If it _needs_ to be there always then I'd roll it out with whatever
automation you use to deploy the mesos workers ; depending on
the scale you're running at launching it as a task is likely to be less
reliable due to outages etc.
( I understand the 'maybe all hosts' constraint but if it's 'up to
I should say this was tested around mesos 1.0, they may have changed
things - but yes this is vanilla networking, no CNI or anything like that.
But I'm guessing if you're using BRIDGE networking and specifying a
hostPort: you're causing work for yourself (unless you actually care what
port the
Try setting your hostPort to 0, to tell Mesos to select one
(which it will allocate out of the pool the mesos slave is set to use).
This works for me for redis:
{
"container": {
"type": "DOCKER",
"docker": {
"image": "redis",
"network": "BRIDGE",
"portMappings": [
It's on s3 isn't it - maybe CloudFront?
On 20 September 2016 at 05:48, tommy xiao wrote:
> Hi Team and Mesosphere's repo,
>
> can Mesosphere provide a sync server way with http://repos.mesosphere.com/.
> it will help china's community to sync the package from mirror repo.
>
>
I'd try the Docker image approach.
We've done this in the past and used our CM tool to 'seed' all slaves
by running 'docker pull foo:v1' across them all in advance, saved a
lot of startup time (although we were only dealing with a Gb or so of
dependencies).
On 5 July 2016 at 11:23, Kota UENISHI
t; How can i give you a log file do check?
>
> 2016-06-12 10:42 GMT+02:00 Dick Davies <d...@hellooperator.net>:
>>
>> Try putting the IP you're binding to (the actual IP on the master) in
>> /etc/mesos-*/ip , and the externally accessible IP in
>> /etc/mesos-*/h
Try putting the IP you're binding to (the actual IP on the master) in
/etc/mesos-*/ip , and the externally accessible IP in
/etc/mesos-*/hostname.
On 12 June 2016 at 00:57, Stefano Bianchi wrote:
> ok i guess i figured out.
> The reason for which i put floating IP on
is started to
> only connect with local zk:
> $ cat /opt/mesosphere/etc/mesos-master | grep ZK
> MESOS_ZK=zk://127.0.0.1:2181/mesos
>
> So I think I do not have to specify all the zk on each master.
>
>
>
>
>
>
>
> Thanks,
> Qian Zhang
>
> On Sun, Jun
t; I tried both:
> sudo ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/mesos --quorum=2
> --work_dir=/var/lib/mesos/master
> and
> sudo ./bin/mesos-master.sh
> --zk=zk://192.168.122.132:2181,192.168.122.171:2181,192.168.122.225:2181/mesos
> --quorum=2 --work_dir=/var/lib/m
You told the master it needed a quorum of 2 and it's the only one
online, so it's bombing out.
That's the expected behaviour.
You need to start at least 2 zookeepers before it will be a functional
group, same for the masters.
You haven't mentioned how you setup your zookeeper cluster, so i'm
I'd imagine it's reporting whatever partition the --work-dir argument
on the slave is set to (sandboxes live under that directory).
On 3 May 2016 at 12:21, Rinaldo Digiorgio wrote:
> Hi,
>
> I have a configuration with a root file system and other file
> systems.
Hi June
are you running Mesos as root, or a non-privileged user? Non-root
won't be able to up their own ulimit too high
(sorry, not an upstart expert as RHELs is laughably incomplete).
On 25 April 2016 at 19:15, June Taylor wrote:
> What I'm saying is even putting them within the
think that's it.
On 18 April 2016 at 20:39, Stefano Bianchi <jazzist...@gmail.com> wrote:
> Hi Dick Davies
>
> Could you please share your solution?
> How did you set up mesos/Zookeeper to interconnect masters and slaves among
> networks?
>
> Thanks a lot!
>
>
+1 for that theory, we had some screwy issues when we tried to span
subnets until we set every slave and master
to listen on a specific IP so we could tie down routing correctly.
Saw very similar symptoms that have been described.
On 18 April 2016 at 18:35, Alex Rukletsov
MB.
>
> On Sat, Apr 16, 2016 at 6:52 PM, Shuai Lin <linshuai2...@gmail.com> wrote:
>> Take a look at
>> http://stackoverflow.com/questions/35614923/errors-compiling-mesos-on-alpine-linux
>> , this guy has successfully patched an older version of the mesos to build
>
Has anyone been able to build libmesos (0.28.x ideally) on Alpine Linux yet?
I'm trying to get a smaller spark docker image and though that was
straightforward, the docs say I need libmesos in the image to be able
to use it (which I find a bit suprising, but it seems to be correct).
You are probably building on an older version of Golang - I think the
Timeout attribute was added to http.Client around 1.5 or 1.6?
On 15 April 2016 at 13:56, June Taylor wrote:
> David,
>
> Thanks for the assistance. How did you get the mesos-exporter installed?
> When I tried the
We just grab them with collectds mesos plugin and log to Graphite,
gives us long term trend details.
https://github.com/rayrod2030/collectd-mesos
Haven't used this one but it supposedly does per-task metric collection:
https://github.com/bobrik/collectd-mesos-tasks
On 14 April 2016 at 13:37,
Oh please yes!
On 13 April 2016 at 08:00, Sam wrote:
> +1
>
> Sent from my iPhone
>
> On Apr 13, 2016, at 12:44 PM, Avinash Sridharan
> wrote:
>
> +1
>
> On Tue, Apr 12, 2016 at 9:31 PM, Jie Yu wrote:
>>
>> +1
>>
>> On Tue, Apr
03:12:24.512336 1715 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> W0413 03:12:34.519641 1710 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> W
There's no mention of a slave there, have a look at the logs on the
slaves filesystem and see if it is giving any errors.
On 12 April 2016 at 10:17, wrote:
> The GUI log shows like this:
>
>
>
> I0412 08:45:51.379609 3616 master.cpp:3673] Processing DECLINE
o swap,
> or maybe we want to swap for non-latency sensitive containers. However, it's
> more complicated (the user and operator have to co-operate more, there are
> more ways to run things, etc), and so the general advice is to disable swap
> to keep things simple and deterministic.
&
Great!
I'm not really sure why mesos even allows RSS limiting without VMEM,
it takes down slaves like the Black Death
if you accidentally deploy a 'leaker'. I'm sure there's a use case I'm
not seeing :)
On 18 March 2016 at 16:27, Shiyao Ma wrote:
> Thanks. The limit_swap works.
Last time I tried (not on the latest release) I also had to have
cgroups set to limit swap, otherwise
as soon as the process hit the RAM limit it would just start to consume swap.
try adding --cgroups_limit_swap to the slaves startup flags.
On 17 March 2016 at 16:21, Shiyao Ma
document link append in the JIRA ticket.
>
> Thanks,
>
> Guangya
>
> On Wed, Mar 16, 2016 at 5:24 PM, Dick Davies <d...@hellooperator.net> wrote:
>>
>> Quick question - what versions of Mesos (if any) support rkt/appc?
>>
>> Saw the announcement of th
Quick question - what versions of Mesos (if any) support rkt/appc?
Saw the announcement of the Unified Containerizer
( http://mesos.apache.org/documentation/container-image/ )
but I wasn't clear if this was a refactoring of existing support, or
new functionality.
Agreed, vMotion always struck me as something for those monolithic
apps with a lot of local state.
The industry seems to be moving away from that as fast as its little
legs will carry it.
On 19 February 2016 at 11:35, Jason Giedymin wrote:
> Food for thought:
>
> One
It sounds like you want to use checkpointing, that should keep the
tasks alive as you update
the mesos slave process itself.
On 30 December 2015 at 11:43, Mike Michel wrote:
> Hi,
>
>
>
> i need to update slaves from time to time and looking for a way to take them
> out of
zookeeper really wants a dedicated cluster IMO; preferably with SSD
under it - if zookeeper
starts to run slow then everything else will start to bog down. I've
co-hosted it with mesos masters
before now for demo purposes etc. but for production it's probably
worth choosing dedicated hosts.
On 24
+1 for the collectd plugin. been using that for about 9 months
and it does the job nicely.
On 11 November 2015 at 06:59, Du, Fan wrote:
> Hi Mesos experts
>
> There is server and client snapshot metrics in jason format provided by
> Mesos itself.
> but more often we want to
You might want to look at the maintenance primitives feature in 0.25.0:
https://mesos.apache.org/blog/mesos-0-25-0-released/
On 29 October 2015 at 18:19, John Omernik wrote:
> I am wondering if there are some easy ways to take a healthy slave/agent
> and start a process
Hi Chris
Spark is a Mesos native, I'd have no hesitation running it on Mesos.
Cassandra not so much -
that's not to disparage the work people are putting in there, I think
it's really interesting. But personally with complex beasts like Cassandra
I want to be running as 'stock' as possible, as
The active master has a flag set in /metrics/snapshot :
master/elected which is 1 for the active
master and 0 otherwise, so it's easy enough to only load the metrics
from the active master.
(I use the collectd plugin and push data rather than poll, but the
same principle should apply).
On 7
That doesn't sound too bad (it's a fairly typical setup e.g. on an Amazon VPC).
You probably want to avoid NAT or similar things between master and
slaves to avoid
a lot of LIBPROCESS_IP tricks so same switch sounds good.
Personally I quite like the master/slave distinction.
I wouldn't want a
Do the masters first, as described at the link.
On 19 June 2015 at 10:17, tommy xiao xia...@gmail.com wrote:
Thanks Alex Rukletsov. In my earlier try, the newer mesos slave ( version
0.21.1) can't connect to mesos master (version 0.20.0), So it annoies to me.
anyway, i will test again, let me
, they should reattach themselves to the
respective slaves.
Thanks
Nikolay
-Original Message-
From: rasput...@gmail.com [mailto:rasput...@gmail.com] On Behalf Of Dick
Davies
Sent: Monday, May 18, 2015 5:26 AM
To: user@mesos.apache.org
Subject: cluster confusion after zookeeper blip
We run a 3 node marathon cluster on top of 3 mesos masters + 6 slaves.
(mesos 0.21.0, marathon 0.7.5)
This morning we had a network outage long enough for everything to
lose zookeeper.
Now our marathon UI is empty (all 3 marathons think someone else is a
master, and
marathons 'proxy to leader'
Been banging my head against this for a while now.
mesos 0.21.0 , marathon 0.7.5, centos 6 servers.
When I enable cgroups (flags are : --cgroups_limit_swap
--isolation=cgroups/cpu,groups/mem ) the memory limits I'm setting
are reflected in memory.soft_limit_in_bytes but not in
on:
https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L393
Ian
On Tue, Apr 28, 2015 at 7:42 AM, Dick Davies d...@hellooperator.net wrote:
Been banging my head against this for a while now.
mesos 0.21.0 , marathon 0.7.5, centos 6 servers.
When I
verify this? It should be shortly
after the LOG(INFO) on line 358.
Ian
On Tue, Apr 28, 2015 at 9:54 AM, Dick Davies d...@hellooperator.net wrote:
Thanks Ian.
Digging around the cgroup there are 3 processes in there;
* the mesos-executor
* the shell script marathon starts the app
You may very well be right, but I'd like to keep this specific thread
focussed on figuring
out why the expected/implemented behaviour isn't happening in my case
if that's ok.
On 28 April 2015 at 19:26, CCAAT cc...@tampabay.rr.com wrote:
I really hate to be the 'old fashion computer scientist'
Thanks Craig, that's really handy!
Dumb question for the list: are there any plans to support multiple
isolation flags somehow?
I need cgroups, but would really like the disk quota feature too (and
network isolation come to that.
And a pony).
On 25 March 2015 at 01:00, craig w
(cgroups/cpu,cgroups/mem,posix/disk)
Tim
On Wed, Mar 25, 2015 at 12:46 AM, Dick Davies d...@hellooperator.net
wrote:
Thanks Craig, that's really handy!
Dumb question for the list: are there any plans to support multiple
isolation flags somehow?
I need cgroups, but would really like the disk
/plugins/python/
Import mesos-master
Module mesos-master
Host localhost
Port 5050
Verbose false
Version 0.21.0
/Module
/Plugin
Anything wrong with the above settings?
Cheers,
Dan
2015-03-10 17:21 GMT-05:00 Dick Davies d...@hellooperator.net
Yeah, that confused me too - I think that figure is specific to the
master/slave polled
(and that'll just be the active one since you're only reporting when
master/elected
is true.
I'm using this one https://github.com/rayrod2030/collectd-mesos , not
sure if that's
the same as yours?
On 7
This is due to the upstart scripts shipped with the RPM.
mesos has been shipping these since at least 0.17.x (as
that's when we started using it).
Where's the repo to send a PR to correct the docs?
On 5 February 2015 at 09:48, Chengwei Yang chengwei.yang...@gmail.com wrote:
On Mon, Feb 02, 2015
The offer is only for 455 Mb of RAM. You can check that in the slave UI,
but it looks like you have other tasks running that are using some of that
1863Mb.
On 2 February 2015 at 05:11, Hepple, Robert rhep...@tnsi.com wrote:
Yeah but ... the slave is reporting 1863Mb RAM and 2 CPUS - so how come
Be careful, there's now nothing stopping those 2 masters from forming
2 clusters.
Add a third asap.
On 28 January 2015 at 08:25, xiaokun xiaokun...@gmail.com wrote:
hi, I changed the quorum to 1. Slave can be displayed now!
Thanks!
2015-01-28 16:19 GMT+08:00 xiaokun xiaokun...@gmail.com:
Those RPMs are built for CentOS 6 i think.
For testing, you can get it to start up by just dropping in a symlink :
/lib64/libsasl2.so.2 - /lib64/libsasl2.so.3
On 26 January 2015 at 01:33, Yu Wenhua s...@yuwh.net wrote:
[root@zone1_0 ~]# uname -a
Linux zone1_0 3.10.0-123.el7.x86_64 #1 SMP
On 23 January 2015 at 21:20, Sharma Podila spod...@netflix.com wrote:
Here's one possible scenario:
A DataCenter runs Databases, Webservers, MicroServices, Hadoop or other
batch jobs, stream processing jobs, etc. There's 1000s, if not 100s, of
systems available for all of this. Ideally,
There's an RPM repo, see documentation at:
https://mesosphere.com/2014/07/17/mesosphere-package-repositories/
On 23 January 2015 at 09:27, Yu Wenhua s...@yuwh.net wrote:
Hi,
Can anyone tell me how to build a mesos rpm package? So I can deploy it to
slave node easily
Thanks.
Yu.
To view the slaves logs, you need to be able to connect to that URL
from your browser, not the master
(the data is read directly from the slave by your browser, it doesn't
go via the master).
On 15 January 2015 at 21:42, Dan Dong dongda...@gmail.com wrote:
Hi, All,
Now sandbox could be
Might be worth getting a packaged release for your OS, especially
if you're new to this.
On 7 January 2015 at 16:53, Dan Dong dongda...@gmail.com wrote:
Hi, Brian,
It's not there:
ls /etc/default/mesos
ls: cannot access /etc/default/mesos: No such file or directory
I installed mesos from
The quorum flag is for the number of mesos masters, not zookeepers.
if you only have one master, it's going to have trouble reaching a
quorum of 2 :)
either set --quorum=1 or spin up more masters.
On 6 November 2014 21:01, sujinzhao sujinz...@gmail.com wrote:
Hi,all,
I set up zookeeper
Golden Rule : Don't use even numbers of members with quorum systems.
You need a quorum to function so with 2 masters and quorum=2, you can't
ever take a member down. With 2 masters and quorum=1, you're asking
for split brain.
(this is exactly the same with zookeeper by the way, it's also a
-HDFS sources.
On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies d...@hellooperator.net wrote:
I think Spark needs a way to send jobs to/from the workers - the Spark
distro itself
will pull down the executor ok, but in my (very basic) tests I got
stuck without HDFS.
So basically it depends
Issue seems to be with how the tasks are asking for port resources -
I'd guess whichever
tutorial you're using may be using an old/invalid syntax.
What tutorial are you working from?
On 18 October 2014 15:08, David Palaitis david.palai...@twosigma.com wrote:
I am having trouble getting
One gotcha - the marathon timeout is in seconds, so pass '300' in your case.
let us know if it works, I spotted this the other day and anecdotally
it addresses
the issue for some users, be good to get more feedback.
On 16 October 2014 09:49, Grzegorz Graczyk gregor...@gmail.com wrote:
Make sure
To answer point 2) - yes, your executors will create their 'sandboxes'
under work_dir.
On 8 October 2014 00:13, Arunabha Ghosh arunabha...@gmail.com wrote:
Thanks Steven !
On Tue, Oct 7, 2014 at 4:08 PM, Steven Schlansker
sschlans...@opentable.com wrote:
On Oct 7, 2014, at 4:06 PM,
One thing to check - have you upped
--executor_registration_timeout
from the default of 1min? a docker pull can easily take longer than that.
On 2 October 2014 22:18, Michael Babineau michael.babin...@gmail.com wrote:
I'm seeing an issue where tasks are being marked as killed but remain
What version of docker does that give you, out of interest?
mainline EL7 is still shipping a pre-1.0 that won't work with mesos
(although since docker is just a static Go binary, it's trivial to overwrite
/usr/bin/docker and get everything to work).
On 25 September 2014 20:23, John Mickey
The master is advertising itself as being on 127.0.0.1 - try running
it with an --ip flag.
On 23 September 2014 11:10, Grzegorz Graczyk gregor...@gmail.com wrote:
Thanks for your response!
Mounting /sys did the job, cgroups are working, but now mesos-slave is just
crushing after detecting
Don't suppose there's any chance of a fix for
https://issues.apache.org/jira/browse/MESOS-1195
is there?
(I'll settle for a workaround to get mesos running on EL7 somehow, mind)
On 18 September 2014 18:18, Adam Bordelon a...@mesosphere.io wrote:
Great. I'll roll that into an rc3 today. Any
I don't think that's the issue - i have a custom work_dir too and can
see the logs fine.
Don't they still get served up from the slaves themselves (port 5051)?
Maybe you've got
a firewall blocking that from where you're viewing the mesos ui?
On 4 September 2014 23:58, John Omernik
On 22 July 2014 10:40, Tomas Barton barton.to...@gmail.com wrote:
I have 4 Mesos masters, which would mean that quorum 2 - quorum=3, right?
Yes, that's right. 2 won't be enough.
quorum=1, mesos-masters=1
quorum=2, mesos-masters=3
quorum=3, mesos-masters=5
quorum=4, mesos-masters=7
Is
the things very cleanly when you remove the formation? Though I find
their JSON file very difficult to navigate and their Update Feature doesnt
seem to work too well..
On Wed, Jul 16, 2014 at 10:46 AM, Dick Davies d...@hellooperator.net
wrote:
I'd like to show you my playbooks
Are you using cgroups, or the default (posix) isolation?
On 11 July 2014 17:06, Asim linka...@gmail.com wrote:
Hi,
I am running a job on few machines in my Linux cluster. Each machine is an
Intel 8 core (with 32 threads). I see a total of 32 CPUs in /etc/cpuinfo and
within mesos web
I might be wrong but doesn't the new quorum setting mean
it only makes sense to run an odd number of masters
(a la zookeepers)?
i.e. 4 masters is no more resilient than 3 (in fact less so, since
you increase your chance of a node failure as number of nodes
increases).
That's fantastic news, really good to see some integration happening
between chocolate and peanut butter
here. Deimos has been pretty difficult for us to deploy on our
platforms (largely down to the python implementation,
which has problems on the ancient python EL6 ships with).
On 20 June 2014
.
On Thu, Jun 19, 2014 at 3:03 AM, Dick Davies d...@hellooperator.net wrote:
Fab, thanks Vinod. Turns out that feature (different FQDN to serve the ui
up on)
might well be really useful for us, so every cloud has a silver lining :)
back to the metadata feature though - do you know why just the 'id
as available. More details/logs would help diagnose
the issue.
HTH,
On Wed, Jun 18, 2014 at 4:26 AM, Dick Davies d...@hellooperator.net wrote:
Should have said, the CLI for this is :
/usr/local/sbin/mesos-slave --master=zk://10.10.10.105:2181/mesos
--log_dir=/var/log/mesos --ip=10.10.10.101
So we're running with default isolation (posix)
and thinking about enabling cgroups (mesos 0.17.0
right now but the upgrade to 0.18.2 was seamless
in dev. so that'll probably happen too).
I just need to justify the effort and extra complexity,
so can someone explain briefly
* what croup
I'd try a newer version before you file bugs - but to be honest log rotation
is logrotates job, it's really not very hard to setup.
In our stack we run under upstart, so things make it into syslog and we
don't have to worry about rotation - scales better too as it's easier to
centralize.
On 14
/#/slaves/201405120912-16777343-5050-23673-0
On Thu, May 8, 2014 at 9:21 AM, Dick Davies d...@hellooperator.net
wrote:
I've found the sandbox logs to be very useful in debugging
misbehaving frameworks, typos, etc. - the usual n00b stuff I suppose.
I've got a vagrant stack running quite
can specify taskRateLimit (max number of tasks to start
per second) as part of your app definition.
On Wed, Apr 30, 2014 at 11:30 AM, Dick Davies d...@hellooperator.net
wrote:
Managed to take out a mesos slave today with a typo while launching
a marathon app, and wondered
Can't you just '/ignore' the IRC bot if it bothers you?
On 17 April 2014 03:01, Chengwei Yang chengwei.yang...@gmail.com wrote:
Hi All,
I am a irc guy, maybe so as you. However, I found that there are two
bots for JIRA, one for the mesos-dev mailing list, one for the irc
channel.
I
76 matches
Mail list logo