Re: ensuring a particular task is deployed to "all" Mesos Worker hosts

2017-07-01 Thread Dick Davies
If it _needs_ to be there always then I'd roll it out with whatever automation you use to deploy the mesos workers ; depending on the scale you're running at launching it as a task is likely to be less reliable due to outages etc. ( I understand the 'maybe all hosts' constraint but if it's 'up to

Re: Mesos (and Marathon) port mapping

2017-03-29 Thread Dick Davies
I should say this was tested around mesos 1.0, they may have changed things - but yes this is vanilla networking, no CNI or anything like that. But I'm guessing if you're using BRIDGE networking and specifying a hostPort: you're causing work for yourself (unless you actually care what port the

Re: Mesos (and Marathon) port mapping

2017-03-28 Thread Dick Davies
Try setting your hostPort to 0, to tell Mesos to select one (which it will allocate out of the pool the mesos slave is set to use). This works for me for redis: { "container": { "type": "DOCKER", "docker": { "image": "redis", "network": "BRIDGE", "portMappings": [

Re: mirror of mesosphere's repo

2016-09-20 Thread Dick Davies
It's on s3 isn't it - maybe CloudFront? On 20 September 2016 at 05:48, tommy xiao wrote: > Hi Team and Mesosphere's repo, > > can Mesosphere provide a sync server way with http://repos.mesosphere.com/. > it will help china's community to sync the package from mirror repo. > >

Re: Fetcher cache: caching even more while an executor is alive

2016-07-07 Thread Dick Davies
I'd try the Docker image approach. We've done this in the past and used our CM tool to 'seed' all slaves by running 'docker pull foo:v1' across them all in advance, saved a lot of startup time (although we were only dealing with a Gb or so of dependencies). On 5 July 2016 at 11:23, Kota UENISHI

Re: Mesos 0.28.2 does not start

2016-06-13 Thread Dick Davies
t; How can i give you a log file do check? > > 2016-06-12 10:42 GMT+02:00 Dick Davies <d...@hellooperator.net>: >> >> Try putting the IP you're binding to (the actual IP on the master) in >> /etc/mesos-*/ip , and the externally accessible IP in >> /etc/mesos-*/h

Re: Mesos 0.28.2 does not start

2016-06-12 Thread Dick Davies
Try putting the IP you're binding to (the actual IP on the master) in /etc/mesos-*/ip , and the externally accessible IP in /etc/mesos-*/hostname. On 12 June 2016 at 00:57, Stefano Bianchi wrote: > ok i guess i figured out. > The reason for which i put floating IP on

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-05 Thread Dick Davies
is started to > only connect with local zk: > $ cat /opt/mesosphere/etc/mesos-master | grep ZK > MESOS_ZK=zk://127.0.0.1:2181/mesos > > So I think I do not have to specify all the zk on each master. > > > > > > > > Thanks, > Qian Zhang > > On Sun, Jun

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-05 Thread Dick Davies
t; I tried both: > sudo ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/mesos --quorum=2 > --work_dir=/var/lib/mesos/master > and > sudo ./bin/mesos-master.sh > --zk=zk://192.168.122.132:2181,192.168.122.171:2181,192.168.122.225:2181/mesos > --quorum=2 --work_dir=/var/lib/m

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-04 Thread Dick Davies
You told the master it needed a quorum of 2 and it's the only one online, so it's bombing out. That's the expected behaviour. You need to start at least 2 zookeepers before it will be a functional group, same for the masters. You haven't mentioned how you setup your zookeeper cluster, so i'm

Re: How to add other file systems to an agent

2016-05-03 Thread Dick Davies
I'd imagine it's reporting whatever partition the --work-dir argument on the slave is set to (sandboxes live under that directory). On 3 May 2016 at 12:21, Rinaldo Digiorgio wrote: > Hi, > > I have a configuration with a root file system and other file > systems.

Re: Setting ulimits on mesos-slave

2016-04-25 Thread Dick Davies
Hi June are you running Mesos as root, or a non-privileged user? Non-root won't be able to up their own ulimit too high (sorry, not an upstart expert as RHELs is laughably incomplete). On 25 April 2016 at 19:15, June Taylor wrote: > What I'm saying is even putting them within the

Re: removed slace "ID": (131.154.96.172): health check timed out

2016-04-18 Thread Dick Davies
think that's it. On 18 April 2016 at 20:39, Stefano Bianchi <jazzist...@gmail.com> wrote: > Hi Dick Davies > > Could you please share your solution? > How did you set up mesos/Zookeeper to interconnect masters and slaves among > networks? > > Thanks a lot! > >

Re: removed slace "ID": (131.154.96.172): health check timed out

2016-04-18 Thread Dick Davies
+1 for that theory, we had some screwy issues when we tried to span subnets until we set every slave and master to listen on a specific IP so we could tie down routing correctly. Saw very similar symptoms that have been described. On 18 April 2016 at 18:35, Alex Rukletsov

Re: libmesos on alpine linux?

2016-04-17 Thread Dick Davies
MB. > > On Sat, Apr 16, 2016 at 6:52 PM, Shuai Lin <linshuai2...@gmail.com> wrote: >> Take a look at >> http://stackoverflow.com/questions/35614923/errors-compiling-mesos-on-alpine-linux >> , this guy has successfully patched an older version of the mesos to build >

libmesos on alpine linux?

2016-04-16 Thread Dick Davies
Has anyone been able to build libmesos (0.28.x ideally) on Alpine Linux yet? I'm trying to get a smaller spark docker image and though that was straightforward, the docs say I need libmesos in the image to be able to use it (which I find a bit suprising, but it seems to be correct).

Re: Prometheus Exporters on Marathon

2016-04-15 Thread Dick Davies
You are probably building on an older version of Golang - I think the Timeout attribute was added to http.Client around 1.5 or 1.6? On 15 April 2016 at 13:56, June Taylor wrote: > David, > > Thanks for the assistance. How did you get the mesos-exporter installed? > When I tried the

Re: Mesos Task History

2016-04-14 Thread Dick Davies
We just grab them with collectds mesos plugin and log to Graphite, gives us long term trend details. https://github.com/rayrod2030/collectd-mesos Haven't used this one but it supposedly does per-task metric collection: https://github.com/bobrik/collectd-mesos-tasks On 14 April 2016 at 13:37,

Re: [Proposal] Remove the default value for agent work_dir

2016-04-13 Thread Dick Davies
Oh please yes! On 13 April 2016 at 08:00, Sam wrote: > +1 > > Sent from my iPhone > > On Apr 13, 2016, at 12:44 PM, Avinash Sridharan > wrote: > > +1 > > On Tue, Apr 12, 2016 at 9:31 PM, Jie Yu wrote: >> >> +1 >> >> On Tue, Apr

Re: Slaves not getting registered

2016-04-13 Thread Dick Davies
03:12:24.512336 1715 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0413 03:12:34.519641 1710 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W

Re: Slaves not getting registered

2016-04-12 Thread Dick Davies
There's no mention of a slave there, have a look at the logs on the slaves filesystem and see if it is giving any errors. On 12 April 2016 at 10:17, wrote: > The GUI log shows like this: > > > > I0412 08:45:51.379609 3616 master.cpp:3673] Processing DECLINE

Re: How to kill tasks when memory exceeds the cgroup limit?

2016-03-19 Thread Dick Davies
o swap, > or maybe we want to swap for non-latency sensitive containers. However, it's > more complicated (the user and operator have to co-operate more, there are > more ways to run things, etc), and so the general advice is to disable swap > to keep things simple and deterministic. &

Re: How to kill tasks when memory exceeds the cgroup limit?

2016-03-19 Thread Dick Davies
Great! I'm not really sure why mesos even allows RSS limiting without VMEM, it takes down slaves like the Black Death if you accidentally deploy a 'leaker'. I'm sure there's a use case I'm not seeing :) On 18 March 2016 at 16:27, Shiyao Ma wrote: > Thanks. The limit_swap works.

Re: How to kill tasks when memory exceeds the cgroup limit?

2016-03-19 Thread Dick Davies
Last time I tried (not on the latest release) I also had to have cgroups set to limit swap, otherwise as soon as the process hit the RAM limit it would just start to consume swap. try adding --cgroups_limit_swap to the slaves startup flags. On 17 March 2016 at 16:21, Shiyao Ma

Re: rkt / appc support

2016-03-16 Thread Dick Davies
document link append in the JIRA ticket. > > Thanks, > > Guangya > > On Wed, Mar 16, 2016 at 5:24 PM, Dick Davies <d...@hellooperator.net> wrote: >> >> Quick question - what versions of Mesos (if any) support rkt/appc? >> >> Saw the announcement of th

rkt / appc support

2016-03-16 Thread Dick Davies
Quick question - what versions of Mesos (if any) support rkt/appc? Saw the announcement of the Unified Containerizer ( http://mesos.apache.org/documentation/container-image/ ) but I wasn't clear if this was a refactoring of existing support, or new functionality.

Re: AW: Feature request: move in-flight containers w/o stopping them

2016-02-19 Thread Dick Davies
Agreed, vMotion always struck me as something for those monolithic apps with a lot of local state. The industry seems to be moving away from that as fast as its little legs will carry it. On 19 February 2016 at 11:35, Jason Giedymin wrote: > Food for thought: > > One

Re: make slaves not getting tasks anymore

2015-12-30 Thread Dick Davies
It sounds like you want to use checkpointing, that should keep the tasks alive as you update the mesos slave process itself. On 30 December 2015 at 11:43, Mike Michel wrote: > Hi, > > > > i need to update slaves from time to time and looking for a way to take them > out of

Re: Mesos masters and zookeeper running together?

2015-12-24 Thread Dick Davies
zookeeper really wants a dedicated cluster IMO; preferably with SSD under it - if zookeeper starts to run slow then everything else will start to bog down. I've co-hosted it with mesos masters before now for demo purposes etc. but for production it's probably worth choosing dedicated hosts. On 24

Re: what's the best way to monitor mesos cluster

2015-11-11 Thread Dick Davies
+1 for the collectd plugin. been using that for about 9 months and it does the job nicely. On 11 November 2015 at 06:59, Du, Fan wrote: > Hi Mesos experts > > There is server and client snapshot metrics in jason format provided by > Mesos itself. > but more often we want to

Re: Cluster Maintanence

2015-10-29 Thread Dick Davies
You might want to look at the maintenance primitives feature in 0.25.0: https://mesos.apache.org/blog/mesos-0-25-0-released/ On 29 October 2015 at 18:19, John Omernik wrote: > I am wondering if there are some easy ways to take a healthy slave/agent > and start a process

Re: How production un-ready are Mesos Cassandra, Spark and Kafka Frameworks?

2015-10-12 Thread Dick Davies
Hi Chris Spark is a Mesos native, I'd have no hesitation running it on Mesos. Cassandra not so much - that's not to disparage the work people are putting in there, I think it's really interesting. But personally with complex beasts like Cassandra I want to be running as 'stock' as possible, as

Re: Java detector for mess masters and leader

2015-07-07 Thread Dick Davies
The active master has a flag set in /metrics/snapshot : master/elected which is 1 for the active master and 0 otherwise, so it's easy enough to only load the metrics from the active master. (I use the collectd plugin and push data rather than poll, but the same principle should apply). On 7

Re: Thoughts and opinions in physically building a cluster

2015-06-25 Thread Dick Davies
That doesn't sound too bad (it's a fairly typical setup e.g. on an Amazon VPC). You probably want to avoid NAT or similar things between master and slaves to avoid a lot of LIBPROCESS_IP tricks so same switch sounds good. Personally I quite like the master/slave distinction. I wouldn't want a

Re: How to upgrade mesos version from a running mesos cluster

2015-06-19 Thread Dick Davies
Do the masters first, as described at the link. On 19 June 2015 at 10:17, tommy xiao xia...@gmail.com wrote: Thanks Alex Rukletsov. In my earlier try, the newer mesos slave ( version 0.21.1) can't connect to mesos master (version 0.20.0), So it annoies to me. anyway, i will test again, let me

Re: cluster confusion after zookeeper blip

2015-05-18 Thread Dick Davies
, they should reattach themselves to the respective slaves. Thanks Nikolay -Original Message- From: rasput...@gmail.com [mailto:rasput...@gmail.com] On Behalf Of Dick Davies Sent: Monday, May 18, 2015 5:26 AM To: user@mesos.apache.org Subject: cluster confusion after zookeeper blip

cluster confusion after zookeeper blip

2015-05-18 Thread Dick Davies
We run a 3 node marathon cluster on top of 3 mesos masters + 6 slaves. (mesos 0.21.0, marathon 0.7.5) This morning we had a network outage long enough for everything to lose zookeeper. Now our marathon UI is empty (all 3 marathons think someone else is a master, and marathons 'proxy to leader'

group memory limits are always 'soft' . how do I ensure info-pid.isNone() ?

2015-04-28 Thread Dick Davies
Been banging my head against this for a while now. mesos 0.21.0 , marathon 0.7.5, centos 6 servers. When I enable cgroups (flags are : --cgroups_limit_swap --isolation=cgroups/cpu,groups/mem ) the memory limits I'm setting are reflected in memory.soft_limit_in_bytes but not in

Re: group memory limits are always 'soft' . how do I ensure info-pid.isNone() ?

2015-04-28 Thread Dick Davies
on: https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L393 Ian On Tue, Apr 28, 2015 at 7:42 AM, Dick Davies d...@hellooperator.net wrote: Been banging my head against this for a while now. mesos 0.21.0 , marathon 0.7.5, centos 6 servers. When I

Re: group memory limits are always 'soft' . how do I ensure info-pid.isNone() ?

2015-04-28 Thread Dick Davies
verify this? It should be shortly after the LOG(INFO) on line 358. Ian On Tue, Apr 28, 2015 at 9:54 AM, Dick Davies d...@hellooperator.net wrote: Thanks Ian. Digging around the cgroup there are 3 processes in there; * the mesos-executor * the shell script marathon starts the app

Re: group memory limits are always 'soft' . how do I ensure info-pid.isNone() ?

2015-04-28 Thread Dick Davies
You may very well be right, but I'd like to keep this specific thread focussed on figuring out why the expected/implemented behaviour isn't happening in my case if that's ok. On 28 April 2015 at 19:26, CCAAT cc...@tampabay.rr.com wrote: I really hate to be the 'old fashion computer scientist'

Re: [RESULT][VOTE] Release Apache Mesos 0.22.0 (rc4)

2015-03-25 Thread Dick Davies
Thanks Craig, that's really handy! Dumb question for the list: are there any plans to support multiple isolation flags somehow? I need cgroups, but would really like the disk quota feature too (and network isolation come to that. And a pony). On 25 March 2015 at 01:00, craig w

Re: [RESULT][VOTE] Release Apache Mesos 0.22.0 (rc4)

2015-03-25 Thread Dick Davies
(cgroups/cpu,cgroups/mem,posix/disk) Tim On Wed, Mar 25, 2015 at 12:46 AM, Dick Davies d...@hellooperator.net wrote: Thanks Craig, that's really handy! Dumb question for the list: are there any plans to support multiple isolation flags somehow? I need cgroups, but would really like the disk

Re: mesos-collectd-plugin

2015-03-11 Thread Dick Davies
/plugins/python/ Import mesos-master Module mesos-master Host localhost Port 5050 Verbose false Version 0.21.0 /Module /Plugin Anything wrong with the above settings? Cheers, Dan 2015-03-10 17:21 GMT-05:00 Dick Davies d...@hellooperator.net

Re: Question on Monitoring a Mesos Cluster

2015-03-07 Thread Dick Davies
Yeah, that confused me too - I think that figure is specific to the master/slave polled (and that'll just be the active one since you're only reporting when master/elected is true. I'm using this one https://github.com/rayrod2030/collectd-mesos , not sure if that's the same as yours? On 7

Re: Mesosphere on Centos 6.6

2015-02-05 Thread Dick Davies
This is due to the upstart scripts shipped with the RPM. mesos has been shipping these since at least 0.17.x (as that's when we started using it). Where's the repo to send a PR to correct the docs? On 5 February 2015 at 09:48, Chengwei Yang chengwei.yang...@gmail.com wrote: On Mon, Feb 02, 2015

Re: Is mesos spamming me?

2015-02-01 Thread Dick Davies
The offer is only for 455 Mb of RAM. You can check that in the slave UI, but it looks like you have other tasks running that are using some of that 1863Mb. On 2 February 2015 at 05:11, Hepple, Robert rhep...@tnsi.com wrote: Yeah but ... the slave is reporting 1863Mb RAM and 2 CPUS - so how come

Re: Slave cannot be registered while masters keep switching to another one.

2015-01-28 Thread Dick Davies
Be careful, there's now nothing stopping those 2 masters from forming 2 clusters. Add a third asap. On 28 January 2015 at 08:25, xiaokun xiaokun...@gmail.com wrote: hi, I changed the quorum to 1. Slave can be displayed now! Thanks! 2015-01-28 16:19 GMT+08:00 xiaokun xiaokun...@gmail.com:

Re: how to create rpm package

2015-01-26 Thread Dick Davies
Those RPMs are built for CentOS 6 i think. For testing, you can get it to start up by just dropping in a symlink : /lib64/libsasl2.so.2 - /lib64/libsasl2.so.3 On 26 January 2015 at 01:33, Yu Wenhua s...@yuwh.net wrote: [root@zone1_0 ~]# uname -a Linux zone1_0 3.10.0-123.el7.x86_64 #1 SMP

Re: cluster wide init

2015-01-23 Thread Dick Davies
On 23 January 2015 at 21:20, Sharma Podila spod...@netflix.com wrote: Here's one possible scenario: A DataCenter runs Databases, Webservers, MicroServices, Hadoop or other batch jobs, stream processing jobs, etc. There's 1000s, if not 100s, of systems available for all of this. Ideally,

Re: how to create rpm package

2015-01-23 Thread Dick Davies
There's an RPM repo, see documentation at: https://mesosphere.com/2014/07/17/mesosphere-package-repositories/ On 23 January 2015 at 09:27, Yu Wenhua s...@yuwh.net wrote: Hi, Can anyone tell me how to build a mesos rpm package? So I can deploy it to slave node easily Thanks. Yu.

Re: hadoop job stuck.

2015-01-16 Thread Dick Davies
To view the slaves logs, you need to be able to connect to that URL from your browser, not the master (the data is read directly from the slave by your browser, it doesn't go via the master). On 15 January 2015 at 21:42, Dan Dong dongda...@gmail.com wrote: Hi, All, Now sandbox could be

Re: conf files location of mesos.

2015-01-07 Thread Dick Davies
Might be worth getting a packaged release for your OS, especially if you're new to this. On 7 January 2015 at 16:53, Dan Dong dongda...@gmail.com wrote: Hi, Brian, It's not there: ls /etc/default/mesos ls: cannot access /etc/default/mesos: No such file or directory I installed mesos from

Re: Problems of running mesos-0.20.0 with zookeeper

2014-11-06 Thread Dick Davies
The quorum flag is for the number of mesos masters, not zookeepers. if you only have one master, it's going to have trouble reaching a quorum of 2 :) either set --quorum=1 or spin up more masters. On 6 November 2014 21:01, sujinzhao sujinz...@gmail.com wrote: Hi,all, I set up zookeeper

Re: Problems of running mesos-0.20.0 with zookeeper

2014-11-06 Thread Dick Davies
Golden Rule : Don't use even numbers of members with quorum systems. You need a quorum to function so with 2 masters and quorum=2, you can't ever take a member down. With 2 masters and quorum=1, you're asking for split brain. (this is exactly the same with zookeeper by the way, it's also a

Re: Do i really need HDFS?

2014-10-22 Thread Dick Davies
-HDFS sources. On Tue, Oct 21, 2014 at 4:57 AM, Dick Davies d...@hellooperator.net wrote: I think Spark needs a way to send jobs to/from the workers - the Spark distro itself will pull down the executor ok, but in my (very basic) tests I got stuck without HDFS. So basically it depends

Re: Cassandra Mesos Framework Issue

2014-10-19 Thread Dick Davies
Issue seems to be with how the tasks are asking for port resources - I'd guess whichever tutorial you're using may be using an old/invalid syntax. What tutorial are you working from? On 18 October 2014 15:08, David Palaitis david.palai...@twosigma.com wrote: I am having trouble getting

Re: Staging docker task KILLED after 1 minute

2014-10-16 Thread Dick Davies
One gotcha - the marathon timeout is in seconds, so pass '300' in your case. let us know if it works, I spotted this the other day and anecdotally it addresses the issue for some users, be good to get more feedback. On 16 October 2014 09:49, Grzegorz Graczyk gregor...@gmail.com wrote: Make sure

Re: Multiple disks with Mesos

2014-10-08 Thread Dick Davies
To answer point 2) - yes, your executors will create their 'sandboxes' under work_dir. On 8 October 2014 00:13, Arunabha Ghosh arunabha...@gmail.com wrote: Thanks Steven ! On Tue, Oct 7, 2014 at 4:08 PM, Steven Schlansker sschlans...@opentable.com wrote: On Oct 7, 2014, at 4:06 PM,

Re: Orphaned Docker containers in Mesos 0.20.1

2014-10-02 Thread Dick Davies
One thing to check - have you upped --executor_registration_timeout from the default of 1min? a docker pull can easily take longer than that. On 2 October 2014 22:18, Michael Babineau michael.babin...@gmail.com wrote: I'm seeing an issue where tasks are being marked as killed but remain

Re: Build on Amazon Linux

2014-09-26 Thread Dick Davies
What version of docker does that give you, out of interest? mainline EL7 is still shipping a pre-1.0 that won't work with mesos (although since docker is just a static Go binary, it's trivial to overwrite /usr/bin/docker and get everything to work). On 25 September 2014 20:23, John Mickey

Re: Running mesos-slave in Docker container

2014-09-23 Thread Dick Davies
The master is advertising itself as being on 127.0.0.1 - try running it with an --ip flag. On 23 September 2014 11:10, Grzegorz Graczyk gregor...@gmail.com wrote: Thanks for your response! Mounting /sys did the job, cgroups are working, but now mesos-slave is just crushing after detecting

Re: [VOTE] Release Apache Mesos 0.20.1 (rc2)

2014-09-18 Thread Dick Davies
Don't suppose there's any chance of a fix for https://issues.apache.org/jira/browse/MESOS-1195 is there? (I'll settle for a workaround to get mesos running on EL7 somehow, mind) On 18 September 2014 18:18, Adam Bordelon a...@mesosphere.io wrote: Great. I'll roll that into an rc3 today. Any

Re: Sandbox Log Links

2014-09-04 Thread Dick Davies
I don't think that's the issue - i have a custom work_dir too and can see the logs fine. Don't they still get served up from the slaves themselves (port 5051)? Maybe you've got a firewall blocking that from where you're viewing the mesos ui? On 4 September 2014 23:58, John Omernik

Re: Mesos 0.19 registrar upgrade

2014-07-22 Thread Dick Davies
On 22 July 2014 10:40, Tomas Barton barton.to...@gmail.com wrote: I have 4 Mesos masters, which would mean that quorum 2 - quorum=3, right? Yes, that's right. 2 won't be enough. quorum=1, mesos-masters=1 quorum=2, mesos-masters=3 quorum=3, mesos-masters=5 quorum=4, mesos-masters=7 Is

Re: how to update master cluster

2014-07-16 Thread Dick Davies
the things very cleanly when you remove the formation? Though I find their JSON file very difficult to navigate and their Update Feature doesnt seem to work too well.. On Wed, Jul 16, 2014 at 10:46 AM, Dick Davies d...@hellooperator.net wrote: I'd like to show you my playbooks

Re: mesos isolation

2014-07-11 Thread Dick Davies
Are you using cgroups, or the default (posix) isolation? On 11 July 2014 17:06, Asim linka...@gmail.com wrote: Hi, I am running a job on few machines in my Linux cluster. Each machine is an Intel 8 core (with 32 threads). I see a total of 32 CPUs in /etc/cpuinfo and within mesos web

number of masters and quorum

2014-07-01 Thread Dick Davies
I might be wrong but doesn't the new quorum setting mean it only makes sense to run an odd number of masters (a la zookeepers)? i.e. 4 masters is no more resilient than 3 (in fact less so, since you increase your chance of a node failure as number of nodes increases).

Re: Docker support in Mesos core

2014-06-21 Thread Dick Davies
That's fantastic news, really good to see some integration happening between chocolate and peanut butter here. Deimos has been pretty difficult for us to deploy on our platforms (largely down to the python implementation, which has problems on the ancient python EL6 ships with). On 20 June 2014

Re: Failed to perform recovery: Incompatible slave info detected

2014-06-19 Thread Dick Davies
. On Thu, Jun 19, 2014 at 3:03 AM, Dick Davies d...@hellooperator.net wrote: Fab, thanks Vinod. Turns out that feature (different FQDN to serve the ui up on) might well be really useful for us, so every cloud has a silver lining :) back to the metadata feature though - do you know why just the 'id

Re: Failed to perform recovery: Incompatible slave info detected

2014-06-18 Thread Dick Davies
as available. More details/logs would help diagnose the issue. HTH, On Wed, Jun 18, 2014 at 4:26 AM, Dick Davies d...@hellooperator.net wrote: Should have said, the CLI for this is : /usr/local/sbin/mesos-slave --master=zk://10.10.10.105:2181/mesos --log_dir=/var/log/mesos --ip=10.10.10.101

n00b isolation docs?

2014-06-09 Thread Dick Davies
So we're running with default isolation (posix) and thinking about enabling cgroups (mesos 0.17.0 right now but the upgrade to 0.18.2 was seamless in dev. so that'll probably happen too). I just need to justify the effort and extra complexity, so can someone explain briefly * what croup

Re: Log managment

2014-05-16 Thread Dick Davies
I'd try a newer version before you file bugs - but to be honest log rotation is logrotates job, it's really not very hard to setup. In our stack we run under upstart, so things make it into syslog and we don't have to worry about rotation - scales better too as it's easier to centralize. On 14

Re: how does the web UI get sandbox logs?

2014-05-16 Thread Dick Davies
/#/slaves/201405120912-16777343-5050-23673-0 On Thu, May 8, 2014 at 9:21 AM, Dick Davies d...@hellooperator.net wrote: I've found the sandbox logs to be very useful in debugging misbehaving frameworks, typos, etc. - the usual n00b stuff I suppose. I've got a vagrant stack running quite

Re: protecting mesos from fat fingers

2014-05-02 Thread Dick Davies
can specify taskRateLimit (max number of tasks to start per second) as part of your app definition. On Wed, Apr 30, 2014 at 11:30 AM, Dick Davies d...@hellooperator.net wrote: Managed to take out a mesos slave today with a typo while launching a marathon app, and wondered

Re: How about disable the irc ASFBot to flood the irc channel?

2014-04-17 Thread Dick Davies
Can't you just '/ignore' the IRC bot if it bothers you? On 17 April 2014 03:01, Chengwei Yang chengwei.yang...@gmail.com wrote: Hi All, I am a irc guy, maybe so as you. However, I found that there are two bots for JIRA, one for the mesos-dev mailing list, one for the irc channel. I