Re: mesos agent not recovering after ZK init failure

2016-02-23 Thread Sharma Podila
Hi Ben, Let me know if there is a new issue created for this, I would like to add myself to watch it. Thanks. On Wed, Feb 10, 2016 at 9:54 AM, Sharma Podila wrote: > Hi Ben, > > That is accurate, with one additional line: > > -Agent running fine with 0.24.1 > -Transient

Re: Safe update of agent attributes

2016-02-23 Thread Vinod Kone
On Tue, Feb 23, 2016 at 12:59 PM, Zameer Manji wrote: > Is incompatible slave info signaled by a certain exit code? > Not currently, but we could. A naive/hacky implementation could look at log lines.

Re: Safe update of agent attributes

2016-02-23 Thread Zameer Manji
Is incompatible slave info signaled by a certain exit code? On Tue, Feb 23, 2016 at 11:15 AM, Vinod Kone wrote: > > On Tue, Feb 23, 2016 at 8:44 AM, Zhitao Li wrote: > >> Can we consider to add a new option like "--auto_recovery_cleanup" which >> would

Re: Safe update of agent attributes

2016-02-23 Thread Vinod Kone
On Tue, Feb 23, 2016 at 8:44 AM, Zhitao Li wrote: > Can we consider to add a new option like "--auto_recovery_cleanup" which > would automatically perform the clean up if detected incompatible slave > info, or change the default behavior for "--recover"? > Wouldn't you want to

Re: Mesos 0.25 not incresing Staged/Started counters in the UI

2016-02-23 Thread haosdent
Hi, I am not sure whether your problem related to this issue https://issues.apache.org/jira/browse/MESOS-3282 or not. And could you also help check "master/tasks_staging" value in your "master/metrics/snapshot" endpoint? I think "Staged" value is got from that. On Wed, Feb 24, 2016 at 1:23 AM,

Mesos 0.25 not incresing Staged/Started counters in the UI

2016-02-23 Thread Geoffroy Jabouley
Hello since we moved to Mesos 0.25, we noticed that in the left column of the UI, in the TASKS part, counters for Staged and Started tasks are always equals to 0. [image: Images intégrées 1] Is this normal? Or maybe a known-issue? With 0.22.1, Started counter was always zero but at least

Re: Safe update of agent attributes

2016-02-23 Thread Zhitao Li
Hi Adam, The command `mesos-slave --recover=cleanup` could indeed to be used for clean up an incompatible change. I am still concerned about the possibility that a totally valid attributes or resources value change could leave the Mesos agent to be in crash loop and losing critical tasks after

Re: Can Marathon ensure single instance of a service at any give time?

2016-02-23 Thread Shuai Lin
> > If I would like to allow it to restart on any node in a cluster can I use > Marathon to simplify the implementation or it warrants more involved > implementation using Zoo. Does Mesos provide any other helpers to simplify > this use case? Marathon can do that, but be aware that there is

Re: Can Marathon ensure single instance of a service at any give time?

2016-02-23 Thread Guangya Liu
Marathon is designed for long running services no matter the service is stateless or stateful with HA by default. The Marathon will help restart your services if it is down. Thanks, Guangya On Tue, Feb 23, 2016 at 6:06 PM, Petr Novak wrote: > Hello, > if I need to run

Can Marathon ensure single instance of a service at any give time?

2016-02-23 Thread Petr Novak
Hello, if I need to run single stateless instance or only a single leader doing a work at any given time. Something I would typically implement using Zoo Curator LeaderSelector. Can I use Marathon to ensure this without having to implement mutual exclusion myself? Let's assume that other parts of

Re: Mesos integration with OpenStack HEAT AutoScaling

2016-02-23 Thread Petr Novak
Thanks everybody for answers. Our use case is to run our BigData platform on top of Mesos, pretty classic setup of Kafka, Spark, ES, HDFS + custom services currently with future extensions for other components as decided. It seems doable with Magnum API. We will currently run mostly barebone but