RE: Custom python executor with Docker

2015-11-10 Thread Plotka, Bartlomiej
Hi All,

I can see there are many use cases for extending Docker tasks with some 
pre/post work.

In our solution we want to prepare and attach some volumes (e.g ceph volumes) 
to Docker. As you know we cannot (easily) attach a volume to the running 
Docker, so we need pre script for preparing & attaching in one task. 
Unfortunately, we also need to clean-up these volumes and it seems there is not 
any “slavePostLaunchDockerHook” available now in Mesos.

We’ve developed short script for running such tasks using common Mesos 
Containerizer and without any hook module in Mesos. We can just put it in the 
Marathon “command” parameter. Do you think such created Docker container will 
have limited cpu shares & memory (using Mesos resources)?

Here is pseudo-bash code:
#!/bin/bash

function cleanup {{
  docker rm ;
 
}}

trap cleanup SIGKILL SIGSTOP SIGHUP SIGINT SIGTERM;


docker run --name  -p :22   bash -c 
"";

We are aware that there wouldn’t be any sandbox support and any other Docker 
Containerizer features. However, our main concern is about cgroups… We are not 
sure if our newly created Docker (via this script) will inherit our task’s 
cgroups and as a result will limit memory, cpus via default Mesos Task’s 
resource parameters.

Anyone knows the answer?

As far as I understand without  special configuration ‘Docker run’ somehow pass 
the ‘run’ command to the docker daemon so we don’t have control of container 
cgroup…

Kind Regards,
Bartek Plotka (bplotka)

From: Kevin Sweeney [mailto:kevi...@apache.org]
Sent: Wednesday, August 12, 2015 1:52 AM
To: user@mesos.apache.org
Subject: Re: Custom python executor with Docker

Apache Aurora [1] uses a custom Python executor and supports Docker via the 
containerizer. There's just one problem - the container has to have a Python2.7 
runtime inside that can run the executor PEX file [2]. So if you're okay with 
that restriction you're in business (and you can use the Aurora configuration 
DSL to describe setup/teardown steps).

[1] https://aurora.apache.org
[2] https://pex.readthedocs.org/en/latest/

On Tue, Aug 11, 2015 at 4:42 PM, Tim Chen 
> wrote:
So currently there is a review out for pre-hooks 
(https://reviews.apache.org/r/36185/) before a docker container launches.

We can also add a post hook, but like to see if the specified hook satifies 
what you guys are looking for.

Tim

On Tue, Aug 11, 2015 at 4:28 PM, Tom Fordon 
> wrote:
We ended up implementing a solution where we did the pre/post steps as separate 
mesos tasks and adding logic to our scheduler to ensure they were run on the 
same machine.  If anybody knows of a standard / openly available DockerExecutor 
like what is described below, my team would be greatly interested.


On Fri, Aug 7, 2015 at 4:01 AM, Kapil Malik 
> wrote:
Hi,

We have a similar usecase while running multi-user workloads on mesos. Users 
provide docker images encapsulating application logic, which we (we = say some 
“Central API”) schedule on Chronos / Marathon. However, we need to run some 
standard pre / post steps for every docker submitted by users. We have 
following options –


1.   Ask every user to embed their logic inside a pre-defined docker 
template which will perform pre/post steps.

==> This is error prone, makes us dependent on whether the users followed 
template, and not very popular with users either.



2.   Extend every user docker (FROM <>) and find a way to add pre-post 
steps in our docker. Refer this docker when scheduling on chronos / marathon.

==> Building new dockers does not scale as users and applications grow



3.   Write a custom executor which will perform the pre-post steps and 
manage the user docker lifetime.

==> Deals with user docker lifetime and is obviously complex.

Is there a standard / openly available DockerExecutor which manages the docker 
lifetime and which I can extend to build my custom executor? This way I will be 
concerned only with my custom logic (pre/post steps) and still get benefits of 
a standard way to manage docker containers.

Btw, thanks for the meaningful discussion below, it is very helpful.

Thanks and regards,

Kapil Malik | kma...@adobe.com | 33430 / 8800836581

From: James DeFelice 
[mailto:james.defel...@gmail.com]
Sent: 09 April 2015 18:12
To: user@mesos.apache.org
Subject: Re: Custom python executor with Docker

If you can run the pre/post steps in a container then I'd recommend building a 
Docker image that includes your pre/post step scripting + your algorithm and 
launching it using the built-in mesos Docker containerizer. It's much simpler 
than managing the lifetime of the Docker container yourself.

On Thu, Apr 9, 2015 at 8:29 AM, Tom Fordon 
> wrote:
Thanks for 

RE: Custom python executor with Docker

2015-11-10 Thread Plotka, Bartlomiej
This is somehow possible using Kubernetes over Mesos: 
https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md

Kind Regards,
Bartek Plotka

From: Aaron Carey [mailto:aca...@ilm.com]
Sent: Tuesday, November 10, 2015 4:33 PM
To: user@mesos.apache.org
Subject: RE: Custom python executor with Docker

We would also be interested in some sort of standardised DockerExecutor which 
would allow us to add pre and post launch steps.

Also having the ability to run two containers together as one task would be 
very useful (ie on the same host and linked together)

From: Tom Fordon [tom.for...@gmail.com]
Sent: 12 August 2015 00:28
To: user@mesos.apache.org
Subject: Re: Custom python executor with Docker
We ended up implementing a solution where we did the pre/post steps as separate 
mesos tasks and adding logic to our scheduler to ensure they were run on the 
same machine.  If anybody knows of a standard / openly available DockerExecutor 
like what is described below, my team would be greatly interested.

On Fri, Aug 7, 2015 at 4:01 AM, Kapil Malik 
> wrote:
Hi,

We have a similar usecase while running multi-user workloads on mesos. Users 
provide docker images encapsulating application logic, which we (we = say some 
"Central API") schedule on Chronos / Marathon. However, we need to run some 
standard pre / post steps for every docker submitted by users. We have 
following options -


1.   Ask every user to embed their logic inside a pre-defined docker 
template which will perform pre/post steps.

==> This is error prone, makes us dependent on whether the users followed 
template, and not very popular with users either.



2.   Extend every user docker (FROM <>) and find a way to add pre-post 
steps in our docker. Refer this docker when scheduling on chronos / marathon.

==> Building new dockers does not scale as users and applications grow



3.   Write a custom executor which will perform the pre-post steps and 
manage the user docker lifetime.

==> Deals with user docker lifetime and is obviously complex.

Is there a standard / openly available DockerExecutor which manages the docker 
lifetime and which I can extend to build my custom executor? This way I will be 
concerned only with my custom logic (pre/post steps) and still get benefits of 
a standard way to manage docker containers.

Btw, thanks for the meaningful discussion below, it is very helpful.

Thanks and regards,

Kapil Malik | kma...@adobe.com | 33430 / 8800836581

From: James DeFelice 
[mailto:james.defel...@gmail.com]
Sent: 09 April 2015 18:12
To: user@mesos.apache.org
Subject: Re: Custom python executor with Docker

If you can run the pre/post steps in a container then I'd recommend building a 
Docker image that includes your pre/post step scripting + your algorithm and 
launching it using the built-in mesos Docker containerizer. It's much simpler 
than managing the lifetime of the Docker container yourself.

On Thu, Apr 9, 2015 at 8:29 AM, Tom Fordon 
> wrote:
Thanks for all the responses, I really appreciate the help.  Let me try to 
state my problem more clearly

Our project is performing file-based data processing.  I would like to keep the 
actual algorithm as contained as possible since we are in an R setting and 
will be getting untested code.  We have some pre/post steps that need to be run 
on the same box as the actual algorithm: downloading/uploading files and 
database calls.

We can run the pre/post steps and algorithm within the same container.  The 
algorithm will be a little less contained, but it will work.

Docker letting you specify a cgroup parent is really exciting.  If I invoke a 
docker container with the executor as the cgroup-parent are there any other 
steps I need to perform?  Would I need to do anything special to make mesos 
aware of the resource usage, or is that handled since the docker process would 
be in the executors cgroup?

Thanks again,
Tom

On Tue, Apr 7, 2015 at 8:10 PM, Timothy Chen 
> wrote:
Hi Tom(s),

Tom Arnfeld is right, if you want to launch your own docker container
in your custom executor you will have to handle all the issues
yourself and not able to use the Docker containerizer at all.

Alternatively, you can actually launch your custom executor in a
Docker container by Mesos, by specifying the ContainerInfo in the
ExecutorInfo.
What this means is that your custom executor is already running in a
docker container, and you can do your custom logic afterwards. This
does means you can simply just launch multiple containers in the
executor anymore.

If there is something you want to do and doesnt' fit these let us know
what you're trying to achieve and we can see what we can 

RE: Custom python executor with Docker

2015-11-10 Thread Aaron Carey
Yeah.. it'd be nice to do it natively though :)


From: Plotka, Bartlomiej [bartlomiej.plo...@intel.com]
Sent: 10 November 2015 15:38
To: user@mesos.apache.org
Subject: RE: Custom python executor with Docker

This is somehow possible using Kubernetes over Mesos: 
https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md

Kind Regards,
Bartek Plotka

From: Aaron Carey [mailto:aca...@ilm.com]
Sent: Tuesday, November 10, 2015 4:33 PM
To: user@mesos.apache.org
Subject: RE: Custom python executor with Docker

We would also be interested in some sort of standardised DockerExecutor which 
would allow us to add pre and post launch steps.

Also having the ability to run two containers together as one task would be 
very useful (ie on the same host and linked together)

From: Tom Fordon [tom.for...@gmail.com]
Sent: 12 August 2015 00:28
To: user@mesos.apache.org
Subject: Re: Custom python executor with Docker
We ended up implementing a solution where we did the pre/post steps as separate 
mesos tasks and adding logic to our scheduler to ensure they were run on the 
same machine.  If anybody knows of a standard / openly available DockerExecutor 
like what is described below, my team would be greatly interested.

On Fri, Aug 7, 2015 at 4:01 AM, Kapil Malik 
> wrote:
Hi,

We have a similar usecase while running multi-user workloads on mesos. Users 
provide docker images encapsulating application logic, which we (we = say some 
“Central API”) schedule on Chronos / Marathon. However, we need to run some 
standard pre / post steps for every docker submitted by users. We have 
following options –


1.   Ask every user to embed their logic inside a pre-defined docker 
template which will perform pre/post steps.

==> This is error prone, makes us dependent on whether the users followed 
template, and not very popular with users either.



2.   Extend every user docker (FROM <>) and find a way to add pre-post 
steps in our docker. Refer this docker when scheduling on chronos / marathon.

==> Building new dockers does not scale as users and applications grow



3.   Write a custom executor which will perform the pre-post steps and 
manage the user docker lifetime.

==> Deals with user docker lifetime and is obviously complex.

Is there a standard / openly available DockerExecutor which manages the docker 
lifetime and which I can extend to build my custom executor? This way I will be 
concerned only with my custom logic (pre/post steps) and still get benefits of 
a standard way to manage docker containers.

Btw, thanks for the meaningful discussion below, it is very helpful.

Thanks and regards,

Kapil Malik | kma...@adobe.com | 33430 / 8800836581

From: James DeFelice 
[mailto:james.defel...@gmail.com]
Sent: 09 April 2015 18:12
To: user@mesos.apache.org
Subject: Re: Custom python executor with Docker

If you can run the pre/post steps in a container then I'd recommend building a 
Docker image that includes your pre/post step scripting + your algorithm and 
launching it using the built-in mesos Docker containerizer. It's much simpler 
than managing the lifetime of the Docker container yourself.

On Thu, Apr 9, 2015 at 8:29 AM, Tom Fordon 
> wrote:
Thanks for all the responses, I really appreciate the help.  Let me try to 
state my problem more clearly

Our project is performing file-based data processing.  I would like to keep the 
actual algorithm as contained as possible since we are in an R setting and 
will be getting untested code.  We have some pre/post steps that need to be run 
on the same box as the actual algorithm: downloading/uploading files and 
database calls.

We can run the pre/post steps and algorithm within the same container.  The 
algorithm will be a little less contained, but it will work.

Docker letting you specify a cgroup parent is really exciting.  If I invoke a 
docker container with the executor as the cgroup-parent are there any other 
steps I need to perform?  Would I need to do anything special to make mesos 
aware of the resource usage, or is that handled since the docker process would 
be in the executors cgroup?

Thanks again,
Tom

On Tue, Apr 7, 2015 at 8:10 PM, Timothy Chen 
> wrote:
Hi Tom(s),

Tom Arnfeld is right, if you want to launch your own docker container
in your custom executor you will have to handle all the issues
yourself and not able to use the Docker containerizer at all.

Alternatively, you can actually launch your custom executor in a
Docker container by Mesos, by specifying the ContainerInfo in the
ExecutorInfo.
What this means is that your custom executor is already running in a
docker container, and you can do your 

RE: Custom python executor with Docker

2015-11-10 Thread Aaron Carey
We would also be interested in some sort of standardised DockerExecutor which 
would allow us to add pre and post launch steps.

Also having the ability to run two containers together as one task would be 
very useful (ie on the same host and linked together)


From: Tom Fordon [tom.for...@gmail.com]
Sent: 12 August 2015 00:28
To: user@mesos.apache.org
Subject: Re: Custom python executor with Docker

We ended up implementing a solution where we did the pre/post steps as separate 
mesos tasks and adding logic to our scheduler to ensure they were run on the 
same machine.  If anybody knows of a standard / openly available DockerExecutor 
like what is described below, my team would be greatly interested.

On Fri, Aug 7, 2015 at 4:01 AM, Kapil Malik 
> wrote:
Hi,

We have a similar usecase while running multi-user workloads on mesos. Users 
provide docker images encapsulating application logic, which we (we = say some 
“Central API”) schedule on Chronos / Marathon. However, we need to run some 
standard pre / post steps for every docker submitted by users. We have 
following options –


1.   Ask every user to embed their logic inside a pre-defined docker 
template which will perform pre/post steps.

==> This is error prone, makes us dependent on whether the users followed 
template, and not very popular with users either.



2.   Extend every user docker (FROM <>) and find a way to add pre-post 
steps in our docker. Refer this docker when scheduling on chronos / marathon.

==> Building new dockers does not scale as users and applications grow



3.   Write a custom executor which will perform the pre-post steps and 
manage the user docker lifetime.

==> Deals with user docker lifetime and is obviously complex.

Is there a standard / openly available DockerExecutor which manages the docker 
lifetime and which I can extend to build my custom executor? This way I will be 
concerned only with my custom logic (pre/post steps) and still get benefits of 
a standard way to manage docker containers.

Btw, thanks for the meaningful discussion below, it is very helpful.

Thanks and regards,

Kapil Malik | kma...@adobe.com | 33430 / 8800836581

From: James DeFelice 
[mailto:james.defel...@gmail.com]
Sent: 09 April 2015 18:12
To: user@mesos.apache.org
Subject: Re: Custom python executor with Docker

If you can run the pre/post steps in a container then I'd recommend building a 
Docker image that includes your pre/post step scripting + your algorithm and 
launching it using the built-in mesos Docker containerizer. It's much simpler 
than managing the lifetime of the Docker container yourself.

On Thu, Apr 9, 2015 at 8:29 AM, Tom Fordon 
> wrote:
Thanks for all the responses, I really appreciate the help.  Let me try to 
state my problem more clearly

Our project is performing file-based data processing.  I would like to keep the 
actual algorithm as contained as possible since we are in an R setting and 
will be getting untested code.  We have some pre/post steps that need to be run 
on the same box as the actual algorithm: downloading/uploading files and 
database calls.

We can run the pre/post steps and algorithm within the same container.  The 
algorithm will be a little less contained, but it will work.

Docker letting you specify a cgroup parent is really exciting.  If I invoke a 
docker container with the executor as the cgroup-parent are there any other 
steps I need to perform?  Would I need to do anything special to make mesos 
aware of the resource usage, or is that handled since the docker process would 
be in the executors cgroup?

Thanks again,
Tom

On Tue, Apr 7, 2015 at 8:10 PM, Timothy Chen 
> wrote:
Hi Tom(s),

Tom Arnfeld is right, if you want to launch your own docker container
in your custom executor you will have to handle all the issues
yourself and not able to use the Docker containerizer at all.

Alternatively, you can actually launch your custom executor in a
Docker container by Mesos, by specifying the ContainerInfo in the
ExecutorInfo.
What this means is that your custom executor is already running in a
docker container, and you can do your custom logic afterwards. This
does means you can simply just launch multiple containers in the
executor anymore.

If there is something you want to do and doesnt' fit these let us know
what you're trying to achieve and we can see what we can do.

Tim

On Tue, Apr 7, 2015 at 4:15 PM, Tom Arnfeld 
> wrote:
> It's not possible to invoke the docker containerizer from outside of Mesos,
> as far as I know.
>
> If you persue this route, you can run into issues with orphaned containers
> as your executor may die for some unknown reason, and the container is still
> 

Re: Zookeeper cluster changes

2015-11-10 Thread Donald Laidlaw
I agree, you want to apply the changes gradually so as not to lose a quorum. 
The problem is automating this so that it happens in a lights-out environment, 
in the cloud, without some poor slob's pager going off in the middle of the 
night :)

While health checks can detect and replace a dead server reliably on any number 
of clouds, the new server comes up with a new IP address. This server can 
reliably join into zookeeper ensemble. However, it is tough to automate the 
rolling restart of the other mesos servers, both masters and slaves, that needs 
to occur to keep them happy. 

One thing I have not tried is to just ignore the change, and use something to 
detect the masters just prior to starting mesos. If they truly fail fast, then 
if they lose a zookeeper connection, then maybe they don’t care that they have 
been started with an out-of-date list of zookeeper servers.

What does mesos-master and mesos-slave do with a list of zookeeper servers to 
connect to? Just try them in order until one works, then use that one until it 
fails? If so, and it fails fast, then letting it continue to run with a stale 
list will have no ill effects. Or does it keep trying the servers in the list 
when a connection fails? 

Don Laidlaw


> On Nov 10, 2015, at 4:42 AM, Erik Weathers  wrote:
> 
> Keep in mind that mesos is designed to "fail fast".  So when there are 
> problems (such as losing connectivity to the resolved ZooKeeper IP) the 
> daemon(s) (master & slave) die.
> 
> Due to this design, we are all supposed to run the mesos daemons under 
> "supervision", which means auto-restart after they crash.  This can be done 
> with monit/god/runit/etc.
> 
> So, to perform maintenance on ZooKeeper, I would firstly ensure the 
> mesos-master processes are running under "supervision" so that they restart 
> quickly after a ZK connectivity failure occurs.  Then proceed with standard 
> ZooKeeper maintenance (exhibitor-based or manual), pausing between downing of 
> ZK servers to ensure you have "enough" mesos-master processes running.  (I 
> *would* say a "pausing until you have a quorum of mesos-masters up", but if 
> you only have 2 of 3 up and then take down the ZK that the leader is 
> connected to, that would be temporarily bad.  So I'd make sure they're all 
> up.)
> 
> - Erik
> 
> On Mon, Nov 9, 2015 at 11:07 PM, Marco Massenzio  > wrote:
> The way I would do it in a production cluster would be *not* to use directly 
> IP addresses for the ZK ensemble, but instead rely on some form of internal 
> DNS and use internally-resolvable hostnames (eg, {zk1, zk2, 
> ...}.prod.example.com  etc) and have the 
> provisioning tooling (Chef, Puppet, Ansible, what have you) handle the 
> setting of the hostname when restarting/replacing a failing/crashed ZK server.
> 
> This way your list of zk's to Mesos never changes, even though the FQN's will 
> map to different IPs / VMs.
> 
> Obviously, this may not be always desirable / feasible (eg, if your prod 
> environment does not support DNS resolution).
> 
> You are correct in that Mesos does not currently support dynamically changing 
> the ZK's addresses, but I don't know whether that's a limitation of Mesos 
> code or of the ZK C++ client driver.
> I'll look into it and let you know what I find (if anything).
> 
> --
> Marco Massenzio
> Distributed Systems Engineer
> http://codetrips.com 
> 
> On Mon, Nov 9, 2015 at 6:01 AM, Donald Laidlaw  > wrote:
> How do mesos masters and slaves react to zookeeper cluster changes? When the 
> masters and slaves start they are given a set of addresses to connect to 
> zookeeper. But over time, one of those zookeepers fails, and is replaced by a 
> new server at a new address. How should this be handled in the mesos servers?
> 
> I am guessing that mesos does not automatically detect and react to that 
> change. But obviously we should do something to keep the mesos servers happy 
> as well. What should be do?
> 
> The obvious thing is to stop the mesos servers, one at a time, and restart 
> them with the new configuration. But it would be really nice to be able to do 
> this dynamically without restarting the server. After all, coordinating a 
> rolling restart is a fairly hard job.
> 
> Any suggestions or pointers?
> 
> Best regards,
> Don Laidlaw
> 
> 
> 
> 



Re: spark on mesos with docker issue

2015-11-10 Thread Rad Gruchalski
Stavros,  

As mentioned a couple of weeks ago: 
https://issues.apache.org/jira/browse/SPARK-11638
Happy to answer any questions.










Kind regards,

Radek Gruchalski

ra...@gruchalski.com (mailto:ra...@gruchalski.com)
 
(mailto:ra...@gruchalski.com)
de.linkedin.com/in/radgruchalski/ (http://de.linkedin.com/in/radgruchalski/)

Confidentiality:
This communication is intended for the above-named person and may be 
confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must 
you copy or show it to anyone; please delete/destroy and inform the sender 
immediately.



On Thursday, 22 October 2015 at 22:45, Stavros Kontopoulos wrote:

> Thnx Rad, sounds pretty cool :). Elizabeth one note for the jira ticket i do 
> not run the cluster with zookeeper, i use mesos master in stand alone mode... 
> i guess it makes no difference right?
>  
> On Thu, Oct 22, 2015 at 10:38 PM, Rad Gruchalski  (mailto:ra...@gruchalski.com)> wrote:
> > There are 2 things:  
> >  
> >  - Akka remote in 2.3.x does not support advertising hostname / port 
> > different to what it binds to
> >  - All other services: file server, broadcast server, repl class server do 
> > not support advertising hostnames / ports different than what they bind to
> >  
> > Just to expand on the previous one, we are in the process of contributing 
> > the following bits:  
> >  
> >  - akka-remote bind-hostname and bind-port backport to akka 2.3.x (not the 
> > typesafe closed support implementation, our own implementation)
> >  - spark patches for spark 1.4.0+ which enable running Spark on Mesos in 
> > Docker Bridge networking
> >  
> > Unfortunately, due to the nature of how my employer operates I can’t share 
> > the code yet. We are working with our legal team to make these available 
> > asap.
> >  
> > We do run this stuff in production.  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> > Kind regards,

> > Radek Gruchalski
> > 
ra...@gruchalski.com (mailto:ra...@gruchalski.com)
 
> > (mailto:ra...@gruchalski.com)
> > de.linkedin.com/in/radgruchalski/ (http://de.linkedin.com/in/radgruchalski/)
> >  
> > Confidentiality:
> > This communication is intended for the above-named person and may be 
> > confidential and/or legally privileged.
> > If it has come to you in error you must take no action based on it, nor 
> > must you copy or show it to anyone; please delete/destroy and inform the 
> > sender immediately.
> >  
> >  
> >  
> > On Thursday, 22 October 2015 at 22:28, Iulian Dragoș wrote:
> >  
> > >  
> > >  
> > > On Thu, Oct 22, 2015 at 9:10 PM, Rad Gruchalski  > > (mailto:ra...@gruchalski.com)> wrote:
> > > > Stavros,  
> > > >  
> > > > Spark does not support this. I am currently in the process of 
> > > > submitting patches for it however it first has to pass through the 
> > > > legal team at the company I work for.
> > >  
> > > What exactly is missing in Spark?
> > >   
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > > Kind regards,

> > > > Radek Gruchalski
> > > > 
ra...@gruchalski.com (mailto:ra...@gruchalski.com)
 
> > > > (mailto:ra...@gruchalski.com)
> > > > de.linkedin.com/in/radgruchalski/ 
> > > > (http://de.linkedin.com/in/radgruchalski/)
> > > >  
> > > > Confidentiality:
> > > > This communication is intended for the above-named person and may be 
> > > > confidential and/or legally privileged.
> > > > If it has come to you in error you must take no action based on it, nor 
> > > > must you copy or show it to anyone; please delete/destroy and inform 
> > > > the sender immediately.
> > > >  
> > > >  
> > > >  
> > > > On Thursday, 22 October 2015 at 21:08, Stavros Kontopoulos wrote:
> > > >  
> > > > > Bridge... with the latest mesos library vesion 0.25...
> > > > >  
> > > > > On Thu, Oct 22, 2015 at 9:07 PM, Elizabeth Lingg 
> > > > >  wrote:
> > > > > > Are you using Bridge or Host Networking?
> > > > > >  
> > > > > > -Elizabeth
> > > > > >  
> > > > > >  
> > > > > >  
> > > > > > On Thu, Oct 22, 2015 at 12:02 PM, Stavros Kontopoulos 
> > > > > >  wrote:
> > > > > > > Hi,
> > > > > > >  
> > > > > > > Im using spark on mesos on docker. I have linked my slaves to the 
> > > > > > > master and a
> > > > > > > spark repl works fine inside the master container.
> > > > > > >  
> > > > > > > If i try to crate the same spark repl form the host i get stuck 
> > > > > > > at the point when the framework tries to register to the mesos 
> > > > > > > master (here the framework is the spark repl itself).
> > > > > > > I can ping the container from my host and vice versa. So 
> > > > > > > networking its not the problem.
> > > > > > > What i noticed form the logs is that mesos does not resolve the 
> > > > > > > correct ip:
> > > > > > >  

Re: Zookeeper cluster changes

2015-11-10 Thread Erik Weathers
Keep in mind that mesos is designed to "fail fast".  So when there are
problems (such as losing connectivity to the resolved ZooKeeper IP) the
daemon(s) (master & slave) die.

Due to this design, we are all supposed to run the mesos daemons under
"supervision", which means auto-restart after they crash.  This can be done
with monit/god/runit/etc.

So, to perform maintenance on ZooKeeper, I would firstly ensure the
mesos-master processes are running under "supervision" so that they restart
quickly after a ZK connectivity failure occurs.  Then proceed with standard
ZooKeeper maintenance (exhibitor-based or manual), pausing between downing
of ZK servers to ensure you have "enough" mesos-master processes running.
 (I *would* say a "pausing until you have a quorum of mesos-masters up",
but if you only have 2 of 3 up and then take down the ZK that the leader is
connected to, that would be temporarily bad.  So I'd make sure they're all
up.)

- Erik

On Mon, Nov 9, 2015 at 11:07 PM, Marco Massenzio 
wrote:

> The way I would do it in a production cluster would be *not* to use
> directly IP addresses for the ZK ensemble, but instead rely on some form of
> internal DNS and use internally-resolvable hostnames (eg, {zk1, zk2, ...}.
> prod.example.com etc) and have the provisioning tooling (Chef, Puppet,
> Ansible, what have you) handle the setting of the hostname when
> restarting/replacing a failing/crashed ZK server.
>
> This way your list of zk's to Mesos never changes, even though the FQN's
> will map to different IPs / VMs.
>
> Obviously, this may not be always desirable / feasible (eg, if your prod
> environment does not support DNS resolution).
>
> You are correct in that Mesos does not currently support dynamically
> changing the ZK's addresses, but I don't know whether that's a limitation
> of Mesos code or of the ZK C++ client driver.
> I'll look into it and let you know what I find (if anything).
>
> --
> *Marco Massenzio*
> Distributed Systems Engineer
> http://codetrips.com
>
> On Mon, Nov 9, 2015 at 6:01 AM, Donald Laidlaw  wrote:
>
>> How do mesos masters and slaves react to zookeeper cluster changes? When
>> the masters and slaves start they are given a set of addresses to connect
>> to zookeeper. But over time, one of those zookeepers fails, and is replaced
>> by a new server at a new address. How should this be handled in the mesos
>> servers?
>>
>> I am guessing that mesos does not automatically detect and react to that
>> change. But obviously we should do something to keep the mesos servers
>> happy as well. What should be do?
>>
>> The obvious thing is to stop the mesos servers, one at a time, and
>> restart them with the new configuration. But it would be really nice to be
>> able to do this dynamically without restarting the server. After all,
>> coordinating a rolling restart is a fairly hard job.
>>
>> Any suggestions or pointers?
>>
>> Best regards,
>> Don Laidlaw
>>
>>
>>
>


Re: Mesos and Zookeeper TCP keepalive

2015-11-10 Thread tommy xiao
same here , same question with Erik. could you please input more background
info, thanks

2015-11-10 15:56 GMT+08:00 Erik Weathers :

> It would really help if you (Jeremy) explained the *actual* problem you
> are facing.  I'm *guessing* that it's a firewall timing out the sessions
> because there isn't activity on them for whatever the timeout of the
> firewall is?   It seems likely to be unreasonably short, given that mesos
> has constant activity between master and
> slave/agent/whatever-it-is-being-called-nowadays-but-not-really-yet-maybe-someday-for-reals.
>
> - Erik
>
> On Mon, Nov 9, 2015 at 10:00 PM, Jojy Varghese  wrote:
>
>> Hi Jeremy
>>  Its great that you are making progress but I doubt if this is what you
>> intend to achieve since network failures are a valid state in distributed
>> systems. If you think there is a special case you are trying to solve, I
>> suggest proposing a design document for review.
>>   For ZK client code, I would suggest asking the zookeeper mailing list.
>>
>> thanks
>> -Jojy
>>
>> On Nov 9, 2015, at 7:56 PM, Jeremy Olexa  wrote:
>>
>> Alright, great, I'm making some progress,
>>
>> I did a simple copy/paste modification and recompiled mesos. The
>> keepalive timer is set from slave to master so this is an improvement for
>> me. I didn't test the other direction yet -
>> https://gist.github.com/jolexa/ee9e152aa7045c558e02 - I'd like to file
>> an enhancement request for this since it seems like an improvement for
>> other people as well, after some real world testing
>>
>> I'm having some harder time figuring out the zk client code. I started by
>> modifying build/3rdparty/zookeeper-3.4.5/src/c/zookeeper.c but either a) my
>> change wasn't correct or b) I'm modifying a wrong file, since I
>> just assumed using the c client. Is this the correct place?
>>
>> Thanks much,
>> Jeremy
>>
>>
>> --
>> *From:* Jojy Varghese 
>> *Sent:* Monday, November 9, 2015 2:09 PM
>> *To:* user@mesos.apache.org
>> *Subject:* Re: Mesos and Zookeeper TCP keepalive
>>
>> Hi Jeremy
>>  The “network” code is at
>> "3rdparty/libprocess/include/process/network.hpp” ,
>> "3rdparty/libprocess/src/poll_socket.hpp/cpp”.
>>
>> thanks
>> jojy
>>
>>
>> On Nov 9, 2015, at 6:54 AM, Jeremy Olexa  wrote:
>>
>> Hi all,
>>
>> Jojy, That is correct, but more specifically a keepalive timer from slave
>> to master and slave to zookeeper. Can you send a link to the portion of the
>> code that builds the socket/connection? Is there any reason to not set the
>> SO_KEEPALIVE option in your opinion?
>>
>> hasodent, I'm not looking for keepalive between zk quorum members, like
>> the ZOOKEEPER JIRA is referencing.
>>
>> Thanks,
>> Jeremy
>>
>>
>> --
>> *From:* Jojy Varghese 
>> *Sent:* Sunday, November 8, 2015 8:37 PM
>> *To:* user@mesos.apache.org
>> *Subject:* Re: Mesos and Zookeeper TCP keepalive
>>
>> Hi Jeremy
>>   Are you trying to establish a keepalive timer between mesos master and
>> mesos slave? If so, I don’t believe its possible today as SO_KEEPALIVE
>> option is  not set on an accepting socket.
>>
>> -Jojy
>>
>> On Nov 8, 2015, at 8:43 AM, haosdent  wrote:
>>
>> I think keepalive option should be set in Zookeeper, not in Mesos. See
>> this related issue in Zookeeper.
>> https://issues.apache.org/jira/browse/ZOOKEEPER-2246?focusedCommentId=14724085=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14724085
>>
>> On Sun, Nov 8, 2015 at 4:47 AM, Jeremy Olexa 
>> wrote:
>>
>>> Hello all,
>>>
>>> We have been fighting some network/session disconnection issues between
>>> datacenters and I'm curious if there is anyway to enable tcp keepalive on
>>> the zookeeper/mesos sockets? If there was a way, then the sysctl tcp
>>> kernel settings would be used. I believe keepalive has to be enabled by the
>>> software which is opening the connection. (That is my understanding anyway)
>>>
>>> Here is what I see via netstat --timers -tn:
>>> tcp0  0 172.18.1.1:55842  10.10.1.1:2181
>>>  ESTABLISHED off (0.00/0/0)
>>> tcp0  0 172.18.1.1:49702  10.10.1.1:5050
>>>  ESTABLISHED off (0.00/0/0)
>>>
>>>
>>> Where 172 is the mesos-slave network and 10 is the mesos-master network.
>>> The "off" keyword means that keepalive's are not being sent.
>>>
>>> I've trolled through JIRA, git, etc and cannot easily determine if this
>>> is expected behavior or should be an enhancement request. Any ideas?
>>>
>>> Thanks much!
>>> -Jeremy
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>>
>>
>


-- 
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com


Re: Failed to authenticate

2015-11-10 Thread Pradeep Kiruvale
This issue is only on centos 7, On ubuntu its working fine.

Any idea?

Regards,
Pradeep


On 9 November 2015 at 17:32, Pradeep Kiruvale 
wrote:

> Hi All,
>
> I am getting authentication issue on my mesos cluster
>
> Please find the slave side and master side logs.
>
> Regards,
> Pradeep
>
> *Slave  logs *
>
> W1110 01:54:18.641191 111550 slave.cpp:877] Authentication timed out
> W1110 01:54:18.641309 111550 slave.cpp:841] Failed to authenticate with
> master master@192.168.0.102:5050: Authentication discarded
> I1110 01:54:18.641355 111550 slave.cpp:792] Authenticating with master
> master@192.168.0.102:5050
> I1110 01:54:18.641369 111550 slave.cpp:797] Using default CRAM-MD5
> authenticatee
> I1110 01:54:18.641616 111539 authenticatee.cpp:123] Creating new client
> SASL connection
> W1110 01:54:23.646075 111555 slave.cpp:877] Authentication timed out
> W1110 01:54:23.646205 111555 slave.cpp:841] Failed to authenticate with
> master master@192.168.0.102:5050: Authentication discarded
> I1110 01:54:23.646266 111555 slave.cpp:792] Authenticating with master
> master@192.168.0.102:5050
> I1110 01:54:23.646286 111555 slave.cpp:797] Using default CRAM-MD5
> authenticatee
> I1110 01:54:23.646406 111544 authenticatee.cpp:123] Creating new client
> SASL connection
> W1110 01:54:28.651070 111554 slave.cpp:877] Authentication timed out
> W1110 01:54:28.651206 111554 slave.cpp:841] Failed to authenticate with
> master master@192.168.0.102:5050: Authentication discarded
> I1110 01:54:28.651257 111554 slave.cpp:792] Authenticating with master
> master@192.168.0.102:5050
>
>
> *Master logs*
>
> E1109 17:27:36.455260 27950 process.cpp:1911] Failed to shutdown socket
> with fd 11: Transport endpoint is not connected
> W1109 17:27:36.455517 27949 master.cpp:5177] Failed to authenticate
> slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
> E1109 17:27:36.455602 27950 process.cpp:1911] Failed to shutdown socket
> with fd 12: Transport endpoint is not connected
> I1109 17:27:41.459787 27946 master.cpp:5150] Authenticating slave(1)@
> 192.168.0.169:5051
> I1109 17:27:41.460211 27946 authenticator.cpp:100] Creating new server
> SASL connection
> E1109 17:27:41.460376 27950 process.cpp:1911] Failed to shutdown socket
> with fd 11: Transport endpoint is not connected
> W1109 17:27:41.460578 27947 master.cpp:5177] Failed to authenticate
> slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
> E1109 17:27:41.460695 27950 process.cpp:1911] Failed to shutdown socket
> with fd 12: Transport endpoint is not connected
> I1109 17:27:46.460510 27948 master.cpp:5150] Authenticating slave(1)@
> 192.168.0.169:5051
> I1109 17:27:46.460930 27944 authenticator.cpp:100] Creating new server
> SASL connection
> E1109 17:27:46.461139 27950 process.cpp:1911] Failed to shutdown socket
> with fd 11: Transport endpoint is not connected
> W1109 17:27:46.461392 27944 master.cpp:5177] Failed to authenticate
> slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
> E1109 17:27:46.461444 27950 process.cpp:1911] Failed to shutdown socket
> with fd 12: Transport endpoint is not connected
> I1109 17:27:51.466349 27945 master.cpp:5150] Authenticating slave(1)@
> 192.168.0.169:5051
> I1109 17:27:51.466747 27945 authenticator.cpp:100] Creating new server
> SASL connection
>
>


Re: mess slave can't register to master via master ip:port

2015-11-10 Thread Xiaodong Zhang
What should I do in this scenarios:

slave register to master with --master=masterip:masterport

After that ,master nodes change their leader.

I found mesos-slave can’t not register to master anymore. So it seems 
masterip:masterport is not a PROD-READY choice.

Does that mean slaves have to register to master via zk?

If use zk. How should mesos make the communication security when my master and 
slave communicate each other via public ip.


发件人: Guangya Liu >
答复: "user@mesos.apache.org" 
>
日期: 2015年11月3日 星期二 下午2:10
至: "user@mesos.apache.org" 
>
主题: Re: mess slave can't register to master via master ip:port

I filed a jira ticket https://issues.apache.org/jira/browse/MESOS-3822 to trace 
this. Thanks.

On Tue, Nov 3, 2015 at 2:02 PM, haosdent 
> wrote:
I think it is not correct.

On Tue, Nov 3, 2015 at 12:44 PM, Xiaodong Zhang 
> wrote:
If that so. I think this document should be modified.

http://mesos.apache.org/documentation/latest/configuration/#SlaveOptions

[cid:30B6D8D5-217E-42F9-925E-BBBEBCF636A3]

Right?


发件人: Guangya Liu >
答复: "user@mesos.apache.org" 
>
日期: 2015年11月3日 星期二 下午12:39
至: "user@mesos.apache.org" 
>
主题: Re: mess slave can't register to master via master ip:port

Seems mesos does not support such mode, please refer to 
https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L105-L111 for 
the format of "--master". Thanks!

On Tue, Nov 3, 2015 at 12:28 PM, haosdent 
> wrote:
After checking code, seems Mesos only support --master=IP1:5050 or 
--master=zk://xx or --master=file:///.

On Tue, Nov 3, 2015 at 12:15 PM, haosdent 
> wrote:
Do your masters have already managed by zookeeper? And what is your master 
start command?

On Tue, Nov 3, 2015 at 12:06 PM, Xiaodong Zhang 
> wrote:
Hi all:

My slave command like this:

/usr/sbin/mesos-slave --master=IP1:5050,IP2:5050,IP3:5050 …. --credential …

Only if IP1 is the leader, the slave can register to master successfully, Or it 
will register fail.

Slave log like this:

Creating new client SASL connection
Authentication timed out
Failed to authenticate with master 
master@172.31.43.77:5050: Authentication 
discarded
Authenticating with master 
master@172.31.43.77:5050
Using default CRAM-MD5 authenticatee

Is this a bug?Or it is designed like this.

BTW: --master:zk://xxx work well.



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang




--
Best Regards,
Haosdent Huang



Re: mess slave can't register to master via master ip:port

2015-11-10 Thread haosdent
How about use zookeeper acl?
https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_ZooKeeperAccessControl

On Tue, Nov 10, 2015 at 6:01 PM, Xiaodong Zhang  wrote:

> What should I do in this scenarios:
>
> slave register to master with --master=masterip:masterport
>
> After that ,master nodes change their leader.
>
> I found mesos-slave can’t not register to master anymore. So it seems
> masterip:masterport is not a PROD-READY choice.
>
> Does that mean slaves have to register to master via zk?
>
> If use zk. How should mesos make the communication security when my master
> and slave communicate each other via public ip.
>
>
> 发件人: Guangya Liu 
> 答复: "user@mesos.apache.org" 
> 日期: 2015年11月3日 星期二 下午2:10
>
> 至: "user@mesos.apache.org" 
> 主题: Re: mess slave can't register to master via master ip:port
>
> I filed a jira ticket https://issues.apache.org/jira/browse/MESOS-3822 to
> trace this. Thanks.
>
> On Tue, Nov 3, 2015 at 2:02 PM, haosdent  wrote:
>
>> I think it is not correct.
>>
>> On Tue, Nov 3, 2015 at 12:44 PM, Xiaodong Zhang 
>> wrote:
>>
>>> If that so. I think this document should be modified.
>>>
>>> http://mesos.apache.org/documentation/latest/configuration/#SlaveOptions
>>>
>>>
>>> Right?
>>>
>>>
>>> 发件人: Guangya Liu 
>>> 答复: "user@mesos.apache.org" 
>>> 日期: 2015年11月3日 星期二 下午12:39
>>> 至: "user@mesos.apache.org" 
>>> 主题: Re: mess slave can't register to master via master ip:port
>>>
>>> Seems mesos does not support such mode, please refer to
>>> https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L105-L111
>>> for the format of "--master". Thanks!
>>>
>>> On Tue, Nov 3, 2015 at 12:28 PM, haosdent  wrote:
>>>
 After checking code, seems Mesos only support --master=IP1:5050
 or --master=zk://xx or --master=file:///.

 On Tue, Nov 3, 2015 at 12:15 PM, haosdent  wrote:

> Do your masters have already managed by zookeeper? And what is your
> master start command?
>
> On Tue, Nov 3, 2015 at 12:06 PM, Xiaodong Zhang 
> wrote:
>
>> Hi all:
>>
>> My slave command like this:
>>
>> /usr/sbin/mesos-slave --master=IP1:5050,IP2:5050,IP3:5050 ….
>> --credential …
>>
>> Only if IP1 is the leader, the slave can register to master
>> successfully, Or it will register fail.
>>
>> Slave log like this:
>>
>> Creating new client SASL connection
>> Authentication timed out
>> Failed to authenticate with master master@172.31.43.77:5050:
>> Authentication discarded
>> Authenticating with master master@172.31.43.77:5050
>> Using default CRAM-MD5 authenticatee
>>
>> Is this a bug?Or it is designed like this.
>>
>> BTW: --master:zk://xxx work well.
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



 --
 Best Regards,
 Haosdent Huang

>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>


-- 
Best Regards,
Haosdent Huang


Re: Failed to authenticate

2015-11-10 Thread Klaus Ma
would you help to log a JIRA to trace it?


Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
Platform Symphony/DCOS Development & Support, STG, IBM GCG
+86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me

On Tue, Nov 10, 2015 at 5:12 PM, Pradeep Kiruvale  wrote:

> This issue is only on centos 7, On ubuntu its working fine.
>
> Any idea?
>
> Regards,
> Pradeep
>
>
> On 9 November 2015 at 17:32, Pradeep Kiruvale 
> wrote:
>
>> Hi All,
>>
>> I am getting authentication issue on my mesos cluster
>>
>> Please find the slave side and master side logs.
>>
>> Regards,
>> Pradeep
>>
>> *Slave  logs *
>>
>> W1110 01:54:18.641191 111550 slave.cpp:877] Authentication timed out
>> W1110 01:54:18.641309 111550 slave.cpp:841] Failed to authenticate with
>> master master@192.168.0.102:5050: Authentication discarded
>> I1110 01:54:18.641355 111550 slave.cpp:792] Authenticating with master
>> master@192.168.0.102:5050
>> I1110 01:54:18.641369 111550 slave.cpp:797] Using default CRAM-MD5
>> authenticatee
>> I1110 01:54:18.641616 111539 authenticatee.cpp:123] Creating new client
>> SASL connection
>> W1110 01:54:23.646075 111555 slave.cpp:877] Authentication timed out
>> W1110 01:54:23.646205 111555 slave.cpp:841] Failed to authenticate with
>> master master@192.168.0.102:5050: Authentication discarded
>> I1110 01:54:23.646266 111555 slave.cpp:792] Authenticating with master
>> master@192.168.0.102:5050
>> I1110 01:54:23.646286 111555 slave.cpp:797] Using default CRAM-MD5
>> authenticatee
>> I1110 01:54:23.646406 111544 authenticatee.cpp:123] Creating new client
>> SASL connection
>> W1110 01:54:28.651070 111554 slave.cpp:877] Authentication timed out
>> W1110 01:54:28.651206 111554 slave.cpp:841] Failed to authenticate with
>> master master@192.168.0.102:5050: Authentication discarded
>> I1110 01:54:28.651257 111554 slave.cpp:792] Authenticating with master
>> master@192.168.0.102:5050
>>
>>
>> *Master logs*
>>
>> E1109 17:27:36.455260 27950 process.cpp:1911] Failed to shutdown socket
>> with fd 11: Transport endpoint is not connected
>> W1109 17:27:36.455517 27949 master.cpp:5177] Failed to authenticate
>> slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
>> E1109 17:27:36.455602 27950 process.cpp:1911] Failed to shutdown socket
>> with fd 12: Transport endpoint is not connected
>> I1109 17:27:41.459787 27946 master.cpp:5150] Authenticating slave(1)@
>> 192.168.0.169:5051
>> I1109 17:27:41.460211 27946 authenticator.cpp:100] Creating new server
>> SASL connection
>> E1109 17:27:41.460376 27950 process.cpp:1911] Failed to shutdown socket
>> with fd 11: Transport endpoint is not connected
>> W1109 17:27:41.460578 27947 master.cpp:5177] Failed to authenticate
>> slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
>> E1109 17:27:41.460695 27950 process.cpp:1911] Failed to shutdown socket
>> with fd 12: Transport endpoint is not connected
>> I1109 17:27:46.460510 27948 master.cpp:5150] Authenticating slave(1)@
>> 192.168.0.169:5051
>> I1109 17:27:46.460930 27944 authenticator.cpp:100] Creating new server
>> SASL connection
>> E1109 17:27:46.461139 27950 process.cpp:1911] Failed to shutdown socket
>> with fd 11: Transport endpoint is not connected
>> W1109 17:27:46.461392 27944 master.cpp:5177] Failed to authenticate
>> slave(1)@192.168.0.169:5051: Failed to communicate with authenticatee
>> E1109 17:27:46.461444 27950 process.cpp:1911] Failed to shutdown socket
>> with fd 12: Transport endpoint is not connected
>> I1109 17:27:51.466349 27945 master.cpp:5150] Authenticating slave(1)@
>> 192.168.0.169:5051
>> I1109 17:27:51.466747 27945 authenticator.cpp:100] Creating new server
>> SASL connection
>>
>>
>


Re: Custom python executor with Docker

2015-11-10 Thread Weitao
Customized framework with Mesos persistent volumn can come to your desired 
feature, IMO.  Which will recover the volumn after framework deregistered.


> 在 2015年11月10日,23:43,Aaron Carey  写道:
> 
> Yeah.. it'd be nice to do it natively though :)
> 
> From: Plotka, Bartlomiej [bartlomiej.plo...@intel.com]
> Sent: 10 November 2015 15:38
> To: user@mesos.apache.org
> Subject: RE: Custom python executor with Docker
> 
> This is somehow possible using Kubernetes over Mesos: 
> https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/mesos.md
>  
> Kind Regards,
> Bartek Plotka
>  
> From: Aaron Carey [mailto:aca...@ilm.com] 
> Sent: Tuesday, November 10, 2015 4:33 PM
> To: user@mesos.apache.org
> Subject: RE: Custom python executor with Docker
>  
> We would also be interested in some sort of standardised DockerExecutor which 
> would allow us to add pre and post launch steps.
> 
> Also having the ability to run two containers together as one task would be 
> very useful (ie on the same host and linked together)
> 
> From: Tom Fordon [tom.for...@gmail.com]
> Sent: 12 August 2015 00:28
> To: user@mesos.apache.org
> Subject: Re: Custom python executor with Docker
> 
> We ended up implementing a solution where we did the pre/post steps as 
> separate mesos tasks and adding logic to our scheduler to ensure they were 
> run on the same machine.  If anybody knows of a standard / openly available 
> DockerExecutor like what is described below, my team would be greatly 
> interested.
>  
> On Fri, Aug 7, 2015 at 4:01 AM, Kapil Malik  wrote:
> Hi,
>  
> We have a similar usecase while running multi-user workloads on mesos. Users 
> provide docker images encapsulating application logic, which we (we = say 
> some “Central API”) schedule on Chronos / Marathon. However, we need to run 
> some standard pre / post steps for every docker submitted by users. We have 
> following options –
>  
> 1.   Ask every user to embed their logic inside a pre-defined docker 
> template which will perform pre/post steps.
> è This is error prone, makes us dependent on whether the users followed 
> template, and not very popular with users either.
>  
> 2.   Extend every user docker (FROM <>) and find a way to add pre-post 
> steps in our docker. Refer this docker when scheduling on chronos / marathon.
> è Building new dockers does not scale as users and applications grow
>  
> 3.   Write a custom executor which will perform the pre-post steps and 
> manage the user docker lifetime.
> è Deals with user docker lifetime and is obviously complex.
>  
> Is there a standard / openly available DockerExecutor which manages the 
> docker lifetime and which I can extend to build my custom executor? This way 
> I will be concerned only with my custom logic (pre/post steps) and still get 
> benefits of a standard way to manage docker containers.
>  
> Btw, thanks for the meaningful discussion below, it is very helpful.
>  
> Thanks and regards,
>  
> Kapil Malik | kma...@adobe.com | 33430 / 8800836581
>  
> From: James DeFelice [mailto:james.defel...@gmail.com] 
> Sent: 09 April 2015 18:12
> To: user@mesos.apache.org
> Subject: Re: Custom python executor with Docker
>  
> If you can run the pre/post steps in a container then I'd recommend building 
> a Docker image that includes your pre/post step scripting + your algorithm 
> and launching it using the built-in mesos Docker containerizer. It's much 
> simpler than managing the lifetime of the Docker container yourself.
>  
> On Thu, Apr 9, 2015 at 8:29 AM, Tom Fordon  wrote:
> Thanks for all the responses, I really appreciate the help.  Let me try to 
> state my problem more clearly
>  
> Our project is performing file-based data processing.  I would like to keep 
> the actual algorithm as contained as possible since we are in an R setting 
> and will be getting untested code.  We have some pre/post steps that need to 
> be run on the same box as the actual algorithm: downloading/uploading files 
> and database calls.
>  
> We can run the pre/post steps and algorithm within the same container.  The 
> algorithm will be a little less contained, but it will work.
>  
> Docker letting you specify a cgroup parent is really exciting.  If I invoke a 
> docker container with the executor as the cgroup-parent are there any other 
> steps I need to perform?  Would I need to do anything special to make mesos 
> aware of the resource usage, or is that handled since the docker process 
> would be in the executors cgroup?
>  
> Thanks again,
> Tom
>  
> On Tue, Apr 7, 2015 at 8:10 PM, Timothy Chen  wrote:
> Hi Tom(s),
> 
> Tom Arnfeld is right, if you want to launch your own docker container
> in your custom executor you will have to handle all the issues
> yourself and not able to use the Docker containerizer at all.
> 
> Alternatively, you can actually launch your custom executor in a
> Docker container 

Re: Mesos and Zookeeper TCP keepalive

2015-11-10 Thread Joris Van Remoortere
Hi Jeremy,

Can you read the description of these

parameters on the master, and possibly share your values for these flags?

It seems from the re-registration attempt on the agent, that the master has
already treated the agent as "failed", and so will tell it to shut down on
any re-registration attempt.

I'm curious if there is a conflict (or too narrow of a time gap) of
timeouts in your environment to allow re-registration by the agent after
the agent notices it needs to re-establish the connection.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Nov 10, 2015 at 5:02 AM, Jeremy Olexa 
wrote:

> Hi Tommy, Erik, all,
>
>
> You are correct in your assumption that I'm trying to solve for a one hour
> session expire time on a firewall. For some more background info, our
> master cluster is in datacenter X, the slaves in X will stay "up" for days
> and days. The slaves in a different datacenter, Y, connected to that master
> cluster will stay "up" for about a few days and restart. The master cluster
> is healthy, with a stable leader for months (no flapping), same for the ZK
> "leader". There are about 35 slaves in datacenter Y. Maybe the firewall
> session timer is a red herring because the slave restart is seemingly
> random (the slave with the highest uptime is 6 days, but a handful only
> have uptime of a day)
>
>
> I've started debugging this awhile ago, and the gist of the logs is here:
> https://gist.github.com/jolexa/1a80e26a4b017846d083 I've posted this back
> in October seeking help and Benjamin suggested network issues in both
> directions, so I thought firewall.
>
>
> Thanks for any hints,
>
> Jeremy
>
> --
> *From:* tommy xiao 
> *Sent:* Tuesday, November 10, 2015 3:07 AM
>
> *To:* user@mesos.apache.org
> *Subject:* Re: Mesos and Zookeeper TCP keepalive
>
> same here , same question with Erik. could you please input more
> background info, thanks
>
> 2015-11-10 15:56 GMT+08:00 Erik Weathers :
>
>> It would really help if you (Jeremy) explained the *actual* problem you
>> are facing.  I'm *guessing* that it's a firewall timing out the sessions
>> because there isn't activity on them for whatever the timeout of the
>> firewall is?   It seems likely to be unreasonably short, given that mesos
>> has constant activity between master and
>> slave/agent/whatever-it-is-being-called-nowadays-but-not-really-yet-maybe-someday-for-reals.
>>
>> - Erik
>>
>> On Mon, Nov 9, 2015 at 10:00 PM, Jojy Varghese 
>> wrote:
>>
>>> Hi Jeremy
>>>  Its great that you are making progress but I doubt if this is what you
>>> intend to achieve since network failures are a valid state in distributed
>>> systems. If you think there is a special case you are trying to solve, I
>>> suggest proposing a design document for review.
>>>   For ZK client code, I would suggest asking the zookeeper mailing list.
>>>
>>> thanks
>>> -Jojy
>>>
>>> On Nov 9, 2015, at 7:56 PM, Jeremy Olexa  wrote:
>>>
>>> Alright, great, I'm making some progress,
>>>
>>> I did a simple copy/paste modification and recompiled mesos. The
>>> keepalive timer is set from slave to master so this is an improvement for
>>> me. I didn't test the other direction yet -
>>> https://gist.github.com/jolexa/ee9e152aa7045c558e02 - I'd like to file
>>> an enhancement request for this since it seems like an improvement for
>>> other people as well, after some real world testing
>>>
>>> I'm having some harder time figuring out the zk client code. I started
>>> by modifying build/3rdparty/zookeeper-3.4.5/src/c/zookeeper.c but either a)
>>> my change wasn't correct or b) I'm modifying a wrong file, since I
>>> just assumed using the c client. Is this the correct place?
>>>
>>> Thanks much,
>>> Jeremy
>>>
>>>
>>> --
>>> *From:* Jojy Varghese 
>>> *Sent:* Monday, November 9, 2015 2:09 PM
>>> *To:* user@mesos.apache.org
>>> *Subject:* Re: Mesos and Zookeeper TCP keepalive
>>>
>>> Hi Jeremy
>>>  The “network” code is at
>>> "3rdparty/libprocess/include/process/network.hpp” ,
>>> "3rdparty/libprocess/src/poll_socket.hpp/cpp”.
>>>
>>> thanks
>>> jojy
>>>
>>>
>>> On Nov 9, 2015, at 6:54 AM, Jeremy Olexa  wrote:
>>>
>>> Hi all,
>>>
>>> Jojy, That is correct, but more specifically a keepalive timer from
>>> slave to master and slave to zookeeper. Can you send a link to the portion
>>> of the code that builds the socket/connection? Is there any reason to not
>>> set the SO_KEEPALIVE option in your opinion?
>>>
>>> hasodent, I'm not looking for keepalive between zk quorum members, like
>>> the ZOOKEEPER JIRA is referencing.
>>>
>>> Thanks,
>>> Jeremy
>>>
>>>
>>> --
>>> *From:* Jojy Varghese 
>>> *Sent:* Sunday, November 8, 2015 8:37 

what's the best way to monitor mesos cluster

2015-11-10 Thread Du, Fan

Hi Mesos experts

There is server and client snapshot metrics in jason format provided by 
Mesos itself.

but more often we want to extend the metrics a bit more than that.

I have been looking for this for a couple of days, while 
https://collectd.org/ comes
to my sight, it also has a mesos plugin 
https://github.com/rayrod2030/collectd-mesos.


Is there any recommended such open source project to do this task?
Thanks.


Re: Mesos and Zookeeper TCP keepalive

2015-11-10 Thread Jeremy Olexa
Hi Tommy, Erik, all,


You are correct in your assumption that I'm trying to solve for a one hour 
session expire time on a firewall. For some more background info, our master 
cluster is in datacenter X, the slaves in X will stay "up" for days and days. 
The slaves in a different datacenter, Y, connected to that master cluster will 
stay "up" for about a few days and restart. The master cluster is healthy, with 
a stable leader for months (no flapping), same for the ZK "leader". There are 
about 35 slaves in datacenter Y. Maybe the firewall session timer is a red 
herring because the slave restart is seemingly random (the slave with the 
highest uptime is 6 days, but a handful only have uptime of a day)


I've started debugging this awhile ago, and the gist of the logs is here: 
https://gist.github.com/jolexa/1a80e26a4b017846d083 I've posted this back in 
October seeking help and Benjamin suggested network issues in both directions, 
so I thought firewall.


Thanks for any hints,

Jeremy


From: tommy xiao 
Sent: Tuesday, November 10, 2015 3:07 AM
To: user@mesos.apache.org
Subject: Re: Mesos and Zookeeper TCP keepalive

same here , same question with Erik. could you please input more background 
info, thanks

2015-11-10 15:56 GMT+08:00 Erik Weathers 
>:
It would really help if you (Jeremy) explained the *actual* problem you are 
facing.  I'm *guessing* that it's a firewall timing out the sessions because 
there isn't activity on them for whatever the timeout of the firewall is?   It 
seems likely to be unreasonably short, given that mesos has constant activity 
between master and 
slave/agent/whatever-it-is-being-called-nowadays-but-not-really-yet-maybe-someday-for-reals.

- Erik

On Mon, Nov 9, 2015 at 10:00 PM, Jojy Varghese 
> wrote:
Hi Jeremy
 Its great that you are making progress but I doubt if this is what you intend 
to achieve since network failures are a valid state in distributed systems. If 
you think there is a special case you are trying to solve, I suggest proposing 
a design document for review.
  For ZK client code, I would suggest asking the zookeeper mailing list.

thanks
-Jojy

On Nov 9, 2015, at 7:56 PM, Jeremy Olexa 
> wrote:

Alright, great, I'm making some progress,

I did a simple copy/paste modification and recompiled mesos. The keepalive 
timer is set from slave to master so this is an improvement for me. I didn't 
test the other direction yet - 
https://gist.github.com/jolexa/ee9e152aa7045c558e02 - I'd like to file an 
enhancement request for this since it seems like an improvement for other 
people as well, after some real world testing

I'm having some harder time figuring out the zk client code. I started by 
modifying build/3rdparty/zookeeper-3.4.5/src/c/zookeeper.c but either a) my 
change wasn't correct or b) I'm modifying a wrong file, since I just assumed 
using the c client. Is this the correct place?

Thanks much,
Jeremy



From: Jojy Varghese >
Sent: Monday, November 9, 2015 2:09 PM
To: user@mesos.apache.org
Subject: Re: Mesos and Zookeeper TCP keepalive

Hi Jeremy
 The “network” code is at "3rdparty/libprocess/include/process/network.hpp” , 
"3rdparty/libprocess/src/poll_socket.hpp/cpp”.

thanks
jojy


On Nov 9, 2015, at 6:54 AM, Jeremy Olexa 
> wrote:

Hi all,

Jojy, That is correct, but more specifically a keepalive timer from slave to 
master and slave to zookeeper. Can you send a link to the portion of the code 
that builds the socket/connection? Is there any reason to not set the 
SO_KEEPALIVE option in your opinion?

hasodent, I'm not looking for keepalive between zk quorum members, like the 
ZOOKEEPER JIRA is referencing.

Thanks,
Jeremy



From: Jojy Varghese >
Sent: Sunday, November 8, 2015 8:37 PM
To: user@mesos.apache.org
Subject: Re: Mesos and Zookeeper TCP keepalive

Hi Jeremy
  Are you trying to establish a keepalive timer between mesos master and mesos 
slave? If so, I don’t believe its possible today as SO_KEEPALIVE option is  not 
set on an accepting socket.

-Jojy

On Nov 8, 2015, at 8:43 AM, haosdent 
> wrote:

I think keepalive option should be set in Zookeeper, not in Mesos. See this 
related issue in Zookeeper. 
https://issues.apache.org/jira/browse/ZOOKEEPER-2246?focusedCommentId=14724085=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14724085

On Sun, Nov 8, 2015 at 4:47 AM, Jeremy Olexa 
> wrote:
Hello all,

We have been fighting some network/session