[jira] [Commented] (MESOS-7271) JNI SIGSEGV failed when connecting spark to mesos master

2017-05-18 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016089#comment-16016089
 ] 

Michael Gummelt commented on MESOS-7271:


No, Oracle 1.8.0_112

> JNI SIGSEGV failed when connecting spark to mesos master
> 
>
> Key: MESOS-7271
> URL: https://issues.apache.org/jira/browse/MESOS-7271
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 1.1.0, 1.2.0
> Environment: Ubuntu 16.04, OpenJDK 8, Spark 2.1.1
>Reporter: Qi Cui
>
> Run starting. Expected test count is: 1
> SampleDataFrameTest:
> 17/03/20 11:53:16 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0320 11:53:19.775842  4679 process.cpp:1071] libprocess is initialized on 
> 192.168.0.99:38293 with 8 worker threads
> I0320 11:53:19.775975  4679 logging.cpp:199] Logging to STDERR
> I0320 11:53:19.789871  4725 sched.cpp:226] Version: 1.1.0
> I0320 11:53:19.832826  4717 sched.cpp:330] New master detected at 
> master@192.168.0.50:5050
> I0320 11:53:19.838253  4717 sched.cpp:341] No credentials provided. 
> Attempting to register without authentication
> I0320 11:53:19.838337  4717 sched.cpp:820] Sending SUBSCRIBE call to 
> master@192.168.0.50:5050
> I0320 11:53:19.840265  4717 sched.cpp:853] Will retry registration in 
> 32.354951ms if necessary
> I0320 11:53:19.844734  4717 sched.cpp:743] Framework registered with 
> 6e147824-5d88-411b-9c09-a7137565c309-0001
> I0320 11:53:19.864850  4717 sched.cpp:757] Scheduler::registered took 
> 20.022604ms
> ERROR: exception pending on entry to FindMesosClass()
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7ffa06fea4a6, pid=4677, tid=0x7ff9a1a46700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_121-b13) (build 
> 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)
> # Java VM: OpenJDK 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.so+0x6744a6]
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /media/sf_G_DRIVE/src/spark-testing-base/hs_err_pid4677.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7271) JNI SIGSEGV failed when connecting spark to mesos master

2017-03-24 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940744#comment-15940744
 ] 

Michael Gummelt commented on MESOS-7271:


I don't know, but I've been running Spark 2.1 against Mesos 1.2 w/o any 
problems, so I can't repro this.

> JNI SIGSEGV failed when connecting spark to mesos master
> 
>
> Key: MESOS-7271
> URL: https://issues.apache.org/jira/browse/MESOS-7271
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Affects Versions: 1.1.0, 1.2.0
> Environment: Ubuntu 16.04, OpenJDK 8, Spark 2.1.1
>Reporter: Qi Cui
>
> Run starting. Expected test count is: 1
> SampleDataFrameTest:
> 17/03/20 11:53:16 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0320 11:53:19.775842  4679 process.cpp:1071] libprocess is initialized on 
> 192.168.0.99:38293 with 8 worker threads
> I0320 11:53:19.775975  4679 logging.cpp:199] Logging to STDERR
> I0320 11:53:19.789871  4725 sched.cpp:226] Version: 1.1.0
> I0320 11:53:19.832826  4717 sched.cpp:330] New master detected at 
> master@192.168.0.50:5050
> I0320 11:53:19.838253  4717 sched.cpp:341] No credentials provided. 
> Attempting to register without authentication
> I0320 11:53:19.838337  4717 sched.cpp:820] Sending SUBSCRIBE call to 
> master@192.168.0.50:5050
> I0320 11:53:19.840265  4717 sched.cpp:853] Will retry registration in 
> 32.354951ms if necessary
> I0320 11:53:19.844734  4717 sched.cpp:743] Framework registered with 
> 6e147824-5d88-411b-9c09-a7137565c309-0001
> I0320 11:53:19.864850  4717 sched.cpp:757] Scheduler::registered took 
> 20.022604ms
> ERROR: exception pending on entry to FindMesosClass()
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x7ffa06fea4a6, pid=4677, tid=0x7ff9a1a46700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_121-b13) (build 
> 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)
> # Java VM: OpenJDK 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.so+0x6744a6]
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core 
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /media/sf_G_DRIVE/src/spark-testing-base/hs_err_pid4677.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-6875) Copy backend fails to copy container

2017-01-06 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-6875:
--

 Summary: Copy backend fails to copy container
 Key: MESOS-6875
 URL: https://issues.apache.org/jira/browse/MESOS-6875
 Project: Mesos
  Issue Type: Bug
  Components: agent, containerization
Affects Versions: 1.1.0
Reporter: Michael Gummelt


cc [~gilbert]

I get the following error when trying to launch a custom executor in 
mgummelt/couchbase:latest (which is just ubuntu:14.04 with {{erl}} installed).

{code}
E0106 19:43:18.759450  3597 slave.cpp:4562] Container 
'c1958040-3ca0-4d46-ab32-0c307919be9b' for executor 
'server__5cebe7d5-28c3-465c-a442-0ecd49364e62' of framework 
dbf21cd6-e559-45cf-a159-704aa10d2482-0002 failed to start: Collect failed: 
Failed to copy layer: cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Africa/Lusaka':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Africa/Mbabane':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/America/Curacao':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Katmandu':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Kuwait':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Thimphu':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Asia/Urumqi':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Atlantic/St_Helena':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/Lord_Howe':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/North':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/Sydney':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Australia/Tasmania':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Pacific/Easter':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Pacific/Saipan':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/Zulu':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/right/Africa/Lusaka':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/right/Africa/Mbabane':
 Too many levels of symbolic links
cp: cannot stat 
'/var/lib/mesos/slave/provisioner/containers/c1958040-3ca0-4d46-ab32-0c307919be9b/backends/copy/rootfses/e838669b-c728-4609-961e-218584210909/usr/share/zoneinfo/right/America/Curacao':
 Too many levels of symbolic 

[jira] [Created] (MESOS-6874) Agent silently ignores FS isolation when protobuf is malformed

2017-01-06 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-6874:
--

 Summary: Agent silently ignores FS isolation when protobuf is 
malformed
 Key: MESOS-6874
 URL: https://issues.apache.org/jira/browse/MESOS-6874
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Michael Gummelt


cc [~vinodkone]

I accidentally set my Mesos ContainerInfo to include a DockerInfo instead of a 
MesosInfo:

{code}
executorInfoBuilder.setContainer(
 Protos.ContainerInfo.newBuilder()
 .setType(Protos.ContainerInfo.Type.MESOS)
 .setDocker(Protos.ContainerInfo.DockerInfo.newBuilder()
 .setImage(podSpec.getContainer().get().getImageName()))
{code}

I would have expected a validation error before or during containerization, but 
instead, the agent silently decided to ignore filesystem isolation altogether, 
and launch my executor on the host filesystem. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6754) Include command in task's state.json entry

2016-12-07 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-6754:
--

 Summary: Include command in task's state.json entry
 Key: MESOS-6754
 URL: https://issues.apache.org/jira/browse/MESOS-6754
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Michael Gummelt


I often would like to determine which command a task is running w/o having to 
SSH into the box and {{ps}}.  I'm currently doing this for HDFS, for example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6113) Offer Quota resources as revocable

2016-09-02 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459148#comment-15459148
 ] 

Michael Gummelt commented on MESOS-6113:


Maybe.  It's not clear to me from the title nor the description that MESOS-4392 
is proposing to mark quota as revocable. 

> Offer Quota resources as revocable
> --
>
> Key: MESOS-6113
> URL: https://issues.apache.org/jira/browse/MESOS-6113
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> *Goal:*
> I have high-priority Spark jobs, and best-effort jobs.  I need my 
> high-priority jobs to pre-empt my best-effort jobs, so I'd like to launch the 
> best-effort jobs on revocable resources. 
> *Problem:*
> Revocable resources are currently only created via oversubscription, where 
> resources allocated to but not used by a framework will be offered to other 
> frameworks.  This doesn't support the ability for a high-pri framework to 
> start up and pre-empty a low-pri framework.
> *Solution:*
> Let's allow quota (and ideally any reserved resources) to be configurable to 
> be offered as revocable resources to other frameworks that don't register 
> with the role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6113) Offer reserved resources as revocable

2016-09-01 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456102#comment-15456102
 ] 

Michael Gummelt commented on MESOS-6113:


I tend to think of quota as just another interface to marking resources as 
reserved, but I understand there are some differences, yes.

> Offer reserved resources as revocable
> -
>
> Key: MESOS-6113
> URL: https://issues.apache.org/jira/browse/MESOS-6113
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> *Goal:*
> I have high-priority Spark jobs, and best-effort jobs.  I need my 
> high-priority jobs to pre-empt my best-effort jobs, so I'd like to launch the 
> best-effort jobs on revocable resources. 
> *Problem:*
> Revocable resources are currently only created via oversubscription, where 
> resources allocated to but not used by a framework will be offered to other 
> frameworks.  This doesn't support the ability for a high-pri framework to 
> start up and pre-empty a low-pri framework.
> *Solution:*
> Let's allow quota (and ideally any reserved resources) to be configurable to 
> be offered as revocable resources to other frameworks that don't register 
> with the role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6112) Frameworks are starved when > 5 are run concurrently

2016-09-01 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456099#comment-15456099
 ] 

Michael Gummelt commented on MESOS-6112:


That's a fine workaround, yea.  I can solve my immediate problem.  The purpose 
of this JIRA is more to make it so this kind of cooperative scheduling is 
unnecessary.

> Frameworks are starved when > 5 are run concurrently
> 
>
> Key: MESOS-6112
> URL: https://issues.apache.org/jira/browse/MESOS-6112
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> As I understand it, the master will send an offer to a list of frameworks 
> ordered by DRF, until the offer is accepted.  There is a 1s wait time between 
> each offering.  Once the decline timeout for the first framework has been 
> reached, rather than continuing to submit the offer to the rest of the 
> frameworks in the list, the master starts over at the beginning, starving the 
> rest of the frameworks.
> This means that in order for Mesos to support > 5 concurrent frameworks, all 
> frameworks must be good citizens and set their decline timeout to something 
> large or suppress offers.  I think this is a fairly undesirable state of 
> things.
> I propose that the master instead continues to submit the offer to every 
> registered framework, even if the declineOffer timeout has been reached.
> The potential increase in task startup latency that could be introduced by 
> this change can be obviated in part if we also make the master smarter about 
> how long to wait between successive offers, rather than a static 1s.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6112) Frameworks are starved when > 5 are run concurrently

2016-08-31 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453107#comment-15453107
 ] 

Michael Gummelt commented on MESOS-6112:


It is a duplicate, yes, but the quota workaround doesn't work for all cases 
(including mine).  If I statically partition my starved framework so that it 
always has enough resources, this prevents other frameworks from using the 
slack, which is desirable.  I just don't want a framework ranked high on DRF to 
starve that framework even when it's *not* using those resources.

> Frameworks are starved when > 5 are run concurrently
> 
>
> Key: MESOS-6112
> URL: https://issues.apache.org/jira/browse/MESOS-6112
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> As I understand it, the master will send an offer to a list of frameworks 
> ordered by DRF, until the offer is accepted.  There is a 1s wait time between 
> each offering.  Once the decline timeout for the first framework has been 
> reached, rather than continuing to submit the offer to the rest of the 
> frameworks in the list, the master starts over at the beginning, starving the 
> rest of the frameworks.
> This means that in order for Mesos to support > 5 concurrent frameworks, all 
> frameworks must be good citizens and set their decline timeout to something 
> large or suppress offers.  I think this is a fairly undesirable state of 
> things.
> I propose that the master instead continues to submit the offer to every 
> registered framework, even if the declineOffer timeout has been reached.
> The potential increase in task startup latency that could be introduced by 
> this change can be obviated in part if we also make the master smarter about 
> how long to wait between successive offers, rather than a static 1s.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6113) Offer reserved resources as revocable

2016-08-31 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-6113:
---
Description: 
*Goal:*

I have high-priority Spark jobs, and best-effort jobs.  I need my high-priority 
jobs to pre-empt my best-effort jobs, so I'd like to launch the best-effort 
jobs on revocable resources. 

*Problem:*

Revocable resources are currently only created via oversubscription, where 
resources allocated to but not used by a framework will be offered to other 
frameworks.  This doesn't support the ability for a high-pri framework to start 
up and pre-empty a low-pri framework.

*Solution:*

Let's allow quota (and ideally any reserved resources) to be configurable to be 
offered as revocable resources to other frameworks that don't register with the 
role.

  was:
*Goal:*

I have high-priority Spark jobs, and best-effort jobs.  I need my high-priority 
jobs to pre-empt my best-effort jobs, so I'd like to launch the best-effort 
jobs on revocable resources. 

*Problem:*

Revocable resources are currently only created via oversubscription, where 
resources allocated to but not used by a framework will be offered to other 
frameworks.  This doesn't support the ability for a high-pri framework to start 
up and pre-empty a low-pri framework.

*Solution:*

Let's allow quota to be configured to be offered as revocable resources to 
other frameworks that don't register with the role.


> Offer reserved resources as revocable
> -
>
> Key: MESOS-6113
> URL: https://issues.apache.org/jira/browse/MESOS-6113
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> *Goal:*
> I have high-priority Spark jobs, and best-effort jobs.  I need my 
> high-priority jobs to pre-empt my best-effort jobs, so I'd like to launch the 
> best-effort jobs on revocable resources. 
> *Problem:*
> Revocable resources are currently only created via oversubscription, where 
> resources allocated to but not used by a framework will be offered to other 
> frameworks.  This doesn't support the ability for a high-pri framework to 
> start up and pre-empty a low-pri framework.
> *Solution:*
> Let's allow quota (and ideally any reserved resources) to be configurable to 
> be offered as revocable resources to other frameworks that don't register 
> with the role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6113) Offer reserved resources as revocable

2016-08-31 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-6113:
---
Summary: Offer reserved resources as revocable  (was: Revocable resources 
for quota)

> Offer reserved resources as revocable
> -
>
> Key: MESOS-6113
> URL: https://issues.apache.org/jira/browse/MESOS-6113
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> *Goal:*
> I have high-priority Spark jobs, and best-effort jobs.  I need my 
> high-priority jobs to pre-empt my best-effort jobs, so I'd like to launch the 
> best-effort jobs on revocable resources. 
> *Problem:*
> Revocable resources are currently only created via oversubscription, where 
> resources allocated to but not used by a framework will be offered to other 
> frameworks.  This doesn't support the ability for a high-pri framework to 
> start up and pre-empty a low-pri framework.
> *Solution:*
> Let's allow quota to be configured to be offered as revocable resources to 
> other frameworks that don't register with the role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6113) Revocable resources for quota

2016-08-31 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453042#comment-15453042
 ] 

Michael Gummelt commented on MESOS-6113:


cc [~clambert]

> Revocable resources for quota
> -
>
> Key: MESOS-6113
> URL: https://issues.apache.org/jira/browse/MESOS-6113
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> *Goal:*
> I have high-priority Spark jobs, and best-effort jobs.  I need my 
> high-priority jobs to pre-empt my best-effort jobs, so I'd like to launch the 
> best-effort jobs on revocable resources. 
> *Problem:*
> Revocable resources are currently only created via oversubscription, where 
> resources allocated to but not used by a framework will be offered to other 
> frameworks.  This doesn't support the ability for a high-pri framework to 
> start up and pre-empty a low-pri framework.
> *Solution:*
> Let's allow quota to be configured to be offered as revocable resources to 
> other frameworks that don't register with the role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6113) Revocable resources for quota

2016-08-31 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-6113:
--

 Summary: Revocable resources for quota
 Key: MESOS-6113
 URL: https://issues.apache.org/jira/browse/MESOS-6113
 Project: Mesos
  Issue Type: Task
  Components: allocation
Affects Versions: 1.0.1
Reporter: Michael Gummelt


*Goal:*

I have high-priority Spark jobs, and best-effort jobs.  I need my high-priority 
jobs to pre-empt my best-effort jobs, so I'd like to launch the best-effort 
jobs on revocable resources. 

*Problem:*

Revocable resources are currently only created via oversubscription, where 
resources allocated to but not used by a framework will be offered to other 
frameworks.  This doesn't support the ability for a high-pri framework to start 
up and pre-empty a low-pri framework.

*Solution:*

Let's allow quota to be configured to be offered as revocable resources to 
other frameworks that don't register with the role.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6112) Frameworks are starved when > 5 are run concurrently

2016-08-31 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452826#comment-15452826
 ] 

Michael Gummelt edited comment on MESOS-6112 at 8/31/16 5:38 PM:
-

{quote}
When a framework declines an offer for 5s it says "I don't need these 
particular resources for the next 5s".
{quote}

Sort of.  My scheduler (e.g. Kafka) is really saying "I don't need these 
particular resources right now.  I don't know when I may need them in the 
future.  Here's a timeout that represents some tradeoff I've determined between 
latency and good citizenship (fairness)."

{quote}
Or, even better, call suppressOffers()? Is it hard to understand / implement?
{quote}

I can call {{suppressOffers()}}.  You're right, it's not that hard.  But it 
only partially solves the problem.  There will still exist practically 
unbounded periods of time when I can't suppress.  For example, when one of my 
data nodes fails, I'll try to wait until its persistent volume is offered back 
to me.

But the larger issue is that solutions such as this require all frameworks to 
be good citizens, which is brittle and unscalable.






was (Author: mgummelt):
> When a framework declines an offer for 5s it says "I don't need these 
> particular resources for the next 5s".

Sort of.  My scheduler (e.g. Kafka) is really saying "I don't need these 
particular resources right now.  I don't know when I may need them in the 
future.  Here's a timeout that represents some tradeoff I've determined between 
latency and good citizenship (fairness)."

>  Or, even better, call suppressOffers()? Is it hard to understand / implement?

I can call {{suppressOffers()}}.  You're right, it's not that hard.  But it 
only partially solves the problem.  There will still exist practically 
unbounded periods of time when I can't suppress.  For example, when one of my 
data nodes fails, I'll try to wait until its persistent volume is offered back 
to me.

But the larger issue is that solutions such as this require all frameworks to 
be good citizens, which is brittle and unscalable.





> Frameworks are starved when > 5 are run concurrently
> 
>
> Key: MESOS-6112
> URL: https://issues.apache.org/jira/browse/MESOS-6112
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> As I understand it, the master will send an offer to a list of frameworks 
> ordered by DRF, until the offer is accepted.  There is a 1s wait time between 
> each offering.  Once the decline timeout for the first framework has been 
> reached, rather than continuing to submit the offer to the rest of the 
> frameworks in the list, the master starts over at the beginning, starving the 
> rest of the frameworks.
> This means that in order for Mesos to support > 5 concurrent frameworks, all 
> frameworks must be good citizens and set their decline timeout to something 
> large or suppress offers.  I think this is a fairly undesirable state of 
> things.
> I propose that the master instead continues to submit the offer to every 
> registered framework, even if the declineOffer timeout has been reached.
> The potential increase in task startup latency that could be introduced by 
> this change can be obviated in part if we also make the master smarter about 
> how long to wait between successive offers, rather than a static 1s.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-6112) Frameworks are starved when > 5 are run concurrently

2016-08-31 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-6112:
---
Comment: was deleted

(was: > When a framework declines an offer for 5s it says "I don't need these 
particular resources for the next 5s".

Sort of.  My scheduler (e.g. Kafka) is really saying "I don't need these 
particular resources right now.  I don't know when I may need them in the 
future.  Here's a timeout that represents some tradeoff I've determined between 
latency and good citizenship (fairness)."

>  Or, even better, call suppressOffers()? Is it hard to understand / implement?

I can call {{suppressOffers()}}.  You're right, it's not that hard.  But it 
only partially solves the problem.  There will still exist practically 
unbounded periods of time when I can't suppress.  For example, when one of my 
data nodes fails, I'll try to wait until its persistent volume is offered back 
to me.

But the larger issue is that solutions such as this require all frameworks to 
be good citizens, which is brittle and unscalable.



)

> Frameworks are starved when > 5 are run concurrently
> 
>
> Key: MESOS-6112
> URL: https://issues.apache.org/jira/browse/MESOS-6112
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> As I understand it, the master will send an offer to a list of frameworks 
> ordered by DRF, until the offer is accepted.  There is a 1s wait time between 
> each offering.  Once the decline timeout for the first framework has been 
> reached, rather than continuing to submit the offer to the rest of the 
> frameworks in the list, the master starts over at the beginning, starving the 
> rest of the frameworks.
> This means that in order for Mesos to support > 5 concurrent frameworks, all 
> frameworks must be good citizens and set their decline timeout to something 
> large or suppress offers.  I think this is a fairly undesirable state of 
> things.
> I propose that the master instead continues to submit the offer to every 
> registered framework, even if the declineOffer timeout has been reached.
> The potential increase in task startup latency that could be introduced by 
> this change can be obviated in part if we also make the master smarter about 
> how long to wait between successive offers, rather than a static 1s.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6112) Frameworks are starved when > 5 are run concurrently

2016-08-31 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452827#comment-15452827
 ] 

Michael Gummelt commented on MESOS-6112:


> When a framework declines an offer for 5s it says "I don't need these 
> particular resources for the next 5s".

Sort of.  My scheduler (e.g. Kafka) is really saying "I don't need these 
particular resources right now.  I don't know when I may need them in the 
future.  Here's a timeout that represents some tradeoff I've determined between 
latency and good citizenship (fairness)."

>  Or, even better, call suppressOffers()? Is it hard to understand / implement?

I can call {{suppressOffers()}}.  You're right, it's not that hard.  But it 
only partially solves the problem.  There will still exist practically 
unbounded periods of time when I can't suppress.  For example, when one of my 
data nodes fails, I'll try to wait until its persistent volume is offered back 
to me.

But the larger issue is that solutions such as this require all frameworks to 
be good citizens, which is brittle and unscalable.





> Frameworks are starved when > 5 are run concurrently
> 
>
> Key: MESOS-6112
> URL: https://issues.apache.org/jira/browse/MESOS-6112
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> As I understand it, the master will send an offer to a list of frameworks 
> ordered by DRF, until the offer is accepted.  There is a 1s wait time between 
> each offering.  Once the decline timeout for the first framework has been 
> reached, rather than continuing to submit the offer to the rest of the 
> frameworks in the list, the master starts over at the beginning, starving the 
> rest of the frameworks.
> This means that in order for Mesos to support > 5 concurrent frameworks, all 
> frameworks must be good citizens and set their decline timeout to something 
> large or suppress offers.  I think this is a fairly undesirable state of 
> things.
> I propose that the master instead continues to submit the offer to every 
> registered framework, even if the declineOffer timeout has been reached.
> The potential increase in task startup latency that could be introduced by 
> this change can be obviated in part if we also make the master smarter about 
> how long to wait between successive offers, rather than a static 1s.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6112) Frameworks are starved when > 5 are run concurrently

2016-08-31 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452826#comment-15452826
 ] 

Michael Gummelt commented on MESOS-6112:


> When a framework declines an offer for 5s it says "I don't need these 
> particular resources for the next 5s".

Sort of.  My scheduler (e.g. Kafka) is really saying "I don't need these 
particular resources right now.  I don't know when I may need them in the 
future.  Here's a timeout that represents some tradeoff I've determined between 
latency and good citizenship (fairness)."

>  Or, even better, call suppressOffers()? Is it hard to understand / implement?

I can call {{suppressOffers()}}.  You're right, it's not that hard.  But it 
only partially solves the problem.  There will still exist practically 
unbounded periods of time when I can't suppress.  For example, when one of my 
data nodes fails, I'll try to wait until its persistent volume is offered back 
to me.

But the larger issue is that solutions such as this require all frameworks to 
be good citizens, which is brittle and unscalable.





> Frameworks are starved when > 5 are run concurrently
> 
>
> Key: MESOS-6112
> URL: https://issues.apache.org/jira/browse/MESOS-6112
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> As I understand it, the master will send an offer to a list of frameworks 
> ordered by DRF, until the offer is accepted.  There is a 1s wait time between 
> each offering.  Once the decline timeout for the first framework has been 
> reached, rather than continuing to submit the offer to the rest of the 
> frameworks in the list, the master starts over at the beginning, starving the 
> rest of the frameworks.
> This means that in order for Mesos to support > 5 concurrent frameworks, all 
> frameworks must be good citizens and set their decline timeout to something 
> large or suppress offers.  I think this is a fairly undesirable state of 
> things.
> I propose that the master instead continues to submit the offer to every 
> registered framework, even if the declineOffer timeout has been reached.
> The potential increase in task startup latency that could be introduced by 
> this change can be obviated in part if we also make the master smarter about 
> how long to wait between successive offers, rather than a static 1s.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6112) Frameworks are starved when > 5 are run concurrently

2016-08-30 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-6112:
---
Component/s: allocation

> Frameworks are starved when > 5 are run concurrently
> 
>
> Key: MESOS-6112
> URL: https://issues.apache.org/jira/browse/MESOS-6112
> Project: Mesos
>  Issue Type: Task
>  Components: allocation, master
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> As I understand it, the master will send an offer to a list of frameworks 
> ordered by DRF, until the offer is accepted.  There is a 1s wait time between 
> each offering.  Once the decline timeout for the first framework has been 
> reached, rather than continuing to submit the offer to the rest of the 
> frameworks in the list, the master starts over at the beginning, starving the 
> rest of the frameworks.
> This means that in order for Mesos to support > 5 concurrent frameworks, all 
> frameworks must be good citizens and set their decline timeout to something 
> large or suppress offers.  I think this is a fairly undesirable state of 
> things.
> I propose that the master instead continues to submit the offer to every 
> registered framework, even if the declineOffer timeout has been reached.
> The potential increase in task startup latency that could be introduced by 
> this change can be obviated in part if we also make the master smarter about 
> how long to wait between successive offers, rather than a static 1s.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6112) Frameworks are starved when > 5 are run concurrently

2016-08-30 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450578#comment-15450578
 ] 

Michael Gummelt commented on MESOS-6112:


cc [~gabriel.hartm...@gmail.com] [~clambert]

> Frameworks are starved when > 5 are run concurrently
> 
>
> Key: MESOS-6112
> URL: https://issues.apache.org/jira/browse/MESOS-6112
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Affects Versions: 1.0.1
>Reporter: Michael Gummelt
>
> As I understand it, the master will send an offer to a list of frameworks 
> ordered by DRF, until the offer is accepted.  There is a 1s wait time between 
> each offering.  Once the decline timeout for the first framework has been 
> reached, rather than continuing to submit the offer to the rest of the 
> frameworks in the list, the master starts over at the beginning, starving the 
> rest of the frameworks.
> This means that in order for Mesos to support > 5 concurrent frameworks, all 
> frameworks must be good citizens and set their decline timeout to something 
> large or suppress offers.  I think this is a fairly undesirable state of 
> things.
> I propose that the master instead continues to submit the offer to every 
> registered framework, even if the declineOffer timeout has been reached.
> The potential increase in task startup latency that could be introduced by 
> this change can be obviated in part if we also make the master smarter about 
> how long to wait between successive offers, rather than a static 1s.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6112) Frameworks are starved when > 5 are run concurrently

2016-08-30 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-6112:
--

 Summary: Frameworks are starved when > 5 are run concurrently
 Key: MESOS-6112
 URL: https://issues.apache.org/jira/browse/MESOS-6112
 Project: Mesos
  Issue Type: Task
  Components: master
Affects Versions: 1.0.1
Reporter: Michael Gummelt


As I understand it, the master will send an offer to a list of frameworks 
ordered by DRF, until the offer is accepted.  There is a 1s wait time between 
each offering.  Once the decline timeout for the first framework has been 
reached, rather than continuing to submit the offer to the rest of the 
frameworks in the list, the master starts over at the beginning, starving the 
rest of the frameworks.

This means that in order for Mesos to support > 5 concurrent frameworks, all 
frameworks must be good citizens and set their decline timeout to something 
large or suppress offers.  I think this is a fairly undesirable state of things.

I propose that the master instead continues to submit the offer to every 
registered framework, even if the declineOffer timeout has been reached.

The potential increase in task startup latency that could be introduced by this 
change can be obviated in part if we also make the master smarter about how 
long to wait between successive offers, rather than a static 1s.

  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6111) Offer cycle is undocumented

2016-08-30 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-6111:
--

 Summary: Offer cycle is undocumented
 Key: MESOS-6111
 URL: https://issues.apache.org/jira/browse/MESOS-6111
 Project: Mesos
  Issue Type: Task
  Components: documentation
Affects Versions: 1.0.1
Reporter: Michael Gummelt


cc [~neilc]

AFAICT, the "offer cycle" in Mesos is undocumented.  As it has been explained 
to me, the master will send on offer to a successive list of frameworks ordered 
by DRF, with a 1s gap in between each offer.  And when the decline timeout 
(default 5s) is reached, it will start over at the beginning of the list.  This 
means that, by default, all other frameworks other than the first 5 in DRF 
ordering will be starved.

I'm going to submit a separate JIRA with a proposal to fix this, but at the 
very least, we should document the above behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6030) Offer API

2016-08-11 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-6030:
--

 Summary: Offer API
 Key: MESOS-6030
 URL: https://issues.apache.org/jira/browse/MESOS-6030
 Project: Mesos
  Issue Type: Improvement
  Components: master
Affects Versions: 1.0.0
Reporter: Michael Gummelt


It's often difficult to debug a framework without knowing what it's being 
offered.  The scheduler can log the offers, but not all schedulers do so, and 
it's often behind a verbose logging option that can be difficult to enable in 
certain environments.

It would be much more helpful if Mesos offered and API for clients to view 
recent offers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5998) FINISHED task shown as Active in the UI

2016-08-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410056#comment-15410056
 ] 

Michael Gummelt commented on MESOS-5998:


http://mgummelt-mesos.s3.amazonaws.com/ui_screenshot.png

> FINISHED task shown as Active in the UI
> ---
>
> Key: MESOS-5998
> URL: https://issues.apache.org/jira/browse/MESOS-5998
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.0.0
>Reporter: Michael Gummelt
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5998) FINISHED task shown as Active in the UI

2016-08-05 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-5998:
---
Comment: was deleted

(was: http://mgummelt-mesos.s3.amazonaws.com/ui_screenshot.png)

> FINISHED task shown as Active in the UI
> ---
>
> Key: MESOS-5998
> URL: https://issues.apache.org/jira/browse/MESOS-5998
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.0.0
>Reporter: Michael Gummelt
>
> http://mgummelt-mesos.s3.amazonaws.com/ui_screenshot.png



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5998) FINISHED task shown as Active in the UI

2016-08-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410057#comment-15410057
 ] 

Michael Gummelt commented on MESOS-5998:


Can I add attachments to this JIRA?  I don't see how.

> FINISHED task shown as Active in the UI
> ---
>
> Key: MESOS-5998
> URL: https://issues.apache.org/jira/browse/MESOS-5998
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.0.0
>Reporter: Michael Gummelt
>
> http://mgummelt-mesos.s3.amazonaws.com/ui_screenshot.png



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5998) FINISHED task shown as Active in the UI

2016-08-05 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-5998:
---
Description: http://mgummelt-mesos.s3.amazonaws.com/ui_screenshot.png

> FINISHED task shown as Active in the UI
> ---
>
> Key: MESOS-5998
> URL: https://issues.apache.org/jira/browse/MESOS-5998
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.0.0
>Reporter: Michael Gummelt
>
> http://mgummelt-mesos.s3.amazonaws.com/ui_screenshot.png



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5998) FINISHED task shown as Active in the UI

2016-08-05 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-5998:
--

 Summary: FINISHED task shown as Active in the UI
 Key: MESOS-5998
 URL: https://issues.apache.org/jira/browse/MESOS-5998
 Project: Mesos
  Issue Type: Bug
  Components: webui
Affects Versions: 1.0.0
Reporter: Michael Gummelt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5971) Better handling for docker credentials

2016-08-02 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-5971:
---
Component/s: docker
 containerization

> Better handling for docker credentials
> --
>
> Key: MESOS-5971
> URL: https://issues.apache.org/jira/browse/MESOS-5971
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, docker
>Affects Versions: 1.0.0
>Reporter: Michael Gummelt
>
> Users often want to run Spark jobs in custom docker images that reside in 
> private registries.  We can adapt the marathon approach of passing docker 
> configs as fetcher URIs: 
> https://mesosphere.github.io/marathon/docs/native-docker-private-registry.html
> But this is a hack.  It would be nice if I could configure a mesos agent with 
> docker credentials beforehand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5971) Better handling for docker credentials

2016-08-02 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-5971:
--

 Summary: Better handling for docker credentials
 Key: MESOS-5971
 URL: https://issues.apache.org/jira/browse/MESOS-5971
 Project: Mesos
  Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Michael Gummelt


Users often want to run Spark jobs in custom docker images that reside in 
private registries.  We can adapt the marathon approach of passing docker 
configs as fetcher URIs: 
https://mesosphere.github.io/marathon/docs/native-docker-private-registry.html

But this is a hack.  It would be nice if I could configure a mesos agent with 
docker credentials beforehand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5866) MESOS_DIRECTORY set to a host path when using a docker image w/ unified containerizer

2016-07-19 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-5866:
--

 Summary: MESOS_DIRECTORY set to a host path when using a docker 
image w/ unified containerizer
 Key: MESOS-5866
 URL: https://issues.apache.org/jira/browse/MESOS-5866
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 0.28.2
Reporter: Michael Gummelt


Running Spark with the unified containerizer, it fails with:

{code}
16/07/19 21:03:09 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:36) 
failed in Unknown s due to Job aborted due to stage failure: Task serialization 
failed: java.io.IOException: Failed to create local dir in 
/var/lib/mesos/slave/slaves/003ebcc2-64e2-488f-87b9-f6fa7630c01b-S0/frameworks/003ebcc2-64e2-488f-87b9-f6fa7630c01b-0001/executors/driver-20160719210109-0002/runs/8f21b32e-b929-4369-bce9-9f49a3a8844f/blockmgr-e3a611d4-e0de-48cb-b17a-1e41d97e84c2/11.
{code}

This is because MESOS_DIRECTORY is set to /var/lib/mesos/, which is a host 
path.  The container can't see the host path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5865) MESOS_DIRECTORY is not set in the docker containerizer

2016-07-19 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-5865:
--

 Summary: MESOS_DIRECTORY is not set in the docker containerizer
 Key: MESOS-5865
 URL: https://issues.apache.org/jira/browse/MESOS-5865
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 0.28.2
Reporter: Michael Gummelt


I'm running Spark with the docker containerizer.  It sets MESOS_SANDBOX, but 
not MESOS_DIRECTORY.  The docs indicate that MESOS_DIRECTORY should be set: 
https://github.com/apache/mesos/blob/2127376b8e092684312ec9843173b532df931d20/docs/executor-http-api.md#executor-environment-variables

It would be preferable for there to be just one env var containing the sandbox 
location, independent of containerizer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5785) Port documentation mistakes - ephemeral ports

2016-07-05 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-5785:
--

 Summary: Port documentation mistakes - ephemeral ports
 Key: MESOS-5785
 URL: https://issues.apache.org/jira/browse/MESOS-5785
 Project: Mesos
  Issue Type: Bug
Reporter: Michael Gummelt


The docs here: 
http://mesos.apache.org/documentation/latest/attributes-resources/

Should probably recommend that users not configure their agents to offer ports 
in the ephemeral port range (32768+: 
https://en.wikipedia.org/wiki/Ephemeral_port).  We avoid this in DC/OS, for 
example.  The example includes ports offered in this range, so we should fix 
that.

Further the docs state that ports have "pre-defined behavior", but they don't 
state what this is, and I'm not even clear myself what this is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5754) CommandInfo.user not honored in docker containerizer

2016-06-30 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358002#comment-15358002
 ] 

Michael Gummelt commented on MESOS-5754:


> The workaround is to specify a CLI parameter: 

Assuming you're launching through marathon, yes

> CommandInfo.user not honored in docker containerizer
> 
>
> Key: MESOS-5754
> URL: https://issues.apache.org/jira/browse/MESOS-5754
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Michael Gummelt
>
> Repro by creating a framework that starts a task with CommandInfo.user set, 
> and observe that the dockerized executor is still running as the default 
> (e.g. root).
> cc [~kaysoky]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5754) CommandInfo.user not honored in docker containerizer

2016-06-30 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-5754:
--

 Summary: CommandInfo.user not honored in docker containerizer
 Key: MESOS-5754
 URL: https://issues.apache.org/jira/browse/MESOS-5754
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Michael Gummelt


Repro by creating a framework that starts a task with CommandInfo.user set, and 
observe that the dockerized executor is still running as the default (e.g. 
root).

cc [~kaysoky]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3220) Offer ability to kill tasks from the API

2016-05-09 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276822#comment-15276822
 ] 

Michael Gummelt commented on MESOS-3220:


+1.

I'm implementing this behavior in Spark.  It would be more efficient if mesos 
offered it, so we wouldn't have to reimplement at the framework level.

> Offer ability to kill tasks from the API
> 
>
> Key: MESOS-3220
> URL: https://issues.apache.org/jira/browse/MESOS-3220
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Sunil Shah
>  Labels: mesosphere
>
> We are investigating adding a {{dcos task kill}} command to our DCOS (and 
> Mesos) command line interface. Currently the ability to kill tasks is only 
> offered via the scheduler API so it would be useful to have some ability to 
> kill tasks directly.
> This would complement the Maintenance Primitives, in that it would enable the 
> operator to terminate those tasks which, for whatever reasons, do not respond 
> to Inverse Offers events.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5197) Log executor commands w/o verbose logs enabled

2016-05-05 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273184#comment-15273184
 ] 

Michael Gummelt commented on MESOS-5197:


[~kaysoky] How can we solve this problem?  I'm working with yet another 
customer where this info would be invaluable.  Generally "My command is 
failing.  What was the command?" Is a very common scenario.

> Log executor commands w/o verbose logs enabled
> --
>
> Key: MESOS-5197
> URL: https://issues.apache.org/jira/browse/MESOS-5197
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Gummelt
>Assignee: Yong Tang
>  Labels: mesosphere
>
> To debug executors, it's often necessary to know the command that ran the 
> executor.  For example, when Spark executors fail, I'd like to know the 
> command used to invoke the executor (Spark uses the command executor in a 
> docker container).  Currently, it's only output if GLOG_v is enabled, but I 
> don't think this should be a "verbose" output.  It's a common debugging need.
> https://github.com/apache/mesos/blob/2e76199a3dd977152110fbb474928873f31f7213/src/docker/docker.cpp#L677
> cc [~kaysoky]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5197) Log executor commands w/o verbose logs enabled

2016-04-18 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245949#comment-15245949
 ] 

Michael Gummelt commented on MESOS-5197:


How can we make it so the commands are printed?

> Log executor commands w/o verbose logs enabled
> --
>
> Key: MESOS-5197
> URL: https://issues.apache.org/jira/browse/MESOS-5197
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Gummelt
>Assignee: Yong Tang
>  Labels: mesosphere
>
> To debug executors, it's often necessary to know the command that ran the 
> executor.  For example, when Spark executors fail, I'd like to know the 
> command used to invoke the executor (Spark uses the command executor in a 
> docker container).  Currently, it's only output if GLOG_v is enabled, but I 
> don't think this should be a "verbose" output.  It's a common debugging need.
> https://github.com/apache/mesos/blob/2e76199a3dd977152110fbb474928873f31f7213/src/docker/docker.cpp#L677
> cc [~kaysoky]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5197) Log executor commands w/o verbose logs enabled

2016-04-18 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245948#comment-15245948
 ] 

Michael Gummelt commented on MESOS-5197:


How can we make it so the commands are printed?

> Log executor commands w/o verbose logs enabled
> --
>
> Key: MESOS-5197
> URL: https://issues.apache.org/jira/browse/MESOS-5197
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Gummelt
>Assignee: Yong Tang
>  Labels: mesosphere
>
> To debug executors, it's often necessary to know the command that ran the 
> executor.  For example, when Spark executors fail, I'd like to know the 
> command used to invoke the executor (Spark uses the command executor in a 
> docker container).  Currently, it's only output if GLOG_v is enabled, but I 
> don't think this should be a "verbose" output.  It's a common debugging need.
> https://github.com/apache/mesos/blob/2e76199a3dd977152110fbb474928873f31f7213/src/docker/docker.cpp#L677
> cc [~kaysoky]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-5197) Log executor commands w/o verbose logs enabled

2016-04-18 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-5197:
---
Comment: was deleted

(was: How can we make it so the commands are printed?)

> Log executor commands w/o verbose logs enabled
> --
>
> Key: MESOS-5197
> URL: https://issues.apache.org/jira/browse/MESOS-5197
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Gummelt
>Assignee: Yong Tang
>  Labels: mesosphere
>
> To debug executors, it's often necessary to know the command that ran the 
> executor.  For example, when Spark executors fail, I'd like to know the 
> command used to invoke the executor (Spark uses the command executor in a 
> docker container).  Currently, it's only output if GLOG_v is enabled, but I 
> don't think this should be a "verbose" output.  It's a common debugging need.
> https://github.com/apache/mesos/blob/2e76199a3dd977152110fbb474928873f31f7213/src/docker/docker.cpp#L677
> cc [~kaysoky]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5198) state.json incorrectly serves an empty {{executors}} field

2016-04-12 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-5198:
--

 Summary: state.json incorrectly serves an empty {{executors}} field
 Key: MESOS-5198
 URL: https://issues.apache.org/jira/browse/MESOS-5198
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.28.1
Reporter: Michael Gummelt


The {{frameworks.executors}} array in {{state.json}} is empty, despite the 
framework having running tasks.  I believe this is incorrect, since you can't 
have tasks w/o an executor.  Perhaps the intended meaning is "custom 
executors", but I think we should serve info for all executors run by the 
framework, including command executors.  I often need to look up, for example, 
which command is run by the command executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5197) Log executor commands w/o verbose logs enabled

2016-04-12 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-5197:
--

 Summary: Log executor commands w/o verbose logs enabled
 Key: MESOS-5197
 URL: https://issues.apache.org/jira/browse/MESOS-5197
 Project: Mesos
  Issue Type: Task
Reporter: Michael Gummelt


To debug executors, it's often necessary to know the command that ran the 
executor.  For example, when Spark executors fail, I'd like to know the command 
used to invoke the executor (Spark uses the command executor in a docker 
container).  Currently, it's only output if GLOG_v is enabled, but I don't 
think this should be a "verbose" output.  It's a common debugging need.

https://github.com/apache/mesos/blob/2e76199a3dd977152110fbb474928873f31f7213/src/docker/docker.cpp#L677

cc [~kaysoky]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5197) Log executor commands w/o verbose logs enabled

2016-04-12 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-5197:
---
Labels: mesosphere  (was: )

> Log executor commands w/o verbose logs enabled
> --
>
> Key: MESOS-5197
> URL: https://issues.apache.org/jira/browse/MESOS-5197
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Gummelt
>  Labels: mesosphere
>
> To debug executors, it's often necessary to know the command that ran the 
> executor.  For example, when Spark executors fail, I'd like to know the 
> command used to invoke the executor (Spark uses the command executor in a 
> docker container).  Currently, it's only output if GLOG_v is enabled, but I 
> don't think this should be a "verbose" output.  It's a common debugging need.
> https://github.com/apache/mesos/blob/2e76199a3dd977152110fbb474928873f31f7213/src/docker/docker.cpp#L677
> cc [~kaysoky]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4769) Update state endpoints to allow clients to determine how many resources for a given role have been used

2016-02-24 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-4769:
---
Labels: mesosphere  (was: )

> Update state endpoints to allow clients to determine how many resources for a 
> given role have been used
> ---
>
> Key: MESOS-4769
> URL: https://issues.apache.org/jira/browse/MESOS-4769
> Project: Mesos
>  Issue Type: Task
>Affects Versions: 0.27.1
>Reporter: Michael Gummelt
>  Labels: mesosphere
>
> AFAICT, this is currently impossible.  Say I have a cluster with 4CPUs 
> reserved for {{spark}} and 4CPUs unreserved, I have a framework registered as 
> {{spark}}, and I would like to determine how many CPUs reserved for {{Spark}} 
> have been used.  AFAIK, there are two endpoints with interesting information: 
> {{/master/state}} and {{/master/roles}}.  Both endpoints tell me how many 
> resources are used by the framework registered as {{spark}}, but it doesn't 
> tell me which role those resources belong to (i.e. are they reserved or 
> unreserved).
> A simple fix would be to update {{/master/roles}} to split out resources into 
> "reserved" and "unreserved".  However, this will fail to solve the problem if 
> (and hopefully when) Mesos supports multi-role frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4769) Update state endpoints to allow clients to determine how many resources for a given role have been used

2016-02-24 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-4769:
--

 Summary: Update state endpoints to allow clients to determine how 
many resources for a given role have been used
 Key: MESOS-4769
 URL: https://issues.apache.org/jira/browse/MESOS-4769
 Project: Mesos
  Issue Type: Task
Affects Versions: 0.27.1
Reporter: Michael Gummelt


AFAICT, this is currently impossible.  Say I have a cluster with 4CPUs reserved 
for {{spark}} and 4CPUs unreserved, I have a framework registered as {{spark}}, 
and I would like to determine how many CPUs reserved for {{Spark}} have been 
used.  AFAIK, there are two endpoints with interesting information: 
{{/master/state}} and {{/master/roles}}.  Both endpoints tell me how many 
resources are used by the framework registered as {{spark}}, but it doesn't 
tell me which role those resources belong to (i.e. are they reserved or 
unreserved).

A simple fix would be to update {{/master/roles}} to split out resources into 
"reserved" and "unreserved".  However, this will fail to solve the problem if 
(and hopefully when) Mesos supports multi-role frameworks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4751) Convenient API for getting free resources by role

2016-02-23 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-4751:
---
Priority: Minor  (was: Major)

> Convenient API for getting free resources by role
> -
>
> Key: MESOS-4751
> URL: https://issues.apache.org/jira/browse/MESOS-4751
> Project: Mesos
>  Issue Type: Task
>  Components: json api
>Reporter: Michael Gummelt
>Priority: Minor
>
> /master/roles provides allocation by role, but it doesn't provide the total 
> resources assigned to each role, so I can't compute the remaining resources.  
> It seems natural that this endpoint should also include the total assigned to 
> each role.
> Also, please consider normalizing the data in `state.json`.  e.g.:
> {code:javascript}
> "resources": [
>   {
> "cpus"
> "disk"
> "mem"
> "role"
> "used"  
>   }
> ]
> {code}
> It would make it easier to support arbitrary queries if the data were 
> normalized as such.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4751) Convenient API for getting free resources by role

2016-02-23 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-4751:
---
Description: 
/master/roles provides allocation by role, but it doesn't provide the total 
resources assigned to each role, so I can't compute the remaining resources.  
It seems natural that this endpoint should also include the total assigned to 
each role.

Also, please consider normalizing the data in `state.json`.  e.g.:

{code:javascript}
"resources": [
  {
"cpus"
"disk"
"mem"
"role"
"used"  
  }
]
{code}
It would make it easier to support arbitrary queries if the data were 
normalized as such.

  was:
/master/roles provides allocation by role, but it doesn't provide the total 
resources assigned to each role, so I can't compute the remaining resources.  
It seems natural that this endpoint should also include the total assigned to 
each role.

Also, please consider normalizing the data in `state.json`.  e.g.:

{{
"resources": [
  {
"cpus"
"disk"
"mem"
"role"
"used"  
  }
]
}}
It would make it easier to support arbitrary queries if the data were 
normalized as such.


> Convenient API for getting free resources by role
> -
>
> Key: MESOS-4751
> URL: https://issues.apache.org/jira/browse/MESOS-4751
> Project: Mesos
>  Issue Type: Task
>  Components: json api
>Reporter: Michael Gummelt
>
> /master/roles provides allocation by role, but it doesn't provide the total 
> resources assigned to each role, so I can't compute the remaining resources.  
> It seems natural that this endpoint should also include the total assigned to 
> each role.
> Also, please consider normalizing the data in `state.json`.  e.g.:
> {code:javascript}
> "resources": [
>   {
> "cpus"
> "disk"
> "mem"
> "role"
> "used"  
>   }
> ]
> {code}
> It would make it easier to support arbitrary queries if the data were 
> normalized as such.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4698) "Composing" containerizer docs are confusing

2016-02-18 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152885#comment-15152885
 ] 

Michael Gummelt commented on MESOS-4698:


To "compose" means to combine two things to form something else.  This 
containerizer isn't doing that.  It's using EITHER the mesos or the docker 
containerizer.  That's not composition.  Even if we can't agree on the 
definition of the work, just as evidence that it's confusing, both me and my 
peer at Typesafe independently interpreted "composition" to mean something like 
nesting. 

Also, it's inconsistent to list it in the docs as a containerizer type, then 
not include it in the list of `--containerizer` options. 

> "Composing" containerizer docs are confusing
> 
>
> Key: MESOS-4698
> URL: https://issues.apache.org/jira/browse/MESOS-4698
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Michael Gummelt
>  Labels: mesosphere
>
> Both I and my peer at Typesafe have found the containerizer docs confusing 
> (The 'Composing Containerizer' part)
> https://github.com/apache/mesos/blob/master/docs/containerizer.md
> "composing" suggests that I can launch tasks in nested containers
> Also, the structure of the docs suggest that there's a third container type 
> called "composing", which is not true, or it's at least not exposed in the 
> UI.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4698) "Composing" containerizer docs are confusing

2016-02-17 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-4698:
--

 Summary: "Composing" containerizer docs are confusing
 Key: MESOS-4698
 URL: https://issues.apache.org/jira/browse/MESOS-4698
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Reporter: Michael Gummelt


Both I and my peer at Typesafe have found the docs confusing

"composing" suggests that I can launch tasks in nested containers

Also, the structure of the docs suggest that there's a third container type 
called "composing", which is not true, or it's at least not exposed in the UI.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4582) state.json serving duplicate "active" fields

2016-02-02 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-4582:
---
Attachment: error.json

> state.json serving duplicate "active" fields
> 
>
> Key: MESOS-4582
> URL: https://issues.apache.org/jira/browse/MESOS-4582
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27
>Reporter: Michael Gummelt
> Attachments: error.json
>
>
> state.json is serving duplicate "active" fields in frameworks.  See the 
> framework "47df96c2-3f85-4bc5-b781-709b2c30c752-" In the attached file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4582) state.json serving duplicate "active" fields

2016-02-02 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-4582:
--

 Summary: state.json serving duplicate "active" fields
 Key: MESOS-4582
 URL: https://issues.apache.org/jira/browse/MESOS-4582
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.27
Reporter: Michael Gummelt
 Attachments: error.json

state.json is serving duplicate "active" fields in frameworks.  See the 
framework "47df96c2-3f85-4bc5-b781-709b2c30c752-" In the attached file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4585) mesos-fetcher LIBPROCESS_PORT set to 5051 URI fetch failure

2016-02-02 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-4585:
---
Attachment: hdfs-stderr.log

HDFS links are also failing.  See attached log.

> mesos-fetcher LIBPROCESS_PORT set to 5051 URI fetch failure
> ---
>
> Key: MESOS-4585
> URL: https://issues.apache.org/jira/browse/MESOS-4585
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0
>Reporter: Drew Robb
> Attachments: hdfs-stderr.log
>
>
> When starting a task with a {{s3a://}} URI, the fetcher fails to download the 
> URI, failing when trying to bind to the slave's port 5051. The URI gets 
> successfully downloaded, but the error is fatal. If the URI is changed to 
> {{http://}}. The root cause of this is that apparently the mesos-fetcher 
> process has {{LIBPROCESS_PORT=5051}} in its environment as I was able to find 
> from {{cat "/proc/`pgrep mesos-fetcher`/environ"}}.
> stderr from a failing task:
> {quote}
> I0203 00:11:55.815500  4964 fetcher.cpp:424] Fetcher Info: 
> {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/ede0e5bc-d7ac-4b9a-8d35-b210fa785db0-S0","items":[{"action":"BYPASS_CACHE","uri":{"cache":false,"executable":false,"extract":true,"value":"s3a:\/\/strava.mesos\/foo"}}],"sandbox_directory":"\/mnt\/mesos\/slaves\/ede0e5bc-d7ac-4b9a-8d35-b210fa785db0-S0\/frameworks\/fe927665-1516-46cf-94dd-6d2ca84007f1-\/executors\/uris-test.bc047306-ca0a-11e5-b742-e2162bf6108e\/runs\/24ebd807-b065-4776-a0bf-84bda4a82f01"}
> I0203 00:11:55.816830  4964 fetcher.cpp:379] Fetching URI 
> 's3a://strava.mesos/foo'
> I0203 00:11:55.816846  4964 fetcher.cpp:250] Fetching directly into the 
> sandbox directory
> I0203 00:11:55.816864  4964 fetcher.cpp:187] Fetching URI 
> 's3a://strava.mesos/foo'
> I0203 00:11:56.191640  4964 fetcher.cpp:109] Downloading resource with Hadoop 
> client from 's3a://strava.mesos/foo' to 
> '/mnt/mesos/slaves/ede0e5bc-d7ac-4b9a-8d35-b210fa785db0-S0/frameworks/fe927665-1516-46cf-94dd-6d2ca84007f1-/executors/uris-test.bc047306-ca0a-11e5-b742-e2162bf6108e/runs/24ebd807-b065-4776-a0bf-84bda4a82f01/foo'
> F0203 00:11:56.192503  4964 process.cpp:892] Failed to initialize: Failed to 
> bind on 0.0.0.0:5051: Address already in use: Address already in use [98]
> *** Check failure stack trace: ***
> @ 0x7f229ce50e7d  google::LogMessage::Fail()
> @ 0x7f229ce52c10  google::LogMessage::SendToLog()
> @ 0x7f229ce50a42  google::LogMessage::Flush()
> @ 0x7f229ce50c89  google::LogMessage::~LogMessage()
> @ 0x7f229ce51c32  google::ErrnoLogMessage::~ErrnoLogMessage()
> @ 0x7f229cdf16b9  process::initialize()
> @ 0x7f229cdf2f36  process::ProcessBase::ProcessBase()
> @ 0x7f229ce22875  process::reap()
> @ 0x7f229ce2ced7  process::subprocess()
> @ 0x7f229c50ab7b  HDFS::copyToLocal()
> @   0x40f03e  download()
> @   0x40b69f  main
> @ 0x7f229adc8a40  (unknown)
> @   0x40cf59  _start
> Aborted (core dumped)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3866) The docker containerizer sets MESOS_NATIVE_JAVA_LIBRARY in docker executors

2015-12-22 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068375#comment-15068375
 ] 

Michael Gummelt commented on MESOS-3866:


It's not a dupe.  MESOS-3751 regards MESOS_NATIVE_JAVA_LIBRARY not being set 
when it should (in mesos).  This regards it being set when it shouldn't (in 
docker).

> The docker containerizer sets MESOS_NATIVE_JAVA_LIBRARY in docker executors
> ---
>
> Key: MESOS-3866
> URL: https://issues.apache.org/jira/browse/MESOS-3866
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.25.0
>Reporter: Michael Gummelt
>
> It's set here: 
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L281
> And passed to the docker executor here: 
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L844
> This leaks the host path of the library into the docker image, which of 
> course can't see it. This is breaking DCOS Spark, which runs in a docker 
> image that has set its own value for MESOS_NATIVE_JAVA_LIBRARY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3866) The docker containerizer sets MESOS_NATIVE_JAVA_LIBRARY in docker executors

2015-11-09 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-3866:
--

 Summary: The docker containerizer sets MESOS_NATIVE_JAVA_LIBRARY 
in docker executors
 Key: MESOS-3866
 URL: https://issues.apache.org/jira/browse/MESOS-3866
 Project: Mesos
  Issue Type: Bug
  Components: containerization
Affects Versions: 0.25.0
Reporter: Michael Gummelt


It's set here: 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L281

And passed to the docker executor here: 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L844

This leaks the host path of the library into the docker image, which of course 
can't see it. This is breaking Spark, which runs in a docker image that has set 
its own value for MESOS_NATIVE_JAVA_LIBRARY.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3866) The docker containerizer sets MESOS_NATIVE_JAVA_LIBRARY in docker executors

2015-11-09 Thread Michael Gummelt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated MESOS-3866:
---
Description: 
It's set here: 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L281

And passed to the docker executor here: 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L844

This leaks the host path of the library into the docker image, which of course 
can't see it. This is breaking DCOS Spark, which runs in a docker image that 
has set its own value for MESOS_NATIVE_JAVA_LIBRARY.


  was:
It's set here: 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L281

And passed to the docker executor here: 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L844

This leaks the host path of the library into the docker image, which of course 
can't see it. This is breaking Spark, which runs in a docker image that has set 
its own value for MESOS_NATIVE_JAVA_LIBRARY.



> The docker containerizer sets MESOS_NATIVE_JAVA_LIBRARY in docker executors
> ---
>
> Key: MESOS-3866
> URL: https://issues.apache.org/jira/browse/MESOS-3866
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.25.0
>Reporter: Michael Gummelt
>
> It's set here: 
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L281
> And passed to the docker executor here: 
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L844
> This leaks the host path of the library into the docker image, which of 
> course can't see it. This is breaking DCOS Spark, which runs in a docker 
> image that has set its own value for MESOS_NATIVE_JAVA_LIBRARY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3836) `--executor-environment-variables` may not apply to docker containers

2015-11-08 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996136#comment-14996136
 ] 

Michael Gummelt commented on MESOS-3836:


bq. Every marathon app task got every environment variable that mesos-slave had 
unless the marathon app definition explicitly overrode it.

That's because marathon tasks run under the command executor.  As I said this 
is the only scenario where you can say with certainty that tasks inherit env 
vars from the host.

bq. Executors in many ways re like Tasks and should be fully containerized like 
them

I'm not sure what you mean by "fully" containerized, but tasks aren't fully 
isolated.  In fact, you can't really say anything about tasks.  It doesn't 
really even make sense to talk about env vars set on tasks, because tasks 
aren't even processes necessarily.  All of this env var talk only applies to 
executors.  We should be clear with terms.

Definitional nitpicks aside, I do agree that we should head toward total host 
isolation, but let's focus on solving the immediate problem.

> `--executor-environment-variables` may not apply to docker containers
> -
>
> Key: MESOS-3836
> URL: https://issues.apache.org/jira/browse/MESOS-3836
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, slave
>Affects Versions: 0.25.0
> Environment: Mesos 0.25.0 configured with 
> --executor-environment-variables
>Reporter: Cody Maloney
>Assignee: Marco Massenzio
>Priority: Minor
>  Labels: mesosphere
>
> In our use case we set {{PATH}} as part of the 
> {{\-\-executor_environment_variables}} in order to limit what binaries all 
> tasks which are launched via Mesos have readily available to them, making it 
> much harder for people launching tasks on mesos to accidentally depend on 
> something which isn't part of the "guaranteed" environment / platform.
> Docker containers can be used as executors, and have a fully isolated 
> filesystem. For executors which run in docker containers setting {{PATH}}  to 
> our path on the host filesystem may potentially break the docker container.
> The previous code of only copying across environment variables when 
> {{includeOsEnvironment}} is set dealt with this 
> (https://github.com/apache/mesos/blob/56510afe149758a69a5a714dfaab16111dd0d9c3/src/slave/containerizer/containerizer.cpp#L267)
> if {{includeOsEnvironment}} is set than we should copy across the current 
> {{\-\-executor_environment_variables}}. If it isn't, then 
> {{\-\-executor_environment_variables}} shouldn't be used at all.
> Another option which could be useful is to make it so that there are two sets 
> of "Executor Environment Variables". One for when {{includeOsEnvironment}} is 
> set, and one for when it is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3836) `--executor-environment-variables` may not apply to docker containers

2015-11-08 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996101#comment-14996101
 ] 

Michael Gummelt commented on MESOS-3836:


It looks like the original goal of MESOS-2832, where 
{{--executor-environment-variables}} was introduced, was to replace the 
inherited host environment with a different environment, which would only apply 
to non-docker containers, since they're the only ones that inherit the host 
environment.  However, as implemented, it's set on all executors.

So the central question is whether we want to keep the functionality of setting 
env vars on all executors, or do we want to revert to the original goal of 
replacing the inherited host environnment, which would only apply to non-docker 
containers (mesos and external).

[~tnachen]: I don't see how your proposal for a 
{{-docker-task-environment-variables}} flag solves the {{PATH}} problem.  
Adding more docker env vars doesn't prevent us from setting the existing 
{{--executor-environment-variables}} on docker executors.

[~cmaloney]: 

bq. The -executor-environment-variables is given directly to executors, and 
then gets inherited from the executor by all tasks the executors launch 
currently.

Not really.  Custom executors can launch tasks however they want.  It's up to 
them whether or not they pass their env vars.  And the docker command executors 
(mesos-docker-executor) doesn't pass env vars through.  So this is really only 
true for the mesos command executor.



> `--executor-environment-variables` may not apply to docker containers
> -
>
> Key: MESOS-3836
> URL: https://issues.apache.org/jira/browse/MESOS-3836
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, slave
>Affects Versions: 0.25.0
> Environment: Mesos 0.25.0 configured with 
> --executor-environment-variables
>Reporter: Cody Maloney
>Assignee: Marco Massenzio
>Priority: Minor
>  Labels: mesosphere
>
> In our use case we set {{PATH}} as part of the 
> {{\-\-executor_environment_variables}} in order to limit what binaries all 
> tasks which are launched via Mesos have readily available to them, making it 
> much harder for people launching tasks on mesos to accidentally depend on 
> something which isn't part of the "guaranteed" environment / platform.
> Docker containers can be used as executors, and have a fully isolated 
> filesystem. For executors which run in docker containers setting {{PATH}}  to 
> our path on the host filesystem may potentially break the docker container.
> The previous code of only copying across environment variables when 
> {{includeOsEnvironment}} is set dealt with this 
> (https://github.com/apache/mesos/blob/56510afe149758a69a5a714dfaab16111dd0d9c3/src/slave/containerizer/containerizer.cpp#L267)
> if {{includeOsEnvironment}} is set than we should copy across the current 
> {{\-\-executor_environment_variables}}. If it isn't, then 
> {{\-\-executor_environment_variables}} shouldn't be used at all.
> Another option which could be useful is to make it so that there are two sets 
> of "Executor Environment Variables". One for when {{includeOsEnvironment}} is 
> set, and one for when it is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3836) `--executor-environment-variables` may not apply to docker containers

2015-11-08 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996144#comment-14996144
 ] 

Michael Gummelt commented on MESOS-3836:


bq.  I mean every executor should adhere to the same isolators that tasks do

Isolators are set on containers.  Thus executors and tasks, which run in 
containers, adhere to the same isolators.  There are no isolators that tasks 
adhere to that executors don't.

> `--executor-environment-variables` may not apply to docker containers
> -
>
> Key: MESOS-3836
> URL: https://issues.apache.org/jira/browse/MESOS-3836
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, slave
>Affects Versions: 0.25.0
> Environment: Mesos 0.25.0 configured with 
> --executor-environment-variables
>Reporter: Cody Maloney
>Assignee: Marco Massenzio
>Priority: Minor
>  Labels: mesosphere
>
> In our use case we set {{PATH}} as part of the 
> {{\-\-executor_environment_variables}} in order to limit what binaries all 
> tasks which are launched via Mesos have readily available to them, making it 
> much harder for people launching tasks on mesos to accidentally depend on 
> something which isn't part of the "guaranteed" environment / platform.
> Docker containers can be used as executors, and have a fully isolated 
> filesystem. For executors which run in docker containers setting {{PATH}}  to 
> our path on the host filesystem may potentially break the docker container.
> The previous code of only copying across environment variables when 
> {{includeOsEnvironment}} is set dealt with this 
> (https://github.com/apache/mesos/blob/56510afe149758a69a5a714dfaab16111dd0d9c3/src/slave/containerizer/containerizer.cpp#L267)
> if {{includeOsEnvironment}} is set than we should copy across the current 
> {{\-\-executor_environment_variables}}. If it isn't, then 
> {{\-\-executor_environment_variables}} shouldn't be used at all.
> Another option which could be useful is to make it so that there are two sets 
> of "Executor Environment Variables". One for when {{includeOsEnvironment}} is 
> set, and one for when it is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3836) `--executor-environment-variables` may not apply to docker containers

2015-11-08 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996105#comment-14996105
 ] 

Michael Gummelt commented on MESOS-3836:


If we decide to keep the existing functionality, my proposal is to have both 
{{-executor-environment-variables}} and something like 
{{--inherited-environment-variables}} or {{-host-environment-variables}}.  The 
former would set env vars on all executors.  The latter would set the inherited 
environment for containers, which would only apply to those containerizers that 
inherit the host environment (mesos and external)

> `--executor-environment-variables` may not apply to docker containers
> -
>
> Key: MESOS-3836
> URL: https://issues.apache.org/jira/browse/MESOS-3836
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, slave
>Affects Versions: 0.25.0
> Environment: Mesos 0.25.0 configured with 
> --executor-environment-variables
>Reporter: Cody Maloney
>Assignee: Marco Massenzio
>Priority: Minor
>  Labels: mesosphere
>
> In our use case we set {{PATH}} as part of the 
> {{\-\-executor_environment_variables}} in order to limit what binaries all 
> tasks which are launched via Mesos have readily available to them, making it 
> much harder for people launching tasks on mesos to accidentally depend on 
> something which isn't part of the "guaranteed" environment / platform.
> Docker containers can be used as executors, and have a fully isolated 
> filesystem. For executors which run in docker containers setting {{PATH}}  to 
> our path on the host filesystem may potentially break the docker container.
> The previous code of only copying across environment variables when 
> {{includeOsEnvironment}} is set dealt with this 
> (https://github.com/apache/mesos/blob/56510afe149758a69a5a714dfaab16111dd0d9c3/src/slave/containerizer/containerizer.cpp#L267)
> if {{includeOsEnvironment}} is set than we should copy across the current 
> {{\-\-executor_environment_variables}}. If it isn't, then 
> {{\-\-executor_environment_variables}} shouldn't be used at all.
> Another option which could be useful is to make it so that there are two sets 
> of "Executor Environment Variables". One for when {{includeOsEnvironment}} is 
> set, and one for when it is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3836) `--executor-environment-variables` may not apply to docker containers

2015-11-08 Thread Michael Gummelt (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996144#comment-14996144
 ] 

Michael Gummelt edited comment on MESOS-3836 at 11/9/15 7:15 AM:
-

bq.  I mean every executor should adhere to the same isolators that tasks do

Isolators are set on containers (or rather, they define containers).  Thus 
executors and tasks, which run in containers, adhere to the same isolators.  
There are no isolators that tasks adhere to that executors don't.


was (Author: mgummelt):
bq.  I mean every executor should adhere to the same isolators that tasks do

Isolators are set on containers.  Thus executors and tasks, which run in 
containers, adhere to the same isolators.  There are no isolators that tasks 
adhere to that executors don't.

> `--executor-environment-variables` may not apply to docker containers
> -
>
> Key: MESOS-3836
> URL: https://issues.apache.org/jira/browse/MESOS-3836
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, slave
>Affects Versions: 0.25.0
> Environment: Mesos 0.25.0 configured with 
> --executor-environment-variables
>Reporter: Cody Maloney
>Assignee: Marco Massenzio
>Priority: Minor
>  Labels: mesosphere
>
> In our use case we set {{PATH}} as part of the 
> {{\-\-executor_environment_variables}} in order to limit what binaries all 
> tasks which are launched via Mesos have readily available to them, making it 
> much harder for people launching tasks on mesos to accidentally depend on 
> something which isn't part of the "guaranteed" environment / platform.
> Docker containers can be used as executors, and have a fully isolated 
> filesystem. For executors which run in docker containers setting {{PATH}}  to 
> our path on the host filesystem may potentially break the docker container.
> The previous code of only copying across environment variables when 
> {{includeOsEnvironment}} is set dealt with this 
> (https://github.com/apache/mesos/blob/56510afe149758a69a5a714dfaab16111dd0d9c3/src/slave/containerizer/containerizer.cpp#L267)
> if {{includeOsEnvironment}} is set than we should copy across the current 
> {{\-\-executor_environment_variables}}. If it isn't, then 
> {{\-\-executor_environment_variables}} shouldn't be used at all.
> Another option which could be useful is to make it so that there are two sets 
> of "Executor Environment Variables". One for when {{includeOsEnvironment}} is 
> set, and one for when it is not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2797) mesos-slave dies when it hits open file descriptor limit

2015-06-01 Thread Michael Gummelt (JIRA)
Michael Gummelt created MESOS-2797:
--

 Summary: mesos-slave dies when it hits open file descriptor limit
 Key: MESOS-2797
 URL: https://issues.apache.org/jira/browse/MESOS-2797
 Project: Mesos
  Issue Type: Bug
  Components: general
Affects Versions: 0.22.1
Reporter: Michael Gummelt


I'm running mesos-slave under systemd as part of Mesosphere's DCOS.  The slave 
process is repeatedly dying as it hits the system's open file descriptor limit 
of 1024.  See the below master-slave.log file.

I stop mesos-slave, remove the directory specified in the slave logs, and still 
get the same error.  lsof shows that mesos-slave is opening several hundred 
pipes.  See the below lsof.log file.

mesos-slave.log
Jun 01 23:49:19 dcos-01 systemd[1]: mesos-slave.service holdoff time over, 
scheduling restart.
Jun 01 23:49:19 dcos-01 systemd[1]: Stopping Mesos Slave...
Jun 01 23:49:19 dcos-01 systemd[1]: Starting Mesos Slave...
Jun 01 23:49:19 dcos-01 ping[14896]: PING leader.mesos (172.17.8.101) 56(84) 
bytes of data.
Jun 01 23:49:19 dcos-01 ping[14896]: 64 bytes from dcos-01 (172.17.8.101): 
icmp_seq=1 ttl=64 time=0.023 ms
Jun 01 23:49:19 dcos-01 ping[14896]: --- leader.mesos ping statistics ---
Jun 01 23:49:19 dcos-01 ping[14896]: 1 packets transmitted, 1 received, 0% 
packet loss, time 0ms
Jun 01 23:49:19 dcos-01 ping[14896]: rtt min/avg/max/mdev = 
0.023/0.023/0.023/0.000 ms
Jun 01 23:49:19 dcos-01 systemd[1]: Started Mesos Slave.
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: I0601 23:49:19.713110 14899 
logging.cpp:172] INFO level logging started!
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: I0601 23:49:19.715564 14899 
main.cpp:156] Build: 2015-05-19 18:43:41 by
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: I0601 23:49:19.715600 14899 
main.cpp:158] Version: 0.22.1
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: I0601 23:49:19.715618 14899 
main.cpp:165] Git SHA: dd082c8656eb6e93e091a12fc5cfee3700a61bb1
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: I0601 23:49:19.830142 14899 
containerizer.cpp:110] Using isolation: cgroups/cpu,cgroups/mem
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: I0601 23:49:19.845340 14899 
linux_launcher.cpp:94] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
for the Linux launcher
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: I0601 23:49:19.845696 14899 
main.cpp:200] Starting Mesos slave
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: 2015-06-01 
23:49:19,845:14899(0x7f111ff43700):ZOO_INFO@log_env@712: Client 
environment:zookeeper.version=zookeeper C client 3.4.5
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: 2015-06-01 
23:49:19,846:14899(0x7f111ff43700):ZOO_INFO@log_env@716: Client 
environment:host.name=dcos-01
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: 2015-06-01 
23:49:19,846:14899(0x7f111ff43700):ZOO_INFO@log_env@723: Client 
environment:os.name=Linux
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: 2015-06-01 
23:49:19,846:14899(0x7f111ff43700):ZOO_INFO@log_env@724: Client 
environment:os.arch=3.19.0
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: 2015-06-01 
23:49:19,846:14899(0x7f111ff43700):ZOO_INFO@log_env@725: Client 
environment:os.version=#2 SMP Thu Mar 26 10:44:46 UTC 2015
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: 2015-06-01 
23:49:19,846:14899(0x7f111ff43700):ZOO_INFO@log_env@733: Client 
environment:user.name=(null)
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: 2015-06-01 
23:49:19,846:14899(0x7f111ff43700):ZOO_INFO@log_env@741: Client 
environment:user.home=/root
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: 2015-06-01 
23:49:19,846:14899(0x7f111ff43700):ZOO_INFO@log_env@753: Client 
environment:user.dir=/
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: 2015-06-01 
23:49:19,846:14899(0x7f111ff43700):ZOO_INFO@zookeeper_init@786: Initiating 
client connection, host=leader.mesos:2181 sessionTimeout=1 
watcher=0x7f11246c0140 sessionId=0 sessionPasswd=null context=0x7f1114000b40 
flags=0
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: I0601 23:49:19.846161 14899 
slave.cpp:174] Slave started on 1)@172.17.8.101:5051
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: I0601 23:49:19.846206 14899 
slave.cpp:194] Moving slave process into its own cgroup for subsystem: cpu
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: 2015-06-01 
23:49:19,855:14899(0x7f110bde7700):ZOO_INFO@check_events@1703: initiated 
connection to server [172.17.8.101:2181]
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: 2015-06-01 
23:49:19,855:14899(0x7f110bde7700):ZOO_INFO@check_events@1750: session 
establishment complete on server [172.17.8.101:2181], 
sessionId=0x14d77b31175030e, negotiated timeout=1
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: I0601 23:49:19.856979 14900 
group.cpp:313] Group process (group(1)@172.17.8.101:5051) connected to ZooKeeper
Jun 01 23:49:19 dcos-01 mesos-slave[14899]: I0601 23:49:19.857028 14900 
group.cpp:790] Syncing group operations: queue size (joins, cancels, datas) = 
(0, 0, 0)
Jun