[jira] [Updated] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6118:
--
Fix Version/s: 1.1.0

> Agent would crash with docker container tasks due to host mount table read.
> ---
>
> Key: MESOS-6118
> URL: https://issues.apache.org/jira/browse/MESOS-6118
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
> Environment: Build: 2016-08-26 23:06:27 by centos
> Version: 1.0.1
> Git tag: 1.0.1
> Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> systemd version `219` detected
> Inializing systemd state
> Created systemd slice: `/run/systemd/system/mesos_executors.slice`
> Started systemd slice `mesos_executors.slice`
> Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
>  Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Jamie Briant
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: linux, slave
> Fix For: 1.0.2, 1.1.0
>
> Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, 
> cycle6.log, slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds 
> to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the 
> slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
> fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: 
> ***
> Version 1.0.0 works without this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6118:
--
Target Version/s:   (was: 1.0.2, 1.1.0)

> Agent would crash with docker container tasks due to host mount table read.
> ---
>
> Key: MESOS-6118
> URL: https://issues.apache.org/jira/browse/MESOS-6118
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
> Environment: Build: 2016-08-26 23:06:27 by centos
> Version: 1.0.1
> Git tag: 1.0.1
> Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> systemd version `219` detected
> Inializing systemd state
> Created systemd slice: `/run/systemd/system/mesos_executors.slice`
> Started systemd slice `mesos_executors.slice`
> Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
>  Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Jamie Briant
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: linux, slave
> Fix For: 1.0.2, 1.1.0
>
> Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, 
> cycle6.log, slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds 
> to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the 
> slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
> fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: 
> ***
> Version 1.0.0 works without this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6118) Agent would crash with docker container tasks due to host mount table read.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6118:
--
Fix Version/s: 1.0.2

> Agent would crash with docker container tasks due to host mount table read.
> ---
>
> Key: MESOS-6118
> URL: https://issues.apache.org/jira/browse/MESOS-6118
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 1.0.1
> Environment: Build: 2016-08-26 23:06:27 by centos
> Version: 1.0.1
> Git tag: 1.0.1
> Git SHA: 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> systemd version `219` detected
> Inializing systemd state
> Created systemd slice: `/run/systemd/system/mesos_executors.slice`
> Started systemd slice `mesos_executors.slice`
> Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
>  Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
> Linux ip-10-254-192-40 3.10.0-327.28.3.el7.x86_64 #1 SMP Thu Aug 18 19:05:49 
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Jamie Briant
>Assignee: Kevin Klues
>Priority: Blocker
>  Labels: linux, slave
> Fix For: 1.0.2, 1.1.0
>
> Attachments: crashlogfull.log, cycle2.log, cycle3.log, cycle5.log, 
> cycle6.log, slave-crash.log
>
>
> I have a framework which schedules thousands of short running (a few seconds 
> to a few minutes) of tasks, over a period of several minutes. In 1.0.1, the 
> slave process will crash every few minutes (with systemd restarting it).
> Crash is:
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: F0901 20:52:23.905678  1232 
> fs.cpp:140] Check failed: !visitedParents.contains(parentId)
> Sep 01 20:52:23 ip-10-254-192-99 mesos-slave: *** Check failure stack trace: 
> ***
> Version 1.0.0 works without this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6173) Authentication in v2 protobuf should not be `required`.

2016-10-13 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574075#comment-15574075
 ] 

Jie Yu commented on MESOS-6173:
---

Re-open if you still see the issue.

> Authentication in v2 protobuf should not be `required`.
> ---
>
> Key: MESOS-6173
> URL: https://issues.apache.org/jira/browse/MESOS-6173
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.28.0, 0.28.1, 0.28.2, 1.0.0, 1.0.1
> Environment: Ubuntu 14.04
>Reporter: Bill Zhao
>Assignee: Gilbert Song
>  Labels: containerizer, docker, mesosphere
> Fix For: 1.1.0
>
>
> I was testing the mesos GPU support.  However, I have notice the issue 
> between different docker repository.  The public docker hub works fine, but 
> the private docker repository by JFrog doesn't work as expected.
> I have setup the environment according to this document.
> https://mesosphere.github.io/marathon/docs/native-docker.html
> Tested with mesos-execute command:
> billz2:/etc/mesos-slave$ sudo mesos-execute   
> --master=bz01.apple.com:5050   --name=gpu-test   
> --docker_image=docker.apple.com/nvidia/cuda   --command="nvidia-smi"  
>  --framework_capabilities="GPU_RESOURCES"   --resources="gpus:2"
> I0914 18:32:51.571482 26084 scheduler.cpp:172] Version: 1.0.1
> I0914 18:32:51.579815 26087 scheduler.cpp:461] New master detected at 
> master@17.x.x.x:5050
> Subscribed with ID 'c0968c96-cc66-4990-9c49-d5ef26d07a07-0015'
> Submitted task 'gpu-test' to agent 
> 'c0968c96-cc66-4990-9c49-d5ef26d07a07-S17370'
> Received status update TASK_FAILED for task 'gpu-test'
>   message: 'Failed to launch container: Failed to parse the image manifest: 
> Protobuf parse failed: Missing required fields: signatures[0].header.jwk.kid; 
> Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> The authentication in v2 protobuf should
> not be `required`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6391) Command task's sandbox should not be owned by root if it uses container image.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6391:
--
Target Version/s: 1.1.0

> Command task's sandbox should not be owned by root if it uses container image.
> --
>
> Key: MESOS-6391
> URL: https://issues.apache.org/jira/browse/MESOS-6391
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.1
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, if the task defines a container image, the command executor will 
> be run under root because it needs to perform pivot_root.
> That means if the task wants to run under an unprivileged user, the sandbox 
> of that task will not be writable because it's owned by root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6391) Command task's sandbox should not be owned by root if it uses container image.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu reassigned MESOS-6391:
-

Assignee: Jie Yu

> Command task's sandbox should not be owned by root if it uses container image.
> --
>
> Key: MESOS-6391
> URL: https://issues.apache.org/jira/browse/MESOS-6391
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.1
>Reporter: Jie Yu
>Assignee: Jie Yu
>
> Currently, if the task defines a container image, the command executor will 
> be run under root because it needs to perform pivot_root.
> That means if the task wants to run under an unprivileged user, the sandbox 
> of that task will not be writable because it's owned by root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6391) Command task's sandbox should not be owned by root if it uses container image.

2016-10-13 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574063#comment-15574063
 ] 

Jie Yu commented on MESOS-6391:
---

https://reviews.apache.org/r/52854/
https://reviews.apache.org/r/52855/

> Command task's sandbox should not be owned by root if it uses container image.
> --
>
> Key: MESOS-6391
> URL: https://issues.apache.org/jira/browse/MESOS-6391
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.1
>Reporter: Jie Yu
>
> Currently, if the task defines a container image, the command executor will 
> be run under root because it needs to perform pivot_root.
> That means if the task wants to run under an unprivileged user, the sandbox 
> of that task will not be writable because it's owned by root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6391) Command task's sandbox should not be owned by root if it uses container image.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6391:
--
Description: 
Currently, if the task defines a container image, the command executor will be 
run under root because it needs to perform pivot_root.

That means if the task wants to run under an unprivileged user, the sandbox of 
that task will not be writable because it's owned by root.

  was:
Currently, is the task defines a container image, the command executor will be 
run under root because it needs to perform pivot_root.

That means if the task wants to run under an unprivileged user, the sandbox of 
that task will not be writable because it's owned by root.


> Command task's sandbox should not be owned by root if it uses container image.
> --
>
> Key: MESOS-6391
> URL: https://issues.apache.org/jira/browse/MESOS-6391
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.2, 1.0.1
>Reporter: Jie Yu
>
> Currently, if the task defines a container image, the command executor will 
> be run under root because it needs to perform pivot_root.
> That means if the task wants to run under an unprivileged user, the sandbox 
> of that task will not be writable because it's owned by root.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6173) Authentication in v2 protobuf should not be `required`.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6173:
--
Target Version/s: 1.1.0  (was: 1.2.0)

> Authentication in v2 protobuf should not be `required`.
> ---
>
> Key: MESOS-6173
> URL: https://issues.apache.org/jira/browse/MESOS-6173
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.28.0, 0.28.1, 0.28.2, 1.0.0, 1.0.1
> Environment: Ubuntu 14.04
>Reporter: Bill Zhao
>Assignee: Gilbert Song
>  Labels: containerizer, docker, mesosphere
>
> I was testing the mesos GPU support.  However, I have notice the issue 
> between different docker repository.  The public docker hub works fine, but 
> the private docker repository by JFrog doesn't work as expected.
> I have setup the environment according to this document.
> https://mesosphere.github.io/marathon/docs/native-docker.html
> Tested with mesos-execute command:
> billz2:/etc/mesos-slave$ sudo mesos-execute   
> --master=bz01.apple.com:5050   --name=gpu-test   
> --docker_image=docker.apple.com/nvidia/cuda   --command="nvidia-smi"  
>  --framework_capabilities="GPU_RESOURCES"   --resources="gpus:2"
> I0914 18:32:51.571482 26084 scheduler.cpp:172] Version: 1.0.1
> I0914 18:32:51.579815 26087 scheduler.cpp:461] New master detected at 
> master@17.x.x.x:5050
> Subscribed with ID 'c0968c96-cc66-4990-9c49-d5ef26d07a07-0015'
> Submitted task 'gpu-test' to agent 
> 'c0968c96-cc66-4990-9c49-d5ef26d07a07-S17370'
> Received status update TASK_FAILED for task 'gpu-test'
>   message: 'Failed to launch container: Failed to parse the image manifest: 
> Protobuf parse failed: Missing required fields: signatures[0].header.jwk.kid; 
> Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> The authentication in v2 protobuf should
> not be `required`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6393) Deprecated SSL_ environment variables are non functional already.

2016-10-13 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-6393:
-
Priority: Blocker  (was: Major)

> Deprecated SSL_ environment variables are non functional already.
> -
>
> Key: MESOS-6393
> URL: https://issues.apache.org/jira/browse/MESOS-6393
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>Priority: Blocker
>  Labels: SSL, libprocess, security
>
> {noformat}
> $ SSL_ENABLED=true SSL_SUPPORT_DOWNGRADE=true 
> SSL_KEY_FILE=~/Development/ssl/snakeoil.key 
> SSL_CERT_FILE=~/Development/ssl/snakeoil.crt ./bin/mesos-master.sh 
> --work_dir=/tmp/mesos
> {noformat}
> {noformat}
> $ curl -k https://127.0.0.1:5050/master/state.json
> curl: (35) Server aborted the SSL handshake
> {noformat}
> Only when using the new prefix, {{LIBPROCESS_SSL}} are the variables actually 
> being respected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6393) Deprecated SSL_ environment variables are non functional already.

2016-10-13 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6393:
--
Affects Version/s: 1.1.0
   1.0.2

> Deprecated SSL_ environment variables are non functional already.
> -
>
> Key: MESOS-6393
> URL: https://issues.apache.org/jira/browse/MESOS-6393
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>  Labels: SSL, libprocess, security
>
> {noformat}
> $ SSL_ENABLED=true SSL_SUPPORT_DOWNGRADE=true 
> SSL_KEY_FILE=~/Development/ssl/snakeoil.key 
> SSL_CERT_FILE=~/Development/ssl/snakeoil.crt ./bin/mesos-master.sh 
> --work_dir=/tmp/mesos
> {noformat}
> {noformat}
> $ curl -k https://127.0.0.1:5050/master/state.json
> curl: (35) Server aborted the SSL handshake
> {noformat}
> Only when using the new prefix, {{LIBPROCESS_SSL}} are the variables actually 
> being respected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6393) Deprecated SSL_ environment variables are non functional already.

2016-10-13 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff reassigned MESOS-6393:
-

Assignee: Till Toenshoff

> Deprecated SSL_ environment variables are non functional already.
> -
>
> Key: MESOS-6393
> URL: https://issues.apache.org/jira/browse/MESOS-6393
> Project: Mesos
>  Issue Type: Bug
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>  Labels: SSL, libprocess, security
>
> {noformat}
> $ SSL_ENABLED=true SSL_SUPPORT_DOWNGRADE=true 
> SSL_KEY_FILE=~/Development/ssl/snakeoil.key 
> SSL_CERT_FILE=~/Development/ssl/snakeoil.crt ./bin/mesos-master.sh 
> --work_dir=/tmp/mesos
> {noformat}
> {noformat}
> $ curl -k https://127.0.0.1:5050/master/state.json
> curl: (35) Server aborted the SSL handshake
> {noformat}
> Only when using the new prefix, {{LIBPROCESS_SSL}} are the variables actually 
> being respected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6393) Deprecated SSL_ environment variables are non functional already.

2016-10-13 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573830#comment-15573830
 ] 

Till Toenshoff commented on MESOS-6393:
---

The commit that introduced this problem is 
acbebcc6f13d75956022f849ef93955fdfb33f4c - it also already got backported to 
1.0.2 and hence needs to be fixed there as well [~vinodkone]

> Deprecated SSL_ environment variables are non functional already.
> -
>
> Key: MESOS-6393
> URL: https://issues.apache.org/jira/browse/MESOS-6393
> Project: Mesos
>  Issue Type: Bug
>Reporter: Till Toenshoff
>  Labels: SSL, libprocess, security
>
> {noformat}
> $ SSL_ENABLED=true SSL_SUPPORT_DOWNGRADE=true 
> SSL_KEY_FILE=~/Development/ssl/snakeoil.key 
> SSL_CERT_FILE=~/Development/ssl/snakeoil.crt ./bin/mesos-master.sh 
> --work_dir=/tmp/mesos
> {noformat}
> {noformat}
> $ curl -k https://127.0.0.1:5050/master/state.json
> curl: (35) Server aborted the SSL handshake
> {noformat}
> Only when using the new prefix, {{LIBPROCESS_SSL}} are the variables actually 
> being respected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6393) Deprecated SSL_ environment variables are non functional already.

2016-10-13 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-6393:
-

 Summary: Deprecated SSL_ environment variables are non functional 
already.
 Key: MESOS-6393
 URL: https://issues.apache.org/jira/browse/MESOS-6393
 Project: Mesos
  Issue Type: Bug
Reporter: Till Toenshoff


{noformat}
$ SSL_ENABLED=true SSL_SUPPORT_DOWNGRADE=true 
SSL_KEY_FILE=~/Development/ssl/snakeoil.key 
SSL_CERT_FILE=~/Development/ssl/snakeoil.crt ./bin/mesos-master.sh 
--work_dir=/tmp/mesos
{noformat}

{noformat}
$ curl -k https://127.0.0.1:5050/master/state.json
curl: (35) Server aborted the SSL handshake
{noformat}

Only when using the new prefix, {{LIBPROCESS_SSL}} are the variables actually 
being respected.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6177) Return unregistered agents recovered from registrar in `GetAgents` and/or `/state.json`

2016-10-13 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572997#comment-15572997
 ] 

Zhitao Li edited comment on MESOS-6177 at 10/14/16 1:25 AM:


(edited after recalling that pid is not in SlaveInfo. We should think about 
adding {{Address}} to {{SlaveInfo}} if possible but that has to be a different 
ticket)

[~anandmazumdar], after some more thoughts, I'm inclined to return the full 
{{AgentInfo}} instead of only {{AgentID}} for agents in {{recovered}} state.

This has the benefit to help operators to know the hostname of the agent id 
which is not recovered yet without calling registry again.

-My primary intention is to have a hold of {{pid}}, so the operator/subscriber 
can know the ip:port the agent is listening at. If we only return {{AgentID}}, 
the operator can do little additional babysitting steps to validate the state 
of the agent, except for waiting for {{--agent_reregistration_timeout}} to 
pass.-

-This is also pretty easy to implement IIUIC: we can simply change the 
{{slaves.recovered}} from {{hashset}} to {{hashmap}}. The {{SlaveInfo}} is already available after Registrar recovers 
it.-



was (Author: zhitao):

(edited)

[~anandmazumdar], after some more thoughts, I'm inclined to return the full 
{{AgentInfo}} instead of only {{AgentID}} for agents in {{recovered}} state.

This has the benefit to help operators to know the hostname of the agent id 
which is not recovered yet without calling registry again.

-My primary intention is to have a hold of {{pid}}, so the operator/subscriber 
can know the ip:port the agent is listening at. If we only return {{AgentID}}, 
the operator can do little additional babysitting steps to validate the state 
of the agent, except for waiting for {{--agent_reregistration_timeout}} to 
pass.-

-This is also pretty easy to implement IIUIC: we can simply change the 
{{slaves.recovered}} from {{hashset}} to {{hashmap}}. The {{SlaveInfo}} is already available after Registrar recovers 
it.-


> Return unregistered agents recovered from registrar in `GetAgents` and/or 
> `/state.json`
> ---
>
> Key: MESOS-6177
> URL: https://issues.apache.org/jira/browse/MESOS-6177
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> Use case:
> This can be used for any software which talks to Mesos master to better 
> understand state of an unregistered agent after a master failover.
> If this information is available, the use case in MESOS-6174 can be handled 
> with a simpler decision of whether the corresponding agent is removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6177) Return unregistered agents recovered from registrar in `GetAgents` and/or `/state.json`

2016-10-13 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572997#comment-15572997
 ] 

Zhitao Li edited comment on MESOS-6177 at 10/14/16 1:24 AM:



(edited)

[~anandmazumdar], after some more thoughts, I'm inclined to return the full 
{{AgentInfo}} instead of only {{AgentID}} for agents in {{recovered}} state.

This has the benefit to help operators to know the hostname of the agent id 
which is not recovered yet without calling registry again.

-My primary intention is to have a hold of {{pid}}, so the operator/subscriber 
can know the ip:port the agent is listening at. If we only return {{AgentID}}, 
the operator can do little additional babysitting steps to validate the state 
of the agent, except for waiting for {{--agent_reregistration_timeout}} to 
pass.-

-This is also pretty easy to implement IIUIC: we can simply change the 
{{slaves.recovered}} from {{hashset}} to {{hashmap}}. The {{SlaveInfo}} is already available after Registrar recovers 
it.-



was (Author: zhitao):
[~anandmazumdar], after some more thoughts, I'm inclined to return the full 
{{AgentInfo}} instead of only {{AgentID}} for agents in {{recovered}} state.

My primary intention is to have a hold of {{pid}}, so the operator/subscriber 
can know the ip:port the agent is listening at. If we only return {{AgentID}}, 
the operator can do little additional babysitting steps to validate the state 
of the agent, except for waiting for {{--agent_reregistration_timeout}} to pass.

This is also pretty easy to implement IIUIC: we can simply change the 
{{slaves.recovered}} from {{hashset}} to {{hashmap}}. The {{SlaveInfo}} is already available after Registrar recovers 
it.

> Return unregistered agents recovered from registrar in `GetAgents` and/or 
> `/state.json`
> ---
>
> Key: MESOS-6177
> URL: https://issues.apache.org/jira/browse/MESOS-6177
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> Use case:
> This can be used for any software which talks to Mesos master to better 
> understand state of an unregistered agent after a master failover.
> If this information is available, the use case in MESOS-6174 can be handled 
> with a simpler decision of whether the corresponding agent is removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6392) Remove `TEST*_TEMP_DISABLED_ON_WINDOWS`

2016-10-13 Thread Alex Clemmer (JIRA)
Alex Clemmer created MESOS-6392:
---

 Summary: Remove `TEST*_TEMP_DISABLED_ON_WINDOWS`
 Key: MESOS-6392
 URL: https://issues.apache.org/jira/browse/MESOS-6392
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: Alex Clemmer
Assignee: Alex Clemmer


As a temporary measure, we introduced the family of macros 
`TEST*_TEMP_DISABLED_ON_WINDOWS`. This creates a `DISABLED_` test on Windows, 
but enables it on every other platform.

Eventually, permanently-disabled tests should be `#ifdef`'d out and these 
macros should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6032) Add infrastructure for unit tests in the new python-based CLI.

2016-10-13 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573682#comment-15573682
 ] 

Joseph Wu commented on MESOS-6032:
--

Moved target to 1.2.0.

> Add infrastructure for unit tests in the new python-based CLI.
> --
>
> Key: MESOS-6032
> URL: https://issues.apache.org/jira/browse/MESOS-6032
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6032) Add infrastructure for unit tests in the new python-based CLI.

2016-10-13 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-6032:
-
Target Version/s: 1.2.0  (was: 1.1.0)

> Add infrastructure for unit tests in the new python-based CLI.
> --
>
> Key: MESOS-6032
> URL: https://issues.apache.org/jira/browse/MESOS-6032
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6282) CNI isolator should print plugin's stderr

2016-10-13 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-6282:
-
Target Version/s: 1.2.0  (was: 1.1.0)

> CNI isolator should print plugin's stderr
> -
>
> Key: MESOS-6282
> URL: https://issues.apache.org/jira/browse/MESOS-6282
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, isolation, network
>Reporter: Dan Osborne
>Assignee: Avinash Sridharan
>
> It's quite difficult for both Operators and CNI plugin developers to diagnose 
> CNI plugin errors in production or in test when the only error information 
> available is the stdout error string returned by the plugin (assuming it 
> managed to even print its correctly formatted text to stdout).
> Many CNI plugins print logging information to stderr, [as per the CNI 
> spec|https://github.com/containernetworking/cni/blob/master/SPEC.md#result]:
> bq. In addition, stderr can be used for unstructured output such as logs.
> Therefore, I propose the Mesos CNI Isolator capture the CNI plugin's stderr 
> output and log it to the Agent Logs, for easier diagnosis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6173) Authentication in v2 protobuf should not be `required`.

2016-10-13 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573550#comment-15573550
 ] 

Gilbert Song commented on MESOS-6173:
-

Temporarily target it for 1.2.0.

[~jieyu], please re-target it if we should land it for 1.1.

> Authentication in v2 protobuf should not be `required`.
> ---
>
> Key: MESOS-6173
> URL: https://issues.apache.org/jira/browse/MESOS-6173
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.28.0, 0.28.1, 0.28.2, 1.0.0, 1.0.1
> Environment: Ubuntu 14.04
>Reporter: Bill Zhao
>Assignee: Gilbert Song
>  Labels: containerizer, docker, mesosphere
>
> I was testing the mesos GPU support.  However, I have notice the issue 
> between different docker repository.  The public docker hub works fine, but 
> the private docker repository by JFrog doesn't work as expected.
> I have setup the environment according to this document.
> https://mesosphere.github.io/marathon/docs/native-docker.html
> Tested with mesos-execute command:
> billz2:/etc/mesos-slave$ sudo mesos-execute   
> --master=bz01.apple.com:5050   --name=gpu-test   
> --docker_image=docker.apple.com/nvidia/cuda   --command="nvidia-smi"  
>  --framework_capabilities="GPU_RESOURCES"   --resources="gpus:2"
> I0914 18:32:51.571482 26084 scheduler.cpp:172] Version: 1.0.1
> I0914 18:32:51.579815 26087 scheduler.cpp:461] New master detected at 
> master@17.x.x.x:5050
> Subscribed with ID 'c0968c96-cc66-4990-9c49-d5ef26d07a07-0015'
> Submitted task 'gpu-test' to agent 
> 'c0968c96-cc66-4990-9c49-d5ef26d07a07-S17370'
> Received status update TASK_FAILED for task 'gpu-test'
>   message: 'Failed to launch container: Failed to parse the image manifest: 
> Protobuf parse failed: Missing required fields: signatures[0].header.jwk.kid; 
> Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> The authentication in v2 protobuf should
> not be `required`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6390) Ensure Python support scripts are linted

2016-10-13 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-6390:
-
Shepherd: Joseph Wu

> Ensure Python support scripts are linted
> 
>
> Key: MESOS-6390
> URL: https://issues.apache.org/jira/browse/MESOS-6390
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>  Labels: newbie, python
>
> Currently {{support/mesos-style.py}} does not lint files under {{support/}}. 
> This is mostly due to the fact that these scripts are too inconsistent 
> style-wise that they wouldn't even pass the linter now.
> We should clean up all Python scripts under {{support/}} so they pass the 
> Python linter, and activate that directory in the linter for future 
> additions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2016-10-13 Thread Aniket Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573273#comment-15573273
 ] 

Aniket Bhat commented on MESOS-6010:


[~gilbert] this looks similar to the issue I am seeing with MESOS-6378.



> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6390) Ensure Python support scripts are linted

2016-10-13 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-6390:
---

 Summary: Ensure Python support scripts are linted
 Key: MESOS-6390
 URL: https://issues.apache.org/jira/browse/MESOS-6390
 Project: Mesos
  Issue Type: Improvement
Reporter: Benjamin Bannier


Currently {{support/mesos-style.py}} does not lint files under {{support/}}. 
This is mostly due to the fact that these scripts are too inconsistent 
style-wise that they wouldn't even pass the linter now.

We should clean up all Python scripts under {{support/}} so they pass the 
Python linter, and activate that directory in the linter for future additions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6387) Improve reporting of parallel test runner.

2016-10-13 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6387:

Description: 
We should improve the logging of the parallel test runner. Improved logging 
seems to require some parsing of GoogleTest output, e.g., to prevent 
interleaving test output from concurrently executing tests.

We should add a verbose mode which can print results as they arrive from shards 
in order to not let users wonder if tests are executing.

We should also provide a way to properly unify output from different shards, 
e.g., report passed and failed tests in a single list instead of listing all 
shards separately (this e.g., makes tests from failed shards reported first 
harder to discover).

Distinguishing reports from tests run in parallel and sequentially might be 
useful as well.

  was:
We should improve the logging of the parallel test runner. Improved logging 
seems to require some parsing of GoogleTest output, e.g., to prevent 
interleaving test output from concurrently executing tests.

We should add a verbose mode which can print results as they arrive from shards 
in order to not let users wonder if tests are executing.

We should also provide a way to properly unify output from different shards, 
e.g., report passed and failed tests in a single list instead of listing all 
shards separately (this e.g., makes tests from failed shards reported first 
harder to discover).


> Improve reporting of parallel test runner.
> --
>
> Key: MESOS-6387
> URL: https://issues.apache.org/jira/browse/MESOS-6387
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Benjamin Bannier
>  Labels: mesosphere, newbie, python
>
> We should improve the logging of the parallel test runner. Improved logging 
> seems to require some parsing of GoogleTest output, e.g., to prevent 
> interleaving test output from concurrently executing tests.
> We should add a verbose mode which can print results as they arrive from 
> shards in order to not let users wonder if tests are executing.
> We should also provide a way to properly unify output from different shards, 
> e.g., report passed and failed tests in a single list instead of listing all 
> shards separately (this e.g., makes tests from failed shards reported first 
> harder to discover).
> Distinguishing reports from tests run in parallel and sequentially might be 
> useful as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6389) Update webui for PARTITION_AWARE changes

2016-10-13 Thread Neil Conway (JIRA)
Neil Conway created MESOS-6389:
--

 Summary: Update webui for PARTITION_AWARE changes
 Key: MESOS-6389
 URL: https://issues.apache.org/jira/browse/MESOS-6389
 Project: Mesos
  Issue Type: Bug
  Components: webui
Reporter: Neil Conway
Assignee: Neil Conway






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6388) Report new PARTITION_AWARE task statuses in HTTP endpoints

2016-10-13 Thread Neil Conway (JIRA)
Neil Conway created MESOS-6388:
--

 Summary: Report new PARTITION_AWARE task statuses in HTTP endpoints
 Key: MESOS-6388
 URL: https://issues.apache.org/jira/browse/MESOS-6388
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Neil Conway
Assignee: Neil Conway


At a minimum, the {{/state-summary}} endpoint needs to be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6387) Improve reporting of parallel test runner

2016-10-13 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-6387:
---

 Summary: Improve reporting of parallel test runner
 Key: MESOS-6387
 URL: https://issues.apache.org/jira/browse/MESOS-6387
 Project: Mesos
  Issue Type: Improvement
  Components: test
Reporter: Benjamin Bannier


We should improve the logging of the parallel test runner. Improved logging 
seems to require some parsing of GoogleTest output, e.g., to prevent 
interleaving test output from concurrently executing tests.

We should add a verbose mode which can print results as they arrive from shards 
in order to not let users wonder if tests are executing.

We should also provide a way to properly unify output from different shards, 
e.g., report passed and failed tests in a single list instead of listing all 
shards separately (this e.g., makes tests from failed shards reported first 
harder to discover).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6387) Improve reporting of parallel test runner.

2016-10-13 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6387:
---
Summary: Improve reporting of parallel test runner.  (was: Improve 
reporting of parallel test runner)

> Improve reporting of parallel test runner.
> --
>
> Key: MESOS-6387
> URL: https://issues.apache.org/jira/browse/MESOS-6387
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Benjamin Bannier
>  Labels: mesosphere, newbie, python
>
> We should improve the logging of the parallel test runner. Improved logging 
> seems to require some parsing of GoogleTest output, e.g., to prevent 
> interleaving test output from concurrently executing tests.
> We should add a verbose mode which can print results as they arrive from 
> shards in order to not let users wonder if tests are executing.
> We should also provide a way to properly unify output from different shards, 
> e.g., report passed and failed tests in a single list instead of listing all 
> shards separately (this e.g., makes tests from failed shards reported first 
> harder to discover).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6387) Improve reporting of parallel test runner.

2016-10-13 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6387:
---
Labels: mesosphere newbie python  (was: newbie python)

> Improve reporting of parallel test runner.
> --
>
> Key: MESOS-6387
> URL: https://issues.apache.org/jira/browse/MESOS-6387
> Project: Mesos
>  Issue Type: Improvement
>  Components: test
>Reporter: Benjamin Bannier
>  Labels: mesosphere, newbie, python
>
> We should improve the logging of the parallel test runner. Improved logging 
> seems to require some parsing of GoogleTest output, e.g., to prevent 
> interleaving test output from concurrently executing tests.
> We should add a verbose mode which can print results as they arrive from 
> shards in order to not let users wonder if tests are executing.
> We should also provide a way to properly unify output from different shards, 
> e.g., report passed and failed tests in a single list instead of listing all 
> shards separately (this e.g., makes tests from failed shards reported first 
> harder to discover).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6177) Return unregistered agents recovered from registrar in `GetAgents` and/or `/state.json`

2016-10-13 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572997#comment-15572997
 ] 

Zhitao Li commented on MESOS-6177:
--

[~anandmazumdar], I'm strongly inclined to return the full {{AgentInfo}} 
instead of only {{AgentID}} for agents in {{recovered}} state.

My primary intention is to have a hold of {{pid}}, so the operator/subscriber 
can know the ip:port the agent is listening at. If we only return {{AgentID}}, 
the operator can do little additional babysitting steps to validate the state 
of the agent, except for waiting for {{--agent_reregistration_timeout}} to pass.

This is also pretty easy to implement IIUIC: we can simply change the 
{{slaves.recovered}} from {{hashset}} to {{hashmap}}. The {{SlaveInfo}} is already available after Registrar recovers 
it.

> Return unregistered agents recovered from registrar in `GetAgents` and/or 
> `/state.json`
> ---
>
> Key: MESOS-6177
> URL: https://issues.apache.org/jira/browse/MESOS-6177
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> Use case:
> This can be used for any software which talks to Mesos master to better 
> understand state of an unregistered agent after a master failover.
> If this information is available, the use case in MESOS-6174 can be handled 
> with a simpler decision of whether the corresponding agent is removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6386) "Reached unreachable statement" in LinuxCapabilitiesIsolatorTest

2016-10-13 Thread Neil Conway (JIRA)
Neil Conway created MESOS-6386:
--

 Summary: "Reached unreachable statement" in 
LinuxCapabilitiesIsolatorTest
 Key: MESOS-6386
 URL: https://issues.apache.org/jira/browse/MESOS-6386
 Project: Mesos
  Issue Type: Bug
  Components: containerization
 Environment: CentOS Linux release 7.2.1511 (Core), amd64
Reporter: Neil Conway
Priority: Minor


{noformat}
[ RUN  ] TestParam/LinuxCapabilitiesIsolatorTest.ROOT_Ping/2
Failed to execute command: Permission denied
Reached unreachable statement at 
../../mesos/src/slave/containerizer/mesos/launch.cpp:710
[   OK ] TestParam/LinuxCapabilitiesIsolatorTest.ROOT_Ping/2 (366 ms)
{noformat}

Observed running the tests as root on CentOS 7.2. Verbose test output attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6386) "Reached unreachable statement" in LinuxCapabilitiesIsolatorTest

2016-10-13 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-6386:
---
Attachment: verbose-test-output.txt

> "Reached unreachable statement" in LinuxCapabilitiesIsolatorTest
> 
>
> Key: MESOS-6386
> URL: https://issues.apache.org/jira/browse/MESOS-6386
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: CentOS Linux release 7.2.1511 (Core), amd64
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
> Attachments: verbose-test-output.txt
>
>
> {noformat}
> [ RUN  ] TestParam/LinuxCapabilitiesIsolatorTest.ROOT_Ping/2
> Failed to execute command: Permission denied
> Reached unreachable statement at 
> ../../mesos/src/slave/containerizer/mesos/launch.cpp:710
> [   OK ] TestParam/LinuxCapabilitiesIsolatorTest.ROOT_Ping/2 (366 ms)
> {noformat}
> Observed running the tests as root on CentOS 7.2. Verbose test output 
> attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6386) "Reached unreachable statement" in LinuxCapabilitiesIsolatorTest

2016-10-13 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572734#comment-15572734
 ] 

Neil Conway commented on MESOS-6386:


The fact that we're reaching an "unreachable" statement seems, uh, unexpected. 
[~bbannier] [~jieyu]

> "Reached unreachable statement" in LinuxCapabilitiesIsolatorTest
> 
>
> Key: MESOS-6386
> URL: https://issues.apache.org/jira/browse/MESOS-6386
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: CentOS Linux release 7.2.1511 (Core), amd64
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
> Attachments: verbose-test-output.txt
>
>
> {noformat}
> [ RUN  ] TestParam/LinuxCapabilitiesIsolatorTest.ROOT_Ping/2
> Failed to execute command: Permission denied
> Reached unreachable statement at 
> ../../mesos/src/slave/containerizer/mesos/launch.cpp:710
> [   OK ] TestParam/LinuxCapabilitiesIsolatorTest.ROOT_Ping/2 (366 ms)
> {noformat}
> Observed running the tests as root on CentOS 7.2. Verbose test output 
> attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6385) Document how containierization works in terms of entering namespaces / setting up cgroups, etc.

2016-10-13 Thread Kevin Klues (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6385:
---
Component/s: documentation

> Document how containierization works in terms of entering namespaces / 
> setting up cgroups, etc.
> ---
>
> Key: MESOS-6385
> URL: https://issues.apache.org/jira/browse/MESOS-6385
> Project: Mesos
>  Issue Type: Task
>  Components: containerization, documentation
>Reporter: Kevin Klues
>Priority: Minor
>
> There is currently alot of tribal knowledge in what it means to actually 
> setup a container and launch a process inside of it.  It would be nice to see 
> some documentation produced which outlines the exact process of launching a 
> new container, as well as the process involved in executing a new task inside 
> that container (or as a nested container, which shares some portion of the 
> container, but not all of it).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6384) Support for size with create-on-demand volumes with Docker containerizer

2016-10-13 Thread David Bosschaert (JIRA)
David Bosschaert created MESOS-6384:
---

 Summary: Support for size with create-on-demand volumes with 
Docker containerizer
 Key: MESOS-6384
 URL: https://issues.apache.org/jira/browse/MESOS-6384
 Project: Mesos
  Issue Type: Improvement
  Components: docker
Reporter: David Bosschaert


This issue was originally worded as marathon issue 
https://github.com/mesosphere/marathon/issues/4583

When specifying an external volume to be used with a docker container, the size 
of the external volume cannot be specified. External volumes are often created 
on-demand, when first used, but if no size can be specified all volumes will 
have the same, default size. 
In the case if DVDI with REXRay, the default size is 16GB. This default can be 
modified but it is clearly desirable to be able to specify per-volume what the 
size should be.

When mesos creates the external volume on-demand, which I presume normally 
happens before the docker container is launched, it should allow the 
specification of the size of this volume. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6335) Add user doc for task group tasks

2016-10-13 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6335:
--
Shepherd: Benjamin Mahler  (was: Vinod Kone)
Assignee: Vinod Kone

> Add user doc for task group tasks
> -
>
> Key: MESOS-6335
> URL: https://issues.apache.org/jira/browse/MESOS-6335
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6335) Add user doc for task group tasks

2016-10-13 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6335:
--
Sprint: Mesosphere Sprint 44

> Add user doc for task group tasks
> -
>
> Key: MESOS-6335
> URL: https://issues.apache.org/jira/browse/MESOS-6335
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6383) NvidiaGpuAllocator::resources cannot load symbol nvmlGetDeviceMinorNumber - can the device minor number be ascertained reliably using an older set of API calls?

2016-10-13 Thread Dylan Bethune-Waddell (JIRA)
Dylan Bethune-Waddell created MESOS-6383:


 Summary: NvidiaGpuAllocator::resources cannot load symbol 
nvmlGetDeviceMinorNumber - can the device minor number be ascertained reliably 
using an older set of API calls?
 Key: MESOS-6383
 URL: https://issues.apache.org/jira/browse/MESOS-6383
 Project: Mesos
  Issue Type: Improvement
Affects Versions: 1.0.1
Reporter: Dylan Bethune-Waddell
Priority: Minor


We're attempting to deploy Mesos on a cluster with 2 Nvidia GPUs per host. We 
are not in a position to upgrade the Nvidia drivers in the near future, and are 
currently at driver version 319.72

When attempting to launch an agent with the following command and take 
advantage of Nvidia GPU support (master address elided):

bq. {{./bin/mesos-agent.sh --master=: 
--work_dir=/tmp/mesos --isolation="cgroups/devices,gpu/nvidia"}}

I receive the following error message:

bq. {{Failed to create a containerizer: Failed call to 
NvidiaGpuAllocator::resources: Failed to nvml::initialize: Failed to load 
symbol 'nvmlDeviceGetMinorNumber': Error looking up symbol 
'nvmlDeviceGetMinorNumber' in 'libnvidia-ml.so.1' : 
/usr/lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetMinorNumber}}

Based on the change log for the NVML module, it seems that 
{{nvmlDeviceGetMinorNumber}} is only available for driver versions 331 and 
later as per info under the [Changes between NVML v5.319 Update and 
v331|http://docs.nvidia.com/deploy/nvml-api/change-log.html#change-log] heading 
in the NVML API reference.

Is there is an alternate method of obtaining this information at runtime to 
enable support for older versions of the Nvidia driver? A modest search has not 
yet yielded much insight on a path forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6035) Add non-recursive version of cgroups::get

2016-10-13 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6035:

Target Version/s: 1.2.0  (was: 1.1.0)

> Add non-recursive version of cgroups::get
> -
>
> Key: MESOS-6035
> URL: https://issues.apache.org/jira/browse/MESOS-6035
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> In some cases, we only need to get the top level cgroups instead of to get 
> all cgroups recursively. Add a non-recursive version could help to avoid 
> unnecessary paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3335) FlagsBase copy-ctor leads to dangling pointer.

2016-10-13 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3335:
--
Story Points: 8

> FlagsBase copy-ctor leads to dangling pointer.
> --
>
> Key: MESOS-3335
> URL: https://issues.apache.org/jira/browse/MESOS-3335
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Benjamin Bannier
>  Labels: mesosphere
> Attachments: lambda_capture_bug.cpp
>
>
> Per [#3328], ubsan detects the following problem:
> [ RUN ] FaultToleranceTest.ReregisterCompletedFrameworks
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp:303:25:
>  runtime error: load of value 33, which is not a valid value for type 'bool'
> I believe what is going on here is the following:
> * The test calls StartMaster(), which does MesosTest::CreateMasterFlags()
> * MesosTest::CreateMasterFlags() allocates a new master::Flags on the stack, 
> which is subsequently copy-constructed back to StartMaster()
> * The FlagsBase constructor is:
> bq. {{FlagsBase() { add(, "help", "...", false); }}}
> where "help" is a member variable -- i.e., it is allocated on the stack in 
> this case.
> * {{FlagsBase()::add}} captures {{}}, e.g.:
> {noformat}
> flag.stringify = [t1](const FlagsBase&) -> Option {
> return stringify(*t1);
>   };}}
> {noformat}
> * The implicit copy constructor for FlagsBase is just going to copy the 
> lambda above, i.e., the result of the copy constructor will have a lambda 
> that points into MesosTest::CreateMasterFlags()'s stack frame, which is bad 
> news.
> Not sure the right fix -- comments welcome. You could define a copy-ctor for 
> FlagsBase that does something gross (basically remove the old help flag and 
> define a new one that points into the target of the copy), but that seems, 
> well, gross.
> Probably not a pressing-problem to fix -- AFAICS worst symptom is that we end 
> up reading one byte from some random stack location when serving 
> {{state.json}}, for example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6293) HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.

2016-10-13 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6293:
---
Shepherd: Alexander Rukletsov

> HealthCheckTest.HealthyTaskViaHTTPWithoutType fails on some distros.
> 
>
> Key: MESOS-6293
> URL: https://issues.apache.org/jira/browse/MESOS-6293
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>  Labels: health-check, mesosphere
>
> I see consistent failures of this test in the internal CI in *some* distros, 
> specifically CentOS 6, Ubuntu 14, 15, 16. The source of the health check 
> failure is always the same: {{curl}} cannot connect to the target:
> {noformat}
> Received task health update, healthy: false
> W0929 17:22:05.270992  2730 health_checker.cpp:204] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) couldn't connect to host
> I0929 17:22:05.273634 26850 slave.cpp:3609] Handling status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from executor(1)@172.30.2.20:58660
> I0929 17:22:05.274178 26844 status_update_manager.cpp:323] Received status 
> update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274226 26844 status_update_manager.cpp:377] Forwarding update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to the agent
> I0929 17:22:05.274314 26845 slave.cpp:4026] Forwarding the update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to master@172.30.2.20:38955
> I0929 17:22:05.274415 26845 slave.cpp:3920] Status update manager 
> successfully handled status update TASK_RUNNING (UUID: 
> f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> I0929 17:22:05.274436 26845 slave.cpp:3936] Sending acknowledgement for 
> status update TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for 
> task aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of 
> framework 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- to 
> executor(1)@172.30.2.20:58660
> I0929 17:22:05.274534 26849 master.cpp:5661] Status update TASK_RUNNING 
> (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- from agent 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-S0 at slave(77)@172.30.2.20:38955 
> (ip-172-30-2-20.mesosphere.io)
> ../../src/tests/health_check_tests.cpp:1398: Failure
> I0929 17:22:05.274567 26849 master.cpp:5723] Forwarding status update 
> TASK_RUNNING (UUID: f5408ac9-f6ba-447f-b3d7-9dce44384ffe) for task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda in health state unhealthy of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f-
> Value of: statusHealth.get().healthy()
>   Actual: false
>   Expected: true
> I0929 17:22:05.274636 26849 master.cpp:7560] Updating the state of task 
> aa0792d3-8d85-4c32-bd04-56a9b552ebda of framework 
> 2e0e9ea1-0ae5-4f28-80bb-a9abc56c5a6f- (latest state: TASK_RUNNING, status 
> update state: TASK_RUNNING)
> I0929 17:22:05.274829 26844 sched.cpp:1025] Scheduler::statusUpdate took 
> 43297ns
> Received SHUTDOWN event
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6382) Add option to enable parallel test runner for cmake builds

2016-10-13 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-6382:
---

 Summary: Add option to enable parallel test runner for cmake builds
 Key: MESOS-6382
 URL: https://issues.apache.org/jira/browse/MESOS-6382
 Project: Mesos
  Issue Type: Bug
  Components: cmake
Reporter: Benjamin Bannier


We should add a config option to enable the parallel test runner already 
available in the autotools setup also in the cmake setup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6381) Support for Docker's --live-restore setting

2016-10-13 Thread Alex Withrow (JIRA)
Alex Withrow created MESOS-6381:
---

 Summary: Support for Docker's --live-restore setting
 Key: MESOS-6381
 URL: https://issues.apache.org/jira/browse/MESOS-6381
 Project: Mesos
  Issue Type: Wish
  Components: docker
Reporter: Alex Withrow


Docker after 1.12.0 has a --live-restore which allows for containers to 
continue to run when the docker daemon itself goes down (such as upgrades). It 
would be great if mesos supported this feature when running tasks in docker. 
Currently if the docker daemon goes down, all tasks running in docker 
containers are reported FAILED and are rescheduled.

Perhaps the addition of a timeout setting before ending the tasks would allow 
this to work? Having this would allow for upgrades of the docker daemon without 
having to reschedule the current tasks similar to how mesos agent upgrades work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6301) Recursive destroy in MesosContainerizer is problematic.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6301:
--
Story Points: 3

> Recursive destroy in MesosContainerizer is problematic.
> ---
>
> Key: MESOS-6301
> URL: https://issues.apache.org/jira/browse/MESOS-6301
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> When doing recursive destroy, we should return the collected future of
> nested container destroys. Intead, we should fail the corresponding
> termination and return that termination if nested container destroys
> failed.
> 
> In addition, we cannot remove 'Container' struct from the internal map
> when the destroy of a nested container failed. This is to ensure that
> the top level container do not proceed with destroy if any of its
> nested container failed to destroy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6323) 'mesos-containerizer launch' should inherit agent environment variables.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6323:
--
Story Points: 5

> 'mesos-containerizer launch' should inherit agent environment variables.
> 
>
> Key: MESOS-6323
> URL: https://issues.apache.org/jira/browse/MESOS-6323
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Critical
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> If some dynamic libraries that agent depends on are stored in a non standard 
> location, and the operator starts the agent using LD_LIBRARY_PATH. When we 
> actually fork/exec the 'mesos-containerizer launch' helper, we need to make 
> sure it inherits agent's environment variables. Otherwise, it might throw 
> linking errors. This makes sense because it's a Mesos controlled process.
> However, the the helper actually fork/exec the user container (or executor), 
> we need to make sure to strip the agent environment variables.
> The tricky case is for default executor and command executor. These two are 
> controlled by Mesos as well, we also want them to have agent environment 
> variables. We need to somehow distinguish this from custom executor case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6300) A destroyed nested container is not reflected in the parent container's children map.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6300:
--
Story Points: 2

> A destroyed nested container is not reflected in the parent container's 
> children map.
> -
>
> Key: MESOS-6300
> URL: https://issues.apache.org/jira/browse/MESOS-6300
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> We should update parent container's children map if it's nested container is 
> terminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6284) MesosContainerizer should skip non-nesting aware isolators for nested container.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6284:
--
Story Points: 3

> MesosContainerizer should skip non-nesting aware isolators for nested 
> container.
> 
>
> Key: MESOS-6284
> URL: https://issues.apache.org/jira/browse/MESOS-6284
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> If an isolator is not nesting aware, we should not invoke methods from them 
> for nested containers. This ensures that an old isolator does not get 
> surprises



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6284) MesosContainerizer should skip non-nesting aware isolators for nested container.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6284:
--
Labels: mesosphere  (was: )

> MesosContainerizer should skip non-nesting aware isolators for nested 
> container.
> 
>
> Key: MESOS-6284
> URL: https://issues.apache.org/jira/browse/MESOS-6284
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> If an isolator is not nesting aware, we should not invoke methods from them 
> for nested containers. This ensures that an old isolator does not get 
> surprises



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6301) Recursive destroy in MesosContainerizer is problematic.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6301:
--
Labels: mesosphere  (was: )

> Recursive destroy in MesosContainerizer is problematic.
> ---
>
> Key: MESOS-6301
> URL: https://issues.apache.org/jira/browse/MESOS-6301
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> When doing recursive destroy, we should return the collected future of
> nested container destroys. Intead, we should fail the corresponding
> termination and return that termination if nested container destroys
> failed.
> 
> In addition, we cannot remove 'Container' struct from the internal map
> when the destroy of a nested container failed. This is to ensure that
> the top level container do not proceed with destroy if any of its
> nested container failed to destroy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6323) 'mesos-containerizer launch' should inherit agent environment variables.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6323:
--
Labels: mesosphere  (was: )

> 'mesos-containerizer launch' should inherit agent environment variables.
> 
>
> Key: MESOS-6323
> URL: https://issues.apache.org/jira/browse/MESOS-6323
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Critical
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> If some dynamic libraries that agent depends on are stored in a non standard 
> location, and the operator starts the agent using LD_LIBRARY_PATH. When we 
> actually fork/exec the 'mesos-containerizer launch' helper, we need to make 
> sure it inherits agent's environment variables. Otherwise, it might throw 
> linking errors. This makes sense because it's a Mesos controlled process.
> However, the the helper actually fork/exec the user container (or executor), 
> we need to make sure to strip the agent environment variables.
> The tricky case is for default executor and command executor. These two are 
> controlled by Mesos as well, we also want them to have agent environment 
> variables. We need to somehow distinguish this from custom executor case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6300) A destroyed nested container is not reflected in the parent container's children map.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6300:
--
Labels: mesosphere  (was: )

> A destroyed nested container is not reflected in the parent container's 
> children map.
> -
>
> Key: MESOS-6300
> URL: https://issues.apache.org/jira/browse/MESOS-6300
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> We should update parent container's children map if it's nested container is 
> terminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6284) MesosContainerizer should skip non-nesting aware isolators for nested container.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6284:
--
Sprint: Mesosphere Sprint 44

> MesosContainerizer should skip non-nesting aware isolators for nested 
> container.
> 
>
> Key: MESOS-6284
> URL: https://issues.apache.org/jira/browse/MESOS-6284
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Blocker
> Fix For: 1.1.0
>
>
> If an isolator is not nesting aware, we should not invoke methods from them 
> for nested containers. This ensures that an old isolator does not get 
> surprises



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6301) Recursive destroy in MesosContainerizer is problematic.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6301:
--
Sprint: Mesosphere Sprint 44

> Recursive destroy in MesosContainerizer is problematic.
> ---
>
> Key: MESOS-6301
> URL: https://issues.apache.org/jira/browse/MESOS-6301
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
> Fix For: 1.1.0
>
>
> When doing recursive destroy, we should return the collected future of
> nested container destroys. Intead, we should fail the corresponding
> termination and return that termination if nested container destroys
> failed.
> 
> In addition, we cannot remove 'Container' struct from the internal map
> when the destroy of a nested container failed. This is to ensure that
> the top level container do not proceed with destroy if any of its
> nested container failed to destroy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6323) 'mesos-containerizer launch' should inherit agent environment variables.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6323:
--
Sprint: Mesosphere Sprint 44

> 'mesos-containerizer launch' should inherit agent environment variables.
> 
>
> Key: MESOS-6323
> URL: https://issues.apache.org/jira/browse/MESOS-6323
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
>Priority: Critical
> Fix For: 1.1.0
>
>
> If some dynamic libraries that agent depends on are stored in a non standard 
> location, and the operator starts the agent using LD_LIBRARY_PATH. When we 
> actually fork/exec the 'mesos-containerizer launch' helper, we need to make 
> sure it inherits agent's environment variables. Otherwise, it might throw 
> linking errors. This makes sense because it's a Mesos controlled process.
> However, the the helper actually fork/exec the user container (or executor), 
> we need to make sure to strip the agent environment variables.
> The tricky case is for default executor and command executor. These two are 
> controlled by Mesos as well, we also want them to have agent environment 
> variables. We need to somehow distinguish this from custom executor case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6300) A destroyed nested container is not reflected in the parent container's children map.

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6300:
--
Sprint: Mesosphere Sprint 44

> A destroyed nested container is not reflected in the parent container's 
> children map.
> -
>
> Key: MESOS-6300
> URL: https://issues.apache.org/jira/browse/MESOS-6300
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Jie Yu
> Fix For: 1.1.0
>
>
> We should update parent container's children map if it's nested container is 
> terminated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6302) Agent recovery can fail after nested containers are launched

2016-10-13 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6302:
--
Shepherd: Jie Yu

> Agent recovery can fail after nested containers are launched
> 
>
> Key: MESOS-6302
> URL: https://issues.apache.org/jira/browse/MESOS-6302
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Greg Mann
>Assignee: Gilbert Song
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.1.0
>
> Attachments: read_write_app.json
>
>
> After launching a nested container which used a Docker image, I restarted the 
> agent which ran that task group and saw the following in the agent logs 
> during recovery:
> {code}
> Oct 01 01:45:10 ip-10-0-3-133.us-west-2.compute.internal mesos-agent[4629]: 
> I1001 01:45:10.813596  4640 status_update_manager.cpp:203] Recovering status 
> update manager
> Oct 01 01:45:10 ip-10-0-3-133.us-west-2.compute.internal mesos-agent[4629]: 
> I1001 01:45:10.813622  4640 status_update_manager.cpp:211] Recovering 
> executor 'instance-testvolume.02c26bce-8778-11e6-9ff3-7a3cd7c1568e' of 
> framework 118ca38d-daee-4b2d-b584-b5581738a3dd-
> Oct 01 01:45:10 ip-10-0-3-133.us-west-2.compute.internal mesos-agent[4629]: 
> I1001 01:45:10.814249  4639 docker.cpp:745] Recovering Docker containers
> Oct 01 01:45:10 ip-10-0-3-133.us-west-2.compute.internal mesos-agent[4629]: 
> I1001 01:45:10.815294  4642 containerizer.cpp:581] Recovering containerizer
> Oct 01 01:45:10 ip-10-0-3-133.us-west-2.compute.internal mesos-agent[4629]: 
> Failed to perform recovery: Collect failed: Unable to list rootfses belonged 
> to container a7d576da-fd0f-4dc1-bd5a-6d0a93ac8a53: Unable to list the 
> container directory: Failed to opendir 
> '/var/lib/mesos/slave/provisioner/containers/a7d576da-fd0f-4dc1-bd5a-6d0a93ac8a53/backends':
>  No such file or directory
> Oct 01 01:45:10 ip-10-0-3-133.us-west-2.compute.internal mesos-agent[4629]: 
> To remedy this do as follows:
> Oct 01 01:45:10 ip-10-0-3-133.us-west-2.compute.internal mesos-agent[4629]: 
> Step 1: rm -f /var/lib/mesos/slave/meta/slaves/latest
> Oct 01 01:45:10 ip-10-0-3-133.us-west-2.compute.internal mesos-agent[4629]:   
>   This ensures agent doesn't recover old live executors.
> Oct 01 01:45:10 ip-10-0-3-133.us-west-2.compute.internal mesos-agent[4629]: 
> Step 2: Restart the agent.
> {code}
> and the agent continues to restart in this fashion. Attached is the Marathon 
> app definition that I used to launch the task group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6238) SSL / libevent support broken in IPv6 patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-10-13 Thread Lukas Loesche (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571916#comment-15571916
 ] 

Lukas Loesche edited comment on MESOS-6238 at 10/13/16 1:29 PM:


Can confirm compiling works now on Fedora 24.


was (Author: lloesche):
Can confirm it works now.

> SSL / libevent support broken in IPv6 patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6238
> URL: https://issues.apache.org/jira/browse/MESOS-6238
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>Assignee: Benno Evers
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 
> make fails when configure options --enable-ssl --enable-libevent were given.
> Error message:
> {noformat}
> ...
> ...
> ../../../3rdparty/libprocess/src/process.cpp: In member function ‘void 
> process::SocketManager::link_connect(const process::Future&, 
> process::network::Socket, const process::UPID&)’:
> ../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not 
> declared in this scope
>Try ip = url.ip;
>  ^
> Makefile:997: recipe for target 'libprocess_la-process.lo' failed
> make[5]: *** [libprocess_la-process.lo] Error 1
> ...
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6238) SSL / libevent support broken in IPv6 patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-10-13 Thread Lukas Loesche (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571916#comment-15571916
 ] 

Lukas Loesche commented on MESOS-6238:
--

Can confirm it works now.

> SSL / libevent support broken in IPv6 patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6238
> URL: https://issues.apache.org/jira/browse/MESOS-6238
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>Assignee: Benno Evers
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9 
> make fails when configure options --enable-ssl --enable-libevent were given.
> Error message:
> {noformat}
> ...
> ...
> ../../../3rdparty/libprocess/src/process.cpp: In member function ‘void 
> process::SocketManager::link_connect(const process::Future&, 
> process::network::Socket, const process::UPID&)’:
> ../../../3rdparty/libprocess/src/process.cpp:1457:25: error: ‘url’ was not 
> declared in this scope
>Try ip = url.ip;
>  ^
> Makefile:997: recipe for target 'libprocess_la-process.lo' failed
> make[5]: *** [libprocess_la-process.lo] Error 1
> ...
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6350) Raise minimum required cmake version

2016-10-13 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-6350:

Description: We currently require at least cmake-2.8 which had its first 
point release 2010 and last update 2013. Meanwhile upstream is preparing the 
release of 3.7.0. While cmake support in Mesos is still experimental we should 
evaluate how much we can increase the minimal required version so we are not 
locked into an old version lacking desirable features.  (was: It seems cmake's 
{{BYPRODUCTS}} or {{BUILD_BYPRODUCTS}} clauses are the correct tool to properly 
model implicitly generated 3rdparty artifacts.

However these are only available in cmake-3.2 or 3.3, respectively. 
https://cmake.org/cmake/help/v3.3/policy/CMP0058.html

We should evaluate what is holding us back to upgrade 
{{cmake_minimum_required}} to at least 3.3. Meanwhile upstream is preparing the 
release of 3.7.0. )
Summary: Raise minimum required cmake version  (was: Raised minimum 
required cmake version)

> Raise minimum required cmake version
> 
>
> Key: MESOS-6350
> URL: https://issues.apache.org/jira/browse/MESOS-6350
> Project: Mesos
>  Issue Type: Improvement
>  Components: cmake
>Reporter: Benjamin Bannier
>  Labels: mesosphere, tech-debt
>
> We currently require at least cmake-2.8 which had its first point release 
> 2010 and last update 2013. Meanwhile upstream is preparing the release of 
> 3.7.0. While cmake support in Mesos is still experimental we should evaluate 
> how much we can increase the minimal required version so we are not locked 
> into an old version lacking desirable features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6237) Agent Sandbox inaccessible when using IPv6 address in patch from https://github.com/lava/mesos/tree/bennoe/ipv6

2016-10-13 Thread Lukas Loesche (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571714#comment-15571714
 ] 

Lukas Loesche commented on MESOS-6237:
--

Example:
{noformat}
./mesos-master --work_dir=/tmp --ip=2001:41d0:1000:ab9:: --no-hostname_lookup
sudo ./mesos-slave --work_dir=/tmp --ip=2001:41d0:1000:ab9:: 
--master=[2001:41d0:1000:ab9::]:5050 --no-hostname_lookup
{noformat}


> Agent Sandbox inaccessible when using IPv6 address in patch from 
> https://github.com/lava/mesos/tree/bennoe/ipv6
> ---
>
> Key: MESOS-6237
> URL: https://issues.apache.org/jira/browse/MESOS-6237
> Project: Mesos
>  Issue Type: Bug
>Reporter: Lukas Loesche
>Assignee: Benno Evers
>
> Affects https://github.com/lava/mesos/tree/bennoe/ipv6 at commit 
> 2199a24c0b7a782a0381aad8cceacbc95ec3d5c9
> When using IPs instead of hostnames the Agent Sandbox is inaccessible in the 
> Web UI. The problem seems to be that there's no brackets around the IP so it 
> tries to access e.g. http://2001:41d0:1000:ab9:::5051 instead of 
> http://[2001:41d0:1000:ab9::]:5051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6380) mesos-local failed to start without sudo

2016-10-13 Thread haosdent (JIRA)
haosdent created MESOS-6380:
---

 Summary: mesos-local failed to start without sudo
 Key: MESOS-6380
 URL: https://issues.apache.org/jira/browse/MESOS-6380
 Project: Mesos
  Issue Type: Bug
Reporter: haosdent


Got this error when launch mesos-local without sudo

{code}
 message: 'Failed to launch container: Failed to make the containerizer runtime 
directory '/var/run/mesos/containers/f2d6947f-2916-4f1a-90dc-3d137b360b9c': 
Permission denied; Abnormal executor termination: unknown container'
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)