[jira] [Updated] (MESOS-7003) Introduce the AuthenticationContext

2017-01-25 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-7003:
-
Summary: Introduce the AuthenticationContext  (was: Update the default 
executor to authenticate)

> Introduce the AuthenticationContext
> ---
>
> Key: MESOS-7003
> URL: https://issues.apache.org/jira/browse/MESOS-7003
> Project: Mesos
>  Issue Type: Task
>  Components: executor, security
>Reporter: Greg Mann
>  Labels: executor, security
>
> The default executor should be updated to authenticate with the agent when 
> HTTP executor authentication is enabled. This will entail:
> * loading the default JWT authenticatee module
> * calling into the authenticatee before making requests to the agent
> * decorating requests with the headers returned by the authenticatee



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6304) Add authentication support to the default executor

2017-01-25 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6304:
-
Description: 
The default executor should be updated to authenticate with the agent when HTTP 
executor authentication is enabled. This will entail:
* loading the default JWT authenticatee module
* calling into the authenticatee before making requests to the agent
* decorating requests with the headers returned by the authenticatee

  was:




> Add authentication support to the default executor
> --
>
> Key: MESOS-6304
> URL: https://issues.apache.org/jira/browse/MESOS-6304
> Project: Mesos
>  Issue Type: Improvement
>  Components: executor, modules, security
>Reporter: Galen Pewtherer
>Assignee: Greg Mann
>  Labels: executor, module, security
>
> The default executor should be updated to authenticate with the agent when 
> HTTP executor authentication is enabled. This will entail:
> * loading the default JWT authenticatee module
> * calling into the authenticatee before making requests to the agent
> * decorating requests with the headers returned by the authenticatee



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6304) Add authentication support to the default executor

2017-01-25 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6304:
-
Story Points: 5
  Labels: executor module security  (was: )
 Component/s: security
  modules
  executor

> Add authentication support to the default executor
> --
>
> Key: MESOS-6304
> URL: https://issues.apache.org/jira/browse/MESOS-6304
> Project: Mesos
>  Issue Type: Improvement
>  Components: executor, modules, security
>Reporter: Galen Pewtherer
>Assignee: Greg Mann
>  Labels: executor, module, security
>
> The default executor should be updated to authenticate with the agent when 
> HTTP executor authentication is enabled. This will entail:
> * loading the default JWT authenticatee module
> * calling into the authenticatee before making requests to the agent
> * decorating requests with the headers returned by the authenticatee



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-7003) Update the default executor to authenticate

2017-01-25 Thread Greg Mann (JIRA)
Greg Mann created MESOS-7003:


 Summary: Update the default executor to authenticate
 Key: MESOS-7003
 URL: https://issues.apache.org/jira/browse/MESOS-7003
 Project: Mesos
  Issue Type: Task
  Components: executor, security
Reporter: Greg Mann


The default executor should be updated to authenticate with the agent when HTTP 
executor authentication is enabled. This will entail:
* loading the default JWT authenticatee module
* calling into the authenticatee before making requests to the agent
* decorating requests with the headers returned by the authenticatee



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-7002) Implement a JWT HTTP authenticatee module

2017-01-25 Thread Greg Mann (JIRA)
Greg Mann created MESOS-7002:


 Summary: Implement a JWT HTTP authenticatee module
 Key: MESOS-7002
 URL: https://issues.apache.org/jira/browse/MESOS-7002
 Project: Mesos
  Issue Type: Task
  Components: executor, security
Reporter: Greg Mann


An implementation of the new {{HttpAuthenticatee}} interface should be added 
for executors to use when authenticating with the default JSON web token (JWT) 
authenticator module. This module will be loaded into the default executor by 
default when HTTP executor authentication is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-7001) Implement a JWT authenticator

2017-01-25 Thread Greg Mann (JIRA)
Greg Mann created MESOS-7001:


 Summary: Implement a JWT authenticator
 Key: MESOS-7001
 URL: https://issues.apache.org/jira/browse/MESOS-7001
 Project: Mesos
  Issue Type: Task
  Components: modules, security
Reporter: Greg Mann


A JSON web token (JWT) authenticator module should be added to authenticate 
executors which use default credentials generated by the agent. This module 
will be loaded as an HTTP authenticator by default when 
{{--authenticate_http_executors}} is set, unless HTTP authenticators are 
specified explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-7000) Implement a JWT CredentialGenerator

2017-01-25 Thread Greg Mann (JIRA)
Greg Mann created MESOS-7000:


 Summary: Implement a JWT CredentialGenerator
 Key: MESOS-7000
 URL: https://issues.apache.org/jira/browse/MESOS-7000
 Project: Mesos
  Issue Type: Task
  Components: agent, modules, security
Reporter: Greg Mann


The default {{CredentialGenerator}} for the generation of default executor 
credentials will be a module which generates JSON web tokens. This module will 
be loaded by default when executor credential generation is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6999) Add agent flag to generate and pass executor credentials

2017-01-25 Thread Greg Mann (JIRA)
Greg Mann created MESOS-6999:


 Summary: Add agent flag to generate and pass executor credentials
 Key: MESOS-6999
 URL: https://issues.apache.org/jira/browse/MESOS-6999
 Project: Mesos
  Issue Type: Task
  Components: agent, security
Reporter: Greg Mann


A new agent flag {{--generate_executor_credentials}} is needed to support 
executor authentication. It should enable the generation of default executor 
credentials, which will entail:
* loading the default {{CredentialGenerator}} module
* calling the credential generator when launching an executor
* passing the generated credential into the executor's environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6998) Add agent flag to enable authentication of '/v1/executor'

2017-01-25 Thread Greg Mann (JIRA)
Greg Mann created MESOS-6998:


 Summary: Add agent flag to enable authentication of '/v1/executor'
 Key: MESOS-6998
 URL: https://issues.apache.org/jira/browse/MESOS-6998
 Project: Mesos
  Issue Type: Task
  Components: agent, security
Reporter: Greg Mann


The new agent flag {{--authenticate_http_executors}} must be added. When set, 
it will require that requests received on the {{/v1/executor}} endpoint be 
authenticated. Note that this will require the addition of a new authentication 
realm for that endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6997) Add new module interfaces for executor authentication

2017-01-25 Thread Greg Mann (JIRA)
Greg Mann created MESOS-6997:


 Summary: Add new module interfaces for executor authentication
 Key: MESOS-6997
 URL: https://issues.apache.org/jira/browse/MESOS-6997
 Project: Mesos
  Issue Type: Task
  Components: executor, modules, security
Reporter: Greg Mann


Two new module interfaces are needed to accommodate executor authentication:
* {{CredentialGenerator}}
* {{HttpAuthenticatee}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6996) Add a 'Secret' protobuf message

2017-01-25 Thread Greg Mann (JIRA)
Greg Mann created MESOS-6996:


 Summary: Add a 'Secret' protobuf message
 Key: MESOS-6996
 URL: https://issues.apache.org/jira/browse/MESOS-6996
 Project: Mesos
  Issue Type: Task
  Components: security
Reporter: Greg Mann


A {{Secret}} protobuf message should be added to serve as a generic message for 
sending credentials and other secrets throughout Mesos.

A new field of type {{Secret}} should also be added to the {{Environment}} 
message to enable the inclusion of secrets in executor and task environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6304) Add authentication support to the default executor

2017-01-25 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-6304:
-
Description: 



  was:
Right now the default executor (used to launch task groups) does not 
authenticate with either the executor API (/v1/executor) or the agent API (v1). 
Ofcourse, the driver based executor doesn't authenticate either.

It would be great to come up with a solution that works for both the built-in 
executors and custom executors.



> Add authentication support to the default executor
> --
>
> Key: MESOS-6304
> URL: https://issues.apache.org/jira/browse/MESOS-6304
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Galen Pewtherer
>Assignee: Greg Mann
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6831) Add metrics for `slave` libprocess' event queue

2017-01-25 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6831:

Assignee: Zhitao Li

> Add metrics for `slave` libprocess' event queue
> ---
>
> Key: MESOS-6831
> URL: https://issues.apache.org/jira/browse/MESOS-6831
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>  Labels: monitoring
>
> We have event queue metrics for master and allocator in 
> http://mesos.apache.org/documentation/latest/monitoring/, but we don't have 
> the event queue length for the most important libprocess actor in agent 
> `slave`.
> I propose we add similar metrics to this actor. This is at least useful in 
> debugging the issues of whether  Mesos agent is overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6989) Docker executor segfaults in ~MesosExecutorDriver()

2017-01-25 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839128#comment-15839128
 ] 

Gilbert Song commented on MESOS-6989:
-

[~kaysoky], sure, thanks. Might relate to another `finalize()` that was 
introduced.

> Docker executor segfaults in ~MesosExecutorDriver()
> ---
>
> Key: MESOS-6989
> URL: https://issues.apache.org/jira/browse/MESOS-6989
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Reporter: Jan-Philip Gehrcke
>
> With the current Mesos master state (commit 
> 42e515bc5c175a318e914d34473016feda4db6ff), the Docker executor segfaults 
> during shutdown. 
> Steps to reproduce:
> 1) Start master:
> {code}
> $ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/jp/mesos
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0125 13:41:15.963775 14744 main.cpp:278] Build: 2017-01-25 13:37:42 by jp
> I0125 13:41:15.963868 14744 main.cpp:279] Version: 1.2.0
> I0125 13:41:15.963877 14744 main.cpp:286] Git SHA: 
> 42e515bc5c175a318e914d34473016feda4db6ff
> {code}
> (note that building it at 13:37 is not part of the repro)
> 2) Start agent:
> {code}
> $ ./bin/mesos-slave.sh --containerizers=mesos,docker --master=127.0.0.1:5050 
> --work_dir=/tmp/jp/mesos
> {code}
> 3) Run {{mesos-execute}} with the Docker containerizer:
> {code}
> $ ./src/mesos-execute --master=127.0.0.1:5050 --name=testcommand 
> --containerizer=docker --docker_image=debian --command=env
> I0125 13:43:59.704973 14951 scheduler.cpp:184] Version: 1.2.0
> I0125 13:43:59.706425 14952 scheduler.cpp:470] New master detected at 
> master@127.0.0.1:5050
> Subscribed with ID 57596743-06f4-45f1-a975-348cf70589b1-
> Submitted task 'testcommand' to agent 
> '57596743-06f4-45f1-a975-348cf70589b1-S0'
> Received status update TASK_RUNNING for task 'testcommand'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FINISHED for task 'testcommand'
>   message: 'Container exited with status 0'
>   source: SOURCE_EXECUTOR
> {code}
> Relevant agent output that shows the executor segfault:
> {code}
> [...]
> I0125 13:44:16.249191 14823 slave.cpp:4328] Got exited event for 
> executor(1)@192.99.40.208:33529
> I0125 13:44:16.347095 14830 docker.cpp:2358] Executor for container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53 has exited
> I0125 13:44:16.347127 14830 docker.cpp:2052] Destroying container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53
> I0125 13:44:16.347439 14830 docker.cpp:2179] Running docker stop on container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53
> I0125 13:44:16.349215 14826 slave.cpp:4691] Executor 'testcommand' of 
> framework 57596743-06f4-45f1-a975-348cf70589b1- terminated with signal 
> Segmentation fault (core dumped)
> [...]
> {code}
> The complete task stderr:
> {code}
> $ cat 
> /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-/executors/testcommand/runs/latest/stderr
>  
> I0125 13:44:12.850073 15030 exec.cpp:162] Version: 1.2.0
> I0125 13:44:12.864229 15050 exec.cpp:237] Executor registered on agent 
> 57596743-06f4-45f1-a975-348cf70589b1-S0
> I0125 13:44:12.865842 15054 docker.cpp:850] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 
> --env-file /tmp/xFZ8G9 -v 
> /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-/executors/testcommand/runs/396282a9-7bf0-48ee-ba07-3ff2ca801d53:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name 
> mesos-57596743-06f4-45f1-a975-348cf70589b1-S0.396282a9-7bf0-48ee-ba07-3ff2ca801d53
>  debian -c env
> I0125 13:44:15.248721 15064 exec.cpp:410] Executor asked to shutdown
> *** Aborted at 1485369856 (unix time) try "date -d @1485369856" if you are 
> using GNU date ***
> PC: @ 0x7fb38f153dd0 (unknown)
> *** SIGSEGV (@0x68) received by PID 15030 (TID 0x7fb3961a88c0) from PID 104; 
> stack trace: ***
> @ 0x7fb38f15b5c0 (unknown)
> @ 0x7fb38f153dd0 (unknown)
> @ 0x7fb39332c607 __gthread_mutex_lock()
> @ 0x7fb39332c657 __gthread_recursive_mutex_lock()
> @ 0x7fb39332edca std::recursive_mutex::lock()
> @ 0x7fb393337bd8 
> _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENKUlPS0_E_clES5_
> @ 0x7fb393337bf8 
> _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENUlPS0_E_4_FUNES5_
> @ 0x7fb39333ba6b Synchronized<>::Synchronized()
> @ 0x7fb393337cac synchronize<>()
> @ 0x7fb39492f15c process::ProcessManager::wait()
> @ 0x7fb3949353f0 process::wait()
> @ 0x55fd63f31fe5 process::wait()
> @ 0x7fb39332ce3c mesos::MesosExecutorDriver::~MesosExecutorDriver()
> @ 0x55fd63f2bd86 main
> @ 0x7fb38e4fc401 __libc_start_main
> @ 0x55fd63f2ab5a _start
> 

[jira] [Commented] (MESOS-6989) Docker executor segfaults in ~MesosExecutorDriver()

2017-01-25 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839052#comment-15839052
 ] 

Joseph Wu commented on MESOS-6989:
--

This is likely related: 
https://github.com/apache/mesos/blame/4eed1bddb96d26d18aaaed5ba6196b8e1f1f4c7d/src/docker/executor.cpp#L811

If you need/want me to take a look, I'll be happy to help.

> Docker executor segfaults in ~MesosExecutorDriver()
> ---
>
> Key: MESOS-6989
> URL: https://issues.apache.org/jira/browse/MESOS-6989
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Reporter: Jan-Philip Gehrcke
>
> With the current Mesos master state (commit 
> 42e515bc5c175a318e914d34473016feda4db6ff), the Docker executor segfaults 
> during shutdown. 
> Steps to reproduce:
> 1) Start master:
> {code}
> $ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/jp/mesos
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0125 13:41:15.963775 14744 main.cpp:278] Build: 2017-01-25 13:37:42 by jp
> I0125 13:41:15.963868 14744 main.cpp:279] Version: 1.2.0
> I0125 13:41:15.963877 14744 main.cpp:286] Git SHA: 
> 42e515bc5c175a318e914d34473016feda4db6ff
> {code}
> (note that building it at 13:37 is not part of the repro)
> 2) Start agent:
> {code}
> $ ./bin/mesos-slave.sh --containerizers=mesos,docker --master=127.0.0.1:5050 
> --work_dir=/tmp/jp/mesos
> {code}
> 3) Run {{mesos-execute}} with the Docker containerizer:
> {code}
> $ ./src/mesos-execute --master=127.0.0.1:5050 --name=testcommand 
> --containerizer=docker --docker_image=debian --command=env
> I0125 13:43:59.704973 14951 scheduler.cpp:184] Version: 1.2.0
> I0125 13:43:59.706425 14952 scheduler.cpp:470] New master detected at 
> master@127.0.0.1:5050
> Subscribed with ID 57596743-06f4-45f1-a975-348cf70589b1-
> Submitted task 'testcommand' to agent 
> '57596743-06f4-45f1-a975-348cf70589b1-S0'
> Received status update TASK_RUNNING for task 'testcommand'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FINISHED for task 'testcommand'
>   message: 'Container exited with status 0'
>   source: SOURCE_EXECUTOR
> {code}
> Relevant agent output that shows the executor segfault:
> {code}
> [...]
> I0125 13:44:16.249191 14823 slave.cpp:4328] Got exited event for 
> executor(1)@192.99.40.208:33529
> I0125 13:44:16.347095 14830 docker.cpp:2358] Executor for container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53 has exited
> I0125 13:44:16.347127 14830 docker.cpp:2052] Destroying container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53
> I0125 13:44:16.347439 14830 docker.cpp:2179] Running docker stop on container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53
> I0125 13:44:16.349215 14826 slave.cpp:4691] Executor 'testcommand' of 
> framework 57596743-06f4-45f1-a975-348cf70589b1- terminated with signal 
> Segmentation fault (core dumped)
> [...]
> {code}
> The complete task stderr:
> {code}
> $ cat 
> /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-/executors/testcommand/runs/latest/stderr
>  
> I0125 13:44:12.850073 15030 exec.cpp:162] Version: 1.2.0
> I0125 13:44:12.864229 15050 exec.cpp:237] Executor registered on agent 
> 57596743-06f4-45f1-a975-348cf70589b1-S0
> I0125 13:44:12.865842 15054 docker.cpp:850] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 
> --env-file /tmp/xFZ8G9 -v 
> /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-/executors/testcommand/runs/396282a9-7bf0-48ee-ba07-3ff2ca801d53:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name 
> mesos-57596743-06f4-45f1-a975-348cf70589b1-S0.396282a9-7bf0-48ee-ba07-3ff2ca801d53
>  debian -c env
> I0125 13:44:15.248721 15064 exec.cpp:410] Executor asked to shutdown
> *** Aborted at 1485369856 (unix time) try "date -d @1485369856" if you are 
> using GNU date ***
> PC: @ 0x7fb38f153dd0 (unknown)
> *** SIGSEGV (@0x68) received by PID 15030 (TID 0x7fb3961a88c0) from PID 104; 
> stack trace: ***
> @ 0x7fb38f15b5c0 (unknown)
> @ 0x7fb38f153dd0 (unknown)
> @ 0x7fb39332c607 __gthread_mutex_lock()
> @ 0x7fb39332c657 __gthread_recursive_mutex_lock()
> @ 0x7fb39332edca std::recursive_mutex::lock()
> @ 0x7fb393337bd8 
> _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENKUlPS0_E_clES5_
> @ 0x7fb393337bf8 
> _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENUlPS0_E_4_FUNES5_
> @ 0x7fb39333ba6b Synchronized<>::Synchronized()
> @ 0x7fb393337cac synchronize<>()
> @ 0x7fb39492f15c process::ProcessManager::wait()
> @ 0x7fb3949353f0 process::wait()
> @ 0x55fd63f31fe5 process::wait()
> @ 0x7fb39332ce3c mesos::MesosExecutorDriver::~MesosExecutorDriver()
> @ 

[jira] [Commented] (MESOS-6989) Docker executor segfaults in ~MesosExecutorDriver()

2017-01-25 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839044#comment-15839044
 ] 

Gilbert Song commented on MESOS-6989:
-

Hmm..I cannot reproduce with my local branch (~1 week behind 
`6c63a3fc7aba4d4cfc2f004362e4a6e3a384bd55`), but the segfault happened with the 
master branch. Should be something introduced recently. Will take a look later 
tonight.

> Docker executor segfaults in ~MesosExecutorDriver()
> ---
>
> Key: MESOS-6989
> URL: https://issues.apache.org/jira/browse/MESOS-6989
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Reporter: Jan-Philip Gehrcke
>
> With the current Mesos master state (commit 
> 42e515bc5c175a318e914d34473016feda4db6ff), the Docker executor segfaults 
> during shutdown. 
> Steps to reproduce:
> 1) Start master:
> {code}
> $ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/jp/mesos
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0125 13:41:15.963775 14744 main.cpp:278] Build: 2017-01-25 13:37:42 by jp
> I0125 13:41:15.963868 14744 main.cpp:279] Version: 1.2.0
> I0125 13:41:15.963877 14744 main.cpp:286] Git SHA: 
> 42e515bc5c175a318e914d34473016feda4db6ff
> {code}
> (note that building it at 13:37 is not part of the repro)
> 2) Start agent:
> {code}
> $ ./bin/mesos-slave.sh --containerizers=mesos,docker --master=127.0.0.1:5050 
> --work_dir=/tmp/jp/mesos
> {code}
> 3) Run {{mesos-execute}} with the Docker containerizer:
> {code}
> $ ./src/mesos-execute --master=127.0.0.1:5050 --name=testcommand 
> --containerizer=docker --docker_image=debian --command=env
> I0125 13:43:59.704973 14951 scheduler.cpp:184] Version: 1.2.0
> I0125 13:43:59.706425 14952 scheduler.cpp:470] New master detected at 
> master@127.0.0.1:5050
> Subscribed with ID 57596743-06f4-45f1-a975-348cf70589b1-
> Submitted task 'testcommand' to agent 
> '57596743-06f4-45f1-a975-348cf70589b1-S0'
> Received status update TASK_RUNNING for task 'testcommand'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FINISHED for task 'testcommand'
>   message: 'Container exited with status 0'
>   source: SOURCE_EXECUTOR
> {code}
> Relevant agent output that shows the executor segfault:
> {code}
> [...]
> I0125 13:44:16.249191 14823 slave.cpp:4328] Got exited event for 
> executor(1)@192.99.40.208:33529
> I0125 13:44:16.347095 14830 docker.cpp:2358] Executor for container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53 has exited
> I0125 13:44:16.347127 14830 docker.cpp:2052] Destroying container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53
> I0125 13:44:16.347439 14830 docker.cpp:2179] Running docker stop on container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53
> I0125 13:44:16.349215 14826 slave.cpp:4691] Executor 'testcommand' of 
> framework 57596743-06f4-45f1-a975-348cf70589b1- terminated with signal 
> Segmentation fault (core dumped)
> [...]
> {code}
> The complete task stderr:
> {code}
> $ cat 
> /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-/executors/testcommand/runs/latest/stderr
>  
> I0125 13:44:12.850073 15030 exec.cpp:162] Version: 1.2.0
> I0125 13:44:12.864229 15050 exec.cpp:237] Executor registered on agent 
> 57596743-06f4-45f1-a975-348cf70589b1-S0
> I0125 13:44:12.865842 15054 docker.cpp:850] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 
> --env-file /tmp/xFZ8G9 -v 
> /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-/executors/testcommand/runs/396282a9-7bf0-48ee-ba07-3ff2ca801d53:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name 
> mesos-57596743-06f4-45f1-a975-348cf70589b1-S0.396282a9-7bf0-48ee-ba07-3ff2ca801d53
>  debian -c env
> I0125 13:44:15.248721 15064 exec.cpp:410] Executor asked to shutdown
> *** Aborted at 1485369856 (unix time) try "date -d @1485369856" if you are 
> using GNU date ***
> PC: @ 0x7fb38f153dd0 (unknown)
> *** SIGSEGV (@0x68) received by PID 15030 (TID 0x7fb3961a88c0) from PID 104; 
> stack trace: ***
> @ 0x7fb38f15b5c0 (unknown)
> @ 0x7fb38f153dd0 (unknown)
> @ 0x7fb39332c607 __gthread_mutex_lock()
> @ 0x7fb39332c657 __gthread_recursive_mutex_lock()
> @ 0x7fb39332edca std::recursive_mutex::lock()
> @ 0x7fb393337bd8 
> _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENKUlPS0_E_clES5_
> @ 0x7fb393337bf8 
> _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENUlPS0_E_4_FUNES5_
> @ 0x7fb39333ba6b Synchronized<>::Synchronized()
> @ 0x7fb393337cac synchronize<>()
> @ 0x7fb39492f15c process::ProcessManager::wait()
> @ 0x7fb3949353f0 process::wait()
> @ 0x55fd63f31fe5 process::wait()
> @ 0x7fb39332ce3c 

[jira] [Updated] (MESOS-6958) Support linux filesystem type detection.

2017-01-25 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6958:

Issue Type: Task  (was: Bug)

> Support linux filesystem type detection.
> 
>
> Key: MESOS-6958
> URL: https://issues.apache.org/jira/browse/MESOS-6958
> Project: Mesos
>  Issue Type: Task
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: filesystem, linux
>
> We should support detecting a linux filesystem type (e.g., xfs, extfs) and 
> its filesystem id mapping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6000) Overlayfs backend cannot support the image with numerous layers.

2017-01-25 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839035#comment-15839035
 ] 

Gilbert Song commented on MESOS-6000:
-

Should we consider backport this to Mesos 1.0.3?

> Overlayfs backend cannot support the image with numerous layers.
> 
>
> Key: MESOS-6000
> URL: https://issues.apache.org/jira/browse/MESOS-6000
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 15
> Or any os with kernel 4.0+
>Reporter: Gilbert Song
>Assignee: Zhitao Li
>  Labels: backend, containerizer, overlayfs
> Fix For: 1.1.0
>
>
> This issue is exposed when testing unified containerizer with overlayfs 
> backend using any image with numerous layers (e.g., 38 layers). It can be 
> reproduced by using this image: `gilbertsong/cirros:34` (for anyone who wants 
> to test it out).
> Here is the partial log:
> {noformat}
> I0805 21:50:02.631873 11136 provisioner.cpp:315] Provisioning image rootfs 
> '/tmp/provisioner/containers/36c69ade-69db-4de3-9cd4-18b9b9c99e73/backends/overlay/rootfses/ba255b76-8326-4611-beb5-002f202b52e0'
>  for container 36c69ade-69db-4de3-9cd4-18b9b9c99e73 using overlay backend
> I0805 21:50:02.632990 11138 overlay.cpp:156] Provisioning image rootfs with 
> overlayfs: 
> 

[jira] [Commented] (MESOS-6653) Overlayfs backend may fail to mount the rootfs if both container image and image volume are specified.

2017-01-25 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839028#comment-15839028
 ] 

Gilbert Song commented on MESOS-6653:
-

commit 27a5016154384eb18ec5d7577c1723c9d17e87f7
Author: Gilbert Song 
Date:   Wed Jan 25 07:51:35 2017 -0800

Fixed overlay backend provisioning multi images symlink.

Since the fix of MESOS-6000, symlinks are used in overlayfs
backend to shorten the arguments when mounting the rootfs.
E.g., '.../backends/overlay/links' is the symlink created
for a provisioned image. It becomes problematic if a
container image is specified while some image volumes are
specified for the same container. An unique symlink is
needed for each image to be provisioned.

Please note that changing the symlinks directory would
still be backward compatible for legacy containers, since
the container backend directory will be removed anyway in
provisioner::destroy().

Review: https://reviews.apache.org/r/54212/

> Overlayfs backend may fail to mount the rootfs if both container image and 
> image volume are specified.
> --
>
> Key: MESOS-6653
> URL: https://issues.apache.org/jira/browse/MESOS-6653
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>Priority: Critical
>  Labels: backend, containerizer, overlayfs
>
> Depending on MESOS-6000, we use symlink to shorten the overlayfs mounting 
> arguments. However, if more than one image need to be provisioned (e.g., a 
> container image is specified while image volumes are specified for the same 
> container), the symlink .../backends/overlay/links would fail to be created 
> since it exists already.
> Here is a simple log when we hard code overlayfs as our default backend:
> {noformat}
> [07:02:45] :   [Step 10/10] [ RUN  ] 
> Nesting/VolumeImageIsolatorTest.ROOT_ImageInVolumeWithRootFilesystem/0
> [07:02:46] :   [Step 10/10] I1127 07:02:46.416021  2919 
> containerizer.cpp:207] Using isolation: 
> filesystem/linux,volume/image,docker/runtime,network/cni
> [07:02:46] :   [Step 10/10] I1127 07:02:46.419312  2919 
> linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [07:02:46] :   [Step 10/10] E1127 07:02:46.425336  2919 shell.hpp:107] 
> Command 'hadoop version 2>&1' failed; this is the output:
> [07:02:46] :   [Step 10/10] sh: 1: hadoop: not found
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425379  2919 fetcher.cpp:69] 
> Skipping URI fetcher plugin 'hadoop' as it could not be created: Failed to 
> create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was 
> either not found or exited with a non-zero exit status: 127
> [07:02:46] :   [Step 10/10] I1127 07:02:46.425452  2919 local_puller.cpp:94] 
> Creating local puller with docker registry '/tmp/R6OUei/registry'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427258  2934 
> containerizer.cpp:956] Starting container 
> 9af6c98a-d9f7-4c89-a5ed-fc7ae2fa1330 for executor 'test_executor' of 
> framework 
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427592  2938 
> metadata_manager.cpp:167] Looking for image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.427774  2936 local_puller.cpp:147] 
> Untarring image 'test_image_rootfs' from 
> '/tmp/R6OUei/registry/test_image_rootfs.tar' to 
> '/tmp/R6OUei/store/staging/9krDz2'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512070  2933 local_puller.cpp:167] 
> The repositories JSON file for image 'test_image_rootfs' is 
> '{"test_image_rootfs":{"latest":"815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346"}}'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.512279  2933 local_puller.cpp:295] 
> Extracting layer tar ball 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/layer.tar
>  to rootfs 
> '/tmp/R6OUei/store/staging/9krDz2/815b809d588c80fd6ddf4d6ac244ad1c01ae4cbe0f91cc7480e306671ee9c346/rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617442  2937 
> metadata_manager.cpp:155] Successfully cached image 'test_image_rootfs'
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617908  2938 provisioner.cpp:286] 
> Image layers: 1
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617925  2938 provisioner.cpp:296] 
> Should hit here
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617949  2938 provisioner.cpp:315] 
> : bind
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617959  2938 provisioner.cpp:315] 
> : overlay
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617967  2938 provisioner.cpp:315] 
> : copy
> [07:02:46] :   [Step 10/10] I1127 07:02:46.617974  2938 

[jira] [Comment Edited] (MESOS-6981) Allow disabling name based SSL checks

2017-01-25 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838929#comment-15838929
 ] 

Till Toenshoff edited comment on MESOS-6981 at 1/26/17 12:54 AM:
-

The implementation should be straight-forward. We would add a new SSL-flag; 
e.g. {{LIBPROCESS_SSL_WEAK_VERIFY}}.

Then we add 
{noformat}
if (ssl_flags->weak_verify) {
  return Nothing();
}
{noformat}

here 
https://github.com/apache/mesos/blob/16f479d151d5a6554f8ebfcedfdc6b62dc7a0edb/3rdparty/libprocess/src/openssl.cpp#L646
 


was (Author: tillt):
The implementation should be straight-forward. We would add a new SSL-flag; 
e.g. `LIBPROCESS_SSL_WEAK_VERIFY`.

Then we add 
{noformat}
if (ssl_flags->weak_verify) {
  return Nothing();
}
{noformat}

here 
https://github.com/apache/mesos/blob/16f479d151d5a6554f8ebfcedfdc6b62dc7a0edb/3rdparty/libprocess/src/openssl.cpp#L646
 

> Allow disabling name based SSL checks
> -
>
> Key: MESOS-6981
> URL: https://issues.apache.org/jira/browse/MESOS-6981
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Kevin Cox
>  Labels: mesosphere, security
>
> Currently if you want to use verified certificates you need to enable 
> validation by hostname or IP. However if you are running your own CA for 
> these certificates it is often sufficient to verify solely based on the CA 
> signature.
> For example if an admin wants to connect it is a pain to make sure that they 
> always have a valid certificate for their IP or reverse DNS. It would be nice 
> if the admin could be given a certificate that was trusted no matter where he 
> is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6981) Allow disabling name based SSL checks

2017-01-25 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838929#comment-15838929
 ] 

Till Toenshoff edited comment on MESOS-6981 at 1/26/17 12:54 AM:
-

The implementation should be straight-forward. We would add a new SSL-flag; 
e.g. `LIBPROCESS_SSL_WEAK_VERIFY`.

Then we add 
{noformat}
if (ssl_flags->weak_verify) {
  return Nothing();
}
{noformat}

here 
https://github.com/apache/mesos/blob/16f479d151d5a6554f8ebfcedfdc6b62dc7a0edb/3rdparty/libprocess/src/openssl.cpp#L646
 


was (Author: tillt):
The implementation should be straight-forward. We would add a new SSL-flag; 
e.g. `LIBPROCESS_SSL_WEAK_VERIFY`.

Then we add 
{noformat}
if (!ssl_flags->weak_verify) {
  return Nothing();
}
{noformat}

here 
https://github.com/apache/mesos/blob/16f479d151d5a6554f8ebfcedfdc6b62dc7a0edb/3rdparty/libprocess/src/openssl.cpp#L646
 

> Allow disabling name based SSL checks
> -
>
> Key: MESOS-6981
> URL: https://issues.apache.org/jira/browse/MESOS-6981
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Kevin Cox
>  Labels: mesosphere, security
>
> Currently if you want to use verified certificates you need to enable 
> validation by hostname or IP. However if you are running your own CA for 
> these certificates it is often sufficient to verify solely based on the CA 
> signature.
> For example if an admin wants to connect it is a pain to make sure that they 
> always have a valid certificate for their IP or reverse DNS. It would be nice 
> if the admin could be given a certificate that was trusted no matter where he 
> is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6981) Allow disabling name based SSL checks

2017-01-25 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838929#comment-15838929
 ] 

Till Toenshoff commented on MESOS-6981:
---

The implementation should be straight-forward. We would add a new SSL-flag; 
e.g. `LIBPROCESS_SSL_WEAK_VERIFY`.

Then we add 
{noformat}
if (!ssl_flags->weak_verify) {
  return Nothing();
}
{noformat}

here 
https://github.com/apache/mesos/blob/16f479d151d5a6554f8ebfcedfdc6b62dc7a0edb/3rdparty/libprocess/src/openssl.cpp#L646
 

> Allow disabling name based SSL checks
> -
>
> Key: MESOS-6981
> URL: https://issues.apache.org/jira/browse/MESOS-6981
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Kevin Cox
>  Labels: mesosphere, security
>
> Currently if you want to use verified certificates you need to enable 
> validation by hostname or IP. However if you are running your own CA for 
> these certificates it is often sufficient to verify solely based on the CA 
> signature.
> For example if an admin wants to connect it is a pain to make sure that they 
> always have a valid certificate for their IP or reverse DNS. It would be nice 
> if the admin could be given a certificate that was trusted no matter where he 
> is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6981) Allow disabling name based SSL checks

2017-01-25 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6981:
--
Labels: mesosphere security  (was: )

> Allow disabling name based SSL checks
> -
>
> Key: MESOS-6981
> URL: https://issues.apache.org/jira/browse/MESOS-6981
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Kevin Cox
>  Labels: mesosphere, security
>
> Currently if you want to use verified certificates you need to enable 
> validation by hostname or IP. However if you are running your own CA for 
> these certificates it is often sufficient to verify solely based on the CA 
> signature.
> For example if an admin wants to connect it is a pain to make sure that they 
> always have a valid certificate for their IP or reverse DNS. It would be nice 
> if the admin could be given a certificate that was trusted no matter where he 
> is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6981) Allow disabling name based SSL checks

2017-01-25 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838882#comment-15838882
 ] 

Adam B commented on MESOS-6981:
---

Fair point. It seems reasonable to me. [~tillt] should have further thoughts.
We've definitely run into this issue before at Mesophere.

> Allow disabling name based SSL checks
> -
>
> Key: MESOS-6981
> URL: https://issues.apache.org/jira/browse/MESOS-6981
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Kevin Cox
>  Labels: mesosphere, security
>
> Currently if you want to use verified certificates you need to enable 
> validation by hostname or IP. However if you are running your own CA for 
> these certificates it is often sufficient to verify solely based on the CA 
> signature.
> For example if an admin wants to connect it is a pain to make sure that they 
> always have a valid certificate for their IP or reverse DNS. It would be nice 
> if the admin could be given a certificate that was trusted no matter where he 
> is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6973) Fix BOOST random generator initialization on Windows

2017-01-25 Thread Alex Clemmer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Clemmer updated MESOS-6973:

Labels: Windows microsoft  (was: Windows)

> Fix BOOST random generator initialization on Windows
> 
>
> Key: MESOS-6973
> URL: https://issues.apache.org/jira/browse/MESOS-6973
> Project: Mesos
>  Issue Type: Bug
>Reporter: Daniel Pravat
>Assignee: Alex Clemmer
>  Labels: Windows, microsoft
>
> seed_rng::seed_rng does not produced the expected result in Windows since is 
> using `/dev/urandom` file.  
> 0:005> k
>  # Child-SP  RetAddr   Call Site
> 00 0049`22dfc108 7ff6`5193822f kernel32!CreateFileW
> ...
> 0e 0049`22dfc660 7ff6`502228fd 
> mesos_agent!boost::uuids::detail::seed_rng::seed_rng+0x3d 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\seed_rng.hpp
>  @ 80]
> 0f 0049`22dfc690 7ff6`502591e3 
> mesos_agent!boost::uuids::detail::seed int,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>
>  >+0x4d 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\seed_rng.hpp
>  @ 246]
> 10 0049`22dfc790 7ff6`50395518 
> mesos_agent!boost::uuids::basic_random_generator int,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>
>  >::basic_random_generator
>  >+0xd3 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\random_generator.hpp
>  @ 50]
> 11 0049`22dfc800 7ff6`500ad140 mesos_agent!id::UUID::random+0x78 
> [d:\repositories\mesoswin\3rdparty\stout\include\stout\uuid.hpp @ 49]
> 12 0049`22dfc870 7ff6`5007ff55 
> mesos_agent!mesos::internal::slave::Framework::launchExecutor+0x70 
> [d:\repositories\mesoswin\src\slave\slave.cpp @ 6301]
> 13 0049`22dfd520 7ff6`502a0a35 
> mesos_agent!mesos::internal::slave::Slave::_run+0x2455 
> [d:\repositories\mesoswin\src\slave\slave.cpp @ 1990]
> ...
> 0:005> du @rcx
> 01d7`cc55fb60  "/dev/urandom"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6868) Transition Windows away from `os::killtree`.

2017-01-25 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838859#comment-15838859
 ] 

Andrew Schwartzmeyer commented on MESOS-6868:
-

I've started the transition away from `os::killtree` for the `WindowsLauncher` 
and default executor, but there are still many instances in use elsewhere in 
the code base.

> Transition Windows away from `os::killtree`.
> 
>
> Key: MESOS-6868
> URL: https://issues.apache.org/jira/browse/MESOS-6868
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: microsoft
>
> Windows does not have as robust a notion of a process hierarchy as Unix, and 
> thus functions like `os::killtree` will always have critical limitations and 
> semantic mismatches between Unix and Windows.
> We should transition away from this function when we can, and replace it with 
> something similar to how we kill a cgroup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6973) Fix BOOST random generator initialization on Windows

2017-01-25 Thread Alex Clemmer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838858#comment-15838858
 ] 

Alex Clemmer commented on MESOS-6973:
-

[~klueska] Thanks for the heads up, I don't know about this issue offhand, but 
I will assign it to me and tag it `microsoft` so we remember to close it before 
March 1.

> Fix BOOST random generator initialization on Windows
> 
>
> Key: MESOS-6973
> URL: https://issues.apache.org/jira/browse/MESOS-6973
> Project: Mesos
>  Issue Type: Bug
>Reporter: Daniel Pravat
>Assignee: Alex Clemmer
>  Labels: Windows, microsoft
>
> seed_rng::seed_rng does not produced the expected result in Windows since is 
> using `/dev/urandom` file.  
> 0:005> k
>  # Child-SP  RetAddr   Call Site
> 00 0049`22dfc108 7ff6`5193822f kernel32!CreateFileW
> ...
> 0e 0049`22dfc660 7ff6`502228fd 
> mesos_agent!boost::uuids::detail::seed_rng::seed_rng+0x3d 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\seed_rng.hpp
>  @ 80]
> 0f 0049`22dfc690 7ff6`502591e3 
> mesos_agent!boost::uuids::detail::seed int,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>
>  >+0x4d 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\seed_rng.hpp
>  @ 246]
> 10 0049`22dfc790 7ff6`50395518 
> mesos_agent!boost::uuids::basic_random_generator int,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>
>  >::basic_random_generator
>  >+0xd3 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\random_generator.hpp
>  @ 50]
> 11 0049`22dfc800 7ff6`500ad140 mesos_agent!id::UUID::random+0x78 
> [d:\repositories\mesoswin\3rdparty\stout\include\stout\uuid.hpp @ 49]
> 12 0049`22dfc870 7ff6`5007ff55 
> mesos_agent!mesos::internal::slave::Framework::launchExecutor+0x70 
> [d:\repositories\mesoswin\src\slave\slave.cpp @ 6301]
> 13 0049`22dfd520 7ff6`502a0a35 
> mesos_agent!mesos::internal::slave::Slave::_run+0x2455 
> [d:\repositories\mesoswin\src\slave\slave.cpp @ 1990]
> ...
> 0:005> du @rcx
> 01d7`cc55fb60  "/dev/urandom"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6892) Reconsider process creation primitives on Windows

2017-01-25 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer reassigned MESOS-6892:
---

Assignee: Andrew Schwartzmeyer  (was: Alex Clemmer)

> Reconsider process creation primitives on Windows
> -
>
> Key: MESOS-6892
> URL: https://issues.apache.org/jira/browse/MESOS-6892
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: microsoft
>
> Windows does not have the same notions of process hierarchies as Unix, and so 
> killing groups of processes requires us to make sure all processes are 
> contained in a job object, which acts something like a cgroup. This is 
> particularly important when we decide to kill a task, as there is no way to 
> reliably do this unless all the processes you'd like to kill are in the job 
> object.
> This causes us a number of issues; it is a big reason we needed to fork the 
> command executor, and it is the reason tasks are currently unkillable in the 
> default executor.
> As we clean this issue up, we need to think carefully about the process 
> governance semantics of Mesos, and how we can map them to a reliable, simple 
> Windows implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6973) Fix BOOST random generator initialization on Windows

2017-01-25 Thread Alex Clemmer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Clemmer reassigned MESOS-6973:
---

Assignee: Alex Clemmer

> Fix BOOST random generator initialization on Windows
> 
>
> Key: MESOS-6973
> URL: https://issues.apache.org/jira/browse/MESOS-6973
> Project: Mesos
>  Issue Type: Bug
>Reporter: Daniel Pravat
>Assignee: Alex Clemmer
>  Labels: Windows, microsoft
>
> seed_rng::seed_rng does not produced the expected result in Windows since is 
> using `/dev/urandom` file.  
> 0:005> k
>  # Child-SP  RetAddr   Call Site
> 00 0049`22dfc108 7ff6`5193822f kernel32!CreateFileW
> ...
> 0e 0049`22dfc660 7ff6`502228fd 
> mesos_agent!boost::uuids::detail::seed_rng::seed_rng+0x3d 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\seed_rng.hpp
>  @ 80]
> 0f 0049`22dfc690 7ff6`502591e3 
> mesos_agent!boost::uuids::detail::seed int,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>
>  >+0x4d 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\seed_rng.hpp
>  @ 246]
> 10 0049`22dfc790 7ff6`50395518 
> mesos_agent!boost::uuids::basic_random_generator int,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>
>  >::basic_random_generator
>  >+0xd3 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\random_generator.hpp
>  @ 50]
> 11 0049`22dfc800 7ff6`500ad140 mesos_agent!id::UUID::random+0x78 
> [d:\repositories\mesoswin\3rdparty\stout\include\stout\uuid.hpp @ 49]
> 12 0049`22dfc870 7ff6`5007ff55 
> mesos_agent!mesos::internal::slave::Framework::launchExecutor+0x70 
> [d:\repositories\mesoswin\src\slave\slave.cpp @ 6301]
> 13 0049`22dfd520 7ff6`502a0a35 
> mesos_agent!mesos::internal::slave::Slave::_run+0x2455 
> [d:\repositories\mesoswin\src\slave\slave.cpp @ 1990]
> ...
> 0:005> du @rcx
> 01d7`cc55fb60  "/dev/urandom"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6815) Enable glog stack traces when we call things like `ABORT` on Windows

2017-01-25 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838849#comment-15838849
 ] 

Andrew Schwartzmeyer commented on MESOS-6815:
-

Work in progress is here: https://github.com/andschwa/glog/tree/test, with an 
open but unfinished pull request here: https://github.com/google/glog/pull/151.

This is currently paused. The status is that stack tracing works, but relying 
on a signal handler to report the stack trace does not provide the desired 
behavior. On Linux, the stack trace is reported from the thread that the signal 
was called; but it seems on Windows that the stack trace is reported from the 
thread on which the signal handler was installed, regardless of the thread from 
which it was called. I believe the correct approach is to slightly change how 
Glog reports stack tracing (at least for Windows) to not rely on signal 
handlers.

> Enable glog stack traces when we call things like `ABORT` on Windows
> 
>
> Key: MESOS-6815
> URL: https://issues.apache.org/jira/browse/MESOS-6815
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>Priority: Critical
>  Labels: microsoft, windows-mvp
>
> Currently in the Windows builds, if we call `ABORT` (etc.) we will simply 
> bail out, with no stack traces.
> This is highly undesirable. Stack traces are important for operating clusters 
> in production. We should work to enable this behavior, including possibly 
> working with glog to add this support if they currently they do not natively 
> support it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6951) Docker containerizer: mangled environment when env value contains LF byte

2017-01-25 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838791#comment-15838791
 ] 

Kevin Klues commented on MESOS-6951:


[~gilbert] Can you please take a look at this when you get a chance?

> Docker containerizer: mangled environment when env value contains LF byte
> -
>
> Key: MESOS-6951
> URL: https://issues.apache.org/jira/browse/MESOS-6951
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Jan-Philip Gehrcke
>
> Consider this Marathon app definition:
> {code}
> {
>   "id": "/testapp",
>   "cmd": "env && tail -f /dev/null",
>   "env":{
> "TESTVAR":"line1\nline2"
>   },
>   "cpus": 0.1,
>   "mem": 10,
>   "instances": 1,
>   "container": {
> "type": "DOCKER",
> "docker": {
>   "image": "alpine"
> }
>   }
> }
> {code}
> The JSON-encoded newline in the value of the {{TESTVAR}} environment variable 
> leads to a corrupted task environment. What follows is a subset of the 
> resulting task environment (as printed via {{env}}, i.e. in key=value 
> notation):
> {code}
> line2=
> TESTVAR=line1
> {code}
> That is, the trailing part of the intended value ended up being interpreted 
> as variable name, and only the leading part of the intended value was used as 
> actual value for {{TESTVAR}}.
> Common application scenarios that would badly break with that involve 
> pretty-printed JSON documents or YAML documents passed along via the 
> environment.
> Following the code and information flow led to the conclusion that Docker's 
> {{--env-file}} command line interface is the weak point in the flow. It is 
> currently used in Mesos' Docker containerizer for passing the environment to 
> the container:
> {code}
>   argv.push_back("--env-file");
>   argv.push_back(environmentFile);
> {code}
> (Ref: 
> [code|https://github.com/apache/mesos/blob/c0aee8cc10b1d1f4b2db5ff12b771372fdd5b1f3/src/docker/docker.cpp#L584])
> Docker's {{--env-file}} argument behavior is documented via
> {quote}
> The --env-file flag takes a filename as an argument
> and expects each line to be in the VAR=VAL format,
> {quote}
> (Ref: https://docs.docker.com/engine/reference/commandline/run/)
> That is, Docker identifies individual environment variable key/value pair 
> definitions based on newline bytes in that file which explains the observed 
> environment variable value fragmentation. Notably, Docker does not provide a 
> mechanism for escaping newline bytes in the values specified in this 
> environment file.
> I think it is important to understand that Docker's {{--env-file}} mechanism 
> is ill-posed in the sense that it is not capable of transmitting the whole 
> range of environment variable values allowed by POSIX. That's what the Single 
> UNIX Specification, Version 3 has to say about environment variable values:
> {quote}
> the value shall be composed of characters from the
> portable character set (except NUL and as indicated below). 
> {quote}
> (Ref: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html)
> About "The portable character set": 
> http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap06.html#tagtcjh_3
> It includes (among others) the LF byte. Understandably, the current Docker 
> {{--env-file}} behavior will not change, so this is not an issue that can be 
> deferred to Docker: https://github.com/docker/docker/issues/12997
> Notably, the {{--env-file}} method for communicating environment variables to 
> Docker containers was just recently introduced to Mesos as of 
> https://issues.apache.org/jira/browse/MESOS-6566, for not leaking secrets 
> through the process listing. Previously, we specified env key/value pairs on 
> the command line which leaked secrets to the process list and probably also 
> did not support the full range of valid environment variable values.
> We need a solution that
> 1) does not leak sensitive values (i.e. is compliant with MESOS-6566).
> 2) allows for passing arbitrary environment variable values.
> It seems that Docker's {{--env}} method can be used for that. It can be used 
> to define _just the names of the environment variables_ to-be-passed-along, 
> in which case the docker binary will read the corresponding values from its 
> own environment, which we can clearly prepare appropriately when we invoke 
> the corresponding child process. This method would still leak environment 
> variable _names_ to the process listing, but (especially if documented) this 
> should be fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6989) Docker executor segfaults in ~MesosExecutorDriver()

2017-01-25 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838789#comment-15838789
 ] 

Kevin Klues commented on MESOS-6989:


[~gilbert] Can you please take a look at this when you get a chance?

> Docker executor segfaults in ~MesosExecutorDriver()
> ---
>
> Key: MESOS-6989
> URL: https://issues.apache.org/jira/browse/MESOS-6989
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Reporter: Jan-Philip Gehrcke
>
> With the current Mesos master state (commit 
> 42e515bc5c175a318e914d34473016feda4db6ff), the Docker executor segfaults 
> during shutdown. 
> Steps to reproduce:
> 1) Start master:
> {code}
> $ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/jp/mesos
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0125 13:41:15.963775 14744 main.cpp:278] Build: 2017-01-25 13:37:42 by jp
> I0125 13:41:15.963868 14744 main.cpp:279] Version: 1.2.0
> I0125 13:41:15.963877 14744 main.cpp:286] Git SHA: 
> 42e515bc5c175a318e914d34473016feda4db6ff
> {code}
> (note that building it at 13:37 is not part of the repro)
> 2) Start agent:
> {code}
> $ ./bin/mesos-slave.sh --containerizers=mesos,docker --master=127.0.0.1:5050 
> --work_dir=/tmp/jp/mesos
> {code}
> 3) Run {{mesos-execute}} with the Docker containerizer:
> {code}
> $ ./src/mesos-execute --master=127.0.0.1:5050 --name=testcommand 
> --containerizer=docker --docker_image=debian --command=env
> I0125 13:43:59.704973 14951 scheduler.cpp:184] Version: 1.2.0
> I0125 13:43:59.706425 14952 scheduler.cpp:470] New master detected at 
> master@127.0.0.1:5050
> Subscribed with ID 57596743-06f4-45f1-a975-348cf70589b1-
> Submitted task 'testcommand' to agent 
> '57596743-06f4-45f1-a975-348cf70589b1-S0'
> Received status update TASK_RUNNING for task 'testcommand'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FINISHED for task 'testcommand'
>   message: 'Container exited with status 0'
>   source: SOURCE_EXECUTOR
> {code}
> Relevant agent output that shows the executor segfault:
> {code}
> [...]
> I0125 13:44:16.249191 14823 slave.cpp:4328] Got exited event for 
> executor(1)@192.99.40.208:33529
> I0125 13:44:16.347095 14830 docker.cpp:2358] Executor for container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53 has exited
> I0125 13:44:16.347127 14830 docker.cpp:2052] Destroying container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53
> I0125 13:44:16.347439 14830 docker.cpp:2179] Running docker stop on container 
> 396282a9-7bf0-48ee-ba07-3ff2ca801d53
> I0125 13:44:16.349215 14826 slave.cpp:4691] Executor 'testcommand' of 
> framework 57596743-06f4-45f1-a975-348cf70589b1- terminated with signal 
> Segmentation fault (core dumped)
> [...]
> {code}
> The complete task stderr:
> {code}
> $ cat 
> /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-/executors/testcommand/runs/latest/stderr
>  
> I0125 13:44:12.850073 15030 exec.cpp:162] Version: 1.2.0
> I0125 13:44:12.864229 15050 exec.cpp:237] Executor registered on agent 
> 57596743-06f4-45f1-a975-348cf70589b1-S0
> I0125 13:44:12.865842 15054 docker.cpp:850] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 
> --env-file /tmp/xFZ8G9 -v 
> /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-/executors/testcommand/runs/396282a9-7bf0-48ee-ba07-3ff2ca801d53:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name 
> mesos-57596743-06f4-45f1-a975-348cf70589b1-S0.396282a9-7bf0-48ee-ba07-3ff2ca801d53
>  debian -c env
> I0125 13:44:15.248721 15064 exec.cpp:410] Executor asked to shutdown
> *** Aborted at 1485369856 (unix time) try "date -d @1485369856" if you are 
> using GNU date ***
> PC: @ 0x7fb38f153dd0 (unknown)
> *** SIGSEGV (@0x68) received by PID 15030 (TID 0x7fb3961a88c0) from PID 104; 
> stack trace: ***
> @ 0x7fb38f15b5c0 (unknown)
> @ 0x7fb38f153dd0 (unknown)
> @ 0x7fb39332c607 __gthread_mutex_lock()
> @ 0x7fb39332c657 __gthread_recursive_mutex_lock()
> @ 0x7fb39332edca std::recursive_mutex::lock()
> @ 0x7fb393337bd8 
> _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENKUlPS0_E_clES5_
> @ 0x7fb393337bf8 
> _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENUlPS0_E_4_FUNES5_
> @ 0x7fb39333ba6b Synchronized<>::Synchronized()
> @ 0x7fb393337cac synchronize<>()
> @ 0x7fb39492f15c process::ProcessManager::wait()
> @ 0x7fb3949353f0 process::wait()
> @ 0x55fd63f31fe5 process::wait()
> @ 0x7fb39332ce3c mesos::MesosExecutorDriver::~MesosExecutorDriver()
> @ 0x55fd63f2bd86 main
> @ 0x7fb38e4fc401 __libc_start_main
> @ 0x55fd63f2ab5a _start
> {code}



--
This 

[jira] [Commented] (MESOS-6981) Allow disabling name based SSL checks

2017-01-25 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838785#comment-15838785
 ] 

Kevin Klues commented on MESOS-6981:


[~tillt] [~kaysoky] [~adam-mesos] [~arojas] Can you please comment here?

> Allow disabling name based SSL checks
> -
>
> Key: MESOS-6981
> URL: https://issues.apache.org/jira/browse/MESOS-6981
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Kevin Cox
>
> Currently if you want to use verified certificates you need to enable 
> validation by hostname or IP. However if you are running your own CA for 
> these certificates it is often sufficient to verify solely based on the CA 
> signature.
> For example if an admin wants to connect it is a pain to make sure that they 
> always have a valid certificate for their IP or reverse DNS. It would be nice 
> if the admin could be given a certificate that was trusted no matter where he 
> is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6973) Fix BOOST random generator initialization on Windows

2017-01-25 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838781#comment-15838781
 ] 

Kevin Klues commented on MESOS-6973:


[~hausdorff] Do you have any insight on this?

> Fix BOOST random generator initialization on Windows
> 
>
> Key: MESOS-6973
> URL: https://issues.apache.org/jira/browse/MESOS-6973
> Project: Mesos
>  Issue Type: Bug
>Reporter: Daniel Pravat
>  Labels: Windows
>
> seed_rng::seed_rng does not produced the expected result in Windows since is 
> using `/dev/urandom` file.  
> 0:005> k
>  # Child-SP  RetAddr   Call Site
> 00 0049`22dfc108 7ff6`5193822f kernel32!CreateFileW
> ...
> 0e 0049`22dfc660 7ff6`502228fd 
> mesos_agent!boost::uuids::detail::seed_rng::seed_rng+0x3d 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\seed_rng.hpp
>  @ 80]
> 0f 0049`22dfc690 7ff6`502591e3 
> mesos_agent!boost::uuids::detail::seed int,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>
>  >+0x4d 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\seed_rng.hpp
>  @ 246]
> 10 0049`22dfc790 7ff6`50395518 
> mesos_agent!boost::uuids::basic_random_generator int,32,624,397,31,2567483615,11,4294967295,7,2636928640,15,4022730752,18,1812433253>
>  >::basic_random_generator
>  >+0xd3 
> [d:\repositories\mesoswin\build\3rdparty\boost-1.53.0\src\boost-1.53.0\boost\uuid\random_generator.hpp
>  @ 50]
> 11 0049`22dfc800 7ff6`500ad140 mesos_agent!id::UUID::random+0x78 
> [d:\repositories\mesoswin\3rdparty\stout\include\stout\uuid.hpp @ 49]
> 12 0049`22dfc870 7ff6`5007ff55 
> mesos_agent!mesos::internal::slave::Framework::launchExecutor+0x70 
> [d:\repositories\mesoswin\src\slave\slave.cpp @ 6301]
> 13 0049`22dfd520 7ff6`502a0a35 
> mesos_agent!mesos::internal::slave::Slave::_run+0x2455 
> [d:\repositories\mesoswin\src\slave\slave.cpp @ 1990]
> ...
> 0:005> du @rcx
> 01d7`cc55fb60  "/dev/urandom"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6957) timestamp based Task reconcillation

2017-01-25 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838778#comment-15838778
 ] 

Kevin Klues commented on MESOS-6957:


This seems more like a question for the mailing list than a ticket to be 
created already.

> timestamp based Task reconcillation
> ---
>
> Key: MESOS-6957
> URL: https://issues.apache.org/jira/browse/MESOS-6957
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Shi Lu
>
> If mesos master supports timestamp based task reconciliation, e.g. client 
> sends reconcile request with a list of tasklDs and time T, and master streams 
> back task changes that is after T. This can reduce the overhead f task 
> reconciliation a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6956) Out of band Task reconcillation

2017-01-25 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838773#comment-15838773
 ] 

Kevin Klues commented on MESOS-6956:


[~vinodkone] You are probably in the best position to answer this question.

> Out of band Task reconcillation 
> 
>
> Key: MESOS-6956
> URL: https://issues.apache.org/jira/browse/MESOS-6956
> Project: Mesos
>  Issue Type: Task
>Reporter: Shi Lu
>
> Can we add capability in mesos master to have out of band task 
> reconcillation? Like the client can send a request to master with a list of 
> taskIDs that it want to reconcile and the mesos master returns the state of 
> those tasks in the response, instead of sending back via the subscribed 
> connection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6995) Update the webui to reflect hierarchical roles.

2017-01-25 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6995:
--

 Summary: Update the webui to reflect hierarchical roles.
 Key: MESOS-6995
 URL: https://issues.apache.org/jira/browse/MESOS-6995
 Project: Mesos
  Issue Type: Task
  Components: webui
Reporter: Benjamin Mahler


It may not need any changes, but we should confirm that the new role format for 
hierarchical roles is correctly displayed in the webui.

In addition, we can add a roles tab that shows the summary information (shares, 
weights, quotas). For now, we don't need to make any of this clickable (e.g. to 
see the tasks / frameworks under the role).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6994) Add HIERARCHICAL_ROLES framework capability.

2017-01-25 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6994:
--

 Summary: Add HIERARCHICAL_ROLES framework capability.
 Key: MESOS-6994
 URL: https://issues.apache.org/jira/browse/MESOS-6994
 Project: Mesos
  Issue Type: Task
  Components: framework api
Reporter: Benjamin Mahler


With hierarchical roles, frameworks are expected to use a new format for roles 
where there is a leading slash and there may be intermediate slashes:

/eng
/eng/frontend
/eng/frontend/server

In order to enforce the role format that matches the framework's intention, we 
will add a HIERARCHICAL_ROLES capability that makes the framework's intention 
explicit. This allows us to validate the roles differently, rather than 
implicitly deciding if the framework has the capability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6993) Translate hierarchical roles back to the old format for non-hierarchical role schedulers.

2017-01-25 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6993:
--

 Summary: Translate hierarchical roles back to the old format for 
non-hierarchical role schedulers.
 Key: MESOS-6993
 URL: https://issues.apache.org/jira/browse/MESOS-6993
 Project: Mesos
  Issue Type: Task
  Components: master
Reporter: Benjamin Mahler


When sending roles back to frameworks (within Resource.role or 
Resource.allocation_info.role), we need translate the role back to the format 
the framework used when subscribing to the role. That is, if the framework did 
not use a leading slash or used “*”, we will omit the leading slash and convert 
the “/” role to “*”.

This is required to ensure backwards compatibility with schedulers that 
continue to use the non-hierarchical role format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6992) Remove validation against "/" characters in roles to support hierarchical roles.

2017-01-25 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-6992:
--

 Summary: Remove validation against "/" characters in roles to 
support hierarchical roles.
 Key: MESOS-6992
 URL: https://issues.apache.org/jira/browse/MESOS-6992
 Project: Mesos
  Issue Type: Task
  Components: allocation, master
Reporter: Benjamin Mahler


With the introduction of hierarchical roles, we need to allow "/" characters in 
roles, as these are now used to provide the hierarchical placement of the role. 
We will allow roles containing a slash only when the role begins with a slash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6986) abort in DRFSorter::add

2017-01-25 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838710#comment-15838710
 ] 

Vinod Kone commented on MESOS-6986:
---

Can you paste more master log lines around the crash? It's hard to debug 
without the surrounding context.

> abort in DRFSorter::add
> ---
>
> Key: MESOS-6986
> URL: https://issues.apache.org/jira/browse/MESOS-6986
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 1.0.1
> Environment: Mesosphere Enterprise DC/OS, CoreOS
>Reporter: Yvan Royon
>  Labels: mesosphere
>
> My mesos-master process terminated on SIGABRT.
> The CHECK failed in function {{DRFSorter::add}}:
> https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L74
> It seems there is a condition during framework registration where names are 
> lost?
> We are using the mesos-go library ({{next}} branch), which uses the new HTTP 
> API. The framework is custom Go code. The crash is hard to reliably reproduce.
> {code}
> mesos-master[90061]: F0119 01:07:57.426159 90086 sorter.cpp:73] Check failed: 
> !contains(name)
> mesos-master[90061]: *** Check failure stack trace: ***
> mesos-master[90061]: @ 0x7f960d9299fd  google::LogMessage::Fail()
> mesos-master[90061]: @ 0x7f960d92b82d  google::LogMessage::SendToLog()
> mesos-master[90061]: @ 0x7f960d9295ec  google::LogMessage::Flush()
> mesos-master[90061]: @ 0x7f960d92c129  
> google::LogMessageFatal::~LogMessageFatal()
> mesos-master[90061]: @ 0x7f960d03460d  
> mesos::internal::master::allocator::DRFSorter::add()
> mesos-master[90061]: @ 0x7f960d021177  
> mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::addFramework()
> mesos-master[90061]: @ 0x7f960d8b9381  process::ProcessManager::resume()
> mesos-master[90061]: @ 0x7f960d8b9687  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> mesos-master[90061]: @ 0x7f960bf52d73  (unknown)
> mesos-master[90061]: @ 0x7f960b74f52c  (unknown)
> mesos-master[90061]: @ 0x7f960b49180d  (unknown)
> systemd[1]: dcos-mesos-master.service: Main process exited, code=killed, 
> status=6/ABRT
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6953) A compromised mesos-master node can execute code as root on agents.

2017-01-25 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838700#comment-15838700
 ] 

Vinod Kone commented on MESOS-6953:
---

Is this a short term solution? Why are you only protecting against the "run 
task" message from master? What about other messages like "kill task", 
"shutdown framework" etc that can come from a compromised master?

The right long term solution sounds like mutual authentication/authorization 
between master and agent.

> A compromised mesos-master node can execute code as root on agents.
> ---
>
> Key: MESOS-6953
> URL: https://issues.apache.org/jira/browse/MESOS-6953
> Project: Mesos
>  Issue Type: Bug
>  Components: security
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: security, slave
>
> mesos-master has a `--[no-]root_submissions` flag that controls whether 
> frameworks with `root` user are admitted to the cluster.
> However, if a mesos-master node is compromised, it can attempt to schedule 
> tasks on agent as the `root` user. Since mesos-agent has no check against 
> tasks running on the agent for specific users, tasks can get run with `root` 
> privileges can get run within the container on the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6296) Default executor should be able to launch multiple task groups

2017-01-25 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6296:
--
  Sprint: Mesosphere Sprint 50
Story Points: 5

> Default executor should be able to launch multiple task groups
> --
>
> Key: MESOS-6296
> URL: https://issues.apache.org/jira/browse/MESOS-6296
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>
> This gives more flexibility for schedulers that do not know all the tasks 
> that they want to launch up front. For example a backup task that needs to be 
> launched regularly next to a main task in the same executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6296) Default executor should be able to launch multiple task groups

2017-01-25 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar reassigned MESOS-6296:
-

Assignee: Anand Mazumdar

> Default executor should be able to launch multiple task groups
> --
>
> Key: MESOS-6296
> URL: https://issues.apache.org/jira/browse/MESOS-6296
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Anand Mazumdar
>
> This gives more flexibility for schedulers that do not know all the tasks 
> that they want to launch up front. For example a backup task that needs to be 
> launched regularly next to a main task in the same executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6648) MesosContainerizer launch helper should take ContainerLaunchInfo.

2017-01-25 Thread Stephan Erb (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838594#comment-15838594
 ] 

Stephan Erb commented on MESOS-6648:


I fear the sudden interface change makes upgrading difficult. Would it be 
possible to provide a backwards compatible shim so that users can migrate to 
the new command line API seamlessly?

Context:
* https://issues.apache.org/jira/browse/AURORA-1882
* https://reviews.apache.org/r/55951/

> MesosContainerizer launch helper should take ContainerLaunchInfo.
> -
>
> Key: MESOS-6648
> URL: https://issues.apache.org/jira/browse/MESOS-6648
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>Assignee: Jie Yu
>  Labels: mesosphere
> Fix For: 1.2.0
>
>
> Currently, the launch helper takes various flags from MesosContainerizer to 
> launch the container. This makes it very hard to add more parameters to the 
> launch helper. To simplify that, MesosContainerizer can pass 
> 'ContainerLaunchInfo' to the launch helper instead. 'ContainerLaunchInfo' is 
> also the protobuf message returned by isolators during 'prepare()'. This 
> makes it very easy to merge them and send it to the launch helper. More 
> importantly, this makes it very easy to add more parameters to the launch 
> helper in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6991) Change `Environment.Variable.Value` from required to optional

2017-01-25 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838491#comment-15838491
 ] 

Greg Mann commented on MESOS-6991:
--

Reviews here:
https://reviews.apache.org/r/55954/
https://reviews.apache.org/r/55955/

> Change `Environment.Variable.Value` from required to optional
> -
>
> Key: MESOS-6991
> URL: https://issues.apache.org/jira/browse/MESOS-6991
> Project: Mesos
>  Issue Type: Bug
>Reporter: Greg Mann
>Assignee: Greg Mann
>
> To prepare for future work which will enable the modular fetching of secrets, 
> we should change the {{Environment.Variable.Value}} field from {{required}} 
> to {{optional}}. This way, the field can be left empty and filled in by a 
> secret fetching module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6991) Change `Environment.Variable.Value` from required to optional

2017-01-25 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-6991:


Assignee: Greg Mann

> Change `Environment.Variable.Value` from required to optional
> -
>
> Key: MESOS-6991
> URL: https://issues.apache.org/jira/browse/MESOS-6991
> Project: Mesos
>  Issue Type: Bug
>Reporter: Greg Mann
>Assignee: Greg Mann
>
> To prepare for future work which will enable the modular fetching of secrets, 
> we should change the {{Environment.Variable.Value}} field from {{required}} 
> to {{optional}}. This way, the field can be left empty and filled in by a 
> secret fetching module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6991) Change `Environment.Variable.Value` from required to optional

2017-01-25 Thread Greg Mann (JIRA)
Greg Mann created MESOS-6991:


 Summary: Change `Environment.Variable.Value` from required to 
optional
 Key: MESOS-6991
 URL: https://issues.apache.org/jira/browse/MESOS-6991
 Project: Mesos
  Issue Type: Bug
Reporter: Greg Mann


To prepare for future work which will enable the modular fetching of secrets, 
we should change the {{Environment.Variable.Value}} field from {{required}} to 
{{optional}}. This way, the field can be left empty and filled in by a secret 
fetching module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6990) `PartitionTest.TaskCompletedOnPartitionedAgent` is flaky

2017-01-25 Thread Michael Park (JIRA)
Michael Park created MESOS-6990:
---

 Summary: `PartitionTest.TaskCompletedOnPartitionedAgent` is flaky
 Key: MESOS-6990
 URL: https://issues.apache.org/jira/browse/MESOS-6990
 Project: Mesos
  Issue Type: Bug
  Components: tests
Reporter: Michael Park


Observed in the ASF Jenkins CI:

{noformat}
/mesos/src/tests/partition_tests.cpp:2055: Failure
Actual function call count doesn't match EXPECT_CALL(sched, 
statusUpdate(, _))...
 Expected: to be called once
   Actual: never called - unsatisfied and active
{noformat}

Full log for the test:

{noformat}
[ RUN  ] PartitionTest.TaskCompletedOnPartitionedAgent
I0125 15:16:42.170163 25314 cluster.cpp:160] Creating default 'local' authorizer
I0125 15:16:42.171134 25325 master.cpp:383] Master 
6361cb74-ebfe-43e5-9927-652201a9677a (9cdefe4ff6bc) started on 172.17.0.3:57726
I0125 15:16:42.171160 25325 master.cpp:385] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/GAnqYR/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/GAnqYR/master" 
--zk_session_timeout="10secs"
I0125 15:16:42.171417 25325 master.cpp:435] Master only allowing authenticated 
frameworks to register
I0125 15:16:42.171427 25325 master.cpp:449] Master only allowing authenticated 
agents to register
I0125 15:16:42.171433 25325 master.cpp:462] Master only allowing authenticated 
HTTP frameworks to register
I0125 15:16:42.171439 25325 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/GAnqYR/credentials'
I0125 15:16:42.171571 25325 master.cpp:507] Using default 'crammd5' 
authenticator
I0125 15:16:42.171614 25325 http.cpp:922] Using default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0125 15:16:42.171658 25325 http.cpp:922] Using default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0125 15:16:42.171684 25325 http.cpp:922] Using default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0125 15:16:42.171710 25325 master.cpp:587] Authorization enabled
I0125 15:16:42.172552 25325 hierarchical.cpp:151] Initialized hierarchical 
allocator process
I0125 15:16:42.172575 25325 whitelist_watcher.cpp:77] No whitelist given
I0125 15:16:42.173259 25325 master.cpp:2121] Elected as the leading master!
I0125 15:16:42.173274 25325 master.cpp:1643] Recovering from registrar
I0125 15:16:42.173328 25325 registrar.cpp:329] Recovering registrar
I0125 15:16:42.173552 25325 registrar.cpp:362] Successfully fetched the 
registry (0B) in 0ns
I0125 15:16:42.173588 25325 registrar.cpp:461] Applied 1 operations in 8907ns; 
attempting to update the registry
I0125 15:16:42.173854 25325 registrar.cpp:506] Successfully updated the 
registry in 0ns
I0125 15:16:42.173898 25325 registrar.cpp:392] Successfully recovered registrar
I0125 15:16:42.174008 25325 master.cpp:1759] Recovered 0 agents from the 
registry (129B); allowing 10mins for agents to re-register
I0125 15:16:42.174048 25325 hierarchical.cpp:178] Skipping recovery of 
hierarchical allocator: nothing to recover
I0125 15:16:42.175926 25314 cluster.cpp:446] Creating default 'local' authorizer
I0125 15:16:42.176554 25321 slave.cpp:209] Mesos agent started on 
(93)@172.17.0.3:57726
I0125 15:16:42.176578 25321 slave.cpp:210] Flags at startup: --acls="" 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="mesos" 

[jira] [Commented] (MESOS-6052) Unable to launch containers on CNI networks on CoreOS

2017-01-25 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838384#comment-15838384
 ] 

Vinod Kone commented on MESOS-6052:
---

Backported to 1.0.3.

commit 23da4b9418a61e65b8ab5357c57ad9a3bd4d7fe9
Author: Avinash sridharan 
Date:   Wed Sep 7 13:30:53 2016 -0700

Modified network file setup in `network/cni` isolator.

In case /etc/hosts and /etc/hostname files are not present in the host
filesystem, we were ignoring these files and assuming that they would
not be required by the executor when it is launched in a new network
namespace. This assumption is incorrect, since the executor needs
/etc/hosts in the new network namespace to resolve its hostname.
Hence, we are explicitly creating these files in the host file system
in case they are not present, so that containers /etc/hosts and
/etc/hostname can be mounted on these mount points. This solves the
problem in distributions such as CoreOS that don't have /etc/hosts in
their host filesystem.

Review: https://reviews.apache.org/r/51643/


> Unable to launch containers on CNI networks on CoreOS
> -
>
> Key: MESOS-6052
> URL: https://issues.apache.org/jira/browse/MESOS-6052
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 1.0.0
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> CoreOS does not have an `/etc/hosts`. Currently, in the `network/cni` 
> isolator, if we don't see a `/etc/hosts` on the host filesystem we don't bind 
> mount the containers `hosts` file to this target for the `command executor`. 
> On distros such as CoreOS this fails the container launch since the 
> `libprocess` initialization of the `command executor` fails cause it can't 
> resolve its `hostname`.
> We should be creating the `/etc/hosts` and `/etc/hostname` files when they 
> are absent on the host filesystem since creating these files should not 
> affect name resolution on the host network namespace, and it will allow the 
> `/etc/hosts` file to be bind mounted correctly and allow name resolution in 
> the containers network namespace as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6989) Docker executor segfaults in ~MesosExecutorDriver()

2017-01-25 Thread Jan-Philip Gehrcke (JIRA)
Jan-Philip Gehrcke created MESOS-6989:
-

 Summary: Docker executor segfaults in ~MesosExecutorDriver()
 Key: MESOS-6989
 URL: https://issues.apache.org/jira/browse/MESOS-6989
 Project: Mesos
  Issue Type: Bug
  Components: docker
Reporter: Jan-Philip Gehrcke


With the current Mesos master state (commit 
42e515bc5c175a318e914d34473016feda4db6ff), the Docker executor segfaults during 
shutdown. 

Steps to reproduce:

1) Start master:
{code}
$ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/jp/mesos
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0125 13:41:15.963775 14744 main.cpp:278] Build: 2017-01-25 13:37:42 by jp
I0125 13:41:15.963868 14744 main.cpp:279] Version: 1.2.0
I0125 13:41:15.963877 14744 main.cpp:286] Git SHA: 
42e515bc5c175a318e914d34473016feda4db6ff
{code}
(note that building it at 13:37 is not part of the repro)

2) Start agent:
{code}
$ ./bin/mesos-slave.sh --containerizers=mesos,docker --master=127.0.0.1:5050 
--work_dir=/tmp/jp/mesos
{code}

3) Run {{mesos-execute}} with the Docker containerizer:
{code}
$ ./src/mesos-execute --master=127.0.0.1:5050 --name=testcommand 
--containerizer=docker --docker_image=debian --command=env
I0125 13:43:59.704973 14951 scheduler.cpp:184] Version: 1.2.0
I0125 13:43:59.706425 14952 scheduler.cpp:470] New master detected at 
master@127.0.0.1:5050
Subscribed with ID 57596743-06f4-45f1-a975-348cf70589b1-
Submitted task 'testcommand' to agent '57596743-06f4-45f1-a975-348cf70589b1-S0'
Received status update TASK_RUNNING for task 'testcommand'
  source: SOURCE_EXECUTOR
Received status update TASK_FINISHED for task 'testcommand'
  message: 'Container exited with status 0'
  source: SOURCE_EXECUTOR
{code}

Relevant agent output that shows the executor segfault:
{code}
[...]
I0125 13:44:16.249191 14823 slave.cpp:4328] Got exited event for 
executor(1)@192.99.40.208:33529
I0125 13:44:16.347095 14830 docker.cpp:2358] Executor for container 
396282a9-7bf0-48ee-ba07-3ff2ca801d53 has exited
I0125 13:44:16.347127 14830 docker.cpp:2052] Destroying container 
396282a9-7bf0-48ee-ba07-3ff2ca801d53
I0125 13:44:16.347439 14830 docker.cpp:2179] Running docker stop on container 
396282a9-7bf0-48ee-ba07-3ff2ca801d53
I0125 13:44:16.349215 14826 slave.cpp:4691] Executor 'testcommand' of framework 
57596743-06f4-45f1-a975-348cf70589b1- terminated with signal Segmentation 
fault (core dumped)
[...]
{code}

The complete task stderr:
{code}
$ cat 
/tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-/executors/testcommand/runs/latest/stderr
 
I0125 13:44:12.850073 15030 exec.cpp:162] Version: 1.2.0
I0125 13:44:12.864229 15050 exec.cpp:237] Executor registered on agent 
57596743-06f4-45f1-a975-348cf70589b1-S0
I0125 13:44:12.865842 15054 docker.cpp:850] Running docker -H 
unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 --env-file 
/tmp/xFZ8G9 -v 
/tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-/executors/testcommand/runs/396282a9-7bf0-48ee-ba07-3ff2ca801d53:/mnt/mesos/sandbox
 --net host --entrypoint /bin/sh --name 
mesos-57596743-06f4-45f1-a975-348cf70589b1-S0.396282a9-7bf0-48ee-ba07-3ff2ca801d53
 debian -c env
I0125 13:44:15.248721 15064 exec.cpp:410] Executor asked to shutdown
*** Aborted at 1485369856 (unix time) try "date -d @1485369856" if you are 
using GNU date ***
PC: @ 0x7fb38f153dd0 (unknown)
*** SIGSEGV (@0x68) received by PID 15030 (TID 0x7fb3961a88c0) from PID 104; 
stack trace: ***
@ 0x7fb38f15b5c0 (unknown)
@ 0x7fb38f153dd0 (unknown)
@ 0x7fb39332c607 __gthread_mutex_lock()
@ 0x7fb39332c657 __gthread_recursive_mutex_lock()
@ 0x7fb39332edca std::recursive_mutex::lock()
@ 0x7fb393337bd8 
_ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENKUlPS0_E_clES5_
@ 0x7fb393337bf8 
_ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENUlPS0_E_4_FUNES5_
@ 0x7fb39333ba6b Synchronized<>::Synchronized()
@ 0x7fb393337cac synchronize<>()
@ 0x7fb39492f15c process::ProcessManager::wait()
@ 0x7fb3949353f0 process::wait()
@ 0x55fd63f31fe5 process::wait()
@ 0x7fb39332ce3c mesos::MesosExecutorDriver::~MesosExecutorDriver()
@ 0x55fd63f2bd86 main
@ 0x7fb38e4fc401 __libc_start_main
@ 0x55fd63f2ab5a _start
{code}







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6280) Task group executor should support command health checks.

2017-01-25 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838282#comment-15838282
 ] 

haosdent commented on MESOS-6280:
-

{code}
commit 636866c540c696ec99785fd27328f93201e88d3d
Author: Gastón Kleiman 
Date:   Thu Jan 26 02:11:13 2017 +0800

Renamed `taskID` to `taskId` in `HealthChecker`.

We normally use `taskId` for variables that hold a `TaskID`, so I
renamed the attribute/parameters to be consistent with the rest of the
codebase.

Review: https://reviews.apache.org/r/55899/
{code}

> Task group executor should support command health checks.
> -
>
> Key: MESOS-6280
> URL: https://issues.apache.org/jira/browse/MESOS-6280
> Project: Mesos
>  Issue Type: Improvement
>  Components: executor
>Affects Versions: 1.1.0
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>Priority: Critical
>  Labels: health-check, mesosphere
>
> Currently, the default (aka pod) executor supports only HTTP and TCP health 
> checks. We should also support command health checks as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6396) Hooks should allow sandbox dependent environment variables.

2017-01-25 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838078#comment-15838078
 ] 

Adam B commented on MESOS-6396:
---

Also:
{code}
commit 30377be38927022b1dc641a6be936cf4ac46
Author: Till Toenshoff 
Date:   Tue Nov 29 22:05:46 2016 +0100

Removed superseded `slavePreLaunchDockerHook` hook.

Review: https://reviews.apache.org/r/54174/

commit 63cc0fb49a0370d596e2c9cf87c6c5025372dd15
Author: Till Toenshoff 
Date:   Tue Nov 29 22:05:39 2016 +0100

Fixed conflict in hook result handling.

Review: https://reviews.apache.org/r/54165/

commit 1fe9876ca069162e46cc2436384e8d5da2b9d551
Author: Till Toenshoff 
Date:   Tue Nov 29 22:05:30 2016 +0100

Removed superseded `slavePreLaunchDockerEnvironmentDecorator` hook.

Review: https://reviews.apache.org/r/54129/

commit f765828257cf0e0c43e6f110b204524da7dcf728
Author: Till Toenshoff 
Date:   Tue Nov 29 22:05:15 2016 +0100

Added test for `slavePreLaunchDockerTaskExecutorDecorator` hook.

Review: https://reviews.apache.org/r/54128/
{code}

> Hooks should allow sandbox dependent environment variables.
> ---
>
> Key: MESOS-6396
> URL: https://issues.apache.org/jira/browse/MESOS-6396
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>  Labels: containerizer, docker, hooks, module
> Fix For: 1.2.0
>
>
> The {{slaveExecutorEnvironmentDecorator}} hook is the only one that allows 
> mutating the executor environment of a Docker container. That callback has no 
> means of getting the location of the sandbox. That in turn means that it is 
> not possible for a hook to create files and respective environment variables 
> listing  paths within the sandbox for the executor to access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6988) CLONE - WebUI redirect doesn't work with stats from /metric/snapshot

2017-01-25 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-6988:
---

Assignee: haosdent

> CLONE - WebUI redirect doesn't work with stats from /metric/snapshot
> 
>
> Key: MESOS-6988
> URL: https://issues.apache.org/jira/browse/MESOS-6988
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>Assignee: haosdent
>
> The issue described in MESOS-6446 is still not fixed in 1.1.0. (Especially 
> for non-leading masters)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6988) WebUI redirect doesn't work with stats from /metrics/snapshot

2017-01-25 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837848#comment-15837848
 ] 

haosdent commented on MESOS-6988:
-

hi, [~xujyan] I just check the code, it looks like we have always read this 
endpoint from the leading master
{code}
mesos/src/webui/master/static/js/controllers.js:
  428  
  429  var pollMetrics = function() {
  430:   
$http.jsonp(leadingMasterURL('/metrics/snapshot?jsonp=JSON_CALLBACK'))
  431  .success(function(response) {
  432if (updateMetrics($scope, $timeout, response)) {
{code}

Do you encounter any problem in this?

> WebUI redirect doesn't work with stats from /metrics/snapshot
> -
>
> Key: MESOS-6988
> URL: https://issues.apache.org/jira/browse/MESOS-6988
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>Assignee: haosdent
>
> The issue described in MESOS-6446 is still not fixed in 1.1.0. (Especially 
> for non-leading masters)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6988) WebUI redirect doesn't work with stats from /metrics/snapshot

2017-01-25 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6988:

Summary: WebUI redirect doesn't work with stats from /metrics/snapshot  
(was: CLONE - WebUI redirect doesn't work with stats from /metric/snapshot)

> WebUI redirect doesn't work with stats from /metrics/snapshot
> -
>
> Key: MESOS-6988
> URL: https://issues.apache.org/jira/browse/MESOS-6988
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>Assignee: haosdent
>
> The issue described in MESOS-6446 is still not fixed in 1.1.0. (Especially 
> for non-leading masters)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6988) CLONE - WebUI redirect doesn't work with stats from /metric/snapshot

2017-01-25 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6988:
--
Description: 
The issue described in MESOS-6446 is still not fixed in 1.1.0. (Especially for 
non-leading masters)


  was:The issue described in MESOS-6446 is still not fixed in 1.1.0.


> CLONE - WebUI redirect doesn't work with stats from /metric/snapshot
> 
>
> Key: MESOS-6988
> URL: https://issues.apache.org/jira/browse/MESOS-6988
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>
> The issue described in MESOS-6446 is still not fixed in 1.1.0. (Especially 
> for non-leading masters)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6988) CLONE - WebUI redirect doesn't work with stats from /metric/snapshot

2017-01-25 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6988:
--
Target Version/s: 1.2.0

> CLONE - WebUI redirect doesn't work with stats from /metric/snapshot
> 
>
> Key: MESOS-6988
> URL: https://issues.apache.org/jira/browse/MESOS-6988
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>
> The issue described in MESOS-6446 is still not fixed in 1.1.0. (Especially 
> for non-leading masters)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6631) Disallow frameworks from modifying FrameworkInfo.roles.

2017-01-25 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-6631:

Shepherd: Michael Park  (was: Benjamin Mahler)

> Disallow frameworks from modifying FrameworkInfo.roles.
> ---
>
> Key: MESOS-6631
> URL: https://issues.apache.org/jira/browse/MESOS-6631
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>
> In "phase 1" of the multi-role framework support, we want to preserve the 
> existing behavior of single-role framework support in that we disallow 
> frameworks from modifying their role.
> With multi-role framework support, we will initially disallow frameworks from 
> modifying the roles field. Note that in the case that the master has failed 
> over but the framework hasn't re-registered yet, we will use the framework 
> info from the agents to disallow changes to the roles field. We will treat 
> {{FrameworkInfo.roles}} as a set rather than a list, so ordering does not 
> matter for equality.
> One difference between {{role}} and {{roles}} is that for {{role}} 
> modification, we ignore it. But, with {{roles}} modification, since this is a 
> new feature, we can disallow it by rejecting the framework subscription.
> Later, in phase 2, we will allow frameworks to modify their roles, see 
> MESOS-6627.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)