[jira] [Commented] (MESOS-2092) Make ACLs dynamic

2016-11-09 Thread Vijay Srinivasaraghavan (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653244#comment-15653244
 ] 

Vijay Srinivasaraghavan commented on MESOS-2092:


Is anyone working on this feature?

> Make ACLs dynamic
> -
>
> Key: MESOS-2092
> URL: https://issues.apache.org/jira/browse/MESOS-2092
> Project: Mesos
>  Issue Type: Task
>  Components: security
>Reporter: Alexander Rukletsov
>Assignee: Yongqiao Wang
>  Labels: mesosphere, newbie
>
> Master loads ACLs once during its launch and there is no way to update them 
> in a running master. Making them dynamic will allow for updating ACLs on the 
> fly, for example granting a new framework necessary rights.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6479) add ability to execute batch jobs from TaskGroupInfo proto in execute.cpp and add string flag for framework-name

2016-11-09 Thread Hubert Asamer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653183#comment-15653183
 ] 

Hubert Asamer commented on MESOS-6479:
--

Thanks for your responses & clarifications. You may close the issue...

> add ability to execute batch jobs from TaskGroupInfo proto in execute.cpp and 
> add string flag for framework-name
> 
>
> Key: MESOS-6479
> URL: https://issues.apache.org/jira/browse/MESOS-6479
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 1.1.0
> Environment: all
>Reporter: Hubert Asamer
>Assignee: Hubert Asamer
>Priority: Trivial
>  Labels: testing
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Extend execute.cpp to use TaskGroupInfo as container for batch jobs to 
> distribute tasks based on available offers. A simple bool cli flag shall 
> enable/disable such a behavior. If enabled the contents of TaskGroupInfo does 
> not cause the execution of tasks within a "pod" (on a single host) but as 
> distributed jobs (on multiple hosts) 
> As an addition an optional cli flag for setting the temporary framework name 
> (e.g. to better distinguish between running/finished frameworks) could be 
> useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6570) Keep container information around for a time.

2016-11-09 Thread Charles Allen (JIRA)
Charles Allen created MESOS-6570:


 Summary: Keep container information around for a time.
 Key: MESOS-6570
 URL: https://issues.apache.org/jira/browse/MESOS-6570
 Project: Mesos
  Issue Type: Wish
  Components: containerization, HTTP API, statistics
Reporter: Charles Allen


http://mesos.apache.org/documentation/latest/endpoints/slave/containers/ 
describes the stats that are available upon probing an agent. If tasks start 
and finish quickly, they might be missed between probes of the {{/containers}} 
endpoint. The endpoint documentation states that only running containers are 
described.

This ask is that recently terminated containers still report their stats for a 
configurable time, and an extra field be added to the json indicating if the 
container is currently running or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1763) Add support for frameworks to receive resources for multiple roles.

2016-11-09 Thread Charles Allen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652742#comment-15652742
 ] 

Charles Allen commented on MESOS-1763:
--

Is a specific application required to only register one framework? Why would 
you need to modify frameworks to use multiple roles instead of just ensuring 
some application can have multiple parallel registrations with different 
framework IDs?

> Add support for frameworks to receive resources for multiple roles.
> ---
>
> Key: MESOS-1763
> URL: https://issues.apache.org/jira/browse/MESOS-1763
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, framework api, master
>Reporter: Vinod Kone
>Assignee: Benjamin Mahler
>  Labels: mesosphere, multi-tenancy
>
> Currently, a framework can only obtain resources for a single allocation 
> role. This design discusses allowing frameworks to obtain resources for 
> multiple allocation roles.
> Use cases:
> * Allow an instance of a framework to be “multi-tenant” (e.g. Marathon, 
> Aurora, etc). Currently, users run multiple instances of a framework under 
> different roles to support multiple tenants.
> * Allow a framework to further leverage the resource allocation primitives 
> within Mesos to ensure it has sufficient resource guarantees in place (e.g. a 
> framework may want to set different guarantees amongst the tasks it needs to 
> run, without necessarily being multi-tenant).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6569) MesosContainerizer/DefaultExecutorTest.KillTask/0 failing on ASF CI

2016-11-09 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652704#comment-15652704
 ] 

Anand Mazumdar commented on MESOS-6569:
---

Similar to the logic that {{TASK_KILLED}} update for the tasks can be received 
in any order, we need similar logic to ensure that {{TASK_RUNNING}} update for 
tasks can be received in any order by the scheduler.

> MesosContainerizer/DefaultExecutorTest.KillTask/0 failing on ASF CI
> ---
>
> Key: MESOS-6569
> URL: https://issues.apache.org/jira/browse/MESOS-6569
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
> Environment: 
> https://builds.apache.org/job/Mesos/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-6)&&(!ubuntu-eu2)/
>Reporter: Yan Xu
>  Labels: flaky, newbie
>
> {noformat:title=}
> [ RUN  ] MesosContainerizer/DefaultExecutorTest.KillTask/0
> I1110 01:20:11.482097 29700 cluster.cpp:158] Creating default 'local' 
> authorizer
> I1110 01:20:11.485241 29700 leveldb.cpp:174] Opened db in 2.774513ms
> I1110 01:20:11.486237 29700 leveldb.cpp:181] Compacted db in 953614ns
> I1110 01:20:11.486299 29700 leveldb.cpp:196] Created db iterator in 24739ns
> I1110 01:20:11.486325 29700 leveldb.cpp:202] Seeked to beginning of db in 
> 2300ns
> I1110 01:20:11.486344 29700 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 378ns
> I1110 01:20:11.486399 29700 replica.cpp:776] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1110 01:20:11.486933 29733 recover.cpp:451] Starting replica recovery
> I1110 01:20:11.487289 29733 recover.cpp:477] Replica is in EMPTY status
> I1110 01:20:11.488503 29721 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from __req_res__(7318)@172.17.0.3:52462
> I1110 01:20:11.488855 29727 recover.cpp:197] Received a recover response from 
> a replica in EMPTY status
> I1110 01:20:11.489398 29729 recover.cpp:568] Updating replica status to 
> STARTING
> I1110 01:20:11.490223 29723 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 575135ns
> I1110 01:20:11.490284 29732 master.cpp:380] Master 
> d28fbae1-c3dc-45fa-8384-32ab9395a975 (3a31be8bf679) started on 
> 172.17.0.3:52462
> I1110 01:20:11.490317 29732 master.cpp:382] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/k50x7x/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
> --registry_max_agent_count="102400" --registry_store_timeout="100secs" 
> --registry_strict="false" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/mesos/mesos-1.2.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/k50x7x/master" --zk_session_timeout="10secs"
> I1110 01:20:11.490696 29732 master.cpp:432] Master only allowing 
> authenticated frameworks to register
> I1110 01:20:11.490712 29732 master.cpp:446] Master only allowing 
> authenticated agents to register
> I1110 01:20:11.490720 29732 master.cpp:459] Master only allowing 
> authenticated HTTP frameworks to register
> I1110 01:20:11.490730 29732 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/k50x7x/credentials'
> I1110 01:20:11.490281 29723 replica.cpp:320] Persisted replica status to 
> STARTING
> I1110 01:20:11.491210 29732 master.cpp:504] Using default 'crammd5' 
> authenticator
> I1110 01:20:11.491225 29720 recover.cpp:477] Replica is in STARTING status
> I1110 01:20:11.491394 29732 http.cpp:895] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I1110 01:20:11.491621 29732 http.cpp:895] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I1110 01:20:11.491770 29732 http.cpp:895] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I1110 01:20:11.491937 29732 master.cpp:584] Authorization 

[jira] [Commented] (MESOS-6569) MesosContainerizer/DefaultExecutorTest.KillTask/0 failing on ASF CI

2016-11-09 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652668#comment-15652668
 ] 

Yan Xu commented on MESOS-6569:
---

[~vinodkone] any insight?

> MesosContainerizer/DefaultExecutorTest.KillTask/0 failing on ASF CI
> ---
>
> Key: MESOS-6569
> URL: https://issues.apache.org/jira/browse/MESOS-6569
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
> Environment: 
> https://builds.apache.org/job/Mesos/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-6)&&(!ubuntu-eu2)/
>Reporter: Yan Xu
>
> {noformat:title=}
> [ RUN  ] MesosContainerizer/DefaultExecutorTest.KillTask/0
> I1110 01:20:11.482097 29700 cluster.cpp:158] Creating default 'local' 
> authorizer
> I1110 01:20:11.485241 29700 leveldb.cpp:174] Opened db in 2.774513ms
> I1110 01:20:11.486237 29700 leveldb.cpp:181] Compacted db in 953614ns
> I1110 01:20:11.486299 29700 leveldb.cpp:196] Created db iterator in 24739ns
> I1110 01:20:11.486325 29700 leveldb.cpp:202] Seeked to beginning of db in 
> 2300ns
> I1110 01:20:11.486344 29700 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 378ns
> I1110 01:20:11.486399 29700 replica.cpp:776] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1110 01:20:11.486933 29733 recover.cpp:451] Starting replica recovery
> I1110 01:20:11.487289 29733 recover.cpp:477] Replica is in EMPTY status
> I1110 01:20:11.488503 29721 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from __req_res__(7318)@172.17.0.3:52462
> I1110 01:20:11.488855 29727 recover.cpp:197] Received a recover response from 
> a replica in EMPTY status
> I1110 01:20:11.489398 29729 recover.cpp:568] Updating replica status to 
> STARTING
> I1110 01:20:11.490223 29723 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 575135ns
> I1110 01:20:11.490284 29732 master.cpp:380] Master 
> d28fbae1-c3dc-45fa-8384-32ab9395a975 (3a31be8bf679) started on 
> 172.17.0.3:52462
> I1110 01:20:11.490317 29732 master.cpp:382] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/k50x7x/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --http_authenticators="basic" --http_framework_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --quiet="false" --recovery_agent_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
> --registry_max_agent_count="102400" --registry_store_timeout="100secs" 
> --registry_strict="false" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/mesos/mesos-1.2.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/k50x7x/master" --zk_session_timeout="10secs"
> I1110 01:20:11.490696 29732 master.cpp:432] Master only allowing 
> authenticated frameworks to register
> I1110 01:20:11.490712 29732 master.cpp:446] Master only allowing 
> authenticated agents to register
> I1110 01:20:11.490720 29732 master.cpp:459] Master only allowing 
> authenticated HTTP frameworks to register
> I1110 01:20:11.490730 29732 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/k50x7x/credentials'
> I1110 01:20:11.490281 29723 replica.cpp:320] Persisted replica status to 
> STARTING
> I1110 01:20:11.491210 29732 master.cpp:504] Using default 'crammd5' 
> authenticator
> I1110 01:20:11.491225 29720 recover.cpp:477] Replica is in STARTING status
> I1110 01:20:11.491394 29732 http.cpp:895] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I1110 01:20:11.491621 29732 http.cpp:895] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I1110 01:20:11.491770 29732 http.cpp:895] Using default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I1110 01:20:11.491937 29732 master.cpp:584] Authorization enabled
> I1110 01:20:11.492276 29725 whitelist_watcher.cpp:77] No whitelist given
> I1110 01:20:11.492310 29723 hierarchical.cpp:149] Initialized hierarchical 
> allocator process
> I1110 01:20:11.492569 29721 replica.cpp:673] Replica in S

[jira] [Created] (MESOS-6569) MesosContainerizer/DefaultExecutorTest.KillTask/0 failing on ASF CI

2016-11-09 Thread Yan Xu (JIRA)
Yan Xu created MESOS-6569:
-

 Summary: MesosContainerizer/DefaultExecutorTest.KillTask/0 failing 
on ASF CI
 Key: MESOS-6569
 URL: https://issues.apache.org/jira/browse/MESOS-6569
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.1.0
 Environment: 
https://builds.apache.org/job/Mesos/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-6)&&(!ubuntu-eu2)/
Reporter: Yan Xu




{noformat:title=}
[ RUN  ] MesosContainerizer/DefaultExecutorTest.KillTask/0
I1110 01:20:11.482097 29700 cluster.cpp:158] Creating default 'local' authorizer
I1110 01:20:11.485241 29700 leveldb.cpp:174] Opened db in 2.774513ms
I1110 01:20:11.486237 29700 leveldb.cpp:181] Compacted db in 953614ns
I1110 01:20:11.486299 29700 leveldb.cpp:196] Created db iterator in 24739ns
I1110 01:20:11.486325 29700 leveldb.cpp:202] Seeked to beginning of db in 2300ns
I1110 01:20:11.486344 29700 leveldb.cpp:271] Iterated through 0 keys in the db 
in 378ns
I1110 01:20:11.486399 29700 replica.cpp:776] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1110 01:20:11.486933 29733 recover.cpp:451] Starting replica recovery
I1110 01:20:11.487289 29733 recover.cpp:477] Replica is in EMPTY status
I1110 01:20:11.488503 29721 replica.cpp:673] Replica in EMPTY status received a 
broadcasted recover request from __req_res__(7318)@172.17.0.3:52462
I1110 01:20:11.488855 29727 recover.cpp:197] Received a recover response from a 
replica in EMPTY status
I1110 01:20:11.489398 29729 recover.cpp:568] Updating replica status to STARTING
I1110 01:20:11.490223 29723 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 575135ns
I1110 01:20:11.490284 29732 master.cpp:380] Master 
d28fbae1-c3dc-45fa-8384-32ab9395a975 (3a31be8bf679) started on 172.17.0.3:52462
I1110 01:20:11.490317 29732 master.cpp:382] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/k50x7x/credentials" 
--framework_sorter="drf" --help="false" --hostname_lookup="true" 
--http_authenticators="basic" --http_framework_authenticators="basic" 
--initialize_driver_logging="true" --log_auto_initialize="true" 
--logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--quiet="false" --recovery_agent_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registry_max_agent_age="2weeks" 
--registry_max_agent_count="102400" --registry_store_timeout="100secs" 
--registry_strict="false" --root_submissions="true" --user_sorter="drf" 
--version="false" --webui_dir="/mesos/mesos-1.2.0/_inst/share/mesos/webui" 
--work_dir="/tmp/k50x7x/master" --zk_session_timeout="10secs"
I1110 01:20:11.490696 29732 master.cpp:432] Master only allowing authenticated 
frameworks to register
I1110 01:20:11.490712 29732 master.cpp:446] Master only allowing authenticated 
agents to register
I1110 01:20:11.490720 29732 master.cpp:459] Master only allowing authenticated 
HTTP frameworks to register
I1110 01:20:11.490730 29732 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/k50x7x/credentials'
I1110 01:20:11.490281 29723 replica.cpp:320] Persisted replica status to 
STARTING
I1110 01:20:11.491210 29732 master.cpp:504] Using default 'crammd5' 
authenticator
I1110 01:20:11.491225 29720 recover.cpp:477] Replica is in STARTING status
I1110 01:20:11.491394 29732 http.cpp:895] Using default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I1110 01:20:11.491621 29732 http.cpp:895] Using default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I1110 01:20:11.491770 29732 http.cpp:895] Using default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I1110 01:20:11.491937 29732 master.cpp:584] Authorization enabled
I1110 01:20:11.492276 29725 whitelist_watcher.cpp:77] No whitelist given
I1110 01:20:11.492310 29723 hierarchical.cpp:149] Initialized hierarchical 
allocator process
I1110 01:20:11.492569 29721 replica.cpp:673] Replica in STARTING status 
received a broadcasted recover request from __req_res__(7319)@172.17.0.3:52462
I1110 01:20:11.492830 29719 recover.cpp:197] Received a recover response from a 
replica in STARTING status
I1110 01:20:11.493371 29720 recover.cpp:568] Updating replica status to VOTING
I1110 01:20:11.494002 29721 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 367673ns
I1110 01:20:11.494032 2972

[jira] [Commented] (MESOS-6567) Actively Scan for CNI Configurations

2016-11-09 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652501#comment-15652501
 ] 

Qian Zhang commented on MESOS-6567:
---

Currently, the modification to an existing CNI network configuration file under 
{{--network_cni_config_dir}} will be automatically picked up by Mesos agent 
when it attaches a container to the CNI network.

> Actively Scan for CNI Configurations
> 
>
> Key: MESOS-6567
> URL: https://issues.apache.org/jira/browse/MESOS-6567
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Dan Osborne
>
> Mesos-Agent currently loads the CNI configs into memory at startup. After 
> this point, new configurations that are added will remain unknown to the 
> Mesos Agent process until it is restarted.
> This ticket is to request that the Mesos Agent process can the CNI config 
> directory each time it is networking a task, so that modifying, adding, and 
> removing networks will not require a slave reboot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6464) Add fine grained control of which namespaces a nested container should inherit (or not).

2016-11-09 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-6464:
--
Shepherd: Jie Yu

> Add fine grained control of which namespaces a nested container should 
> inherit (or not).
> 
>
> Key: MESOS-6464
> URL: https://issues.apache.org/jira/browse/MESOS-6464
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> We need finer grained control of which namespaces / cgroups a nested 
> container should inherit or not.
> Right now, there are some implicit assumptions about which cgroups we enter 
> and which namespaces we inherit when we launch a nested container. For 
> example, under the current semantics, a nested container will always get a 
> new pid namespace but inherit the network namespace from its parent. 
> Moreover, nested containers will always inherit all of the cgroups from their 
> parent (except the freezer cgroup), with no possiblity of choosing any 
> different configuration.
> My current thinking is to pass the set of isolators to 
> {{containerizer->launch()} that we would like to have invoked as part of 
> launching a new container. Only if that isolator is enabled (via the agent 
> flags) AND it is passed in via {{launch()}, will it be used to isolate the 
> new container (note that both cgroup isolation as well as namespace 
> membership also implemented using isolators).  This is a sort of a whitelist 
> approach, where we have to know the full set of isolators we want our 
> container launched with ahead of time.
> Alternatively, we could consider passing in the set of isolators that we 
> would like *disabled* instead.  This way we could blacklist certain isolators 
> from kicking in, even if they have been enabled via the agent flags.
> In both approaches, one major caveat of this is that it will have to become 
> part of the top-level containerizer API, but it is specific only to the 
> universal containerizer. Maybe this is OK as we phase out the docker 
> containerizer anyway.
> I am leaning towards the blacklist approach at the moment...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5900) Support Unix domain socket connections in libprocess

2016-11-09 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5900:
--
Shepherd: Jie Yu

> Support Unix domain socket connections in libprocess
> 
>
> Key: MESOS-5900
> URL: https://issues.apache.org/jira/browse/MESOS-5900
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Neil Conway
>Assignee: Benjamin Hindman
>  Labels: mesosphere
>
> We should consider allowing two programs on the same host using libprocess to 
> communicate via Unix domain sockets rather than TCP. This has a few 
> advantages:
> * Security: remote hosts cannot connect to the Unix socket. Domain sockets 
> also offer additional support for 
> [authentication|https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/sect-Defensive_Coding-Authentication-UNIX_Domain.html].
> * Performance: domain sockets are marginally faster than localhost TCP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6556) UTS namespace isolator

2016-11-09 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652210#comment-15652210
 ] 

James Peach commented on MESOS-6556:


| Add net::setDomainname() helper API. | https://reviews.apache.org/r/53626/ |
| Implement a namespaces/uts isolator. | https://reviews.apache.org/r/53627/ |
|  Document the namespaces/uts isolator. | https://reviews.apache.org/r/53628/ |

> UTS namespace isolator
> --
>
> Key: MESOS-6556
> URL: https://issues.apache.org/jira/browse/MESOS-6556
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Add a {{namespace/uts}} isolator for doing UTS namespace isolation without 
> using the CNI isolator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6562) Use JSON content type in mesos-execute.

2016-11-09 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652174#comment-15652174
 ] 

James Peach commented on MESOS-6562:


| Use JSON content type in mesos-execute.  | 
https://reviews.apache.org/r/53624/| 

> Use JSON content type in mesos-execute.
> ---
>
> Key: MESOS-6562
> URL: https://issues.apache.org/jira/browse/MESOS-6562
> Project: Mesos
>  Issue Type: Improvement
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> Use the {{mesos::ContentType::JSON}} in {{mesas-execute}} so that we can 
> easily packet trace the scheduler interactions. This makes debugging a lot 
> easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6479) add ability to execute batch jobs from TaskGroupInfo proto in execute.cpp and add string flag for framework-name

2016-11-09 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651999#comment-15651999
 ] 

Vinod Kone commented on MESOS-6479:
---

Agree with Joseph here.

> add ability to execute batch jobs from TaskGroupInfo proto in execute.cpp and 
> add string flag for framework-name
> 
>
> Key: MESOS-6479
> URL: https://issues.apache.org/jira/browse/MESOS-6479
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 1.1.0
> Environment: all
>Reporter: Hubert Asamer
>Assignee: Hubert Asamer
>Priority: Trivial
>  Labels: testing
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Extend execute.cpp to use TaskGroupInfo as container for batch jobs to 
> distribute tasks based on available offers. A simple bool cli flag shall 
> enable/disable such a behavior. If enabled the contents of TaskGroupInfo does 
> not cause the execution of tasks within a "pod" (on a single host) but as 
> distributed jobs (on multiple hosts) 
> As an addition an optional cli flag for setting the temporary framework name 
> (e.g. to better distinguish between running/finished frameworks) could be 
> useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Updated] (MESOS-6568) JSON serialization should not omit empty arrays in HTTP APIs

2016-11-09 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-6568:
---
Description: 
When using the JSON content type with the HTTP APIs, a {{repeated}} protobuf 
field is omitted entirely from the JSON serialization of the message. For 
example, this is a response to the {{GetTasks}} call:

{noformat}
{
  "get_tasks": {
"tasks": [{...}]
  },
  "type": "GET_TASKS"
}
{noformat}

I think it would be better to include empty arrays for the other fields of the 
message ({{pending_tasks}}, {{completed_tasks}}, etc.). Advantages:

# Consistency with the old HTTP endpoints, e.g., /state
# Semantically, an empty array is more accurate. The master's response should 
be interpreted as saying it doesn't know about any pending/completed tasks; 
that is more accurately conveyed by explicitly including an empty array, not by 
omitting the key entirely.

  was:
When using the JSON content type with the HTTP APIs, a {{repeated}} protobuf 
field is omitted entirely from the JSON serialization of the message. For 
example, this is a response to the {{GetTasks}} call:

{noformat}
{
  "get_tasks": {
"tasks": [{...}]
  },
  "type": "GET_TASKS"
}
{noformat}

I think it would be better to include empty arrays for the other fields of the 
message ({{pending_tasks}}, {{completed_tasks}}, etc.). Advantages:

1. Consistency with the old HTTP endpoints, e.g., /state
2. Semantically, an empty array is more accurate. The master's response should 
be interpreted as saying it doesn't know about any pending/completed tasks; 
that is more accurately conveyed by explicitly including an empty array, not by 
omitting the key entirely.


> JSON serialization should not omit empty arrays in HTTP APIs
> 
>
> Key: MESOS-6568
> URL: https://issues.apache.org/jira/browse/MESOS-6568
> Project: Mesos
>  Issue Type: Improvement
>  Components: HTTP API
>Reporter: Neil Conway
>  Labels: mesosphere
>
> When using the JSON content type with the HTTP APIs, a {{repeated}} protobuf 
> field is omitted entirely from the JSON serialization of the message. For 
> example, this is a response to the {{GetTasks}} call:
> {noformat}
> {
>   "get_tasks": {
> "tasks": [{...}]
>   },
>   "type": "GET_TASKS"
> }
> {noformat}
> I think it would be better to include empty arrays for the other fields of 
> the message ({{pending_tasks}}, {{completed_tasks}}, etc.). Advantages:
> # Consistency with the old HTTP endpoints, e.g., /state
> # Semantically, an empty array is more accurate. The master's response should 
> be interpreted as saying it doesn't know about any pending/completed tasks; 
> that is more accurately conveyed by explicitly including an empty array, not 
> by omitting the key entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6568) JSON serialization should not omit empty arrays in HTTP APIs

2016-11-09 Thread Neil Conway (JIRA)
Neil Conway created MESOS-6568:
--

 Summary: JSON serialization should not omit empty arrays in HTTP 
APIs
 Key: MESOS-6568
 URL: https://issues.apache.org/jira/browse/MESOS-6568
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API
Reporter: Neil Conway


When using the JSON content type with the HTTP APIs, a {{repeated}} protobuf 
field is omitted entirely from the JSON serialization of the message. For 
example, this is a response to the {{GetTasks}} call:

{noformat}
{
  "get_tasks": {
"tasks": [{...}]
  },
  "type": "GET_TASKS"
}
{noformat}

I think it would be better to include empty arrays for the other fields of the 
message ({{pending_tasks}}, {{completed_tasks}}, etc.). Advantages:

1. Consistency with the old HTTP endpoints, e.g., /state
2. Semantically, an empty array is more accurate. The master's response should 
be interpreted as saying it doesn't know about any pending/completed tasks; 
that is more accurately conveyed by explicitly including an empty array, not by 
omitting the key entirely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem

2016-11-09 Thread Jason Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651768#comment-15651768
 ] 

Jason Lai commented on MESOS-6162:
--

Thanks Haosdent! I also asked to Zhitao to temporarily hold this task for me 
before I can assign it to myself.

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: Zhitao Li
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6567) Actively Scan for CNI Configurations

2016-11-09 Thread Dan Osborne (JIRA)
Dan Osborne created MESOS-6567:
--

 Summary: Actively Scan for CNI Configurations
 Key: MESOS-6567
 URL: https://issues.apache.org/jira/browse/MESOS-6567
 Project: Mesos
  Issue Type: Improvement
Reporter: Dan Osborne


Mesos-Agent currently loads the CNI configs into memory at startup. After this 
point, new configurations that are added will remain unknown to the Mesos Agent 
process until it is restarted.

This ticket is to request that the Mesos Agent process can the CNI config 
directory each time it is networking a task, so that modifying, adding, and 
removing networks will not require a slave reboot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6479) add ability to execute batch jobs from TaskGroupInfo proto in execute.cpp and add string flag for framework-name

2016-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-6479:
-
Labels: testing  (was: newbie newbie++ testing)

In my opinion, we may not want to introduce these capabilities to 
{{mesos-execute}}:

* Batch capabilities are already available in some community frameworks, like 
Chronos and Aurora (and probably others).
* Thus far, the purpose of {{mesos-execute}} has been for single-use commands, 
usually for testing things on a small/single-node cluster.  {{mesos-execute}} 
sits somewhere between being an example framework and a tool for people to play 
with.  We keep it up to date, but we also want to keep {{mesos-execute}} as 
simple as possible.
* Breaking apart a {{TaskGroup}} into multiple separate {{Task}}'s effectively 
negates the concept of a {{TaskGroup}}.  You'd be better off running 
{{mesos-execute}} once per {{Task}}.

> add ability to execute batch jobs from TaskGroupInfo proto in execute.cpp and 
> add string flag for framework-name
> 
>
> Key: MESOS-6479
> URL: https://issues.apache.org/jira/browse/MESOS-6479
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 1.1.0
> Environment: all
>Reporter: Hubert Asamer
>Assignee: Hubert Asamer
>Priority: Trivial
>  Labels: testing
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Extend execute.cpp to use TaskGroupInfo as container for batch jobs to 
> distribute tasks based on available offers. A simple bool cli flag shall 
> enable/disable such a behavior. If enabled the contents of TaskGroupInfo does 
> not cause the execution of tasks within a "pod" (on a single host) but as 
> distributed jobs (on multiple hosts) 
> As an addition an optional cli flag for setting the temporary framework name 
> (e.g. to better distinguish between running/finished frameworks) could be 
> useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6162) Add support for cgroups blkio subsystem

2016-11-09 Thread Zhitao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhitao Li reassigned MESOS-6162:


Assignee: Zhitao Li

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>Assignee: Zhitao Li
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6563) Shared Filesystem Isolator does not clean up mounts

2016-11-09 Thread Ilya Pronin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651538#comment-15651538
 ] 

Ilya Pronin edited comment on MESOS-6563 at 11/9/16 5:52 PM:
-

For the history. I ran tests on CentOS 7 and CentOS 5. On CentOS7 the problem 
appears because the mount propagation mode for the root fs is set to shared 
(looks like systemd does that). So {{/tmp}} mount appears in all namespaces and 
messes with subsequent mounts for other containers.

Fixing it would require introducing a new private mount point. Looks like 
that's what {{filesystem/linux}} isolator does.


was (Author: ipronin):
For the history. I ran tests on CentOS 7 and CentOS 5. On CentOS7 the problem 
appears because the mount propagation mode for the root fs is set to shared. So 
{{/tmp}} mount appears in all namespaces and messes with subsequent mounts for 
other containers.

Fixing it would require introducing a new private mount point. Looks like 
that's what {{filesystem/linux}} isolator does.

> Shared Filesystem Isolator does not clean up mounts
> ---
>
> Key: MESOS-6563
> URL: https://issues.apache.org/jira/browse/MESOS-6563
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: David Robinson
>Assignee: Ilya Pronin
>
> While testing the agent's 'filesystem/shared' isolator we discovered that 
> mounts are not unmounted, agents ended up with 1000s of mounts, one for each 
> task that has run.
> To reproduce the problem start a mesos agent w/ 
> --isolation="filesystem/shared" and 
> --default_container_info="file:///tmp/the-container-info-below.json", then 
> launch and kill several tasks. After the tasks are killed the mount points 
> should be unmounted, but they are not.
> {noformat:title=container info}
> {
> "type": "MESOS",
> "volumes": [
> {
> "container_path": "/tmp",
> "host_path": "tmp",
> "mode": "RW"
> }
> ]
> }
> {noformat}
> Mounts are supposed to be [cleaned automatically by the kernel when the 
> process 
> exits|https://github.com/apache/mesos/blob/3845ab8af83a6eebfbf32e98f9000ab695cf2661/src/slave/containerizer/mesos/isolators/filesystem/shared.cpp#L70].
> {noformat}
> // We only need to implement the `prepare()` function in this
> // isolator. There is nothing to recover because we do not keep any
> // state and do not monitor filesystem usage or perform any action on
> // cleanup. Cleanup of mounts is done automatically done by the kernel
> // when the mount namespace is destroyed after the last process
> // terminates.
> Future> SharedFilesystemIsolatorProcess::prepare(
> const ContainerID& containerId,
> const ContainerConfig& containerConfig)
> {
> {noformat}
> We found during testing that an agent would have 1000s of dangling mounts, 
> all of them attributed to the mesos agent:
> {noformat}
> root[7]server-001 ~ # tail /proc/mounts
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dda59747-848a-4b3b-8424-d0032f8a38f7/runs/e31bea31-22d7-4758-bc8b-6837919d7ed7/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-3a001926-a442-45c4-9cbc-dad182954fed/runs/bd0a8e36-d147-4511-9cc5-afff9f1c0fbe/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-04204a72-53d8-44a8-bac5-613835ff85a7/runs/967739ea-5284-41ed-af1a-1cb5a77dd690/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-95d1ac39-323a-4c15-b1dc-645ed79c4128/runs/6ff6d2b3-2867-4ad4-b2bb-20e27a0fa925/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-91f6a946-f560-43a3-95c2-424c5dd71684/runs/a4821acc-58f8-4457-bdc9-bd83bdeb8231/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dd3b34f1-10c6-43d3-8741-a3164a642e93/runs/0ef8cf17-6c18-48a4-9943-66c448de5d44/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2

[jira] [Commented] (MESOS-6563) Shared Filesystem Isolator does not clean up mounts

2016-11-09 Thread Ilya Pronin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651538#comment-15651538
 ] 

Ilya Pronin commented on MESOS-6563:


For the history. I ran tests on CentOS 7 and CentOS 5. On CentOS7 the problem 
appears because the mount propagation mode for the root fs is set to shared. So 
{{/tmp}} mount appears in all namespaces and messes with subsequent mounts for 
other containers.

Fixing it would require introducing a new private mount point. Looks like 
that's what {{filesystem/linux}} isolator does.

> Shared Filesystem Isolator does not clean up mounts
> ---
>
> Key: MESOS-6563
> URL: https://issues.apache.org/jira/browse/MESOS-6563
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: David Robinson
>Assignee: Ilya Pronin
>
> While testing the agent's 'filesystem/shared' isolator we discovered that 
> mounts are not unmounted, agents ended up with 1000s of mounts, one for each 
> task that has run.
> To reproduce the problem start a mesos agent w/ 
> --isolation="filesystem/shared" and 
> --default_container_info="file:///tmp/the-container-info-below.json", then 
> launch and kill several tasks. After the tasks are killed the mount points 
> should be unmounted, but they are not.
> {noformat:title=container info}
> {
> "type": "MESOS",
> "volumes": [
> {
> "container_path": "/tmp",
> "host_path": "tmp",
> "mode": "RW"
> }
> ]
> }
> {noformat}
> Mounts are supposed to be [cleaned automatically by the kernel when the 
> process 
> exits|https://github.com/apache/mesos/blob/3845ab8af83a6eebfbf32e98f9000ab695cf2661/src/slave/containerizer/mesos/isolators/filesystem/shared.cpp#L70].
> {noformat}
> // We only need to implement the `prepare()` function in this
> // isolator. There is nothing to recover because we do not keep any
> // state and do not monitor filesystem usage or perform any action on
> // cleanup. Cleanup of mounts is done automatically done by the kernel
> // when the mount namespace is destroyed after the last process
> // terminates.
> Future> SharedFilesystemIsolatorProcess::prepare(
> const ContainerID& containerId,
> const ContainerConfig& containerConfig)
> {
> {noformat}
> We found during testing that an agent would have 1000s of dangling mounts, 
> all of them attributed to the mesos agent:
> {noformat}
> root[7]server-001 ~ # tail /proc/mounts
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dda59747-848a-4b3b-8424-d0032f8a38f7/runs/e31bea31-22d7-4758-bc8b-6837919d7ed7/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-3a001926-a442-45c4-9cbc-dad182954fed/runs/bd0a8e36-d147-4511-9cc5-afff9f1c0fbe/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-04204a72-53d8-44a8-bac5-613835ff85a7/runs/967739ea-5284-41ed-af1a-1cb5a77dd690/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-95d1ac39-323a-4c15-b1dc-645ed79c4128/runs/6ff6d2b3-2867-4ad4-b2bb-20e27a0fa925/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-91f6a946-f560-43a3-95c2-424c5dd71684/runs/a4821acc-58f8-4457-bdc9-bd83bdeb8231/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dd3b34f1-10c6-43d3-8741-a3164a642e93/runs/0ef8cf17-6c18-48a4-9943-66c448de5d44/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-fb704ef8-1cf9-4d35-854d-7b6247cf4bc2/runs/e65ec976-057f-4939-9053-1ddcddfc98f8/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-cdf7b06d-2265-41fe-b1e9-84366dc88b62/runs/1bed4289-7442-4a91-bf45-a7de10ab79bb/tmp
>  xfs rw,noatime,attr2,

[jira] [Commented] (MESOS-6563) Shared Filesystem Isolator does not clean up mounts

2016-11-09 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651527#comment-15651527
 ] 

Jie Yu commented on MESOS-6563:
---

[~idownes] Yes, this is correct.

> Shared Filesystem Isolator does not clean up mounts
> ---
>
> Key: MESOS-6563
> URL: https://issues.apache.org/jira/browse/MESOS-6563
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: David Robinson
>Assignee: Ilya Pronin
>
> While testing the agent's 'filesystem/shared' isolator we discovered that 
> mounts are not unmounted, agents ended up with 1000s of mounts, one for each 
> task that has run.
> To reproduce the problem start a mesos agent w/ 
> --isolation="filesystem/shared" and 
> --default_container_info="file:///tmp/the-container-info-below.json", then 
> launch and kill several tasks. After the tasks are killed the mount points 
> should be unmounted, but they are not.
> {noformat:title=container info}
> {
> "type": "MESOS",
> "volumes": [
> {
> "container_path": "/tmp",
> "host_path": "tmp",
> "mode": "RW"
> }
> ]
> }
> {noformat}
> Mounts are supposed to be [cleaned automatically by the kernel when the 
> process 
> exits|https://github.com/apache/mesos/blob/3845ab8af83a6eebfbf32e98f9000ab695cf2661/src/slave/containerizer/mesos/isolators/filesystem/shared.cpp#L70].
> {noformat}
> // We only need to implement the `prepare()` function in this
> // isolator. There is nothing to recover because we do not keep any
> // state and do not monitor filesystem usage or perform any action on
> // cleanup. Cleanup of mounts is done automatically done by the kernel
> // when the mount namespace is destroyed after the last process
> // terminates.
> Future> SharedFilesystemIsolatorProcess::prepare(
> const ContainerID& containerId,
> const ContainerConfig& containerConfig)
> {
> {noformat}
> We found during testing that an agent would have 1000s of dangling mounts, 
> all of them attributed to the mesos agent:
> {noformat}
> root[7]server-001 ~ # tail /proc/mounts
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dda59747-848a-4b3b-8424-d0032f8a38f7/runs/e31bea31-22d7-4758-bc8b-6837919d7ed7/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-3a001926-a442-45c4-9cbc-dad182954fed/runs/bd0a8e36-d147-4511-9cc5-afff9f1c0fbe/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-04204a72-53d8-44a8-bac5-613835ff85a7/runs/967739ea-5284-41ed-af1a-1cb5a77dd690/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-95d1ac39-323a-4c15-b1dc-645ed79c4128/runs/6ff6d2b3-2867-4ad4-b2bb-20e27a0fa925/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-91f6a946-f560-43a3-95c2-424c5dd71684/runs/a4821acc-58f8-4457-bdc9-bd83bdeb8231/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dd3b34f1-10c6-43d3-8741-a3164a642e93/runs/0ef8cf17-6c18-48a4-9943-66c448de5d44/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-fb704ef8-1cf9-4d35-854d-7b6247cf4bc2/runs/e65ec976-057f-4939-9053-1ddcddfc98f8/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-cdf7b06d-2265-41fe-b1e9-84366dc88b62/runs/1bed4289-7442-4a91-bf45-a7de10ab79bb/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-58582496-e551-4d80-8ae5-9eacac5e8a36/runs/6b5a7f56-af89-4eab-bbfa-883ca43744ad/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/s

[jira] [Commented] (MESOS-6563) Shared Filesystem Isolator does not clean up mounts

2016-11-09 Thread Ian Downes (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651503#comment-15651503
 ] 

Ian Downes commented on MESOS-6563:
---

[~jieyu] Am I correct in understanding that the {{filesystem/linux}} isolator 
supports per container filesystems but doesn't require them? If so, then yes, 
I'd agree that we should just use that isolator instead.

> Shared Filesystem Isolator does not clean up mounts
> ---
>
> Key: MESOS-6563
> URL: https://issues.apache.org/jira/browse/MESOS-6563
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: David Robinson
>Assignee: Ilya Pronin
>
> While testing the agent's 'filesystem/shared' isolator we discovered that 
> mounts are not unmounted, agents ended up with 1000s of mounts, one for each 
> task that has run.
> To reproduce the problem start a mesos agent w/ 
> --isolation="filesystem/shared" and 
> --default_container_info="file:///tmp/the-container-info-below.json", then 
> launch and kill several tasks. After the tasks are killed the mount points 
> should be unmounted, but they are not.
> {noformat:title=container info}
> {
> "type": "MESOS",
> "volumes": [
> {
> "container_path": "/tmp",
> "host_path": "tmp",
> "mode": "RW"
> }
> ]
> }
> {noformat}
> Mounts are supposed to be [cleaned automatically by the kernel when the 
> process 
> exits|https://github.com/apache/mesos/blob/3845ab8af83a6eebfbf32e98f9000ab695cf2661/src/slave/containerizer/mesos/isolators/filesystem/shared.cpp#L70].
> {noformat}
> // We only need to implement the `prepare()` function in this
> // isolator. There is nothing to recover because we do not keep any
> // state and do not monitor filesystem usage or perform any action on
> // cleanup. Cleanup of mounts is done automatically done by the kernel
> // when the mount namespace is destroyed after the last process
> // terminates.
> Future> SharedFilesystemIsolatorProcess::prepare(
> const ContainerID& containerId,
> const ContainerConfig& containerConfig)
> {
> {noformat}
> We found during testing that an agent would have 1000s of dangling mounts, 
> all of them attributed to the mesos agent:
> {noformat}
> root[7]server-001 ~ # tail /proc/mounts
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dda59747-848a-4b3b-8424-d0032f8a38f7/runs/e31bea31-22d7-4758-bc8b-6837919d7ed7/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-3a001926-a442-45c4-9cbc-dad182954fed/runs/bd0a8e36-d147-4511-9cc5-afff9f1c0fbe/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-04204a72-53d8-44a8-bac5-613835ff85a7/runs/967739ea-5284-41ed-af1a-1cb5a77dd690/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-95d1ac39-323a-4c15-b1dc-645ed79c4128/runs/6ff6d2b3-2867-4ad4-b2bb-20e27a0fa925/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-91f6a946-f560-43a3-95c2-424c5dd71684/runs/a4821acc-58f8-4457-bdc9-bd83bdeb8231/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dd3b34f1-10c6-43d3-8741-a3164a642e93/runs/0ef8cf17-6c18-48a4-9943-66c448de5d44/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-fb704ef8-1cf9-4d35-854d-7b6247cf4bc2/runs/e65ec976-057f-4939-9053-1ddcddfc98f8/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-cdf7b06d-2265-41fe-b1e9-84366dc88b62/runs/1bed4289-7442-4a91-bf45-a7de10ab79bb/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-dro

[jira] [Commented] (MESOS-6563) Shared Filesystem Isolator does not clean up mounts

2016-11-09 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651442#comment-15651442
 ] 

Jie Yu commented on MESOS-6563:
---

Any reason not use the filesystem/linux isolator which does not have this issue 
and is a super set of this isolator?

filesystem/shared isolator has been deprecated for a while and will be removed 
from the tree.

cc [~idownes] [~drobinson]

> Shared Filesystem Isolator does not clean up mounts
> ---
>
> Key: MESOS-6563
> URL: https://issues.apache.org/jira/browse/MESOS-6563
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: David Robinson
>Assignee: Ilya Pronin
>
> While testing the agent's 'filesystem/shared' isolator we discovered that 
> mounts are not unmounted, agents ended up with 1000s of mounts, one for each 
> task that has run.
> To reproduce the problem start a mesos agent w/ 
> --isolation="filesystem/shared" and 
> --default_container_info="file:///tmp/the-container-info-below.json", then 
> launch and kill several tasks. After the tasks are killed the mount points 
> should be unmounted, but they are not.
> {noformat:title=container info}
> {
> "type": "MESOS",
> "volumes": [
> {
> "container_path": "/tmp",
> "host_path": "tmp",
> "mode": "RW"
> }
> ]
> }
> {noformat}
> Mounts are supposed to be [cleaned automatically by the kernel when the 
> process 
> exits|https://github.com/apache/mesos/blob/3845ab8af83a6eebfbf32e98f9000ab695cf2661/src/slave/containerizer/mesos/isolators/filesystem/shared.cpp#L70].
> {noformat}
> // We only need to implement the `prepare()` function in this
> // isolator. There is nothing to recover because we do not keep any
> // state and do not monitor filesystem usage or perform any action on
> // cleanup. Cleanup of mounts is done automatically done by the kernel
> // when the mount namespace is destroyed after the last process
> // terminates.
> Future> SharedFilesystemIsolatorProcess::prepare(
> const ContainerID& containerId,
> const ContainerConfig& containerConfig)
> {
> {noformat}
> We found during testing that an agent would have 1000s of dangling mounts, 
> all of them attributed to the mesos agent:
> {noformat}
> root[7]server-001 ~ # tail /proc/mounts
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dda59747-848a-4b3b-8424-d0032f8a38f7/runs/e31bea31-22d7-4758-bc8b-6837919d7ed7/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-3a001926-a442-45c4-9cbc-dad182954fed/runs/bd0a8e36-d147-4511-9cc5-afff9f1c0fbe/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-04204a72-53d8-44a8-bac5-613835ff85a7/runs/967739ea-5284-41ed-af1a-1cb5a77dd690/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-95d1ac39-323a-4c15-b1dc-645ed79c4128/runs/6ff6d2b3-2867-4ad4-b2bb-20e27a0fa925/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-91f6a946-f560-43a3-95c2-424c5dd71684/runs/a4821acc-58f8-4457-bdc9-bd83bdeb8231/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dd3b34f1-10c6-43d3-8741-a3164a642e93/runs/0ef8cf17-6c18-48a4-9943-66c448de5d44/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-fb704ef8-1cf9-4d35-854d-7b6247cf4bc2/runs/e65ec976-057f-4939-9053-1ddcddfc98f8/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-cdf7b06d-2265-41fe-b1e9-84366dc88b62/runs/1bed4289-7442-4a91-bf45-a7de10ab79bb/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/ex

[jira] [Commented] (MESOS-6563) Shared Filesystem Isolator does not clean up mounts

2016-11-09 Thread Ilya Pronin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651295#comment-15651295
 ] 

Ilya Pronin commented on MESOS-6563:


Mounts list looks strange. Like there's something mounted TO the directory that 
was supposed to be mounted as {{/tmp}}. Investigating.

> Shared Filesystem Isolator does not clean up mounts
> ---
>
> Key: MESOS-6563
> URL: https://issues.apache.org/jira/browse/MESOS-6563
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: David Robinson
>Assignee: Ilya Pronin
>
> While testing the agent's 'filesystem/shared' isolator we discovered that 
> mounts are not unmounted, agents ended up with 1000s of mounts, one for each 
> task that has run.
> To reproduce the problem start a mesos agent w/ 
> --isolation="filesystem/shared" and 
> --default_container_info="file:///tmp/the-container-info-below.json", then 
> launch and kill several tasks. After the tasks are killed the mount points 
> should be unmounted, but they are not.
> {noformat:title=container info}
> {
> "type": "MESOS",
> "volumes": [
> {
> "container_path": "/tmp",
> "host_path": "tmp",
> "mode": "RW"
> }
> ]
> }
> {noformat}
> Mounts are supposed to be [cleaned automatically by the kernel when the 
> process 
> exits|https://github.com/apache/mesos/blob/3845ab8af83a6eebfbf32e98f9000ab695cf2661/src/slave/containerizer/mesos/isolators/filesystem/shared.cpp#L70].
> {noformat}
> // We only need to implement the `prepare()` function in this
> // isolator. There is nothing to recover because we do not keep any
> // state and do not monitor filesystem usage or perform any action on
> // cleanup. Cleanup of mounts is done automatically done by the kernel
> // when the mount namespace is destroyed after the last process
> // terminates.
> Future> SharedFilesystemIsolatorProcess::prepare(
> const ContainerID& containerId,
> const ContainerConfig& containerConfig)
> {
> {noformat}
> We found during testing that an agent would have 1000s of dangling mounts, 
> all of them attributed to the mesos agent:
> {noformat}
> root[7]server-001 ~ # tail /proc/mounts
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dda59747-848a-4b3b-8424-d0032f8a38f7/runs/e31bea31-22d7-4758-bc8b-6837919d7ed7/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-3a001926-a442-45c4-9cbc-dad182954fed/runs/bd0a8e36-d147-4511-9cc5-afff9f1c0fbe/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-04204a72-53d8-44a8-bac5-613835ff85a7/runs/967739ea-5284-41ed-af1a-1cb5a77dd690/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-95d1ac39-323a-4c15-b1dc-645ed79c4128/runs/6ff6d2b3-2867-4ad4-b2bb-20e27a0fa925/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-91f6a946-f560-43a3-95c2-424c5dd71684/runs/a4821acc-58f8-4457-bdc9-bd83bdeb8231/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dd3b34f1-10c6-43d3-8741-a3164a642e93/runs/0ef8cf17-6c18-48a4-9943-66c448de5d44/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-fb704ef8-1cf9-4d35-854d-7b6247cf4bc2/runs/e65ec976-057f-4939-9053-1ddcddfc98f8/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-cdf7b06d-2265-41fe-b1e9-84366dc88b62/runs/1bed4289-7442-4a91-bf45-a7de10ab79bb/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-58582496-e551-4d80-8ae5-9eacac5e8a36/runs/6b5a7f56-af89-

[jira] [Created] (MESOS-6566) The Docker executor should not leak task env variables in the Docker command cmd line.

2016-11-09 Thread JIRA
Gastón Kleiman created MESOS-6566:
-

 Summary: The Docker executor should not leak task env variables in 
the Docker command cmd line.
 Key: MESOS-6566
 URL: https://issues.apache.org/jira/browse/MESOS-6566
 Project: Mesos
  Issue Type: Bug
  Components: docker, security
Reporter: Gastón Kleiman


Task environment variables are sensitive, as they might contain secrets.

The Docker executor starts tasks by executing a {{docker run}} command, and it 
includes the env variables in the cmd line of the docker command, exposing them 
to all the users in the machine:

{code}
$ ./src/mesos-execute --command="sleep 200" --containerizer=docker 
--docker_image=alpine --env='{"foo": "bar"}' --master=10.0.2.15:5050 --name=test
$ ps aux | grep bar
[...] docker -H unix:///var/run/docker.sock run [...] -e foo=bar [...] alpine 
-c sleep 200
$
{code}

The Docker executor could pass Docker the {{--env-file}} flag, pointing it to a 
file with the environment variables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6563) Shared Filesystem Isolator does not clean up mounts

2016-11-09 Thread Ilya Pronin (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Pronin reassigned MESOS-6563:
--

Assignee: Ilya Pronin

> Shared Filesystem Isolator does not clean up mounts
> ---
>
> Key: MESOS-6563
> URL: https://issues.apache.org/jira/browse/MESOS-6563
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation
>Reporter: David Robinson
>Assignee: Ilya Pronin
>
> While testing the agent's 'filesystem/shared' isolator we discovered that 
> mounts are not unmounted, agents ended up with 1000s of mounts, one for each 
> task that has run.
> To reproduce the problem start a mesos agent w/ 
> --isolation="filesystem/shared" and 
> --default_container_info="file:///tmp/the-container-info-below.json", then 
> launch and kill several tasks. After the tasks are killed the mount points 
> should be unmounted, but they are not.
> {noformat:title=container info}
> {
> "type": "MESOS",
> "volumes": [
> {
> "container_path": "/tmp",
> "host_path": "tmp",
> "mode": "RW"
> }
> ]
> }
> {noformat}
> Mounts are supposed to be [cleaned automatically by the kernel when the 
> process 
> exits|https://github.com/apache/mesos/blob/3845ab8af83a6eebfbf32e98f9000ab695cf2661/src/slave/containerizer/mesos/isolators/filesystem/shared.cpp#L70].
> {noformat}
> // We only need to implement the `prepare()` function in this
> // isolator. There is nothing to recover because we do not keep any
> // state and do not monitor filesystem usage or perform any action on
> // cleanup. Cleanup of mounts is done automatically done by the kernel
> // when the mount namespace is destroyed after the last process
> // terminates.
> Future> SharedFilesystemIsolatorProcess::prepare(
> const ContainerID& containerId,
> const ContainerConfig& containerConfig)
> {
> {noformat}
> We found during testing that an agent would have 1000s of dangling mounts, 
> all of them attributed to the mesos agent:
> {noformat}
> root[7]server-001 ~ # tail /proc/mounts
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dda59747-848a-4b3b-8424-d0032f8a38f7/runs/e31bea31-22d7-4758-bc8b-6837919d7ed7/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-3a001926-a442-45c4-9cbc-dad182954fed/runs/bd0a8e36-d147-4511-9cc5-afff9f1c0fbe/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-04204a72-53d8-44a8-bac5-613835ff85a7/runs/967739ea-5284-41ed-af1a-1cb5a77dd690/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-95d1ac39-323a-4c15-b1dc-645ed79c4128/runs/6ff6d2b3-2867-4ad4-b2bb-20e27a0fa925/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-91f6a946-f560-43a3-95c2-424c5dd71684/runs/a4821acc-58f8-4457-bdc9-bd83bdeb8231/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-dd3b34f1-10c6-43d3-8741-a3164a642e93/runs/0ef8cf17-6c18-48a4-9943-66c448de5d44/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-fb704ef8-1cf9-4d35-854d-7b6247cf4bc2/runs/e65ec976-057f-4939-9053-1ddcddfc98f8/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-cdf7b06d-2265-41fe-b1e9-84366dc88b62/runs/1bed4289-7442-4a91-bf45-a7de10ab79bb/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/frameworks/201103282247-19-/executors/thermos-drobinson-test-sleep2-0-58582496-e551-4d80-8ae5-9eacac5e8a36/runs/6b5a7f56-af89-4eab-bbfa-883ca43744ad/tmp
>  xfs rw,noatime,attr2,nobarrier,inode64,prjquota 0 0
> /dev/sda1 
> /var/lib/mesos/slaves/b600ee92-bb38-4447-984a-4047c3d2c176-S2/fr

[jira] [Updated] (MESOS-4875) overlayfs does not work when launching tasks.

2016-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4875:
---
Shepherd: Jie Yu

> overlayfs does not work when launching tasks.
> -
>
> Key: MESOS-4875
> URL: https://issues.apache.org/jira/browse/MESOS-4875
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
> Fix For: 1.0.0
>
>
> Enable the overlay backend and launch a task, the task failed to start. Check 
> executor log, found the followinig:
> {code}
> Failed to create sandbox mount point  at 
> '/tmp/mesos/slaves/bbc41bda-747a-420e-88d2-cf100fa8b6d5-S1/frameworks/bbc41bda-747a-420e-88d2-cf100fa8b6d5-0001/executors/test_mesos/runs/3736fb2a-de7a-4aba-9b08-25c73be7879f/.rootfs/mnt/mesos/sandbox':
>  Read-only file system
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4875) overlayfs does not work when launching tasks.

2016-11-09 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650638#comment-15650638
 ] 

Alexander Rukletsov commented on MESOS-4875:


Please make sure the "shepherd" field is properly set.

> overlayfs does not work when launching tasks.
> -
>
> Key: MESOS-4875
> URL: https://issues.apache.org/jira/browse/MESOS-4875
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
> Fix For: 1.0.0
>
>
> Enable the overlay backend and launch a task, the task failed to start. Check 
> executor log, found the followinig:
> {code}
> Failed to create sandbox mount point  at 
> '/tmp/mesos/slaves/bbc41bda-747a-420e-88d2-cf100fa8b6d5-S1/frameworks/bbc41bda-747a-420e-88d2-cf100fa8b6d5-0001/executors/test_mesos/runs/3736fb2a-de7a-4aba-9b08-25c73be7879f/.rootfs/mnt/mesos/sandbox':
>  Read-only file system
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4875) overlayfs does not work when launching tasks.

2016-11-09 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-4875:
---
Summary: overlayfs does not work when launching tasks.  (was: overlayfs 
does not work when lauching tasks)

> overlayfs does not work when launching tasks.
> -
>
> Key: MESOS-4875
> URL: https://issues.apache.org/jira/browse/MESOS-4875
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
> Fix For: 1.0.0
>
>
> Enable the overlay backend and launch a task, the task failed to start. Check 
> executor log, found the followinig:
> {code}
> Failed to create sandbox mount point  at 
> '/tmp/mesos/slaves/bbc41bda-747a-420e-88d2-cf100fa8b6d5-S1/frameworks/bbc41bda-747a-420e-88d2-cf100fa8b6d5-0001/executors/test_mesos/runs/3736fb2a-de7a-4aba-9b08-25c73be7879f/.rootfs/mnt/mesos/sandbox':
>  Read-only file system
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)