[jira] [Commented] (MESOS-6002) The whiteout file cannot be removed correctly using aufs backend.

2016-10-18 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587744#comment-15587744
 ] 

Jie Yu commented on MESOS-6002:
---

commit ee7496ad8fba889438fe63dadc11c1ca2068d304
Author: Qian Zhang 
Date:   Tue Oct 18 22:36:25 2016 -0700

Updated aufs mount with `rw` and `ro+wh` options.

Review: https://reviews.apache.org/r/52254/

> The whiteout file cannot be removed correctly using aufs backend.
> -
>
> Key: MESOS-6002
> URL: https://issues.apache.org/jira/browse/MESOS-6002
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 14, Ubuntu 12
> Or any os with aufs module
>Reporter: Gilbert Song
>Assignee: Qian Zhang
>  Labels: aufs, backend, containerizer
> Attachments: whiteout.diff
>
>
> The whiteout file is not removed correctly when using the aufs backend in 
> unified containerizer. It can be verified by this unit test with the aufs 
> manually specified.
> {noformat}
> [20:11:24] :   [Step 10/10] [ RUN  ] 
> ProvisionerDockerPullerTest.ROOT_INTERNET_CURL_Whiteout
> [20:11:24]W:   [Step 10/10] I0805 20:11:24.986734 24295 cluster.cpp:155] 
> Creating default 'local' authorizer
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.001153 24295 leveldb.cpp:174] 
> Opened db in 14.308627ms
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003731 24295 leveldb.cpp:181] 
> Compacted db in 2.558329ms
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003749 24295 leveldb.cpp:196] 
> Created db iterator in 3086ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003754 24295 leveldb.cpp:202] 
> Seeked to beginning of db in 595ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003758 24295 leveldb.cpp:271] 
> Iterated through 0 keys in the db in 314ns
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.003769 24295 replica.cpp:776] 
> Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004086 24315 recover.cpp:451] 
> Starting replica recovery
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004251 24312 recover.cpp:477] 
> Replica is in EMPTY status
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004546 24314 replica.cpp:673] 
> Replica in EMPTY status received a broadcasted recover request from 
> __req_res__(5640)@172.30.2.105:36006
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004607 24312 recover.cpp:197] 
> Received a recover response from a replica in EMPTY status
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004762 24313 recover.cpp:568] 
> Updating replica status to STARTING
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004776 24314 master.cpp:375] 
> Master 21665992-d47e-402f-a00c-6f8fab613019 (ip-172-30-2-105.mesosphere.io) 
> started on 172.30.2.105:36006
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004787 24314 master.cpp:377] Flags 
> at startup: --acls="" --agent_ping_timeout="15secs" 
> --agent_reregister_timeout="10mins" --allocation_interval="1secs" 
> --allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/0z753P/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="100secs" 
> --registry_strict="true" --root_submissions="true" --user_sorter="drf" 
> --version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/0z753P/master" --zk_session_timeout="10secs"
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004920 24314 master.cpp:427] 
> Master only allowing authenticated frameworks to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004930 24314 master.cpp:441] 
> Master only allowing authenticated agents to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004935 24314 master.cpp:454] 
> Master only allowing authenticated HTTP frameworks to register
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.004942 24314 credentials.hpp:37] 
> Loading credentials for authentication from '/tmp/0z753P/credentials'
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.005018 24314 master.cpp:499] Using 
> default 'crammd5' authenticator
> [20:11:25]W:   [Step 10/10] I0805 20:11:25.005101 24314 http.cpp:883] Using 
> default 'basic' HTTP authenticator for 

[jira] [Commented] (MESOS-6416) Design doc for Restartable Tasks

2016-10-18 Thread Megha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587494#comment-15587494
 ] 

Megha commented on MESOS-6416:
--

Here's the design doc for Restartable tasks:
https://docs.google.com/document/d/1YS_EBUNLkzpSru0dwn_hPUIeTATiWckSaosXSIaHUCo/edit?usp=sharing

> Design doc for Restartable Tasks
> 
>
> Key: MESOS-6416
> URL: https://issues.apache.org/jira/browse/MESOS-6416
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Megha
>Assignee: Megha
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6416) Design doc for Restartable Tasks

2016-10-18 Thread Megha (JIRA)
Megha created MESOS-6416:


 Summary: Design doc for Restartable Tasks
 Key: MESOS-6416
 URL: https://issues.apache.org/jira/browse/MESOS-6416
 Project: Mesos
  Issue Type: Improvement
Reporter: Megha
Assignee: Megha






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6414) Task cleanup fails when the containers includes cgroups not owned by Mesos

2016-10-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587476#comment-15587476
 ] 

haosdent commented on MESOS-6414:
-

In additional, may you share the details about how to reproduce this? I would 
like to verify if cgroups namespace could resolve this or not [~anindya.sinha]

> Task cleanup fails when the containers includes cgroups not owned by Mesos
> --
>
> Key: MESOS-6414
> URL: https://issues.apache.org/jira/browse/MESOS-6414
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>
> If a mesos task is launched in a cgroup outside of the context of Mesos,  
> Mesos is unaware of that cgroup created in the task context.
> Now when the Mesos task terminates: Mesos tries to cleanup all cgroups within 
> the top level cgroup it knows about. If the cgroup created in the task 
> context exists when LinuxLauncherProcess::destroy() is called but is 
> eventually cleaned up by the container before we do a freeze() or thaw() or 
> remove(), it fails at those stages leading to an incomplete cleanup of the 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6414) Task cleanup fails when the containers includes cgroups not owned by Mesos

2016-10-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587469#comment-15587469
 ] 

haosdent commented on MESOS-6414:
-

I think [~anindya.sinha] means his tasks would create cgroup during running. 
But those cgroups created by user tasks would be clean up by  
{{LinuxLauncherProcess::destroy()}}. I think the correct way to fix this 
problem is to use cgroups namespaces. 

> Task cleanup fails when the containers includes cgroups not owned by Mesos
> --
>
> Key: MESOS-6414
> URL: https://issues.apache.org/jira/browse/MESOS-6414
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>
> If a mesos task is launched in a cgroup outside of the context of Mesos,  
> Mesos is unaware of that cgroup created in the task context.
> Now when the Mesos task terminates: Mesos tries to cleanup all cgroups within 
> the top level cgroup it knows about. If the cgroup created in the task 
> context exists when LinuxLauncherProcess::destroy() is called but is 
> eventually cleaned up by the container before we do a freeze() or thaw() or 
> remove(), it fails at those stages leading to an incomplete cleanup of the 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks

2016-10-18 Thread kasim (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587231#comment-15587231
 ] 

kasim commented on MESOS-6400:
--

I am using marathon 1.3.0-1.0.506.el7.

Yes, I can restarted marathon get a new framework id, and start some tasks(all 
duplicated of Orphan Tasks). And due to lack of resouce, it can not start all 
tasks. so I'd like to remove Orphan Tasks immediately, is there any way to do ?

> Not able to remove Orphan Tasks
> ---
>
> Key: MESOS-6400
> URL: https://issues.apache.org/jira/browse/MESOS-6400
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: centos 7 x64
>Reporter: kasim
>
> The problem maybe cause by Mesos and Marathon out of sync
> https://github.com/mesosphere/marathon/issues/616
> When I found Orphan Tasks happen, I
> 1. restart marathon
> 2. marathon do not sync Orphan Tasks, but start new tasks.
> 3. Orphan Tasks still taked the resource, I have to delete them.
> 4. I find all Orphan Tasks are under framework 
> `ef169d8a-24fc-41d1-8b0d-c67718937a48-`,
> curl -XGET `http://c196:5050/master/frameworks` shows that framework is 
> `unregistered_frameworks`
> {code}
> {
> "frameworks": [
> .
> ],
> "completed_frameworks": [ ],
> "unregistered_frameworks": [
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-"
> ]
> }
> {code}
> 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d 
> 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code}
> , but get `No framework found with specified ID`
> So I have no idea to delete Orphan Tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3545) Investigate restoring tasks/executors after machine reboot.

2016-10-18 Thread Megha (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megha updated MESOS-3545:
-
Epic Name: Restartable Tasks

> Investigate restoring tasks/executors after machine reboot.
> ---
>
> Key: MESOS-3545
> URL: https://issues.apache.org/jira/browse/MESOS-3545
> Project: Mesos
>  Issue Type: Epic
>  Components: slave
>Reporter: Benjamin Hindman
>Assignee: Megha
>
> If a task/executor is restartable (see MESOS-3544) it might make sense to 
> force an agent to restart these tasks/executors _before_ after a machine 
> reboot in the event that the machine is network partitioned away from the 
> master (or the master has failed) but we'd like to get these services running 
> again. Assuming the agent(s) running on the machine has not been disconnected 
> from the master for longer than the master's agent re-registration timeout 
> the agent should be able to re-register (i.e., after a network partition is 
> resolved) without a problem. However, in the same way that a framework would 
> be interested in knowing that it's tasks/executors were restarted we'd want 
> to send something like a TASK_RESTARTED status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3545) Investigate restoring tasks/executors after machine reboot.

2016-10-18 Thread Megha (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Megha updated MESOS-3545:
-
  Shepherd: Yan Xu
Labels:   (was: mesosphere)
Issue Type: Epic  (was: Improvement)

> Investigate restoring tasks/executors after machine reboot.
> ---
>
> Key: MESOS-3545
> URL: https://issues.apache.org/jira/browse/MESOS-3545
> Project: Mesos
>  Issue Type: Epic
>  Components: slave
>Reporter: Benjamin Hindman
>Assignee: Megha
>
> If a task/executor is restartable (see MESOS-3544) it might make sense to 
> force an agent to restart these tasks/executors _before_ after a machine 
> reboot in the event that the machine is network partitioned away from the 
> master (or the master has failed) but we'd like to get these services running 
> again. Assuming the agent(s) running on the machine has not been disconnected 
> from the master for longer than the master's agent re-registration timeout 
> the agent should be able to re-register (i.e., after a network partition is 
> resolved) without a problem. However, in the same way that a framework would 
> be interested in knowing that it's tasks/executors were restarted we'd want 
> to send something like a TASK_RESTARTED status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6415) Create an unit test for OOM in Mesos containerizer's mem isolator

2016-10-18 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587117#comment-15587117
 ] 

Zhitao Li edited comment on MESOS-6415 at 10/19/16 12:10 AM:
-

[~jieyu] and I chatted on the containerizer slack, and I exactly want to pursue 
this as a separate test. 

Slack history:
{quote}
It seems like we don't have any integration test practicing the case of 
exceeding container memory limit. @jieyu @gilbert ?

Jie Yu [4:43 PM]  
we do have a balloon framework

Zhitao Li [4:44 PM]  
Is it exercised through a test?

Jie Yu [4:44 PM]  
yeah

Gilbert Song [4:44 PM]  
yes,

[4:44]  
through a script in a unit test

Jie Yu [4:44 PM]  
in retrospect, we can simply use a command task

[4:45]  
at the time balloon framework was written, command task does not exist yet

Zhitao Li [4:45 PM]  
I'd volunteer me or someone from our team to write a smaller test, if you want 
to shepherd (edited)

Jie Yu [4:45 PM]  
yup, i’d be happy to shepherd

[4:45]  
you should add one to cgroups_isolator_tests.cpp

Zhitao Li [4:46 PM]  
Will file an issue and claim it under my umbrella for now. Thanks

Jie Yu [4:46 PM]  
oh

[4:46]  
hold on

[4:46]  
we do have MemoryPressureMesosTest

[4:48]  
but I guess we don’t have a oom test

[4:48]  
memory pressure is mainly for the stats

[4:49]  
yeah, @zhitao, we should add a OOM test
{quote}


was (Author: zhitao):
[~jieyu] and I chatted on the containerizer slack, and I exactly want to pursue 
this as a separate test. 

Slack history:
```

It seems like we don't have any integration test practicing the case of 
exceeding container memory limit. @jieyu @gilbert ?

Jie Yu [4:43 PM]  
we do have a balloon framework

Zhitao Li [4:44 PM]  
Is it exercised through a test?

Jie Yu [4:44 PM]  
yeah

Gilbert Song [4:44 PM]  
yes,

[4:44]  
through a script in a unit test

Jie Yu [4:44 PM]  
in retrospect, we can simply use a command task

[4:45]  
at the time balloon framework was written, command task does not exist yet

Zhitao Li [4:45 PM]  
I'd volunteer me or someone from our team to write a smaller test, if you want 
to shepherd (edited)

Jie Yu [4:45 PM]  
yup, i’d be happy to shepherd

[4:45]  
you should add one to cgroups_isolator_tests.cpp

Zhitao Li [4:46 PM]  
Will file an issue and claim it under my umbrella for now. Thanks

Jie Yu [4:46 PM]  
oh

[4:46]  
hold on

[4:46]  
we do have MemoryPressureMesosTest

[4:48]  
but I guess we don’t have a oom test

[4:48]  
memory pressure is mainly for the stats

[4:49]  
yeah, @zhitao, we should add a OOM test
```

> Create an unit test for OOM in Mesos containerizer's mem isolator
> -
>
> Key: MESOS-6415
> URL: https://issues.apache.org/jira/browse/MESOS-6415
> Project: Mesos
>  Issue Type: Improvement
>  Components: testing
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>Priority: Minor
>
> It seems like we don't have any integration test practicing the case of 
> exceeding container memory limit.
> We could add one to cgroups_isolator_tests.cpp.
> Good starting task for anyone interested in this area, including myself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6415) Create an unit test for OOM in Mesos containerizer's mem isolator

2016-10-18 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587117#comment-15587117
 ] 

Zhitao Li commented on MESOS-6415:
--

[~jieyu] and I chatted on the containerizer slack, and I exactly want to pursue 
this as a separate test. 

Slack history:
```

It seems like we don't have any integration test practicing the case of 
exceeding container memory limit. @jieyu @gilbert ?

Jie Yu [4:43 PM]  
we do have a balloon framework

Zhitao Li [4:44 PM]  
Is it exercised through a test?

Jie Yu [4:44 PM]  
yeah

Gilbert Song [4:44 PM]  
yes,

[4:44]  
through a script in a unit test

Jie Yu [4:44 PM]  
in retrospect, we can simply use a command task

[4:45]  
at the time balloon framework was written, command task does not exist yet

Zhitao Li [4:45 PM]  
I'd volunteer me or someone from our team to write a smaller test, if you want 
to shepherd (edited)

Jie Yu [4:45 PM]  
yup, i’d be happy to shepherd

[4:45]  
you should add one to cgroups_isolator_tests.cpp

Zhitao Li [4:46 PM]  
Will file an issue and claim it under my umbrella for now. Thanks

Jie Yu [4:46 PM]  
oh

[4:46]  
hold on

[4:46]  
we do have MemoryPressureMesosTest

[4:48]  
but I guess we don’t have a oom test

[4:48]  
memory pressure is mainly for the stats

[4:49]  
yeah, @zhitao, we should add a OOM test
```

> Create an unit test for OOM in Mesos containerizer's mem isolator
> -
>
> Key: MESOS-6415
> URL: https://issues.apache.org/jira/browse/MESOS-6415
> Project: Mesos
>  Issue Type: Improvement
>  Components: testing
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>Priority: Minor
>
> It seems like we don't have any integration test practicing the case of 
> exceeding container memory limit.
> We could add one to cgroups_isolator_tests.cpp.
> Good starting task for anyone interested in this area, including myself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6415) Create an unit test for OOM in Mesos containerizer's mem isolator

2016-10-18 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587109#comment-15587109
 ] 

Joseph Wu commented on MESOS-6415:
--

This technically counts, but I'd love to see this in a non-{{TEST_SCRIPT}} 
form, as those are pretty fragile tests.
https://github.com/apache/mesos/blob/1.1.x/src/tests/containerizer/cgroups_isolator_tests.cpp#L74-L77

> Create an unit test for OOM in Mesos containerizer's mem isolator
> -
>
> Key: MESOS-6415
> URL: https://issues.apache.org/jira/browse/MESOS-6415
> Project: Mesos
>  Issue Type: Improvement
>  Components: testing
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>Priority: Minor
>
> It seems like we don't have any integration test practicing the case of 
> exceeding container memory limit.
> We could add one to cgroups_isolator_tests.cpp.
> Good starting task for anyone interested in this area, including myself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6415) Create an unit test for OOM in Mesos containerizer's mem isolator

2016-10-18 Thread Zhitao Li (JIRA)
Zhitao Li created MESOS-6415:


 Summary: Create an unit test for OOM in Mesos containerizer's mem 
isolator
 Key: MESOS-6415
 URL: https://issues.apache.org/jira/browse/MESOS-6415
 Project: Mesos
  Issue Type: Improvement
  Components: testing
Reporter: Zhitao Li
Assignee: Zhitao Li
Priority: Minor


It seems like we don't have any integration test practicing the case of 
exceeding container memory limit.

We could add one to cgroups_isolator_tests.cpp.

Good starting task for anyone interested in this area, including myself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5966) Add libprocess HTTP tests with SSL support

2016-10-18 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-5966:
--
Shepherd: Joseph Wu  (was: Vinod Kone)

> Add libprocess HTTP tests with SSL support
> --
>
> Key: MESOS-5966
> URL: https://issues.apache.org/jira/browse/MESOS-5966
> Project: Mesos
>  Issue Type: Task
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Libprocess contains SSL unit tests which test our SSL support using simple 
> sockets. We should add tests which also make use of libprocess's various HTTP 
> classes and helpers in a variety of SSL configurations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2016-10-18 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-3753:
--
Shepherd: Joseph Wu  (was: Vinod Kone)

> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> (You can manually test this by spinning up an SSL-enabled master and attempt 
> to run the event-call framework example against it.)
> We need to add tests that check the HTTP Scheduler library against 
> SSL-enabled Mesos:
> * with downgrade support,
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with/without verification of certificates (framework-side),
> * with a custom certificate authority (CA)
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6409) mesos-ps - Invalid header value

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586935#comment-15586935
 ] 

ASF GitHub Bot commented on MESOS-6409:
---

Github user ronaldpetty closed the pull request at:

https://github.com/apache/mesos/pull/171


> mesos-ps - Invalid header value
> ---
>
> Key: MESOS-6409
> URL: https://issues.apache.org/jira/browse/MESOS-6409
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.0.1
> Environment: Linux
>Reporter: Ronald Petty
>Assignee: Ronald Petty
> Fix For: 1.2.0
>
>
> Fresh install on Ubuntu 16.04.  Follow posix instructions then install libz.
> user@nodea:~$ mesos-ps --master=127.0.0.1:5050
> Failed to get the master state: Invalid header value '127.0.0.1:5050\n'
> Master log shows:
> I1017 21:08:28.685926 70112 http.cpp:381] HTTP GET for /master/state from 
> 127.0.0.1:50526 with User-Agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
> rv:49.0) Gecko/20100101 Firefox/49.0'
> If you use 'curl' it works:
> curl localhost:5050/state.json
> I also tried 127.0.0.1 and http, also errors out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6414) Task cleanup fails when the containers includes cgroups not owned by Mesos

2016-10-18 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586919#comment-15586919
 ] 

Jie Yu commented on MESOS-6414:
---

I cannot fully understand the problem. What do you mean by " a mesos task is 
launched in a cgroup outside of the context of Mesos"?

> Task cleanup fails when the containers includes cgroups not owned by Mesos
> --
>
> Key: MESOS-6414
> URL: https://issues.apache.org/jira/browse/MESOS-6414
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>
> If a mesos task is launched in a cgroup outside of the context of Mesos,  
> Mesos is unaware of that cgroup created in the task context.
> Now when the Mesos task terminates: Mesos tries to cleanup all cgroups within 
> the top level cgroup it knows about. If the cgroup created in the task 
> context exists when LinuxLauncherProcess::destroy() is called but is 
> eventually cleaned up by the container before we do a freeze() or thaw() or 
> remove(), it fails at those stages leading to an incomplete cleanup of the 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6414) Task cleanup fails when the containers includes cgroups not owned by Mesos

2016-10-18 Thread Anindya Sinha (JIRA)
Anindya Sinha created MESOS-6414:


 Summary: Task cleanup fails when the containers includes cgroups 
not owned by Mesos
 Key: MESOS-6414
 URL: https://issues.apache.org/jira/browse/MESOS-6414
 Project: Mesos
  Issue Type: Bug
  Components: cgroups
Reporter: Anindya Sinha
Assignee: Anindya Sinha
Priority: Minor


If a mesos task is launched in a cgroup outside of the context of Mesos,  Mesos 
is unaware of that cgroup created in the task context.

Now when the Mesos task terminates: Mesos tries to cleanup all cgroups within 
the top level cgroup it knows about. If the cgroup created in the task context 
exists when LinuxLauncherProcess::destroy() is called but is eventually cleaned 
up by the container before we do a freeze() or thaw() or remove(), it fails at 
those stages leading to an incomplete cleanup of the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6357) `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8.

2016-10-18 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586817#comment-15586817
 ] 

Gilbert Song commented on MESOS-6357:
-

Thought the multi-digit fd bug does not affect Debian 8, but just realize that 
bug affects both ubuntu and Debian, since their default sh is dash.

I still don't understand why {ParentSigterm} only flaky on Debian 8, but not on 
Ubuntu. Maybe due to a separate reason. We can post the fix for the fd issue 
first, and see if it is still flaky. 

> `NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit` is flaky in Debian 8.
> 
>
> Key: MESOS-6357
> URL: https://issues.apache.org/jira/browse/MESOS-6357
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 1.1.0
> Environment: Debian 8 with SSL enabled
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: flaky-test
>
> {noformat}
> [00:21:51] :   [Step 10/10] [ RUN  ] 
> NestedMesosContainerizerTest.ROOT_CGROUPS_ParentExit
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.357839 23530 
> containerizer.cpp:202] Using isolation: 
> cgroups/cpu,filesystem/linux,namespaces/pid,network/cni,volume/image
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.361143 23530 
> linux_launcher.cpp:150] Using /sys/fs/cgroup/freezer as the freezer hierarchy 
> for the Linux launcher
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.366930 23547 
> containerizer.cpp:557] Recovering containerizer
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.367962 23551 provisioner.cpp:253] 
> Provisioner recovery complete
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.368253 23549 
> containerizer.cpp:954] Starting container 
> 42589936-56b2-4e41-86d8-447bfaba4666 for executor 'executor' of framework 
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.368577 23548 cgroups.cpp:404] 
> Creating cgroup at 
> '/sys/fs/cgroup/cpu,cpuacct/mesos_test_458f8018-67e7-4cc6-8126-a535974db35d/42589936-56b2-4e41-86d8-447bfaba4666'
>  for container 42589936-56b2-4e41-86d8-447bfaba4666
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.369863 23544 cpu.cpp:103] Updated 
> 'cpu.shares' to 1024 (cpus 1) for container 
> 42589936-56b2-4e41-86d8-447bfaba4666
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.370384 23545 
> containerizer.cpp:1443] Launching 'mesos-containerizer' with flags 
> '--command="{"shell":true,"value":"read key <&30"}" --help="false" 
> --pipe_read="30" --pipe_write="34" 
> --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" 
> --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_sEbtvQ/containers/42589936-56b2-4e41-86d8-447bfaba4666"
>  --unshare_namespace_mnt="false" 
> --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_MqjHi0"'
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.370483 23544 
> linux_launcher.cpp:421] Launching container 
> 42589936-56b2-4e41-86d8-447bfaba4666 and cloning with namespaces CLONE_NEWNS 
> | CLONE_NEWPID
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.374867 23545 
> containerizer.cpp:1480] Checkpointing container's forked pid 14139 to 
> '/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_gzjeKG/meta/slaves/frameworks/executors/executor/runs/42589936-56b2-4e41-86d8-447bfaba4666/pids/forked.pid'
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.376519 23551 
> containerizer.cpp:1648] Starting nested container 
> 42589936-56b2-4e41-86d8-447bfaba4666.a5bc9913-c32c-40c6-ab78-2b08910847f8
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.377296 23549 
> containerizer.cpp:1443] Launching 'mesos-containerizer' with flags 
> '--command="{"shell":true,"value":"sleep 1000"}" --help="false" 
> --pipe_read="30" --pipe_write="34" 
> --pre_exec_commands="[{"arguments":["mesos-containerizer","mount","--help=false","--operation=make-rslave","--path=\/"],"shell":false,"value":"\/mnt\/teamcity\/work\/4240ba9ddd0997c3\/build\/src\/mesos-containerizer"},{"shell":true,"value":"mount
>  -n -t proc proc \/proc -o nosuid,noexec,nodev"}]" 
> --runtime_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_sEbtvQ/containers/42589936-56b2-4e41-86d8-447bfaba4666/containers/a5bc9913-c32c-40c6-ab78-2b08910847f8"
>  --unshare_namespace_mnt="false" 
> --working_directory="/mnt/teamcity/temp/buildTmp/NestedMesosContainerizerTest_ROOT_CGROUPS_ParentExit_MqjHi0/containers/a5bc9913-c32c-40c6-ab78-2b08910847f8"'
> [00:21:51]W:   [Step 10/10] I1008 00:21:51.377424 23548 
> linux_launcher.cpp:421] 

[jira] [Commented] (MESOS-6412) Improve socket connect error message.

2016-10-18 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586737#comment-15586737
 ] 

James Peach commented on MESOS-6412:


https://reviews.apache.org/r/52997/

> Improve socket connect error message.
> -
>
> Key: MESOS-6412
> URL: https://issues.apache.org/jira/browse/MESOS-6412
> Project: Mesos
>  Issue Type: Bug
>Reporter: James Peach
>Assignee: James Peach
>
> The error from {{Socket::connect}} just says it failed. Improve this to 
> report the error (from {{errno}}) and the address we are trying to connect 
> to. This gives the operator a fighting chance at debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6413) Support "rsync"-like features for review board scripts

2016-10-18 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6413:
--
Labels: mesosphere newbie  (was: mesosphere)

> Support "rsync"-like features for review board scripts
> --
>
> Key: MESOS-6413
> URL: https://issues.apache.org/jira/browse/MESOS-6413
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere, newbie
>
> When working on a branch containing many reviews, it can be easy to forget to 
> update the review-board copy of a review request after revising it and 
> rebasing. It would be useful to be able to:
> * automatically "rsync" a local branch against the version on review-board, 
> only creating new RB versions when the diff is non-empty
> * list which RRs at review-board are not the same as the corresponding local 
> RR
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5396) After failover, master does not remove agents with same UPID

2016-10-18 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-5396:
---
Shepherd: Vinod Kone

> After failover, master does not remove agents with same UPID
> 
>
> Key: MESOS-5396
> URL: https://issues.apache.org/jira/browse/MESOS-5396
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Critical
>  Labels: mesosphere
>
> Scenario:
> * master fails over
> * an agent host is restarted; the agent attempts to *register* (not 
> reregister) with Mesos using the same UPID as the previous agent instance; 
> this means it will get a new agent ID
> * framework isn't notified about the status of the tasks on the *old* agentID 
> until the {{agent_reregister_timeout}} expires (10 mins)
> This isn't necessarily wrong but it is suboptimal: when the agent attempts to 
> register with the same UPID that was used by the previous agent instance, we 
> know that a *reregistration* attempt for the old  pair will 
> never be seen. Hence we can declare the old agentID to be gone-forever and 
> notify frameworks appropriately, without waiting for the full 
> {{agent_reregister_timeout}} to expire.
> Note that we already implement the proposed behavior for the case when the 
> master does *not* failover 
> (https://github.com/apache/mesos/blob/0.28.1/src/master/master.cpp#L4162-L4172).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6100) Make fails compiling 1.0.1

2016-10-18 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586714#comment-15586714
 ] 

Vinod Kone commented on MESOS-6100:
---

Cherry-picked to 1.0.x

commit aa6dad2117ef088ed086df760a9488e77376885d
Author: Neil Conway 
Date:   Wed Sep 21 10:23:47 2016 -0700

Fixed an uninitialized variable warning.

Observed with clang-tidy.

Review: https://reviews.apache.org/r/52113/


> Make fails compiling 1.0.1 
> ---
>
> Key: MESOS-6100
> URL: https://issues.apache.org/jira/browse/MESOS-6100
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: Alpine Linux  (Edge)
> GCC 6.1.1
>Reporter: Gennady Feldman
>Assignee: Kevin Klues
> Fix For: 1.0.2, 1.1.0
>
>
> linux/fs.cpp: In static member function 'static 
> Try 
> mesos::internal::fs::MountInfoTable::read(const Option&, bool)':
> linux/fs.cpp:152:27: error: 'rootParentId' may be used uninitialized in this 
> function [-Werror=maybe-uninitialized]
>  sortFrom(rootParentId);
>^
> cc1plus: all warnings being treated as errors
> P.S. This is something new since I am able to compile 1.0.0 just fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6413) Support "rsync"-like features for review board scripts

2016-10-18 Thread Neil Conway (JIRA)
Neil Conway created MESOS-6413:
--

 Summary: Support "rsync"-like features for review board scripts
 Key: MESOS-6413
 URL: https://issues.apache.org/jira/browse/MESOS-6413
 Project: Mesos
  Issue Type: Bug
Reporter: Neil Conway
Priority: Minor


When working on a branch containing many reviews, it can be easy to forget to 
update the review-board copy of a review request after revising it and 
rebasing. It would be useful to be able to:

* automatically "rsync" a local branch against the version on review-board, 
only creating new RB versions when the diff is non-empty
* list which RRs at review-board are not the same as the corresponding local RR

etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6400) Not able to remove Orphan Tasks

2016-10-18 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586677#comment-15586677
 ] 

Gilbert Song commented on MESOS-6400:
-

[~mithril], we may need more information to help you with debugging. What is 
your Mesos version? Marathon version?

Most likely you will receive a new framework id for your marathon after 
restart. Can you find it from the master/framework endpoint? If yes, the master 
is supposed to remove your old framework after a configurable timeout (e.g., 7 
days by default), then your tasks from that unregistered framework should be 
cleaned up.

> Not able to remove Orphan Tasks
> ---
>
> Key: MESOS-6400
> URL: https://issues.apache.org/jira/browse/MESOS-6400
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: centos 7 x64
>Reporter: kasim
>
> The problem maybe cause by Mesos and Marathon out of sync
> https://github.com/mesosphere/marathon/issues/616
> When I found Orphan Tasks happen, I
> 1. restart marathon
> 2. marathon do not sync Orphan Tasks, but start new tasks.
> 3. Orphan Tasks still taked the resource, I have to delete them.
> 4. I find all Orphan Tasks are under framework 
> `ef169d8a-24fc-41d1-8b0d-c67718937a48-`,
> curl -XGET `http://c196:5050/master/frameworks` shows that framework is 
> `unregistered_frameworks`
> {code}
> {
> "frameworks": [
> .
> ],
> "completed_frameworks": [ ],
> "unregistered_frameworks": [
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-",
> "ef169d8a-24fc-41d1-8b0d-c67718937a48-"
> ]
> }
> {code}
> 5.Try {code}curl -XPOST http://c196:5050/master/teardown -d 
> 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-' {code}
> , but get `No framework found with specified ID`
> So I have no idea to delete Orphan Tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6409) mesos-ps - Invalid header value

2016-10-18 Thread Ronald Petty (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586610#comment-15586610
 ] 

Ronald Petty commented on MESOS-6409:
-

Sweet done!

> mesos-ps - Invalid header value
> ---
>
> Key: MESOS-6409
> URL: https://issues.apache.org/jira/browse/MESOS-6409
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.0.1
> Environment: Linux
>Reporter: Ronald Petty
>Assignee: Ronald Petty
> Fix For: 1.2.0
>
>
> Fresh install on Ubuntu 16.04.  Follow posix instructions then install libz.
> user@nodea:~$ mesos-ps --master=127.0.0.1:5050
> Failed to get the master state: Invalid header value '127.0.0.1:5050\n'
> Master log shows:
> I1017 21:08:28.685926 70112 http.cpp:381] HTTP GET for /master/state from 
> 127.0.0.1:50526 with User-Agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
> rv:49.0) Gecko/20100101 Firefox/49.0'
> If you use 'curl' it works:
> curl localhost:5050/state.json
> I also tried 127.0.0.1 and http, also errors out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6412) Improve socket connect error message.

2016-10-18 Thread James Peach (JIRA)
James Peach created MESOS-6412:
--

 Summary: Improve socket connect error message.
 Key: MESOS-6412
 URL: https://issues.apache.org/jira/browse/MESOS-6412
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach
Assignee: James Peach


The error from {{Socket::connect}} just says it failed. Improve this to report 
the error (from {{errno}}) and the address we are trying to connect to. This 
gives the operator a fighting chance at debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4945) Garbage collect unused docker layers in the store.

2016-10-18 Thread Zhitao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhitao Li reassigned MESOS-4945:


Assignee: Zhitao Li

> Garbage collect unused docker layers in the store.
> --
>
> Key: MESOS-4945
> URL: https://issues.apache.org/jira/browse/MESOS-4945
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>Assignee: Zhitao Li
>
> Right now, we don't have any garbage collection in place for docker layers. 
> It's not straightforward to implement because we don't know what container is 
> currently using the layer. We probably need a way to track the current usage 
> of layers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6409) mesos-ps - Invalid header value

2016-10-18 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586416#comment-15586416
 ] 

Gilbert Song commented on MESOS-6409:
-

You may need to find a shepherd to help commit the patch.

[~vinodkone][~kaysoky], do you have cycle to help shepherd this patch? (should 
be just one-line change) :)

> mesos-ps - Invalid header value
> ---
>
> Key: MESOS-6409
> URL: https://issues.apache.org/jira/browse/MESOS-6409
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.0.1
> Environment: Linux
>Reporter: Ronald Petty
>
> Fresh install on Ubuntu 16.04.  Follow posix instructions then install libz.
> user@nodea:~$ mesos-ps --master=127.0.0.1:5050
> Failed to get the master state: Invalid header value '127.0.0.1:5050\n'
> Master log shows:
> I1017 21:08:28.685926 70112 http.cpp:381] HTTP GET for /master/state from 
> 127.0.0.1:50526 with User-Agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
> rv:49.0) Gecko/20100101 Firefox/49.0'
> If you use 'curl' it works:
> curl localhost:5050/state.json
> I also tried 127.0.0.1 and http, also errors out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6409) mesos-ps - Invalid header value

2016-10-18 Thread Ronald Petty (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586360#comment-15586360
 ] 

Ronald Petty commented on MESOS-6409:
-

Thanks Gilbert.

https://reviews.apache.org/r/52993/

> mesos-ps - Invalid header value
> ---
>
> Key: MESOS-6409
> URL: https://issues.apache.org/jira/browse/MESOS-6409
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.0.1
> Environment: Linux
>Reporter: Ronald Petty
>
> Fresh install on Ubuntu 16.04.  Follow posix instructions then install libz.
> user@nodea:~$ mesos-ps --master=127.0.0.1:5050
> Failed to get the master state: Invalid header value '127.0.0.1:5050\n'
> Master log shows:
> I1017 21:08:28.685926 70112 http.cpp:381] HTTP GET for /master/state from 
> 127.0.0.1:50526 with User-Agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
> rv:49.0) Gecko/20100101 Firefox/49.0'
> If you use 'curl' it works:
> curl localhost:5050/state.json
> I also tried 127.0.0.1 and http, also errors out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6398) mesos-agent executable: Failing commands display full help text.

2016-10-18 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-6398:

Issue Type: Improvement  (was: Bug)
   Summary: mesos-agent executable: Failing commands display full help 
text.  (was: mesos-agent executable: Failing commands display full help text)

> mesos-agent executable: Failing commands display full help text.
> 
>
> Key: MESOS-6398
> URL: https://issues.apache.org/jira/browse/MESOS-6398
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: DC/OS cluster on AWS, 1-master-2-agents (1 public and 1 
> private node) configuration.
>Reporter: Mischa Krüger
>Priority: Minor
>
> When trying to execute a command (in shell) with `mesos-agent` binary that 
> fails for some reason (in my case it was setting 
> `--executor_environment_variables=config_file.json` which didn't work ^^), 
> the full (insanely huge) help text gets displayed again, and the little 
> information at the beginning of the command stays on top of output.
> This should be definitely changed to display the help text only if 
> `mesos-agent` is started without any command or with `--help`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6398) mesos-agent executable: Failing commands display full help text

2016-10-18 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586346#comment-15586346
 ] 

Gilbert Song commented on MESOS-6398:
-

Agree. Thanks [~Makman2]. I found this huge help text annoying once before, 
since the only valid information is printed on the top. We need to scroll up to 
find out what exactly is wrong from the arguments.

We should improvement the way printing the help text.

> mesos-agent executable: Failing commands display full help text
> ---
>
> Key: MESOS-6398
> URL: https://issues.apache.org/jira/browse/MESOS-6398
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: DC/OS cluster on AWS, 1-master-2-agents (1 public and 1 
> private node) configuration.
>Reporter: Mischa Krüger
>Priority: Minor
>
> When trying to execute a command (in shell) with `mesos-agent` binary that 
> fails for some reason (in my case it was setting 
> `--executor_environment_variables=config_file.json` which didn't work ^^), 
> the full (insanely huge) help text gets displayed again, and the little 
> information at the beginning of the command stays on top of output.
> This should be definitely changed to display the help text only if 
> `mesos-agent` is started without any command or with `--help`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2016-10-18 Thread Aniket Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586344#comment-15586344
 ] 

Aniket Bhat commented on MESOS-6010:


(1) seems like the right way to go about it. But it may be a long-ish term fix. 


> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6409) mesos-ps - Invalid header value

2016-10-18 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586311#comment-15586311
 ] 

Gilbert Song commented on MESOS-6409:
-

[~ronald.petty], Thanks for investigating on this issue and send out a PR. We 
did plan to improve more on Mesos CLI.

Unfortunately, we only accept Pull Request from Github for documentation and 
contributor list. Could you please submit a patch following this doc?
http://mesos.apache.org/documentation/latest/submitting-a-patch/

We will be more than happy to help you to fix this issue.

> mesos-ps - Invalid header value
> ---
>
> Key: MESOS-6409
> URL: https://issues.apache.org/jira/browse/MESOS-6409
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.0.1
> Environment: Linux
>Reporter: Ronald Petty
>
> Fresh install on Ubuntu 16.04.  Follow posix instructions then install libz.
> user@nodea:~$ mesos-ps --master=127.0.0.1:5050
> Failed to get the master state: Invalid header value '127.0.0.1:5050\n'
> Master log shows:
> I1017 21:08:28.685926 70112 http.cpp:381] HTTP GET for /master/state from 
> 127.0.0.1:50526 with User-Agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
> rv:49.0) Gecko/20100101 Firefox/49.0'
> If you use 'curl' it works:
> curl localhost:5050/state.json
> I also tried 127.0.0.1 and http, also errors out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2016-10-18 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586271#comment-15586271
 ] 

Gilbert Song commented on MESOS-6010:
-

The root cause is that our 3rd party http_parser cannot handle the case that a 
http response does not contain any header and body. It would return `No 
response decoded`. 

Currently, we are using `curl` to pull docker images. When we firstly send a 
request to the registry server for manifest, we expect the http response like 
the following:
{noformat}

{noformat}

However, when the agent node is behind a proxy. The http response returned from 
curl may contain the proxy connect information as an extra http response (may 
or may not container headers/body). Then, the response may be as following:
{noformat}

{noformat}

It is problematic for the 3rd party http_parser (currently we are using 
http-parser-2.6.2) to parse a response without herders/body. We should either:
1. Fix the 3rd party http_parser library, or upgrade to a version contains the 
fix, or find another substitute library.
2. Introduce a workaround to skip the response from the proxy (e.g., config the 
curl using some flags to ignore the proxy response).

I would personally prefer (1), since we may have similar issues coming up in 
the future.

> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2016-10-18 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586271#comment-15586271
 ] 

Gilbert Song edited comment on MESOS-6010 at 10/18/16 6:35 PM:
---

The root cause is that our 3rd party http_parser cannot handle the case that a 
http response does not contain any header and body. It would return `No 
response decoded`. 

Currently, we are using `curl` to pull docker images. When we firstly send a 
request to the registry server for manifest, we expect the http response like 
the following:
{noformat}
HTTP/1.1 401 Unauthorized
Content-Type: application/json; charset=utf-8
Docker-Distribution-Api-Version: registry/2.0
Www-Authenticate: Bearer 
realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
Date: Tue, 09 Aug 2016 08:10:32 GMT
Content-Length: 145
Strict-Transport-Security: max-age=31536000

{"errors":[{"code":"UNAUTHORIZED","message":"authentication 
required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
{noformat}

However, when the agent node is behind a proxy. The http response returned from 
curl may contain the proxy connect information as an extra http response (may 
or may not container headers/body). Then, the response may be as following:
{noformat}

{noformat}

It is problematic for the 3rd party http_parser (currently we are using 
http-parser-2.6.2) to parse a response without herders/body. We should either:
1. Fix the 3rd party http_parser library, or upgrade to a version contains the 
fix, or find another substitute library.
2. Introduce a workaround to skip the response from the proxy (e.g., config the 
curl using some flags to ignore the proxy response).

I would personally prefer (1), since we may have similar issues coming up in 
the future.


was (Author: gilbert):
The root cause is that our 3rd party http_parser cannot handle the case that a 
http response does not contain any header and body. It would return `No 
response decoded`. 

Currently, we are using `curl` to pull docker images. When we firstly send a 
request to the registry server for manifest, we expect the http response like 
the following:
{noformat}

{noformat}

However, when the agent node is behind a proxy. The http response returned from 
curl may contain the proxy connect information as an extra http response (may 
or may not container headers/body). Then, the response may be as following:
{noformat}

{noformat}

It is problematic for the 3rd party http_parser (currently we are using 
http-parser-2.6.2) to parse a response without herders/body. We should either:
1. Fix the 3rd party http_parser library, or upgrade to a version contains the 
fix, or find another substitute library.
2. Introduce a workaround to skip the response from the proxy (e.g., config the 
curl using some flags to ignore the proxy response).

I would personally prefer (1), since we may have similar issues coming up in 
the future.

> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> 

[jira] [Commented] (MESOS-6010) Docker registry puller shows decode error "No response decoded".

2016-10-18 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586096#comment-15586096
 ] 

Gilbert Song commented on MESOS-6010:
-

Chatted with [~abhat], their mesos agent nodes are openstack computes hosted 
behind a proxy.

> Docker registry puller shows decode error "No response decoded".
> 
>
> Key: MESOS-6010
> URL: https://issues.apache.org/jira/browse/MESOS-6010
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Sunzhe
>  Labels: Docker, mesos-containerizer
>
> The {{mesos-agent}} flags:
> {code}
>  GLOG_v=1 ./bin/mesos-agent.sh \
>   --master=zk://${MESOS_MASTER_IP}:2181/mesos  \
>   --ip=10.100.3.3  \
>   --work_dir=${MESOS_WORK_DIR} \
>   
> --isolation=cgroups/devices,gpu/nvidia,disk/du,docker/runtime,filesystem/linux
>  \
>   --enforce_container_disk_quota \
>   --containerizers=mesos \
>   --image_providers=docker \
>   --executor_environment_variables="{}"
> {code}
> And the {{mesos-execute}} flags:
> {code}
>  ./src/mesos-execute \
>--master=${MESOS_MASTER_IP}:5050 \
>--name=${INSTANCE_NAME} \
>--docker_image=${DOCKER_IMAGE} \
>--framework_capabilities=GPU_RESOURCES \
>--shell=false
> {code}
> But when {{./src/mesos-execute}}, the errors like below:
> {code}
> I0809 16:11:46.207875 25583 scheduler.cpp:172] Version: 1.0.0
> I0809 16:11:46.212442 25582 scheduler.cpp:461] New master detected at 
> master@10.103.0.125:5050
> Subscribed with ID '168ab900-ee7e-4829-a59a-d16de956637e-0009'
> Submitted task 'test' to agent '168ab900-ee7e-4829-a59a-d16de956637e-S1'
> Received status update TASK_FAILED for task 'test'
>   message: 'Failed to launch container: Failed to decode HTTP responses: No 
> response decoded
> HTTP/1.1 200 Connection established
> HTTP/1.1 401 Unauthorized
> Content-Type: application/json; charset=utf-8
> Docker-Distribution-Api-Version: registry/2.0
> Www-Authenticate: Bearer 
> realm="https://auth.docker.io/token",service="registry.docker.io",scope="repository:library/redis:pull;
> Date: Tue, 09 Aug 2016 08:10:32 GMT
> Content-Length: 145
> Strict-Transport-Security: max-age=31536000
> {"errors":[{"code":"UNAUTHORIZED","message":"authentication 
> required","detail":[{"Type":"repository","Name":"library/redis","Action":"pull"}]}]}
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6014) Create a CNI plugin that provides port mapping functionality for various CNI plugins.

2016-10-18 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-6014:
-
Target Version/s: 1.1.0  (was: 1.1.1)

> Create a CNI plugin that provides port mapping functionality for various CNI 
> plugins.
> -
>
> Key: MESOS-6014
> URL: https://issues.apache.org/jira/browse/MESOS-6014
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> Currently there is no CNI plugin that supports port mapping. Given that the 
> unified containerizer is starting to become the de-facto container run time, 
> having  a CNI plugin that provides port mapping is a must have. This is 
> primarily required for support BRIDGE networking mode, similar to docker 
> bridge networking that users expect to have when using docker containers. 
> While the most obvious use case is that of using the port-mapper plugin with 
> the bridge plugin, the port-mapping functionality itself is generic and 
> should be usable with any CNI plugin that needs it.
> Keeping port-mapping as a CNI plugin gives operators the ability to use the 
> default port-mapper (CNI plugin) that Mesos provides, or use their own plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6378) Error downloading docker images using mesos-execute cli

2016-10-18 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586027#comment-15586027
 ] 

Gilbert Song commented on MESOS-6378:
-

[~abhat], I am closing this issue since it duplicates MESOS-6010. Let's 
continue the conversation in that JIRA.

> Error downloading docker images using mesos-execute cli
> ---
>
> Key: MESOS-6378
> URL: https://issues.apache.org/jira/browse/MESOS-6378
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, fetcher
>Affects Versions: 1.0.1
>Reporter: Aniket Bhat
> Attachments: mesos-slave.INFO, mesos-slave.log
>
>
> When using mesos-execute cli with mesos containerizer to spawn a docker 
> image, the curl for the image fails. 
> {code}
> [root@host-62-214 mesos]#  sudo mesos-execute --command=/bin/bash 
> --docker_image=library/ubuntu:latest --master=172.22.62.215:5050 
> --name="yeah" --containerizer=mesos
> I1012 16:30:15.426878 15457 scheduler.cpp:172] Version: 1.0.1
> I1012 16:30:15.430287 15462 scheduler.cpp:461] New master detected at 
> master@172.22.62.215:5050
> Subscribed with ID 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-0036'
> Submitted task 'yeah' to agent 'cd0ce0ef-330f-441b-8189-ab1a1ee760d1-S10'
> Received status update TASK_FAILED for task 'yeah'
>   message: 'Failed to launch container: Failed to perform 'curl': curl: (7) 
> Failed connect to registry-1.docker.io:443; Operation now in progress
> ; Container destroyed while provisioning images'
>   source: SOURCE_AGENT
>   reason: REASON_CONTAINER_LAUNCH_FAILED
> {code}
> Mesos-slave args:
> {code}
> /usr/sbin/mesos-slave --master=zk://172.22.62.215:2181/mesos 
> --log_dir=/var/log/mesos --containerizers=mesos,docker 
> --executor_registration_timeout=5mins --image_providers=appc,docker 
> --isolation=filesystem/linux,docker/runtime --work_dir=/var/lib/mesos
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6040) Add a CMake build for `mesos-port-mapper`

2016-10-18 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-6040:
-
Sprint: Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere Sprint 45  
(was: Mesosphere Sprint 41, Mesosphere Sprint 42)

> Add a CMake build for `mesos-port-mapper`
> -
>
> Key: MESOS-6040
> URL: https://issues.apache.org/jira/browse/MESOS-6040
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>Priority: Blocker
>  Labels: mesosphere
>
> Once the port-mapper binary compiles with GNU make, we need to modify the 
> CMake to build the port-mapper binary as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6409) mesos-ps - Invalid header value

2016-10-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586014#comment-15586014
 ] 

ASF GitHub Bot commented on MESOS-6409:
---

GitHub user ronaldpetty opened a pull request:

https://github.com/apache/mesos/pull/171

Update cli.py

The CLI tools are failing per #MESOS-6409 in 1.0.1.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ronaldpetty/mesos master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mesos/pull/171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #171


commit 3891871f256ddb75d4643253cd3cd75f97891e16
Author: Ronald Petty 
Date:   2016-10-18T17:11:59Z

Update cli.py

The CLI tools are failing per #MESOS-6409 in 1.0.1.




> mesos-ps - Invalid header value
> ---
>
> Key: MESOS-6409
> URL: https://issues.apache.org/jira/browse/MESOS-6409
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.0.1
> Environment: Linux
>Reporter: Ronald Petty
>
> Fresh install on Ubuntu 16.04.  Follow posix instructions then install libz.
> user@nodea:~$ mesos-ps --master=127.0.0.1:5050
> Failed to get the master state: Invalid header value '127.0.0.1:5050\n'
> Master log shows:
> I1017 21:08:28.685926 70112 http.cpp:381] HTTP GET for /master/state from 
> 127.0.0.1:50526 with User-Agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
> rv:49.0) Gecko/20100101 Firefox/49.0'
> If you use 'curl' it works:
> curl localhost:5050/state.json
> I also tried 127.0.0.1 and http, also errors out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6409) mesos-ps - Invalid header value

2016-10-18 Thread Ronald Petty (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586007#comment-15586007
 ] 

Ronald Petty commented on MESOS-6409:
-

Agree.

> mesos-ps - Invalid header value
> ---
>
> Key: MESOS-6409
> URL: https://issues.apache.org/jira/browse/MESOS-6409
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.0.1
> Environment: Linux
>Reporter: Ronald Petty
>
> Fresh install on Ubuntu 16.04.  Follow posix instructions then install libz.
> user@nodea:~$ mesos-ps --master=127.0.0.1:5050
> Failed to get the master state: Invalid header value '127.0.0.1:5050\n'
> Master log shows:
> I1017 21:08:28.685926 70112 http.cpp:381] HTTP GET for /master/state from 
> 127.0.0.1:50526 with User-Agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
> rv:49.0) Gecko/20100101 Firefox/49.0'
> If you use 'curl' it works:
> curl localhost:5050/state.json
> I also tried 127.0.0.1 and http, also errors out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-6409) mesos-ps - Invalid header value

2016-10-18 Thread Ronald Petty (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ronald Petty updated MESOS-6409:

Comment: was deleted

(was: Also noticed this code is repeated in the cli tools, could use some 
DRY'ing up.  If someone with more knowledge than me agrees (or has other 
advice) I am happy to jump in and work on this.  I still assume its me (user 
error) but unclear how at the moment.)

> mesos-ps - Invalid header value
> ---
>
> Key: MESOS-6409
> URL: https://issues.apache.org/jira/browse/MESOS-6409
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.0.1
> Environment: Linux
>Reporter: Ronald Petty
>
> Fresh install on Ubuntu 16.04.  Follow posix instructions then install libz.
> user@nodea:~$ mesos-ps --master=127.0.0.1:5050
> Failed to get the master state: Invalid header value '127.0.0.1:5050\n'
> Master log shows:
> I1017 21:08:28.685926 70112 http.cpp:381] HTTP GET for /master/state from 
> 127.0.0.1:50526 with User-Agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
> rv:49.0) Gecko/20100101 Firefox/49.0'
> If you use 'curl' it works:
> curl localhost:5050/state.json
> I also tried 127.0.0.1 and http, also errors out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6409) mesos-ps - Invalid header value

2016-10-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585922#comment-15585922
 ] 

haosdent commented on MESOS-6409:
-

[~ronald.petty] Thank you for reporting this. It caused by `mesos-resolve` 
didn't trim output (the last {{\n}}).

I think could fix it by this way

{code}
diff --git a/src/python/cli/src/mesos/cli.py b/src/python/cli/src/mesos/cli.py
index f342992..4a9b558 100644
--- a/src/python/cli/src/mesos/cli.py
+++ b/src/python/cli/src/mesos/cli.py
@@ -51,7 +51,7 @@ def resolve(master):
 raise Exception('Failed to execute \'mesos-resolve %s\':\n%s'
 % (master, process.stderr.read()))

-result = process.stdout.read()
+result = process.stdout.read().strip()
 process.stdout.close()
 process.stderr.close()
 return result
{code}

> mesos-ps - Invalid header value
> ---
>
> Key: MESOS-6409
> URL: https://issues.apache.org/jira/browse/MESOS-6409
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.0.1
> Environment: Linux
>Reporter: Ronald Petty
>
> Fresh install on Ubuntu 16.04.  Follow posix instructions then install libz.
> user@nodea:~$ mesos-ps --master=127.0.0.1:5050
> Failed to get the master state: Invalid header value '127.0.0.1:5050\n'
> Master log shows:
> I1017 21:08:28.685926 70112 http.cpp:381] HTTP GET for /master/state from 
> 127.0.0.1:50526 with User-Agent='Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
> rv:49.0) Gecko/20100101 Firefox/49.0'
> If you use 'curl' it works:
> curl localhost:5050/state.json
> I also tried 127.0.0.1 and http, also errors out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6383) NvidiaGpuAllocator::resources cannot load symbol nvmlGetDeviceMinorNumber - can the device minor number be ascertained reliably using an older set of API calls?

2016-10-18 Thread Kevin Klues (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585846#comment-15585846
 ] 

Kevin Klues commented on MESOS-6383:


Hi Dylan,

Thanks for reporting this.  I see the problem, but it's not immediately clear 
to me what the solution would be.  We don't want to just parse the output of 
{{nvidia-smi}} because that also changes from version to version (we talked 
with Nvidia directly about this, and they *highly* discouraged trying to rely 
on the output of {{nvidia-smi}}).

One thing I could imagine doing is to change the code that attempts to load the 
{{nvmlDeviceGetMinorNumber}} symbol from NVML.  It could attempt to load the 
symbol, and if it failed, it would fall back to implementing our wrapper 
function for {{nvml::deviceGetMinorNumber()}} using a different method (meaning 
there would be no changes to {{NvidiaGpuAllocator}}. Do you know what (if any) 
methods were available in the 5.319 driver to determine the minor number? How 
does the old {{nvidia-smi}} determine them?

Also, are you sure this is the only symbol we aren't able to load from the old 
driver, or did you just hit this one first?

> NvidiaGpuAllocator::resources cannot load symbol nvmlGetDeviceMinorNumber - 
> can the device minor number be ascertained reliably using an older set of API 
> calls?
> 
>
> Key: MESOS-6383
> URL: https://issues.apache.org/jira/browse/MESOS-6383
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.0.1
>Reporter: Dylan Bethune-Waddell
>Priority: Minor
>  Labels: gpu
>
> We're attempting to deploy Mesos on a cluster with 2 Nvidia GPUs per host. We 
> are not in a position to upgrade the Nvidia drivers in the near future, and 
> are currently at driver version 319.72
> When attempting to launch an agent with the following command and take 
> advantage of Nvidia GPU support (master address elided):
> bq. {{./bin/mesos-agent.sh --master=: 
> --work_dir=/tmp/mesos --isolation="cgroups/devices,gpu/nvidia"}}
> I receive the following error message:
> bq. {{Failed to create a containerizer: Failed call to 
> NvidiaGpuAllocator::resources: Failed to nvml::initialize: Failed to load 
> symbol 'nvmlDeviceGetMinorNumber': Error looking up symbol 
> 'nvmlDeviceGetMinorNumber' in 'libnvidia-ml.so.1' : 
> /usr/lib64/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetMinorNumber}}
> Based on the change log for the NVML module, it seems that 
> {{nvmlDeviceGetMinorNumber}} is only available for driver versions 331 and 
> later as per info under the [Changes between NVML v5.319 Update and 
> v331|http://docs.nvidia.com/deploy/nvml-api/change-log.html#change-log] 
> heading in the NVML API reference.
> Is there is an alternate method of obtaining this information at runtime to 
> enable support for older versions of the Nvidia driver? Based on discussion 
> in the design document, obtaining this information from the {{nvidia-smi}} 
> command output is a feasible alternative. 
> I am willing to submit a PR that amends the behaviour of 
> {{NvidiaGpuAllocator}} such that it first attempts calls to 
> {{nvml::nvmlGetDeviceMinorNumber}} via libnvidia-ml, and if the symbol cannot 
> be found, falls back on {{--nvidia-smi="/path/to/nvidia-smi"}} option 
> obtained from mesos-agent if provided or attempts to run {{nvidia-smi}} if 
> found on path and parses the output to obtain this information. Otherwise, 
> raise an exception indicating all this was attempted.
> Would a function or class for parsing {{nvidia-smi}} output be a useful 
> contribution?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5396) After failover, master does not remove agents with same UPID

2016-10-18 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-5396:
---
Description: 
Scenario:

* master fails over
* an agent host is restarted; the agent attempts to *register* (not reregister) 
with Mesos using the same UPID as the previous agent instance; this means it 
will get a new agent ID
* framework isn't notified about the status of the tasks on the *old* agentID 
until the {{agent_reregister_timeout}} expires (10 mins)

This isn't necessarily wrong but it is suboptimal: when the agent attempts to 
register with the same UPID that was used by the previous agent instance, we 
know that a *reregistration* attempt for the old  pair will 
never be seen. Hence we can declare the old agentID to be gone-forever and 
notify frameworks appropriately, without waiting for the full 
{{agent_reregister_timeout}} to expire.

Note that we already implement the proposed behavior for the case when the 
master does *not* failover 
(https://github.com/apache/mesos/blob/0.28.1/src/master/master.cpp#L4162-L4172).

  was:
Scenario:

* master fails over
* an agent host is restarted; the agent attempts to *register* (not reregister) 
with Mesos using the same UPID as the previous agent instance; this means it 
will get a new agent ID
* framework isn't notified about the status of the tasks on the *old* slaveID 
until the {{agent_reregister_timeout}} expires (10 mins)

This isn't necessarily wrong but it is suboptimal: when the agent attempts to 
register with the same UPID that was used by the previous agent instance, we 
know that a *reregistration* attempt for the old  pair will 
never be seen. Hence we can declare the old agentID to be gone-forever and 
notify frameworks appropriately, without waiting for the full 
{{agent_reregister_timeout}} to expire.

Note that we already implement the proposed behavior for the case when the 
master does *not* failover 
(https://github.com/apache/mesos/blob/0.28.1/src/master/master.cpp#L4162-L4172).


> After failover, master does not remove agents with same UPID
> 
>
> Key: MESOS-5396
> URL: https://issues.apache.org/jira/browse/MESOS-5396
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Critical
>  Labels: mesosphere
>
> Scenario:
> * master fails over
> * an agent host is restarted; the agent attempts to *register* (not 
> reregister) with Mesos using the same UPID as the previous agent instance; 
> this means it will get a new agent ID
> * framework isn't notified about the status of the tasks on the *old* agentID 
> until the {{agent_reregister_timeout}} expires (10 mins)
> This isn't necessarily wrong but it is suboptimal: when the agent attempts to 
> register with the same UPID that was used by the previous agent instance, we 
> know that a *reregistration* attempt for the old  pair will 
> never be seen. Hence we can declare the old agentID to be gone-forever and 
> notify frameworks appropriately, without waiting for the full 
> {{agent_reregister_timeout}} to expire.
> Note that we already implement the proposed behavior for the case when the 
> master does *not* failover 
> (https://github.com/apache/mesos/blob/0.28.1/src/master/master.cpp#L4162-L4172).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5396) After failover, master does not remove agents with same UPID

2016-10-18 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-5396:
---
Description: 
Scenario:

* master fails over
* an agent host is restarted; the agent attempts to *register* (not reregister) 
with Mesos using the same UPID as the previous agent instance; this means it 
will get a new agent ID
* framework isn't notified about the status of the tasks on the *old* slaveID 
until the {{agent_reregister_timeout}} expires (10 mins)

This isn't necessarily wrong but it is suboptimal: when the agent attempts to 
register with the same UPID that was used by the previous agent instance, we 
know that a *reregistration* attempt for the old  pair will 
never be seen. Hence we can declare the old agentID to be gone-forever and 
notify frameworks appropriately, without waiting for the full 
{{agent_reregister_timeout}} to expire.

Note that we already implement the proposed behavior for the case when the 
master does *not* failover 
(https://github.com/apache/mesos/blob/0.28.1/src/master/master.cpp#L4162-L4172).

  was:
Scenario:

* master fails over
* an agent host is restarted; the agent attempts to register with Mesos using 
the same UPID as the previous agent instance; this means it will get a new 
agent ID
* framework isn't notified about the status of the tasks on the *old* slaveID 
until the slave_reregister_timeout expires (10 mins)

This isn't necessarily wrong, but it is suboptimal: when the slave attempts to 
register with the same UPID that was used by the previous slave instance, we 
know that a *reregistration* attempt for the old  pair will 
never be seen. Hence we can declare the old slaveID to be gone-forever and 
notify frameworks appropriately, without waiting for the full 
slave_reregister_timeout to expire.

Note that we already implement the proposed behavior for the case when the 
master does *not* failover 
(https://github.com/apache/mesos/blob/0.28.1/src/master/master.cpp#L4162-L4172).


> After failover, master does not remove agents with same UPID
> 
>
> Key: MESOS-5396
> URL: https://issues.apache.org/jira/browse/MESOS-5396
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Critical
>  Labels: mesosphere
>
> Scenario:
> * master fails over
> * an agent host is restarted; the agent attempts to *register* (not 
> reregister) with Mesos using the same UPID as the previous agent instance; 
> this means it will get a new agent ID
> * framework isn't notified about the status of the tasks on the *old* slaveID 
> until the {{agent_reregister_timeout}} expires (10 mins)
> This isn't necessarily wrong but it is suboptimal: when the agent attempts to 
> register with the same UPID that was used by the previous agent instance, we 
> know that a *reregistration* attempt for the old  pair will 
> never be seen. Hence we can declare the old agentID to be gone-forever and 
> notify frameworks appropriately, without waiting for the full 
> {{agent_reregister_timeout}} to expire.
> Note that we already implement the proposed behavior for the case when the 
> master does *not* failover 
> (https://github.com/apache/mesos/blob/0.28.1/src/master/master.cpp#L4162-L4172).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3335) FlagsBase copy-ctor leads to dangling pointer.

2016-10-18 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585813#comment-15585813
 ] 

Michael Park commented on MESOS-3335:
-

{noformat}
commit 92fb8e7ee2280b5b57840fd3d2f5f3e3814c
Author: Benjamin Bannier 
Date:   Tue Oct 18 03:29:54 2016 -0400

Removed the non-member pointer overloads of `FlagsBase::add`.

Review: https://reviews.apache.org/r/46825/
{noformat}
{noformat}
commit f441eb9adb8c1443e62e10d17ed4019b66391168
Author: Benjamin Bannier 
Date:   Tue Oct 18 04:48:10 2016 -0400

Fully qualified addresses of flag members in `add` calls in mesos.

While right now we can technically `add` variables to `Flags` classes
which are not members, the in order to have correct copy semantics for
`Flags` only member variables should be used.

Here we changed all instances to a full pointer-to-member syntax in
the current code.

Review: https://reviews.apache.org/r/46824/
{noformat}
{noformat}
commit dde5eee7b11be8df874571316e29a9a25ae59150
Author: Benjamin Bannier 
Date:   Tue Oct 18 03:29:39 2016 -0400

Fully qualified addresses of flag members in `add` calls in libprocess.

While right now we can technically `add` variables to `Flags` classes
which are not members, the in order to have correct copy semantics for
`Flags` only member variables should be used.

Here we changed all instances to a full pointer-to-member syntax in
the current code.

Review: https://reviews.apache.org/r/46823/
{noformat}
{noformat}
commit 5aeecca345f80d5d34c9a8c8b64d460bcd773e30
Author: Benjamin Bannier 
Date:   Tue Oct 18 03:29:14 2016 -0400

Fully qualified addresses of flag members in `add` calls in stout.

While right now we can technically `add` variables to `Flags` classes
which are not members, the in order to have correct copy semantics for
`Flags` only member variables should be used.

Here we changed all instances to a full pointer-to-member syntax in
the current code.

Review: https://reviews.apache.org/r/52390/
{noformat}

> FlagsBase copy-ctor leads to dangling pointer.
> --
>
> Key: MESOS-3335
> URL: https://issues.apache.org/jira/browse/MESOS-3335
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Benjamin Bannier
>  Labels: mesosphere
> Attachments: lambda_capture_bug.cpp
>
>
> Per [#3328], ubsan detects the following problem:
> [ RUN ] FaultToleranceTest.ReregisterCompletedFrameworks
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp:303:25:
>  runtime error: load of value 33, which is not a valid value for type 'bool'
> I believe what is going on here is the following:
> * The test calls StartMaster(), which does MesosTest::CreateMasterFlags()
> * MesosTest::CreateMasterFlags() allocates a new master::Flags on the stack, 
> which is subsequently copy-constructed back to StartMaster()
> * The FlagsBase constructor is:
> bq. {{FlagsBase() { add(, "help", "...", false); }}}
> where "help" is a member variable -- i.e., it is allocated on the stack in 
> this case.
> * {{FlagsBase()::add}} captures {{}}, e.g.:
> {noformat}
> flag.stringify = [t1](const FlagsBase&) -> Option {
> return stringify(*t1);
>   };}}
> {noformat}
> * The implicit copy constructor for FlagsBase is just going to copy the 
> lambda above, i.e., the result of the copy constructor will have a lambda 
> that points into MesosTest::CreateMasterFlags()'s stack frame, which is bad 
> news.
> Not sure the right fix -- comments welcome. You could define a copy-ctor for 
> FlagsBase that does something gross (basically remove the old help flag and 
> define a new one that points into the target of the copy), but that seems, 
> well, gross.
> Probably not a pressing-problem to fix -- AFAICS worst symptom is that we end 
> up reading one byte from some random stack location when serving 
> {{state.json}}, for example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3160) CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky

2016-10-18 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585762#comment-15585762
 ] 

Benjamin Bannier commented on MESOS-3160:
-

[~tillt]: This test is "disabled" by an {{ASSERT}} on systems with swap 
enabled, also
{code}
// TODO(vinod): Instead of asserting here dynamically disable
// the test if swap is enabled on the host.
ASSERT_EQ(memory.get().totalSwap, Bytes(0))
{code}

Instead you should either disable swap on your host, or filter that test 
yourself for the time being.

> CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky
> 
>
> Key: MESOS-3160
> URL: https://issues.apache.org/jira/browse/MESOS-3160
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0, 0.26.0
>Reporter: Paul Brett
>  Labels: cgroups, mesosphere
>
> Test will occasionally with:
> [ RUN  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): Failed to sync with the subprocess
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): The subprocess has not been spawned yet
> [  FAILED  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS 
> (223 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3160) CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky

2016-10-18 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585618#comment-15585618
 ] 

Till Toenshoff edited comment on MESOS-3160 at 10/18/16 2:46 PM:
-

Just saw it failing on Centos6 in an SSL build as well.

{noformat}
[ RUN  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS
../../../src/tests/containerizer/cgroups_tests.cpp:1093: Failure
Value of: Bytes(0)
  Actual: 0B
Expected: memory.get().totalSwap
Which is: 61437944KB
-
We cannot run this test because it appears you have swap
enabled, but feel free to disable this test.
-
{noformat}


was (Author: tillt):
Just saw it failing on Centos6 in an SSL build as well.

> CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky
> 
>
> Key: MESOS-3160
> URL: https://issues.apache.org/jira/browse/MESOS-3160
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0, 0.26.0
>Reporter: Paul Brett
>  Labels: cgroups, mesosphere
>
> Test will occasionally with:
> [ RUN  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): Failed to sync with the subprocess
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): The subprocess has not been spawned yet
> [  FAILED  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS 
> (223 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2918) CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen Flaky

2016-10-18 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585626#comment-15585626
 ] 

Till Toenshoff commented on MESOS-2918:
---

Just saw it failing on Centos6 on an SSL build:

```
[ RUN  ] CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen
../../../src/tests/containerizer/cgroups_tests.cpp:573: Failure
Value of: Bytes(0)
  Actual: 0B
Expected: memory.get().totalSwap
Which is: 61437944KB
-
We cannot run this test because it appears you have swap
enabled, but feel free to disable this test.
-
```


> CgroupsAnyHierarchyWithCpuMemoryTest.ROOT_CGROUPS_Listen Flaky
> --
>
> Key: MESOS-2918
> URL: https://issues.apache.org/jira/browse/MESOS-2918
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, test
>Affects Versions: 0.23.0
>Reporter: Paul Brett
>Assignee: Chi Zhang
>  Labels: flaky, flaky-test, mesosphere, test, twitter
>
> This test fails when swap is enabled on the platform because it creates a 
> memory hog with the expectation that the OOM killer will kill the hog but 
> with swap enabled, the hog is just swapped out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3160) CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky

2016-10-18 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585618#comment-15585618
 ] 

Till Toenshoff commented on MESOS-3160:
---

Just saw it failing on Centos6 in an SSL build as well.

> CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky
> 
>
> Key: MESOS-3160
> URL: https://issues.apache.org/jira/browse/MESOS-3160
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0, 0.26.0
>Reporter: Paul Brett
>  Labels: cgroups, mesosphere
>
> Test will occasionally with:
> [ RUN  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): Failed to sync with the subprocess
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): The subprocess has not been spawned yet
> [  FAILED  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS 
> (223 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6404) My program cannot access a .so file while being run with mesos containerization on a docker image.

2016-10-18 Thread Mark Hammons (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585405#comment-15585405
 ] 

Mark Hammons commented on MESOS-6404:
-

I would attach the original image, but it's massive and I don't have a 
dockerfile for it.

> My program cannot access a .so file while being run with mesos 
> containerization on a docker image.
> --
>
> Key: MESOS-6404
> URL: https://issues.apache.org/jira/browse/MESOS-6404
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: CentOS Linux release 7.2.1511 (Core) 
>Reporter: Mark Hammons
>Priority: Minor
> Attachments: Dockerfile, IUWT_140926aR_t000_ch00.log
>
>
> I have an application compiled within a docker environment called 
> ubuntu-mesos:0.11-17102016-IUWT. I've defined the executor for said 
> application with the following code: 
> val iuwtURI = CommandInfo.URI.newBuilder().setValue("http://***/
> IUWT.tar.gz").setExtract(true).setCache(false).build()
> val iuwtjURI = CommandInfo.URI.newBuilder().setValue("http://***/
> iuwtExecutor-assembly-0.1-
> SNAPSHOT.jar").setExecutable(false).setCache(false).build()
> val iuwtExec = "java -jar iuwtExecutor-assembly-0.1-SNAPSHOT.jar -
> Xmx1024M -Xmx128M"
> val iuwtCommand = 
> CommandInfo.newBuilder.setValue(iuwtExec).addAllUris(List(iuwtjURI, 
> iuwtURI).asJava).setShell(true).build()
> val iuwtImageInfo = 
> Image.newBuilder().setType(Image.Type.DOCKER).setDocker(Image.Docker.newBuilder.setName("ubuntu-
> mesos:0.11-17102016-IUWT").build()).build()
> val iuwtContInfo = 
> ContainerInfo.MesosInfo.newBuilder().setImage(iuwtImageInfo).build()
> val iuwtContainer = ContainerInfo.newBuilder()
> .setMesos(iuwtContInfo)
>   .setType(ContainerInfo.Type.MESOS)
>   .build()
> val iuwtExecutor = ExecutorInfo.newBuilder()
> .setCommand(iuwtCommand)
> .setContainer(iuwtContainer)
> 
> .setExecutorId(ExecutorID.newBuilder().setValue("iuwt-executor"))
> .setName("iuwt-executor").build()
> My executor then downloads some additional data and then tries to launch the 
> application with the input data. Unfortunately, the application fails to 
> launch because "exec: error while loading shared libraries: libtiff.so.5: 
> cannot open shared object file: No such file or directory". I've attached 
> logs showing libtiff.so.5 is both in /usr/lib/x86_64-linux-gnu, but also in 
> /usr/lib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6312) Update CHANGELOG to mention addition of agent '--runtime_dir' flag.

2016-10-18 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6312:
--
Summary: Update CHANGELOG to mention addition of agent '--runtime_dir' 
flag.  (was: Update CHANGELOG to mention addtion of agent '--runtime_dir' flag.)

> Update CHANGELOG to mention addition of agent '--runtime_dir' flag.
> ---
>
> Key: MESOS-6312
> URL: https://issues.apache.org/jira/browse/MESOS-6312
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>Priority: Blocker
> Fix For: 1.1.0
>
>
> We recently introduced a new agent flag for {{\-\-runtime_dir}}. Unlike the 
> {{\-\-work_dir}}, this directory is designed to hold the state of a running 
> agent between subsequent agent-restarts (but not across host reboots).
> By default, this flag is set to {{/var/run/mesos}} since this is a {{tempfs}} 
> on linux that gets automatically cleaned up on reboot. When running as 
> non-root we set the default to {{os::temp()/mesos/runtime}}.
> We should call this out in the CHAGNELOG



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6404) My program cannot access a .so file while being run with mesos containerization on a docker image.

2016-10-18 Thread Mark Hammons (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584911#comment-15584911
 ] 

Mark Hammons commented on MESOS-6404:
-

As you suggested, creating a fresh docker file without rm (which I've attached) 
eliminates this issue.

> My program cannot access a .so file while being run with mesos 
> containerization on a docker image.
> --
>
> Key: MESOS-6404
> URL: https://issues.apache.org/jira/browse/MESOS-6404
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: CentOS Linux release 7.2.1511 (Core) 
>Reporter: Mark Hammons
>Priority: Minor
> Attachments: Dockerfile, IUWT_140926aR_t000_ch00.log
>
>
> I have an application compiled within a docker environment called 
> ubuntu-mesos:0.11-17102016-IUWT. I've defined the executor for said 
> application with the following code: 
> val iuwtURI = CommandInfo.URI.newBuilder().setValue("http://***/
> IUWT.tar.gz").setExtract(true).setCache(false).build()
> val iuwtjURI = CommandInfo.URI.newBuilder().setValue("http://***/
> iuwtExecutor-assembly-0.1-
> SNAPSHOT.jar").setExecutable(false).setCache(false).build()
> val iuwtExec = "java -jar iuwtExecutor-assembly-0.1-SNAPSHOT.jar -
> Xmx1024M -Xmx128M"
> val iuwtCommand = 
> CommandInfo.newBuilder.setValue(iuwtExec).addAllUris(List(iuwtjURI, 
> iuwtURI).asJava).setShell(true).build()
> val iuwtImageInfo = 
> Image.newBuilder().setType(Image.Type.DOCKER).setDocker(Image.Docker.newBuilder.setName("ubuntu-
> mesos:0.11-17102016-IUWT").build()).build()
> val iuwtContInfo = 
> ContainerInfo.MesosInfo.newBuilder().setImage(iuwtImageInfo).build()
> val iuwtContainer = ContainerInfo.newBuilder()
> .setMesos(iuwtContInfo)
>   .setType(ContainerInfo.Type.MESOS)
>   .build()
> val iuwtExecutor = ExecutorInfo.newBuilder()
> .setCommand(iuwtCommand)
> .setContainer(iuwtContainer)
> 
> .setExecutorId(ExecutorID.newBuilder().setValue("iuwt-executor"))
> .setName("iuwt-executor").build()
> My executor then downloads some additional data and then tries to launch the 
> application with the input data. Unfortunately, the application fails to 
> launch because "exec: error while loading shared libraries: libtiff.so.5: 
> cannot open shared object file: No such file or directory". I've attached 
> logs showing libtiff.so.5 is both in /usr/lib/x86_64-linux-gnu, but also in 
> /usr/lib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6404) My program cannot access a .so file while being run with mesos containerization on a docker image.

2016-10-18 Thread Mark Hammons (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hammons updated MESOS-6404:

Attachment: Dockerfile

This works

> My program cannot access a .so file while being run with mesos 
> containerization on a docker image.
> --
>
> Key: MESOS-6404
> URL: https://issues.apache.org/jira/browse/MESOS-6404
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.1
> Environment: CentOS Linux release 7.2.1511 (Core) 
>Reporter: Mark Hammons
>Priority: Minor
> Attachments: Dockerfile, IUWT_140926aR_t000_ch00.log
>
>
> I have an application compiled within a docker environment called 
> ubuntu-mesos:0.11-17102016-IUWT. I've defined the executor for said 
> application with the following code: 
> val iuwtURI = CommandInfo.URI.newBuilder().setValue("http://***/
> IUWT.tar.gz").setExtract(true).setCache(false).build()
> val iuwtjURI = CommandInfo.URI.newBuilder().setValue("http://***/
> iuwtExecutor-assembly-0.1-
> SNAPSHOT.jar").setExecutable(false).setCache(false).build()
> val iuwtExec = "java -jar iuwtExecutor-assembly-0.1-SNAPSHOT.jar -
> Xmx1024M -Xmx128M"
> val iuwtCommand = 
> CommandInfo.newBuilder.setValue(iuwtExec).addAllUris(List(iuwtjURI, 
> iuwtURI).asJava).setShell(true).build()
> val iuwtImageInfo = 
> Image.newBuilder().setType(Image.Type.DOCKER).setDocker(Image.Docker.newBuilder.setName("ubuntu-
> mesos:0.11-17102016-IUWT").build()).build()
> val iuwtContInfo = 
> ContainerInfo.MesosInfo.newBuilder().setImage(iuwtImageInfo).build()
> val iuwtContainer = ContainerInfo.newBuilder()
> .setMesos(iuwtContInfo)
>   .setType(ContainerInfo.Type.MESOS)
>   .build()
> val iuwtExecutor = ExecutorInfo.newBuilder()
> .setCommand(iuwtCommand)
> .setContainer(iuwtContainer)
> 
> .setExecutorId(ExecutorID.newBuilder().setValue("iuwt-executor"))
> .setName("iuwt-executor").build()
> My executor then downloads some additional data and then tries to launch the 
> application with the input data. Unfortunately, the application fails to 
> launch because "exec: error while loading shared libraries: libtiff.so.5: 
> cannot open shared object file: No such file or directory". I've attached 
> logs showing libtiff.so.5 is both in /usr/lib/x86_64-linux-gnu, but also in 
> /usr/lib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker

2016-10-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584818#comment-15584818
 ] 

haosdent commented on MESOS-6410:
-

Hi [~Lei Xu] Do you add {{--privileged=true}} when start the container?

> Fail to mount persistent volume when run mesos slave in docker
> --
>
> Key: MESOS-6410
> URL: https://issues.apache.org/jira/browse/MESOS-6410
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, volumes
>Affects Versions: 0.28.2
> Environment: Mesos 0.28.2
> Docker 1.12.1
>Reporter: Lei Xu
>Priority: Critical
>
> Here are some error logs from the slave:
> {code}
> E1018 07:52:06.18692630 slave.cpp:3758] Container 
> 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
>  Operation not permitted
> E1018 07:52:09.91687725 slave.cpp:3758] Container 
> 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
>  Operation not permitted
> {code}
> But out of the docker, the mesos slave works OK with the persistent volumes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker

2016-10-18 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6410:

Description: 
Here are some error logs from the slave:

{code}
E1018 07:52:06.18692630 slave.cpp:3758] Container 
'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
 Operation not permitted
E1018 07:52:09.91687725 slave.cpp:3758] Container 
'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
 Operation not permitted
{code}

But out of the docker, the mesos slave works OK with the persistent volumes


  was:
Here are some error logs from the slave:

{quote}
E1018 07:52:06.18692630 slave.cpp:3758] Container 
'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
 Operation not permitted
E1018 07:52:09.91687725 slave.cpp:3758] Container 
'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
 Operation not permitted
{quote}

But out of the docker, the mesos slave works OK with the persistent volumes



> Fail to mount persistent volume when run mesos slave in docker
> --
>
> Key: MESOS-6410
> URL: https://issues.apache.org/jira/browse/MESOS-6410
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, volumes
>Affects Versions: 0.28.2
> Environment: Mesos 0.28.2
> Docker 1.12.1
>Reporter: Lei Xu
>Priority: Critical
>
> Here are some error logs from the slave:
> {code}
> E1018 07:52:06.18692630 slave.cpp:3758] Container 
> 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
>  Operation not permitted
> E1018 07:52:09.91687725 slave.cpp:3758] Container 
> 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> 

[jira] [Updated] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker

2016-10-18 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6410:
--
Description: 
Here are some error logs from the slave:

{quote}
E1018 07:52:06.18692630 slave.cpp:3758] Container 
'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
 Operation not permitted
E1018 07:52:09.91687725 slave.cpp:3758] Container 
'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
 Operation not permitted
{quote}

But out of the docker, the mesos slave works OK with the persistent volumes


  was:
Here are some error logs from the slave:

{quote}
E1018 07:52:06.18692630 slave.cpp:3758] Container 
'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
 Operation not permitted
E1018 07:52:09.91687725 slave.cpp:3758] Container 
'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
 Operation not permitted
{quote}




> Fail to mount persistent volume when run mesos slave in docker
> --
>
> Key: MESOS-6410
> URL: https://issues.apache.org/jira/browse/MESOS-6410
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, volumes
>Affects Versions: 0.28.2
> Environment: Mesos 0.28.2
> Docker 1.12.1
>Reporter: Lei Xu
>Priority: Critical
>
> Here are some error logs from the slave:
> {quote}
> E1018 07:52:06.18692630 slave.cpp:3758] Container 
> 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
>  Operation not permitted
> E1018 07:52:09.91687725 slave.cpp:3758] Container 
> 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
>  Operation not permitted
> {quote}
> 

[jira] [Updated] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker

2016-10-18 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6410:
--
Description: 
Here are some error logs from the slave:

{quote}
E1018 07:52:06.18692630 slave.cpp:3758] Container 
'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
 Operation not permitted
E1018 07:52:09.91687725 slave.cpp:3758] Container 
'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
 Operation not permitted
{quote}



> Fail to mount persistent volume when run mesos slave in docker
> --
>
> Key: MESOS-6410
> URL: https://issues.apache.org/jira/browse/MESOS-6410
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, volumes
>Affects Versions: 0.28.2
> Environment: Mesos 0.28.2
> Docker 1.12.1
>Reporter: Lei Xu
>Priority: Critical
>
> Here are some error logs from the slave:
> {quote}
> E1018 07:52:06.18692630 slave.cpp:3758] Container 
> 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
>  Operation not permitted
> E1018 07:52:09.91687725 slave.cpp:3758] Container 
> 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
>  Operation not permitted
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker

2016-10-18 Thread Lei Xu (JIRA)
Lei Xu created MESOS-6410:
-

 Summary: Fail to mount persistent volume when run mesos slave in 
docker
 Key: MESOS-6410
 URL: https://issues.apache.org/jira/browse/MESOS-6410
 Project: Mesos
  Issue Type: Bug
  Components: containerization, volumes
Affects Versions: 0.28.2
 Environment: Mesos 0.28.2
Docker 1.12.1
Reporter: Lei Xu
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4440) Clean get/post/deleteRequest func and let the caller to use the general funcion.

2016-10-18 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15584690#comment-15584690
 ] 

Adam B commented on MESOS-4440:
---

Sorry I'm only just now getting around to this. I took a look over the patches 
and you're on the right track. Just a few minor things to clean up. I hope you 
are still interested in pursuing this patch set.

> Clean get/post/deleteRequest func and let the caller to use the general 
> funcion.
> 
>
> Key: MESOS-4440
> URL: https://issues.apache.org/jira/browse/MESOS-4440
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Yongqiao Wang
>Assignee: Yongqiao Wang
>Priority: Minor
>  Labels: tech-debt
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)