[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741930#comment-14741930
 ] 

haosdent commented on MESOS-3349:
-

I am testing, thank you!

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: Jie Yu
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-3349:

Assignee: Jie Yu  (was: haosdent)

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: Jie Yu
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741925#comment-14741925
 ] 

Jie Yu commented on MESOS-3349:
---

Alright, this is the true fix. Tested on CentOS5 and CentOS6. 
[~haosd...@gmail.com], can you test the patches on Ubuntu? Thanks!

https://reviews.apache.org/r/38328/
https://reviews.apache.org/r/38329/

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3417) Log source address replicated log recieved broadcasts

2015-09-11 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3417:
--
Labels: mesosphere newbie  (was: mesosphere)

> Log source address replicated log recieved broadcasts
> -
>
> Key: MESOS-3417
> URL: https://issues.apache.org/jira/browse/MESOS-3417
> Project: Mesos
>  Issue Type: Improvement
>  Components: replicated log
>Affects Versions: 0.23.0, 0.24.0
> Environment: Mesos 0.23
>Reporter: Cody Maloney
>Assignee: Adam B
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Currently Mesos doesn't log what machine a replicated log status broadcast 
> was recieved from:
> {code}
> Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.320164 15637 
> replica.cpp:641] Replica in EMPTY status received a broadcasted recover 
> request
> Sep 11 21:41:14 master-01 mesos-dns[15583]: I0911 21:41:14.321097   15583 
> detect.go:118] ignoring children-changed event, leader has not changed: /mesos
> Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.353914 15639 
> replica.cpp:641] Replica in EMPTY status received a broadcasted recover 
> request
> Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.479132 15639 
> replica.cpp:641] Replica in EMPTY status received a broadcasted recover 
> request
> {code}
> It would be really useful for debugging replicated log startup issues to have 
> info about where the message came from (libprocess address, ip, or hostname) 
> the message came from



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3417) Log source address replicated log recieved broadcasts

2015-09-11 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3417:
--
Shepherd: Adam B
Assignee: (was: Adam B)

> Log source address replicated log recieved broadcasts
> -
>
> Key: MESOS-3417
> URL: https://issues.apache.org/jira/browse/MESOS-3417
> Project: Mesos
>  Issue Type: Improvement
>  Components: replicated log
>Affects Versions: 0.23.0, 0.24.0
> Environment: Mesos 0.23
>Reporter: Cody Maloney
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Currently Mesos doesn't log what machine a replicated log status broadcast 
> was recieved from:
> {code}
> Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.320164 15637 
> replica.cpp:641] Replica in EMPTY status received a broadcasted recover 
> request
> Sep 11 21:41:14 master-01 mesos-dns[15583]: I0911 21:41:14.321097   15583 
> detect.go:118] ignoring children-changed event, leader has not changed: /mesos
> Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.353914 15639 
> replica.cpp:641] Replica in EMPTY status received a broadcasted recover 
> request
> Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.479132 15639 
> replica.cpp:641] Replica in EMPTY status received a broadcasted recover 
> request
> {code}
> It would be really useful for debugging replicated log startup issues to have 
> info about where the message came from (libprocess address, ip, or hostname) 
> the message came from



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3417) Log source address replicated log recieved broadcasts

2015-09-11 Thread Cody Maloney (JIRA)
Cody Maloney created MESOS-3417:
---

 Summary: Log source address replicated log recieved broadcasts
 Key: MESOS-3417
 URL: https://issues.apache.org/jira/browse/MESOS-3417
 Project: Mesos
  Issue Type: Improvement
  Components: replicated log
Affects Versions: 0.24.0, 0.23.0
 Environment: Mesos 0.23
Reporter: Cody Maloney
Assignee: Adam B
Priority: Minor


Currently Mesos doesn't log what machine a replicated log status broadcast was 
recieved from:
{code}
Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.320164 15637 
replica.cpp:641] Replica in EMPTY status received a broadcasted recover request
Sep 11 21:41:14 master-01 mesos-dns[15583]: I0911 21:41:14.321097   15583 
detect.go:118] ignoring children-changed event, leader has not changed: /mesos
Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.353914 15639 
replica.cpp:641] Replica in EMPTY status received a broadcasted recover request
Sep 11 21:41:14 master-01 mesos-master[15625]: I0911 21:41:14.479132 15639 
replica.cpp:641] Replica in EMPTY status received a broadcasted recover request
{code}

It would be really useful for debugging replicated log startup issues to have 
info about where the message came from (libprocess address, ip, or hostname) 
the message came from



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3340) Command-line flags should take precedence over OS Env variables

2015-09-11 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741915#comment-14741915
 ] 

Michael Park commented on MESOS-3340:
-

[~marco-mesos] I think it does make sense to prioritize command-line flags over 
environment variables.
[~klaus1982] mentioned in the description of his review request that he 
implemented it that way since he thinks overwriting may cause confusion for the 
user.

I would imagine people typically have {{export MESOS_IP=127.0.0.1}} in their 
{{bashrc}} which they use in most cases by default, and provide the 
{{--ip=127.168.1.2}} on occasions (e.g. testing) when they want to override it. 
Having to change it or having to unset it in order to provide the command line 
flag I think would make it more difficult to use.

A similar situation exists in SSL, where flags {{verify_cert}} and 
{{require_cert}} can conflict. If {{require_cert = true}} and {{verify_cert = 
false}}, rather than causing an error, we simply override the {{verify_cert}} 
flag to {{true}} and proceed.

[~klaus1982]: What do you think?

> Command-line flags should take precedence over OS Env variables
> ---
>
> Key: MESOS-3340
> URL: https://issues.apache.org/jira/browse/MESOS-3340
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 0.24.0
>Reporter: Marco Massenzio
>Assignee: Klaus Ma
>  Labels: mesosphere, tech-debt
>
> Currently, it appears that re-defining a flag on the command-line that was 
> already defined via a OS Env var ({{MESOS_*}}) causes the Master to fail with 
> a not very helpful message.
> For example, if one has {{MESOS_QUORUM}} defined, this happens:
> {noformat}
> $ ./mesos-master --zk=zk://192.168.1.4/mesos --quorum=1 
> --hostname=192.168.1.4 --ip=192.168.1.4
> Duplicate flag 'quorum' on command line
> {noformat}
> which is not very helpful.
> Ideally, we would parse the flags with a "well-known" priority (command-line 
> first, environment last) - but at the very least, the error message should be 
> more helpful in explaining what the issue is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1790) Add "chown" option to CommandInfo.URI

2015-09-11 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741911#comment-14741911
 ] 

Adam B commented on MESOS-1790:
---

I can imagine k8s-mesos wanting similar privileges, since it could also launch 
tasks on behalf of other users.

> Add "chown" option to CommandInfo.URI
> -
>
> Key: MESOS-1790
> URL: https://issues.apache.org/jira/browse/MESOS-1790
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Jim Klucar
>  Labels: myriad, newbie
> Attachments: 
> 0001-MESOS-1790-Adds-chown-option-to-CommandInfo.URI.patch
>
>
> Mesos fetcher always chown()s the extracted executor URIs as the executor 
> user but sometimes this is not desirable, e.g., "setuid" bit gets lost during 
> chown() if slave/fetcher is running as root. 
> It would be nice to give frameworks the ability to skip the chown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741907#comment-14741907
 ] 

Jie Yu commented on MESOS-3349:
---

Great. I'll try to come up with a patch for true fix.

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741899#comment-14741899
 ] 

haosdent commented on MESOS-3349:
-

Could work if mark the parent as a shared mount. Could not work if mark it as a 
shared mount directly. So I misunderstand the document. LoL

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741877#comment-14741877
 ] 

haosdent commented on MESOS-3349:
-

Let me try.

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741876#comment-14741876
 ] 

Jie Yu commented on MESOS-3349:
---

Can you try to make the parent of the persistent volume a shared mount? See if 
that solves the problem.

Sent from my iPhone



> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741873#comment-14741873
 ] 

haosdent commented on MESOS-3349:
-

[~jieyu] I try that on Ubuntu 14.04 (3.13.0-32) and CentOS 6 (2.6.32-504) And 
sure why umount success and rmdir failed is because executor is still running 
and holding the mount point after I add trace code in executor.cpp. This is 
also the reason why we could rmdir success after few seconds. Because executor 
would terminated in few seconds after send TASK_FINISH.

We could reproduce this problem through these simple shell snippet
{code}
mkdir /tmp/source /tmp/target
mount --bind /tmp/source /tmp/target
unshare -m /bin/bash -- -c "sleep 2" &
umount /tmp/target && rmdir /tmp/target
{code}

And this test case also simulate this problem: 
https://reviews.apache.org/r/38300/diff/1#index_header

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2863) Command executor can send TASK_KILLED after TASK_FINISHED

2015-09-11 Thread Vaibhav Khanduja (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741825#comment-14741825
 ] 

Vaibhav Khanduja commented on MESOS-2863:
-

[~vinodkone][~tnachen] Do you have any thoughts here? Can you guide me to the 
problem?  

> Command executor can send TASK_KILLED after TASK_FINISHED
> -
>
> Key: MESOS-2863
> URL: https://issues.apache.org/jira/browse/MESOS-2863
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Vaibhav Khanduja
>  Labels: newbie++
>
> Observed this while doing some tests in our test cluster.
> If the command executor gets a shutdown() (e.g., framework unregistered) 
> after sending TASK_FINISHED but before exiting (there is a forced sleep), it 
> could send a TASK_KILLED update to the slave.
> Ideally the command executor should not send multiple terminal updates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741792#comment-14741792
 ] 

Jie Yu commented on MESOS-3349:
---

Posted a workaround while still investigating alternatives:
https://reviews.apache.org/r/38327

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741771#comment-14741771
 ] 

Jie Yu commented on MESOS-3349:
---

[~wangcong] pointed to me a patch that might explain why kernel 4.1 no longer 
has this problem. This patch got merged in 3.18.

In the patch, if rmdir target is a mount point, the kernel will try to umount 
it lazily. But that still does not explain why umount returns OK even if the 
mount point cannot be completely removed.

commit 8ed936b5671bfb33d89bc60bdcc7cf0470ba52fe
Author: Eric W. Biederman 
Date:   Tue Oct 1 18:33:48 2013 -0700

vfs: Lazily remove mounts on unlinked files and directories.

With the introduction of mount namespaces and bind mounts it became
possible to access files and directories that on some paths are mount
points but are not mount points on other paths.  It is very confusing
when rm -rf somedir returns -EBUSY simply because somedir is mounted
somewhere else.  With the addition of user namespaces allowing
unprivileged mounts this condition has gone from annoying to allowing
a DOS attack on other users in the system.

The possibility for mischief is removed by updating the vfs to support
rename, unlink and rmdir on a dentry that is a mountpoint and by
lazily unmounting mountpoints on deleted dentries.

In particular this change allows rename, unlink and rmdir system calls
on a dentry without a mountpoint in the current mount namespace to
succeed, and it allows rename, unlink, and rmdir performed on a
distributed filesystem to update the vfs cache even if when there is a
mount in some namespace on the original dentry.

There are two common patterns of maintaining mounts: Mounts on trusted
paths with the parent directory of the mount point and all ancestory
directories up to / owned by root and modifiable only by root
(i.e. /media/xxx, /dev, /dev/pts, /proc, /sys, /sys/fs/cgroup/{cpu,
cpuacct, ...}, /usr, /usr/local).  Mounts on unprivileged directories
maintained by fusermount.

In the case of mounts in trusted directories owned by root and
modifiable only by root the current parent directory permissions are
sufficient to ensure a mount point on a trusted path is not removed
or renamed by anyone other than root, even if there is a context
where the there are no mount points to prevent this.

In the case of mounts in directories owned by less privileged users
races with users modifying the path of a mount point are already a
danger.  fusermount already uses a combination of chdir,
/proc//fd/NNN, and UMOUNT_NOFOLLOW to prevent these races.  The
removable of global rename, unlink, and rmdir protection really adds
nothing new to consider only a widening of the attack window, and
fusermount is already safe against unprivileged users modifying the
directory simultaneously.

In principle for perfect userspace programs returning -EBUSY for
unlink, rmdir, and rename of dentires that have mounts in the local
namespace is actually unnecessary.  Unfortunately not all userspace
programs are perfect so retaining -EBUSY for unlink, rmdir and rename
of dentries that have mounts in the current mount namespace plays an
important role of maintaining consistency with historical behavior and
making imperfect userspace applications hard to exploit.

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3416) Publish egg for 0.24.0 to PyPI

2015-09-11 Thread Bill Farner (JIRA)
Bill Farner created MESOS-3416:
--

 Summary: Publish egg for 0.24.0 to PyPI
 Key: MESOS-3416
 URL: https://issues.apache.org/jira/browse/MESOS-3416
 Project: Mesos
  Issue Type: Task
Reporter: Bill Farner
Priority: Blocker


0.24.0 was released, but the python egg has not been published.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741723#comment-14741723
 ] 

Jie Yu commented on MESOS-3349:
---

Digged a thread in chrome project:
https://code.google.com/p/chromium/issues/detail?id=358933

Looks like a kernel issue in old kernel. Not sure which commit fixed this issue.

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2825) Run heavy disk IO operations at low IO priority

2015-09-11 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741687#comment-14741687
 ] 

Yan Xu commented on MESOS-2825:
---

Hi [~SEJeff], 

I don't think the current use case (filesystem provisioning) for this requires 
it to be an abstraction by itself yet. Our {{Subprocess}} allows a [setup 
function to be 
specified|https://github.com/apache/mesos/blob/05c608f80e3b120302bd2040780ce477677671dc/3rdparty/libprocess/include/process/subprocess.hpp#L240]
 where we could call {{ioprio_set}} on Linux.

There is no use case for other systems yet.

> Run heavy disk IO operations at low IO priority
> ---
>
> Key: MESOS-2825
> URL: https://issues.apache.org/jira/browse/MESOS-2825
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Affects Versions: 0.23.0
>Reporter: Ian Downes
>Assignee: Yan Xu
>Priority: Minor
>  Labels: twitter
>
> Slave IO operations on large images can be disruptive to tasks, e.g., 
> decompressing, decrypting, hashing, and extracting. Provide option to run 
> such operations at a lower IO priority, i.e., at "idle" or by explicitly 
> setting the "best effort" priority to less than the tasks', e.g., 0 as the 
> lowest best effort priority. All of the existing operations in the Appc 
> provisioner are run as Subprocesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3340) Command-line flags should take precedence over OS Env variables

2015-09-11 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741611#comment-14741611
 ] 

Marco Massenzio commented on MESOS-3340:


Thanks [~mcypark] for taking this up.
I've done a first round of review.

The main question (for which I don't really have an answer) is whether we 
should actually have a "priority" for flags (e.g., {{command-line; OS Env; 
default value}}) and, instead of erroring out, we should just use the highest 
priority value provided by the user.

In other words:
{noformat}
MESOS_IP=127.0.0.1 ./bin/mesos-master.sh --ip=192.168.1.2 ...
{noformat}

would result in {{hostname == "192.168.1.2"}} instead of an error.

What do you think, [~mcypark]?

[cc: [~vinodkone]]

> Command-line flags should take precedence over OS Env variables
> ---
>
> Key: MESOS-3340
> URL: https://issues.apache.org/jira/browse/MESOS-3340
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 0.24.0
>Reporter: Marco Massenzio
>Assignee: Klaus Ma
>  Labels: mesosphere, tech-debt
>
> Currently, it appears that re-defining a flag on the command-line that was 
> already defined via a OS Env var ({{MESOS_*}}) causes the Master to fail with 
> a not very helpful message.
> For example, if one has {{MESOS_QUORUM}} defined, this happens:
> {noformat}
> $ ./mesos-master --zk=zk://192.168.1.4/mesos --quorum=1 
> --hostname=192.168.1.4 --ip=192.168.1.4
> Duplicate flag 'quorum' on command line
> {noformat}
> which is not very helpful.
> Ideally, we would parse the flags with a "well-known" priority (command-line 
> first, environment last) - but at the very least, the error message should be 
> more helpful in explaining what the issue is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-2984) Deprecating '.json' extension in files endpoints url

2015-09-11 Thread Isabel Jimenez (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611631#comment-14611631
 ] 

Isabel Jimenez edited comment on MESOS-2984 at 9/11/15 9:53 PM:


https://reviews.apache.org/r/36127/
https://reviews.apache.org/r/38321/


was (Author: ijimenez):
https://reviews.apache.org/r/36127/

> Deprecating '.json' extension in files endpoints url
> 
>
> Key: MESOS-2984
> URL: https://issues.apache.org/jira/browse/MESOS-2984
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Isabel Jimenez
>Assignee: Isabel Jimenez
>  Labels: HTTP, mesosphere
>
> Remove the '.json' extension on endpoints such as `/files/browse.json` so it 
> become `/files/browse`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3340) Command-line flags should take precedence over OS Env variables

2015-09-11 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-3340:

Shepherd: Michael Park  (was: Bernd Mathiske)

> Command-line flags should take precedence over OS Env variables
> ---
>
> Key: MESOS-3340
> URL: https://issues.apache.org/jira/browse/MESOS-3340
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Affects Versions: 0.24.0
>Reporter: Marco Massenzio
>Assignee: Klaus Ma
>  Labels: mesosphere, tech-debt
>
> Currently, it appears that re-defining a flag on the command-line that was 
> already defined via a OS Env var ({{MESOS_*}}) causes the Master to fail with 
> a not very helpful message.
> For example, if one has {{MESOS_QUORUM}} defined, this happens:
> {noformat}
> $ ./mesos-master --zk=zk://192.168.1.4/mesos --quorum=1 
> --hostname=192.168.1.4 --ip=192.168.1.4
> Duplicate flag 'quorum' on command line
> {noformat}
> which is not very helpful.
> Ideally, we would parse the flags with a "well-known" priority (command-line 
> first, environment last) - but at the very least, the error message should be 
> more helpful in explaining what the issue is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741438#comment-14741438
 ] 

Jie Yu edited comment on MESOS-3349 at 9/11/15 7:52 PM:


[~haosd...@gmail.com] We definitely have this test passed on CentOS6 
(linux-4.1) running using ROOT. If what you said is true, we should definitely 
get the same error, but we don't. Also, we cannot get rid of CLONE_NEWNS 
because the container expects a new mount namespace and many of our code relies 
on that.

Also, I observed a case where I still got EBUSY when doing rmdir even if all 
processes in the container are killed (centos5, linux-3.14). That's in the 
containerizer->destroy path. Still investigating.


was (Author: jieyu):
[~haosd...@gmail.com] We definitely have this test passed on CentOS6 running 
using ROOT. If what you said is true, we should definitely get the same error, 
but we don't. Also, we cannot get rid of CLONE_NEWNS because the container 
expects a new mount namespace and many of our code relies on that.

Also, I observed a case where I still got EBUSY when doing rmdir even if all 
processes in the container are killed. That's in the containerizer->destroy 
path. Still investigating.

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741438#comment-14741438
 ] 

Jie Yu commented on MESOS-3349:
---

[~haosd...@gmail.com] We definitely have this test passed on CentOS6 running 
using ROOT. If what you said is true, we should definitely get the same error, 
but we don't. Also, we cannot get rid of CLONE_NEWNS because the container 
expects a new mount namespace and many of our code relies on that.

Also, I observed a case where I still got EBUSY when doing rmdir even if all 
processes in the container are killed. That's in the containerizer->destroy 
path. Still investigating.

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3415) Appc provisioner doesn't clean up the the root dir of a container's provisioned rootfses.

2015-09-11 Thread Yan Xu (JIRA)
Yan Xu created MESOS-3415:
-

 Summary: Appc provisioner doesn't clean up the the root dir of a 
container's provisioned rootfses.
 Key: MESOS-3415
 URL: https://issues.apache.org/jira/browse/MESOS-3415
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.25.0
Reporter: Yan Xu
Assignee: Yan Xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3413) Docker containerizer does not symlink persistent volumes into sandbox

2015-09-11 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741243#comment-14741243
 ] 

haosdent commented on MESOS-3413:
-

In meso docker containerizer, we use docker -v to mount. I think symlink is not 
related to this. [~neunhoef] Do you set the container_path in Volume Resource 
correctly when driver.acceptOffers? For example, if you set the container_path 
to "path1", it should be accessed in docker container through "/path1".

> Docker containerizer does not symlink persistent volumes into sandbox
> -
>
> Key: MESOS-3413
> URL: https://issues.apache.org/jira/browse/MESOS-3413
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker, slave
>Affects Versions: 0.23.0
>Reporter: Max Neunhöffer
>Assignee: haosdent
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For the ArangoDB framework I am trying to use the persistent primitives. 
> nearly all is working, but I am missing a crucial piece at the end: I have 
> successfully created a persistent disk resource and have set the persistence 
> and volume information in the DiskInfo message. However, I do not see any way 
> to find out what directory on the host the mesos slave has reserved for us. I 
> know it is ${MESOS_SLAVE_WORKDIR}/volumes/roles//_ but we 
> have no way to query this information anywhere. The docker containerizer does 
> not automatically mount this directory into our docker container, or symlinks 
> it into our sandbox. Therefore, I have essentially no access to it. Note that 
> the mesos containerizer (which I cannot use for other reasons) seems to 
> create a symlink in the sandbox to the actual path for the persistent volume. 
> With that, I could mount the volume into our docker container and all would 
> be well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3051) performance issues with port ranges comparison

2015-09-11 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741193#comment-14741193
 ] 

Jie Yu commented on MESOS-3051:
---

Joris, totally! I have no intention to block you guys at all. Just thought 
it'll be great if this part of the code base can be cleaned up later using 
IntervalSet.


> performance issues with port ranges comparison
> --
>
> Key: MESOS-3051
> URL: https://issues.apache.org/jira/browse/MESOS-3051
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Affects Versions: 0.22.1
>Reporter: James Peach
>Assignee: Joerg Schad
>  Labels: mesosphere
>
> Testing in an environment with lots of frameworks (>200), where the 
> frameworks permanently decline resources they don't need. The allocator ends 
> up spending a lot of time figuring out whether offers are refused (the code 
> path through {{HierarchicalAllocatorProcess::isFiltered()}}.
> In profiling a synthetic benchmark, it turns out that comparing port ranges 
> is very expensive, involving many temporary allocations. 61% of 
> Resources::contains() run time is in operator -= (Resource). 35% of 
> Resources::contains() run time is in Resources::_contains().
> The heaviest call chain through {{Resources::_contains}} is:
> {code}
> Running Time  Self (ms) Symbol Name
> 7237.0ms   35.5%  4.0
> mesos::Resources::_contains(mesos::Resource const&) const
> 7200.0ms   35.3%  1.0 mesos::contains(mesos::Resource 
> const&, mesos::Resource const&)
> 7133.0ms   35.0%121.0  
> mesos::operator<=(mesos::Value_Ranges const&, mesos::Value_Ranges const&)
> 6319.0ms   31.0%  7.0   
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Ranges const&)
> 6240.0ms   30.6%161.0
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
> 1867.0ms9.1% 25.0 mesos::Value_Ranges::add_range()
> 1694.0ms8.3%  4.0 
> mesos::Value_Ranges::~Value_Ranges()
> 1495.0ms7.3% 16.0 
> mesos::Value_Ranges::operator=(mesos::Value_Ranges const&)
>  445.0ms2.1% 94.0 
> mesos::Value_Range::MergeFrom(mesos::Value_Range const&)
>  154.0ms0.7% 24.0 mesos::Value_Ranges::range(int) 
> const
>  103.0ms0.5% 24.0 
> mesos::Value_Ranges::range_size() const
>   95.0ms0.4%  2.0 
> mesos::Value_Range::Value_Range(mesos::Value_Range const&)
>   59.0ms0.2%  4.0 
> mesos::Value_Ranges::Value_Ranges()
>   50.0ms0.2% 50.0 mesos::Value_Range::begin() 
> const
>   28.0ms0.1% 28.0 mesos::Value_Range::end() const
>   26.0ms0.1%  0.0 
> mesos::Value_Range::~Value_Range()
> {code}
> mesos::coalesce(Value_Ranges) gets done a lot and ends up being really 
> expensive. The heaviest parts of the inverted call chain are:
> {code}
> Running Time  Self (ms)   Symbol Name
> 3209.0ms   15.7%  3209.0  mesos::Value_Range::~Value_Range()
> 3209.0ms   15.7%  0.0  
> google::protobuf::internal::GenericTypeHandler::Delete(mesos::Value_Range*)
> 3209.0ms   15.7%  0.0   void 
> google::protobuf::internal::RepeatedPtrFieldBase::Destroy::TypeHandler>()
> 3209.0ms   15.7%  0.0
> google::protobuf::RepeatedPtrField::~RepeatedPtrField()
> 3209.0ms   15.7%  0.0 
> google::protobuf::RepeatedPtrField::~RepeatedPtrField()
> 3209.0ms   15.7%  0.0  
> mesos::Value_Ranges::~Value_Ranges()
> 3209.0ms   15.7%  0.0   
> mesos::Value_Ranges::~Value_Ranges()
> 2441.0ms   11.9%  0.0
> mesos::coalesce(mesos::Value_Ranges*, mesos::Value_Range const&)
>  452.0ms2.2%  0.0
> mesos::remove(mesos::Value_Ranges*, mesos::Value_Range const&)
>  169.0ms0.8%  0.0
> mesos::operator<=(mesos::Value_Ranges const&, mesos::Value_Ranges const&)
>   82.0ms0.4%  0.0
> mesos::operator-=(mesos::Value_Ranges&, mesos::Value_Ranges const&)
>   65.0ms0.3%  0.0
> mesos::Value_Ranges::~Value_Ranges()
> 2541.0ms   12.4%  2541.0  
> google::protobuf::internal::GenericTypeHandler::New()
> 2541.0ms   12.4%  0.0  
> google::protobuf::RepeatedPtrField::TypeHandler::Type* 
> google::protobuf::internal::RepeatedPtrFieldBase::Add::TypeHandler>()
> 2305.0ms   11.3%  0.0   
> google::protobuf::RepeatedPtrField::Add()
> 2305.0ms   11.3%  0.0mesos::Value_Ranges:

[jira] [Assigned] (MESOS-2091) docs/mesos-developers.guide.md does not mention github pull-requests.

2015-09-11 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song reassigned MESOS-2091:
---

Assignee: Gilbert Song

> docs/mesos-developers.guide.md does not mention github pull-requests.
> -
>
> Key: MESOS-2091
> URL: https://issues.apache.org/jira/browse/MESOS-2091
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Till Toenshoff
>Assignee: Gilbert Song
>Priority: Minor
>  Labels: newbie
>
> Given that we do actually support github pull-requests as a way to suggest 
> patches, our guidelines should be updated accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3413) Docker containerizer does not symlink persistent volumes into sandbox

2015-09-11 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-3413:
---

Assignee: haosdent

> Docker containerizer does not symlink persistent volumes into sandbox
> -
>
> Key: MESOS-3413
> URL: https://issues.apache.org/jira/browse/MESOS-3413
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker, slave
>Affects Versions: 0.23.0
>Reporter: Max Neunhöffer
>Assignee: haosdent
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For the ArangoDB framework I am trying to use the persistent primitives. 
> nearly all is working, but I am missing a crucial piece at the end: I have 
> successfully created a persistent disk resource and have set the persistence 
> and volume information in the DiskInfo message. However, I do not see any way 
> to find out what directory on the host the mesos slave has reserved for us. I 
> know it is ${MESOS_SLAVE_WORKDIR}/volumes/roles//_ but we 
> have no way to query this information anywhere. The docker containerizer does 
> not automatically mount this directory into our docker container, or symlinks 
> it into our sandbox. Therefore, I have essentially no access to it. Note that 
> the mesos containerizer (which I cannot use for other reasons) seems to 
> create a symlink in the sandbox to the actual path for the persistent volume. 
> With that, I could mount the volume into our docker container and all would 
> be well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3270) Fix error GMOCK_CONFIG_CMD in CMake

2015-09-11 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741183#comment-14741183
 ] 

Neil Conway commented on MESOS-3270:


Should be fixed in https://reviews.apache.org/r/37370

> Fix error GMOCK_CONFIG_CMD in CMake
> ---
>
> Key: MESOS-3270
> URL: https://issues.apache.org/jira/browse/MESOS-3270
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>
> The GMOCK_CONFIG_CMD don't contains $\{CMAKE_CXX_FLAGS\}, this make gmock 
> build with GTEST_LANG_CXX11 failed.
> This bug is report by [~hausdorff] in https://reviews.apache.org/r/37370/ 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3413) Docker containerizer does not symlink persistent volumes into sandbox

2015-09-11 Thread Vaibhav Khanduja (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741150#comment-14741150
 ] 

Vaibhav Khanduja edited comment on MESOS-3413 at 9/11/15 5:12 PM:
--

I see the code for docker.cpp it is mapping sandbox directory as volumes inside 
the container. The value of the sandbox directory is coming in as environment 
variables:"MESOS_SANDBOX"
..
 argv.push_back("-e");
 argv.push_back("MESOS_SANDBOX=" + mappedDirectory);
 argv.push_back("-e");
 argv.push_back("MESOS_CONTAINER_NAME=" + name);
...

May be good idea to try this for now, though not a good solution ... 

Docker bind mounts these volumes inside container ... a symlink to a directory 
would be mounted but since it does chroot, the link may not be valid inside the 
container  


was (Author: vaibhavkhanduja):
I see the code for docker.cpp it is mapping sandbox directory as volumes inside 
the container. The value of the sandbox directory is coming in as environment 
variables:"MESOS_SANDBOX"
..
 argv.push_back("-e");
 argv.push_back("MESOS_SANDBOX=" + mappedDirectory);
 argv.push_back("-e");
 argv.push_back("MESOS_CONTAINER_NAME=" + name);
...

May be good idea to try this for now, though not a good solution ... 

> Docker containerizer does not symlink persistent volumes into sandbox
> -
>
> Key: MESOS-3413
> URL: https://issues.apache.org/jira/browse/MESOS-3413
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker, slave
>Affects Versions: 0.23.0
>Reporter: Max Neunhöffer
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For the ArangoDB framework I am trying to use the persistent primitives. 
> nearly all is working, but I am missing a crucial piece at the end: I have 
> successfully created a persistent disk resource and have set the persistence 
> and volume information in the DiskInfo message. However, I do not see any way 
> to find out what directory on the host the mesos slave has reserved for us. I 
> know it is ${MESOS_SLAVE_WORKDIR}/volumes/roles//_ but we 
> have no way to query this information anywhere. The docker containerizer does 
> not automatically mount this directory into our docker container, or symlinks 
> it into our sandbox. Therefore, I have essentially no access to it. Note that 
> the mesos containerizer (which I cannot use for other reasons) seems to 
> create a symlink in the sandbox to the actual path for the persistent volume. 
> With that, I could mount the volume into our docker container and all would 
> be well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3413) Docker containerizer does not symlink persistent volumes into sandbox

2015-09-11 Thread Vaibhav Khanduja (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741150#comment-14741150
 ] 

Vaibhav Khanduja commented on MESOS-3413:
-

I see the code for docker.cpp it is mapping sandbox directory as volumes inside 
the container. The value of the sandbox directory is coming in as environment 
variables:"MESOS_SANDBOX"
..
 argv.push_back("-e");
 argv.push_back("MESOS_SANDBOX=" + mappedDirectory);
 argv.push_back("-e");
 argv.push_back("MESOS_CONTAINER_NAME=" + name);
...

May be good idea to try this for now, though not a good solution ... 

> Docker containerizer does not symlink persistent volumes into sandbox
> -
>
> Key: MESOS-3413
> URL: https://issues.apache.org/jira/browse/MESOS-3413
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker, slave
>Affects Versions: 0.23.0
>Reporter: Max Neunhöffer
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For the ArangoDB framework I am trying to use the persistent primitives. 
> nearly all is working, but I am missing a crucial piece at the end: I have 
> successfully created a persistent disk resource and have set the persistence 
> and volume information in the DiskInfo message. However, I do not see any way 
> to find out what directory on the host the mesos slave has reserved for us. I 
> know it is ${MESOS_SLAVE_WORKDIR}/volumes/roles//_ but we 
> have no way to query this information anywhere. The docker containerizer does 
> not automatically mount this directory into our docker container, or symlinks 
> it into our sandbox. Therefore, I have essentially no access to it. Note that 
> the mesos containerizer (which I cannot use for other reasons) seems to 
> create a symlink in the sandbox to the actual path for the persistent volume. 
> With that, I could mount the volume into our docker container and all would 
> be well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3414) Failing test TaskValidationTest.DuplicatedTaskID

2015-09-11 Thread Ian Downes (JIRA)
Ian Downes created MESOS-3414:
-

 Summary: Failing test TaskValidationTest.DuplicatedTaskID
 Key: MESOS-3414
 URL: https://issues.apache.org/jira/browse/MESOS-3414
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.24.0
Reporter: Ian Downes
Priority: Minor


Observed in CI for a CentOS5 environment

{noformat}
DEBUG: [ RUN  ] TaskValidationTest.DuplicatedTaskID
DEBUG: Using temporary directory 
'/tmp/TaskValidationTest_DuplicatedTaskID_rgSTX3'
DEBUG: tests/master_validation_tests.cpp:832: Failure
DEBUG: Failed to wait 15secs for offers
DEBUG: tests/master_validation_tests.cpp:826: Failure
DEBUG: Actual function call count doesn't match EXPECT_CALL(sched, 
resourceOffers(&driver, _))...
DEBUG:  Expected: to be called at least once
DEBUG:Actual: never called - unsatisfied and active
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3380) Include libevent in Windows CMake build

2015-09-11 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3380:
-
Labels: build cmake mesosphere  (was: build cmake mesos)

> Include libevent in Windows CMake build
> ---
>
> Key: MESOS-3380
> URL: https://issues.apache.org/jira/browse/MESOS-3380
> Project: Mesos
>  Issue Type: Task
>  Components: build
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: build, cmake, mesosphere
>
> Windows will probably require libevent to work. This means we need to insert 
> the code to retrieve, build, and link against it for the Windows path, since 
> it isn't rebundled and distributed as part of Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3367) Mesos fetcher does not extract archives for URI with parameters

2015-09-11 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3367:
-
Labels: mesosphere  (was: )

> Mesos fetcher does not extract archives for URI with parameters
> ---
>
> Key: MESOS-3367
> URL: https://issues.apache.org/jira/browse/MESOS-3367
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 0.22.1, 0.23.0
> Environment: DCOS 1.1
>Reporter: Renat Zubairov
>Assignee: haosdent
>Priority: Minor
>  Labels: mesosphere
>
> I'm deploying using marathon applications with sources served from S3. I'm 
> using a signed URL to give only temporary access to the S3 resources, so URL 
> of the resource have some query parameters.
> So URI is 'https://foo.com/file.tgz?hasi' and fetcher stores it in the file 
> with the name 'file.tgz?hasi', then it thinks that extension 'hasi' is not 
> tgz hence extraction is skipped, despite the fact that MIME Type of the HTTP 
> resource is 'application/x-tar'.
> Workaround - add additional parameter like '&workaround=.tgz'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3358) Add TaskStatus label decorator hooks for Master

2015-09-11 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3358:
-
Labels: mesosphere  (was: )

> Add TaskStatus label decorator hooks for Master
> ---
>
> Key: MESOS-3358
> URL: https://issues.apache.org/jira/browse/MESOS-3358
> Project: Mesos
>  Issue Type: Task
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> The hook will be triggered when Master receives TaskStatus message from Agent 
> or when the Master itself generates a TASK_LOST status. The hook should also 
> provide a list of the previous TaskStatuses to the module.
> The use case is to allow a "cleanup" module to release IPs if an agent is 
> lost. The previous statuses will contain the IP address(es) to be released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3370) Deprecate the external containerizer

2015-09-11 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740945#comment-14740945
 ] 

Till Toenshoff commented on MESOS-3370:
---

Yes, a vote would be the right approach here, but I am also +1 (with a 
small tear in my eyes ;) ).

> Deprecate the external containerizer
> 
>
> Key: MESOS-3370
> URL: https://issues.apache.org/jira/browse/MESOS-3370
> Project: Mesos
>  Issue Type: Task
>Reporter: Niklas Quarfot Nielsen
>
> To our knowledge, no one is using the external containerizer and we could 
> clean up code paths in the slave and containerizer interface (the dual 
> launch() signatures)
> In a deprecation cycle, we can move this code into a module (dependent on 
> containerizer modules landing) and from there, move it into it's own repo



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3390) A new TaskStatus.reason to state capability mismatch between SlaveInfo::Capabilities and TaskInfo.

2015-09-11 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740941#comment-14740941
 ] 

Bernd Mathiske commented on MESOS-3390:
---

Can you describe this in more detail, please? And please state the problem, not 
the solution, first.

> A new TaskStatus.reason to state capability mismatch between 
> SlaveInfo::Capabilities and TaskInfo.
> --
>
> Key: MESOS-3390
> URL: https://issues.apache.org/jira/browse/MESOS-3390
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kapil Arya
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740888#comment-14740888
 ] 

haosdent commented on MESOS-3349:
-

A test case to simplify this problem: https://reviews.apache.org/r/38300/

> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
> I0901 02:17:26.435140 39432 exec.cpp:133] Version: 0.25.0
> I0901 02:17:26.442129 39461 exec.cpp:207] Executor registered on slave 
> 20150901-021726-1828659978-52102-32604-S0
> Registered executor on hostname
> Starting task d8ff1f00-e720-4a61-b440-e111009dfdc3
> sh -c 'echo abc > path1/file'
> Forked command at 39484
> Command exited with status 0 (pid: 39484)
> ../../src/tests/persistent_volume_tests.cpp:579: Failure
> Value of: os::exists(path::join(directory, "path1"))
>   Actual: true
> Expected: false
> [  FAILED  ] PersistentVolumeTest.AccessPersistentVolume (777 ms)
> {noformat}
> FYI [~jieyu] [~mcypark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3413) Docker containerizer does not symlink persistent volumes into sandbox

2015-09-11 Thread JIRA
Max Neunhöffer created MESOS-3413:
-

 Summary: Docker containerizer does not symlink persistent volumes 
into sandbox
 Key: MESOS-3413
 URL: https://issues.apache.org/jira/browse/MESOS-3413
 Project: Mesos
  Issue Type: Bug
  Components: containerization, docker, slave
Affects Versions: 0.23.0
Reporter: Max Neunhöffer


For the ArangoDB framework I am trying to use the persistent primitives. nearly 
all is working, but I am missing a crucial piece at the end: I have 
successfully created a persistent disk resource and have set the persistence 
and volume information in the DiskInfo message. However, I do not see any way 
to find out what directory on the host the mesos slave has reserved for us. I 
know it is ${MESOS_SLAVE_WORKDIR}/volumes/roles//_ but we 
have no way to query this information anywhere. The docker containerizer does 
not automatically mount this directory into our docker container, or symlinks 
it into our sandbox. Therefore, I have essentially no access to it. Note that 
the mesos containerizer (which I cannot use for other reasons) seems to create 
a symlink in the sandbox to the actual path for the persistent volume. With 
that, I could mount the volume into our docker container and all would be well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3349) PersistentVolumeTest.AccessPersistentVolume fails when run as root.

2015-09-11 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738557#comment-14738557
 ] 

haosdent edited comment on MESOS-3349 at 9/11/15 9:30 AM:
--

According 
[CLONE_NEWNS|http://stackoverflow.com/questions/22889241/linux-understanding-the-mount-namespace-clone-clone-newns-flag],
 [bind_mount|https://lwn.net/Articles/159092/]. I think could explain the 
behaviours so far.

In LinuxFilesystemIsolatorProcess, we mount persistent volume (default 
behaviour make-private) before launch the executor. After LinuxLauncher fork 
with CLONE_NEWNS, we could umount persistent volume in 
LinuxFilesystemIsolatorProcess. But this don't affect the executor continue to 
hold that mount point. When slave receive TASK_FINISH and call 
LinuxFilesystemIsolatorProcess try to rmdir that mount point, it would failed 
because executor is still running and holding the mount point (after add some 
trace code to show when executor exited, I observed this.). So a possible way 
to fix this is to use make-shared or make-slave when mount persistent volume. 
By my attempts failed on this. 

{code}
45 22 8:3 
/tmp/PersistentVolumeTest_AccessPersistentVolume_L6fY1a/volumes/roles/role1/id1 
/tmp/PersistentVolumeTest_AccessPersistentVolume_L6fY1a/slaves/20150911-170559-162297291-49192-21628-S0/frameworks/20150911-170559-162297291-49192-21628-/executors/c6bcf76f-7cf5-42e6-8eb8-2d21e393ba3d/runs/454bbfa3-0305-4900-b05d-389f6b215c32/path1
 rw,relatime shared:1 - ext4 
/dev/disk/by-uuid/98708f21-a59d-4b80-a85c-27b78c22e316 
rw,errors=remount-ro,data=ordered
{code}

{code}
78 48 8:3 
/tmp/PersistentVolumeTest_AccessPersistentVolume_L6fY1a/volumes/roles/role1/id1 
/tmp/PersistentVolumeTest_AccessPersistentVolume_L6fY1a/slaves/20150911-170559-162297291-49192-21628-S0/frameworks/20150911-170559-162297291-49192-21628-/executors/c6bcf76f-7cf5-42e6-8eb8-2d21e393ba3d/runs/454bbfa3-0305-4900-b05d-389f6b215c32/path1
 rw,relatime shared:1 - ext4 
/dev/disk/by-uuid/98708f21-a59d-4b80-a85c-27b78c22e316 
rw,errors=remount-ro,data=ordered
{code}

Could see the persistent volumes have already mount as shared, but this test 
still failed.


was (Author: haosd...@gmail.com):
After see this 
http://stackoverflow.com/questions/22889241/linux-understanding-the-mount-namespace-clone-clone-newns-flag
 about CLONE_NEWNS. I think could explain the behaviours so far.

In LinuxFilesystemIsolatorProcess, we mount in parent (pid is 24073).
{code}
I0910 18:07:42.768034 24073 linux.cpp:598] Mounting 
'/tmp/PersistentVolumeTest_AccessPersistentVolume_PTx7g0/volumes/roles/role1/id1'
 to 
'/tmp/PersistentVolumeTest_AccessPersistentVolume_PTx7g0/slaves/20150910-180742-162297291-42795-24055-S0/frameworks/20150910-180742-162297291-42795-24055-/executors/72989615-cc6e-449c-a561-264fcee7edc3/runs/0cdc0d01-4c59-48e8-925a-7a6c06feb2ae/path1'
 for persistent volume disk(role1)[id1:path1]:64 of container 
0cdc0d01-4c59-48e8-925a-7a6c06feb2ae
{code}

After LinuxLauncher fork with CLONE_NEWNS, child(pid is 24071) could unmount 
it. But still could not rmdir it, because it has another mount point handled by 
parent.
{code}
I0910 18:07:44.868654 24071 linux.cpp:493] Removing mount 
'/tmp/PersistentVolumeTest_AccessPersistentVolume_PTx7g0/slaves/20150910-180742-162297291-42795-24055-S0/frameworks/20150910-180742-162297291-42795-24055-/executors/72989615-cc6e-449c-a561-264fcee7edc3/runs/0cdc0d01-4c59-48e8-925a-7a6c06feb2ae/path1'
 for persistent volume disk(role1)[id1:path1]:64 of container 
0cdc0d01-4c59-48e8-925a-7a6c06feb2ae
E0910 18:07:44.876619 24076 slave.cpp:2870] Failed to update resources for 
container 0cdc0d01-4c59-48e8-925a-7a6c06feb2ae of executor 
72989615-cc6e-449c-a561-264fcee7edc3 running task 
72989615-cc6e-449c-a561-264fcee7edc3 on status update for terminal task, 
destroying container: Collect failed: Failed to remove persistent volume mount 
point at 
'/tmp/PersistentVolumeTest_AccessPersistentVolume_PTx7g0/slaves/20150910-180742-162297291-42795-24055-S0/frameworks/20150910-180742-162297291-42795-24055-/executors/72989615-cc6e-449c-a561-264fcee7edc3/runs/0cdc0d01-4c59-48e8-925a-7a6c06feb2ae/path1':
 Device or resource busy
{code}


> PersistentVolumeTest.AccessPersistentVolume fails when run as root.
> ---
>
> Key: MESOS-3349
> URL: https://issues.apache.org/jira/browse/MESOS-3349
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: Ubuntu 14.04, CentOS 5
>Reporter: Benjamin Mahler
>Assignee: haosdent
>  Labels: flaky-test
>
> When running the tests as root:
> {noformat}
> [ RUN  ] PersistentVolumeTest.AccessPersistentVolume
&

[jira] [Commented] (MESOS-3177) Make Mesos own configuration of roles/weights

2015-09-11 Thread Cody Maloney (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740398#comment-14740398
 ] 

Cody Maloney commented on MESOS-3177:
-

There hasn't been any design documentation building / development so far.

In my mind I've been thinking it of a "Before you start the mesos masters, you 
create the initial replicated log state which contains the first set of roles 
and weights to operate with". Then from that point on mesos has a "add_role" 
and "remove_role" endpoints to manage them. Even better would be that if you 
don't have authentication turned on, as mesos sees new roles it just adds them 
(And as all things with that role disappear it removes them). If authentication 
is turned on, the authentication mechanism effectively "permanently" owns all 
the roles it defines (if it's just a static configuration file). If it's a 
dynamic source / database then the interface to talk about ownership would 
probably need to get more complicated.

> Make Mesos own configuration of roles/weights
> -
>
> Key: MESOS-3177
> URL: https://issues.apache.org/jira/browse/MESOS-3177
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Cody Maloney
>Assignee: Thomas Rampelberg
>  Labels: mesosphere
>
> All roles and weights must currently be specified up-front when starting 
> Mesos masters currently. In addition, they should be consistent on every 
> master, otherwise unexpected behavior could occur (You can have them be 
> inconsistent for some upgrade paths / changing the set).
> This makes it hard to introduce new groups of machines under new roles 
> dynamically (Have to generate a new master configuration, deploy that, before 
> we can connect slaves with a new role to the cluster).
> Ideally an administrator can manually add / remove / edit roles and have the 
> settings replicated / passed to all masters in the cluster by Mesos. 
> Effectively Mesos takes ownership of the setting, rather than requiring it to 
> be done externally.
> In addition, if a new slave joins the cluster with an unexpected / new role 
> that should just work, making it much easier to introduce machines with new 
> roles. (Policy around whether or not a slave can cause creation of a new 
> role, a given slave can register with a given role, etc. is out of scope, and 
> would be controls in the general registration process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)