[jira] [Comment Edited] (MESOS-6507) 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails consistently.

2016-10-28 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617270#comment-15617270
 ] 

Anand Mazumdar edited comment on MESOS-6507 at 10/29/16 2:53 AM:
-

It was an oversight on my part, had forgotten backporting Ben's UUID patches. 
This should unblock the 1.0.2 release. 

{noformat}
commit 9e0f9505bae40a5f803d9a3eebfebe62287fbe91
Author: Benjamin Mahler 
Date:   Tue Sep 20 14:17:30 2016 -0700

Updated scheduler library to handle UUID parsing error.

Previously this would have thrown an exception.

Review: https://reviews.apache.org/r/52099

commit 4e4d058ea3c012b2e6d4bbed58ef7fbaea5b60fb
Author: Benjamin Mahler 
Date:   Tue Sep 20 14:14:39 2016 -0700

Updated UUID::fromString to not throw an exception on error.

The exception from the string_generator needs to be caught so
that we can surface a Try to the caller.

Review: https://reviews.apache.org/r/52098
{noformat}


was (Author: anandmazumdar):
It was an oversight on my part, had forgotten backporting Ben's UUID patches. 
This should unblock the 1.0.2 release. However, we still need to fix the test 
flakiness on HEAD. I would update the JIRA with the flaky test log.

{noformat}
commit 9e0f9505bae40a5f803d9a3eebfebe62287fbe91
Author: Benjamin Mahler 
Date:   Tue Sep 20 14:17:30 2016 -0700

Updated scheduler library to handle UUID parsing error.

Previously this would have thrown an exception.

Review: https://reviews.apache.org/r/52099

commit 4e4d058ea3c012b2e6d4bbed58ef7fbaea5b60fb
Author: Benjamin Mahler 
Date:   Tue Sep 20 14:14:39 2016 -0700

Updated UUID::fromString to not throw an exception on error.

The exception from the string_generator needs to be caught so
that we can surface a Try to the caller.

Review: https://reviews.apache.org/r/52098
{noformat}

> 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails 
> consistently.
> --
>
> Key: MESOS-6507
> URL: https://issues.apache.org/jira/browse/MESOS-6507
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, test
>Reporter: Gilbert Song
>Priority: Blocker
>  Labels: failure
>
> Here is the log:
> {noformat}
> [23:09:24] :   [Step 10/10] [ RUN  ] 
> DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.304638 31435 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -f -v mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.398941 31435 resources.cpp:572] 
> Parsing resources as JSON failed: cpus:1;mem:512
> [23:09:24] :   [Step 10/10] Trying semicolon-delimited string format instead
> [23:09:24] :   [Step 10/10] I1028 23:09:24.399123 31435 docker.cpp:809] 
> Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 
> 536870912 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-s1.malformedUUID -v 
> /mnt/teamcity/temp/buildTmp/DockerContainerizerTest_ROOT_DOCKER_SkipRecoverMalformedUUID_rjDyqa:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name mesos-s1.malformedUUID alpine -c 
> sleep 1000
> [23:09:24] :   [Step 10/10] I1028 23:09:24.401227 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.700460 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804401 31453 docker.cpp:785] 
> Recovering Docker containers
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804477 31453 docker.cpp:1091] 
> Running docker -H unix:///var/run/docker.sock ps -a
> [23:09:24] :   [Step 10/10] I1028 23:09:24.905027 31454 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] W1028 23:09:25.008965 31454 docker.cpp:838] 
> Skipping recovery of executor '' of framework '' because its latest run could 
> not be recovered
> [23:09:25] :   [Step 10/10] I1028 23:09:25.008996 31454 docker.cpp:957] 
> Checking if Docker container named '/mesos-s1.malformedUUID' was started by 
> Mesos
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009019 31454 docker.cpp:967] 
> Checking if Mesos container with ID 'malformedUUID' has been orphaned
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009052 31454 docker.cpp:860] 
> Running docker -H unix:///var/run/docker.sock stop -t 0 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.109345 31451 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -v 
> 

[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem

2016-10-28 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617272#comment-15617272
 ] 

haosdent commented on MESOS-6162:
-

No problem, thank you!

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6507) 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails consistently.

2016-10-28 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6507:
--
Target Version/s:   (was: 1.0.2)

> 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails 
> consistently.
> --
>
> Key: MESOS-6507
> URL: https://issues.apache.org/jira/browse/MESOS-6507
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, test
>Reporter: Gilbert Song
>Priority: Blocker
>  Labels: failure
>
> Here is the log:
> {noformat}
> [23:09:24] :   [Step 10/10] [ RUN  ] 
> DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.304638 31435 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -f -v mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.398941 31435 resources.cpp:572] 
> Parsing resources as JSON failed: cpus:1;mem:512
> [23:09:24] :   [Step 10/10] Trying semicolon-delimited string format instead
> [23:09:24] :   [Step 10/10] I1028 23:09:24.399123 31435 docker.cpp:809] 
> Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 
> 536870912 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-s1.malformedUUID -v 
> /mnt/teamcity/temp/buildTmp/DockerContainerizerTest_ROOT_DOCKER_SkipRecoverMalformedUUID_rjDyqa:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name mesos-s1.malformedUUID alpine -c 
> sleep 1000
> [23:09:24] :   [Step 10/10] I1028 23:09:24.401227 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.700460 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804401 31453 docker.cpp:785] 
> Recovering Docker containers
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804477 31453 docker.cpp:1091] 
> Running docker -H unix:///var/run/docker.sock ps -a
> [23:09:24] :   [Step 10/10] I1028 23:09:24.905027 31454 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] W1028 23:09:25.008965 31454 docker.cpp:838] 
> Skipping recovery of executor '' of framework '' because its latest run could 
> not be recovered
> [23:09:25] :   [Step 10/10] I1028 23:09:25.008996 31454 docker.cpp:957] 
> Checking if Docker container named '/mesos-s1.malformedUUID' was started by 
> Mesos
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009019 31454 docker.cpp:967] 
> Checking if Mesos container with ID 'malformedUUID' has been orphaned
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009052 31454 docker.cpp:860] 
> Running docker -H unix:///var/run/docker.sock stop -t 0 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.109345 31451 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -v 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.212870 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.513255 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.815946 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.119107 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.421722 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.724777 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.028252 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.331799 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.634660 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.938190 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:28] :   [Step 10/10] I1028 23:09:28.241756 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:28] :   [Step 10/10] I1028 

[jira] [Commented] (MESOS-6507) 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails consistently.

2016-10-28 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617270#comment-15617270
 ] 

Anand Mazumdar commented on MESOS-6507:
---

It was an oversight on my part, had forgotten backporting Ben's UUID patches. 
This should unblock the 1.0.2 release. However, we still need to fix the test 
flakiness on HEAD. I would update the JIRA with the flaky test log.

{noformat}
commit 9e0f9505bae40a5f803d9a3eebfebe62287fbe91
Author: Benjamin Mahler 
Date:   Tue Sep 20 14:17:30 2016 -0700

Updated scheduler library to handle UUID parsing error.

Previously this would have thrown an exception.

Review: https://reviews.apache.org/r/52099

commit 4e4d058ea3c012b2e6d4bbed58ef7fbaea5b60fb
Author: Benjamin Mahler 
Date:   Tue Sep 20 14:14:39 2016 -0700

Updated UUID::fromString to not throw an exception on error.

The exception from the string_generator needs to be caught so
that we can surface a Try to the caller.

Review: https://reviews.apache.org/r/52098
{noformat}

> 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails 
> consistently.
> --
>
> Key: MESOS-6507
> URL: https://issues.apache.org/jira/browse/MESOS-6507
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, test
>Reporter: Gilbert Song
>Priority: Blocker
>  Labels: failure
>
> Here is the log:
> {noformat}
> [23:09:24] :   [Step 10/10] [ RUN  ] 
> DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.304638 31435 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -f -v mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.398941 31435 resources.cpp:572] 
> Parsing resources as JSON failed: cpus:1;mem:512
> [23:09:24] :   [Step 10/10] Trying semicolon-delimited string format instead
> [23:09:24] :   [Step 10/10] I1028 23:09:24.399123 31435 docker.cpp:809] 
> Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 
> 536870912 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-s1.malformedUUID -v 
> /mnt/teamcity/temp/buildTmp/DockerContainerizerTest_ROOT_DOCKER_SkipRecoverMalformedUUID_rjDyqa:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name mesos-s1.malformedUUID alpine -c 
> sleep 1000
> [23:09:24] :   [Step 10/10] I1028 23:09:24.401227 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.700460 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804401 31453 docker.cpp:785] 
> Recovering Docker containers
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804477 31453 docker.cpp:1091] 
> Running docker -H unix:///var/run/docker.sock ps -a
> [23:09:24] :   [Step 10/10] I1028 23:09:24.905027 31454 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] W1028 23:09:25.008965 31454 docker.cpp:838] 
> Skipping recovery of executor '' of framework '' because its latest run could 
> not be recovered
> [23:09:25] :   [Step 10/10] I1028 23:09:25.008996 31454 docker.cpp:957] 
> Checking if Docker container named '/mesos-s1.malformedUUID' was started by 
> Mesos
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009019 31454 docker.cpp:967] 
> Checking if Mesos container with ID 'malformedUUID' has been orphaned
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009052 31454 docker.cpp:860] 
> Running docker -H unix:///var/run/docker.sock stop -t 0 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.109345 31451 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -v 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.212870 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.513255 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.815946 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.119107 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.421722 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.724777 31435 

[jira] [Updated] (MESOS-6507) 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails consistently.

2016-10-28 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6507:
--
Target Version/s: 1.0.2

> 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails 
> consistently.
> --
>
> Key: MESOS-6507
> URL: https://issues.apache.org/jira/browse/MESOS-6507
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, test
>Reporter: Gilbert Song
>Priority: Blocker
>  Labels: failure
>
> Here is the log:
> {noformat}
> [23:09:24] :   [Step 10/10] [ RUN  ] 
> DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.304638 31435 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -f -v mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.398941 31435 resources.cpp:572] 
> Parsing resources as JSON failed: cpus:1;mem:512
> [23:09:24] :   [Step 10/10] Trying semicolon-delimited string format instead
> [23:09:24] :   [Step 10/10] I1028 23:09:24.399123 31435 docker.cpp:809] 
> Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 
> 536870912 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-s1.malformedUUID -v 
> /mnt/teamcity/temp/buildTmp/DockerContainerizerTest_ROOT_DOCKER_SkipRecoverMalformedUUID_rjDyqa:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name mesos-s1.malformedUUID alpine -c 
> sleep 1000
> [23:09:24] :   [Step 10/10] I1028 23:09:24.401227 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.700460 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804401 31453 docker.cpp:785] 
> Recovering Docker containers
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804477 31453 docker.cpp:1091] 
> Running docker -H unix:///var/run/docker.sock ps -a
> [23:09:24] :   [Step 10/10] I1028 23:09:24.905027 31454 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] W1028 23:09:25.008965 31454 docker.cpp:838] 
> Skipping recovery of executor '' of framework '' because its latest run could 
> not be recovered
> [23:09:25] :   [Step 10/10] I1028 23:09:25.008996 31454 docker.cpp:957] 
> Checking if Docker container named '/mesos-s1.malformedUUID' was started by 
> Mesos
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009019 31454 docker.cpp:967] 
> Checking if Mesos container with ID 'malformedUUID' has been orphaned
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009052 31454 docker.cpp:860] 
> Running docker -H unix:///var/run/docker.sock stop -t 0 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.109345 31451 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -v 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.212870 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.513255 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.815946 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.119107 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.421722 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.724777 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.028252 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.331799 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.634660 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.938190 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:28] :   [Step 10/10] I1028 23:09:28.241756 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:28] :   [Step 10/10] I1028 23:09:28.544697 31435 docker.cpp:972] 

[jira] [Commented] (MESOS-6507) 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails consistently.

2016-10-28 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617059#comment-15617059
 ] 

Jie Yu commented on MESOS-6507:
---

[~tnachen] Can you take a look? This is blocking 1.0.2 release.

> 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails 
> consistently.
> --
>
> Key: MESOS-6507
> URL: https://issues.apache.org/jira/browse/MESOS-6507
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, test
>Reporter: Gilbert Song
>  Labels: failure
>
> Here is the log:
> {noformat}
> [23:09:24] :   [Step 10/10] [ RUN  ] 
> DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.304638 31435 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -f -v mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.398941 31435 resources.cpp:572] 
> Parsing resources as JSON failed: cpus:1;mem:512
> [23:09:24] :   [Step 10/10] Trying semicolon-delimited string format instead
> [23:09:24] :   [Step 10/10] I1028 23:09:24.399123 31435 docker.cpp:809] 
> Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 
> 536870912 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-s1.malformedUUID -v 
> /mnt/teamcity/temp/buildTmp/DockerContainerizerTest_ROOT_DOCKER_SkipRecoverMalformedUUID_rjDyqa:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name mesos-s1.malformedUUID alpine -c 
> sleep 1000
> [23:09:24] :   [Step 10/10] I1028 23:09:24.401227 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.700460 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804401 31453 docker.cpp:785] 
> Recovering Docker containers
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804477 31453 docker.cpp:1091] 
> Running docker -H unix:///var/run/docker.sock ps -a
> [23:09:24] :   [Step 10/10] I1028 23:09:24.905027 31454 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] W1028 23:09:25.008965 31454 docker.cpp:838] 
> Skipping recovery of executor '' of framework '' because its latest run could 
> not be recovered
> [23:09:25] :   [Step 10/10] I1028 23:09:25.008996 31454 docker.cpp:957] 
> Checking if Docker container named '/mesos-s1.malformedUUID' was started by 
> Mesos
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009019 31454 docker.cpp:967] 
> Checking if Mesos container with ID 'malformedUUID' has been orphaned
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009052 31454 docker.cpp:860] 
> Running docker -H unix:///var/run/docker.sock stop -t 0 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.109345 31451 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -v 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.212870 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.513255 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.815946 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.119107 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.421722 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.724777 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.028252 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.331799 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.634660 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.938190 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:28] :   [Step 10/10] I1028 23:09:28.241756 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:28] :   [Step 10/10] 

[jira] [Updated] (MESOS-6507) 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails consistently.

2016-10-28 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-6507:
--
Priority: Blocker  (was: Major)

> 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails 
> consistently.
> --
>
> Key: MESOS-6507
> URL: https://issues.apache.org/jira/browse/MESOS-6507
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, test
>Reporter: Gilbert Song
>Priority: Blocker
>  Labels: failure
>
> Here is the log:
> {noformat}
> [23:09:24] :   [Step 10/10] [ RUN  ] 
> DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.304638 31435 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -f -v mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.398941 31435 resources.cpp:572] 
> Parsing resources as JSON failed: cpus:1;mem:512
> [23:09:24] :   [Step 10/10] Trying semicolon-delimited string format instead
> [23:09:24] :   [Step 10/10] I1028 23:09:24.399123 31435 docker.cpp:809] 
> Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 
> 536870912 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-s1.malformedUUID -v 
> /mnt/teamcity/temp/buildTmp/DockerContainerizerTest_ROOT_DOCKER_SkipRecoverMalformedUUID_rjDyqa:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name mesos-s1.malformedUUID alpine -c 
> sleep 1000
> [23:09:24] :   [Step 10/10] I1028 23:09:24.401227 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.700460 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804401 31453 docker.cpp:785] 
> Recovering Docker containers
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804477 31453 docker.cpp:1091] 
> Running docker -H unix:///var/run/docker.sock ps -a
> [23:09:24] :   [Step 10/10] I1028 23:09:24.905027 31454 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] W1028 23:09:25.008965 31454 docker.cpp:838] 
> Skipping recovery of executor '' of framework '' because its latest run could 
> not be recovered
> [23:09:25] :   [Step 10/10] I1028 23:09:25.008996 31454 docker.cpp:957] 
> Checking if Docker container named '/mesos-s1.malformedUUID' was started by 
> Mesos
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009019 31454 docker.cpp:967] 
> Checking if Mesos container with ID 'malformedUUID' has been orphaned
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009052 31454 docker.cpp:860] 
> Running docker -H unix:///var/run/docker.sock stop -t 0 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.109345 31451 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -v 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.212870 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.513255 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.815946 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.119107 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.421722 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.724777 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.028252 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.331799 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.634660 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.938190 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:28] :   [Step 10/10] I1028 23:09:28.241756 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:28] :   [Step 10/10] I1028 23:09:28.544697 31435 

[jira] [Commented] (MESOS-6507) 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails consistently.

2016-10-28 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617025#comment-15617025
 ] 

Gilbert Song commented on MESOS-6507:
-

cc [~ManuwelaKanade-GSLab][~tnachen]

> 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails 
> consistently.
> --
>
> Key: MESOS-6507
> URL: https://issues.apache.org/jira/browse/MESOS-6507
> Project: Mesos
>  Issue Type: Bug
>  Components: docker, test
>Reporter: Gilbert Song
>  Labels: failure
>
> Here is the log:
> {noformat}
> [23:09:24] :   [Step 10/10] [ RUN  ] 
> DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.304638 31435 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -f -v mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.398941 31435 resources.cpp:572] 
> Parsing resources as JSON failed: cpus:1;mem:512
> [23:09:24] :   [Step 10/10] Trying semicolon-delimited string format instead
> [23:09:24] :   [Step 10/10] I1028 23:09:24.399123 31435 docker.cpp:809] 
> Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 
> 536870912 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-s1.malformedUUID -v 
> /mnt/teamcity/temp/buildTmp/DockerContainerizerTest_ROOT_DOCKER_SkipRecoverMalformedUUID_rjDyqa:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name mesos-s1.malformedUUID alpine -c 
> sleep 1000
> [23:09:24] :   [Step 10/10] I1028 23:09:24.401227 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.700460 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804401 31453 docker.cpp:785] 
> Recovering Docker containers
> [23:09:24] :   [Step 10/10] I1028 23:09:24.804477 31453 docker.cpp:1091] 
> Running docker -H unix:///var/run/docker.sock ps -a
> [23:09:24] :   [Step 10/10] I1028 23:09:24.905027 31454 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] W1028 23:09:25.008965 31454 docker.cpp:838] 
> Skipping recovery of executor '' of framework '' because its latest run could 
> not be recovered
> [23:09:25] :   [Step 10/10] I1028 23:09:25.008996 31454 docker.cpp:957] 
> Checking if Docker container named '/mesos-s1.malformedUUID' was started by 
> Mesos
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009019 31454 docker.cpp:967] 
> Checking if Mesos container with ID 'malformedUUID' has been orphaned
> [23:09:25] :   [Step 10/10] I1028 23:09:25.009052 31454 docker.cpp:860] 
> Running docker -H unix:///var/run/docker.sock stop -t 0 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.109345 31451 docker.cpp:933] 
> Running docker -H unix:///var/run/docker.sock rm -v 
> 1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
> [23:09:25] :   [Step 10/10] I1028 23:09:25.212870 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.513255 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:25] :   [Step 10/10] I1028 23:09:25.815946 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.119107 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.421722 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:26] :   [Step 10/10] I1028 23:09:26.724777 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.028252 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.331799 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.634660 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:27] :   [Step 10/10] I1028 23:09:27.938190 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:28] :   [Step 10/10] I1028 23:09:28.241756 31435 docker.cpp:972] 
> Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
> [23:09:28] :   [Step 10/10] I1028 

[jira] [Created] (MESOS-6507) 'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails consistently.

2016-10-28 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-6507:
---

 Summary: 
'DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID' fails 
consistently.
 Key: MESOS-6507
 URL: https://issues.apache.org/jira/browse/MESOS-6507
 Project: Mesos
  Issue Type: Bug
  Components: docker, test
Reporter: Gilbert Song


Here is the log:
{noformat}
[23:09:24] : [Step 10/10] [ RUN  ] 
DockerContainerizerTest.ROOT_DOCKER_SkipRecoverMalformedUUID
[23:09:24] : [Step 10/10] I1028 23:09:24.304638 31435 docker.cpp:933] 
Running docker -H unix:///var/run/docker.sock rm -f -v mesos-s1.malformedUUID
[23:09:24] : [Step 10/10] I1028 23:09:24.398941 31435 resources.cpp:572] 
Parsing resources as JSON failed: cpus:1;mem:512
[23:09:24] : [Step 10/10] Trying semicolon-delimited string format instead
[23:09:24] : [Step 10/10] I1028 23:09:24.399123 31435 docker.cpp:809] 
Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 
536870912 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
MESOS_CONTAINER_NAME=mesos-s1.malformedUUID -v 
/mnt/teamcity/temp/buildTmp/DockerContainerizerTest_ROOT_DOCKER_SkipRecoverMalformedUUID_rjDyqa:/mnt/mesos/sandbox
 --net host --entrypoint /bin/sh --name mesos-s1.malformedUUID alpine -c sleep 
1000
[23:09:24] : [Step 10/10] I1028 23:09:24.401227 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:24] : [Step 10/10] I1028 23:09:24.700460 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:24] : [Step 10/10] I1028 23:09:24.804401 31453 docker.cpp:785] 
Recovering Docker containers
[23:09:24] : [Step 10/10] I1028 23:09:24.804477 31453 docker.cpp:1091] 
Running docker -H unix:///var/run/docker.sock ps -a
[23:09:24] : [Step 10/10] I1028 23:09:24.905027 31454 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:25] : [Step 10/10] W1028 23:09:25.008965 31454 docker.cpp:838] 
Skipping recovery of executor '' of framework '' because its latest run could 
not be recovered
[23:09:25] : [Step 10/10] I1028 23:09:25.008996 31454 docker.cpp:957] 
Checking if Docker container named '/mesos-s1.malformedUUID' was started by 
Mesos
[23:09:25] : [Step 10/10] I1028 23:09:25.009019 31454 docker.cpp:967] 
Checking if Mesos container with ID 'malformedUUID' has been orphaned
[23:09:25] : [Step 10/10] I1028 23:09:25.009052 31454 docker.cpp:860] 
Running docker -H unix:///var/run/docker.sock stop -t 0 
1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
[23:09:25] : [Step 10/10] I1028 23:09:25.109345 31451 docker.cpp:933] 
Running docker -H unix:///var/run/docker.sock rm -v 
1e9990dbadad6078ceda5d5e0cbfd62b9242c22359126b42dca77d6fdd9a2747
[23:09:25] : [Step 10/10] I1028 23:09:25.212870 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:25] : [Step 10/10] I1028 23:09:25.513255 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:25] : [Step 10/10] I1028 23:09:25.815946 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:26] : [Step 10/10] I1028 23:09:26.119107 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:26] : [Step 10/10] I1028 23:09:26.421722 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:26] : [Step 10/10] I1028 23:09:26.724777 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:27] : [Step 10/10] I1028 23:09:27.028252 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:27] : [Step 10/10] I1028 23:09:27.331799 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:27] : [Step 10/10] I1028 23:09:27.634660 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:27] : [Step 10/10] I1028 23:09:27.938190 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:28] : [Step 10/10] I1028 23:09:28.241756 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:28] : [Step 10/10] I1028 23:09:28.544697 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:28] : [Step 10/10] I1028 23:09:28.847909 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock inspect mesos-s1.malformedUUID
[23:09:29] : [Step 10/10] I1028 23:09:29.151430 31435 docker.cpp:972] 
Running docker -H unix:///var/run/docker.sock 

[jira] [Created] (MESOS-6506) Show framework info in /state and /frameworks for frameworks that have orphan tasks

2016-10-28 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6506:
-

 Summary: Show framework info in /state and /frameworks for 
frameworks that have orphan tasks
 Key: MESOS-6506
 URL: https://issues.apache.org/jira/browse/MESOS-6506
 Project: Mesos
  Issue Type: Improvement
Reporter: Vinod Kone
Assignee: Vinod Kone


Since Mesos 1.0, the master has access to FrameworkInfo of frameworks that have 
orphan tasks. So we could expose this information in /state and /frameworks 
endpoints. Note that this information is already present in the v1 operator API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6502) _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java binding.

2016-10-28 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616903#comment-15616903
 ] 

Joris Van Remoortere edited comment on MESOS-6502 at 10/28/16 11:16 PM:


{{1.1.x}}
{code}
commit e105363a52e219a565acc91144788600eb0b9aeb
Author: Joris Van Remoortere 
Date:   Fri Oct 28 15:50:10 2016 -0400

Fixed MesosNativeLibrary to use '_NUM' MESOS_VERSION macros.

Review: https://reviews.apache.org/r/53270
{code}
{{1.0.2}}
{code}
commit 9b8c54282c5337e28d99bc0025661131bde2246f
Author: Joris Van Remoortere 
Date:   Fri Oct 28 15:50:10 2016 -0400

Fixed MesosNativeLibrary to use '_NUM' MESOS_VERSION macros.

Review: https://reviews.apache.org/r/53270
{code}


was (Author: jvanremoortere):
{{1.1.x}}
{code}
commit e105363a52e219a565acc91144788600eb0b9aeb
Author: Joris Van Remoortere 
Date:   Fri Oct 28 15:50:10 2016 -0400

Fixed MesosNativeLibrary to use '_NUM' MESOS_VERSION macros.

Review: https://reviews.apache.org/r/53270
{code}

> _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java 
> binding.
> ---
>
> Key: MESOS-6502
> URL: https://issues.apache.org/jira/browse/MESOS-6502
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.2, 1.1.0
>
>
> When the macros were re-assigned they were not flushed fully through the 
> codebase:
> https://github.com/apache/mesos/commit/6bc6a40a54491cfd733263cd3962e490b0b4bdbb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6502) _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java binding.

2016-10-28 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616903#comment-15616903
 ] 

Joris Van Remoortere commented on MESOS-6502:
-

{{1.1.x}}
{code}
commit e105363a52e219a565acc91144788600eb0b9aeb
Author: Joris Van Remoortere 
Date:   Fri Oct 28 15:50:10 2016 -0400

Fixed MesosNativeLibrary to use '_NUM' MESOS_VERSION macros.

Review: https://reviews.apache.org/r/53270
{code}

> _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java 
> binding.
> ---
>
> Key: MESOS-6502
> URL: https://issues.apache.org/jira/browse/MESOS-6502
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.0.2, 1.1.0
>
>
> When the macros were re-assigned they were not flushed fully through the 
> codebase:
> https://github.com/apache/mesos/commit/6bc6a40a54491cfd733263cd3962e490b0b4bdbb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4638) versioning preprocessor macros

2016-10-28 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616900#comment-15616900
 ] 

Joris Van Remoortere commented on MESOS-4638:
-

{{1.0.2}}:
{code}
commit 5668d4ff2655f120ca3d66c509efa40e24d5faf3
Author: Zhitao Li 
Date:   Wed Aug 17 09:34:27 2016 -0700

Introduce MESOS_{MAJOR|MINOR|PATCH}_VERSION_NUM macros.

This makes version based conditional compiling much easier for
module writers.

Review: https://reviews.apache.org/r/50992/
{code}

> versioning preprocessor macros
> --
>
> Key: MESOS-4638
> URL: https://issues.apache.org/jira/browse/MESOS-4638
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Reporter: James Peach
>Assignee: Zhitao Li
> Fix For: 1.0.2, 1.1.0
>
>
> The macros in {{version.hpp}} cannot be used for conditional build because 
> they are strings not integers. It would be helpful to have integer versions 
> of these for conditionally building code against different versions of the 
> Mesos API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6505) Figure out a way to only show output for failed tests

2016-10-28 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-6505:
-
Attachment: gtest-wrapper.py

Attached a wrapper that I wrote a while ago, but didn't really use, cause it 
was just a proof-of-concept.

> Figure out a way to only show output for failed tests
> -
>
> Key: MESOS-6505
> URL: https://issues.apache.org/jira/browse/MESOS-6505
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Reporter: Vinod Kone
> Attachments: gtest-wrapper.py
>
>
> Currently, whether `make check` shows output or not is an all or nothing 
> setting. This makes the CI logs unnecessarily verbose. If there is a way 
> where the output is only shown for failed tests that would be ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6501) Add a test for duplicate framework ids in "unregistered_frameworks"

2016-10-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6501:
--
Shepherd: Anand Mazumdar
  Sprint: Mesosphere Sprint 46
Story Points: 2

https://reviews.apache.org/r/53275/

> Add a test for duplicate framework ids in "unregistered_frameworks"
> ---
>
> Key: MESOS-6501
> URL: https://issues.apache.org/jira/browse/MESOS-6501
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Vinod Kone
>
> For details see MESOS-4973 and MESOS-6461.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4638) versioning preprocessor macros

2016-10-28 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4638:

Fix Version/s: 1.0.2

> versioning preprocessor macros
> --
>
> Key: MESOS-4638
> URL: https://issues.apache.org/jira/browse/MESOS-4638
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api
>Reporter: James Peach
>Assignee: Zhitao Li
> Fix For: 1.0.2, 1.1.0
>
>
> The macros in {{version.hpp}} cannot be used for conditional build because 
> they are strings not integers. It would be helpful to have integer versions 
> of these for conditionally building code against different versions of the 
> Mesos API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6461) Duplicate framework ids in /master/frameworks endpoint 'unregistered_frameworks'.

2016-10-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6461:
--
Fix Version/s: 1.1.0

Backported to 1.1.0

commit 3c51acb88179c6dd12f81a131edbd667a578b961
Author: Vinod Kone 
Date:   Fri Oct 28 14:04:18 2016 -0700

Added MESOS-4973 and MESOS-6461 to 1.1.0 CHANGELOG.

commit 7460d1d12dd8d75d1829527392594c0d5786d015
Author: Vinod Kone 
Date:   Mon Oct 24 17:05:12 2016 -0700

Fixed duplicate framework ids in "unregistered_frameworks".

The existing test (MasterTest.OrphanTasks) continues to pass after
the change. I will try to write another test that spawns multiple
agents to ensure the duplicate framework ids are not shown.

Review: https://reviews.apache.org/r/53159


> Duplicate framework ids in /master/frameworks endpoint 
> 'unregistered_frameworks'.
> -
>
> Key: MESOS-6461
> URL: https://issues.apache.org/jira/browse/MESOS-6461
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Gilbert Song
>Assignee: Vinod Kone
>Priority: Minor
>  Labels: master
> Fix For: 1.0.2, 1.1.0, 1.2.0
>
>
> This issue was exposed from MESOS-6400. There are duplicate framework ids 
> presented from the /master/frameworks endpoint due to:
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1338
> We should use a `set` or a `hashset` instead of an array, to avoid duplicate 
> ids.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4973) Duplicates in 'unregistered_frameworks' in /state

2016-10-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4973:
--
Fix Version/s: 1.1.0

Backported to 1.1.0

commit 3c51acb88179c6dd12f81a131edbd667a578b961
Author: Vinod Kone 
Date:   Fri Oct 28 14:04:18 2016 -0700

Added MESOS-4973 and MESOS-6461 to 1.1.0 CHANGELOG.

commit 7460d1d12dd8d75d1829527392594c0d5786d015
Author: Vinod Kone 
Date:   Mon Oct 24 17:05:12 2016 -0700

Fixed duplicate framework ids in "unregistered_frameworks".

The existing test (MasterTest.OrphanTasks) continues to pass after
the change. I will try to write another test that spawns multiple
agents to ensure the duplicate framework ids are not shown.

Review: https://reviews.apache.org/r/53159


> Duplicates in 'unregistered_frameworks' in /state 
> --
>
> Key: MESOS-4973
> URL: https://issues.apache.org/jira/browse/MESOS-4973
> Project: Mesos
>  Issue Type: Bug
>Reporter: Yan Xu
>Assignee: Vinod Kone
>Priority: Minor
>  Labels: mesosphere
> Fix For: 1.0.2, 1.1.0, 1.2.0
>
>
> In our clusters where many frameworks run, 'unregistered_frameworks' 
> currently doesn't show what it semantically means, but rather "a list of 
> frameworkIDs for each orphaned task", which means a lot of duplicated 
> frameworkIDs.
> For this filed to be useful we need to deduplicate when outputting the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6461) Duplicate framework ids in /master/frameworks endpoint 'unregistered_frameworks'.

2016-10-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-6461:
--
Fix Version/s: 1.0.2

Backported to 1.0.2

commit d073ed53b62b978385b4df2b63df32711f41de99
Author: Vinod Kone 
Date:   Fri Oct 28 13:56:50 2016 -0700

Added MESOS-4973 and MESOS-6461 to 1.0.2 CHANGELOG.

commit 81f9c6977f0d7706ef4e454b4e06da024076e82c
Author: Vinod Kone 
Date:   Mon Oct 24 17:05:12 2016 -0700

Fixed duplicate framework ids in "unregistered_frameworks".

The existing test (MasterTest.OrphanTasks) continues to pass after
the change. I will try to write another test that spawns multiple
agents to ensure the duplicate framework ids are not shown.

Review: https://reviews.apache.org/r/53159


> Duplicate framework ids in /master/frameworks endpoint 
> 'unregistered_frameworks'.
> -
>
> Key: MESOS-6461
> URL: https://issues.apache.org/jira/browse/MESOS-6461
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Gilbert Song
>Assignee: Vinod Kone
>Priority: Minor
>  Labels: master
> Fix For: 1.0.2, 1.2.0
>
>
> This issue was exposed from MESOS-6400. There are duplicate framework ids 
> presented from the /master/frameworks endpoint due to:
> https://github.com/apache/mesos/blob/master/src/master/http.cpp#L1338
> We should use a `set` or a `hashset` instead of an array, to avoid duplicate 
> ids.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6503) Mesos containerizer launch should not be pid 1 inside the container.

2016-10-28 Thread Jason Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616529#comment-15616529
 ] 

Jason Lai commented on MESOS-6503:
--

Would be good to:
# make the `mesos-containerizer`'s launcher process live outside of the PID 
namespace like Docker containerd's shim process;
# the namespace unsharing should happen in the launcher's child process before 
`execv*(2)` into executors

> Mesos containerizer launch should not be pid 1 inside the container.
> 
>
> Key: MESOS-6503
> URL: https://issues.apache.org/jira/browse/MESOS-6503
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> This can solve two issues:
> 1) people cannot see arguments of mesos-containerizer launch
> 2) allow users to use other init system (e.g., systemd) inside the container



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem

2016-10-28 Thread Jason Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616497#comment-15616497
 ] 

Jason Lai commented on MESOS-6162:
--

Hi haosdent! Not sure if you're working on this ticket, at Uber we have been 
collecting blkio stats for Docker containers, and I would like to take this 
task as an effort to maintain feature parity with our Docker containers. Are 
you okay with that?

> Add support for cgroups blkio subsystem
> ---
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>Reporter: haosdent
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4975) mesos::internal::master::Slave::tasks can grow unboundedly

2016-10-28 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616489#comment-15616489
 ] 

Yan Xu commented on MESOS-4975:
---

Backported to 1.0.x.

{noformat:title=}
commit 864fd7db38bdca51e09ae9a7771c3f2172a19e4c
Author: Jiang Yan Xu 
Date:   Thu Oct 27 13:30:24 2016 -0700

Fixed master that leaks empty entries in its hashmaps.

This fixes the CHECK failure mentioned in MESOS-6482.

Review: https://reviews.apache.org/r/53208/
{noformat}

> mesos::internal::master::Slave::tasks can grow unboundedly
> --
>
> Key: MESOS-4975
> URL: https://issues.apache.org/jira/browse/MESOS-4975
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Yan Xu
>Assignee: Yan Xu
> Fix For: 1.0.2, 1.1.0, 1.2.0
>
>
> So in a Mesos cluster we observed the following
> {noformat:title=}
> $ jq '.orphan_tasks | length' state.json
> 1369
> $ jq '.unregistered_frameworks | length' state.json
> 20162
> {noformat}
> Aside from {{unregistered_frameworks}} here being "the list of frameworkIDs 
> for each orphan task" (described in MESOS-4973), the discrepancy between the 
> two values above is surprising.
> I think the problem is that we do this in the master:
> From 
> [source|https://github.com/apache/mesos/blob/e376d3aa0074710278224ccd17afd51971820dfb/src/master/master.cpp#L2212]:
> {code}
> foreachvalue (Slave* slave, slaves.registered) {
>   foreachvalue (Task* task, slave->tasks[framework->id()]) {
> framework->addTask(task);
>   }
>   foreachvalue (const ExecutorInfo& executor,
> slave->executors[framework->id()]) {
> framework->addExecutor(slave->id, executor);
>   }
> }
> {code}
> Here an {{operator[]}} is used whenever a framework subscribes regardless of 
> whether this agent has tasks for the framework or not.
> If the agent has no such task for this framework, then this \{frameworkID: 
> empty hashmap\} entry will stay in the map indefinitely! If frameworks are 
> ephemeral and new ones keep come in, the map grows unboundedly.
> We should do {{tasks.contains(frameworkId)}} before using the {{[] operator}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6504) Use 'geteuid()' for the root privileges check.

2016-10-28 Thread Gilbert Song (JIRA)
Gilbert Song created MESOS-6504:
---

 Summary: Use 'geteuid()' for the root privileges check.
 Key: MESOS-6504
 URL: https://issues.apache.org/jira/browse/MESOS-6504
 Project: Mesos
  Issue Type: Bug
Reporter: Gilbert Song
Assignee: Gilbert Song


Currently, parts of code in Mesos check the root privileges using os::user() to 
compare to "root", which is not sufficient, since it compares the real user. 
When people change the mesos binary by 'setuid root', the process may not have 
the right permission to execute.

We should check the effective user id instead in our code. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4975) mesos::internal::master::Slave::tasks can grow unboundedly

2016-10-28 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-4975:
--
Fix Version/s: 1.0.2

> mesos::internal::master::Slave::tasks can grow unboundedly
> --
>
> Key: MESOS-4975
> URL: https://issues.apache.org/jira/browse/MESOS-4975
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Yan Xu
>Assignee: Yan Xu
> Fix For: 1.0.2, 1.1.0, 1.2.0
>
>
> So in a Mesos cluster we observed the following
> {noformat:title=}
> $ jq '.orphan_tasks | length' state.json
> 1369
> $ jq '.unregistered_frameworks | length' state.json
> 20162
> {noformat}
> Aside from {{unregistered_frameworks}} here being "the list of frameworkIDs 
> for each orphan task" (described in MESOS-4973), the discrepancy between the 
> two values above is surprising.
> I think the problem is that we do this in the master:
> From 
> [source|https://github.com/apache/mesos/blob/e376d3aa0074710278224ccd17afd51971820dfb/src/master/master.cpp#L2212]:
> {code}
> foreachvalue (Slave* slave, slaves.registered) {
>   foreachvalue (Task* task, slave->tasks[framework->id()]) {
> framework->addTask(task);
>   }
>   foreachvalue (const ExecutorInfo& executor,
> slave->executors[framework->id()]) {
> framework->addExecutor(slave->id, executor);
>   }
> }
> {code}
> Here an {{operator[]}} is used whenever a framework subscribes regardless of 
> whether this agent has tasks for the framework or not.
> If the agent has no such task for this framework, then this \{frameworkID: 
> empty hashmap\} entry will stay in the map indefinitely! If frameworks are 
> ephemeral and new ones keep come in, the map grows unboundedly.
> We should do {{tasks.contains(frameworkId)}} before using the {{[] operator}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6502) _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java binding.

2016-10-28 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-6502:

Summary: _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in 
libmesos java binding.  (was: MESOS_{MAJOR,MINOR,PATCH}_VERSION incorrect in 
libmesos java binding)

> _version uses incorrect MESOS_{MAJOR,MINOR,PATCH}_VERSION in libmesos java 
> binding.
> ---
>
> Key: MESOS-6502
> URL: https://issues.apache.org/jira/browse/MESOS-6502
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>Priority: Blocker
>  Labels: mesosphere
> Fix For: 1.1.0
>
>
> When the macros were re-assigned they were not flushed fully through the 
> codebase:
> https://github.com/apache/mesos/commit/6bc6a40a54491cfd733263cd3962e490b0b4bdbb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6497) Java Scheduler Adapter does not surface MasterInfo.

2016-10-28 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6497:
--
  Sprint: Mesosphere Sprint 46
Story Points: 2

> Java Scheduler Adapter does not surface MasterInfo.
> ---
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
> Fix For: 1.1.0
>
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6501) Add a test for duplicate framework ids in "unregistered_frameworks"

2016-10-28 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-6501:
-

 Summary: Add a test for duplicate framework ids in 
"unregistered_frameworks"
 Key: MESOS-6501
 URL: https://issues.apache.org/jira/browse/MESOS-6501
 Project: Mesos
  Issue Type: Bug
Reporter: Vinod Kone
Assignee: Vinod Kone


For details see MESOS-4973 and MESOS-6461.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6500) SlaveRecoveryTest/0.ReconnectHTTPExecutor is flaky

2016-10-28 Thread Anand Mazumdar (JIRA)
Anand Mazumdar created MESOS-6500:
-

 Summary: SlaveRecoveryTest/0.ReconnectHTTPExecutor is flaky
 Key: MESOS-6500
 URL: https://issues.apache.org/jira/browse/MESOS-6500
 Project: Mesos
  Issue Type: Bug
Reporter: Anand Mazumdar


Showed up on ReviewBot. Unfortunately, the ReviewBot cleaned up the logs.

It seems like we are leaving orphan processes upon the test suite completion 
that leads to this test failing.
{code}
../../src/tests/environment.cpp:825: Failure
Failed
Tests completed with child processes remaining:
-+- 29429 /mesos/mesos-1.2.0/_build/src/.libs/lt-mesos-tests 
 \-+- 5970 /mesos/mesos-1.2.0/_build/src/.libs/lt-mesos-containerizer launch 
--command={"arguments":["mesos-executor","--launcher_dir=\/mesos\/mesos-1.2.0\/_build\/src"],"shell":false,"value":"\/mesos\/mesos-1.2.0\/_build\/src\/mesos-executor"}
 
--environment={"LIBPROCESS_PORT":"0","MESOS_AGENT_ENDPOINT":"172.17.0.2:52560","MESOS_CHECKPOINT":"1","MESOS_DIRECTORY":"\/tmp\/SlaveRecoveryTest_0_ReconnectHTTPExecutor_kyPmzZ\/slaves\/6ace31e5-eac7-41f8-a938-64d648610484-S0\/frameworks\/6ace31e5-eac7-41f8-a938-64d648610484-\/executors\/e4f3e7e4-1acf-46d6-9768-259be617a17a\/runs\/6517ed10-859f-41d1-b5b4-75dc5c0c2a23","MESOS_EXECUTOR_ID":"e4f3e7e4-1acf-46d6-9768-259be617a17a","MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD":"5secs","MESOS_FRAMEWORK_ID":"6ace31e5-eac7-41f8-a938-64d648610484-","M
 
ESOS_HTTP_COMMAND_EXECUTOR":"1","MESOS_RECOVERY_TIMEOUT":"15mins","MESOS_SANDBOX":"\/tmp\/SlaveRecoveryTest_0_ReconnectHTTPExecutor_kyPmzZ\/slaves\/6ace31e5-eac7-41f8-a938-64d648610484-S0\/frameworks\/6ace31e5-eac7-41f8-a938-64d648610484-\/executors\/e4f3e7e4-1acf-46d6-9768-259be617a17a\/runs\/6517ed10-859f-41d1-b5b4-75dc5c0c2a23","MESOS_SLAVE_ID":"6ace31e5-eac7-41f8-a938-64d648610484-S0","MESOS_SLAVE_PID":"agent@172.17.0.2:52560","MESOS_SUBSCRIPTION_BACKOFF_MAX":"2secs","PATH":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin"}
 --help=false --pipe_read=72 --pipe_write=77 --pre_exec_commands=[] 
--runtime_directory=/tmp/SlaveRecoveryTest_0_ReconnectHTTPExecutor_FIHcEr/containers/6517ed10-859f-41d1-b5b4-75dc5c0c2a23
 --unshare_namespace_mnt=false --user=mesos 
--working_directory=/tmp/SlaveRecoveryTest_0_ReconnectHTTPExecutor_
 
kyPmzZ/slaves/6ace31e5-eac7-41f8-a938-64d648610484-S0/frameworks/6ace31e5-eac7-41f8-a938-64d648610484-/executors/e4f3e7e4-1acf-46d6-9768-259be617a17a/runs/6517ed10-859f-41d1-b5b4-75dc5c0c2a23
 
   \-+- 5984 /mesos/mesos-1.2.0/_build/src/.libs/lt-mesos-executor 
--launcher_dir=/mesos/mesos-1.2.0/_build/src 
 \-+- 6015 sh -c sleep 1000 
   \--- 6029 sleep 1000 
[==] 1369 tests from 155 test cases ran. (465777 ms total)
[  PASSED  ] 1368 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.ReconnectHTTPExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6500) SlaveRecoveryTest/0.ReconnectHTTPExecutor is flaky

2016-10-28 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616236#comment-15616236
 ] 

Anand Mazumdar commented on MESOS-6500:
---

cc: [~jieyu] [~gilbert]

> SlaveRecoveryTest/0.ReconnectHTTPExecutor is flaky
> --
>
> Key: MESOS-6500
> URL: https://issues.apache.org/jira/browse/MESOS-6500
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>  Labels: flaky, flaky-test
>
> Showed up on ReviewBot. Unfortunately, the ReviewBot cleaned up the logs.
> It seems like we are leaving orphan processes upon the test suite completion 
> that leads to this test failing.
> {code}
> ../../src/tests/environment.cpp:825: Failure
> Failed
> Tests completed with child processes remaining:
> -+- 29429 /mesos/mesos-1.2.0/_build/src/.libs/lt-mesos-tests 
>  \-+- 5970 /mesos/mesos-1.2.0/_build/src/.libs/lt-mesos-containerizer launch 
> --command={"arguments":["mesos-executor","--launcher_dir=\/mesos\/mesos-1.2.0\/_build\/src"],"shell":false,"value":"\/mesos\/mesos-1.2.0\/_build\/src\/mesos-executor"}
>  
> --environment={"LIBPROCESS_PORT":"0","MESOS_AGENT_ENDPOINT":"172.17.0.2:52560","MESOS_CHECKPOINT":"1","MESOS_DIRECTORY":"\/tmp\/SlaveRecoveryTest_0_ReconnectHTTPExecutor_kyPmzZ\/slaves\/6ace31e5-eac7-41f8-a938-64d648610484-S0\/frameworks\/6ace31e5-eac7-41f8-a938-64d648610484-\/executors\/e4f3e7e4-1acf-46d6-9768-259be617a17a\/runs\/6517ed10-859f-41d1-b5b4-75dc5c0c2a23","MESOS_EXECUTOR_ID":"e4f3e7e4-1acf-46d6-9768-259be617a17a","MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD":"5secs","MESOS_FRAMEWORK_ID":"6ace31e5-eac7-41f8-a938-64d648610484-","M
>  
> ESOS_HTTP_COMMAND_EXECUTOR":"1","MESOS_RECOVERY_TIMEOUT":"15mins","MESOS_SANDBOX":"\/tmp\/SlaveRecoveryTest_0_ReconnectHTTPExecutor_kyPmzZ\/slaves\/6ace31e5-eac7-41f8-a938-64d648610484-S0\/frameworks\/6ace31e5-eac7-41f8-a938-64d648610484-\/executors\/e4f3e7e4-1acf-46d6-9768-259be617a17a\/runs\/6517ed10-859f-41d1-b5b4-75dc5c0c2a23","MESOS_SLAVE_ID":"6ace31e5-eac7-41f8-a938-64d648610484-S0","MESOS_SLAVE_PID":"agent@172.17.0.2:52560","MESOS_SUBSCRIPTION_BACKOFF_MAX":"2secs","PATH":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin:\/usr\/bin:\/sbin:\/bin"}
>  --help=false --pipe_read=72 --pipe_write=77 --pre_exec_commands=[] 
> --runtime_directory=/tmp/SlaveRecoveryTest_0_ReconnectHTTPExecutor_FIHcEr/containers/6517ed10-859f-41d1-b5b4-75dc5c0c2a23
>  --unshare_namespace_mnt=false --user=mesos 
> --working_directory=/tmp/SlaveRecoveryTest_0_ReconnectHTTPExecutor_
>  
> kyPmzZ/slaves/6ace31e5-eac7-41f8-a938-64d648610484-S0/frameworks/6ace31e5-eac7-41f8-a938-64d648610484-/executors/e4f3e7e4-1acf-46d6-9768-259be617a17a/runs/6517ed10-859f-41d1-b5b4-75dc5c0c2a23
>  
>\-+- 5984 /mesos/mesos-1.2.0/_build/src/.libs/lt-mesos-executor 
> --launcher_dir=/mesos/mesos-1.2.0/_build/src 
>  \-+- 6015 sh -c sleep 1000 
>\--- 6029 sleep 1000 
> [==] 1369 tests from 155 test cases ran. (465777 ms total)
> [  PASSED  ] 1368 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] SlaveRecoveryTest/0.ReconnectHTTPExecutor, where TypeParam = 
> mesos::internal::slave::MesosContainerizer
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4973) Duplicates in 'unregistered_frameworks' in /state

2016-10-28 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-4973:
--
Shepherd: Anand Mazumdar
Assignee: Vinod Kone
Target Version/s: 1.0.2

> Duplicates in 'unregistered_frameworks' in /state 
> --
>
> Key: MESOS-4973
> URL: https://issues.apache.org/jira/browse/MESOS-4973
> Project: Mesos
>  Issue Type: Bug
>Reporter: Yan Xu
>Assignee: Vinod Kone
>Priority: Minor
>  Labels: mesosphere
>
> In our clusters where many frameworks run, 'unregistered_frameworks' 
> currently doesn't show what it semantically means, but rather "a list of 
> frameworkIDs for each orphaned task", which means a lot of duplicated 
> frameworkIDs.
> For this filed to be useful we need to deduplicate when outputting the list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6483) Check failure when a 1.1 master marking a 0.28 agent as unreachable

2016-10-28 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616123#comment-15616123
 ] 

Yan Xu commented on MESOS-6483:
---

Backported to 1.1.x.

{noformat:title=}
commit b18c5ccdbfcfea133fe366c82dc0578c948134b9
Author: Neil Conway 
Date:   Thu Oct 27 14:16:01 2016 -0700

Avoided CHECK failure with pre-1.0 agents.

We don't guarantee compatibility with pre-1.0 agents. However, since it
is easy to avoid a CHECK failure in the master when an old agent
re-registers, it seems worth doing so.

Review: https://reviews.apache.org/r/53202/
{noformat}

> Check failure when a 1.1 master marking a 0.28 agent as unreachable
> ---
>
> Key: MESOS-6483
> URL: https://issues.apache.org/jira/browse/MESOS-6483
> Project: Mesos
>  Issue Type: Bug
>Reporter: Megha
>Assignee: Neil Conway
> Fix For: 1.1.0, 1.2.0
>
>
> When upgrading directly from mesos version 0.28 to a version > 1.0 there 
> could be a scenario that may make the 
> CHECK(frameworks.recovered.contains(frameworkId)) in 
> Master::_markUnreachable(..) to fail. The following sequence of events can 
> happen.
> 1) The master gets upgraded first to the new version and the agent lets say X 
> is still at mesos version 0.28
> 2) This agent X (at mesos 0.28) attempts to re-registers with the master (at 
> lets say 1.1) and as a result doesn't send the frameworks (frameworkInfos) in 
> the ReRegisterSlave message since it wasn't available in the older mesos 
> version.
> 3) Among other frameworks on this agent X, is a framework Y which didn’t 
> re-register after master’s failover. Since the master builds the 
> frameworks.recovered from the frameworkInfos that agents provide it so this 
> framework Y is neither in the recovered nor in registered frameworks.
> 4) The agent X post re-registering fails master’s health check and is being 
> marked unreachable by the master. The check  
> CHECK(frameworks.recovered.contains(frameworkId)) will get fired for the 
> framework Y since it is neither in recovered or registered but has tasks 
> running on the agent X.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6483) Check failure when a 1.1 master marking a 0.28 agent as unreachable

2016-10-28 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-6483:
--
Fix Version/s: 1.2.0

> Check failure when a 1.1 master marking a 0.28 agent as unreachable
> ---
>
> Key: MESOS-6483
> URL: https://issues.apache.org/jira/browse/MESOS-6483
> Project: Mesos
>  Issue Type: Bug
>Reporter: Megha
>Assignee: Neil Conway
> Fix For: 1.1.0, 1.2.0
>
>
> When upgrading directly from mesos version 0.28 to a version > 1.0 there 
> could be a scenario that may make the 
> CHECK(frameworks.recovered.contains(frameworkId)) in 
> Master::_markUnreachable(..) to fail. The following sequence of events can 
> happen.
> 1) The master gets upgraded first to the new version and the agent lets say X 
> is still at mesos version 0.28
> 2) This agent X (at mesos 0.28) attempts to re-registers with the master (at 
> lets say 1.1) and as a result doesn't send the frameworks (frameworkInfos) in 
> the ReRegisterSlave message since it wasn't available in the older mesos 
> version.
> 3) Among other frameworks on this agent X, is a framework Y which didn’t 
> re-register after master’s failover. Since the master builds the 
> frameworks.recovered from the frameworkInfos that agents provide it so this 
> framework Y is neither in the recovered nor in registered frameworks.
> 4) The agent X post re-registering fails master’s health check and is being 
> marked unreachable by the master. The check  
> CHECK(frameworks.recovered.contains(frameworkId)) will get fired for the 
> framework Y since it is neither in recovered or registered but has tasks 
> running on the agent X.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4975) mesos::internal::master::Slave::tasks can grow unboundedly

2016-10-28 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-4975:
--
Fix Version/s: 1.2.0

> mesos::internal::master::Slave::tasks can grow unboundedly
> --
>
> Key: MESOS-4975
> URL: https://issues.apache.org/jira/browse/MESOS-4975
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Yan Xu
>Assignee: Yan Xu
> Fix For: 1.1.0, 1.2.0
>
>
> So in a Mesos cluster we observed the following
> {noformat:title=}
> $ jq '.orphan_tasks | length' state.json
> 1369
> $ jq '.unregistered_frameworks | length' state.json
> 20162
> {noformat}
> Aside from {{unregistered_frameworks}} here being "the list of frameworkIDs 
> for each orphan task" (described in MESOS-4973), the discrepancy between the 
> two values above is surprising.
> I think the problem is that we do this in the master:
> From 
> [source|https://github.com/apache/mesos/blob/e376d3aa0074710278224ccd17afd51971820dfb/src/master/master.cpp#L2212]:
> {code}
> foreachvalue (Slave* slave, slaves.registered) {
>   foreachvalue (Task* task, slave->tasks[framework->id()]) {
> framework->addTask(task);
>   }
>   foreachvalue (const ExecutorInfo& executor,
> slave->executors[framework->id()]) {
> framework->addExecutor(slave->id, executor);
>   }
> }
> {code}
> Here an {{operator[]}} is used whenever a framework subscribes regardless of 
> whether this agent has tasks for the framework or not.
> If the agent has no such task for this framework, then this \{frameworkID: 
> empty hashmap\} entry will stay in the map indefinitely! If frameworks are 
> ephemeral and new ones keep come in, the map grows unboundedly.
> We should do {{tasks.contains(frameworkId)}} before using the {{[] operator}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4975) mesos::internal::master::Slave::tasks can grow unboundedly

2016-10-28 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616120#comment-15616120
 ] 

Yan Xu commented on MESOS-4975:
---

Backported to 1.1.x

{noformat:title=}
commit 55322e7a206618e04109f462986c6a550afbd352
Author: Jiang Yan Xu 
Date:   Thu Oct 27 13:30:24 2016 -0700

Fixed master that leaks empty entries in its hashmaps.

This fixes the CHECK failure mentioned in MESOS-6482.

Review: https://reviews.apache.org/r/53208/
{noformat}

> mesos::internal::master::Slave::tasks can grow unboundedly
> --
>
> Key: MESOS-4975
> URL: https://issues.apache.org/jira/browse/MESOS-4975
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Yan Xu
>Assignee: Yan Xu
> Fix For: 1.1.0, 1.2.0
>
>
> So in a Mesos cluster we observed the following
> {noformat:title=}
> $ jq '.orphan_tasks | length' state.json
> 1369
> $ jq '.unregistered_frameworks | length' state.json
> 20162
> {noformat}
> Aside from {{unregistered_frameworks}} here being "the list of frameworkIDs 
> for each orphan task" (described in MESOS-4973), the discrepancy between the 
> two values above is surprising.
> I think the problem is that we do this in the master:
> From 
> [source|https://github.com/apache/mesos/blob/e376d3aa0074710278224ccd17afd51971820dfb/src/master/master.cpp#L2212]:
> {code}
> foreachvalue (Slave* slave, slaves.registered) {
>   foreachvalue (Task* task, slave->tasks[framework->id()]) {
> framework->addTask(task);
>   }
>   foreachvalue (const ExecutorInfo& executor,
> slave->executors[framework->id()]) {
> framework->addExecutor(slave->id, executor);
>   }
> }
> {code}
> Here an {{operator[]}} is used whenever a framework subscribes regardless of 
> whether this agent has tasks for the framework or not.
> If the agent has no such task for this framework, then this \{frameworkID: 
> empty hashmap\} entry will stay in the map indefinitely! If frameworks are 
> ephemeral and new ones keep come in, the map grows unboundedly.
> We should do {{tasks.contains(frameworkId)}} before using the {{[] operator}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3574) Support replacing ZooKeeper with replicated log

2016-10-28 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616044#comment-15616044
 ] 

Zhitao Li commented on MESOS-3574:
--

How will frameworks and agents detect where is, using replicated log? Are 
clients expected to hard code a list of master's ip:port and rely on redirect 
message from master?

> Support replacing ZooKeeper with replicated log
> ---
>
> Key: MESOS-3574
> URL: https://issues.apache.org/jira/browse/MESOS-3574
> Project: Mesos
>  Issue Type: Improvement
>  Components: leader election, replicated log
>Reporter: Neil Conway
>  Labels: mesosphere
>
> It would be useful to support using the replicated log without also requiring 
> ZooKeeper to be running. This would simplify the process of 
> configuring/operating a high-availability configuration of Mesos.
> At least three things would need to be done:
> 1. Abstract away the stuff we use Zk for into an interface that can be 
> implemented (e.g., by etcd, consul, rep-log, or Zk). This might be done 
> already as part of [MESOS-1806]
> 2. Enhance the replicated log to be able to do its own leader election + 
> failure detection (to decide when the current master is down).
> 3. Validate replicated log performance to ensure it is adequate (per Joris, 
> likely needs some significant work)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-6420) Mesos Agent leaking sockets when port mapping network isolator is ON

2016-10-28 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614902#comment-15614902
 ] 

Till Toenshoff edited comment on MESOS-6420 at 10/28/16 5:08 PM:
-

Cherry-picked onto 1.1.x:

{noformat}
commit ab92dc562fd93f8edccc37ff1919d655c15dd558
Author: Santhosh Kumar Shanmugham 
Date:   Thu Oct 20 08:54:52 2016 -0700

Close socket after setting flags on the interface.

Review: https://reviews.apache.org/r/53049/
{noformat}


was (Author: alexr):
Cherry-picked.

> Mesos Agent leaking sockets when port mapping network isolator is ON
> 
>
> Key: MESOS-6420
> URL: https://issues.apache.org/jira/browse/MESOS-6420
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network, slave
>Affects Versions: 1.0.2
>Reporter: Santhosh Shanmugham
>Priority: Blocker
> Fix For: 1.0.2, 1.1.0, 1.2.0
>
>
> Mesos Agent leaks one socket per task launched and eventually runs out of 
> sockets. We were able to track it down to the network isolator 
> (port_mapping.cpp). When we turned off the port mapping isolator no file 
> descriptors where leaked. The leaked fd is a SOCK_STREAM socket.
> Leaked Sockets:
> $ sudo lsof -p $(pgrep -u root -o -f /usr/local/sbin/mesos-slave) -nP | grep 
> "can't"
> [sudo] password for sshanmugham:
> mesos-sla 57688 root   19u  sock0,6  0t0 2993216948 can't 
> identify protocol
> mesos-sla 57688 root   27u  sock0,6  0t0 2993216468 can't 
> identify protocol
> Extract from strace:
> ...
> [pid 57701] 19:14:02.493718 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494395 close(19)   = 0
> [pid 57701] 19:14:02.494448 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494844 close(19)   = 0
> [pid 57701] 19:14:02.494913 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.495565 close(19)   = 0
> [pid 57701] 19:14:02.495617 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496072 close(19)   = 0
> [pid 57701] 19:14:02.496128 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496758 close(19)   = 0
> [pid 57701] 19:14:02.496812 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497270 close(19)   = 0
> [pid 57701] 19:14:02.497319 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497698 close(19)   = 0
> [pid 57701] 19:14:02.497750 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498407 close(19)   = 0
> [pid 57701] 19:14:02.498456 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498899 close(19)   = 0
> [pid 57701] 19:14:02.498963 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 63682] 19:14:02.499091 close(18 
> [pid 57701] 19:14:02.499634 close(19)   = 0
> [pid 57701] 19:14:02.499689 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500044 close(19)   = 0
> [pid 57701] 19:14:02.500093 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500734 close(19)   = 0
> [pid 57701] 19:14:02.500782 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.501271 close(19)   = 0
> [pid 57701] 19:14:02.501339 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.502030 close(19)   = 0
> [pid 57701] 19:14:02.502101 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 19
> ...
> ...
> [pid 57691] 19:18:03.461022 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461345 open("/etc/selinux/config", O_RDONLY  ...>
> [pid 57691] 19:18:03.461460 close(27)   = 0
> [pid 57691] 19:18:03.461520 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461632 close(3 
> [pid  6138] 19:18:03.461781 open("/proc/mounts", O_RDONLY 
> [pid  6138] 19:18:03.462190 close(3 
> [pid 57691] 19:18:03.462374 close(27)   = 0
> [pid 57691] 19:18:03.462430 socket(PF_NETLINK, SOCK_RAW, 0 
> [pid  6138] 19:18:03.462456 open("/proc/net/psched", O_RDONLY 
> [pid  6138] 19:18:03.462678 close(3 
> [pid  6138] 19:18:03.462915 open("/etc/libnl/classid", O_RDONLY  ...>
> [pid 57691] 19:18:03.463046 close(27)   = 0
> [pid 57691] 19:18:03.463111 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.463225 close(3 
> [pid 57691] 19:18:03.463845 close(27)   = 0
> [pid 57691] 19:18:03.463911 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.464604 close(27)   = 0
> [pid 57691] 19:18:03.464664 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465074 close(27)   = 0
> [pid 57691] 19:18:03.465132 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465862 close(27)   = 0
> [pid 57691] 19:18:03.465928 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.466713 close(27)   = 0
> [pid 57691] 19:18:03.466780 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.467472 close(27)   = 0
> [pid 57691] 19:18:03.467524 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> 

[jira] [Comment Edited] (MESOS-6497) Java Scheduler Adapter does not surface MasterInfo.

2016-10-28 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615960#comment-15615960
 ] 

Till Toenshoff edited comment on MESOS-6497 at 10/28/16 5:07 PM:
-

1.1.x cherry-pick:

{noformat}
commit d9236ab50cea13a1eafb4b4ade902675ee211493
Author: Anand Mazumdar 
Date:   Thu Oct 27 15:48:24 2016 -0700

Populated `MasterInfo` in the v0 Java adapter.

Review: https://reviews.apache.org/r/53247

commit aa0f2b662e4d87b2fd547d88dc1d6c894827636b
Author: Anand Mazumdar 
Date:   Thu Oct 27 15:46:01 2016 -0700

Populated `MasterInfo` evolving from v0 framework registered message.

Review: https://reviews.apache.org/r/53246

commit 5e37ed57b457c330ab05e4ed619f5be8e073808f
Author: Anand Mazumdar 
Date:   Thu Oct 27 15:42:45 2016 -0700

Added `MasterInfo` to the subscribed event.

This would be useful for schedulers that want more information
about the master if they are using an alternate detector
implementation that does not query Master ZK entry. Also,
this is needed presently by the V0 -> V1 HTTP adapter. The
old driver based schedulers used to get this information via
the `Framework(Re-)registered` event.

Review: https://reviews.apache.org/r/53245
{noformat}


was (Author: tillt):
1.1.x backport:

{noformat}
commit d9236ab50cea13a1eafb4b4ade902675ee211493
Author: Anand Mazumdar 
Date:   Thu Oct 27 15:48:24 2016 -0700

Populated `MasterInfo` in the v0 Java adapter.

Review: https://reviews.apache.org/r/53247

commit aa0f2b662e4d87b2fd547d88dc1d6c894827636b
Author: Anand Mazumdar 
Date:   Thu Oct 27 15:46:01 2016 -0700

Populated `MasterInfo` evolving from v0 framework registered message.

Review: https://reviews.apache.org/r/53246

commit 5e37ed57b457c330ab05e4ed619f5be8e073808f
Author: Anand Mazumdar 
Date:   Thu Oct 27 15:42:45 2016 -0700

Added `MasterInfo` to the subscribed event.

This would be useful for schedulers that want more information
about the master if they are using an alternate detector
implementation that does not query Master ZK entry. Also,
this is needed presently by the V0 -> V1 HTTP adapter. The
old driver based schedulers used to get this information via
the `Framework(Re-)registered` event.

Review: https://reviews.apache.org/r/53245
{noformat}

> Java Scheduler Adapter does not surface MasterInfo.
> ---
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
> Fix For: 1.1.0
>
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6497) Java Scheduler Adapter does not surface MasterInfo.

2016-10-28 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615960#comment-15615960
 ] 

Till Toenshoff commented on MESOS-6497:
---

1.1.x backport:

{noformat}
commit d9236ab50cea13a1eafb4b4ade902675ee211493
Author: Anand Mazumdar 
Date:   Thu Oct 27 15:48:24 2016 -0700

Populated `MasterInfo` in the v0 Java adapter.

Review: https://reviews.apache.org/r/53247

commit aa0f2b662e4d87b2fd547d88dc1d6c894827636b
Author: Anand Mazumdar 
Date:   Thu Oct 27 15:46:01 2016 -0700

Populated `MasterInfo` evolving from v0 framework registered message.

Review: https://reviews.apache.org/r/53246

commit 5e37ed57b457c330ab05e4ed619f5be8e073808f
Author: Anand Mazumdar 
Date:   Thu Oct 27 15:42:45 2016 -0700

Added `MasterInfo` to the subscribed event.

This would be useful for schedulers that want more information
about the master if they are using an alternate detector
implementation that does not query Master ZK entry. Also,
this is needed presently by the V0 -> V1 HTTP adapter. The
old driver based schedulers used to get this information via
the `Framework(Re-)registered` event.

Review: https://reviews.apache.org/r/53245
{noformat}

> Java Scheduler Adapter does not surface MasterInfo.
> ---
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
> Fix For: 1.1.0
>
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6479) add ability to execute batch jobs from TaskGroupInfo proto in execute.cpp and add string flag for framework-name

2016-10-28 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6479:
--
Target Version/s: 1.2.0  (was: 1.1.0)

> add ability to execute batch jobs from TaskGroupInfo proto in execute.cpp and 
> add string flag for framework-name
> 
>
> Key: MESOS-6479
> URL: https://issues.apache.org/jira/browse/MESOS-6479
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 1.1.0
> Environment: all
>Reporter: Hubert Asamer
>Priority: Trivial
>  Labels: newbie, newbie++, testing
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Extend execute.cpp to use TaskGroupInfo as container for batch jobs to 
> distribute tasks based on available offers. A simple bool cli flag shall 
> enable/disable such a behavior. If enabled the contents of TaskGroupInfo does 
> not cause the execution of tasks within a "pod" (on a single host) but as 
> distributed jobs (on multiple hosts) 
> As an addition an optional cli flag for setting the temporary framework name 
> (e.g. to better distinguish between running/finished frameworks) could be 
> useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6479) add ability to execute batch jobs from TaskGroupInfo proto in execute.cpp and add string flag for framework-name

2016-10-28 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615950#comment-15615950
 ] 

Till Toenshoff commented on MESOS-6479:
---

I am retargeting this as 1.1.0 RC2 will be cut soon. 

> add ability to execute batch jobs from TaskGroupInfo proto in execute.cpp and 
> add string flag for framework-name
> 
>
> Key: MESOS-6479
> URL: https://issues.apache.org/jira/browse/MESOS-6479
> Project: Mesos
>  Issue Type: Improvement
>  Components: cli
>Affects Versions: 1.1.0
> Environment: all
>Reporter: Hubert Asamer
>Priority: Trivial
>  Labels: newbie, newbie++, testing
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Extend execute.cpp to use TaskGroupInfo as container for batch jobs to 
> distribute tasks based on available offers. A simple bool cli flag shall 
> enable/disable such a behavior. If enabled the contents of TaskGroupInfo does 
> not cause the execution of tasks within a "pod" (on a single host) but as 
> distributed jobs (on multiple hosts) 
> As an addition an optional cli flag for setting the temporary framework name 
> (e.g. to better distinguish between running/finished frameworks) could be 
> useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6499) Add metric to track active subscribers in operator API

2016-10-28 Thread Zhitao Li (JIRA)
Zhitao Li created MESOS-6499:


 Summary: Add metric to track active subscribers in operator API
 Key: MESOS-6499
 URL: https://issues.apache.org/jira/browse/MESOS-6499
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API
Reporter: Zhitao Li
Assignee: Zhitao Li






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6348) Allow `network/cni` isolator unit-tests to run with CNI plugins

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6348:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Allow `network/cni` isolator unit-tests to run with CNI plugins 
> 
>
> Key: MESOS-6348
> URL: https://issues.apache.org/jira/browse/MESOS-6348
> Project: Mesos
>  Issue Type: Task
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Currently, we don't have any infrastructure to allow for CNI plugins to be 
> used in `network/cni` isolator unit-tests. This forces us to mock CNI plugins 
> that don't use new network namespaces leading to very restricting form of 
> unit-tests. 
> Especially for port-mapper plugin, in order to test its DNAT functionality it 
> will be very useful if we run the containers in separate network namespace 
> requiring an actual CNI plugin.
> The proposal is there to introduce a test filter called CNIPLUGIN, that gets 
> set when CNI_PATH env var is set. Tests using the CNIPLUGIN filter can then 
> use actual CNI plugins in their tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6291) Add unit tests for nested container case for filesystem/linux isolator.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6291:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Add unit tests for nested container case for filesystem/linux isolator.
> ---
>
> Key: MESOS-6291
> URL: https://issues.apache.org/jira/browse/MESOS-6291
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Parameterize the existing tests so that all works for both top level 
> container and nested container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6142) Frameworks may RESERVE for an arbitrary role.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6142:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Frameworks may RESERVE for an arbitrary role.
> -
>
> Key: MESOS-6142
> URL: https://issues.apache.org/jira/browse/MESOS-6142
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master
>Affects Versions: 1.0.0
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>  Labels: mesosphere, reservations
>
> The master does not validate that resources from a reservation request have 
> the same role the framework is registered with. As a result, frameworks may 
> reserve resources for arbitrary roles.
> I've modified the role in [the {{ReserveThenUnreserve}} 
> test|https://github.com/apache/mesos/blob/bca600cf5602ed8227d91af9f73d689da14ad786/src/tests/reservation_tests.cpp#L117]
>  to "yoyo" and observed the following in the test's log:
> {noformat}
> I0908 18:35:43.379122 2138112 master.cpp:3362] Processing ACCEPT call for 
> offers: [ dfaf67e6-7c1c-4988-b427-c49842cb7bb7-O0 ] on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train) for framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- 
> (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116
> I0908 18:35:43.379170 2138112 master.cpp:3022] Authorizing principal 
> 'test-principal' to reserve resources 'cpus(yoyo, test-principal):1; 
> mem(yoyo, test-principal):512'
> I0908 18:35:43.379678 2138112 master.cpp:3642] Applying RESERVE operation for 
> resources cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 from 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- (default) at 
> scheduler-ca12a660-9f08-49de-be4e-d452aa3aa6da@10.200.181.237:60116 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.379767 2138112 master.cpp:7341] Sending checkpointed resources 
> cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512 to agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 at slave(1)@10.200.181.237:60116 
> (alexr.railnet.train)
> I0908 18:35:43.380273 3211264 slave.cpp:2497] Updated checkpointed resources 
> from  to cpus(yoyo, test-principal):1; mem(yoyo, test-principal):512
> I0908 18:35:43.380574 2674688 hierarchical.cpp:760] Updated allocation of 
> framework dfaf67e6-7c1c-4988-b427-c49842cb7bb7- on agent 
> dfaf67e6-7c1c-4988-b427-c49842cb7bb7-S0 from cpus(*):1; mem(*):512; 
> disk(*):470841; ports(*):[31000-32000] to ports(*):[31000-32000]; cpus(yoyo, 
> test-principal):1; disk(*):470841; mem(yoyo, test-principal):512 with RESERVE 
> operation
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6493) Add test cases for the HTTPS health checks.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6493:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Add test cases for the HTTPS health checks.
> ---
>
> Key: MESOS-6493
> URL: https://issues.apache.org/jira/browse/MESOS-6493
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: haosdent
>Assignee: haosdent
>  Labels: health-check, mesosphere, test
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6405) Benchmark call ingestion path on the Mesos master.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6405:
-
Sprint: Mesosphere Sprint 45, Mesosphere Sprint 46  (was: Mesosphere Sprint 
45)

> Benchmark call ingestion path on the Mesos master.
> --
>
> Key: MESOS-6405
> URL: https://issues.apache.org/jira/browse/MESOS-6405
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: Anand Mazumdar
>Priority: Critical
>  Labels: mesosphere
>
> [~drexin] reported on the user mailing 
> [list|http://mail-archives.apache.org/mod_mbox/mesos-user/201610.mbox/%3C6B42E374-9AB7--A315-A6558753E08B%40apple.com%3E]
>  that there seems to be a significant regression in performance on the call 
> ingestion path on the Mesos master wrt to the scheduler driver (v0 API). 
> We should create a benchmark to first get a sense of the numbers and then go 
> about fixing the performance issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6464) Add fine grained control of which namespaces / cgroups a nested container should inherit (or not).

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6464:
-
Sprint: Mesosphere Sprint 45, Mesosphere Sprint 46  (was: Mesosphere Sprint 
45)

> Add fine grained control of which namespaces / cgroups a nested container 
> should inherit (or not).
> --
>
> Key: MESOS-6464
> URL: https://issues.apache.org/jira/browse/MESOS-6464
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> We need finer grained control of which namespaces / cgroups a nested 
> container should inherit or not.
> Right now, there are some implicit assumptions about which cgroups we enter 
> and which namespaces we inherit when we launch a nested container. For 
> example, under the current semantics, a nested container will always get a 
> new pid namespace but inherit the network namespace from its parent. 
> Moreover, nested containers will always inherit all of the cgroups from their 
> parent (except the freezer cgroup), with no possiblity of choosing any 
> different configuration.
> My current thinking is to pass the set of isolators to 
> {{containerizer->launch()} that we would like to have invoked as part of 
> launching a new container. Only if that isolator is enabled (via the agent 
> flags) AND it is passed in via {{launch()}, will it be used to isolate the 
> new container (note that both cgroup isolation as well as namespace 
> membership also implemented using isolators).  This is a sort of a whitelist 
> approach, where we have to know the full set of isolators we want our 
> container launched with ahead of time.
> Alternatively, we could consider passing in the set of isolators that we 
> would like *disabled* instead.  This way we could blacklist certain isolators 
> from kicking in, even if they have been enabled via the agent flags.
> In both approaches, one major caveat of this is that it will have to become 
> part of the top-level containerizer API, but it is specific only to the 
> universal containerizer. Maybe this is OK as we phase out the docker 
> containerizer anyway.
> I am leaning towards the blacklist approach at the moment...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5856) Logrotate ContainerLogger module does not rotate logs when run as root with --switch_user

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5856:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Logrotate ContainerLogger module does not rotate logs when run as root with 
> --switch_user
> -
>
> Key: MESOS-5856
> URL: https://issues.apache.org/jira/browse/MESOS-5856
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0, 0.28.0, 1.0.0
>Reporter: Joseph Wu
>Assignee: Sivaram Kannan
>Priority: Critical
>  Labels: logger, mesosphere, newbie
>
> The logrotate ContainerLogger module runs as the agent's user.  In most 
> cases, this is {{root}}.
> When {{logrotate}} is run as root, there is an additional check the 
> configuration files must pass (because a root {{logrotate}} needs to be 
> secured against non-root modifications to the configuration):
> https://github.com/logrotate/logrotate/blob/fe80cb51a2571ca35b1a7c8ba0695db5a68feaba/config.c#L807-L815
> Log rotation will fail under the following scenario:
> 1) The agent is run with {{--switch_user}} (default: true)
> 2) A task is launched with a non-root user specified
> 3) The logrotate module spawns a few companion processes (as root) and this 
> creates the {{stdout}}, {{stderr}}, {{stdout.logrotate.conf}}, and 
> {{stderr.logrotate.conf}} files (as root).  This step races with the next 
> step.
> 4) The Mesos containerizer and Fetcher will {{chown}} the task's sandbox to 
> the non-root user.  Including the files just created.
> 5) When {{logrotate}} is run, it will skip any non-root configuration files.  
> This means the files are not rotated.
> 
> Fix: The logrotate module's companion processes should call {{setuid}} and 
> {{setgid}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6462) Design Doc: Mesos Support for Container Attach and Container Exec

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6462:
-
Sprint: Mesosphere Sprint 45, Mesosphere Sprint 46  (was: Mesosphere Sprint 
45)

> Design Doc: Mesos Support for Container Attach and Container Exec
> -
>
> Key: MESOS-6462
> URL: https://issues.apache.org/jira/browse/MESOS-6462
> Project: Mesos
>  Issue Type: Task
>Reporter: Kevin Klues
>Assignee: Kevin Klues
>  Labels: debugging, mesosphere
>
> Here is a link to the design doc:
> https://docs.google.com/document/d/1nAVr0sSSpbDLrgUlAEB5hKzCl482NSVk8V0D56sFMzU
> It is not yet complete, but it is filled out enough to start eliciting 
> feedback. Please feel free to add comments (or even add content!) as you wish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-3753:
-
Sprint: Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 41, 
Mesosphere Sprint 42, Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere 
Sprint 46  (was: Mesosphere Sprint 39, Mesosphere Sprint 40, Mesosphere Sprint 
41, Mesosphere Sprint 42, Mesosphere Sprint 44, Mesosphere Sprint 45)

> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Greg Mann
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> (You can manually test this by spinning up an SSL-enabled master and attempt 
> to run the event-call framework example against it.)
> We need to add tests that check the HTTP Scheduler library against 
> SSL-enabled Mesos:
> * with downgrade support,
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with/without verification of certificates (framework-side),
> * with a custom certificate authority (CA)
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5966) Add libprocess HTTP tests with SSL support

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-5966:
-
Sprint: Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42, 
Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  (was: 
Mesosphere Sprint 40, Mesosphere Sprint 41, Mesosphere Sprint 42, Mesosphere 
Sprint 44, Mesosphere Sprint 45)

> Add libprocess HTTP tests with SSL support
> --
>
> Key: MESOS-5966
> URL: https://issues.apache.org/jira/browse/MESOS-5966
> Project: Mesos
>  Issue Type: Task
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Libprocess contains SSL unit tests which test our SSL support using simple 
> sockets. We should add tests which also make use of libprocess's various HTTP 
> classes and helpers in a variety of SSL configurations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6411) Add documentation for CNI port-mapper plugin.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6411:
-
Sprint: Mesosphere Sprint 45, Mesosphere Sprint 46  (was: Mesosphere Sprint 
45)

> Add documentation for CNI port-mapper plugin.
> -
>
> Key: MESOS-6411
> URL: https://issues.apache.org/jira/browse/MESOS-6411
> Project: Mesos
>  Issue Type: Documentation
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Need to add the CNI port-mapper plugin to the CNI documentation within Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6335) Add user doc for task group tasks

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6335:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Add user doc for task group tasks
> -
>
> Key: MESOS-6335
> URL: https://issues.apache.org/jira/browse/MESOS-6335
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Gilbert Song
>
> Committed some basic documentation. So moving this to pods-improvements epic 
> and targeting this for 1.2.0. I would like this to track the more 
> comprehensive documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6366) Design doc for agent secrets

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6366:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Design doc for agent secrets
> 
>
> Key: MESOS-6366
> URL: https://issues.apache.org/jira/browse/MESOS-6366
> Project: Mesos
>  Issue Type: Task
>  Components: slave
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Produce a design for the passing of credentials to the agent, and their use 
> in the following three scenarios:
> * HTTP executor authentication
> * Container image fetching
> * Artifact fetching



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6292) Add unit tests for nested container case for docker/runtime isolator.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6292:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Add unit tests for nested container case for docker/runtime isolator.
> -
>
> Key: MESOS-6292
> URL: https://issues.apache.org/jira/browse/MESOS-6292
> Project: Mesos
>  Issue Type: Improvement
>  Components: isolation
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>
> Launch nested containers with different container images specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6193) Make the docker/volume isolator nesting aware.

2016-10-28 Thread Artem Harutyunyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Harutyunyan updated MESOS-6193:
-
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 45, Mesosphere Sprint 46  
(was: Mesosphere Sprint 44, Mesosphere Sprint 45)

> Make the docker/volume isolator nesting aware.
> --
>
> Key: MESOS-6193
> URL: https://issues.apache.org/jira/browse/MESOS-6193
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Gilbert Song
>  Labels: isolator, mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6022) unit-test for port-mapper CNI plugin

2016-10-28 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-6022:
-
Summary: unit-test for port-mapper CNI plugin  (was: unit-test for adding 
port-mapping using ptp plugin)

> unit-test for port-mapper CNI plugin
> 
>
> Key: MESOS-6022
> URL: https://issues.apache.org/jira/browse/MESOS-6022
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
> Environment: Linux
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: mesosphere
>
> Write unit-tests for the port mapper plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6496) Support up-casting of Shared and Owned

2016-10-28 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway updated MESOS-6496:
---
Description: It should be possible to pass a {{Shared}} value to 
an object that takes a parameter of type {{Shared}}. Similarly for 
{{Owned}}. In general, {{Shared}} should be implicitly convertable to 
{{Shared}} iff {{T2*}} is implicitly convertable to {{T1*}}. In C++11, this 
works because they define the appropriate conversion constructor.  (was: It 
should be possible to pass a {{Shared}} value to an object that takes 
a parameter of type {{Shared}}. Similarly for {{Owned}}. In general, 
{{Shared}} should be implicitly convertable to {{Shared}} iff {{T2}} is 
implicitly convertable to {{T1}}. In C++11, this works because they define the 
appropriate conversion constructor.)

> Support up-casting of Shared and Owned
> 
>
> Key: MESOS-6496
> URL: https://issues.apache.org/jira/browse/MESOS-6496
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, tech-debt
>
> It should be possible to pass a {{Shared}} value to an object that 
> takes a parameter of type {{Shared}}. Similarly for {{Owned}}. In 
> general, {{Shared}} should be implicitly convertable to {{Shared}} 
> iff {{T2*}} is implicitly convertable to {{T1*}}. In C++11, this works 
> because they define the appropriate conversion constructor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-6496) Support up-casting of Shared and Owned

2016-10-28 Thread Neil Conway (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Conway reassigned MESOS-6496:
--

Assignee: Neil Conway

> Support up-casting of Shared and Owned
> 
>
> Key: MESOS-6496
> URL: https://issues.apache.org/jira/browse/MESOS-6496
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere, tech-debt
>
> It should be possible to pass a {{Shared}} value to an object that 
> takes a parameter of type {{Shared}}. Similarly for {{Owned}}. In 
> general, {{Shared}} should be implicitly convertable to {{Shared}} 
> iff {{T2}} is implicitly convertable to {{T1}}. In C++11, this works because 
> they define the appropriate conversion constructor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6497) Java Scheduler Adapter does not surface MasterInfo.

2016-10-28 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-6497:
--
Summary: Java Scheduler Adapter does not surface MasterInfo.  (was: HTTP 
Adapter does not surface MasterInfo.)

> Java Scheduler Adapter does not surface MasterInfo.
> ---
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
> Fix For: 1.1.0
>
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6480) Support for docker live-restore option in Mesos

2016-10-28 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615551#comment-15615551
 ] 

haosdent commented on MESOS-6480:
-

We may use {{docker run -d}} first and then use {{docker logs --follow}} to 
make the log redirect to sandbox stdout/stderr.

> Support for docker live-restore option in Mesos
> ---
>
> Key: MESOS-6480
> URL: https://issues.apache.org/jira/browse/MESOS-6480
> Project: Mesos
>  Issue Type: Task
>Reporter: Milind Chawre
>
> Docker-1.12 supports live-restore option which keeps containers alive during 
> docker daemon downtime https://docs.docker.com/engine/admin/live-restore/
> I tried to use this option in my Mesos setup And  observed this :
> 1. On mesos worker node stop docker daemon.
> 2. After some time start the docker daemon. All the containers running on 
> that are still visible using "docker ps". This is an expected behaviour of 
> live-restore option.
> 3. When I check mesos and marathon UI. It shows no Active tasks running on 
> that node. The containers which are still running on that node are now 
> scheduled on different mesos nodes, which is not right since I can see the 
> containers in "docker ps" output because of live-restore option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6424) Possible nullptr dereference in flag loading

2016-10-28 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6424:
--
Fix Version/s: 1.1.0
   1.0.2

> Possible nullptr dereference in flag loading
> 
>
> Key: MESOS-6424
> URL: https://issues.apache.org/jira/browse/MESOS-6424
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: coverity
> Fix For: 1.0.2, 1.1.0
>
>
> Coverity reports the following:
> {code}
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add std::basic_string, char 
> [10], mesos::internal::logger::rotate::Flags::Flags()::[lambda(const 
> std::basic_string&) 
> (instance 1)]>(T2 T1::*, const flags::Name &, const Option &, 
> const std::basic_string&, 
> const T3 *, T4)::[lambda(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) (instance 1)]::operator 
> ()(flags::FlagsBase*, const std::basic_string std::allocator>&) const()
> 369 Flags* flags = dynamic_cast(base);
> 370 if (base != nullptr) {
> 371   // NOTE: 'fetch' "retrieves" the value if necessary and then
> 372   // invokes 'parse'. See 'fetch' for more details.
> 373   Try t = fetch(value);
> 374   if (t.isSome()) {
>CID 1374083:(FORWARD_NULL)
>Dereferencing null pointer "flags".
> 375 flags->*t1 = t.get();
> 376   } else {
> 377 return Error("Failed to load value '" + value + "': " + 
> t.error());
> 378   }
> 379 }
> 380 
> ** CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add Option (*)(const Bytes &)>(T2 T1::*, const flags::Name &, 
> Option&, const std::basic_string std::allocator>&, const T3 *, T4)::[lambda(flags::FlagsBase*, const 
> std::basic_string&) 
> (instance 1)]::operator ()(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) const()
> 
> *** CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add Option (*)(const Bytes &)>(T2 T1::*, const flags::Name &, 
> Option&, const std::basic_string std::allocator>&, const T3 *, T4)::[lambda(flags::FlagsBase*, const 
> std::basic_string&) 
> (instance 1)]::operator ()(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) const()
> 369 Flags* flags = dynamic_cast(base);
> 370 if (base != nullptr) {
> 371   // NOTE: 'fetch' "retrieves" the value if necessary and then
> 372   // invokes 'parse'. See 'fetch' for more details.
> 373   Try t = fetch(value);
> 374   if (t.isSome()) {
>CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
>Dereferencing null pointer "flags".
> 375 flags->*t1 = t.get();
> 376   } else {
> 377 return Error("Failed to load value '" + value + "': " + 
> t.error());
> 378   }
> 379 }
> 380 
> {code}
> The {{dynamic_cast}} is needed here if the derived {{Flags}} class got 
> intentionally sliced (e.g., to a {{FlagsBase}}). Since the base class of the 
> hierarchy ({{FlagsBase}}) stores the flags they would not be sliced away; the 
> {{dynamic_cast}} here effectively filters out all flags still valid for the 
> {{Flags}} used when the {{Flag}} was {{add}}'ed.
> It seems the intention here was to confirm that the {{dynamic_cast}} to 
> {{Flags*}} succeeded like is done e.g., in {{flags.stringify}} and 
> {{flags.validate}} just below.
> AFAICT this code has existed since 2013, but was only reported by coverity 
> recently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6424) Possible nullptr dereference in flag loading

2016-10-28 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615549#comment-15615549
 ] 

Till Toenshoff commented on MESOS-6424:
---

1.0.x backport:
{noformat}
commit ddb5be4410e83027ad99af914a4e66d4c8801b59
Author: Benjamin Bannier 
Date:   Fri Oct 28 13:39:02 2016 +0200

Fixed incorrect check in dynamic_cast.

We checked here whether the passed pointer is not null, while we in
fact wanted to confirm that the performed dynamic_cast succeeded
(like is already done e.g., in the definitions of flags.stringify or
flags.validate).

This change makes us check the result of the dynamic_cast like
originally intended.

Review: https://reviews.apache.org/r/53055/
{noformat}

> Possible nullptr dereference in flag loading
> 
>
> Key: MESOS-6424
> URL: https://issues.apache.org/jira/browse/MESOS-6424
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: coverity
>
> Coverity reports the following:
> {code}
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add std::basic_string, char 
> [10], mesos::internal::logger::rotate::Flags::Flags()::[lambda(const 
> std::basic_string&) 
> (instance 1)]>(T2 T1::*, const flags::Name &, const Option &, 
> const std::basic_string&, 
> const T3 *, T4)::[lambda(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) (instance 1)]::operator 
> ()(flags::FlagsBase*, const std::basic_string std::allocator>&) const()
> 369 Flags* flags = dynamic_cast(base);
> 370 if (base != nullptr) {
> 371   // NOTE: 'fetch' "retrieves" the value if necessary and then
> 372   // invokes 'parse'. See 'fetch' for more details.
> 373   Try t = fetch(value);
> 374   if (t.isSome()) {
>CID 1374083:(FORWARD_NULL)
>Dereferencing null pointer "flags".
> 375 flags->*t1 = t.get();
> 376   } else {
> 377 return Error("Failed to load value '" + value + "': " + 
> t.error());
> 378   }
> 379 }
> 380 
> ** CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add Option (*)(const Bytes &)>(T2 T1::*, const flags::Name &, 
> Option&, const std::basic_string std::allocator>&, const T3 *, T4)::[lambda(flags::FlagsBase*, const 
> std::basic_string&) 
> (instance 1)]::operator ()(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) const()
> 
> *** CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add Option (*)(const Bytes &)>(T2 T1::*, const flags::Name &, 
> Option&, const std::basic_string std::allocator>&, const T3 *, T4)::[lambda(flags::FlagsBase*, const 
> std::basic_string&) 
> (instance 1)]::operator ()(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) const()
> 369 Flags* flags = dynamic_cast(base);
> 370 if (base != nullptr) {
> 371   // NOTE: 'fetch' "retrieves" the value if necessary and then
> 372   // invokes 'parse'. See 'fetch' for more details.
> 373   Try t = fetch(value);
> 374   if (t.isSome()) {
>CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
>Dereferencing null pointer "flags".
> 375 flags->*t1 = t.get();
> 376   } else {
> 377 return Error("Failed to load value '" + value + "': " + 
> t.error());
> 378   }
> 379 }
> 380 
> {code}
> The {{dynamic_cast}} is needed here if the derived {{Flags}} class got 
> intentionally sliced (e.g., to a {{FlagsBase}}). Since the base class of the 
> hierarchy ({{FlagsBase}}) stores the flags they would not be sliced away; the 
> {{dynamic_cast}} here effectively filters out all flags still valid for the 
> {{Flags}} used when the {{Flag}} was {{add}}'ed.
> It seems the intention here was to confirm that the {{dynamic_cast}} to 
> {{Flags*}} succeeded like is done e.g., in {{flags.stringify}} and 
> {{flags.validate}} just 

[jira] [Commented] (MESOS-6424) Possible nullptr dereference in flag loading

2016-10-28 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615532#comment-15615532
 ] 

Till Toenshoff commented on MESOS-6424:
---

{noformat}
commit 189ac949784bfebfb14c99c76dc75d2bf924a412
Author: Benjamin Bannier 
Date:   Fri Oct 28 13:39:02 2016 +0200

Fixed incorrect check in dynamic_cast.

We checked here whether the passed pointer is not null, while we in
fact wanted to confirm that the performed dynamic_cast succeeded
(like is already done e.g., in the definitions of flags.stringify or
flags.validate).

This change makes us check the result of the dynamic_cast like
originally intended.

Review: https://reviews.apache.org/r/53055/
{noformat}

> Possible nullptr dereference in flag loading
> 
>
> Key: MESOS-6424
> URL: https://issues.apache.org/jira/browse/MESOS-6424
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: coverity
>
> Coverity reports the following:
> {code}
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add std::basic_string, char 
> [10], mesos::internal::logger::rotate::Flags::Flags()::[lambda(const 
> std::basic_string&) 
> (instance 1)]>(T2 T1::*, const flags::Name &, const Option &, 
> const std::basic_string&, 
> const T3 *, T4)::[lambda(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) (instance 1)]::operator 
> ()(flags::FlagsBase*, const std::basic_string std::allocator>&) const()
> 369 Flags* flags = dynamic_cast(base);
> 370 if (base != nullptr) {
> 371   // NOTE: 'fetch' "retrieves" the value if necessary and then
> 372   // invokes 'parse'. See 'fetch' for more details.
> 373   Try t = fetch(value);
> 374   if (t.isSome()) {
>CID 1374083:(FORWARD_NULL)
>Dereferencing null pointer "flags".
> 375 flags->*t1 = t.get();
> 376   } else {
> 377 return Error("Failed to load value '" + value + "': " + 
> t.error());
> 378   }
> 379 }
> 380 
> ** CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add Option (*)(const Bytes &)>(T2 T1::*, const flags::Name &, 
> Option&, const std::basic_string std::allocator>&, const T3 *, T4)::[lambda(flags::FlagsBase*, const 
> std::basic_string&) 
> (instance 1)]::operator ()(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) const()
> 
> *** CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add Option (*)(const Bytes &)>(T2 T1::*, const flags::Name &, 
> Option&, const std::basic_string std::allocator>&, const T3 *, T4)::[lambda(flags::FlagsBase*, const 
> std::basic_string&) 
> (instance 1)]::operator ()(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) const()
> 369 Flags* flags = dynamic_cast(base);
> 370 if (base != nullptr) {
> 371   // NOTE: 'fetch' "retrieves" the value if necessary and then
> 372   // invokes 'parse'. See 'fetch' for more details.
> 373   Try t = fetch(value);
> 374   if (t.isSome()) {
>CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
>Dereferencing null pointer "flags".
> 375 flags->*t1 = t.get();
> 376   } else {
> 377 return Error("Failed to load value '" + value + "': " + 
> t.error());
> 378   }
> 379 }
> 380 
> {code}
> The {{dynamic_cast}} is needed here if the derived {{Flags}} class got 
> intentionally sliced (e.g., to a {{FlagsBase}}). Since the base class of the 
> hierarchy ({{FlagsBase}}) stores the flags they would not be sliced away; the 
> {{dynamic_cast}} here effectively filters out all flags still valid for the 
> {{Flags}} used when the {{Flag}} was {{add}}'ed.
> It seems the intention here was to confirm that the {{dynamic_cast}} to 
> {{Flags*}} succeeded like is done e.g., in {{flags.stringify}} and 
> {{flags.validate}} just below.
> AFAICT this 

[jira] [Comment Edited] (MESOS-6424) Possible nullptr dereference in flag loading

2016-10-28 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615526#comment-15615526
 ] 

Till Toenshoff edited comment on MESOS-6424 at 10/28/16 2:13 PM:
-

Landed on master - we should actually backport this one for 1.0.2 - 
[~vinodkone] do you agree?


was (Author: tillt):
Landed on master - we should actually backport this one for 1.0.2 - @vinodkone 
do you agree?

> Possible nullptr dereference in flag loading
> 
>
> Key: MESOS-6424
> URL: https://issues.apache.org/jira/browse/MESOS-6424
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: coverity
>
> Coverity reports the following:
> {code}
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add std::basic_string, char 
> [10], mesos::internal::logger::rotate::Flags::Flags()::[lambda(const 
> std::basic_string&) 
> (instance 1)]>(T2 T1::*, const flags::Name &, const Option &, 
> const std::basic_string&, 
> const T3 *, T4)::[lambda(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) (instance 1)]::operator 
> ()(flags::FlagsBase*, const std::basic_string std::allocator>&) const()
> 369 Flags* flags = dynamic_cast(base);
> 370 if (base != nullptr) {
> 371   // NOTE: 'fetch' "retrieves" the value if necessary and then
> 372   // invokes 'parse'. See 'fetch' for more details.
> 373   Try t = fetch(value);
> 374   if (t.isSome()) {
>CID 1374083:(FORWARD_NULL)
>Dereferencing null pointer "flags".
> 375 flags->*t1 = t.get();
> 376   } else {
> 377 return Error("Failed to load value '" + value + "': " + 
> t.error());
> 378   }
> 379 }
> 380 
> ** CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add Option (*)(const Bytes &)>(T2 T1::*, const flags::Name &, 
> Option&, const std::basic_string std::allocator>&, const T3 *, T4)::[lambda(flags::FlagsBase*, const 
> std::basic_string&) 
> (instance 1)]::operator ()(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) const()
> 
> *** CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add Option (*)(const Bytes &)>(T2 T1::*, const flags::Name &, 
> Option&, const std::basic_string std::allocator>&, const T3 *, T4)::[lambda(flags::FlagsBase*, const 
> std::basic_string&) 
> (instance 1)]::operator ()(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) const()
> 369 Flags* flags = dynamic_cast(base);
> 370 if (base != nullptr) {
> 371   // NOTE: 'fetch' "retrieves" the value if necessary and then
> 372   // invokes 'parse'. See 'fetch' for more details.
> 373   Try t = fetch(value);
> 374   if (t.isSome()) {
>CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
>Dereferencing null pointer "flags".
> 375 flags->*t1 = t.get();
> 376   } else {
> 377 return Error("Failed to load value '" + value + "': " + 
> t.error());
> 378   }
> 379 }
> 380 
> {code}
> The {{dynamic_cast}} is needed here if the derived {{Flags}} class got 
> intentionally sliced (e.g., to a {{FlagsBase}}). Since the base class of the 
> hierarchy ({{FlagsBase}}) stores the flags they would not be sliced away; the 
> {{dynamic_cast}} here effectively filters out all flags still valid for the 
> {{Flags}} used when the {{Flag}} was {{add}}'ed.
> It seems the intention here was to confirm that the {{dynamic_cast}} to 
> {{Flags*}} succeeded like is done e.g., in {{flags.stringify}} and 
> {{flags.validate}} just below.
> AFAICT this code has existed since 2013, but was only reported by coverity 
> recently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6424) Possible nullptr dereference in flag loading

2016-10-28 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615526#comment-15615526
 ] 

Till Toenshoff commented on MESOS-6424:
---

Landed on master - we should actually backport this one for 1.0.2 - @vinodkone 
do you agree?

> Possible nullptr dereference in flag loading
> 
>
> Key: MESOS-6424
> URL: https://issues.apache.org/jira/browse/MESOS-6424
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: coverity
>
> Coverity reports the following:
> {code}
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add std::basic_string, char 
> [10], mesos::internal::logger::rotate::Flags::Flags()::[lambda(const 
> std::basic_string&) 
> (instance 1)]>(T2 T1::*, const flags::Name &, const Option &, 
> const std::basic_string&, 
> const T3 *, T4)::[lambda(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) (instance 1)]::operator 
> ()(flags::FlagsBase*, const std::basic_string std::allocator>&) const()
> 369 Flags* flags = dynamic_cast(base);
> 370 if (base != nullptr) {
> 371   // NOTE: 'fetch' "retrieves" the value if necessary and then
> 372   // invokes 'parse'. See 'fetch' for more details.
> 373   Try t = fetch(value);
> 374   if (t.isSome()) {
>CID 1374083:(FORWARD_NULL)
>Dereferencing null pointer "flags".
> 375 flags->*t1 = t.get();
> 376   } else {
> 377 return Error("Failed to load value '" + value + "': " + 
> t.error());
> 378   }
> 379 }
> 380 
> ** CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add Option (*)(const Bytes &)>(T2 T1::*, const flags::Name &, 
> Option&, const std::basic_string std::allocator>&, const T3 *, T4)::[lambda(flags::FlagsBase*, const 
> std::basic_string&) 
> (instance 1)]::operator ()(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) const()
> 
> *** CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
> /3rdparty/stout/include/stout/flags/flags.hpp: 375 in 
> flags::FlagsBase::add Option (*)(const Bytes &)>(T2 T1::*, const flags::Name &, 
> Option&, const std::basic_string std::allocator>&, const T3 *, T4)::[lambda(flags::FlagsBase*, const 
> std::basic_string&) 
> (instance 1)]::operator ()(flags::FlagsBase*, const std::basic_string std::char_traits, std::allocator>&) const()
> 369 Flags* flags = dynamic_cast(base);
> 370 if (base != nullptr) {
> 371   // NOTE: 'fetch' "retrieves" the value if necessary and then
> 372   // invokes 'parse'. See 'fetch' for more details.
> 373   Try t = fetch(value);
> 374   if (t.isSome()) {
>CID 1374082:  Null pointer dereferences  (FORWARD_NULL)
>Dereferencing null pointer "flags".
> 375 flags->*t1 = t.get();
> 376   } else {
> 377 return Error("Failed to load value '" + value + "': " + 
> t.error());
> 378   }
> 379 }
> 380 
> {code}
> The {{dynamic_cast}} is needed here if the derived {{Flags}} class got 
> intentionally sliced (e.g., to a {{FlagsBase}}). Since the base class of the 
> hierarchy ({{FlagsBase}}) stores the flags they would not be sliced away; the 
> {{dynamic_cast}} here effectively filters out all flags still valid for the 
> {{Flags}} used when the {{Flag}} was {{add}}'ed.
> It seems the intention here was to confirm that the {{dynamic_cast}} to 
> {{Flags*}} succeeded like is done e.g., in {{flags.stringify}} and 
> {{flags.validate}} just below.
> AFAICT this code has existed since 2013, but was only reported by coverity 
> recently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6455) DefaultExecutorTests fail when running on hosts without docker.

2016-10-28 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6455:
--
Assignee: Gastón Kleiman

> DefaultExecutorTests fail when running on hosts without docker.
> ---
>
> Key: MESOS-6455
> URL: https://issues.apache.org/jira/browse/MESOS-6455
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>Assignee: Gastón Kleiman
>
> {noformat:title=}
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskRunning/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_KillTask/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskUsesExecutor/1, 
> where GetParam() = "docker,mesos"
> {noformat}
> {noformat:title=}
> ../../src/tests/default_executor_tests.cpp:98: Failure
> slave: Failed to create containerizer: Could not create DockerContainerizer: 
> Failed to create docker: Failed to get docker version: Failed to execute 
> 'docker -H unix:///var/run/docker.sock --version': exited with status 127
> {noformat}
> Maybe we can put {{DOCKER_}} in the instantiation name and use another 
> instantiation for tests that don't require docker?
> /cc [~vinodkone] [~anandmazumdar]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6455) DefaultExecutorTests fail when running on hosts without docker.

2016-10-28 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-6455:
--
Shepherd: Till Toenshoff

> DefaultExecutorTests fail when running on hosts without docker.
> ---
>
> Key: MESOS-6455
> URL: https://issues.apache.org/jira/browse/MESOS-6455
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>
> {noformat:title=}
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskRunning/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_KillTask/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskUsesExecutor/1, 
> where GetParam() = "docker,mesos"
> {noformat}
> {noformat:title=}
> ../../src/tests/default_executor_tests.cpp:98: Failure
> slave: Failed to create containerizer: Could not create DockerContainerizer: 
> Failed to create docker: Failed to get docker version: Failed to execute 
> 'docker -H unix:///var/run/docker.sock --version': exited with status 127
> {noformat}
> Maybe we can put {{DOCKER_}} in the instantiation name and use another 
> instantiation for tests that don't require docker?
> /cc [~vinodkone] [~anandmazumdar]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6202) Docker containerizer kills containers whose name starts with 'mesos-'

2016-10-28 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615520#comment-15615520
 ] 

Anand Mazumdar commented on MESOS-6202:
---

Nopes, we can close this issue.

> Docker containerizer kills containers whose name starts with 'mesos-'
> -
>
> Key: MESOS-6202
> URL: https://issues.apache.org/jira/browse/MESOS-6202
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 1.0.1
> Environment: Dockerized 
> {{mesosphere/mesos-slave:1.0.1-2.0.93.ubuntu1404}}
>Reporter: Marc Villacorta
>
> I run 3 docker containers in my CoreOS system whose names start with 
> _'mesos-'_ those are: _'mesos-master'_, _'mesos-dns'_ and _'mesos-agent'_.
> I can start the first two without any problem but when I start the third one 
> _('mesos-agent')_ all three containers are killed by the docker daemon.
> If I rename the containers to _'m3s0s-master'_, _'m3s0s-dns'_ and 
> _'m3s0s-agent'_ everything works.
> I tracked down the problem to 
> [this|https://github.com/apache/mesos/blob/16a563aca1f226b021b8f8815c4d115a3212f02b/src/slave/containerizer/docker.cpp#L116-L120]
>  code which is marked to be removed after deprecation cycle.
> I was previously running Mesos 0.28.2 without this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6459) PosixRLimitsIsolatorTest.TaskExceedingLimit fails on OS X

2016-10-28 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-6459:
--
Target Version/s: 1.2.0
  Labels: mesosphere  (was: )

> PosixRLimitsIsolatorTest.TaskExceedingLimit fails on OS X
> -
>
> Key: MESOS-6459
> URL: https://issues.apache.org/jira/browse/MESOS-6459
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> This test consistently fails on OS X:
> {code}
> 31-7e9c-4248-acfd-21634150a657@172.28.128.1:64864 on agent 
> 52cc4957-1a39-4d66-ace6-5622fac3b85e-S0
> ../../src/tests/containerizer/posix_rlimits_isolator_tests.cpp:120: Failure
> Value of: statusFailed->state()
>   Actual: TASK_FINISHED
> Expected: TASK_FAILED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6494) Clean up the flags parsing in the executors

2016-10-28 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-6494:
--
Shepherd: Alexander Rukletsov
  Sprint: Mesosphere Sprint 46  (was: Mesosphere Sprint 45)
Target Version/s: 1.2.0
  Labels: mesosphere  (was: )

> Clean up the flags parsing in the executors
> ---
>
> Key: MESOS-6494
> URL: https://issues.apache.org/jira/browse/MESOS-6494
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>
> The current executors and the executor libraries use a mix of `stout::flags` 
> and `os::getenv` to parse flags, leading to a lot of unnecessary and 
> sometimes duplicated code.
> This should be cleaned up, using only {{stout::flags}} to parse flags.
> Environment variables should be used for the flags that are common to ALL the 
> executors (listed in the Executor HTTP API doc).
> Command line parameters should be used for flags that apply only to 
> individual executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6492) Deprecate the existing `SSL_` env variables

2016-10-28 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MESOS-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gastón Kleiman updated MESOS-6492:
--
Assignee: Gastón Kleiman
  Labels: mesosphere  (was: )

> Deprecate the existing `SSL_` env variables
> ---
>
> Key: MESOS-6492
> URL: https://issues.apache.org/jira/browse/MESOS-6492
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>
> {{SSL_}} env variables are deprecated by {{LIBPROCESS_SSL_}}.
> Cleanup the code once the deprecation cycle is over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6171) Introduce "global" decision policy for unhealthy tasks.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6171:
---
Sprint: Mesosphere Sprint 44  (was: Mesosphere Sprint 44, Mesosphere Sprint 
45)

> Introduce "global" decision policy for unhealthy tasks.
> ---
>
> Key: MESOS-6171
> URL: https://issues.apache.org/jira/browse/MESOS-6171
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.0.0
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>  Labels: health-check, mesosphere
>
> Currently, if the task is deemed unhealthy, i.e. it failed a health check a 
> certain number of times, it is killed by both default executors: 
> [command|https://github.com/apache/mesos/blob/b053572bc424478cafcd60d1bce078f5132c4590/src/launcher/executor.cpp#L299]
>  and 
> [docker|https://github.com/apache/mesos/blob/b053572bc424478cafcd60d1bce078f5132c4590/src/docker/executor.cpp#L315].
>  This is what can be called "local" kill policy.
> While local kill policy can save some network traffic and unload the 
> scheduler, there are cases, when a scheduler may want to decide what—and 
> when—to do. This is what can be called "global" policy, i.e. the health check 
> library reports whether a health check failed or succeeded, while the 
> executor forwards this update to the scheduler without taking any action.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6395) HealthChecker sends updates to executor via libprocess messaging.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6395:
---
Sprint: Mesosphere Sprint 46  (was: Mesosphere Sprint 45)

> HealthChecker sends updates to executor via libprocess messaging.
> -
>
> Key: MESOS-6395
> URL: https://issues.apache.org/jira/browse/MESOS-6395
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> Currently {{HealthChecker}} sends status updates via libprocess messaging to 
> the executor's UPID. This seems unnecessary after refactoring health checker 
> into the library: a simple callback will do. Moreover, not requiring 
> executor's {{UPID}} will simplify creating a mocked {{HealthChecker}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5963) HealthChecker should not decide when to kill tasks and when to stop performing health checks.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5963:
---
Sprint: Mesosphere Sprint 46  (was: Mesosphere Sprint 45)

> HealthChecker should not decide when to kill tasks and when to stop 
> performing health checks.
> -
>
> Key: MESOS-5963
> URL: https://issues.apache.org/jira/browse/MESOS-5963
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> Currently, {{HealthChecker}} library decides when a task should be killed 
> based on its health status. Moreover, it stops checking it health after that. 
> This seems unfortunate, because it's up to the executor and / or framework to 
> decide both when to kill tasks and when to health check them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6184) Health checks should use a general mechanism to enter namespaces of the task.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6184:
---
Sprint: Mesosphere Sprint 44, Mesosphere Sprint 46  (was: Mesosphere Sprint 
44, Mesosphere Sprint 45)

> Health checks should use a general mechanism to enter namespaces of the task.
> -
>
> Key: MESOS-6184
> URL: https://issues.apache.org/jira/browse/MESOS-6184
> Project: Mesos
>  Issue Type: Improvement
>Reporter: haosdent
>Assignee: haosdent
>Priority: Blocker
>  Labels: health-check, mesosphere
>
> To perform health checks for tasks, we need to enter the corresponding 
> namespaces of the container. For now health check use custom clone to 
> implement this
> {code}
>   return process::defaultClone([=]() -> int {
> if (taskPid.isSome()) {
>   foreach (const string& ns, namespaces) {
> Try setns = ns::setns(taskPid.get(), ns);
> if (setns.isError()) {
>   ...
> }
>   }
> }
> return func();
>   });
> {code}
> After the childHooks patches merged, we could change the health check to use 
> childHooks to call {{setns}} and make {{process::defaultClone}} private 
> again.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6457) Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6457:
---
Sprint: Mesosphere Sprint 46  (was: Mesosphere Sprint 45)

> Tasks shouldn't transition from TASK_KILLING to TASK_RUNNING
> 
>
> Key: MESOS-6457
> URL: https://issues.apache.org/jira/browse/MESOS-6457
> Project: Mesos
>  Issue Type: Bug
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>
> A task can currently transition from {{TASK_KILLING}} to {{TASK_RUNNING}}, if 
> for example it starts/stops passing a health check once it got into the 
> {{TASK_KILLING}} state.
> I think that this behaviour is counterintuitive. It also makes the life of 
> framework/tools developers harder, since they have to keep track of the 
> complete task status history in order to know if a task is being killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6119) TCP health checks are not portable.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6119:
---
Sprint: Mesosphere Sprint 42, Mesosphere Sprint 43, Mesosphere Sprint 44, 
Mesosphere Sprint 46  (was: Mesosphere Sprint 42, Mesosphere Sprint 43, 
Mesosphere Sprint 44, Mesosphere Sprint 45)

> TCP health checks are not portable.
> ---
>
> Key: MESOS-6119
> URL: https://issues.apache.org/jira/browse/MESOS-6119
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> MESOS-3567 introduced a dependency on "bash" for TCP health checks, which is 
> undesirable. We should implement a portable solution for TCP health checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6497) HTTP Adapter does not surface MasterInfo.

2016-10-28 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614923#comment-15614923
 ] 

Alexander Rukletsov commented on MESOS-6497:


Please post the complete chain.

https://reviews.apache.org/r/53246/
https://reviews.apache.org/r/53247/

> HTTP Adapter does not surface MasterInfo.
> -
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-6497) HTTP Adapter does not surface MasterInfo.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6497:
---
Comment: was deleted

(was: Please post the complete chain.

https://reviews.apache.org/r/53246/
https://reviews.apache.org/r/53247/)

> HTTP Adapter does not surface MasterInfo.
> -
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6497) HTTP Adapter does not surface MasterInfo.

2016-10-28 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614924#comment-15614924
 ] 

Alexander Rukletsov commented on MESOS-6497:


Please post the complete chain.

https://reviews.apache.org/r/53246/
https://reviews.apache.org/r/53247/

> HTTP Adapter does not surface MasterInfo.
> -
>
> Key: MESOS-6497
> URL: https://issues.apache.org/jira/browse/MESOS-6497
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Joris Van Remoortere
>Assignee: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere, v1_api
>
> The HTTP adapter does not surface the {{MasterInfo}}. This makes it not 
> compatible with the V0 API where the {{registered}} and {{reregistered}} 
> calls provided the MasterInfo to the framework.
> cc [~vinodkone]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6420) Mesos Agent leaking sockets when port mapping network isolator is ON

2016-10-28 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614902#comment-15614902
 ] 

Alexander Rukletsov commented on MESOS-6420:


Cherry-picked.

> Mesos Agent leaking sockets when port mapping network isolator is ON
> 
>
> Key: MESOS-6420
> URL: https://issues.apache.org/jira/browse/MESOS-6420
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, network, slave
>Affects Versions: 1.0.2
>Reporter: Santhosh Shanmugham
>Priority: Blocker
> Fix For: 1.0.2, 1.1.0, 1.2.0
>
>
> Mesos Agent leaks one socket per task launched and eventually runs out of 
> sockets. We were able to track it down to the network isolator 
> (port_mapping.cpp). When we turned off the port mapping isolator no file 
> descriptors where leaked. The leaked fd is a SOCK_STREAM socket.
> Leaked Sockets:
> $ sudo lsof -p $(pgrep -u root -o -f /usr/local/sbin/mesos-slave) -nP | grep 
> "can't"
> [sudo] password for sshanmugham:
> mesos-sla 57688 root   19u  sock0,6  0t0 2993216948 can't 
> identify protocol
> mesos-sla 57688 root   27u  sock0,6  0t0 2993216468 can't 
> identify protocol
> Extract from strace:
> ...
> [pid 57701] 19:14:02.493718 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494395 close(19)   = 0
> [pid 57701] 19:14:02.494448 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.494844 close(19)   = 0
> [pid 57701] 19:14:02.494913 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.495565 close(19)   = 0
> [pid 57701] 19:14:02.495617 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496072 close(19)   = 0
> [pid 57701] 19:14:02.496128 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.496758 close(19)   = 0
> [pid 57701] 19:14:02.496812 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497270 close(19)   = 0
> [pid 57701] 19:14:02.497319 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.497698 close(19)   = 0
> [pid 57701] 19:14:02.497750 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498407 close(19)   = 0
> [pid 57701] 19:14:02.498456 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.498899 close(19)   = 0
> [pid 57701] 19:14:02.498963 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 63682] 19:14:02.499091 close(18 
> [pid 57701] 19:14:02.499634 close(19)   = 0
> [pid 57701] 19:14:02.499689 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500044 close(19)   = 0
> [pid 57701] 19:14:02.500093 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.500734 close(19)   = 0
> [pid 57701] 19:14:02.500782 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.501271 close(19)   = 0
> [pid 57701] 19:14:02.501339 socket(PF_NETLINK, SOCK_RAW, 0) = 19
> [pid 57701] 19:14:02.502030 close(19)   = 0
> [pid 57701] 19:14:02.502101 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 19
> ...
> ...
> [pid 57691] 19:18:03.461022 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461345 open("/etc/selinux/config", O_RDONLY  ...>
> [pid 57691] 19:18:03.461460 close(27)   = 0
> [pid 57691] 19:18:03.461520 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.461632 close(3 
> [pid  6138] 19:18:03.461781 open("/proc/mounts", O_RDONLY 
> [pid  6138] 19:18:03.462190 close(3 
> [pid 57691] 19:18:03.462374 close(27)   = 0
> [pid 57691] 19:18:03.462430 socket(PF_NETLINK, SOCK_RAW, 0 
> [pid  6138] 19:18:03.462456 open("/proc/net/psched", O_RDONLY 
> [pid  6138] 19:18:03.462678 close(3 
> [pid  6138] 19:18:03.462915 open("/etc/libnl/classid", O_RDONLY  ...>
> [pid 57691] 19:18:03.463046 close(27)   = 0
> [pid 57691] 19:18:03.463111 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid  6138] 19:18:03.463225 close(3 
> [pid 57691] 19:18:03.463845 close(27)   = 0
> [pid 57691] 19:18:03.463911 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.464604 close(27)   = 0
> [pid 57691] 19:18:03.464664 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465074 close(27)   = 0
> [pid 57691] 19:18:03.465132 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.465862 close(27)   = 0
> [pid 57691] 19:18:03.465928 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.466713 close(27)   = 0
> [pid 57691] 19:18:03.466780 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.467472 close(27)   = 0
> [pid 57691] 19:18:03.467524 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468012 close(27)   = 0
> [pid 57691] 19:18:03.468075 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.468799 close(27)   = 0
> [pid 57691] 19:18:03.468950 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.469505 close(27)   = 0
> [pid 57691] 19:18:03.469578 socket(PF_NETLINK, SOCK_RAW, 0) = 27
> [pid 57691] 19:18:03.470301 

[jira] [Updated] (MESOS-6455) DefaultExecutorTests fail when running on hosts without docker

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6455:
---
Target Version/s: 1.2.0  (was: 1.1.0)

> DefaultExecutorTests fail when running on hosts without docker 
> ---
>
> Key: MESOS-6455
> URL: https://issues.apache.org/jira/browse/MESOS-6455
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>
> {noformat:title=}
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskRunning/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_KillTask/1, where 
> GetParam() = "docker,mesos"
> [  FAILED  ] Containterizers/DefaultExecutorTest.ROOT_TaskUsesExecutor/1, 
> where GetParam() = "docker,mesos"
> {noformat}
> {noformat:title=}
> ../../src/tests/default_executor_tests.cpp:98: Failure
> slave: Failed to create containerizer: Could not create DockerContainerizer: 
> Failed to create docker: Failed to get docker version: Failed to execute 
> 'docker -H unix:///var/run/docker.sock --version': exited with status 127
> {noformat}
> Maybe we can put {{DOCKER_}} in the instantiation name and use another 
> instantiation for tests that don't require docker?
> /cc [~vinodkone] [~anandmazumdar]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6026) Tasks mistakenly marked as FAILED due to race b/w sendExecutorTerminatedStatusUpdate() and _statusUpdate().

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6026:
---
Summary: Tasks mistakenly marked as FAILED due to race b/w 
sendExecutorTerminatedStatusUpdate() and _statusUpdate().  (was: Tasks 
mistakenly marked as FAILED due to race b/w 
⁠sendExecutorTerminatedStatusUpdate()⁠ and ⁠_statusUpdate()⁠)

> Tasks mistakenly marked as FAILED due to race b/w 
> sendExecutorTerminatedStatusUpdate() and _statusUpdate().
> ---
>
> Key: MESOS-6026
> URL: https://issues.apache.org/jira/browse/MESOS-6026
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Kapil Arya
>Assignee: Benjamin Mahler
>  Labels: mesosphere
> Fix For: 1.0.2, 1.1.0
>
>
> Due to a race between ⁠sendExecutorTerminatedStatusUpdate()⁠ and 
> ⁠_statusUpdate()⁠ that happens when the task has just finished and the 
> executor is exiting.
> Here is an example of slave log messages:
> {code}
> Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959374 
> 20418 slave.cpp:3211] Handling status update TASK_FINISHED (UUID: 
> fd79d0bd-4ece-41dc-bced-b93491f6bb2e) for task 291 of framework 
> 340dfe26-a09f-4857-85b8-faba5f8d95df-0008 from executor(1)@10.10.0.205:53504
> Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959604 
> 20418 slave.cpp:3732] executor(1)@10.10.0.205:53504 exited
> Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959643 
> 20418 slave.cpp:4089] Executor '291' of framework 
> 340dfe26-a09f-4857-85b8-faba5f8d95df-0008 exited with status 0
> Aug 10 21:32:53 ip-10-10-0-205 mesos-slave[20413]: I0810 21:32:53.959744 
> 20418 slave.cpp:3211] Handling status update TASK_FAILED (UUID: 
> b94722fb-1658-4936-b604-6d642ffe20a0) for task 291 of framework 
> 340dfe26-a09f-4857-85b8-faba5f8d95df-0008 from @0.0.0.0:0
> {code}
> As can be noticed, the task is marked as TASK_FAILED after the executor has 
> exited.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6493) Add test cases for the HTTPS health checks.

2016-10-28 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-6493:
---
Fix Version/s: (was: 1.2.0)

> Add test cases for the HTTPS health checks.
> ---
>
> Key: MESOS-6493
> URL: https://issues.apache.org/jira/browse/MESOS-6493
> Project: Mesos
>  Issue Type: Task
>  Components: tests
>Reporter: haosdent
>Assignee: haosdent
>  Labels: health-check, mesosphere, test
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6484) Memory leak in `Future::after()`

2016-10-28 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614689#comment-15614689
 ] 

Alexander Rojas commented on MESOS-6484:


So, here is how the leak works: An object {{Future}} is a wrapper for 
{{std::shared_ptr data}}. This part of the code 
{{lambda::bind(::expired, f, latch, promise, *this)}} creates a 
copy of {{this}}, where {{data}} is pointed to by {{this}} and the new copy of 
the future. This copy is stored in the callable object created by 
{{lambda::bind()}} to be executed in the future. When we execute the 
{{onAny(lambda::bind(::after, latch, promise, timer, 
lambda::_1))}}, a callable object with a copy of {{timer}} is stored in the 
{{onAnyCallbacks}} which is in itself stored in {{data}}, which ends up being 
{{data}} having indirectly a reference counted pointer to itself, therefore it 
will never be destroyed.

> Memory leak in `Future::after()`
> ---
>
> Key: MESOS-6484
> URL: https://issues.apache.org/jira/browse/MESOS-6484
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Affects Versions: 1.1.0
>Reporter: Alexander Rojas
>  Labels: libprocess, mesosphere
>
> The problem arises when one tries to associate an {{after()}} call to copied 
> futures. The following test case is enough to reproduce the issue:
> {code}
> TEST(FutureTest, After3)
> {
>   auto policy = std::make_shared(0);
>   {
> auto generator = []() {
>   return Future();
> };
> Future future = generator()
>   .after(Milliseconds(1),
> [policy](const Future&) {
>return Nothing();
> });
> AWAIT_READY(future);
>   }
>   EXPECT_EQ(1, policy.use_count());
> }
> {code}
> In the test, one would expect that there is only one active reference to 
> {{policy}}, therefore the expectation {{EXPECT_EQ(1, policy.use_count())}}. 
> However, if after is triggered more than once, each extra call adds one 
> undeleted reference to {{policy}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)