[jira] [Commented] (MESOS-8474) Test StorageLocalResourceProviderTest.ROOT_ConvertPreExistingVolume is flaky

2018-01-24 Thread Chun-Hung Hsiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338648#comment-16338648
 ] 

Chun-Hung Hsiao commented on MESOS-8474:


This is a similar issue, but now a race between {{CREATE_VOLUME}} and 
{{CREATE_BLOCK}}. Will implement a better synchronization logic in this test.

> Test StorageLocalResourceProviderTest.ROOT_ConvertPreExistingVolume is flaky
> 
>
> Key: MESOS-8474
> URL: https://issues.apache.org/jira/browse/MESOS-8474
> Project: Mesos
>  Issue Type: Bug
>  Components: storage, test
>Affects Versions: 1.5.0
>Reporter: Benjamin Bannier
>Assignee: Chun-Hung Hsiao
>Priority: Major
>  Labels: flaky, flaky-test, mesosphere
> Attachments: consoleText.txt, consoleText.txt
>
>
> Observed on our internal CI on ubuntu16.04 with SSL and GRPC enabled,
> {noformat}
> ../../src/tests/storage_local_resource_provider_tests.cpp:1898
>   Expected: 2u
>   Which is: 2
> To be equal to: destroyed.size()
>   Which is: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8487) Introduce API changes for supporting quota limits.

2018-01-24 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338639#comment-16338639
 ] 

Benjamin Mahler commented on MESOS-8487:


https://reviews.apache.org/r/65334/

> Introduce API changes for supporting quota limits.
> --
>
> Key: MESOS-8487
> URL: https://issues.apache.org/jira/browse/MESOS-8487
> Project: Mesos
>  Issue Type: Task
>  Components: HTTP API
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Major
>
> Per MESOS-8068, the introduction of a quota limit requires introducing this 
> in the API. We should send out the proposed changes more broadly in the 
> interest of being more rigorous about API changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8488) Docker bug can cause unkillable tasks

2018-01-24 Thread Gilbert Song (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gilbert Song updated MESOS-8488:

Component/s: containerization

> Docker bug can cause unkillable tasks
> -
>
> Key: MESOS-8488
> URL: https://issues.apache.org/jira/browse/MESOS-8488
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Affects Versions: 1.5.0
>Reporter: Greg Mann
>Priority: Major
>  Labels: mesosphere
>
> Due to an [issue on the Moby 
> project|https://github.com/moby/moby/issues/33820], it's possible for Docker 
> versions 1.13 and later to fail to catch a container exit, so that the 
> {{docker run}} command which was used to launch the container will never 
> return. This can lead to the Docker executor becoming stuck in a state where 
> it believes the container is still running and cannot be killed.
> We should update the Docker executor to ensure that containers stuck in such 
> a state cannot cause unkillable Docker executors/tasks.
> One way to do this would be a timeout, after which the Docker executor will 
> commit suicide if a kill task attempt has not succeeded. However, if we do 
> this we should also ensure that in the case that the container was actually 
> still running, either the Docker daemon or the DockerContainerizer would 
> clean up the container when it does exit.
> Another option might be for the Docker executor to directly {{wait()}} on the 
> container's Linux PID, in order to notice when the container exits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8488) Docker bug can cause unkillable tasks

2018-01-24 Thread Greg Mann (JIRA)
Greg Mann created MESOS-8488:


 Summary: Docker bug can cause unkillable tasks
 Key: MESOS-8488
 URL: https://issues.apache.org/jira/browse/MESOS-8488
 Project: Mesos
  Issue Type: Improvement
Affects Versions: 1.5.0
Reporter: Greg Mann


Due to an [issue on the Moby 
project|https://github.com/moby/moby/issues/33820], it's possible for Docker 
versions 1.13 and later to fail to catch a container exit, so that the {{docker 
run}} command which was used to launch the container will never return. This 
can lead to the Docker executor becoming stuck in a state where it believes the 
container is still running and cannot be killed.

We should update the Docker executor to ensure that containers stuck in such a 
state cannot cause unkillable Docker executors/tasks.

One way to do this would be a timeout, after which the Docker executor will 
commit suicide if a kill task attempt has not succeeded. However, if we do this 
we should also ensure that in the case that the container was actually still 
running, either the Docker daemon or the DockerContainerizer would clean up the 
container when it does exit.

Another option might be for the Docker executor to directly {{wait()}} on the 
container's Linux PID, in order to notice when the container exits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6822) CNI reports confusing error message for failed interface setup.

2018-01-24 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338509#comment-16338509
 ] 

Qian Zhang commented on MESOS-6822:
---

commit 2cdbec02e37c794627204f0e1fadf09e5325507d
Author: Qian Zhang 
Date: Tue Jan 23 15:54:58 2018 +0800

Updated the way to output error messages in `NetworkCniIsolatorSetup`.
 
 Review: https://reviews.apache.org/r/65306

> CNI reports confusing error message for failed interface setup.
> ---
>
> Key: MESOS-6822
> URL: https://issues.apache.org/jira/browse/MESOS-6822
> Project: Mesos
>  Issue Type: Bug
>  Components: network
>Affects Versions: 1.1.0
>Reporter: Alexander Rukletsov
>Assignee: Qian Zhang
>Priority: Major
> Fix For: 1.6.0
>
>
> Saw this today:
> {noformat}
> Failed to bring up the loopback interface in the new network namespace of pid 
> 17067: Success
> {noformat}
> which is produced by this code: 
> https://github.com/apache/mesos/blob/1e72605e9892eb4e518442ab9c1fe2a1a1696748/src/slave/containerizer/mesos/isolators/network/cni/cni.cpp#L1854-L1859
> Note that ssh'ing into the machine confirmed that {{ifconfig}} is available 
> in {{PATH}}.
> Full log: http://pastebin.com/hVdNz6yk



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8487) Introduce API changes for supporting quota limits.

2018-01-24 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-8487:
--

 Summary: Introduce API changes for supporting quota limits.
 Key: MESOS-8487
 URL: https://issues.apache.org/jira/browse/MESOS-8487
 Project: Mesos
  Issue Type: Task
  Components: HTTP API
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler


Per MESOS-8068, the introduction of a quota limit requires introducing this in 
the API. We should send out the proposed changes more broadly in the interest 
of being more rigorous about API changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8068) Non-revocable bursting over quota guarantees via limits.

2018-01-24 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-8068:
---
Target Version/s: 1.6.0

> Non-revocable bursting over quota guarantees via limits.
> 
>
> Key: MESOS-8068
> URL: https://issues.apache.org/jira/browse/MESOS-8068
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation
>Reporter: Benjamin Mahler
>Priority: Major
>  Labels: multitenancy
>
> Prior to introducing a revocable tier of allocation (see MESOS-4441), there 
> is a notion of whether a role can burst over its quota guarantee.
> We currently apply implicit limits in the following way:
> No quota guarantee set: (guarantee 0, no limit)
> Quota guarantee set: (guarantee G, limit G)
> That is, we only allow support burst-only without guarantee and 
> guarantee-only without burst. We do not support bursting over some non-zero 
> guarantee: (guarantee G, limit L >= G).
> The idea here is that we should make these implicit limits explicit to 
> clarify for users the distinction between guarantees and limits, and to 
> support bursting over the guarantee.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8486) Webui should display role limits.

2018-01-24 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-8486:
--

 Summary: Webui should display role limits.
 Key: MESOS-8486
 URL: https://issues.apache.org/jira/browse/MESOS-8486
 Project: Mesos
  Issue Type: Task
  Components: webui
Reporter: Benjamin Mahler


With the addition of quota limits (see MESOS-8068), the UI should be updated to 
display the per role limit information. Specifically, the 'Roles' tab needs to 
be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7292) Introduce a "sensitive mode" in Mesos which prevents leaks of sensitive data.

2018-01-24 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338447#comment-16338447
 ] 

Till Toenshoff commented on MESOS-7292:
---

I have linked another environment handling improvement story as they could 
possibly be solved in one go. 

> Introduce a "sensitive mode" in Mesos which prevents leaks of sensitive data.
> -
>
> Key: MESOS-7292
> URL: https://issues.apache.org/jira/browse/MESOS-7292
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Alexander Rukletsov
>Priority: Major
>  Labels: debugging, mesosphere, newbie++, security
>
> Consider a following scenario. A user passes some sensitive data in an 
> environment variable to a task. These data may be logged by Mesos components, 
> e.g., executor as part of {{mesos-containerizer}} invocation. While this is 
> useful for debugging, this might be an issue in some production environments.
> One of the solution is to have global "sensitive mode", that turns off 
> logging of such sensitive data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8484) stout test NumifyTest.HexNumberTest fails.

2018-01-24 Thread Benno Evers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338220#comment-16338220
 ] 

Benno Evers commented on MESOS-8484:


In boost 1.53, lexical_cast implements its own parser that doesnt handle the 
'0x' prefix, therefore parsing the two strings in the test would return an 
error.

 

In boost 1.65, lexical_cast calls std::istream::operator>>, which on mac (i.e. 
using libc++) can successfully parse strings of the form "0x10.9" or "0x1p-5", 
and returns the correct number. On linux platforms (i.e. using libstdc++), 
std::istream::operator>> is not able to parse these strings and thus returns an 
error.

 

The function stout::numify wants to achieve platform independence by forbidding 
these kinds of literals on all platforms. However, the checks are only 
happening *after* boost was already given the chance to parse the string, which 
has platform-dependent behaviour.

> stout test NumifyTest.HexNumberTest fails. 
> ---
>
> Key: MESOS-8484
> URL: https://issues.apache.org/jira/browse/MESOS-8484
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: macOS 10.13.2 (17C88)
> Apple LLVM version 9.0.0 (clang-900.0.37)
> ../configure && make check -j6
>Reporter: Till Toenshoff
>Assignee: Benjamin Bannier
>Priority: Blocker
>
> The current Mesos master shows the following on my machine:
> {noformat}
> [ RUN  ] NumifyTest.HexNumberTest
> ../../../3rdparty/stout/tests/numify_tests.cpp:57: Failure
> Value of: numify("0x10.9").isError()
>   Actual: false
> Expected: true
> ../../../3rdparty/stout/tests/numify_tests.cpp:58: Failure
> Value of: numify("0x1p-5").isError()
>   Actual: false
> Expected: true
> [  FAILED  ] NumifyTest.HexNumberTest (0 ms)
> {noformat}
> This problem disappears for me when reverting the latest boost upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8480) Mesos returns high resource usage when killing a Docker task.

2018-01-24 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338131#comment-16338131
 ] 

Gilbert Song commented on MESOS-8480:
-

[~zhitao] , very likely, since we got -1s for 
https://issues.apache.org/jira/browse/MESOS-8481

> Mesos returns high resource usage when killing a Docker task.
> -
>
> Key: MESOS-8480
> URL: https://issues.apache.org/jira/browse/MESOS-8480
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Major
> Fix For: 1.3.2, 1.4.2, 1.6.0, 1.5.1
>
> Attachments: test.cpp
>
>
> The way we get resource statistics for Docker tasks is through getting the 
> cgroup subsystem path through {{/proc//cgroup}} first (taking the 
> {{cpuacct}} subsystem as an example):
> {noformat}
> 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b
> {noformat}
> Then read 
> {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}}
>  to get the statistics:
> {noformat}
> user 4
> system 0
> {noformat}
> However, when a Docker container is being teared down, it seems that Docker 
> or the operation system will first move the process to the root cgroup before 
> actually killing it, making {{/proc//docker}} look like the following:
> {noformat}
> 9:cpuacct,cpu:/
> {noformat}
> This makes a racy call to 
> [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935]
>  return a single '/', which in turn makes 
> [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991]
>  read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the 
> statistics for the root cgroup:
> {noformat}
> user 228058750
> system 24506461
> {noformat}
> This can be reproduced by [^test.cpp] with the following command:
> {noformat}
> $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect 
> sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep
> ...
> Reading file '/proc/44224/cgroup'
> Reading file 
> '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat'
> user 4
> system 0
> Reading file '/proc/44224/cgroup'
> Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
> user 228058750
> system 24506461
> Reading file '/proc/44224/cgroup'
> Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
> user 228058750
> system 24506461
> Failed to open file '/proc/44224/cgroup'
> sleep
> [2]-  Exit 1  ./test $(docker inspect sleep | jq 
> .[].State.Pid)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-7258) Provide scheduler calls to subscribe to additional roles and unsubscribe from roles.

2018-01-24 Thread Kapil Arya (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kapil Arya reassigned MESOS-7258:
-

Assignee: Kapil Arya

> Provide scheduler calls to subscribe to additional roles and unsubscribe from 
> roles.
> 
>
> Key: MESOS-7258
> URL: https://issues.apache.org/jira/browse/MESOS-7258
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, scheduler api
>Reporter: Benjamin Mahler
>Assignee: Kapil Arya
>Priority: Major
>  Labels: multitenancy
>
> The current support for schedulers to subscribe to additional roles or 
> unsubscribe from some of their roles requires that the scheduler obtain a new 
> subscription with the master which invalidates the event stream.
> A more lightweight mechanism would be to provide calls for the scheduler to 
> subscribe to additional roles or unsubscribe from some roles such that the 
> existing event stream remains open and offers to the new roles arrive on the 
> existing event stream. E.g.
> SUBSCRIBE_TO_ROLE
>  UNSUBSCRIBE_FROM_ROLE
> One open question pertains to the terminology here, whether we would want to 
> avoid using "subscribe" in this context. An alternative would be:
> UPDATE_FRAMEWORK_INFO
> Which provides a generic mechanism for a framework to perform framework info 
> updates without obtaining a new event stream.
> In addition, it would be easier to use if it returned 200 on success and an 
> error response if invalid, etc. Rather than returning 202.
> *NOTE*: Not specific to this issue, but we need to figure out how to allow 
> the framework to not leak reservations, e.g. MESOS-7651.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-7258) Provide scheduler calls to subscribe to additional roles and unsubscribe from roles.

2018-01-24 Thread Benjamin Mahler (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7258:
---
Description: 
The current support for schedulers to subscribe to additional roles or 
unsubscribe from some of their roles requires that the scheduler obtain a new 
subscription with the master which invalidates the event stream.

A more lightweight mechanism would be to provide calls for the scheduler to 
subscribe to additional roles or unsubscribe from some roles such that the 
existing event stream remains open and offers to the new roles arrive on the 
existing event stream. E.g.

SUBSCRIBE_TO_ROLE
 UNSUBSCRIBE_FROM_ROLE

One open question pertains to the terminology here, whether we would want to 
avoid using "subscribe" in this context. An alternative would be:

UPDATE_FRAMEWORK_INFO

Which provides a generic mechanism for a framework to perform framework info 
updates without obtaining a new event stream.

In addition, it would be easier to use if it returned 200 on success and an 
error response if invalid, etc. Rather than returning 202.

*NOTE*: Not specific to this issue, but we need to figure out how to allow the 
framework to not leak reservations, e.g. MESOS-7651.

  was:
The current support for schedulers to subscribe to additional roles or 
unsubscribe from some of their roles requires that the scheduler obtain a new 
subscription with the master which invalidates the event stream.

A more lightweight mechanism would be to provide calls for the scheduler to 
subscribe to additional roles or unsubscribe from some roles such that the 
existing event stream remains open and offers to the new roles arrive on the 
existing event stream. E.g.

SUBSCRIBE_TO_ROLE
UNSUBSCRIBE_FROM_ROLE

One open question pertains to the terminology here, whether we would want to 
avoid using "subscribe" in this context. An alternative would be:

UPDATE_FRAMEWORK_INFO

Which provides a generic mechanism for a framework to perform framework info 
updates without obtaining a new event stream.

*NOTE*: Not specific to this issue, but we need to figure out how to allow the 
framework to not leak reservations, e.g. MESOS-7651.


> Provide scheduler calls to subscribe to additional roles and unsubscribe from 
> roles.
> 
>
> Key: MESOS-7258
> URL: https://issues.apache.org/jira/browse/MESOS-7258
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, scheduler api
>Reporter: Benjamin Mahler
>Priority: Major
>  Labels: multitenancy
>
> The current support for schedulers to subscribe to additional roles or 
> unsubscribe from some of their roles requires that the scheduler obtain a new 
> subscription with the master which invalidates the event stream.
> A more lightweight mechanism would be to provide calls for the scheduler to 
> subscribe to additional roles or unsubscribe from some roles such that the 
> existing event stream remains open and offers to the new roles arrive on the 
> existing event stream. E.g.
> SUBSCRIBE_TO_ROLE
>  UNSUBSCRIBE_FROM_ROLE
> One open question pertains to the terminology here, whether we would want to 
> avoid using "subscribe" in this context. An alternative would be:
> UPDATE_FRAMEWORK_INFO
> Which provides a generic mechanism for a framework to perform framework info 
> updates without obtaining a new event stream.
> In addition, it would be easier to use if it returned 200 on success and an 
> error response if invalid, etc. Rather than returning 202.
> *NOTE*: Not specific to this issue, but we need to figure out how to allow 
> the framework to not leak reservations, e.g. MESOS-7651.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8484) stout test NumifyTest.HexNumberTest fails.

2018-01-24 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-8484:
--
Shepherd: Till Toenshoff

> stout test NumifyTest.HexNumberTest fails. 
> ---
>
> Key: MESOS-8484
> URL: https://issues.apache.org/jira/browse/MESOS-8484
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: macOS 10.13.2 (17C88)
> Apple LLVM version 9.0.0 (clang-900.0.37)
> ../configure && make check -j6
>Reporter: Till Toenshoff
>Assignee: Benjamin Bannier
>Priority: Blocker
>
> The current Mesos master shows the following on my machine:
> {noformat}
> [ RUN  ] NumifyTest.HexNumberTest
> ../../../3rdparty/stout/tests/numify_tests.cpp:57: Failure
> Value of: numify("0x10.9").isError()
>   Actual: false
> Expected: true
> ../../../3rdparty/stout/tests/numify_tests.cpp:58: Failure
> Value of: numify("0x1p-5").isError()
>   Actual: false
> Expected: true
> [  FAILED  ] NumifyTest.HexNumberTest (0 ms)
> {noformat}
> This problem disappears for me when reverting the latest boost upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8484) stout test NumifyTest.HexNumberTest fails.

2018-01-24 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff reassigned MESOS-8484:
-

Assignee: Benjamin Bannier

> stout test NumifyTest.HexNumberTest fails. 
> ---
>
> Key: MESOS-8484
> URL: https://issues.apache.org/jira/browse/MESOS-8484
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: macOS 10.13.2 (17C88)
> Apple LLVM version 9.0.0 (clang-900.0.37)
> ../configure && make check -j6
>Reporter: Till Toenshoff
>Assignee: Benjamin Bannier
>Priority: Blocker
>
> The current Mesos master shows the following on my machine:
> {noformat}
> [ RUN  ] NumifyTest.HexNumberTest
> ../../../3rdparty/stout/tests/numify_tests.cpp:57: Failure
> Value of: numify("0x10.9").isError()
>   Actual: false
> Expected: true
> ../../../3rdparty/stout/tests/numify_tests.cpp:58: Failure
> Value of: numify("0x1p-5").isError()
>   Actual: false
> Expected: true
> [  FAILED  ] NumifyTest.HexNumberTest (0 ms)
> {noformat}
> This problem disappears for me when reverting the latest boost upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-8480) Mesos returns high resource usage when killing a Docker task.

2018-01-24 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338114#comment-16338114
 ] 

Zhitao Li edited comment on MESOS-8480 at 1/24/18 7:39 PM:
---

Will this be also cherrypicked to 1.5.0 since the RC is still not finalized yet?


was (Author: zhitao):
Will this be also back ported to 1.5.0 since the RC is still not finalized yet?

> Mesos returns high resource usage when killing a Docker task.
> -
>
> Key: MESOS-8480
> URL: https://issues.apache.org/jira/browse/MESOS-8480
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Major
> Fix For: 1.3.2, 1.4.2, 1.6.0, 1.5.1
>
> Attachments: test.cpp
>
>
> The way we get resource statistics for Docker tasks is through getting the 
> cgroup subsystem path through {{/proc//cgroup}} first (taking the 
> {{cpuacct}} subsystem as an example):
> {noformat}
> 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b
> {noformat}
> Then read 
> {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}}
>  to get the statistics:
> {noformat}
> user 4
> system 0
> {noformat}
> However, when a Docker container is being teared down, it seems that Docker 
> or the operation system will first move the process to the root cgroup before 
> actually killing it, making {{/proc//docker}} look like the following:
> {noformat}
> 9:cpuacct,cpu:/
> {noformat}
> This makes a racy call to 
> [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935]
>  return a single '/', which in turn makes 
> [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991]
>  read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the 
> statistics for the root cgroup:
> {noformat}
> user 228058750
> system 24506461
> {noformat}
> This can be reproduced by [^test.cpp] with the following command:
> {noformat}
> $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect 
> sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep
> ...
> Reading file '/proc/44224/cgroup'
> Reading file 
> '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat'
> user 4
> system 0
> Reading file '/proc/44224/cgroup'
> Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
> user 228058750
> system 24506461
> Reading file '/proc/44224/cgroup'
> Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
> user 228058750
> system 24506461
> Failed to open file '/proc/44224/cgroup'
> sleep
> [2]-  Exit 1  ./test $(docker inspect sleep | jq 
> .[].State.Pid)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8480) Mesos returns high resource usage when killing a Docker task.

2018-01-24 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338114#comment-16338114
 ] 

Zhitao Li commented on MESOS-8480:
--

Will this be also back ported to 1.5.0 since the RC is still not finalized yet?

> Mesos returns high resource usage when killing a Docker task.
> -
>
> Key: MESOS-8480
> URL: https://issues.apache.org/jira/browse/MESOS-8480
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Major
> Fix For: 1.3.2, 1.4.2, 1.6.0, 1.5.1
>
> Attachments: test.cpp
>
>
> The way we get resource statistics for Docker tasks is through getting the 
> cgroup subsystem path through {{/proc//cgroup}} first (taking the 
> {{cpuacct}} subsystem as an example):
> {noformat}
> 9:cpuacct,cpu:/docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b
> {noformat}
> Then read 
> {{/sys/fs/cgroup/cpuacct//docker/66fbe67b64ad3a86c6e080e18578bc9e540e55ee0bdcae09c2e131a4264a3a3b/cpuacct.stat}}
>  to get the statistics:
> {noformat}
> user 4
> system 0
> {noformat}
> However, when a Docker container is being teared down, it seems that Docker 
> or the operation system will first move the process to the root cgroup before 
> actually killing it, making {{/proc//docker}} look like the following:
> {noformat}
> 9:cpuacct,cpu:/
> {noformat}
> This makes a racy call to 
> [{{cgroup::internal::cgroup()}}|https://github.com/apache/mesos/blob/master/src/linux/cgroups.cpp#L1935]
>  return a single '/', which in turn makes 
> [{{DockerContainerizerProcess::cgroupsStatistics()}}|https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L1991]
>  read {{/sys/fs/cgroup/cpuacct///cpuacct.stat}}, which contains the 
> statistics for the root cgroup:
> {noformat}
> user 228058750
> system 24506461
> {noformat}
> This can be reproduced by [^test.cpp] with the following command:
> {noformat}
> $ docker run --name sleep -d --rm alpine sleep 1000; ./test $(docker inspect 
> sleep | jq .[].State.Pid) & sleep 1 && docker rm -f sleep
> ...
> Reading file '/proc/44224/cgroup'
> Reading file 
> '/sys/fs/cgroup/cpuacct//docker/1d79a6c877e2af3081630aa57d23d853e6bd7d210dad28f897556bfea20bc9c1/cpuacct.stat'
> user 4
> system 0
> Reading file '/proc/44224/cgroup'
> Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
> user 228058750
> system 24506461
> Reading file '/proc/44224/cgroup'
> Reading file '/sys/fs/cgroup/cpuacct///cpuacct.stat'
> user 228058750
> system 24506461
> Failed to open file '/proc/44224/cgroup'
> sleep
> [2]-  Exit 1  ./test $(docker inspect sleep | jq 
> .[].State.Pid)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8469) Mesos master might drop some events in the operator API stream

2018-01-24 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338104#comment-16338104
 ] 

Greg Mann commented on MESOS-8469:
--

Related test reviews:
https://reviews.apache.org/r/65315/
https://reviews.apache.org/r/65316/

> Mesos master might drop some events in the operator API stream
> --
>
> Key: MESOS-8469
> URL: https://issues.apache.org/jira/browse/MESOS-8469
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Greg Mann
>Priority: Critical
> Fix For: 1.5.0
>
>
> Inside `Master::updateTask`, we call `Subscribers::send` which asynchronously 
> calls `Subscribers::Subscriber::send` on each subscriber.
> But the problem is that inside `Subscribers:Subscriber::send` we are looking 
> up the state of the master (e.g., getting Task* and Framework*) which might 
> have changed between `Subscribers::send ` and `Subscribers::Subscriber::send`.
>  
> For example, if a terminal task received an acknowledgement the task might be 
> removed from master's state, causing us to drop the TASK_UPDATED event.
>  
> We noticed this in an internal cluster, where a TASK_KILLED update was sent 
> to one subscriber but not the other.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-8475) Event-specific overloads for 'Master::Subscribers::Subscriber::send()'

2018-01-24 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338102#comment-16338102
 ] 

Greg Mann edited comment on MESOS-8475 at 1/24/18 7:33 PM:
---

NOTE that when this ticket is addressed, we will also need to update the 
related test {{MasterAPITest.EventAuthorizationDelayed}}, since it currently 
depends on each event causing 4 calls into the authorizer: 
https://reviews.apache.org/r/65316


was (Author: greggomann):
NOTE that when this ticket is addressed, we will also need to update the 
related test {{MasterAPITest.EventAuthorizationDelayed}}, since it currently 
depends on each event causing 4 calls into the authorizer.

> Event-specific overloads for 'Master::Subscribers::Subscriber::send()'
> --
>
> Key: MESOS-8475
> URL: https://issues.apache.org/jira/browse/MESOS-8475
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Greg Mann
>Priority: Major
>  Labels: authorization, mesosphere
>
> The code could be more efficient and more readable if we introduce 
> event-specific overloads for the {{Master::Subscribers::Subscriber::send()}} 
> method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8475) Event-specific overloads for 'Master::Subscribers::Subscriber::send()'

2018-01-24 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338102#comment-16338102
 ] 

Greg Mann commented on MESOS-8475:
--

NOTE that when this ticket is addressed, we will also need to update the 
related test {{MasterAPITest.EventAuthorizationDelayed}}, since it currently 
depends on each event causing 4 calls into the authorizer.

> Event-specific overloads for 'Master::Subscribers::Subscriber::send()'
> --
>
> Key: MESOS-8475
> URL: https://issues.apache.org/jira/browse/MESOS-8475
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Greg Mann
>Priority: Major
>  Labels: authorization, mesosphere
>
> The code could be more efficient and more readable if we introduce 
> event-specific overloads for the {{Master::Subscribers::Subscriber::send()}} 
> method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8485) MasterTest.RegistryGcByCount is flaky

2018-01-24 Thread Benno Evers (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benno Evers reassigned MESOS-8485:
--

Assignee: Benno Evers

> MasterTest.RegistryGcByCount is flaky
> -
>
> Key: MESOS-8485
> URL: https://issues.apache.org/jira/browse/MESOS-8485
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
>Reporter: Vinod Kone
>Assignee: Benno Evers
>Priority: Major
>  Labels: flaky-test
>
> Observed this while testing Mesos 1.5.0-rc1 in ASF CI.
>  
> {code}
> 3: [ RUN      ] MasterTest.RegistryGcByCount
> ..snip...
> 3: I0123 19:22:05.929347 15994 slave.cpp:1201] Detecting new master
> 3: I0123 19:22:05.931701 15988 slave.cpp:1228] Authenticating with master 
> master@172.17.0.2:45634
> 3: I0123 19:22:05.931838 15988 slave.cpp:1237] Using default CRAM-MD5 
> authenticatee
> 3: I0123 19:22:05.932153 15999 authenticatee.cpp:121] Creating new client 
> SASL connection
> 3: I0123 19:22:05.932580 15992 master.cpp:8958] Authenticating 
> slave(442)@172.17.0.2:45634
> 3: I0123 19:22:05.932822 15990 authenticator.cpp:414] Starting authentication 
> session for crammd5-authenticatee(870)@172.17.0.2:45634
> 3: I0123 19:22:05.933163 15989 authenticator.cpp:98] Creating new server SASL 
> connection
> 3: I0123 19:22:05.933465 16001 authenticatee.cpp:213] Received SASL 
> authentication mechanisms: CRAM-MD5
> 3: I0123 19:22:05.933495 16001 authenticatee.cpp:239] Attempting to 
> authenticate with mechanism 'CRAM-MD5'
> 3: I0123 19:22:05.933631 15987 authenticator.cpp:204] Received SASL 
> authentication start
> 3: I0123 19:22:05.933712 15987 authenticator.cpp:326] Authentication requires 
> more steps
> 3: I0123 19:22:05.933851 15987 authenticatee.cpp:259] Received SASL 
> authentication step
> 3: I0123 19:22:05.934006 15987 authenticator.cpp:232] Received SASL 
> authentication step
> 3: I0123 19:22:05.934041 15987 auxprop.cpp:109] Request to lookup properties 
> for user: 'test-principal' realm: '455912973e2c' server FQDN: '455912973e2c' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: false 
> 3: I0123 19:22:05.934095 15987 auxprop.cpp:181] Looking up auxiliary property 
> '*userPassword'
> 3: I0123 19:22:05.934147 15987 auxprop.cpp:181] Looking up auxiliary property 
> '*cmusaslsecretCRAM-MD5'
> 3: I0123 19:22:05.934279 15987 auxprop.cpp:109] Request to lookup properties 
> for user: 'test-principal' realm: '455912973e2c' server FQDN: '455912973e2c' 
> SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
> SASL_AUXPROP_AUTHZID: true 
> 3: I0123 19:22:05.934298 15987 auxprop.cpp:131] Skipping auxiliary property 
> '*userPassword' since SASL_AUXPROP_AUTHZID == true
> 3: I0123 19:22:05.934307 15987 auxprop.cpp:131] Skipping auxiliary property 
> '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> 3: I0123 19:22:05.934324 15987 authenticator.cpp:318] Authentication success
> 3: I0123 19:22:05.934463 15995 authenticatee.cpp:299] Authentication success
> 3: I0123 19:22:05.934563 16002 master.cpp:8988] Successfully authenticated 
> principal 'test-principal' at slave(442)@172.17.0.2:45634
> 3: I0123 19:22:05.934708 15993 authenticator.cpp:432] Authentication session 
> cleanup for crammd5-authenticatee(870)@172.17.0.2:45634
> 3: I0123 19:22:05.934891 15995 slave.cpp:1320] Successfully authenticated 
> with master master@172.17.0.2:45634
> 3: I0123 19:22:05.935261 15995 slave.cpp:1764] Will retry registration in 
> 2.234083ms if necessary
> 3: I0123 19:22:05.935436 15999 master.cpp:6061] Received register agent 
> message from slave(442)@172.17.0.2:45634 (455912973e2c)
> 3: I0123 19:22:05.935662 15999 master.cpp:3867] Authorizing agent with 
> principal 'test-principal'
> 3: I0123 19:22:05.936161 15992 master.cpp:6123] Authorized registration of 
> agent at slave(442)@172.17.0.2:45634 (455912973e2c)
> 3: I0123 19:22:05.936261 15992 master.cpp:6234] Registering agent at 
> slave(442)@172.17.0.2:45634 (455912973e2c) with id 
> eef8ea11-9247-44f3-84cf-340b24df3a52-S0
> 3: I0123 19:22:05.936993 15989 registrar.cpp:495] Applied 1 operations in 
> 227911ns; attempting to update the registry
> 3: I0123 19:22:05.937814 15989 registrar.cpp:552] Successfully updated the 
> registry in 743168ns
> 3: I0123 19:22:05.938057 15991 master.cpp:6282] Admitted agent 
> eef8ea11-9247-44f3-84cf-340b24df3a52-S0 at slave(442)@172.17.0.2:45634 
> (455912973e2c)
> 3: I0123 19:22:05.938891 15991 master.cpp:6331] Registered agent 
> eef8ea11-9247-44f3-84cf-340b24df3a52-S0 at slave(442)@172.17.0.2:45634 
> (455912973e2c) with cpus:2; mem:1024; disk:1024; ports:[31000-32000]
> 3: I0123 19:22:05.939159 16002 slave.cpp:1764] Will retry registration in 
> 26.332876ms if necessary

[jira] [Commented] (MESOS-6985) os::getenv() can segfault

2018-01-24 Thread Ilya Pronin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338076#comment-16338076
 ] 

Ilya Pronin commented on MESOS-6985:


[~vinodkone], sorry I missed the comment somehow. I have a POC-like patch for 
this, didn't have time to finish it. I'll try to finish it maybe next week. 
Feel free to reassign if somebody would like to work on it before that.

> os::getenv() can segfault
> -
>
> Key: MESOS-6985
> URL: https://issues.apache.org/jira/browse/MESOS-6985
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
> Environment: ASF CI, Ubuntu 14.04 and CentOS 7 both with and without 
> libevent/SSL
>Reporter: Greg Mann
>Assignee: Ilya Pronin
>Priority: Major
>  Labels: flaky-test, reliability, stout
> Attachments: 
> MasterMaintenanceTest.InverseOffersFilters-truncated.txt, 
> MasterTest.MultipleExecutors.txt
>
>
> This was observed on ASF CI. The segfault first showed up on CI on 9/20/16 
> and has been produced by the tests {{MasterTest.MultipleExecutors}} and 
> {{MasterMaintenanceTest.InverseOffersFilters}}. In both cases, 
> {{os::getenv()}} segfaults with the same stack trace:
> {code}
> *** Aborted at 1485241617 (unix time) try "date -d @1485241617" if you are 
> using GNU date ***
> PC: @ 0x2ad59e3ae82d (unknown)
> I0124 07:06:57.422080 28619 exec.cpp:162] Version: 1.2.0
> *** SIGSEGV (@0xf0) received by PID 28591 (TID 0x2ad5a7b87700) from PID 240; 
> stack trace: ***
> I0124 07:06:57.422336 28615 exec.cpp:212] Executor started at: 
> executor(75)@172.17.0.2:45752 with pid 28591
> @ 0x2ad5ab953197 (unknown)
> @ 0x2ad5ab957479 (unknown)
> @ 0x2ad59e165330 (unknown)
> @ 0x2ad59e3ae82d (unknown)
> @ 0x2ad594631358 os::getenv()
> @ 0x2ad59aba6acf mesos::internal::slave::executorEnvironment()
> @ 0x2ad59ab845c0 mesos::internal::slave::Framework::launchExecutor()
> @ 0x2ad59ab818a2 mesos::internal::slave::Slave::_run()
> @ 0x2ad59ac1ec10 
> _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureIbEERKNS1_13FrameworkInfoERKNS1_12ExecutorInfoERK6OptionINS1_8TaskInfoEERKSF_INS1_13TaskGroupInfoEES6_S9_SC_SH_SL_EEvRKNS_3PIDIT_EEMSP_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES16_
> @ 0x2ad59ac1e6bf 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureIbEERKNS5_13FrameworkInfoERKNS5_12ExecutorInfoERK6OptionINS5_8TaskInfoEERKSJ_INS5_13TaskGroupInfoEESA_SD_SG_SL_SP_EEvRKNS0_3PIDIT_EEMST_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2ad59bce2304 std::function<>::operator()()
> @ 0x2ad59bcc9824 process::ProcessBase::visit()
> @ 0x2ad59bd4028e process::DispatchEvent::visit()
> @ 0x2ad594616df1 process::ProcessBase::serve()
> @ 0x2ad59bcc72b7 process::ProcessManager::resume()
> @ 0x2ad59bcd567c 
> process::ProcessManager::init_threads()::$_2::operator()()
> @ 0x2ad59bcd5585 
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_2vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x2ad59bcd std::_Bind_simple<>::operator()()
> @ 0x2ad59bcd552c std::thread::_Impl<>::_M_run()
> @ 0x2ad59d9e6a60 (unknown)
> @ 0x2ad59e15d184 start_thread
> @ 0x2ad59e46d37d (unknown)
> make[4]: *** [check-local] Segmentation fault
> {code}
> Find attached the full log from a failed run of 
> {{MasterTest.MultipleExecutors}} and a truncated log from a failed run of 
> {{MasterMaintenanceTest.InverseOffersFilters}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8485) MasterTest.RegistryGcByCount is flaky

2018-01-24 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-8485:
-

 Summary: MasterTest.RegistryGcByCount is flaky
 Key: MESOS-8485
 URL: https://issues.apache.org/jira/browse/MESOS-8485
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
Reporter: Vinod Kone


Observed this while testing Mesos 1.5.0-rc1 in ASF CI.

 

{code}

3: [ RUN      ] MasterTest.RegistryGcByCount

..snip...

3: I0123 19:22:05.929347 15994 slave.cpp:1201] Detecting new master

3: I0123 19:22:05.931701 15988 slave.cpp:1228] Authenticating with master 
master@172.17.0.2:45634

3: I0123 19:22:05.931838 15988 slave.cpp:1237] Using default CRAM-MD5 
authenticatee

3: I0123 19:22:05.932153 15999 authenticatee.cpp:121] Creating new client SASL 
connection

3: I0123 19:22:05.932580 15992 master.cpp:8958] Authenticating 
slave(442)@172.17.0.2:45634

3: I0123 19:22:05.932822 15990 authenticator.cpp:414] Starting authentication 
session for crammd5-authenticatee(870)@172.17.0.2:45634

3: I0123 19:22:05.933163 15989 authenticator.cpp:98] Creating new server SASL 
connection

3: I0123 19:22:05.933465 16001 authenticatee.cpp:213] Received SASL 
authentication mechanisms: CRAM-MD5

3: I0123 19:22:05.933495 16001 authenticatee.cpp:239] Attempting to 
authenticate with mechanism 'CRAM-MD5'

3: I0123 19:22:05.933631 15987 authenticator.cpp:204] Received SASL 
authentication start

3: I0123 19:22:05.933712 15987 authenticator.cpp:326] Authentication requires 
more steps

3: I0123 19:22:05.933851 15987 authenticatee.cpp:259] Received SASL 
authentication step

3: I0123 19:22:05.934006 15987 authenticator.cpp:232] Received SASL 
authentication step

3: I0123 19:22:05.934041 15987 auxprop.cpp:109] Request to lookup properties 
for user: 'test-principal' realm: '455912973e2c' server FQDN: '455912973e2c' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: false 

3: I0123 19:22:05.934095 15987 auxprop.cpp:181] Looking up auxiliary property 
'*userPassword'

3: I0123 19:22:05.934147 15987 auxprop.cpp:181] Looking up auxiliary property 
'*cmusaslsecretCRAM-MD5'

3: I0123 19:22:05.934279 15987 auxprop.cpp:109] Request to lookup properties 
for user: 'test-principal' realm: '455912973e2c' server FQDN: '455912973e2c' 
SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false 
SASL_AUXPROP_AUTHZID: true 

3: I0123 19:22:05.934298 15987 auxprop.cpp:131] Skipping auxiliary property 
'*userPassword' since SASL_AUXPROP_AUTHZID == true

3: I0123 19:22:05.934307 15987 auxprop.cpp:131] Skipping auxiliary property 
'*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true

3: I0123 19:22:05.934324 15987 authenticator.cpp:318] Authentication success

3: I0123 19:22:05.934463 15995 authenticatee.cpp:299] Authentication success

3: I0123 19:22:05.934563 16002 master.cpp:8988] Successfully authenticated 
principal 'test-principal' at slave(442)@172.17.0.2:45634

3: I0123 19:22:05.934708 15993 authenticator.cpp:432] Authentication session 
cleanup for crammd5-authenticatee(870)@172.17.0.2:45634

3: I0123 19:22:05.934891 15995 slave.cpp:1320] Successfully authenticated with 
master master@172.17.0.2:45634

3: I0123 19:22:05.935261 15995 slave.cpp:1764] Will retry registration in 
2.234083ms if necessary

3: I0123 19:22:05.935436 15999 master.cpp:6061] Received register agent message 
from slave(442)@172.17.0.2:45634 (455912973e2c)

3: I0123 19:22:05.935662 15999 master.cpp:3867] Authorizing agent with 
principal 'test-principal'

3: I0123 19:22:05.936161 15992 master.cpp:6123] Authorized registration of 
agent at slave(442)@172.17.0.2:45634 (455912973e2c)

3: I0123 19:22:05.936261 15992 master.cpp:6234] Registering agent at 
slave(442)@172.17.0.2:45634 (455912973e2c) with id 
eef8ea11-9247-44f3-84cf-340b24df3a52-S0

3: I0123 19:22:05.936993 15989 registrar.cpp:495] Applied 1 operations in 
227911ns; attempting to update the registry

3: I0123 19:22:05.937814 15989 registrar.cpp:552] Successfully updated the 
registry in 743168ns

3: I0123 19:22:05.938057 15991 master.cpp:6282] Admitted agent 
eef8ea11-9247-44f3-84cf-340b24df3a52-S0 at slave(442)@172.17.0.2:45634 
(455912973e2c)

3: I0123 19:22:05.938891 15991 master.cpp:6331] Registered agent 
eef8ea11-9247-44f3-84cf-340b24df3a52-S0 at slave(442)@172.17.0.2:45634 
(455912973e2c) with cpus:2; mem:1024; disk:1024; ports:[31000-32000]

3: I0123 19:22:05.939159 16002 slave.cpp:1764] Will retry registration in 
26.332876ms if necessary

3: I0123 19:22:05.939349 15994 master.cpp:6061] Received register agent message 
from slave(442)@172.17.0.2:45634 (455912973e2c)

3: I0123 19:22:05.939347 15998 hierarchical.cpp:574] Added agent 
eef8ea11-9247-44f3-84cf-340b24df3a52-S0 (455912973e2c) with cpus:2; mem:1024; 
disk:1024; ports:[31000-32000] (allocated: {})

3: I0123 19:22:05.939574 15994 master.cpp:3867] Authorizing agent with 

[jira] [Commented] (MESOS-8484) stout test NumifyTest.HexNumberTest fails.

2018-01-24 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338072#comment-16338072
 ] 

Till Toenshoff commented on MESOS-8484:
---

commits tried:
- current head (cd2774efde5e55cc027721086af14fbc78688849) -> fails
- e91ce42ed56c5ab65220fbba740a8a50c7f835ae -> works

> stout test NumifyTest.HexNumberTest fails. 
> ---
>
> Key: MESOS-8484
> URL: https://issues.apache.org/jira/browse/MESOS-8484
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: macOS 10.13.2 (17C88)
> Apple LLVM version 9.0.0 (clang-900.0.37)
> ../configure && make check -j6
>Reporter: Till Toenshoff
>Priority: Blocker
>
> The current Mesos master shows the following on my machine:
> {noformat}
> [ RUN  ] NumifyTest.HexNumberTest
> ../../../3rdparty/stout/tests/numify_tests.cpp:57: Failure
> Value of: numify("0x10.9").isError()
>   Actual: false
> Expected: true
> ../../../3rdparty/stout/tests/numify_tests.cpp:58: Failure
> Value of: numify("0x1p-5").isError()
>   Actual: false
> Expected: true
> [  FAILED  ] NumifyTest.HexNumberTest (0 ms)
> {noformat}
> This problem disappears for me when reverting the latest boost upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8484) stout test NumifyTest.HexNumberTest fails.

2018-01-24 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-8484:
--
Environment: 
macOS 10.13.2 (17C88)
Apple LLVM version 9.0.0 (clang-900.0.37)

../configure && make check -j6

> stout test NumifyTest.HexNumberTest fails. 
> ---
>
> Key: MESOS-8484
> URL: https://issues.apache.org/jira/browse/MESOS-8484
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: macOS 10.13.2 (17C88)
> Apple LLVM version 9.0.0 (clang-900.0.37)
> ../configure && make check -j6
>Reporter: Till Toenshoff
>Priority: Blocker
>
> The current Mesos master shows the following on my machine:
> {noformat}
> [ RUN  ] NumifyTest.HexNumberTest
> ../../../3rdparty/stout/tests/numify_tests.cpp:57: Failure
> Value of: numify("0x10.9").isError()
>   Actual: false
> Expected: true
> ../../../3rdparty/stout/tests/numify_tests.cpp:58: Failure
> Value of: numify("0x1p-5").isError()
>   Actual: false
> Expected: true
> [  FAILED  ] NumifyTest.HexNumberTest (0 ms)
> {noformat}
> This problem disappears for me when reverting the latest boost upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8483) ExampleTests PythonFramework fails with sigabort.

2018-01-24 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338061#comment-16338061
 ] 

Benjamin Bannier commented on MESOS-8483:
-

This fails for me in a different way,
{noformat}
% ./examples/python/test-framework local

[libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
exists in database: mesos/mesos.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:1394] CHECK failed: 
generated_database_->Add(encoded_file_descriptor, size):
libc++abi.dylib: terminating with uncaught exception of type 
google::protobuf::FatalException: CHECK failed: 
generated_database_->Add(encoded_file_descriptor, size):
[1]57083 abort  ./examples/python/test-framework local
{noformat}

I am using python-2.7.14 and unbundled protobuf-3.5.1 from homebrew.

> ExampleTests PythonFramework fails with sigabort.
> -
>
> Key: MESOS-8483
> URL: https://issues.apache.org/jira/browse/MESOS-8483
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.5.0
> Environment: macOS 10.13.2 (17C88)
> Python 2.7.10 (Apple's default - not homebrew)
>Reporter: Till Toenshoff
>Priority: Blocker
>
> Starting the {{PythonFramework}} manually results in a sigabort:
> {noformat}
> $ ./src/examples/python/test-framework local
> [..]
> I0124 15:22:46.637238 65925120 master.cpp:563] Using default 'crammd5' 
> authenticator
> W0124 15:22:46.637269 65925120 authenticator.cpp:513] No credentials 
> provided, authentication requests will be refused
> I0124 15:22:46.637284 65925120 authenticator.cpp:520] Initializing server SASL
> I0124 15:22:46.659503 2385417024 resolver.cpp:69] Creating default secret 
> resolver
> I0124 15:22:46.659624 2385417024 containerizer.cpp:304] Using isolation { 
> environment_secret, filesystem/posix, posix/mem, posix/cpu }
> I0124 15:22:46.659951 2385417024 provisioner.cpp:299] Using default backend 
> 'copy'
> I0124 15:22:46.661628 67534848 slave.cpp:262] Mesos agent started on 
> (1)@192.168.178.20:49682
> I0124 15:22:46.661669 67534848 slave.cpp:263] Flags at startup: 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/store/appc"
>  --authenticate_http_executors="false" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --container_disk_watch_interval="15secs" --containerizers="mesos" 
> --default_role="*" --disk_watch_interval="1mins" --docker="docker" 
> --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
> --docker_stop_timeout="0ns" 
> --docker_store_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/store/docker"
>  --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false" 
> --executor_registration_timeout="1mins" 
> --executor_reregistration_timeout="2secs" 
> --executor_shutdown_grace_period="5secs" 
> --fetcher_cache_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/fetch"
>  --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
> --gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
> --hostname_lookup="true" --http_command_executor="false" 
> --http_heartbeat_interval="30secs" --initialize_driver_logging="true" 
> --isolation="posix/cpu,posix/mem" --launcher="posix" 
> --launcher_dir="/usr/local/libexec/mesos" --logbufsecs="0" 
> --logging_level="INFO" --max_completed_executors_per_framework="150" 
> --oversubscribed_resources_interval="15secs" --port="5051" 
> --qos_correction_interval_min="0ns" --quiet="false" 
> --reconfiguration_policy="equal" --recover="reconnect" 
> --recovery_timeout="15mins" --registration_backoff_factor="1secs" 
> --runtime_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/run"
>  --sandbox_directory="/mnt/mesos/sandbox" --strict="true" 
> --switch_user="true" --version="false" 
> --work_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/work"
>  --zk_session_timeout="10secs"
> python(1780,0x74068000) malloc: *** error for object 0x106ac07c8: pointer 
> being freed was not allocated
> *** set a breakpoint in malloc_error_break to debug
> {noformat}
> When running the {{PythonFramework}} via lldb, I get the following stacktrace:
> {noformat}
> * thread #7, stop reason = signal SIGABRT
>   * frame #0: 0x7fff55321e3e libsystem_kernel.dylib`__pthread_kill + 10
> frame #1: 0x7fff55460150 libsystem_pthread.dylib`pthread_kill + 333
> frame #2: 0x7fff5527e312 libsystem_c.dylib`abort + 127
> frame #3: 0x7fff5537b866 libsystem_malloc.dylib`free + 521
> 

[jira] [Updated] (MESOS-8481) Agent reboot during checkpointing may result in empty checkpoints.

2018-01-24 Thread Michael Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Park updated MESOS-8481:

Priority: Blocker  (was: Major)
Target Version/s: 1.5.0

> Agent reboot during checkpointing may result in empty checkpoints.
> --
>
> Key: MESOS-8481
> URL: https://issues.apache.org/jira/browse/MESOS-8481
> Project: Mesos
>  Issue Type: Bug
>Reporter: Chun-Hung Hsiao
>Assignee: Michael Park
>Priority: Blocker
>
> An empty checkpoint file was created due to the following incident.
> At 17:12:25, the master assigned a task to an agent:
> {noformat}
> I0123 17:12:25.00 18618 master.cpp:11457] Adding task 5602 with resources 
> cpus(allocated: *):0.1; mem(allocated: *):128 on agent 
> aaf0a62f-a6eb-4c1d-80db-5fdd26fe8008-S4 at slave(1)@:5051 
> ()
> I0123 17:12:25.00 18618 master.cpp:5017] Launching task 5602 of framework 
> 6f9b0688-38f7-4b38-bb1c-421f55e486e5-0112 (Balloon Framework OOM) at 
> scheduler-fbba22f7-ebbc-4864-8394-0aa558f8ffaa@:10015 with resources 
> [...] on agent aaf0a62f-a6eb-4c1d-80db-5fdd26fe8008-S4 at 
> slave(1)@:5051 ()
> {noformat}
> Meanwhile, the agent is being rebooted:
> {noformat}
> $ last reboot
> reboot   system boot  3.10.0-693.11.6. Tue Jan 23 17:14 - 00:09  (06:55)
> {noformat}
> The agent log did not show any information about the task, possibly because 
> there was no fsync before reboot:
> {noformat}
> I0123 17:12:09.00 17237 http.cpp:851] Authorizing principal 
> 'dcos_checks_agent' to GET the endpoint '/metrics/snapshot'
> -- Reboot --
> I0123 17:15:40.00  2689 logsink.cpp:89] Added FileSink for glog logs to: 
> /var/log/mesos/mesos-agent.log
> {noformat}
> However, the agent was checkpointing the task before reboot:
> {noformat}
> $ sudo stat 
> /var/lib/mesos/slave/meta/slaves/aaf0a62f-a6eb-4c1d-80db-5fdd26fe8008-S4/frameworks/6f9b0688-38f7-4b38-bb1c-421f55e486e5-0112/executors/5602/
>   File: 
> ‘/var/lib/mesos/slave/meta/slaves/aaf0a62f-a6eb-4c1d-80db-5fdd26fe8008-S4/frameworks/6f9b0688-38f7-4b38-bb1c-421f55e486e5-0112/executors/5602/’
>   Size: 39Blocks: 0  IO Block: 4096   directory
> Device: ca40h/51776d  Inode: 67306254Links: 3
> Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)
> Context: system_u:object_r:unlabeled_t:s0
> Access: 2018-01-24 00:23:43.237322609 +
> Modify: 2018-01-23 17:12:25.751463030 +
> Change: 2018-01-23 17:12:25.751463030 +
>  Birth: -
> {noformat}
> And since there was no fsync before reboot, all checkpoints resulted in empty 
> files:
> {noformat}
> $ sudo stat 
> /var/lib/mesos/slave/meta/slaves/aaf0a62f-a6eb-4c1d-80db-5fdd26fe8008-S4/frameworks/6f9b0688-38f7-4b38-bb1c-421f55e486e5-0112/framework.info
>   File: 
> ‘/var/lib/mesos/slave/meta/slaves/aaf0a62f-a6eb-4c1d-80db-5fdd26fe8008-S4/frameworks/6f9b0688-38f7-4b38-bb1c-421f55e486e5-0112/framework.info’
>   Size: 0   Blocks: 0  IO Block: 4096   regular empty file
> Device: ca40h/51776dInode: 33967500Links: 1
> Access: (0600/-rw---)  Uid: (0/root)   Gid: (0/root)
> Context: system_u:object_r:unlabeled_t:s0
> Access: 2018-01-23 17:15:41.485506070 +
> Modify: 2018-01-23 17:12:25.749463047 +
> Change: 2018-01-23 17:12:25.749463047 +
>  Birth: -
> $ sudo stat 
> /var/lib/mesos/slave/meta/slaves/aaf0a62f-a6eb-4c1d-80db-5fdd26fe8008-S4/frameworks/6f9b0688-38f7-4b38-bb1c-421f55e486e5-0112/framework.pid
>   File: 
> ‘/var/lib/mesos/slave/meta/slaves/aaf0a62f-a6eb-4c1d-80db-5fdd26fe8008-S4/frameworks/6f9b0688-38f7-4b38-bb1c-421f55e486e5-0112/framework.pid’
>   Size: 0   Blocks: 0  IO Block: 4096   regular empty file
> Device: ca40h/51776dInode: 33967495Links: 1
> Access: (0600/-rw---)  Uid: (0/root)   Gid: (0/root)
> Context: system_u:object_r:unlabeled_t:s0
> Access: 2018-01-23 23:00:42.190975780 +
> Modify: 2018-01-23 17:12:25.749463047 +
> Change: 2018-01-23 17:12:25.749463047 +
>  Birth: -
> $ sudo stat 
> /var/lib/mesos/slave/meta/slaves/aaf0a62f-a6eb-4c1d-80db-5fdd26fe8008-S4/frameworks/6f9b0688-38f7-4b38-bb1c-421f55e486e5-0112/executors/5602/executor.info
>   File: 
> ‘/var/lib/mesos/slave/meta/slaves/aaf0a62f-a6eb-4c1d-80db-5fdd26fe8008-S4/frameworks/6f9b0688-38f7-4b38-bb1c-421f55e486e5-0112/executors/5602/executor.info’
>   Size: 0 Blocks: 0  IO Block: 4096   regular empty file
> Device: ca40h/51776d  Inode: 67306255Links: 1
> Access: (0600/-rw---)  Uid: (0/root)   Gid: (0/root)
> Context: system_u:object_r:unlabeled_t:s0
> Access: 2018-01-23 17:12:25.751463030 +
> Modify: 2018-01-23 17:12:25.751463030 +
> Change: 2018-01-23 17:12:25.751463030 +
>  Birth: -
> {noformat}
> So were 

[jira] [Updated] (MESOS-8484) stout test NumifyTest.HexNumberTest fails.

2018-01-24 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-8484:

Affects Version/s: 1.6.0

> stout test NumifyTest.HexNumberTest fails. 
> ---
>
> Key: MESOS-8484
> URL: https://issues.apache.org/jira/browse/MESOS-8484
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Till Toenshoff
>Priority: Blocker
>
> The current Mesos master shows the following on my machine:
> {noformat}
> [ RUN  ] NumifyTest.HexNumberTest
> ../../../3rdparty/stout/tests/numify_tests.cpp:57: Failure
> Value of: numify("0x10.9").isError()
>   Actual: false
> Expected: true
> ../../../3rdparty/stout/tests/numify_tests.cpp:58: Failure
> Value of: numify("0x1p-5").isError()
>   Actual: false
> Expected: true
> [  FAILED  ] NumifyTest.HexNumberTest (0 ms)
> {noformat}
> This problem disappears for me when reverting the latest boost upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8484) stout test NumifyTest.HexNumberTest fails.

2018-01-24 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-8484:
-

 Summary: stout test NumifyTest.HexNumberTest fails. 
 Key: MESOS-8484
 URL: https://issues.apache.org/jira/browse/MESOS-8484
 Project: Mesos
  Issue Type: Bug
Reporter: Till Toenshoff


The current Mesos master shows the following on my machine:

{noformat}
[ RUN  ] NumifyTest.HexNumberTest
../../../3rdparty/stout/tests/numify_tests.cpp:57: Failure
Value of: numify("0x10.9").isError()
  Actual: false
Expected: true
../../../3rdparty/stout/tests/numify_tests.cpp:58: Failure
Value of: numify("0x1p-5").isError()
  Actual: false
Expected: true
[  FAILED  ] NumifyTest.HexNumberTest (0 ms)
{noformat}

This problem disappears for me when reverting the latest boost upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-3160) CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky

2018-01-24 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-3160:
-
Story Points: 3
  Sprint: Mesosphere Sprint 73

> CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky
> 
>
> Key: MESOS-3160
> URL: https://issues.apache.org/jira/browse/MESOS-3160
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04
> CentOS 7
>Reporter: Paul Brett
>Assignee: Greg Mann
>Priority: Major
>  Labels: cgroups, flaky-test, mesosphere
>
> Test will occasionally with:
> [ RUN  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): Failed to sync with the subprocess
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): The subprocess has not been spawned yet
> [  FAILED  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS 
> (223 ms)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6985) os::getenv() can segfault

2018-01-24 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16337966#comment-16337966
 ] 

Vinod Kone commented on MESOS-6985:
---

Should we re-assign this to someone else [~ipronin]?

> os::getenv() can segfault
> -
>
> Key: MESOS-6985
> URL: https://issues.apache.org/jira/browse/MESOS-6985
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
> Environment: ASF CI, Ubuntu 14.04 and CentOS 7 both with and without 
> libevent/SSL
>Reporter: Greg Mann
>Assignee: Ilya Pronin
>Priority: Major
>  Labels: flaky-test, reliability, stout
> Attachments: 
> MasterMaintenanceTest.InverseOffersFilters-truncated.txt, 
> MasterTest.MultipleExecutors.txt
>
>
> This was observed on ASF CI. The segfault first showed up on CI on 9/20/16 
> and has been produced by the tests {{MasterTest.MultipleExecutors}} and 
> {{MasterMaintenanceTest.InverseOffersFilters}}. In both cases, 
> {{os::getenv()}} segfaults with the same stack trace:
> {code}
> *** Aborted at 1485241617 (unix time) try "date -d @1485241617" if you are 
> using GNU date ***
> PC: @ 0x2ad59e3ae82d (unknown)
> I0124 07:06:57.422080 28619 exec.cpp:162] Version: 1.2.0
> *** SIGSEGV (@0xf0) received by PID 28591 (TID 0x2ad5a7b87700) from PID 240; 
> stack trace: ***
> I0124 07:06:57.422336 28615 exec.cpp:212] Executor started at: 
> executor(75)@172.17.0.2:45752 with pid 28591
> @ 0x2ad5ab953197 (unknown)
> @ 0x2ad5ab957479 (unknown)
> @ 0x2ad59e165330 (unknown)
> @ 0x2ad59e3ae82d (unknown)
> @ 0x2ad594631358 os::getenv()
> @ 0x2ad59aba6acf mesos::internal::slave::executorEnvironment()
> @ 0x2ad59ab845c0 mesos::internal::slave::Framework::launchExecutor()
> @ 0x2ad59ab818a2 mesos::internal::slave::Slave::_run()
> @ 0x2ad59ac1ec10 
> _ZZN7process8dispatchIN5mesos8internal5slave5SlaveERKNS_6FutureIbEERKNS1_13FrameworkInfoERKNS1_12ExecutorInfoERK6OptionINS1_8TaskInfoEERKSF_INS1_13TaskGroupInfoEES6_S9_SC_SH_SL_EEvRKNS_3PIDIT_EEMSP_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_ENKUlPNS_11ProcessBaseEE_clES16_
> @ 0x2ad59ac1e6bf 
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal5slave5SlaveERKNS0_6FutureIbEERKNS5_13FrameworkInfoERKNS5_12ExecutorInfoERK6OptionINS5_8TaskInfoEERKSJ_INS5_13TaskGroupInfoEESA_SD_SG_SL_SP_EEvRKNS0_3PIDIT_EEMST_FvT0_T1_T2_T3_T4_ET5_T6_T7_T8_T9_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x2ad59bce2304 std::function<>::operator()()
> @ 0x2ad59bcc9824 process::ProcessBase::visit()
> @ 0x2ad59bd4028e process::DispatchEvent::visit()
> @ 0x2ad594616df1 process::ProcessBase::serve()
> @ 0x2ad59bcc72b7 process::ProcessManager::resume()
> @ 0x2ad59bcd567c 
> process::ProcessManager::init_threads()::$_2::operator()()
> @ 0x2ad59bcd5585 
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_2vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x2ad59bcd std::_Bind_simple<>::operator()()
> @ 0x2ad59bcd552c std::thread::_Impl<>::_M_run()
> @ 0x2ad59d9e6a60 (unknown)
> @ 0x2ad59e15d184 start_thread
> @ 0x2ad59e46d37d (unknown)
> make[4]: *** [check-local] Segmentation fault
> {code}
> Find attached the full log from a failed run of 
> {{MasterTest.MultipleExecutors}} and a truncated log from a failed run of 
> {{MasterMaintenanceTest.InverseOffersFilters}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-8434) Cleanup Authorization logic in master and agent

2018-01-24 Thread Alexander Rojas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rojas reassigned MESOS-8434:
--

Assignee: Alexander Rojas

> Cleanup Authorization logic in master and agent
> ---
>
> Key: MESOS-8434
> URL: https://issues.apache.org/jira/browse/MESOS-8434
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Affects Versions: 1.4.1
>Reporter: Alexander Rojas
>Assignee: Alexander Rojas
>Priority: Major
>  Labels: mesosphere, security
>
> During MesosCon EU 2017, [~benjaminhindman] came up with a neat abstraction 
> called [{{ObjectApprovers}}|https://reviews.apache.org/r/63258/] which go a 
> long way into streamlining and unifying the authorization used within mesos. 
> However, these patches became stale afterwards.
> Given the benefits of such logic, we should really make the effort to land 
> these patches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8483) ExampleTests PythonFramework fails with sigabort.

2018-01-24 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-8483:
--
Affects Version/s: 1.5.0

> ExampleTests PythonFramework fails with sigabort.
> -
>
> Key: MESOS-8483
> URL: https://issues.apache.org/jira/browse/MESOS-8483
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.5.0
> Environment: macOS 10.13.2 (17C88)
> Python 2.7.10 (Apple's default - not homebrew)
>Reporter: Till Toenshoff
>Priority: Blocker
>
> Starting the {{PythonFramework}} manually results in a sigabort:
> {noformat}
> $ ./src/examples/python/test-framework local
> [..]
> I0124 15:22:46.637238 65925120 master.cpp:563] Using default 'crammd5' 
> authenticator
> W0124 15:22:46.637269 65925120 authenticator.cpp:513] No credentials 
> provided, authentication requests will be refused
> I0124 15:22:46.637284 65925120 authenticator.cpp:520] Initializing server SASL
> I0124 15:22:46.659503 2385417024 resolver.cpp:69] Creating default secret 
> resolver
> I0124 15:22:46.659624 2385417024 containerizer.cpp:304] Using isolation { 
> environment_secret, filesystem/posix, posix/mem, posix/cpu }
> I0124 15:22:46.659951 2385417024 provisioner.cpp:299] Using default backend 
> 'copy'
> I0124 15:22:46.661628 67534848 slave.cpp:262] Mesos agent started on 
> (1)@192.168.178.20:49682
> I0124 15:22:46.661669 67534848 slave.cpp:263] Flags at startup: 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/store/appc"
>  --authenticate_http_executors="false" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --container_disk_watch_interval="15secs" --containerizers="mesos" 
> --default_role="*" --disk_watch_interval="1mins" --docker="docker" 
> --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
> --docker_stop_timeout="0ns" 
> --docker_store_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/store/docker"
>  --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false" 
> --executor_registration_timeout="1mins" 
> --executor_reregistration_timeout="2secs" 
> --executor_shutdown_grace_period="5secs" 
> --fetcher_cache_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/fetch"
>  --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
> --gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
> --hostname_lookup="true" --http_command_executor="false" 
> --http_heartbeat_interval="30secs" --initialize_driver_logging="true" 
> --isolation="posix/cpu,posix/mem" --launcher="posix" 
> --launcher_dir="/usr/local/libexec/mesos" --logbufsecs="0" 
> --logging_level="INFO" --max_completed_executors_per_framework="150" 
> --oversubscribed_resources_interval="15secs" --port="5051" 
> --qos_correction_interval_min="0ns" --quiet="false" 
> --reconfiguration_policy="equal" --recover="reconnect" 
> --recovery_timeout="15mins" --registration_backoff_factor="1secs" 
> --runtime_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/run"
>  --sandbox_directory="/mnt/mesos/sandbox" --strict="true" 
> --switch_user="true" --version="false" 
> --work_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/work"
>  --zk_session_timeout="10secs"
> python(1780,0x74068000) malloc: *** error for object 0x106ac07c8: pointer 
> being freed was not allocated
> *** set a breakpoint in malloc_error_break to debug
> {noformat}
> When running the {{PythonFramework}} via lldb, I get the following stacktrace:
> {noformat}
> * thread #7, stop reason = signal SIGABRT
>   * frame #0: 0x7fff55321e3e libsystem_kernel.dylib`__pthread_kill + 10
> frame #1: 0x7fff55460150 libsystem_pthread.dylib`pthread_kill + 333
> frame #2: 0x7fff5527e312 libsystem_c.dylib`abort + 127
> frame #3: 0x7fff5537b866 libsystem_malloc.dylib`free + 521
> frame #4: 0x00010d24daac 
> _scheduler.so`google::protobuf::internal::ArenaStringPtr::DestroyNoArena(this=0x7ac355b0,
>  default_value="") at arenastring.h:264
> frame #5: 0x00010d2fe1aa 
> _scheduler.so`mesos::Resource::SharedDtor(this=0x7ac35580) at 
> mesos.pb.cc:31016
> frame #6: 0x00010d2fe063 
> _scheduler.so`mesos::Resource::~Resource(this=0x7ac35580) at 
> mesos.pb.cc:31011
> frame #7: 0x00010d2fe485 
> _scheduler.so`mesos::Resource::~Resource(this=0x7ac35580) at 
> mesos.pb.cc:31009
> frame #8: 0x00010b0257c7 
> _scheduler.so`mesos::Resources::parse(name="cpus", value="8", 

[jira] [Updated] (MESOS-8483) ExampleTests PythonFramework fails with sigabort.

2018-01-24 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-8483:
--
Environment: 
macOS 10.13.2 (17C88)
Python 2.7.10 (Apple's default - not homebrew)

> ExampleTests PythonFramework fails with sigabort.
> -
>
> Key: MESOS-8483
> URL: https://issues.apache.org/jira/browse/MESOS-8483
> Project: Mesos
>  Issue Type: Bug
> Environment: macOS 10.13.2 (17C88)
> Python 2.7.10 (Apple's default - not homebrew)
>Reporter: Till Toenshoff
>Priority: Blocker
>
> Starting the {{PythonFramework}} manually results in a sigabort:
> {noformat}
> $ ./src/examples/python/test-framework local
> [..]
> I0124 15:22:46.637238 65925120 master.cpp:563] Using default 'crammd5' 
> authenticator
> W0124 15:22:46.637269 65925120 authenticator.cpp:513] No credentials 
> provided, authentication requests will be refused
> I0124 15:22:46.637284 65925120 authenticator.cpp:520] Initializing server SASL
> I0124 15:22:46.659503 2385417024 resolver.cpp:69] Creating default secret 
> resolver
> I0124 15:22:46.659624 2385417024 containerizer.cpp:304] Using isolation { 
> environment_secret, filesystem/posix, posix/mem, posix/cpu }
> I0124 15:22:46.659951 2385417024 provisioner.cpp:299] Using default backend 
> 'copy'
> I0124 15:22:46.661628 67534848 slave.cpp:262] Mesos agent started on 
> (1)@192.168.178.20:49682
> I0124 15:22:46.661669 67534848 slave.cpp:263] Flags at startup: 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/store/appc"
>  --authenticate_http_executors="false" --authenticate_http_readonly="false" 
> --authenticate_http_readwrite="false" --authenticatee="crammd5" 
> --authentication_backoff_factor="1secs" --authorizer="local" 
> --container_disk_watch_interval="15secs" --containerizers="mesos" 
> --default_role="*" --disk_watch_interval="1mins" --docker="docker" 
> --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
> --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
> --docker_stop_timeout="0ns" 
> --docker_store_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/store/docker"
>  --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
> --enforce_container_disk_quota="false" 
> --executor_registration_timeout="1mins" 
> --executor_reregistration_timeout="2secs" 
> --executor_shutdown_grace_period="5secs" 
> --fetcher_cache_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/fetch"
>  --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
> --gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
> --hostname_lookup="true" --http_command_executor="false" 
> --http_heartbeat_interval="30secs" --initialize_driver_logging="true" 
> --isolation="posix/cpu,posix/mem" --launcher="posix" 
> --launcher_dir="/usr/local/libexec/mesos" --logbufsecs="0" 
> --logging_level="INFO" --max_completed_executors_per_framework="150" 
> --oversubscribed_resources_interval="15secs" --port="5051" 
> --qos_correction_interval_min="0ns" --quiet="false" 
> --reconfiguration_policy="equal" --recover="reconnect" 
> --recovery_timeout="15mins" --registration_backoff_factor="1secs" 
> --runtime_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/run"
>  --sandbox_directory="/mnt/mesos/sandbox" --strict="true" 
> --switch_user="true" --version="false" 
> --work_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/work"
>  --zk_session_timeout="10secs"
> python(1780,0x74068000) malloc: *** error for object 0x106ac07c8: pointer 
> being freed was not allocated
> *** set a breakpoint in malloc_error_break to debug
> {noformat}
> When running the {{PythonFramework}} via lldb, I get the following stacktrace:
> {noformat}
> * thread #7, stop reason = signal SIGABRT
>   * frame #0: 0x7fff55321e3e libsystem_kernel.dylib`__pthread_kill + 10
> frame #1: 0x7fff55460150 libsystem_pthread.dylib`pthread_kill + 333
> frame #2: 0x7fff5527e312 libsystem_c.dylib`abort + 127
> frame #3: 0x7fff5537b866 libsystem_malloc.dylib`free + 521
> frame #4: 0x00010d24daac 
> _scheduler.so`google::protobuf::internal::ArenaStringPtr::DestroyNoArena(this=0x7ac355b0,
>  default_value="") at arenastring.h:264
> frame #5: 0x00010d2fe1aa 
> _scheduler.so`mesos::Resource::SharedDtor(this=0x7ac35580) at 
> mesos.pb.cc:31016
> frame #6: 0x00010d2fe063 
> _scheduler.so`mesos::Resource::~Resource(this=0x7ac35580) at 
> mesos.pb.cc:31011
> frame #7: 0x00010d2fe485 
> _scheduler.so`mesos::Resource::~Resource(this=0x7ac35580) at 
> mesos.pb.cc:31009
> frame #8: 0x00010b0257c7 
> 

[jira] [Created] (MESOS-8483) ExampleTests PythonFramework fails with sigabort.

2018-01-24 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-8483:
-

 Summary: ExampleTests PythonFramework fails with sigabort.
 Key: MESOS-8483
 URL: https://issues.apache.org/jira/browse/MESOS-8483
 Project: Mesos
  Issue Type: Bug
Reporter: Till Toenshoff


Starting the {{PythonFramework}} manually results in a sigabort:

{noformat}
$ ./src/examples/python/test-framework local
[..]
I0124 15:22:46.637238 65925120 master.cpp:563] Using default 'crammd5' 
authenticator
W0124 15:22:46.637269 65925120 authenticator.cpp:513] No credentials provided, 
authentication requests will be refused
I0124 15:22:46.637284 65925120 authenticator.cpp:520] Initializing server SASL
I0124 15:22:46.659503 2385417024 resolver.cpp:69] Creating default secret 
resolver
I0124 15:22:46.659624 2385417024 containerizer.cpp:304] Using isolation { 
environment_secret, filesystem/posix, posix/mem, posix/cpu }
I0124 15:22:46.659951 2385417024 provisioner.cpp:299] Using default backend 
'copy'
I0124 15:22:46.661628 67534848 slave.cpp:262] Mesos agent started on 
(1)@192.168.178.20:49682
I0124 15:22:46.661669 67534848 slave.cpp:263] Flags at startup: 
--appc_simple_discovery_uri_prefix="http://; 
--appc_store_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/store/appc"
 --authenticate_http_executors="false" --authenticate_http_readonly="false" 
--authenticate_http_readwrite="false" --authenticatee="crammd5" 
--authentication_backoff_factor="1secs" --authorizer="local" 
--container_disk_watch_interval="15secs" --containerizers="mesos" 
--default_role="*" --disk_watch_interval="1mins" --docker="docker" 
--docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io; 
--docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" 
--docker_stop_timeout="0ns" 
--docker_store_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/store/docker"
 --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" 
--enforce_container_disk_quota="false" --executor_registration_timeout="1mins" 
--executor_reregistration_timeout="2secs" 
--executor_shutdown_grace_period="5secs" 
--fetcher_cache_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/fetch"
 --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname_lookup="true" --http_command_executor="false" 
--http_heartbeat_interval="30secs" --initialize_driver_logging="true" 
--isolation="posix/cpu,posix/mem" --launcher="posix" 
--launcher_dir="/usr/local/libexec/mesos" --logbufsecs="0" 
--logging_level="INFO" --max_completed_executors_per_framework="150" 
--oversubscribed_resources_interval="15secs" --port="5051" 
--qos_correction_interval_min="0ns" --quiet="false" 
--reconfiguration_policy="equal" --recover="reconnect" 
--recovery_timeout="15mins" --registration_backoff_factor="1secs" 
--runtime_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/run"
 --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" 
--version="false" 
--work_dir="/var/folders/_t/rdp354gx7j5fjww270kbk6_rgn/T/mesos/work/agents/0/work"
 --zk_session_timeout="10secs"
python(1780,0x74068000) malloc: *** error for object 0x106ac07c8: pointer 
being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
{noformat}


When running the {{PythonFramework}} via lldb, I get the following stacktrace:

{noformat}
* thread #7, stop reason = signal SIGABRT
  * frame #0: 0x7fff55321e3e libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff55460150 libsystem_pthread.dylib`pthread_kill + 333
frame #2: 0x7fff5527e312 libsystem_c.dylib`abort + 127
frame #3: 0x7fff5537b866 libsystem_malloc.dylib`free + 521
frame #4: 0x00010d24daac 
_scheduler.so`google::protobuf::internal::ArenaStringPtr::DestroyNoArena(this=0x7ac355b0,
 default_value="") at arenastring.h:264
frame #5: 0x00010d2fe1aa 
_scheduler.so`mesos::Resource::SharedDtor(this=0x7ac35580) at 
mesos.pb.cc:31016
frame #6: 0x00010d2fe063 
_scheduler.so`mesos::Resource::~Resource(this=0x7ac35580) at 
mesos.pb.cc:31011
frame #7: 0x00010d2fe485 
_scheduler.so`mesos::Resource::~Resource(this=0x7ac35580) at 
mesos.pb.cc:31009
frame #8: 0x00010b0257c7 
_scheduler.so`mesos::Resources::parse(name="cpus", value="8", role="*") at 
resources.cpp:702
frame #9: 0x00010c7ae4c9 
_scheduler.so`mesos::internal::slave::Containerizer::resources(flags=0x00010202bac0)
 at containerizer.cpp:118
frame #10: 0x00010c3a93e1 
_scheduler.so`mesos::internal::slave::Slave::initialize(this=0x00010202ba00)
 at slave.cpp:472
frame #11: 0x00010c3d7cb2 _scheduler.so`virtual thunk to 
mesos::internal::slave::Slave::initialize(this=0x00010202ba00) at 
slave.cpp:0
frame #12: 0x00010e459c39 

[jira] [Updated] (MESOS-8482) Signed/Unsigned comparisons in tests

2018-01-24 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8482:
---
Description: 
Many tests in mesos currently have comparisons between signed and unsigned 
integers, eg
{noformat}
    ASSERT_EQ(4, v1Response->read_file().size());
{noformat}
or comparisons between values of different enums, e.g. TaskState and 
v1::TaskState:
{noformat}
  ASSERT_EQ(TASK_STARTING, startingUpdate->status().state());
{noformat}
Usually, the compiler would catch these and emit a warning, but these are 
currently silenced because gtest headers are included using the {{-isystem}} 
command line flag.

  was:
Many tests in mesos currently have comparisons between signed and unsigned 
integers, eg
{noformat}
    ASSERT_EQ(4, v1Response->read_file().size());
{noformat}
or comparisons between values of different enums, e.g. TaskState and 
v1::TaskState:
{noformat}
  ASSERT_EQ(TASK_STARTING, startingUpdate->status().state());
{noformat}
Usually, the compiler would catch these and emit a warning, but these are 
currently silenced because gtest headers are included using the `-isystem` 
command line flag.


> Signed/Unsigned comparisons in tests
> 
>
> Key: MESOS-8482
> URL: https://issues.apache.org/jira/browse/MESOS-8482
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Priority: Major
>  Labels: mesosphere, newbie, tech-debt
>
> Many tests in mesos currently have comparisons between signed and unsigned 
> integers, eg
> {noformat}
>     ASSERT_EQ(4, v1Response->read_file().size());
> {noformat}
> or comparisons between values of different enums, e.g. TaskState and 
> v1::TaskState:
> {noformat}
>   ASSERT_EQ(TASK_STARTING, startingUpdate->status().state());
> {noformat}
> Usually, the compiler would catch these and emit a warning, but these are 
> currently silenced because gtest headers are included using the {{-isystem}} 
> command line flag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (MESOS-8482) Signed/Unsigned comparisons in tests

2018-01-24 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8482:
---
Labels: mesosphere newbie tech-debt  (was: )

> Signed/Unsigned comparisons in tests
> 
>
> Key: MESOS-8482
> URL: https://issues.apache.org/jira/browse/MESOS-8482
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Priority: Major
>  Labels: mesosphere, newbie, tech-debt
>
> Many tests in mesos currently have comparisons between signed and unsigned 
> integers, eg
> {noformat}
>     ASSERT_EQ(4, v1Response->read_file().size());
> {noformat}
> or comparisons between values of different enums, e.g. TaskState and 
> v1::TaskState:
> {noformat}
>   ASSERT_EQ(TASK_STARTING, startingUpdate->status().state());
> {noformat}
> Usually, the compiler would catch these and emit a warning, but these are 
> currently silenced because gtest headers are included using the `-isystem` 
> command line flag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8482) Signed/Unsigned comparisons in tests

2018-01-24 Thread Benno Evers (JIRA)
Benno Evers created MESOS-8482:
--

 Summary: Signed/Unsigned comparisons in tests
 Key: MESOS-8482
 URL: https://issues.apache.org/jira/browse/MESOS-8482
 Project: Mesos
  Issue Type: Bug
Reporter: Benno Evers


Many tests in mesos currently have comparisons between signed and unsigned 
integers, eg
{noformat}
    ASSERT_EQ(4, v1Response->read_file().size());
{noformat}
or comparisons between values of different enums, e.g. TaskState and 
v1::TaskState:
{noformat}
  ASSERT_EQ(TASK_STARTING, startingUpdate->status().state());
{noformat}
Usually, the compiler would catch these and emit a warning, but these are 
currently silenced because gtest headers are included using the `-isystem` 
command line flag.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)