[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed

2017-09-21 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175836#comment-16175836
 ] 

Qian Zhang commented on MESOS-7975:
---

[~alexr] I have sent a mail to the lists just now, let's wait for the feedback 
from the community.

> The command/default executor can incorrectly send a TASK_FINISHED update even 
> when the task is killed
> -
>
> Key: MESOS-7975
> URL: https://issues.apache.org/jira/browse/MESOS-7975
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when a task is killed, the default and the command executor 
> incorrectly send a {{TASK_FINISHED}} status update instead of 
> {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when 
> the task exits with a zero status code.
> {code}
>   if (WSUCCEEDED(status)) {
> taskState = TASK_FINISHED;
>   } else if (killed) {
> // Send TASK_KILLED if the task was killed as a result of
> // kill() or shutdown().
> taskState = TASK_KILLED;
>   } else {
> taskState = TASK_FAILED;
>   }
> {code}
> We should modify the code to correctly send {{TASK_KILLED}} status updates 
> when a task is killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed

2017-09-21 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175793#comment-16175793
 ] 

Qian Zhang commented on MESOS-7975:
---

[~jpe...@apache.org] When the scheduler sends a kill, will your executor send a 
SIGTERM to the task or SIGKILL? If it is SIGTERM, and the task handles it 
gracefully and exit with 0, do you think it is reasonable for executor to send 
a TASK_FINISHED in this case?

> The command/default executor can incorrectly send a TASK_FINISHED update even 
> when the task is killed
> -
>
> Key: MESOS-7975
> URL: https://issues.apache.org/jira/browse/MESOS-7975
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when a task is killed, the default and the command executor 
> incorrectly send a {{TASK_FINISHED}} status update instead of 
> {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when 
> the task exits with a zero status code.
> {code}
>   if (WSUCCEEDED(status)) {
> taskState = TASK_FINISHED;
>   } else if (killed) {
> // Send TASK_KILLED if the task was killed as a result of
> // kill() or shutdown().
> taskState = TASK_KILLED;
>   } else {
> taskState = TASK_FAILED;
>   }
> {code}
> We should modify the code to correctly send {{TASK_KILLED}} status updates 
> when a task is killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7962) Display task state counters in the framework page of the webui.

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175636#comment-16175636
 ] 

ASF GitHub Bot commented on MESOS-7962:
---

Github user asfgit closed the pull request at:

https://github.com/apache/mesos/pull/234


> Display task state counters in the framework page of the webui.
> ---
>
> Key: MESOS-7962
> URL: https://issues.apache.org/jira/browse/MESOS-7962
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Tomasz Janiszewski
>
> Currently the webui displays task state counters across all frameworks on the 
> home page, but it does not display the per-framework task state counters when 
> you click in to a particular framework. We should add the task state counters 
> to the per-framework page.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-2657) Support multiple reasons in status update message.

2017-09-21 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175579#comment-16175579
 ] 

James Peach commented on MESOS-2657:


[~haosd...@gmail.com] [~jieyu] As part of the refactoring for MESOS-7963, I'm 
planning to remove the multiple reasons in the {{ContainerTermination}} 
message. I think the main use case for that was supporting multiple limitations 
from isolators, but that never worked and I'm removing that as well :)

 Please let me know if you see any problems with this.

> Support multiple reasons in status update message.
> --
>
> Key: MESOS-2657
> URL: https://issues.apache.org/jira/browse/MESOS-2657
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>Assignee: haosdent
>
> Sometimes, a single reason in the status update message makes it very hard 
> for frameworks to understand the cause of a status update. For example, we 
> have REASON_EXECUTOR_TERMINATED, but that's a very general reason and 
> sometime we want a sub-reason for that (e.g., REASON_CONTAINER_LAUNCH_FAILED) 
> so that the framework can better react to the status update.
> We could change 'reason' field in TaskStatus to be a repeated field (should 
> be backward compatible). For instance, for a containerizer launch failure, we 
> probably need two reasons for TASK_LOST: 1) the top level reason 
> REASON_EXECUTOR_TERMINATED; 2) the second level reason 
> REASON_CONTAINER_LAUNCH_FAILED.
> Another example. We may want to have a generic reason when resource limit is 
> reached: REASON_RESOURCE_LIMIT_EXCEEDED, and have a second level sub-reason: 
> REASON_OUT_OF_MEMORY.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7990) Support systemd named hierarchy (name=systemd) for Mesos Containerizer.

2017-09-21 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175460#comment-16175460
 ] 

Jie Yu commented on MESOS-7990:
---

[~jasonlai] Yes, i am aware of that (the naming convention). In fact, we're not 
supposed to touch named systemd cgroup hierarchy manually. However, major 
container orchestrators (docker, k8s) all manipulate systemd cgroup hierarchy 
directly. They all have an alternative mode that supports systemd more natively 
(using machined or system slice for containers, like rkt does).

We want to support both too. The native systemd support will be added later. 
This ticket is for the cgroupfs support.

> Support systemd named hierarchy (name=systemd) for Mesos Containerizer.
> ---
>
> Key: MESOS-7990
> URL: https://issues.apache.org/jira/browse/MESOS-7990
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Jie Yu
>
> Similar to docker's cgroupfs cgroup driver, we should create cgroups under 
> /sys/fs/cgroup/systemd (if it exists), and move container pid into the 
> corresponding cgroup ( /sys/fs/cgroup/systemd/mesos/).
> This can give us a bunch of benefits:
> 1) systemd-cgls can list mesos containers
> 2) systemd-cgtop can show stats for mesos containers
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7990) Support systemd named hierarchy (name=systemd) for Mesos Containerizer.

2017-09-21 Thread Jason Lai (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175448#comment-16175448
 ] 

Jason Lai commented on MESOS-7990:
--

I'm all for the systemd support. But it doesn't go that easily as 
{{/systemd/mesos/}}, as Systemd has imposed some conventions on tasks' 
cgroup names. There are some references 
[here|https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/resource_management_guide/sec-default_cgroup_hierarchies].
 AFAIK, rkt has aligned pretty well with the systemd conventions for their 
containers. Would be worth looking at what they're doing

> Support systemd named hierarchy (name=systemd) for Mesos Containerizer.
> ---
>
> Key: MESOS-7990
> URL: https://issues.apache.org/jira/browse/MESOS-7990
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Jie Yu
>
> Similar to docker's cgroupfs cgroup driver, we should create cgroups under 
> /sys/fs/cgroup/systemd (if it exists), and move container pid into the 
> corresponding cgroup ( /sys/fs/cgroup/systemd/mesos/).
> This can give us a bunch of benefits:
> 1) systemd-cgls can list mesos containers
> 2) systemd-cgtop can show stats for mesos containers
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-7999) Add and document ability to expose new /monitor modules on agents

2017-09-21 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175118#comment-16175118
 ] 

James Peach edited comment on MESOS-7999 at 9/21/17 7:56 PM:
-

You can write an anonymous Mesos module that uses the lib process 
[metrics|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/metrics/metrics.hpp#L94]
 API to expose metrics into {{/metrics/snapshot}}.


was (Author: jamespeach):
You can write an anonymous Mesos module that uses the lib process 
[metrics|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/metrics/metrics.hpp#L94]
 API to expose metrics into {{/metrics/snapshot}}/

> Add and document ability to expose new /monitor modules on agents
> -
>
> Key: MESOS-7999
> URL: https://issues.apache.org/jira/browse/MESOS-7999
> Project: Mesos
>  Issue Type: Wish
>  Components: agent, json api, modules, statistics
>Reporter: Charles Allen
>
> When looking at how to collect data about the cluster, the best way to 
> support functionality similar to Kubernetes DaemonSets is not completely 
> clear.
> One key use case fore DaemonSets is a monitor for system metrics. This ask is 
> that agents are able to have a module which either exposes new endpoints in 
> {{/monitor}} or allows pluggable entries to be added to 
> {{/monitor/statistics}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7312) Update Resource proto for storage resource providers.

2017-09-21 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175316#comment-16175316
 ] 

Benjamin Bannier commented on MESOS-7312:
-

{noformat}
commit 91e279ad1855ac7f1ae628778731173aa603d5e3
Author: Benjamin Bannier 
Date:   Thu Sep 21 15:03:22 2017 +0200

Added 'id' and 'metadata' fields to 'Resource.DiskInfo.Source'.

IDs will allow to create distinguishable resources, e.g., of RAW or
BLOCK type. We also add a metadata field which can be used to expose
additional disk information.

Review: https://reviews.apache.org/r/58048/
{noformat}

> Update Resource proto for storage resource providers.
> -
>
> Key: MESOS-7312
> URL: https://issues.apache.org/jira/browse/MESOS-7312
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: storage
>
> Storage resource provider support requires a number of changes to the 
> {{Resource}} proto:
> * support for {{RAW}} and {{BLOCK}} type {{Resource::DiskInfo::Source}}
> * {{ResourceProviderID}} in Resource
> * {{Resource::DiskInfo::Source::Path}} should be {{optional}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed

2017-09-21 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175135#comment-16175135
 ] 

James Peach commented on MESOS-7975:


FWIW the rule we have in our executor is that if we terminated a task because 
the scheduler send a kill, we always send a {{TASK_KILLED}} status. That is the 
only reason we send this status.

> The command/default executor can incorrectly send a TASK_FINISHED update even 
> when the task is killed
> -
>
> Key: MESOS-7975
> URL: https://issues.apache.org/jira/browse/MESOS-7975
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when a task is killed, the default and the command executor 
> incorrectly send a {{TASK_FINISHED}} status update instead of 
> {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when 
> the task exits with a zero status code.
> {code}
>   if (WSUCCEEDED(status)) {
> taskState = TASK_FINISHED;
>   } else if (killed) {
> // Send TASK_KILLED if the task was killed as a result of
> // kill() or shutdown().
> taskState = TASK_KILLED;
>   } else {
> taskState = TASK_FAILED;
>   }
> {code}
> We should modify the code to correctly send {{TASK_KILLED}} status updates 
> when a task is killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8003) PersistentVolumeEndpointsTest.SlavesEndpointFullResources is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8003:
---
Attachment: 
PersistentVolumeEndpointsTest.SlavesEndpointFullResources_badrun.txt

> PersistentVolumeEndpointsTest.SlavesEndpointFullResources is flaky.
> ---
>
> Key: MESOS-8003
> URL: https://issues.apache.org/jira/browse/MESOS-8003
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Fedora 23
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: 
> PersistentVolumeEndpointsTest.SlavesEndpointFullResources_badrun.txt
>
>
> Observed on internal CI:
> {noformat}
> ../../src/tests/persistent_volume_endpoints_tests.cpp:1952
> Value of: (response).get().status
>   Actual: "409 Conflict"
> Expected: Accepted().status
> Which is: "202 Accepted"
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7999) Add and document ability to expose new /monitor modules on agents

2017-09-21 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175118#comment-16175118
 ] 

James Peach commented on MESOS-7999:


You can write an anonymous Mesos module that uses the lib process 
[metrics|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/metrics/metrics.hpp#L94]
 API to expose metrics into {{/metrics/snapshot}}/

> Add and document ability to expose new /monitor modules on agents
> -
>
> Key: MESOS-7999
> URL: https://issues.apache.org/jira/browse/MESOS-7999
> Project: Mesos
>  Issue Type: Wish
>  Components: agent, json api, modules, statistics
>Reporter: Charles Allen
>
> When looking at how to collect data about the cluster, the best way to 
> support functionality similar to Kubernetes DaemonSets is not completely 
> clear.
> One key use case fore DaemonSets is a monitor for system metrics. This ask is 
> that agents are able to have a module which either exposes new endpoints in 
> {{/monitor}} or allows pluggable entries to be added to 
> {{/monitor/statistics}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8003) PersistentVolumeEndpointsTest.SlavesEndpointFullResources is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8003:
--

 Summary: PersistentVolumeEndpointsTest.SlavesEndpointFullResources 
is flaky.
 Key: MESOS-8003
 URL: https://issues.apache.org/jira/browse/MESOS-8003
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
 Environment: Fedora 23
Reporter: Alexander Rukletsov


Observed on internal CI:
{noformat}
../../src/tests/persistent_volume_endpoints_tests.cpp:1952
Value of: (response).get().status
  Actual: "409 Conflict"
Expected: Accepted().status
Which is: "202 Accepted"
{noformat}
Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8001) PersistentVolumeEndpointsTest.NoAuthentication is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8001:
---
Attachment: PersistentVolumeEndpointsTest.NoAuthentication_badrun.txt

> PersistentVolumeEndpointsTest.NoAuthentication is flaky.
> 
>
> Key: MESOS-8001
> URL: https://issues.apache.org/jira/browse/MESOS-8001
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Ubuntu 16.04 with SSL
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: PersistentVolumeEndpointsTest.NoAuthentication_badrun.txt
>
>
> Observed a failure on internal CI:
> {noformat}
> ../../src/tests/persistent_volume_endpoints_tests.cpp:1385
> Value of: (response).get().status
>   Actual: "409 Conflict"
> Expected: Accepted().status
> Which is: "202 Accepted"
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8002) Marathon can't start on macOS 11.12.x with Mesos 1.3.0

2017-09-21 Thread Alex Lee (JIRA)
Alex Lee created MESOS-8002:
---

 Summary: Marathon can't start on macOS 11.12.x with Mesos 1.3.0
 Key: MESOS-8002
 URL: https://issues.apache.org/jira/browse/MESOS-8002
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 1.3.0
 Environment: macOS 10.12.x 
Reporter: Alex Lee


We upgraded our Mesos cluster 1.3.0 and run into the following error when 
starting Marathon 1.4.7:
```
I0823 17:19:17.498087 101744640 group.cpp:340] Group process 
(zookeeper-group(1)@127.0.0.1:57708) connected to ZooKeeper
I0823 17:19:17.498652 101744640 group.cpp:830] Syncing group operations: queue 
size (joins, cancels, datas) = (0, 0, 0)
I0823 17:19:17.499153 101744640 group.cpp:418] Trying to create path 
'/mesos/master' in ZooKeeper
Assertion failed: (0), function hash, file 
/BuildRoot/Library/Caches/com.apple.xbs/Sources/cmph/cmph-6/src/hash.c, line 35.
```
This was reported in: https://jira.mesosphere.com/browse/MARATHON-7727

Interestingly, Marathon was able to start in the same cluster on macOS 10.11.6 
host. We were suspecting it's OS version issue initially and open the issue 
with Apple. But macOS team responded that there may be a regression in mesos. 
The assertion is being raised in libcmph that libmeso.dylib invokes with 
providing invalid input and the hash functions in libcmph don’t look like 
they’ve changed between 10.11.6 and 10.12.6, at least with respect to that 
assert(0) being around.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7963) Task groups can lose the container limitation status.

2017-09-21 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174971#comment-16174971
 ] 

Qian Zhang commented on MESOS-7963:
---

Can you let me know what the special case is? I think currently when the 
default executor gets a limitation, it will kill all other nested containers 
and then terminate itself, I do not think we need to change this. And even 
without my proposal (i.e., raise limitation only for root container), all the 
nested containers will be killed as well (by Mesos containerizer), so the 
result is same, I am not sure when we need to restart the nested container.

> Task groups can lose the container limitation status.
> -
>
> Key: MESOS-7963
> URL: https://issues.apache.org/jira/browse/MESOS-7963
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, executor
>Reporter: James Peach
>
> If you run a single task in a task group and that task fails with a container 
> limitation, that status update can be lost and only the executor failure will 
> be reported to the framework.
> {noformat}
> exec /opt/mesos/bin/mesos-execute --content_type=json 
> --master=jpeach.apple.com:5050 '--task_group={
> "tasks":
> [
> {
> "name": "7f141aca-55fe-4bb0-af4b-87f5ee26986a",
> "task_id": {"value" : "2866368d-7279-4657-b8eb-bf1d968e8ebf"},
> "agent_id": {"value" : ""},
> "resources": [{
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
> "value": 0.2
> }
> }, {
> "name": "mem",
> "type": "SCALAR",
> "scalar": {
> "value": 32
> }
> }, {
> "name": "disk",
> "type": "SCALAR",
> "scalar": {
> "value": 2
> }
> }
> ],
> "command": {
> "value": "sleep 2 ; /usr/bin/dd if=/dev/zero of=out.dat bs=1M 
> count=64 ; sleep 1"
> }
> }
> ]
> }'
> I0911 11:48:01.480689  7340 scheduler.cpp:184] Version: 1.5.0
> I0911 11:48:01.488868  7339 scheduler.cpp:470] New master detected at 
> master@17.228.224.108:5050
> Subscribed with ID aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> Submitted task group with tasks [ 2866368d-7279-4657-b8eb-bf1d968e8ebf ] to 
> agent 'aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-S0'
> Received status update TASK_RUNNING for task 
> '2866368d-7279-4657-b8eb-bf1d968e8ebf'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FAILED for task 
> '2866368d-7279-4657-b8eb-bf1d968e8ebf'
>   message: 'Command terminated with signal Killed'
>   source: SOURCE_EXECUTOR
> {noformat}
> However, the agent logs show that this failed with a memory limitation:
> {noformat}
> I0911 11:48:02.235818  7012 http.cpp:532] Processing call 
> WAIT_NESTED_CONTAINER
> I0911 11:48:02.236395  7013 status_update_manager.cpp:323] Received status 
> update TASK_RUNNING (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> I0911 11:48:02.237083  7016 slave.cpp:4875] Forwarding the update 
> TASK_RUNNING (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010 to master@17.228.224.108:5050
> I0911 11:48:02.283661  7007 status_update_manager.cpp:395] Received status 
> update acknowledgement (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> I0911 11:48:04.771455  7014 memory.cpp:516] OOM detected for container 
> 474388fe-43c3-4372-b903-eaca22740996
> I0911 11:48:04.776445  7014 memory.cpp:556] Memory limit exceeded: Requested: 
> 64MB Maximum Used: 64MB
> ...
> I0911 11:48:04.776943  7012 containerizer.cpp:2681] Container 
> 474388fe-43c3-4372-b903-eaca22740996 has reached its limit for resource 
> [{"name":"mem","scalar":{"value":64.0},"type":"SCALAR"}] and will be 
> terminated
> {noformat}
> The following {{mesos-execute}} task will show the container limitation 
> correctly:
> {noformat}
> exec /opt/mesos/bin/mesos-execute --content_type=json 
> --master=jpeach.apple.com:5050 '--task_group={
> "tasks":
> [
> {
> "name": "37db08f6-4f0f-4ef6-97ee-b10a5c5cc211",
> "task_id": {"value" : "1372b2e2-c501-4e80-bcbd-1a5c5194e206"},
> "agent_id": {"value" : ""},
> "resources": [{
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
> "value": 0.2
> }
> 

[jira] [Commented] (MESOS-7995) libprocess tests breaking on macOS.

2017-09-21 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174936#comment-16174936
 ] 

Till Toenshoff commented on MESOS-7995:
---

Downgrading from blocker as the workaround is to downgrade libevent towards 
2.0.22.

> libprocess tests breaking on macOS.
> ---
>
> Key: MESOS-7995
> URL: https://issues.apache.org/jira/browse/MESOS-7995
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
>Affects Versions: 1.5.0
> Environment: libevent 2.1.8
>Reporter: Till Toenshoff
>Priority: Blocker
>
> Many libprocess tests fail on macOS, some even abort.
> Examples:
> {noformat}
> [--] 8 tests from HTTPConnectionTest
> [ RUN  ] HTTPConnectionTest.GzipRequestBody
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:972: Failure
> Failed to wait 15secs for connect
> [  FAILED  ] HTTPConnectionTest.GzipRequestBody (15001 ms)
> [ RUN  ] HTTPConnectionTest.Serial
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1015: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Serial (0 ms)
> [ RUN  ] HTTPConnectionTest.Pipeline
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1094: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Pipeline (1 ms)
> [ RUN  ] HTTPConnectionTest.ClosingRequest
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1190: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingRequest (0 ms)
> [ RUN  ] HTTPConnectionTest.ClosingResponse
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1245: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingResponse (0 ms)
> [ RUN  ] HTTPConnectionTest.ReferenceCounting
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1306: Failure
> (*connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ReferenceCounting (1 ms)
> [ RUN  ] HTTPConnectionTest.Equality
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1333: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Equality (0 ms)
> [ RUN  ] HTTPConnectionTest.RequestStreaming
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1360: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.RequestStreaming (0 ms)
> [--] 8 tests from HTTPConnectionTest (15003 ms total)
> {noformat}
> {noformat}
> [--] 8 tests from HttpAuthenticationTest
> [ RUN  ] HttpAuthenticationTest.NoAuthenticator
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1792: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1786: Failure
> Actual function call count doesn't match EXPECT_CALL(*http.process, 
> authenticated(_, Option::none()))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> [  FAILED  ] HttpAuthenticationTest.NoAuthenticator (1 ms)
> [ RUN  ] HttpAuthenticationTest.Unauthorized
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1816: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F0921 12:18:19.947710 2519827264 future.hpp:1151] Check failed: !isFailed() 
> Future::get() but state == FAILED: Failed to connect to 192.168.178.20:51437: 
> Host is down
> *** Check failure stack trace: ***
> *** Aborted at 1505989099 (unix time) try "date -d @1505989099" if you are 
> using GNU date ***
> PC: @ 0x7fff5cd45fce __pthread_kill
> *** SIGABRT (@0x7fff5cd45fce) received by PID 23916 (TID 0x7fff96318340) 
> stack trace: ***
> @ 0x7fff5ce76f5a _sigtramp
> @ 0x7fff5ac5e526 std::__1::locale::facet::__on_zero_shared()
> @ 0x7fff5cca232a abort
> @0x1077b9659 google::logging_fail()
> @0x1077b964a google::LogMessage::Fail()
> @0x1077b72fc google::LogMessage::SendToLog()
> @0x1077b8089 google::LogMessage::Flush()
> @0x1077c12e9 google::LogMessageFatal::~LogMessageFatal()
> @0x1077b9b35 google::LogMessageFatal::~LogMessageFatal()
> @0x106998ad1 process::Future<>::get()
> @0x1069d4d5b HttpAuthenticationTest_Unauthorized_Test::TestBody()
> @0x1070a828e 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x10704a96b 
> 

[jira] [Updated] (MESOS-7995) libprocess tests breaking on macOS.

2017-09-21 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-7995:
--
Priority: Major  (was: Blocker)

> libprocess tests breaking on macOS.
> ---
>
> Key: MESOS-7995
> URL: https://issues.apache.org/jira/browse/MESOS-7995
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
>Affects Versions: 1.5.0
> Environment: libevent 2.1.8
>Reporter: Till Toenshoff
>
> Many libprocess tests fail on macOS, some even abort.
> Examples:
> {noformat}
> [--] 8 tests from HTTPConnectionTest
> [ RUN  ] HTTPConnectionTest.GzipRequestBody
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:972: Failure
> Failed to wait 15secs for connect
> [  FAILED  ] HTTPConnectionTest.GzipRequestBody (15001 ms)
> [ RUN  ] HTTPConnectionTest.Serial
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1015: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Serial (0 ms)
> [ RUN  ] HTTPConnectionTest.Pipeline
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1094: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Pipeline (1 ms)
> [ RUN  ] HTTPConnectionTest.ClosingRequest
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1190: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingRequest (0 ms)
> [ RUN  ] HTTPConnectionTest.ClosingResponse
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1245: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingResponse (0 ms)
> [ RUN  ] HTTPConnectionTest.ReferenceCounting
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1306: Failure
> (*connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ReferenceCounting (1 ms)
> [ RUN  ] HTTPConnectionTest.Equality
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1333: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Equality (0 ms)
> [ RUN  ] HTTPConnectionTest.RequestStreaming
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1360: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.RequestStreaming (0 ms)
> [--] 8 tests from HTTPConnectionTest (15003 ms total)
> {noformat}
> {noformat}
> [--] 8 tests from HttpAuthenticationTest
> [ RUN  ] HttpAuthenticationTest.NoAuthenticator
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1792: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1786: Failure
> Actual function call count doesn't match EXPECT_CALL(*http.process, 
> authenticated(_, Option::none()))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> [  FAILED  ] HttpAuthenticationTest.NoAuthenticator (1 ms)
> [ RUN  ] HttpAuthenticationTest.Unauthorized
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1816: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F0921 12:18:19.947710 2519827264 future.hpp:1151] Check failed: !isFailed() 
> Future::get() but state == FAILED: Failed to connect to 192.168.178.20:51437: 
> Host is down
> *** Check failure stack trace: ***
> *** Aborted at 1505989099 (unix time) try "date -d @1505989099" if you are 
> using GNU date ***
> PC: @ 0x7fff5cd45fce __pthread_kill
> *** SIGABRT (@0x7fff5cd45fce) received by PID 23916 (TID 0x7fff96318340) 
> stack trace: ***
> @ 0x7fff5ce76f5a _sigtramp
> @ 0x7fff5ac5e526 std::__1::locale::facet::__on_zero_shared()
> @ 0x7fff5cca232a abort
> @0x1077b9659 google::logging_fail()
> @0x1077b964a google::LogMessage::Fail()
> @0x1077b72fc google::LogMessage::SendToLog()
> @0x1077b8089 google::LogMessage::Flush()
> @0x1077c12e9 google::LogMessageFatal::~LogMessageFatal()
> @0x1077b9b35 google::LogMessageFatal::~LogMessageFatal()
> @0x106998ad1 process::Future<>::get()
> @0x1069d4d5b HttpAuthenticationTest_Unauthorized_Test::TestBody()
> @0x1070a828e 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x10704a96b 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10704a896 testing::Test::Run()
> @

[jira] [Created] (MESOS-8001) PersistentVolumeEndpointsTest.NoAuthentication is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8001:
--

 Summary: PersistentVolumeEndpointsTest.NoAuthentication is flaky.
 Key: MESOS-8001
 URL: https://issues.apache.org/jira/browse/MESOS-8001
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
 Environment: Ubuntu 16.04 with SSL
Reporter: Alexander Rukletsov


Observed a failure on internal CI:
{noformat}
../../src/tests/persistent_volume_endpoints_tests.cpp:1385
Value of: (response).get().status
  Actual: "409 Conflict"
Expected: Accepted().status
Which is: "202 Accepted"
{noformat}
Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7995) libprocess tests breaking on macOS.

2017-09-21 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-7995:
--
Environment: libevent 2.1.8

> libprocess tests breaking on macOS.
> ---
>
> Key: MESOS-7995
> URL: https://issues.apache.org/jira/browse/MESOS-7995
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
>Affects Versions: 1.5.0
> Environment: libevent 2.1.8
>Reporter: Till Toenshoff
>Priority: Blocker
>
> Many libprocess tests fail on macOS, some even abort.
> Examples:
> {noformat}
> [--] 8 tests from HTTPConnectionTest
> [ RUN  ] HTTPConnectionTest.GzipRequestBody
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:972: Failure
> Failed to wait 15secs for connect
> [  FAILED  ] HTTPConnectionTest.GzipRequestBody (15001 ms)
> [ RUN  ] HTTPConnectionTest.Serial
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1015: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Serial (0 ms)
> [ RUN  ] HTTPConnectionTest.Pipeline
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1094: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Pipeline (1 ms)
> [ RUN  ] HTTPConnectionTest.ClosingRequest
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1190: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingRequest (0 ms)
> [ RUN  ] HTTPConnectionTest.ClosingResponse
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1245: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingResponse (0 ms)
> [ RUN  ] HTTPConnectionTest.ReferenceCounting
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1306: Failure
> (*connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ReferenceCounting (1 ms)
> [ RUN  ] HTTPConnectionTest.Equality
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1333: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Equality (0 ms)
> [ RUN  ] HTTPConnectionTest.RequestStreaming
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1360: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.RequestStreaming (0 ms)
> [--] 8 tests from HTTPConnectionTest (15003 ms total)
> {noformat}
> {noformat}
> [--] 8 tests from HttpAuthenticationTest
> [ RUN  ] HttpAuthenticationTest.NoAuthenticator
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1792: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1786: Failure
> Actual function call count doesn't match EXPECT_CALL(*http.process, 
> authenticated(_, Option::none()))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> [  FAILED  ] HttpAuthenticationTest.NoAuthenticator (1 ms)
> [ RUN  ] HttpAuthenticationTest.Unauthorized
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1816: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F0921 12:18:19.947710 2519827264 future.hpp:1151] Check failed: !isFailed() 
> Future::get() but state == FAILED: Failed to connect to 192.168.178.20:51437: 
> Host is down
> *** Check failure stack trace: ***
> *** Aborted at 1505989099 (unix time) try "date -d @1505989099" if you are 
> using GNU date ***
> PC: @ 0x7fff5cd45fce __pthread_kill
> *** SIGABRT (@0x7fff5cd45fce) received by PID 23916 (TID 0x7fff96318340) 
> stack trace: ***
> @ 0x7fff5ce76f5a _sigtramp
> @ 0x7fff5ac5e526 std::__1::locale::facet::__on_zero_shared()
> @ 0x7fff5cca232a abort
> @0x1077b9659 google::logging_fail()
> @0x1077b964a google::LogMessage::Fail()
> @0x1077b72fc google::LogMessage::SendToLog()
> @0x1077b8089 google::LogMessage::Flush()
> @0x1077c12e9 google::LogMessageFatal::~LogMessageFatal()
> @0x1077b9b35 google::LogMessageFatal::~LogMessageFatal()
> @0x106998ad1 process::Future<>::get()
> @0x1069d4d5b HttpAuthenticationTest_Unauthorized_Test::TestBody()
> @0x1070a828e 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x10704a96b 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10704a896 

[jira] [Updated] (MESOS-8000) DefaultExecutorCniTest.ROOT_VerifyContainerIP is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8000:
---
Attachment: ROOT_VerifyContainerIP_badrun.txt
ROOT_VerifyContainerIP_goodrun.txt

> DefaultExecutorCniTest.ROOT_VerifyContainerIP is flaky.
> ---
>
> Key: MESOS-8000
> URL: https://issues.apache.org/jira/browse/MESOS-8000
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Ubuntu 16.04
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: ROOT_VerifyContainerIP_badrun.txt, 
> ROOT_VerifyContainerIP_goodrun.txt
>
>
> Observed a failure on internal CI:
> {noformat}
> ../../src/tests/containerizer/cni_isolator_tests.cpp:1419
> Failed to wait 15secs for subscribed
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7963) Task groups can lose the container limitation status.

2017-09-21 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174890#comment-16174890
 ] 

James Peach commented on MESOS-7963:


Right now, if an executor gets any limitation, it knows it will be terminated. 
The special case is that in your proposal some kinds of limitation would not 
cause the executor to be terminated, so the executor needs to decide how to 
handle that by either manually tearing everything down or restarting the nested 
container.

> Task groups can lose the container limitation status.
> -
>
> Key: MESOS-7963
> URL: https://issues.apache.org/jira/browse/MESOS-7963
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, executor
>Reporter: James Peach
>
> If you run a single task in a task group and that task fails with a container 
> limitation, that status update can be lost and only the executor failure will 
> be reported to the framework.
> {noformat}
> exec /opt/mesos/bin/mesos-execute --content_type=json 
> --master=jpeach.apple.com:5050 '--task_group={
> "tasks":
> [
> {
> "name": "7f141aca-55fe-4bb0-af4b-87f5ee26986a",
> "task_id": {"value" : "2866368d-7279-4657-b8eb-bf1d968e8ebf"},
> "agent_id": {"value" : ""},
> "resources": [{
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
> "value": 0.2
> }
> }, {
> "name": "mem",
> "type": "SCALAR",
> "scalar": {
> "value": 32
> }
> }, {
> "name": "disk",
> "type": "SCALAR",
> "scalar": {
> "value": 2
> }
> }
> ],
> "command": {
> "value": "sleep 2 ; /usr/bin/dd if=/dev/zero of=out.dat bs=1M 
> count=64 ; sleep 1"
> }
> }
> ]
> }'
> I0911 11:48:01.480689  7340 scheduler.cpp:184] Version: 1.5.0
> I0911 11:48:01.488868  7339 scheduler.cpp:470] New master detected at 
> master@17.228.224.108:5050
> Subscribed with ID aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> Submitted task group with tasks [ 2866368d-7279-4657-b8eb-bf1d968e8ebf ] to 
> agent 'aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-S0'
> Received status update TASK_RUNNING for task 
> '2866368d-7279-4657-b8eb-bf1d968e8ebf'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FAILED for task 
> '2866368d-7279-4657-b8eb-bf1d968e8ebf'
>   message: 'Command terminated with signal Killed'
>   source: SOURCE_EXECUTOR
> {noformat}
> However, the agent logs show that this failed with a memory limitation:
> {noformat}
> I0911 11:48:02.235818  7012 http.cpp:532] Processing call 
> WAIT_NESTED_CONTAINER
> I0911 11:48:02.236395  7013 status_update_manager.cpp:323] Received status 
> update TASK_RUNNING (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> I0911 11:48:02.237083  7016 slave.cpp:4875] Forwarding the update 
> TASK_RUNNING (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010 to master@17.228.224.108:5050
> I0911 11:48:02.283661  7007 status_update_manager.cpp:395] Received status 
> update acknowledgement (UUID: 85e7a8e8-22a7-4561-9000-2cd6d93502d9) for task 
> 2866368d-7279-4657-b8eb-bf1d968e8ebf of framework 
> aabd0847-aabc-4eb4-9c66-7d91fc9e9c32-0010
> I0911 11:48:04.771455  7014 memory.cpp:516] OOM detected for container 
> 474388fe-43c3-4372-b903-eaca22740996
> I0911 11:48:04.776445  7014 memory.cpp:556] Memory limit exceeded: Requested: 
> 64MB Maximum Used: 64MB
> ...
> I0911 11:48:04.776943  7012 containerizer.cpp:2681] Container 
> 474388fe-43c3-4372-b903-eaca22740996 has reached its limit for resource 
> [{"name":"mem","scalar":{"value":64.0},"type":"SCALAR"}] and will be 
> terminated
> {noformat}
> The following {{mesos-execute}} task will show the container limitation 
> correctly:
> {noformat}
> exec /opt/mesos/bin/mesos-execute --content_type=json 
> --master=jpeach.apple.com:5050 '--task_group={
> "tasks":
> [
> {
> "name": "37db08f6-4f0f-4ef6-97ee-b10a5c5cc211",
> "task_id": {"value" : "1372b2e2-c501-4e80-bcbd-1a5c5194e206"},
> "agent_id": {"value" : ""},
> "resources": [{
> "name": "cpus",
> "type": "SCALAR",
> "scalar": {
> "value": 0.2
> }
> },
> {
> "name": "mem",
> "type": "SCALAR",
> "scalar": {
>  

[jira] [Created] (MESOS-8000) DefaultExecutorCniTest.ROOT_VerifyContainerIP is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8000:
--

 Summary: DefaultExecutorCniTest.ROOT_VerifyContainerIP is flaky.
 Key: MESOS-8000
 URL: https://issues.apache.org/jira/browse/MESOS-8000
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
 Environment: Ubuntu 16.04
Reporter: Alexander Rukletsov


Observed a failure on internal CI:
{noformat}
../../src/tests/containerizer/cni_isolator_tests.cpp:1419
Failed to wait 15secs for subscribed
{noformat}
Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7999) Add and document ability to expose new /monitor modules on agents

2017-09-21 Thread Charles Allen (JIRA)
Charles Allen created MESOS-7999:


 Summary: Add and document ability to expose new /monitor modules 
on agents
 Key: MESOS-7999
 URL: https://issues.apache.org/jira/browse/MESOS-7999
 Project: Mesos
  Issue Type: Wish
  Components: agent, json api, modules, statistics
Reporter: Charles Allen


When looking at how to collect data about the cluster, the best way to support 
functionality similar to Kubernetes DaemonSets is not completely clear.

One key use case fore DaemonSets is a monitor for system metrics. This ask is 
that agents are able to have a module which either exposes new endpoints in 
{{/monitor}} or allows pluggable entries to be added to {{/monitor/statistics}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7997) ContentType/MasterAPITest.CreateAndDestroyVolumes is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7997:
---
Attachment: CreateAndDestroyVolumes_goodrun.txt
CreateAndDestroyVolumes_badrun.txt

> ContentType/MasterAPITest.CreateAndDestroyVolumes is flaky.
> ---
>
> Key: MESOS-7997
> URL: https://issues.apache.org/jira/browse/MESOS-7997
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Ubuntu 17.04 with SSL
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: CreateAndDestroyVolumes_badrun.txt, 
> CreateAndDestroyVolumes_goodrun.txt
>
>
> Observed a failure on the internal CI:
> {noformat}
> ../../src/tests/api_tests.cpp:3052
> Value of: Resources(offer.resources()).contains( allocatedResources(volume, 
> frameworkInfo.role()))
>   Actual: false
> Expected: true
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7998) PersistentVolumeEndpointsTest.UnreserveVolumeResources is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7998:
---
Attachment: UnreserveVolumeResources_badrun.txt

> PersistentVolumeEndpointsTest.UnreserveVolumeResources is flaky.
> 
>
> Key: MESOS-7998
> URL: https://issues.apache.org/jira/browse/MESOS-7998
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Ubuntu 17.04 with SSL
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: UnreserveVolumeResources_badrun.txt
>
>
> Observed a failure on the internal CI:
> {noformat}
> ../../src/tests/persistent_volume_endpoints_tests.cpp:450
> Value of: (response).get().status
>   Actual: "409 Conflict"
> Expected: Accepted().status
> Which is: "202 Accepted"
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7998) PersistentVolumeEndpointsTest.UnreserveVolumeResources is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7998:
---
Summary: PersistentVolumeEndpointsTest.UnreserveVolumeResources is flaky.  
(was: UnreserveVolumeResources is flaky.)

> PersistentVolumeEndpointsTest.UnreserveVolumeResources is flaky.
> 
>
> Key: MESOS-7998
> URL: https://issues.apache.org/jira/browse/MESOS-7998
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Ubuntu 17.04 with SSL
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
>
> Observed a failure on the internal CI:
> {noformat}
> ../../src/tests/persistent_volume_endpoints_tests.cpp:450
> Value of: (response).get().status
>   Actual: "409 Conflict"
> Expected: Accepted().status
> Which is: "202 Accepted"
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7998) UnreserveVolumeResources is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7998:
---
Environment: Ubuntu 17.04 with SSL  (was: Ubuntu 17.07 with SSL)

> UnreserveVolumeResources is flaky.
> --
>
> Key: MESOS-7998
> URL: https://issues.apache.org/jira/browse/MESOS-7998
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Ubuntu 17.04 with SSL
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
>
> Observed a failure on the internal CI:
> {noformat}
> ../../src/tests/persistent_volume_endpoints_tests.cpp:450
> Value of: (response).get().status
>   Actual: "409 Conflict"
> Expected: Accepted().status
> Which is: "202 Accepted"
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7998) UnreserveVolumeResources is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-7998:
--

 Summary: UnreserveVolumeResources is flaky.
 Key: MESOS-7998
 URL: https://issues.apache.org/jira/browse/MESOS-7998
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
 Environment: Ubuntu 17.07 with SSL
Reporter: Alexander Rukletsov


Observed a failure on the internal CI:
{noformat}
../../src/tests/persistent_volume_endpoints_tests.cpp:450
Value of: (response).get().status
  Actual: "409 Conflict"
Expected: Accepted().status
Which is: "202 Accepted"
{noformat}
Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7997) ContentType/MasterAPITest.CreateAndDestroyVolumes is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-7997:
--

 Summary: ContentType/MasterAPITest.CreateAndDestroyVolumes is 
flaky.
 Key: MESOS-7997
 URL: https://issues.apache.org/jira/browse/MESOS-7997
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
 Environment: Ubuntu 17.04 with SSL
Reporter: Alexander Rukletsov


Observed a failure on the internal CI:
{noformat}
../../src/tests/api_tests.cpp:3052
Value of: Resources(offer.resources()).contains( allocatedResources(volume, 
frameworkInfo.role()))
  Actual: false
Expected: true
{noformat}
Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7996) ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7996:
---
Attachment: SchedulerTest.NoOffersWithAllRolesSuppressed_goodrun.txt
SchedulerTest.NoOffersWithAllRolesSuppressed_badrun.txt

> ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.
> --
>
> Key: MESOS-7996
> URL: https://issues.apache.org/jira/browse/MESOS-7996
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.5.0
> Environment: Observed on Ubuntu 17.04 with SSL enabled
>Reporter: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: SchedulerTest.NoOffersWithAllRolesSuppressed_badrun.txt, 
> SchedulerTest.NoOffersWithAllRolesSuppressed_goodrun.txt
>
>
> Observed the failure on internal CI:
> {noformat}
> ../../src/tests/scheduler_tests.cpp:1474
> Mock function called more times than expected - returning directly.
> Function call: offers(0x7b085d90, @0x7f1a88003590 48-byte object 
> <48-82 52-9F 1A-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
> 00-00 00-00 00-00 00-00 01-00 00-00 04-00 00-00 20-4D 00-88 1A-7F 00-00>)
>  Expected: to be never called
>Actual: called once - over-saturated and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7996) ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed is flaky.

2017-09-21 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-7996:
--

 Summary: ContentType/SchedulerTest.NoOffersWithAllRolesSuppressed 
is flaky.
 Key: MESOS-7996
 URL: https://issues.apache.org/jira/browse/MESOS-7996
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
 Environment: Observed on Ubuntu 17.04 with SSL enabled
Reporter: Alexander Rukletsov


Observed the failure on internal CI:
{noformat}
../../src/tests/scheduler_tests.cpp:1474
Mock function called more times than expected - returning directly.
Function call: offers(0x7b085d90, @0x7f1a88003590 48-byte object <48-82 
52-9F 1A-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 
00-00 00-00 01-00 00-00 04-00 00-00 20-4D 00-88 1A-7F 00-00>)
 Expected: to be never called
   Actual: called once - over-saturated and active
{noformat}
Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6162) Add support for cgroups blkio subsystem blkio statistics.

2017-09-21 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174787#comment-16174787
 ] 

Qian Zhang commented on MESOS-6162:
---

I did more tests for this performance issue with Mesos (rather than just 
manually tested it with {{dd}} in my previous post), I used {{mesos-execute}} 
to launch task to run {{dd}} like this:
{code}mesos-execute --master=192.168.1.6:5050 --name=test --command="dd 
if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync"{code}
And I found this performance issue will *always* happen as long as the 
combination {{ext4/ext3 with the data=ordered option}} + {{cfq IO scheduler}} 
is met *no matter `cgroups/blkio` isolation is enabled or not*, i.e., if that 
combination is met, the task will always take much longer to complete (~16s) 
than what the task will take (~1.2s) if that combination is not met regardless 
`cgroups/blkio` enabled or not.

So it seems this performance issue has nothing to do with `cgroups/blkio` since 
it will happen even `cgroups/blkio` is not enabled at all. However a weird 
issue I found is, if the process is assigned to the *root* blkio cgroup and 
even that combination is met, this performance issue will *not* happen:
{code}
# echo $$ > /sys/fs/cgroup/blkio/cgroup.procs 
# dd if=/dev/zero of=test.bin bs=512 count=1000 oflag=dsync 
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 1.19546 s, 428 kB/s<--- No 
performance issue.
{code}

So the conclusion is when the combination is met, 
# If the process is not assigned to any blkio cgroups (i.e., `cgroups/blio` 
isolation is not enabled), the performance issue will happen.
# If the process is assigned to a sub blkio cgroup (i.e., `cgroups/blio` 
isolation is enabled), the performance issue will happen.
# If the process is assigned to the root blkio cgroup, the performance issue 
will not happen.

I think 1 and 2 will happen in the Mesos context but not 3 since a container 
launched by Mesos will never be assigned to the root blkio cgroup. Originally I 
thought we should add a note for the performance issue in the doc of 
`cgroups/blkio`, but now I think that may not be the right place to mention 
such performance issue, instead we should add such note in the doc 
{{mesos-containerizer.md}} and {{persistent-volume.md}}.


> Add support for cgroups blkio subsystem blkio statistics.
> -
>
> Key: MESOS-6162
> URL: https://issues.apache.org/jira/browse/MESOS-6162
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups, containerization
>Reporter: haosdent
>Assignee: Jason Lai
>  Labels: cgroups, containerizer, mesosphere
> Fix For: 1.4.0
>
>
> Noted that cgroups blkio subsystem may have performance issue, refer to 
> https://github.com/opencontainers/runc/issues/861



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7500) Command checks via agent lead to flaky tests.

2017-09-21 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174785#comment-16174785
 ] 

Andrei Budnik commented on MESOS-7500:
--

Another example from the failed run, including debug output 
(https://reviews.apache.org/r/59107):
https://pastebin.com/iKA1WaZB

> Command checks via agent lead to flaky tests.
> -
>
> Key: MESOS-7500
> URL: https://issues.apache.org/jira/browse/MESOS-7500
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rukletsov
>Assignee: Gastón Kleiman
>  Labels: check, flaky-test, health-check, mesosphere
>
> Tests that rely on command checks via agent are flaky on Apache CI. Here is 
> an example from one of the failed run: https://pastebin.com/g2mPgYzu



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2017-09-21 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174742#comment-16174742
 ] 

Alexander Rukletsov commented on MESOS-7742:


Observed this on internal CI, for both {{application/x-protobuf}} and 
{{application/json}}. Same failure:
{noformat}
../../src/tests/api_tests.cpp:6701
Value of: (response).get().status
  Actual: "500 Internal Server Error"
Expected: http::OK().status
Which is: "200 OK"
{noformat}

> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
> --
>
> Key: MESOS-7742
> URL: https://issues.apache.org/jira/browse/MESOS-7742
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Gastón Kleiman
>  Labels: flaky-test, mesosphere-oncall
>
> Observed this on ASF CI. 
> [~gkleiman] mind triaging this?
> {code}
> [ RUN  ] 
> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
> I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0629 05:49:33.182234 25306 master.cpp:436] Master 
> 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 
> 172.17.0.3:45726
> I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" -
> -allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --au
> thenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/a5h5J3/credentials" 
> --framework_sorter="drf" --help="false" --hostn
> ame_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" 
> --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="10
> 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" 
> --registry="in_memory" --registry_fetch_timeout="1mins" 
> --registry_gc_interval="15mins" --registr
> y_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" -
> -version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs"
> I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/a5h5J3/credentials'
> I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled
> I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical 
> allocator process
> I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given
> I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master!
> I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar
> I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar
> I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 183040ns
> I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 
> 6441ns; attempting to update the registry
> I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the 
> registry in 147200ns
> I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered 
> registrar
> I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni
> W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: 
> AufsBackend 

[jira] [Commented] (MESOS-7995) libprocess tests breaking on macOS.

2017-09-21 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174656#comment-16174656
 ] 

Jan Schlicht commented on MESOS-7995:
-

Forgot to mention it: Mine's also a SSL build (--enable-libevent --enable-ssl), 
using libevent 2.0.22. Latest HEAD (c0293a6f7d457a595a3763662e3a9740db31859b).

> libprocess tests breaking on macOS.
> ---
>
> Key: MESOS-7995
> URL: https://issues.apache.org/jira/browse/MESOS-7995
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
>Affects Versions: 1.5.0
>Reporter: Till Toenshoff
>Priority: Blocker
>
> Many libprocess tests fail on macOS, some even abort.
> Examples:
> {noformat}
> [--] 8 tests from HTTPConnectionTest
> [ RUN  ] HTTPConnectionTest.GzipRequestBody
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:972: Failure
> Failed to wait 15secs for connect
> [  FAILED  ] HTTPConnectionTest.GzipRequestBody (15001 ms)
> [ RUN  ] HTTPConnectionTest.Serial
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1015: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Serial (0 ms)
> [ RUN  ] HTTPConnectionTest.Pipeline
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1094: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Pipeline (1 ms)
> [ RUN  ] HTTPConnectionTest.ClosingRequest
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1190: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingRequest (0 ms)
> [ RUN  ] HTTPConnectionTest.ClosingResponse
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1245: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingResponse (0 ms)
> [ RUN  ] HTTPConnectionTest.ReferenceCounting
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1306: Failure
> (*connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ReferenceCounting (1 ms)
> [ RUN  ] HTTPConnectionTest.Equality
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1333: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Equality (0 ms)
> [ RUN  ] HTTPConnectionTest.RequestStreaming
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1360: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.RequestStreaming (0 ms)
> [--] 8 tests from HTTPConnectionTest (15003 ms total)
> {noformat}
> {noformat}
> [--] 8 tests from HttpAuthenticationTest
> [ RUN  ] HttpAuthenticationTest.NoAuthenticator
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1792: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1786: Failure
> Actual function call count doesn't match EXPECT_CALL(*http.process, 
> authenticated(_, Option::none()))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> [  FAILED  ] HttpAuthenticationTest.NoAuthenticator (1 ms)
> [ RUN  ] HttpAuthenticationTest.Unauthorized
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1816: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F0921 12:18:19.947710 2519827264 future.hpp:1151] Check failed: !isFailed() 
> Future::get() but state == FAILED: Failed to connect to 192.168.178.20:51437: 
> Host is down
> *** Check failure stack trace: ***
> *** Aborted at 1505989099 (unix time) try "date -d @1505989099" if you are 
> using GNU date ***
> PC: @ 0x7fff5cd45fce __pthread_kill
> *** SIGABRT (@0x7fff5cd45fce) received by PID 23916 (TID 0x7fff96318340) 
> stack trace: ***
> @ 0x7fff5ce76f5a _sigtramp
> @ 0x7fff5ac5e526 std::__1::locale::facet::__on_zero_shared()
> @ 0x7fff5cca232a abort
> @0x1077b9659 google::logging_fail()
> @0x1077b964a google::LogMessage::Fail()
> @0x1077b72fc google::LogMessage::SendToLog()
> @0x1077b8089 google::LogMessage::Flush()
> @0x1077c12e9 google::LogMessageFatal::~LogMessageFatal()
> @0x1077b9b35 google::LogMessageFatal::~LogMessageFatal()
> @0x106998ad1 process::Future<>::get()
> @0x1069d4d5b HttpAuthenticationTest_Unauthorized_Test::TestBody()
> @0x1070a828e 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>  

[jira] [Commented] (MESOS-7995) libprocess tests breaking on macOS.

2017-09-21 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174651#comment-16174651
 ] 

Benjamin Bannier commented on MESOS-7995:
-

I can only repro this in an SSL-build.

> libprocess tests breaking on macOS.
> ---
>
> Key: MESOS-7995
> URL: https://issues.apache.org/jira/browse/MESOS-7995
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
>Affects Versions: 1.5.0
>Reporter: Till Toenshoff
>Priority: Blocker
>
> Many libprocess tests fail on macOS, some even abort.
> Examples:
> {noformat}
> [--] 8 tests from HTTPConnectionTest
> [ RUN  ] HTTPConnectionTest.GzipRequestBody
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:972: Failure
> Failed to wait 15secs for connect
> [  FAILED  ] HTTPConnectionTest.GzipRequestBody (15001 ms)
> [ RUN  ] HTTPConnectionTest.Serial
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1015: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Serial (0 ms)
> [ RUN  ] HTTPConnectionTest.Pipeline
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1094: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Pipeline (1 ms)
> [ RUN  ] HTTPConnectionTest.ClosingRequest
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1190: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingRequest (0 ms)
> [ RUN  ] HTTPConnectionTest.ClosingResponse
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1245: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingResponse (0 ms)
> [ RUN  ] HTTPConnectionTest.ReferenceCounting
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1306: Failure
> (*connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ReferenceCounting (1 ms)
> [ RUN  ] HTTPConnectionTest.Equality
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1333: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Equality (0 ms)
> [ RUN  ] HTTPConnectionTest.RequestStreaming
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1360: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.RequestStreaming (0 ms)
> [--] 8 tests from HTTPConnectionTest (15003 ms total)
> {noformat}
> {noformat}
> [--] 8 tests from HttpAuthenticationTest
> [ RUN  ] HttpAuthenticationTest.NoAuthenticator
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1792: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1786: Failure
> Actual function call count doesn't match EXPECT_CALL(*http.process, 
> authenticated(_, Option::none()))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> [  FAILED  ] HttpAuthenticationTest.NoAuthenticator (1 ms)
> [ RUN  ] HttpAuthenticationTest.Unauthorized
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1816: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F0921 12:18:19.947710 2519827264 future.hpp:1151] Check failed: !isFailed() 
> Future::get() but state == FAILED: Failed to connect to 192.168.178.20:51437: 
> Host is down
> *** Check failure stack trace: ***
> *** Aborted at 1505989099 (unix time) try "date -d @1505989099" if you are 
> using GNU date ***
> PC: @ 0x7fff5cd45fce __pthread_kill
> *** SIGABRT (@0x7fff5cd45fce) received by PID 23916 (TID 0x7fff96318340) 
> stack trace: ***
> @ 0x7fff5ce76f5a _sigtramp
> @ 0x7fff5ac5e526 std::__1::locale::facet::__on_zero_shared()
> @ 0x7fff5cca232a abort
> @0x1077b9659 google::logging_fail()
> @0x1077b964a google::LogMessage::Fail()
> @0x1077b72fc google::LogMessage::SendToLog()
> @0x1077b8089 google::LogMessage::Flush()
> @0x1077c12e9 google::LogMessageFatal::~LogMessageFatal()
> @0x1077b9b35 google::LogMessageFatal::~LogMessageFatal()
> @0x106998ad1 process::Future<>::get()
> @0x1069d4d5b HttpAuthenticationTest_Unauthorized_Test::TestBody()
> @0x1070a828e 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @0x10704a96b 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10704a896 

[jira] [Commented] (MESOS-7995) libprocess tests breaking on macOS.

2017-09-21 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174636#comment-16174636
 ] 

Jan Schlicht commented on MESOS-7995:
-

Is there something specific different in your environment? Can't reproduce this 
on macOS 10.13, Apple Clang 9.0.0. All libprocess tests are successful.

> libprocess tests breaking on macOS.
> ---
>
> Key: MESOS-7995
> URL: https://issues.apache.org/jira/browse/MESOS-7995
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
>Affects Versions: 1.5.0
>Reporter: Till Toenshoff
>Priority: Blocker
>
> Many libprocess tests fail on macOS, some even abort.
> Examples:
> {noformat}
> [--] 8 tests from HTTPConnectionTest
> [ RUN  ] HTTPConnectionTest.GzipRequestBody
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:972: Failure
> Failed to wait 15secs for connect
> [  FAILED  ] HTTPConnectionTest.GzipRequestBody (15001 ms)
> [ RUN  ] HTTPConnectionTest.Serial
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1015: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Serial (0 ms)
> [ RUN  ] HTTPConnectionTest.Pipeline
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1094: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Pipeline (1 ms)
> [ RUN  ] HTTPConnectionTest.ClosingRequest
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1190: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingRequest (0 ms)
> [ RUN  ] HTTPConnectionTest.ClosingResponse
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1245: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingResponse (0 ms)
> [ RUN  ] HTTPConnectionTest.ReferenceCounting
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1306: Failure
> (*connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ReferenceCounting (1 ms)
> [ RUN  ] HTTPConnectionTest.Equality
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1333: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Equality (0 ms)
> [ RUN  ] HTTPConnectionTest.RequestStreaming
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1360: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.RequestStreaming (0 ms)
> [--] 8 tests from HTTPConnectionTest (15003 ms total)
> {noformat}
> {noformat}
> [--] 8 tests from HttpAuthenticationTest
> [ RUN  ] HttpAuthenticationTest.NoAuthenticator
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1792: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1786: Failure
> Actual function call count doesn't match EXPECT_CALL(*http.process, 
> authenticated(_, Option::none()))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> [  FAILED  ] HttpAuthenticationTest.NoAuthenticator (1 ms)
> [ RUN  ] HttpAuthenticationTest.Unauthorized
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1816: Failure
> (response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F0921 12:18:19.947710 2519827264 future.hpp:1151] Check failed: !isFailed() 
> Future::get() but state == FAILED: Failed to connect to 192.168.178.20:51437: 
> Host is down
> *** Check failure stack trace: ***
> *** Aborted at 1505989099 (unix time) try "date -d @1505989099" if you are 
> using GNU date ***
> PC: @ 0x7fff5cd45fce __pthread_kill
> *** SIGABRT (@0x7fff5cd45fce) received by PID 23916 (TID 0x7fff96318340) 
> stack trace: ***
> @ 0x7fff5ce76f5a _sigtramp
> @ 0x7fff5ac5e526 std::__1::locale::facet::__on_zero_shared()
> @ 0x7fff5cca232a abort
> @0x1077b9659 google::logging_fail()
> @0x1077b964a google::LogMessage::Fail()
> @0x1077b72fc google::LogMessage::SendToLog()
> @0x1077b8089 google::LogMessage::Flush()
> @0x1077c12e9 google::LogMessageFatal::~LogMessageFatal()
> @0x1077b9b35 google::LogMessageFatal::~LogMessageFatal()
> @0x106998ad1 process::Future<>::get()
> @0x1069d4d5b HttpAuthenticationTest_Unauthorized_Test::TestBody()
> @0x1070a828e 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @   

[jira] [Commented] (MESOS-7975) The command/default executor can incorrectly send a TASK_FINISHED update even when the task is killed

2017-09-21 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174611#comment-16174611
 ] 

Alexander Rukletsov commented on MESOS-7975:


[~qianzhang] I think we should send an email to the lists. I understand that 
this might seem like a lot of work for "an easy fix", but it is an important 
change even though it requires small code change.

> The command/default executor can incorrectly send a TASK_FINISHED update even 
> when the task is killed
> -
>
> Key: MESOS-7975
> URL: https://issues.apache.org/jira/browse/MESOS-7975
> Project: Mesos
>  Issue Type: Bug
>Reporter: Anand Mazumdar
>Assignee: Qian Zhang
>Priority: Critical
>  Labels: mesosphere
>
> Currently, when a task is killed, the default and the command executor 
> incorrectly send a {{TASK_FINISHED}} status update instead of 
> {{TASK_KILLED}}. This is due to an unfortunate missed conditional check when 
> the task exits with a zero status code.
> {code}
>   if (WSUCCEEDED(status)) {
> taskState = TASK_FINISHED;
>   } else if (killed) {
> // Send TASK_KILLED if the task was killed as a result of
> // kill() or shutdown().
> taskState = TASK_KILLED;
>   } else {
> taskState = TASK_FAILED;
>   }
> {code}
> We should modify the code to correctly send {{TASK_KILLED}} status updates 
> when a task is killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7995) libprocess tests breaking on macOS.

2017-09-21 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174534#comment-16174534
 ] 

Till Toenshoff commented on MESOS-7995:
---

Example with extended logging (GLOG_v=2)

{noformat}
[--] 8 tests from HTTPConnectionTest
[ RUN  ] HTTPConnectionTest.GzipRequestBody
I0921 12:25:35.704711 115154944 process.cpp:3245] Resuming 
__latch__(39)@192.168.178.20:51793 at 2017-09-25 22:25:35.705722944+00:00
I0921 12:25:35.704720 116764672 process.cpp:3245] Resuming 
help@192.168.178.20:51793 at 2017-09-25 22:25:35.705730112+00:00
I0921 12:25:35.704736 115154944 process.cpp:3383] Cleaning up 
__latch__(39)@192.168.178.20:51793
I0921 12:25:35.704778 2519827264 process.cpp:3235] Spawned process 
(1)@192.168.178.20:51793
I0921 12:25:35.704787 114081792 process.cpp:3245] Resuming 
(1)@192.168.178.20:51793 at 2017-09-25 22:25:35.705836096+00:00
I0921 12:25:35.704764 114618368 process.cpp:3245] Resuming 
help@192.168.178.20:51793 at 2017-09-25 22:25:35.705777984+00:00
I0921 12:25:35.705045 2519827264 process.cpp:3235] Spawned process 
__latch__(40)@192.168.178.20:51793
I0921 12:25:35.705051 116228096 process.cpp:3245] Resuming 
__latch__(40)@192.168.178.20:51793 at 2017-09-25 22:25:35.706059072+00:00
I0921 12:25:35.705068 116228096 process.cpp:3383] Cleaning up 
__latch__(40)@192.168.178.20:51793
I0921 12:25:35.705090 115691520 process.cpp:3245] Resuming 
help@192.168.178.20:51793 at 2017-09-25 22:25:35.706096960+00:00
../../../3rdparty/libprocess/src/tests/http_tests.cpp:972: Failure
(connect).failure(): Failed to connect to 192.168.178.20:51793: Host is down
I0921 12:25:35.705114 2519827264 process.cpp:3555] Donating thread to 
(1)@192.168.178.20:51793 while waiting
I0921 12:25:35.705135 2519827264 process.cpp:3245] Resuming 
(1)@192.168.178.20:51793 at 2017-09-25 22:25:35.706139968+00:00
I0921 12:25:35.705147 2519827264 process.cpp:3383] Cleaning up 
(1)@192.168.178.20:51793
I0921 12:25:35.705168 113008640 process.cpp:3245] Resuming 
help@192.168.178.20:51793 at 2017-09-25 22:25:35.706178112+00:00
[  FAILED  ] HTTPConnectionTest.GzipRequestBody (1 ms)
{noformat}

> libprocess tests breaking on macOS.
> ---
>
> Key: MESOS-7995
> URL: https://issues.apache.org/jira/browse/MESOS-7995
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
>Affects Versions: 1.5.0
>Reporter: Till Toenshoff
>Priority: Blocker
>
> Many libprocess tests fail on macOS, some even abort.
> Examples:
> {noformat}
> [--] 8 tests from HTTPConnectionTest
> [ RUN  ] HTTPConnectionTest.GzipRequestBody
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:972: Failure
> Failed to wait 15secs for connect
> [  FAILED  ] HTTPConnectionTest.GzipRequestBody (15001 ms)
> [ RUN  ] HTTPConnectionTest.Serial
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1015: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Serial (0 ms)
> [ RUN  ] HTTPConnectionTest.Pipeline
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1094: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Pipeline (1 ms)
> [ RUN  ] HTTPConnectionTest.ClosingRequest
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1190: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingRequest (0 ms)
> [ RUN  ] HTTPConnectionTest.ClosingResponse
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1245: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ClosingResponse (0 ms)
> [ RUN  ] HTTPConnectionTest.ReferenceCounting
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1306: Failure
> (*connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.ReferenceCounting (1 ms)
> [ RUN  ] HTTPConnectionTest.Equality
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1333: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.Equality (0 ms)
> [ RUN  ] HTTPConnectionTest.RequestStreaming
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1360: Failure
> (connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
> [  FAILED  ] HTTPConnectionTest.RequestStreaming (0 ms)
> [--] 8 tests from HTTPConnectionTest (15003 ms total)
> {noformat}
> {noformat}
> [--] 8 tests from HttpAuthenticationTest
> [ RUN  ] HttpAuthenticationTest.NoAuthenticator
> ../../../3rdparty/libprocess/src/tests/http_tests.cpp:1792: Failure
> (response).failure(): Failed to connect to 

[jira] [Created] (MESOS-7995) libprocess tests breaking on macOS.

2017-09-21 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-7995:
-

 Summary: libprocess tests breaking on macOS.
 Key: MESOS-7995
 URL: https://issues.apache.org/jira/browse/MESOS-7995
 Project: Mesos
  Issue Type: Bug
  Components: libprocess, test
Affects Versions: 1.5.0
Reporter: Till Toenshoff
Priority: Blocker


Many libprocess tests fail on macOS, some even abort.

Examples:
{noformat}
[--] 8 tests from HTTPConnectionTest
[ RUN  ] HTTPConnectionTest.GzipRequestBody
../../../3rdparty/libprocess/src/tests/http_tests.cpp:972: Failure
Failed to wait 15secs for connect
[  FAILED  ] HTTPConnectionTest.GzipRequestBody (15001 ms)
[ RUN  ] HTTPConnectionTest.Serial
../../../3rdparty/libprocess/src/tests/http_tests.cpp:1015: Failure
(connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
[  FAILED  ] HTTPConnectionTest.Serial (0 ms)
[ RUN  ] HTTPConnectionTest.Pipeline
../../../3rdparty/libprocess/src/tests/http_tests.cpp:1094: Failure
(connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
[  FAILED  ] HTTPConnectionTest.Pipeline (1 ms)
[ RUN  ] HTTPConnectionTest.ClosingRequest
../../../3rdparty/libprocess/src/tests/http_tests.cpp:1190: Failure
(connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
[  FAILED  ] HTTPConnectionTest.ClosingRequest (0 ms)
[ RUN  ] HTTPConnectionTest.ClosingResponse
../../../3rdparty/libprocess/src/tests/http_tests.cpp:1245: Failure
(connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
[  FAILED  ] HTTPConnectionTest.ClosingResponse (0 ms)
[ RUN  ] HTTPConnectionTest.ReferenceCounting
../../../3rdparty/libprocess/src/tests/http_tests.cpp:1306: Failure
(*connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
[  FAILED  ] HTTPConnectionTest.ReferenceCounting (1 ms)
[ RUN  ] HTTPConnectionTest.Equality
../../../3rdparty/libprocess/src/tests/http_tests.cpp:1333: Failure
(connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
[  FAILED  ] HTTPConnectionTest.Equality (0 ms)
[ RUN  ] HTTPConnectionTest.RequestStreaming
../../../3rdparty/libprocess/src/tests/http_tests.cpp:1360: Failure
(connect).failure(): Failed to connect to 192.168.178.20:51437: Host is down
[  FAILED  ] HTTPConnectionTest.RequestStreaming (0 ms)
[--] 8 tests from HTTPConnectionTest (15003 ms total)
{noformat}


{noformat}
[--] 8 tests from HttpAuthenticationTest
[ RUN  ] HttpAuthenticationTest.NoAuthenticator
../../../3rdparty/libprocess/src/tests/http_tests.cpp:1792: Failure
(response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
../../../3rdparty/libprocess/src/tests/http_tests.cpp:1786: Failure
Actual function call count doesn't match EXPECT_CALL(*http.process, 
authenticated(_, Option::none()))...
 Expected: to be called once
   Actual: never called - unsatisfied and active
[  FAILED  ] HttpAuthenticationTest.NoAuthenticator (1 ms)
[ RUN  ] HttpAuthenticationTest.Unauthorized
../../../3rdparty/libprocess/src/tests/http_tests.cpp:1816: Failure
(response).failure(): Failed to connect to 192.168.178.20:51437: Host is down
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0921 12:18:19.947710 2519827264 future.hpp:1151] Check failed: !isFailed() 
Future::get() but state == FAILED: Failed to connect to 192.168.178.20:51437: 
Host is down
*** Check failure stack trace: ***
*** Aborted at 1505989099 (unix time) try "date -d @1505989099" if you are 
using GNU date ***
PC: @ 0x7fff5cd45fce __pthread_kill
*** SIGABRT (@0x7fff5cd45fce) received by PID 23916 (TID 0x7fff96318340) stack 
trace: ***
@ 0x7fff5ce76f5a _sigtramp
@ 0x7fff5ac5e526 std::__1::locale::facet::__on_zero_shared()
@ 0x7fff5cca232a abort
@0x1077b9659 google::logging_fail()
@0x1077b964a google::LogMessage::Fail()
@0x1077b72fc google::LogMessage::SendToLog()
@0x1077b8089 google::LogMessage::Flush()
@0x1077c12e9 google::LogMessageFatal::~LogMessageFatal()
@0x1077b9b35 google::LogMessageFatal::~LogMessageFatal()
@0x106998ad1 process::Future<>::get()
@0x1069d4d5b HttpAuthenticationTest_Unauthorized_Test::TestBody()
@0x1070a828e 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@0x10704a96b 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x10704a896 testing::Test::Run()
@0x10704c60d testing::TestInfo::Run()
@0x10704dc0c testing::TestCase::Run()
@0x10705e14c testing::internal::UnitTestImpl::RunAllTests()
@0x1070ac2fe 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@0x10705db7b 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@

[jira] [Updated] (MESOS-7994) Hard-coded protobuf version in mesos.pom.in

2017-09-21 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-7994:

Component/s: java api

> Hard-coded protobuf version in mesos.pom.in
> ---
>
> Key: MESOS-7994
> URL: https://issues.apache.org/jira/browse/MESOS-7994
> Project: Mesos
>  Issue Type: Bug
>  Components: java api
>Reporter: Benno Evers
>
> Currently, the version of protobuf.jar used by maven is hardcoded in 
> `src/java/mesos.pom.in` to be 3.3.0.
> When building against a non-bundled version of protobuf, this will likely 
> cause a version mismatch which can lead to build errors because the java 
> build is trying to compile the java source files created by the protoc of the 
> non-bundled protobuf.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-7994) Hard-coded protobuf version in mesos.pom.in

2017-09-21 Thread Benno Evers (JIRA)
Benno Evers created MESOS-7994:
--

 Summary: Hard-coded protobuf version in mesos.pom.in
 Key: MESOS-7994
 URL: https://issues.apache.org/jira/browse/MESOS-7994
 Project: Mesos
  Issue Type: Bug
Reporter: Benno Evers


Currently, the version of protobuf.jar used by maven is hardcoded in 
`src/java/mesos.pom.in` to be 3.3.0.

When building against a non-bundled version of protobuf, this will likely cause 
a version mismatch which can lead to build errors because the java build is 
trying to compile the java source files created by the protoc of the 
non-bundled protobuf.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)