[jira] [Commented] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2018-01-22 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334227#comment-16334227
 ] 

Andrei Budnik commented on MESOS-7742:
--

https://reviews.apache.org/r/65261/

I think this patch provides a better solution than retrying to 
[connect|https://github.com/apache/mesos/blob/336e932199643e88c0edbea7c1f08d4b45596389/src/slave/containerizer/mesos/io/switchboard.cpp#L696-L700],
because otherwise it's needed to:
# Use one more `loop` for retrying logic
# Define the limit of retry attempts and delay between attempts
# It might retry to connect due to some non-ECONNREFUSED error

> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
> --
>
> Key: MESOS-7742
> URL: https://issues.apache.org/jira/browse/MESOS-7742
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.5.0
>Reporter: Vinod Kone
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: flaky-test, mesosphere-oncall
> Fix For: 1.6.0
>
> Attachments: AgentAPITest.LaunchNestedContainerSession-badrun.txt, 
> LaunchNestedContainerSessionDisconnected-badrun.txt
>
>
> Observed this on ASF CI and internal Mesosphere CI. Affected tests:
> {noformat}
> AgentAPIStreamingTest.AttachInputToNestedContainerSession
> AgentAPITest.LaunchNestedContainerSession
> AgentAPITest.AttachContainerInputAuthorization/0
> AgentAPITest.LaunchNestedContainerSessionWithTTY/0
> AgentAPITest.LaunchNestedContainerSessionDisconnected/1
> {noformat}
> This issue comes at least in three different flavours. Take 
> {{AgentAPIStreamingTest.AttachInputToNestedContainerSession}} as an example.
> h5. Flavour 1
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "503 Service Unavailable"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: ""
> {noformat}
> h5. Flavour 2
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: "Disconnected"
> {noformat}
> h5. Flavour 3
> {noformat}
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-16.04/mesos/src/tests/api_tests.cpp:6367
> Value of: (sessionResponse).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: ""
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2018-01-18 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16330630#comment-16330630
 ] 

Andrei Budnik commented on MESOS-7742:
--

Steps to reproduce second cause:
1. Add a {{::sleep(2);}} after [binding unix 
socket|https://github.com/apache/mesos/blob/634c8af2618c57a1405d20717fa909b399486f37/src/slave/containerizer/mesos/io/switchboard.cpp#L1056].
2. Recompile `make && make check`.
3. Launch a test:
{code:}
GLOG_v=2 sudo GLOG_v=2 ./src/mesos-tests 
--gtest_filter=ContentType/AgentAPITest.LaunchNestedContainerSession/0 
--gtest_break_on_failure --gtest_repeat=1 --verbose
{code}


> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
> --
>
> Key: MESOS-7742
> URL: https://issues.apache.org/jira/browse/MESOS-7742
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.5.0
>Reporter: Vinod Kone
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: flaky-test, mesosphere-oncall
> Fix For: 1.6.0
>
> Attachments: AgentAPITest.LaunchNestedContainerSession-badrun.txt, 
> LaunchNestedContainerSessionDisconnected-badrun.txt
>
>
> Observed this on ASF CI and internal Mesosphere CI. Affected tests:
> {noformat}
> AgentAPIStreamingTest.AttachInputToNestedContainerSession
> AgentAPITest.LaunchNestedContainerSession
> AgentAPITest.AttachContainerInputAuthorization/0
> AgentAPITest.LaunchNestedContainerSessionWithTTY/0
> AgentAPITest.LaunchNestedContainerSessionDisconnected/1
> {noformat}
> This issue comes at least in three different flavours. Take 
> {{AgentAPIStreamingTest.AttachInputToNestedContainerSession}} as an example.
> h5. Flavour 1
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "503 Service Unavailable"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: ""
> {noformat}
> h5. Flavour 2
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: "Disconnected"
> {noformat}
> h5. Flavour 3
> {noformat}
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-16.04/mesos/src/tests/api_tests.cpp:6367
> Value of: (sessionResponse).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: ""
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2018-01-17 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329006#comment-16329006
 ] 

Andrei Budnik commented on MESOS-7742:
--

These patches ^^ are fixing the first cause described in the [first 
patch|https://reviews.apache.org/r/65122/].

There is a second cause when an attempt to connect to IO-Switchboard fails with:
{code:java}
I1109 23:47:25.016929 27803 process.cpp:3982] Failed to process request for 
'/slave(812)/api/v1': Failed to connect to 
/tmp/mesos-io-switchboard-56bcba4b-6e81-4aeb-a0e9-41309ec991b5: Connection 
refused
W1109 23:47:25.017009 27803 http.cpp:2944] Failed to attach to nested container 
7ab572dd-78b5-4186-93af-7ac011990f80.b77944da-f1d5-4694-a51b-8fde150c5f7a: 
Failed to connect to 
/tmp/mesos-io-switchboard-56bcba4b-6e81-4aeb-a0e9-41309ec991b5: Connection 
refused
I1109 23:47:25.017063 27803 process.cpp:1590] Returning '500 Internal Server 
Error' for '/slave(812)/api/v1' (Failed to connect to 
/tmp/mesos-io-switchboard-56bcba4b-6e81-4aeb-a0e9-41309ec991b5: Connection 
refused)
{code}
The reason for this failure needs to be investigated.

> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
> --
>
> Key: MESOS-7742
> URL: https://issues.apache.org/jira/browse/MESOS-7742
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.5.0
>Reporter: Vinod Kone
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: flaky-test, mesosphere-oncall
> Fix For: 1.6.0
>
> Attachments: AgentAPITest.LaunchNestedContainerSession-badrun.txt, 
> LaunchNestedContainerSessionDisconnected-badrun.txt
>
>
> Observed this on ASF CI and internal Mesosphere CI. Affected tests:
> {noformat}
> AgentAPIStreamingTest.AttachInputToNestedContainerSession
> AgentAPITest.LaunchNestedContainerSession
> AgentAPITest.AttachContainerInputAuthorization/0
> AgentAPITest.LaunchNestedContainerSessionWithTTY/0
> AgentAPITest.LaunchNestedContainerSessionDisconnected/1
> {noformat}
> This issue comes at least in three different flavours. Take 
> {{AgentAPIStreamingTest.AttachInputToNestedContainerSession}} as an example.
> h5. Flavour 1
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "503 Service Unavailable"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: ""
> {noformat}
> h5. Flavour 2
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: "Disconnected"
> {noformat}
> h5. Flavour 3
> {noformat}
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-16.04/mesos/src/tests/api_tests.cpp:6367
> Value of: (sessionResponse).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: ""
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2018-01-09 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318666#comment-16318666
 ] 

Andrei Budnik commented on MESOS-7742:
--

As we have launched 
[`cat`|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/src/tests/api_tests.cpp#L6529]
 command as a nested container, related ioswitchboard process will be in the 
same process group. Whenever a process group leader ({{cat}}) terminates, all 
processes in the process group are killed, including ioswitchboard.
ioswitchboard handles HTTP requests from the slave, e.g. 
{{ATTACH_CONTAINER_INPUT}} request in this test.
Usually, after reading all client's data, {{Http::_attachContainerInput()}} 
invokes a callback which calls 
[writer.close()|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/src/slave/http.cpp#L3223].
[writer.close()|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/3rdparty/libprocess/src/http.cpp#L561]
 implies sending a 
[\r\n\r\n|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/3rdparty/libprocess/src/http.cpp#L1045]
 to the ioswitchboard process.
ioswitchboard returns [200 
OK|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/src/slave/containerizer/mesos/io/switchboard.cpp#L1572]
 response, hence agent returns {{200 OK}} for {{ATTACH_CONTAINER_INPUT}} 
request as expected.

However, if ioswitchboard terminates before it receives {{\r\n\r\n}} or before 
agent receives {{200 OK}} response from the ioswitchboard, connection (via unix 
socket) might be closed, so corresponding {{ConnectionProcess}} will handle 
this case as an unexpected [EOF| 
https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/3rdparty/libprocess/src/http.cpp#L1293
 
https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/3rdparty/libprocess/src/http.cpp#L1293]
 during 
[read|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/3rdparty/libprocess/src/http.cpp#L1216]
 of a response. That will lead to {{500 Internal Server Error}} response from 
the agent.

> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
> --
>
> Key: MESOS-7742
> URL: https://issues.apache.org/jira/browse/MESOS-7742
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Andrei Budnik
>  Labels: flaky-test, mesosphere-oncall
> Attachments: AgentAPITest.LaunchNestedContainerSession-badrun.txt, 
> LaunchNestedContainerSessionDisconnected-badrun.txt
>
>
> Observed this on ASF CI and internal Mesosphere CI. Affected tests:
> {noformat}
> AgentAPIStreamingTest.AttachInputToNestedContainerSession
> AgentAPITest.LaunchNestedContainerSession
> AgentAPITest.AttachContainerInputAuthorization/0
> AgentAPITest.LaunchNestedContainerSessionWithTTY/0
> AgentAPITest.LaunchNestedContainerSessionDisconnected/1
> {noformat}
> This issue comes at least in three different flavours. Take 
> {{AgentAPIStreamingTest.AttachInputToNestedContainerSession}} as an example.
> h5. Flavour 1
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "503 Service Unavailable"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: ""
> {noformat}
> h5. Flavour 2
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: "Disconnected"
> {noformat}
> h5. Flavour 3
> {noformat}
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-16.04/mesos/src/tests/api_tests.cpp:6367
> Value of: (sessionResponse).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: ""
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2018-01-09 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318567#comment-16318567
 ] 

Andrei Budnik commented on MESOS-7742:
--

How to reproduce Flavour 3:
Put a {{::sleep(1);}} before {{writer.close();}} in 
[Http::_attachContainerInput()|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/src/slave/http.cpp#L3222].

> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
> --
>
> Key: MESOS-7742
> URL: https://issues.apache.org/jira/browse/MESOS-7742
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Andrei Budnik
>  Labels: flaky-test, mesosphere-oncall
> Attachments: AgentAPITest.LaunchNestedContainerSession-badrun.txt, 
> LaunchNestedContainerSessionDisconnected-badrun.txt
>
>
> Observed this on ASF CI and internal Mesosphere CI. Affected tests:
> {noformat}
> AgentAPIStreamingTest.AttachInputToNestedContainerSession
> AgentAPITest.LaunchNestedContainerSession
> AgentAPITest.AttachContainerInputAuthorization/0
> AgentAPITest.LaunchNestedContainerSessionWithTTY/0
> AgentAPITest.LaunchNestedContainerSessionDisconnected/1
> {noformat}
> This issue comes at least in three different flavours. Take 
> {{AgentAPIStreamingTest.AttachInputToNestedContainerSession}} as an example.
> h5. Flavour 1
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "503 Service Unavailable"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: ""
> {noformat}
> h5. Flavour 2
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: "Disconnected"
> {noformat}
> h5. Flavour 3
> {noformat}
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-16.04/mesos/src/tests/api_tests.cpp:6367
> Value of: (sessionResponse).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: ""
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2017-10-02 Thread Till Toenshoff (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188081#comment-16188081
 ] 

Till Toenshoff commented on MESOS-7742:
---

Observed this as well on internal CI:
{noformat}
../../src/tests/api_tests.cpp:6951
Value of: (response).get().status
  Actual: "500 Internal Server Error"
Expected: http::OK().status
Which is: "200 OK"
{noformat}

{noformat}
00:50:40  [ RUN  ] 
ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/1
00:50:40  I0930 00:50:40.193588  7413 cluster.cpp:162] Creating default 'local' 
authorizer
00:50:40  I0930 00:50:40.194614 26521 master.cpp:445] Master 
6d4d319b-ce27-402c-91d2-087edb6a4a11 (ip-172-16-10-96.ec2.internal) started on 
172.16.10.96:38662
00:50:40  I0930 00:50:40.194630 26521 master.cpp:447] Flags at startup: 
--acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate_agents="true" --authenticate_frameworks="true" 
--authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
--authenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/wdBG06/credentials" 
--filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
--max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
--recovery_agent_removal_limit="100%" --registry="in_memory" 
--registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
--registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" --version="false" 
--webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/wdBG06/master" 
--zk_session_timeout="10secs"
00:50:40  I0930 00:50:40.194720 26521 master.cpp:497] Master only allowing 
authenticated frameworks to register
00:50:40  I0930 00:50:40.194723 26521 master.cpp:511] Master only allowing 
authenticated agents to register
00:50:40  I0930 00:50:40.194725 26521 master.cpp:524] Master only allowing 
authenticated HTTP frameworks to register
00:50:40  I0930 00:50:40.194730 26521 credentials.hpp:37] Loading credentials 
for authentication from '/tmp/wdBG06/credentials'
00:50:40  I0930 00:50:40.194808 26521 master.cpp:569] Using default 'crammd5' 
authenticator
00:50:40  I0930 00:50:40.194844 26521 http.cpp:1045] Creating default 'basic' 
HTTP authenticator for realm 'mesos-master-readonly'
00:50:40  I0930 00:50:40.194876 26521 http.cpp:1045] Creating default 'basic' 
HTTP authenticator for realm 'mesos-master-readwrite'
00:50:40  I0930 00:50:40.194905 26521 http.cpp:1045] Creating default 'basic' 
HTTP authenticator for realm 'mesos-master-scheduler'
00:50:40  I0930 00:50:40.194932 26521 master.cpp:649] Authorization enabled
00:50:40  I0930 00:50:40.194973 26516 hierarchical.cpp:171] Initialized 
hierarchical allocator process
00:50:40  I0930 00:50:40.195008 26516 whitelist_watcher.cpp:77] No whitelist 
given
00:50:40  I0930 00:50:40.195634 26523 master.cpp:2216] Elected as the leading 
master!
00:50:40  I0930 00:50:40.195659 26523 master.cpp:1705] Recovering from registrar
00:50:40  I0930 00:50:40.195701 26523 registrar.cpp:347] Recovering registrar
00:50:40  I0930 00:50:40.195863 26521 registrar.cpp:391] Successfully fetched 
the registry (0B) in 144128ns
00:50:40  I0930 00:50:40.195896 26521 registrar.cpp:495] Applied 1 operations 
in 6568ns; attempting to update the registry
00:50:40  I0930 00:50:40.196048 26519 registrar.cpp:552] Successfully updated 
the registry in 119808ns
00:50:40  I0930 00:50:40.196079 26519 registrar.cpp:424] Successfully recovered 
registrar
00:50:40  I0930 00:50:40.196159 26520 master.cpp:1809] Recovered 0 agents from 
the registry (168B); allowing 10mins for agents to re-register
00:50:40  I0930 00:50:40.196218 26518 hierarchical.cpp:209] Skipping recovery 
of hierarchical allocator: nothing to recover
00:50:40  I0930 00:50:40.197204  7413 containerizer.cpp:292] Using isolation { 
environment_secret, posix/cpu, posix/mem, filesystem/posix, network/cni }
00:50:40  I0930 00:50:40.202510  7413 linux_launcher.cpp:146] Using 
/sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
00:50:40  I0930 00:50:40.202874  7413 provisioner.cpp:255] Using default 
backend 'overlay'
00:50:40  W0930 00:50:40.204174  7413 process.cpp:3194] Attempted to spawn 
already running process files@172.16.10.96:38662
00:50:40  I0930 00:50:40.204308  7413 cluster.cpp:448] Creating default 'local' 
authorizer
00:50:40  I0930 00:50:40.204797 26523 

[jira] [Commented] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2017-09-21 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174742#comment-16174742
 ] 

Alexander Rukletsov commented on MESOS-7742:


Observed this on internal CI, for both {{application/x-protobuf}} and 
{{application/json}}. Same failure:
{noformat}
../../src/tests/api_tests.cpp:6701
Value of: (response).get().status
  Actual: "500 Internal Server Error"
Expected: http::OK().status
Which is: "200 OK"
{noformat}

> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
> --
>
> Key: MESOS-7742
> URL: https://issues.apache.org/jira/browse/MESOS-7742
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Gastón Kleiman
>  Labels: flaky-test, mesosphere-oncall
>
> Observed this on ASF CI. 
> [~gkleiman] mind triaging this?
> {code}
> [ RUN  ] 
> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
> I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0629 05:49:33.182234 25306 master.cpp:436] Master 
> 90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 
> 172.17.0.3:45726
> I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" -
> -allocator="HierarchicalDRF" --authenticate_agents="true" 
> --authenticate_frameworks="true" --authenticate_http_frameworks="true" 
> --authenticate_http_readonly="true" --au
> thenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/a5h5J3/credentials" 
> --framework_sorter="drf" --help="false" --hostn
> ame_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" 
> --logging_level="INFO" --max_agent_ping_timeouts="5" 
> --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="10
> 00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" 
> --registry="in_memory" --registry_fetch_timeout="1mins" 
> --registry_gc_interval="15mins" --registr
> y_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" -
> -version="false" --webui_dir="/usr/local/share/mesos/webui" 
> --work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs"
> I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing 
> authenticated frameworks to register
> I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing 
> authenticated agents to register
> I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/a5h5J3/credentials'
> I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' 
> authenticator
> I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled
> I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical 
> allocator process
> I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given
> I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master!
> I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar
> I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar
> I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 183040ns
> I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 
> 6441ns; attempting to update the registry
> I0629 05:49:33.184885 25304 registrar.cpp:550] Successfully updated the 
> registry in 147200ns
> I0629 05:49:33.184993 25304 registrar.cpp:422] Successfully recovered 
> registrar
> I0629 05:49:33.185148 25308 master.cpp:1799] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0629 05:49:33.185161 25302 hierarchical.cpp:185] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0629 05:49:33.186769 25301 containerizer.cpp:221] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni
> W0629 05:49:33.187232 25301 backend.cpp:76] Failed to create 'aufs' backend: 
> AufsBackend