[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-09-21 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619417#comment-16619417
 ] 

Alexander Rukletsov edited comment on MESOS-8545 at 9/21/18 1:01 PM:
-

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit bfa2bd24780b5c49467b3c23260855e3d8b4c948
Author: Andrei Budnik 
AuthorDate: Fri Sep 21 14:51:24 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Fri Sep 21 14:51:24 2018 +0200

Fixed disconnection while sending acknowledgment to IOSwitchboard.

Previously, an HTTP connection to the IOSwitchboard could be garbage
collected before the agent sent an acknowledgment to the IOSwitchboard
via this connection. This patch fixes the issue by keeping a reference
count to the connection in a lambda callback until disconnection
occurs.

Review: https://reviews.apache.org/r/68768/
{noformat}
{noformat}
commit c3c77cbef818d497d8bd5e67fa72e55a7190e27a
Author: Andrei Budnik 
AuthorDate: Fri Sep 21 14:51:59 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Fri Sep 21 14:51:59 2018 +0200

Fixed broken pipe error in IOSwitchboard.

Previous attempt to fix `HTTP 500` "broken pipe" in review /r/62187/
was not correct: after IOSwitchboard sends a response to the agent for
the `ATTACH_CONTAINER_INPUT` call, the socket is closed immediately,
thus causing the error on the agent. This patch adds a delay after
IO redirects are finished and before IOSwitchboard forcibly send a
response.

Review: https://reviews.apache.org/r/68784/
{noformat}
*{{1.7.1}}*:
{noformat}
commit 1672941630960cccf66ed81b11811d84e8a4e3f0
commit 600b388e25c49f4fac4d39bc07bcf6ffce42c679
commit 021a8f4de1ad65167946548e3ecfa74d8e41e9c5
commit 38a914398b6f1aaf08db4f62f4e42cdb80127eb5
{noformat}
*{{1.6.2}}*:
{noformat}
commit 2ddd6f07bebbe91e1e0d5165c4a5ae552b836303
commit c1448f36d4c2c2c8345e7e8d1bf1f206dba18dac
commit 55b0e94f0c8a1896ca079361d89527123faf22c6
commit c40c92b7710b5b238b13ce6f1bacd3d75e04283b
{noformat}
*{{1.5.2}}*:
{noformat}
commit 3bf4fe22e0ed828a36d5b2ea652d07c6eef4b578
commit 33a6bec95b44592d626874ae8deaa3e2a3bbc120
commit 7b8195680104c2c5f61073a956f60ac961c37f45
commit 0216002744517a6053fd782b6b4dc3d6cf77dd5e
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken 

[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-09-21 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619417#comment-16619417
 ] 

Alexander Rukletsov edited comment on MESOS-8545 at 9/21/18 12:56 PM:
--

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit bfa2bd24780b5c49467b3c23260855e3d8b4c948
Author: Andrei Budnik 
AuthorDate: Fri Sep 21 14:51:24 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Fri Sep 21 14:51:24 2018 +0200

Fixed disconnection while sending acknowledgment to IOSwitchboard.

Previously, an HTTP connection to the IOSwitchboard could be garbage
collected before the agent sent an acknowledgment to the IOSwitchboard
via this connection. This patch fixes the issue by keeping a reference
count to the connection in a lambda callback until disconnection
occurs.

Review: https://reviews.apache.org/r/68768/
{noformat}
{noformat}
commit c3c77cbef818d497d8bd5e67fa72e55a7190e27a
Author: Andrei Budnik 
AuthorDate: Fri Sep 21 14:51:59 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Fri Sep 21 14:51:59 2018 +0200

Fixed broken pipe error in IOSwitchboard.

Previous attempt to fix `HTTP 500` "broken pipe" in review /r/62187/
was not correct: after IOSwitchboard sends a response to the agent for
the `ATTACH_CONTAINER_INPUT` call, the socket is closed immediately,
thus causing the error on the agent. This patch adds a delay after
IO redirects are finished and before IOSwitchboard forcibly send a
response.

Review: https://reviews.apache.org/r/68784/
{noformat}
*{{1.7.1}}*:
{noformat}
commit 1672941630960cccf66ed81b11811d84e8a4e3f0
commit 600b388e25c49f4fac4d39bc07bcf6ffce42c679
{noformat}
*{{1.6.2}}*:
{noformat}
commit 2ddd6f07bebbe91e1e0d5165c4a5ae552b836303
commit c1448f36d4c2c2c8345e7e8d1bf1f206dba18dac
{noformat}
*{{1.5.2}}*:
{noformat}
commit 3bf4fe22e0ed828a36d5b2ea652d07c6eef4b578
commit 33a6bec95b44592d626874ae8deaa3e2a3bbc120
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an 

[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-09-18 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619417#comment-16619417
 ] 

Alexander Rukletsov edited comment on MESOS-8545 at 9/18/18 6:14 PM:
-

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
*{{1.7.1}}*:
{noformat}
commit 1672941630960cccf66ed81b11811d84e8a4e3f0
commit 600b388e25c49f4fac4d39bc07bcf6ffce42c679
{noformat}
*{{1.6.2}}*:
{noformat}
commit 2ddd6f07bebbe91e1e0d5165c4a5ae552b836303
commit c1448f36d4c2c2c8345e7e8d1bf1f206dba18dac
{noformat}
*{{1.5.2}}*:
{noformat}
commit 3bf4fe22e0ed828a36d5b2ea652d07c6eef4b578
commit 33a6bec95b44592d626874ae8deaa3e2a3bbc120
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
*{{1.7.1}}*:
{noformat}
commit 1672941630960cccf66ed81b11811d84e8a4e3f0
commit 

[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-09-18 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619417#comment-16619417
 ] 

Alexander Rukletsov edited comment on MESOS-8545 at 9/18/18 5:58 PM:
-

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
*{{1.7.1}}*:
{noformat}
commit 1672941630960cccf66ed81b11811d84e8a4e3f0
commit 600b388e25c49f4fac4d39bc07bcf6ffce42c679
{noformat}
*{{1.6.2}}*:
{noformat}
commit 2ddd6f07bebbe91e1e0d5165c4a5ae552b836303
commit c1448f36d4c2c2c8345e7e8d1bf1f206dba18dac
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
*{{1.7.1}}*:
{noformat}
commit 1672941630960cccf66ed81b11811d84e8a4e3f0
commit 600b388e25c49f4fac4d39bc07bcf6ffce42c679
{noformat}

> AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
> 

[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-09-18 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619417#comment-16619417
 ] 

Alexander Rukletsov edited comment on MESOS-8545 at 9/18/18 5:44 PM:
-

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
*{{1.7.1}}*:
{noformat}
commit 1672941630960cccf66ed81b11811d84e8a4e3f0
commit 600b388e25c49f4fac4d39bc07bcf6ffce42c679
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
*{{1.7.1}}*:
{noformat}
commit 1672941630960cccf66ed81b11811d84e8a4e3f0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:27:17 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate 

[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-09-18 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619417#comment-16619417
 ] 

Alexander Rukletsov edited comment on MESOS-8545 at 9/18/18 5:43 PM:
-

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
*{{1.7.1}}*:
{noformat}
commit 1672941630960cccf66ed81b11811d84e8a4e3f0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:27:17 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
(cherry picked from commit 5b95bb0f21852058d22703385f2c8e139881bf1a)
{noformat}
{noformat}
commit 600b388e25c49f4fac4d39bc07bcf6ffce42c679
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:20 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:27:17 2018 +0200

Fixed broken pipe error in IOSwitchboard.

We force IOSwitchboard to return a final response to the client for the
`ATTACH_CONTAINER_INPUT` call after IO redirects are finished. In this
case, we don't read remaining messages from the input stream. So the
agent might send an acknowledgment for the request before IOSwitchboard
has received remaining messages. We need to delay termination of
IOSwitchboard to give it a chance to read the remaining messages.
Otherwise, the agent might get `HTTP 500` "broken pipe" while
attempting to write the final message.

Review: https://reviews.apache.org/r/62187/
(cherry picked from commit c5cf4d49f47579b5a6cb7afc2f7df7c8f51dc6d0)
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`

[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-08-29 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596258#comment-16596258
 ] 

Alexander Rukletsov edited comment on MESOS-8545 at 8/29/18 3:01 PM:
-

When the agent handles {{ATTACH_CONTAINER_INPUT}} call, it creates an HTTP 
[streaming 
connection|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3104]
 to IOSwitchboard.
 After the agent 
[sends|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3141]
 a request to IOSwitchboard, a new instance of {{ConnectionProcess}} is 
created, which calls 
[{{ConnectionProcess::read()}}|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1220]
 to read an HTTP response from IOSwitchboard.
 If the socket is closed before a `\r\n\r\n` response is received, the 
{{ConnectionProcess}} calls 
`[disconnect()|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1326]`,
 which in turn [flushes 
`pipeline`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1197-L1201]
 containing a {{Response}} promise. This leads to responding back (to the 
{{AttachInputToNestedContainerSession}} 
[test|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/tests/api_tests.cpp#L7942-L7943])
 an {{HTTP 500}} error with body "Disconnected".

When io redirect finishes, IOSwitchboardServerProcess calls {{terminate(self(), 
false)}} (here 
[\[1\]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1262]
 or there 
[\[2\]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1713]).
 Then, {{IOSwitchboardServerProcess::finalize()}} sets a value to the 
[`promise`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1304-L1308],
 which [unblocks 
{{main()}}|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard_main.cpp#L149-L150]
 function. As a result, IOSwitchboard process terminates immediately.

When IOSwitchboard terminates, there could be not yet 
[written|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1699]
 response messages to the socket. So, if any delay occurs before 
[sending|https://github.com/apache/mesos/blob/95bbe784da51b3a7eaeb9127e2541ea0b2af07b5/3rdparty/libprocess/src/http.cpp#L1742-L1748]
 the response back to the agent, the socket will be closed due to IOSwitchboard 
process termination. That leads to the aforementioned premature socket close in 
the agent.

See my previous comment which includes steps to reproduce the bug.


was (Author: abudnik):
When the agent handles `ATTACH_CONTAINER_INPUT` call, it creates an HTTP 
[streaming 
connection|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3104]
 to IOSwitchboard.
 After the agent 
[sends|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3141]
 a request to IOSwitchboard, a new instance of `ConnectionProcess` is created, 
which calls 
[`ConnectionProcess::read()`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1220]
 to read an HTTP response from IOSwitchboard.
 If the socket is closed before a `\r\n\r\n` response is received, the 
`ConnectionProcess` calls 
`[disconnect()|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1326]`,
 which in turn [flushes 
`pipeline`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1197-L1201]
 containing a `Response` promise. This leads to responding back (to the 
`AttachInputToNestedContainerSession` 
[test|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/tests/api_tests.cpp#L7942-L7943])
 an `HTTP 500` error with body "Disconnected".

When io redirect finishes, IOSwitchboardServerProcess calls `terminate(self(), 
false)` (here 
[\[1\]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1262]
 or there 
[\[2\]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1713]).
 Then, `IOSwitchboardServerProcess::finalize()` sets a value to the 

[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-08-29 Thread Andrei Budnik (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596258#comment-16596258
 ] 

Andrei Budnik edited comment on MESOS-8545 at 8/29/18 12:42 PM:


When the agent handles `ATTACH_CONTAINER_INPUT` call, it creates an HTTP 
[streaming 
connection|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3104]
 to IOSwitchboard.
 After the agent 
[sends|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3141]
 a request to IOSwitchboard, a new instance of `ConnectionProcess` is created, 
which calls 
[`ConnectionProcess::read()`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1220]
 to read an HTTP response from IOSwitchboard.
 If the socket is closed before a `\r\n\r\n` response is received, the 
`ConnectionProcess` calls 
`[disconnect()|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1326]`,
 which in turn [flushes 
`pipeline`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1197-L1201]
 containing a `Response` promise. This leads to responding back (to the 
`AttachInputToNestedContainerSession` 
[test|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/tests/api_tests.cpp#L7942-L7943])
 an `HTTP 500` error with body "Disconnected".

When io redirect finishes, IOSwitchboardServerProcess calls `terminate(self(), 
false)` (here 
[[1]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1262]
 or there 
[[2]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1713]).
 Then, `IOSwitchboardServerProcess::finalize()` sets a value to the 
[`promise`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1304-L1308],
 which [unblocks 
`main()`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard_main.cpp#L149-L150]
 function. As a result, IOSwitchboard process terminates immediately.

When IOSwitchboard terminates, there could be not yet 
[written|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1699]
 response messages to the socket. So, if any delay occurs before 
[sending|https://github.com/apache/mesos/blob/95bbe784da51b3a7eaeb9127e2541ea0b2af07b5/3rdparty/libprocess/src/http.cpp#L1742-L1748]
 the response back to the agent, the socket will be closed due to IOSwitchboard 
process termination. That leads to the aforementioned premature socket close in 
the agent.

See my previous comment which includes steps to reproduce the bug.


was (Author: abudnik):
When the agent handles `ATTACH_CONTAINER_INPUT` call, it creates an HTTP 
[streaming 
connection|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3104]
 to IOSwitchboard.
 After the agent 
[sends|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3141]
 a request to IOSwitchboard, a new instance of `ConnectionProcess` is created, 
which calls 
[`ConnectionProcess::read()`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1220]
 to read an HTTP response from IOSwitchboard.
 If the socket is closed before a `\r\n\r\n` response is received, the 
`ConnectionProcess` calls 
`[disconnect()|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1326]`,
 which in turn [flushes 
`pipeline`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1197-L1201]
 containing a `Response` promise. This leads to responding back (to the 
`AttachInputToNestedContainerSession` 
[test|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/tests/api_tests.cpp#L7942-L7943])
 an `HTTP 500` error with body "Disconnected".

When io redirect finishes, IOSwitchboardServerProcess calls `terminate(self(), 
false)` (here 
[\[1\]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1262]
 or there 
[\[2\]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1713]).
 Then, `IOSwitchboardServerProcess::finalize()` sets a value to the 

[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-08-29 Thread Andrei Budnik (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596258#comment-16596258
 ] 

Andrei Budnik edited comment on MESOS-8545 at 8/29/18 12:43 PM:


When the agent handles `ATTACH_CONTAINER_INPUT` call, it creates an HTTP 
[streaming 
connection|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3104]
 to IOSwitchboard.
 After the agent 
[sends|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3141]
 a request to IOSwitchboard, a new instance of `ConnectionProcess` is created, 
which calls 
[`ConnectionProcess::read()`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1220]
 to read an HTTP response from IOSwitchboard.
 If the socket is closed before a `\r\n\r\n` response is received, the 
`ConnectionProcess` calls 
`[disconnect()|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1326]`,
 which in turn [flushes 
`pipeline`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1197-L1201]
 containing a `Response` promise. This leads to responding back (to the 
`AttachInputToNestedContainerSession` 
[test|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/tests/api_tests.cpp#L7942-L7943])
 an `HTTP 500` error with body "Disconnected".

When io redirect finishes, IOSwitchboardServerProcess calls `terminate(self(), 
false)` (here 
[\[1\]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1262]
 or there 
[\[2\]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1713]).
 Then, `IOSwitchboardServerProcess::finalize()` sets a value to the 
[`promise`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1304-L1308],
 which [unblocks 
`main()`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard_main.cpp#L149-L150]
 function. As a result, IOSwitchboard process terminates immediately.

When IOSwitchboard terminates, there could be not yet 
[written|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1699]
 response messages to the socket. So, if any delay occurs before 
[sending|https://github.com/apache/mesos/blob/95bbe784da51b3a7eaeb9127e2541ea0b2af07b5/3rdparty/libprocess/src/http.cpp#L1742-L1748]
 the response back to the agent, the socket will be closed due to IOSwitchboard 
process termination. That leads to the aforementioned premature socket close in 
the agent.

See my previous comment which includes steps to reproduce the bug.


was (Author: abudnik):
When the agent handles `ATTACH_CONTAINER_INPUT` call, it creates an HTTP 
[streaming 
connection|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3104]
 to IOSwitchboard.
 After the agent 
[sends|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3141]
 a request to IOSwitchboard, a new instance of `ConnectionProcess` is created, 
which calls 
[`ConnectionProcess::read()`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1220]
 to read an HTTP response from IOSwitchboard.
 If the socket is closed before a `\r\n\r\n` response is received, the 
`ConnectionProcess` calls 
`[disconnect()|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1326]`,
 which in turn [flushes 
`pipeline`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1197-L1201]
 containing a `Response` promise. This leads to responding back (to the 
`AttachInputToNestedContainerSession` 
[test|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/tests/api_tests.cpp#L7942-L7943])
 an `HTTP 500` error with body "Disconnected".

When io redirect finishes, IOSwitchboardServerProcess calls `terminate(self(), 
false)` (here 
[[1]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1262]
 or there 
[[2]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1713]).
 Then, `IOSwitchboardServerProcess::finalize()` sets a value to the 

[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-08-29 Thread Andrei Budnik (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596258#comment-16596258
 ] 

Andrei Budnik edited comment on MESOS-8545 at 8/29/18 12:20 PM:


When the agent handles `ATTACH_CONTAINER_INPUT` call, it creates an HTTP 
[streaming 
connection|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3104]
 to IOSwitchboard.
 After the agent 
[sends|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3141]
 a request to IOSwitchboard, a new instance of `ConnectionProcess` is created, 
which calls 
[`ConnectionProcess::read()`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1220]
 to read an HTTP response from IOSwitchboard.
 If the socket is closed before a `\r\n\r\n` response is received, the 
`ConnectionProcess` calls 
`[disconnect()|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1326]`,
 which in turn [flushes 
`pipeline`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1197-L1201]
 containing a `Response` promise. This leads to responding back (to the 
`AttachInputToNestedContainerSession` 
[test|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/tests/api_tests.cpp#L7942-L7943])
 an `HTTP 500` error with body "Disconnected".

When io redirect finishes, IOSwitchboardServerProcess calls `terminate(self(), 
false)` (here 
[\[1\]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1262]
 or there 
[\[2\]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1713]).
 Then, `IOSwitchboardServerProcess::finalize()` sets a value to the 
[`promise`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1304-L1308],
 which [unblocks 
`main()`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard_main.cpp#L149-L150]
 function. As a result, IOSwitchboard process terminates immediately.

When IOSwitchboard terminates, there could be not yet 
[written|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1699]
 response messages to the socket. So, if any delay occurs before 
[sending|https://github.com/apache/mesos/blob/95bbe784da51b3a7eaeb9127e2541ea0b2af07b5/3rdparty/libprocess/src/http.cpp#L1742-L1748]
 the response back to the agent, the socket will be closed due to IOSwitchboard 
process termination. That leads to the aforementioned premature socket close in 
the agent.

See my previous comment including steps to reproduce.


was (Author: abudnik):
When the agent handles `ATTACH_CONTAINER_INPUT` call, it creates an HTTP 
[streaming 
connection|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3104]
 to IOSwitchboard.
 After the agent 
[sends|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/http.cpp#L3141]
 a request to IOSwitchboard, a new instance of `ConnectionProcess` is created, 
which calls 
[`ConnectionProcess::read()`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1220]
 to read an HTTP response from IOSwitchboard.
 If the socket is closed before a `\r\n\r\n` response is received, the 
`ConnectionProcess` calls 
`[disconnect()|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1326]`,
 which in turn [flushes 
`pipeline`|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/3rdparty/libprocess/src/http.cpp#L1197-L1201]
 containing a `Response` promise. This leads to responding back (to the 
`AttachInputToNestedContainerSession` 
[test|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/tests/api_tests.cpp#L7942-L7943])
 an `HTTP 500` error with body "Disconnected".

When io redirect finishes, IOSwitchboardServerProcess calls `terminate(self(), 
false)` (here 
[[1]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1262]
 or there 
[[2]|https://github.com/apache/mesos/blob/12636838f78ad06b66466b3d2fa9c9db94ac70b2/src/slave/containerizer/mesos/io/switchboard.cpp#L1713]).
 Then, `IOSwitchboardServerProcess::finalize()` sets a value to the 

[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-03-26 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405510#comment-16405510
 ] 

Alexander Rukletsov edited comment on MESOS-8545 at 3/26/18 1:04 PM:
-

{noformat}
commit 02ebf9986ab5ce883a71df72e9e3392a3e37e40e
Author: Andrei Budnik 
AuthorDate: Mon Mar 19 22:48:31 2018 +0100
Commit: Alexander Rukletsov 
CommitDate: Mon Mar 19 22:48:31 2018 +0100

Fixed disconnection for ATTACH_CONTAINER_INPUT call in IOSwitchboard.

Previously, an http response for the `ATTACH_CONTAINER_INPUT` call
could be lost due to immediate termination of the IOSwitchboard
process after the termination of the IOSwitchboard actor. Since the
IOSwitchboard process didn't wait for completion of sending all
responses back to the agent, the agent received disconnection error.
To fix the issue, this patch adds explicit finalization of libprocess
before returning from the IOSwitchboard's main function.

Review: https://reviews.apache.org/r/66147/
{noformat}
{noformat}
commit 1ed3eae3ca09c8fdeac349d78e568d2a91be306b
Author: Andrei Budnik 
AuthorDate: Mon Mar 26 15:03:30 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Mon Mar 26 15:03:30 2018 +0200

Ensured correct termination order in IOSwitchboard's main function.

This patch terminates `IOSwitchboardServer` actor before calling
`process::finalize()`. This patch is an addition to commit 02ebf9986a.

Review: https://reviews.apache.org/r/66278/
{noformat}


was (Author: alexr):
{noformat}
commit 02ebf9986ab5ce883a71df72e9e3392a3e37e40e
Author: Andrei Budnik 
AuthorDate: Mon Mar 19 22:48:31 2018 +0100
Commit: Alexander Rukletsov 
CommitDate: Mon Mar 19 22:48:31 2018 +0100

Fixed disconnection for ATTACH_CONTAINER_INPUT call in IOSwitchboard.

Previously, an http response for the `ATTACH_CONTAINER_INPUT` call
could be lost due to immediate termination of the IOSwitchboard
process after the termination of the IOSwitchboard actor. Since the
IOSwitchboard process didn't wait for completion of sending all
responses back to the agent, the agent received disconnection error.
To fix the issue, this patch adds explicit finalization of libprocess
before returning from the IOSwitchboard's main function.

Review: https://reviews.apache.org/r/66147/
{noformat}

> AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
> ---
>
> Key: MESOS-8545
> URL: https://issues.apache.org/jira/browse/MESOS-8545
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.5.0
>Reporter: Andrei Budnik
>Assignee: Andrei Budnik
>Priority: Major
>  Labels: Mesosphere, flaky-test
> Fix For: 1.6.0
>
> Attachments: 
> AgentAPIStreamingTest.AttachInputToNestedContainerSession-badrun.txt, 
> AgentAPIStreamingTest.AttachInputToNestedContainerSession-badrun2.txt
>
>
> {code:java}
> I0205 17:11:01.091872 4898 http_proxy.cpp:132] Returning '500 Internal Server 
> Error' for '/slave(974)/api/v1' (Disconnected)
> /home/centos/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-centos-7/mesos/src/tests/api_tests.cpp:6596:
>  Failure
> Value of: (response).get().status
> Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> Body: "Disconnected"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)