[jira] [Comment Edited] (MESOS-9131) Health checks launching nested containers while a container is being destroyed lead to unkillable tasks.

2018-09-18 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619415#comment-16619415
 ] 

Alexander Rukletsov edited comment on MESOS-9131 at 9/18/18 6:14 PM:
-

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:09:31 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSwitchboard didn't terminate immediately. IOSwitchboard was waiting
for EOF message from the input HTTP connection. Since the IOSwitchboard
was stuck, the corresponding nested container was also stuck in
`DESTROYING` state.

This patch fixes the aforementioned issue by sending 200 `OK` response
for `ATTACH_CONTAINER_INPUT` call in the case when io redirect is
finished while reading from the HTTP input connection is not.

Review: https://reviews.apache.org/r/68232/
{noformat}
{noformat}
commit e941d206f651bde861675a6517a89e44d1f61a34
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:01 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:01 2018 +0200

Added `AgentAPITest.LaunchNestedContainerSessionKillTask` test.

This test verifies that IOSwitchboard, which holds an open HTTP input
connection, terminates once IO redirects finish for the corresponding
nested container.

Review: https://reviews.apache.org/r/68230/
{noformat}
{noformat}
commit 7ad390b3aa261f4a39ff7f2c0842f2aae39005f4
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:07 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:07 2018 +0200

Added `AgentAPITest.AttachContainerInputRepeat` test.

This test verifies that we can call `ATTACH_CONTAINER_INPUT` more
than once. We send a short message first then we send a long message
in chunks.

Review: https://reviews.apache.org/r/68231/
{noformat}
*{{1.7.1}}*:
{noformat}
commit e9605a6243db41c1bbc85ec9ade112f2ef806c15
commit f672afef601c71d69a9eb4db3c191bacfe167d3e
commit 4a1b3186a2fa64bf7d94787f3546dd584e2f1186
{noformat}
*{{1.6.2}}*:
{noformat}
commit e3a9eb3b473a10f210913d568c1d9923ed05d933
commit a1798ae1fb2249280f4a4e9fec69eb9e37b95452
commit d82177d00a4a25d70aab172a91c855ad6b07f768
{noformat}
*{{1.5.2}}*:
{noformat}
commit 5a5089938f13a5aafc0a4ee3308f33e76374c408
commit 25de60746de4681ed0d858cba0790372f03ff840
commit fa6eb85fd2a8798842855628495c16664bc68652
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:09:31 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSwitchboard didn't terminate immediately. IOSwitchboard was waiting
for EOF message from the input HTTP connection. Since the IOSwitchboard
was stuck, the corresponding nested container was also stuck in
`DESTROYING` state.

This patch fixes the aforementioned issue by sending 200 `OK` response
for `ATTACH_CONTAINER_INPUT` call in the case when io redirect is
finished while reading from the HTTP input connection is not.

Review: https://reviews.apache.org/r/68232/
{noformat}
{noformat}
commit e941d206f651bde861675a6517a89e44d1f61a34
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:01 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:01 2018 +0200

Added `AgentAPITest.LaunchNestedContainerSessionKillTask` test.

This test verifies that IOSwitchboard, which holds an open HTTP input
connection, terminates once IO redirects finish for the corresponding
nested container.

Review: https://reviews.apache.org/r/68230/
{noformat}
{noformat}
commit 7ad390b3aa261f4a39ff7f2c0842f2aae39005f4
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:07 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:07 2018 +0200

Added `AgentAPITest.AttachContainerInputRepeat` test.

This test verifies that we can call `ATTACH_CONTAINER_INPUT` more
than once. We send a short message first then we send a long message
in chunks.

Review: https://reviews.apache.org/r/68231/
{noformat}
*{{1.7.1}}*:
{noformat}
commit e9605a6243db41c1bbc85ec9ade112f2ef806c15
commit f672afef601c71d69a9eb4db3c191bacfe167d3e
commit 4a1b3186a2fa

[jira] [Comment Edited] (MESOS-9131) Health checks launching nested containers while a container is being destroyed lead to unkillable tasks.

2018-09-18 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619415#comment-16619415
 ] 

Alexander Rukletsov edited comment on MESOS-9131 at 9/18/18 5:57 PM:
-

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:09:31 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSwitchboard didn't terminate immediately. IOSwitchboard was waiting
for EOF message from the input HTTP connection. Since the IOSwitchboard
was stuck, the corresponding nested container was also stuck in
`DESTROYING` state.

This patch fixes the aforementioned issue by sending 200 `OK` response
for `ATTACH_CONTAINER_INPUT` call in the case when io redirect is
finished while reading from the HTTP input connection is not.

Review: https://reviews.apache.org/r/68232/
{noformat}
{noformat}
commit e941d206f651bde861675a6517a89e44d1f61a34
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:01 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:01 2018 +0200

Added `AgentAPITest.LaunchNestedContainerSessionKillTask` test.

This test verifies that IOSwitchboard, which holds an open HTTP input
connection, terminates once IO redirects finish for the corresponding
nested container.

Review: https://reviews.apache.org/r/68230/
{noformat}
{noformat}
commit 7ad390b3aa261f4a39ff7f2c0842f2aae39005f4
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:07 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:07 2018 +0200

Added `AgentAPITest.AttachContainerInputRepeat` test.

This test verifies that we can call `ATTACH_CONTAINER_INPUT` more
than once. We send a short message first then we send a long message
in chunks.

Review: https://reviews.apache.org/r/68231/
{noformat}
*{{1.7.1}}*:
{noformat}
commit e9605a6243db41c1bbc85ec9ade112f2ef806c15
commit f672afef601c71d69a9eb4db3c191bacfe167d3e
commit 4a1b3186a2fa64bf7d94787f3546dd584e2f1186
{noformat}
*{{1.6.2}}*:
{noformat}
commit e3a9eb3b473a10f210913d568c1d9923ed05d933
commit a1798ae1fb2249280f4a4e9fec69eb9e37b95452
commit d82177d00a4a25d70aab172a91c855ad6b07f768
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:09:31 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSwitchboard didn't terminate immediately. IOSwitchboard was waiting
for EOF message from the input HTTP connection. Since the IOSwitchboard
was stuck, the corresponding nested container was also stuck in
`DESTROYING` state.

This patch fixes the aforementioned issue by sending 200 `OK` response
for `ATTACH_CONTAINER_INPUT` call in the case when io redirect is
finished while reading from the HTTP input connection is not.

Review: https://reviews.apache.org/r/68232/
{noformat}
{noformat}
commit e941d206f651bde861675a6517a89e44d1f61a34
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:01 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:01 2018 +0200

Added `AgentAPITest.LaunchNestedContainerSessionKillTask` test.

This test verifies that IOSwitchboard, which holds an open HTTP input
connection, terminates once IO redirects finish for the corresponding
nested container.

Review: https://reviews.apache.org/r/68230/
{noformat}
{noformat}
commit 7ad390b3aa261f4a39ff7f2c0842f2aae39005f4
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:07 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:07 2018 +0200

Added `AgentAPITest.AttachContainerInputRepeat` test.

This test verifies that we can call `ATTACH_CONTAINER_INPUT` more
than once. We send a short message first then we send a long message
in chunks.

Review: https://reviews.apache.org/r/68231/
{noformat}
*{{1.7.1}}*:
{noformat}
commit e9605a6243db41c1bbc85ec9ade112f2ef806c15
commit f672afef601c71d69a9eb4db3c191bacfe167d3e
commit 4a1b3186a2fa64bf7d94787f3546dd584e2f1186
{noformat}

> Health checks launching nested containers while a container is being 
> destroyed lead to unkillable tasks.
> --

[jira] [Comment Edited] (MESOS-9131) Health checks launching nested containers while a container is being destroyed lead to unkillable tasks

2018-09-18 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619415#comment-16619415
 ] 

Alexander Rukletsov edited comment on MESOS-9131 at 9/18/18 5:44 PM:
-

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:09:31 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSwitchboard didn't terminate immediately. IOSwitchboard was waiting
for EOF message from the input HTTP connection. Since the IOSwitchboard
was stuck, the corresponding nested container was also stuck in
`DESTROYING` state.

This patch fixes the aforementioned issue by sending 200 `OK` response
for `ATTACH_CONTAINER_INPUT` call in the case when io redirect is
finished while reading from the HTTP input connection is not.

Review: https://reviews.apache.org/r/68232/
{noformat}
{noformat}
commit e941d206f651bde861675a6517a89e44d1f61a34
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:01 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:01 2018 +0200

Added `AgentAPITest.LaunchNestedContainerSessionKillTask` test.

This test verifies that IOSwitchboard, which holds an open HTTP input
connection, terminates once IO redirects finish for the corresponding
nested container.

Review: https://reviews.apache.org/r/68230/
{noformat}
{noformat}
commit 7ad390b3aa261f4a39ff7f2c0842f2aae39005f4
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:07 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:07 2018 +0200

Added `AgentAPITest.AttachContainerInputRepeat` test.

This test verifies that we can call `ATTACH_CONTAINER_INPUT` more
than once. We send a short message first then we send a long message
in chunks.

Review: https://reviews.apache.org/r/68231/
{noformat}
*{{1.7.1}}*:
{noformat}
commit e9605a6243db41c1bbc85ec9ade112f2ef806c15
commit f672afef601c71d69a9eb4db3c191bacfe167d3e
commit 4a1b3186a2fa64bf7d94787f3546dd584e2f1186
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:09:31 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSwitchboard didn't terminate immediately. IOSwitchboard was waiting
for EOF message from the input HTTP connection. Since the IOSwitchboard
was stuck, the corresponding nested container was also stuck in
`DESTROYING` state.

This patch fixes the aforementioned issue by sending 200 `OK` response
for `ATTACH_CONTAINER_INPUT` call in the case when io redirect is
finished while reading from the HTTP input connection is not.

Review: https://reviews.apache.org/r/68232/
{noformat}
{noformat}
commit e941d206f651bde861675a6517a89e44d1f61a34
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:01 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:01 2018 +0200

Added `AgentAPITest.LaunchNestedContainerSessionKillTask` test.

This test verifies that IOSwitchboard, which holds an open HTTP input
connection, terminates once IO redirects finish for the corresponding
nested container.

Review: https://reviews.apache.org/r/68230/
{noformat}
{noformat}
commit 7ad390b3aa261f4a39ff7f2c0842f2aae39005f4
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:07 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:07 2018 +0200

Added `AgentAPITest.AttachContainerInputRepeat` test.

This test verifies that we can call `ATTACH_CONTAINER_INPUT` more
than once. We send a short message first then we send a long message
in chunks.

Review: https://reviews.apache.org/r/68231/
{noformat}
*{{1.7.1}}*:
{noformat}
commit e9605a6243db41c1bbc85ec9ade112f2ef806c15
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:27:17 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSwitchboard didn't terminate immediately. 

[jira] [Comment Edited] (MESOS-9131) Health checks launching nested containers while a container is being destroyed lead to unkillable tasks

2018-09-18 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619415#comment-16619415
 ] 

Alexander Rukletsov edited comment on MESOS-9131 at 9/18/18 5:42 PM:
-

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:09:31 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSwitchboard didn't terminate immediately. IOSwitchboard was waiting
for EOF message from the input HTTP connection. Since the IOSwitchboard
was stuck, the corresponding nested container was also stuck in
`DESTROYING` state.

This patch fixes the aforementioned issue by sending 200 `OK` response
for `ATTACH_CONTAINER_INPUT` call in the case when io redirect is
finished while reading from the HTTP input connection is not.

Review: https://reviews.apache.org/r/68232/
{noformat}
{noformat}
commit e941d206f651bde861675a6517a89e44d1f61a34
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:01 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:01 2018 +0200

Added `AgentAPITest.LaunchNestedContainerSessionKillTask` test.

This test verifies that IOSwitchboard, which holds an open HTTP input
connection, terminates once IO redirects finish for the corresponding
nested container.

Review: https://reviews.apache.org/r/68230/
{noformat}
{noformat}
commit 7ad390b3aa261f4a39ff7f2c0842f2aae39005f4
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:07 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:07 2018 +0200

Added `AgentAPITest.AttachContainerInputRepeat` test.

This test verifies that we can call `ATTACH_CONTAINER_INPUT` more
than once. We send a short message first then we send a long message
in chunks.

Review: https://reviews.apache.org/r/68231/
{noformat}
*{{1.7.1}}*:
{noformat}
commit e9605a6243db41c1bbc85ec9ade112f2ef806c15
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:27:17 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSwitchboard didn't terminate immediately. IOSwitchboard was waiting
for EOF message from the input HTTP connection. Since the IOSwitchboard
was stuck, the corresponding nested container was also stuck in
`DESTROYING` state.

This patch fixes the aforementioned issue by sending 200 `OK` response
for `ATTACH_CONTAINER_INPUT` call in the case when io redirect is
finished while reading from the HTTP input connection is not.

Review: https://reviews.apache.org/r/68232/
(cherry picked from commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0)
{noformat}
{noformat}
commit f672afef601c71d69a9eb4db3c191bacfe167d3e
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:01 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:27:17 2018 +0200

Added `AgentAPITest.LaunchNestedContainerSessionKillTask` test.

This test verifies that IOSwitchboard, which holds an open HTTP input
connection, terminates once IO redirects finish for the corresponding
nested container.

Review: https://reviews.apache.org/r/68230/
(cherry picked from commit e941d206f651bde861675a6517a89e44d1f61a34)
{noformat}
{noformat}
commit 4a1b3186a2fa64bf7d94787f3546dd584e2f1186
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:07 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:27:17 2018 +0200

Added `AgentAPITest.AttachContainerInputRepeat` test.

This test verifies that we can call `ATTACH_CONTAINER_INPUT` more
than once. We send a short message first then we send a long message
in chunks.

Review: https://reviews.apache.org/r/68231/
(cherry picked from commit 7ad390b3aa261f4a39ff7f2c0842f2aae39005f4)
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:09:31 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSw

[jira] [Comment Edited] (MESOS-9131) Health checks launching nested containers while a container is being destroyed lead to unkillable tasks

2018-09-18 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619415#comment-16619415
 ] 

Alexander Rukletsov edited comment on MESOS-9131 at 9/18/18 5:41 PM:
-

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:09:31 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSwitchboard didn't terminate immediately. IOSwitchboard was waiting
for EOF message from the input HTTP connection. Since the IOSwitchboard
was stuck, the corresponding nested container was also stuck in
`DESTROYING` state.

This patch fixes the aforementioned issue by sending 200 `OK` response
for `ATTACH_CONTAINER_INPUT` call in the case when io redirect is
finished while reading from the HTTP input connection is not.

Review: https://reviews.apache.org/r/68232/
{noformat}
{noformat}
commit e941d206f651bde861675a6517a89e44d1f61a34
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:01 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:01 2018 +0200

Added `AgentAPITest.LaunchNestedContainerSessionKillTask` test.

This test verifies that IOSwitchboard, which holds an open HTTP input
connection, terminates once IO redirects finish for the corresponding
nested container.

Review: https://reviews.apache.org/r/68230/
{noformat}
{noformat}
commit 7ad390b3aa261f4a39ff7f2c0842f2aae39005f4
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:07 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:07 2018 +0200

Added `AgentAPITest.AttachContainerInputRepeat` test.

This test verifies that we can call `ATTACH_CONTAINER_INPUT` more
than once. We send a short message first then we send a long message
in chunks.

Review: https://reviews.apache.org/r/68231/
{noformat}
*{{master}} aka {{1.7.1}}*:
{noformat}
commit e9605a6243db41c1bbc85ec9ade112f2ef806c15
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:27:17 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_INPUT`
IOSwitchboard didn't terminate immediately. IOSwitchboard was waiting
for EOF message from the input HTTP connection. Since the IOSwitchboard
was stuck, the corresponding nested container was also stuck in
`DESTROYING` state.

This patch fixes the aforementioned issue by sending 200 `OK` response
for `ATTACH_CONTAINER_INPUT` call in the case when io redirect is
finished while reading from the HTTP input connection is not.

Review: https://reviews.apache.org/r/68232/
(cherry picked from commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0)
{noformat}
{noformat}
commit f672afef601c71d69a9eb4db3c191bacfe167d3e
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:01 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:27:17 2018 +0200

Added `AgentAPITest.LaunchNestedContainerSessionKillTask` test.

This test verifies that IOSwitchboard, which holds an open HTTP input
connection, terminates once IO redirects finish for the corresponding
nested container.

Review: https://reviews.apache.org/r/68230/
(cherry picked from commit e941d206f651bde861675a6517a89e44d1f61a34)
{noformat}
{noformat}
commit 4a1b3186a2fa64bf7d94787f3546dd584e2f1186
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:07 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:27:17 2018 +0200

Added `AgentAPITest.AttachContainerInputRepeat` test.

This test verifies that we can call `ATTACH_CONTAINER_INPUT` more
than once. We send a short message first then we send a long message
in chunks.

Review: https://reviews.apache.org/r/68231/
(cherry picked from commit 7ad390b3aa261f4a39ff7f2c0842f2aae39005f4)
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 2fdc8f3cffc5eac91e5f2b0c6aef2254acfc2bd0
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:09:31 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:09:31 2018 +0200

Fixed IOSwitchboard waiting EOF from attach container input request.

Previously, when a corresponding nested container terminated, while the
user was attached to the container's stdin via `ATTACH_CONTAINER_

[jira] [Comment Edited] (MESOS-9131) Health checks launching nested containers while a container is being destroyed lead to unkillable tasks

2018-08-23 Thread Andrei Budnik (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589312#comment-16589312
 ] 

Andrei Budnik edited comment on MESOS-9131 at 8/23/18 6:17 PM:
---

[Test draft 
implementation|https://github.com/abudnik/mesos/commit/cf6e8cbc9aff4cdd350c1f13a2a37a3b5bce656e]

[Fix draft 
implementation|https://github.com/abudnik/mesos/commit/65690c8674902cb3ca55a8dddb4e370447856b0f]


was (Author: abudnik):
[Test draft 
implementation|https://github.com/abudnik/mesos/commit/cf6e8cbc9aff4cdd350c1f13a2a37a3b5bce656e]

[Fix draft 
implementation|https://github.com/abudnik/mesos/commit/a7b6a7d23e4a190e2d3215c02094c03a7cf72d3a]

> Health checks launching nested containers while a container is being 
> destroyed lead to unkillable tasks
> ---
>
> Key: MESOS-9131
> URL: https://issues.apache.org/jira/browse/MESOS-9131
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization
>Affects Versions: 1.5.1
>Reporter: Jan Schlicht
>Assignee: Qian Zhang
>Priority: Blocker
>  Labels: container-stuck
>
> A container might get stuck in {{DESTROYING}} state if there's a command 
> health check that starts new nested containers while its parent container is 
> getting destroyed.
> Here are some logs which unrelated lines removed. The 
> `REMOVE_NESTED_CONTAINER`/`LAUNCH_NESTED_CONTAINER_SESSION` keeps looping 
> afterwards.
> {noformat}
> 2018-04-16 12:37:54: I0416 12:37:54.235877  3863 containerizer.cpp:2807] 
> Container 
> db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 has 
> exited
> 2018-04-16 12:37:54: I0416 12:37:54.235914  3863 containerizer.cpp:2354] 
> Destroying container 
> db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 in 
> RUNNING state
> 2018-04-16 12:37:54: I0416 12:37:54.235932  3863 containerizer.cpp:2968] 
> Transitioning the state of container 
> db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 
> from RUNNING to DESTROYING
> 2018-04-16 12:37:54: I0416 12:37:54.236100  3852 linux_launcher.cpp:514] 
> Asked to destroy container 
> db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.e6e01854-40a0-4da3-b458-2b4cf52bbc11
> 2018-04-16 12:37:54: I0416 12:37:54.237671  3852 linux_launcher.cpp:560] 
> Using freezer to destroy cgroup 
> mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11
> 2018-04-16 12:37:54: I0416 12:37:54.240327  3852 cgroups.cpp:3060] Freezing 
> cgroup 
> /sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11
> 2018-04-16 12:37:54: I0416 12:37:54.244179  3852 cgroups.cpp:1415] 
> Successfully froze cgroup 
> /sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11
>  after 3.814144ms
> 2018-04-16 12:37:54: I0416 12:37:54.250550  3853 cgroups.cpp:3078] Thawing 
> cgroup 
> /sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11
> 2018-04-16 12:37:54: I0416 12:37:54.256599  3853 cgroups.cpp:1444] 
> Successfully thawed cgroup 
> /sys/fs/cgroup/freezer/mesos/db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3/mesos/0e44d4d7-629f-41f1-80df-4aae9583d133/mesos/e6e01854-40a0-4da3-b458-2b4cf52bbc11
>  after 5.977856ms
> ...
> 2018-04-16 12:37:54: I0416 12:37:54.371117  3837 http.cpp:3502] Processing 
> LAUNCH_NESTED_CONTAINER_SESSION call for container 
> 'db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.2bfd8eed-b528-493b-8434-04311e453dcd'
> 2018-04-16 12:37:54: W0416 12:37:54.371692  3842 http.cpp:2758] Failed to 
> launch container 
> db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.2bfd8eed-b528-493b-8434-04311e453dcd:
>  Parent container 
> db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133 is 
> in 'DESTROYING' state
> 2018-04-16 12:37:54: W0416 12:37:54.371826  3840 containerizer.cpp:2337] 
> Attempted to destroy unknown container 
> db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.2bfd8eed-b528-493b-8434-04311e453dcd
> ...
> 2018-04-16 12:37:55: I0416 12:37:55.504456  3856 http.cpp:3078] Processing 
> REMOVE_NESTED_CONTAINER call for container 
> 'db1c0ab0-3b73-453b-b2b5-a8fc8e1d0ae3.0e44d4d7-629f-41f1-80df-4aae9583d133.check-f3a1238c-7f0f-4db3-bda4-c0ea951d46b6'
> ...
> 2018-04-16 12:37:55: I0416 12:37:55.556367  3857 http.cpp:3502] Processing 
> LAUNCH_NESTED_CONTAINER_SESSION call for contai

[jira] [Comment Edited] (MESOS-9131) Health checks launching nested containers while a container is being destroyed lead to unkillable tasks

2018-08-19 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585328#comment-16585328
 ] 

Qian Zhang edited comment on MESOS-9131 at 8/20/18 1:50 AM:


I found a way to steadily reproduce this issue:

1. Start Mesos master.
{code:java}
$ sudo ./bin/mesos-master.sh --work_dir=/opt/mesos --ip=192.168.56.5
{code}
2. Start Mesos agent.
{code:java}
$ sudo ./bin/mesos-slave.sh --master=192.168.56.5:5050 
--containerizers=mesos,docker --image_providers=docker 
--isolation=filesystem/linux,docker/runtime,network/cni,cgroups/cpu,cgroups/mem 
--network_cni_config_dir=/opt/cni/net_configs 
--network_cni_plugins_dir=/opt/cni/plugins --work_dir=/opt/mesos 
--docker_store_dir=/opt/mesos/store/docker 
--executor_registration_timeout=60mins --ip=192.168.56.5 
--resources='cpus:2;mem:4096;disk:10240'
{code}
3. Launch a task group which has one task.
{code:java}
$ cat task_group.json 
{
  "tasks":[
{
  "name" : "test1",
  "task_id" : {"value" : "test1"},
  "agent_id": {"value" : ""},
  "resources": [
{"name": "cpus", "type": "SCALAR", "scalar": {"value": 0.1}},
{"name": "mem", "type": "SCALAR", "scalar": {"value": 32}}
  ],
  "command": {
"value": "sleep 1000"
  }
}
  ]
}

$ mesos-execute --master=192.168.56.5:5050 
--task_group=file:///home/stack/workspace/config/task_group.json
I0820 09:06:03.655900 16130 scheduler.cpp:188] Version: 1.5.1
I0820 09:06:03.738003 16146 scheduler.cpp:311] Using default 'basic' HTTP 
authenticatee
I0820 09:06:03.739403 16150 scheduler.cpp:494] New master detected at 
master@192.168.56.5:5050
Subscribed with ID 9a871ea9-68aa-40ad-ae2d-f77cab3b63c0-
Submitted task group with tasks [ test1 ] to agent 
'9a871ea9-68aa-40ad-ae2d-f77cab3b63c0-S0'
Received status update TASK_STARTING for task 'test1'
source: SOURCE_EXECUTOR
Received status update TASK_RUNNING for task 'test1'
source: SOURCE_EXECUTOR{code}
4. Run the `dcos` command below to kill the process of the nested container 
(task) launched in step 3. This issue will not be reproduced if `-t` option is 
specified in the command below or `-i` is not specified.
{code:java}
$ dcos task exec -i test1 bash
kill -9 
{code}
Now in the agent log, we can see the nested container launched in step 3 has 
exited (since it got killed by the `dcos` command in step 4), and then agent 
tried to destroy its child container (i.e., the nested container launched in 
step 4 by the `dcos` command), but the destroy never completes. As a result the 
nested container launched in step 3 stuck at `DESTROYING` state, and the 
`mesos-execute` command launched in step 3 never returns, i.e., from its point 
of view, the task is still in the `TASK_RUNNING` status.
{code:java}
I0820 09:09:33.072842 16004 slave.cpp:6865] Current disk usage 40.20%. Max 
allowed age: 3.486303275117465days
I0820 09:09:37.069425 16004 containerizer.cpp:2807] Container 
95aa3e70-4f1d-42f0-93bb-3963d63126b8.8b3080cc-200b-4484-9661-fb2025668dbe has 
exited
I0820 09:09:37.069540 16004 containerizer.cpp:2354] Destroying container 
95aa3e70-4f1d-42f0-93bb-3963d63126b8.8b3080cc-200b-4484-9661-fb2025668dbe in 
RUNNING state
I0820 09:09:37.069591 16004 containerizer.cpp:2968] Transitioning the state of 
container 
95aa3e70-4f1d-42f0-93bb-3963d63126b8.8b3080cc-200b-4484-9661-fb2025668dbe from 
RUNNING to DESTROYING
I0820 09:09:37.071295 16004 linux_launcher.cpp:514] Asked to destroy container 
95aa3e70-4f1d-42f0-93bb-3963d63126b8.8b3080cc-200b-4484-9661-fb2025668dbe.ba5e905c-af5f-41ea-abf4-9b197cabf8f1
I0820 09:09:37.073197 16004 linux_launcher.cpp:560] Using freezer to destroy 
cgroup 
mesos/95aa3e70-4f1d-42f0-93bb-3963d63126b8/mesos/8b3080cc-200b-4484-9661-fb2025668dbe/mesos/ba5e905c-af5f-41ea-abf4-9b197cabf8f1
I0820 09:09:37.076241 16005 cgroups.cpp:3060] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/95aa3e70-4f1d-42f0-93bb-3963d63126b8/mesos/8b3080cc-200b-4484-9661-fb2025668dbe/mesos/ba5e905c-af5f-41ea-abf4-9b197cabf8f1
I0820 09:09:37.079506 16002 cgroups.cpp:1415] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/95aa3e70-4f1d-42f0-93bb-3963d63126b8/mesos/8b3080cc-200b-4484-9661-fb2025668dbe/mesos/ba5e905c-af5f-41ea-abf4-9b197cabf8f1
 after 3.18976ms
I0820 09:09:37.083220 16003 cgroups.cpp:3078] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos/95aa3e70-4f1d-42f0-93bb-3963d63126b8/mesos/8b3080cc-200b-4484-9661-fb2025668dbe/mesos/ba5e905c-af5f-41ea-abf4-9b197cabf8f1
I0820 09:09:37.086514 15999 cgroups.cpp:1444] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos/95aa3e70-4f1d-42f0-93bb-3963d63126b8/mesos/8b3080cc-200b-4484-9661-fb2025668dbe/mesos/ba5e905c-af5f-41ea-abf4-9b197cabf8f1
 after 3.170048ms
I0820 09:09:42.178274 16007 switchboard.cpp:789] Sending SIGTERM to I/O 
switchboard server (pid: 16551) since container 
95aa3e70-4f1d-42f0-93bb-3963d63126b8.8b3080cc-200b-4484-9661-fb2025668dbe.

[jira] [Comment Edited] (MESOS-9131) Health checks launching nested containers while a container is being destroyed lead to unkillable tasks

2018-08-19 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585328#comment-16585328
 ] 

Qian Zhang edited comment on MESOS-9131 at 8/20/18 1:36 AM:


I found a way to steadily reproduce this issue:

1. Start Mesos master.
{code:java}
$ sudo ./bin/mesos-master.sh --work_dir=/opt/mesos --ip=192.168.56.5
{code}
2. Start Mesos agent.
{code:java}
$ sudo ./bin/mesos-slave.sh --master=192.168.56.5:5050 
--containerizers=mesos,docker --image_providers=docker 
--isolation=filesystem/linux,docker/runtime,network/cni,cgroups/cpu,cgroups/mem 
--network_cni_config_dir=/opt/cni/net_configs 
--network_cni_plugins_dir=/opt/cni/plugins --work_dir=/opt/mesos 
--docker_store_dir=/opt/mesos/store/docker 
--executor_registration_timeout=60mins --ip=192.168.56.5 
--resources='cpus:2;mem:4096;disk:10240'
{code}
3. Launch a task group which has one task.
{code:java}
$ cat task_group.json 
{
  "tasks":[
{
  "name" : "test1",
  "task_id" : {"value" : "test1"},
  "agent_id": {"value" : ""},
  "resources": [
{"name": "cpus", "type": "SCALAR", "scalar": {"value": 0.1}},
{"name": "mem", "type": "SCALAR", "scalar": {"value": 32}}
  ],
  "command": {
"value": "sleep 1000"
  }
}
  ]
}

$ mesos-execute --master=192.168.56.5:5050 
--task_group=file:///home/stack/workspace/config/task_group.json
I0820 09:06:03.655900 16130 scheduler.cpp:188] Version: 1.5.1
I0820 09:06:03.738003 16146 scheduler.cpp:311] Using default 'basic' HTTP 
authenticatee
I0820 09:06:03.739403 16150 scheduler.cpp:494] New master detected at 
master@192.168.56.5:5050
Subscribed with ID 9a871ea9-68aa-40ad-ae2d-f77cab3b63c0-
Submitted task group with tasks [ test1 ] to agent 
'9a871ea9-68aa-40ad-ae2d-f77cab3b63c0-S0'
Received status update TASK_STARTING for task 'test1'
source: SOURCE_EXECUTOR
Received status update TASK_RUNNING for task 'test1'
source: SOURCE_EXECUTOR{code}
4. Run the `dcos` command below to kill the process of the nested container 
(task) launched in step 3. This issue will not be reproduced if `-t` option is 
specified in the command below or `-i` is not specified.
{code:java}
$ dcos task exec -i test1 bash
kill -9 
{code}
Now in the agent log, we can see the nested container launched in step 3 has 
exited (since it got killed by the `dcos` command in step 4), and then agent 
tried to destroy its child container (i.e., the nested container launched in 
step 4 by the `dcos` command), but the destroy never completes. As a result the 
nested container launched in step 3 stuck at `DESTROYING` state, and the 
`mesos-execute` command launched in step 3 never returns, i.e., from its point 
of view, the task is still in the `TASK_RUNNING` status.
{code:java}
I0820 09:09:33.072842 16004 slave.cpp:6865] Current disk usage 40.20%. Max 
allowed age: 3.486303275117465days
I0820 09:09:37.069425 16004 containerizer.cpp:2807] Container 
95aa3e70-4f1d-42f0-93bb-3963d63126b8.8b3080cc-200b-4484-9661-fb2025668dbe has 
exited
I0820 09:09:37.069540 16004 containerizer.cpp:2354] Destroying container 
95aa3e70-4f1d-42f0-93bb-3963d63126b8.8b3080cc-200b-4484-9661-fb2025668dbe in 
RUNNING state
I0820 09:09:37.069591 16004 containerizer.cpp:2968] Transitioning the state of 
container 
95aa3e70-4f1d-42f0-93bb-3963d63126b8.8b3080cc-200b-4484-9661-fb2025668dbe from 
RUNNING to DESTROYING
I0820 09:09:37.071295 16004 linux_launcher.cpp:514] Asked to destroy container 
95aa3e70-4f1d-42f0-93bb-3963d63126b8.8b3080cc-200b-4484-9661-fb2025668dbe.ba5e905c-af5f-41ea-abf4-9b197cabf8f1
I0820 09:09:37.073197 16004 linux_launcher.cpp:560] Using freezer to destroy 
cgroup 
mesos/95aa3e70-4f1d-42f0-93bb-3963d63126b8/mesos/8b3080cc-200b-4484-9661-fb2025668dbe/mesos/ba5e905c-af5f-41ea-abf4-9b197cabf8f1
I0820 09:09:37.076241 16005 cgroups.cpp:3060] Freezing cgroup 
/sys/fs/cgroup/freezer/mesos/95aa3e70-4f1d-42f0-93bb-3963d63126b8/mesos/8b3080cc-200b-4484-9661-fb2025668dbe/mesos/ba5e905c-af5f-41ea-abf4-9b197cabf8f1
I0820 09:09:37.079506 16002 cgroups.cpp:1415] Successfully froze cgroup 
/sys/fs/cgroup/freezer/mesos/95aa3e70-4f1d-42f0-93bb-3963d63126b8/mesos/8b3080cc-200b-4484-9661-fb2025668dbe/mesos/ba5e905c-af5f-41ea-abf4-9b197cabf8f1
 after 3.18976ms
I0820 09:09:37.083220 16003 cgroups.cpp:3078] Thawing cgroup 
/sys/fs/cgroup/freezer/mesos/95aa3e70-4f1d-42f0-93bb-3963d63126b8/mesos/8b3080cc-200b-4484-9661-fb2025668dbe/mesos/ba5e905c-af5f-41ea-abf4-9b197cabf8f1
I0820 09:09:37.086514 15999 cgroups.cpp:1444] Successfully thawed cgroup 
/sys/fs/cgroup/freezer/mesos/95aa3e70-4f1d-42f0-93bb-3963d63126b8/mesos/8b3080cc-200b-4484-9661-fb2025668dbe/mesos/ba5e905c-af5f-41ea-abf4-9b197cabf8f1
 after 3.170048ms
I0820 09:09:42.178274 16007 switchboard.cpp:789] Sending SIGTERM to I/O 
switchboard server (pid: 16551) since container 
95aa3e70-4f1d-42f0-93bb-3963d63126b8.8b3080cc-200b-4484-9661-fb2025668dbe.