[jira] [Comment Edited] (MESOS-9228) SLRP does not clean up plugin containers after it is removed.

2018-09-21 Thread Chun-Hung Hsiao (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624534#comment-16624534
 ] 

Chun-Hung Hsiao edited comment on MESOS-9228 at 9/22/18 6:45 AM:
-

{noformat}
commit 0581437999a262277f695592c6a1be9d30bf8c31
Author: Chun-Hung Hsiao 
Date: Tue Sep 18 12:59:52 2018 -0700

Removed unnecessary failure handling in agent HTTP API handlers.

The current agent HTTP API handlers either unnecessarily handle failures
on always-ready futures, or return "500 Internal Server Error"
unnecessarily. This patch removes those unnecessarily code.

Review: https://reviews.apache.org/r/68755{noformat}
{noformat}
commit 1adceaacc81bb2a52a9f28a864ac928ac706
Author: Chun-Hung Hsiao 
Date: Tue Sep 18 14:40:55 2018 -0700

Performed RP-specific validations when adding/updating RP configs.

Each type of RP might have some specific validations for RP info. For
example, SLRP requires the `storage` field to be set. This patch makes
the local RP daemon to perform such validations when adding/updating
configs, so the `ADD_RESOURCE_PROVIDER_CONFIG` and
`UPDATE_RESOURCE_PROVIDER_CONFIG` calls can fail fast.

Review: https://reviews.apache.org/r/68756{noformat}
{noformat}
commit ae41c14e47dd09db82fb237a9987fde8e100c8be
Author: Chun-Hung Hsiao 
Date: Tue Sep 18 11:54:29 2018 -0700

Removed `ROOT` requirements for `AgentResourceProviderConfigApiTest`.

These tests required `ROOT` in order to use `filesystem/linux`
isolation this is not a requirement anymore so we can run the tests in
general. These tests appear to be able to run in parallel as well.

We also changed the `AddConflict` test a bit to make it more robust.

Review: https://reviews.apache.org/r/68757{noformat}
{noformat}
commit c174b9d3a7eb842b6de66256f0373c5e12b00cce
Author: Chun-Hung Hsiao 
Date: Wed Sep 19 18:07:50 2018 -0700

Set master/agent flags in `AgentResourceProviderConfigApiTest` fixture.

Review: https://reviews.apache.org/r/68777{noformat}
{noformat}
commit a31509e4206edc12fbc7ed6a8e8f87c36e02f34d
Author: Chun-Hung Hsiao 
Date: Tue Sep 18 13:25:06 2018 -0700

Added unit tests for adding/updating invalid resource provider configs.

Review: https://reviews.apache.org/r/68758{noformat}
{noformat}
commit 88b28f2cd900a53a8d54b7be837ccc9813e3b764
Author: Chun-Hung Hsiao 
Date: Thu Sep 20 11:25:11 2018 -0700

Moved the container ID prefix generation to `LocalResourceProvider`.

It is more reasonable to not allow each specific resource provider to
construct their own container ID prefix, otherwise it would be hard to
avoid conflicts. Therefore we now established the convension of how the
prefix is constructed in `LocalResourceProvider`.

Review: https://reviews.apache.org/r/68790{noformat}
{noformat}
commit 6366b5c1e5e60dfda5ca0368d6a22da998f0cfa4
Author: Chun-Hung Hsiao 
Date: Thu Sep 20 15:06:51 2018 -0700

Cleaned up residual containers when removing resource provider configs.

When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
provider daemon now performs a best-effort cleanup by killing all
standalone containers prefixed by the 'cid_prefix' of the resource
provider. During the cleanup, no resource provider config with the same
type and name can be added.

Review: https://reviews.apache.org/r/68763{noformat}
{noformat}
commit 678fa8b44bc9c09f5f9908a3a4511f7195150d7b
Author: Chun-Hung Hsiao 
Date: Tue Sep 18 21:40:35 2018 -0700

Tested container cleanup in `AgentResourceProviderConfigApiTest.Remove`.

Review: https://reviews.apache.org/r/68762{noformat}
 


was (Author: chhsia0):
{noformat}
commit 0581437999a262277f695592c6a1be9d30bf8c31
Author: Chun-Hung Hsiao 
Date: Tue Sep 18 12:59:52 2018 -0700

Removed unnecessary failure handling in agent HTTP API handlers.

The current agent HTTP API handlers either unnecessarily handle failures
on always-ready futures, or return "500 Internal Server Error"
unnecessarily. This patch removes those unnecessarily code.

Review: https://reviews.apache.org/r/68755{noformat}
{noformat}
commit 1adceaacc81bb2a52a9f28a864ac928ac706
Author: Chun-Hung Hsiao 
Date: Tue Sep 18 14:40:55 2018 -0700

Performed RP-specific validations when adding/updating RP configs.

Each type of RP might have some specific validations for RP info. For
example, SLRP requires the `storage` field to be set. This patch makes
the local RP daemon to perform such validations when adding/updating
configs, so the `ADD_RESOURCE_PROVIDER_CONFIG` and
`UPDATE_RESOURCE_PROVIDER_CONFIG` calls can fail fast.

Review: https://reviews.apache.org/r/68756{noformat}
{noformat}
commit ae41c14e47dd09db82fb237a9987fde8e100c8be
Author: Chun-Hung Hsiao 
Date: Tue Sep 18 11:54:29 2018 -0700

Removed `ROOT` requirements for `AgentResourceProviderConfigApiTest`.

These tests required `ROOT` in order to use `filesystem/linux`
isolation this is not a requirement anymore so we can run the tests in
general. These te

[jira] [Commented] (MESOS-9228) SLRP does not clean up plugin containers after it is removed.

2018-09-21 Thread Chun-Hung Hsiao (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624535#comment-16624535
 ] 

Chun-Hung Hsiao commented on MESOS-9228:


Backported to 1.7.x:
{noformat}
commit 86bd9e4c5b5c0d807bbbe54e10e53f389cd7a7ec
Author: Chun-Hung Hsiao 
Date: Tue Sep 18 12:59:52 2018 -0700

Removed unnecessary failure handling in agent HTTP API handlers.

The current agent HTTP API handlers either unnecessarily handle failures
on always-ready futures, or return "500 Internal Server Error"
unnecessarily. This patch removes those unnecessarily code.

Review: https://reviews.apache.org/r/68755{noformat}
{noformat}
commit 84fa09e3a7a807edce7c0655622e0662ac97031d
Author: Chun-Hung Hsiao 
Date: Tue Sep 18 14:40:55 2018 -0700

Performed RP-specific validations when adding/updating RP configs.

Each type of RP might have some specific validations for RP info. For
example, SLRP requires the `storage` field to be set. This patch makes
the local RP daemon to perform such validations when adding/updating
configs, so the `ADD_RESOURCE_PROVIDER_CONFIG` and
`UPDATE_RESOURCE_PROVIDER_CONFIG` calls can fail fast.

Review: https://reviews.apache.org/r/68756{noformat}
{noformat}
commit 76bc3efe4822f285a6783170ba2c12e3546aec35
Author: Chun-Hung Hsiao 
Date: Thu Sep 20 11:25:11 2018 -0700

Moved the container ID prefix generation to `LocalResourceProvider`.

It is more reasonable to not allow each specific resource provider to
construct their own container ID prefix, otherwise it would be hard to
avoid conflicts. Therefore we now established the convension of how the
prefix is constructed in `LocalResourceProvider`.

Review: https://reviews.apache.org/r/68790{noformat}
{noformat}
commit 39e6911567517689452be23893f9031fb02ceebf
Author: Chun-Hung Hsiao 
Date: Thu Sep 20 15:06:51 2018 -0700

Cleaned up residual containers when removing resource provider configs.

When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
provider daemon now performs a best-effort cleanup by killing all
standalone containers prefixed by the 'cid_prefix' of the resource
provider. During the cleanup, no resource provider config with the same
type and name can be added.

Review: https://reviews.apache.org/r/68763{noformat}

> SLRP does not clean up plugin containers after it is removed.
> -
>
> Key: MESOS-9228
> URL: https://issues.apache.org/jira/browse/MESOS-9228
> Project: Mesos
>  Issue Type: Bug
>  Components: storage
>Affects Versions: 1.5.0, 1.6.0, 1.7.0
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>Priority: Blocker
>  Labels: mesosphere, storage
> Fix For: 1.7.1, 1.8.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9251) Add a benchmark to measure end-to-end offer generation performance.

2018-09-21 Thread Meng Zhu (JIRA)
Meng Zhu created MESOS-9251:
---

 Summary: Add a benchmark to measure end-to-end offer generation 
performance.
 Key: MESOS-9251
 URL: https://issues.apache.org/jira/browse/MESOS-9251
 Project: Mesos
  Issue Type: Improvement
Reporter: Meng Zhu


The existing allocator benchmarks only measure the performance of the allocator 
actor. They do not cover the offercallback and the fanout out path in the 
master actor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-6765) Consider making the Resources wrapper "copy-on-write" to improve performance.

2018-09-21 Thread Meng Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624276#comment-16624276
 ] 

Meng Zhu commented on MESOS-6765:
-

{noformat}
commit fa7f659b1952344cf52c725ccdf8f86d7a01296b
Author: Meng Zhu 
Date:   Thu Sep 20 13:20:06 2018 -0700

Optimized `class Resources` with copy-on-write.

This patch lets `class Resources` only store shared
pointers to the underlying resource objects, so that
read-only filter operations such as `reserved()`,
`unreserved()` and etc. can avoid making copies of
the whole resource objects. Instead, only shared pointers
are copied.

In write operations, we check if there are more than one
references to the resource object. If so, a copy is made
for safe mutation without affecting owners.

To maintain the usage abstraction that `class Resources`
still holds resource objects, we utilize
`boost::indirect_iterator` iterator adapter to deference
the shared pointers as we iterate.

Review: https://reviews.apache.org/r/68490/
{noformat}


{noformat}
commit 4ffd1099ca8f3a747de9a5282c4cac791113c5cc (HEAD -> master, apache/master)
Author: Meng Zhu m...@mesosphere.io
Date:   Thu Sep 20 14:03:00 2018 -0700


Mitigated accidental mutation of `Resources::resources`.

Due to the copy-on-write optimization (MESOS-6765), one needs to
check the `use_count` of `Resource_` before mutating. Currently,
there is no mechanism to enforce this. As a short-term mitigation
measure, we rename `resources` to
`resourcesNoMutationWithoutExclusiveOwnership` and typedef its item
type to `Resource_UnSafe`
to alert people about obtaining an exclusive ownership before mutating
the `Resource_` objects.
{noformat}


> Consider making the Resources wrapper "copy-on-write" to improve performance.
> -
>
> Key: MESOS-6765
> URL: https://issues.apache.org/jira/browse/MESOS-6765
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>Assignee: Meng Zhu
>Priority: Major
>  Labels: mesosphere, performance
> Fix For: 1.8.0
>
>
> Resources currently directly stores the underlying resource objects:
> {code}
> class Resources
> {
>   ...
>   std::vector resources;
> };
> {code}
> What this means is that copying of Resources (which occurs frequently) is 
> expensive since copying a {{Resource}} object is relatively heavy-weight.
> One strategy, in MESOS-4770, is to avoid protobuf in favor of C++ types (i.e. 
> replace {{Value::Scalar}}, {{Value::Set}}, and {{Value::Ranges}} with C++ 
> equivalents). However, metadata like reservations, disk info, etc, is still 
> fairly expensive to copy even if avoiding protobufs.
> An approach to reduce copying would be to only copy the resource objects upon 
> writing, when there are multiple references to the resource object. If there 
> is a single reference to the resource object we could safely mutate it 
> without copying. E.g.
> {code}
> class Resource
> {
>   ...
>   std::vector> resources;
> };
> // Mutation function:
> void Resources::mutate(size_t index)
> {
>   // Copy if there are multiple references.
>   if (resources[i].use_count() > 1) {
> resources[i] = copy(resources[i]);
>   }
>   // Mutate safely.
>   resources[i].some_mutation();
> }
> {code}
> On the other hand, this introduces a additional level of pointer chasing. So 
> we would need to weigh the approaches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9250) Introduce a copy-on-write type to enforce safe mutation with exclusive ownership.

2018-09-21 Thread Meng Zhu (JIRA)
Meng Zhu created MESOS-9250:
---

 Summary: Introduce a copy-on-write type to enforce safe mutation 
with exclusive ownership.
 Key: MESOS-9250
 URL: https://issues.apache.org/jira/browse/MESOS-9250
 Project: Mesos
  Issue Type: Improvement
Reporter: Meng Zhu
Assignee: Meng Zhu


Resource copy-on-write optimization has been landed (MESOS-6765). But the 
no-mutation-without-exclusive-ownership rule is not enforced. This is 
error-prone as ownership needs to be manually checked every time:

{code:c++}
if (resource_.use_count() > 1) {
  resource_ = make_shared(*resource_);
}
{code}

It will be great to introduce a templated COW wrapper class that enforces such 
semantic, such construct could also be reused in other places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9249) Avoid dirtying the DRF sorter when allocating resources.

2018-09-21 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-9249:
--

 Summary: Avoid dirtying the DRF sorter when allocating resources.
 Key: MESOS-9249
 URL: https://issues.apache.org/jira/browse/MESOS-9249
 Project: Mesos
  Issue Type: Improvement
  Components: allocation
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler


The perf data that [~mzhu] provided revealed that when there are many 
frameworks, the {{DRFSorter::sort()}} function ends up dominating the time 
spent during an allocation cycle.

To improve performance, it's possible to avoid dirtying the sorter so that it 
remains sorted throughout the entire allocation cycle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9239) Improve sorting performance in the DRF sorter.

2018-09-21 Thread Benjamin Mahler (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624262#comment-16624262
 ] 

Benjamin Mahler commented on MESOS-9239:


https://reviews.apache.org/r/68732/

> Improve sorting performance in the DRF sorter.
> --
>
> Key: MESOS-9239
> URL: https://issues.apache.org/jira/browse/MESOS-9239
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Major
>  Labels: performance
>
> The sorting performance of the DRF sorter is negatively affected by the use 
> of hashmaps introduced originally in MESOS-4964.
> Storing a cache-friendlier data structure and avoiding hashing overhead would 
> improve the sort performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9248) Consider adding V0 API (Java and Python bindings) to api-client-libraries.md.

2018-09-21 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624256#comment-16624256
 ] 

Till Toenshoff commented on MESOS-9248:
---

Linking towards docs/app-framework-development-guide.md might also be helpful.

> Consider adding V0 API (Java and Python bindings) to api-client-libraries.md.
> -
>
> Key: MESOS-9248
> URL: https://issues.apache.org/jira/browse/MESOS-9248
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Till Toenshoff
>Priority: Minor
>
> {{docs/api-client-libraries.md}} is exclusively handling V1 APIs. We should 
> consider offering a subsection that explains the downsides of V0 but also 
> mentions the Python and Java bindings so that newcomers may have a quick run 
> without further external dependencies. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9248) Consider adding V0 API (Java and Python bindings) to api-client-libraries.md.

2018-09-21 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-9248:
-

 Summary: Consider adding V0 API (Java and Python bindings) to 
api-client-libraries.md.
 Key: MESOS-9248
 URL: https://issues.apache.org/jira/browse/MESOS-9248
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff


{{docs/api-client-libraries.md}} is exclusively handling V1 APIs. We should 
consider offering a subsection that explains the downsides of V0 but also 
mentions the Python and Java bindings so that newcomers may have a quick run 
without further external dependencies. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-895) Unbundle libev.

2018-09-21 Thread James Peach (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624201#comment-16624201
 ] 

James Peach commented on MESOS-895:
---

{noformat}
commit 0b9861e356ec2d7d50163ae54a6be9c1c45f279b
Author: James Peach 
Date:   Fri Sep 21 14:13:29 2018 -0700

Removed bundled libev patch.

Since we now disable the libev SIGCHLD handler at runtime, we no longer
need to bundle the patch to do it at build time. It is still useful to
bundle libev itself, to support older distributions.

Review: https://reviews.apache.org/r/68800/
{noformat}

> Unbundle libev.
> ---
>
> Key: MESOS-895
> URL: https://issues.apache.org/jira/browse/MESOS-895
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Timothy St. Clair
>Assignee: James Peach
>Priority: Major
>  Labels: tech-debt
>
> The libev patch can easily be removed and update the configuration flags and 
> possibly the accompanying code prior to include.   
> For configure pass in: 
> CFLAGS=-DEV_CHILD_ENABLE=0
> For inclusion: 
> #define EV_CHILD_ENABLE 0
> include 
> excerpt from maintainer: 
>  that patch is unnecessary
>  schmorp, so if they wanted to just set EV_CHILD_ENABLE=0 they 
> could just pass CFLAGS=-DEV_CHILD_ENABLE=0  through.
>  tstclair: yes, or use a wrapper



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-895) Unbundle libev.

2018-09-21 Thread James Peach (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624201#comment-16624201
 ] 

James Peach edited comment on MESOS-895 at 9/21/18 9:21 PM:


{noformat}
commit 0b9861e356ec2d7d50163ae54a6be9c1c45f279b
Author: James Peach 
Date:   Fri Sep 21 14:13:29 2018 -0700

Removed bundled libev patch.

Since we now disable the libev SIGCHLD handler at runtime, we no longer
need to bundle the patch to do it at build time. It is still useful to
bundle libev itself, to support older distributions.

Review: https://reviews.apache.org/r/68800/
{noformat}


was (Author: jamespeach):
{noformat}
commit 0b9861e356ec2d7d50163ae54a6be9c1c45f279b
Author: James Peach 
Date:   Fri Sep 21 14:13:29 2018 -0700

Removed bundled libev patch.

Since we now disable the libev SIGCHLD handler at runtime, we no longer
need to bundle the patch to do it at build time. It is still useful to
bundle libev itself, to support older distributions.

Review: https://reviews.apache.org/r/68800/
{noformat}

> Unbundle libev.
> ---
>
> Key: MESOS-895
> URL: https://issues.apache.org/jira/browse/MESOS-895
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.17.0
>Reporter: Timothy St. Clair
>Assignee: James Peach
>Priority: Major
>  Labels: tech-debt
>
> The libev patch can easily be removed and update the configuration flags and 
> possibly the accompanying code prior to include.   
> For configure pass in: 
> CFLAGS=-DEV_CHILD_ENABLE=0
> For inclusion: 
> #define EV_CHILD_ENABLE 0
> include 
> excerpt from maintainer: 
>  that patch is unnecessary
>  schmorp, so if they wanted to just set EV_CHILD_ENABLE=0 they 
> could just pass CFLAGS=-DEV_CHILD_ENABLE=0  through.
>  tstclair: yes, or use a wrapper



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9247) MasterAPITest.EventAuthorizationFiltering is flaky

2018-09-21 Thread Greg Mann (JIRA)
Greg Mann created MESOS-9247:


 Summary: MasterAPITest.EventAuthorizationFiltering is flaky
 Key: MESOS-9247
 URL: https://issues.apache.org/jira/browse/MESOS-9247
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.7.0
Reporter: Greg Mann


Saw this failure on a CentOS 6 SSL build in our internal CI. Build log 
attached. For some reason, it seems that the initial {{TASK_ADDED}} event is 
missed:
{code}
../../src/tests/api_tests.cpp:2922
  Expected: v1::master::Event::TASK_ADDED
  Which is: TASK_ADDED
To be equal to: event->get().type()
  Which is: TASK_UPDATED
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-09-21 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619417#comment-16619417
 ] 

Alexander Rukletsov edited comment on MESOS-8545 at 9/21/18 1:01 PM:
-

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit bfa2bd24780b5c49467b3c23260855e3d8b4c948
Author: Andrei Budnik 
AuthorDate: Fri Sep 21 14:51:24 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Fri Sep 21 14:51:24 2018 +0200

Fixed disconnection while sending acknowledgment to IOSwitchboard.

Previously, an HTTP connection to the IOSwitchboard could be garbage
collected before the agent sent an acknowledgment to the IOSwitchboard
via this connection. This patch fixes the issue by keeping a reference
count to the connection in a lambda callback until disconnection
occurs.

Review: https://reviews.apache.org/r/68768/
{noformat}
{noformat}
commit c3c77cbef818d497d8bd5e67fa72e55a7190e27a
Author: Andrei Budnik 
AuthorDate: Fri Sep 21 14:51:59 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Fri Sep 21 14:51:59 2018 +0200

Fixed broken pipe error in IOSwitchboard.

Previous attempt to fix `HTTP 500` "broken pipe" in review /r/62187/
was not correct: after IOSwitchboard sends a response to the agent for
the `ATTACH_CONTAINER_INPUT` call, the socket is closed immediately,
thus causing the error on the agent. This patch adds a delay after
IO redirects are finished and before IOSwitchboard forcibly send a
response.

Review: https://reviews.apache.org/r/68784/
{noformat}
*{{1.7.1}}*:
{noformat}
commit 1672941630960cccf66ed81b11811d84e8a4e3f0
commit 600b388e25c49f4fac4d39bc07bcf6ffce42c679
commit 021a8f4de1ad65167946548e3ecfa74d8e41e9c5
commit 38a914398b6f1aaf08db4f62f4e42cdb80127eb5
{noformat}
*{{1.6.2}}*:
{noformat}
commit 2ddd6f07bebbe91e1e0d5165c4a5ae552b836303
commit c1448f36d4c2c2c8345e7e8d1bf1f206dba18dac
commit 55b0e94f0c8a1896ca079361d89527123faf22c6
commit c40c92b7710b5b238b13ce6f1bacd3d75e04283b
{noformat}
*{{1.5.2}}*:
{noformat}
commit 3bf4fe22e0ed828a36d5b2ea652d07c6eef4b578
commit 33a6bec95b44592d626874ae8deaa3e2a3bbc120
commit 7b8195680104c2c5f61073a956f60ac961c37f45
commit 0216002744517a6053fd782b6b4dc3d6cf77dd5e
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the ag

[jira] [Comment Edited] (MESOS-8545) AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.

2018-09-21 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619417#comment-16619417
 ] 

Alexander Rukletsov edited comment on MESOS-8545 at 9/21/18 12:56 PM:
--

*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives an acknowledgment, and io redirects are
finished, it terminates itself. That guarantees that the agent always
receives a response for the `ATTACH_CONTAINER_INPUT` call.

Review: https://reviews.apache.org/r/65168/
{noformat}
{noformat}
commit bfa2bd24780b5c49467b3c23260855e3d8b4c948
Author: Andrei Budnik 
AuthorDate: Fri Sep 21 14:51:24 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Fri Sep 21 14:51:24 2018 +0200

Fixed disconnection while sending acknowledgment to IOSwitchboard.

Previously, an HTTP connection to the IOSwitchboard could be garbage
collected before the agent sent an acknowledgment to the IOSwitchboard
via this connection. This patch fixes the issue by keeping a reference
count to the connection in a lambda callback until disconnection
occurs.

Review: https://reviews.apache.org/r/68768/
{noformat}
{noformat}
commit c3c77cbef818d497d8bd5e67fa72e55a7190e27a
Author: Andrei Budnik 
AuthorDate: Fri Sep 21 14:51:59 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Fri Sep 21 14:51:59 2018 +0200

Fixed broken pipe error in IOSwitchboard.

Previous attempt to fix `HTTP 500` "broken pipe" in review /r/62187/
was not correct: after IOSwitchboard sends a response to the agent for
the `ATTACH_CONTAINER_INPUT` call, the socket is closed immediately,
thus causing the error on the agent. This patch adds a delay after
IO redirects are finished and before IOSwitchboard forcibly send a
response.

Review: https://reviews.apache.org/r/68784/
{noformat}
*{{1.7.1}}*:
{noformat}
commit 1672941630960cccf66ed81b11811d84e8a4e3f0
commit 600b388e25c49f4fac4d39bc07bcf6ffce42c679
{noformat}
*{{1.6.2}}*:
{noformat}
commit 2ddd6f07bebbe91e1e0d5165c4a5ae552b836303
commit c1448f36d4c2c2c8345e7e8d1bf1f206dba18dac
{noformat}
*{{1.5.2}}*:
{noformat}
commit 3bf4fe22e0ed828a36d5b2ea652d07c6eef4b578
commit 33a6bec95b44592d626874ae8deaa3e2a3bbc120
{noformat}


was (Author: alexr):
*{{master}} aka {{1.8-dev}}*:
{noformat}
commit 5b95bb0f21852058d22703385f2c8e139881bf1a
Author: Andrei Budnik 
AuthorDate: Tue Sep 18 19:10:14 2018 +0200
Commit: Alexander Rukletsov 
CommitDate: Tue Sep 18 19:10:14 2018 +0200

Fixed HTTP errors caused by dropped HTTP responses by IOSwitchboard.

Previously, IOSwitchboard process could terminate before all HTTP
responses had been sent to the agent. In the case of
`ATTACH_CONTAINER_INPUT` call, we could drop a final HTTP `200 OK`
response, so the agent got broken HTTP connection for the call.
This patch introduces an acknowledgment for the received response
for the `ATTACH_CONTAINER_INPUT` call. This acknowledgment is a new
type of control messages for the `ATTACH_CONTAINER_INPUT` call. When
IOSwitchboard receives