[jira] [Created] (MESOS-8304) Update CHANGELOG to call out agent reconfiguration feature

2017-12-05 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-8304:
-

 Summary: Update CHANGELOG to call out agent reconfiguration feature
 Key: MESOS-8304
 URL: https://issues.apache.org/jira/browse/MESOS-8304
 Project: Mesos
  Issue Type: Documentation
Reporter: Vinod Kone
Assignee: Benno Evers


This is a feature worth calling out in the 1.5.0 CHANGELOG.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8303) Add user doc for agent reconfiguration

2017-12-05 Thread Vinod Kone (JIRA)
Vinod Kone created MESOS-8303:
-

 Summary: Add user doc for agent reconfiguration
 Key: MESOS-8303
 URL: https://issues.apache.org/jira/browse/MESOS-8303
 Project: Mesos
  Issue Type: Documentation
Reporter: Vinod Kone
Assignee: Benno Evers






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8302) Improve master failover performance.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279353#comment-16279353
 ] 

Michael Park commented on MESOS-8302:
-

{noformat}
commit 6839897c5464fce6b8cbd253d959a7e2efd72987
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:21:27 2017 -0800

Made `Event` move-only in libprocess.

Review: https://reviews.apache.org/r/64347/
{noformat}
{noformat}
commit c9e6a03c02e9f8dc040b937ccd5ae89e5530fd7e
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:21:11 2017 -0800

Used `std::move` for `Event`s consumption in the master.

Review: https://reviews.apache.org/r/63641/
{noformat}

> Improve master failover performance.
> 
>
> Key: MESOS-8302
> URL: https://issues.apache.org/jira/browse/MESOS-8302
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> This is somewhat more like an epic, but will track the different improvements 
> here for now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8302) Improve master failover performance.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279350#comment-16279350
 ] 

Michael Park commented on MESOS-8302:
-

{noformat}
commit c9462f4927cfffb1f3a90827467ded730c0f40b9
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:28 2017 -0800

Migrated to use the `EventConsumer` interface.

Review: https://reviews.apache.org/r/63632/
{noformat}
{noformat}
commit 6b91f62769a1f8c525162fc716b1d5c231c77811
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:23 2017 -0800

Separated `Event` visitation and consumption.

This introduces the `EventConsumer` interface, which add support of
`Event`s with move-only data. This allows consumers to move data out of
`Event`, rather than copying it. This is required to implement move-only
objects support in `defer` to guarantee that deferred function object is
invoked only once, allowing deferred parameters to be moved into call.

Review: https://reviews.apache.org/r/63631/
{noformat}
{noformat}
commit 1074a9c3fd7aa9d2e4b484f86dbe657271abecc0
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 07:23:28 2017 -0800

Fixed the signature of `CallableOnce::operator()`.

This changes `operator()` signature to match the one defined in
`CallableOnce` template parameter. Previously used form incorrectly
specifies that `operator()` can be invoked with arbitrary number and
types of parameters, which can break other templates using SFINAE to
check if function object can be invoked with specific parameters.

Review: https://reviews.apache.org/r/64337/
{noformat}

> Improve master failover performance.
> 
>
> Key: MESOS-8302
> URL: https://issues.apache.org/jira/browse/MESOS-8302
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> This is somewhat more like an epic, but will track the different improvements 
> here for now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8301) Support moving into defer/dispatch/install handlers.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279346#comment-16279346
 ] 

Michael Park commented on MESOS-8301:
-

{noformat}
commit 8014e3f9e1838745a6f3af7c1e2a557bd74349b0
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:20:55 2017 -0800

Added `CallableOnce` support in `Future`.

`Future` guarantees that callbacks are called at most once, so it can
use `lambda::CallableOnce` to expicitly declare this, and allow
corresponding optimizations with moves.

Review: https://reviews.apache.org/r/63638/
{noformat}
{noformat}
commit 09b72e9bbf87793ce84df5d5f9d5f292c60fa5ee
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:20:41 2017 -0800

Added `CallableOnce` support in `defer`.

This allows `defer` result to be converted to `CallableOnce`, ensuring
that bound arguments are moved, when call is made, and avoiding making
copies of bound arguments.

Review: https://reviews.apache.org/r/63637/
{noformat}

> Support moving into defer/dispatch/install handlers.
> 
>
> Key: MESOS-8301
> URL: https://issues.apache.org/jira/browse/MESOS-8301
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> Currently, dispatch and defer will take copies of the provided arguments. 
> Also, a protobuf message handler cannot move the supplied message. We should 
> support moves for these for efficiency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8301) Support moving into defer/dispatch/install handlers.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279347#comment-16279347
 ] 

Michael Park commented on MESOS-8301:
-

{noformat}
commit bbd8381ebce3522841e80ae53f56b3049342f15b
Author: Dmitry Zhuk 
Date:   Tue Dec 5 13:47:53 2017 -0800

Replaced `std::shared_ptr` with `std::unique_ptr` in `Future`.

Review: https://reviews.apache.org/r/63913/
{noformat}
{noformat}
commit bca8c6a05d03a2162c04703a9c1ac8172fdfae8a
Author: Dmitry Zhuk 
Date:   Tue Dec 5 13:45:56 2017 -0800

Replaced `std::shared_ptr` with `std::unique_ptr` in `dispatch`.

Since `dispatch` can now handle move-only parameters, `Promise` and
function object can be wrapped into `std::unique_ptr` for efficiency.
{noformat}

> Support moving into defer/dispatch/install handlers.
> 
>
> Key: MESOS-8301
> URL: https://issues.apache.org/jira/browse/MESOS-8301
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> Currently, dispatch and defer will take copies of the provided arguments. 
> Also, a protobuf message handler cannot move the supplied message. We should 
> support moves for these for efficiency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8301) Support moving into defer/dispatch/install handlers.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279344#comment-16279344
 ] 

Michael Park commented on MESOS-8301:
-

{noformat}
commit 13eda27802cfe05800d4dcbed22c46ca7b46bafb
Author: Dmitry Zhuk 
Date:   Tue Dec 5 10:40:37 2017 -0800

Prepared `defer` for use in callable-once contexts.

This changes `defer` to use `lambda::partial` instead of `std::bind`,
which allows it be used in callable-once contexts.

Review: https://reviews.apache.org/r/63635/
{noformat}
{noformat}
commit 0d9ce98e9df97be06144d2e29cf23a9c090a06b3
Author: Dmitry Zhuk 
Date:   Tue Dec 5 10:39:47 2017 -0800

Changed dispatch to use callable once functors.

`dispatch` guarantees that functor will be called at most once, and
therefore it allows optimizations, such as moves of deferred objects.

Review: https://reviews.apache.org/r/63634/
{noformat}

> Support moving into defer/dispatch/install handlers.
> 
>
> Key: MESOS-8301
> URL: https://issues.apache.org/jira/browse/MESOS-8301
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> Currently, dispatch and defer will take copies of the provided arguments. 
> Also, a protobuf message handler cannot move the supplied message. We should 
> support moves for these for efficiency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8302) Improve master failover performance.

2017-12-05 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279341#comment-16279341
 ] 

Benjamin Mahler commented on MESOS-8302:


{noformat}
commit f8e4f11e796b2b0d9bd101a7eb6f106e72cbaee1
Author: Dmitry Zhuk 
Date:   Mon Aug 7 11:01:04 2017 -0700

Reduced copying in `defer`, `dispatch` and `Future`.

This reduces number of copies made for each parameter in
a piece of code like this:

```
future.then(defer(pid, ::someMethod, param1, param2));
```

For the objects that do not support move semantics
(e.g., protobuf messages), number of copies is reduced from 8-10 to 6.
If move semantics is supported, then number of copies is reduced from
6-7 to 1 if parameter is passed with `std::move`, or 2 otherwise.

Review: https://reviews.apache.org/r/60003/
{noformat}

{noformat}
commit 834053d976e2db18c16e1612b3b723fe1c8ca1ac
Author: Dmitry Zhuk 
Date:   Thu Oct 12 14:45:26 2017 -0700

Used protobuf arenas for creating messages in ProtobufProcess.

When passing const protobuf messages and fields, we can allocate
the protobuf message within an arena. Arenas dramatically reduce
the number of malloc's involved. The use of arenas also improves
the cache locality of the protobuf memory.

Review: https://reviews.apache.org/r/62901/
{noformat}

{noformat}
commit f569b841cbd8d8b07eb906a711aeb2d395096af6
Author: Dmitry Zhuk 
Date:   Fri Oct 13 12:06:31 2017 -0700

Simplified RepeatedPtrField to vector conversion.

It's also possible that `vector` can take advantage of
`RepeatedPtrField` iterators implementing `RandomAccessIterator`
concept and avoid buffer resizes, when constructed from range.

Review: https://reviews.apache.org/r/62973/
{noformat}

> Improve master failover performance.
> 
>
> Key: MESOS-8302
> URL: https://issues.apache.org/jira/browse/MESOS-8302
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> This is somewhat more like an epic, but will track the different improvements 
> here for now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8301) Support moving into defer/dispatch/install handlers.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279337#comment-16279337
 ] 

Michael Park commented on MESOS-8301:
-

{noformat}
commit ececa115966067551e00098f973f0104a0782730
Author: Michael Park 
Date:   Tue Dec 5 09:57:51 2017 -0800

Removed `constexpr`-ness from `cpp17::invoke`.

`std::invoke` is not marked `constexpr`.

Review: https://reviews.apache.org/r/64332/
{noformat}
{noformat}
commit 59329cbc605d578572e90db62070b2bb99fdbb9c
Author: Michael Park 
Date:   Tue Dec 5 09:57:40 2017 -0800

Added more `cpp17::invoke` test cases.

Review: https://reviews.apache.org/r/64331/
{noformat}

> Support moving into defer/dispatch/install handlers.
> 
>
> Key: MESOS-8301
> URL: https://issues.apache.org/jira/browse/MESOS-8301
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> Currently, dispatch and defer will take copies of the provided arguments. 
> Also, a protobuf message handler cannot move the supplied message. We should 
> support moves for these for efficiency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8301) Support moving into defer/dispatch/install handlers.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279336#comment-16279336
 ] 

Michael Park commented on MESOS-8301:
-

{noformat}
commit 8670d2e9224485b94a40654d4a2102d8340fe4ac
Author: Michael Park 
Date:   Mon Dec 4 17:25:36 2017 -0800

Added `lambda::partial` to .

Review: https://reviews.apache.org/r/64274/
{noformat}
{noformat}
commit 3b9db404cb0b50edef1958423b26364e63ccaa27
Author: Michael Park 
Date:   Mon Dec 4 17:25:33 2017 -0800

Added `cpp17::invoke` in .

Review: https://reviews.apache.org/r/64248/
{noformat}

> Support moving into defer/dispatch/install handlers.
> 
>
> Key: MESOS-8301
> URL: https://issues.apache.org/jira/browse/MESOS-8301
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> Currently, dispatch and defer will take copies of the provided arguments. 
> Also, a protobuf message handler cannot move the supplied message. We should 
> support moves for these for efficiency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279048#comment-16279048
 ] 

Michael Park edited comment on MESOS-6972 at 12/5/17 11:05 PM:
---

{noformat}
commit 8670d2e9224485b94a40654d4a2102d8340fe4ac
Author: Michael Park 
Date:   Mon Dec 4 17:25:36 2017 -0800

Added `lambda::partial` to .

Review: https://reviews.apache.org/r/64274/
{noformat}
{noformat}
commit 3b9db404cb0b50edef1958423b26364e63ccaa27
Author: Michael Park 
Date:   Mon Dec 4 17:25:33 2017 -0800

Added `cpp17::invoke` in .

Review: https://reviews.apache.org/r/64248/
{noformat}


was (Author: mcypark):
{noformat}
commit 8670d2e9224485b94a40654d4a2102d8340fe4ac (private/ci/mpark/utility, 
ci/mpark/utility)
Author: Michael Park 
Date:   Mon Dec 4 17:25:36 2017 -0800

Added `lambda::partial` to .

Review: https://reviews.apache.org/r/64274/
{noformat}
{noformat}
commit 3b9db404cb0b50edef1958423b26364e63ccaa27
Author: Michael Park 
Date:   Mon Dec 4 17:25:33 2017 -0800

Added `cpp17::invoke` in .

Review: https://reviews.apache.org/r/64248/
{noformat}

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: performance, tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7681) Add safeguard for new agents with new features + old master

2017-12-05 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279334#comment-16279334
 ] 

Vinod Kone commented on MESOS-7681:
---

FYI, Master capabilities have landed. [~mcypark] will you be working on this?

> Add safeguard for new agents with new features + old master
> ---
>
> Key: MESOS-7681
> URL: https://issues.apache.org/jira/browse/MESOS-7681
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>  Labels: mesosphere
>
> Consider this scenario:
> * Mesos cluster with 3 masters and 1 agent.
> * 2 of the masters (including the leader) are upgraded to Mesos 1.4; 
> remaining master stays at Mesos 1.3 (e.g., due to operator error).
> * Agent is upgraded to Mesos 1.4
> * Framework creates a reservation refinement on the agent
> * Leading master fails; Mesos 1.3 master is elected as the new leader
> In this scenario, the agent will send resources to the master in the new 
> (post-refinement) format, but the master will not understand those new 
> fields. This results in an inconsistency between the agent's resources and 
> the master's view of the agent's resources. This could lead to various 
> problems -- in effect, the reservation the framework previously made has been 
> "forgotten" during master failover. Similarly, if the agent attempts to 
> unreserve the resources (using the master's version of the resource), that 
> operation will be rejected by the agent.
> To fix this, it seems we need an explicit negotiation between the agent and 
> the master as part of registration/re-registration. The agent would examine 
> its resources and say which capabilities it _requires_ of the master (not 
> just the capabilities the agent _supports_); if the master does not support 
> those capabilities, the agent cannot safely register.
> We could implement this either via master capabilities (agent computes the 
> master capabilities it requires and declines to register if the master isn't 
> new enough), or via agent capabilities (agent tells master the capabilities 
> it is "actively using"; master refuses to allow any agent to register that is 
> using a capability the master doesn't recognize/support). Probably the former 
> is safer/cleaner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8302) Improve master failover performance.

2017-12-05 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279335#comment-16279335
 ] 

Benjamin Mahler commented on MESOS-8302:


{noformat}
commit 8ea9245e5c6d1f8b84219eb5d8b36f3cec1c6a7a
Author: Dmitry Zhuk 
Date:   Wed Nov 22 11:59:17 2017 -0700

Optimized resources logging in master.

When master logs agent or task resources, it uses `operator <<` for raw
protobuf data, which outputs resources in JSON format, and is rather
slow. However resources are known to be valid and refined when logged by
master, so it's faster to use `operator <<` after protobuf is converted
to `Resources`.

Review: https://reviews.apache.org/r/63959/
{noformat}

{noformat}
commit 24550e7b863fee371877ea9d1d9153e0f5054155
Author: Dmitry Zhuk 
Date:   Wed Nov 29 18:47:13 2017 -0800

Improved master failover performance by avoiding resource conversions.

RepeatedPtrField can be implicitly converted to Resources,
leading to hidden multiple resources conversions on performance-critical
paths in master. For example, operator += relies on implicit
conversion, when invoked with RepeatedPtrField argument.
Using protobuf also implies data validation and sanitization, e.g. when
converting to Resources, as protobuf generally comes from untrusted
sources. By doing conversion only once, and then reusing the result,
we save on these checks as well, as operations on Resources are
generally faster as they can trust data in Resources.

Review: https://reviews.apache.org/r/64028/
{noformat}

{noformat}
commit b7ad2c0d4e7308a70049a6a04f19e3709df0e539
Author: Dmitry Zhuk 
Date:   Tue Nov 21 10:11:46 2017 -0800

Preallocated buffer for resources conversion.

When converting collections of protobuf `Resource` to `Resources`,
`std::vector` could be resized several times. This patch ensures that
there is enough capacity to fit all resources in `vector` and avoid
resizes.

Review: https://reviews.apache.org/r/63960/
{noformat}

> Improve master failover performance.
> 
>
> Key: MESOS-8302
> URL: https://issues.apache.org/jira/browse/MESOS-8302
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> This is somewhat more like an epic, but will track the different improvements 
> here for now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8302) Improve master failover performance.

2017-12-05 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-8302:
--

 Summary: Improve master failover performance.
 Key: MESOS-8302
 URL: https://issues.apache.org/jira/browse/MESOS-8302
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Benjamin Mahler
Assignee: Dmitry Zhuk


This is somewhat more like an epic, but will track the different improvements 
here for now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-5675) Add support for master capabilities

2017-12-05 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-5675:
-

Assignee: Benno Evers
Target Version/s: 1.5.0

> Add support for master capabilities
> ---
>
> Key: MESOS-5675
> URL: https://issues.apache.org/jira/browse/MESOS-5675
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Reporter: Neil Conway
>Assignee: Benno Evers
>  Labels: mesosphere
>
> Right now, frameworks can advertise their capabilities to the master via the 
> {{FrameworkInfo}} they use for registration/re-registration. This allows 
> masters to provide backward compatibility for old frameworks that don't 
> support new functionality.
> To allow new frameworks to support backward compatibility with old masters, 
> the inverse concept would be useful: masters would tell frameworks which 
> capabilities are supported by the master, which the frameworks could then use 
> to decide whether to use features only supported by more recent versions of 
> the master.
> For now, frameworks can workaround this by looking at the master's version 
> number, but that seems a bit fragile and hacky.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8301) Support moving into defer/dispatch/install handlers.

2017-12-05 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279325#comment-16279325
 ] 

Benjamin Mahler commented on MESOS-8301:


{{install}} support:

{noformat}
commit 8cd1bd2fcb8a4d2330a781a86b62e55ada6d4984
Author: Dmitry Zhuk 
Date:   Wed Nov 15 14:19:39 2017 -0800

Enabled rvalue reference parameters in protobuf handlers.

Using rvalue reference parameter in protobuf handler opts-out of arena
usage, and allows the handler to move the message.

Review: https://reviews.apache.org/r/63639/
{noformat}

> Support moving into defer/dispatch/install handlers.
> 
>
> Key: MESOS-8301
> URL: https://issues.apache.org/jira/browse/MESOS-8301
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Mahler
>Assignee: Dmitry Zhuk
>
> Currently, dispatch and defer will take copies of the provided arguments. 
> Also, a protobuf message handler cannot move the supplied message. We should 
> support moves for these for efficiency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8301) Support moving into defer/dispatch/install handlers.

2017-12-05 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-8301:
--

 Summary: Support moving into defer/dispatch/install handlers.
 Key: MESOS-8301
 URL: https://issues.apache.org/jira/browse/MESOS-8301
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Benjamin Mahler
Assignee: Dmitry Zhuk


Currently, dispatch and defer will take copies of the provided arguments. Also, 
a protobuf message handler cannot move the supplied message. We should support 
moves for these for efficiency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7607) Support for first-class fault domains.

2017-12-05 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7607:
--
Target Version/s: 1.5.0  (was: 1.4.0)

> Support for first-class fault domains.
> --
>
> Key: MESOS-7607
> URL: https://issues.apache.org/jira/browse/MESOS-7607
> Project: Mesos
>  Issue Type: Epic
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Mesos should support a first-class notion of "fault domains", which 
> effectively provide a common vocabulary for describing the region and zone 
> where a node (either master or agent) is located.
> Design doc: 
> https://drive.google.com/open?id=1gEugdkLRbBsqsiFv3urRPRNrHwUC-i1HwfFfHR_MvC8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7607) Support for first-class fault domains.

2017-12-05 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7607:
--
Fix Version/s: 1.5.0

> Support for first-class fault domains.
> --
>
> Key: MESOS-7607
> URL: https://issues.apache.org/jira/browse/MESOS-7607
> Project: Mesos
>  Issue Type: Epic
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Mesos should support a first-class notion of "fault domains", which 
> effectively provide a common vocabulary for describing the region and zone 
> where a node (either master or agent) is located.
> Design doc: 
> https://drive.google.com/open?id=1gEugdkLRbBsqsiFv3urRPRNrHwUC-i1HwfFfHR_MvC8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7607) Support for first-class fault domains.

2017-12-05 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated MESOS-7607:
--
Fix Version/s: (was: 1.5.0)

> Support for first-class fault domains.
> --
>
> Key: MESOS-7607
> URL: https://issues.apache.org/jira/browse/MESOS-7607
> Project: Mesos
>  Issue Type: Epic
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> Mesos should support a first-class notion of "fault domains", which 
> effectively provide a common vocabulary for describing the region and zone 
> where a node (either master or agent) is located.
> Design doc: 
> https://drive.google.com/open?id=1gEugdkLRbBsqsiFv3urRPRNrHwUC-i1HwfFfHR_MvC8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8115) Add a master flag to disallow agents that are not configured with fault domain

2017-12-05 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-8115:
-

Assignee: Benno Evers  (was: Vinod Kone)

> Add a master flag to disallow agents that are not configured with fault domain
> --
>
> Key: MESOS-8115
> URL: https://issues.apache.org/jira/browse/MESOS-8115
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Benno Evers
>
> Once mesos masters and agents in a cluster are *all* upgraded to a version 
> where the fault domains feature is available, it is beneficial to enforce 
> that agents without a fault domain configured are not allowed to join the 
> cluster. 
> This is a safety net for operators who could forget to configure the fault 
> domain of a remote agent and let it join the cluster. If this happens, an 
> agent in a remote region will be considered a local agent by the master and 
> frameworks (because agent's fault domain is not configured) causing tasks to 
> potentially land in a remote agent which is undesirable.
> Note that this has to be a configurable flag and not enforced by default 
> because otherwise upgrades from a fault domain non-configured cluster to a 
> configured cluster will not be possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8292) Update webui to show fault domains

2017-12-05 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-8292:
-

Assignee: Benno Evers

> Update webui to show fault domains
> --
>
> Key: MESOS-8292
> URL: https://issues.apache.org/jira/browse/MESOS-8292
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Vinod Kone
>Assignee: Benno Evers
>  Labels: newbie
>
> At the least the nodes and tasks page should show what region and zone they 
> are running in. Maybe the home page can also show the region/zone of the 
> leading master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8291) Add documentation about fault domains

2017-12-05 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-8291:
-

Assignee: Benno Evers

> Add documentation about fault domains
> -
>
> Key: MESOS-8291
> URL: https://issues.apache.org/jira/browse/MESOS-8291
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Benno Evers
>
> We need some user docs for fault domains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart

2017-12-05 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279313#comment-16279313
 ] 

Vinod Kone commented on MESOS-1739:
---

Design doc: 
https://docs.google.com/document/d/1iOENs0JoXPc7sf1NDBCR2tPJ_KxwU4lLtr53SrE5U3Q/edit

> Allow slave reconfiguration on restart
> --
>
> Key: MESOS-1739
> URL: https://issues.apache.org/jira/browse/MESOS-1739
> Project: Mesos
>  Issue Type: Epic
>Reporter: Patrick Reilly
>Assignee: Benno Evers
>  Labels: external-volumes, mesosphere, myriad
>
> Make it so that either via a slave restart or a out of process "reconfigure" 
> ping, the attributes and resources of a slave can be updated to be a superset 
> of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-1739) Allow slave reconfiguration on restart

2017-12-05 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-1739:
-

Assignee: Benno Evers

> Allow slave reconfiguration on restart
> --
>
> Key: MESOS-1739
> URL: https://issues.apache.org/jira/browse/MESOS-1739
> Project: Mesos
>  Issue Type: Epic
>Reporter: Patrick Reilly
>Assignee: Benno Evers
>  Labels: external-volumes, mesosphere, myriad
>
> Make it so that either via a slave restart or a out of process "reconfigure" 
> ping, the attributes and resources of a slave can be updated to be a superset 
> of what they used to be.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279279#comment-16279279
 ] 

Michael Park commented on MESOS-6972:
-

{noformat}
commit bbd8381ebce3522841e80ae53f56b3049342f15b
Author: Dmitry Zhuk 
Date:   Tue Dec 5 13:47:53 2017 -0800

Replaced `std::shared_ptr` with `std::unique_ptr` in `Future`.

Review: https://reviews.apache.org/r/63913/
{noformat}
{noformat}
commit bca8c6a05d03a2162c04703a9c1ac8172fdfae8a
Author: Dmitry Zhuk 
Date:   Tue Dec 5 13:45:56 2017 -0800

Replaced `std::shared_ptr` with `std::unique_ptr` in `dispatch`.

Since `dispatch` can now handle move-only parameters, `Promise` and
function object can be wrapped into `std::unique_ptr` for efficiency.
{noformat}

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: performance, tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8288) SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect is flaky.

2017-12-05 Thread Vinod Kone (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone reassigned MESOS-8288:
-

Assignee: Benjamin Mahler

> SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect is flaky.
> ---
>
> Key: MESOS-8288
> URL: https://issues.apache.org/jira/browse/MESOS-8288
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: CentOS 7;
> Debian 8;
> Ubuntu 16.04;
>Reporter: Alexander Rukletsov
>Assignee: Benjamin Mahler
>  Labels: flaky-test
> Attachments: 
> IgnoreV0ExecutorIfItReregistersWithoutReconnect-badrun.txt
>
>
> {noformat}
> ../../src/tests/slave_tests.cpp:7888
> Actual function call count doesn't match EXPECT_CALL(exec, shutdown(_))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8288) SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect is flaky.

2017-12-05 Thread Vinod Kone (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279214#comment-16279214
 ] 

Vinod Kone commented on MESOS-8288:
---

[~bmahler] Can you look into this as its been pretty flaky? Looks like you 
wrote the test. The expectation on executor shutdown in the test seems wrong 
because we are not waiting for it to happen before exiting the test.

```
  // Now spoof an executor re-registration, it should be ignored
  // and the agent should not respond.
  EXPECT_NO_FUTURE_PROTOBUFS(ExecutorReregisteredMessage(), _, _);

  Future executorShutdown;
  EXPECT_CALL(exec, shutdown(_))
.WillOnce(FutureSatisfy());

  UPID executorPid = registerExecutorMessage->from;
  UPID agentPid = registerExecutorMessage->to;

  ReregisterExecutorMessage reregisterExecutorMessage;
  reregisterExecutorMessage.mutable_executor_id()->CopyFrom(
  task.executor().executor_id());
  reregisterExecutorMessage.mutable_framework_id()->CopyFrom(
  frameworkId);

  process::post(executorPid, agentPid, reregisterExecutorMessage);

  Clock::settle();
  EXPECT_TRUE(executorShutdown.isPending());

  driver.stop();
  driver.join();
```

> SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect is flaky.
> ---
>
> Key: MESOS-8288
> URL: https://issues.apache.org/jira/browse/MESOS-8288
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: CentOS 7;
> Debian 8;
> Ubuntu 16.04;
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: 
> IgnoreV0ExecutorIfItReregistersWithoutReconnect-badrun.txt
>
>
> {noformat}
> ../../src/tests/slave_tests.cpp:7888
> Actual function call count doesn't match EXPECT_CALL(exec, shutdown(_))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279122#comment-16279122
 ] 

Michael Park edited comment on MESOS-6972 at 12/5/17 7:54 PM:
--

{noformat}
commit 6839897c5464fce6b8cbd253d959a7e2efd72987
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:21:27 2017 -0800

Made `Event` move-only in libprocess.

Review: https://reviews.apache.org/r/64347/
{noformat}
{noformat}
commit c9e6a03c02e9f8dc040b937ccd5ae89e5530fd7e
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:21:11 2017 -0800

Used `std::move` for `Event`s consumption in the master.

Review: https://reviews.apache.org/r/63641/
{noformat}
{noformat}
commit 8014e3f9e1838745a6f3af7c1e2a557bd74349b0
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:20:55 2017 -0800

Added `CallableOnce` support in `Future`.

`Future` guarantees that callbacks are called at most once, so it can
use `lambda::CallableOnce` to expicitly declare this, and allow
corresponding optimizations with moves.

Review: https://reviews.apache.org/r/63638/
{noformat}
{noformat}
commit 09b72e9bbf87793ce84df5d5f9d5f292c60fa5ee
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:20:41 2017 -0800

Added `CallableOnce` support in `defer`.

This allows `defer` result to be converted to `CallableOnce`, ensuring
that bound arguments are moved, when call is made, and avoiding making
copies of bound arguments.

Review: https://reviews.apache.org/r/63637/
{noformat}


was (Author: mcypark):
{noformat}
commit 6839897c5464fce6b8cbd253d959a7e2efd72987 (HEAD -> master, 
upstream/master)
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:21:27 2017 -0800

Made `Event` move-only in libprocess.

Review: https://reviews.apache.org/r/64347/
{noformat}
{noformat}
commit c9e6a03c02e9f8dc040b937ccd5ae89e5530fd7e
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:21:11 2017 -0800

Used `std::move` for `Event`s consumption in the master.

Review: https://reviews.apache.org/r/63641/
{noformat}
{noformat}
commit 8014e3f9e1838745a6f3af7c1e2a557bd74349b0
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:20:55 2017 -0800

Added `CallableOnce` support in `Future`.

`Future` guarantees that callbacks are called at most once, so it can
use `lambda::CallableOnce` to expicitly declare this, and allow
corresponding optimizations with moves.

Review: https://reviews.apache.org/r/63638/
{noformat}
{noformat}
commit 09b72e9bbf87793ce84df5d5f9d5f292c60fa5ee
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:20:41 2017 -0800

Added `CallableOnce` support in `defer`.

This allows `defer` result to be converted to `CallableOnce`, ensuring
that bound arguments are moved, when call is made, and avoiding making
copies of bound arguments.

Review: https://reviews.apache.org/r/63637/
{noformat}

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: performance, tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279122#comment-16279122
 ] 

Michael Park commented on MESOS-6972:
-

{noformat}
commit 6839897c5464fce6b8cbd253d959a7e2efd72987 (HEAD -> master, 
upstream/master)
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:21:27 2017 -0800

Made `Event` move-only in libprocess.

Review: https://reviews.apache.org/r/64347/
{noformat}
{noformat}
commit c9e6a03c02e9f8dc040b937ccd5ae89e5530fd7e
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:21:11 2017 -0800

Used `std::move` for `Event`s consumption in the master.

Review: https://reviews.apache.org/r/63641/
{noformat}
{noformat}
commit 8014e3f9e1838745a6f3af7c1e2a557bd74349b0
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:20:55 2017 -0800

Added `CallableOnce` support in `Future`.

`Future` guarantees that callbacks are called at most once, so it can
use `lambda::CallableOnce` to expicitly declare this, and allow
corresponding optimizations with moves.

Review: https://reviews.apache.org/r/63638/
{noformat}
{noformat}
commit 09b72e9bbf87793ce84df5d5f9d5f292c60fa5ee
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:20:41 2017 -0800

Added `CallableOnce` support in `defer`.

This allows `defer` result to be converted to `CallableOnce`, ensuring
that bound arguments are moved, when call is made, and avoiding making
copies of bound arguments.

Review: https://reviews.apache.org/r/63637/
{noformat}

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: performance, tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8288) SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect is flaky.

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8288:
---
Environment: 
CentOS 7;
Debian 8;
Ubuntu 16.04;

  was:CentOS 7, Debian 8


> SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect is flaky.
> ---
>
> Key: MESOS-8288
> URL: https://issues.apache.org/jira/browse/MESOS-8288
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: CentOS 7;
> Debian 8;
> Ubuntu 16.04;
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: 
> IgnoreV0ExecutorIfItReregistersWithoutReconnect-badrun.txt
>
>
> {noformat}
> ../../src/tests/slave_tests.cpp:7888
> Actual function call count doesn't match EXPECT_CALL(exec, shutdown(_))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279069#comment-16279069
 ] 

Michael Park commented on MESOS-6972:
-

{noformat}
commit 62b4727310873e80c82516971150082924d9075a
Author: Dmitry Zhuk 
Date:   Tue Dec 5 11:04:00 2017 -0800

Reduced # of supported arguments in `_Deferred` conversion operators.

Conversion of `_Deferred` to `std::function` and `Deferred` currently
supports up to 12 parameters in function signature. However, this is
unnecessary and is a huge overhead. Most usages require just one
parameter (e.g. when `defer` is used with `Future`). And there are few
usages with two parameters (in `master.cpp` to initialize allocator, and
in `slave.cpp` to install signal handler). This number of parameters is
different from the number of parameters passed to `defer`, but it's
related and can be defined as maximum number of placeholders that can be
passed to `defer`.

Given that `deferred.hpp` is indirectly included in most source files,
it is beneficial to keep this number low. This patch changes maximum
number of parameters to 2.

Review: https://reviews.apache.org/r/64338/
{noformat}
{noformat}
commit 13eda27802cfe05800d4dcbed22c46ca7b46bafb
Author: Dmitry Zhuk 
Date:   Tue Dec 5 10:40:37 2017 -0800

Prepared `defer` for use in callable-once contexts.

This changes `defer` to use `lambda::partial` instead of `std::bind`,
which allows it be used in callable-once contexts.

Review: https://reviews.apache.org/r/63635/
{noformat}
{noformat}
commit 0d9ce98e9df97be06144d2e29cf23a9c090a06b3
Author: Dmitry Zhuk 
Date:   Tue Dec 5 10:39:47 2017 -0800

Changed dispatch to use callable once functors.

`dispatch` guarantees that functor will be called at most once, and
therefore it allows optimizations, such as moves of deferred objects.

Review: https://reviews.apache.org/r/63634/
{noformat}

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: performance, tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7370) Fix create symlink code to use flag which enables non-admins to make symlinks

2017-12-05 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279063#comment-16279063
 ] 

Andrew Schwartzmeyer commented on MESOS-7370:
-

For the record, this requires at minimum Windows 10 Insiders Build 14972, and 
will be formally delivered in Windows 10 Creators Update, which was the Spring 
2017 update.

An "invalid error parameter" error indicates that Windows needs to be updated.

> Fix create symlink code to use flag which enables non-admins to make symlinks
> -
>
> Key: MESOS-7370
> URL: https://issues.apache.org/jira/browse/MESOS-7370
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
> Environment: Windows 10
>Reporter: Andrew Schwartzmeyer
>Assignee: Andrew Schwartzmeyer
>  Labels: windows
> Fix For: 1.6.0
>
>
> Specifically {{SYMBOLIC_LINK_FLAG_ALLOW_UNPRIVILEGED_CREATE}}.
> bq. Specify this flag to allow creation of symbolic links when the process is 
> not elevated



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279050#comment-16279050
 ] 

Michael Park edited comment on MESOS-6972 at 12/5/17 7:02 PM:
--

{noformat}
commit c9462f4927cfffb1f3a90827467ded730c0f40b9
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:28 2017 -0800

Migrated to use the `EventConsumer` interface.

Review: https://reviews.apache.org/r/63632/
{noformat}
{noformat}
commit 6b91f62769a1f8c525162fc716b1d5c231c77811
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:23 2017 -0800

Separated `Event` visitation and consumption.

This introduces the `EventConsumer` interface, which add support of
`Event`s with move-only data. This allows consumers to move data out of
`Event`, rather than copying it. This is required to implement move-only
objects support in `defer` to guarantee that deferred function object is
invoked only once, allowing deferred parameters to be moved into call.

Review: https://reviews.apache.org/r/63631/
{noformat}
{noformat}
commit 1074a9c3fd7aa9d2e4b484f86dbe657271abecc0
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 07:23:28 2017 -0800

Fixed the signature of `CallableOnce::operator()`.

This changes `operator()` signature to match the one defined in
`CallableOnce` template parameter. Previously used form incorrectly
specifies that `operator()` can be invoked with arbitrary number and
types of parameters, which can break other templates using SFINAE to
check if function object can be invoked with specific parameters.

Review: https://reviews.apache.org/r/64337/
{noformat}


was (Author: mcypark):
{noformat}
commit c9462f4927cfffb1f3a90827467ded730c0f40b9
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:28 2017 -0800

Migrated to use the `EventConsumer` interface.

Review: https://reviews.apache.org/r/63632/
{noformat}
{noformat}
commit 6b91f62769a1f8c525162fc716b1d5c231c77811
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:23 2017 -0800

Separated `Event` visitation and consumption.

This introduces the `EventConsumer` interface, which add support of
`Event`s with move-only data. This allows consumers to move data out of
`Event`, rather than copying it. This is required to implement move-only
objects support in `defer` to guarantee that deferred function object is
invoked only once, allowing deferred parameters to be moved into call.

Review: https://reviews.apache.org/r/63631/
{noformat}
{noformat}
commit 1074a9c3fd7aa9d2e4b484f86dbe657271abecc0
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 07:23:28 2017 -0800

 Fixed the signature of `CallableOnce::operator()`.

This changes `operator()` signature to match the one defined in
`CallableOnce` template parameter. Previously used form incorrectly
specifies that `operator()` can be invoked with arbitrary number and
types of parameters, which can break other templates using SFINAE to
check if function object can be invoked with specific parameters.

Review: https://reviews.apache.org/r/64337/
{noformat}

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: performance, tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279050#comment-16279050
 ] 

Michael Park edited comment on MESOS-6972 at 12/5/17 7:02 PM:
--

{noformat}
commit c9462f4927cfffb1f3a90827467ded730c0f40b9
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:28 2017 -0800

Migrated to use the `EventConsumer` interface.

Review: https://reviews.apache.org/r/63632/
{noformat}
{noformat}
commit 6b91f62769a1f8c525162fc716b1d5c231c77811
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:23 2017 -0800

Separated `Event` visitation and consumption.

This introduces the `EventConsumer` interface, which add support of
`Event`s with move-only data. This allows consumers to move data out of
`Event`, rather than copying it. This is required to implement move-only
objects support in `defer` to guarantee that deferred function object is
invoked only once, allowing deferred parameters to be moved into call.

Review: https://reviews.apache.org/r/63631/
{noformat}
{noformat}
commit 1074a9c3fd7aa9d2e4b484f86dbe657271abecc0
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 07:23:28 2017 -0800

 Fixed the signature of `CallableOnce::operator()`.

This changes `operator()` signature to match the one defined in
`CallableOnce` template parameter. Previously used form incorrectly
specifies that `operator()` can be invoked with arbitrary number and
types of parameters, which can break other templates using SFINAE to
check if function object can be invoked with specific parameters.

Review: https://reviews.apache.org/r/64337/
{noformat}


was (Author: mcypark):
{noformat}
commit c9462f4927cfffb1f3a90827467ded730c0f40b9
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:28 2017 -0800

Migrated to use the `EventConsumer` interface.

Review: https://reviews.apache.org/r/63632/
{noformat}
{noformat}
commit 6b91f62769a1f8c525162fc716b1d5c231c77811
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:23 2017 -0800

Separated `Event` visitation and consumption.

This introduces the `EventConsumer` interface, which add support of
`Event`s with move-only data. This allows consumers to move data out of
`Event`, rather than copying it. This is required to implement move-only
objects support in `defer` to guarantee that deferred function object is
invoked only once, allowing deferred parameters to be moved into call.

Review: https://reviews.apache.org/r/63631/
{noformat}
{noformat}
commit 1074a9c3fd7aa9d2e4b484f86dbe657271abecc0
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 07:23:28 2017 -0800

Fixed the signature of `CallableOnce::operator()`.

This changes `operator()` signature to match the one defined in
`CallableOnce` template parameter. Previously used form incorrectly
specifies that `operator()` can be invoked with arbitrary number and
types of parameters, which can break other templates using SFINAE to
check if function object can be invoked with specific parameters.

Review: https://reviews.apache.org/r/64337/
{noformat}

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: performance, tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279049#comment-16279049
 ] 

Michael Park commented on MESOS-6972:
-

{noformat}
commit ececa115966067551e00098f973f0104a0782730
Author: Michael Park 
Date:   Tue Dec 5 09:57:51 2017 -0800

Removed `constexpr`-ness from `cpp17::invoke`.

`std::invoke` is not marked `constexpr`.

Review: https://reviews.apache.org/r/64332/
{noformat}
{noformat}
commit 59329cbc605d578572e90db62070b2bb99fdbb9c
Author: Michael Park 
Date:   Tue Dec 5 09:57:40 2017 -0800

Added more `cpp17::invoke` test cases.

Review: https://reviews.apache.org/r/64331/
{noformat}

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: performance, tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279050#comment-16279050
 ] 

Michael Park commented on MESOS-6972:
-

{noformat}
commit c9462f4927cfffb1f3a90827467ded730c0f40b9
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:28 2017 -0800

Migrated to use the `EventConsumer` interface.

Review: https://reviews.apache.org/r/63632/
{noformat}
{noformat}
commit 6b91f62769a1f8c525162fc716b1d5c231c77811
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 10:11:23 2017 -0800

Separated `Event` visitation and consumption.

This introduces the `EventConsumer` interface, which add support of
`Event`s with move-only data. This allows consumers to move data out of
`Event`, rather than copying it. This is required to implement move-only
objects support in `defer` to guarantee that deferred function object is
invoked only once, allowing deferred parameters to be moved into call.

Review: https://reviews.apache.org/r/63631/
{noformat}
{noformat}
commit 1074a9c3fd7aa9d2e4b484f86dbe657271abecc0
Author: Dmitry Zhuk dz...@twopensource.com
Date:   Tue Dec 5 07:23:28 2017 -0800

Fixed the signature of `CallableOnce::operator()`.

This changes `operator()` signature to match the one defined in
`CallableOnce` template parameter. Previously used form incorrectly
specifies that `operator()` can be invoked with arbitrary number and
types of parameters, which can break other templates using SFINAE to
check if function object can be invoked with specific parameters.

Review: https://reviews.apache.org/r/64337/
{noformat}

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: performance, tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6972) Improve performance of protobuf message passing by removing RepeatedPtrField to vector conversion.

2017-12-05 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279048#comment-16279048
 ] 

Michael Park commented on MESOS-6972:
-

{noformat}
commit 8670d2e9224485b94a40654d4a2102d8340fe4ac (private/ci/mpark/utility, 
ci/mpark/utility)
Author: Michael Park 
Date:   Mon Dec 4 17:25:36 2017 -0800

Added `lambda::partial` to .

Review: https://reviews.apache.org/r/64274/
{noformat}
{noformat}
commit 3b9db404cb0b50edef1958423b26364e63ccaa27
Author: Michael Park 
Date:   Mon Dec 4 17:25:33 2017 -0800

Added `cpp17::invoke` in .

Review: https://reviews.apache.org/r/64248/
{noformat}

> Improve performance of protobuf message passing by removing RepeatedPtrField 
> to vector conversion.
> --
>
> Key: MESOS-6972
> URL: https://issues.apache.org/jira/browse/MESOS-6972
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Mahler
>  Labels: performance, tech-debt
>
> Currently, all protobuf message handlers must take a {{vector}} for repeated 
> fields, rather than a {{RepeatedPtrField}}.
> This requires that a copy be performed of the repeated field's entries (see 
> [here|https://github.com/apache/mesos/blob/9228ebc239dac42825390bebc72053dbf3ae7b09/3rdparty/libprocess/include/process/protobuf.hpp#L78-L87]),
>  which can be very expensive in some cases. We should avoid requiring this 
> expense on the callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-8286) Making bind mounts readonly fails with user namespaces.

2017-12-05 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278923#comment-16278923
 ] 

James Peach edited comment on MESOS-8286 at 12/5/17 6:18 PM:
-

This happens because once the mount namespace is a child of a user namespace, 
it is considered unprivileged and the {{CL_UNPRIVILEGED}} flag is set on the 
mount. Once this flag is set, then a remount that changes the flags *must* 
preserve all the existing flags on the mount (see 
[do_remount|https://github.com/torvalds/linux/blob/master/fs/namespace.c#L2283]).
 

When we bind mount a file from the host, the mount flags from the host 
filesystem are inherited to the new bind mount. For example:

{noformat}
/NetworkManager/resolv.conf 
/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_Ea3QSF/provisioner/containers/2378c60f-d0ab-4144-8df5-46a2f2b5e9fe/containers/a1700239-518d-4908-a7b4-21deda36df8a/backends/overlay/rootfses/12c7b91f-5042-473d-bf38-30f4e9127e08/etc/resolv.conf
 rw,nosuid,nodev rw,mode=755 tmpfs tmpfs 
...
Failed to remount bind mount as readonly from '/etc/resolv.conf' to 
'/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_Ea3QSF/provisioner/containers/2378c60f-d0ab-4144-8df5-46a2f2b5e9fe/containers/a1700239-518d-4908-a7b4-21deda36df8a/backends/overlay/rootfses/12c7b91f-5042-473d-bf38-30f4e9127e08/etc/resolv.conf':
 Operation not permitted
{noformat}

Updating the {{MS_RDONLY}} flag fails because although {{MS_REMOUNT}} only 
implements changing {{MS_RDONLY}}, it actually checks that all the per-mount 
flags were preserved and we omitted the inherited {{MS_NOSUID|MS_NODEV}} flags.
 


was (Author: jamespeach):
This happens because once the mount namespace is a child of a user namespace, 
it is considered unprivileged and the {{CL_UNPRIVILEGED}} flag is set on the 
mount. Once this flag is set, then a remount that changes the flags *must* 
preserve all the existing flags on the mount (see 
[do_remount|https://github.com/torvalds/linux/blob/master/fs/namespace.c#L2283]).
 

When we bind mount a file from the host, the mount flags from the host 
filesystem are inherited to the new bind mount. For example:

{noformat}
/NetworkManager/resolv.conf 
/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_Ea3QSF/provisioner/containers/2378c60f-d0ab-4144-8df5-46a2f2b5e9fe/containers/a1700239-518d-4908-a7b4-21deda36df8a/backends/overlay/rootfses/12c7b91f-5042-473d-bf38-30f4e9127e08/etc/resolv.conf
 rw,nosuid,nodev rw,mode=755 tmpfs tmpfs 
...
Failed to remount bind mount as readonly from '/etc/resolv.conf' to 
'/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_Ea3QSF/provisioner/containers/2378c60f-d0ab-4144-8df5-46a2f2b5e9fe/containers/a1700239-518d-4908-a7b4-21deda36df8a/backends/overlay/rootfses/12c7b91f-5042-473d-bf38-30f4e9127e08/etc/resolv.conf':
 Operation not permitted
{noformat}

Updating the {{MS_RDONLY}} flag fails because {{MS_REMOUNT}} actually updates 
all the flags and we omitted the inherited {{MS_NOSUID|MS_NODEV}} flags.
 

> Making bind mounts readonly fails with user namespaces.
> ---
>
> Key: MESOS-8286
> URL: https://issues.apache.org/jira/browse/MESOS-8286
> Project: Mesos
>  Issue Type: Improvement
>Reporter: James Peach
>Assignee: James Peach
>
> When user namespaces are in effect, the additional mounts performed by the 
> CNI isolator to bind host network files read-only  fail. The initial bind 
> mount succeeds, but the subsequent remount is failing. The reason for the 
> failure isn't clear to me - there are a number of kernel checks and I don't 
> know which one is failing yet.
> {noformat}
> ...
> [pid 15609] execve("/home/jpeach/src/mesos/build/src/mesos-containerizer", 
> ["/home/jpeach/src/mesos/build/src"..., "launch"], 0x7f74a001c450 /* 30 vars 
> */I1130 17:04:34.281958 15537 containerizer.cpp:2921] Transitioning the state 
> of container 
> 0a0fdd6b-9532-4010-913b-5e36cad6f666.c4b9a777-eb6c-4c4a-9c4c-5d39e23373eb 
> from PREPARING to ISOLATING
> ) = 0
> strace: Process 15610 attached
> [pid 15610] execve("/home/jpeach/src/mesos/build/src/mesos-containerizer", 
> ["mesos-containerizer", "network-cni-setup", "--bind_host_files=false", 
> "--bind_readonly=true", "--etc_hostname_path=/etc/hostnam"..., 
> "--etc_hosts_path=/etc/hosts", "--etc_resolv_conf=/etc/resolv.co"..., 
> "--help=false", "--pid=15609", "--rootfs=/tmp/ExecutorType_UserN"...], 
> 0x58f07f0 /* 24 vars */) = 0
> [pid 15610] mount(NULL, "/", NULL, MS_REC|MS_SLAVE, NULL) = 0
> [pid 15610] mount("/etc/resolv.conf", 
> 

[jira] [Comment Edited] (MESOS-8286) Making bind mounts readonly fails with user namespaces.

2017-12-05 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278923#comment-16278923
 ] 

James Peach edited comment on MESOS-8286 at 12/5/17 5:43 PM:
-

This happens because once the mount namespace is a child of a user namespace, 
it is considered unprivileged and the {{CL_UNPRIVILEGED}} flag is set on the 
mount. Once this flag is set, then a remount that changes the flags *must* 
preserve all the existing flags on the mount (see 
[do_remount|https://github.com/torvalds/linux/blob/master/fs/namespace.c#L2283]).
 

When we bind mount a file from the host, the mount flags from the host 
filesystem are inherited to the new bind mount. For example:

{noformat}
/NetworkManager/resolv.conf 
/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_Ea3QSF/provisioner/containers/2378c60f-d0ab-4144-8df5-46a2f2b5e9fe/containers/a1700239-518d-4908-a7b4-21deda36df8a/backends/overlay/rootfses/12c7b91f-5042-473d-bf38-30f4e9127e08/etc/resolv.conf
 rw,nosuid,nodev rw,mode=755 tmpfs tmpfs 
...
Failed to remount bind mount as readonly from '/etc/resolv.conf' to 
'/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_Ea3QSF/provisioner/containers/2378c60f-d0ab-4144-8df5-46a2f2b5e9fe/containers/a1700239-518d-4908-a7b4-21deda36df8a/backends/overlay/rootfses/12c7b91f-5042-473d-bf38-30f4e9127e08/etc/resolv.conf':
 Operation not permitted
{noformat}

Updating the {{MS_RDONLY}} flag fails because {{MS_REMOUNT}} actually updates 
all the flags and we omitted the inherited {{MS_NOSUID|MS_NODEV}} flags.
 


was (Author: jamespeach):
This happens because once the mount namespace is a child of a user namespace, 
it is considered unprivileged and the {{CL_UNPRIVILEGED}} flag is set on the 
mount. Once this flag is set, then a remount that changes the flags *must* 
preserve all the existing flags on the mount (see 
[do_remount|https://github.com/torvalds/linux/blob/master/fs/namespace.c#L2283].
 

When we bind mount a file from the host, the mount flags from the host 
filesystem are inherited to the new bind mount. For example:

{noformat}
/NetworkManager/resolv.conf 
/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_Ea3QSF/provisioner/containers/2378c60f-d0ab-4144-8df5-46a2f2b5e9fe/containers/a1700239-518d-4908-a7b4-21deda36df8a/backends/overlay/rootfses/12c7b91f-5042-473d-bf38-30f4e9127e08/etc/resolv.conf
 rw,nosuid,nodev rw,mode=755 tmpfs tmpfs 
...
Failed to remount bind mount as readonly from '/etc/resolv.conf' to 
'/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_Ea3QSF/provisioner/containers/2378c60f-d0ab-4144-8df5-46a2f2b5e9fe/containers/a1700239-518d-4908-a7b4-21deda36df8a/backends/overlay/rootfses/12c7b91f-5042-473d-bf38-30f4e9127e08/etc/resolv.conf':
 Operation not permitted
{noformat}

Updating the {{MS_RDONLY}} flag fails because {{MS_REMOUNT}} actually updates 
all the flags and we omitted the inherited {{MS_NOSUID|MS_NODEV}} flags.
 

> Making bind mounts readonly fails with user namespaces.
> ---
>
> Key: MESOS-8286
> URL: https://issues.apache.org/jira/browse/MESOS-8286
> Project: Mesos
>  Issue Type: Improvement
>Reporter: James Peach
>Assignee: James Peach
>
> When user namespaces are in effect, the additional mounts performed by the 
> CNI isolator to bind host network files read-only  fail. The initial bind 
> mount succeeds, but the subsequent remount is failing. The reason for the 
> failure isn't clear to me - there are a number of kernel checks and I don't 
> know which one is failing yet.
> {noformat}
> ...
> [pid 15609] execve("/home/jpeach/src/mesos/build/src/mesos-containerizer", 
> ["/home/jpeach/src/mesos/build/src"..., "launch"], 0x7f74a001c450 /* 30 vars 
> */I1130 17:04:34.281958 15537 containerizer.cpp:2921] Transitioning the state 
> of container 
> 0a0fdd6b-9532-4010-913b-5e36cad6f666.c4b9a777-eb6c-4c4a-9c4c-5d39e23373eb 
> from PREPARING to ISOLATING
> ) = 0
> strace: Process 15610 attached
> [pid 15610] execve("/home/jpeach/src/mesos/build/src/mesos-containerizer", 
> ["mesos-containerizer", "network-cni-setup", "--bind_host_files=false", 
> "--bind_readonly=true", "--etc_hostname_path=/etc/hostnam"..., 
> "--etc_hosts_path=/etc/hosts", "--etc_resolv_conf=/etc/resolv.co"..., 
> "--help=false", "--pid=15609", "--rootfs=/tmp/ExecutorType_UserN"...], 
> 0x58f07f0 /* 24 vars */) = 0
> [pid 15610] mount(NULL, "/", NULL, MS_REC|MS_SLAVE, NULL) = 0
> [pid 15610] mount("/etc/resolv.conf", 
> 

[jira] [Commented] (MESOS-8286) Making bind mounts readonly fails with user namespaces.

2017-12-05 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278923#comment-16278923
 ] 

James Peach commented on MESOS-8286:


This happens because once the mount namespace is a child of a user namespace, 
it is considered unprivileged and the {{CL_UNPRIVILEGED}} flag is set on the 
mount. Once this flag is set, then a remount that changes the flags *must* 
preserve all the existing flags on the mount (see 
[do_remount|https://github.com/torvalds/linux/blob/master/fs/namespace.c#L2283].
 

When we bind mount a file from the host, the mount flags from the host 
filesystem are inherited to the new bind mount. For example:

{noformat}
/NetworkManager/resolv.conf 
/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_Ea3QSF/provisioner/containers/2378c60f-d0ab-4144-8df5-46a2f2b5e9fe/containers/a1700239-518d-4908-a7b4-21deda36df8a/backends/overlay/rootfses/12c7b91f-5042-473d-bf38-30f4e9127e08/etc/resolv.conf
 rw,nosuid,nodev rw,mode=755 tmpfs tmpfs 
...
Failed to remount bind mount as readonly from '/etc/resolv.conf' to 
'/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_Ea3QSF/provisioner/containers/2378c60f-d0ab-4144-8df5-46a2f2b5e9fe/containers/a1700239-518d-4908-a7b4-21deda36df8a/backends/overlay/rootfses/12c7b91f-5042-473d-bf38-30f4e9127e08/etc/resolv.conf':
 Operation not permitted
{noformat}

Updating the {{MS_RDONLY}} flag fails because {{MS_REMOUNT}} actually updates 
all the flags and we omitted the inherited {{MS_NOSUID|MS_NODEV}} flags.
 

> Making bind mounts readonly fails with user namespaces.
> ---
>
> Key: MESOS-8286
> URL: https://issues.apache.org/jira/browse/MESOS-8286
> Project: Mesos
>  Issue Type: Improvement
>Reporter: James Peach
>Assignee: James Peach
>
> When user namespaces are in effect, the additional mounts performed by the 
> CNI isolator to bind host network files read-only  fail. The initial bind 
> mount succeeds, but the subsequent remount is failing. The reason for the 
> failure isn't clear to me - there are a number of kernel checks and I don't 
> know which one is failing yet.
> {noformat}
> ...
> [pid 15609] execve("/home/jpeach/src/mesos/build/src/mesos-containerizer", 
> ["/home/jpeach/src/mesos/build/src"..., "launch"], 0x7f74a001c450 /* 30 vars 
> */I1130 17:04:34.281958 15537 containerizer.cpp:2921] Transitioning the state 
> of container 
> 0a0fdd6b-9532-4010-913b-5e36cad6f666.c4b9a777-eb6c-4c4a-9c4c-5d39e23373eb 
> from PREPARING to ISOLATING
> ) = 0
> strace: Process 15610 attached
> [pid 15610] execve("/home/jpeach/src/mesos/build/src/mesos-containerizer", 
> ["mesos-containerizer", "network-cni-setup", "--bind_host_files=false", 
> "--bind_readonly=true", "--etc_hostname_path=/etc/hostnam"..., 
> "--etc_hosts_path=/etc/hosts", "--etc_resolv_conf=/etc/resolv.co"..., 
> "--help=false", "--pid=15609", "--rootfs=/tmp/ExecutorType_UserN"...], 
> 0x58f07f0 /* 24 vars */) = 0
> [pid 15610] mount(NULL, "/", NULL, MS_REC|MS_SLAVE, NULL) = 0
> [pid 15610] mount("/etc/resolv.conf", 
> "/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_IMJpTh/provisioner/containers/0a0fdd6b-9532-4010-913b-5e36cad6f666/containers/c4b9a777-eb6c-4c4a-9c4c-5d39e23373eb/backends/overlay/rootfses/0aaba267-75e7-444a-9f3a-adb22adcf195/etc/resolv.conf",
>  NULL, MS_BIND, NULL) = 0
> [pid 15610] mount(NULL, 
> "/tmp/ExecutorType_UserNamespaceIsolatorTest_ROOT_USER_DockerTask_DefaultExecutor_IMJpTh/provisioner/containers/0a0fdd6b-9532-4010-913b-5e36cad6f666/containers/c4b9a777-eb6c-4c4a-9c4c-5d39e23373eb/backends/overlay/rootfses/0aaba267-75e7-444a-9f3a-adb22adcf195/etc/resolv.conf",
>  NULL, MS_RDONLY|MS_REMOUNT, NULL) = -1 EPERM (Operation not permitted)
> [pid 15610] +++ exited with 1 +++
> ...
> {noformat}
> Note that in this log I've experimentally modified the mount flags, but that 
> doesn't make any difference.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8300) build fault

2017-12-05 Thread john skaller (JIRA)
john skaller created MESOS-8300:
---

 Summary: build fault
 Key: MESOS-8300
 URL: https://issues.apache.org/jira/browse/MESOS-8300
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.4.1
 Environment: OSX  10.12.1
~/mesos-1.4.1/build>clang --version
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.1.0
Thread model: posix
InstalledDir: 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Reporter: john skaller


../../../3rdparty/libprocess/../stout/include/stout/jsonify.hpp:113:3: error: 
‘locale_t’ does not name a type
   locale_t original_locale_;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (MESOS-3160) CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky

2017-12-05 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278640#comment-16278640
 ] 

Alexander Rukletsov edited comment on MESOS-3160 at 12/5/17 2:46 PM:
-

At the moment of writing the segfault has not been observed for some time (and 
is probably fixed by MESOS-7921). However, the test still fails frequently with 
the following error:
{noformat}
../../src/tests/containerizer/cgroups_tests.cpp:1132
helper.increaseRSS(os::pagesize()): Failed to sync with the subprocess
{noformat}


was (Author: alexr):
At the moment of writing the segfault has not been observed for some time (and 
is probably fixed by ). However, the test still fails frequently with the 
following error:
{noformat}
../../src/tests/containerizer/cgroups_tests.cpp:1132
helper.increaseRSS(os::pagesize()): Failed to sync with the subprocess
{noformat}

> CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky
> 
>
> Key: MESOS-3160
> URL: https://issues.apache.org/jira/browse/MESOS-3160
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04
> CentOS 7
>Reporter: Paul Brett
>  Labels: cgroups, flaky-test, mesosphere
>
> Test will occasionally with:
> [ RUN  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): Failed to sync with the subprocess
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): The subprocess has not been spawned yet
> [  FAILED  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS 
> (223 ms)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-3160) CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky

2017-12-05 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278640#comment-16278640
 ] 

Alexander Rukletsov commented on MESOS-3160:


At the moment of writing the segfault has not been observed for some time (and 
is probably fixed by ). However, the test still fails frequently with the 
following error:
{noformat}
../../src/tests/containerizer/cgroups_tests.cpp:1132
helper.increaseRSS(os::pagesize()): Failed to sync with the subprocess
{noformat}

> CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky
> 
>
> Key: MESOS-3160
> URL: https://issues.apache.org/jira/browse/MESOS-3160
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04
> CentOS 7
>Reporter: Paul Brett
>  Labels: cgroups, flaky-test, mesosphere
>
> Test will occasionally with:
> [ RUN  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): Failed to sync with the subprocess
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): The subprocess has not been spawned yet
> [  FAILED  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS 
> (223 ms)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-3160) CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-3160:
---
Environment: 
Ubuntu 14.04
CentOS 7

> CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseRSS Flaky
> 
>
> Key: MESOS-3160
> URL: https://issues.apache.org/jira/browse/MESOS-3160
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0, 0.26.0
> Environment: Ubuntu 14.04
> CentOS 7
>Reporter: Paul Brett
>  Labels: cgroups, flaky-test, mesosphere
>
> Test will occasionally with:
> [ RUN  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): Failed to sync with the subprocess
> ../../src/tests/containerizer/cgroups_tests.cpp:1103: Failure
> helper.increaseRSS(getpagesize()): The subprocess has not been spawned yet
> [  FAILED  ] CgroupsAnyHierarchyMemoryPressureTest.ROOT_IncreaseUnlockedRSS 
> (223 ms)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8123) GPU tests are failing due to TASK_STARTING.

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov reassigned MESOS-8123:
--

Assignee: Benno Evers  (was: Alexander Rukletsov)

> GPU tests are failing due to TASK_STARTING.
> ---
>
> Key: MESOS-8123
> URL: https://issues.apache.org/jira/browse/MESOS-8123
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Jie Yu
>Assignee: Benno Evers
>  Labels: flaky-test, mesosphere
> Fix For: 1.5.0
>
>
> For instance: NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_VerifyDeviceAccess
> {noformat}
> I1020 22:18:46.180371  1480 exec.cpp:237] Executor registered on agent 
> ca0e7b44-c621-4442-a62e-15f7bf02064b-S0
> I1020 22:18:46.185027  1486 executor.cpp:171] Received SUBSCRIBED event
> I1020 22:18:46.186005  1486 executor.cpp:175] Subscribed executor on core-dev
> I1020 22:18:46.186189  1486 executor.cpp:171] Received LAUNCH event
> I1020 22:18:46.188908  1486 executor.cpp:637] Starting task 
> 3c08cf78-575d-4813-82b6-3ace272db35e
> I1020 22:18:46.192939  1316 slave.cpp:4407] Handling status update 
> TASK_STARTING (UUID: 87cee290-b2fe-4459-9b75-b9f03aab6492) for task 
> 3c08cf78-575d-4813-82b6-3ace272db35e of fra
> mework ca0e7b44-c621-4442-a62e-15f7bf02064b- from 
> executor(1)@10.0.49.2:42711
> I1020 22:18:46.196228  1330 status_update_manager.cpp:323] Received status 
> update TASK_STARTING (UUID: 87cee290-b2fe-4459-9b75-b9f03aab6492) for task 
> 3c08cf78-575d-4813-82b6-3ace
> 272db35e of framework ca0e7b44-c621-4442-a62e-15f7bf02064b-
> I1020 22:18:46.197510  1329 slave.cpp:4888] Forwarding the update 
> TASK_STARTING (UUID: 87cee290-b2fe-4459-9b75-b9f03aab6492) for task 
> 3c08cf78-575d-4813-82b6-3ace272db35e of fram
> ework ca0e7b44-c621-4442-a62e-15f7bf02064b- to master@10.0.49.2:34819
> I1020 22:18:46.197927  1329 slave.cpp:4798] Sending acknowledgement for 
> status update TASK_STARTING (UUID: 87cee290-b2fe-4459-9b75-b9f03aab6492) for 
> task 3c08cf78-575d-4813-82b6-
> 3ace272db35e of framework ca0e7b44-c621-4442-a62e-15f7bf02064b- to 
> executor(1)@10.0.49.2:42711
> I1020 22:18:46.198098  1332 master.cpp:6998] Status update TASK_STARTING 
> (UUID: 87cee290-b2fe-4459-9b75-b9f03aab6492) for task 
> 3c08cf78-575d-4813-82b6-3ace272db35e of framework c
> a0e7b44-c621-4442-a62e-15f7bf02064b- from agent 
> ca0e7b44-c621-4442-a62e-15f7bf02064b-S0 at slave(1)@10.0.49.2:34819 (core-dev)
> I1020 22:18:46.198187  1332 master.cpp:7060] Forwarding status update 
> TASK_STARTING (UUID: 87cee290-b2fe-4459-9b75-b9f03aab6492) for task 
> 3c08cf78-575d-4813-82b6-3ace272db35e of 
> framework ca0e7b44-c621-4442-a62e-15f7bf02064b-
> I1020 22:18:46.198463  1332 master.cpp:9162] Updating the state of task 
> 3c08cf78-575d-4813-82b6-3ace272db35e of framework 
> ca0e7b44-c621-4442-a62e-15f7bf02064b- (latest state:
>  TASK_STARTING, status update state: TASK_STARTING)
> I1020 22:18:46.199198  1331 master.cpp:5566] Processing ACKNOWLEDGE call 
> 87cee290-b2fe-4459-9b75-b9f03aab6492 for task 
> 3c08cf78-575d-4813-82b6-3ace272db35e of framework ca0e7b44-
> c621-4442-a62e-15f7bf02064b- (default) at 
> scheduler-f2b66689-382a-4b8c-bdc9-978cff922409@10.0.49.2:34819 on agent 
> ca0e7b44-c621-4442-a62e-15f7bf02064b-S0
> /home/jie/workspace/mesos/src/tests/containerizer/nvidia_gpu_isolator_tests.cpp:142:
>  Failure
>   Expected: TASK_RUNNING
> To be equal to: statusRunning1->state()
>   Which is: TASK_STARTING
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7972) SlaveTest.HTTPSchedulerSlaveRestart test is flaky.

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7972:
---
Summary: SlaveTest.HTTPSchedulerSlaveRestart test is flaky.  (was: 
SlaveTest.HTTPSchedulerSlaveRestart test is flaky)

> SlaveTest.HTTPSchedulerSlaveRestart test is flaky.
> --
>
> Key: MESOS-7972
> URL: https://issues.apache.org/jira/browse/MESOS-7972
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.0
>Reporter: Vinod Kone
>Assignee: Benjamin Mahler
>  Labels: flaky-test, mesosphere
> Fix For: 1.5.0
>
> Attachments: slave_test_http_scheduler_restart.bad.log, 
> slave_test_http_scheduler_restart.good.log
>
>
> Saw this on ASF CI when testing 1.4.0-rc5
> {code}
> [ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
> I0912 05:40:15.280185 32547 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0912 05:40:15.282783 32554 master.cpp:442] Master 
> c23ff8cf-cb2f-40d0-8f18-871a41f128cf (b909d5e22907) started on 
> 172.17.0.2:58922
> I0912 05:40:15.282804 32554 master.cpp:444] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="true" --authenticate_frameworks="true" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/he1E9j/credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/he1E9j/master" 
> --zk_session_timeout="10secs"
> I0912 05:40:15.283092 32554 master.cpp:494] Master only allowing 
> authenticated frameworks to register
> I0912 05:40:15.283110 32554 master.cpp:508] Master only allowing 
> authenticated agents to register
> I0912 05:40:15.283118 32554 master.cpp:521] Master only allowing 
> authenticated HTTP frameworks to register
> I0912 05:40:15.283123 32554 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/he1E9j/credentials'
> I0912 05:40:15.283394 32554 master.cpp:566] Using default 'crammd5' 
> authenticator
> I0912 05:40:15.283543 32554 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0912 05:40:15.283731 32554 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0912 05:40:15.283887 32554 http.cpp:1026] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0912 05:40:15.284021 32554 master.cpp:646] Authorization enabled
> I0912 05:40:15.284293 32552 whitelist_watcher.cpp:77] No whitelist given
> I0912 05:40:15.284335 32550 hierarchical.cpp:171] Initialized hierarchical 
> allocator process
> I0912 05:40:15.287078 32561 master.cpp:2163] Elected as the leading master!
> I0912 05:40:15.287103 32561 master.cpp:1702] Recovering from registrar
> I0912 05:40:15.287214 32557 registrar.cpp:347] Recovering registrar
> I0912 05:40:15.287703 32557 registrar.cpp:391] Successfully fetched the 
> registry (0B) in 455936ns
> I0912 05:40:15.287791 32557 registrar.cpp:495] Applied 1 operations in 
> 24179ns; attempting to update the registry
> I0912 05:40:15.288317 32557 registrar.cpp:552] Successfully updated the 
> registry in 473088ns
> I0912 05:40:15.288435 32557 registrar.cpp:424] Successfully recovered 
> registrar
> I0912 05:40:15.288789 32548 master.cpp:1801] Recovered 0 agents from the 
> registry (129B); allowing 10mins for agents to re-register
> I0912 05:40:15.288822 32559 hierarchical.cpp:209] Skipping recovery of 
> hierarchical allocator: nothing to recover
> I0912 05:40:15.292457 32547 containerizer.cpp:246] Using isolation: 
> posix/cpu,posix/mem,filesystem/posix,network/cni,environment_secret
> W0912 05:40:15.293053 32547 backend.cpp:76] Failed to create 'aufs' backend: 
> AufsBackend requires root privileges
> W0912 05:40:15.293184 32547 backend.cpp:76] Failed 

[jira] [Updated] (MESOS-8058) Agent and master can race when updating agent state.

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8058:
---
Summary: Agent and master can race when updating agent state.  (was: Agent 
and master can race when updating agent state)

> Agent and master can race when updating agent state.
> 
>
> Key: MESOS-8058
> URL: https://issues.apache.org/jira/browse/MESOS-8058
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.5.0
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Critical
>  Labels: mesosphere
> Fix For: 1.5.0
>
>
> In {{2af9a5b07dc80151154264e974d03f56a1c25838}} we introduce the use of 
> {{UpdateSlaveMessage}} for the agent to inform the master about its current 
> total resources. Currently we trigger this message only on agent registration 
> and reregistration.
> This can race with operations applied in the master and communicated via 
> {{CheckpointResourcesMessage}}.
> Example:
> 1. Agent ({{cpus:4(\*)}} registers.
> 2. Master is triggered to apply an operation to the agent's resources, e.g., 
> a reservation: {{cpus:4(\*) -> cpus:4(A)}}. The master applies the operation 
> to its current view of the agent's resources and sends the agent a 
> {{CheckpointResourcesMessage}} so the agent can persist the result.
> 3. The agent sends the master an {{UpdateSlaveMessage}}, e.g., {{cpus:4(\*)}} 
> since it hasn't received the {{CheckpointResourcesMessage}} yet.
> 4. The master processes the {{UpdateSlaveMessage}} and updates its view of 
> the agent's resources to be {{cpus:4(\*)}}.
> 5. The agent processes the {{CheckpointResourcesMessage}} and updates its 
> view of its resources to be {{cpus:4(A)}}.
> 6. The agent and the master have an inconsistent view of the agent's 
> resources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7742) ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7742:
---
Description: 
Observed this on ASF CI and internal Mesosphere CI. Affected tests:
{noformat}
AgentAPIStreamingTest.AttachInputToNestedContainerSession
AgentAPITest.LaunchNestedContainerSession
AgentAPITest.AttachContainerInputAuthorization/0
AgentAPITest.LaunchNestedContainerSessionWithTTY/0
AgentAPITest.LaunchNestedContainerSessionDisconnected/1
{noformat}

This issue comes at least in three different flavours. Take 
{{AgentAPIStreamingTest.AttachInputToNestedContainerSession}} as an example.
h5. Flavour 1
{noformat}
../../src/tests/api_tests.cpp:6473
Value of: (response).get().status
  Actual: "503 Service Unavailable"
Expected: http::OK().status
Which is: "200 OK"
Body: ""
{noformat}

h5. Flavour 2
{noformat}
../../src/tests/api_tests.cpp:6473
Value of: (response).get().status
  Actual: "500 Internal Server Error"
Expected: http::OK().status
Which is: "200 OK"
Body: "Disconnected"
{noformat}

h5. Flavour 3
{noformat}
/home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-16.04/mesos/src/tests/api_tests.cpp:6367
Value of: (sessionResponse).get().status
  Actual: "500 Internal Server Error"
Expected: http::OK().status
Which is: "200 OK"
Body: ""
{noformat}

  was:
Observed this on ASF CI and internal Mesosphere CI. Affected tests:
{noformat}
AgentAPIStreamingTest.AttachInputToNestedContainerSession
AgentAPITest.LaunchNestedContainerSession
AgentAPITest.AttachContainerInputAuthorization/0
AgentAPITest.LaunchNestedContainerSessionWithTTY/0
AgentAPITest.LaunchNestedContainerSessionDisconnected/1
{noformat}

{code}
[ RUN  ] 
ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession/0
I0629 05:49:33.180673 25301 cluster.cpp:162] Creating default 'local' authorizer
I0629 05:49:33.182234 25306 master.cpp:436] Master 
90ea1640-bdf3-49ba-b78f-b2ba7ea30077 (296af9b598c3) started on 172.17.0.3:45726
I0629 05:49:33.182289 25306 master.cpp:438] Flags at startup: --acls="" 
--agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
--allocation_interval="1secs" -
-allocator="HierarchicalDRF" --authenticate_agents="true" 
--authenticate_frameworks="true" --authenticate_http_frameworks="true" 
--authenticate_http_readonly="true" --au
thenticate_http_readwrite="true" --authenticators="crammd5" 
--authorizers="local" --credentials="/tmp/a5h5J3/credentials" 
--framework_sorter="drf" --help="false" --hostn
ame_lookup="true" --http_authenticators="basic" 
--http_framework_authenticators="basic" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" 
--logging_level="INFO" --max_agent_ping_timeouts="5" 
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" 
--max_unreachable_tasks_per_framework="10
00" --port="5050" --quiet="false" --recovery_agent_removal_limit="100%" 
--registry="in_memory" --registry_fetch_timeout="1mins" 
--registry_gc_interval="15mins" --registr
y_max_agent_age="2weeks" --registry_max_agent_count="102400" 
--registry_store_timeout="100secs" --registry_strict="false" 
--root_submissions="true" --user_sorter="drf" -
-version="false" --webui_dir="/usr/local/share/mesos/webui" 
--work_dir="/tmp/a5h5J3/master" --zk_session_timeout="10secs"
I0629 05:49:33.182561 25306 master.cpp:488] Master only allowing authenticated 
frameworks to register
I0629 05:49:33.182610 25306 master.cpp:502] Master only allowing authenticated 
agents to register
I0629 05:49:33.182636 25306 master.cpp:515] Master only allowing authenticated 
HTTP frameworks to register
I0629 05:49:33.182656 25306 credentials.hpp:37] Loading credentials for 
authentication from '/tmp/a5h5J3/credentials'
I0629 05:49:33.182915 25306 master.cpp:560] Using default 'crammd5' 
authenticator
I0629 05:49:33.183009 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readonly'
I0629 05:49:33.183151 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-readwrite'
I0629 05:49:33.183218 25306 http.cpp:975] Creating default 'basic' HTTP 
authenticator for realm 'mesos-master-scheduler'
I0629 05:49:33.183284 25306 master.cpp:640] Authorization enabled
I0629 05:49:33.183462 25309 hierarchical.cpp:158] Initialized hierarchical 
allocator process
I0629 05:49:33.183504 25309 whitelist_watcher.cpp:77] No whitelist given
I0629 05:49:33.184311 25308 master.cpp:2161] Elected as the leading master!
I0629 05:49:33.184341 25308 master.cpp:1700] Recovering from registrar
I0629 05:49:33.184404 25308 registrar.cpp:345] Recovering registrar
I0629 05:49:33.184622 25308 registrar.cpp:389] Successfully fetched the 
registry (0B) in 183040ns
I0629 05:49:33.184687 25308 registrar.cpp:493] Applied 1 operations in 6441ns; 
attempting to update the registry
I0629 05:49:33.184885 25304 registrar.cpp:550] 

[jira] [Updated] (MESOS-7028) NetSocketTest.EOFBeforeRecv is flaky

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-7028:
---
Environment: 
ASF CI, autotools, gcc, CentOS 7, libevent/SSL enabled;
Mac OS with SSL enabled;
CentOS 6 with SSL enabled;

  was:
ASF CI, autotools, gcc, CentOS 7, libevent/SSL enabled
Mac OS with SSL enabled


> NetSocketTest.EOFBeforeRecv is flaky
> 
>
> Key: MESOS-7028
> URL: https://issues.apache.org/jira/browse/MESOS-7028
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, test
> Environment: ASF CI, autotools, gcc, CentOS 7, libevent/SSL enabled;
> Mac OS with SSL enabled;
> CentOS 6 with SSL enabled;
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: flaky, flaky-test, libprocess, mesosphere, socket, ssl
>
> This was observed on ASF CI:
> {code}
> [ RUN  ] Encryption/NetSocketTest.EOFBeforeRecv/0
> I0128 03:48:51.444228 27745 openssl.cpp:419] CA file path is unspecified! 
> NOTE: Set CA file path with LIBPROCESS_SSL_CA_FILE=
> I0128 03:48:51.444252 27745 openssl.cpp:424] CA directory path unspecified! 
> NOTE: Set CA directory path with LIBPROCESS_SSL_CA_DIR=
> I0128 03:48:51.444257 27745 openssl.cpp:429] Will not verify peer certificate!
> NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
> I0128 03:48:51.444262 27745 openssl.cpp:435] Will only verify peer 
> certificate if presented!
> NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate 
> verification
> I0128 03:48:51.447341 27745 process.cpp:1246] libprocess is initialized on 
> 172.17.0.2:45515 with 16 worker threads
> ../../../3rdparty/libprocess/src/tests/socket_tests.cpp:196: Failure
> Failed to wait 15secs for client->recv()
> [  FAILED  ] Encryption/NetSocketTest.EOFBeforeRecv/0, where GetParam() = 
> "SSL" (15269 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8288) SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect is flaky.

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8288:
---
Environment: CentOS 7, Debian 8  (was: CentOS 7)

> SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect is flaky.
> ---
>
> Key: MESOS-8288
> URL: https://issues.apache.org/jira/browse/MESOS-8288
> Project: Mesos
>  Issue Type: Bug
>  Components: test
> Environment: CentOS 7, Debian 8
>Reporter: Alexander Rukletsov
>  Labels: flaky-test
> Attachments: 
> IgnoreV0ExecutorIfItReregistersWithoutReconnect-badrun.txt
>
>
> {noformat}
> ../../src/tests/slave_tests.cpp:7888
> Actual function call count doesn't match EXPECT_CALL(exec, shutdown(_))...
>  Expected: to be called once
>Actual: never called - unsatisfied and active
> {noformat}
> Full log attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5139) Some ProvisionerDockerLocalStoreTest.* are flaky due to tar issue.

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5139:
---
Summary: Some ProvisionerDockerLocalStoreTest.* are flaky due to tar issue. 
 (was: ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar is flaky)

> Some ProvisionerDockerLocalStoreTest.* are flaky due to tar issue.
> --
>
> Key: MESOS-5139
> URL: https://issues.apache.org/jira/browse/MESOS-5139
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0, 1.0.4, 1.1.3, 1.2.3, 1.3.1, 1.4.1
> Environment: Ubuntu 14.04
> Ubuntu 16.04
>Reporter: Vinod Kone
>  Labels: mesosphere
>
> These tests are still occasionally fail as of Mesos 1.5.0-wip:
> {code}
> ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar
> ProvisionerDockerLocalStoreTest.MetadataManagerInitialization
> ProvisionerDockerLocalStoreTest.MissingLayer
> {code}
> Found this on ASF CI while testing 0.28.1-rc2
> {code}
> [ RUN  ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar
> E0406 18:29:30.870481   520 shell.hpp:93] Command 'hadoop version 2>&1' 
> failed; this is the output:
> sh: 1: hadoop: not found
> E0406 18:29:30.870576   520 fetcher.cpp:59] Failed to create URI fetcher 
> plugin 'hadoop': Failed to create HDFS client: Failed to execute 'hadoop 
> version 2>&1'; the command was either not found or exited with a non-zero 
> exit status: 127
> I0406 18:29:30.871052   520 local_puller.cpp:90] Creating local puller with 
> docker registry '/tmp/3l8ZBv/images'
> I0406 18:29:30.873325   539 metadata_manager.cpp:159] Looking for image 'abc'
> I0406 18:29:30.874438   539 local_puller.cpp:142] Untarring image 'abc' from 
> '/tmp/3l8ZBv/images/abc.tar' to '/tmp/3l8ZBv/store/staging/5tw8bD'
> I0406 18:29:30.901916   547 local_puller.cpp:162] The repositories JSON file 
> for image 'abc' is '{"abc":{"latest":"456"}}'
> I0406 18:29:30.902304   547 local_puller.cpp:290] Extracting layer tar ball 
> '/tmp/3l8ZBv/store/staging/5tw8bD/123/layer.tar to rootfs 
> '/tmp/3l8ZBv/store/staging/5tw8bD/123/rootfs'
> I0406 18:29:30.909144   547 local_puller.cpp:290] Extracting layer tar ball 
> '/tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar to rootfs 
> '/tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs'
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:183: Failure
> (imageInfo).failure(): Collect failed: Subprocess 'tar, tar, -x, -f, 
> /tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar, -C, 
> /tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs' failed: tar: This does not look 
> like a tar archive
> tar: Exiting with failure status due to previous errors
> [  FAILED  ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar (243 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5139) ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar is flaky

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5139:
---
Affects Version/s: 1.0.4
   1.1.3
   1.2.3
   1.3.1
   1.4.1
  Environment: 
Ubuntu 14.04
Ubuntu 16.04

  was:Ubuntu14.04


> ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar is flaky
> --
>
> Key: MESOS-5139
> URL: https://issues.apache.org/jira/browse/MESOS-5139
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0, 1.0.4, 1.1.3, 1.2.3, 1.3.1, 1.4.1
> Environment: Ubuntu 14.04
> Ubuntu 16.04
>Reporter: Vinod Kone
>  Labels: mesosphere
>
> These tests are still occasionally fail as of Mesos 1.5.0-wip:
> {code}
> ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar
> ProvisionerDockerLocalStoreTest.MetadataManagerInitialization
> ProvisionerDockerLocalStoreTest.MissingLayer
> {code}
> Found this on ASF CI while testing 0.28.1-rc2
> {code}
> [ RUN  ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar
> E0406 18:29:30.870481   520 shell.hpp:93] Command 'hadoop version 2>&1' 
> failed; this is the output:
> sh: 1: hadoop: not found
> E0406 18:29:30.870576   520 fetcher.cpp:59] Failed to create URI fetcher 
> plugin 'hadoop': Failed to create HDFS client: Failed to execute 'hadoop 
> version 2>&1'; the command was either not found or exited with a non-zero 
> exit status: 127
> I0406 18:29:30.871052   520 local_puller.cpp:90] Creating local puller with 
> docker registry '/tmp/3l8ZBv/images'
> I0406 18:29:30.873325   539 metadata_manager.cpp:159] Looking for image 'abc'
> I0406 18:29:30.874438   539 local_puller.cpp:142] Untarring image 'abc' from 
> '/tmp/3l8ZBv/images/abc.tar' to '/tmp/3l8ZBv/store/staging/5tw8bD'
> I0406 18:29:30.901916   547 local_puller.cpp:162] The repositories JSON file 
> for image 'abc' is '{"abc":{"latest":"456"}}'
> I0406 18:29:30.902304   547 local_puller.cpp:290] Extracting layer tar ball 
> '/tmp/3l8ZBv/store/staging/5tw8bD/123/layer.tar to rootfs 
> '/tmp/3l8ZBv/store/staging/5tw8bD/123/rootfs'
> I0406 18:29:30.909144   547 local_puller.cpp:290] Extracting layer tar ball 
> '/tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar to rootfs 
> '/tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs'
> ../../src/tests/containerizer/provisioner_docker_tests.cpp:183: Failure
> (imageInfo).failure(): Collect failed: Subprocess 'tar, tar, -x, -f, 
> /tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar, -C, 
> /tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs' failed: tar: This does not look 
> like a tar archive
> tar: Exiting with failure status due to previous errors
> [  FAILED  ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar (243 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5139) ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar is flaky

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-5139:
---
Description: 
These tests are still occasionally fail as of Mesos 1.5.0-wip:
{code}
ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar
ProvisionerDockerLocalStoreTest.MetadataManagerInitialization
ProvisionerDockerLocalStoreTest.MissingLayer
{code}

Found this on ASF CI while testing 0.28.1-rc2

{code}
[ RUN  ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar
E0406 18:29:30.870481   520 shell.hpp:93] Command 'hadoop version 2>&1' failed; 
this is the output:
sh: 1: hadoop: not found
E0406 18:29:30.870576   520 fetcher.cpp:59] Failed to create URI fetcher plugin 
'hadoop': Failed to create HDFS client: Failed to execute 'hadoop version 
2>&1'; the command was either not found or exited with a non-zero exit status: 
127
I0406 18:29:30.871052   520 local_puller.cpp:90] Creating local puller with 
docker registry '/tmp/3l8ZBv/images'
I0406 18:29:30.873325   539 metadata_manager.cpp:159] Looking for image 'abc'
I0406 18:29:30.874438   539 local_puller.cpp:142] Untarring image 'abc' from 
'/tmp/3l8ZBv/images/abc.tar' to '/tmp/3l8ZBv/store/staging/5tw8bD'
I0406 18:29:30.901916   547 local_puller.cpp:162] The repositories JSON file 
for image 'abc' is '{"abc":{"latest":"456"}}'
I0406 18:29:30.902304   547 local_puller.cpp:290] Extracting layer tar ball 
'/tmp/3l8ZBv/store/staging/5tw8bD/123/layer.tar to rootfs 
'/tmp/3l8ZBv/store/staging/5tw8bD/123/rootfs'
I0406 18:29:30.909144   547 local_puller.cpp:290] Extracting layer tar ball 
'/tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar to rootfs 
'/tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs'
../../src/tests/containerizer/provisioner_docker_tests.cpp:183: Failure
(imageInfo).failure(): Collect failed: Subprocess 'tar, tar, -x, -f, 
/tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar, -C, 
/tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs' failed: tar: This does not look 
like a tar archive
tar: Exiting with failure status due to previous errors

[  FAILED  ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar (243 ms)
{code}

  was:
Found this on ASF CI while testing 0.28.1-rc2

{code}
[ RUN  ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar
E0406 18:29:30.870481   520 shell.hpp:93] Command 'hadoop version 2>&1' failed; 
this is the output:
sh: 1: hadoop: not found
E0406 18:29:30.870576   520 fetcher.cpp:59] Failed to create URI fetcher plugin 
'hadoop': Failed to create HDFS client: Failed to execute 'hadoop version 
2>&1'; the command was either not found or exited with a non-zero exit status: 
127
I0406 18:29:30.871052   520 local_puller.cpp:90] Creating local puller with 
docker registry '/tmp/3l8ZBv/images'
I0406 18:29:30.873325   539 metadata_manager.cpp:159] Looking for image 'abc'
I0406 18:29:30.874438   539 local_puller.cpp:142] Untarring image 'abc' from 
'/tmp/3l8ZBv/images/abc.tar' to '/tmp/3l8ZBv/store/staging/5tw8bD'
I0406 18:29:30.901916   547 local_puller.cpp:162] The repositories JSON file 
for image 'abc' is '{"abc":{"latest":"456"}}'
I0406 18:29:30.902304   547 local_puller.cpp:290] Extracting layer tar ball 
'/tmp/3l8ZBv/store/staging/5tw8bD/123/layer.tar to rootfs 
'/tmp/3l8ZBv/store/staging/5tw8bD/123/rootfs'
I0406 18:29:30.909144   547 local_puller.cpp:290] Extracting layer tar ball 
'/tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar to rootfs 
'/tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs'
../../src/tests/containerizer/provisioner_docker_tests.cpp:183: Failure
(imageInfo).failure(): Collect failed: Subprocess 'tar, tar, -x, -f, 
/tmp/3l8ZBv/store/staging/5tw8bD/456/layer.tar, -C, 
/tmp/3l8ZBv/store/staging/5tw8bD/456/rootfs' failed: tar: This does not look 
like a tar archive
tar: Exiting with failure status due to previous errors

[  FAILED  ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar (243 ms)
{code}


> ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar is flaky
> --
>
> Key: MESOS-5139
> URL: https://issues.apache.org/jira/browse/MESOS-5139
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.0
> Environment: Ubuntu14.04
>Reporter: Vinod Kone
>  Labels: mesosphere
>
> These tests are still occasionally fail as of Mesos 1.5.0-wip:
> {code}
> ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar
> ProvisionerDockerLocalStoreTest.MetadataManagerInitialization
> ProvisionerDockerLocalStoreTest.MissingLayer
> {code}
> Found this on ASF CI while testing 0.28.1-rc2
> {code}
> [ RUN  ] ProvisionerDockerLocalStoreTest.LocalStoreTestWithTar
> E0406 18:29:30.870481   520 shell.hpp:93] Command 'hadoop version 2>&1' 
> failed; this is the output:
> sh: 1: hadoop: not found
> E0406 18:29:30.870576   520 fetcher.cpp:59] Failed to create URI fetcher 
> plugin 'hadoop': 

[jira] [Updated] (MESOS-8297) Built-in driver-based executors ignore kill task if the task has not been launched.

2017-12-05 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-8297:
---
Description: If docker executor receives a kill task request and the task 
has never been launch, the request is ignored. We now know that: the executor 
has never received the registration confirmation, hence has ignored the launch 
task request, hence the task has never started. And this is how the executor 
enters an idle state, waiting for registration and ignoring kill task requests.

> Built-in driver-based executors ignore kill task if the task has not been 
> launched.
> ---
>
> Key: MESOS-8297
> URL: https://issues.apache.org/jira/browse/MESOS-8297
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere
>
> If docker executor receives a kill task request and the task has never been 
> launch, the request is ignored. We now know that: the executor has never 
> received the registration confirmation, hence has ignored the launch task 
> request, hence the task has never started. And this is how the executor 
> enters an idle state, waiting for registration and ignoring kill task 
> requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)