Re: Review Request 70899: Refactored the agent's task-killing code.

2019-06-21 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70899/
---

(Updated June 21, 2019, 11:59 p.m.)


Review request for mesos, Benjamin Bannier, Benno Evers, Benjamin Mahler, and 
Joseph Wu.


Bugs: MESOS-9821
https://issues.apache.org/jira/browse/MESOS-9821


Repository: mesos


Description
---

This patch factors the code responsible for killing tasks
out into two helper functions. This will facilitate the
calling of this common code by the agent-draining handler.


Diffs (updated)
-

  src/slave/slave.hpp 6954f53ff1531b9fcb688ef76acddf6a3d849a41 
  src/slave/slave.cpp 30039b0857a4d85b4b96fa95d7f8724d57cdec6e 


Diff: https://reviews.apache.org/r/70899/diff/3/

Changes: https://reviews.apache.org/r/70899/diff/2-3/


Testing
---

Testing details at the end of this chain.


Thanks,

Greg Mann



Re: Review Request 70834: Added minimal agent handler for 'DrainSlaveMessage'.

2019-06-21 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70834/
---

(Updated June 21, 2019, 11:58 p.m.)


Review request for mesos, Benjamin Bannier, Benno Evers, Benjamin Mahler, and 
Joseph Wu.


Bugs: MESOS-9818
https://issues.apache.org/jira/browse/MESOS-9818


Repository: mesos


Description
---

This patch adds a minimal handler to the agent for the
`DrainSlaveMessage`. This handler will later be extended
to implement the full functionality.


Diffs (updated)
-

  src/slave/paths.hpp ad768267e6a3c105eab4b36b05351d175d720eaf 
  src/slave/paths.cpp 1163c88eefd2a9e99b56817a3f515a23ee6e3903 
  src/slave/slave.hpp 6954f53ff1531b9fcb688ef76acddf6a3d849a41 
  src/slave/slave.cpp 30039b0857a4d85b4b96fa95d7f8724d57cdec6e 


Diff: https://reviews.apache.org/r/70834/diff/4/

Changes: https://reviews.apache.org/r/70834/diff/3-4/


Testing
---

Testing details at the end of this chain.


Thanks,

Greg Mann



Re: Review Request 70839: Added agent capability for agent draining.

2019-06-21 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70839/
---

(Updated June 21, 2019, 11:57 p.m.)


Review request for mesos, Benjamin Bannier, Benno Evers, Benjamin Mahler, and 
Joseph Wu.


Bugs: MESOS-9818
https://issues.apache.org/jira/browse/MESOS-9818


Repository: mesos


Description
---

Added agent capability for agent draining.


Diffs (updated)
-

  include/mesos/mesos.proto 2b4f350815935220c2d2b0dd0e52346bc74c91d9 
  include/mesos/v1/mesos.proto bafc27499f810791700c4a30dcb1da33b6f31d2e 
  src/common/protobuf_utils.hpp ecaf8ea36c486b859870fe53e834e0844e317182 
  src/common/protobuf_utils.cpp 9ff0cf59c658c7c6a3a439a77aff13aff3c20fe5 
  src/slave/constants.cpp 4c29b00f25aee5475051a4eb6d0d587ca2b7b6d4 
  src/slave/flags.cpp e23061ad929c8e3c24806f52c95f82766e3d1911 
  src/tests/master_tests.cpp 097f1b77a59e29c6690210773d1556ebf2bb701e 
  src/tests/slave_tests.cpp c2035976713abb31b3646c0d23771fa40df93271 


Diff: https://reviews.apache.org/r/70839/diff/2/

Changes: https://reviews.apache.org/r/70839/diff/1-2/


Testing
---

Testing details at the end of this chain.


Thanks,

Greg Mann



Re: Review Request 70822: Added common protobufs for agent draining.

2019-06-21 Thread Greg Mann


> On June 14, 2019, 9:34 a.m., Benjamin Bannier wrote:
> > src/messages/messages.proto
> > Lines 986 (patched)
> > 
> >
> > What does it mean if the master sends a `DRAINED` state to the agent? 
> > Is that something we need to reject in validation?
> > 
> > Does it maybe make sense to instead break out `DrainInfo.state` into 
> > its own message to use in state reporting?
> 
> Greg Mann wrote:
> Yes we can have a CHECK on the agent to make sure the master doesn't send 
> DRAINED in a DrainSlaveMessage. That will happen in another patch.
> 
> What's the benefit of moving `DrainInfo.State` outside of `DrainInfo`?
> 
> Benjamin Bannier wrote:
> The goal would be to use dedicated messages for triggering draining 
> (master->agent) and to report draining (agent->master). That might not only 
> be conceptually simplier, but would likely also lead to simpler, less 
> redundant code. Is there any benefit in using the same message?
> 
> Greg Mann wrote:
> What do you mean by "report draining"? The design doesn't involve any 
> explicit communication of draining progress between agent and master, the 
> master just receives task status updates.
> 
> Benjamin Bannier wrote:
> Sorry, there's no process of communicating `DrainInfo` back to master, 
> but we do report it to users with https://reviews.apache.org/r/70835/.
> 
> Greg Mann wrote:
> Are you proposing having a separate message for inclusion in the agent 
> API outputs?
> 
> Vinod Kone wrote:
> IIUC, Benjamin is questioning why we are storing both the spec (max grace 
> period, mark_gone) and the status (state) in the same proto (do we do this 
> elsewhere in the code base?). If they were separate protos, say DrainRequest 
> and DrainStatus, master would send DrainRequest to the agent, and show 
> DrainStatus (and possibly DrainRequest) in the operator API response. It 
> looks a bit weird that we would send DrainInfo (as it is currently written  
> to the agent and do a CHECK on valid state. The fact that an agent got a 
> Drain message is enough for the agent to know that it needs to drain, it 
> doesn't need to look into the `DrainInfo::State`?
> 
> Vinod Kone wrote:
> Let me repharse my first sentence. I think storing both spec and status 
> in the same proto is probably fine if it makes thing easy to expose in API 
> endpoints and use it internally in the master. But I still think it's weird 
> to send the status in a message to the agent which is not expected to use it. 
> So, if there's a way we can avoid it that would be great.
> 
> Joseph Wu wrote:
> From the master's perspective, having the `State` in the DrainInfo 
> complicates what the master needs to do in the registry.  To report the 
> correct state to any endpoints, the master will need to assume all agents are 
> in a `DRAINING` state, until the agent connects, reconciles, and says it has 
> nothing running.  This means the state within the registry will always be 
> `DRAINING` (and there is no reason to transition that state).
> 
> So moving the `State` field into a separate message would be ideal.

It definitely makes sense to me that the agent doesn't need to receive a drain 
state, so the `DrainSlaveMessage` shouldn't include that information. But we 
could just include the max grace period and the mark_gone bit in the 
`DrainSlaveMessage`, without moving the drain state outside of `DrainInfo`.

I think the question is, will we make use of the drain state outside of the 
`DrainInfo` message? I don't have any use cases in mind for that currently, but 
maybe I'm missing something? Or maybe it's best to have a separate `DrainState` 
enum in case we want to use it outside of `DrainInfo` in the future?


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70822/#review215896
---


On June 18, 2019, 11:17 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70822/
> ---
> 
> (Updated June 18, 2019, 11:17 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, Benjamin Mahler, 
> Joseph Wu, and Vinod Kone.
> 
> 
> Bugs: MESOS-9753
> https://issues.apache.org/jira/browse/MESOS-9753
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch makes protobuf message updates which will be used
> by both the master and the agent to facilitate automatic
> draining of agents.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 2b4f350815935220c2d2b0dd0e52346bc74c91d9 
>   include/mesos/type_utils.hpp 57b1893160dbe874aa9fec00a3d1b640b9c54906 
>   include/mesos/v1/mesos.proto bafc27499f810791700c

Review Request 70927: Fixed a memory "leak" of filters in the allocator.

2019-06-21 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70927/
---

Review request for mesos, Andrei Sekretenko and Meng Zhu.


Bugs: MESOS-9852
https://issues.apache.org/jira/browse/MESOS-9852


Repository: mesos


Description
---

Per MESOS-9852, the allocator does not keep a handle to the offer
filter timer, which means it cannot remove the timer if it's no
longer relevant (e.g. due to revive). This means that timers build
up in memory.

Also, the offer filter is allocated on the heap, and is not deleted
until the expiry of the timer (which may take forever!), even if the
offer filter is no longer relevant (e.g. due to revive). This means
that offer filters build up in memory.

The fix applied is to manage offer filters using a shared_ptr in the
maps, which means that they get deleted when erased. The expiration
functions need to now use a weak_ptr in case they run after the
offer filter has been erased (which is possible due to racing between
the expiry timer firing and the timer being discarded).

To discard the timers when no longer needed, the destructors of the
filters perform the discard.

This was tested against a test which spins in a revive + long decline
loop. Previously, the RSS continues to grow, but after this fix
it remains the same.


Diffs
-

  src/master/allocator/mesos/hierarchical.hpp 
25472eb7f5cb8342d2c88449c9c4145d816fb718 
  src/master/allocator/mesos/hierarchical.cpp 
7076e968589c05c770cf50ad289bc7889625b54f 


Diff: https://reviews.apache.org/r/70927/diff/1/


Testing
---

make check in addition to the leak test described in the description
and included in the subsequent patch (that's not planned to be committed)


Thanks,

Benjamin Mahler



Review Request 70928: WIP: leak test.

2019-06-21 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70928/
---

Review request for mesos, Andrei Sekretenko and Meng Zhu.


Bugs: MESOS-9852
https://issues.apache.org/jira/browse/MESOS-9852


Repository: mesos


Description
---

WIP: leak test.


Diffs
-

  src/tests/hierarchical_allocator_tests.cpp 
20b7fd7dd56cae51757d4434e3245cf690f63320 


Diff: https://reviews.apache.org/r/70928/diff/1/


Testing
---


Thanks,

Benjamin Mahler



Re: Review Request 70884: Added optional 'host' string member to UPID.

2019-06-21 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70884/#review216071
---




3rdparty/libprocess/src/pid.cpp
Lines 74 (patched)


Note: I'm a bit unsure about this. On the one hand, it feels super-useful 
to print UPID's the same way they were received, and I've been annoyed more 
often than not that I have to manually resolve an IP address from some Mesos 
log.

On the other hand, this can lead to slightly confusing messages like:
```
W0622 00:09:43.911963  5825 slave.cpp:1584] Ignoring re-registration 
message from master@127.0.1.1:5050 because it is not the expected master: 
master@localhost:5050

```
where it's not immediately obvious that `localhost` is actually ignored by 
the code in question and the problem is actually that it resolved to 
`127.0.0.1` and not `127.0.1.1`.


- Benno Evers


On June 21, 2019, 10:25 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70884/
> ---
> 
> (Updated June 21, 2019, 10:25 p.m.)
> 
> 
> Review request for mesos, Joseph Wu and Till Toenshoff.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This allows client code to access the original hostname
> that was used to specify a libprocess address.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/include/process/pid.hpp 
> 9f09ab46fa3ceaeac09b0fbf9f532728c4ed2d7a 
>   3rdparty/libprocess/src/pid.cpp fdc61b5ab6c75b33ce33de7edd11e9302550f300 
>   3rdparty/libprocess/src/process.cpp 
> 799666f03d6a78708aa9336c2dd04bc9b5023aa0 
> 
> 
> Diff: https://reviews.apache.org/r/70884/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 70884: Added optional 'host' string member to UPID.

2019-06-21 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70884/
---

(Updated June 21, 2019, 10:25 p.m.)


Review request for mesos, Joseph Wu and Till Toenshoff.


Summary (updated)
-

Added optional 'host' string member to UPID.


Repository: mesos


Description (updated)
---

This allows client code to access the original hostname
that was used to specify a libprocess address.


Diffs (updated)
-

  3rdparty/libprocess/include/process/pid.hpp 
9f09ab46fa3ceaeac09b0fbf9f532728c4ed2d7a 
  3rdparty/libprocess/src/pid.cpp fdc61b5ab6c75b33ce33de7edd11e9302550f300 
  3rdparty/libprocess/src/process.cpp 799666f03d6a78708aa9336c2dd04bc9b5023aa0 


Diff: https://reviews.apache.org/r/70884/diff/2/

Changes: https://reviews.apache.org/r/70884/diff/1-2/


Testing
---


Thanks,

Benno Evers



Re: Review Request 70924: Added registry operations for DE/RE-ACTIVATE_AGENT calls.

2019-06-21 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70924/#review216070
---


Fix it, then Ship it!





src/master/registry_operations.cpp
Lines 527 (patched)


Maybe:
```
if (moreThanOneDeactivated) {
  break;
}

continue;
```


- Greg Mann


On June 21, 2019, 4:49 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70924/
> ---
> 
> (Updated June 21, 2019, 4:49 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-9814
> https://issues.apache.org/jira/browse/MESOS-9814
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds the associated registry operation and fields for the
> DEACTIVATE_AGENT and REACTIVATE_AGENT master calls.
> 
> Like the DRAIN_AGENT call, the deactivation state also persists when
> agents become unreachable/reachable, which is already handled by the
> DRAIN_AGENT operation implementation.
> 
> Likewise, this feature is not downgrade compatible, and a minimum
> capability is added when the deactivation feature is used.
> If all draining or deactivated agents are reactivated, the minimum
> capability is removed.
> 
> 
> Diffs
> -
> 
>   src/master/registry_operations.hpp 5a3010d46c4eddb0d660eb7368ec836903477a9b 
>   src/master/registry_operations.cpp c417c4d6c4272eeab15f5313ca5160d4a0aa4da4 
>   src/tests/registrar_tests.cpp 81979d78f1348791e28f1afea4ca2999de6362b8 
> 
> 
> Diff: https://reviews.apache.org/r/70924/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 70914: Defined `mapped_type` for `ResourceQuantities` and `ResourceLimits`.

2019-06-21 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70914/#review216068
---



This makes the json serialization pretty implicit, why not just add explicit 
json serialization functions for these types? I'm not that familiar with the 
other implications of adding mapped_type, are you?

- Benjamin Mahler


On June 21, 2019, 2:27 a.m., Meng Zhu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70914/
> ---
> 
> (Updated June 21, 2019, 2:27 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This let the two pass the `HasMappedType` and benefit from
> templated functions such as `json`.
> 
> 
> Diffs
> -
> 
>   include/mesos/resource_quantities.hpp 
> c861df8a8a167b19ea8374c22cdd2a8fe567a6a6 
> 
> 
> Diff: https://reviews.apache.org/r/70914/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>



Re: Review Request 70923: Added a registry operation for the DRAIN_AGENT call.

2019-06-21 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70923/#review216065
---


Fix it, then Ship it!





src/master/registry_operations.cpp
Lines 128 (patched)


s/any //


- Greg Mann


On June 21, 2019, 4 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70923/
> ---
> 
> (Updated June 21, 2019, 4 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-9814
> https://issues.apache.org/jira/browse/MESOS-9814
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds the associated registry operation and fields for the
> DRAIN_AGENT master call.  This call affects admitted or unreachable
> agents and thus has data fields into two parts of the registry.
> 
> When agents transition from reachable to unreachable, or vice
> versa, the draining info is now copied too.
> 
> Because this feature is not downgrade compatible, a minimum capability
> is added when the draining feature is used.
> 
> 
> Diffs
> -
> 
>   src/master/registry.proto 239789322c45c6a8346332f50661a98b1851b685 
>   src/master/registry_operations.hpp 5a3010d46c4eddb0d660eb7368ec836903477a9b 
>   src/master/registry_operations.cpp c417c4d6c4272eeab15f5313ca5160d4a0aa4da4 
>   src/tests/registrar_tests.cpp 81979d78f1348791e28f1afea4ca2999de6362b8 
> 
> 
> Diff: https://reviews.apache.org/r/70923/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 70922: Added master minimum capability for agent draining.

2019-06-21 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70922/#review216059
---


Fix it, then Ship it!





src/tests/master_tests.cpp
Line 5035 (original), 5035 (patched)


s/the some/these/


- Greg Mann


On June 21, 2019, 3:08 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70922/
> ---
> 
> (Updated June 21, 2019, 3:08 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, Meng 
> Zhu, and Vinod Kone.
> 
> 
> Bugs: MESOS-9814
> https://issues.apache.org/jira/browse/MESOS-9814
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds a new enum for the agent draining feature in the master.
> When an agent is marked for draining or deactivation, an associated
> entry will be added to the master's registry.  This will prevent
> any master downgrades until all agents are reactivated.
> 
> 
> Diffs
> -
> 
>   docs/downgrades.md 08200015d8b533640b4656c88b0510cf41a7ffa4 
>   include/mesos/mesos.proto 2b4f350815935220c2d2b0dd0e52346bc74c91d9 
>   include/mesos/v1/mesos.proto bafc27499f810791700c4a30dcb1da33b6f31d2e 
>   src/common/protobuf_utils.hpp ecaf8ea36c486b859870fe53e834e0844e317182 
>   src/master/constants.cpp 1109b483daad10560e6d2c717a80492eddb507fc 
>   src/tests/master_tests.cpp 097f1b77a59e29c6690210773d1556ebf2bb701e 
> 
> 
> Diff: https://reviews.apache.org/r/70922/diff/2/
> 
> 
> Testing
> ---
> 
> TODO: Need to look at this on a browser.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 70925: Exposed the MESOS-9856 bug in revivieOffers() in a test.

2019-06-21 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70925/#review216064
---


Ship it!




Thanks! Usually we would put this test after the fix, because if it gets 
committed first it will break the build.

- Benjamin Mahler


On June 21, 2019, 5:38 p.m., Andrei Sekretenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70925/
> ---
> 
> (Updated June 21, 2019, 5:38 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-9856
> https://issues.apache.org/jira/browse/MESOS-9856
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch exposes a bug, due to which reviveOffers() called for
> a specific role clears offer filters for all the roles of a framework.
> 
> 
> Diffs
> -
> 
>   src/tests/hierarchical_allocator_tests.cpp 
> 7dc3dc1392b120e2798dcd68f7290cae4afa890d 
> 
> 
> Diff: https://reviews.apache.org/r/70925/diff/1/
> 
> 
> Testing
> ---
> 
> make check - this test fails
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>



Re: Review Request 70926: Fixed clearing offer filters for unrelated roles by reviveOffers().

2019-06-21 Thread Benjamin Mahler

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70926/#review216063
---


Ship it!




Ship It!

- Benjamin Mahler


On June 21, 2019, 5:38 p.m., Andrei Sekretenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70926/
> ---
> 
> (Updated June 21, 2019, 5:38 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Meng Zhu.
> 
> 
> Bugs: MESOS-9856
> https://issues.apache.org/jira/browse/MESOS-9856
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This is a fix for MESOS-9856 bug wich was causing all the offer filters
> to be cleared by reviveOffers() called for a specific subset of roles.
> 
> 
> Diffs
> -
> 
>   src/master/allocator/mesos/hierarchical.cpp 
> c9ed094fcd67c1d8896fe380274ef8e08d5af892 
> 
> 
> Diff: https://reviews.apache.org/r/70926/diff/1/
> 
> 
> Testing
> ---
> 
> `./bin/mesos-tests.sh --gtest_filter="*SuppressAndReviveOffersWithMultiRole*" 
> --gtest_break_on_failure --gtest_repeat=1000`
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>



Re: Review Request 70886: WIP: Override source address for executors.

2019-06-21 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70886/#review216060
---



Patch looks great!

Reviews applied: [70747, 70748, 70810, 70883, 70884, 70885, 70796, 70749, 
70795, 70921, 70797, 70886]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On June 19, 2019, 2:48 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70886/
> ---
> 
> (Updated June 19, 2019, 2:48 p.m.)
> 
> 
> Review request for mesos, Jan-Philip Gehrcke, Joseph Wu, and Till Toenshoff.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Ignore an executor's reported IP address and enforce the hostname
> "localhost" instead for TLS hostname validation.
> 
> NOTE: This change is for illustration purposes only, and is not
> intended to be upstreamed.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp 30039b0857a4d85b4b96fa95d7f8724d57cdec6e 
> 
> 
> Diff: https://reviews.apache.org/r/70886/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Review Request 70926: Fixed clearing offer filters for unrelated roles by reviveOffers().

2019-06-21 Thread Andrei Sekretenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70926/
---

Review request for mesos and Benjamin Mahler.


Bugs: MESOS-9856
https://issues.apache.org/jira/browse/MESOS-9856


Repository: mesos


Description
---

This is a fix for MESOS-9856 bug wich was causing all the offer filters
to be cleared by reviveOffers() called for a specific subset of roles.


Diffs
-

  src/master/allocator/mesos/hierarchical.cpp 
c9ed094fcd67c1d8896fe380274ef8e08d5af892 


Diff: https://reviews.apache.org/r/70926/diff/1/


Testing
---

`./bin/mesos-tests.sh --gtest_filter="*SuppressAndReviveOffersWithMultiRole*" 
--gtest_break_on_failure --gtest_repeat=1000`


Thanks,

Andrei Sekretenko



Review Request 70925: Exposed the MESOS-9856 bug in revivieOffers() in a test.

2019-06-21 Thread Andrei Sekretenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70925/
---

Review request for mesos and Benjamin Mahler.


Bugs: MESOS-9856
https://issues.apache.org/jira/browse/MESOS-9856


Repository: mesos


Description
---

This patch exposes a bug, due to which reviveOffers() called for
a specific role clears offer filters for all the roles of a framework.


Diffs
-

  src/tests/hierarchical_allocator_tests.cpp 
7dc3dc1392b120e2798dcd68f7290cae4afa890d 


Diff: https://reviews.apache.org/r/70925/diff/1/


Testing
---

make check - this test fails


Thanks,

Andrei Sekretenko



Re: Review Request 70911: Added master endpoints for agent draining.

2019-06-21 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70911/#review216056
---


Fix it, then Ship it!




In the description: s/tn operator/an operator/


include/mesos/master/master.proto
Lines 233 (patched)


s/The/This/


- Greg Mann


On June 21, 2019, 1:10 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70911/
> ---
> 
> (Updated June 21, 2019, 1:10 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-9814
> https://issues.apache.org/jira/browse/MESOS-9814
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds three master calls:
>   * DRAIN_AGENT
>   * DEACTIVATE_AGENT
>   * REACTIVATE_AGENT
> 
> DRAIN_AGENT starts automated draining of tasks on the specified agent.
> When marked for draining, the agent's resources will not be offered
> and all the agent's tasks will be gracefully killed.
> 
> DEACTIVATE_AGENT stops offers for a specific agent.  This can be used
> for manual draining of tasks, but is not limited to draining.
> For example, tn operator could deactivate an agent prior to
> reserving resources.
> 
> REACTIVATE_AGENT restarts offers for a specific agent, that had been
> drained or deactivated previously.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto 4a653ce59268a47d69362d0e2a3d5eb867ca6fa9 
>   include/mesos/v1/master/master.proto 
> d4dd3fcb0abc947097a5a281c4082ccf04db8614 
>   src/internal/devolve.hpp fefe86e450fa5083b9ff50e92f4594ffb30a54c8 
>   src/internal/devolve.cpp 1d300b49d5cc3de4b8ed409902eb881c7afc07ea 
>   src/master/http.cpp 3cd7df228adb54a963821c23b4d1c26d33622ee7 
>   src/master/master.hpp 7acaa8264eb7a37e17394bc4940971f872ab2de3 
>   src/master/validation.cpp af2d04a46750e5d42e7d92c70ce685e64d646a3b 
> 
> 
> Diff: https://reviews.apache.org/r/70911/diff/4/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 70903: Killed all tasks on the agent when draining.

2019-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70903/#review216058
---




src/slave/slave.cpp
Lines 1005-1009 (patched)


Perhaps you can move this entire section and below into a separate 
`_drain()` continuation so that the recovery code an re-use it.

I'm expecting the recovery to set `drainInfo` separately, but then need to 
start killing (this code) again.



src/slave/slave.cpp
Lines 1013-1014 (patched)


Might help to mention this mirrors the (hard coded! D:) defaults in the 
default and command executors.


- Joseph Wu


On June 19, 2019, 12:07 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70903/
> ---
> 
> (Updated June 19, 2019, 12:07 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, Benjamin Mahler, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-9821
> https://issues.apache.org/jira/browse/MESOS-9821
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the agent's `DrainSlaveMessage` handler
> to kill all tasks on the agent when the message is received.
> 
> 
> Diffs
> -
> 
>   include/mesos/type_utils.hpp 57b1893160dbe874aa9fec00a3d1b640b9c54906 
>   src/slave/slave.cpp 30039b0857a4d85b4b96fa95d7f8724d57cdec6e 
> 
> 
> Diff: https://reviews.apache.org/r/70903/diff/1/
> 
> 
> Testing
> ---
> 
> Testing details at the end of this chain.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70911: Added master endpoints for agent draining.

2019-06-21 Thread Greg Mann


> On June 21, 2019, 3:23 p.m., James DeFelice wrote:
> > include/mesos/master/master.proto
> > Lines 93 (patched)
> > 
> >
> > are these new APIs experimental? if so, they should be labeled as such
> 
> Joseph Wu wrote:
> By the time the entire chain of agent draining work is done, these APIs 
> won't be experimental.  If we land this patch independently, then yes, we'll 
> need to mark them experimental.

Hmm I would probably advocate making the new API experimental to start, just in 
case we run into some unforeseen issues and want to make changes?


- Greg


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70911/#review216046
---


On June 21, 2019, 1:10 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70911/
> ---
> 
> (Updated June 21, 2019, 1:10 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-9814
> https://issues.apache.org/jira/browse/MESOS-9814
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds three master calls:
>   * DRAIN_AGENT
>   * DEACTIVATE_AGENT
>   * REACTIVATE_AGENT
> 
> DRAIN_AGENT starts automated draining of tasks on the specified agent.
> When marked for draining, the agent's resources will not be offered
> and all the agent's tasks will be gracefully killed.
> 
> DEACTIVATE_AGENT stops offers for a specific agent.  This can be used
> for manual draining of tasks, but is not limited to draining.
> For example, tn operator could deactivate an agent prior to
> reserving resources.
> 
> REACTIVATE_AGENT restarts offers for a specific agent, that had been
> drained or deactivated previously.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto 4a653ce59268a47d69362d0e2a3d5eb867ca6fa9 
>   include/mesos/v1/master/master.proto 
> d4dd3fcb0abc947097a5a281c4082ccf04db8614 
>   src/internal/devolve.hpp fefe86e450fa5083b9ff50e92f4594ffb30a54c8 
>   src/internal/devolve.cpp 1d300b49d5cc3de4b8ed409902eb881c7afc07ea 
>   src/master/http.cpp 3cd7df228adb54a963821c23b4d1c26d33622ee7 
>   src/master/master.hpp 7acaa8264eb7a37e17394bc4940971f872ab2de3 
>   src/master/validation.cpp af2d04a46750e5d42e7d92c70ce685e64d646a3b 
> 
> 
> Diff: https://reviews.apache.org/r/70911/diff/4/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 70910: Added ACLs for agent draining APIs.

2019-06-21 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70910/#review216052
---


Ship it!




- Greg Mann


On June 20, 2019, 9 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70910/
> ---
> 
> (Updated June 20, 2019, 9 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-9814
> https://issues.apache.org/jira/browse/MESOS-9814
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds three ACLs for the three endpoints added as part of the
> agent draining feature:
>   * DRAIN_AGENT
>   * DEACTIVATE_AGENT
>   * REACTIVATE_AGENT
> 
> All three ACLs are coarse-grained; a principal is either allowed to
> perform the action, or unauthorized. However, there is scope to
> restrict the action to particular subsets of agents in future.
> 
> 
> Diffs
> -
> 
>   include/mesos/authorizer/acls.proto 
> 78acfe622603f3697150c81bcf8c7a31deed9699 
>   include/mesos/authorizer/authorizer.proto 
> ad40d3add9db7540c328ebcf3aae771ff966528e 
>   src/authorizer/local/authorizer.cpp 
> b89010d09ef3ee564e7659c0cfe335281656e1cd 
>   src/tests/authorization_tests.cpp 7bb0bd831c7c71558cc981ce00ab399e0034bbdf 
>   src/tests/master_authorization_tests.cpp 
> 7d406950d113c40637054f2d0f5edd2b81f7410a 
> 
> 
> Diff: https://reviews.apache.org/r/70910/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 70900: Updated an equality operator.

2019-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70900/#review216054
---




src/common/type_utils.cpp
Line 573 (original)


I think this is the one case where the MessageDifferencer will not work.

I've played around with one here: https://reviews.apache.org/r/65541/

But that patch (experimental) also did not handle Resources because they 
are a bit tricky to fit.


- Joseph Wu


On June 19, 2019, 11:59 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70900/
> ---
> 
> (Updated June 19, 2019, 11:59 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, Benjamin Mahler, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-9821
> https://issues.apache.org/jira/browse/MESOS-9821
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the equality operator for the `Task`
> message to make use of the `MessageDifferencer`, which
> will eliminate the need to update the operator
> implementation when new fields are added in the future.
> 
> 
> Diffs
> -
> 
>   src/common/type_utils.cpp ef1b3ea15cde1c7a8e0735fb9d7566dd1fd2cfdb 
> 
> 
> Diff: https://reviews.apache.org/r/70900/diff/1/
> 
> 
> Testing
> ---
> 
> Testing details at the end of this chain.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70911: Added master endpoints for agent draining.

2019-06-21 Thread Benjamin Mahler


> On June 21, 2019, 3:22 p.m., James DeFelice wrote:
> > is there a related patch to add these new APIs to the operator V1 API 
> > documentation, along w/ expected response codes from mesos?

+1

There should either be a ticket filed or an item added to a google doc that 
tracks all the pieces needed for this to be "complete".

I assume that's probably under this: 
https://issues.apache.org/jira/browse/MESOS-9845

That's a pretty vague umbrella for the moment. Personally, I find it better to 
track all these items in detail as soon as they come up, it's what we're trying 
to do with the quota work:
https://docs.google.com/document/d/17qxa3y91iBAvriJt8U5_CabBKBGgQPlDWBM-u0EjE64/edit#

(sometimes it's easier to just note something down and follow up with filing a 
nice ticket later, since tickets are a lot "heavier")


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70911/#review216045
---


On June 21, 2019, 1:10 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70911/
> ---
> 
> (Updated June 21, 2019, 1:10 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-9814
> https://issues.apache.org/jira/browse/MESOS-9814
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds three master calls:
>   * DRAIN_AGENT
>   * DEACTIVATE_AGENT
>   * REACTIVATE_AGENT
> 
> DRAIN_AGENT starts automated draining of tasks on the specified agent.
> When marked for draining, the agent's resources will not be offered
> and all the agent's tasks will be gracefully killed.
> 
> DEACTIVATE_AGENT stops offers for a specific agent.  This can be used
> for manual draining of tasks, but is not limited to draining.
> For example, tn operator could deactivate an agent prior to
> reserving resources.
> 
> REACTIVATE_AGENT restarts offers for a specific agent, that had been
> drained or deactivated previously.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto 4a653ce59268a47d69362d0e2a3d5eb867ca6fa9 
>   include/mesos/v1/master/master.proto 
> d4dd3fcb0abc947097a5a281c4082ccf04db8614 
>   src/internal/devolve.hpp fefe86e450fa5083b9ff50e92f4594ffb30a54c8 
>   src/internal/devolve.cpp 1d300b49d5cc3de4b8ed409902eb881c7afc07ea 
>   src/master/http.cpp 3cd7df228adb54a963821c23b4d1c26d33622ee7 
>   src/master/master.hpp 7acaa8264eb7a37e17394bc4940971f872ab2de3 
>   src/master/validation.cpp af2d04a46750e5d42e7d92c70ce685e64d646a3b 
> 
> 
> Diff: https://reviews.apache.org/r/70911/diff/4/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 70899: Refactored the agent's task-killing code.

2019-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70899/#review216051
---


Fix it, then Ship it!




Refactor looks reasonable.  One possible commenting oversight below.


src/slave/slave.cpp
Lines 3782-3784 (patched)


In the header, the comment says:
```
This function should be used to kill tasks which are queued or launched, 
but not tasks which are pending.
```

And the refactor seems to prevent this function from being called with a 
pending task.  So shouldn't `executor` be non-null?

Or is there some future use-case where this is not true?


- Joseph Wu


On June 19, 2019, 12:05 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70899/
> ---
> 
> (Updated June 19, 2019, 12:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, Benjamin Mahler, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-9821
> https://issues.apache.org/jira/browse/MESOS-9821
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch factors the code responsible for killing tasks
> out into two helper functions. This will facilitate the
> calling of this common code by the agent-draining handler.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp 6954f53ff1531b9fcb688ef76acddf6a3d849a41 
>   src/slave/slave.cpp 30039b0857a4d85b4b96fa95d7f8724d57cdec6e 
> 
> 
> Diff: https://reviews.apache.org/r/70899/diff/2/
> 
> 
> Testing
> ---
> 
> Testing details at the end of this chain.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70907: Added recovery of agent drain information.

2019-06-21 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70907/#review216022
---


Fix it, then Ship it!





src/slave/state.cpp
Lines 210 (patched)


Make use of the local variable `agentStatePath` here.


- Greg Mann


On June 20, 2019, 9:45 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70907/
> ---
> 
> (Updated June 20, 2019, 9:45 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-9822
> https://issues.apache.org/jira/browse/MESOS-9822
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> With this patch the agent will, after executor reregistration finished,
> replay any active drain information so remaining tasks are drained as
> well. We need to wait until executors had a chance to register so they
> are not terminated should we try to send kill task request before the
> executor has registered.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp 30039b0857a4d85b4b96fa95d7f8724d57cdec6e 
>   src/slave/state.hpp 45836e5805a5038c0b3b5563ea995c3fa90ab808 
>   src/slave/state.cpp b4bf6292eb482f4e99a27442af0ea03e31c1ddc2 
>   src/tests/slave_tests.cpp c2035976713abb31b3646c0d23771fa40df93271 
> 
> 
> Diff: https://reviews.apache.org/r/70907/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Review Request 70924: Added registry operations for DE/RE-ACTIVATE_AGENT calls.

2019-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70924/
---

Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
Vinod Kone.


Bugs: MESOS-9814
https://issues.apache.org/jira/browse/MESOS-9814


Repository: mesos


Description
---

This adds the associated registry operation and fields for the
DEACTIVATE_AGENT and REACTIVATE_AGENT master calls.

Like the DRAIN_AGENT call, the deactivation state also persists when
agents become unreachable/reachable, which is already handled by the
DRAIN_AGENT operation implementation.

Likewise, this feature is not downgrade compatible, and a minimum
capability is added when the deactivation feature is used.
If all draining or deactivated agents are reactivated, the minimum
capability is removed.


Diffs
-

  src/master/registry_operations.hpp 5a3010d46c4eddb0d660eb7368ec836903477a9b 
  src/master/registry_operations.cpp c417c4d6c4272eeab15f5313ca5160d4a0aa4da4 
  src/tests/registrar_tests.cpp 81979d78f1348791e28f1afea4ca2999de6362b8 


Diff: https://reviews.apache.org/r/70924/diff/1/


Testing
---

make check


Thanks,

Joseph Wu



Re: Review Request 70834: Added minimal agent handler for 'DrainSlaveMessage'.

2019-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70834/#review216050
---




src/slave/slave.cpp
Lines 752 (patched)


Oops, extra space here.



src/slave/slave.cpp
Lines 991-992 (patched)


Per discussions on https://reviews.apache.org/r/70822/ , this line may 
become redundant if we remove the state from DrainInfo.


- Joseph Wu


On June 18, 2019, 4:20 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70834/
> ---
> 
> (Updated June 18, 2019, 4:20 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, Benjamin Mahler, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-9818
> https://issues.apache.org/jira/browse/MESOS-9818
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds a minimal handler to the agent for the
> `DrainSlaveMessage`. This handler will later be extended
> to implement the full functionality.
> 
> 
> Diffs
> -
> 
>   src/slave/paths.hpp ad768267e6a3c105eab4b36b05351d175d720eaf 
>   src/slave/paths.cpp 1163c88eefd2a9e99b56817a3f515a23ee6e3903 
>   src/slave/slave.hpp 6954f53ff1531b9fcb688ef76acddf6a3d849a41 
>   src/slave/slave.cpp 30039b0857a4d85b4b96fa95d7f8724d57cdec6e 
> 
> 
> Diff: https://reviews.apache.org/r/70834/diff/3/
> 
> 
> Testing
> ---
> 
> Testing details at the end of this chain.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70839: Added agent capability for agent draining.

2019-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70839/#review216049
---


Fix it, then Ship it!





src/slave/constants.cpp
Line 44 (original), 44-45 (patched)


Want to move the ending `};` onto a new line and add an extra comma?

That'll make this a one-line change for future capabilities.


- Joseph Wu


On June 12, 2019, 1:10 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70839/
> ---
> 
> (Updated June 12, 2019, 1:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, Benjamin Mahler, and 
> Joseph Wu.
> 
> 
> Bugs: MESOS-9818
> https://issues.apache.org/jira/browse/MESOS-9818
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added agent capability for agent draining.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 2b4f350815935220c2d2b0dd0e52346bc74c91d9 
>   include/mesos/v1/mesos.proto bafc27499f810791700c4a30dcb1da33b6f31d2e 
>   src/common/protobuf_utils.hpp ecaf8ea36c486b859870fe53e834e0844e317182 
>   src/common/protobuf_utils.cpp 9ff0cf59c658c7c6a3a439a77aff13aff3c20fe5 
>   src/slave/constants.cpp 4c29b00f25aee5475051a4eb6d0d587ca2b7b6d4 
>   src/slave/flags.cpp e23061ad929c8e3c24806f52c95f82766e3d1911 
>   src/tests/master_tests.cpp 097f1b77a59e29c6690210773d1556ebf2bb701e 
>   src/tests/slave_tests.cpp c2035976713abb31b3646c0d23771fa40df93271 
> 
> 
> Diff: https://reviews.apache.org/r/70839/diff/1/
> 
> 
> Testing
> ---
> 
> Testing details at the end of this chain.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Review Request 70923: Added a registry operation for the DRAIN_AGENT call.

2019-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70923/
---

Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
Vinod Kone.


Bugs: MESOS-9814
https://issues.apache.org/jira/browse/MESOS-9814


Repository: mesos


Description
---

This adds the associated registry operation and fields for the
DRAIN_AGENT master call.  This call affects admitted or unreachable
agents and thus has data fields into two parts of the registry.

When agents transition from reachable to unreachable, or vice
versa, the draining info is now copied too.

Because this feature is not downgrade compatible, a minimum capability
is added when the draining feature is used.


Diffs
-

  src/master/registry.proto 239789322c45c6a8346332f50661a98b1851b685 
  src/master/registry_operations.hpp 5a3010d46c4eddb0d660eb7368ec836903477a9b 
  src/master/registry_operations.cpp c417c4d6c4272eeab15f5313ca5160d4a0aa4da4 
  src/tests/registrar_tests.cpp 81979d78f1348791e28f1afea4ca2999de6362b8 


Diff: https://reviews.apache.org/r/70923/diff/1/


Testing
---

make check


Thanks,

Joseph Wu



Re: Review Request 70749: Introduced optional new scheme for hostname validation.

2019-06-21 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70749/
---

(Updated June 21, 2019, 3:32 p.m.)


Review request for mesos, Alexander Rukletsov, Joseph Wu, and Till Toenshoff.


Bugs: MESOS-9809
https://issues.apache.org/jira/browse/MESOS-9809


Repository: mesos


Description (updated)
---

This commit introduces a new libprocess SSL flag
`hostname_validation_scheme`, which can be used to select
between the previous hostname validation behaviour and a new
option to use standardized OpenSSL algorithms to handle
hostname validation as part of the

As a nice side-effect, the new scheme gets rid of reverse DNS
lookups during TLS connection establishment, which used to be
a common source of hard-to-debug unresponsiveness in Mesos
components.

See `docs/ssl.md` in the follow-up commit for details of and
differences between the schemes.


Diffs (updated)
-

  3rdparty/libprocess/include/process/ssl/flags.hpp 
f3483f97f93bb29117b2c78f0f2ed9735d9c4b3a 
  3rdparty/libprocess/src/openssl.hpp 17bec246e516261f8d772f1647c17f092fae82d1 
  3rdparty/libprocess/src/openssl.cpp e7dbd67913fa8e7fbbf60dee428e7e38895f86ce 
  3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.hpp 
6ef5a86566af3439cfe0b06ab3576076623f7be0 
  3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp 
29a1bf71c1df9d80370455a6269ecea0ec4193b0 


Diff: https://reviews.apache.org/r/70749/diff/5/

Changes: https://reviews.apache.org/r/70749/diff/4-5/


Testing
---

Todo!


Thanks,

Benno Evers



Re: Review Request 70749: Introduced optional new scheme for hostname validation.

2019-06-21 Thread Benno Evers


> On June 21, 2019, 1:32 a.m., Till Toenshoff wrote:
> > 3rdparty/libprocess/src/openssl.cpp
> > Lines 808 (patched)
> > 
> >
> > I know you just moved it, but where do these 100ms come from and how 
> > could we be more explicit about that choice?
> > 
> > I would suggest to use a const with some explaining comment how that 
> > value was chosen - can we please? :D

I'm afraid I have no idea where the 100ms come from. The suggestion sounds 
good, but I think the changes will fit better in a separate review, since 
they're not really related to hostname validation.


> On June 21, 2019, 1:32 a.m., Till Toenshoff wrote:
> > 3rdparty/libprocess/src/openssl.cpp
> > Lines 984-985 (patched)
> > 
> >
> > This sounds like a great idea worth spending 1 more cycle on -- can we 
> > create and reference a ticket that explains this jazz as nicely as we do 
> > here in the code?
> > 
> > My thought is that being open about this idea in JIRA,  we would 
> > provide more chances of getting community support for it.

Opened https://issues.apache.org/jira/browse/MESOS-9855


> On June 21, 2019, 1:32 a.m., Till Toenshoff wrote:
> > 3rdparty/libprocess/src/openssl.cpp
> > Lines 989 (patched)
> > 
> >
> > Shall we explain why this is the `right` way - aka best practices?

Huh, I actually removed this now: I originally had a look at the OpenSSL 
example code and at RFC6125, but missed that partial wildcards are only 
disallowed in *internationalized* domain names. (and for these, openssl already 
does the correct thing regardless of this flag.)

With this removed, we're a bit more loose than what a web browser would accept, 
but Mesos is not a web browser so that seems fine.


- Benno


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70749/#review216017
---


On June 20, 2019, 5:48 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70749/
> ---
> 
> (Updated June 20, 2019, 5:48 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Joseph Wu, and Till Toenshoff.
> 
> 
> Bugs: MESOS-9809
> https://issues.apache.org/jira/browse/MESOS-9809
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This commit introduces a new libprocess SSL flag
> `hostname_validation_scheme`, which can be used to select
> between the previous hostname validation behaviour and a new
> option to use standardized OpenSSL algorithms to handle
> hostname validation as part of the handshake.
> 
> As a nice side-effect, the new scheme gets rid of reverse DNS
> lookups during TLS connection establishment, which used to be
> a common source of hard-to-debug unresponsiveness in Mesos
> components.
> 
> See `docs/ssl.md` in the follow-up commit for details of and
> differences between the schemes.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/include/process/ssl/flags.hpp 
> f3483f97f93bb29117b2c78f0f2ed9735d9c4b3a 
>   3rdparty/libprocess/src/openssl.hpp 
> 17bec246e516261f8d772f1647c17f092fae82d1 
>   3rdparty/libprocess/src/openssl.cpp 
> e7dbd67913fa8e7fbbf60dee428e7e38895f86ce 
>   3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.hpp 
> 6ef5a86566af3439cfe0b06ab3576076623f7be0 
>   3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp 
> 29a1bf71c1df9d80370455a6269ecea0ec4193b0 
> 
> 
> Diff: https://reviews.apache.org/r/70749/diff/4/
> 
> 
> Testing
> ---
> 
> Todo!
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 70911: Added master endpoints for agent draining.

2019-06-21 Thread Joseph Wu


> On June 21, 2019, 8:23 a.m., James DeFelice wrote:
> > include/mesos/master/master.proto
> > Lines 93 (patched)
> > 
> >
> > are these new APIs experimental? if so, they should be labeled as such

By the time the entire chain of agent draining work is done, these APIs won't 
be experimental.  If we land this patch independently, then yes, we'll need to 
mark them experimental.


> On June 21, 2019, 8:23 a.m., James DeFelice wrote:
> > include/mesos/v1/master/master.proto
> > Lines 19 (patched)
> > 
> >
> > why this note? i thought proto3 also had support for well-known types.

Once we switch to proto3, we won't need to explicitly include a proto3 type.  
Since we're on proto2, this line pulls in a proto3 object, which otherwise 
isn't present by default.


- Joseph


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70911/#review216046
---


On June 21, 2019, 6:10 a.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70911/
> ---
> 
> (Updated June 21, 2019, 6:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-9814
> https://issues.apache.org/jira/browse/MESOS-9814
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds three master calls:
>   * DRAIN_AGENT
>   * DEACTIVATE_AGENT
>   * REACTIVATE_AGENT
> 
> DRAIN_AGENT starts automated draining of tasks on the specified agent.
> When marked for draining, the agent's resources will not be offered
> and all the agent's tasks will be gracefully killed.
> 
> DEACTIVATE_AGENT stops offers for a specific agent.  This can be used
> for manual draining of tasks, but is not limited to draining.
> For example, tn operator could deactivate an agent prior to
> reserving resources.
> 
> REACTIVATE_AGENT restarts offers for a specific agent, that had been
> drained or deactivated previously.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto 4a653ce59268a47d69362d0e2a3d5eb867ca6fa9 
>   include/mesos/v1/master/master.proto 
> d4dd3fcb0abc947097a5a281c4082ccf04db8614 
>   src/internal/devolve.hpp fefe86e450fa5083b9ff50e92f4594ffb30a54c8 
>   src/internal/devolve.cpp 1d300b49d5cc3de4b8ed409902eb881c7afc07ea 
>   src/master/http.cpp 3cd7df228adb54a963821c23b4d1c26d33622ee7 
>   src/master/master.hpp 7acaa8264eb7a37e17394bc4940971f872ab2de3 
>   src/master/validation.cpp af2d04a46750e5d42e7d92c70ce685e64d646a3b 
> 
> 
> Diff: https://reviews.apache.org/r/70911/diff/4/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 70911: Added master endpoints for agent draining.

2019-06-21 Thread James DeFelice

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70911/#review216046
---




include/mesos/master/master.proto
Lines 93 (patched)


are these new APIs experimental? if so, they should be labeled as such



include/mesos/v1/master/master.proto
Lines 19 (patched)


why this note? i thought proto3 also had support for well-known types.


- James DeFelice


On June 21, 2019, 1:10 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70911/
> ---
> 
> (Updated June 21, 2019, 1:10 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-9814
> https://issues.apache.org/jira/browse/MESOS-9814
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds three master calls:
>   * DRAIN_AGENT
>   * DEACTIVATE_AGENT
>   * REACTIVATE_AGENT
> 
> DRAIN_AGENT starts automated draining of tasks on the specified agent.
> When marked for draining, the agent's resources will not be offered
> and all the agent's tasks will be gracefully killed.
> 
> DEACTIVATE_AGENT stops offers for a specific agent.  This can be used
> for manual draining of tasks, but is not limited to draining.
> For example, tn operator could deactivate an agent prior to
> reserving resources.
> 
> REACTIVATE_AGENT restarts offers for a specific agent, that had been
> drained or deactivated previously.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto 4a653ce59268a47d69362d0e2a3d5eb867ca6fa9 
>   include/mesos/v1/master/master.proto 
> d4dd3fcb0abc947097a5a281c4082ccf04db8614 
>   src/internal/devolve.hpp fefe86e450fa5083b9ff50e92f4594ffb30a54c8 
>   src/internal/devolve.cpp 1d300b49d5cc3de4b8ed409902eb881c7afc07ea 
>   src/master/http.cpp 3cd7df228adb54a963821c23b4d1c26d33622ee7 
>   src/master/master.hpp 7acaa8264eb7a37e17394bc4940971f872ab2de3 
>   src/master/validation.cpp af2d04a46750e5d42e7d92c70ce685e64d646a3b 
> 
> 
> Diff: https://reviews.apache.org/r/70911/diff/4/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 70911: Added master endpoints for agent draining.

2019-06-21 Thread James DeFelice

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70911/#review216045
---



is there a related patch to add these new APIs to the operator V1 API 
documentation, along w/ expected response codes from mesos?

- James DeFelice


On June 21, 2019, 1:10 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70911/
> ---
> 
> (Updated June 21, 2019, 1:10 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
> Vinod Kone.
> 
> 
> Bugs: MESOS-9814
> https://issues.apache.org/jira/browse/MESOS-9814
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This adds three master calls:
>   * DRAIN_AGENT
>   * DEACTIVATE_AGENT
>   * REACTIVATE_AGENT
> 
> DRAIN_AGENT starts automated draining of tasks on the specified agent.
> When marked for draining, the agent's resources will not be offered
> and all the agent's tasks will be gracefully killed.
> 
> DEACTIVATE_AGENT stops offers for a specific agent.  This can be used
> for manual draining of tasks, but is not limited to draining.
> For example, tn operator could deactivate an agent prior to
> reserving resources.
> 
> REACTIVATE_AGENT restarts offers for a specific agent, that had been
> drained or deactivated previously.
> 
> 
> Diffs
> -
> 
>   include/mesos/master/master.proto 4a653ce59268a47d69362d0e2a3d5eb867ca6fa9 
>   include/mesos/v1/master/master.proto 
> d4dd3fcb0abc947097a5a281c4082ccf04db8614 
>   src/internal/devolve.hpp fefe86e450fa5083b9ff50e92f4594ffb30a54c8 
>   src/internal/devolve.cpp 1d300b49d5cc3de4b8ed409902eb881c7afc07ea 
>   src/master/http.cpp 3cd7df228adb54a963821c23b4d1c26d33622ee7 
>   src/master/master.hpp 7acaa8264eb7a37e17394bc4940971f872ab2de3 
>   src/master/validation.cpp af2d04a46750e5d42e7d92c70ce685e64d646a3b 
> 
> 
> Diff: https://reviews.apache.org/r/70911/diff/4/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Review Request 70922: Added master minimum capability for agent draining.

2019-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70922/
---

Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
Vinod Kone.


Bugs: MESOS-9814
https://issues.apache.org/jira/browse/MESOS-9814


Repository: mesos


Description
---

This adds a new enum for the agent draining feature in the master.
When an agent is marked for draining or deactivation, an associated
entry will be added to the master's registry.  This will prevent
any master downgrades until all agents are reactivated.


Diffs
-

  docs/downgrades.md 08200015d8b533640b4656c88b0510cf41a7ffa4 
  include/mesos/mesos.proto 2b4f350815935220c2d2b0dd0e52346bc74c91d9 
  include/mesos/v1/mesos.proto bafc27499f810791700c4a30dcb1da33b6f31d2e 
  src/common/protobuf_utils.hpp ecaf8ea36c486b859870fe53e834e0844e317182 
  src/master/constants.cpp 1109b483daad10560e6d2c717a80492eddb507fc 


Diff: https://reviews.apache.org/r/70922/diff/1/


Testing
---

TODO: Need to look at this on a browser.


Thanks,

Joseph Wu



Re: Review Request 70795: Updated SSL docs to include new libprocess flag.

2019-06-21 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70795/
---

(Updated June 21, 2019, 3:05 p.m.)


Review request for mesos, Alexander Rukletsov, Jan-Philip Gehrcke, Joseph Wu, 
and Till Toenshoff.


Repository: mesos


Description (updated)
---

Added a description of the new `--hostname_validation_scheme` flag
and corresponding `LIBPROCESS_SSL_HOSTNAME_VALIDATION_SCHEME`
environment variable.


Diffs (updated)
-

  docs/ssl.md ce5058896144aa7824986d40d996899d92cb7c1c 


Diff: https://reviews.apache.org/r/70795/diff/4/

Changes: https://reviews.apache.org/r/70795/diff/3-4/


Testing
---


Thanks,

Benno Evers



Review Request 70921: Add OpenSSL related changes to CHANGELOG.

2019-06-21 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70921/
---

Review request for mesos, Joseph Wu and Till Toenshoff.


Repository: mesos


Description
---

Add OpenSSL related changes to CHANGELOG.


Diffs
-

  CHANGELOG 197d7912fe229e3b5b52f97330fb327d7b2b9394 
  docs/upgrades.md 4a818df4993093770b49efb675b8078c42144241 


Diff: https://reviews.apache.org/r/70921/diff/1/


Testing
---


Thanks,

Benno Evers



Re: Review Request 70748: Changed semantics of libprocess TLS flags.

2019-06-21 Thread Benno Evers

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70748/
---

(Updated June 21, 2019, 3:01 p.m.)


Review request for mesos, Alexander Rukletsov, Jan-Philip Gehrcke, Joseph Wu, 
and Till Toenshoff.


Summary (updated)
-

Changed semantics of libprocess TLS flags.


Bugs: MESOS-9810
https://issues.apache.org/jira/browse/MESOS-9810


Repository: mesos


Description (updated)
---

This commit slightly updates the semants of the
`LIBPROCESS_SSL_VERIFY_CERT` and `LIBPROCESS_SSL_REQUIRE_CERT`
environment variables. The former now only applies to connections
in client mode and the latter now only applies to connections in
server mode.

In particular, in TLS server mode we now *only* verify client
certificates when `LIBPROCESS_SSL_REQUIRE_CERT` is set to `true`,
regardless of the value of `LIBPROCESS_SSL_VERIFY_CERT`.

In addtion, when in SSL client mode and  `LIBPROCESS_SSL_VERIFY_CERT`
has been set to `true`, enforce that the server actually presents a
certificate that can be verified. Note that this is expected to be
not a behavioural change in practice, since the TLS specification
already states that a server MUST always send a certificate unless an
anonymous cipher is used, and most TLS ciphersuites are configured to
exclude anonymous ciphers.


Diffs (updated)
-

  3rdparty/libprocess/src/openssl.hpp 17bec246e516261f8d772f1647c17f092fae82d1 
  3rdparty/libprocess/src/openssl.cpp e7dbd67913fa8e7fbbf60dee428e7e38895f86ce 
  3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp 
29a1bf71c1df9d80370455a6269ecea0ec4193b0 
  3rdparty/libprocess/src/tests/ssl_tests.cpp 
6b8496aeeed79ae1bd39d7013f4f403b248fdd4c 


Diff: https://reviews.apache.org/r/70748/diff/5/

Changes: https://reviews.apache.org/r/70748/diff/4-5/


Testing
---

So far, mostly manual testing.


Thanks,

Benno Evers



Re: Review Request 70748: Changed semantics of some libprocess TLS flags.

2019-06-21 Thread Benno Evers


> On June 21, 2019, 1:06 a.m., Till Toenshoff wrote:
> > 3rdparty/libprocess/src/openssl.cpp
> > Line 96 (original), 96 (patched)
> > 
> >
> > Additional to the updates above, we also need to call this out in the 
> > CHANGELOG. Can you please include that in this chain?

I added the `CHANGELOG` changes in a new review 
https://reviews.apache.org/r/70921/


- Benno


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70748/#review216009
---


On June 19, 2019, 2:28 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70748/
> ---
> 
> (Updated June 19, 2019, 2:28 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Jan-Philip Gehrcke, Joseph Wu, 
> and Till Toenshoff.
> 
> 
> Bugs: MESOS-9810
> https://issues.apache.org/jira/browse/MESOS-9810
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This commit slightly updates the semants of the
> `LIBPROCESS_SSL_VERIFY_CERT` and `LIBPROCESS_SSL_REQUIRE_CERT`
> environment variables. The former now only applies to connections
> in client mode and the latter now only applies to connections in
> server mode.
> 
> In particular, in TLS server mode we now *only* verify client
> certificates when `LIBPROCESS_SSL_REQUIRE_CERT` is set to `true`,
> regardless of the value of `LIBPROCESS_SSL_VERIFY_CERT`.
> 
> In addition, when in SSL client mode and  `LIBPROCESS_SSL_VERIFY_CERT`
> has been set to `true`, enforce that the server actually presents a
> certificate that can be verified. Note that this is expected to be
> not a behavioural change in practice, since the TLS specification
> already states that a server MUST always send a certificate unless an
> anonymous cipher is used, and most TLS ciphersuites are configured to
> exclude anonymous ciphers.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/openssl.hpp 
> 17bec246e516261f8d772f1647c17f092fae82d1 
>   3rdparty/libprocess/src/openssl.cpp 
> e7dbd67913fa8e7fbbf60dee428e7e38895f86ce 
>   3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp 
> 29a1bf71c1df9d80370455a6269ecea0ec4193b0 
>   3rdparty/libprocess/src/tests/ssl_tests.cpp 
> 6b8496aeeed79ae1bd39d7013f4f403b248fdd4c 
> 
> 
> Diff: https://reviews.apache.org/r/70748/diff/4/
> 
> 
> Testing
> ---
> 
> So far, mostly manual testing.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 70822: Added common protobufs for agent draining.

2019-06-21 Thread Joseph Wu


> On June 14, 2019, 2:34 a.m., Benjamin Bannier wrote:
> > src/messages/messages.proto
> > Lines 986 (patched)
> > 
> >
> > What does it mean if the master sends a `DRAINED` state to the agent? 
> > Is that something we need to reject in validation?
> > 
> > Does it maybe make sense to instead break out `DrainInfo.state` into 
> > its own message to use in state reporting?
> 
> Greg Mann wrote:
> Yes we can have a CHECK on the agent to make sure the master doesn't send 
> DRAINED in a DrainSlaveMessage. That will happen in another patch.
> 
> What's the benefit of moving `DrainInfo.State` outside of `DrainInfo`?
> 
> Benjamin Bannier wrote:
> The goal would be to use dedicated messages for triggering draining 
> (master->agent) and to report draining (agent->master). That might not only 
> be conceptually simplier, but would likely also lead to simpler, less 
> redundant code. Is there any benefit in using the same message?
> 
> Greg Mann wrote:
> What do you mean by "report draining"? The design doesn't involve any 
> explicit communication of draining progress between agent and master, the 
> master just receives task status updates.
> 
> Benjamin Bannier wrote:
> Sorry, there's no process of communicating `DrainInfo` back to master, 
> but we do report it to users with https://reviews.apache.org/r/70835/.
> 
> Greg Mann wrote:
> Are you proposing having a separate message for inclusion in the agent 
> API outputs?
> 
> Vinod Kone wrote:
> IIUC, Benjamin is questioning why we are storing both the spec (max grace 
> period, mark_gone) and the status (state) in the same proto (do we do this 
> elsewhere in the code base?). If they were separate protos, say DrainRequest 
> and DrainStatus, master would send DrainRequest to the agent, and show 
> DrainStatus (and possibly DrainRequest) in the operator API response. It 
> looks a bit weird that we would send DrainInfo (as it is currently written  
> to the agent and do a CHECK on valid state. The fact that an agent got a 
> Drain message is enough for the agent to know that it needs to drain, it 
> doesn't need to look into the `DrainInfo::State`?
> 
> Vinod Kone wrote:
> Let me repharse my first sentence. I think storing both spec and status 
> in the same proto is probably fine if it makes thing easy to expose in API 
> endpoints and use it internally in the master. But I still think it's weird 
> to send the status in a message to the agent which is not expected to use it. 
> So, if there's a way we can avoid it that would be great.

>From the master's perspective, having the `State` in the DrainInfo complicates 
>what the master needs to do in the registry.  To report the correct state to 
>any endpoints, the master will need to assume all agents are in a `DRAINING` 
>state, until the agent connects, reconciles, and says it has nothing running.  
>This means the state within the registry will always be `DRAINING` (and there 
>is no reason to transition that state).

So moving the `State` field into a separate message would be ideal.


- Joseph


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70822/#review215896
---


On June 18, 2019, 4:17 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70822/
> ---
> 
> (Updated June 18, 2019, 4:17 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, Benjamin Mahler, 
> Joseph Wu, and Vinod Kone.
> 
> 
> Bugs: MESOS-9753
> https://issues.apache.org/jira/browse/MESOS-9753
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch makes protobuf message updates which will be used
> by both the master and the agent to facilitate automatic
> draining of agents.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 2b4f350815935220c2d2b0dd0e52346bc74c91d9 
>   include/mesos/type_utils.hpp 57b1893160dbe874aa9fec00a3d1b640b9c54906 
>   include/mesos/v1/mesos.proto bafc27499f810791700c4a30dcb1da33b6f31d2e 
>   src/common/type_utils.cpp ef1b3ea15cde1c7a8e0735fb9d7566dd1fd2cfdb 
>   src/internal/devolve.hpp fefe86e450fa5083b9ff50e92f4594ffb30a54c8 
>   src/internal/devolve.cpp 1d300b49d5cc3de4b8ed409902eb881c7afc07ea 
>   src/internal/evolve.hpp 1044d9df75b6fc1f60d3704be9cb5751e6d4321d 
>   src/internal/evolve.cpp 19c155967bf090fb2ec39211805ff1385787ab59 
>   src/messages/messages.proto e30ad34cc9212b05f85ba5e1d4fcfc9e49ae92c0 
> 
> 
> Diff: https://reviews.apache.org/r/70822/diff/5/
> 
> 
> Testing
> ---
> 
> `make`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70911: Added master endpoints for agent draining.

2019-06-21 Thread Joseph Wu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70911/
---

(Updated June 21, 2019, 6:10 a.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, Greg Mann, and 
Vinod Kone.


Changes
---

Turns out that you can use proto3 objects in proto2, as long as you include 
them explicitly.  So the extra proto file can be removed.

Also added endpoint stubs.


Summary (updated)
-

Added master endpoints for agent draining.


Bugs: MESOS-9814
https://issues.apache.org/jira/browse/MESOS-9814


Repository: mesos


Description
---

This adds three master calls:
  * DRAIN_AGENT
  * DEACTIVATE_AGENT
  * REACTIVATE_AGENT

DRAIN_AGENT starts automated draining of tasks on the specified agent.
When marked for draining, the agent's resources will not be offered
and all the agent's tasks will be gracefully killed.

DEACTIVATE_AGENT stops offers for a specific agent.  This can be used
for manual draining of tasks, but is not limited to draining.
For example, tn operator could deactivate an agent prior to
reserving resources.

REACTIVATE_AGENT restarts offers for a specific agent, that had been
drained or deactivated previously.


Diffs (updated)
-

  include/mesos/master/master.proto 4a653ce59268a47d69362d0e2a3d5eb867ca6fa9 
  include/mesos/v1/master/master.proto d4dd3fcb0abc947097a5a281c4082ccf04db8614 
  src/internal/devolve.hpp fefe86e450fa5083b9ff50e92f4594ffb30a54c8 
  src/internal/devolve.cpp 1d300b49d5cc3de4b8ed409902eb881c7afc07ea 
  src/master/http.cpp 3cd7df228adb54a963821c23b4d1c26d33622ee7 
  src/master/master.hpp 7acaa8264eb7a37e17394bc4940971f872ab2de3 


Diff: https://reviews.apache.org/r/70911/diff/2/

Changes: https://reviews.apache.org/r/70911/diff/1-2/


Testing (updated)
---

make check


Thanks,

Joseph Wu



Re: Review Request 70798: Improved `namespaces/ipc` isolator for configurable IPC support.

2019-06-21 Thread Qian Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70798/
---

(Updated June 21, 2019, 9:08 p.m.)


Review request for mesos, Andrei Budnik and Gilbert Song.


Changes
---

Addressed review comments.


Bugs: MESOS-9788
https://issues.apache.org/jira/browse/MESOS-9788


Repository: mesos


Description
---

Improved `namespaces/ipc` isolator for configurable IPC support.


Diffs (updated)
-

  src/slave/containerizer/mesos/isolators/namespaces/ipc.hpp 
32c888309ca536d944e4d73641aed214805ccce2 
  src/slave/containerizer/mesos/isolators/namespaces/ipc.cpp 
6c8e8eed69ed355094f7d9b54bde25a8a9bc63f7 
  src/slave/containerizer/mesos/paths.hpp 
a5e0920bd4d753f6ba6f3b092c92cb405e9edf35 
  src/slave/containerizer/mesos/paths.cpp 
4281abc522c942c87fcfd811af26f95cbd6f734f 
  src/tests/containerizer/isolator_tests.cpp 
9c14f3acbc19631b2f5cac4dc7cd9caba8527712 


Diff: https://reviews.apache.org/r/70798/diff/4/

Changes: https://reviews.apache.org/r/70798/diff/3-4/


Testing
---


Thanks,

Qian Zhang



Re: Review Request 70907: Added recovery of agent drain information.

2019-06-21 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70907/#review216040
---



Patch looks great!

Reviews applied: [70822, 70839, 70834, 70835, 70836, 70899, 70900, 70901, 
70903, 70904, 70912, 70906, 70907]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers 
--disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; 
./support/docker-build.sh

- Mesos Reviewbot


On June 20, 2019, 9:45 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70907/
> ---
> 
> (Updated June 20, 2019, 9:45 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-9822
> https://issues.apache.org/jira/browse/MESOS-9822
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> With this patch the agent will, after executor reregistration finished,
> replay any active drain information so remaining tasks are drained as
> well. We need to wait until executors had a chance to register so they
> are not terminated should we try to send kill task request before the
> executor has registered.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp 30039b0857a4d85b4b96fa95d7f8724d57cdec6e 
>   src/slave/state.hpp 45836e5805a5038c0b3b5563ea995c3fa90ab808 
>   src/slave/state.cpp b4bf6292eb482f4e99a27442af0ea03e31c1ddc2 
>   src/tests/slave_tests.cpp c2035976713abb31b3646c0d23771fa40df93271 
> 
> 
> Diff: https://reviews.apache.org/r/70907/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 70749: Introduced optional new scheme for hostname validation.

2019-06-21 Thread Jan-Philip Gehrcke via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70749/#review215999
---


Fix it, then Ship it!




Just saw that I did not yet submit this review, it's from yesterday's 
conversation. Hope it still applies. Feel free to ignore.


3rdparty/libprocess/src/openssl.cpp
Lines 563 (patched)


Let's maybe emit a warning when VERIFY_CERT is set to true, but 
ALLOW_DOWNGRADE is also set to true


- Jan-Philip Gehrcke


On June 20, 2019, 5:48 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70749/
> ---
> 
> (Updated June 20, 2019, 5:48 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Joseph Wu, and Till Toenshoff.
> 
> 
> Bugs: MESOS-9809
> https://issues.apache.org/jira/browse/MESOS-9809
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This commit introduces a new libprocess SSL flag
> `hostname_validation_scheme`, which can be used to select
> between the previous hostname validation behaviour and a new
> option to use standardized OpenSSL algorithms to handle
> hostname validation as part of the handshake.
> 
> As a nice side-effect, the new scheme gets rid of reverse DNS
> lookups during TLS connection establishment, which used to be
> a common source of hard-to-debug unresponsiveness in Mesos
> components.
> 
> See `docs/ssl.md` in the follow-up commit for details of and
> differences between the schemes.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/include/process/ssl/flags.hpp 
> f3483f97f93bb29117b2c78f0f2ed9735d9c4b3a 
>   3rdparty/libprocess/src/openssl.hpp 
> 17bec246e516261f8d772f1647c17f092fae82d1 
>   3rdparty/libprocess/src/openssl.cpp 
> e7dbd67913fa8e7fbbf60dee428e7e38895f86ce 
>   3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.hpp 
> 6ef5a86566af3439cfe0b06ab3576076623f7be0 
>   3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp 
> 29a1bf71c1df9d80370455a6269ecea0ec4193b0 
> 
> 
> Diff: https://reviews.apache.org/r/70749/diff/4/
> 
> 
> Testing
> ---
> 
> Todo!
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 70795: Updated SSL docs to include new libprocess flag.

2019-06-21 Thread Benno Evers


> On June 21, 2019, 1:34 a.m., Till Toenshoff wrote:
> > docs/ssl.md
> > Lines 194 (patched)
> > 
> >
> > I wonder if we should already start a deprecation of the `libprocess` 
> > scheme - that would be:
> > - announcing that `openssl` will be standard soon on compatible boxes
> > - announcing it to be gone at some point
> > 
> > Or am I too eager for unification here?

It's actually a pretty big change - the 'libprocess' behaviour was built, I 
assume, to "magically" work with normal certificates w/o IP addresses despite 
libprocess only knowing about IP addresses. In DC/OS we don't notice most of 
it, since there all our certificates *do* contain the correct IP address, but 
at least quite a few unit tests will break by switching the default.

So I actually agree we should do this deprecation, but I'm not sure about the 
timeline.


- Benno


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70795/#review216020
---


On June 20, 2019, 5:27 p.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70795/
> ---
> 
> (Updated June 20, 2019, 5:27 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Jan-Philip Gehrcke, Joseph Wu, 
> and Till Toenshoff.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added a description of the new `--hostname_validation_algorithm` flag
> and corresponding `LIBPROCESS_SSL_HOSTNAME_VALIDATION_ALGORITHM`
> environment variable.
> 
> 
> Diffs
> -
> 
>   docs/ssl.md ce5058896144aa7824986d40d996899d92cb7c1c 
> 
> 
> Diff: https://reviews.apache.org/r/70795/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 70906: Fixed pid checkpointing for `TestContainerizer`.

2019-06-21 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70906/
---

(Updated June 21, 2019, 11:59 a.m.)


Review request for mesos, Gilbert Song, Greg Mann, and Joseph Wu.


Repository: mesos


Description
---

In order for a `MockExecutor` to be able to reregister after agent
restart a persisted pid is required. This patch adds checkpointing of
the pid.


Diffs (updated)
-

  src/tests/containerizer.cpp fab7e81339f3b7b3fec2a4f6cb7e3a52df2607ed 


Diff: https://reviews.apache.org/r/70906/diff/2/

Changes: https://reviews.apache.org/r/70906/diff/1-2/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 70820: Updated `filesystem/linux` isolator for configurable IPC support.

2019-06-21 Thread Qian Zhang


> On June 21, 2019, 8:27 a.m., Gilbert Song wrote:
> > could we consider to do the decouple now?
> 
> Gilbert Song wrote:
> probably still need a deprecation cycle :(

Yes, and I have added a TODO in mesos.proto for the deprecation cycle.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70820/#review216030
---


On June 12, 2019, 10:32 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70820/
> ---
> 
> (Updated June 12, 2019, 10:32 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9826
> https://issues.apache.org/jira/browse/MESOS-9826
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If `namespaces/ipc` isolator is not enabled, for backward
> compatibility /dev/shm will still be handled in `filesystem/linux`
> isolator as before. Otherwise, both /dev/shm and IPC namespace
> will be handled by `namespaces/ipc` isolator.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/filesystem/linux.cpp 
> 3cfb6e97a565420c8be2a0e31b481b39cd09d9da 
> 
> 
> Diff: https://reviews.apache.org/r/70820/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 70844: Implemented `cleanup` method of the `namespaces/ipc` isolator.

2019-06-21 Thread Qian Zhang


> On June 21, 2019, 8:26 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/isolators/namespaces/ipc.cpp
> > Lines 400 (patched)
> > 
> >
> > if the shmPath not exists, should we log a warning?

No, because shmPath not exists is a normal case for the containers which share 
its parent's shm.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70844/#review216029
---


On June 12, 2019, 10:36 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70844/
> ---
> 
> (Updated June 12, 2019, 10:36 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9788
> https://issues.apache.org/jira/browse/MESOS-9788
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Implemented `cleanup` method of the `namespaces/ipc` isolator.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/namespaces/ipc.hpp 
> 32c888309ca536d944e4d73641aed214805ccce2 
>   src/slave/containerizer/mesos/isolators/namespaces/ipc.cpp 
> 6c8e8eed69ed355094f7d9b54bde25a8a9bc63f7 
> 
> 
> Diff: https://reviews.apache.org/r/70844/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>



Re: Review Request 70798: Improved `namespaces/ipc` isolator for configurable IPC support.

2019-06-21 Thread Qian Zhang


> On June 21, 2019, 8:24 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/isolators/namespaces/ipc.cpp
> > Lines 52-54 (patched)
> > 
> >
> > consider to return Result for no parent case?

What do you mean for "no parent case"? This method will only be called for the 
nested container case, i.e., the container must have parent.


> On June 21, 2019, 8:24 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/isolators/namespaces/ipc.cpp
> > Lines 56 (patched)
> > 
> >
> > a bit confusing. the parentId could be assigned as a containerId 
> > without a parent?

That is for the top-level container which does not have parent. However, I 
agree it is confusing, let me improve it.


> On June 21, 2019, 8:24 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/isolators/namespaces/ipc.cpp
> > Lines 90-93 (patched)
> > 
> >
> > what if for SHARE_PARENT case but the container does not have a parent?

We will return `AGENT_SHM_PATH` because it is the case that a top-level 
container whose `ipc_mode` is `SHARE_PARENT` (i.e., it will share agent's 
/dev/shm).


> On June 21, 2019, 8:24 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/isolators/namespaces/ipc.cpp
> > Lines 137-143 (patched)
> > 
> >
> > is this  a new dependency?

Yes.


> On June 21, 2019, 8:24 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/isolators/namespaces/ipc.cpp
> > Lines 212 (patched)
> > 
> >
> > are we going to remove the `createContainerMount()` in linux filesystem 
> > isolator?

Do you mean the `createContainerMount()` in linux filesystem isolator for 
/dev/shm? in https://reviews.apache.org/r/70820 I have updated linux filesystem 
to do `createContainerMount()` for /dev/shm only when `namespaces/ipc` isolator 
is not enabled.


> On June 21, 2019, 8:24 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/isolators/namespaces/ipc.cpp
> > Line 94 (original), 382 (patched)
> > 
> >
> > if we introduce cleanup(), does it mean we could avoid the dependency 
> > on linux filesystem isolator?

No, we still need linux filesystem to ensure non-debug container has its own 
mount namespace and debug container enters its parent's mount namespace. We 
have the similar dependency in `namespaces/pid` isolator.


> On June 21, 2019, 8:24 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/paths.hpp
> > Lines 85 (patched)
> > 
> >
> > how about `CONTAINER_SHM_DIRECTORY` or just `CONTAINER_SHM`?

Agree! Let's go with `CONTAINER_SHM_DIRECTORY`.


- Qian


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70798/#review216026
---


On June 11, 2019, 10:33 p.m., Qian Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70798/
> ---
> 
> (Updated June 11, 2019, 10:33 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik and Gilbert Song.
> 
> 
> Bugs: MESOS-9788
> https://issues.apache.org/jira/browse/MESOS-9788
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Improved `namespaces/ipc` isolator for configurable IPC support.
> 
> 
> Diffs
> -
> 
>   src/slave/containerizer/mesos/isolators/namespaces/ipc.hpp 
> 32c888309ca536d944e4d73641aed214805ccce2 
>   src/slave/containerizer/mesos/isolators/namespaces/ipc.cpp 
> 6c8e8eed69ed355094f7d9b54bde25a8a9bc63f7 
>   src/slave/containerizer/mesos/paths.hpp 
> a5e0920bd4d753f6ba6f3b092c92cb405e9edf35 
>   src/slave/containerizer/mesos/paths.cpp 
> 4281abc522c942c87fcfd811af26f95cbd6f734f 
>   src/tests/containerizer/isolator_tests.cpp 
> 9c14f3acbc19631b2f5cac4dc7cd9caba8527712 
> 
> 
> Diff: https://reviews.apache.org/r/70798/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>