Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-17 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review212123
---


Ship it!




Ship It!

- Chun-Hung Hsiao


On Jan. 17, 2019, 11:17 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Jan. 17, 2019, 11:17 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441deb 
>   src/master/master.cpp 7c8d3ce3271fd1d6b8e6e85e996f96572a7d48a6 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
>   src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
>   src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 76ec56b530ed259b8e2cb7794074b6b72cec86b3 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/15/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-17 Thread Benjamin Bannier


> On Jan. 17, 2019, 5:42 a.m., Chun-Hung Hsiao wrote:
> > src/slave/http.cpp
> > Lines 3355 (patched)
> > 
> >
> > Most handlers print this line *before* creating the approvers. Let's 
> > move this before L3345.
> > 
> > (Not yours, but the logging of `addResourceProviderConfig`, 
> > `updateResourceProviderConfig` and `removeResourceProviderConfig` should be 
> > adjusted instead. We can address the consistency issue with a followup 
> > patch.)

Fixed and created https://reviews.apache.org/r/69782/.


> On Jan. 17, 2019, 5:42 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8020 (patched)
> > 
> >
> > How about adding the following check?
> > ```
> > CHECK(!protobuf::isTerminalState(operation->latest_status().state()));
> > ```

Good point, we actually only want to work on non-terminal operations here.

Adding a hard `CHECK` here couples us strongly to what operations the agent 
stores, something I'd like to avoid -- if only because it increases the amount 
of invariants one needs to keep in mind. How about just skipping non-terminal 
operations here instead? I updated the patch for that.

Dropping for now, please reopen if you feel this needs further discussion.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review212103
---


On Jan. 17, 2019, 12:17 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Jan. 17, 2019, 12:17 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441deb 
>   src/master/master.cpp 7c8d3ce3271fd1d6b8e6e85e996f96572a7d48a6 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
>   src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
>   src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 76ec56b530ed259b8e2cb7794074b6b72cec86b3 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/15/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-17 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
---

(Updated Jan. 17, 2019, 12:17 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
Schlicht.


Bugs: MESOS-8403
https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
---

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
  src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441deb 
  src/master/master.cpp 7c8d3ce3271fd1d6b8e6e85e996f96572a7d48a6 
  src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
  src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
  src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
  src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 76ec56b530ed259b8e2cb7794074b6b72cec86b3 


Diff: https://reviews.apache.org/r/68147/diff/15/

Changes: https://reviews.apache.org/r/68147/diff/14-15/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-16 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review212103
---


Fix it, then Ship it!





src/slave/http.cpp
Lines 3355 (patched)


Most handlers print this line *before* creating the approvers. Let's move 
this before L3345.

(Not yours, but the logging of `addResourceProviderConfig`, 
`updateResourceProviderConfig` and `removeResourceProviderConfig` should be 
adjusted instead. We can address the consistency issue with a followup patch.)



src/slave/slave.cpp
Lines 8020 (patched)


How about adding the following check?
```
CHECK(!protobuf::isTerminalState(operation->latest_status().state()));
```



src/slave/slave.cpp
Lines 8020 (patched)






src/slave/slave.cpp
Lines 8020 (patched)






src/slave/slave.cpp
Lines 8020 (patched)






src/slave/slave.cpp
Lines 8034-8038 (patched)


```
protobuf::createOperationStatus(
OPERATION_GONE_BY_OPERATOR,
operationId,
"The resource provider ... received",
None(),
None(),
info.id());
```
The we can get rid of L8051-8053.


- Chun-Hung Hsiao


On Jan. 16, 2019, 10:51 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Jan. 16, 2019, 10:51 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441deb 
>   src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
>   src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
>   src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 79a2d2966ff6533629aeeefaaaf19e849b76e0e6 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/14/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-16 Thread Benjamin Bannier


> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > 
> >
> > Let's validate that there is no task using the resources provided by 
> > this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
> I believe this should be treated as orthogonal issue. It might e.g., 
> happen that the user triggering the RP removal cannot kill tasks or prevent 
> them from being rescheduled. I'd suggest to leaves as is.
> 
> WDYT?
> 
> Chun-Hung Hsiao wrote:
> The scenario I am worried about is that when the RP is marked as gone, 
> the RP may fail to tear itself down if there is any task using its resources.
> 
> Benjamin Bannier wrote:
> We don't seem to have a good story on incompatible resource changes ATM. 
> If we e.g., update a RP config in an incompatible way, we make not attempt to 
> detect or rectify this in any way, either. At least in the agent we were in 
> general we were operating under the assumption that tasks would die on 
> incompatible changes to their resources.
> 
> I'd suggest to drop this issue for now and potentially file a JIRA to 
> track draining and situations where it might be needed.
> 
> Benjamin Bannier wrote:
> Dropping this, see above comment.
> 
> Chun-Hung Hsiao wrote:
> Reopening this. Here's my suggestion:
> 
> 1. Create a ticket for a proper design to drain the resource provider.
> 2. For now, let's fail the call here if 
> `slave->resourceProviders.at(resourceProviderId)->totalResources` is not 
> empty. In other words, the operator must first reconfigure the resource 
> provider to "drain" itself before making the `MARK_RESOURCE_PROVIDER_GONE` 
> call.
> 
> Benjamin Bannier wrote:
> Okay.
> 
> 1. I created https://issues.apache.org/jira/browse/MESOS-9522.
> 2. We had discussed this some time ago, and I think agreed that 
> preventing operators from marking resource providers as gone whenever a task 
> was running on them was not good (especially if operators cannot drain the 
> RP). We have prior art in `MARK_AGENT_GONE` which asks operators to drain 
> agent before marking as gone. I would suggest for now with MESOS-9522 not 
> implemented we allow operators to mark resource providers as gone even if 
> some of their resources are in use; if needed they can drain them by 
> filtering all tasks using their resources. This also wouldn't expose a race.
> 
> Does this address your concerns?
> 
> Chun-Hung Hsiao wrote:
> I agreed that checking if there is any task running on resources of an RP 
> when marking it as gone would lead to some complexity, so I'm suggesting to 
> use a stronger condition: there must be no resource.
> 
> I'm not convinced that asking the operator to drain the RP is a bad idea. 
> My main concern is that, since `MARK_RESOURCE_PROVIDER_GONE` cannot be 
> undone, we should only perform it when it is safe to do, so the operator can 
> have a chance to fix the problem. Also, this restriction would ensure that no 
> framework will have an inconsistent view of cluster resources. WDYT?

Sorry for not reading your comment -- only allowing marking RPs without 
resources as gone sounds good to me. Updated this and the follow-up patch & 
marking as _Fixed_.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
---


On Jan. 16, 2019, 11:51 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Jan. 16, 2019, 11:51 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441deb 
>   src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
>   src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
>   src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
>   src/slave/validation.cpp 

Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-16 Thread Benjamin Bannier


> On Jan. 15, 2019, 6:53 a.m., Chun-Hung Hsiao wrote:
> > src/tests/api_tests.cpp
> > Lines 7857 (patched)
> > 
> >
> > ```
> > Owned detector = master.get()->createDetector();
> > ```
> > And
> > ```
> > Try> slave = StartSlave(detector.get(), 
> > slavFlags);
> > ```
> > below.
> 
> Benjamin Bannier wrote:
> Could you explain the pattern we follow? I mostly went for a 
> StandaloneMasterDetector instead of an Owned because of the 
> conceptually simpler type (value vs pointer), but we seem to use both here.
> 
> Chun-Hung Hsiao wrote:
> I'm in favor of abstract types (e.g., `MasterDetector`) over concrete 
> subtypes (e.g., `StandaloneMasterDetector`). Please feel free to drop this if 
> you prefer avoiding the pointer type :)

Dropping.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review211997
---


On Jan. 16, 2019, 11:51 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Jan. 16, 2019, 11:51 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441deb 
>   src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
>   src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
>   src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 79a2d2966ff6533629aeeefaaaf19e849b76e0e6 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/14/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-16 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
---

(Updated Jan. 16, 2019, 11:51 a.m.)


Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
Schlicht.


Changes
---

Only allow RPs w/o resources to be marked as gone.


Bugs: MESOS-8403
https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
---

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
  src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441deb 
  src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 
  src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
  src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
  src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
  src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 79a2d2966ff6533629aeeefaaaf19e849b76e0e6 


Diff: https://reviews.apache.org/r/68147/diff/14/

Changes: https://reviews.apache.org/r/68147/diff/13-14/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-15 Thread Chun-Hung Hsiao


> On Aug. 17, 2018, 10:29 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > 
> >
> > Let's validate that there is no task using the resources provided by 
> > this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
> I believe this should be treated as orthogonal issue. It might e.g., 
> happen that the user triggering the RP removal cannot kill tasks or prevent 
> them from being rescheduled. I'd suggest to leaves as is.
> 
> WDYT?
> 
> Chun-Hung Hsiao wrote:
> The scenario I am worried about is that when the RP is marked as gone, 
> the RP may fail to tear itself down if there is any task using its resources.
> 
> Benjamin Bannier wrote:
> We don't seem to have a good story on incompatible resource changes ATM. 
> If we e.g., update a RP config in an incompatible way, we make not attempt to 
> detect or rectify this in any way, either. At least in the agent we were in 
> general we were operating under the assumption that tasks would die on 
> incompatible changes to their resources.
> 
> I'd suggest to drop this issue for now and potentially file a JIRA to 
> track draining and situations where it might be needed.
> 
> Benjamin Bannier wrote:
> Dropping this, see above comment.
> 
> Chun-Hung Hsiao wrote:
> Reopening this. Here's my suggestion:
> 
> 1. Create a ticket for a proper design to drain the resource provider.
> 2. For now, let's fail the call here if 
> `slave->resourceProviders.at(resourceProviderId)->totalResources` is not 
> empty. In other words, the operator must first reconfigure the resource 
> provider to "drain" itself before making the `MARK_RESOURCE_PROVIDER_GONE` 
> call.
> 
> Benjamin Bannier wrote:
> Okay.
> 
> 1. I created https://issues.apache.org/jira/browse/MESOS-9522.
> 2. We had discussed this some time ago, and I think agreed that 
> preventing operators from marking resource providers as gone whenever a task 
> was running on them was not good (especially if operators cannot drain the 
> RP). We have prior art in `MARK_AGENT_GONE` which asks operators to drain 
> agent before marking as gone. I would suggest for now with MESOS-9522 not 
> implemented we allow operators to mark resource providers as gone even if 
> some of their resources are in use; if needed they can drain them by 
> filtering all tasks using their resources. This also wouldn't expose a race.
> 
> Does this address your concerns?

I agreed that checking if there is any task running on resources of an RP when 
marking it as gone would lead to some complexity, so I'm suggesting to use a 
stronger condition: there must be no resource.

I'm not convinced that asking the operator to drain the RP is a bad idea. My 
main concern is that, since `MARK_RESOURCE_PROVIDER_GONE` cannot be undone, we 
should only perform it when it is safe to do, so the operator can have a chance 
to fix the problem. Also, this restriction would ensure that no framework will 
have an inconsistent view of cluster resources. WDYT?


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
---


On Jan. 15, 2019, 3:03 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Jan. 15, 2019, 3:03 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441deb 
>   src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
>   src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
>   src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 79a2d2966ff6533629aeeefaaaf19e849b76e0e6 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/13/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-15 Thread Chun-Hung Hsiao


> On Jan. 15, 2019, 5:53 a.m., Chun-Hung Hsiao wrote:
> > src/tests/api_tests.cpp
> > Lines 7857 (patched)
> > 
> >
> > ```
> > Owned detector = master.get()->createDetector();
> > ```
> > And
> > ```
> > Try> slave = StartSlave(detector.get(), 
> > slavFlags);
> > ```
> > below.
> 
> Benjamin Bannier wrote:
> Could you explain the pattern we follow? I mostly went for a 
> StandaloneMasterDetector instead of an Owned because of the 
> conceptually simpler type (value vs pointer), but we seem to use both here.

I'm in favor of abstract types (e.g., `MasterDetector`) over concrete subtypes 
(e.g., `StandaloneMasterDetector`). Please feel free to drop this if you prefer 
avoiding the pointer type :)


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review211997
---


On Jan. 15, 2019, 3:03 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Jan. 15, 2019, 3:03 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441deb 
>   src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
>   src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
>   src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 79a2d2966ff6533629aeeefaaaf19e849b76e0e6 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/13/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-15 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
---

(Updated Jan. 15, 2019, 4:03 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
Schlicht.


Bugs: MESOS-8403
https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
---

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
  src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441deb 
  src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 
  src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
  src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
  src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
  src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 79a2d2966ff6533629aeeefaaaf19e849b76e0e6 


Diff: https://reviews.apache.org/r/68147/diff/13/

Changes: https://reviews.apache.org/r/68147/diff/12-13/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-15 Thread Benjamin Bannier


> On Jan. 15, 2019, 6:42 a.m., Chun-Hung Hsiao wrote:
> > src/master/master.cpp
> > Lines 11248 (patched)
> > 
> >
> > Can you add a comment explaining why it is not necessary to call 
> > `allocator->recoverResources(...)` here, and that the 
> > `slave->recoverResources(...)` and `framework->recoverResources(...)` below 
> > are unrelated and thus the bookkeeping would still be consistent even if 
> > `allocator->recoverResources(...)` is not called?

I moved into a block together with the other failed, terminal status since this 
makes immediate sense. I also updated the test to assert that the master sees 
the status update before the `UpdateSlaveMessage` as the agent only forwards 
status updates for operations it knows and the resource accounting in the 
master is only safe for that case.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review211993
---


On Jan. 15, 2019, 3:53 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Jan. 15, 2019, 3:53 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
>   src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/12/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-15 Thread Benjamin Bannier


> On Jan. 15, 2019, 6:53 a.m., Chun-Hung Hsiao wrote:
> > src/tests/api_tests.cpp
> > Lines 7857 (patched)
> > 
> >
> > ```
> > Owned detector = master.get()->createDetector();
> > ```
> > And
> > ```
> > Try> slave = StartSlave(detector.get(), 
> > slavFlags);
> > ```
> > below.

Could you explain the pattern we follow? I mostly went for a 
StandaloneMasterDetector instead of an Owned because of the 
conceptually simpler type (value vs pointer), but we seem to use both here.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review211997
---


On Jan. 15, 2019, 3:53 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Jan. 15, 2019, 3:53 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
>   src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/12/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-15 Thread Benjamin Bannier


> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > 
> >
> > Let's validate that there is no task using the resources provided by 
> > this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
> I believe this should be treated as orthogonal issue. It might e.g., 
> happen that the user triggering the RP removal cannot kill tasks or prevent 
> them from being rescheduled. I'd suggest to leaves as is.
> 
> WDYT?
> 
> Chun-Hung Hsiao wrote:
> The scenario I am worried about is that when the RP is marked as gone, 
> the RP may fail to tear itself down if there is any task using its resources.
> 
> Benjamin Bannier wrote:
> We don't seem to have a good story on incompatible resource changes ATM. 
> If we e.g., update a RP config in an incompatible way, we make not attempt to 
> detect or rectify this in any way, either. At least in the agent we were in 
> general we were operating under the assumption that tasks would die on 
> incompatible changes to their resources.
> 
> I'd suggest to drop this issue for now and potentially file a JIRA to 
> track draining and situations where it might be needed.
> 
> Benjamin Bannier wrote:
> Dropping this, see above comment.
> 
> Chun-Hung Hsiao wrote:
> Reopening this. Here's my suggestion:
> 
> 1. Create a ticket for a proper design to drain the resource provider.
> 2. For now, let's fail the call here if 
> `slave->resourceProviders.at(resourceProviderId)->totalResources` is not 
> empty. In other words, the operator must first reconfigure the resource 
> provider to "drain" itself before making the `MARK_RESOURCE_PROVIDER_GONE` 
> call.

Okay.

1. I created https://issues.apache.org/jira/browse/MESOS-9522.
2. We had discussed this some time ago, and I think agreed that preventing 
operators from marking resource providers as gone whenever a task was running 
on them was not good (especially if operators cannot drain the RP). We have 
prior art in `MARK_AGENT_GONE` which asks operators to drain agent before 
marking as gone. I would suggest for now with MESOS-9522 not implemented we 
allow operators to mark resource providers as gone even if some of their 
resources are in use; if needed they can drain them by filtering all tasks 
using their resources. This also wouldn't expose a race.

Does this address your concerns?


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
---


On Jan. 15, 2019, 3:53 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Jan. 15, 2019, 3:53 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
>   src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/12/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-15 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
---

(Updated Jan. 15, 2019, 3:53 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
Schlicht.


Changes
---

Addressed comments from Chun.


Bugs: MESOS-8403
https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
---

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs
-

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
  src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
  src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
  src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
  src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
  src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
  src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 


Diff: https://reviews.apache.org/r/68147/diff/12/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-14 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review211997
---




src/tests/api_tests.cpp
Lines 7857 (patched)


```
Owned detector = master.get()->createDetector();
```
And
```
Try> slave = StartSlave(detector.get(), slavFlags);
```
below.


- Chun-Hung Hsiao


On Oct. 25, 2018, 10:47 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Oct. 25, 2018, 10:47 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
>   src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/12/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-14 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review211993
---




src/master/master.cpp
Lines 11248 (patched)


Can you add a comment explaining why it is not necessary to call 
`allocator->recoverResources(...)` here, and that the 
`slave->recoverResources(...)` and `framework->recoverResources(...)` below are 
unrelated and thus the bookkeeping would still be consistent even if 
`allocator->recoverResources(...)` is not called?



src/slave/http.cpp
Lines 3310 (patched)


Please add the following log line here:
```
LOG(INFO)
  << "Processing MARK_RESOURCE_PROVIDER_GONE for resource provider "
  << resourceProviderId;
```



src/tests/api_tests.cpp
Lines 7877 (patched)


This is not necessary since the `AWAIT_READY` below enforces the proper 
synchronization.



src/tests/api_tests.cpp
Lines 7905 (patched)


`std::move(endpointDetector)`


- Chun-Hung Hsiao


On Oct. 25, 2018, 10:47 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Oct. 25, 2018, 10:47 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
>   src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/12/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2019-01-14 Thread Chun-Hung Hsiao


> On Aug. 17, 2018, 10:29 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > 
> >
> > Let's validate that there is no task using the resources provided by 
> > this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
> I believe this should be treated as orthogonal issue. It might e.g., 
> happen that the user triggering the RP removal cannot kill tasks or prevent 
> them from being rescheduled. I'd suggest to leaves as is.
> 
> WDYT?
> 
> Chun-Hung Hsiao wrote:
> The scenario I am worried about is that when the RP is marked as gone, 
> the RP may fail to tear itself down if there is any task using its resources.
> 
> Benjamin Bannier wrote:
> We don't seem to have a good story on incompatible resource changes ATM. 
> If we e.g., update a RP config in an incompatible way, we make not attempt to 
> detect or rectify this in any way, either. At least in the agent we were in 
> general we were operating under the assumption that tasks would die on 
> incompatible changes to their resources.
> 
> I'd suggest to drop this issue for now and potentially file a JIRA to 
> track draining and situations where it might be needed.
> 
> Benjamin Bannier wrote:
> Dropping this, see above comment.

Reopening this. Here's my suggestion:

1. Create a ticket for a proper design to drain the resource provider.
2. For now, let's fail the call here if 
`slave->resourceProviders.at(resourceProviderId)->totalResources` is not empty. 
In other words, the operator must first reconfigure the resource provider to 
"drain" itself before making the `MARK_RESOURCE_PROVIDER_GONE` call.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
---


On Oct. 25, 2018, 10:47 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Oct. 25, 2018, 10:47 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
>   src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/12/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-12-17 Thread Benjamin Bannier


> On Dec. 1, 2018, 4:32 a.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Lines 8209-8210 (patched)
> > 
> >
> > Some high-level feedback after my first glance at this patch: for the 
> > MARK_AGENT_GONE call, the master checkpoints a list of gone agents so that 
> > it can prevent agents in that list from reregistering in the future. While 
> > the size of this list is constrained, and thus it is possible that a gone 
> > agent can reregister in the future if it has been GC'd from the list, this 
> > still allows the master to provide a best-effort guarantee that such agents 
> > will not return.
> > 
> > Should the agent checkpoint a list of gone RPs so that it can attempt 
> > to provide the same guarantee?

In the current design this would happen not in the agent, but the RP manager. 
With this patch we already check if resubscribing RPs do not use a gone RP ID. 
Limiting the size of that list is currently not implemented.

Dropping.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review210994
---


On Oct. 25, 2018, 12:47 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Oct. 25, 2018, 12:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
>   src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/12/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-11-30 Thread Greg Mann

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review210994
---




src/slave/slave.cpp
Lines 8209-8210 (patched)


Some high-level feedback after my first glance at this patch: for the 
MARK_AGENT_GONE call, the master checkpoints a list of gone agents so that it 
can prevent agents in that list from reregistering in the future. While the 
size of this list is constrained, and thus it is possible that a gone agent can 
reregister in the future if it has been GC'd from the list, this still allows 
the master to provide a best-effort guarantee that such agents will not return.

Should the agent checkpoint a list of gone RPs so that it can attempt to 
provide the same guarantee?


- Greg Mann


On Oct. 25, 2018, 10:47 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Oct. 25, 2018, 10:47 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
>   src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/12/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-10-31 Thread Benjamin Bannier


> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 7934 (patched)
> > 
> >
> > `OPERATION_GONE_BY_OPERATOR` is not a terminal state:
> > 
> > https://github.com/apache/mesos/blob/808485da01387bd27a51cb82a90b1f8301d613ee/include/mesos/mesos.proto#L2342-L2349
> 
> Benjamin Bannier wrote:
> From https://reviews.apache.org/r/68410/#comment_rc290945-207560,
> 
> > ... It seems that OPERATOR_GONE_BY_OPERATOR does not carry the 
> semantics we want. Looking at the other possible error codes, it seems we 
> could either use OPERATION_ERROR or OPERATION_FAILED instead where 
> OPERATION_ERROR possibly better fits our requirements as the operation's side 
> effects might have materialized. That would mean we should drop this patch 
> completely.  
> > WDYT?
> 
> Chun-Hung Hsiao wrote:
> I'm not sure if either is a good choice:
> 
> `OPERATION_ERROR` usually means some validation failure before applying 
> the operation, the consumed resource remains (i.e., can be used by other 
> operations), but the operation is non-retryable (i.e., the caller must fix 
> the error).
> `OPERATION_FAILED` usually means something goes wrong after applying the 
> operation, the consumed resource remains, and the operation is retryable.
> 
> In the case of mark RP gone, although we can argue that any operation 
> becomes an non-retryable one and we don't care about the consumed resoure 
> because it will then be removed. There still are some issues:
> 
> 1. The reason of the operation being non-retryable is not due to any 
> client-side error, but a server-side "error," and this seems inconsistent to 
> the current `OPERATION_ERROR` usage.
> 2. We transition an operation to `OPERATION_GONE_BY_OPERATOR` when 
> marking an agent as gone, but to `OPERATION_ERROR` when marking a local RP as 
> gone. This sounds inconsistent to me from an API perspective.
> 3. For `OPERATION_ERROR` and `OPERATION_FAILED`, Mesos provides reliable 
> retry (at least under "normal" circumstances), and expects acks. For 
> `OPERATION_GONE_BY_OPERATOR`, we could just do a best-effort notification 
> (the implementation is missing though) and don't need any ack, similar to 
> `TASK_GONE_BY_OPERATOR`. In the case of mark RP as gone, we probably want to 
> go with the later.
> 
> So it seems better if we still go with `OPERATION_GONE_BY_OPERATOR` and 
> make the proper master changes. And we need to make sure that we call 
> `allocator->updateSlave` before calling `allocator->recoverResources` to 
> avoid the consumed resource being offered to other framework in between.

I added changes to use `OPERATION_GONE_BY_OPERATOR` here, and to treat it as 
terminal.


> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > 
> >
> > Let's validate that there is no task using the resources provided by 
> > this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
> I believe this should be treated as orthogonal issue. It might e.g., 
> happen that the user triggering the RP removal cannot kill tasks or prevent 
> them from being rescheduled. I'd suggest to leaves as is.
> 
> WDYT?
> 
> Chun-Hung Hsiao wrote:
> The scenario I am worried about is that when the RP is marked as gone, 
> the RP may fail to tear itself down if there is any task using its resources.
> 
> Benjamin Bannier wrote:
> We don't seem to have a good story on incompatible resource changes ATM. 
> If we e.g., update a RP config in an incompatible way, we make not attempt to 
> detect or rectify this in any way, either. At least in the agent we were in 
> general we were operating under the assumption that tasks would die on 
> incompatible changes to their resources.
> 
> I'd suggest to drop this issue for now and potentially file a JIRA to 
> track draining and situations where it might be needed.

Dropping this, see above comment.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
---


On Oct. 25, 2018, 12:47 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Oct. 25, 2018, 12:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds 

Re: Review Request 68147: Added agent support to remove local resource providers.

2018-10-25 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
---

(Updated Oct. 25, 2018, 12:47 p.m.)


Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
Schlicht.


Changes
---

Rebased; address comments from Chun.


Bugs: MESOS-8403
https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
---

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
  src/common/protobuf_utils.cpp 77139d8a3931dc1e9e00fcea83d77d96244a34f3 
  src/master/master.cpp 0c95c438975efd949cbf86f7d8bfea940c20a43a 
  src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
  src/slave/http.cpp a4db532cc6477c386fcd9bf563895214e95a475a 
  src/slave/slave.hpp 0bd340176e2a8cefdfa7ef71e059441fb171aff6 
  src/slave/slave.cpp 7bb2b291a7b67563adf88bcedca1392890f17ffe 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp c681b9b0b83f7472312083730a5433e3d6f0efc0 


Diff: https://reviews.apache.org/r/68147/diff/9/

Changes: https://reviews.apache.org/r/68147/diff/8-9/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-10-17 Thread Benjamin Bannier


> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > 
> >
> > Let's validate that there is no task using the resources provided by 
> > this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
> I believe this should be treated as orthogonal issue. It might e.g., 
> happen that the user triggering the RP removal cannot kill tasks or prevent 
> them from being rescheduled. I'd suggest to leaves as is.
> 
> WDYT?
> 
> Chun-Hung Hsiao wrote:
> The scenario I am worried about is that when the RP is marked as gone, 
> the RP may fail to tear itself down if there is any task using its resources.

We don't seem to have a good story on incompatible resource changes ATM. If we 
e.g., update a RP config in an incompatible way, we make not attempt to detect 
or rectify this in any way, either. At least in the agent we were in general we 
were operating under the assumption that tasks would die on incompatible 
changes to their resources.

I'd suggest to drop this issue for now and potentially file a JIRA to track 
draining and situations where it might be needed.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
---


On Aug. 20, 2018, 8:47 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 20, 2018, 8:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp b1d695b59a63a5af34b19faff16bf6c82b7e7d88 
>   src/slave/slave.cpp e6c7e686f287fb4448a0074d4e99298665fc866d 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp ee82350f7a6c6d44ba0590608e7d9a223c0be169 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/8/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-27 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207982
---



Patch looks great!

Reviews applied: [68407, 68143, 68144, 68146, 68362, 68145, 68147]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Aug. 20, 2018, 6:47 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 20, 2018, 6:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp b1d695b59a63a5af34b19faff16bf6c82b7e7d88 
>   src/slave/slave.cpp e6c7e686f287fb4448a0074d4e99298665fc866d 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp ee82350f7a6c6d44ba0590608e7d9a223c0be169 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/8/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-27 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207979
---



PASS: Mesos patch 68147 was successfully built and tested.

Reviews applied: `['68407', '68143', '68144', '68146', '68362', '68145', 
'68147']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2242/mesos-review-68147

- Mesos Reviewbot Windows


On Aug. 20, 2018, 6:47 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 20, 2018, 6:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp b1d695b59a63a5af34b19faff16bf6c82b7e7d88 
>   src/slave/slave.cpp e6c7e686f287fb4448a0074d4e99298665fc866d 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp ee82350f7a6c6d44ba0590608e7d9a223c0be169 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/8/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-20 Thread Chun-Hung Hsiao


> On Aug. 17, 2018, 10:29 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 7934 (patched)
> > 
> >
> > `OPERATION_GONE_BY_OPERATOR` is not a terminal state:
> > 
> > https://github.com/apache/mesos/blob/808485da01387bd27a51cb82a90b1f8301d613ee/include/mesos/mesos.proto#L2342-L2349
> 
> Benjamin Bannier wrote:
> From https://reviews.apache.org/r/68410/#comment_rc290945-207560,
> 
> > ... It seems that OPERATOR_GONE_BY_OPERATOR does not carry the 
> semantics we want. Looking at the other possible error codes, it seems we 
> could either use OPERATION_ERROR or OPERATION_FAILED instead where 
> OPERATION_ERROR possibly better fits our requirements as the operation's side 
> effects might have materialized. That would mean we should drop this patch 
> completely.  
> > WDYT?

I'm not sure if either is a good choice:

`OPERATION_ERROR` usually means some validation failure before applying the 
operation, the consumed resource remains (i.e., can be used by other 
operations), but the operation is non-retryable (i.e., the caller must fix the 
error).
`OPERATION_FAILED` usually means something goes wrong after applying the 
operation, the consumed resource remains, and the operation is retryable.

In the case of mark RP gone, although we can argue that any operation becomes 
an non-retryable one and we don't care about the consumed resoure because it 
will then be removed. There still are some issues:

1. The reason of the operation being non-retryable is not due to any 
client-side error, but a server-side "error," and this seems inconsistent to 
the current `OPERATION_ERROR` usage.
2. We transition an operation to `OPERATION_GONE_BY_OPERATOR` when marking an 
agent as gone, but to `OPERATION_ERROR` when marking a local RP as gone. This 
sounds inconsistent to me from an API perspective.
3. For `OPERATION_ERROR` and `OPERATION_FAILED`, Mesos provides reliable retry 
(at least under "normal" circumstances), and expects acks. For 
`OPERATION_GONE_BY_OPERATOR`, we could just do a best-effort notification (the 
implementation is missing though) and don't need any ack, similar to 
`TASK_GONE_BY_OPERATOR`. In the case of mark RP as gone, we probably want to go 
with the later.

So it seems better if we still go with `OPERATION_GONE_BY_OPERATOR` and make 
the proper master changes. And we need to make sure that we call 
`allocator->updateSlave` before calling `allocator->recoverResources` to avoid 
the consumed resource being offered to other framework in between.


> On Aug. 17, 2018, 10:29 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > 
> >
> > Let's validate that there is no task using the resources provided by 
> > this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
> I believe this should be treated as orthogonal issue. It might e.g., 
> happen that the user triggering the RP removal cannot kill tasks or prevent 
> them from being rescheduled. I'd suggest to leaves as is.
> 
> WDYT?

The scenario I am worried about is that when the RP is marked as gone, the RP 
may fail to tear itself down if there is any task using its resources.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
---


On Aug. 20, 2018, 6:47 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 20, 2018, 6:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan 
> Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 0420109ac93e1249906c52437e5859c5ee033fb6 
>   src/slave/slave.cpp 679394a549cdfe84d64a102164c8652ad96f1eb2 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/7/
> 
> 
> Testing
> ---
> 
> `make check`
> 

Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-20 Thread Benjamin Bannier


> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 7934 (patched)
> > 
> >
> > `OPERATION_GONE_BY_OPERATOR` is not a terminal state:
> > 
> > https://github.com/apache/mesos/blob/808485da01387bd27a51cb82a90b1f8301d613ee/include/mesos/mesos.proto#L2342-L2349

>From https://reviews.apache.org/r/68410/#comment_rc290945-207560,

> ... It seems that OPERATOR_GONE_BY_OPERATOR does not carry the semantics we 
> want. Looking at the other possible error codes, it seems we could either use 
> OPERATION_ERROR or OPERATION_FAILED instead where OPERATION_ERROR possibly 
> better fits our requirements as the operation's side effects might have 
> materialized. That would mean we should drop this patch completely.  
> WDYT?


> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > 
> >
> > Let's validate that there is no task using the resources provided by 
> > this RP before doing the removal, or fail the call.

I believe this should be treated as orthogonal issue. It might e.g., happen 
that the user triggering the RP removal cannot kill tasks or prevent them from 
being rescheduled. I'd suggest to leaves as is.

WDYT?


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
---


On Aug. 20, 2018, 11:52 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 20, 2018, 11:52 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 0420109ac93e1249906c52437e5859c5ee033fb6 
>   src/slave/slave.cpp 679394a549cdfe84d64a102164c8652ad96f1eb2 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/7/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-20 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
---

(Updated Aug. 20, 2018, 11:52 a.m.)


Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Changes
---

Use `OPERATION_ERROR` instead of `OPERATION_GONE_BY_OPERATOR`.


Bugs: MESOS-8403
https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
---

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
  src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
  src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
  src/slave/slave.hpp 0420109ac93e1249906c52437e5859c5ee033fb6 
  src/slave/slave.cpp 679394a549cdfe84d64a102164c8652ad96f1eb2 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 


Diff: https://reviews.apache.org/r/68147/diff/7/

Changes: https://reviews.apache.org/r/68147/diff/6-7/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-17 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
---




src/slave/slave.cpp
Lines 7934 (patched)


`OPERATION_GONE_BY_OPERATOR` is not a terminal state:

https://github.com/apache/mesos/blob/808485da01387bd27a51cb82a90b1f8301d613ee/include/mesos/mesos.proto#L2342-L2349



src/slave/slave.cpp
Lines 8195 (patched)


Let's validate that there is no task using the resources provided by this 
RP before doing the removal, or fail the call.


- Chun-Hung Hsiao


On Aug. 17, 2018, 1:54 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 17, 2018, 1:54 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/6/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-17 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
---

(Updated Aug. 17, 2018, 3:54 p.m.)


Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Bugs: MESOS-8403
https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
---

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
  src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
  src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
  src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
  src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 


Diff: https://reviews.apache.org/r/68147/diff/6/

Changes: https://reviews.apache.org/r/68147/diff/5-6/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-17 Thread Benjamin Bannier


> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/http.cpp
> > Lines 3356 (patched)
> > 
> >
> > We should return an `InternalServerError` on failed/discarded.
> 
> Benjamin Bannier wrote:
> I don't think this is required as a failed `Future` is automatically 
> mapped onto an `InternalServerError`.
> 
> Chun-Hung Hsiao wrote:
> You are right about this. Which one is more appropriate in your opinion: 
> `InternalServerError` or `ServiceUnavailable`? Feel free to drop this if you 
> prefer the former.

I prefer `InternalServerError` as this is very likely a non-transient error.


> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8107-8108 (patched)
> > 
> >
> > `"Resource provider manager is not initialized"`
> 
> Benjamin Bannier wrote:
> Wouldn't that be leaking internal information? I tried to map this onto 
> an operator-relevant issue here.
> 
> Chun-Hung Hsiao wrote:
> How about `"Agent has not registered yet"`?

Okay.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207351
---


On Aug. 16, 2018, 4:31 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 16, 2018, 4:31 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-17 Thread Benjamin Bannier


> On Aug. 17, 2018, 1:27 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 7934 (patched)
> > 
> >
> > This line combined with Line 8079 will crash the agent.

Ouch, added https://reviews.apache.org/r/68410.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207450
---


On Aug. 16, 2018, 4:31 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 16, 2018, 4:31 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-16 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207450
---




src/slave/slave.cpp
Lines 7934 (patched)


This line combined with Line 8079 will crash the agent.


- Chun-Hung Hsiao


On Aug. 16, 2018, 2:31 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 16, 2018, 2:31 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-16 Thread Chun-Hung Hsiao


> On Aug. 15, 2018, 8:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/http.cpp
> > Lines 3356 (patched)
> > 
> >
> > We should return an `InternalServerError` on failed/discarded.
> 
> Benjamin Bannier wrote:
> I don't think this is required as a failed `Future` is automatically 
> mapped onto an `InternalServerError`.

You are right about this. Which one is more appropriate in your opinion: 
`InternalServerError` or `ServiceUnavailable`? Feel free to drop this if you 
prefer the former.


> On Aug. 15, 2018, 8:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8107-8108 (patched)
> > 
> >
> > `"Resource provider manager is not initialized"`
> 
> Benjamin Bannier wrote:
> Wouldn't that be leaking internal information? I tried to map this onto 
> an operator-relevant issue here.

How about `"Agent has not registered yet"`?


> On Aug. 15, 2018, 8:54 p.m., Chun-Hung Hsiao wrote:
> > src/tests/api_tests.cpp
> > Lines 7171-7179 (patched)
> > 
> >
> > ```
> > Future v1Response =
> >   post(slave.get()->pid, v1Call, contentType, DEFAULT_CREDENTIAL_2);
> > ```
> 
> Benjamin Bannier wrote:
> I would like to explicitly test the return HTTP status code, and using 
> the `post` helper just maps a `Forbidden` on a `Failure`. Leave as is?

Make sense. Dropping this. Maybe we could slightly modify the helper in the 
future.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207351
---


On Aug. 16, 2018, 2:31 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 16, 2018, 2:31 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-16 Thread Benjamin Bannier


> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/http.cpp
> > Lines 3356 (patched)
> > 
> >
> > We should return an `InternalServerError` on failed/discarded.

I don't think this is required as a failed `Future` is automatically mapped 
onto an `InternalServerError`.


> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8107-8108 (patched)
> > 
> >
> > `"Resource provider manager is not initialized"`

Wouldn't that be leaking internal information? I tried to map this onto an 
operator-relevant issue here.


> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/tests/api_tests.cpp
> > Lines 7171-7179 (patched)
> > 
> >
> > ```
> > Future v1Response =
> >   post(slave.get()->pid, v1Call, contentType, DEFAULT_CREDENTIAL_2);
> > ```

I would like to explicitly test the return HTTP status code, and using the 
`post` helper just maps a `Forbidden` on a `Failure`. Leave as is?


> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/tests/api_tests.cpp
> > Lines 7183-7191 (patched)
> > 
> >
> > ```
> > v1Response = post(slave.get()->pid, v1Call, contentType);
> > ```

Ditto.


- Benjamin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207351
---


On Aug. 16, 2018, 4:31 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 16, 2018, 4:31 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/resource_provider/manager.cpp abd7e38e5517ea600f9fc9b8a96c7d0d26df0620 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-16 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
---

(Updated Aug. 16, 2018, 4:31 p.m.)


Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Changes
---

Improved handling of removed resource providers in agent; rebased.


Bugs: MESOS-8403
https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
---

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
  src/resource_provider/manager.cpp abd7e38e5517ea600f9fc9b8a96c7d0d26df0620 
  src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
  src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
  src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
  src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 


Diff: https://reviews.apache.org/r/68147/diff/4/

Changes: https://reviews.apache.org/r/68147/diff/3-4/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-15 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207351
---




src/slave/http.cpp
Lines 3356 (patched)


We should return an `InternalServerError` on failed/discarded.



src/slave/slave.cpp
Lines 8107-8108 (patched)


`"Resource provider manager is not initialized"`



src/tests/api_tests.cpp
Lines 7171-7179 (patched)


```
Future v1Response =
  post(slave.get()->pid, v1Call, contentType, DEFAULT_CREDENTIAL_2);
```



src/tests/api_tests.cpp
Lines 7183-7191 (patched)


```
v1Response = post(slave.get()->pid, v1Call, contentType);
```


- Chun-Hung Hsiao


On Aug. 15, 2018, 1:53 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 15, 2018, 1:53 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/3/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-15 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
---

(Updated Aug. 15, 2018, 3:53 p.m.)


Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Changes
---

Addressed issues raised by Chun.


Bugs: MESOS-8403
https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
---

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
  src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
  src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
  src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
  src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 


Diff: https://reviews.apache.org/r/68147/diff/3/

Changes: https://reviews.apache.org/r/68147/diff/2-3/


Testing
---

`make check`


Thanks,

Benjamin Bannier



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-15 Thread Mesos Reviewbot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207312
---



Patch looks great!

Reviews applied: [68143, 68144, 68145, 68146, 68147]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' 
CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 
MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Aug. 1, 2018, 11:47 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 1, 2018, 11:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp dcfd0d9831775912942afb746ce6704383868468 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp e574c249f81e0e77abe982c126fe210a6ee8b591 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-14 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207277
---




include/mesos/agent/agent.proto
Lines 100 (patched)


Would `MARK_RESOURCE_PROVIDER_GONE` be a more consistent (w.r.t. the 
`MARK_AGENT_GONE` master API) and less confusing name?


- Chun-Hung Hsiao


On Aug. 1, 2018, 3:47 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 1, 2018, 3:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp dcfd0d9831775912942afb746ce6704383868468 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp e574c249f81e0e77abe982c126fe210a6ee8b591 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-02 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review206789
---



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['68143', '68144', '68145', '68146', '68147']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2028/mesos-review-68147

Relevant logs:

- 
[mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2028/mesos-review-68147/logs/mesos-tests-stdout.log):

```
[--] 9 tests from Endpoint/SlaveEndpointTest (980 ms total)

[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (31 
ms)
[ RUN  ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[   OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (36 
ms)
[--] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (70 ms 
total)

[--] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN  ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[   OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (1166 ms)
[--] 1 test from IsolationFlag/CpuIsolatorTest (1187 ms total)

[--] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN  ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[   OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (1891 ms)
[--] 1 test from IsolationFlag/MemoryIsolatorTest (1914 ms total)

[--] Global test environment tear-down
[==] 1017 tests from 98 test cases ran. (737321 ms total)
[  PASSED  ] 1015 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] DockerTest.ROOT_DOCKER_interface
[  FAILED  ] DockerTest.ROOT_DOCKER_kill

 2 FAILED TESTS
  YOU HAVE 222 DISABLED TESTS

```

- 
[mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2028/mesos-review-68147/logs/mesos-tests-stderr.log):

```
I0802 11:12:05.241119 12580 slave.cpp:3939] Shutting down framework 
b05da931-60cd-4f01-95c3-fde8eb96c9e7-
I0802 11:12:05.241119  1596 master.cpp:10963] Updating the state of task 
ea44f51e-006c-4f2f-95a7-c1de36c6229f of framework 
b05da931-60cd-4f01-95c3-fde8eb96c9e7- (latest state: TASK_KILLED, status 
update state: TASK_KILLED)
I0802 11:12:05.241119 12580 slave.cpp:6658] Shutting down executor 
'ea44f51e-006c-4f2f-95a7-c1de36c6229f' of framework b05da931-60cd-I0802 
11:12:04.184103 11216 exec.cpp:162] Version: 1.7.0
I0802 11:12:04.209116 12352 exec.cpp:236] Executor registered on agent 
b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0
I0802 11:12:04.214114 11908 executor.cpp:182] Received SUBSCRIBED event
I0802 11:12:04.218117 11908 executor.cpp:186] Subscribed executor on 
windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net
I0802 11:12:04.219120 11908 executor.cpp:182] Received LAUNCH event
I0802 11:12:04.223115 11908 executor.cpp:679] Starting task 
ea44f51e-006c-4f2f-95a7-c1de36c6229f
I0802 11:12:04.308123 11908 executor.cpp:499] Running 
'D:\DCOS\mesos\src\mesos-containerizer.exe launch '
I0802 11:12:05.215116 11908 executor.cpp:693] Forked command at 14264
I0802 11:12:05.244125  9860 exec.cpp:445] Executor asked to shutdown
I0802 11:12:05.245126 11908 executor.cpp:182] Received SHUTDOWN event
I0802 11:12:05.245126 11908 executor.cpp:796] Shutting down
I0802 11:12:05.245126 11908 executor.cpp:909] Sending SIGTERM to process tree 
at pid 144f01-95c3-fde8eb96c9e7- at executor(1)@192.10.1.6:63036
I0802 11:12:05.243126  1596 master.cpp:11061] Removing task 
ea44f51e-006c-4f2f-95a7-c1de36c6229f with resources cpus(allocated: *):4; 
mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: 
*):[31000-32000] of framework b05da931-60cd-4f01-95c3-fde8eb96c9e7- on 
agent b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0 at slave(464)@192.10.1.6:61259 
(windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net)
I0802 11:12:05.243126 12580 slave.cpp:931] Agent terminating
W0802 11:12:05.244125 12580 slave.cpp:3935] Ignoring shutdown framework 
b05da931-60cd-4f01-95c3-fde8eb96c9e7- because it is terminating
I0802 11:12:05.246127  1396 master.cpp:1338] Agent 
b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0 at slave(464)@192.10.1.6:61259 
(windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net) disconnected
I0802 11:12:05.246127  1396 master.cpp:3354] Disconnecting agent 
b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0 at slave(464)@192.10.1.6:61259 
(windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net)
I0802 11:12:05.247131  1396 master.cpp:3373] Deactivating agent 
b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0 at slave(464)@192.10.1.6:61259 

Re: Review Request 68147: Added agent support to remove local resource providers.

2018-08-01 Thread Mesos Reviewbot Windows

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review206753
---



PASS: Mesos patch 68147 was successfully built and tested.

Reviews applied: `['68143', '68144', '68145', '68146', '68147']`

All the build artifacts available at: 
http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2021/mesos-review-68147

- Mesos Reviewbot Windows


On Aug. 1, 2018, 3:47 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> ---
> 
> (Updated Aug. 1, 2018, 3:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
> https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
>   src/slave/http.hpp dcfd0d9831775912942afb746ce6704383868468 
>   src/slave/http.cpp ab5864d9fd2fde478ed7da2ca7ed8abedc72c7c5 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp e574c249f81e0e77abe982c126fe210a6ee8b591 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 182622a62d350ebefc891a385de3f2d35a7c0243 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Review Request 68147: Added agent support to remove local resource providers.

2018-08-01 Thread Benjamin Bannier

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
---

Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Bugs: MESOS-8403
https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
---

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs
-

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f2159367c66ed35ee90aea07f 
  src/slave/http.hpp dcfd0d9831775912942afb746ce6704383868468 
  src/slave/http.cpp ab5864d9fd2fde478ed7da2ca7ed8abedc72c7c5 
  src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
  src/slave/slave.cpp e574c249f81e0e77abe982c126fe210a6ee8b591 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 182622a62d350ebefc891a385de3f2d35a7c0243 


Diff: https://reviews.apache.org/r/68147/diff/1/


Testing
---

`make check`


Thanks,

Benjamin Bannier