Re: Review Request 71080: Master should store the list of completed framework ids for lifecycle.

2019-07-18 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71080/#review216746
---


Fix it, then Ship it!




Gave it a ship it. HoweverI'm not touching the master that much these days, so 
it would be nice to get a review from Greg/Benno/Joseph =).


docs/configuration/master.md
Lines 447 (patched)
<https://reviews.apache.org/r/71080/#comment303976>

Nit: s/ids/IDs/



src/master/flags.cpp
Lines 599-601 (patched)
<https://reviews.apache.org/r/71080/#comment303978>

Nit: s/id/ID/g



src/master/master.hpp
Lines 2215 (patched)
<https://reviews.apache.org/r/71080/#comment303980>

We only care about the key (framework ID), so using a bounded hashmap feels 
a bit wasteful. However stout doesn't provide other bounded data structures and 
I don't think that we should add a new one just for this.



src/master/master.cpp
Lines 11402 (patched)
<https://reviews.apache.org/r/71080/#comment303979>

Nit: s/ids/IDs/


- Gastón Kleiman


On July 17, 2019, 11:58 a.m., Xudong Ni wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71080/
> ---
> 
> (Updated July 17, 2019, 11:58 a.m.)
> 
> 
> Review request for mesos, James Peach and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-8636
> https://issues.apache.org/jira/browse/MESOS-8636
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> It should be separately from that for webUI and endpoints.
> Currently the master stores the history of completed frameworks in
> a map with the full historical data of the framework, it is
> prohibitively expensive to keep a long history; In order to reject
> frameworks from reregistering if they have previously marked as
> completed, we only need to persist the framework ids and are able
> to keep long history.
> 
> 
> Diffs
> -
> 
>   docs/configuration/master.md c56ac8510ea968f9587e23e81ed310caa968ee9e 
>   docs/operator-http-api.md 2d4a9b66e20cf19eceec718b7de3d812ab285772 
>   include/mesos/allocator/allocator.hpp 
> 2bab53ab5fb25931a724c20a039e1301983ba574 
>   src/master/flags.hpp 78623d68bf428cd3f52684303d98a525d42eb878 
>   src/master/flags.cpp 74f4daadd48e8e691be43759b88dc8b3c2df489a 
>   src/master/http.cpp 765d5052ab6a8d731113f8922d20fb280b843003 
>   src/master/master.hpp ffa7423ba533725f7c1123d9aa507b1348e7f281 
>   src/master/master.cpp f1ca637b4cb0382caec53b5a81f6a4eb46f4dd2d 
> 
> 
> Diff: https://reviews.apache.org/r/71080/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> sudo GLOG_v=1 ./bin/mesos-tests.sh --verbose 
> --gtest_filter=MasterTest.MaxCompletedFrameworksFlag --gtest_break_on_failure 
> --gtest_repeat=1000
> [   OK ] MasterTest.MaxCompletedFrameworksFlag (230 ms)
> [--] 1 test from MasterTest (235 ms total)
> 
> [--] Global test environment tear-down
> [==] 1 test from 1 test case ran. (281 ms total)
> [  PASSED  ] 1 test.
> 
> 
> Thanks,
> 
> Xudong Ni
> 
>



Re: Review Request 70528: Updated release guide.

2019-07-10 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70528/#review216515
---


Fix it, then Ship it!





docs/release-guide.md
Lines 69 (patched)
<https://reviews.apache.org/r/70528/#comment303732>

Super minor nit: s/  / / =)


- Gastón Kleiman


On July 10, 2019, 7:58 a.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70528/
> ---
> 
> (Updated July 10, 2019, 7:58 a.m.)
> 
> 
> Review request for mesos, Greg Mann, Joseph Wu, and Till Toenshoff.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated the release guide in `docs/release-guide.md` to match
> the de-facto process we're using to release new versions.
> 
> Also mentioned some additional third-party tooling to the post-release
> desription.
> 
> 
> Diffs
> -
> 
>   docs/release-guide.md a3ad2668a1953a7f20dd7209e122481ad8b30f17 
> 
> 
> Diff: https://reviews.apache.org/r/70528/diff/3/
> 
> 
> Testing
> ---
> 
> Released Mesos 1.8.0 according to this procedure.
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 70587: Fixed unguarded calls to `Option::get()` in the master.

2019-05-03 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70587/#review215039
---


Ship it!




Ship It!

- Gastón Kleiman


On May 2, 2019, 3 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70587/
> ---
> 
> (Updated May 2, 2019, 3 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, Joseph Wu, and Till 
> Toenshoff.
> 
> 
> Bugs: MESOS-9698
> https://issues.apache.org/jira/browse/MESOS-9698
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch avoids making unguarded calls to `Option::get()`
> in `Master::updateOperationStatus()`. During agent reregistration, it's
> possible that a `ReregisterSlaveMessage` from the agent can race with a
> `SlaveReregisteredMessage` from the master, leading to multiple rounds
> of master/agent operation reconciliation. The duplicate operation status
> updates which occur as a result would crash the master before this fix.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 7dcdc9ab62a46638a027eb9a54c1dff173785927 
>   src/tests/agent_operation_feedback_tests.cpp 
> e427441b3ef702acf0fba52adf7ba027ea6bc508 
> 
> 
> Diff: https://reviews.apache.org/r/70587/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*DroppedOperationDuplicateStatusUpdate*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70593: Fixed flaky test FrameworkPrincipalChangeFails.

2019-05-03 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70593/#review215027
---



The review message doesn't conform with our commit message style.

I'd propose something like this:

```
Fixed flaky test 'FrameworkPrincipalChangeFails'.

Currently the test is flaky due to the race between `EXPECT_CALL` and the
scheduler API.

This patch moves the EXPECT_CALLs before `v1::scheduler::TestMesos` is
instantiated.

Review: https://reviews.apache.org/r/70593/
```

- Gastón Kleiman


On May 3, 2019, 9:38 a.m., Andrei Sekretenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70593/
> ---
> 
> (Updated May 3, 2019, 9:38 a.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-2842
> https://issues.apache.org/jira/browse/MESOS-2842
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently the test is flaky due to the race between EXPECT_CALL and the 
> scheduler API. 
> EXPECT_CALLs should be moved before v1::scheduler::TestMesos() - I don't know 
> why we have overlooked this in https://reviews.apache.org/r/70377 .
> 
> 
> Diffs
> -
> 
>   src/tests/http_fault_tolerance_tests.cpp 
> a9e8d0c8962d9c978e0488e16317117d55d214fa 
> 
> 
> Diff: https://reviews.apache.org/r/70593/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>



Re: Review Request 70593: Fixed flaky test FrameworkPrincipalChangeFails.

2019-05-03 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70593/#review215022
---


Ship it!




Ship It!

- Gastón Kleiman


On May 3, 2019, 9:38 a.m., Andrei Sekretenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70593/
> ---
> 
> (Updated May 3, 2019, 9:38 a.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-2842
> https://issues.apache.org/jira/browse/MESOS-2842
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently the test is flaky due to the race between EXPECT_CALL and the 
> scheduler API. 
> EXPECT_CALLs should be moved before v1::scheduler::TestMesos() - I don't know 
> why we have overlooked this in https://reviews.apache.org/r/70377 .
> 
> 
> Diffs
> -
> 
>   src/tests/http_fault_tolerance_tests.cpp 
> a9e8d0c8962d9c978e0488e16317117d55d214fa 
> 
> 
> Diff: https://reviews.apache.org/r/70593/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>



Re: Review Request 70506: Added metrics to the operation feedback example framework.

2019-04-19 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70506/
---

(Updated April 19, 2019, 4:06 p.m.)


Review request for mesos, Benno Evers, Greg Mann, and Joseph Wu.


Changes
---

Addressed feedback.


Repository: mesos


Description
---

Added metrics to the operation feedback example framework.


Diffs (updated)
-

  src/examples/operation_feedback_framework.cpp 
ab19605dda069db06b6215a95049014244fc2d4f 


Diff: https://reviews.apache.org/r/70506/diff/2/

Changes: https://reviews.apache.org/r/70506/diff/1-2/


Testing
---


Thanks,

Gastón Kleiman



Review Request 70506: Added metrics to the operation feedback example framework.

2019-04-18 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70506/
---

Review request for mesos, Benno Evers, Greg Mann, and Joseph Wu.


Repository: mesos


Description
---

Added metrics to the operation feedback example framework.


Diffs
-

  src/examples/operation_feedback_framework.cpp 
ab19605dda069db06b6215a95049014244fc2d4f 


Diff: https://reviews.apache.org/r/70506/diff/1/


Testing
---


Thanks,

Gastón Kleiman



Review Request 70478: Improved operation feedback example framework.

2019-04-15 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70478/
---

Review request for mesos, Benno Evers and Greg Mann.


Repository: mesos


Description
---

This patch makes the operation feedback example framework use a long
filter when accepting offers and revive offers when it receives an
`OPERATION_FINISH` operation status update.


Diffs
-

  src/examples/operation_feedback_framework.cpp 
ab19605dda069db06b6215a95049014244fc2d4f 


Diff: https://reviews.apache.org/r/70478/diff/1/


Testing
---

Manual testing.


Thanks,

Gastón Kleiman



Re: Review Request 70377: Added tests to check that framework cannot change its principal.

2019-04-12 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70377/#review214641
---


Ship it!




Ship It!

- Gastón Kleiman


On April 10, 2019, 8:31 a.m., Andrei Sekretenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70377/
> ---
> 
> (Updated April 10, 2019, 8:31 a.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-2842
> https://issues.apache.org/jira/browse/MESOS-2842
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added tests to check that framework cannot change its principal.
> 
> 
> Diffs
> -
> 
>   src/tests/fault_tolerance_tests.cpp 
> 91cc9a2bcb98562cfdb60df229a88a0dabb34792 
>   src/tests/http_fault_tolerance_tests.cpp 
> d3db8a6efa0e56dd332d8cae48123b173d347cf2 
>   src/tests/mesos.hpp d8c17eabdabd14f428946917d56eb8f261bfbd2a 
> 
> 
> Diff: https://reviews.apache.org/r/70377/diff/3/
> 
> 
> Testing
> ---
> 
> Ran make check - these two tests fail, others don't.
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>



Re: Review Request 70379: Added validation that the principal stays the same on resubscription.

2019-04-12 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70379/#review214640
---


Ship it!




Ship It!

- Gastón Kleiman


On April 10, 2019, 8:35 a.m., Andrei Sekretenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70379/
> ---
> 
> (Updated April 10, 2019, 8:35 a.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-2842
> https://issues.apache.org/jira/browse/MESOS-2842
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added validation that the principal stays the same on resubscription.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
> 
> 
> Diff: https://reviews.apache.org/r/70379/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> Now the two failing tests from https://reviews.apache.org/r/70377/ pass.
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>



Re: Review Request 70408: Deduplicated common validation code in Master::subscribe()'s.

2019-04-12 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70408/#review214639
---


Fix it, then Ship it!





src/master/master.cpp
Line 2553 (original), 2553 (patched)
<https://reviews.apache.org/r/70408/#comment300825>

Nit: We normally only prefix underscores in method names when adding 
continuations of async methods.

I would call this method `validateFrameworkSubscription`.



src/master/master.cpp
Line 2620 (original), 2600 (patched)
<https://reviews.apache.org/r/70408/#comment300826>

Please shift this line left to fix the indentation.


- Gastón Kleiman


On April 10, 2019, 8:29 a.m., Andrei Sekretenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70408/
> ---
> 
> (Updated April 10, 2019, 8:29 a.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-2842
> https://issues.apache.org/jira/browse/MESOS-2842
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Deduplicated common validation code in Master::subscribe()'s.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 94891af9deeaddbfc9d6eabb243aed97f7b7 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
> 
> 
> Diff: https://reviews.apache.org/r/70408/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>



Re: Review Request 70412: Made the operation feedback example framwork cleanup old reservations.

2019-04-09 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70412/
---

(Updated April 9, 2019, 4:56 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
---

Addressed moar feedback.


Repository: mesos


Description
---

This patch adds a flag to the operation feedback example framework that
makes it unreserve resources that were reserved by another instance of
the framework.

NOTE: The new flag should only be enabled if the example framework is
the only framework making reservations for a given role. Otherwise it
will unreserve reservations made by another framework.


Diffs (updated)
-

  src/examples/operation_feedback_framework.cpp 
2480c340c4ccb4246098c35e0315093f3eb44e81 


Diff: https://reviews.apache.org/r/70412/diff/3/

Changes: https://reviews.apache.org/r/70412/diff/2-3/


Testing
---

Manual testing.


Thanks,

Gastón Kleiman



Re: Review Request 70412: Made the operation feedback example framwork cleanup old reservations.

2019-04-09 Thread Gastón Kleiman


> On April 8, 2019, 5:02 p.m., Greg Mann wrote:
> > src/examples/operation_feedback_framework.cpp
> > Line 496 (original), 528 (patched)
> > <https://reviews.apache.org/r/70412/diff/1/?file=2137761#file2137761line532>
> >
> > Should we remove this log line since we may receive operation status 
> > updates for unreservations of resources from previous runs of the 
> > framework? Or maybe we could change it to INFO level?

We don't set an operation ID when unreserving reserouces from previous runs, so 
we shouldn't get any status updates.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70412/#review214481
-------


On April 5, 2019, 4:35 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70412/
> ---
> 
> (Updated April 5, 2019, 4:35 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds a flag to the operation feedback example framework that
> makes it unreserve resources that were reserved by another instance of
> the framework.
> 
> NOTE: The new flag should only be enabled if the example framework is
> the only framework making reservations for a given role. Otherwise it
> will unreserve reservations made by another framework.
> 
> 
> Diffs
> -
> 
>   src/examples/operation_feedback_framework.cpp 
> 2480c340c4ccb4246098c35e0315093f3eb44e81 
> 
> 
> Diff: https://reviews.apache.org/r/70412/diff/1/
> 
> 
> Testing
> ---
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 70412: Made the operation feedback example framwork cleanup old reservations.

2019-04-09 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70412/
---

(Updated April 9, 2019, 4:39 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
---

Addressed feedback.


Repository: mesos


Description
---

This patch adds a flag to the operation feedback example framework that
makes it unreserve resources that were reserved by another instance of
the framework.

NOTE: The new flag should only be enabled if the example framework is
the only framework making reservations for a given role. Otherwise it
will unreserve reservations made by another framework.


Diffs (updated)
-

  src/examples/operation_feedback_framework.cpp 
2480c340c4ccb4246098c35e0315093f3eb44e81 


Diff: https://reviews.apache.org/r/70412/diff/2/

Changes: https://reviews.apache.org/r/70412/diff/1-2/


Testing
---

Manual testing.


Thanks,

Gastón Kleiman



Re: Review Request 70412: Made the operation feedback example framwork cleanup old reservations.

2019-04-09 Thread Gastón Kleiman


> On April 8, 2019, 10:26 a.m., Benno Evers wrote:
> > src/examples/operation_feedback_framework.cpp
> > Lines 329 (patched)
> > <https://reviews.apache.org/r/70412/diff/1/?file=2137761#file2137761line333>
> >
> > Is this always true even in the presence of `RESERVATION_REFINEMENT`?
> 
> Gastón Kleiman wrote:
> This framework only pushes one reservation, so I guess that there should 
> only be one element for reservations made by it?
> 
> The check would fail if the example framework shares a role with another 
> framework that is using reservation refinement. Is that a case that we want 
> to contemplate for a simple example framework?
> 
> If so, the framework should probably check the labels of all the 
> reservations.
> 
> Greg Mann wrote:
> Yea we would only hit a failure at this CHECK if another framework 
> refined the reservations made by this framework. I'm fine with making the 
> assumption that that will not occur, what do you guys think?
> 
> Benno Evers wrote:
> Hm, the original question was more along the lines of "Does the 
> `Resources::reserved()` guarantee to always return resources with a single 
> reservation even when using reservation refinement?".
> 
> If it does not, then I'm not sure we should be `CHECK()`ing here: A check 
> failure should usually indicate some bug in the local code that led to an 
> inconsistent state where it is impossible to recover. However, here it looks 
> like we're `CHECK()`ing on external input, and it does not even have to be 
> malformed to trigger a check failure.
> 
> I'm fine with making the assumption that this does not happen, but the 
> problem is how to communicate that assumption. I fear that if we distill the 
> discussion here into a comment and put it above the check it might be too 
> distracting, especially since the target audience are anyways people who 
> might not be too familiar with the intricacies of Mesos.
> 
> Maybe one solution would be to remove the `RESERVATION_REFINEMENT` 
> capability below and then also remove the `CHECK()` here completely, leaving 
> the framework to crash with a protobuf error if our assumption was wrong.

I tried removing the `RESERVATION_REFINEMENT` capability, but then the 
`v1::Resources()` constructor crashes because it expects the resources proto to 
be in the post-refinement format. I could upgrade the resources, but that also 
adds unnecessary complexity.

I ended up just removing the `CHECK()`. Note however that the framework has 
some other `CHECK()` statements on external input, e.g., 
https://github.com/apache/mesos/blob/475b5e14085eef88edd868d331bd08b7ba5443bf/src/examples/operation_feedback_framework.cpp#L665
 — we might want to get rid of those statements too.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70412/#review214468
---


On April 5, 2019, 4:35 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70412/
> ---
> 
> (Updated April 5, 2019, 4:35 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds a flag to the operation feedback example framework that
> makes it unreserve resources that were reserved by another instance of
> the framework.
> 
> NOTE: The new flag should only be enabled if the example framework is
> the only framework making reservations for a given role. Otherwise it
> will unreserve reservations made by another framework.
> 
> 
> Diffs
> -
> 
>   src/examples/operation_feedback_framework.cpp 
> 2480c340c4ccb4246098c35e0315093f3eb44e81 
> 
> 
> Diff: https://reviews.apache.org/r/70412/diff/1/
> 
> 
> Testing
> ---
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 70412: Made the operation feedback example framwork cleanup old reservations.

2019-04-08 Thread Gastón Kleiman


> On April 8, 2019, 10:26 a.m., Benno Evers wrote:
> > src/examples/operation_feedback_framework.cpp
> > Lines 329 (patched)
> > <https://reviews.apache.org/r/70412/diff/1/?file=2137761#file2137761line333>
> >
> > Is this always true even in the presence of `RESERVATION_REFINEMENT`?

This framework only pushes one reservation, so I guess that there should only 
be one element for reservations made by it?

The check would fail if the example framework shares a role with another 
framework that is using reservation refinement. Is that a case that we want to 
contemplate for a simple example framework?

If so, the framework should probably check the labels of all the reservations.


> On April 8, 2019, 10:26 a.m., Benno Evers wrote:
> > src/examples/operation_feedback_framework.cpp
> > Lines 735 (patched)
> > <https://reviews.apache.org/r/70412/diff/1/?file=2137761#file2137761line739>
> >
> > I guess this should be camelCase?

Good catch! =)


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70412/#review214468
-------


On April 5, 2019, 4:35 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70412/
> ---
> 
> (Updated April 5, 2019, 4:35 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds a flag to the operation feedback example framework that
> makes it unreserve resources that were reserved by another instance of
> the framework.
> 
> NOTE: The new flag should only be enabled if the example framework is
> the only framework making reservations for a given role. Otherwise it
> will unreserve reservations made by another framework.
> 
> 
> Diffs
> -
> 
>   src/examples/operation_feedback_framework.cpp 
> 2480c340c4ccb4246098c35e0315093f3eb44e81 
> 
> 
> Diff: https://reviews.apache.org/r/70412/diff/1/
> 
> 
> Testing
> ---
> 
> Manual testing.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Review Request 70412: Made the operation feedback example framwork cleanup old reservations.

2019-04-05 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70412/
---

Review request for mesos, Benno Evers and Greg Mann.


Repository: mesos


Description
---

This patch adds a flag to the operation feedback example framework that
makes it unreserve resources that were reserved by another instance of
the framework.

NOTE: The new flag should only be enabled if the example framework is
the only framework making reservations for a given role. Otherwise it
will unreserve reservations made by another framework.


Diffs
-

  src/examples/operation_feedback_framework.cpp 
2480c340c4ccb4246098c35e0315093f3eb44e81 


Diff: https://reviews.apache.org/r/70412/diff/1/


Testing
---

Manual testing.


Thanks,

Gastón Kleiman



Review Request 70411: Changed naming of example framework class fields to match convention.

2019-04-05 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70411/
---

Review request for mesos, Benno Evers and Greg Mann.


Repository: mesos


Description
---

Changed naming of example framework class fields to match convention.


Diffs
-

  src/examples/operation_feedback_framework.cpp 
2480c340c4ccb4246098c35e0315093f3eb44e81 


Diff: https://reviews.apache.org/r/70411/diff/1/


Testing
---

Framework still compiles =).


Thanks,

Gastón Kleiman



Re: Review Request 70377: Added tests to check that the framework cannot change its principal.

2019-04-04 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70377/#review214392
---




src/tests/fault_tolerance_tests.cpp
Lines 2035-2036 (patched)
<https://reviews.apache.org/r/70377/#comment300598>

Do we really need these lines?

`DEFAULT_FRAMEWORK_INFO` already has a principal set, and the failover 
timeout doesn't seem to be necessary, because the first driver never 
disconnects.



src/tests/scheduler_http_api_tests.cpp
Lines 581-619 (patched)
<https://reviews.apache.org/r/70377/#comment300600>

We now have some handy helpers to create/subscribe a framework and get the 
ID: 
https://github.com/apache/mesos/blob/003e47d852f6eb9a1cd7c86750d91b37269aec26/src/tests/operation_reconciliation_tests.cpp#L724-L747

The `MockHTTPScheduler` also has a `error` callback that can be used to 
check whether the master rejects the re-registration attempt.


- Gastón Kleiman


On April 3, 2019, 8:42 a.m., Andrei Sekretenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70377/
> ---
> 
> (Updated April 3, 2019, 8:42 a.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-2842
> https://issues.apache.org/jira/browse/MESOS-2842
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added two tests to check that the framework cannot change its principal: one 
> for HTTP API, another for process API.
> 
> Actually, the process API test reproduces the master crash described in 
> MESOS-2842.
> 
> 
> Diffs
> -
> 
>   src/tests/fault_tolerance_tests.cpp 
> 91cc9a2bcb98562cfdb60df229a88a0dabb34792 
>   src/tests/scheduler_http_api_tests.cpp 
> d5b5eec0df0c01c4706c9bb4438c9daa305bd376 
> 
> 
> Diff: https://reviews.apache.org/r/70377/diff/1/
> 
> 
> Testing
> ---
> 
> Ran make check - these two tests fail, others don't.
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>



Re: Review Request 70378: Wrapped access to hashmaps in `frameworks.principals` and `authorized`.

2019-04-04 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70378/#review214387
---




src/master/master.hpp
Lines 100-158 (patched)
<https://reviews.apache.org/r/70378/#comment300594>

I think that I would drop this patch.

However whenever we add new data structures, we add them in stout 
(https://github.com/apache/mesos/blob/003e47d852f6eb9a1cd7c86750d91b37269aec26/3rdparty/stout/README.md)
 instead of in the Mesos codebase.


- Gastón Kleiman


On April 3, 2019, 9:01 a.m., Andrei Sekretenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70378/
> ---
> 
> (Updated April 3, 2019, 9:01 a.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-2842
> https://issues.apache.org/jira/browse/MESOS-2842
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> !! This code is completely disposable and possibly not properly located; the 
> next patch actually does not depend on this one.
> 
> --
> 
> After having a look at the Master code, I realized that the hashmaps 
> `frameworks.principlas` and `authorized` should satisfy the following 
> property: the value corresponding to a given key never changes.
> 
> Currently, this invariant is enforced MANUALLY with hacks of three kinds. 
> The first hack is constructs like this:
> 
> CHECK(!h.contains(key));
> h[key] = newValue;
> 
> The second hack is much worse: it is avoiding calling non-const 
> `hashmap::operator[]()` for nonexisting keys.
> 
> And the third one is even worse: it is avoiding this situation:
> h.erase(key);
> h[key] = newValue;
> 
> To make sure that at least the first two hacks has been used properly, I 
> added this wrapper around hashmap.
> Actually, this is a dirty hack itself, and not a proper decomposition of the 
> God Object antipattern that, IMO, has somehow started to grow inside of the 
> Master.
> 
> Unfortunately, there is no such simple way to validate that the third kind of 
> hacks is used properly. (In fact, it is not - and this is the direct cause of 
> MESOS-2842.)
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 94891af9deeaddbfc9d6eabb243aed97f7b7 
>   src/master/master.cpp cf5caa0893ba1387a1f3a9d129ecd7d974f776bd 
> 
> 
> Diff: https://reviews.apache.org/r/70378/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>



Re: Review Request 70379: Added validation that the principal stays the same on resubscription.

2019-04-04 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70379/#review214382
---




src/master/master.hpp
Lines 1194-1195 (patched)
<https://reviews.apache.org/r/70379/#comment300573>

I'd write something like this:

```
Validates that the principal provided on resubscription matches the one 
already known by the master.
```



src/master/master.cpp
Lines 2525-2527 (patched)
<https://reviews.apache.org/r/70379/#comment300575>

We tend to prefer inlining statements like this one instead of creating 
static methods.



src/master/master.cpp
Lines 2558-2560 (patched)
<https://reviews.apache.org/r/70379/#comment300574>

We use stout's `stringify()` for this, which delegates to the corresponding 
`operator<<` if one exists.



src/master/master.cpp
Lines 2562-2563 (patched)
<https://reviews.apache.org/r/70379/#comment300589>

I would follow the current pattern in `Master::subscribe` and instead of 
adding this method add one with this signature: `bool 
Master::frameworkPrincipalChanged(const FrameworkInfo& frameworkInfo)`.



src/master/master.cpp
Lines 2570 (patched)
<https://reviews.apache.org/r/70379/#comment300577>

we tend to use `auto` only for complex types or inside `foreach` statements.

To stay consistent with what's normally used, I would do:

`Framework* framework = getFramework(frameworkInfo.id());`



src/master/master.cpp
Lines 2573-2574 (patched)
<https://reviews.apache.org/r/70379/#comment300580>

I would change this to:

```
// TODO(asekretenko): Masters don't store `FrameworkInfo` messages in the 
replicated log, so it is possible that the previous principal is still unknown 
at the time of re-registration. This has to be changed if we decide to start 
storing `FrameworkInfo` messages.
```

And add an empty line before the TODO.



src/master/master.cpp
Lines 2592-2596 (patched)
<https://reviews.apache.org/r/70379/#comment300584>

We don't end lines with the `<<` operator. We start new lines with it 
instead, so this would look like:

```
  LOG(WARNING)
<< "Framework " << frameworkInfo.id() << " which had a principal '"
<<  oldPrincipal << "' tried to resubscribe with a new principal '"
<< newPrincipal << "'";
```

However we don't have to log anything here, because `Master::subscribe()` 
already logs validation errors.



src/master/master.cpp
Lines 2598 (patched)
<https://reviews.apache.org/r/70379/#comment300593>

I think I'd rather not mention the old principal to prevent leaking 
information.



src/master/master.cpp
Lines 10763 (patched)
<https://reviews.apache.org/r/70379/#comment300591>

Looks like this variable is not used.


- Gastón Kleiman


On April 3, 2019, 9:04 a.m., Andrei Sekretenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70379/
> ---
> 
> (Updated April 3, 2019, 9:04 a.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-2842
> https://issues.apache.org/jira/browse/MESOS-2842
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added validation that the principal stays the same on a framework 
> resubscription.
> This solves MESOS-2842.
> 
> There are at least two objectionable places in this patch - see TODO in the 
> diff.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 94891af9deeaddbfc9d6eabb243aed97f7b7 
>   src/master/master.cpp cf5caa0893ba1387a1f3a9d129ecd7d974f776bd 
> 
> 
> Diff: https://reviews.apache.org/r/70379/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> Now the two failing tests from https://reviews.apache.org/r/70377/ pass.
> 
> 
> Thanks,
> 
> Andrei Sekretenko
> 
>



Re: Review Request 70335: Sent FrameworkInfo to agents when applying operations.

2019-04-03 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70335/#review214345
---




src/slave/slave.cpp
Lines 4516 (patched)
<https://reviews.apache.org/r/70335/#comment300539>

I'm afraid that if the continuation is not called synchronously, the 
application of a speculative operation can race with the handling of a 
`RunTaskMessage` that tries to consume the result of the operation. This 
ultimately leads to agent crashes =/.


- Gastón Kleiman


On March 28, 2019, 4:27 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70335/
> ---
> 
> (Updated March 28, 2019, 4:27 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, Chun-Hung Hsiao, 
> Gastón Kleiman, Joseph Wu, and Meng Zhu.
> 
> 
> Bugs: MESOS-8582
> https://issues.apache.org/jira/browse/MESOS-8582
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the master to send a framework's full
> `FrameworkInfo` to the agent in the `ApplyOperationMessage`.
> The agent is updated to checkpoint frameworks when applying
> operations, and to unschedule GC on the meta directory when
> a new framework is created.
> 
> The test `TerminalOrphanOperationAfterMasterFailover` is
> removed since this patch eliminates the case of orphan
> operations relevant to that test.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp acc67d3763ddee9027e6cf375f1d495ff5805026 
>   src/messages/messages.proto e30ad34cc9212b05f85ba5e1d4fcfc9e49ae92c0 
>   src/slave/slave.hpp 2bffdc4b1e282d3c6dae2dcf23584a8a4550bf94 
>   src/slave/slave.cpp 5373cee5d30c2403497939eeba2ee5405117237e 
>   src/tests/persistent_volume_tests.cpp 
> 7e929a5a3a92e16a5dec10206f37caebc20d66a8 
>   src/tests/reservation_tests.cpp cd84cd24d3587fafc01ae1861f22c47262f2d7e9 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 55c9389453754227de31e0a76d32ba663cc8ca7c 
> 
> 
> Diff: https://reviews.apache.org/r/70335/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70335: Sent FrameworkInfo to agents when applying operations.

2019-04-01 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70335/#review214247
---




src/slave/slave.hpp
Lines 889 (patched)
<https://reviews.apache.org/r/70335/#comment300461>

Why do we need this hashmap? I see that the code inserts/removes entries, 
but nothing ever gets anything from it.



src/slave/slave.cpp
Lines 4420-4421 (patched)
<https://reviews.apache.org/r/70335/#comment300451>

`message.framework_info()` might not be set if the operation is triggered 
by an operator and not a framework, so these variables should probably be 
options.



src/slave/slave.cpp
Lines 4433 (patched)
<https://reviews.apache.org/r/70335/#comment300452>

Nit: I'd name this `unscheduleGC`.



src/slave/slave.cpp
Lines 4439-4443 (patched)
<https://reviews.apache.org/r/70335/#comment300458>

Should we also unschedule the garbage collection of the framework work 
directory?

It gets scheduled for GC here: 
https://github.com/apache/mesos/blob/1acb38c27306326a53f866b5386b5e28a6dc9314/src/slave/slave.cpp#L6834-L6838



src/slave/slave.cpp
Lines 4452-4456 (patched)
<https://reviews.apache.org/r/70335/#comment300456>

Nit: fits in one line.



src/slave/slave.cpp
Lines 4459 (patched)
<https://reviews.apache.org/r/70335/#comment300457>

I noticed that `Slave::run()` only checkpoints the framework if 
`frameworkInfo.checkpoint()` returns `true`.

Should we maybe not checkpoint operations/frameworkInfos of frameworks that 
set `checkpoint` to `false`?



src/slave/slave.cpp
Lines 4479 (patched)
<https://reviews.apache.org/r/70335/#comment300459>

Nit: I'd say: "Failed to unschedule framework directories scheduled for gc"

I would also include the path that couldn't be gc'd.


- Gastón Kleiman


On March 28, 2019, 4:27 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70335/
> ---
> 
> (Updated March 28, 2019, 4:27 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benno Evers, Chun-Hung Hsiao, 
> Gastón Kleiman, Joseph Wu, and Meng Zhu.
> 
> 
> Bugs: MESOS-8582
> https://issues.apache.org/jira/browse/MESOS-8582
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the master to send a framework's full
> `FrameworkInfo` to the agent in the `ApplyOperationMessage`.
> The agent is updated to checkpoint frameworks when applying
> operations, and to unschedule GC on the meta directory when
> a new framework is created.
> 
> The test `TerminalOrphanOperationAfterMasterFailover` is
> removed since this patch eliminates the case of orphan
> operations relevant to that test.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp acc67d3763ddee9027e6cf375f1d495ff5805026 
>   src/messages/messages.proto e30ad34cc9212b05f85ba5e1d4fcfc9e49ae92c0 
>   src/slave/slave.hpp 2bffdc4b1e282d3c6dae2dcf23584a8a4550bf94 
>   src/slave/slave.cpp 5373cee5d30c2403497939eeba2ee5405117237e 
>   src/tests/persistent_volume_tests.cpp 
> 7e929a5a3a92e16a5dec10206f37caebc20d66a8 
>   src/tests/reservation_tests.cpp cd84cd24d3587fafc01ae1861f22c47262f2d7e9 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 55c9389453754227de31e0a76d32ba663cc8ca7c 
> 
> 
> Diff: https://reviews.apache.org/r/70335/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Review Request 70283: Improved handling of resources consumed by orphan operations.

2019-03-22 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70283/
---

Review request for mesos, Benjamin Bannier, Greg Mann, Joseph Wu, and Meng Zhu.


Bugs: MESOS-9635
https://issues.apache.org/jira/browse/MESOS-9635


Repository: mesos


Description
---

This patch makes the master's `UpdateSlaveMessage` handler include
resources consumed by orphan operations when calling
`allocator->addResourceProvider()`.

The change prevents some races that lead to the master reoffering the
resources consumed by the operations and makes the
`OperationReconciliationTest.AgentPendingOperationAfterMasterFailover`
test stable.


Diffs
-

  src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320 


Diff: https://reviews.apache.org/r/70283/diff/1/


Testing
---

`OperationReconciliationTest.AgentPendingOperationAfterMasterFailover` passed 
over 5000 iterations under stress. Other tests still pass on GNU/Linux.


Thanks,

Gastón Kleiman



Re: Review Request 70243: Improved operation reconciliation for unsubscribed resource providers.

2019-03-21 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70243/#review213896
---




include/mesos/resource_provider/resource_provider.proto
Lines 101-104 (patched)
<https://reviews.apache.org/r/70243/#comment300013>

I think we should document somewhere (either protos or in the 
reconciliation handlers, or maybe in both places?) the difference in semantics, 
i.e., exactly which cases are handled differently depending on who initiated 
the reconciliation request.



src/master/master.cpp
Lines 9684-9693 (patched)
<https://reviews.apache.org/r/70243/#comment38>

Nit: I think the following version would be more concise:

```
ReconcileOperationsMessage::Operation* forwardedOperation = 
reconciliationMessage.add_operations();
forwardedOperation->mutable_operation_id()
  ->CopyFrom(operation.operation_id());
if (resourceProviderId.isSome()) {
  forwardedOperation->mutable_resource_provider_id()
->CopyFrom(resourceProviderId.get());
}
```



src/slave/slave.cpp
Lines 4524-4525 (patched)
<https://reviews.apache.org/r/70243/#comment300011>

Nit:

```
  } else if (operationId.isSome() &&
operationIds.contains(operationId.get())) {
```

But I like Joseph's suggestion better =).


- Gastón Kleiman


On March 21, 2019, 3:26 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70243/
> ---
> 
> (Updated March 21, 2019, 3:26 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, Gastón Kleiman, 
> and Joseph Wu.
> 
> 
> Bugs: MESOS-9318
> https://issues.apache.org/jira/browse/MESOS-9318
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the operation reconciliation pipeline to
> forward some framework-initiated reconciliation requests to
> the agent. In cases where an explicit reconciliation request
> specifies an operation which is not recognized on a resource
> provider which is also not recognized, the master will
> forward the request to the agent so that the resource
> provider manager can satisfy the request based on whether or
> not the resource provider has been seen before.
> 
> 
> Diffs
> -
> 
>   include/mesos/resource_provider/resource_provider.proto 
> 5ea9e2009209b1609619874ebd63cb1e2e698434 
>   src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320 
>   src/messages/messages.proto 633dddbaa874550f7f0d9513c608ed75b18059a8 
>   src/resource_provider/manager.cpp 7d3338ea7fbf330a25416f848db7742ad1bea52f 
>   src/slave/slave.cpp 36424f89a8c1f183febabcc9582975dd21213c25 
> 
> 
> Diff: https://reviews.apache.org/r/70243/diff/2/
> 
> 
> Testing
> ---
> 
> Testing details at the end of this chain.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70208: Added test for reconciliation of multiple operations.

2019-03-15 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70208/#review213756
---


Fix it, then Ship it!





src/tests/operation_reconciliation_tests.cpp
Lines 284-293 (patched)
<https://reviews.apache.org/r/70208/#comment299748>

We could remove this if the first operation also reserved 0.1 CPUs


- Gastón Kleiman


On March 15, 2019, 3:53 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70208/
> ---
> 
> (Updated March 15, 2019, 3:53 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Joseph Wu, Megha Sharma, and Jiang 
> Yan Xu.
> 
> 
> Bugs: MESOS-9648
> https://issues.apache.org/jira/browse/MESOS-9648
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added test for reconciliation of multiple operations.
> 
> 
> Diffs
> -
> 
>   src/tests/operation_reconciliation_tests.cpp 
> 6a815ad694e2a608ce324715c920833f825793a0 
> 
> 
> Diff: https://reviews.apache.org/r/70208/diff/2/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh --gtest_filter="*PendingAndFinishedOperations*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70221: Updated operation feedback documentation.

2019-03-15 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70221/#review213755
---


Ship it!




Ship It!

- Gastón Kleiman


On March 15, 2019, 3:37 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70221/
> ---
> 
> (Updated March 15, 2019, 3:37 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, James DeFelice, Megha Sharma, and 
> Jiang Yan Xu.
> 
> 
> Bugs: MESOS-9648
> https://issues.apache.org/jira/browse/MESOS-9648
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the operation feedback documentation to
> reflect API changes from MESOS-9648.
> 
> 
> Diffs
> -
> 
>   docs/reconciliation.md 150a145ac2cfe7b584d0b2b14e8132e8d2d6eb2e 
>   docs/scheduler-http-api.md 8384336bbecf2ca38a3cd203f9db28d931812d65 
> 
> 
> Diff: https://reviews.apache.org/r/70221/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70200: Changed operation reconciliation to send updates on the event stream.

2019-03-15 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70200/#review213754
---


Ship it!




Ship It!

- Gastón Kleiman


On March 15, 2019, 3:35 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70200/
> ---
> 
> (Updated March 15, 2019, 3:35 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
> Gastón Kleiman, James DeFelice, Megha Sharma, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-9648
> https://issues.apache.org/jira/browse/MESOS-9648
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the v1 scheduler API's RECONCILE_OPERATIONS
> call to provide a 202 Accepted response with an empty body,
> rather than a response containing all reconciliation results.
> In this new scheme, reconciliation requests are satisfied with
> operation status updates on the scheduler's event stream.
> Related tests are also updated.
> 
> NOTE that this is a breaking change for schedulers consuming
> the experimental operation reconciliation API.
> 
> 
> Diffs
> -
> 
>   include/mesos/scheduler/scheduler.proto 
> 5443048a620395c6beb0d93a325187df905b1b0b 
>   include/mesos/v1/scheduler/scheduler.proto 
> 3cfe0251847431da09a4eb9c81aecb77c84100dc 
>   src/master/http.cpp d6d47405f871f88235cc203ef5cf9f1460754e0c 
>   src/master/master.hpp 953cc5b8ab6a8e1920a3ad63fb2dd6382e3603ec 
>   src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
>   src/tests/operation_reconciliation_tests.cpp 
> 6a815ad694e2a608ce324715c920833f825793a0 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 7945384867f26fa15dc734a235ae509d5d6d350f 
> 
> 
> Diff: https://reviews.apache.org/r/70200/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70208: Added test for reconciliation of multiple operations.

2019-03-15 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70208/#review213753
---



Tnis test looks good — however in my experience tests which send only `RESERVE` 
opertions don't need a resource provider. Removing the instantiation of the RP 
would make remove a lot of boilerplate and some amount of clock manipulation, 
thus making the test much shorter, readable and less prone to races.

- Gastón Kleiman


On March 13, 2019, 6:41 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70208/
> ---
> 
> (Updated March 13, 2019, 6:41 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Joseph Wu, Megha Sharma, and Jiang 
> Yan Xu.
> 
> 
> Bugs: MESOS-9648
> https://issues.apache.org/jira/browse/MESOS-9648
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds a test to verify that operation reconciliation
> via the v1 scheduler API works correctly when multiple
> operations in different states are reconciled at the same time.
> 
> 
> Diffs
> -
> 
>   src/tests/operation_reconciliation_tests.cpp 
> 6a815ad694e2a608ce324715c920833f825793a0 
> 
> 
> Diff: https://reviews.apache.org/r/70208/diff/1/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh --gtest_filter="*PendingAndFinishedOperations*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70200: Changed operation reconciliation to send updates on the event stream.

2019-03-15 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70200/#review213751
---



Thanks for the update! This is looking pretty good.

We also have to update the docs, are you going to do that here or do you plan 
to add another patch to the chain?


include/mesos/v1/scheduler/scheduler.proto
Line 218 (original), 221 (patched)
<https://reviews.apache.org/r/70200/#comment299744>

Shouldn't we mark this as deprecated by adding: `[deprecated=true]`?

I don't think it has any effect on the generated code, but it would be 
useful for anyone reading the proto.



include/mesos/v1/scheduler/scheduler.proto
Line 226 (original), 229 (patched)
<https://reviews.apache.org/r/70200/#comment299746>

We should mark this field as deprecated.



include/mesos/v1/scheduler/scheduler.proto
Lines 534-536 (patched)
<https://reviews.apache.org/r/70200/#comment299745>

The `call()` method hasn't changed, so I don't think this statement is 
accurate.

The field will be set to something if an API call returns a response, but 
it just happens to be the case that no call does that at the moment.

I would delete this comment.


- Gastón Kleiman


On March 14, 2019, 10:24 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70200/
> ---
> 
> (Updated March 14, 2019, 10:24 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
> Gastón Kleiman, James DeFelice, Megha Sharma, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-9648
> https://issues.apache.org/jira/browse/MESOS-9648
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the v1 scheduler API's RECONCILE_OPERATIONS
> call to provide a 202 Accepted response with an empty body,
> rather than a response containing all reconciliation results.
> In this new scheme, reconciliation requests are satisfied with
> operation status updates on the scheduler's event stream.
> Related tests are also updated.
> 
> NOTE that this is a breaking change for schedulers consuming
> the experimental operation reconciliation API.
> 
> 
> Diffs
> -
> 
>   include/mesos/scheduler/scheduler.proto 
> 5443048a620395c6beb0d93a325187df905b1b0b 
>   include/mesos/v1/scheduler/scheduler.proto 
> 3cfe0251847431da09a4eb9c81aecb77c84100dc 
>   src/master/http.cpp d6d47405f871f88235cc203ef5cf9f1460754e0c 
>   src/master/master.hpp 953cc5b8ab6a8e1920a3ad63fb2dd6382e3603ec 
>   src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
>   src/tests/operation_reconciliation_tests.cpp 
> 6a815ad694e2a608ce324715c920833f825793a0 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 7945384867f26fa15dc734a235ae509d5d6d350f 
> 
> 
> Diff: https://reviews.apache.org/r/70200/diff/3/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70200: Changed operation reconciliation to send updates on the event stream.

2019-03-13 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70200/#review213691
---




src/master/master.cpp
Line 9557 (original), 9557-9564 (patched)
<https://reviews.apache.org/r/70200/#comment299688>

Should this lambda leverage move semantics? It could take an rvalue and do 
an assignment instead of a copy:

```
update.mutable_update_operation_status()
  ->mutable_status() = status;
```

It would also be possible to create a single `scheduler::Event` message 
instead of one per lambda invocation.



src/tests/operation_reconciliation_tests.cpp
Line 200 (original), 204 (patched)
<https://reviews.apache.org/r/70200/#comment299689>

`s/ASSERT/EXPECT/` here and on the other tests.

The updated test doesn't call `result->response()`, so there is no risk of 
it crashing.



src/tests/storage_local_resource_provider_tests.cpp
Lines 4658-4660 (original), 4662-4664 (patched)
<https://reviews.apache.org/r/70200/#comment299693>

`%s/ASSERT/EXPECT/g`


- Gastón Kleiman


On March 13, 2019, 11:09 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70200/
> ---
> 
> (Updated March 13, 2019, 11:09 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
> Gastón Kleiman, James DeFelice, Megha Sharma, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-9648
> https://issues.apache.org/jira/browse/MESOS-9648
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the v1 scheduler API's RECONCILE_OPERATIONS
> call to provide a 202 Accepted response with an empty body,
> rather than a response containing all reconciliation results.
> In this new scheme, reconciliation requests are satisfied with
> operation status updates on the scheduler's event stream.
> Related tests are also updated.
> 
> NOTE that this is a breaking change for schedulers consuming
> the experimental operation reconciliation API.
> 
> 
> Diffs
> -
> 
>   src/master/http.cpp d6d47405f871f88235cc203ef5cf9f1460754e0c 
>   src/master/master.hpp 953cc5b8ab6a8e1920a3ad63fb2dd6382e3603ec 
>   src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
>   src/tests/operation_reconciliation_tests.cpp 
> 6a815ad694e2a608ce324715c920833f825793a0 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 7945384867f26fa15dc734a235ae509d5d6d350f 
> 
> 
> Diff: https://reviews.apache.org/r/70200/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70186: Updated protobuf comments related to operation feedback.

2019-03-12 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70186/#review213626
---


Ship it!




Ship It!

- Gastón Kleiman


On March 11, 2019, 12:18 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70186/
> ---
> 
> (Updated March 11, 2019, 12:18 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Gastón Kleiman.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> While the `id` field in `Offer::Operation` was previously marked
> experimental, it is also wise to explicitly mark the related
> scheduler API calls and events as experimental to avoid any
> confusion.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 099873278bfc4ca0f28ce52f329cba0903cec7cb 
>   include/mesos/scheduler/scheduler.proto 
> 27634153b372584a214c9631a939254bd984c735 
>   include/mesos/v1/mesos.proto 3656aa7f9d87ff3868bbc7b221d9fc7c5d7a00a3 
>   include/mesos/v1/scheduler/scheduler.proto 
> 9b0f54604f5b7b904b52b7f0a24c3a6402659ef0 
> 
> 
> Diff: https://reviews.apache.org/r/70186/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 70116: Added metrics for offer operation feedback.

2019-03-07 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213541
---




src/master/master.hpp
Lines 941-944 (patched)
<https://reviews.apache.org/r/70116/#comment299493>

Is this used anywhere?



src/master/metrics.hpp
Lines 83 (patched)
<https://reviews.apache.org/r/70116/#comment299494>

s/per operation/per operation type/



src/master/metrics.cpp
Lines 158-159 (original), 160-161 (patched)
<https://reviews.apache.org/r/70116/#comment299495>

Should we add a similar metric for operation status updates?



src/master/metrics.cpp
Lines 174-177 (original), 176-179 (patched)
<https://reviews.apache.org/r/70116/#comment299496>

Should we add a similar metrics for operation status updates?



src/master/metrics.cpp
Lines 386-388 (patched)
<https://reviews.apache.org/r/70116/#comment299497>

This is weirdly formatted, I think the following follows the style guide 
and is more readable:

```
if (type == Offer::Operation::UNKNOWN ||
type == Offer::Operation::LAUNCH ||
type == Offer::Operation::LAUNCH_GROUP) {
```



src/master/metrics.cpp
Lines 637-651 (patched)
<https://reviews.apache.org/r/70116/#comment299499>

Can these counters be decremented? If not, should we add a CHECK to ensure 
that `delta` is always positive?



src/master/metrics.cpp
Lines 684 (patched)
<https://reviews.apache.org/r/70116/#comment299500>

Can counters for terminal states be decremented? I guess not? If I'm 
correct, then we could add a `CHECK` here.


- Gastón Kleiman


On March 7, 2019, 9:01 a.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> ---
> 
> (Updated March 7, 2019, 9:01 a.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
> https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp aceab3455adfdf7ff6e168e033316997a4d36fb5 
>   src/master/master.cpp 665c1c7979a5ae4ecec0d5a68e59c5419049a4d5 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 70156: Added helper to test for metrics values.

2019-03-07 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70156/#review213540
---




src/tests/utils.hpp
Lines 39 (patched)
<https://reviews.apache.org/r/70156/#comment299492>

Nit: I'd name this `EXPECT_METRIC_EQ`.


- Gastón Kleiman


On March 7, 2019, 8:59 a.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70156/
> ---
> 
> (Updated March 7, 2019, 8:59 a.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added helper to test for metrics values.
> 
> 
> Diffs
> -
> 
>   src/tests/utils.hpp b2f22cd4db223d167aa35109cd8de6df82ed1f4d 
> 
> 
> Diff: https://reviews.apache.org/r/70156/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 70115: Updated comment about operations.

2019-03-07 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70115/#review213539
---


Ship it!




Ship It!

- Gastón Kleiman


On March 4, 2019, 9 a.m., Benno Evers wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70115/
> ---
> 
> (Updated March 4, 2019, 9 a.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> A comment about operation feedback mentioned the master sending 
> `OPERATION_DROPPED` updates for certain operations. However, such updates are 
> only sent by agents; the master will send `OPERATION_ERROR` updates instead.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 665c1c7979a5ae4ecec0d5a68e59c5419049a4d5 
> 
> 
> Diff: https://reviews.apache.org/r/70115/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Benno Evers
> 
>



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-26 Thread Gastón Kleiman


> On Feb. 26, 2019, 3:16 p.m., Joseph Wu wrote:
> > src/slave/slave.cpp
> > Lines 4694 (patched)
> > <https://reviews.apache.org/r/69978/diff/5/?file=2126441#file2126441line4694>
> >
> > Is there any reason why this is not the following?
> > ```
> > operation->latest_status().state()
> > ```

Not that I can think of, changing it would be a nice readibility improvement =).


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/#review213234
-----------


On Feb. 22, 2019, 5:28 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69978/
> ---
> 
> (Updated Feb. 22, 2019, 5:28 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-9574
> https://issues.apache.org/jira/browse/MESOS-9574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Make the agent garbage collect an operation status update stream once
> a terminal status update is acknowledged.
> 
> This patch also improves the logging of acknowledgement failures and the
> readability of the `Slave::operationStatusAcknowledgement` method.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 
> 
> 
> Diff: https://reviews.apache.org/r/69978/diff/5/
> 
> 
> Testing
> ---
> 
> Manual testing + existing tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 70044: Moved status update streams of operations on agent's default resources.

2019-02-26 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70044/
---

(Updated Feb. 26, 2019, 11:01 a.m.)


Review request for mesos, Benno Evers, Greg Mann, and Joseph Wu.


Changes
---

Added comments.


Bugs: MESOS-9597
https://issues.apache.org/jira/browse/MESOS-9597


Repository: mesos


Description
---

This patch moves the operation status update streams for operations
affecting agent default resources from the root of the metadata
directory to the agent's metadata directory, i.e.,
`meta/operations/` to
`meta/slaves/`.


Diffs (updated)
-

  src/slave/paths.hpp bddaef220b4c9ac7d7988664cd82828e32f5a0df 
  src/slave/paths.cpp 4ff9c5da05aef1496323daf269ab9b3215b0b31f 
  src/slave/slave.hpp 7ad495504e4ff144ac31812fbd4a3a1f4da86f02 
  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/70044/diff/2/

Changes: https://reviews.apache.org/r/70044/diff/1-2/


Testing
---

Tests still pass on GNU/Linux + quite some manual testing.


Thanks,

Gastón Kleiman



Re: Review Request 70044: Moved status update streams of operations on agent's default resources.

2019-02-26 Thread Gastón Kleiman


> On Feb. 26, 2019, 10:46 a.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Lines 7452 (patched)
> > <https://reviews.apache.org/r/70044/diff/1/?file=2126445#file2126445line7461>
> >
> > What's the reason for moving this?

I had to change the method's signature in order to be able to be able to 
short-circuit the recovery process by returning `Nothing()` (and not doing 
anything) if there is no checkpointed state.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70044/#review213205
---


On Feb. 22, 2019, 5:34 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70044/
> ---
> 
> (Updated Feb. 22, 2019, 5:34 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-9597
> https://issues.apache.org/jira/browse/MESOS-9597
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch moves the operation status update streams for operations
> affecting agent default resources from the root of the metadata
> directory to the agent's metadata directory, i.e.,
> `meta/operations/` to
> `meta/slaves/`.
> 
> 
> Diffs
> -
> 
>   src/slave/paths.hpp bddaef220b4c9ac7d7988664cd82828e32f5a0df 
>   src/slave/paths.cpp 4ff9c5da05aef1496323daf269ab9b3215b0b31f 
>   src/slave/slave.hpp 7ad495504e4ff144ac31812fbd4a3a1f4da86f02 
>   src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 
> 
> 
> Diff: https://reviews.apache.org/r/70044/diff/1/
> 
> 
> Testing
> ---
> 
> Tests still pass on GNU/Linux + quite some manual testing.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Review Request 70044: Moved status update streams of operations on agent's default resources.

2019-02-22 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70044/
---

Review request for mesos, Benno Evers, Greg Mann, and Joseph Wu.


Bugs: MESOS-9597
https://issues.apache.org/jira/browse/MESOS-9597


Repository: mesos


Description
---

This patch moves the operation status update streams for operations
affecting agent default resources from the root of the metadata
directory to the agent's metadata directory, i.e.,
`meta/operations/` to
`meta/slaves/`.


Diffs
-

  src/slave/paths.hpp bddaef220b4c9ac7d7988664cd82828e32f5a0df 
  src/slave/paths.cpp 4ff9c5da05aef1496323daf269ab9b3215b0b31f 
  src/slave/slave.hpp 7ad495504e4ff144ac31812fbd4a3a1f4da86f02 
  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/70044/diff/1/


Testing
---


Thanks,

Gastón Kleiman



Re: Review Request 70044: Moved status update streams of operations on agent's default resources.

2019-02-22 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70044/
---

(Updated Feb. 22, 2019, 5:34 p.m.)


Review request for mesos, Benno Evers, Greg Mann, and Joseph Wu.


Bugs: MESOS-9597
https://issues.apache.org/jira/browse/MESOS-9597


Repository: mesos


Description
---

This patch moves the operation status update streams for operations
affecting agent default resources from the root of the metadata
directory to the agent's metadata directory, i.e.,
`meta/operations/` to
`meta/slaves/`.


Diffs
-

  src/slave/paths.hpp bddaef220b4c9ac7d7988664cd82828e32f5a0df 
  src/slave/paths.cpp 4ff9c5da05aef1496323daf269ab9b3215b0b31f 
  src/slave/slave.hpp 7ad495504e4ff144ac31812fbd4a3a1f4da86f02 
  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/70044/diff/1/


Testing (updated)
---

Tests still pass on GNU/Linux + quite some manual testing.


Thanks,

Gastón Kleiman



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-22 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/
---

(Updated Feb. 22, 2019, 5:28 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
---

Removed unnecessary checkpointing (`Slave::removeOperation()` already 
checkpoints).


Bugs: MESOS-9574
https://issues.apache.org/jira/browse/MESOS-9574


Repository: mesos


Description
---

Make the agent garbage collect an operation status update stream once
a terminal status update is acknowledged.

This patch also improves the logging of acknowledgement failures and the
readability of the `Slave::operationStatusAcknowledgement` method.


Diffs (updated)
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69978/diff/5/

Changes: https://reviews.apache.org/r/69978/diff/4-5/


Testing
---

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-21 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/
---

(Updated Feb. 21, 2019, 5:56 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
---

Addressed feedback.


Bugs: MESOS-9574
https://issues.apache.org/jira/browse/MESOS-9574


Repository: mesos


Description
---

Make the agent garbage collect an operation status update stream once
a terminal status update is acknowledged.

This patch also improves the logging of acknowledgement failures and the
readability of the `Slave::operationStatusAcknowledgement` method.


Diffs (updated)
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69978/diff/4/

Changes: https://reviews.apache.org/r/69978/diff/3-4/


Testing
---

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69977: Improved agent operation recovery process.

2019-02-21 Thread Gastón Kleiman


> On Feb. 21, 2019, 5:19 p.m., Joseph Wu wrote:
> > src/slave/slave.cpp
> > Lines 7337 (patched)
> > <https://reviews.apache.org/r/69977/diff/2/?file=2125997#file2125997line7341>
> >
> > The SLRP uses this method like this:
> > ```
> > Try> operationPaths =
> >   slave::paths::getOperationPaths(
> >   slave::paths::getResourceProviderPath(
> >   metaDir, slaveId, info.type(), info.name(), info.id()));
> > ```
> > 
> > Which results in a directory tree (taken from `slave/paths.hpp`) like:
> > ```
> > //   |   |-- slaves
> > //   |   |-- latest (symlink)
> > //   |   |-- 
> > //   |   |-- slave.info
> > //   |   |-- resource_providers
> > //   |   |   |-- 
> > //   |   |   |-- 
> > //   |   |   |-- latest (symlink)
> > //   |   |   |-- 
> > //   |   |   |-- resource_provider.state
> > //   |   |   |-- operations
> > //   |   |   |-- 
> > //   |   |   |-- operation.updates
> > ```
> > 
> > But this usage suggests the existence of another directory structure 
> > like:
> > ```
> > //   |   |-- slaves
> > //   |   |-- latest (symlink)
> > //   |   |-- 
> > //   |   |-- slave.info
> > //   |   |-- resource_providers
> > //   |   |   |-- ...
> > //   |   |-- operations
> > //   |   |   |-- 
> > //   |   |   |-- operation.updates
> > ```
> > 
> > This is presumably created by the agent's 
> > `operationStatusUpdateManager`.  But `slave/paths.hpp` does not have the 
> > updated directory structure.  It would also be helpful to note if there is 
> > or is not any duplication between these two directories.

I noticed yesterday that the streams that correspond to operations affecting 
agent's default resouces are currently created under `meta/operations/` instead 
of under `meta/slaves//operations/`.

After discussing it with Greg, we both agreed that the operations should be 
stored in the latter directory.

I will follow up with a patch that changes the place in which streams are 
stored; that patch will also update the ascii drawing in `slave/paths.hpp`: 
https://issues.apache.org/jira/browse/MESOS-9597


> On Feb. 21, 2019, 5:19 p.m., Joseph Wu wrote:
> > src/slave/slave.cpp
> > Lines 7358-7359 (patched)
> > <https://reviews.apache.org/r/69977/diff/2/?file=2125997#file2125997line7362>
> >
> > It would be nice to put the explanation from your commit message:
> > ```
> > This prevents the agent from asking the operation status update manager
> > to recover streams that haven't been created yet.
> > ```
> > Into this logic block somewhere.  i.e.
> > ```
> > // NOTE: Operations found in the `ResourceState`, but not in the
> > // operation update streams directory are streams to be created.
> > // These operations have nothing to recover.
> > ```

I agree, so I added a long comment at the beginning of the block that generates 
the list of IDs to recover.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69977/#review213064
---


On Feb. 21, 2019, 5:52 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69977/
> ---
> 
> (Updated Feb. 21, 2019, 5:52 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8054
> https://issues.apache.org/jira/browse/MESOS-8054
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch makes the agent walk the operation status update streams
> directories in order to generate the list of streams to recover, instead
> of generating it from the checkpointed `ResourceState` message.
> 
> This prevents the agent from asking the operation status update manager
> to recover streams that haven't been created yet.
> 
> The patch also makes the agent garbage collect operation status update
> streams if no correspondng operation is present in the checkpointed
> state. This can happen after the agent fails over while processing the
> acknowledgement of a terminal operation status update.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 
> 
> 
> Diff: https://reviews.apache.org/r/69977/diff/3/
> 
> 
> Testing
> ---
> 
> Manual testing + existing tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69977: Improved agent operation recovery process.

2019-02-21 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69977/
---

(Updated Feb. 21, 2019, 5:52 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
---

Addresed feedback.


Bugs: MESOS-8054
https://issues.apache.org/jira/browse/MESOS-8054


Repository: mesos


Description
---

This patch makes the agent walk the operation status update streams
directories in order to generate the list of streams to recover, instead
of generating it from the checkpointed `ResourceState` message.

This prevents the agent from asking the operation status update manager
to recover streams that haven't been created yet.

The patch also makes the agent garbage collect operation status update
streams if no correspondng operation is present in the checkpointed
state. This can happen after the agent fails over while processing the
acknowledgement of a terminal operation status update.


Diffs (updated)
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69977/diff/3/

Changes: https://reviews.apache.org/r/69977/diff/2-3/


Testing
---

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69960: Added the concept of "orphaned operations" to the master.

2019-02-21 Thread Gastón Kleiman


> On Feb. 20, 2019, 1:22 p.m., Greg Mann wrote:
> > src/master/master.hpp
> > Lines 156 (patched)
> > <https://reviews.apache.org/r/69960/diff/3/?file=2125868#file2125868line156>
> >
> > s/Marks a non-terminal, non-speculative/Marks a non-speculative/

I think this method will also be called in the case that the agent reregisters 
with a pending operation for a framework that might not be torn down but isn't 
known by the current master.

If I'm correct, then I think we should also mention that case here.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69960/#review212972
---


On Feb. 19, 2019, 4:45 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69960/
> ---
> 
> (Updated Feb. 19, 2019, 4:45 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
> https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> An orphaned operation is a non-terminal, non-speculative operation whose
> originating framework has been torn down.  These operations will
> consume resources until they are terminated, but will have no entry
> in the allocator because their associated framework no longer exists.
> 
> To account for resources used by orphaned operations, the operation's
> resources are removed from the agent's total resources upon being
> orphaned.
> 
> This commit handles one of the two possible code paths which can
> introduce orphaned operations.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp ccd117f607747d49e5259d9ba6645fed61811adf 
>   src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 
> 
> 
> Diff: https://reviews.apache.org/r/69960/diff/3/
> 
> 
> Testing
> ---
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-21 Thread Gastón Kleiman


> On Feb. 20, 2019, 6:13 p.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Lines 7452 (patched)
> > <https://reviews.apache.org/r/69978/diff/3/?file=2126000#file2126000line7457>
> >
> > Is it possible that the stream could still be present on disk, but the 
> > operation would not be found in the checkpoint file?
> > 
> > Since we use the directories found on disk to recover the SUM but we 
> > use the recovered operation state when calling `addOperation()`, perhaps 
> > this call to `completedOperations.push_back(uuid);` should be moved outside 
> > of this conditional, so that we GC the stream regardless of whether or not 
> > the operation was present in the agent's checkpoint file?

This RR depends on https://reviews.apache.org/r/69977/ which makes 
`Slave::_recoverOperations()` handle that case.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/#review213008
-----------


On Feb. 20, 2019, 4:43 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69978/
> ---
> 
> (Updated Feb. 20, 2019, 4:43 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-9574
> https://issues.apache.org/jira/browse/MESOS-9574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Make the agent garbage collect an operation status update stream once
> a terminal status update is acknowledged.
> 
> This patch also improves the logging of acknowledgement failures and the
> readability of the `Slave::operationStatusAcknowledgement` method.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 
> 
> 
> Diff: https://reviews.apache.org/r/69978/diff/3/
> 
> 
> Testing
> ---
> 
> Manual testing + existing tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-20 Thread Gastón Kleiman


> On Feb. 15, 2019, 3:50 p.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Line 4711 (original), 4735 (patched)
> > <https://reviews.apache.org/r/69978/diff/2/?file=2125182#file2125182line4735>
> >
> > By moving this line into the `.then()` continuation, we will no longer 
> > remove the operation if there is an error when calling 
> > `operationStatusUpdateManager.acknowledgement()`; is that what we want? 
> > Perhaps we should add a `removeOperation()` call in the `err()` helper as 
> > well?
> 
> Gastón Kleiman wrote:
> I think the patch is good as-is — if we add a removeOperation() call in 
> the err() helper, the framework won't be able to ack the update in the case 
> that there is an error calling operationStatusUpdateManager.acknowledgement() 
> which prevents the SUM from checkpointing the ack and cleaning up the stream, 
> causing it to resend the status update.
> 
> 
> That means that the agent would keep resending the update until the 
> stream is garbage collected the next time the agent recovers.
> 
> 
> 
> Your comment however made me notice another corner case that the SLRP 
> handles and I had missed: the agent could fail over right before executing 
> this lambda. It should notice that during recovery and garbage collect the 
> stream — I'm adding a new patch to this chain that does this.

Note: I ended up squashing the new patch with this one.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/#review212881
---


On Feb. 20, 2019, 4:43 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69978/
> ---
> 
> (Updated Feb. 20, 2019, 4:43 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-9574
> https://issues.apache.org/jira/browse/MESOS-9574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Make the agent garbage collect an operation status update stream once
> a terminal status update is acknowledged.
> 
> This patch also improves the logging of acknowledgement failures and the
> readability of the `Slave::operationStatusAcknowledgement` method.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 
> 
> 
> Diff: https://reviews.apache.org/r/69978/diff/3/
> 
> 
> Testing
> ---
> 
> Manual testing + existing tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-20 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/
---

(Updated Feb. 20, 2019, 4:43 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
---

Handled a missed case.


Bugs: MESOS-9574
https://issues.apache.org/jira/browse/MESOS-9574


Repository: mesos


Description
---

Make the agent garbage collect an operation status update stream once
a terminal status update is acknowledged.

This patch also improves the logging of acknowledgement failures and the
readability of the `Slave::operationStatusAcknowledgement` method.


Diffs (updated)
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69978/diff/3/

Changes: https://reviews.apache.org/r/69978/diff/2-3/


Testing
---

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-20 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/#review212999
---




src/slave/slave.cpp
Line 4711 (original), 4735 (patched)
<https://reviews.apache.org/r/69978/#comment298854>

I think the patch is good as-is — if we add a `removeOperation()` call in 
the `err()` helper, the framework won't be able to ack the update in the case 
that there is an error calling `operationStatusUpdateManager.acknowledgement()` 
which prevents the SUM from checkpointing the ack and cleaning up the stream, 
causing it to resend the status update.

That means that the agent would keep resending the update until the stream 
is garbage collected the next time the agent recovers.

Your comment however made me notice another corner case that the SLRP 
handles and I had missed: the agent could fail over right before executing this 
lambda. It should notice that during recovery and garbage collect the stream — 
I'm adding a new patch to this chain that does this.


- Gastón Kleiman


On Feb. 13, 2019, 3:15 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69978/
> ---
> 
> (Updated Feb. 13, 2019, 3:15 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-9574
> https://issues.apache.org/jira/browse/MESOS-9574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Make the agent garbage collect an operation status update stream once
> a terminal status update is acknowledged.
> 
> This patch also improves the logging of acknowledgement failures and the
> readability of the `Slave::operationStatusAcknowledgement` method.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 
> 
> 
> Diff: https://reviews.apache.org/r/69978/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing + existing tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-20 Thread Gastón Kleiman


> On Feb. 15, 2019, 3:50 p.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Line 4711 (original), 4735 (patched)
> > <https://reviews.apache.org/r/69978/diff/2/?file=2125182#file2125182line4735>
> >
> > By moving this line into the `.then()` continuation, we will no longer 
> > remove the operation if there is an error when calling 
> > `operationStatusUpdateManager.acknowledgement()`; is that what we want? 
> > Perhaps we should add a `removeOperation()` call in the `err()` helper as 
> > well?

I think the patch is good as-is — if we add a removeOperation() call in the 
err() helper, the framework won't be able to ack the update in the case that 
there is an error calling operationStatusUpdateManager.acknowledgement() which 
prevents the SUM from checkpointing the ack and cleaning up the stream, causing 
it to resend the status update.


That means that the agent would keep resending the update until the stream is 
garbage collected the next time the agent recovers.


Your comment however made me notice another corner case that the SLRP handles 
and I had missed: the agent could fail over right before executing this lambda. 
It should notice that during recovery and garbage collect the stream — I'm 
adding a new patch to this chain that does this.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/#review212881
-------


On Feb. 13, 2019, 3:15 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69978/
> ---
> 
> (Updated Feb. 13, 2019, 3:15 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-9574
> https://issues.apache.org/jira/browse/MESOS-9574
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Make the agent garbage collect an operation status update stream once
> a terminal status update is acknowledged.
> 
> This patch also improves the logging of acknowledgement failures and the
> readability of the `Slave::operationStatusAcknowledgement` method.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 
> 
> 
> Diff: https://reviews.apache.org/r/69978/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing + existing tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Review Request 70028: Added garbage collection of terminated operation status streams.

2019-02-20 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70028/
---

Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-9574
https://issues.apache.org/jira/browse/MESOS-9574


Repository: mesos


Description
---

Added garbage collection of terminated operation status streams.


Diffs
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/70028/diff/1/


Testing
---

Tests are still passing.


Thanks,

Gastón Kleiman



Re: Review Request 69977: Improved agent operation recovery process.

2019-02-20 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69977/
---

(Updated Feb. 20, 2019, 2:58 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
---

Fixed typo and whitspace.


Bugs: MESOS-8054
https://issues.apache.org/jira/browse/MESOS-8054


Repository: mesos


Description
---

This patch makes the agent walk the operation status update streams
directories in order to generate the list of streams to recover, instead
of generating it from the checkpointed `ResourceState` message.

This prevents the agent from asking the operation status update manager
to recover streams that haven't been created yet.

The patch also makes the agent garbage collect operation status update
streams if no correspondng operation is present in the checkpointed
state. This can happen after the agent fails over while processing the
acknowledgement of a terminal operation status update.


Diffs (updated)
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69977/diff/2/

Changes: https://reviews.apache.org/r/69977/diff/1-2/


Testing
---

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69957: Updated master operation handling to account for new agent capability.

2019-02-15 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69957/#review212883
---


Ship it!




Ship It!

- Gastón Kleiman


On Feb. 13, 2019, 8:09 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69957/
> ---
> 
> (Updated Feb. 13, 2019, 8:09 p.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Since the new AGENT_OPERATION_FEEDBACK capability has been
> made required for agent startup, we need to update the way
> the master handles validation of incoming operations and
> acknowledgement calls to account for cases where each of
> the AGENT_OPERATION_FEEDBACK and RESOURCE_PROVIDER
> capabilities are enabled/disabled.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp cf2210ec26642028d5e4fb7fc1841eb0a1ed3396 
> 
> 
> Diff: https://reviews.apache.org/r/69957/diff/4/
> 
> 
> Testing
> ---
> 
> Testing details at the end of this chain.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69909: Tested unreachable task behavior on agent GC.

2019-02-14 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69909/#review212850
---


Fix it, then Ship it!





src/tests/partition_tests.cpp
Lines 3086-3089 (patched)
<https://reviews.apache.org/r/69909/#comment298734>

This is in the middle of the code related to the explicit reconciliation.

I would move it to line 3077.

Consider explaining a bit more on the comment:

``slave1` has been garbage collected, so the task shoudl have been removed 
from the master's unreachable tasks list`


- Gastón Kleiman


On Feb. 6, 2019, 8:41 a.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69909/
> ---
> 
> (Updated Feb. 6, 2019, 8:41 a.m.)
> 
> 
> Review request for mesos, Gastón Kleiman and Greg Mann.
> 
> 
> Bugs: MESOS-8887
> https://issues.apache.org/jira/browse/MESOS-8887
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated `PartitionTest, RegistryGcByCount` test. This test fails
> without the previous patch.
> 
> 
> Diffs
> -
> 
>   src/tests/partition_tests.cpp 15af47b7d4328861a7523e669c2afc6a6574166b 
> 
> 
> Diff: https://reviews.apache.org/r/69909/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 69908: Removed unreachable tasks from `Master::Framework` on agent GC.

2019-02-14 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69908/#review212849
---


Ship it!




Ship It!

- Gastón Kleiman


On Feb. 6, 2019, 8:38 a.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69908/
> ---
> 
> (Updated Feb. 6, 2019, 8:38 a.m.)
> 
> 
> Review request for mesos, Gastón Kleiman and Greg Mann.
> 
> 
> Bugs: MESOS-8887
> https://issues.apache.org/jira/browse/MESOS-8887
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Unreachable tasks are stored in `Slaves` and `Framework` structs of
> the master, but they were only being removed from the former when
> an unreachable agent is GCed from the registry. This patch fixes it
> so that the latter is also cleaned up.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 4c9ef2528d0ab74b565dcb03d05c5189d0aa0c0f 
> 
> 
> Diff: https://reviews.apache.org/r/69908/diff/1/
> 
> 
> Testing
> ---
> 
> make check (Updated test in next patch)
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 69907: Fixed variable names in `Master::_doRegistryGC()`.

2019-02-14 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69907/#review212848
---


Fix it, then Ship it!





src/master/master.cpp
Line 1799 (original), 1799 (patched)
<https://reviews.apache.org/r/69907/#comment298733>

Can you fix it here too?


- Gastón Kleiman


On Feb. 6, 2019, 8:35 a.m., Vinod Kone wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69907/
> ---
> 
> (Updated Feb. 6, 2019, 8:35 a.m.)
> 
> 
> Review request for mesos, Gastón Kleiman and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Substituted `slave` with `slaveId` to be consistent with the code base.
> No functional changes.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 4c9ef2528d0ab74b565dcb03d05c5189d0aa0c0f 
> 
> 
> Diff: https://reviews.apache.org/r/69907/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>



Re: Review Request 69969: Updated docs related to agent capabilities.

2019-02-13 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69969/#review212814
---


Ship it!




Ship It!

- Gastón Kleiman


On Feb. 12, 2019, 6:46 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69969/
> ---
> 
> (Updated Feb. 12, 2019, 6:46 p.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Updated docs related to agent capabilities.
> 
> 
> Diffs
> -
> 
>   docs/configuration/agent.md a4015c409d00a4c117df6c869d5ba5072bcfe58e 
>   include/mesos/mesos.proto 9d5c2b53164adb44eb4a6f12a507e009bd607940 
>   include/mesos/v1/mesos.proto aef31eb5b8781182d3f42d935b12470b319027ed 
>   src/slave/flags.cpp 066b84f528b4c888dde399e0b5d5fe5531934de6 
> 
> 
> Diff: https://reviews.apache.org/r/69969/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69964: Added tests related to operation feedback agent capabilities.

2019-02-13 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69964/#review212813
---


Ship it!




Ship It!

- Gastón Kleiman


On Feb. 12, 2019, 6:39 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69964/
> ---
> 
> (Updated Feb. 12, 2019, 6:39 p.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> These tests verify that master and agent behavior is correct
> during a downgrade of an agent to one which does not have the
> AGENT_OPERATION_FEEDBACK capability, and when a new agent's
> capabilities are altered such that it no longer has the
> RESOURCE_PROVIDER capability set.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 9584fa38c6e9ea26b5b6937199fe4679325ef55e 
> 
> 
> Diff: https://reviews.apache.org/r/69964/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-13 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/
---

(Updated Feb. 13, 2019, 3:15 p.m.)


Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Changes
---

Fixed typos in the commit message.


Bugs: MESOS-9574
https://issues.apache.org/jira/browse/MESOS-9574


Repository: mesos


Description (updated)
---

Make the agent garbage collect an operation status update stream once
a terminal status update is acknowledged.

This patch also improves the logging of acknowledgement failures and the
readability of the `Slave::operationStatusAcknowledgement` method.


Diffs (updated)
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69978/diff/2/

Changes: https://reviews.apache.org/r/69978/diff/1-2/


Testing
---

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman



Review Request 69978: Added garbage collection of terminated operations status update streams.

2019-02-13 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69978/
---

Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-9574
https://issues.apache.org/jira/browse/MESOS-9574


Repository: mesos


Description
---

Make the agent garbage collect an operation stauts update stream once
a terminal status update is acknowledged.

This patch also improves the logging of acknowledgmenet failures and the
readability of the `Slave::operationStatusAcknowledgement` method.


Diffs
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69978/diff/1/


Testing
---

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman



Review Request 69977: Improved agent operation recovery process.

2019-02-13 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69977/
---

Review request for mesos, Chun-Hung Hsiao and Greg Mann.


Bugs: MESOS-8054
https://issues.apache.org/jira/browse/MESOS-8054


Repository: mesos


Description
---

This patch makes the agent walk the operation status update streams
directories in order to generate the list of streams to recover, instead
of generating it from the checkpointed `ResourceState` message.

This prevents the agent from asking the operation status update manager
to recover streams that haven't been created yet.

The patch also makes the agent garbage collect operation status update
streams if no correspondng operation is present in the checkpointed
state. This can happen after the agent fails over while processing the
acknowledgement of a terminal operation status update.


Diffs
-

  src/slave/slave.cpp e3c2c005d865b5c333e92e50e49ef398fe06ad79 


Diff: https://reviews.apache.org/r/69977/diff/1/


Testing
---

Manual testing + existing tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69964: Added tests related to operation feedback agent capabilities.

2019-02-12 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69964/#review212779
---




src/tests/master_tests.cpp
Lines 9753 (patched)
<https://reviews.apache.org/r/69964/#comment298687>

Can we check here that the framework doesn't get any operation status 
updates?


- Gastón Kleiman


On Feb. 12, 2019, 3:42 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69964/
> ---
> 
> (Updated Feb. 12, 2019, 3:42 p.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> These tests verify that master and agent behavior is correct
> during a downgrade of an agent to one which does not have the
> AGENT_OPERATION_FEEDBACK capability, and when a new agent's
> capabilities are altered such that it no longer has the
> RESOURCE_PROVIDER capability set.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 9584fa38c6e9ea26b5b6937199fe4679325ef55e 
> 
> 
> Diff: https://reviews.apache.org/r/69964/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69958: Made an agent capability required for agent startup.

2019-02-12 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69958/#review212778
---


Ship it!




Ship It!

- Gastón Kleiman


On Feb. 12, 2019, 1:53 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69958/
> ---
> 
> (Updated Feb. 12, 2019, 1:53 p.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the agent's flag validation code to
> make the AGENT_OPERATION_FEEDBACK capability required for
> agent startup.
> 
> 
> Diffs
> -
> 
>   src/slave/flags.cpp 066b84f528b4c888dde399e0b5d5fe5531934de6 
> 
> 
> Diff: https://reviews.apache.org/r/69958/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69964: Added tests related to operation feedback agent capabilities.

2019-02-12 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69964/#review212777
---


Fix it, then Ship it!





src/tests/master_tests.cpp
Lines 9558 (patched)
<https://reviews.apache.org/r/69964/#comment298686>

Should we do `GET_OPERATIONS` here and check the response?


- Gastón Kleiman


On Feb. 12, 2019, 3:42 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69964/
> ---
> 
> (Updated Feb. 12, 2019, 3:42 p.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> These tests verify that master and agent behavior is correct
> during a downgrade of an agent to one which does not have the
> AGENT_OPERATION_FEEDBACK capability, and when a new agent's
> capabilities are altered such that it no longer has the
> RESOURCE_PROVIDER capability set.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 9584fa38c6e9ea26b5b6937199fe4679325ef55e 
> 
> 
> Diff: https://reviews.apache.org/r/69964/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69957: Updated master operation handling to account for new agent capability.

2019-02-12 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69957/#review212772
---




src/master/master.cpp
Lines 4372-4375 (original), 4373-4376 (patched)
<https://reviews.apache.org/r/69957/#comment298681>

The semantics of these capabilities is not very intuitive; we should update 
our documentation with a good explanation.



src/master/master.cpp
Line 6419 (original), 6422-6423 (patched)
<https://reviews.apache.org/r/69957/#comment298682>

I think the following might be a bit more useful for operators:

```
LOG(WARNING)
  << "Cannot send operation status update acknowledgement for status "
  << statusUuid << " of operation '" << operationId << "'"
  << " of framework " << *framework << " to agent " << slaveId
  << " because the agent does not have the RESOURCE_PROVIDER"
  << " and AGENT_OPERATION_FEEDBACK capabilities";
```

If they get that, then it might be easier for them to debug missing 
capabilities.



src/master/master.cpp
Line 8725 (original)
<https://reviews.apache.org/r/69957/#comment298683>

Would the following assertion hold?

`CHECK(slave->capabilities.resourceProvider || 
slave->capabilities.agentOperationFeedback)`


- Gastón Kleiman


On Feb. 12, 2019, 3:39 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69957/
> ---
> 
> (Updated Feb. 12, 2019, 3:39 p.m.)
> 
> 
> Review request for mesos and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Since the new AGENT_OPERATION_FEEDBACK capability has been
> made required for agent startup, we need to update the way
> the master handles validation of incoming operations and
> acknowledgement calls to account for cases where each of
> the AGENT_OPERATION_FEEDBACK and RESOURCE_PROVIDER
> capabilities are enabled/disabled.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp cf2210ec26642028d5e4fb7fc1841eb0a1ed3396 
> 
> 
> Diff: https://reviews.apache.org/r/69957/diff/2/
> 
> 
> Testing
> ---
> 
> Testing details at the end of this chain.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69911: Added tests for reconciliation of operations on agent default resources.

2019-02-08 Thread Gastón Kleiman


> On Feb. 7, 2019, 4:36 p.m., Greg Mann wrote:
> > src/tests/operation_reconciliation_tests.cpp
> > Lines 1383 (patched)
> > <https://reviews.apache.org/r/69911/diff/2/?file=2124482#file2124482line1383>
> >
> > Could you explain in this comment why the response contains no 
> > operations?

Good catch! I swear I had added this, but it either got lost when squashing the 
commits or I'm starting to imagine things ;-).


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69911/#review212654
-------


On Feb. 8, 2019, 3:12 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69911/
> ---
> 
> (Updated Feb. 8, 2019, 3:12 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-9473
> https://issues.apache.org/jira/browse/MESOS-9473
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added tests for reconciliation of operations on agent default resources.
> 
> 
> Diffs
> -
> 
>   src/tests/operation_reconciliation_tests.cpp 
> 9a1fc95752c8ce7c7358561bda3659a76871f5c3 
> 
> 
> Diff: https://reviews.apache.org/r/69911/diff/3/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh --gtest_filter="*OperationReconciliationTest*" 
> --gtest_repeat=5000 --gtest_break_on_failure` passed
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69911: Added tests for reconciliation of operations on agent default resources.

2019-02-08 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69911/
---

(Updated Feb. 8, 2019, 3:12 p.m.)


Review request for mesos, Greg Mann and Joseph Wu.


Changes
---

Added comment.


Bugs: MESOS-9473
https://issues.apache.org/jira/browse/MESOS-9473


Repository: mesos


Description
---

Added tests for reconciliation of operations on agent default resources.


Diffs (updated)
-

  src/tests/operation_reconciliation_tests.cpp 
9a1fc95752c8ce7c7358561bda3659a76871f5c3 


Diff: https://reviews.apache.org/r/69911/diff/3/

Changes: https://reviews.apache.org/r/69911/diff/2-3/


Testing
---

`bin/mesos-tests.sh --gtest_filter="*OperationReconciliationTest*" 
--gtest_repeat=5000 --gtest_break_on_failure` passed


Thanks,

Gastón Kleiman



Re: Review Request 69910: Added tests for feedback for operations on agent default resources.

2019-02-08 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69910/
---

(Updated Feb. 8, 2019, 1:27 p.m.)


Review request for mesos, Greg Mann and Joseph Wu.


Changes
---

Fixed compilation issue with clang 3.9.x


Bugs: MESOS-9473
https://issues.apache.org/jira/browse/MESOS-9473


Repository: mesos


Description
---

Added tests for feedback for operations on agent default resources.


Diffs (updated)
-

  src/Makefile.am 3f7daf2cca63c1b9c9e78264f241892327741aa0 
  src/tests/CMakeLists.txt 42f820715ac43dc70a776f30783d9bc078ef99a5 
  src/tests/agent_operation_feedback_tests.cpp PRE-CREATION 


Diff: https://reviews.apache.org/r/69910/diff/3/

Changes: https://reviews.apache.org/r/69910/diff/2-3/


Testing
---

`bin/mesos-tests.sh--gtest_filter="*AgentOperationFeedbackTest*" 
--gtest_repeat=5000 --gtest_break_on_failure`


Thanks,

Gastón Kleiman



Re: Review Request 69910: Added tests for feedback for operations on agent default resources.

2019-02-07 Thread Gastón Kleiman


> On Feb. 7, 2019, 4:32 p.m., Greg Mann wrote:
> > src/tests/agent_operation_feedback_tests.cpp
> > Lines 179 (patched)
> > <https://reviews.apache.org/r/69910/diff/2/?file=2124478#file2124478line179>
> >
> > We should probably advance the clock by a multiple of this, like Benno 
> > was doing in his test. That way, if the acknowledgement is not processed 
> > correctly, we would be sure to hit the retry, since the agent backs off as 
> > it retries.

Benno's test advances the clock by a multiple of the minimum interval, this 
test advances it by the maximum interval. I manually verified before posting 
the updated patch that commenting out the ack call triggers another retry and 
makes the test fail.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69910/#review212651
-------


On Feb. 7, 2019, 1:53 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69910/
> ---
> 
> (Updated Feb. 7, 2019, 1:53 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-9473
> https://issues.apache.org/jira/browse/MESOS-9473
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added tests for feedback for operations on agent default resources.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 3f7daf2cca63c1b9c9e78264f241892327741aa0 
>   src/tests/CMakeLists.txt 42f820715ac43dc70a776f30783d9bc078ef99a5 
>   src/tests/agent_operation_feedback_tests.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69910/diff/2/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh--gtest_filter="*AgentOperationFeedbackTest*" 
> --gtest_repeat=5000 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69913: Added more operation reconciliation tests.

2019-02-07 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69913/
---

(Updated Feb. 8, 2019, 12:33 a.m.)


Review request for mesos, Greg Mann and Joseph Wu.


Bugs: MESOS-9473
https://issues.apache.org/jira/browse/MESOS-9473


Repository: mesos


Description
---

Added more operation reconciliation tests.


Diffs
-

  src/tests/operation_reconciliation_tests.cpp 
9a1fc95752c8ce7c7358561bda3659a76871f5c3 


Diff: https://reviews.apache.org/r/69913/diff/1/


Testing
---

`GLOG_v=1 bin/mesos-tests.sh --verbose 
--gtest_filter="*OperationReconciliationTest*"` passed


Thanks,

Gastón Kleiman



Re: Review Request 69922: Added test to verify that frameworks receive OPERATION_DROPPED updates.

2019-02-07 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69922/#review212653
---


Fix it, then Ship it!





src/tests/agent_operation_feedback_tests.cpp
Lines 564 (patched)
<https://reviews.apache.org/r/69922/#comment298500>

Can we also check that the agent ID is set and that the RP ID isn't?


- Gastón Kleiman


On Feb. 7, 2019, 3:23 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69922/
> ---
> 
> (Updated Feb. 7, 2019, 3:23 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman and Joseph Wu.
> 
> 
> Bugs: MESOS-9538
> https://issues.apache.org/jira/browse/MESOS-9538
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds the new test
> `AgentOperationFeedbackTest.DroppedOperationStatusUpdate`
> which verifies that when an operation is dropped on its
> way to the agent, a framework will receive an
> OPERATION_DROPPED status update if it requested feedback
> for that operation.
> 
> 
> Diffs
> -
> 
>   src/tests/agent_operation_feedback_tests.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69922/diff/1/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh --gtest_filter="*DroppedOperationStatusUpdate*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Review Request 69920: Removed `MasterAPITest.OperationFeedbackOnAgentDefaultResources`.

2019-02-07 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69920/
---

Review request for mesos, Benno Evers, Greg Mann, and Joseph Wu.


Bugs: MESOS-9473
https://issues.apache.org/jira/browse/MESOS-9473


Repository: mesos


Description
---

This patch removes a `MasterAPITest`, because the new test suite
`AgentOperationFeedbackTest` already covers the scenario from the
original test.


Diffs
-

  src/tests/api_tests.cpp 428e14ef2e6b9f0e622a6ce05c469dd0ed97d07e 


Diff: https://reviews.apache.org/r/69920/diff/1/


Testing
---


Thanks,

Gastón Kleiman



Review Request 69919: Added missing periods at the end of comments.

2019-02-07 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69919/
---

Review request for mesos, Greg Mann and Joseph Wu.


Bugs: MESOS-9473
https://issues.apache.org/jira/browse/MESOS-9473


Repository: mesos


Description
---

Added missing periods at the end of comments.


Diffs
-

  src/tests/operation_reconciliation_tests.cpp 
9a1fc95752c8ce7c7358561bda3659a76871f5c3 


Diff: https://reviews.apache.org/r/69919/diff/1/


Testing
---


Thanks,

Gastón Kleiman



Re: Review Request 69910: Added tests for feedback for operations on agent default resources.

2019-02-07 Thread Gastón Kleiman


> On Feb. 7, 2019, 9:24 a.m., Greg Mann wrote:
> > I just realized this test is doing the same thing as 
> > `MasterAPITest.OperationFeedbackOnAgentDefaultResources`, maybe we should 
> > just move that one to your new file? Or remove that one and retain this one?

Removing that test in https://reviews.apache.org/r/69920/.


> On Feb. 7, 2019, 9:24 a.m., Greg Mann wrote:
> > src/tests/agent_operation_feedback_tests.cpp
> > Lines 156 (patched)
> > <https://reviews.apache.org/r/69910/diff/1/?file=2124147#file2124147line156>
> >
> > Is this necessary?

The test passes without it, so it is not necessary =).


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69910/#review212631
-----------


On Feb. 6, 2019, 1:02 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69910/
> ---
> 
> (Updated Feb. 6, 2019, 1:02 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-9473
> https://issues.apache.org/jira/browse/MESOS-9473
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added tests for feedback for operations on agent default resources.
> 
> 
> Diffs
> -
> 
>   src/Makefile.am 3f7daf2cca63c1b9c9e78264f241892327741aa0 
>   src/tests/CMakeLists.txt 42f820715ac43dc70a776f30783d9bc078ef99a5 
>   src/tests/agent_operation_feedback_tests.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69910/diff/1/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh--gtest_filter="*AgentOperationFeedbackTest*" 
> --gtest_repeat=5000 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69911: Added tests for reconciliation of operations on agent default resources.

2019-02-07 Thread Gastón Kleiman


> On Feb. 7, 2019, 10:09 a.m., Greg Mann wrote:
> > src/tests/operation_reconciliation_tests.cpp
> > Lines 945 (patched)
> > <https://reviews.apache.org/r/69911/diff/1/?file=2124148#file2124148line945>
> >
> > Use `GetParam()` here?

Already done in https://reviews.apache.org/r/69913/. I squashed both patches.


> On Feb. 7, 2019, 10:09 a.m., Greg Mann wrote:
> > src/tests/operation_reconciliation_tests.cpp
> > Lines 984-985 (patched)
> > <https://reviews.apache.org/r/69911/diff/1/?file=2124148#file2124148line984>
> >
> > Let's also expect that the resource provider ID is unset in the update.

Done here and elsewhere.


> On Feb. 7, 2019, 10:09 a.m., Greg Mann wrote:
> > src/tests/operation_reconciliation_tests.cpp
> > Lines 987 (patched)
> > <https://reviews.apache.org/r/69911/diff/1/?file=2124148#file2124148line987>
> >
> > Period at the end of comment, here and elsewhere.

Also fixed already existing comments here: https://reviews.apache.org/r/69919/


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69911/#review212632
---


On Feb. 7, 2019, 1:55 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69911/
> ---
> 
> (Updated Feb. 7, 2019, 1:55 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-9473
> https://issues.apache.org/jira/browse/MESOS-9473
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added tests for reconciliation of operations on agent default resources.
> 
> 
> Diffs
> -
> 
>   src/tests/operation_reconciliation_tests.cpp 
> 9a1fc95752c8ce7c7358561bda3659a76871f5c3 
> 
> 
> Diff: https://reviews.apache.org/r/69911/diff/2/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh --gtest_filter="*OperationReconciliationTest*" 
> --gtest_repeat=5000 --gtest_break_on_failure` passed
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69913: Added more operation reconciliation tests.

2019-02-07 Thread Gastón Kleiman


> On Feb. 7, 2019, 10:32 a.m., Greg Mann wrote:
> > src/tests/operation_reconciliation_tests.cpp
> > Lines 943-945 (original), 943-947 (patched)
> > <https://reviews.apache.org/r/69913/diff/1/?file=2124157#file2124157line943>
> >
> > Ah, looks like this part of the diff belongs in a previous patch.

Squashed both patches together, see: https://reviews.apache.org/r/69911/


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69913/#review212635
-------


On Feb. 6, 2019, 2:04 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69913/
> ---
> 
> (Updated Feb. 6, 2019, 2:04 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-9473
> https://issues.apache.org/jira/browse/MESOS-9473
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added more operation reconciliation tests.
> 
> 
> Diffs
> -
> 
>   src/tests/operation_reconciliation_tests.cpp 
> 9a1fc95752c8ce7c7358561bda3659a76871f5c3 
> 
> 
> Diff: https://reviews.apache.org/r/69913/diff/1/
> 
> 
> Testing
> ---
> 
> `GLOG_v=1 bin/mesos-tests.sh --verbose 
> --gtest_filter="*OperationReconciliationTest*"` passed
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69911: Added tests for reconciliation of operations on agent default resources.

2019-02-07 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69911/
---

(Updated Feb. 7, 2019, 1:55 p.m.)


Review request for mesos, Greg Mann and Joseph Wu.


Changes
---

Addressed feedback and the original patch squashed with the one posted on 
https://reviews.apache.org/r/69913/.


Bugs: MESOS-9473
https://issues.apache.org/jira/browse/MESOS-9473


Repository: mesos


Description
---

Added tests for reconciliation of operations on agent default resources.


Diffs (updated)
-

  src/tests/operation_reconciliation_tests.cpp 
9a1fc95752c8ce7c7358561bda3659a76871f5c3 


Diff: https://reviews.apache.org/r/69911/diff/2/

Changes: https://reviews.apache.org/r/69911/diff/1-2/


Testing
---

`bin/mesos-tests.sh --gtest_filter="*OperationReconciliationTest*" 
--gtest_repeat=5000 --gtest_break_on_failure` passed


Thanks,

Gastón Kleiman



Re: Review Request 69912: Added agent/RP IDs to some operation updates generated by the master.

2019-02-07 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69912/
---

(Updated Feb. 7, 2019, 1:54 p.m.)


Review request for mesos, Greg Mann and Joseph Wu.


Changes
---

Rebased chain.


Bugs: MESOS-9559
https://issues.apache.org/jira/browse/MESOS-9559


Repository: mesos


Description
---

This patch makes the master include the agent and resource provider IDs
in the OPERATION_GONE_BY_OPERATOR and OPERATION_UNREACHABLE operation
status updates that it generates.


Diffs (updated)
-

  src/master/master.cpp 4c9ef2528d0ab74b565dcb03d05c5189d0aa0c0f 


Diff: https://reviews.apache.org/r/69912/diff/2/

Changes: https://reviews.apache.org/r/69912/diff/1-2/


Testing
---

`bin/mesos-tests.sh` passed


Thanks,

Gastón Kleiman



Re: Review Request 69910: Added tests for feedback for operations on agent default resources.

2019-02-07 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69910/
---

(Updated Feb. 7, 2019, 1:53 p.m.)


Review request for mesos, Greg Mann and Joseph Wu.


Changes
---

Addressed feedback and changed `STATUS_UPDATE_RETRY_INTERVAL_MIN` for 
`STATUS_UPDATE_RETRY_INTERVAL_MAX` in a test.


Bugs: MESOS-9473
https://issues.apache.org/jira/browse/MESOS-9473


Repository: mesos


Description
---

Added tests for feedback for operations on agent default resources.


Diffs (updated)
-

  src/Makefile.am 3f7daf2cca63c1b9c9e78264f241892327741aa0 
  src/tests/CMakeLists.txt 42f820715ac43dc70a776f30783d9bc078ef99a5 
  src/tests/agent_operation_feedback_tests.cpp PRE-CREATION 


Diff: https://reviews.apache.org/r/69910/diff/2/

Changes: https://reviews.apache.org/r/69910/diff/1-2/


Testing
---

`bin/mesos-tests.sh--gtest_filter="*AgentOperationFeedbackTest*" 
--gtest_repeat=5000 --gtest_break_on_failure`


Thanks,

Gastón Kleiman



Re: Review Request 69913: Added more operation reconciliation tests.

2019-02-07 Thread Gastón Kleiman


> On Feb. 7, 2019, 10:32 a.m., Greg Mann wrote:
> > src/tests/operation_reconciliation_tests.cpp
> > Lines 1244 (patched)
> > <https://reviews.apache.org/r/69913/diff/1/?file=2124157#file2124157line1244>
> >
> > I think the detector isn't necessary in this test?

`StartSlave()` needs the detector to find the master.

Note that this is not a `StandAloneDetector`.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69913/#review212635
-------


On Feb. 6, 2019, 2:04 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69913/
> ---
> 
> (Updated Feb. 6, 2019, 2:04 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-9473
> https://issues.apache.org/jira/browse/MESOS-9473
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added more operation reconciliation tests.
> 
> 
> Diffs
> -
> 
>   src/tests/operation_reconciliation_tests.cpp 
> 9a1fc95752c8ce7c7358561bda3659a76871f5c3 
> 
> 
> Diff: https://reviews.apache.org/r/69913/diff/1/
> 
> 
> Testing
> ---
> 
> `GLOG_v=1 bin/mesos-tests.sh --verbose 
> --gtest_filter="*OperationReconciliationTest*"` passed
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Re: Review Request 69912: Added agent/RP IDs to some operation updates generated by the master.

2019-02-07 Thread Gastón Kleiman


> On Feb. 7, 2019, 10:15 a.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 9089 (patched)
> > <https://reviews.apache.org/r/69912/diff/1/?file=2124156#file2124156line9089>
> >
> > s/Some(resourceProviderId.get())/resourceProviderId/

Note that `resourceProviderId` is a `Result` and not an `Option`, so the 
compiler won't like that:

```
../src/master/master.cpp:9088:11: error: no viable conversion from 
'Result' to 'const Option'
  resourceProviderId.isSome()
  ^~~
```


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69912/#review212634
-------


On Feb. 6, 2019, 2:04 p.m., Gastón Kleiman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69912/
> ---
> 
> (Updated Feb. 6, 2019, 2:04 p.m.)
> 
> 
> Review request for mesos, Greg Mann and Joseph Wu.
> 
> 
> Bugs: MESOS-9559
> https://issues.apache.org/jira/browse/MESOS-9559
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch makes the master include the agent and resource provider IDs
> in the OPERATION_GONE_BY_OPERATOR and OPERATION_UNREACHABLE operation
> status updates that it generates.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 4c9ef2528d0ab74b565dcb03d05c5189d0aa0c0f 
> 
> 
> Diff: https://reviews.apache.org/r/69912/diff/1/
> 
> 
> Testing
> ---
> 
> `bin/mesos-tests.sh` passed
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>



Review Request 69913: Added more operation reconciliation tests.

2019-02-06 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69913/
---

Review request for mesos, Greg Mann and Joseph Wu.


Bugs: MESOS-9473
https://issues.apache.org/jira/browse/MESOS-9473


Repository: mesos


Description
---

Added more operation reconciliation tests.


Diffs
-

  src/tests/operation_reconciliation_tests.cpp 
9a1fc95752c8ce7c7358561bda3659a76871f5c3 


Diff: https://reviews.apache.org/r/69913/diff/1/


Testing
---

`GLOG_v=1 bin/mesos-tests.sh --verbose 
--gtest_filter="*OperationReconciliationTest*"` passed


Thanks,

Gastón Kleiman



Review Request 69912: Added agent/RP IDs to some operation updates generated by the master.

2019-02-06 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69912/
---

Review request for mesos, Greg Mann and Joseph Wu.


Bugs: MESOS-9559
https://issues.apache.org/jira/browse/MESOS-9559


Repository: mesos


Description
---

This patch makes the master include the agent and resource provider IDs
in the OPERATION_GONE_BY_OPERATOR and OPERATION_UNREACHABLE operation
status updates that it generates.


Diffs
-

  src/master/master.cpp 4c9ef2528d0ab74b565dcb03d05c5189d0aa0c0f 


Diff: https://reviews.apache.org/r/69912/diff/1/


Testing
---

`bin/mesos-tests.sh` passed


Thanks,

Gastón Kleiman



Review Request 69911: Added tests for reconciliation of operations on agent default resources.

2019-02-06 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69911/
---

Review request for mesos, Greg Mann and Joseph Wu.


Bugs: MESOS-9473
https://issues.apache.org/jira/browse/MESOS-9473


Repository: mesos


Description
---

Added tests for reconciliation of operations on agent default resources.


Diffs
-

  src/tests/operation_reconciliation_tests.cpp 
9a1fc95752c8ce7c7358561bda3659a76871f5c3 


Diff: https://reviews.apache.org/r/69911/diff/1/


Testing
---

`bin/mesos-tests.sh --gtest_filter="*OperationReconciliationTest*" 
--gtest_repeat=5000 --gtest_break_on_failure` passed


Thanks,

Gastón Kleiman



Review Request 69910: Added tests for feedback for operations on agent default resources.

2019-02-06 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69910/
---

Review request for mesos, Greg Mann and Joseph Wu.


Bugs: MESOS-9473
https://issues.apache.org/jira/browse/MESOS-9473


Repository: mesos


Description
---

Added tests for feedback for operations on agent default resources.


Diffs
-

  src/Makefile.am 3f7daf2cca63c1b9c9e78264f241892327741aa0 
  src/tests/CMakeLists.txt 42f820715ac43dc70a776f30783d9bc078ef99a5 
  src/tests/agent_operation_feedback_tests.cpp PRE-CREATION 


Diff: https://reviews.apache.org/r/69910/diff/1/


Testing
---

`bin/mesos-tests.sh--gtest_filter="*AgentOperationFeedbackTest*" 
--gtest_repeat=5000 --gtest_break_on_failure`


Thanks,

Gastón Kleiman



Re: Review Request 69876: Removed operations from master state when an agent is downgraded.

2019-02-05 Thread Gastón Kleiman


> On Feb. 4, 2019, 4:28 p.m., Gastón Kleiman wrote:
> > src/tests/master_tests.cpp
> > Lines 9419 (patched)
> > <https://reviews.apache.org/r/69876/diff/1/?file=2123554#file2123554line9419>
> >
> > We should consider making the agent not recover the operation status 
> > update manager if it isn't started with the `AGENT_OPERATION_FEEDBACK` 
> > capability.

If we don't, we should not drop this message and make sure that the framework 
can acknowledge the update, so that the agent stops resending it.


- Gastón


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69876/#review212538
---


On Jan. 31, 2019, 3:08 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69876/
> ---
> 
> (Updated Jan. 31, 2019, 3:08 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Gastón 
> Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When an agent is downgraded from one with the AGENT_OPERATION_FEEDBACK
> capability to one without this capability, the master needs to remove
> terminal-but-unACKed operations from its state which operate on agent
> default resources, since the downgraded agent will not resend status
> updates for these operations.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5 
>   src/tests/master_tests.cpp acc6096239e4992bdca084d0d644ab4a2385 
> 
> 
> Diff: https://reviews.apache.org/r/69876/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*CleanupOperationsAfterAgentDowngrade*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69876: Removed operations from master state when an agent is downgraded.

2019-02-04 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69876/#review212538
---


Fix it, then Ship it!





src/tests/master_tests.cpp
Lines 9419 (patched)
<https://reviews.apache.org/r/69876/#comment298353>

We should consider making the agent not recover the operation status update 
manager if it isn't started with the `AGENT_OPERATION_FEEDBACK` capability.


- Gastón Kleiman


On Jan. 31, 2019, 3:08 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69876/
> ---
> 
> (Updated Jan. 31, 2019, 3:08 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Gastón 
> Kleiman.
> 
> 
> Bugs: MESOS-9535
> https://issues.apache.org/jira/browse/MESOS-9535
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> When an agent is downgraded from one with the AGENT_OPERATION_FEEDBACK
> capability to one without this capability, the master needs to remove
> terminal-but-unACKed operations from its state which operate on agent
> default resources, since the downgraded agent will not resend status
> updates for these operations.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5 
>   src/tests/master_tests.cpp acc6096239e4992bdca084d0d644ab4a2385 
> 
> 
> Diff: https://reviews.apache.org/r/69876/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*CleanupOperationsAfterAgentDowngrade*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69880: Added header comments for two master methods.

2019-02-04 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69880/#review212532
---


Fix it, then Ship it!





src/master/master.hpp
Lines 665 (patched)
<https://reviews.apache.org/r/69880/#comment298350>

Looking at 
https://github.com/apache/mesos/blob/9f6ccbd41a55846e54297ecb31fddbeee3be50c9/src/master/master.cpp#L6851-L6865
 I think that this method will be called only if the agent tries to register 
using an already known PID (IP + port).

`Master::_registerSlave()` has a comment with a bit more detail:

```
  // The slave was previously disconnected but it is now trying
  // to register as a new slave.
  // There are several possible reasons for this to happen:
  // - If the slave failed recovery and hence registering as a new
  //   slave before the master removed the old slave from its map.
  // - If the slave was shutting down while it had a registration
  //   retry scheduled. See MESOS-8463.
```


- Gastón Kleiman


On Feb. 4, 2019, 1:41 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69880/
> ---
> 
> (Updated Feb. 4, 2019, 1:41 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Joseph Wu, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added header comments for two master methods.
> 
> 
> Diffs
> -
> 
>   src/master/master.hpp 75f396a8dbf2046b76801ca044b62ac386f255df 
> 
> 
> Diff: https://reviews.apache.org/r/69880/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69871: Added more documentation for operation feedback.

2019-02-04 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69871/#review212528
---


Fix it, then Ship it!




LGTM!


docs/scheduler-http-api.md
Lines 600 (patched)
<https://reviews.apache.org/r/69871/#comment298345>

Nit: remove this extra line.


- Gastón Kleiman


On Jan. 31, 2019, 9:57 a.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69871/
> ---
> 
> (Updated Jan. 31, 2019, 9:57 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, and James 
> DeFelice.
> 
> 
> Bugs: MESOS-9477
> https://issues.apache.org/jira/browse/MESOS-9477
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added more documentation for operation feedback.
> 
> 
> Diffs
> -
> 
>   docs/high-availability-framework-guide.md 
> 1bf58e166c76af9ef9d7182d9d4e413bc02e253c 
>   docs/reconciliation.md bfb117c11969f97127ef2c55321191232814a1c2 
>   docs/scheduler-http-api.md 84f6187943b60d4472a4b7e0c667678ab055e482 
> 
> 
> Diff: https://reviews.apache.org/r/69871/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

2019-01-29 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
---

(Updated Jan. 29, 2019, 3:33 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
---

Fixed indentation.


Bugs: MESOS-9356
https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
---

Made agent recover atomically checkpointed resources and operations.


Diffs (updated)
-

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/8/

Changes: https://reviews.apache.org/r/69795/diff/7-8/


Testing
---

Current tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69843: Track operations on agent default resources in master state.

2019-01-29 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69843/#review212429
---


Ship it!




Ship It!

- Gastón Kleiman


On Jan. 25, 2019, 6:22 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69843/
> ---
> 
> (Updated Jan. 25, 2019, 6:22 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Gastón 
> Kleiman.
> 
> 
> Bugs: MESOS-9471
> https://issues.apache.org/jira/browse/MESOS-9471
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch updates the master to add operations on agent
> default resources to its in-memory state upon receipt of
> UpdateSlaveMessages.
> 
> 
> Diffs
> -
> 
>   src/master/master.cpp 2e0a0de4d745112011370741b8cedbb6db17b915 
>   src/tests/master_tests.cpp 1f8da63aa313c4679c0d7a3934802af1474e8f28 
> 
> 
> Diff: https://reviews.apache.org/r/69843/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdateDuringFailover*" 
> --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

2019-01-29 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
---

(Updated Jan. 29, 2019, 3:17 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
---

Fixed bugs.


Bugs: MESOS-9356
https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
---

Made agent recover atomically checkpointed resources and operations.


Diffs (updated)
-

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/7/

Changes: https://reviews.apache.org/r/69795/diff/6-7/


Testing
---

Current tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69851: Added end-to-end test for operations affecting agent default resources.

2019-01-29 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69851/
---

(Updated Jan. 29, 2019, 1:56 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
---

Addressed feedback.


Bugs: MESOS-9356
https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
---

Added end-to-end test for operations affecting agent default resources.


Diffs (updated)
-

  src/tests/slave_tests.cpp 9168e06348a83dcb20400a2fe0e3bb1f26e6ff1b 


Diff: https://reviews.apache.org/r/69851/diff/2/

Changes: https://reviews.apache.org/r/69851/diff/1-2/


Testing
---

`GLOG_v=1 bin/mesos-tests.sh --verbose 
--gtest_filter="*Slave*RetryOperationStatusUpdateAfterRecovery*" 
--gtest_repeat=5000 --gtest_break_on_failure` passed on CentOS 7.4.1708.


Thanks,

Gastón Kleiman



Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

2019-01-28 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
---

(Updated Jan. 28, 2019, 3:26 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
---

Rebased.


Bugs: MESOS-9356
https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
---

Made agent recover atomically checkpointed resources and operations.


Diffs (updated)
-

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/6/

Changes: https://reviews.apache.org/r/69795/diff/5-6/


Testing
---

Current tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69794: Made agent checkpoint operations affecting default resources.

2019-01-28 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69794/
---

(Updated Jan. 28, 2019, 3:25 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
---

Rebased.


Bugs: MESOS-9356
https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
---

This patch makes the agent atomically checkpoint resoures and operations
affecting agent default resources. This is needed in order to provide
operation feedback for operations affecting agent default resources.


Diffs (updated)
-

  src/slave/paths.hpp 73b94a8df932e40d4f266de01150bb17c2282f22 
  src/slave/paths.cpp 0ec50d052e29577818e828133a3c501cacf6f2e0 
  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/tests/slave_tests.cpp 9168e06348a83dcb20400a2fe0e3bb1f26e6ff1b 


Diff: https://reviews.apache.org/r/69794/diff/5/

Changes: https://reviews.apache.org/r/69794/diff/4-5/


Testing
---

Current tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69793: Added the `ResourceState` agent protobuf message.

2019-01-28 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69793/
---

(Updated Jan. 28, 2019, 3:25 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
---

Rebased.


Bugs: MESOS-9356
https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
---

This protobuf message will be used by the agent to atomically checkpoint
resources and operations.


Diffs (updated)
-

  src/CMakeLists.txt 1564183b8c59d79ae378aecf7b6f1e25b9e9faf5 
  src/Makefile.am b8105c9f097d43a8ed8dff790d26dd9008c158ab 
  src/slave/state.proto PRE-CREATION 


Diff: https://reviews.apache.org/r/69793/diff/4/

Changes: https://reviews.apache.org/r/69793/diff/3-4/


Testing
---

Current tests still pass.

Compiled new proto with both cmake and autotools.


Thanks,

Gastón Kleiman



Re: Review Request 69790: Fixed a typo.

2019-01-28 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69790/
---

(Updated Jan. 28, 2019, 3:25 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
---

Rebased.


Repository: mesos


Description
---

Fixed a typo.


Diffs (updated)
-

  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 


Diff: https://reviews.apache.org/r/69790/diff/2/

Changes: https://reviews.apache.org/r/69790/diff/1-2/


Testing
---


Thanks,

Gastón Kleiman



Review Request 69851: Added end-to-end test for operations affecting agent default resources.

2019-01-28 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69851/
---

Review request for mesos, Benno Evers and Greg Mann.


Bugs: MESOS-9356
https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
---

Added end-to-end test for operations affecting agent default resources.


Diffs
-

  src/tests/slave_tests.cpp 9168e06348a83dcb20400a2fe0e3bb1f26e6ff1b 


Diff: https://reviews.apache.org/r/69851/diff/1/


Testing
---

`GLOG_v=1 bin/mesos-tests.sh --verbose 
--gtest_filter="*Slave*RetryOperationStatusUpdateAfterRecovery*" 
--gtest_repeat=5000 --gtest_break_on_failure` passed on CentOS 7.4.1708.


Thanks,

Gastón Kleiman



Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

2019-01-25 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
---

(Updated Jan. 25, 2019, 4:57 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
---

Fixed nits.


Bugs: MESOS-9356
https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
---

Made agent recover atomically checkpointed resources and operations.


Diffs (updated)
-

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/5/

Changes: https://reviews.apache.org/r/69795/diff/4-5/


Testing
---

Current tests still pass.


Thanks,

Gastón Kleiman



Re: Review Request 69786: Added agent capability for operation feedback on default resources.

2019-01-25 Thread Gastón Kleiman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69786/#review212351
---


Ship it!




Ship It!

- Gastón Kleiman


On Jan. 17, 2019, 6:30 p.m., Greg Mann wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69786/
> ---
> 
> (Updated Jan. 17, 2019, 6:30 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Gastón Kleiman.
> 
> 
> Bugs: MESOS-9525
> https://issues.apache.org/jira/browse/MESOS-9525
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added agent capability for operation feedback on default resources.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto 2ef6ba3aef67cf34227569948fd3ddc8dfd5879d 
>   include/mesos/v1/mesos.proto 1a701da65f653fe4191f92ff1fb1436809b50acb 
>   src/common/protobuf_utils.hpp f5294710cbdbcca34a6468ad418938f2fe4e3d15 
>   src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441deb 
>   src/slave/constants.cpp 103384c1d75792357623f55fca03f0d798305339 
>   src/slave/flags.cpp 6bac8e1409f04d639204c45eda8a90c098e3dbd0 
>   src/tests/master_tests.cpp 1f8da63aa313c4679c0d7a3934802af1474e8f28 
>   src/tests/slave_tests.cpp dc711de96053ebecb6b71b98f14f22cb9b050c52 
> 
> 
> Diff: https://reviews.apache.org/r/69786/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>



  1   2   3   4   5   6   7   8   >