Re: Review Request 71414: Gracefully handled duplicated volumes from non-conforming CSI plugins.

2019-08-30 Thread Chun-Hung Hsiao


> On Aug. 30, 2019, 11:02 a.m., Benno Evers wrote:
> > src/resource_provider/storage/provider.cpp
> > Line 1114 (original), 1135 (patched)
> > <https://reviews.apache.org/r/71414/diff/1/?file=2163192#file2163192line1142>
> >
> > It seems like this could be moved to the beginning of the loop, so we 
> > don't even have to create the raw disk resource at all in case of a 
> > duplicated volume id.

Yes I thought about that. I put it here just for printing out the warning 
message with the whole resource proto but it's probably not necessary.


> On Aug. 30, 2019, 11:02 a.m., Benno Evers wrote:
> > src/resource_provider/storage/provider.cpp
> > Line 1145 (original), 1167 (patched)
> > <https://reviews.apache.org/r/71414/diff/1/?file=2163192#file2163192line1182>
> >
> > Probably the name doesnt quite match anymore, now that the conversion 
> > converts more than just metadata.

Good point. I'll have a follow up patch to change the name.


> On Aug. 30, 2019, 11:02 a.m., Benno Evers wrote:
> > src/tests/storage_local_resource_provider_tests.cpp
> > Lines 1379 (patched)
> > <https://reviews.apache.org/r/71414/diff/1/?file=2163193#file2163193line1379>
> >
> > Can this test be run when the clock is stopped? If not, can we document 
> > in a comment why not? Given the many moving parts here, I fear that there's 
> > a high chance this test will become flaky.

In general I usually lean towards stopping the clock only when necessary, and 
enforce as few synchronizations as possible. A more nodeterministic test would 
help revealing race problems in the code, but also might reveal race problems 
in the test (which doesn't worth the time to fix).


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71414/#review217508
---


On Aug. 30, 2019, 12:48 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71414/
> ---
> 
> (Updated Aug. 30, 2019, 12:48 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Jan Schlicht.
> 
> 
> Bugs: MESOS-9956
> https://issues.apache.org/jira/browse/MESOS-9956
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If the SLRP uses a plugin that does not conform to the CSI spec and
> reports duplicated volumes, the duplicate would be removed.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> 6d632606f411d3ca99d3573a57c9f68b02ba8072 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69b59d48ceefebbb7accefe411c54ac5cecff1c3 
> 
> 
> Diff: https://reviews.apache.org/r/71414/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> The test would fail without the fix.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Review Request 71414: Gracefully handled duplicated volumes from non-conforming CSI plugins.

2019-08-29 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71414/
---

Review request for mesos, Benjamin Bannier and Jan Schlicht.


Bugs: MESOS-9956
https://issues.apache.org/jira/browse/MESOS-9956


Repository: mesos


Description
---

If the SLRP uses a plugin that does not conform to the CSI spec and
reports duplicated volumes, the duplicate would be removed.


Diffs
-

  src/resource_provider/storage/provider.cpp 
6d632606f411d3ca99d3573a57c9f68b02ba8072 
  src/tests/storage_local_resource_provider_tests.cpp 
69b59d48ceefebbb7accefe411c54ac5cecff1c3 


Diff: https://reviews.apache.org/r/71414/diff/1/


Testing
---

make check

The test would fail without the fix.


Thanks,

Chun-Hung Hsiao



Re: Review Request 71151: Performed periodic storage local provider reconciliations.

2019-08-19 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71151/#review217315
---


Fix it, then Ship it!





src/resource_provider/storage/provider.cpp
Lines 777 (patched)
<https://reviews.apache.org/r/71151/#comment304613>

Shouldn't `alwaysUpdate` be false here?



src/resource_provider/storage/provider.cpp
Line 753 (original), 820 (patched)
<https://reviews.apache.org/r/71151/#comment304616>

`resourceVersion` needs to be updated every time `UPDATE_STATE` is sent to 
avoid races. In the original implementation, since it is initialized to a 
random UUID in the constructor, we avoided assigning a new random value for the 
first `UPDATE_STATE` call.

In this new codepath, it's possible that we update the random UUID for the 
first `UPDATE_STATE` call, which is okay, just that the initial random UUID is 
wasted. If we don't care about this small overhead, I'd suggest that we move 
this line into the `sendResourceProviderStateUpdate` function.



src/resource_provider/storage/provider.cpp
Lines 760-775 (original), 827-842 (patched)
<https://reviews.apache.org/r/71151/#comment304614>

These should be be removed. L741-753 already handled the transition.



src/resource_provider/storage/provider.cpp
Line 779 (original), 846 (patched)
<https://reviews.apache.org/r/71151/#comment304615>

This should be removed. L739 resumed the SUM.


- Chun-Hung Hsiao


On Aug. 19, 2019, 11:58 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71151/
> ---
> 
> (Updated Aug. 19, 2019, 11:58 a.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Performed periodic storage local provider reconciliations.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> 6d632606f411d3ca99d3573a57c9f68b02ba8072 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69b59d48ceefebbb7accefe411c54ac5cecff1c3 
> 
> 
> Diff: https://reviews.apache.org/r/71151/diff/6/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71151: Performed periodic storage local provider reconciliations.

2019-08-19 Thread Chun-Hung Hsiao


> On Aug. 9, 2019, 7:41 a.m., Chun-Hung Hsiao wrote:
> > src/resource_provider/storage/provider.cpp
> > Line 752 (original), 811 (patched)
> > <https://reviews.apache.org/r/71151/diff/4/?file=2159963#file2159963line813>
> >
> > As commented in the previous patch, since we're having two functions, 
> > `reconcileStoragePools` and `reconcileResources`, it's alright to have a 
> > little bit of repeated code and don't introduce an extra function without a 
> > very focused purpose.
> > 
> > In `reconcileResources`, we could have the logic that checks for 
> > `alwaysUpdate`. In `reconcileStoragePools`, since it's only called when 
> > profiles are updated or volumes of a stale profile are deleted, there's no 
> > need for the extra `alwaysUpdate` logic there.
> 
> Benjamin Bannier wrote:
> I updated the code in question. I am still not sure I understood what 
> exactly you were after, could you have a look?

Since you went for merging `reconfileStoragePools` into `reconcileResources`, 
please ignore it :)


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71151/#review217143
---


On Aug. 19, 2019, 11:58 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71151/
> ---
> 
> (Updated Aug. 19, 2019, 11:58 a.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Performed periodic storage local provider reconciliations.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> 6d632606f411d3ca99d3573a57c9f68b02ba8072 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69b59d48ceefebbb7accefe411c54ac5cecff1c3 
> 
> 
> Diff: https://reviews.apache.org/r/71151/diff/6/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71150: Factored out storage provider method to update resources.

2019-08-19 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71150/#review217316
---




src/resource_provider/storage/provider.cpp
Lines 722-723 (original), 729-739 (patched)
<https://reviews.apache.org/r/71150/#comment304617>

Do you think this extra flattern code makes the next continuation more 
readable? If not maybe let's just keep the original `foreach` loop to save an 
extra copy and an extra continuation?

Otherwise, please feel free to drop this. I already gave you another 
ship-it :)


- Chun-Hung Hsiao


On Aug. 19, 2019, 11:58 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71150/
> ---
> 
> (Updated Aug. 19, 2019, 11:58 a.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Factored out storage provider method to update resources.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> 6d632606f411d3ca99d3573a57c9f68b02ba8072 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69b59d48ceefebbb7accefe411c54ac5cecff1c3 
> 
> 
> Diff: https://reviews.apache.org/r/71150/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71150: Factored out storage provider method to update resources.

2019-08-19 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71150/#review217314
---


Ship it!




Ship It!

- Chun-Hung Hsiao


On Aug. 19, 2019, 11:58 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71150/
> ---
> 
> (Updated Aug. 19, 2019, 11:58 a.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Factored out storage provider method to update resources.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> 6d632606f411d3ca99d3573a57c9f68b02ba8072 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69b59d48ceefebbb7accefe411c54ac5cecff1c3 
> 
> 
> Diff: https://reviews.apache.org/r/71150/diff/5/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71151: Performed periodic storage local provider reconciliations.

2019-08-09 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71151/#review217143
---


Fix it, then Ship it!





src/resource_provider/storage/provider.cpp
Line 735 (original), 796 (patched)
<https://reviews.apache.org/r/71151/#comment304408>

Let's check that the `reconciled` future is pending at the very beginning 
of this function, similar to what we do in `reconcileStoragePools`, because 
`getExistingVolumes` and `getStoragePools` must be called after a 
reconciliation starts.



src/resource_provider/storage/provider.cpp
Line 752 (original), 811 (patched)
<https://reviews.apache.org/r/71151/#comment304409>

As commented in the previous patch, since we're having two functions, 
`reconcileStoragePools` and `reconcileResources`, it's alright to have a little 
bit of repeated code and don't introduce an extra function without a very 
focused purpose.

In `reconcileResources`, we could have the logic that checks for 
`alwaysUpdate`. In `reconcileStoragePools`, since it's only called when 
profiles are updated or volumes of a stale profile are deleted, there's no need 
for the extra `alwaysUpdate` logic there.



src/tests/storage_local_resource_provider_tests.cpp
Lines 6743 (patched)
<https://reviews.apache.org/r/71151/#comment304411>

These are kinda ugly. Probably should find some time to refactor them in 
the future :)



src/tests/storage_local_resource_provider_tests.cpp
Lines 6871 (patched)
<https://reviews.apache.org/r/71151/#comment304412>

Let's do a `Clock::settle()` before `Clock::advance()` to ensure that the 
timer has been set up.



src/tests/storage_local_resource_provider_tests.cpp
Lines 6881 (patched)
<https://reviews.apache.org/r/71151/#comment304413>

Ditto.


- Chun-Hung Hsiao


On Aug. 6, 2019, 2:27 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71151/
> ---
> 
> (Updated Aug. 6, 2019, 2:27 p.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Performed periodic storage local provider reconciliations.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> 6d632606f411d3ca99d3573a57c9f68b02ba8072 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69b59d48ceefebbb7accefe411c54ac5cecff1c3 
> 
> 
> Diff: https://reviews.apache.org/r/71151/diff/4/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71150: Factored out storage provider method to update resources.

2019-08-09 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71150/#review217142
---


Fix it, then Ship it!





src/resource_provider/storage/provider.cpp
Line 718 (original), 728 (patched)
<https://reviews.apache.org/r/71150/#comment304403>

Move `.then` to the next line.



src/resource_provider/storage/provider.cpp
Lines 746-750 (original), 739-749 (patched)
<https://reviews.apache.org/r/71150/#comment304401>

The order of applying the resource conversions in the inner vector is 
important, so if you prefer having a a flatterned vector, let's add a comment 
saying that the order is preserved when flatterning.



src/resource_provider/storage/provider.cpp
Lines 949-950 (original), 931-932 (patched)
<https://reviews.apache.org/r/71150/#comment304399>

It seems to me now that there's no need to hard-fail the SLRP here.

Also, let's establish a convention that the error handling is always 
handled in the top-level caller that doesn't propagate the failure, to avoid 
repeatitive log messages. So let's remove the `onFailed` and `onDiscarded`.



src/resource_provider/storage/provider.cpp
Lines 976 (patched)
<https://reviews.apache.org/r/71150/#comment304405>

The return type should be `Try` or simply `Nothing` since this is 
not an asynchoronus function and never returns an error.



src/resource_provider/storage/provider.cpp
Lines 979-995 (patched)
<https://reviews.apache.org/r/71150/#comment304404>

Sorry for not making my suggestion clear enough. I was actually thinking 
about removing `reconcileStoragePools()` and always calling 
`reconcileResources()` even when we only want to reconcile storage pools.

This suggestion makes more sense if we don't reconcile storage pools after 
destroying a MOUNT disk with a stale profile, as I suggested in the next patch. 
But if we want to keep this behavior, then this approach would introduce an 
extra `ListVolumes` call.

If you prefer to avoid having this extra grpc call, then adding this extra 
function seems only give us a little bit of code reuse, and I'm not sure if it 
worths adding an extra function name to the already-long list of functions in 
this class. I'd vote for keeping the code in its original place and avoid 
introducing a function name that doesn't convey its purpose clearly.



src/resource_provider/storage/provider.cpp
Line 1874 (original), 1902 (patched)
<https://reviews.apache.org/r/71150/#comment304402>

Let's print out log messages here instead:
```
auto err = [](const Resource& resource, const string& message) {
  LOG(ERROR)
<< "Failed to reconcile storage pools after resource '" << resource
<< "' has been freed: " << message;
};

reconciled = sequence.add(...)
  .onFailed(std::bind(err, resource, lambda::_1))
  .onDiscarded(std::bind(err, resource, "future discarded"));
```


- Chun-Hung Hsiao


On July 23, 2019, 8:18 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71150/
> ---
> 
> (Updated July 23, 2019, 8:18 p.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Factored out storage provider method to update resources.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> 6d632606f411d3ca99d3573a57c9f68b02ba8072 
> 
> 
> Diff: https://reviews.apache.org/r/71150/diff/3/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71151: Performed periodic storage local provider reconciliations.

2019-08-02 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71151/#review217049
---




src/resource_provider/storage/provider.cpp
Lines 749-768 (patched)
<https://reviews.apache.org/r/71151/#comment304269>

```
loop(
  self(),
  std::bind(::after, reconciliationInterval),
  [this](const Nothing&) {
// Poll resource provider state ...
reconciled = sequence.add(defer(self(), ::reconcileResources));

return reconciled.then([](const Nothing&) -> ControlFlow {
  return Continue();
});
  });
```



src/resource_provider/storage/provider.cpp
Lines 756-760 (original), 801-821 (patched)
<https://reviews.apache.org/r/71151/#comment304272>

How about introduce a `bool alwaysUpdate` parameter in this function:
```
bool shouldUpdate = alwaysUpdate;

if (result != totalResources) {
  LOG(INFO) << "Removing"... ;
  
  // Update the resource version...
  totalResources = result;
  resourceVersion = id::UUID::random();
  checkpointResourceProviderState();
  
  shouldUpdate = true;
}

if (shouldUpdate) {
  sendResourceProviderStateUpdate();
}
```

Then, in `reconcileResourceProviderState()`:
```
  return reconcileOperationStatuses()
.then(defer(self(), ::reconcileResources, true))
.then(defer(self(), [this] {
  statusUpdateManager.resume();
  
  LOG(INFO)
<< "Resource provider " << info.id() << " is in READY state";

  state = READY;

  return Nothing();
}));
```
So we don't put the state transition logic inside this function?



src/resource_provider/storage/provider.cpp
Lines 1356 (patched)
<https://reviews.apache.org/r/71151/#comment304273>

The delay won't guarantee the happens-before order between the first 
reconciliation and the ones in `watchResources`. How about:

```
reconciled = reconcileResourceProviderState()
  .onReady(defer(self(), ::watchProfiles))
  .onReady(defer(self(), ::watchResources))
  .onFailed(...)
  .onDiscarded(...);



src/resource_provider/storage/provider.cpp
Lines 1869-1886 (original), 1933-1950 (patched)
<https://reviews.apache.org/r/71151/#comment304274>

With periodic reconciliation, this won't be necessary. Removing this would 
also reduce a source of reconciliation, so the reconciliation can only happen 
in either `watchProfiles` or `watchResources`.


- Chun-Hung Hsiao


On July 29, 2019, 8:56 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71151/
> -------
> 
> (Updated July 29, 2019, 8:56 a.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Performed periodic storage local provider reconciliations.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> 6d632606f411d3ca99d3573a57c9f68b02ba8072 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69b59d48ceefebbb7accefe411c54ac5cecff1c3 
> 
> 
> Diff: https://reviews.apache.org/r/71151/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71150: Factored out storage provider method to update resources.

2019-08-02 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71150/#review216947
---



I'm not sure if this is


src/resource_provider/storage/provider.cpp
Lines 292 (patched)
<https://reviews.apache.org/r/71150/#comment304250>

Did you accidentally add a new line?



src/resource_provider/storage/provider.cpp
Lines 728 (patched)
<https://reviews.apache.org/r/71150/#comment304270>

How about merging this function and `reconcileStoragePools` since the logic 
is very similar?



src/resource_provider/storage/provider.cpp
Lines 730 (patched)
<https://reviews.apache.org/r/71150/#comment304182>

Remove this debug log.



src/resource_provider/storage/provider.cpp
Lines 735 (patched)
<https://reviews.apache.org/r/71150/#comment304183>

Remove this debug log.


- Chun-Hung Hsiao


On July 23, 2019, 8:18 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71150/
> ---
> 
> (Updated July 23, 2019, 8:18 p.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Factored out storage provider method to update resources.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> 6d632606f411d3ca99d3573a57c9f68b02ba8072 
> 
> 
> Diff: https://reviews.apache.org/r/71150/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 70831: Fixed synchronization in MasterTest.UpdateSlaveMessageWithPendingOffers.

2019-08-01 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70831/#review217026
---




src/tests/master_tests.cpp
Lines 9014 (patched)
<https://reviews.apache.org/r/70831/#comment304251>

Alternatively, we can avoid this expectation and do the following:
```
Future providerId = resourceProvider->process->id();
AWAIT_READY(providerId);
```
Then s/`resourceProvider->process->info.id()`/`providerId.get()`/

I'm fine with either approach so please feel free to drop this.


- Chun-Hung Hsiao


On June 11, 2019, 12:38 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70831/
> ---
> 
> (Updated June 11, 2019, 12:38 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In order for a resource provider to have been assigned and stored an
> assigned resource provider ID it needs to receive and process the
> corresponding resource provider `SUBSCRIBED` event.
> 
> This patch fixes `MasterTest.UpdateSlaveMessageWithPendingOffers` so we
> await the correct event. The test was previously awaiting an
> `UpdateSlaveMessage` which is only tangentially related, but does not
> guarantee correct event ordering.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 097f1b77a59e29c6690210773d1556ebf2bb701e 
> 
> 
> Diff: https://reviews.apache.org/r/70831/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 70831: Fixed synchronization in MasterTest.UpdateSlaveMessageWithPendingOffers.

2019-08-01 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70831/#review217025
---


Ship it!




Ship It!

- Chun-Hung Hsiao


On June 11, 2019, 12:38 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70831/
> ---
> 
> (Updated June 11, 2019, 12:38 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> In order for a resource provider to have been assigned and stored an
> assigned resource provider ID it needs to receive and process the
> corresponding resource provider `SUBSCRIBED` event.
> 
> This patch fixes `MasterTest.UpdateSlaveMessageWithPendingOffers` so we
> await the correct event. The test was previously awaiting an
> `UpdateSlaveMessage` which is only tangentially related, but does not
> guarantee correct event ordering.
> 
> 
> Diffs
> -
> 
>   src/tests/master_tests.cpp 097f1b77a59e29c6690210773d1556ebf2bb701e 
> 
> 
> Diff: https://reviews.apache.org/r/70831/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71148: Explicitly disabled periodic reconciliation for some provider tests.

2019-08-01 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71148/#review217024
---


Fix it, then Ship it!





src/tests/storage_local_resource_provider_tests.cpp
Line 330 (original), 330 (patched)
<https://reviews.apache.org/r/71148/#comment304249>

Fits in one line.


- Chun-Hung Hsiao


On July 29, 2019, 8:56 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71148/
> ---
> 
> (Updated July 29, 2019, 8:56 a.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Explicitly disabled periodic reconciliation for some provider tests.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69b59d48ceefebbb7accefe411c54ac5cecff1c3 
> 
> 
> Diff: https://reviews.apache.org/r/71148/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71147: Update config factory to set resource provider reconciliation interval.

2019-08-01 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71147/#review217023
---


Ship it!




Ship It!

- Chun-Hung Hsiao


On July 29, 2019, 8:56 a.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71147/
> ---
> 
> (Updated July 29, 2019, 8:56 a.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Update config factory to set resource provider reconciliation interval.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69b59d48ceefebbb7accefe411c54ac5cecff1c3 
> 
> 
> Diff: https://reviews.apache.org/r/71147/diff/2/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 70728: Backed `MockResourceProvider` by a process.

2019-07-30 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70728/#review216979
---


Fix it, then Ship it!





src/tests/mesos.hpp
Lines 3038 (patched)
<https://reviews.apache.org/r/70728/#comment304212>

It seems that I'm wrong about `Self`, which seems to become a type template 
in a class template. However, I've tested that we can simply use 
`MockResourceProviderProcess` (whitout any template argument) within this class 
without introducing a type alias. For example, use 
`::connectDefault`.

That said, please feel free to drop this if you think introducing the 
`Self` type alias make the code more consistent.



src/tests/mesos.hpp
Line 3142 (original), 3150 (patched)
<https://reviews.apache.org/r/70728/#comment304213>

s/since //



src/tests/mesos.hpp
Lines 3151 (patched)
<https://reviews.apache.org/r/70728/#comment304214>

This introduces a coupling beween this test and the implementation of the 
JWT secret generator being synchoronous.

How about the following instead:
```
process::Future> token = None();

#ifdef USE_SSL_SOCKET
...

token = secretGenerator.generate(principal)
  .then([](const Secret& secret) -> Option {
return secret.value().data();
  });

#endif

// TODO(bbannier): Remove the `shared_ptr` once we get C++14.
auto detector_ =
  std::make_shared>(std::move(detector));

token.then(defer(self(), [=](const Option& token) {
  driver.reset(new Driver(
  std::move(*detector_),
  ...,
  token));

  driver->start();
  
  return Nothing();
}));
```



src/tests/mesos.hpp
Lines 3302 (patched)
<https://reviews.apache.org/r/70728/#comment304216>

Do we use `FIXME` in the codebase?



src/tests/mesos.hpp
Lines 3312 (patched)
<https://reviews.apache.org/r/70728/#comment304215>

s/provider_id/providerId/ for naming consistency.



src/tests/mesos.hpp
Lines 3359 (patched)
<https://reviews.apache.org/r/70728/#comment304217>

No need to make it public anymore.



src/tests/mesos.hpp
Line 3350 (original), 3408 (patched)
<https://reviews.apache.org/r/70728/#comment304218>

How about s/Mock/Test/ here and below?

This is more of a personal naming opinion so please feel free to drop it.



src/tests/resource_provider_manager_tests.cpp
Lines 1203 (patched)
<https://reviews.apache.org/r/70728/#comment304219>

s/terminate/stop/ and remove the line break.



src/tests/resource_provider_manager_tests.cpp
Lines 1436 (patched)
<https://reviews.apache.org/r/70728/#comment304220>

Ditto.


- Chun-Hung Hsiao


On July 27, 2019, 7:04 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70728/
> ---
> 
> (Updated July 27, 2019, 7:04 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-9560
> https://issues.apache.org/jira/browse/MESOS-9560
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We were passing callbacks into `MockResourceProvider` to the HTTP
> driver. Since the lifecycle of the callbacks and the mock provider were
> decoupled and these callbacks were binding the mock provider instance
> the code was not safe as written as the driver could invoke the callback
> after the provider had been destructed.
> 
> This patch makes sure that the callbacks are defered to a process. We
> also dispatch a number of other functions which are strongly coupled to
> the lifecycle of the provider. We still do not hide the provider away
> completely so the provider can be mocked in tests.
> 
> 
> Diffs
> -
> 
>   src/tests/api_tests.cpp 641eb15153ddb85df322aa6a133ca8e4c6d6a510 
>   src/tests/master_slave_reconciliation_tests.cpp 
> 7b6ac50adacc8416b91c0dde55ff7ba46a20515c 
>   src/tests/master_tests.cpp b9ef13c31a9c3ae16e55d3ae8f9b1538a49cf49a 
>   src/tests/mesos.hpp 4612e2e3d2bd32590248df58b546de8756636964 
>   src/tests/operation_reconciliation_tests.cpp 
> eae318da2273faae904f0155e49bb23cdb24f007 
>   src/tests/resource_provider_manager_tests.cpp 
> 7d48f18e89f046df6c92e52edeef592acfb13b10 
>   src/tests/slave_tests.cpp abee107720d6b78bb017d2762431ae36c0679026 
> 
> 
> Diff: https://reviews.apache.org/r/70728/diff/7/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71149: Renamed a storage provider function.

2019-07-25 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71149/#review216878
---


Ship it!




You know what? I got stuck at this step trying to come up with a good name when 
I worked on this ticket ;) Thanks for coming up with this name!

- Chun-Hung Hsiao


On July 23, 2019, 8:18 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71149/
> ---
> 
> (Updated July 23, 2019, 8:18 p.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Renamed a storage provider function.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> 6d632606f411d3ca99d3573a57c9f68b02ba8072 
> 
> 
> Diff: https://reviews.apache.org/r/71149/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71148: Explicitly disabled periodic reconciliation for some provider tests.

2019-07-25 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71148/#review216877
---




src/tests/storage_local_resource_provider_tests.cpp
Line 846 (original), 846 (patched)
<https://reviews.apache.org/r/71148/#comment304121>

How about using zero here, as described in `mesos.proto`?

And since most tests require no reconciliation, maybe making zero the 
default in `setupResourceProviderConfig`?


- Chun-Hung Hsiao


On July 23, 2019, 8:18 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71148/
> ---
> 
> (Updated July 23, 2019, 8:18 p.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Explicitly disabled periodic reconciliation for some provider tests.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69861265d94ddf344da96b593797ce145394413e 
> 
> 
> Diff: https://reviews.apache.org/r/71148/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71147: Update config factory to set resource provider reconciliation interval.

2019-07-25 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71147/#review216876
---




src/tests/storage_local_resource_provider_tests.cpp
Lines 331 (patched)
<https://reviews.apache.org/r/71147/#comment304120>

I'm actually thinking about haveing a really short reconciliation interval 
for testing here, so certain tests won't wait for a long time. WDYT?

Or, maybe we can inject proper reconciliation intervals only in the tests 
that checks for reconciliations.


- Chun-Hung Hsiao


On July 23, 2019, 8:18 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71147/
> ---
> 
> (Updated July 23, 2019, 8:18 p.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Update config factory to set resource provider reconciliation interval.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69861265d94ddf344da96b593797ce145394413e 
> 
> 
> Diff: https://reviews.apache.org/r/71147/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71146: Clarified a comment in storage local resource provider tests.

2019-07-25 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71146/#review216875
---


Fix it, then Ship it!





src/tests/storage_local_resource_provider_tests.cpp
Line 615 (original), 615 (patched)
<https://reviews.apache.org/r/71146/#comment304119>

The RP "daemon" does not "get subscribed." The RP daemon is started after 
agent registration, which instantiate the resource providers, which subscribes 
to the RP manager.

So maybe "Since the local resource provider gets subscribed after the agent 
is registered?"

Ditto below.


- Chun-Hung Hsiao


On July 23, 2019, 8:18 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71146/
> ---
> 
> (Updated July 23, 2019, 8:18 p.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Clarified a comment in storage local resource provider tests.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69861265d94ddf344da96b593797ce145394413e 
> 
> 
> Diff: https://reviews.apache.org/r/71146/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71145: Fixed signature to pass parameter by const ref instead of value.

2019-07-25 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71145/#review216874
---


Ship it!




Ship It!

- Chun-Hung Hsiao


On July 23, 2019, 8:18 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71145/
> ---
> 
> (Updated July 23, 2019, 8:18 p.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9254
> https://issues.apache.org/jira/browse/MESOS-9254
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Fixed signature to pass parameter by const ref instead of value.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 69861265d94ddf344da96b593797ce145394413e 
> 
> 
> Diff: https://reviews.apache.org/r/71145/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 71144: Added `reconciliation_interval_seconds` for storage resource providers.

2019-07-25 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71144/
---

(Updated July 26, 2019, 4:26 a.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Addressed Benjamin's comment.


Bugs: MESOS-9254
https://issues.apache.org/jira/browse/MESOS-9254


Repository: mesos


Description
---

This new configuration option controls how frequent a storage resource
provider reconciles existing volumes and storage pools against its CSI
plugin to detect new or missing disk resources.


Diffs (updated)
-

  include/mesos/mesos.proto 2b4f350815935220c2d2b0dd0e52346bc74c91d9 
  include/mesos/v1/mesos.proto bafc27499f810791700c4a30dcb1da33b6f31d2e 
  src/Makefile.am 761dde1d63e0f4f1ac4ab86f129f84f3746d3153 
  src/resource_provider/constants.hpp PRE-CREATION 


Diff: https://reviews.apache.org/r/71144/diff/2/

Changes: https://reviews.apache.org/r/71144/diff/1-2/


Testing
---


Thanks,

Chun-Hung Hsiao



Re: Review Request 71143: Moved default constants for CSI RPC retry to a new header.

2019-07-25 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71143/
---

(Updated July 26, 2019, 4:25 a.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Rebased.


Bugs: MESOS-9254
https://issues.apache.org/jira/browse/MESOS-9254


Repository: mesos


Description
---

Since the default constants for CSI RPC retry do not depend on CSI
versions, these constants are pulled off from version-specific headers
to a common header.


Diffs (updated)
-

  src/Makefile.am 761dde1d63e0f4f1ac4ab86f129f84f3746d3153 
  src/csi/constants.hpp PRE-CREATION 
  src/csi/v0_volume_manager.cpp e19dc7c2933e2bbee5e817f3089bd569b259786b 
  src/csi/v0_volume_manager_process.hpp 
4cfb5b564af9e187a6e3d81f86964db288ae852e 
  src/csi/v1_volume_manager.cpp e7e032988bce576ef8d6a0c3457f26f1aad9bb58 
  src/csi/v1_volume_manager_process.hpp 
30788c3593d4b4bd6271705a1188f73836bc7a85 
  src/tests/storage_local_resource_provider_tests.cpp 
38233053452259571743317326a6efc74d97bd29 


Diff: https://reviews.apache.org/r/71143/diff/2/

Changes: https://reviews.apache.org/r/71143/diff/1-2/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Review Request 71144: Added `reconciliation_interval_seconds` for storage resource providers.

2019-07-23 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71144/
---

Review request for mesos and Benjamin Bannier.


Bugs: MESOS-9254
https://issues.apache.org/jira/browse/MESOS-9254


Repository: mesos


Description
---

This new configuration option controls how frequent a storage resource
provider reconciles existing volumes and storage pools against its CSI
plugin to detect new or missing disk resources.


Diffs
-

  include/mesos/mesos.proto 2b4f350815935220c2d2b0dd0e52346bc74c91d9 
  include/mesos/v1/mesos.proto bafc27499f810791700c4a30dcb1da33b6f31d2e 
  src/Makefile.am 761dde1d63e0f4f1ac4ab86f129f84f3746d3153 
  src/resource_provider/constants.hpp PRE-CREATION 


Diff: https://reviews.apache.org/r/71144/diff/1/


Testing
---


Thanks,

Chun-Hung Hsiao



Review Request 71143: Moved default constants for CSI RPC retry to a new header.

2019-07-23 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71143/
---

Review request for mesos and Benjamin Bannier.


Bugs: MESOS-9254
https://issues.apache.org/jira/browse/MESOS-9254


Repository: mesos


Description
---

Since the default constants for CSI RPC retry do not depend on CSI
versions, these constants are pulled off from version-specific headers
to a common header.


Diffs
-

  src/Makefile.am 761dde1d63e0f4f1ac4ab86f129f84f3746d3153 
  src/csi/constants.hpp PRE-CREATION 
  src/csi/v0_volume_manager.cpp e19dc7c2933e2bbee5e817f3089bd569b259786b 
  src/csi/v0_volume_manager_process.hpp 
4cfb5b564af9e187a6e3d81f86964db288ae852e 
  src/csi/v1_volume_manager.cpp e7e032988bce576ef8d6a0c3457f26f1aad9bb58 
  src/csi/v1_volume_manager_process.hpp 
30788c3593d4b4bd6271705a1188f73836bc7a85 
  src/tests/storage_local_resource_provider_tests.cpp 
38233053452259571743317326a6efc74d97bd29 


Diff: https://reviews.apache.org/r/71143/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70728: Backed `MockResourceProvider` by a process.

2019-07-18 Thread Chun-Hung Hsiao
thPendingOffers`.



src/tests/resource_provider_manager_tests.cpp
Line 1167 (original), 1167 (patched)
<https://reviews.apache.org/r/70728/#comment303959>

We need to have a proper synchronization here to avoid data races. See my 
comments in `UpdateSlaveMessageWithPendingOffers`.



src/tests/resource_provider_manager_tests.cpp
Line 1195 (original), 1195 (patched)
<https://reviews.apache.org/r/70728/#comment303970>

We're accessing the local variable `resourceProvider` in its process 
context, which seems error prone. How about having a 
`MockResourceProviderProcess::stop` function that resets its driver, and invoke 
it here:
```
Invoke(*resourceProvder->process, ::stop)
```



src/tests/resource_provider_manager_tests.cpp
Line 1272 (original), 1272 (patched)
<https://reviews.apache.org/r/70728/#comment303971>

Let's avoid accessing the local variable `resourceProvider` in its process 
context:
```
Invoke(*resourceProvder->process, ::stop)
```
See my comments in `ResubscribeResourceProvider`.



src/tests/resource_provider_manager_tests.cpp
Line 1342 (original), 1342 (patched)
<https://reviews.apache.org/r/70728/#comment303960>

We need to have a proper synchronization here to avoid data races. See my 
comments in `UpdateSlaveMessageWithPendingOffers`.



src/tests/resource_provider_manager_tests.cpp
Line 1421 (original), 1420 (patched)
<https://reviews.apache.org/r/70728/#comment303972>

Let's avoid accessing the local variable `resourceProvider1` in its process 
context:
```
Invoke(*resourceProvder1->process, ::stop)
```
See my comments in `ResubscribeResourceProvider`.



src/tests/slave_tests.cpp
Line 10835 (original), 10836 (patched)
<https://reviews.apache.org/r/70728/#comment303962>

In this test, `applyOperation` being ready could indicate that `subscribed` 
has been called, which means that `info` has been set up, so no data race here. 
However, it would be nice to make these more explicit and avoid direct accesses 
to `process->info` in different threads. See my comments in 
`UpdateSlaveMessageWithPendingOffers`.



src/tests/slave_tests.cpp
Lines 10870 (patched)
<https://reviews.apache.org/r/70728/#comment303964>

Let's use `resourceProviderId` here instead.



src/tests/slave_tests.cpp
Line 11256 (original), 11258 (patched)
<https://reviews.apache.org/r/70728/#comment303965>

In this test, `operation` being ready could indicate that `subscribed` has 
been called, which means that `info` has been set up, so no data race here. 
However, it would be nice to make these more explicit and avoid direct accesses 
to `process->info` in the remaining of this test. See my comments in 
`UpdateSlaveMessageWithPendingOffers`.



src/tests/slave_tests.cpp
Line 11402 (original), 11406 (patched)
<https://reviews.apache.org/r/70728/#comment303967>

s/`CHECK`/`ASSERT_TRUE`/.

We need to have a proper synchronization here to avoid data races. See my 
comments in `UpdateSlaveMessageWithPendingOffers`.

Also, let's avoid accessing `process->info` directly in the remaining of 
this test.



src/tests/slave_tests.cpp
Line 11769 (original), 11775 (patched)
<https://reviews.apache.org/r/70728/#comment303975>

Let's make `resourceProvider` an `Owned` or `unique_ptr`, and do 
`resourceProvider.reset()` instead.


- Chun-Hung Hsiao


On May 27, 2019, 4:32 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70728/
> ---
> 
> (Updated May 27, 2019, 4:32 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-9560
> https://issues.apache.org/jira/browse/MESOS-9560
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We were passing callbacks into `MockResourceProvider` to the HTTP
> driver. Since the lifecycle of the callbacks and the mock provider were
> decoupled and these callbacks were binding the mock provider instance
> the code was not safe as written as the driver could invoke the callback
> after the provider had been destructed.
> 
> This patch makes sure that the callbacks are defered to a process. We
> also dispatch a number of other functions which are strongly coupled to
> the lifecycle of the provider. We still do not hide the provider away
> completely so the provider can be mocked in tests.
> 
> 
> Diffs
> -
> 
>   src/tests/api_tests.cpp af1d215f00c8c2224e807677afb4af2d3521235a 
>   src/tests/master_slave_reconciliation_tests.cpp 
> 7b6ac50adacc8416b91c0dde55ff7ba46a20515c 
>   src/tests/master_tests.cpp 097f1b77a59e29c6690210773d1556ebf2bb701e 
>   src/tests/mesos.hpp d

Review Request 71018: Fixed a race between status updates and acknowledgements in SLRP tests.

2019-07-05 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71018/
---

Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Greg Mann.


Bugs: MESOS-9881
https://issues.apache.org/jira/browse/MESOS-9881


Repository: mesos


Description
---

SLRP tests `RetryOperationStatusUpdate*` check that no operation update
will be sent by SLRP once acknowledgements are sent by master. However,
since the acknowledgements are delivered via HTTP, it is possible that
operation status updates race with acknowledgements that are still in
HTTP pipes. This patch fixes this race condition.


Diffs
-

  src/tests/storage_local_resource_provider_tests.cpp 
38233053452259571743317326a6efc74d97bd29 


Diff: https://reviews.apache.org/r/71018/diff/1/


Testing
---

Ran the tests under stress 200 times.


Thanks,

Chun-Hung Hsiao



Re: Review Request 70788: Garbage-collected disappeared RPs when agent resources remain unchanged.

2019-06-06 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70788/
---

(Updated June 6, 2019, 6:35 p.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Addressed Benjamin's comment.


Bugs: MESOS-9831
https://issues.apache.org/jira/browse/MESOS-9831


Repository: mesos


Description
---

Previously when there is a missing resource provider in an
`UpdateSlaveMessage` but the agent's total resources remain unchanged,
the update message will be completely ignored, so the missing resource
provider will still be cached in the master's state, which is not the
desired behavior. This patch ensures that the master's state gets
updated if any resource provider is missing.


Diffs (updated)
-

  src/master/master.cpp 4d7c37cf8814e45432b3fe15173f5343676a372b 
  src/tests/api_tests.cpp 37d0cb11ddb24b32f1e83feefe168f2aa2eb659c 


Diff: https://reviews.apache.org/r/70788/diff/2/

Changes: https://reviews.apache.org/r/70788/diff/1-2/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70766: Fixed chaining futures infinitely in `UriDiskProfileAdaptor`.

2019-06-05 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70766/
---

(Updated June 6, 2019, 4:58 a.m.)


Review request for mesos, Benjamin Bannier and Benjamin Mahler.


Changes
---

Made `watchers` a vector and avoided the use of iterators.


Bugs: MESOS-9803
https://issues.apache.org/jira/browse/MESOS-9803


Repository: mesos


Description
---

Previously it is possible to have an infinite chain of futures when
`UriDiskProfileAdaptor::watch` is called: if the set of profiles remains
fixed for every poll, each poll would satisfy a promise that triggers
an asynchronous recursive call to `UriDiskProfileAdaptor::watch` again.

This patch fixes the problem by removing the asynchronous recursion.
Instead, we maintain a separated promise for each watcher that is never
associated to another promise. After each poll, we check if the current
set of profiles differs from the known set for a watcher, and satisfy
its own promise if so.


Diffs (updated)
-

  src/resource_provider/storage/uri_disk_profile_adaptor.hpp 
a5a34dc4dc1d518dc69aeb15fe62bd124d828ed3 
  src/resource_provider/storage/uri_disk_profile_adaptor.cpp 
215f7f92b5c2a0e60555134ce9887e8a187e3b1d 


Diff: https://reviews.apache.org/r/70766/diff/2/

Changes: https://reviews.apache.org/r/70766/diff/1-2/


Testing
---

make check

Manually tested with the unit test in r/70764. The unit test will fail at the 
5th poll without the fix and will pass with the fix.


Thanks,

Chun-Hung Hsiao



Review Request 70788: Garbage-collected disappeared RPs when agent resources remain unchanged.

2019-06-05 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70788/
---

Review request for mesos and Benjamin Bannier.


Bugs: MESOS-9831
https://issues.apache.org/jira/browse/MESOS-9831


Repository: mesos


Description
---

Previously when there is a missing resource provider in an
`UpdateSlaveMessage` but the agent's total resources remain unchanged,
the update message will be completely ignored, so the missing resource
provider will still be cached in the master's state, which is not the
desired behavior. This patch ensures that the master's state gets
updated if any resource provider is missing.


Diffs
-

  src/master/master.cpp 4d7c37cf8814e45432b3fe15173f5343676a372b 
  src/tests/api_tests.cpp 37d0cb11ddb24b32f1e83feefe168f2aa2eb659c 


Diff: https://reviews.apache.org/r/70788/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70728: Backed `MockResourceProvider` by a process.

2019-06-05 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70728/#review215705
---



s/completelt/completely/ in the description.


src/tests/master_tests.cpp
Line 9017 (original), 9017 (patched)
<https://reviews.apache.org/r/70728/#comment302518>

Not yours, but it seems to me that the dequeueing of the 
`UpdateSlaveMessage` in the master does not causally happen before the 
`subscribedDefault` call (which sets up the resource provider ID).


- Chun-Hung Hsiao


On May 27, 2019, 4:32 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70728/
> ---
> 
> (Updated May 27, 2019, 4:32 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> We were passing callbacks into `MockResourceProvider` to the HTTP
> driver. Since the lifecycle of the callbacks and the mock provider were
> decoupled and these callbacks were binding the mock provider instance
> the code was not safe as written as the driver could invoke the callback
> after the provider had been destructed.
> 
> This patch makes sure that the callbacks are defered to a process. We
> also dispatch a number of other functions which are strongly coupled to
> the lifecycle of the provider. We still do not hide the provider away
> completelt so the provider can be mocked in tests.
> 
> 
> Diffs
> -
> 
>   src/tests/api_tests.cpp 2220cecc22778a86f0c29317adf495927e1a900d 
>   src/tests/master_slave_reconciliation_tests.cpp 
> 7b6ac50adacc8416b91c0dde55ff7ba46a20515c 
>   src/tests/master_tests.cpp 097f1b77a59e29c6690210773d1556ebf2bb701e 
>   src/tests/mesos.hpp d0c82fa4c1800ed2a1825d563a9462ecd41b77fd 
>   src/tests/operation_reconciliation_tests.cpp 
> eae318da2273faae904f0155e49bb23cdb24f007 
>   src/tests/resource_provider_manager_tests.cpp 
> 7d48f18e89f046df6c92e52edeef592acfb13b10 
>   src/tests/slave_tests.cpp c2035976713abb31b3646c0d23771fa40df93271 
> 
> 
> Diff: https://reviews.apache.org/r/70728/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Review Request 70765: Renamed struct `ProfileRecord` to `ProfileData` for consistency.

2019-05-30 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70765/
---

Review request for mesos, Benjamin Bannier and Benjamin Mahler.


Repository: mesos


Description
---

Renamed struct `ProfileRecord` to `ProfileData` for consistency.


Diffs
-

  src/resource_provider/storage/uri_disk_profile_adaptor.hpp 
a5a34dc4dc1d518dc69aeb15fe62bd124d828ed3 
  src/resource_provider/storage/uri_disk_profile_adaptor.cpp 
215f7f92b5c2a0e60555134ce9887e8a187e3b1d 


Diff: https://reviews.apache.org/r/70765/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Review Request 70766: Fixed chaining futures infinitely in `UriDiskProfileAdaptor`.

2019-05-30 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70766/
---

Review request for mesos, Benjamin Bannier and Benjamin Mahler.


Bugs: MESOS-9803
https://issues.apache.org/jira/browse/MESOS-9803


Repository: mesos


Description
---

Previously it is possible to have an infinite chain of futures when
`UriDiskProfileAdaptor::watch` is called: if the set of profiles remains
fixed for every poll, each poll would satisfy a promise that triggers
an asynchronous recursive call to `UriDiskProfileAdaptor::watch` again.

This patch fixes the problem by removing the asynchronous recursion.
Instead, we maintain a separated promise for each watcher that is never
associated to another promise. After each poll, we check if the current
set of profiles differs from the known set for a watcher, and satisfy
its own promise if so.


Diffs
-

  src/resource_provider/storage/uri_disk_profile_adaptor.hpp 
a5a34dc4dc1d518dc69aeb15fe62bd124d828ed3 
  src/resource_provider/storage/uri_disk_profile_adaptor.cpp 
215f7f92b5c2a0e60555134ce9887e8a187e3b1d 


Diff: https://reviews.apache.org/r/70766/diff/1/


Testing
---

make check

Manually tested with the unit test in r/70764. The unit test will fail at the 
5th poll without the fix and will pass with the fix.


Thanks,

Chun-Hung Hsiao



Re: Review Request 70740: Windows: Removed multiple categories of sources from the build.

2019-05-28 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70740/#review215558
---




src/CMakeLists.txt
Lines 32 (patched)
<https://reviews.apache.org/r/70740/#comment302287>

Why are you guarding just the CSI v1 but not v0? They're different from the 
Mesos v0 vs v1 API protos. They are different APIs.

Let's just disable CSI on Windows and guard both for now.



src/CMakeLists.txt
Lines 61 (patched)
<https://reviews.apache.org/r/70740/#comment302288>

IIRC Windows uses some v2s2 spec? 
https://github.com/apache/mesos/blob/fa410f2fb8efb988590f4da2d4cfffbb2ce70637/src/uri/fetchers/docker.cpp#L974


- Chun-Hung Hsiao


On May 28, 2019, 11:11 p.m., Joseph Wu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70740/
> ---
> 
> (Updated May 28, 2019, 11:11 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Gilbert Song.
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This removes some categories of sources from the Windows build,
> where it is possible to do so with minimal ifdef-ing.
> 
> The features removed are all Linux-specific features that cannot be
> feasibly ported to Windows, including:
>   * Container Storage Interface (CSI)
>   * Docker image provisioning, specifically related to V2
>   * Open Container Interface
>   * Volume GID Manager
> 
> Protobufs are excluded where possible, but many of the above categories
> of protobufs are interleaved with other protobufs or source code, 
> which makes exclusion non-trivial.  For example, CSI V0 protobufs
> cannot be excluded without a large change; or libseccomp is a Linux-only
> feature, but its protobufs are now required to build the Mesos
> containerizer's protobufs.
> 
> Docker image provisioning was semi-trivial to exclude, because the
> related components (provisioner & URI fetcher) are somewhat modularized.
> 
> 
> Diffs
> -
> 
>   src/CMakeLists.txt 1d4f541b73c07a307a8b61f217e0cfad5dc095e4 
>   src/slave/containerizer/mesos/containerizer.cpp 
> 043244841a73fa3f5f7119bc38f6d3a04be8990b 
>   src/slave/containerizer/mesos/provisioner/store.cpp 
> 11fce0eb47e9e6dfce6289afe04a1d58a0c4461a 
>   src/tests/CMakeLists.txt e6b1d8a097246f0921467d7f778034d6754fcff6 
>   src/tests/uri_fetcher_tests.cpp c727cc52e82a396fe187a00c8cc3c9e78a919c5d 
>   src/uri/fetcher.hpp cc4bd93b3b8bcb7803f8f912f4ad9d3cf45a58a9 
>   src/uri/fetcher.cpp 8db43eb9763f1cf8040db93a1f03aae0fe9ab3c7 
> 
> 
> Diff: https://reviews.apache.org/r/70740/diff/1/
> 
> 
> Testing
> ---
> 
> cmake --build . --target check
> 
> This slightly decreases the memory footprint of the build, and allowed my 
> build instance (4GB mem) to proceed beyond some agent files (which is where 
> the Windows CI is also running out of memory).  It still ran out of memory 
> when compiling tests however.  After giving the instance more memory (8GB), 
> the build succeeds.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>



Re: Review Request 70651: Notifies master `/api/v1` subscribers about recovered frameworks.

2019-05-22 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70651/
---

(Updated May 23, 2019, 4:17 a.m.)


Review request for mesos and Greg Mann.


Bugs: MESOS-9785
https://issues.apache.org/jira/browse/MESOS-9785


Repository: mesos


Description
---

If one subscribes to master's `/api/v1` endpoint after a master failover
but before an agent reregistration, frameworks recovered through the
agent registration should be notified to the subscriber, otherwise
recovered tasks will have framework IDs referring to frameworks unknown
to the subscriber.


Diffs
-

  src/common/protobuf_utils.cpp 8b252cb11e17356836988dfc44a63953579a1def 
  src/master/master.cpp 4d7c37cf8814e45432b3fe15173f5343676a372b 
  src/tests/api_tests.cpp 37d0cb11ddb24b32f1e83feefe168f2aa2eb659c 


Diff: https://reviews.apache.org/r/70651/diff/3/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Review Request 70702: Serialized all events to master's `/api/v1` subscribers.

2019-05-22 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70702/
---

Review request for mesos, Benjamin Mahler and Greg Mann.


Bugs: MESOS-9785
https://issues.apache.org/jira/browse/MESOS-9785


Repository: mesos


Description
---

The master needs to create object approvers before sending an event to
its `/api/v1` subscribers. The creation calls `process::collect`, which
does not have any ordering guarantee. As a result, events might be
reordered, which could be unexpected by subscribers.

This patch imposes an order between events by serializing the creation
of object approvers. The actual creations can still go in parallel, but
the returned futures will be completed in the creation order.


Diffs
-

  src/master/master.hpp 2771f4c045c877b7d8aa5db042810232c0e40ba0 
  src/master/master.cpp 4d7c37cf8814e45432b3fe15173f5343676a372b 


Diff: https://reviews.apache.org/r/70702/diff/1/


Testing
---

make check

Ran the test updated in r/70651 for 500 times under stress.


Thanks,

Chun-Hung Hsiao



Re: Review Request 70651: Notifies master `/api/v1` subscribers about recovered frameworks.

2019-05-22 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70651/
---

(Updated May 22, 2019, 9:41 p.m.)


Review request for mesos and Greg Mann.


Changes
---

Addressed Greg's comments.


Bugs: MESOS-9785
https://issues.apache.org/jira/browse/MESOS-9785


Repository: mesos


Description
---

If one subscribes to master's `/api/v1` endpoint after a master failover
but before an agent reregistration, frameworks recovered through the
agent registration should be notified to the subscriber, otherwise
recovered tasks will have framework IDs referring to frameworks unknown
to the subscriber.


Diffs (updated)
-

  src/common/protobuf_utils.cpp 8b252cb11e17356836988dfc44a63953579a1def 
  src/master/master.cpp 4d7c37cf8814e45432b3fe15173f5343676a372b 
  src/tests/api_tests.cpp 37d0cb11ddb24b32f1e83feefe168f2aa2eb659c 


Diff: https://reviews.apache.org/r/70651/diff/2/

Changes: https://reviews.apache.org/r/70651/diff/1-2/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70651: Notifies master `/api/v1` subscribers about recovered frameworks.

2019-05-22 Thread Chun-Hung Hsiao


> On May 17, 2019, 10:16 a.m., Greg Mann wrote:
> > src/tests/api_tests.cpp
> > Lines 2774 (patched)
> > <https://reviews.apache.org/r/70651/diff/1/?file=2145475#file2145475line2777>
> >
> >     Nit: make this a const ref?
> 
> Chun-Hung Hsiao wrote:
> This is copied because `event` is overwritten later. Do you think that 
> it's okay to use a ref, since we don't access to it afterwards?
> 
> Greg Mann wrote:
> As long as `agentAdded` isn't modified, we can make it a const ref? I was 
> suggesting we do the same thing with `agentAdded` that we do previously in 
> this test with `getState`:
> ```
> const v1::master::Response::GetState& getState =
>   event->get().subscribed().get_state();
> ```
> 
> Chun-Hung Hsiao wrote:
> The problem is that `event->get()` will be destructed once `event` is 
> overwritten, so the ref, const or not, will no longer be valid. So making a 
> copy is a bit safer (if anyone touches this test in the future).
> 
> Greg Mann wrote:
> But it looks like we already use the const ref pattern in this test? It 
> seems best to me to just be consistent, but I don't have strong feelings. If 
> you don't want to change it, you can drop the issues.

Given that it's not very likely we'll change this test to access these 
references in the future, I'll take your suggestion for consistency :)


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70651/#review215319
---


On May 16, 2019, 5:11 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70651/
> ---
> 
> (Updated May 16, 2019, 5:11 a.m.)
> 
> 
> Review request for mesos and Greg Mann.
> 
> 
> Bugs: MESOS-9785
> https://issues.apache.org/jira/browse/MESOS-9785
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If one subscribes to master's `/api/v1` endpoint after a master failover
> but before an agent reregistration, frameworks recovered through the
> agent registration should be notified to the subscriber, otherwise
> recovered tasks will have framework IDs referring to frameworks unknown
> to the subscriber.
> 
> 
> Diffs
> -
> 
>   src/common/protobuf_utils.cpp 8b252cb11e17356836988dfc44a63953579a1def 
>   src/master/master.cpp 6c0e30bff3fbfe0ad1f3af44eb3df6f82048f0e1 
>   src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 
> 
> 
> Diff: https://reviews.apache.org/r/70651/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70686: Added a unit test for master operation authorization.

2019-05-21 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70686/
---

(Updated May 22, 2019, 4:09 a.m.)


Review request for mesos, Benjamin Bannier, Gastón Kleiman, and Greg Mann.


Changes
---

Addressed Benjamin's comments.


Bugs: MESOS-9485
https://issues.apache.org/jira/browse/MESOS-9485


Repository: mesos


Description
---

This test verifies that allowing or denying an action will only result
in a success or failure on specific operations but not other operations
in an accept call. This is a regression test for MESOS-9474 and
MESOS-9480.


Diffs (updated)
-

  src/slave/slave.hpp c740bf7c4af1699dd71a31f9169d29c1b4adb40d 
  src/tests/master_authorization_tests.cpp 
f65b6210a8189ec5e9089bce845a46d325253058 
  src/tests/mesos.hpp e70e0e1bbebef41557f9bb1f26c379357fc74e2a 
  src/tests/mock_slave.hpp 326a450eb2ae9912a19c6e0220e80f74d2953add 
  src/tests/mock_slave.cpp 1122c2a8b56b9d4e5deea4e892d81061f5e39a8d 


Diff: https://reviews.apache.org/r/70686/diff/2/

Changes: https://reviews.apache.org/r/70686/diff/1-2/


Testing
---

make check

Ran the unit test under stress for 500 iterations.


Thanks,

Chun-Hung Hsiao



Re: Review Request 70686: Added a unit test for master operation authorization.

2019-05-21 Thread Chun-Hung Hsiao


> On May 21, 2019, 12:07 p.m., Benjamin Bannier wrote:
> > src/tests/master_authorization_tests.cpp
> > Lines 3317-3320 (patched)
> > <https://reviews.apache.org/r/70686/diff/1/?file=2145964#file2145964line3317>
> >
> > If you get rid of the `vector` `evolve` function in r/70683/ we'd 
> > create a temporary `RepeatedPtrField` and convert when creating the accept 
> > call.
> > 
> > 
> > RepeatedPtrField operations_ =
> >   evolve(
> >   {operations.begin(), operations.end()});
> >   
> > mesos.send(v1::createCallAccept(
> > subscribed->framework_id(),
> > offers->offers(0),
> > {operations_.begin(), operations_.end()}));
> > 

Instead of doing this, the operations are now created with v1 proto messages, 
since operation feeback is only supported in v1. As a result, we no longer need 
the vector evolve helper.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70686/#review215405
---


On May 20, 2019, 10:37 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70686/
> ---
> 
> (Updated May 20, 2019, 10:37 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9485
> https://issues.apache.org/jira/browse/MESOS-9485
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test verifies that allowing or denying an action will only result
> in a success or failure on specific operations but not other operations
> in an accept call. This is a regression test for MESOS-9474 and
> MESOS-9480.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.hpp c740bf7c4af1699dd71a31f9169d29c1b4adb40d 
>   src/tests/master_authorization_tests.cpp 
> f65b6210a8189ec5e9089bce845a46d325253058 
>   src/tests/mesos.hpp e70e0e1bbebef41557f9bb1f26c379357fc74e2a 
>   src/tests/mock_slave.hpp 326a450eb2ae9912a19c6e0220e80f74d2953add 
>   src/tests/mock_slave.cpp 1122c2a8b56b9d4e5deea4e892d81061f5e39a8d 
> 
> 
> Diff: https://reviews.apache.org/r/70686/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> Ran the unit test under stress for 500 iterations.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70683: Added helpers for evolving operation-related proto messages.

2019-05-21 Thread Chun-Hung Hsiao


> On May 21, 2019, 12:07 p.m., Benjamin Bannier wrote:
> > src/internal/evolve.hpp
> > Lines 106-114 (original), 122-133 (patched)
> > <https://reviews.apache.org/r/70683/diff/1/?file=2145951#file2145951line122>
> >
> > This helper doesn't seem to fit well, and we'd only use this helper in 
> > a single spot later on in the chain.
> > 
> > I'd suggest to remove it here and just manually adjust in the later 
> > patch.

Removed all evolve helpers and added a devolve helper instead. In the test 
introduced in the later patch, all operations are generated with v1 proto 
messages, since operation feedback is only supported in v1.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70683/#review215403
---


On May 20, 2019, 10:30 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70683/
> ---
> 
> (Updated May 20, 2019, 10:30 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Greg Mann.
> 
> 
> Bugs: MESOS-9485
> https://issues.apache.org/jira/browse/MESOS-9485
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Added helpers for evolving operation-related proto messages.
> 
> 
> Diffs
> -
> 
>   src/internal/evolve.hpp 1044d9df75b6fc1f60d3704be9cb5751e6d4321d 
>   src/internal/evolve.cpp 19c155967bf090fb2ec39211805ff1385787ab59 
> 
> 
> Diff: https://reviews.apache.org/r/70683/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70683: Added a helper to devolve offer operations.

2019-05-21 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70683/
---

(Updated May 22, 2019, 4:06 a.m.)


Review request for mesos, Benjamin Bannier and Greg Mann.


Changes
---

Addressed Benjamin's comment.


Summary (updated)
-

Added a helper to devolve offer operations.


Bugs: MESOS-9485
https://issues.apache.org/jira/browse/MESOS-9485


Repository: mesos


Description (updated)
---

Added a helper to devolve offer operations.


Diffs (updated)
-

  src/internal/devolve.hpp a1f8d8d333c5304b4d3795800485fdbfc14d4455 
  src/internal/devolve.cpp e23ed3ccf6c05f5b8ea91788788469250c70248c 


Diff: https://reviews.apache.org/r/70683/diff/2/

Changes: https://reviews.apache.org/r/70683/diff/1-2/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70622: Added a unit test to verify if SLRP allows changes in volume context.

2019-05-21 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70622/
---

(Updated May 22, 2019, 3:15 a.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Addressed Benjamin's comments.


Bugs: MESOS-9395
https://issues.apache.org/jira/browse/MESOS-9395


Repository: mesos


Description
---

Added a unit test to verify if SLRP allows changes in volume context.


Diffs (updated)
-

  src/examples/test_csi_plugin.cpp 03f782ead136a79c4c5b099496072933f6737598 
  src/tests/storage_local_resource_provider_tests.cpp 
ecd7aeef1cb3d1a5b4b3419dfd912d41a8c6 


Diff: https://reviews.apache.org/r/70622/diff/2/

Changes: https://reviews.apache.org/r/70622/diff/1-2/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70621: Used full paths as volume IDs for the test CSI plugin.

2019-05-21 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70621/
---

(Updated May 22, 2019, 3:03 a.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Addressed Benjamin's comments.


Bugs: MESOS-9395
https://issues.apache.org/jira/browse/MESOS-9395


Repository: mesos


Description
---

The full paths of simulated volumes are now in their ID instead of their
contextual information. This simplifies SLRP tests, and makes it cleaner
if we want to customize the contextual information in the future.


Diffs (updated)
-

  src/examples/test_csi_plugin.cpp 03f782ead136a79c4c5b099496072933f6737598 
  src/tests/storage_local_resource_provider_tests.cpp 
ecd7aeef1cb3d1a5b4b3419dfd912d41a8c6 


Diff: https://reviews.apache.org/r/70621/diff/2/

Changes: https://reviews.apache.org/r/70621/diff/1-2/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70620: Made SLRP allow changes in volume context.

2019-05-21 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70620/
---

(Updated May 22, 2019, 2:58 a.m.)


Review request for mesos, Benjamin Bannier and James DeFelice.


Changes
---

Addressed Benjamin's comments.


Bugs: MESOS-9395
https://issues.apache.org/jira/browse/MESOS-9395


Repository: mesos


Description
---

To make SLRP more robust against non-conforming CSI plugins that change
volume contexts, the `getExistVolumes` method returns a list of resource
conversions consisting of one for converting old volume contexts to new
volume contexts, and one to remove missing volumes and add new volumes.

To make the interfaces consistent, `getStoragePools` now also returns a
list of resource conversions consisting of one conversion.


Diffs (updated)
-

  include/mesos/mesos.proto dc6a87f14864cb4d46b9856f251c2946a475162d 
  include/mesos/v1/mesos.proto e8086e0f70cac73876a8ae31db3365b5059b5c44 
  src/resource_provider/storage/provider.cpp 
999fe95bb6f38f5a25068accd854b37788b24028 


Diff: https://reviews.apache.org/r/70620/diff/2/

Changes: https://reviews.apache.org/r/70620/diff/1-2/


Testing
---

sudo make check
more testing done later in chain


Thanks,

Chun-Hung Hsiao



Re: Review Request 70621: Used full paths as volume IDs for the test CSI plugin.

2019-05-21 Thread Chun-Hung Hsiao


> On May 14, 2019, 11:37 a.m., Benjamin Bannier wrote:
> > src/examples/test_csi_plugin.cpp
> > Line 1253 (original), 1244 (patched)
> > <https://reviews.apache.org/r/70621/diff/1/?file=2144329#file2144329line1267>
> >
> > It would be great to have some testing that `getVolumePath -> 
> > parseVolumePath` roundtrips. Let's either add a simple dedicated test or, 
> > since this is test code, just add an assertion in e.g., `getVolumePath` 
> > that the created path would roundtrip.

Sure will do. However this function does not decode volume names so extra work 
needs to be done.


> On May 14, 2019, 11:37 a.m., Benjamin Bannier wrote:
> > src/examples/test_csi_plugin.cpp
> > Lines 1250 (patched)
> > <https://reviews.apache.org/r/70621/diff/1/?file=2144329#file2144329line1274>
> >
> > Can we do that "normalization" when we initialize `workDir` instead?

Doing that seems to increase the coupling between the initialization of 
`workDir`, which happens in the constructor, and the code here. Dropping.


> On May 14, 2019, 11:37 a.m., Benjamin Bannier wrote:
> > src/examples/test_csi_plugin.cpp
> > Line 1269 (original), 1269 (patched)
> > <https://reviews.apache.org/r/70621/diff/1/?file=2144329#file2144329line1296>
> >
> > We are leaving `id` default-initialized here.

No it's the context which got default-initialized. Let me make it explicit here.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70621/#review215248
---


On May 10, 2019, 1:17 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70621/
> ---
> 
> (Updated May 10, 2019, 1:17 a.m.)
> 
> 
> Review request for mesos and Benjamin Bannier.
> 
> 
> Bugs: MESOS-9395
> https://issues.apache.org/jira/browse/MESOS-9395
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The full paths of simulated volumes are now in their ID instead of their
> contextual information. This simplifies SLRP tests, and makes it cleaner
> if we want to customize the contextual information in the future.
> 
> 
> Diffs
> -
> 
>   src/examples/test_csi_plugin.cpp 03f782ead136a79c4c5b099496072933f6737598 
>   src/tests/storage_local_resource_provider_tests.cpp 
> ecd7aeef1cb3d1a5b4b3419dfd912d41a8c6 
> 
> 
> Diff: https://reviews.apache.org/r/70621/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70620: Made SLRP allow changes in volume context.

2019-05-20 Thread Chun-Hung Hsiao


> On May 14, 2019, 10:04 a.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Line 923 (original), 930 (patched)
> > <https://reviews.apache.org/r/70620/diff/1/?file=2144328#file2144328line930>
> >
> > Let's create a dedicated error message.

Do you mean printing out an error or making it a hard failure? In theory this 
should never fail and that's why I make it a hard assertion here.


> On May 14, 2019, 10:04 a.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 1067 (patched)
> > <https://reviews.apache.org/r/70620/diff/1/?file=2144328#file2144328line1076>
> >
> > I don't think we need optional semantics here, I'd suggest to just work 
> > with an empty `Labels` here.
> > 
> > If you want you can convert an empty `Labels` to a `None` when invoking 
> > `createRawDiskResource` below. I think even that function might not need 
> > optional semantics for labels though.

This is for maintaining the following proto2 <-> proto3 semantic, since proto3 
doesn't have an OPTIONAL semantic, instead any field w/ the default value will 
not go onto the wire:

Resource.DiskInfo.Source.has_metadata() <=> !VolumeInfo.context.empty()

Since in practice there's no difference, we can change this to set up 
`metadata`to an empty `Labels` even if the `context` map is empty, as long as 
we handle backward compatibility carefully. But if we don't do this, it seems 
more reasonable to me to pass in a `None` here than passing a `Labels` here and 
doing a special check in another function.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70620/#review215220
---


On May 10, 2019, 1:14 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70620/
> ---
> 
> (Updated May 10, 2019, 1:14 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and James DeFelice.
> 
> 
> Bugs: MESOS-9395
> https://issues.apache.org/jira/browse/MESOS-9395
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> To make SLRP more robust against non-conforming CSI plugins that change
> volume contexts, the `getExistVolumes` method returns a list of resource
> conversions consisting of one for converting old volume contexts to new
> volume contexts, and one to remove missing volumes and add new volumes.
> 
> To make the interfaces consistent, `getStoragePools` now also returns a
> list of resource conversions consisting of one conversion.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto dc6a87f14864cb4d46b9856f251c2946a475162d 
>   include/mesos/v1/mesos.proto e8086e0f70cac73876a8ae31db3365b5059b5c44 
>   src/resource_provider/storage/provider.cpp 
> 999fe95bb6f38f5a25068accd854b37788b24028 
> 
> 
> Diff: https://reviews.apache.org/r/70620/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> more testing done later in chain
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70620: Made SLRP allow changes in volume context.

2019-05-20 Thread Chun-Hung Hsiao


> On May 12, 2019, 12:44 a.m., James DeFelice wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 1078 (patched)
> > <https://reviews.apache.org/r/70620/diff/1/?file=2144328#file2144328line1087>
> >
> > how often are these conversions applied? i suppose it's every time that 
> > reconciliation happens - how often is that?

Right now it only happens once when the SLRP is restarted. The plan is to make 
this happen periodically or on demand through 
https://issues.apache.org/jira/browse/MESOS-9254.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70620/#review215202
---


On May 10, 2019, 1:14 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70620/
> ---
> 
> (Updated May 10, 2019, 1:14 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and James DeFelice.
> 
> 
> Bugs: MESOS-9395
> https://issues.apache.org/jira/browse/MESOS-9395
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> To make SLRP more robust against non-conforming CSI plugins that change
> volume contexts, the `getExistVolumes` method returns a list of resource
> conversions consisting of one for converting old volume contexts to new
> volume contexts, and one to remove missing volumes and add new volumes.
> 
> To make the interfaces consistent, `getStoragePools` now also returns a
> list of resource conversions consisting of one conversion.
> 
> 
> Diffs
> -
> 
>   include/mesos/mesos.proto dc6a87f14864cb4d46b9856f251c2946a475162d 
>   include/mesos/v1/mesos.proto e8086e0f70cac73876a8ae31db3365b5059b5c44 
>   src/resource_provider/storage/provider.cpp 
> 999fe95bb6f38f5a25068accd854b37788b24028 
> 
> 
> Diff: https://reviews.apache.org/r/70620/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> more testing done later in chain
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Review Request 70686: Added a unit test for master operation authorization.

2019-05-20 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70686/
---

Review request for mesos, Benjamin Bannier and Greg Mann.


Bugs: MESOS-9485
https://issues.apache.org/jira/browse/MESOS-9485


Repository: mesos


Description
---

This test verifies that allowing or denying an action will only result
in a success or failure on specific operations but not other operations
in an accept call. This is a regression test for MESOS-9474 and
MESOS-9480.


Diffs
-

  src/slave/slave.hpp c740bf7c4af1699dd71a31f9169d29c1b4adb40d 
  src/tests/master_authorization_tests.cpp 
f65b6210a8189ec5e9089bce845a46d325253058 
  src/tests/mesos.hpp e70e0e1bbebef41557f9bb1f26c379357fc74e2a 
  src/tests/mock_slave.hpp 326a450eb2ae9912a19c6e0220e80f74d2953add 
  src/tests/mock_slave.cpp 1122c2a8b56b9d4e5deea4e892d81061f5e39a8d 


Diff: https://reviews.apache.org/r/70686/diff/1/


Testing
---

make check

Ran the unit test under stress for 500 iterations.


Thanks,

Chun-Hung Hsiao



Review Request 70685: Removed the `TaskStatusUpdateIsTerminalState` matcher.

2019-05-20 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70685/
---

Review request for mesos, Benjamin Bannier and Gilbert Song.


Repository: mesos


Description
---

The `TaskStatusUpdateIsTerminalState` test matcher does not handle both
v0 and v1 status updates. Since introducing such a matcher in the
`v1::scheduler` namespace is inconsistent and it is not hard to use the
built-in `Truly` matcher to implement the same functionality, this
matcher is removed for now.


Diffs
-

  src/tests/containerizer/nvidia_gpu_isolator_tests.cpp 
7f5bd8cd4c07b724fe2c49068bf327e5afe9ec41 
  src/tests/mesos.hpp e70e0e1bbebef41557f9bb1f26c379357fc74e2a 


Diff: https://reviews.apache.org/r/70685/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Review Request 70684: Changed the `*TaskIdEq` test matchers to take a `TaskID`.

2019-05-20 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70684/
---

Review request for mesos, Benjamin Bannier and Greg Mann.


Bugs: MESOS-9485
https://issues.apache.org/jira/browse/MESOS-9485


Repository: mesos


Description
---

The `TaskStatusTaskIdEq` and `TaskStatusUpdateTaskIdEq` matchers
previously take a `TaskInfo`, which is not very intuitive and
inconsistent with other matchers such as `OptionTaskHasTaskID`. By
making these matchers take a `TaskID` we also reduce coupling of this
matcher and the structure of `TaskInfo`.


Diffs
-

  src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 
  src/tests/containerizer/nvidia_gpu_isolator_tests.cpp 
7f5bd8cd4c07b724fe2c49068bf327e5afe9ec41 
  src/tests/default_executor_tests.cpp 93d7a1cfa1a9df838426c8257b6158e3553b5037 
  src/tests/gc_tests.cpp a583f1d79a9dd9def24ccfeb9e49ea0392173420 
  src/tests/master_tests.cpp 964d935771a99efaee63187affe46b551146f310 
  src/tests/mesos.hpp e70e0e1bbebef41557f9bb1f26c379357fc74e2a 
  src/tests/partition_tests.cpp 8148aff6bf1d9637d486b549d7f8c7124566d86c 
  src/tests/slave_tests.cpp 50882a5aa1b20c234f3cec8366f346fa9bfd4f83 


Diff: https://reviews.apache.org/r/70684/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Review Request 70683: Added helpers for evolving operation-related proto messages.

2019-05-20 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70683/
---

Review request for mesos, Benjamin Bannier and Greg Mann.


Bugs: MESOS-9485
https://issues.apache.org/jira/browse/MESOS-9485


Repository: mesos


Description
---

Added helpers for evolving operation-related proto messages.


Diffs
-

  src/internal/evolve.hpp 1044d9df75b6fc1f60d3704be9cb5751e6d4321d 
  src/internal/evolve.cpp 19c155967bf090fb2ec39211805ff1385787ab59 


Diff: https://reviews.apache.org/r/70683/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70651: Notifies master `/api/v1` subscribers about recovered frameworks.

2019-05-20 Thread Chun-Hung Hsiao


> On May 17, 2019, 10:16 a.m., Greg Mann wrote:
> > src/tests/api_tests.cpp
> > Lines 2774 (patched)
> > <https://reviews.apache.org/r/70651/diff/1/?file=2145475#file2145475line2777>
> >
> >     Nit: make this a const ref?
> 
> Chun-Hung Hsiao wrote:
> This is copied because `event` is overwritten later. Do you think that 
> it's okay to use a ref, since we don't access to it afterwards?
> 
> Greg Mann wrote:
> As long as `agentAdded` isn't modified, we can make it a const ref? I was 
> suggesting we do the same thing with `agentAdded` that we do previously in 
> this test with `getState`:
> ```
> const v1::master::Response::GetState& getState =
>   event->get().subscribed().get_state();
> ```

The problem is that `event->get()` will be destructed once `event` is 
overwritten, so the ref, const or not, will no longer be valid. So making a 
copy is a bit safer (if anyone touches this test in the future).


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70651/#review215319
-------


On May 16, 2019, 5:11 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70651/
> ---
> 
> (Updated May 16, 2019, 5:11 a.m.)
> 
> 
> Review request for mesos and Greg Mann.
> 
> 
> Bugs: MESOS-9785
> https://issues.apache.org/jira/browse/MESOS-9785
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If one subscribes to master's `/api/v1` endpoint after a master failover
> but before an agent reregistration, frameworks recovered through the
> agent registration should be notified to the subscriber, otherwise
> recovered tasks will have framework IDs referring to frameworks unknown
> to the subscriber.
> 
> 
> Diffs
> -
> 
>   src/common/protobuf_utils.cpp 8b252cb11e17356836988dfc44a63953579a1def 
>   src/master/master.cpp 6c0e30bff3fbfe0ad1f3af44eb3df6f82048f0e1 
>   src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 
> 
> 
> Diff: https://reviews.apache.org/r/70651/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70651: Notifies master `/api/v1` subscribers about recovered frameworks.

2019-05-17 Thread Chun-Hung Hsiao


> On May 17, 2019, 10:16 a.m., Greg Mann wrote:
> > src/tests/api_tests.cpp
> > Lines 2774 (patched)
> > <https://reviews.apache.org/r/70651/diff/1/?file=2145475#file2145475line2777>
> >
> > Nit: make this a const ref?

This is copied because `event` is overwritten later. Do you think that it's 
okay to use a ref, since we don't access to it afterwards?


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70651/#review215319
-------


On May 16, 2019, 5:11 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70651/
> ---
> 
> (Updated May 16, 2019, 5:11 a.m.)
> 
> 
> Review request for mesos and Greg Mann.
> 
> 
> Bugs: MESOS-9785
> https://issues.apache.org/jira/browse/MESOS-9785
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> If one subscribes to master's `/api/v1` endpoint after a master failover
> but before an agent reregistration, frameworks recovered through the
> agent registration should be notified to the subscriber, otherwise
> recovered tasks will have framework IDs referring to frameworks unknown
> to the subscriber.
> 
> 
> Diffs
> -
> 
>   src/common/protobuf_utils.cpp 8b252cb11e17356836988dfc44a63953579a1def 
>   src/master/master.cpp 6c0e30bff3fbfe0ad1f3af44eb3df6f82048f0e1 
>   src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 
> 
> 
> Diff: https://reviews.apache.org/r/70651/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Review Request 70651: Notifies master `/api/v1` subscribers about recovered frameworks.

2019-05-15 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70651/
---

Review request for mesos and Greg Mann.


Bugs: MESOS-9785
https://issues.apache.org/jira/browse/MESOS-9785


Repository: mesos


Description
---

If one subscribes to master's `/api/v1` endpoint after a master failover
but before an agent reregistration, frameworks recovered through the
agent registration should be notified to the subscriber, otherwise
recovered tasks will have framework IDs referring to frameworks unknown
to the subscriber.


Diffs
-

  src/common/protobuf_utils.cpp 8b252cb11e17356836988dfc44a63953579a1def 
  src/master/master.cpp 6c0e30bff3fbfe0ad1f3af44eb3df6f82048f0e1 
  src/tests/api_tests.cpp bc19d7e9661f091aebf3072967548c7c3196239d 


Diff: https://reviews.apache.org/r/70651/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Review Request 70628: Return 409 if `UPDATE_RESOURCE_PROVIDER_CONFIG` names a missing config.

2019-05-10 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70628/
---

Review request for mesos, Benjamin Bannier, Greg Mann, and James DeFelice.


Bugs: MESOS-9779
https://issues.apache.org/jira/browse/MESOS-9779


Repository: mesos


Description
---

Since 404 is returned if the API endpoint route is not set yet, this
error code becomes ambiguous and makes clients hard to programmatically
handle it. Therefore, the error code for specifying a missing config
in this API call is changed to 409 Conflict.


Diffs
-

  include/mesos/agent/agent.proto ff408a4a1f144237e7425961aa3d11cba163e7c0 
  include/mesos/v1/agent/agent.proto 19d6c424365f8fd6e6d4a5bb7a61fbe5a66f031d 
  src/slave/http.cpp 2c4e792d16ad4fd3303760a9db3cba4269152e7d 
  src/tests/agent_resource_provider_config_api_tests.cpp 
7185ac00b5137c7ab208b623e36d03ffd43aab6e 


Diff: https://reviews.apache.org/r/70628/diff/1/


Testing
---

sudo make check


Thanks,

Chun-Hung Hsiao



Review Request 70627: Explicitly marked agent resource provider config calls as experimental.

2019-05-10 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70627/
---

Review request for mesos, Benjamin Bannier and James DeFelice.


Repository: mesos


Description
---

The resource provider feature has been marked as experimental, so we
should also call out the related config API calls are experimental.


Diffs
-

  include/mesos/agent/agent.proto ff408a4a1f144237e7425961aa3d11cba163e7c0 
  include/mesos/v1/agent/agent.proto 19d6c424365f8fd6e6d4a5bb7a61fbe5a66f031d 


Diff: https://reviews.apache.org/r/70627/diff/1/


Testing
---

N/A


Thanks,

Chun-Hung Hsiao



Re: Review Request 70579: Added a Task Scheduler to simplify testing.

2019-05-10 Thread Chun-Hung Hsiao


> On May 10, 2019, 3:31 a.m., Chun-Hung Hsiao wrote:
> > src/tests/utils/task_scheduler.cpp
> > Lines 289 (patched)
> > <https://reviews.apache.org/r/70579/diff/1/?file=2142143#file2142143line289>
> >
> > ```
> > offers_.erase(std::remove_if(...), offers_.end());
> > ```
> > Although the effect is the same since there will be only one element 
> > removed, pairing `std::remove_if` and `end` will be more idomatic.
> 
> Benjamin Mahler wrote:
> Hm.. why is this an "issue" and why is that more idiomatic..? Erase has 
> two versions, one take a position and one takes a range.
> 
> Looking at this code again, it's rather unreadable, and I probably will 
> just write the version with a loop to find the one to erase.

The reason it's more idiomatic is because `it = std::remove_if(...)` returns an 
iterator that points to the new end of valid items, which means that, `[it, 
offers_.end())` should be removed. Since internally `std::remove_if` moves 
items past `it` to their "correct" positions before `it`, erasing only `it` but 
not things after it seems very strange at a first glance.

Calling `erase` with one iterator works here because we're relying on the fact 
that `std::remove_if` will only remove one item in this particular case, which 
is not trivial to me (before looking into what `offers_` contains and when the 
lambda returns true.

If you prefer to use `erase` with one position, how about the following:
```
offers_.erase(std::find_if(
offers_.begin(),
offers_.end(),
[&](const v1::Offer& offer) {
...
}));
```

Also, I just noticed that both the snippet you have and the one I wrote above 
relies on the invariant where `event.rescind().offer_id()` exists in `offers_`. 
`vector::erase(pos)` requires that the iterator passed in should be both valid 
and deferenceable. If there are more than one agents, then this invariant will 
be violated.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70579/#review215174
---


On May 1, 2019, 9:17 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70579/
> -------
> 
> (Updated May 1, 2019, 9:17 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8511
> https://issues.apache.org/jira/browse/MESOS-8511
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The caller can submit tasks to the scheduler and get back statuses.
> This reduces the boilerplate of using the low level scheduler
> library.
> 
> 
> Diffs
> -
> 
>   include/mesos/v1/scheduler.hpp 2f72129ba48943c891d7020bfd2cad3066355026 
>   src/Makefile.am 5f97523fbe2d80733fbdcc7706f2761f5a071f9f 
>   src/scheduler/scheduler.cpp 674483aa80bc74b654343c97892a96f49d5c7ed4 
>   src/tests/CMakeLists.txt e6b1d8a097246f0921467d7f778034d6754fcff6 
>   src/tests/utils/task_scheduler.hpp PRE-CREATION 
>   src/tests/utils/task_scheduler.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/70579/diff/1/
> 
> 
> Testing
> ---
> 
> Tested it via the subsequent patch.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70596: Launched tasks with more memory in SLRP unit tests.

2019-05-09 Thread Chun-Hung Hsiao


> On May 6, 2019, 11:55 a.m., Benjamin Bannier wrote:
> > src/tests/storage_local_resource_provider_tests.cpp
> > Line 1795 (original), 1795 (patched)
> > <https://reviews.apache.org/r/70596/diff/1/?file=2142847#file2142847line1795>
> >
> > Could we move this into a test class-specific constant?
> > 
> > Here and below.

Made this a function that takes `persistentVolumes` as additional resources and 
adds cpus and mem to them.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70596/#review215058
-------


On May 3, 2019, 9:09 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70596/
> ---
> 
> (Updated May 3, 2019, 9:09 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Jan Schlicht.
> 
> 
> Bugs: MESOS-9765
> https://issues.apache.org/jira/browse/MESOS-9765
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Raised the task memory to 128MB (which are the value used in most persistent
> volume tests) in all SLRP tests that launch tasks to avoid OOM.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> ecd7aeef1cb3d1a5b4b3419dfd912d41a8c6 
> 
> 
> Diff: https://reviews.apache.org/r/70596/diff/1/
> 
> 
> Testing
> ---
> 
> sudo make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70579: Added a Task Scheduler to simplify testing.

2019-05-09 Thread Chun-Hung Hsiao
 (patched)
<https://reviews.apache.org/r/70579/#comment301727>

Instead of doing the wiring, we could instead make 
`TaskSchedulerProcess::submit` to take `output` as its second argument.



src/tests/utils/task_scheduler.cpp
Lines 486 (patched)
<https://reviews.apache.org/r/70579/#comment301728>

We could avoid this wiring by making `TaskSchedulerProcess::submit` take a 
second `hashmap` parameter.


- Chun-Hung Hsiao


On May 1, 2019, 9:17 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70579/
> ---
> 
> (Updated May 1, 2019, 9:17 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-8511
> https://issues.apache.org/jira/browse/MESOS-8511
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The caller can submit tasks to the scheduler and get back statuses.
> This reduces the boilerplate of using the low level scheduler
> library.
> 
> 
> Diffs
> -
> 
>   include/mesos/v1/scheduler.hpp 2f72129ba48943c891d7020bfd2cad3066355026 
>   src/Makefile.am 5f97523fbe2d80733fbdcc7706f2761f5a071f9f 
>   src/scheduler/scheduler.cpp 674483aa80bc74b654343c97892a96f49d5c7ed4 
>   src/tests/CMakeLists.txt e6b1d8a097246f0921467d7f778034d6754fcff6 
>   src/tests/utils/task_scheduler.hpp PRE-CREATION 
>   src/tests/utils/task_scheduler.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/70579/diff/1/
> 
> 
> Testing
> ---
> 
> Tested it via the subsequent patch.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Review Request 70620: Made SLRP allow changes in volume context.

2019-05-09 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70620/
---

Review request for mesos, Benjamin Bannier and James DeFelice.


Bugs: MESOS-9395
https://issues.apache.org/jira/browse/MESOS-9395


Repository: mesos


Description
---

To make SLRP more robust against non-conforming CSI plugins that change
volume contexts, the `getExistVolumes` method returns a list of resource
conversions consisting of one for converting old volume contexts to new
volume contexts, and one to remove missing volumes and add new volumes.

To make the interfaces consistent, `getStoragePools` now also returns a
list of resource conversions consisting of one conversion.


Diffs
-

  include/mesos/mesos.proto dc6a87f14864cb4d46b9856f251c2946a475162d 
  include/mesos/v1/mesos.proto e8086e0f70cac73876a8ae31db3365b5059b5c44 
  src/resource_provider/storage/provider.cpp 
999fe95bb6f38f5a25068accd854b37788b24028 


Diff: https://reviews.apache.org/r/70620/diff/1/


Testing
---

sudo make check
more testing done later in chain


Thanks,

Chun-Hung Hsiao



Review Request 70622: Added a unit test to verify if SLRP allows changes in volume context.

2019-05-09 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70622/
---

Review request for mesos and Benjamin Bannier.


Bugs: MESOS-9395
https://issues.apache.org/jira/browse/MESOS-9395


Repository: mesos


Description
---

Added a unit test to verify if SLRP allows changes in volume context.


Diffs
-

  src/examples/test_csi_plugin.cpp 03f782ead136a79c4c5b099496072933f6737598 
  src/tests/storage_local_resource_provider_tests.cpp 
ecd7aeef1cb3d1a5b4b3419dfd912d41a8c6 


Diff: https://reviews.apache.org/r/70622/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Review Request 70621: Used full paths as volume IDs for the test CSI plugin.

2019-05-09 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70621/
---

Review request for mesos and Benjamin Bannier.


Bugs: MESOS-9395
https://issues.apache.org/jira/browse/MESOS-9395


Repository: mesos


Description
---

The full paths of simulated volumes are now in their ID instead of their
contextual information. This simplifies SLRP tests, and makes it cleaner
if we want to customize the contextual information in the future.


Diffs
-

  src/examples/test_csi_plugin.cpp 03f782ead136a79c4c5b099496072933f6737598 
  src/tests/storage_local_resource_provider_tests.cpp 
ecd7aeef1cb3d1a5b4b3419dfd912d41a8c6 


Diff: https://reviews.apache.org/r/70621/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70184: Fixed flakiness in 'RetryRpcWithExponentialBackoff'.

2019-05-07 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70184/#review215120
---


Ship it!




Ship It!

- Chun-Hung Hsiao


On May 6, 2019, 9:48 a.m., Jan Schlicht wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70184/
> ---
> 
> (Updated May 6, 2019, 9:48 a.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9594
> https://issues.apache.org/jira/browse/MESOS-9594
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Under some circumstances, offers would be filtered, resulting in the
> test being stuck while waiting for offers. This has been resolved by
> settling the clock before accepting new offers.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> ecd7aeef1cb3d1a5b4b3419dfd912d41a8c6 
> 
> 
> Diff: https://reviews.apache.org/r/70184/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> I could not reproduce the flaky behavior of this test case, hence only assume 
> that this patch resolves the flakiness.
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215075
---


Ship it!




Ship It!

- Chun-Hung Hsiao


On May 6, 2019, 10:04 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 6, 2019, 10:04 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/4/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215074
---


Fix it, then Ship it!




Thanks for making this test more deterministic! Got some comments, but IMO this 
test is robust enough in practice so please feel free to drop my issues.


3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2122 (patched)
<https://reviews.apache.org/r/70595/#comment301475>

If you want libprocess to manager this actor, you can do the following:
```
PID pid = spawn(new TestProcess(), true);

Future waited = dispatch(pid, ::wait_for_terminate);
```
Then you won't need to do `process::wait` and `delete` later.

If you do so, you don't even need `waited`.



3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2138 (patched)
<https://reviews.apache.org/r/70595/#comment301474>

Although highly unlikely, it is theoratically possible that the terminate 
event is enqueued before any event of `process` got dequeued. If this happen, 
this await would fail even w/ the fix.

A possible way to fix this is to set up some promise in 
`wait_for_terminate`, and then wait for the corresponding future before calling 
`terminate` in this test.

That said, fixing this flake that's unlikely to happen might not worth it, 
so please feel free to drop this issue.


- Chun-Hung Hsiao


On May 6, 2019, 6:34 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 6, 2019, 6:34 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/2/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-06 Thread Chun-Hung Hsiao


> On May 3, 2019, 11:31 p.m., Chun-Hung Hsiao wrote:
> > 3rdparty/libprocess/src/tests/process_tests.cpp
> > Lines 2092-2108 (patched)
> > <https://reviews.apache.org/r/70595/diff/1/?file=2142838#file2142838line2092>
> >
> > I tried this unit test w/o the fix in the following settings:
> > 1. `--gtest_repeat=100`, which resulted in 3 passes and 97 failures.
> > 2. Invoked the test w/o repetition in a for-loop, which resulted in 60 
> > passes and 40 failures.
> > 
> > So it seems to me that this test passes fairly easily if it only runs 
> > once.
> > 
> > Then I replace this snippet with the following code, which just used 1 
> > actor but failed every time in the above two settings:
> > ```
> >   UPID pid = spawn(new ProcessBase(), true);
> > 
> >   dispatch(pid, [] { os::sleep(Milliseconds(50)); });
> > 
> >   http::URL url = http::URL(
> >   "http",
> >   process::address().ip,
> >   process::address().port,
> >   "/__processes__");
> > 
> >   Future response = http::get(url);
> > 
> >   terminate(pid, true);
> > ```
> 
> Alexander Rukletsov wrote:
> Agree with Chun, we should avoid races as much as we can. Several 
> suggestions to further improve Chun's version:
> - Pause the clock in the beginnig. We can then sleep for 42 minutes to 
> make sure the first message is still being processes when `http::get()` is 
> invoked (will of course require `settle()` before).
> - `settle()` after `http::get` to ensure `/__processes__` handler is put 
> into the `pid`'s mailbaox.
> - Check that the resulted JSON does not include the entry for `pid`, 
> i.e., is empty.

I'm not sure if adding settles works:
1. Adding a `settle` before `http::get` makes it wait for the completion of the 
sleep.
2. Adding a `settle` after `http::get` makes `terminate` wait for the 
completion of the sleep.
IIUC both will increase the likelihood of the test passing w/o che fix.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215035
---


On May 3, 2019, 7:58 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 3, 2019, 7:58 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/1/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70594: Fixed an issue where /__processes__ never returns a response.

2019-05-03 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70594/#review215037
---


Ship it!




How about quoting `/__processes__` with backticks?

- Chun-Hung Hsiao


On May 3, 2019, 7:58 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70594/
> ---
> 
> (Updated May 3, 2019, 7:58 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> It's possible for /__processes__ to never return a response if
> a process is terminated after the /__processes__ handler dispatches
> to it, thus leading the Future to be abandoned.
> 
> This patch ensures we ignore those cases, rather than wait forever.
> 
> See MESOS-9766.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/process.cpp 
> 124836472313721a5dbfe4b1ca55f0da3cecd66b 
> 
> 
> Diff: https://reviews.apache.org/r/70594/diff/1/
> 
> 
> Testing
> ---
> 
> Added a test in the subsequent patch.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Re: Review Request 70595: Adds a regression test for MESOS-9766.

2019-05-03 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70595/#review215035
---




3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2086 (patched)
<https://reviews.apache.org/r/70595/#comment301415>

Should `/__processes__` be quoted by backticks? Ditto below.



3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2090 (patched)
<https://reviews.apache.org/r/70595/#comment301422>

How about `NoProcessesEndpointHang`?



3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2092-2108 (patched)
<https://reviews.apache.org/r/70595/#comment301419>

I tried this unit test w/o the fix in the following settings:
1. `--gtest_repeat=100`, which resulted in 3 passes and 97 failures.
2. Invoked the test w/o repetition in a for-loop, which resulted in 60 
passes and 40 failures.

So it seems to me that this test passes fairly easily if it only runs once.

Then I replace this snippet with the following code, which just used 1 
actor but failed every time in the above two settings:
```
  UPID pid = spawn(new ProcessBase(), true);

  dispatch(pid, [] { os::sleep(Milliseconds(50)); });

  http::URL url = http::URL(
  "http",
  process::address().ip,
  process::address().port,
  "/__processes__");

  Future response = http::get(url);

  terminate(pid, true);
```



3rdparty/libprocess/src/tests/process_tests.cpp
Lines 2110-2112 (patched)
<https://reviews.apache.org/r/70595/#comment301417>

`AWAIT_EXPECT_RESPONSE_STATUS_EQ(http::OK().status, response);`


- Chun-Hung Hsiao


On May 3, 2019, 7:58 p.m., Benjamin Mahler wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70595/
> ---
> 
> (Updated May 3, 2019, 7:58 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9766
> https://issues.apache.org/jira/browse/MESOS-9766
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This test fails on master prior to applying the fix for MESOS-9766.
> It attempts to ensure that processes are terminated after the
> /__processes__ handler dispatches to them.
> 
> 
> Diffs
> -
> 
>   3rdparty/libprocess/src/tests/process_tests.cpp 
> 60f3dd653153c2b2ccf9c3a7eae8c75fd6ff025c 
> 
> 
> Diff: https://reviews.apache.org/r/70595/diff/1/
> 
> 
> Testing
> ---
> 
> Ran in repetition, although it appears to consistently fail on master without 
> repetition needed.
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>



Review Request 70596: Launched tasks with more memory in SLRP unit tests.

2019-05-03 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70596/
---

Review request for mesos, Benjamin Bannier and Jan Schlicht.


Bugs: MESOS-9765
https://issues.apache.org/jira/browse/MESOS-9765


Repository: mesos


Description
---

Raised the task memory to 128MB (which are the value used in most persistent
volume tests) in all SLRP tests that launch tasks to avoid OOM.


Diffs
-

  src/tests/storage_local_resource_provider_tests.cpp 
ecd7aeef1cb3d1a5b4b3419dfd912d41a8c6 


Diff: https://reviews.apache.org/r/70596/diff/1/


Testing
---

sudo make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70133: Removed unnecessary accept filters in SLRP tests.

2019-05-02 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70133/
---

(Updated May 2, 2019, 11:40 p.m.)


Review request for mesos, Benjamin Bannier and Meng Zhu.


Changes
---

Removed the accept filters in a newly-added test.


Repository: mesos


Description
---

Removed unnecessary accept filters in SLRP tests.


Diffs (updated)
-

  src/tests/storage_local_resource_provider_tests.cpp 
8bf4d2331b59770a7b7f3768499047bec2d67677 


Diff: https://reviews.apache.org/r/70133/diff/3/

Changes: https://reviews.apache.org/r/70133/diff/2-3/


Testing
---

`sudo make check`

Especially, tested that each of the three modified tests finishes in 5 seconds.


Thanks,

Chun-Hung Hsiao



Re: Review Request 69955: Added SLRP unit tests for destroying unpublished persistent volumes.

2019-05-02 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69955/
---

(Updated May 2, 2019, 10:46 p.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Addressed Benjamin's comments.


Bugs: MESOS-9565
https://issues.apache.org/jira/browse/MESOS-9565


Repository: mesos


Description
---

This patch adds 3 unit tests: `DestroyUnpublishedPersistentVolume`,
`DestroyUnpublishedPersistentVolumeWithRecovery`, and
`DestroyUnpublishedPersistentVolumeWithReboot` to test that the SLRP is
resilient to misbehaved CSI plugins that fail to publish volumes.


Diffs (updated)
-

  src/tests/storage_local_resource_provider_tests.cpp 
8bf4d2331b59770a7b7f3768499047bec2d67677 


Diff: https://reviews.apache.org/r/69955/diff/4/

Changes: https://reviews.apache.org/r/69955/diff/3-4/


Testing
---

make check

All tests passed 400 iterations (200 per parameter value) under stress.


Thanks,

Chun-Hung Hsiao



Re: Review Request 69954: Added a SLRP unit test for persistent block volume creation.

2019-05-02 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69954/
---

(Updated May 2, 2019, 10:44 p.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Addressed Benjamin's comment.


Bugs: MESOS-9565
https://issues.apache.org/jira/browse/MESOS-9565


Repository: mesos


Description
---

Test `CreateDestroyPersistentBlockVolume` verifies that SLRP would fail
a `CREATE` operation on a BLOCK disk resource, and a followup `DESTROY`
will be dropped (instead of failing the SLRP).


Diffs (updated)
-

  src/tests/mock_csi_plugin.cpp cbcb59f720642c2801f5b86c438793e24437b414 
  src/tests/storage_local_resource_provider_tests.cpp 
8bf4d2331b59770a7b7f3768499047bec2d67677 


Diff: https://reviews.apache.org/r/69954/diff/4/

Changes: https://reviews.apache.org/r/69954/diff/3-4/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 69954: Added a SLRP unit test for persistent block volume creation.

2019-05-02 Thread Chun-Hung Hsiao


> On April 16, 2019, 1:06 p.m., Benjamin Bannier wrote:
> > src/tests/storage_local_resource_provider_tests.cpp
> > Lines 3128 (patched)
> > <https://reviews.apache.org/r/69954/diff/3/?file=2138790#file2138790line3128>
> >
> > Are we checking that the resources get returned to the offer pool? In 
> > that case let's confirm that `offers` looks like what we would expect.
> > 
> > Alternatively we might not want to test this here and could just remove 
> > this and add a matching ignore when setting up the mock expectation above. 
> > That seems to make more sense to me.
> 
> Chun-Hung Hsiao wrote:
> It's more about making it deterministic, and also as part of the 
> end-to-end scenario. The contents of the offer has already been checked by 
> the matcher.
> 
> I could instead ignore the offer by doing a `Times(AtMost(1))`. Do you 
> think this would make the test simpler?

As discussed offline, the offer has been checked with the matcher set in the 
expectation.
Instead of using `AWAIT_READY`, we can use `AWAIT_EXPECT_READY` with logging 
here.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69954/#review214689
-------


On April 12, 2019, 3:15 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69954/
> ---
> 
> (Updated April 12, 2019, 3:15 a.m.)
> 
> 
> Review request for mesos and Benjamin Bannier.
> 
> 
> Bugs: MESOS-9565
> https://issues.apache.org/jira/browse/MESOS-9565
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Test `CreateDestroyPersistentBlockVolume` verifies that SLRP would fail
> a `CREATE` operation on a BLOCK disk resource, and a followup `DESTROY`
> will be dropped (instead of failing the SLRP).
> 
> 
> Diffs
> -
> 
>   src/tests/mock_csi_plugin.cpp cbcb59f720642c2801f5b86c438793e24437b414 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 8bf4d2331b59770a7b7f3768499047bec2d67677 
> 
> 
> Diff: https://reviews.apache.org/r/69954/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70431: Made the `RetryRpcWithExponentialBackoff` SLRP test work with CSI v1.

2019-05-02 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70431/
---

(Updated May 2, 2019, 10:42 p.m.)


Review request for mesos, Benjamin Bannier and Jan Schlicht.


Changes
---

Addressed Benjamin's comments.


Bugs: MESOS-9627
https://issues.apache.org/jira/browse/MESOS-9627


Repository: mesos


Description
---

This patch enables the unit test to test against CSI v1 through the
following changes:

  * The forwarding mode of the test CSI plugin now respects the
`--api_version` option. When specified, only requests of the proper
CSI version will be forwarded.

  * The expectations of `CreateVolume` and `DeleteVolume` calls in the
unit tests are parameterized against the CSI version string.

  * The mock CSI plugin now provides a default implementation for the
`GetCapacity` call so the unit test can be simplified.


Diffs (updated)
-

  src/examples/test_csi_plugin.cpp b54d666a878576f5845027fe6f704e10195079e1 
  src/tests/mock_csi_plugin.cpp cbcb59f720642c2801f5b86c438793e24437b414 
  src/tests/storage_local_resource_provider_tests.cpp 
8bf4d2331b59770a7b7f3768499047bec2d67677 


Diff: https://reviews.apache.org/r/70431/diff/2/

Changes: https://reviews.apache.org/r/70431/diff/1-2/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70521: Renamed variables in `Master::_accept` to improve readability.

2019-05-01 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70521/
---

(Updated May 1, 2019, 8:36 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
---

Rebased.


Repository: mesos


Description
---

Renamed variables in `Master::_accept` to improve readability.


Diffs (updated)
-

  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 


Diff: https://reviews.apache.org/r/70521/diff/4/

Changes: https://reviews.apache.org/r/70521/diff/3-4/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-05-01 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
---

(Updated May 1, 2019, 8:35 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
---

Addressed BenM's comments.


Bugs: MESOS-9616
https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description (updated)
---

Currently if a framework accepts an offer with a `RESERVE` operation
without a task consuming the reserved resources, the resources will be
implicitly declined. This is counter to what one would expect (that only
the remaining resources in the offer will be declined):

  Offer `cpus:10` -> `ACCEPT` with `RESERVE cpus(role):1`
 *Actual* implicit decline: `cpus:9;cpus(role):1`
 *Expected* implicit decline: `cpus:9`

The same issue is present with other transformational operations (i.e.,
`UNRESERVE`, `CREATE` and `DESTROY`).

This patch fixes this issue by only implicitly declining the "remaining"
untransformed resources, computed as follows:

  Offered = `cpus:10`
  Remaining = `cpus:9;cpus(role):1`
  Implicitly declined = remaining - (remaining - offered)
  = `cpus:9;cpus(role):1` - `cpus:(role):1`
  = `cpus:9`


Diffs (updated)
-

  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/8/

Changes: https://reviews.apache.org/r/70132/diff/7-8/


Testing
---

make check

More testing done in r/70537.


Thanks,

Chun-Hung Hsiao



Re: Review Request 70537: Added unit tests for implicit decline.

2019-04-25 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70537/
---

(Updated April 25, 2019, 10:34 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
---

Reordered test cases to match the documentation.


Bugs: MESOS-9616
https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
---

Added unit tests for implicit decline.


Diffs (updated)
-

  src/tests/scheduler_tests.cpp e0ed02900330c678bbf5c609c1f45d05147851ed 


Diff: https://reviews.apache.org/r/70537/diff/2/

Changes: https://reviews.apache.org/r/70537/diff/1-2/


Testing
---

make check

Tests `ImplicitDecline3` and `ImplicitDecline5` will fail w/o r/70132.


Thanks,

Chun-Hung Hsiao



Review Request 70547: Documented `DECLINE` and added examples for implicit decline.

2019-04-25 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70547/
---

Review request for mesos.


Repository: mesos


Description
---

Documented `DECLINE` and added examples for implicit decline.


Diffs
-

  docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 


Diff: https://reviews.apache.org/r/70547/diff/1/


Testing
---

N/A


Thanks,

Chun-Hung Hsiao



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-25 Thread Chun-Hung Hsiao


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140649#file2140649line132>
> >
> > What do you think of getting rid of "implicitly declined" behavior for 
> > "cancelling operations"?
> > 
> > It seems that behavior is more driven by the implementation than 
> > intuitive api behavior; it e.g., forces frameworks to reason differently 
> > about operations executed in isolation vs. executed together. It seems 
> > having the identical behavior for both cases would both be easier to 
> > explain and also program against. The behavior that seems to make most 
> > sense for me would be to only ever implictly decline "untouched resources", 
> > e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) && 
> > UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.
> 
> Chun-Hung Hsiao wrote:
> It seems to me that "cancelling operations" as something that are both 1. 
> very rare and 2. make little sense for frameworks, so I'm more like 
> delivering a fix for common cases without making the alrealy-messy code path 
> more complicated. WDYT? Also @bmahler what's your opinion on @bbannier's 
> suggestion? IIRC you mentioned something like some are designed behaviors 
> before, but I didn't know the context.
> 
> Benjamin Mahler wrote:
> Thanks for bringing this up, it's certainly a bit bizarre of a use case. 
> I think the more common case is UNRESERVE on its own, where it still seems a 
> bit bizarre that the "untouched" resources are declined with the filter and 
> the UNRESERVE resources are not filtered. That seems a bit arbitrary to me, 
> but I'm not sure what to do about it without allowing the framework to be 
> explicit about which part it wants to "decline and filter" when accepting, 
> and this requires an interface change.
> 
> Personally I would consider RESERVE+UNRESERVE to be "touching" those 
> resources, but I don't think we should worry about it in this patch (I assume 
> that wasn't your intent anyway, and you were more wanting to raise this topic 
> for discussion?)
> 
> Benjamin Bannier wrote:
> What I worry most is that this edge case makes explaining suggested 
> framework behavior harder ("should any of the offer operations in a single 
> accept call cancel each other out you will not get offered the resources 
> again until the default offer filter timeout expires (the timeout isn't up to 
> you here)" -> framework defensively revives after each accept call if it has 
> more work to do). Instead we would like frameworks to focus on getting their 
> offer handling and decline behavior correct and only ever revive in 
> exceptional scenarios (e.g., even "_new_ work arrived").
> 
> Since this patch tries to fix incorrect master behavior we should make 
> sure to get the behavior somewhat right or else risk that frameworks 
> implement suboptimal behavior which will be hard to unlearn. That being said, 
> the fact that no framework author complained when this bug was introduced 
> makes me worry that they either do not care about how fast offers arrive or 
> already implement a overly pessimistc approach (e.g., revive whenever there 
> is more work to do in their state machine).

The timeout is still controlled by the scheduler even in the case where 
operations cancel each other out.

We're trying to move to a world where frameworks are more expilict about what 
filters they want to set up. With this in mind, it seems to me that neither the 
current fix nor what you suggested fits into this goal well, because here we're 
*guessing* what the frameworks' intentions. Say if a framework reserves two 
CPUs and unreserves the two reserved CPUs and then reserves two CPUs again, 
there is no way to infer if the framework is trying to use just two CPUs or 
four CPUs. This could become really messy in terms of both semantics and 
implementation.

Since as you said we're not aware of any complain about this, I'd say let's 
keep the logic simple and determine the declined resources based on the end 
result of an `ACCEPT` call. Dropping this for now.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
---


On April 25, 2019, 10:14 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.

Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-25 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
---

(Updated April 25, 2019, 10:14 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
---

Moved documentation to r/70547.


Bugs: MESOS-9616
https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
---

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly declined. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be declined.


Diffs (updated)
-

  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/7/

Changes: https://reviews.apache.org/r/70132/diff/6-7/


Testing
---

make check

More testing done in r/70537.


Thanks,

Chun-Hung Hsiao



Re: Review Request 70521: Renamed variables in `Master::_accept` to improve readability.

2019-04-23 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70521/
---

(Updated April 24, 2019, 2:04 a.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
---

Rebased.


Repository: mesos


Description
---

Renamed variables in `Master::_accept` to improve readability.


Diffs (updated)
-

  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 


Diff: https://reviews.apache.org/r/70521/diff/2/

Changes: https://reviews.apache.org/r/70521/diff/1-2/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Review Request 70537: Added unit tests for implicit decline.

2019-04-23 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70537/
---

Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Bugs: MESOS-9616
https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
---

Added unit tests for implicit decline.


Diffs
-

  src/tests/scheduler_tests.cpp e0ed02900330c678bbf5c609c1f45d05147851ed 


Diff: https://reviews.apache.org/r/70537/diff/1/


Testing
---

make check

Tests `ImplicitDecline3` and `ImplicitDecline5` will fail w/o r/70132.


Thanks,

Chun-Hung Hsiao



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-23 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
---

(Updated April 24, 2019, 2:02 a.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
---

Addressed some of Benjamin's comments and added examples.


Bugs: MESOS-9616
https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description
---

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly declined. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be declined.


Diffs (updated)
-

  docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/6/

Changes: https://reviews.apache.org/r/70132/diff/5-6/


Testing (updated)
---

make check

More testing done in r/70537.


Thanks,

Chun-Hung Hsiao



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-23 Thread Chun-Hung Hsiao


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > src/tests/slave_tests.cpp
> > Lines 6499 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140651#file2140651line6499>
> >
> > Since the changes in this patch are strongly related to behavior 
> > framework authors need to reason about I strongly feel that we must add a 
> > test for the expected behavior.
> 
> Chun-Hung Hsiao wrote:
> I could add a unit test in a separated patch. This patch itself will be 
> backported, after discussed with @bmahler.

Done in r/70537.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
-------


On April 23, 2019, 1:15 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> ---
> 
> (Updated April 23, 2019, 1:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
> https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-23 Thread Chun-Hung Hsiao


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > docs/scheduler-http-api.md
> > Line 132 (original), 132 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140649#file2140649line132>
> >
> > What do you think of getting rid of "implicitly declined" behavior for 
> > "cancelling operations"?
> > 
> > It seems that behavior is more driven by the implementation than 
> > intuitive api behavior; it e.g., forces frameworks to reason differently 
> > about operations executed in isolation vs. executed together. It seems 
> > having the identical behavior for both cases would both be easier to 
> > explain and also program against. The behavior that seems to make most 
> > sense for me would be to only ever implictly decline "untouched resources", 
> > e.g., if accepting offered `cpus:4` with `RESERVE(cpus:2, role) && 
> > UNRESERVE(cpus:2, role)` we would implicitly decline only `cpus:2`.

It seems to me that "cancelling operations" as something that are both 1. very 
rare and 2. make little sense for frameworks, so I'm more like delivering a fix 
for common cases without making the alrealy-messy code path more complicated. 
WDYT? Also @bmahler what's your opinion on @bbannier's suggestion? IIRC you 
mentioned something like some are designed behaviors before, but I didn't know 
the context.


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > src/master/master.cpp
> > Lines 5963-5964 (original), 5983-5984 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140650#file2140650line5983>
> >
> > Is this a workaround we need until MESOS-4553 gets resolved? If it is, 
> > let's add a `TODO`.

I don't know actually lol. I just copied it from 
https://github.com/apache/mesos/blob/45c9788618e7123f408a1dffcf6772a1285cd2e5/src/master/master.cpp#L10969-L10972,
 as @mzhu suggested that if there's an allocation in between there might be 
offer fragmentation. Is this a workaround for MESOS-4553?


> On April 23, 2019, 10:47 a.m., Benjamin Bannier wrote:
> > src/tests/slave_tests.cpp
> > Lines 6499 (patched)
> > <https://reviews.apache.org/r/70132/diff/5/?file=2140651#file2140651line6499>
> >
> > Since the changes in this patch are strongly related to behavior 
> > framework authors need to reason about I strongly feel that we must add a 
> > test for the expected behavior.

I could add a unit test in a separated patch. This patch itself will be 
backported, after discussed with @bmahler.


- Chun-Hung


-------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/#review214812
---


On April 23, 2019, 1:15 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70132/
> ---
> 
> (Updated April 23, 2019, 1:15 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.
> 
> 
> Bugs: MESOS-9616
> https://issues.apache.org/jira/browse/MESOS-9616
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Currently if a framework accepts an offer to perform pipelined
> operations, e.g., reserving resource, without a final consumer, the
> converted resources will be implicitly declined. This is an undesired
> behavior as the framework might want to reserve one resource first but
> launch a task later in the next allocation cycle. This patch fixes this
> behavior.
> 
> But, if the framework accepts an offers with multiple operations that
> cancel out each other, the resources consumed by these operations are
> still considered unused and will be declined.
> 
> 
> Diffs
> -
> 
>   docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
>   src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70132/diff/5/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Review Request 70521: Renamed variables in `Master::_accept` to improve readability.

2019-04-22 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70521/
---

Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Repository: mesos


Description
---

Renamed variables in `Master::_accept` to improve readability.


Diffs
-

  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 


Diff: https://reviews.apache.org/r/70521/diff/1/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70132: Do not implicitly decline speculatively converted resources.

2019-04-22 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70132/
---

(Updated April 23, 2019, 1:15 a.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Meng Zhu.


Changes
---

Updated variable names and comments to make it easier to understand.


Summary (updated)
-

Do not implicitly decline speculatively converted resources.


Bugs: MESOS-9616
https://issues.apache.org/jira/browse/MESOS-9616


Repository: mesos


Description (updated)
---

Currently if a framework accepts an offer to perform pipelined
operations, e.g., reserving resource, without a final consumer, the
converted resources will be implicitly declined. This is an undesired
behavior as the framework might want to reserve one resource first but
launch a task later in the next allocation cycle. This patch fixes this
behavior.

But, if the framework accepts an offers with multiple operations that
cancel out each other, the resources consumed by these operations are
still considered unused and will be declined.


Diffs (updated)
-

  docs/scheduler-http-api.md a5327c229142267836f327f9c382ef50b7e334db 
  src/master/master.cpp ad54ae217863a08f4e6d743b39c176b171353084 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70132/diff/5/

Changes: https://reviews.apache.org/r/70132/diff/4-5/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 65085: WIP: Added a unit test for fd leakage with blocking HTTP calls.

2019-04-16 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65085/
---

(Updated April 16, 2019, 10 p.m.)


Review request for Benjamin Bannier.


Bugs: MESOS-8428
https://issues.apache.org/jira/browse/MESOS-8428


Repository: mesos


Description
---

When running `ContainerDaemonTest.FileDescriptorLeakage` in repetition,
the test will use up all fds.


Diffs (updated)
-

  src/tests/container_daemon_tests.cpp 1396f25bf7aa9fbc62af9111ad412238faff432e 


Diff: https://reviews.apache.org/r/65085/diff/2/

Changes: https://reviews.apache.org/r/65085/diff/1-2/


Testing
---


Thanks,

Chun-Hung Hsiao



Re: Review Request 69954: Added a SLRP unit test for persistent block volume creation.

2019-04-16 Thread Chun-Hung Hsiao


> On April 16, 2019, 1:06 p.m., Benjamin Bannier wrote:
> > src/tests/storage_local_resource_provider_tests.cpp
> > Lines 3128 (patched)
> > <https://reviews.apache.org/r/69954/diff/3/?file=2138790#file2138790line3128>
> >
> > Are we checking that the resources get returned to the offer pool? In 
> > that case let's confirm that `offers` looks like what we would expect.
> > 
> > Alternatively we might not want to test this here and could just remove 
> > this and add a matching ignore when setting up the mock expectation above. 
> > That seems to make more sense to me.

It's more about making it deterministic, and also as part of the end-to-end 
scenario. The contents of the offer has already been checked by the matcher.

I could instead ignore the offer by doing a `Times(AtMost(1))`. Do you think 
this would make the test simpler?


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69954/#review214689
-------


On April 12, 2019, 3:15 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69954/
> ---
> 
> (Updated April 12, 2019, 3:15 a.m.)
> 
> 
> Review request for mesos and Benjamin Bannier.
> 
> 
> Bugs: MESOS-9565
> https://issues.apache.org/jira/browse/MESOS-9565
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Test `CreateDestroyPersistentBlockVolume` verifies that SLRP would fail
> a `CREATE` operation on a BLOCK disk resource, and a followup `DESTROY`
> will be dropped (instead of failing the SLRP).
> 
> 
> Diffs
> -
> 
>   src/tests/mock_csi_plugin.cpp cbcb59f720642c2801f5b86c438793e24437b414 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 8bf4d2331b59770a7b7f3768499047bec2d67677 
> 
> 
> Diff: https://reviews.apache.org/r/69954/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 69955: Added SLRP unit tests for destroying unpublished persistent volumes.

2019-04-16 Thread Chun-Hung Hsiao


> On April 16, 2019, 1:18 p.m., Benjamin Bannier wrote:
> > src/tests/storage_local_resource_provider_tests.cpp
> > Lines 3328 (patched)
> > <https://reviews.apache.org/r/69955/diff/3/?file=2139099#file2139099line3328>
> >
> > If we need to await offers, lets also check their contents. It seems 
> > checking the operation status would already be enough.

This would eliminate the nondeterminism of the times of `resourceOffers` being 
called.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69955/#review214690
---


On April 12, 2019, 10:02 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69955/
> ---
> 
> (Updated April 12, 2019, 10:02 p.m.)
> 
> 
> Review request for mesos and Benjamin Bannier.
> 
> 
> Bugs: MESOS-9565
> https://issues.apache.org/jira/browse/MESOS-9565
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> This patch adds 3 unit tests: `DestroyUnpublishedPersistentVolume`,
> `DestroyUnpublishedPersistentVolumeWithRecovery`, and
> `DestroyUnpublishedPersistentVolumeWithReboot` to test that the SLRP is
> resilient to misbehaved CSI plugins that fail to publish volumes.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 8bf4d2331b59770a7b7f3768499047bec2d67677 
> 
> 
> Diff: https://reviews.apache.org/r/69955/diff/3/
> 
> 
> Testing
> ---
> 
> make check
> 
> All tests passed 400 iterations (200 per parameter value) under stress.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Review Request 70481: Added an image compatibility note for UCR.

2019-04-15 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70481/
---

Review request for mesos and Gilbert Song.


Bugs: MESOS-7617
https://issues.apache.org/jira/browse/MESOS-7617


Repository: mesos


Description
---

Added an image compatibility note for UCR.


Diffs
-

  docs/isolators/docker-runtime.md a6272b4c83401d3c2cf88333f337b55b147fd083 


Diff: https://reviews.apache.org/r/70481/diff/1/


Testing
---

N/A


Thanks,

Chun-Hung Hsiao



Re: Review Request 69955: Added SLRP unit tests for destroying unpublished persistent volumes.

2019-04-12 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69955/
---

(Updated April 12, 2019, 10:02 p.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Rebased and parameterized the tests.


Summary (updated)
-

Added SLRP unit tests for destroying unpublished persistent volumes.


Bugs: MESOS-9565
https://issues.apache.org/jira/browse/MESOS-9565


Repository: mesos


Description
---

This patch adds 3 unit tests: `DestroyUnpublishedPersistentVolume`,
`DestroyUnpublishedPersistentVolumeWithRecovery`, and
`DestroyUnpublishedPersistentVolumeWithReboot` to test that the SLRP is
resilient to misbehaved CSI plugins that fail to publish volumes.


Diffs (updated)
-

  src/tests/storage_local_resource_provider_tests.cpp 
8bf4d2331b59770a7b7f3768499047bec2d67677 


Diff: https://reviews.apache.org/r/69955/diff/3/

Changes: https://reviews.apache.org/r/69955/diff/2-3/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Review Request 70468: Fixed crash when recovering a volume failed to publish with CSI v1.

2019-04-12 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70468/
---

Review request for mesos and Benjamin Bannier.


Bugs: MESOS-9729
https://issues.apache.org/jira/browse/MESOS-9729


Repository: mesos


Description
---

The CSI v1 volume manager falsely assumed that the target path always
exists when unpublishing a volume, which is not true if there is a
failure when publishing the volume.


Diffs
-

  src/csi/v1_volume_manager.cpp bf640f9ee836f6d8cc6bb2364f2cdeddc51a7cf3 


Diff: https://reviews.apache.org/r/70468/diff/1/


Testing
---

make check
Testing of this bug is done later in the chain.


Thanks,

Chun-Hung Hsiao



Re: Review Request 69954: Added a SLRP unit test for persistent block volume creation.

2019-04-11 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69954/
---

(Updated April 12, 2019, 3:15 a.m.)


Review request for mesos and Benjamin Bannier.


Changes
---

Rebased and fixed a test flake.


Bugs: MESOS-9565
https://issues.apache.org/jira/browse/MESOS-9565


Repository: mesos


Description
---

Test `CreateDestroyPersistentBlockVolume` verifies that SLRP would fail
a `CREATE` operation on a BLOCK disk resource, and a followup `DESTROY`
will be dropped (instead of failing the SLRP).


Diffs (updated)
-

  src/tests/mock_csi_plugin.cpp cbcb59f720642c2801f5b86c438793e24437b414 
  src/tests/storage_local_resource_provider_tests.cpp 
8bf4d2331b59770a7b7f3768499047bec2d67677 


Diff: https://reviews.apache.org/r/69954/diff/3/

Changes: https://reviews.apache.org/r/69954/diff/2-3/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 69954: Added a SLRP unit test for persistent block volume creation.

2019-04-11 Thread Chun-Hung Hsiao
(allocated: storage):1024; disk(allocated: storage):1024; 
> > ports(allocated: storage):[31000-32000]; disk(allocated: 
> > storage)(reservations: 
> > [(DYNAMIC,storage)])[BLOCK(org.apache.mesos.csi.test.local»
> > I0305 10:39:55.772673 58949 master.cpp:12085] Removing offer 
> > 62009208-e484-499c-ac45-09c08ef1cf4d-O3
> > 
> > GMOCK WARNING:
> > Uninteresting mock function call - returning directly.
> > Function call: offerRescinded(0x7ffd5e523d20, @0x7f8874003458 
> > 62009208-e484-499c-ac45-09c08ef1cf4d-O3)
> > NOTE: You can safely ignore the above warning unless this call should not 
> > happen.  Do not suppress it by blindly adding an EXPECT_CALL() if you don't 
> > mean to enforce the call.  See 
> > https://github.com/google/googletest/blob/master/googlemock/docs/CookBook.md#knowing-when-to-expect
> >  for details.
> > I0305 10:39:55.784617 58960 master.cpp:9865] Sending offers [ 
> > 62009208-e484-499c-ac45-09c08ef1cf4d-O4 ] to framework 
> > 62009208-e484-499c-ac45-09c08ef1cf4d- (default) at 
> > scheduler-e112e00f-8102-4bc0-9613-0cea89dd77d6@66.70.182.167:41287
> > ../src/tests/storage_local_resource_provider_tests.cpp:2887: Failure
> > Mock function called more times than expected - returning directly.
> > Function call: resourceOffers(0x7ffd5e523d20, @0x7f897275f2c8 { 
> > 160-byte object  > 00-00 00-00 00-00 00-00 00-00 00-00 00-00 05-00 00-00 05-00 00-00 40-96 
> > 03-90 88-7F 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 ... F0-DB 
> > 01-90 88-7F 00-00 70-DA 01-90 88-7F 00-00 60-AC 02-90 8»
> >  Expected: to be called once
> >Actual: called twice - over-saturated and active
> > *** Aborted at 1551778795 (unix time) try "date -d @1551778795" if you are 
> > using GNU date ***
> > ```
> > 
> > We should also take care to potentially expect `offerRescinded`.

Okay I understand what happened here now.

Since the `CREATE` and `DESTROY` function cancel each other out, the net effect 
of the last `ACCEPT` call was a no-op, which caused an implicit decline and 
then the block volume was re-offered after the default refuse timeout. However, 
SLRP failed the `CREATE` operation after the block volume was re-offered. Since 
an speculative operation failed, an `UPDATE_STATE` was triggered, which in turn 
rescinded the offer.

Since the purpose of this test is to verify that creating a PV on a block disk 
would fail, let me just remove the `DESTROY` part, which would avoid this 
scenario at all.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69954/#review213431
---


On March 4, 2019, 7:23 p.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69954/
> -------
> 
> (Updated March 4, 2019, 7:23 p.m.)
> 
> 
> Review request for mesos and Benjamin Bannier.
> 
> 
> Bugs: MESOS-9565
> https://issues.apache.org/jira/browse/MESOS-9565
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Test `CreateDestroyPersistentBlockVolume` verifies that SLRP would fail
> a `CREATE` operation on a BLOCK disk resource, and a followup `DESTROY`
> will be dropped (instead of failing the SLRP).
> 
> 
> Diffs
> -
> 
>   src/tests/mock_csi_plugin.cpp 10245705ab39872da0fef1afd02213e2c7f345cb 
>   src/tests/storage_local_resource_provider_tests.cpp 
> a661951a0a326cc342aa0c45dd0967692ae70941 
> 
> 
> Diff: https://reviews.apache.org/r/69954/diff/2/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70184: Fixed flakiness in 'RetryRpcWithExponentialBackoff'.

2019-04-11 Thread Chun-Hung Hsiao


> On March 27, 2019, 5:10 p.m., Chun-Hung Hsiao wrote:
> > src/tests/storage_local_resource_provider_tests.cpp
> > Lines 4926 (patched)
> > <https://reviews.apache.org/r/70184/diff/1/?file=2130933#file2130933line4926>
> >
> > I'm not sure if this is the most appropriate fix before understanding 
> > why the offer was filtered. Can you elaborate more on why this is needed?

Okay I think I now understand what happened here.

It was not because that the offer filter prevented the framework from getting 
the offer. No filter is set to refuse the newly created volume.

What actually happened was that, after the master received 
`OPERATION_FINISHED`, it would call `allocator->updateAllocation(...)` and 
`allocator->recoverResources(...)`:
https://github.com/apache/mesos/blob/cb4c9660eecae59bc2acbeb37ab19214f7b0e19b/src/master/master.cpp#L11962-L11972
Since these steps were done asynchronously, they raced with the clock advanced 
that triggered the batch allocation after L4926 in this test.
So in the failed run, the batch allocation was done before the resource (i.e., 
the created volume) was recovered, so the offer allocated to the framework did 
not have any "new" resource and thus filtered.
This can be reproduced by adding an `os::sleep(Milliseconds(10))` right before 
`allocator->updateAllocation(...)` above.

Adding a `REVIVE` call does the trick, because the revive call, which was 
processed sequentially *after* `OPERATION_FINISHED`, would asynchronuosly queue 
up an event-based allocation after the above two allocator calls.
So even if the race happens, it ensures that another allocation would be 
triggered after the created volume is recovered.
However it seems to me that a more proper fix is the following:
```
   EXPECT_EQ(OPERATION_FINISHED, updateOperationStatus->status().state());
-  // Advance the clock to trigger a batch allocation.
+  // Settle the clock to recover the created disk then advance the clock to
+  // trigger a batch allocation.
+  Clock::settle();
   Clock::advance(masterFlags.allocation_interval);

   AWAIT_READY(offers);
```


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70184/#review214121
---


On March 11, 2019, 3:39 p.m., Jan Schlicht wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70184/
> ---
> 
> (Updated March 11, 2019, 3:39 p.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9594
> https://issues.apache.org/jira/browse/MESOS-9594
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> Under some circumstances, offers would be filtered, resulting in the
> test being stuck while waiting for offers. This has been resolved by
> removing all offer filters before waiting for offers.
> 
> 
> Diffs
> -
> 
>   src/tests/storage_local_resource_provider_tests.cpp 
> 7945384867f26fa15dc734a235ae509d5d6d350f 
> 
> 
> Diff: https://reviews.apache.org/r/70184/diff/1/
> 
> 
> Testing
> ---
> 
> make check
> 
> I could not reproduce the flaky behavior of this test case, hence only assume 
> that this patch resolves the flakiness.
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>



Re: Review Request 70449: Avoid publishing resources when an HTTP executor resubscribes.

2019-04-11 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70449/
---

(Updated April 11, 2019, 3:55 p.m.)


Review request for mesos, Benjamin Bannier, Greg Mann, and Vinod Kone.


Changes
---

Fixed a typo.


Bugs: MESOS-9711
https://issues.apache.org/jira/browse/MESOS-9711


Repository: mesos


Description
---

After an agent failover, an HTTP executor may resubscribe before any
resource provider resubscribes. If that happens and the executor
has tasks consuming resources from an unsubscribed resource provider,
the agent will fail to publish the resources and kill the executor,
which is an undesired behavior. The patch fixes this issue.


Diffs (updated)
-

  src/slave/slave.cpp 794a9c986b266b4916f6fbada670142798245bd1 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70449/diff/5/

Changes: https://reviews.apache.org/r/70449/diff/4-5/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



Re: Review Request 70449: Avoid publishing resources when an HTTP executor resubscribes.

2019-04-11 Thread Chun-Hung Hsiao


> On April 11, 2019, 3:35 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp
> > Lines 5033 (patched)
> > <https://reviews.apache.org/r/70449/diff/4/?file=2138629#file2138629line5033>
> >
> > s/updated/update/

Oops thanks.


- Chun-Hung


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70449/#review214586
---


On April 11, 2019, 1:30 a.m., Chun-Hung Hsiao wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70449/
> ---
> 
> (Updated April 11, 2019, 1:30 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Greg Mann, and Vinod Kone.
> 
> 
> Bugs: MESOS-9711
> https://issues.apache.org/jira/browse/MESOS-9711
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> After an agent failover, an HTTP executor may resubscribe before any
> resource provider resubscribes. If that happens and the executor
> has tasks consuming resources from an unsubscribed resource provider,
> the agent will fail to publish the resources and kill the executor,
> which is an undesired behavior. The patch fixes this issue.
> 
> 
> Diffs
> -
> 
>   src/slave/slave.cpp 794a9c986b266b4916f6fbada670142798245bd1 
>   src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 
> 
> 
> Diff: https://reviews.apache.org/r/70449/diff/4/
> 
> 
> Testing
> ---
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>



Re: Review Request 70454: Fixed potential use-after-free bug in storage local resource provider.

2019-04-11 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70454/#review214587
---


Ship it!




Ship It!

- Chun-Hung Hsiao


On April 11, 2019, 12:55 p.m., Benjamin Bannier wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70454/
> ---
> 
> (Updated April 11, 2019, 12:55 p.m.)
> 
> 
> Review request for mesos and Chun-Hung Hsiao.
> 
> 
> Bugs: MESOS-9711
> https://issues.apache.org/jira/browse/MESOS-9711
> 
> 
> Repository: mesos
> 
> 
> Description
> ---
> 
> The storage local resource provider manages a metrics instance which is
> shared with the service and volume managers it also holds; these
> services do not manage the lifetime of the metrics instance.
> 
> This patch fixes the lifetime of the metrics instance.
> 
> 
> Diffs
> -
> 
>   src/resource_provider/storage/provider.cpp 
> b2ca5d0735cbd11b8a9689359e05d6bd723b8a64 
> 
> 
> Diff: https://reviews.apache.org/r/70454/diff/1/
> 
> 
> Testing
> ---
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>



Re: Review Request 70449: Avoid publishing resources when an HTTP executor resubscribes.

2019-04-10 Thread Chun-Hung Hsiao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70449/
---

(Updated April 11, 2019, 1:30 a.m.)


Review request for mesos, Benjamin Bannier, Greg Mann, and Vinod Kone.


Changes
---

Made the test fail without the fix.


Bugs: MESOS-9711
https://issues.apache.org/jira/browse/MESOS-9711


Repository: mesos


Description
---

After an agent failover, an HTTP executor may resubscribe before any
resource provider resubscribes. If that happens and the executor
has tasks consuming resources from an unsubscribed resource provider,
the agent will fail to publish the resources and kill the executor,
which is an undesired behavior. The patch fixes this issue.


Diffs (updated)
-

  src/slave/slave.cpp 794a9c986b266b4916f6fbada670142798245bd1 
  src/tests/slave_tests.cpp b1c3a01031b917fb9773c8c890a8f88838870559 


Diff: https://reviews.apache.org/r/70449/diff/4/

Changes: https://reviews.apache.org/r/70449/diff/3-4/


Testing
---

make check


Thanks,

Chun-Hung Hsiao



  1   2   3   4   5   6   7   8   9   10   >