Re: Review Request 66568: Dropped combined operations with GROW and SHRINK volumes.

Zhitao Li Tue, 24 Apr 2018 23:04:17 -0700


> On April 23, 2018, 3:43 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Line 4512 (original), 4515-4527 (patched)
> > <https://reviews.apache.org/r/66568/diff/5/?file=2010510#file2010510line4515>
> >
> >     I was just wondering - perhaps this code belongs in the validation 
> > function for the ACCEPT call? Since this is something that we can easily 
> > confirm synchronously in the top-level scheduler API handler and return 
> > 400, rather than calling into the accept handlers.
> 
> Zhitao Li wrote:
>     I'm fine in either case as long as it provides better consistency and 
> less code duplication (which I think is a problem). I guess the reason there 
> is a split is due to that AuthZ is async right now.
>     
>     I actualy wonder what happens if we move the slave connectivity checks to 
> `Master::accept()`: since master to agent connection through network can be 
> broken at any moment, I do not see checking that in `_accept()` is 100% safe 
> anyway, and `send()` calls to agent need to handle failures anyway.
>     
>     Alternatively, we can consider moving authorization checks as early as we 
> can, and condense all other validations after AuthZ returned (assuming AuthZ 
> do not depends on these validations).
> 
> Zhitao Li wrote:
>     (replied too fast) I feel both work can reduce the code duplicate, and 
> makes it less ambiguous for future development.
> 
> Chun-Hung Hsiao wrote:
>     IMO in general it's better to have validations before authN/authZ, 
> because validations is usually more light-weight than authN/authZ. Also, 
> validations should be synchronous, meaning that if the call is invalid, we 
> will get rid of it faster from the memory (rather than waiting for the 
> authN/authZ to finish).
> 
> Chun-Hung Hsiao wrote:
>     But in this case since this is temporary, I'm fine with putting the logic 
> here.
> 
> Chun-Hung Hsiao wrote:
>     Question: Are these operations usable with this restriction? It seems to 
> me that with this restriction, the framework will need to maintain one more 
> extra state: 1. reserve enough CPUs, mem and free disk (otherwise it is 
> possible that the framework cannot get enough CPUs and mem after growing the 
> volume); 2. grow the persistent volume; 3. use the grown volume. If we 
> implement a proper validation right now, 1 and 2 can be combined, and there 
> will be no need to rewrite the framework in the future. Can you explain again 
> what the concern of implementing the correct validation logic again?

Re: order of valiation vs AuthZ: I agree they should be consistency, but we 
already have splits between `accept()` and `_accept()` and framework could see 
operations dropped either before or after AuthZ too. Given that 1) AuthZ will 
be async by nature and 2) certain states could change between `accept()` and 
`_accept()` (offer validaty, agent connectivity), I feel that only check AuthZ 
on `accept()` but delay any actual validation to `_accept()` results in most 
consistent behavior. Indeed this could increase calls to authorizor but I don't 
know whether that can be quantified.

Re: usability of framework: I generally believe that framework always needs to 
model each state to handle corner cases even if it uses chained operations 
(because that can have partial success). That's also why I hope to eventually 
not support chained operations at all. In the final form of the code, whatever 
"correct validation logic" we write now will not be necessary, and we can 
implicitly rely on `offeredResources` containment checks chained operations 
(because `converted` resources will not be added back immediately). 
Implementing this right now would require us to add extra variables like 
`observedOfferedResources` and `actualOfferedResources` and check them on each 
`case` of operations, which is not be as clean or as local as current patch 
(which also easier to back out).

- Zhitao

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66568/#review201785
-----------------------------------------------------------

On April 23, 2018, 1:07 p.m., Zhitao Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66568/
> -----------------------------------------------------------
> 
> (Updated April 23, 2018, 1:07 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-4965
>     https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> These two operations are intended to be non-speculative eventually but
> are implemented as speculative right now. To avoid frameworks opt-in to
> dangerous behavior, we require that accept can only contain one
> `GROW_VOLUME` or `SHRINK_VOLUME` and no other operations.
> 
> This is implemented by reuse existing code which already drops `LAUNCH`
> or `LAUNCH_GROUP` with proper reason and message.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 767ad8cfe142b47ef07172bcb2a4fb49fc3e833a 
> 
> 
> Diff: https://reviews.apache.org/r/66568/diff/5/
> 
> 
> Testing
> -------
> 
> Behavior tested in https://reviews.apache.org/r/66569/.
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>

Re: Review Request 66568: Dropped combined operations with GROW and SHRINK volumes.

Reply via email to