> On April 23, 2018, 3:43 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Line 4512 (original), 4515-4527 (patched)
> > <https://reviews.apache.org/r/66568/diff/5/?file=2010510#file2010510line4515>
> >
> >     I was just wondering - perhaps this code belongs in the validation 
> > function for the ACCEPT call? Since this is something that we can easily 
> > confirm synchronously in the top-level scheduler API handler and return 
> > 400, rather than calling into the accept handlers.
> 
> Zhitao Li wrote:
>     I'm fine in either case as long as it provides better consistency and 
> less code duplication (which I think is a problem). I guess the reason there 
> is a split is due to that AuthZ is async right now.
>     
>     I actualy wonder what happens if we move the slave connectivity checks to 
> `Master::accept()`: since master to agent connection through network can be 
> broken at any moment, I do not see checking that in `_accept()` is 100% safe 
> anyway, and `send()` calls to agent need to handle failures anyway.
>     
>     Alternatively, we can consider moving authorization checks as early as we 
> can, and condense all other validations after AuthZ returned (assuming AuthZ 
> do not depends on these validations).
> 
> Zhitao Li wrote:
>     (replied too fast) I feel both work can reduce the code duplicate, and 
> makes it less ambiguous for future development.
> 
> Chun-Hung Hsiao wrote:
>     IMO in general it's better to have validations before authN/authZ, 
> because validations is usually more light-weight than authN/authZ. Also, 
> validations should be synchronous, meaning that if the call is invalid, we 
> will get rid of it faster from the memory (rather than waiting for the 
> authN/authZ to finish).
> 
> Chun-Hung Hsiao wrote:
>     But in this case since this is temporary, I'm fine with putting the logic 
> here.
> 
> Chun-Hung Hsiao wrote:
>     Question: Are these operations usable with this restriction? It seems to 
> me that with this restriction, the framework will need to maintain one more 
> extra state: 1. reserve enough CPUs, mem and free disk (otherwise it is 
> possible that the framework cannot get enough CPUs and mem after growing the 
> volume); 2. grow the persistent volume; 3. use the grown volume. If we 
> implement a proper validation right now, 1 and 2 can be combined, and there 
> will be no need to rewrite the framework in the future. Can you explain again 
> what the concern of implementing the correct validation logic again?
> 
> Zhitao Li wrote:
>     Re: order of valiation vs AuthZ: I agree they should be consistency, but 
> we already have splits between `accept()` and `_accept()` and framework could 
> see operations dropped either before or after AuthZ too. Given that 1) AuthZ 
> will be async by nature and 2) certain states could change between `accept()` 
> and `_accept()` (offer validaty, agent connectivity), I feel that only check 
> AuthZ on `accept()` but delay any actual validation to `_accept()` results in 
> most consistent behavior. Indeed this could increase calls to authorizor but 
> I don't know whether that can be quantified.
>     
>     Re: usability of framework: I generally believe that framework always 
> needs to model each state to handle corner cases even if it uses chained 
> operations (because that can have partial success). That's also why I hope to 
> eventually not support chained operations at all. In the final form of the 
> code, whatever "correct validation logic" we write now will not be necessary, 
> and we can implicitly rely on `offeredResources` containment checks chained 
> operations (because `converted` resources will not be added back 
> immediately). Implementing this right now would require us to add extra 
> variables like `observedOfferedResources` and `actualOfferedResources` and 
> check them on each `case` of operations, which is not be as clean or as local 
> as current patch (which also easier to back out).
> 
> Chun-Hung Hsiao wrote:
>     I'm not sure if I get it. Instead of applying the conversion on 
> `_offeredResources`, we could just do
>     ```
>     _offeredResources -= consumed;
>     ```
>     Wouldn't this solve the issue? Am I missing anything?
> 
> Zhitao Li wrote:
>     This code block 
> (https://github.com/apache/mesos/blob/bd688e4cf9f6f35c9d75aad50b99e1fdeb8104da/src/master/master.cpp#L5502)
>  requires `_offeredResources` properly tracks what allocator needs to 
> recover. I believe we still need to handle `converted` part of `grow` and 
> `shrink`, otherwise allocator could see inconsitent views.
> 
> Chun-Hung Hsiao wrote:
>     Thanks for point it out Zhitao.
>     
>     Just briefly chatted with Greg, and it seems that I misunderstood what he 
> suggested :(
>     He suggested to check the size of operations here: 
> https://github.com/apache/mesos/blob/master/src/master/validation.cpp#L547
>     which will lead to the entire call being dropped: 
> https://github.com/apache/mesos/blob/master/src/master/master.cpp#L2350
>     And we don't need to send status updates for each LAUNCH and 
> LAUNCH_GROUP. It seems cleaner to me.
>     
>     re: usability of framework. If your framework team has no problem using 
> the call this way, or doesn't mind updating their frameworks, then  it's 
> probably fine to have this restriction ;)

I like the suggestion to perform the validation there, although I want to know 
how https://github.com/apache/mesos/blob/master/src/master/master.cpp#L2350 
handles recovery of resources from offers if it just drops the operation w/o 
going through offered resources in each offers (or maybe it does not ?)


- Zhitao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66568/#review201785
-----------------------------------------------------------


On April 23, 2018, 1:07 p.m., Zhitao Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66568/
> -----------------------------------------------------------
> 
> (Updated April 23, 2018, 1:07 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Greg Mann.
> 
> 
> Bugs: MESOS-4965
>     https://issues.apache.org/jira/browse/MESOS-4965
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> These two operations are intended to be non-speculative eventually but
> are implemented as speculative right now. To avoid frameworks opt-in to
> dangerous behavior, we require that accept can only contain one
> `GROW_VOLUME` or `SHRINK_VOLUME` and no other operations.
> 
> This is implemented by reuse existing code which already drops `LAUNCH`
> or `LAUNCH_GROUP` with proper reason and message.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp c723a291ed8d219ced4492bc905ac6b52683ae47 
> 
> 
> Diff: https://reviews.apache.org/r/66568/diff/6/
> 
> 
> Testing
> -------
> 
> Behavior tested in https://reviews.apache.org/r/66569/.
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>

Reply via email to