> On Aug. 28, 2017, 11:31 p.m., Jie Yu wrote:
> > src/master/validation.cpp
> > Lines 2205 (patched)
> > <https://reviews.apache.org/r/61946/diff/1/?file=1806110#file1806110line2205>
> >
> >     I think `checkpointedResources` should not be used for Resource 
> > Provider provided resources. It should only apply to agent default 
> > resources. The checkpointing should be done by the corresponding resource 
> > provider, not the agent for RP provided resources.
> >     
> >     As a result, for operations like RESERVE/UNRESERVE/CREATE/DESTROY, we 
> > need to send operation to the corresponding resource provider as well. This 
> > does make sense. If we ask agent to persist those information, what will be 
> > the semantics if the resource provider is marked as gone?
> >     
> >     However, this does get complicated if we want to guarantee ordering for 
> > operations in one `acceptOffers` call (for backwards compatibility), and we 
> > do want to allow frameworks to launch a task right after reserve operation 
> > (the current semantics).
> >     
> >     To support that, I think we need to speculatively assume the operation 
> > will be sucessful (thus allow a subsequent launch immediately at the master 
> > side). However, when the checkpointing fails, we need a way to abort the 
> > subsequent launch at the agent side. This is essentially why we CHECK fail 
> > if the checkpointing fails at the agent previously for 
> > `checkpointedResources`.
> >     
> >     For the resource provider case, we should do the same thing. We can 
> > abort the agent if a checkpointing fails. However, this only applies to the 
> > local resource provider that lives in the agent process. If a LRP is 
> > outside of the agent process, how to abort the subsequent task launch if a 
> > previous operation fails is something we should think about. For instance, 
> > always reject operations from the agent's RP manager if the operation is 
> > for a stale stream ID?

Fully agreed, thanks for bringing up the challenged with handling 
`RESERVE`/`UNRESERVE`/`CREATE`/`DESTROY` with local and external resource 
providers. An idea for solving this with external resource providers could be 
to rescind a launch, similar to how we rescind offers. E.g. an ERP would send a 
rescind message to the master which then instructs the agent to stop the launch.


- Jan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61946/#review183988
-----------------------------------------------------------


On Aug. 28, 2017, 5:28 p.m., Jan Schlicht wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61946/
> -----------------------------------------------------------
> 
> (Updated Aug. 28, 2017, 5:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Jie Yu.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added validation of resource provider operations.
> 
> 
> Diffs
> -----
> 
>   src/master/validation.hpp f4925752f20ae8ca4de1d9b4a3d5ffc394db9585 
>   src/master/validation.cpp 7c3247d407c9e6aa8cce457d6c6be0c39f4b532f 
> 
> 
> Diff: https://reviews.apache.org/r/61946/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jan Schlicht
> 
>

Reply via email to