Re: Review Request 63094: Added resource sequence id for offer operations.

Greg Mann Wed, 18 Oct 2017 11:28:46 -0700


> On Oct. 18, 2017, 9:46 a.m., Benjamin Bannier wrote:
> > include/mesos/resource_provider/resource_provider.proto
> > Lines 56-66 (patched)
> > <https://reviews.apache.org/r/63094/diff/1/?file=1861954#file1861954line56>
> >
> >     I would prefer if we would not assume that this number changes for 
> > _every_ resource change, but just for failures.
> >     * * *
> >     
> >     For one we do not want users to assume that this number has certain 
> > values so we can recognize failures. Imagine a user applied an operation to 
> > the resource provider resources (`sequence_id=0`), and the agent speculated 
> > that the new `sequence_id` would be `1`. Now the operation fails in the RP 
> > and its `sequence_id` would be incremented to `1`. We cannot distinguish a 
> > case where the agent assumed the operation succeeded from a case where it 
> > failed based on the clock value alone in this case. Would we just increase 
> > this number in case of failures it would be clear that any operation 
> > assuming `sequence_id=0` would be invalid after the failure since the agent 
> > would be unlikely to assume failures on a "normal" execution path. 
> >     
> >     * * *
> >     
> >     Another reason would be that it would be great to be able to use 
> > similar notions of `sequence_id` for the agent's view of its RPs, and the 
> > master's view of its agents. We plan to implement speculative application 
> > of "legacy" offer operations like e.g., `RESERVE`, `CREATE` in the master, 
> > and incrementing this number even in successful cases would require too 
> > much knowledge about how the agent applies these operations in the master.
> >     
> >     Imagine an agent with resource provider resources `disk(*):1` and 
> > `disk(*):1` (each `disk` from a different resource provider). The agent 
> > will expose `disk(*):2` to the master. A framework now reserves, 
> > `disk(*):2->disk(A):2`. The agent would inform the resource providers about 
> > that operation. Let's say the first of these operations succeeds, but the 
> > second on fails (the argument works as well for the other order, or even 
> > concurrent application). Having the master increment this 
> > `resource_sequence_id` for speculated operations would require it to know 
> > that the single `RESERVE` operation did map onto two state changes, i.e., 
> > it should have incremented its agent `resource_sequence_id` by two instead 
> > of just by one.
> >     
> >     This seems not only leaky, but also inflexible since it becomes hard to 
> > decompose an agent's resources without affecting the master's assumptions.
> >     
> >     * * *
> >     
> >     Having `sequence_id` just increase in case of failures would eliminate 
> > a large class of cases where users would need to make assumption about how 
> > it does change. Instead we would more clearly move into a direction where 
> > it is only changed by pushes from lower layers (where an operation either 
> > succeeded or failed) to user layers above. This should also simplify how we 
> > would project multiple `sequence_id`s onto a single `sequence_id` (multiple 
> > resource provider `sequence_id`s onto agent `sequence_id`); in that case we 
> > could just take the maximum of all RP IDs and the agent ID (e.g., last 
> > failure of an operation on any RP is equivalent to last failure of any 
> > operation on the agent). Counting all operations makes this harder, 
> > especially when thinking of master-speculated operations.


This confused me a bit: "Having the master increment this resource_sequence_id 
for speculated operations"

IIUC, the master will only increment the sequence ID when it receives a 
{Register,Reregister,Update}SlaveMessage from the agent containing a new ID. 
So, I don't think the ID is incremented routinely during the successful 
application of offer operations.

My understanding of the comment in this patch is that the sequence ID will be 
incremented only when a legacy operation fails, or when the RP has its total 
resources updated (i.e., storage backend has more/less storage to offer and 
updates the CSI plugin/RP accordingly).


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63094/#review188465
-----------------------------------------------------------


On Oct. 17, 2017, 11:12 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63094/
> -----------------------------------------------------------
> 
> (Updated Oct. 17, 2017, 11:12 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, Chun-Hung Hsiao, 
> Gaston Kleiman, Greg Mann, Joseph Wu, Jan Schlicht, and Vinod Kone.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This sequence id is used to establish ordering between the operation
> and the resources that the operation is operating on.  Each agent (or
> external resource provider) will keep a sequence id, and increments it
> when the resources on the agent (or the external resource provider)
> has changed. The master will keep track of the last known agent (or
> external resource provider) sequence id, and attach this sequence id
> in each operation it sends out. The agent (or the external resource
> provider) should reject operations that have smaller sequence id than
> its own sequence id, because this means the operation is operating on
> resources that might have already been invalidated.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resource_provider/resource_provider.proto 
> f5a9073075327019fd133bd51265f695ef464845 
>   include/mesos/v1/resource_provider/resource_provider.proto 
> e5cbede5b6e57a8641fca1ebfee5454f292cc24c 
>   src/messages/messages.proto 0a32b3457e9143a7d48670610ca3e56dd516136f 
> 
> 
> Diff: https://reviews.apache.org/r/63094/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>

Re: Review Request 63094: Added resource sequence id for offer operations.

Reply via email to