Re: Review Request 48313: Creation and deletion of persistent volumes across agent restart.

Neil Conway Mon, 13 Jun 2016 02:51:55 -0700


> On June 8, 2016, 1:28 p.m., Neil Conway wrote:
> > Overall seems like a reasonable approach.
> > 
> > One thing that isn't clear to me: what is the advantage of updating the 
> > checkpoint to reflect any partial work that was done before exiting? It 
> > seems that adds a bunch of complexity and room for error. Why not only 
> > update the checkpoint if all changes were made successfully?
> 
> Anindya Sinha wrote:
>     We would need to maintain what was actually successful in any case since 
> in a DESTROY, a failed rmdir does not lead to the agent exiting. So, if we 
> were to do it at one place, we would still need to keep account of the 
> successful operations so as to not update the checkpoint based on a failed 
> rmdir as an example (and hence can be a partial update).
>     
>     Since we are keeping track of result of the operations anyway, I think it 
> is a good idea to update before exiting (only place we do that when CREATE 
> fails and the agent exits) so that the subsequent handling of 
> CheckpointResources does not need to redo such operations when the agent 
> reregisters.
> 
> Neil Conway wrote:
>     On reflection, I wonder whether we should be handling `CREATE` errors 
> differently from `DESTROY` errors. In both cases, the user has asked the 
> agent to do something it wasn't able to do. A failed `DESTROY` has the 
> addditional concern that we might have destroyed some but not all of the data 
> on the volume.
>     
>     Do you think handling `CREATE` vs. `DESTROY` errors differently is a good 
> idea?
> 
> Anindya Sinha wrote:
>     Good point. Here is what I think are the use cases:
>     Say we have checkpointed resources (CR) as {V1,V2} where V1, V2 are 2 
> persistent volumes. So, CR(master) = {V1,V2}, and CR(agent) = {V1,V2}.
>     If we now receive a DESTROY(V2): CR(master) = {V1} [since master's view 
> is not dependent on what happened on the agent]. Suppose that fails on the 
> agent, so CR(agent) = {V1,V2} [since we do not update checkpoint resources on 
> agent on failure in DESTROY, which results in inconsistency between master 
> and agent at this point of time].
>     
>     Case 1 (current implementation): Agent does not restart on failure in 
> DESTROY. Hence, CR(agent) = {V1,V2}. When the next CheckpointResources is 
> received, ie. on a subsequent CREATE/DESTROY/RESERVE/UNRESERVE on a different 
> resource, DESTROY(V2) will be reattempted and if that is successful, we will 
> in sync between agent and master. However if the next CheckpointResources is 
> due to a CREATE(V2) [that can happen since V2 is available as a resource 
> based on offer from master], that would be a no-op on agent since agent does 
> not treat it as a new resource based on the checkpoint since at this point 
> CR(master) = {V1,V2}, and CR(agent) = {V1,V2}, which would be a problem.
>     
>     Case 2 (if we exit agent on failure): The agent restarts which triggers a 
> CheckpointResources from master->agent on ReregisterSlave. That would force a 
> reattempt of DESTROY(V2) since current view is CR(master) = {V1} and 
> CR(agent) = {V1,V2} which will reattempt to bring the checkpointed resources 
> back in sync between master and agent.
>     
>     So, I think it might be a better option to exit the agent on failure in 
> DESTROY as well. However, I think we should still update the checkpoint based 
> on the status of successful operations (other CREATE/DESTROY) on failure 
> (when agent exits) so as to avoid these operations to be repeated in a 
> subsequent CheckpointResources message. Does that sound reasonable to you?
>     
>     Note: I think this use case probably is a good example to consider 
> StatusUpdates (or something similar) for operations on reserved resources, 
> viz. CREATE/DESTROY/RESERVE/UNRESERVE to ensure master and agent are in sync 
> to ensure guaranteed view of offers (to frameworks) for reserved resources.

Thanks for the writeup! Makes sense.

If we cause the agent to exit on DESTROY failure, if we only re-apply DESTROYs 
at `ReregisterSlave`, it seems to me there is still a window in which another 
`CREATE` can be applied at the master. That would mean we wouldn't reapply the 
`DESTROY`, which would be bad.

My concern about updating the checkpointed resources to reflect *partial* 
results is that (a) it seems unnecessary, (b) it means the checkpointed 
resources don't reflect either the *original* or the *desired* state of the 
agent, which seems problematic.

What about the following: when an agent gets a `CheckpointedResourcesMessage` 
that differs from its current state, it first writes the desired resource state 
to a new checkpoint file, `checkpoint.target`. Then it tries to apply the delta 
between `checkpoint.target` and the current checkpoint. If the on-disk state at 
the agent is updated successfully to match `checkpoint.target`, we rename 
`checkpoint.target` -> `checkpoint` and we're done. Otherwise, if any I/O 
operation fails, we exit the agent. Then on restart, we check for the existence 
of `checkpoint.target` *before* we try to reregister with the master; if 
`checkpoint.target` != `checkpoint`, we resume trying to apply the delta 
between `checkpoint.target` and `checkpoint`; if that fails we exit again, and 
if it succeeds we rename `checkpoint.target` -> `checkpoint` and then continue 
to reregister with the master.

Let me know what you think.

- Neil

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48313/#review136638
-----------------------------------------------------------

On June 9, 2016, 12:22 a.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48313/
> -----------------------------------------------------------
> 
> (Updated June 9, 2016, 12:22 a.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-5448
>     https://issues.apache.org/jira/browse/MESOS-5448
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> o Checkpoints on the agent are updated only after successful handling
>   of persistent volume creation and deletion to maintain consistency.
> o If volume creation or deletion fails, checkpoint is updated up until
>   that point, and the agent exits.
> o This ensures that after a agent restart, checkpoints are in sync
>   between the master and the agent after the reregistration workflow.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp a557e97c65194d4aad879fb88d8edefd1c95b8d8 
>   include/mesos/v1/resources.hpp a5ba8fec4c9c3643646308f75a4b28cefe0b3df3 
>   src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 
>   src/slave/slave.cpp d635dd2c6f6fce5a9eeefc5dcdf84e00cdc833b6 
>   src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a 
> 
> Diff: https://reviews.apache.org/r/48313/diff/
> 
> 
> Testing
> -------
> 
> All tests passed.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>

Re: Review Request 48313: Creation and deletion of persistent volumes across agent restart.

Reply via email to