> On June 8, 2016, 1:28 p.m., Neil Conway wrote:
> > Overall seems like a reasonable approach.
> >
> > One thing that isn't clear to me: what is the advantage of updating the
> > checkpoint to reflect any partial work that was done before exiting? It
> > seems that adds a bunch of complexity and room for error. Why not only
> > update the checkpoint if all changes were made successfully?
>
> Anindya Sinha wrote:
> We would need to maintain what was actually successful in any case since
> in a DESTROY, a failed rmdir does not lead to the agent exiting. So, if we
> were to do it at one place, we would still need to keep account of the
> successful operations so as to not update the checkpoint based on a failed
> rmdir as an example (and hence can be a partial update).
>
> Since we are keeping track of result of the operations anyway, I think it
> is a good idea to update before exiting (only place we do that when CREATE
> fails and the agent exits) so that the subsequent handling of
> CheckpointResources does not need to redo such operations when the agent
> reregisters.
>
> Neil Conway wrote:
> On reflection, I wonder whether we should be handling `CREATE` errors
> differently from `DESTROY` errors. In both cases, the user has asked the
> agent to do something it wasn't able to do. A failed `DESTROY` has the
> addditional concern that we might have destroyed some but not all of the data
> on the volume.
>
> Do you think handling `CREATE` vs. `DESTROY` errors differently is a good
> idea?
>
> Anindya Sinha wrote:
> Good point. Here is what I think are the use cases:
> Say we have checkpointed resources (CR) as {V1,V2} where V1, V2 are 2
> persistent volumes. So, CR(master) = {V1,V2}, and CR(agent) = {V1,V2}.
> If we now receive a DESTROY(V2): CR(master) = {V1} [since master's view
> is not dependent on what happened on the agent]. Suppose that fails on the
> agent, so CR(agent) = {V1,V2} [since we do not update checkpoint resources on
> agent on failure in DESTROY, which results in inconsistency between master
> and agent at this point of time].
>
> Case 1 (current implementation): Agent does not restart on failure in
> DESTROY. Hence, CR(agent) = {V1,V2}. When the next CheckpointResources is
> received, ie. on a subsequent CREATE/DESTROY/RESERVE/UNRESERVE on a different
> resource, DESTROY(V2) will be reattempted and if that is successful, we will
> in sync between agent and master. However if the next CheckpointResources is
> due to a CREATE(V2) [that can happen since V2 is available as a resource
> based on offer from master], that would be a no-op on agent since agent does
> not treat it as a new resource based on the checkpoint since at this point
> CR(master) = {V1,V2}, and CR(agent) = {V1,V2}, which would be a problem.
>
> Case 2 (if we exit agent on failure): The agent restarts which triggers a
> CheckpointResources from master->agent on ReregisterSlave. That would force a
> reattempt of DESTROY(V2) since current view is CR(master) = {V1} and
> CR(agent) = {V1,V2} which will reattempt to bring the checkpointed resources
> back in sync between master and agent.
>
> So, I think it might be a better option to exit the agent on failure in
> DESTROY as well. However, I think we should still update the checkpoint based
> on the status of successful operations (other CREATE/DESTROY) on failure
> (when agent exits) so as to avoid these operations to be repeated in a
> subsequent CheckpointResources message. Does that sound reasonable to you?
>
> Note: I think this use case probably is a good example to consider
> StatusUpdates (or something similar) for operations on reserved resources,
> viz. CREATE/DESTROY/RESERVE/UNRESERVE to ensure master and agent are in sync
> to ensure guaranteed view of offers (to frameworks) for reserved resources.
Thanks for the writeup! Makes sense.
If we cause the agent to exit on DESTROY failure, if we only re-apply DESTROYs
at `ReregisterSlave`, it seems to me there is still a window in which another
`CREATE` can be applied at the master. That would mean we wouldn't reapply the
`DESTROY`, which would be bad.
My concern about updating the checkpointed resources to reflect *partial*
results is that (a) it seems unnecessary, (b) it means the checkpointed
resources don't reflect either the *original* or the *desired* state of the
agent, which seems problematic.
What about the following: when an agent gets a `CheckpointedResourcesMessage`
that differs from its current state, it first writes the desired resource state
to a new checkpoint file, `checkpoint.target`. Then it tries to apply the delta
between `checkpoint.target` and the current checkpoint. If the on-disk state at
the agent is updated successfully to match `checkpoint.target`, we rename
`checkpoint.target` -> `checkpoint` and we're done. Otherwise, if any I/O
operation fails, we exit the agent. Then on restart, we check for the existence
of `checkpoint.target` *before* we try to reregister with the master; if
`checkpoint.target` != `checkpoint`, we resume trying to apply the delta
between `checkpoint.target` and `checkpoint`; if that fails we exit again, and
if it succeeds we rename `checkpoint.target` -> `checkpoint` and then continue
to reregister with the master.
Let me know what you think.
- Neil
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48313/#review136638
-----------------------------------------------------------
On June 9, 2016, 12:22 a.m., Anindya Sinha wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48313/
> -----------------------------------------------------------
>
> (Updated June 9, 2016, 12:22 a.m.)
>
>
> Review request for mesos, Neil Conway and Jiang Yan Xu.
>
>
> Bugs: MESOS-5448
> https://issues.apache.org/jira/browse/MESOS-5448
>
>
> Repository: mesos
>
>
> Description
> -------
>
> o Checkpoints on the agent are updated only after successful handling
> of persistent volume creation and deletion to maintain consistency.
> o If volume creation or deletion fails, checkpoint is updated up until
> that point, and the agent exits.
> o This ensures that after a agent restart, checkpoints are in sync
> between the master and the agent after the reregistration workflow.
>
>
> Diffs
> -----
>
> include/mesos/resources.hpp a557e97c65194d4aad879fb88d8edefd1c95b8d8
> include/mesos/v1/resources.hpp a5ba8fec4c9c3643646308f75a4b28cefe0b3df3
> src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968
> src/slave/slave.cpp d635dd2c6f6fce5a9eeefc5dcdf84e00cdc833b6
> src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a
>
> Diff: https://reviews.apache.org/r/48313/diff/
>
>
> Testing
> -------
>
> All tests passed.
>
>
> Thanks,
>
> Anindya Sinha
>
>