> On June 8, 2016, 1:28 p.m., Neil Conway wrote: > > Overall seems like a reasonable approach. > > > > One thing that isn't clear to me: what is the advantage of updating the > > checkpoint to reflect any partial work that was done before exiting? It > > seems that adds a bunch of complexity and room for error. Why not only > > update the checkpoint if all changes were made successfully?
We would need to maintain what was actually successful in any case since in a DESTROY, a failed rmdir does not lead to the agent exiting. So, if we were to do it at one place, we would still need to keep account of the successful operations so as to not update the checkpoint based on a failed rmdir as an example (and hence can be a partial update). Since we are keeping track of result of the operations anyway, I think it is a good idea to update before exiting (only place we do that when CREATE fails and the agent exits) so that the subsequent handling of CheckpointResources does not need to redo such operations when the agent reregisters. - Anindya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48313/#review136638 ----------------------------------------------------------- On June 7, 2016, 5:52 p.m., Anindya Sinha wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48313/ > ----------------------------------------------------------- > > (Updated June 7, 2016, 5:52 p.m.) > > > Review request for mesos and Jiang Yan Xu. > > > Bugs: MESOS-5448 > https://issues.apache.org/jira/browse/MESOS-5448 > > > Repository: mesos > > > Description > ------- > > o Checkpoints on the agent are updated only after successful handling > of persistent volume creation and deletion to maintain consistency. > o If volume creation or deletion fails, checkpoint is updated up until > that point, and the agent exits. > o This ensures that after a agent restart, checkpoints are in sync > between the master and the agent after the reregistration workflow. > > > Diffs > ----- > > include/mesos/resources.hpp a557e97c65194d4aad879fb88d8edefd1c95b8d8 > include/mesos/v1/resources.hpp a5ba8fec4c9c3643646308f75a4b28cefe0b3df3 > src/common/resources.cpp f6ff92b591c15bc8e93fd85e1896349c3a7bb968 > src/slave/slave.cpp 03a845ebdbb057051ced2d6b7a0d3e5aafd5ac55 > src/v1/resources.cpp 8c3f2d1c1529915a59d47fe37bb3fc7a3267079a > > Diff: https://reviews.apache.org/r/48313/diff/ > > > Testing > ------- > > All tests passed. > > > Thanks, > > Anindya Sinha > >
