> On Nov. 11, 2015, 9:28 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 4244-4247
> > <https://reviews.apache.org/r/40177/diff/1/?file=1122973#file1122973line4244>
> >
> >     why do it here instead of in recoverFramework() #4363? that feels more 
> > consistent with #1345.
> 
> James Peach wrote:
>     I did this after recovery because the original code did not write 
> framework checkpoints if the slave was in RECOVERING state. I did not see a 
> reason for that, but decided to preserve the behavior as much as possible 
> just in case.
> 
> Vinod Kone wrote:
>     originally, the slave didn't checkpoint framework during recovery stage 
> because it was not needed. if it is creating a framework object during 
> recovery, it is because it read the checkpointed data. so no need to 
> checkpoint again. 
>     
>     but due to the compatibility issue you found, the slave can re-checkpoint 
> framework info during recovery because the framework info is *updated*. so i 
> would recommend moving this down to #1345 and do re-checkpoint if necessary.

Fixed and re-tested. We now upgrade the checkpoint during recovery but only if 
we need to.


- James


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40177/#review106142
-----------------------------------------------------------


On Nov. 12, 2015, 5:41 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40177/
> -----------------------------------------------------------
> 
> (Updated Nov. 12, 2015, 5:41 a.m.)
> 
> 
> Review request for mesos, Kapil Arya and Vinod Kone.
> 
> 
> Bugs: MESOS-3834
>     https://issues.apache.org/jira/browse/MESOS-3834
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When performing an upgrade cycle, it is possible for a 0.24 and
> later agent to recover from a framework checkpoint written by 0.22
> or earlier. In this case, we need to compatibly accept a missing
> FrameworkID, and then rewrite the framework checkpoint so that
> subsequent upgrades don't hit the same problem.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp ec2dfa99e6b553e2bcd82d12db915ae8625075a1 
>   src/slave/slave.cpp ac2d0e0153721a66495cd6539b25f5b3cee9d979 
> 
> Diff: https://reviews.apache.org/r/40177/diff/
> 
> 
> Testing
> -------
> 
> make check on CentOS 6.7.
> Manual testing with a rolling upgrade from 0.22
> 
> 
> Thanks,
> 
> James Peach
> 
>

Reply via email to