Re: Review Request 69010: Synced SLRP checkpoints to the filesystem.

Chun-Hung Hsiao Mon, 15 Oct 2018 19:49:35 -0700


> On Oct. 15, 2018, 2:21 p.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 1158-1171 (patched)
> > <https://reviews.apache.org/r/69010/diff/1/?file=2096857#file2096857line1158>
> >
> >     Since the connection of this to the surrounding could is not 
> > immediately clear, could this be made part of 
> > `recoverResourceProviderState`?
> 
> Chun-Hung Hsiao wrote:
>     This is for constructing the list of operations for SUM (which is an 
> external component wrt the SLRP) to report its checkpointing, so the SLRP 
> could *reconcile* the the status updates. `recoverResourceProviderState` is 
> for reading the SLRP's own state checkpoint and reconstruct its bookkeeping 
> in memory so it seems not a good idea to mix the logic here and that function.


Did a bit of refactoring. Now the garbage collection logic is moved to the 
`garbageCollectOperationPath` function.


> On Oct. 15, 2018, 2:21 p.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 1175-1187 (original), 1191-1203 (patched)
> > <https://reviews.apache.org/r/69010/diff/1/?file=2096857#file2096857line1191>
> >
> >     Since at this point `uuid` is already not tracked anymore, I'd suggest 
> > to move this garbage collection into `checkpointResourceProviderState`.
> >     
> >     In that more general approach we should probably always check that 
> > `path` exists before turning an `Error` to remove it into a `Failure`.
> 
> Chun-Hung Hsiao wrote:
>     The path is removed only if the operation is terminal. But we call 
> `checkpointResourceProviderState` after every change in totol resources, 
> operations, etc. Since this logic is exercised at a specific scenario, it 
> seems better to keep that function simple. 
>     
>     Also note that the operation path is created by the SUM, not by SLRP.

Refactored to `garbageCollectOperationPath`.


> On Oct. 15, 2018, 2:21 p.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Lines 1797-1811 (original), 1814-1828 (patched)
> > <https://reviews.apache.org/r/69010/diff/1/?file=2096857#file2096857line1814>
> >
> >     It seems this could be part of `checkpointResourceProviderState`, see 
> > above.

Refactored to `garbaceCollectOperationPath`.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69010/#review209542
-----------------------------------------------------------


On Oct. 15, 2018, 10:08 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69010/
> -----------------------------------------------------------
> 
> (Updated Oct. 15, 2018, 10:08 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9281
>     https://issues.apache.org/jira/browse/MESOS-9281
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently if a system crashes, SLRP checkpoints might not be synced to
> the filesystem, so it is possible that an old or empty checkpoint will
> be read upon recovery. Moreover, if a CSI call has been issued right
> before the crash, the recovered state may be inconsistent with the
> actual state reported by the plugin. For example, the plugin might have
> created a volume but the checkpointed state does not know about it.
> 
> To avoid this inconsistency, we always call fsync()  when checkpointing
> SLRP states.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/storage/provider.cpp 
> db783b53558811081fb2671e005e8bbbd9edbede 
>   src/slave/state.hpp 003211e4670c1092acb1634220d76bafd39e3a20 
> 
> 
> Diff: https://reviews.apache.org/r/69010/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>

Re: Review Request 69010: Synced SLRP checkpoints to the filesystem.

Reply via email to