> On Feb. 5, 2019, 5:41 p.m., Benjamin Bannier wrote:
> > src/csi/state.proto
> > Lines 62-67 (original), 62-77 (patched)
> > <https://reviews.apache.org/r/69892/diff/1/?file=2123913#file2123913line62>
> >
> >     Any reason we cannot use a single field containing the `bootId` of the 
> > last transition? A single field would cut down on the number of possible 
> > message permutations, and also allow simpler handling (branching a changed 
> > `boot_id`, triggering `state`-dependent handling). We could set such a 
> > `boot_id` whenever there is a state transition.
> 
> Chun-Hung Hsiao wrote:
>     Consider the following scenario:
>     `CREATED` -> `NODE_READY` -> `VOL_READY` -> `PUBLISHED` -> reboot -> 
> `VOL_READY` -> reboot
>     If we share the same `boot_id` for all transitions, we won't be able to 
> tell that this volume has been published before.
>     If we dedicate `boot_id` to `PUBLISHED`, we won't be able to know that 
> there has been a reboot after the last `VOL_READY` so we need to call 
> `NodeStageVolume` again.
> 
> Chun-Hung Hsiao wrote:
>     After an offline discussion, we decided to simplify the state machine, 
> and SLRP will try to bring a volume to `PUBLISHED` during recovery as long as 
> it has ever been in `VOL_READY` before. This would mean that a 
> misconfiguration that makes a plugin succeed on `NodeStageVolume` but fail on 
> `NodePublishVolume` will make the SLRP unable destroy the persisten volume, 
> even if no data have ever been written to it.
> 
> James DeFelice wrote:
>     How would an operator recover from such a sitution?

The operator needs to be somehow notified, then investigate the problem, then 
restart the SLRP (through a config update or agent restart).

It seems to me that the scenario where the plugin is able to do 
`NodeStageVolume` but not `NodePublishVolume` should be rare.

However, it is more likely that the plugin is not 
`STAGE_UNSTAGE_VOLUME`-capable and misconfigured in a way that it cannot do 
`NodePublishVolume`.
In this case SLRP would use a no-op to transition a volume from `NODE_READY` to 
`VOL_READY`, and set up the boot ID. However, because it cannot use the plugin 
to bring the volume to `PUBLISHED`, it won't be able to destroy the PV or 
recover.

The originaly proposal, where we keep track of the boot IDs of 
`NodeStageVolume` and `NodePublishVolume` separatedly, is more resilient to bad 
plugins: PV destroy and SLRP recovery is blocked only when necessary. WDYT?


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69892/#review212557
-----------------------------------------------------------


On Feb. 5, 2019, 7:40 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69892/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2019, 7:40 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, James DeFelice, and Jie Yu.
> 
> 
> Bugs: MESOS-9544
>     https://issues.apache.org/jira/browse/MESOS-9544
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If a CSI volume has been node-published before a reboot, SLRP will now
> try to bring it back to node-published again. This is important to
> perform synchronous persistent volume cleanup for `DESTROY`.
> 
> To achieve this, in addition to keeping track of the boot ID when a CSI
> volume is node-staged in `VolumeState.vol_ready_boot_id` (formerly
> `VolumeState.boot_id`), SLRP now also keeps track of the boot ID when
> the volume is node-published. This helps SLRP to better determine if a
> volume has been published before reboot.
> 
> 
> Diffs
> -----
> 
>   src/csi/state.proto 264a5657dd37605a6f3bdadd0e8d18ba9673191a 
>   src/resource_provider/storage/provider.cpp 
> d6e20a549ede189c757ae3ae922ab7cb86d2be2c 
>   src/tests/storage_local_resource_provider_tests.cpp 
> e8ed20f818ed7f1a3ce15758ea3c366520443377 
> 
> 
> Diff: https://reviews.apache.org/r/69892/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Testing for publish failures will be done later in chain.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>

Reply via email to