> On Feb. 5, 2019, 5:41 p.m., Benjamin Bannier wrote: > > src/csi/state.proto > > Lines 62-67 (original), 62-77 (patched) > > <https://reviews.apache.org/r/69892/diff/1/?file=2123913#file2123913line62> > > > > Any reason we cannot use a single field containing the `bootId` of the > > last transition? A single field would cut down on the number of possible > > message permutations, and also allow simpler handling (branching a changed > > `boot_id`, triggering `state`-dependent handling). We could set such a > > `boot_id` whenever there is a state transition. > > Chun-Hung Hsiao wrote: > Consider the following scenario: > `CREATED` -> `NODE_READY` -> `VOL_READY` -> `PUBLISHED` -> reboot -> > `VOL_READY` -> reboot > If we share the same `boot_id` for all transitions, we won't be able to > tell that this volume has been published before. > If we dedicate `boot_id` to `PUBLISHED`, we won't be able to know that > there has been a reboot after the last `VOL_READY` so we need to call > `NodeStageVolume` again. > > Chun-Hung Hsiao wrote: > After an offline discussion, we decided to simplify the state machine, > and SLRP will try to bring a volume to `PUBLISHED` during recovery as long as > it has ever been in `VOL_READY` before. This would mean that a > misconfiguration that makes a plugin succeed on `NodeStageVolume` but fail on > `NodePublishVolume` will make the SLRP unable destroy the persisten volume, > even if no data have ever been written to it. > > James DeFelice wrote: > How would an operator recover from such a sitution?
The operator needs to be somehow notified, then investigate the problem, then restart the SLRP (through a config update or agent restart). It seems to me that the scenario where the plugin is able to do `NodeStageVolume` but not `NodePublishVolume` should be rare. However, it is more likely that the plugin is not `STAGE_UNSTAGE_VOLUME`-capable and misconfigured in a way that it cannot do `NodePublishVolume`. In this case SLRP would use a no-op to transition a volume from `NODE_READY` to `VOL_READY`, and set up the boot ID. However, because it cannot use the plugin to bring the volume to `PUBLISHED`, it won't be able to destroy the PV or recover. The originaly proposal, where we keep track of the boot IDs of `NodeStageVolume` and `NodePublishVolume` separatedly, is more resilient to bad plugins: PV destroy and SLRP recovery is blocked only when necessary. WDYT? - Chun-Hung ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69892/#review212557 ----------------------------------------------------------- On Feb. 5, 2019, 7:40 a.m., Chun-Hung Hsiao wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69892/ > ----------------------------------------------------------- > > (Updated Feb. 5, 2019, 7:40 a.m.) > > > Review request for mesos, Benjamin Bannier, James DeFelice, and Jie Yu. > > > Bugs: MESOS-9544 > https://issues.apache.org/jira/browse/MESOS-9544 > > > Repository: mesos > > > Description > ------- > > If a CSI volume has been node-published before a reboot, SLRP will now > try to bring it back to node-published again. This is important to > perform synchronous persistent volume cleanup for `DESTROY`. > > To achieve this, in addition to keeping track of the boot ID when a CSI > volume is node-staged in `VolumeState.vol_ready_boot_id` (formerly > `VolumeState.boot_id`), SLRP now also keeps track of the boot ID when > the volume is node-published. This helps SLRP to better determine if a > volume has been published before reboot. > > > Diffs > ----- > > src/csi/state.proto 264a5657dd37605a6f3bdadd0e8d18ba9673191a > src/resource_provider/storage/provider.cpp > d6e20a549ede189c757ae3ae922ab7cb86d2be2c > src/tests/storage_local_resource_provider_tests.cpp > e8ed20f818ed7f1a3ce15758ea3c366520443377 > > > Diff: https://reviews.apache.org/r/69892/diff/1/ > > > Testing > ------- > > `make check` > > Testing for publish failures will be done later in chain. > > > Thanks, > > Chun-Hung Hsiao > >