Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency

Walid Nouri Thu, 11 Sep 2014 15:09:48 -0700

Am 11.09.2014 19:44, schrieb Dr. David Alan Gilbert:

For keeping the complete system state consistent on the secondary system
there must be a possibility for MC to commit/discard block device state
changes. In normal operation the mirrored block device state changes (block
buffer) are committed to disk when the complete checkpoint is committed. In
case of a crash of the primary system while transferring a checkpoint the
data in the block buffer corresponding to the failed Checkpoint must be
discarded.


I think for COLO there's a requirement that the secondary can do reads/writes
in parallel with the primary, and the secondary can discard those reads/writes
- and that doesn't happen in MC (Yang or Eddie should be able to confirm that).

The storage architecture should be ???shared nothing??? so that no shared
storage is required and primary/secondary can have separate block device
images.


I admit that my formulation was unintentionally a bit ambiguous :)
I should have written that a shared storage should not be mandatory.

I'm comming from an SMB environment and (redundant) shared storagesystems are still not usual in small companies :)

I looked for a storage agnostic approach which allows the number ofsystem components to be as low as possible and still get redundancy andfault tolerance.


MC/COLO with shared storage still needs some stuff like this; but it's subtely
different.   They still need to be able to buffer/release modifications
to the shared storage; if any of this code can also be used in the
shared-storage configurations it would be good.

The proposed approach with block filter and the commit/discard protocolshould be storage agnostic and will also work in a shared storageenvironment, but only with distinct images (because of the protocol).

In case of a shared storage and a common image used by the primary andsecondary another storage protocol must be used.


It's not commit/discard but commit/rollback

The primary still sends asynchronously the block state changes. Thesecondary buffers block device state changes but doesn't apply them innormal operation. When the next checkpoint is complete the secondaryclears the buffer and forgets about the old block state data.

If the primary fails the secondary must rollback the common image withthe block state data corresponding to the actual checkpoint.Otherwise the state of the image and rest of the system state on thesecondary will not be in sync.

When there is no block state data corresponding to the actualcheckpoint, then there is nothing to do on the storage for the secondary :)

There is a little danger in this though. When the secondary fails duringrollback, the common image will be left in an inconsistent state.I think this risk cannot be avoided when using a common image. But thisunfortunate situation can also happen in other scenarios.

Sharing a common immage with this protocol will lead to a longer failover time in case of existing block device state data for the actualcheckpoint. The secondary must initiate the rollback and wait until allblocks of the actual checkpoint are commited to the common immage beforetaking over the active role.



Walid

Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency

Reply via email to