Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency

Michael R. Hines Wed, 13 Aug 2014 22:51:37 -0700

On 08/13/2014 10:03 PM, Walid Nouri wrote:

While looking to find some ideas for approaches to replicating blockdevices I have read the paper about the Remus implementation. I thinkMC can take a similar approach for local disk.


I agree.

Here are the main facts that I have understood:
Local disk contents is viewed as internal state the primary andsecondary.In the explanation they describe that for keeping disc semantics ofthe primary and to allow the primary to run speculatively all discstate changes are directly written to the disk. In parrallel andasynchronously send to the secondary. The secondary keeps the pendingwriting requests in two disk buffers. A speculation-disk-buffer and awrite-out-buffer.
After the reception of the next checkpoint the secondary copies thespeculation buffer to the write out buffer, commits the checkpoint andapplies the write out buffer to its local disk.
When the primary fails the secondary must wait until write-out-bufferhas been completely written to disk before before changing theexecution mode to run as primary. In this case (failure of primary)the secondary discards pending disk writes in its speculation buffer.This protocol keeps the disc state consistent with the last checkpoint.
Remus uses the XEN specific blktap driver. As far as I know this can’tbe used with QEMU (KVM).
I must see how drive-mirror can be used for this kind of protocol.

That's all correct. Theoretically, we would do exactly the same thing:drive-mirror on the source would write immediately to disk but followthe same commit semantics on the destination as Xen.


I have taken a look at COLO.

IMHO there are two points. Custom changes of the TCP-Stack are a no-gofor proprietary operating systems like Windows. It makes COLOapplication agnostic but not operating system agnostic. The otherpoint is that with I/O intensive workloads COLO will tend to behavelike MC. This is my point of view but i didn’t invest much time tounderstand everything in detail.

Actually, if I remember correctly, the TCP stack is only modified at thehypervisor level - they are intercepting and translating TCP sequencenumbers "in-flight" to detect divergence of the source and destination -which is not a big problem if the implementation is well-done.

My hope in the future was that the two approaches could be used in a"Hybrid" manner - actually MC has much more of a performance hit for I/Othan COLO does because of its buffering requirements.

On the other hand, MC would perform better in a memory-intensive orCPU-intensive situation - so maybe QEMU could "switch" between the twomechanisms at different points in time when the resource bottleneck changes.


- Michael

Re: [Qemu-devel] Microcheckpointing: Memory-VCPU / Disk State consistency

Reply via email to