On Wed, Oct 23, 2013 at 1:08 AM, Jules <junqing.w...@cs2c.com.cn> wrote:
>
>> On Tue, Oct 15, 2013 at 03:26:19PM +0800, Jules Wang wrote:
>> > v2 -> v3:
>> > * add documentation of new option in qapi-schema.
>> >
>> > * long option name: ft -> fault-tolerant
>> >
>> > v1 -> v2:
>> > * cmdline: migrate curling:tcp:<address>:<port>
>> >        ->  migrate -f tcp:<address>:<port>
>> >
>> > * sender: use QEMU_VM_FILE_MAGIC_FT as the header of the migration
>> >           to indicate this is a ft migration.
>> >
>> > * receiver: look for the signature:
>> >             QEMU_VM_EOF_MAGIC + QEMU_VM_FILE_MAGIC_FT(64bit total)
>> >             which indicates the end of one migration.
>> > --
>> > Jules Wang (4):
>> >   Curling: add doc
>> >   Curling: cmdline interface.
>> >   Curling: the sender
>> >   Curling: the receiver
>>
>
> First of all, thanks for your superb and spot-on comments.
>
>> It would be helpful to clarify the status of Curling in the cover letter
>> email so reviewers know what to expect.
>
> OK, but I'm not quite clear about how to clarify the status, would you
> pls give me an example?

That status would be an explanation of what is current included in the
patch, which functionality already works, and what you still plan to
implement before the series can be merged.

>> This series does not address I/O or failover.  I guess you are aware of
>> the missing topics that I mentioned, here are my thoughts on them:
>>
>> I/O needs to be held back until the destination host has acknowledged
>> receiving the last full migration state.  The outside world cannot
>> witness state changes in the guest until the migration state has been
>> successfully transferred to the destination host.  Otherwise the guest
>> may appear to act incorrectly when resuming execution from the last
>> snapshot.
>>
>> The time period used by the FT sender thread determines how much latency
>> is added to I/O requests.
>
> Yes, there is the latency. That is inevitable.
>
> I guess you mean the following situation:
> If a msg 'hello' is sent to the chat room server just a few seconds
> before the failover happens, there is a possibility that the msg will be
> sent to the others twice or be lost.
>
> Am I right?

Yes, and this is a fundamental requirement for FT.

I/O is not idempotent.  This means it is not possible to repeat the
same operation twice and get the same result.

Other fault tolerance solutions include a mechanism to hold back I/O
until the checkpoint has been committed by the other host.  This way
no I/O is repeated and applications will not break during failover.

For example, imagine a "compare and swap" operation.  If the VM sends
out a "compare and swap" command to a remote server and fails, then
your current patches may send the command again on the other host.
The problem is that the command will not succeed the second time and
therefore the application fails with an error.

>>
>> Failover functionality is missing from these patches.  We cannot simply
>> start executing on the destination host when the migration connection
>> ends.  If the guest disk image is located on shared storage then
>> split-brain occurs when a network error terminates the migration
>> connection -
>
>> will both hosts begin accessing the shared disk?
> YES
>>
>
> I have a simple way to handle that. In one word, the third point
> --gateway.
>
> Both the sender and the receiver check the connectivity to the gateway
> every X seconds. Let's use A and B stand for whether the sender and the
> receiver are connected to the gateway respectively.
>
> When the connection between the sender and the receiver is down.
> A && B is false.
>
> If A is false, the vm instance at the sender will be stopped.
> If B is false, the vm instance at the receiver will not be started.
>
> a.A false  B false: 0 vm run
> b.A false  B true: 1 vm run
> c.A true   B false: 1 vm run
> d.A true   B true : 1 vm run (normal case)
>
> It becomes complicated when we consider the state transitions in
> these four states.
>
> I suggest adding this feature to libvirt instead of qemu.

I agree that the details of the failover (aka quorum and fencing)
should be implemented as policies outside QEMU, if possible.

Also, there were two presentations about fault tolerance at KVM Forum
2013 a few days ago:
https://docs.google.com/file/d/0BzyAwvVlQckebVBrNXdlaTdWVUk/edit
https://docs.google.com/file/d/0BzyAwvVlQckeczNUZHRod28yVXc/edit

Stefan

Reply via email to