Re: [PATCH 3/3] migration/multifd: fix potential wrong acception order of IOChannel
On 2019/10/24 22:34, Daniel P. Berrangé wrote: > On Thu, Oct 24, 2019 at 09:53:24PM +0800, cenjiahui wrote: >> On 2019/10/24 17:52, Daniel P. Berrangé wrote: >>> On Wed, Oct 23, 2019 at 11:32:14AM +0800, cenjiahui wrote: From: Jiahui Cen Multifd assumes the migration thread IOChannel is always established before the multifd IOChannels, but this assumption will be broken in many situations like network packet loss. For example: Step1: Source (migration thread IOChannel) --SYN--> Destination Step2: Source (migration thread IOChannel) <--SYNACK Destination Step3: Source (migration thread IOChannel, lost) --ACK-->X Destination Step4: Source (multifd IOChannel) --SYN-->Destination Step5: Source (multifd IOChannel) <--SYNACK Destination Step6: Source (multifd IOChannel, ESTABLISHED) --ACK--> Destination Step7: Destination accepts multifd IOChannel Step8: Source (migration thread IOChannel, ESTABLISHED) -ACK,DATA-> Destination Step9: Destination accepts migration thread IOChannel The above situation can be reproduced by creating a weak network environment, such as "tc qdisc add dev eth0 root netem loss 50%". The wrong acception order will cause magic check failure and thus lead to migration failure. This patch fixes this issue by sending a migration IOChannel initial packet with a unique id when using multifd migration. Since the multifd IOChannels will also send initial packets, the destination can judge whether the processing IOChannel belongs to multifd by checking the id in the initial packet. This mechanism can ensure that different IOChannels will go to correct branches in our test. >>> >>> Isn't this going to break back compatibility when new QEMU talks to old >>> QEMU with multifd enabled ? New QEMU will be sending a packet that old >>> QEMU isn't expecting IIUC. >> >> Yes, it actually breaks back compatibility. But since the old QEMU has bug >> with >> multifd, it may be not suitable to use multifd to migrate from new QEMU to >> old >> QEMU in my opinion. > > We declared multifd supported from v4.0.0 onwards, so changing the wire > protocol in non-backwards compatibles ways is not acceptable IMHO. > > Ideally we'd change QEMU so that the src QEMU serializes the connections, > such that the migration thread I/O channel is established before we attempt > to establish the multifd channels. > > If changing the wire protocol is unavoidable, then we'd need to invent > a new migration capability for the mgmt apps to detect & opt-in to when > both sides support it. I think the src QEMU cannot promise the serialization of the connections. Multifd is designed as that the migration thread I/O channel is established first which only promises the serialization in the source. Whether the destination can establish the connections in order depends on the network. Without the correct order, the Destination cannot distinguish the connections unless it reads something from the channels. I think there is a somewhat ugly solution to fix this issue, since the migration thread will first send vm state header with a MAGIC different from the multifd initial packet's at the beginning of the migration, we may read it in advance to judge the connection so that we do not need to send an additional packet for migration thread I/O channel. But it has to keep the content of the packet for the future use. And in this way, the Source has already been sending data to migrate, but the Destination is just ready to start migration. Have you got any good idea on this issue ? Regrads, Jiahui Cen
Re: [PATCH 3/3] migration/multifd: fix potential wrong acception order of IOChannel
On Thu, Oct 24, 2019 at 09:53:24PM +0800, cenjiahui wrote: > On 2019/10/24 17:52, Daniel P. Berrangé wrote: > > On Wed, Oct 23, 2019 at 11:32:14AM +0800, cenjiahui wrote: > >> From: Jiahui Cen > >> > >> Multifd assumes the migration thread IOChannel is always established before > >> the multifd IOChannels, but this assumption will be broken in many > >> situations > >> like network packet loss. > >> > >> For example: > >> Step1: Source (migration thread IOChannel) --SYN--> Destination > >> Step2: Source (migration thread IOChannel) <--SYNACK Destination > >> Step3: Source (migration thread IOChannel, lost) --ACK-->X Destination > >> Step4: Source (multifd IOChannel) --SYN-->Destination > >> Step5: Source (multifd IOChannel) <--SYNACK Destination > >> Step6: Source (multifd IOChannel, ESTABLISHED) --ACK--> Destination > >> Step7: Destination accepts multifd IOChannel > >> Step8: Source (migration thread IOChannel, ESTABLISHED) -ACK,DATA-> > >> Destination > >> Step9: Destination accepts migration thread IOChannel > >> > >> The above situation can be reproduced by creating a weak network > >> environment, > >> such as "tc qdisc add dev eth0 root netem loss 50%". The wrong acception > >> order > >> will cause magic check failure and thus lead to migration failure. > >> > >> This patch fixes this issue by sending a migration IOChannel initial > >> packet with > >> a unique id when using multifd migration. Since the multifd IOChannels > >> will also > >> send initial packets, the destination can judge whether the processing > >> IOChannel > >> belongs to multifd by checking the id in the initial packet. This > >> mechanism can > >> ensure that different IOChannels will go to correct branches in our test. > > > > Isn't this going to break back compatibility when new QEMU talks to old > > QEMU with multifd enabled ? New QEMU will be sending a packet that old > > QEMU isn't expecting IIUC. > > Yes, it actually breaks back compatibility. But since the old QEMU has bug > with > multifd, it may be not suitable to use multifd to migrate from new QEMU to old > QEMU in my opinion. We declared multifd supported from v4.0.0 onwards, so changing the wire protocol in non-backwards compatibles ways is not acceptable IMHO. Ideally we'd change QEMU so that the src QEMU serializes the connections, such that the migration thread I/O channel is established before we attempt to establish the multifd channels. If changing the wire protocol is unavoidable, then we'd need to invent a new migration capability for the mgmt apps to detect & opt-in to when both sides support it. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH 3/3] migration/multifd: fix potential wrong acception order of IOChannel
On 2019/10/24 17:52, Daniel P. Berrangé wrote: > On Wed, Oct 23, 2019 at 11:32:14AM +0800, cenjiahui wrote: >> From: Jiahui Cen >> >> Multifd assumes the migration thread IOChannel is always established before >> the multifd IOChannels, but this assumption will be broken in many situations >> like network packet loss. >> >> For example: >> Step1: Source (migration thread IOChannel) --SYN--> Destination >> Step2: Source (migration thread IOChannel) <--SYNACK Destination >> Step3: Source (migration thread IOChannel, lost) --ACK-->X Destination >> Step4: Source (multifd IOChannel) --SYN-->Destination >> Step5: Source (multifd IOChannel) <--SYNACK Destination >> Step6: Source (multifd IOChannel, ESTABLISHED) --ACK--> Destination >> Step7: Destination accepts multifd IOChannel >> Step8: Source (migration thread IOChannel, ESTABLISHED) -ACK,DATA-> >> Destination >> Step9: Destination accepts migration thread IOChannel >> >> The above situation can be reproduced by creating a weak network environment, >> such as "tc qdisc add dev eth0 root netem loss 50%". The wrong acception >> order >> will cause magic check failure and thus lead to migration failure. >> >> This patch fixes this issue by sending a migration IOChannel initial packet >> with >> a unique id when using multifd migration. Since the multifd IOChannels will >> also >> send initial packets, the destination can judge whether the processing >> IOChannel >> belongs to multifd by checking the id in the initial packet. This mechanism >> can >> ensure that different IOChannels will go to correct branches in our test. > > Isn't this going to break back compatibility when new QEMU talks to old > QEMU with multifd enabled ? New QEMU will be sending a packet that old > QEMU isn't expecting IIUC. Yes, it actually breaks back compatibility. But since the old QEMU has bug with multifd, it may be not suitable to use multifd to migrate from new QEMU to old QEMU in my opinion. Hi, Quintela, how do you think about this ? > >> Signed-off-by: Jiahui Cen >> Signed-off-by: Ying Fang Regards, Jiahui Cen
Re: [PATCH 3/3] migration/multifd: fix potential wrong acception order of IOChannel
On Wed, Oct 23, 2019 at 11:32:14AM +0800, cenjiahui wrote: > From: Jiahui Cen > > Multifd assumes the migration thread IOChannel is always established before > the multifd IOChannels, but this assumption will be broken in many situations > like network packet loss. > > For example: > Step1: Source (migration thread IOChannel) --SYN--> Destination > Step2: Source (migration thread IOChannel) <--SYNACK Destination > Step3: Source (migration thread IOChannel, lost) --ACK-->X Destination > Step4: Source (multifd IOChannel) --SYN-->Destination > Step5: Source (multifd IOChannel) <--SYNACK Destination > Step6: Source (multifd IOChannel, ESTABLISHED) --ACK--> Destination > Step7: Destination accepts multifd IOChannel > Step8: Source (migration thread IOChannel, ESTABLISHED) -ACK,DATA-> > Destination > Step9: Destination accepts migration thread IOChannel > > The above situation can be reproduced by creating a weak network environment, > such as "tc qdisc add dev eth0 root netem loss 50%". The wrong acception order > will cause magic check failure and thus lead to migration failure. > > This patch fixes this issue by sending a migration IOChannel initial packet > with > a unique id when using multifd migration. Since the multifd IOChannels will > also > send initial packets, the destination can judge whether the processing > IOChannel > belongs to multifd by checking the id in the initial packet. This mechanism > can > ensure that different IOChannels will go to correct branches in our test. Isn't this going to break back compatibility when new QEMU talks to old QEMU with multifd enabled ? New QEMU will be sending a packet that old QEMU isn't expecting IIUC. > Signed-off-by: Jiahui Cen > Signed-off-by: Ying Fang Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|