On 11/05/2023 16:09, Juan Quintela wrote:
External email: Use caution opening links or attachments


Avihai Horon <avih...@nvidia.com> wrote:
On 10/05/2023 19:41, Juan Quintela wrote:
Does this makes sense?
Yes, thanks a lot for the full and detailed explanation!
Thank you.

This indeed solves the problem in the scenario I mentioned above.

However, this relies on the fact that a device support for this
feature depends only on the QEMU version.
This is not the case for VFIO devices.
What a surprise :-)

Yes, I couldn't resist.

To support explicit-switchover, a VFIO device also needs host kernel
support for VFIO precopy, i.e., it needs to have the
VFIO_MIGRATION_PRE_COPY flag set.
So, theoretically we could have the following:
- Source and destination QEMU are the same version.
- We migrate two different VFIO devices (i.e., they don't share the
   same kernel driver), device X and device Y.
- Host kernel in source supports VIFO precopy for device X but not for
   device Y.
- Host kernel in destination supports VFIO precopy for both device X
   and device Y.
Without explicit-switchover, migration should work.
But if we enable explicit-switchover and do migration, we would end up
in the same situation where switchover_pending=2 in destination and it
never reaches zero so migration is stuck.
I think this is too much for qemu.  You need to work at the
libvirt/management level.

This could be solved by moving the switchover_pending counter to the
source and sending multiple MIG_RP explicit-switchover ACK messages.
However, I also raised a concern about this in my last mail to Peter
[1], where this is not guaranteed to work, depending on the device
implementation for explicit-switchover feature.
I will not try to be extra clever here.  We have removed qemu support of
the question, as it is the same qemu in both sides.

So what we have is this configuration:

Host A
------
device X explicit_switchoever=on
device Y explicit_switchoever=off

Host B
------
device X explicit_switchoever=on
device Y explicit_switchoever=on

The configuration is different.  That is something that qemu protocol
don't know how to handle, and it is up to stack.

You need to configure explicitely in qemu command line on host B:
device=Y,explicit_switchover=off

Or whatever is that configured off.

I understand.


It is exactly the same problem than:

Host A
------

Intel CPU genX

Host B
------

intel CPU genX-1

i.e. there are features that Host A has but host B don't have.  The only
way to make this work is that you need to configure qemu when launched
in Host A with a cpu type that host B is able to run (i.e. one that
don't have any features that Host B is missing).

What is the difference between this and yours?

Hmm, yes, I see your point.



Not sure though if I'm digging too deep in some improbable future
corner cases.
Oh, you are just starting.  The compat layers that CPU have had to do
over the years.  At some point even migration between AMD and Intel
CPU's worked.

Let's go back to the basic question, which is whether we need to send
an "advise" message for each device that supports explicit-switchover.
I think it gives us more flexibility and although not needed at the
moment, might be useful in the future.
I think that is not a good idea, see my previous comment.  We have two
cases:
- both devices have the same features in both places
- they have different features in any of the places

First case, we don't care.  It always work.
Second case, we need to configure it correctly, and that means disable
features that are not on the other side.

Yep, I understand.


If you want I can send a v2 that addresses the comments and simplifies
the code in other areas and we'll continue discussing the necessity of
the "advise" message then.
Yeap.  I think is the best course of action.

OK, so let me digest all the new info of this discussion and get back with v2 / conclusions / questions.

Thanks for all the help!


Reply via email to