Re: libvirt-6.5.0 breaks host passthrough migration

2020-07-14 Thread Jiri Denemark
On Mon, Jul 13, 2020 at 14:04:25 +0200, Jiri Denemark wrote:
> On Sat, Jul 11, 2020 at 13:44:19 -0400, Mark Mielke wrote:
> > On Sat, Jul 11, 2020 at 6:04 AM Mark Mielke  wrote:
> > 
> > > On Fri, Jul 10, 2020 at 7:48 AM Mark Mielke  wrote:
> > >
> > >> On Fri, Jul 10, 2020 at 7:14 AM Jiri Denemark 
> > >> wrote:
> > >>
> > >>> The implementation seems to be doing exactly what the commit message
> > >>>
> > >> says. The migratable=off default should be used only when QEMU does not
> > >>> support -cpu host,migratable=on|off, that is only when QEMU is very old.
> > >>> Every non-ancient version of libvirt should have the
> > >>> QEMU_CAPS_CPU_MIGRATABLE set and thus this code should choose
> > >>> migrateble=on default.
> > >>>
> > >> QEMU_CAPS_CPU_MIGRATABLE only from the  element? If so, doesn't this
> > >> mean that it is not explicitly listed for host-passthrough, and this 
> > >> means
> > >> the check is not detecting whether it is enabled or not properly?
> > >>
> > > Trying to understand what is going on more - I see "migratable" seems to
> > > be ok when launching a new machine, but the failure scenario was live
> > > migration from 6.4.0 to 6.5.0.
> > >
> > > Is this because the QEMU_CAPS_CPU_MIGRATABLE was not filled in for 6.4.0,
> > > and live migration grabs the capabilities from the source, where the
> > > absence of this capability makes it presume an older Qemu in the above 
> > > code?
> > >
> > 
> > Sorry all - I am having trouble reproducing now. The expected use cases are
> > now working.
> > 
> > Is it possible that the "migratable" flag might have been missing on some
> > of the instances, although migration worked fine, and despite having used
> > Qemu 4.2 and Qemu 5.0?
> 
> When an updated libvirtd which knows about this new capability starts,
> it would reprobe all QEMU capabilities (lazily, i.e., once they are
> needed). However, if there is a running domain, libvirt will use cached
> capabilities probed when the domain was started. I suspect migrating
> such domain could be a problem. I'll try to reproduce locally.

OK, I did not reproduce the failure, because migratable=off doesn't
enable anything more than migratable=on (likely because L1 VM in my
nested environment does not have any non-migratable features enabled).
But I was able to reproduce the issue itself and the migration could
clearly fail if migratable=off enabled some non-migratable features. The
reproducer is actually easy and one doesn't even need to migrate to see
libvirt did something wrong:

1. run libvirtd older then 6.5.0
2. start a domain with host-passthrough CPU (QEMU would default to
   migratable=on)
3. upgrade libvirt to 6.5.0 and restart libvirtd
4. virsh dumpxml $DOMAIN_STARTED_IN_STEP_2

Now you would see



which differs from the default used by QEMU. Migrating such domain would
succeed anyway, because it was actually started with migratable='on'.
But when such domain is migrated to libvirt 6.5.0, we would honor the
migratable attribute and start QEMU with -cpu host,migratable=off which
could cause failures when trying to migrate this domain again.

The problem is exactly where I was afraid it could be. When libvirtd
starts, it reads the QEMU capabilities probed by older libvirt
(QEMU_CAPS_CPU_MIGRATABLE would be off) and wrongly updates the XML of
the running domain. I'll prepare a patch to fix this.

Jirka



Re: libvirt-6.5.0 breaks host passthrough migration

2020-07-13 Thread Jiri Denemark
On Sat, Jul 11, 2020 at 13:44:19 -0400, Mark Mielke wrote:
> On Sat, Jul 11, 2020 at 6:04 AM Mark Mielke  wrote:
> 
> > On Fri, Jul 10, 2020 at 7:48 AM Mark Mielke  wrote:
> >
> >> On Fri, Jul 10, 2020 at 7:14 AM Jiri Denemark 
> >> wrote:
> >>
> >>> The implementation seems to be doing exactly what the commit message
> >>>
> >> says. The migratable=off default should be used only when QEMU does not
> >>> support -cpu host,migratable=on|off, that is only when QEMU is very old.
> >>> Every non-ancient version of libvirt should have the
> >>> QEMU_CAPS_CPU_MIGRATABLE set and thus this code should choose
> >>> migrateble=on default.
> >>>
> >> QEMU_CAPS_CPU_MIGRATABLE only from the  element? If so, doesn't this
> >> mean that it is not explicitly listed for host-passthrough, and this means
> >> the check is not detecting whether it is enabled or not properly?
> >>
> > Trying to understand what is going on more - I see "migratable" seems to
> > be ok when launching a new machine, but the failure scenario was live
> > migration from 6.4.0 to 6.5.0.
> >
> > Is this because the QEMU_CAPS_CPU_MIGRATABLE was not filled in for 6.4.0,
> > and live migration grabs the capabilities from the source, where the
> > absence of this capability makes it presume an older Qemu in the above code?
> >
> 
> Sorry all - I am having trouble reproducing now. The expected use cases are
> now working.
> 
> Is it possible that the "migratable" flag might have been missing on some
> of the instances, although migration worked fine, and despite having used
> Qemu 4.2 and Qemu 5.0?

When an updated libvirtd which knows about this new capability starts,
it would reprobe all QEMU capabilities (lazily, i.e., once they are
needed). However, if there is a running domain, libvirt will use cached
capabilities probed when the domain was started. I suspect migrating
such domain could be a problem. I'll try to reproduce locally.

Jirka



Re: libvirt-6.5.0 breaks host passthrough migration

2020-07-11 Thread Mark Mielke
On Sat, Jul 11, 2020 at 6:04 AM Mark Mielke  wrote:

> On Fri, Jul 10, 2020 at 7:48 AM Mark Mielke  wrote:
>
>> On Fri, Jul 10, 2020 at 7:14 AM Jiri Denemark 
>> wrote:
>>
>>> The implementation seems to be doing exactly what the commit message
>>>
>> says. The migratable=off default should be used only when QEMU does not
>>> support -cpu host,migratable=on|off, that is only when QEMU is very old.
>>> Every non-ancient version of libvirt should have the
>>> QEMU_CAPS_CPU_MIGRATABLE set and thus this code should choose
>>> migrateble=on default.
>>>
>> QEMU_CAPS_CPU_MIGRATABLE only from the  element? If so, doesn't this
>> mean that it is not explicitly listed for host-passthrough, and this means
>> the check is not detecting whether it is enabled or not properly?
>>
> Trying to understand what is going on more - I see "migratable" seems to
> be ok when launching a new machine, but the failure scenario was live
> migration from 6.4.0 to 6.5.0.
>
> Is this because the QEMU_CAPS_CPU_MIGRATABLE was not filled in for 6.4.0,
> and live migration grabs the capabilities from the source, where the
> absence of this capability makes it presume an older Qemu in the above code?
>

Sorry all - I am having trouble reproducing now. The expected use cases are
now working.

Is it possible that the "migratable" flag might have been missing on some
of the instances, although migration worked fine, and despite having used
Qemu 4.2 and Qemu 5.0?

If I can reproduce it, I'll follow up. Otherwise, thanks for looking.

-- 
Mark Mielke 


Re: libvirt-6.5.0 breaks host passthrough migration

2020-07-11 Thread Mark Mielke
On Fri, Jul 10, 2020 at 7:48 AM Mark Mielke  wrote:

> On Fri, Jul 10, 2020 at 7:14 AM Jiri Denemark  wrote:
>
>> > +if (qemuCaps &&
>>
> > +def->cpu->mode == VIR_CPU_MODE_HOST_PASSTHROUGH &&
>> > +!def->cpu->migratable) {
>> > +if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_CPU_MIGRATABLE))
>> > +def->cpu->migratable = VIR_TRISTATE_SWITCH_ON;
>> >
>> > *+else if (ARCH_IS_X86(def->os.arch))+
>> >  def->cpu->migratable = VIR_TRISTATE_SWITCH_OFF;*
>> > +}
>>
>> The implementation seems to be doing exactly what the commit message
>> says. The migratable=off default should be used only when QEMU does not
>> support -cpu host,migratable=on|off, that is only when QEMU is very old.
>> Every non-ancient version of libvirt should have the
>> QEMU_CAPS_CPU_MIGRATABLE set and thus this code should choose
>> migrateble=on default.
>>
>
> I wasn't sure what QEMU_CAPS_CPU_MIGRATABLE represents. I initially
> suspected what you are saying, but since it apparently did not work the way
> I expected, I then presumed it does not work the way I expected. :-)
>
> Is QEMU_CAPS_CPU_MIGRATABLE only from the  element? If so, doesn't
> this mean that it is not explicitly listed for host-passthrough, and this
> means the check is not detecting whether it is enabled or not properly?
>

Trying to understand what is going on more - I see "migratable" seems to be
ok when launching a new machine, but the failure scenario was live
migration from 6.4.0 to 6.5.0.

Is this because the QEMU_CAPS_CPU_MIGRATABLE was not filled in for 6.4.0,
and live migration grabs the capabilities from the source, where the
absence of this capability makes it presume an older Qemu in the above code?

-- 
Mark Mielke 


Re: libvirt-6.5.0 breaks host passthrough migration

2020-07-10 Thread Jiri Denemark
On Fri, Jul 10, 2020 at 07:48:26 -0400, Mark Mielke wrote:
> On Fri, Jul 10, 2020 at 7:14 AM Jiri Denemark  wrote:
> 
> > On Sun, Jul 05, 2020 at 12:45:55 -0400, Mark Mielke wrote:
> > > With 6.4.0, live migration was working fine with Qemu 5.0. After trying
> > out
> > > 6.5.0, migration broke with the following error:
> > >
> > > libvirt.libvirtError: internal error: unable to execute QEMU command
> > > 'migrate': State blocked by non-migratable CPU device (invtsc flag)
> >
> > Could you please describe the reproducer steps? For example, was the
> > domain you're trying to migrate already running when you upgrade libvirt
> > or is it freshly started by the new libvirt?
> >
> 
> 
> The original case was:
> 
> 1) Machine X running libvirt 6.4.0 + qemu 5.0
> 2) Machine Y running libvirt 6.5.0 + qemu 5.0
> 3) Live migration from X to Y works. Guest appears fine.
> 4) Upgrade Machine X from libvirt 6.4.0 to 6.5.0 and reboot.
> 5) Live migration from Y to X fails with the message shown.

Oh I see, so I guess the bad default is chosen during the incoming
migration to machine Y. I'll try to reproduce myself to see what's going
on.

> In each case, live migration was done with OpenStack Train directing
> libvirt + qemu.
> 
> 
> And it would be helpful to see the  element as shown by virsh
> > dumpxml before you try to start the domain as well as the QEMU command
> > line libvirt used to start the domain (in
> > /var/log/libvirt/qemu/$VM.log).
> >
> 
> The  element looks like this:
> 
>   
> 
>   
> 
> The QEMU command line is very long, and includes details I would avoid
> publishing publicly unless you need them. The "-cpu" portion is just:
> 
> -cpu host
> 
> The QEMU command line itself is generated from libvirt, which is directed
> by OpenStack Train.

These are from machine X before step 3, right? Can you also share the
same from machine Y before step 5?

> I wasn't sure what QEMU_CAPS_CPU_MIGRATABLE represents. I initially
> suspected what you are saying, but since it apparently did not work the way
> I expected, I then presumed it does not work the way I expected. :-)
> 
> Is QEMU_CAPS_CPU_MIGRATABLE only from the  element? If so, doesn't
> this mean that it is not explicitly listed for host-passthrough, and this
> means the check is not detecting whether it is enabled or not properly?

QEMU_CAPS_CPU_MIGRATABLE comes from the QEMU capability probing.
Specifically, the capability is enabled when a given QEMU binary reports
'migratable' property for the CPU object. And the capability detection
tests show we should be properly detecting this capability:

tests/qemucapabilitiesdata $ git grep cpu.migratable
caps_2.12.0.x86_64.xml:  
caps_3.0.0.x86_64.xml:  
caps_3.1.0.x86_64.xml:  
caps_4.0.0.x86_64.xml:  
caps_4.1.0.x86_64.xml:  
caps_4.2.0.x86_64.xml:  
caps_5.0.0.x86_64.xml:  
caps_5.1.0.x86_64.xml:  

> I think it can go either way. There is also convention over configuration
> as a competing principle. However, I also prefer explicit. Just, it needs
> to be correct, otherwise explicit can be very bad, as it seems in my case.
> :-)

Of course, the explicit default must match the implicit one.

Jirka



Re: libvirt-6.5.0 breaks host passthrough migration

2020-07-10 Thread Mark Mielke
On Fri, Jul 10, 2020 at 7:14 AM Jiri Denemark  wrote:

> On Sun, Jul 05, 2020 at 12:45:55 -0400, Mark Mielke wrote:
> > With 6.4.0, live migration was working fine with Qemu 5.0. After trying
> out
> > 6.5.0, migration broke with the following error:
> >
> > libvirt.libvirtError: internal error: unable to execute QEMU command
> > 'migrate': State blocked by non-migratable CPU device (invtsc flag)
>
> Could you please describe the reproducer steps? For example, was the
> domain you're trying to migrate already running when you upgrade libvirt
> or is it freshly started by the new libvirt?
>


The original case was:

1) Machine X running libvirt 6.4.0 + qemu 5.0
2) Machine Y running libvirt 6.5.0 + qemu 5.0
3) Live migration from X to Y works. Guest appears fine.
4) Upgrade Machine X from libvirt 6.4.0 to 6.5.0 and reboot.
5) Live migration from Y to X fails with the message shown.

In each case, live migration was done with OpenStack Train directing
libvirt + qemu.


And it would be helpful to see the  element as shown by virsh
> dumpxml before you try to start the domain as well as the QEMU command
> line libvirt used to start the domain (in
> /var/log/libvirt/qemu/$VM.log).
>

The  element looks like this:

  

  

The QEMU command line is very long, and includes details I would avoid
publishing publicly unless you need them. The "-cpu" portion is just:

-cpu host

The QEMU command line itself is generated from libvirt, which is directed
by OpenStack Train.


> > commit 201bd5db639c063862b0c1b1abfab9a9a7c92591
> > Author: Jiri Denemark 
> > Date:   Tue Jun 2 15:34:07 2020 +0200
> >
> > qemu: Fill default value in //cpu/@migratable attribute
> >
> > Before QEMU introduced migratable CPU property, "-cpu host" included
> all
> > features that could be enabled on the host, even those which would
> block
> > migration. In other words, the default was equivalent to
> migratable=off.
> > When the migratable property was introduced, the default changed to
> > migratable=on. Let's record the default in domain XML.
> >
> > Signed-off-by: Jiri Denemark 
> > Reviewed-by: Michal Privoznik 
> >
> > Before this change, qemu was still being launched with "-cpu host", which
> > for any somewhat modern version of qemu, defaults to migratable=on. The
> > above comment acknowledges this, however, the implementation chooses the
> > pessimistic and ancient (and no longer applicable!) value of
> migratable=off:
> >
> > +if (qemuCaps &&
> > +def->cpu->mode == VIR_CPU_MODE_HOST_PASSTHROUGH &&
> > +!def->cpu->migratable) {
> > +if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_CPU_MIGRATABLE))
> > +def->cpu->migratable = VIR_TRISTATE_SWITCH_ON;
> >
> > *+else if (ARCH_IS_X86(def->os.arch))+
> >  def->cpu->migratable = VIR_TRISTATE_SWITCH_OFF;*
> > +}
>
> The implementation seems to be doing exactly what the commit message
> says. The migratable=off default should be used only when QEMU does not
> support -cpu host,migratable=on|off, that is only when QEMU is very old.
> Every non-ancient version of libvirt should have the
> QEMU_CAPS_CPU_MIGRATABLE set and thus this code should choose
> migrateble=on default.
>

I wasn't sure what QEMU_CAPS_CPU_MIGRATABLE represents. I initially
suspected what you are saying, but since it apparently did not work the way
I expected, I then presumed it does not work the way I expected. :-)

Is QEMU_CAPS_CPU_MIGRATABLE only from the  element? If so, doesn't
this mean that it is not explicitly listed for host-passthrough, and this
means the check is not detecting whether it is enabled or not properly?

> I think it is not a requirement for "migratable=XXX" to be explicit in
> > libvirt. However, if there is some reason I am unaware of, and it is
> > important for libvirt to know, then I think it is important for libvirt
> to
> > find out the authoritative state rather than guessing.
> Explicit defaults are always better for two reasons: they are visible to
> users and they don't silently change.
>

I think it can go either way. There is also convention over configuration
as a competing principle. However, I also prefer explicit. Just, it needs
to be correct, otherwise explicit can be very bad, as it seems in my case.
:-)

Thanks,


-- 
Mark Mielke 


Re: libvirt-6.5.0 breaks host passthrough migration

2020-07-10 Thread Jiri Denemark
On Fri, Jul 10, 2020 at 05:53:47 -0400, Mark Mielke wrote:
> Hi all:
> 
> Any thoughts on this, or not time enough to look?
> 
> I'm trying to decide what the right solution is on my end to deal with this
> breaking change, before more widely deploying it. If I know what the fix
> will be, I would like to align my strategy against it. Or, if we discuss
> and the libvirt team disagrees with my analysis, then I will have to decide
> whether to enable some sort of conversion mechanism to safely migrate
> machines from libvirt-6.4.0 and before, to libvirt-6.5.0 without losing the
> migration capability.

Ah, sorry about the delay, I was on vacation. Anyway, I just replied to
your original email...

Jirka



Re: libvirt-6.5.0 breaks host passthrough migration

2020-07-10 Thread Jiri Denemark
On Sun, Jul 05, 2020 at 12:45:55 -0400, Mark Mielke wrote:
> Hi all:
> 
> With 6.4.0, live migration was working fine with Qemu 5.0. After trying out
> 6.5.0, migration broke with the following error:
> 
> libvirt.libvirtError: internal error: unable to execute QEMU command
> 'migrate': State blocked by non-migratable CPU device (invtsc flag)

Could you please describe the reproducer steps? For example, was the
domain you're trying to migrate already running when you upgrade libvirt
or is it freshly started by the new libvirt?

And it would be helpful to see the  element as shown by virsh
dumpxml before you try to start the domain as well as the QEMU command
line libvirt used to start the domain (in
/var/log/libvirt/qemu/$VM.log).

> I believe I traced the error back to this commit:
> 
> commit 201bd5db639c063862b0c1b1abfab9a9a7c92591
> Author: Jiri Denemark 
> Date:   Tue Jun 2 15:34:07 2020 +0200
> 
> qemu: Fill default value in //cpu/@migratable attribute
> 
> Before QEMU introduced migratable CPU property, "-cpu host" included all
> features that could be enabled on the host, even those which would block
> migration. In other words, the default was equivalent to migratable=off.
> When the migratable property was introduced, the default changed to
> migratable=on. Let's record the default in domain XML.
> 
> Signed-off-by: Jiri Denemark 
> Reviewed-by: Michal Privoznik 
> 
> 
> Before this change, qemu was still being launched with "-cpu host", which
> for any somewhat modern version of qemu, defaults to migratable=on. The
> above comment acknowledges this, however, the implementation chooses the
> pessimistic and ancient (and no longer applicable!) value of migratable=off:
> 
> +if (qemuCaps &&
> +def->cpu->mode == VIR_CPU_MODE_HOST_PASSTHROUGH &&
> +!def->cpu->migratable) {
> +if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_CPU_MIGRATABLE))
> +def->cpu->migratable = VIR_TRISTATE_SWITCH_ON;
> 
> *+else if (ARCH_IS_X86(def->os.arch))+
>  def->cpu->migratable = VIR_TRISTATE_SWITCH_OFF;*
> +}

The implementation seems to be doing exactly what the commit message
says. The migratable=off default should be used only when QEMU does not
support -cpu host,migratable=on|off, that is only when QEMU is very old.
Every non-ancient version of libvirt should have the
QEMU_CAPS_CPU_MIGRATABLE set and thus this code should choose
migrateble=on default.

> I think it is not a requirement for "migratable=XXX" to be explicit in
> libvirt. However, if there is some reason I am unaware of, and it is
> important for libvirt to know, then I think it is important for libvirt to
> find out the authoritative state rather than guessing.

Explicit defaults are always better for two reasons: they are visible to
users and they don't silently change.

Jirka



Re: libvirt-6.5.0 breaks host passthrough migration

2020-07-10 Thread Mark Mielke
Hi all:

Any thoughts on this, or not time enough to look?

I'm trying to decide what the right solution is on my end to deal with this
breaking change, before more widely deploying it. If I know what the fix
will be, I would like to align my strategy against it. Or, if we discuss
and the libvirt team disagrees with my analysis, then I will have to decide
whether to enable some sort of conversion mechanism to safely migrate
machines from libvirt-6.4.0 and before, to libvirt-6.5.0 without losing the
migration capability.


On Sun, Jul 5, 2020 at 12:45 PM Mark Mielke  wrote:

> Hi all:
>
> With 6.4.0, live migration was working fine with Qemu 5.0. After trying
> out 6.5.0, migration broke with the following error:
>
> libvirt.libvirtError: internal error: unable to execute QEMU command
> 'migrate': State blocked by non-migratable CPU device (invtsc flag)
>
>
> I believe I traced the error back to this commit:
>
> commit 201bd5db639c063862b0c1b1abfab9a9a7c92591
> Author: Jiri Denemark 
> Date:   Tue Jun 2 15:34:07 2020 +0200
>
> qemu: Fill default value in //cpu/@migratable attribute
>
> Before QEMU introduced migratable CPU property, "-cpu host" included
> all
> features that could be enabled on the host, even those which would
> block
> migration. In other words, the default was equivalent to
> migratable=off.
> When the migratable property was introduced, the default changed to
> migratable=on. Let's record the default in domain XML.
>
> Signed-off-by: Jiri Denemark 
> Reviewed-by: Michal Privoznik 
>
>
> Before this change, qemu was still being launched with "-cpu host", which
> for any somewhat modern version of qemu, defaults to migratable=on. The
> above comment acknowledges this, however, the implementation chooses the
> pessimistic and ancient (and no longer applicable!) value of migratable=off:
>
> +if (qemuCaps &&
> +def->cpu->mode == VIR_CPU_MODE_HOST_PASSTHROUGH &&
> +!def->cpu->migratable) {
> +if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_CPU_MIGRATABLE))
> +def->cpu->migratable = VIR_TRISTATE_SWITCH_ON;
>
> *+else if (ARCH_IS_X86(def->os.arch))+
>  def->cpu->migratable = VIR_TRISTATE_SWITCH_OFF;*
> +}
>
> Consequently, after live migration qemu is started with "-cpu
> host,migratable=off" and this enables invtsc, which then prevents migration
> as when invtsc is enabled it activates a migration blocker.
>
> I think this patch should be reverted. If the migratable state cannot be
> determined, then it should not be guessed. It should be left as "absent",
> just as it has been working fine prior to libvirt-6.5.0. Qemu still knows
> what flags are enabled, so Qemu remains the authority on what can be safely
> migrated, and what cannot be. I reverted this patch and re-built, and this
> seems to clear the migration problems. If the user chooses to explicitly
> specify migratable=yes or migratable=no, then this value should be
> preserved and passed through to the Qemu "-cpu host,XXX" option.
>
> I think it is not a requirement for "migratable=XXX" to be explicit in
> libvirt. However, if there is some reason I am unaware of, and it is
> important for libvirt to know, then I think it is important for libvirt to
> find out the authoritative state rather than guessing.
>
> Please start by reverting this patch, so that other people do not get
> broken in the same way, and I don't need to carry my revert patch.
> Personally, I think this is important enough to build a 6.5.1. However,
> since I have a local revert patch in place, I am not waiting for this.
>
> Thanks!
>
> --
> Mark Mielke 
>
>

-- 
Mark Mielke 


libvirt-6.5.0 breaks host passthrough migration

2020-07-05 Thread Mark Mielke
Hi all:

With 6.4.0, live migration was working fine with Qemu 5.0. After trying out
6.5.0, migration broke with the following error:

libvirt.libvirtError: internal error: unable to execute QEMU command
'migrate': State blocked by non-migratable CPU device (invtsc flag)


I believe I traced the error back to this commit:

commit 201bd5db639c063862b0c1b1abfab9a9a7c92591
Author: Jiri Denemark 
Date:   Tue Jun 2 15:34:07 2020 +0200

qemu: Fill default value in //cpu/@migratable attribute

Before QEMU introduced migratable CPU property, "-cpu host" included all
features that could be enabled on the host, even those which would block
migration. In other words, the default was equivalent to migratable=off.
When the migratable property was introduced, the default changed to
migratable=on. Let's record the default in domain XML.

Signed-off-by: Jiri Denemark 
Reviewed-by: Michal Privoznik 


Before this change, qemu was still being launched with "-cpu host", which
for any somewhat modern version of qemu, defaults to migratable=on. The
above comment acknowledges this, however, the implementation chooses the
pessimistic and ancient (and no longer applicable!) value of migratable=off:

+if (qemuCaps &&
+def->cpu->mode == VIR_CPU_MODE_HOST_PASSTHROUGH &&
+!def->cpu->migratable) {
+if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_CPU_MIGRATABLE))
+def->cpu->migratable = VIR_TRISTATE_SWITCH_ON;

*+else if (ARCH_IS_X86(def->os.arch))+
 def->cpu->migratable = VIR_TRISTATE_SWITCH_OFF;*
+}

Consequently, after live migration qemu is started with "-cpu
host,migratable=off" and this enables invtsc, which then prevents migration
as when invtsc is enabled it activates a migration blocker.

I think this patch should be reverted. If the migratable state cannot be
determined, then it should not be guessed. It should be left as "absent",
just as it has been working fine prior to libvirt-6.5.0. Qemu still knows
what flags are enabled, so Qemu remains the authority on what can be safely
migrated, and what cannot be. I reverted this patch and re-built, and this
seems to clear the migration problems. If the user chooses to explicitly
specify migratable=yes or migratable=no, then this value should be
preserved and passed through to the Qemu "-cpu host,XXX" option.

I think it is not a requirement for "migratable=XXX" to be explicit in
libvirt. However, if there is some reason I am unaware of, and it is
important for libvirt to know, then I think it is important for libvirt to
find out the authoritative state rather than guessing.

Please start by reverting this patch, so that other people do not get
broken in the same way, and I don't need to carry my revert patch.
Personally, I think this is important enough to build a 6.5.1. However,
since I have a local revert patch in place, I am not waiting for this.

Thanks!

-- 
Mark Mielke