Igor Mammedov <imamm...@redhat.com> writes:

> On Sat, 17 Aug 2019 08:17:48 +0200
> Markus Armbruster <arm...@redhat.com> wrote:
>
>> Eduardo Habkost <ehabk...@redhat.com> writes:
>> 
>> > On Fri, Aug 16, 2019 at 03:20:11PM +0200, Igor Mammedov wrote:  
>> >> On Thu, 15 Aug 2019 15:38:03 -0300
>> >> Eduardo Habkost <ehabk...@redhat.com> wrote:
>> >>   
>> >> > We have this issue reported when using libvirt to hotplug CPUs:
>> >> > https://bugzilla.redhat.com/show_bug.cgi?id=1741451
>> >> > 
>> >> > Basically, libvirt is not copying die-id from
>> >> > query-hotpluggable-cpus, but die-id is now mandatory.  
>> >> 
>> >> this should have been gated on compat property and affect
>> >> only new machine types.
>> >> Maybe we should do just that instead of fixup so libvirt
>> >> would finally make proper handling of query-hotpluggable-cpus.
>> >> 
>> >>    
>> >> > We could blame libvirt and say it is not following the documented
>> >> > interface, because we have this buried in the QAPI schema
>> >> > documentation:  
>> >> 
>> >> I wouldn't say buried, if I understand it right QAPI schema
>> >> should be the authoritative source of interface description.
>> >> 
>> >> If I recall it's not the first time, there was similar issue
>> >> for exactly the same reason (libvirt not passing through
>> >> all properties from query-hotpluggable-cpus).
>> >> 
>> >> And we had to fix it up on QEMU side (numa_cpu_pre_plug),
>> >> but it seems 2 years later libvirt is still broken the same way :(
>> >> 
>> >> Should we really do fixups or finaly fix it on libvirt side?  
>> >
>> > Is it truly a bug in libvirt?  Making QEMU behave differently
>> > when getting exactly the same input sounds like a bad idea, even
>> > if we documented that at the QAPI documentation.
>> >
>> > My suggestion is to instead drop the comment below from the QAPI
>> > documentation.  New properties shouldn't become mandatory.  
>> 
>> The "comment below" is this one, in qapi/machine.json:
>> 
>> >> > > Note: currently there are 5 properties that could be present
>> >> > > but management should be prepared to pass through other
>> >> > > properties with device_add command to allow for future
>> >> > > interface extension. This also requires the filed names to be kept in
>> >> > > sync with the properties passed to -device/device_add.    
>> 
>> Goes back to commit d4633541ee0, v2.7.0.  @die-id was the first such
>> interface extension.
>> 
>> A rule like "to use command C, you must pass it whatever you get from
>> command Q" punches a hole into the "QMP is a stable interface" promise.
>> Retroactively tacking it onto an existing interface like device-add
>> some-existing-device is even more problematic than specifying it for a
>> new interface.  Mind, this is not a categorical "can't ever do that".
>> It's more like "you better show this is less bad than all the
>> alternatives we can think of, and we've thought pretty hard".
>> Since this particular hole failed us the first time anybody actually
>> tried to wiggle through it, I think Eduardo has a point when he calls
>> for filling it in by deleting the comment.
>
> That was a consensus we were able to reach when discussing cpu hotplug
> QMP interface. If I recall correctly idea was that it should work for
> different targets (cpu topology properties target specific) and be
> extensible without breaking old mgmt stack  or requiring its update
> in lock step.
>
> If implemented correctly mgmt would not only query from QEMU/machine
> possible CPUs (with properties and valid values needed to plug it in,
> which it does already) but also 'keep' them around and pass back to
> device_add. In that case it would have worked as designed just fine.
>
> But this also shows a problem that we still need versioned machine type
> to keep old set of properties for old machine types anyway and we can
> miss it during review as tests we have might be not enough
> (tests/cpu-plug-test didn't detect it for some reason).

I think the lesson to learn here is "non-trivial rules on correct
interface use need to be backed by integration tests".

The rule in question is "a CPU hot-plug with device_add must specify all
the properties returned by query-hotpluggable-cpus".

Sadly, stipulating such rules does not change the de facto API.  Case in
point: libvirt did not obey this one, and even though it's been in place
for years, yet we're (rightly!) unwilling to blame libvirt for the
regression.  The stipulation was futile.

How could we increase our chances that management applications pick up
such rules?  I can see only one promising way: make tests fail unless
they do.  Add some arbitray dummy property, fail the hot plug unless
it's given.  Of course, we can't do that, because it's exactly the
breakage we're trying to avoid.  So do it only when QEMU is run with
--future, then have integration tests run it that way.

Aside: I'm afraid "# TODO: Better documentation; currently there is
none" didn't exactly help with query-hotpluggable-cpus uptake.

[...]

Reply via email to