Adding upstream because I think there's little reason to keep this
discussion off-list.

The context of this mailing thread is that we are discussing how we might
want to store and record historical QMP API interface information such that
we can programmatically answer questions like "when was X command
introduced?", but possibly also to generate changelog QMP reports for
auditing QMP changes for each version during the release candidate window.

On Thu, Jun 6, 2024 at 6:25 AM Victor Toso <victort...@redhat.com> wrote:

> Hi John!
>
> On Wed, Jun 05, 2024 at 11:47:53AM GMT, John Snow wrote:
> > Hi,
> >
> > as part of my project to modernize QMP documentation I recently
> > forked the QAPI generator and modded it to be able to read
> > historical QAPI schema going all the way back to v1.0.
> >
> > We want to be able to use this ability to generate accurate
> > "since:" data for the docs.
>
> :)
>
> > My question for you is: do you have any input or preferences
> > for the format of "output" or "compiled" schemata?
>
> Not sure if I understand 'output' & 'compiled' reference here.
>

The idea is that even though I can modify the schema parser to cope with
historical versions of the schema, we should re-save the schema into a
unified format so that the gross hacks and kludges made to support old
versions in my forked QAPI parser don't need to be kept in-tree. In theory,
we only need to parse each historical schema precisely once.

(And then we'll only need to parse each new schema going forward with the
version of the QAPI parser that's already in-tree.)

Importantly, old versions of the schema aren't contained *entirely* within
the schema. Here's a timeline:

v0.12.0: QMP first introduced. Events are hardcoded, commands are defined
in qemu-monitor.hx. query commands are hard-coded in monitor.c.
v0.14.0: qemu-monitor.hx is forked into qmp-commands.hx and hmp-commands.hx
v1.0: First version which features qapi-schema.json; all query commands are
qapified but most other commands are not.
v1.1.0: A very large chunk of commands are QAPIfied.
v1.3.0: Most commands are now QAPIfied, but there are 2-3 remaining.
v2.1.0: events are now fully qapified; most are now defined in
qapi/events.json
v2.8.0: The remaining commands are fully qapified; qmp-commands.hx is
removed.

So what I mean by "compiled" schema is parsing historical revisions of the
schema, including descriptive schema definitions for items not-yet-qapified
(but nonetheless remain available in that version), and writing that
curated information back out to disk (and checking into qemu.git) for later
reference.

There are multiple choices for this output format:

(A) Just output in native qapi-schema format, just choose the "latest
version".
(B) Choose some other arbitrary format.
(C) Some secret, third thing.

I don't like the native qapi schema idea, for a few reasons:
- It changes over time
- It does not support nested structures except by reference, but
- Type names are not meant to API visible
- Detecting data changes in nested, shared types is difficult

In my prototype thus far, I have used a JSON-Schema based format with a
type definition library that catalogues all of the command arg / command
return / event data types with *all fields* fully dereferenced and inlined,
except where impossible due to cyclical/recursive references (PCI and
BlockStats come to mind.)

A benefit of this "compilation" is that all commands have their arguments
and return values described solely by type (and for enums, values) and not
by type name - fully removing non-API information from the "compiled"
version.

A downside is that this is yet-another-format that differs from the
existing format that requires someone knowledgeable to manipulate in case
there are errors found in it, or worse - we decide to change or upgrade the
format in some way to support a new feature in the future and we must
yet-again revise our catalogue of historical schemata.


>
> My preference is for this metadata to be there when the generator
> parses the schema. It could another external metadata file, but
> available to the generator.
>

Yes, ideally...

I'm thinking that I'd like to include a qapi/history/ directory which
contains a record of all compiled historical schemata, and the generator
can parse the directory and load up them all up - and then compile a quick
lookup table that is able to answer basic questions about the data.

i.e. "when was X [command/event] first introduced?"

"when was Command.arguments.key first introduced / incompatibly modified?"

The syntax/API for how to ask the QAPISchema object these questions remains
unpondered, as does the question "how do we report 'since' information in
the rendered QMP documentation HTML for nested objects that we begrudgingly
refer to only by type?"

Even if QAPI type names are not API, and even if my new QMP documentation
project eliminates as many type names as it can by inlining inherited
structures, boxed arguments, and branches - there are still referenced
types for argument types. That might be a messy/unsolvable problem right
now.


>
> We can then identify all the cases where something might have
> changed, when it changed, and what proper outputs would we have
> due to that. (e.g: field added/removed)
>
> For a whole discussion of this between Markus, Daniel and Andrea
> with Go output (possibilities) in mind, see the thread from
>
>     Date: Tue, 10 May 2022 13:51:05 +0100
>     From: "Daniel P. Berrangé" <berra...@redhat.com>
>     Subject: Re: [RFC PATCH v1 0/8] qapi: add generator for Golang
> interface
>     Message-ID: <ynpfuyvbu56cc...@redhat.com>
>
> > I figured you might have some opinions because of your dbus and
> > other API work.  Possibly this historical record is useful to
> > you in some way.
>
> I'm thinking back on when I worked in the libvirt-go-module code
> generator.
>
> For the Go bindings itself, I'm inclined to a have its versions
> follow closely QEMU versions which brings each qapi-go tag to
> support to what QAPI itself supports for that QEMU version we
> generated that code. I think this is more or less close to what
> Go applications would expect. So, the Since version is not that
> impactful as it was for libvirt-go-module. Ah, based on what
> Marc-André wrote, I think this is the same for qapi-rs.
>
> Cheers,
> Victor
>

As bonus context: here's output from a small utility I wrote called
schema-diff that compares the compiled v1.0 and v1.1.0 schema:

```
jsnow@scv ~/s/qemu ((v3.1.0))> ./schema-diff.py qapi-compiled-v1.0.json
qapi-compiled-v1.1.0.json
Added commands
==============
block-job-cancel
block-job-set-speed
block-stream
block_set_io_throttle
change-vnc-password
qom-get
qom-list
qom-list-types
qom-set
query-block-jobs
system_wakeup
transaction
xen-save-devices-state

Removed commands
================

Modified commands
=================
add_client (x-qemu-arguments)
    ++ arguments.tls: Optional<boolean>
blockdev-snapshot-sync (x-qemu-arguments)
    -- arguments.snapshot-file: Optional<string>
    ++ arguments.mode: Optional<enum>
    ++ arguments.snapshot-file: string
memsave (x-qemu-arguments)
    ++ arguments.cpu-index: Optional<integer>
query-block (x-qemu-returns)
    ++ returns[].inserted.bps: integer
    ++ returns[].inserted.bps_rd: integer
    ++ returns[].inserted.bps_wr: integer
    ++ returns[].inserted.iops: integer
    ++ returns[].inserted.iops_rd: integer
    ++ returns[].inserted.iops_wr: integer
query-spice (x-qemu-returns)
    ++ returns.mouse-mode: enum
query-status (x-qemu-returns)
       returns.status: enum
        ++ suspended
```

It's possibly rough around the edges (branches, ifcond and features are not
yet supported; alternates are kinda-supported but do not yet diff
well/prettily, ignore the "x-qemu-" stuff), and the diff output uses an
arbitrary/invented "key path" and "field type" notation in order to
simplify the changelog summary, but you can hopefully see the utility of
such a report.

Helpfully, it shows *all levels* of changes, no matter how deeply nested,
so *all* potential changes to the API can be observed / audited /
documented. The only types it ever reports are fundamental JSON types and
"any" and "enum", so any refactoring we do in the QAPI schema with regards
to type names, inheritance, boxing, etc are fully transparent in this
analysis.

It's a useful tool for reviewing QAPI changes during the RC period to
ensure we did not accidentally create any breaking changes since the last
released version.

One more example for the road:

```
jsnow@scv ~/s/qemu ((v3.1.0))> ./schema-diff.py qapi-compiled-v1.1.0.json
qapi-compiled-v1.2.0.json
Added commands
==============
add-fd
device-list-properties
dump-guest-memory
migrate-set-cache-size
migrate-set-capabilities
query-cpu-definitions
query-events
query-fdsets
query-machines
query-migrate-cache-size
query-migrate-capabilities
query-target
remove-fd

Removed commands
================

Modified commands
=================
query-block (x-qemu-returns)
    ++ returns[].inserted.backing_file_depth: integer
    ++ returns[].inserted.encryption_key_missing: boolean
query-migrate (x-qemu-returns)
    ++ returns.disk.duplicate: integer
    ++ returns.disk.normal: integer
    ++ returns.disk.normal-bytes: integer
    ++ returns.ram.duplicate: integer
    ++ returns.ram.normal: integer
    ++ returns.ram.normal-bytes: integer
    ++ returns.total-time: Optional<integer>
    ++ returns.xbzrle-cache: Optional<object>
    ++ returns.xbzrle-cache.bytes: integer
    ++ returns.xbzrle-cache.cache-miss: integer
    ++ returns.xbzrle-cache.cache-size: integer
    ++ returns.xbzrle-cache.overflow: integer
    ++ returns.xbzrle-cache.pages: integer
```

Here, you can see that although the "disk" and "ram" fields for
query-migrate's return type are very likely the same actual type, the
compiler has evaluated everything down to its constituent parts and
reported each difference independently for the report; preserving the
possibility to obtain wire-protocol accurate added/modified data for each
and every individual key.

I'll also point out that this diff view does not confuse "first version
qapified" with "first version introduced"; v1.2.0 actually qapified several
commands for the first time, but they remained available in earlier
versions. The diff output accurately reflects this.

--js

Reply via email to