Adding upstream because I think there's little reason to keep this discussion off-list.
The context of this mailing thread is that we are discussing how we might want to store and record historical QMP API interface information such that we can programmatically answer questions like "when was X command introduced?", but possibly also to generate changelog QMP reports for auditing QMP changes for each version during the release candidate window. On Thu, Jun 6, 2024 at 6:25 AM Victor Toso <victort...@redhat.com> wrote: > Hi John! > > On Wed, Jun 05, 2024 at 11:47:53AM GMT, John Snow wrote: > > Hi, > > > > as part of my project to modernize QMP documentation I recently > > forked the QAPI generator and modded it to be able to read > > historical QAPI schema going all the way back to v1.0. > > > > We want to be able to use this ability to generate accurate > > "since:" data for the docs. > > :) > > > My question for you is: do you have any input or preferences > > for the format of "output" or "compiled" schemata? > > Not sure if I understand 'output' & 'compiled' reference here. > The idea is that even though I can modify the schema parser to cope with historical versions of the schema, we should re-save the schema into a unified format so that the gross hacks and kludges made to support old versions in my forked QAPI parser don't need to be kept in-tree. In theory, we only need to parse each historical schema precisely once. (And then we'll only need to parse each new schema going forward with the version of the QAPI parser that's already in-tree.) Importantly, old versions of the schema aren't contained *entirely* within the schema. Here's a timeline: v0.12.0: QMP first introduced. Events are hardcoded, commands are defined in qemu-monitor.hx. query commands are hard-coded in monitor.c. v0.14.0: qemu-monitor.hx is forked into qmp-commands.hx and hmp-commands.hx v1.0: First version which features qapi-schema.json; all query commands are qapified but most other commands are not. v1.1.0: A very large chunk of commands are QAPIfied. v1.3.0: Most commands are now QAPIfied, but there are 2-3 remaining. v2.1.0: events are now fully qapified; most are now defined in qapi/events.json v2.8.0: The remaining commands are fully qapified; qmp-commands.hx is removed. So what I mean by "compiled" schema is parsing historical revisions of the schema, including descriptive schema definitions for items not-yet-qapified (but nonetheless remain available in that version), and writing that curated information back out to disk (and checking into qemu.git) for later reference. There are multiple choices for this output format: (A) Just output in native qapi-schema format, just choose the "latest version". (B) Choose some other arbitrary format. (C) Some secret, third thing. I don't like the native qapi schema idea, for a few reasons: - It changes over time - It does not support nested structures except by reference, but - Type names are not meant to API visible - Detecting data changes in nested, shared types is difficult In my prototype thus far, I have used a JSON-Schema based format with a type definition library that catalogues all of the command arg / command return / event data types with *all fields* fully dereferenced and inlined, except where impossible due to cyclical/recursive references (PCI and BlockStats come to mind.) A benefit of this "compilation" is that all commands have their arguments and return values described solely by type (and for enums, values) and not by type name - fully removing non-API information from the "compiled" version. A downside is that this is yet-another-format that differs from the existing format that requires someone knowledgeable to manipulate in case there are errors found in it, or worse - we decide to change or upgrade the format in some way to support a new feature in the future and we must yet-again revise our catalogue of historical schemata. > > My preference is for this metadata to be there when the generator > parses the schema. It could another external metadata file, but > available to the generator. > Yes, ideally... I'm thinking that I'd like to include a qapi/history/ directory which contains a record of all compiled historical schemata, and the generator can parse the directory and load up them all up - and then compile a quick lookup table that is able to answer basic questions about the data. i.e. "when was X [command/event] first introduced?" "when was Command.arguments.key first introduced / incompatibly modified?" The syntax/API for how to ask the QAPISchema object these questions remains unpondered, as does the question "how do we report 'since' information in the rendered QMP documentation HTML for nested objects that we begrudgingly refer to only by type?" Even if QAPI type names are not API, and even if my new QMP documentation project eliminates as many type names as it can by inlining inherited structures, boxed arguments, and branches - there are still referenced types for argument types. That might be a messy/unsolvable problem right now. > > We can then identify all the cases where something might have > changed, when it changed, and what proper outputs would we have > due to that. (e.g: field added/removed) > > For a whole discussion of this between Markus, Daniel and Andrea > with Go output (possibilities) in mind, see the thread from > > Date: Tue, 10 May 2022 13:51:05 +0100 > From: "Daniel P. Berrangé" <berra...@redhat.com> > Subject: Re: [RFC PATCH v1 0/8] qapi: add generator for Golang > interface > Message-ID: <ynpfuyvbu56cc...@redhat.com> > > > I figured you might have some opinions because of your dbus and > > other API work. Possibly this historical record is useful to > > you in some way. > > I'm thinking back on when I worked in the libvirt-go-module code > generator. > > For the Go bindings itself, I'm inclined to a have its versions > follow closely QEMU versions which brings each qapi-go tag to > support to what QAPI itself supports for that QEMU version we > generated that code. I think this is more or less close to what > Go applications would expect. So, the Since version is not that > impactful as it was for libvirt-go-module. Ah, based on what > Marc-André wrote, I think this is the same for qapi-rs. > > Cheers, > Victor > As bonus context: here's output from a small utility I wrote called schema-diff that compares the compiled v1.0 and v1.1.0 schema: ``` jsnow@scv ~/s/qemu ((v3.1.0))> ./schema-diff.py qapi-compiled-v1.0.json qapi-compiled-v1.1.0.json Added commands ============== block-job-cancel block-job-set-speed block-stream block_set_io_throttle change-vnc-password qom-get qom-list qom-list-types qom-set query-block-jobs system_wakeup transaction xen-save-devices-state Removed commands ================ Modified commands ================= add_client (x-qemu-arguments) ++ arguments.tls: Optional<boolean> blockdev-snapshot-sync (x-qemu-arguments) -- arguments.snapshot-file: Optional<string> ++ arguments.mode: Optional<enum> ++ arguments.snapshot-file: string memsave (x-qemu-arguments) ++ arguments.cpu-index: Optional<integer> query-block (x-qemu-returns) ++ returns[].inserted.bps: integer ++ returns[].inserted.bps_rd: integer ++ returns[].inserted.bps_wr: integer ++ returns[].inserted.iops: integer ++ returns[].inserted.iops_rd: integer ++ returns[].inserted.iops_wr: integer query-spice (x-qemu-returns) ++ returns.mouse-mode: enum query-status (x-qemu-returns) returns.status: enum ++ suspended ``` It's possibly rough around the edges (branches, ifcond and features are not yet supported; alternates are kinda-supported but do not yet diff well/prettily, ignore the "x-qemu-" stuff), and the diff output uses an arbitrary/invented "key path" and "field type" notation in order to simplify the changelog summary, but you can hopefully see the utility of such a report. Helpfully, it shows *all levels* of changes, no matter how deeply nested, so *all* potential changes to the API can be observed / audited / documented. The only types it ever reports are fundamental JSON types and "any" and "enum", so any refactoring we do in the QAPI schema with regards to type names, inheritance, boxing, etc are fully transparent in this analysis. It's a useful tool for reviewing QAPI changes during the RC period to ensure we did not accidentally create any breaking changes since the last released version. One more example for the road: ``` jsnow@scv ~/s/qemu ((v3.1.0))> ./schema-diff.py qapi-compiled-v1.1.0.json qapi-compiled-v1.2.0.json Added commands ============== add-fd device-list-properties dump-guest-memory migrate-set-cache-size migrate-set-capabilities query-cpu-definitions query-events query-fdsets query-machines query-migrate-cache-size query-migrate-capabilities query-target remove-fd Removed commands ================ Modified commands ================= query-block (x-qemu-returns) ++ returns[].inserted.backing_file_depth: integer ++ returns[].inserted.encryption_key_missing: boolean query-migrate (x-qemu-returns) ++ returns.disk.duplicate: integer ++ returns.disk.normal: integer ++ returns.disk.normal-bytes: integer ++ returns.ram.duplicate: integer ++ returns.ram.normal: integer ++ returns.ram.normal-bytes: integer ++ returns.total-time: Optional<integer> ++ returns.xbzrle-cache: Optional<object> ++ returns.xbzrle-cache.bytes: integer ++ returns.xbzrle-cache.cache-miss: integer ++ returns.xbzrle-cache.cache-size: integer ++ returns.xbzrle-cache.overflow: integer ++ returns.xbzrle-cache.pages: integer ``` Here, you can see that although the "disk" and "ram" fields for query-migrate's return type are very likely the same actual type, the compiler has evaluated everything down to its constituent parts and reported each difference independently for the report; preserving the possibility to obtain wire-protocol accurate added/modified data for each and every individual key. I'll also point out that this diff view does not confuse "first version qapified" with "first version introduced"; v1.2.0 actually qapified several commands for the first time, but they remained available in earlier versions. The diff output accurately reflects this. --js