On 08/10/2018 10:48 AM, Eric Blake wrote:
* Note:
- * - Input must be encoded in UTF-8.
+ * - Input must be encoded in modified UTF-8.
Worth documenting this in the QMP doc as an explicit extension? In
general, our QMP interfaces that take binary input do so via base64
encoding, rather than via a modified UTF-8 string - and I don't know how
yajl or jansson would feel about an extension for producing modified
UTF-8 for QMP to consume if we really did want to pass NUL bytes without
the overhead of UTF-8; what's more, even if you can pass NUL, you still
have to worry about all other byte sequences being valid (so base64 is
still better for true binary data - it's hard to argue that we'd ever
have an interface where we want UTF-8 including embedded NUL rather than
true binary). I guess it can also be argued that outputting modified
UTF-8 is a violation of JSON, so the fact that we can round-trip NUL
doesn't help if the client can't read it.
So having typed all that, I guess the answer is no, we don't want to
document it; for now, the fact that we accept \xc0\x80 on input and
produce it on output is only for the testsuite, and unlikely to matter
to any real client of QMP.
Actually, I guess we never output \xc0\x80; but would output the C
string "\\u0000" (since any byte above 0x1f is passed through our UTF
decoder back into a codepoint then output with \u). So it's really only
a question of whether our input engine can pass "\x00" vs. "\\u0000"
when we NEED an input NUL, and except for the testsuite, our QAPI schema
never really needs an input NUL.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org