Markus Armbruster <arm...@redhat.com> writes: > Eric Blake <ebl...@redhat.com> writes: > >> On 08/10/2018 10:48 AM, Eric Blake wrote: >>> On 08/08/2018 07:03 AM, Markus Armbruster wrote: >>>> This is consistent with qobject_to_json(). See commit e2ec3f97680. >>> >>> Side note: that commit mentions that on output, ASCII DEL (0x7f) is >>> always escaped. RFC 7159 does not require it to be escaped on input, > > Weird, isn't it? > >>> but I wonder if any of your earlier testsuite improvements should >>> specifically cover \x7f vs. \u007f on input being canonicalized to >>> \u007f on round trip output. > > From utf8_string(): > > /* 2.2.1 1 byte U+007F */ > { > "\x7F", > "\x7F", > "\\u007F", > }, > > We test parsing of JSON "\x7F" (expecting C string "\x7F"), unparsing of > that C string (expecting JSON "\\u007F"), and after PATCH 29 parsing of > that JSON (expecting the C string again). Sufficient? > >>>> >>>> Signed-off-by: Markus Armbruster <arm...@redhat.com> >>>> --- >>>> qobject/json-lexer.c | 2 +- >>>> qobject/json-parser.c | 2 +- >>>> tests/check-qjson.c | 8 +------- >>>> 3 files changed, 3 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c >>>> index ca1e0e2c03..36fb665b12 100644 >>>> --- a/qobject/json-lexer.c >>>> +++ b/qobject/json-lexer.c >>>> @@ -93,7 +93,7 @@ >>>> * interpolation = %((l|ll|I64)[du]|[ipsf]) >>>> * >>>> * Note: >>>> - * - Input must be encoded in UTF-8. >>>> + * - Input must be encoded in modified UTF-8. >>> >>> Worth documenting this in the QMP doc as an explicit extension? > > qmp-spec.txt: > > The sever expects its input to be encoded in UTF-8, and sends its > output encoded in ASCII. > > The obvious update would be to stick in "modified".
Not really necessary, because: * Before this patch, the JSON parser rejects \0 as ASCII control character, and \xC0\x80 as overlong UTF-8. Note that PATCH 17 fixed rejection of \0 in JSON strings. PATCH 21 fixed rejection of invalid UTF-8, but \xC0\x80 wasn't broken. * This patch makes \xC0\x80 pass the "invalid UTF-8" check, only to get rejected as ASCII control character. The error message changes, that's all. The patch's benefit is consistency with the other direction: qobject_to_json() maps \xC0\x80 to \\u0000. I guess my commit message should explain this a bit better. [...]