Re: [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters

2018-08-10 Thread Markus Armbruster
Eric Blake  writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> Fix the lexer to reject unescaped control characters in JSON strings,
>> in accordance with RFC 7159.
>
> Question - can this break existing QMP clients that were relying on
> this extension working?

In theory, yes.

The "extension" is undocumented.  That makes it a bug.

I'm not aware of clients relying on it.

> Libvirt used to use libyajl, now it uses libjansson. So I'll check
> both of those libraries:
>
> yajl: https://github.com/lloyd/yajl/blob/master/src/yajl_encode.c#L32
>
> default:
> if ((unsigned char) str[end] < 32) {
> CharToHex(str[end], hexBuf + 4);
> escaped = hexBuf;
>
> jansson: https://github.com/akheron/jansson/blob/master/src/dump.c#L101
>
> /* mandatory escape or control char */
> if(codepoint == '\\' || codepoint == '"' || codepoint < 0x20)
>
> Okay, both libraries appear to always send control characters encoded,
> and thus were not relying on this accidental QMP extension.
>
> Are we worried about other clients?

Breakage seems unlikely to me.

>> Bonus: we now recover more nicely from unclosed strings.  E.g.
>>
>>  {"one: 1}\n{"two": 2}
>>
>> now recovers cleanly after the newline, where before the lexer
>> remained confused until the next unpaired double quote or lexical
>> error.
>
> On that grounds alone, I could live with this patch, even if we end up
> having to revert it later if some client was actually depending on
> sending raw control characters as part of a string.

Having to revert the patch to stay bug-compatible wouldn't be exactly
terrible.

> Reviewed-by: Eric Blake 

Thanks!



Re: [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters

2018-08-09 Thread Eric Blake

On 08/08/2018 07:02 AM, Markus Armbruster wrote:

Fix the lexer to reject unescaped control characters in JSON strings,
in accordance with RFC 7159.


Question - can this break existing QMP clients that were relying on this 
extension working?


Libvirt used to use libyajl, now it uses libjansson. So I'll check both 
of those libraries:


yajl: https://github.com/lloyd/yajl/blob/master/src/yajl_encode.c#L32

default:
if ((unsigned char) str[end] < 32) {
CharToHex(str[end], hexBuf + 4);
escaped = hexBuf;

jansson: https://github.com/akheron/jansson/blob/master/src/dump.c#L101

/* mandatory escape or control char */
if(codepoint == '\\' || codepoint == '"' || codepoint < 0x20)

Okay, both libraries appear to always send control characters encoded, 
and thus were not relying on this accidental QMP extension.


Are we worried about other clients?



Bonus: we now recover more nicely from unclosed strings.  E.g.

 {"one: 1}\n{"two": 2}

now recovers cleanly after the newline, where before the lexer
remained confused until the next unpaired double quote or lexical
error.


On that grounds alone, I could live with this patch, even if we end up 
having to revert it later if some client was actually depending on 
sending raw control characters as part of a string.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



[Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters

2018-08-08 Thread Markus Armbruster
Fix the lexer to reject unescaped control characters in JSON strings,
in accordance with RFC 7159.

Bonus: we now recover more nicely from unclosed strings.  E.g.

{"one: 1}\n{"two": 2}

now recovers cleanly after the newline, where before the lexer
remained confused until the next unpaired double quote or lexical
error.

Signed-off-by: Markus Armbruster 
---
 qobject/json-lexer.c | 4 ++--
 tests/check-qjson.c  | 6 +-
 tests/qmp-test.c | 4 ++--
 3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 7c0875d225..e85e9a78ff 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -115,7 +115,7 @@ static const uint8_t json_lexer[][256] =  {
 ['u'] = IN_DQ_UCODE0,
 },
 [IN_DQ_STRING] = {
-[1 ... 0xBF] = IN_DQ_STRING,
+[0x20 ... 0xBF] = IN_DQ_STRING,
 [0xC2 ... 0xF4] = IN_DQ_STRING,
 ['\\'] = IN_DQ_STRING_ESCAPE,
 ['"'] = JSON_STRING,
@@ -155,7 +155,7 @@ static const uint8_t json_lexer[][256] =  {
 ['u'] = IN_SQ_UCODE0,
 },
 [IN_SQ_STRING] = {
-[1 ... 0xBF] = IN_SQ_STRING,
+[0x20 ... 0xBF] = IN_SQ_STRING,
 [0xC2 ... 0xF4] = IN_SQ_STRING,
 ['\\'] = IN_SQ_STRING_ESCAPE,
 ['\''] = JSON_STRING,
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index fda2b014a3..7d8ce5c68d 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -171,11 +171,7 @@ static void utf8_string(void)
 "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F"
 "\x10\x11\x12\x13\x14\x15\x16\x17"
 "\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F",
-/* bug: not corrected (valid UTF-8, but invalid JSON) */
-"\x01\x02\x03\x04\x05\x06\x07"
-"\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F"
-"\x10\x11\x12\x13\x14\x15\x16\x17"
-"\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F",
+NULL,
 "\\u0001\\u0002\\u0003\\u0004\\u0005\\u0006\\u0007"
 "\\b\\t\\n\\u000B\\f\\r\\u000E\\u000F"
 "\\u0010\\u0011\\u0012\\u0013\\u0014\\u0015\\u0016\\u0017"
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 5117a1ab25..b77987b644 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -86,9 +86,9 @@ static void test_malformed(QTestState *qts)
 g_assert(recovered(qts));
 
 /* lexical error: control character in string */
-qtest_qmp_send_raw(qts, "{'execute': 'nonexistent', 'id':'\n'}");
+qtest_qmp_send_raw(qts, "{'execute': 'nonexistent', 'id':'\n");
 resp = qtest_qmp_receive(qts);
-g_assert_cmpstr(get_error_class(resp), ==, "CommandNotFound"); /* BUG */
+g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
 qobject_unref(resp);
 g_assert(recovered(qts));
 
-- 
2.17.1