On 05/20/2010 10:16 AM, Paolo Bonzini wrote:
On 05/20/2010 03:44 PM, Luiz Capitulino wrote:
I think there's another issue in the handling of strings.
The spec says that valid unescaped chars are in the following range:
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in
strings. Any parser that didn't accept that would be broken.
But we do:
[IN_DQ_STRING] = {
[1 ... 0xFF] = IN_DQ_STRING,
['\\'] = IN_DQ_STRING_ESCAPE,
['"'] = IN_DONE_STRING,
},
Shouldn't we cover 0x20 .. 0xFF instead?
If it's the lexer, isn't just it being liberal in what it accepts?
I believe the parser correctly rejects invalid UTF-8 sequences.
Regards,
Anthony Liguori
paolo