On 05/20/2010 10:16 AM, Paolo Bonzini wrote:
On 05/20/2010 03:44 PM, Luiz Capitulino wrote:
  I think there's another issue in the handling of strings.

  The spec says that valid unescaped chars are in the following range:

     unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in strings. Any parser that didn't accept that would be broken.


  But we do:

     [IN_DQ_STRING] = {
         [1 ... 0xFF] = IN_DQ_STRING,
         ['\\'] = IN_DQ_STRING_ESCAPE,
         ['"'] = IN_DONE_STRING,
     },

  Shouldn't we cover 0x20 .. 0xFF instead?

If it's the lexer, isn't just it being liberal in what it accepts?

I believe the parser correctly rejects invalid UTF-8 sequences.

Regards,

Anthony Liguori

paolo


Reply via email to