[pcre-dev] \u in JavaScript compat mode

Zoltán Herczeg Fri, 11 Nov 2011 13:36:35 -0800

Hi,

I recently got a notification that PCRE does not support \u in 
PCRE_JAVASCRIPT_COMPAT mode.


I have checked the latest standard here:
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

And it says:

15.10.1 Patterns

CharacterEscape ::
  [...]
  HexEscapeSequence
  UnicodeEscapeSequence
  [...]

Later, in 15.10.2.10 CharacterEscape:

The production CharacterEscape :: HexEscapeSequence evaluates by evaluating the 
CV of the HexEscapeSequence (see 7.8.4) and returning its character result.
The production CharacterEscape :: UnicodeEscapeSequence evaluates by evaluating 
the CV of the UnicodeEscapeSequence (see 7.8.4) and returning its character 
result.

7.8.4 String Literals

[...]
HexEscapeSequence ::
   x HexDigit HexDigit
UnicodeEscapeSequence ::
   u HexDigit HexDigit HexDigit HexDigit
[...]

Thus a \x hex escape in PCRE_JAVASCRIPT_COMPAT mode must be followed by two 
hexadecimal character, and evaluated as byte, while \u must be followed by four 
hexadecimal character, and evaluated as an unsigned short.

I have checked what happens if \u does not followed by 4 hex numbers
/b\u0041x/ matches to "bAx"
/b\u041x/ matches to "bu041x"
/b\x41x/ matches to "bAx"
/b\x1x/ matches to "bx1x"
So \u is simply converted to u, and \x as x if not followed by enough hex 
characters.

Philip, could we follow the standard here?

Regards,
Zoltan


-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] \u in JavaScript compat mode

Reply via email to