Hi, I recently got a notification that PCRE does not support \u in PCRE_JAVASCRIPT_COMPAT mode.
I have checked the latest standard here: http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf And it says: 15.10.1 Patterns CharacterEscape :: [...] HexEscapeSequence UnicodeEscapeSequence [...] Later, in 15.10.2.10 CharacterEscape: The production CharacterEscape :: HexEscapeSequence evaluates by evaluating the CV of the HexEscapeSequence (see 7.8.4) and returning its character result. The production CharacterEscape :: UnicodeEscapeSequence evaluates by evaluating the CV of the UnicodeEscapeSequence (see 7.8.4) and returning its character result. 7.8.4 String Literals [...] HexEscapeSequence :: x HexDigit HexDigit UnicodeEscapeSequence :: u HexDigit HexDigit HexDigit HexDigit [...] Thus a \x hex escape in PCRE_JAVASCRIPT_COMPAT mode must be followed by two hexadecimal character, and evaluated as byte, while \u must be followed by four hexadecimal character, and evaluated as an unsigned short. I have checked what happens if \u does not followed by 4 hex numbers /b\u0041x/ matches to "bAx" /b\u041x/ matches to "bu041x" /b\x41x/ matches to "bAx" /b\x1x/ matches to "bx1x" So \u is simply converted to u, and \x as x if not followed by enough hex characters. Philip, could we follow the standard here? Regards, Zoltan -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
