https://bugs.exim.org/show_bug.cgi?id=2120
--- Comment #5 from Rob <r...@playjax.net> --- Thanks for replying to my issue. i'll try to clarify ... I'm using PCRE2 in a Javascript interpreter for a web browser. Viewing some pages on the New York Times website caused the Javascript interpreter to throw a syntax error at the following line ... var f = /[\x00-\x1f\ud800-\udfff\ufffe\uffff\u0300-\u0333\u033d-\u0346\u034a-\u034c\u0350-\u0352\u0357-\u0358\u035c-\u0362\u0374\u037e\u0387\u0591-\u05af\u05c4\u0610-\u0617\u0653-\u0654\u0657-\u065b\u065d-\u065e\u06df-\u06e2\u06eb-\u06ec\u0730\u0732-\u0733\u0735-\u0736\u073a\u073d\u073f-\u0741\u0743\u0745\u0747\u07eb-\u07f1\u0951\u0958-\u095f\u09dc-\u09dd\u09df\u0a33\u0a36\u0a59-\u0a5b\u0a5e\u0b5c-\u0b5d\u0e38-\u0e39\u0f43\u0f4d\u0f52\u0f57\u0f5c\u0f69\u0f72-\u0f76\u0f78\u0f80-\u0f83\u0f93\u0f9d\u0fa2\u0fa7\u0fac\u0fb9\u1939-\u193a\u1a17\u1b6b\u1cda-\u1cdb\u1dc0-\u1dcf\u1dfc\u1dfe\u1f71\u1f73\u1f75\u1f77\u1f79\u1f7b\u1f7d\u1fbb\u1fbe\u1fc9\u1fcb\u1fd3\u1fdb\u1fe3\u1feb\u1fee-\u1fef\u1ff9\u1ffb\u1ffd\u2000-\u2001\u20d0-\u20d1\u20d4-\u20d7\u20e7-\u20e9\u2126\u212a-\u212b\u2329-\u232a\u2adc\u302b-\u302c\uaab2-\uaab3\uf900-\ufa0d\ufa10\ufa12\ufa15-\ufa1e\ufa20\ufa22\ufa25-\ufa26\ufa2a-\ufa2d\ufa30-\ufa6d\ufa70-\ufad9\ufb1d\ufb1f\ufb2a-\ufb36\ufb38-\ufb3c\ufb3e\ufb40-\ufb41\ufb43-\ufb44\ufb46-\uf b4e\ufff0-\uffff]/g; pcre2_compile fails at the range \ud800-\udfff. The JS interpreter must respond with a syntax error and the script is not executed. I cant control what JS developers use as a regex queries other than insuring the script being parsed is valid UTF8 (or gets converted to UTF8 by the browser), and this query works in both Chrome and Firefox. Checking the surrogate range seems too benign to be error-worthy. I could be partly to blame for using PCRE in UTF8 mode instead of UTF16 as per the Javascript specification, and i'm unsure whether it would throw the same error in UTF16 mode I understand PCRE2_NO_UTF_CHECK might not be the right solution. I'm quite happy to continue hand patching new PCRE releases. I mostly submitted this because a google search for "disallowed Unicode code point (>= 0xd800 && <= 0xdfff)" generated quite a few hits so it doesnt seem to be an inconsequential issue. -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev