[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

admin Tue, 09 May 2017 03:39:41 -0700

https://bugs.exim.org/show_bug.cgi?id=2120


--- Comment #4 from Philip Hazel <p...@hermes.cam.ac.uk> ---
Thanks for the comments. I was thinking about this overnight, and had second
thoughts about it, along the lines of what Christian says. I think we need to
know exactly what is the problem here. A valid UTF-8 string can never contain
characters in the surrogate range 0xd800-0xdfff, and the UTF check in
pcre2_match() will pick this up.

I *think* he is saying that some websites have explicit checks for character
values in the surrogate range, using patterns containing explicit values such
as \x{d800}. 

I have just realized that there is in any case an oddity in PCRE2. The range
[\x{d7ff}-\x{e000}] is accepted, but [\x{d800}-\x{dfff}] is not.

I will await further input from Rob. Allowing these values in UTF-8 or UTF-32
mode would be possible (under some option), but not in UTF-16 mode because they
cannot be represented in that mode.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] [Bug 2120] PCRE2_NO_UTF_CHECK does not disable all checks

Reply via email to