https://bugs.exim.org/show_bug.cgi?id=2120
--- Comment #4 from Philip Hazel <p...@hermes.cam.ac.uk> --- Thanks for the comments. I was thinking about this overnight, and had second thoughts about it, along the lines of what Christian says. I think we need to know exactly what is the problem here. A valid UTF-8 string can never contain characters in the surrogate range 0xd800-0xdfff, and the UTF check in pcre2_match() will pick this up. I *think* he is saying that some websites have explicit checks for character values in the surrogate range, using patterns containing explicit values such as \x{d800}. I have just realized that there is in any case an oddity in PCRE2. The range [\x{d7ff}-\x{e000}] is accepted, but [\x{d800}-\x{dfff}] is not. I will await further input from Rob. Allowing these values in UTF-8 or UTF-32 mode would be possible (under some option), but not in UTF-16 mode because they cannot be represented in that mode. -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev