https://bugs.exim.org/show_bug.cgi?id=1866
Bug ID: 1866 Summary: UTF-8 class containing \D and \P{Nd} matches incorrectly Product: PCRE Version: 8.39 Hardware: x86 OS: All Status: NEW Severity: bug Priority: medium Component: Code Assignee: p...@hermes.cam.ac.uk Reporter: justin.vii...@intel.com CC: pcre-dev@exim.org We found an issue with this pattern, which has the PCRE_UTF8 flag set but not PCRE_UCP: /[\D\P{Nd}]/8 pcretest -d shows this class being interpreted as the union of "all non-digit characters up to \xff" and "all characters not in \p{Nd}": $ ./pcretest -d PCRE version 8.39 2016-06-14 re> /[\D\P{Nd}]/8 ------------------------------------------------------------------ 0 43 Bra 3 [\x00-/:-\xff\P{Nd}] 43 43 Ket 46 End ------------------------------------------------------------------ Capturing subpattern count = 0 Options: utf No first char No need char data> 0 No match data> _ 0: _ data> \x{1d7cf} No match However, my reading of the pcrepattern documentation suggests that without the PCRE_UCP flag, \D should be interpreted as "all non-digit characters up to \xff and *all other characters*, meaning that the last test case above, U+1D7CF (mathematical bold digit one) should match. This is the case for this pattern if we use /\D/8 on its own, or if we transform the pattern above into an alternation: re> /\D|\P{Nd}/8 data> 0 No match data> a 0: a data> \x{1d7cf} 0: \x{1d7cf} I have checked both PCRE 8.39 and PCRE 10.22 and they both show the same behaviour. -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev