https://bugs.exim.org/show_bug.cgi?id=2641
Bug ID: 2641 Summary: test pattern delimiters Product: PCRE Version: 10.35 (PCRE2) Hardware: x86 OS: Linux Status: NEW Severity: bug Priority: medium Component: Code Assignee: p...@hermes.cam.ac.uk Reporter: h...@crypt.org CC: pcre-dev@exim.org Comments in perltest say: # Unless # "subject_literal" is on the pattern, data lines are processed as # Perl double-quoted strings, so if they contain " $ or @ characters, these # have to be escaped. For this reason, all such characters in the # Perl-compatible testinput1 and testinput4 files are escaped so that they can # be used for perltest as well as for pcre2test. I assume that by "data lines" it means the strings to match on rather than the patterns; the patterns are processsed by C< eval "\$_ =~ ${pattern}" >, which will interpret the pattern as a regexp rather than a double-quoted string _except_ if certain special delimiters such as C<"> or C<'> are used. For no obvious reason (except the first), some of the patterns in testdata/testinput1 are enclosed in those special delimiters: 1960:"(?>.*/)foo" 3834:"(?x)(?-x: \s*#\s*)" 3839:"(?x-is)(?:(?-ixs) \s*#\s*) include" 5235:"(?>.*)foo" 5239:"(?>.*?)foo" 5662:'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++' 5665:'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++' 5668:'\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++' 5671:'\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++' 5730:"Z*(|d*){216}" 5732:"(?1)(?#?'){8}(a)" 5744:"(?|(\k'Pm')|(?'Pm'))" 5836:'(?>ab|abab){1,5}?M' 5839:'(?>ab|abab){2}?M' 5842:'((?(?=(a))a)+k)' 5845:'((?(?=(a))a|)+k)' 5848:'(?(?!(b))a|b)+k' 6414:"(?<=X(?(DEFINE)(A)))X(*F)" 6418:"(?<=X(?(DEFINE)(A)))." 6421:"(?<=X(?(DEFINE)(.*))Y)." 6424:"(?<=X(?(DEFINE)(Y))(?1))." 6427:"(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word" This causes problems for example with the pattern at line 3834: because the special delimiter causes it to be interpolated like a double-quoted string, the "\s" in the pattern are interpolated as "s", so the wrong pattern results. I'd suggest changing the delimiter to '/' for all of these except the first, and for that one using something less special such as '!'. There are two similar cases in testinput4: 479:"(?s)(.{1,5})"utf 2223:"[\S\V\H]"utf Hope this helps, Hugo van der Sanden -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev