https://bugs.exim.org/show_bug.cgi?id=2683
Bug ID: 2683 Summary: Pcretest on empty string with /g Product: PCRE Version: 8.44 Hardware: x86 OS: Windows Status: NEW Severity: bug Priority: medium Component: Code Assignee: philip.ha...@gmail.com Reporter: c_moi_l_mas...@hotmail.com CC: pcre-dev@exim.org Hello, Although the handling of /g is left to the programmer in PCRE, there is still a recommended by PCRE's doc, perl compatible, way to do it. A relevant bit in pcre.txt: "Finding all the matches in a subject is tricky when the pattern can match an empty string. It is possible to emulate Perl's /g behaviour by first trying the match again at the same offset, with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED options, and then if that fails, advancing the starting offset and trying an ordinary match again." When applying the expression /a?a?/gC against the string "." in pcretest: -the first match is found in 3 steps (3 callouts), since it's an empty match, pcretest, and I can only assume it must follow the above for /g, tries another match at the same position with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED set, this attempt fails in 3 steps as well. Now the next position is tried, a second match is found in 3 steps. So far it took 9 steps to find two matches. Now the interesting part: according to me and according to the author of regex101.com, the second match is still an empty match, so you must retry another match there with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED set, which pcretest is NOT doing, but regex101.com is doing (can be seen in the debugger, totaling 12 steps) Is pcretest not trying a last match attempt there an issue with pcretest or is this according to perl and not an issue? If it's not an issue, what cause pcretest to stop too early there? is it because an empty match has been found at the end of the string and is just a small optimization? Another observation is the following about PCRE2 (relevant https://github.com/firasdib/Regex101/issues/1236 ): pcre2.txt has the exact same bit of information about how to handle /g, however pcre2test does not seem to behave this way because the expression /(?<=(\G.{2}))(?!$)/g when applied to the string "dfgdftrbrtdtr" reports different captures than pcretest. I believe this is related to /g and empty match, and that if PCRE2 (or pcre2test) were actually following the same recommendation from PCRE, they would produce the same captures. Am I correct or is the difference in capture not related to that at all (could still be an issue maybe)? -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev