https://bugs.exim.org/show_bug.cgi?id=1883
Bug ID: 1883 Summary: Allow parsing of any context-sensitive grammar by enabling non-atomic recursion Product: PCRE Version: N/A Hardware: All OS: All Status: NEW Severity: wishlist Priority: medium Component: Code Assignee: p...@hermes.cam.ac.uk Reporter: bobw...@hotmail.com CC: pcre-dev@exim.org Essentially: Remove this restriction from PCRE Recursion processing in PCRE differs from Perl in two important ways. In PCRE (like Python, but unlike Perl), a recursive subpattern call is always treated as an atomic group. That is, once it has matched some of the subject string, it is never re-entered, even if it contains untried alternatives and there is a subsequent matching failure. It leads to unparseable context-sensitive grammars like: { S -> 0B1 | 1B0 | 1C1 | 0C0, A -> 0 | 1, B -> ABA | D1, C -> ACA | D0, D -> _E | _ , E -> AE | A } This is essentially adding two bits at a same offset of two binary numbers and checking if it matches the trailing bit. [The actual goal was matching correct additions of two binary numbers in form of a + b = c where a, b and c each match [01]+.] I wished this could be expressed as: (?(DEFINE) (?<s> 0(?&b)1 | 1(?&b)0 | 1(?&c)1 | 0(?&c)0 ) (?<a> 0 | 1 ) (?<b> (?&a)(?&b)(?&a) | (?&d)1 ) (?<c> (?&a)(?&c)(?&a) | (?&d)0 ) (?<d> _(?&e) | _ ) (?<e> (?&a)(?&e) | (?&a) ) ) ^(?&s)$ Example matches: 0_00 01_101 Example non-matches: 0_01 100_00011 In case this is declined, _why_ is PCRE exhibiting that behavior? -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev