https://bugs.exim.org/show_bug.cgi?id=1883

            Bug ID: 1883
           Summary: Allow parsing of any context-sensitive grammar by
                    enabling non-atomic recursion
           Product: PCRE
           Version: N/A
          Hardware: All
                OS: All
            Status: NEW
          Severity: wishlist
          Priority: medium
         Component: Code
          Assignee: p...@hermes.cam.ac.uk
          Reporter: bobw...@hotmail.com
                CC: pcre-dev@exim.org

Essentially: Remove this restriction from PCRE

       Recursion  processing  in PCRE differs from Perl in two important ways.
       In PCRE (like Python, but unlike Perl), a recursive subpattern call  is
       always treated as an atomic group. That is, once it has matched some of
       the subject string, it is never re-entered, even if it contains untried
       alternatives  and  there  is a subsequent matching failure.

It leads to unparseable context-sensitive grammars like:

{ S -> 0B1 | 1B0 | 1C1 | 0C0,
  A -> 0 | 1,
  B -> ABA | D1,
  C -> ACA | D0,
  D -> _E | _ ,
  E -> AE | A }

This is essentially adding two bits at a same offset of two binary numbers and
checking if it matches the trailing bit. [The actual goal was matching correct
additions of two binary numbers in form of a + b = c where a, b and c each
match [01]+.]

I wished this could be expressed as:

(?(DEFINE)
(?<s> 0(?&b)1 | 1(?&b)0 | 1(?&c)1 | 0(?&c)0 )
(?<a> 0 | 1 )
(?<b> (?&a)(?&b)(?&a) | (?&d)1 )
(?<c> (?&a)(?&c)(?&a) | (?&d)0 )
(?<d> _(?&e) | _ )
(?<e> (?&a)(?&e) | (?&a) )
)
^(?&s)$

Example matches:
0_00
01_101

Example non-matches:
0_01
100_00011


In case this is declined, _why_ is PCRE exhibiting that behavior?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to