------- You are receiving this mail because: ------- You are on the CC list for the bug.
http://bugs.exim.org/show_bug.cgi?id=1474 Summary: Mechanism to record token ID Product: PCRE Version: N/A Platform: Other OS/Version: All Status: NEW Severity: wishlist Priority: low Component: Code AssignedTo: [email protected] ReportedBy: [email protected] CC: [email protected] I've recently been looking at PCRE for tokenization. It can handle all of the recognition requirements, but I can't discern a way to easily identify what token was recognized. I can abuse the named submatch mechanism to get close (and that is useful), but it's frustrating that it only gets me 95% of the way there. I'm wondering if it might be appropriate to introduce a new form of memoization that allows a token number to be recorded. Like named sub-match, this would take the form of a new bracketing syntax. In contrast to named submatch, the bracketing would specify a number, and the library API would be extended to provide for recall of this number. In the absence of alternative specification within the RE, the number returned would be zero. When executed with engines implementing ordered choice, the token ID returned should be the one associated with the last token-numbered open paren resulting in a match. When executed with engines implementing a finite automata rather than a pushdown automata, the token ID returned should follow the leftmost longest rule. This is complicated enough that I hesitate to volunteer to implement it, but I'd be willing to give it a try and submit a patch for consideration if there is interest in the feature. -- Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
