Otto Moerbeek o...@drijf.net wrote:
Refering to subpatterns is not available in flex. I suppose it is not
available since it would require a more complex re engine.
Interpretation of the lexical value should be hand-crafted.
I also though caomplexity can be the reason, but I have doubts. Why should
be difficult to track the indices in yytext of the beginning and the end
of each matching subexpression, in two arrays of integers (one for
the beginning and one for the end)? Neither memory nor time seems to
be a problem. And hand crafting means not only avoidable programming
work and unreadability, but a second pass that adds complexity.
A nice source on regexps is here: https://swtch.com/~rsc/regexp/
In the first article listed there you read:
While writing the text editor sam [6] in the early 1980s, Rob Pike wrote a
new regular expression implementation, which Dave Presotto extracted into
a library that appeared in the Eighth Edition. Pike's implementation
incorporated submatch tracking [sic!] into an efficient NFA simulation but,
like the rest of the Eighth Edition source, was not widely distributed.
Pike himself did not realize that his technique was anything new. Henry
Spencer reimplemented the Eighth Edition library interface from scratch,
but using backtracking, and released his implementation into the public
domain. It became very widely used, eventually serving as the basis for
the slow regular expression implementations mentioned earlier: Perl, PCRE,
Python, and so on. (In his defense, Spencer knew the routines could be
slow, and he didn't know that a more efficient algorithm existed. He
even warned in the documentation, Many users have found the speed
perfectly adequate, although replacing the insides of egrep with this
code would be a mistake.) Pike's regular expression implementation,
extended to support Unicode, was made freely available with sam in late
1992, but the particularly efficient regular expression search algorithm
went unnoticed. The code is now available in many forms: as part of sam,
as Plan 9's regular expression library, or packaged separately for Unix.
Ville Laurikari independently discovered Pike's algorithm in 1999,
developing a theoretical foundation as well [2].
Note that OpenBSD's regex library seems to use the slow Spencer
implementation.
Rodrigo.