Hi all,
I would like to announce a new feature in PCRE2: running scripts during pattern
matching. Basically this is an extension of the callout feature with string
arguments. Imagine you can run php, JavaScript, QML scripts inside a regex.
In perl the /ab(?{ print "hello"; })/ regex matches to the "ab" string and also
prints hello. In PCRE2 you can do something similar with callouts: /ab(?C1)/.
However, the callout has only a 0-255 number argument, which is rather
inconvenient to use in a script language, since you need an <id, function> map.
Maintenance is difficult, especially if the id needs to be changed (because you
need to update all patterns manually). But this is over, strings can be used
instead of numbers from now on. The PCRE2 form of the previous example is the
following: /ab(?C` print "hello"; `)/ In this example we used the ` for
delimiter. However, there are many script languages, and they assign different
roles for different characters, so we have a large set of delimiters:
/ab(?C`code`)/
/ab(?C'code')/
/ab(?C"code")/
/ab(?C$code$)/
/ab(?C@code@)/
/ab(?C[code])/
/ab(?C{code})/
These patterns represent the same regex. Feel free to use the most convenient
delimiter for you. Even if you need the delimiter character, just duplicate it:
print("hello world") can be encoded as /ab(?C" print(""hello world""); ")/
Regarding performance, it is not recommended to use these embedded scripts too
frequently. However, it is usually still faster than splitting a pattern to
multiple patterns and conditionally run them (based on a condition you cannot
evaluate by the regex engine).
I hope everybody will like it, and many script language will implement the
support for this feature.
Regards,
Zoltan
--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev