Hi all,

I would like to announce a new feature in PCRE2: running scripts during pattern 
matching. Basically this is an extension of the callout feature with string 
arguments. Imagine you can run php, JavaScript, QML scripts inside a regex.

In perl the /ab(?{ print "hello"; })/ regex matches to the "ab" string and also 
prints hello. In PCRE2 you can do something similar with callouts: /ab(?C1)/. 
However, the callout has only a 0-255 number argument, which is rather 
inconvenient to use in a script language, since you need an <id, function> map. 
Maintenance is difficult, especially if the id needs to be changed (because you 
need to update all patterns manually). But this is over, strings can be used 
instead of numbers from now on. The PCRE2 form of the previous example is the 
following: /ab(?C` print "hello"; `)/ In this example we used the ` for 
delimiter. However, there are many script languages, and they assign different 
roles for different characters, so we have a large set of delimiters:

/ab(?C`code`)/
/ab(?C'code')/
/ab(?C"code")/
/ab(?C$code$)/
/ab(?C@code@)/
/ab(?C[code])/
/ab(?C{code})/

These patterns represent the same regex. Feel free to use the most convenient 
delimiter for you. Even if you need the delimiter character, just duplicate it: 
print("hello world") can be encoded as /ab(?C" print(""hello world""); ")/

Regarding performance, it is not recommended to use these embedded scripts too 
frequently. However, it is usually still faster than splitting a pattern to 
multiple patterns and conditionally run them (based on a condition you cannot 
evaluate by the regex engine).

I hope everybody will like it, and many script language will implement the 
support for this feature.

Regards,
Zoltan


-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to