On Mon, 27 Nov 2017, I wrote:

> I suppose one might consider providing a function similar to
> pcre2_callout_enumerate(), which enumerates the callouts in a compiled
> pattern. Something like pcre2_fixed_strings_enumerate() which would
> pass back the strings (it could bundle up runs of individual
> characters).

Thinking about this some more ... knowing the fixed strings is not good 
enough. Consider a pattern such as ABC|\d\d\d which can match lines that 
do not contain ABC. An external indexing trigram scheme could only work 
if the pattern has no wild cards and no verbs such as (*ACCEPT). It 
would, of course, be possible to implement a pcre2_pattern_info() option 
that gives TRUE only if the pattern contains literal characters, 
vertical bar, non-lookaround, parentheses, circumflex, and dollar. I 
suppose quantifiers whose minimum is 1 could be permitted in some cases.
Also maybe back references.

Is all this going to be worth it?

What you really need (I think) is a function that doesn't just give a
list of strings in the pattern, but gives a list of strings, at least
one of which *must* be present in the subject for there to be a match.
That is something to think about.

Some time ago I spent a bit of time playing with code that, given a 
compiled pattern, generates strings that match it. I had some success 
until I got to lookarounds, when I realized that I needed a whole new 
approach that included backtracking, and I haven't gone back to it. This
requirement of yours seems similar in some ways.

I'll think about it, but please do not hold your breath.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to