On Mon, 27 Nov 2017, I wrote: > I suppose one might consider providing a function similar to > pcre2_callout_enumerate(), which enumerates the callouts in a compiled > pattern. Something like pcre2_fixed_strings_enumerate() which would > pass back the strings (it could bundle up runs of individual > characters).
Thinking about this some more ... knowing the fixed strings is not good enough. Consider a pattern such as ABC|\d\d\d which can match lines that do not contain ABC. An external indexing trigram scheme could only work if the pattern has no wild cards and no verbs such as (*ACCEPT). It would, of course, be possible to implement a pcre2_pattern_info() option that gives TRUE only if the pattern contains literal characters, vertical bar, non-lookaround, parentheses, circumflex, and dollar. I suppose quantifiers whose minimum is 1 could be permitted in some cases. Also maybe back references. Is all this going to be worth it? What you really need (I think) is a function that doesn't just give a list of strings in the pattern, but gives a list of strings, at least one of which *must* be present in the subject for there to be a match. That is something to think about. Some time ago I spent a bit of time playing with code that, given a compiled pattern, generates strings that match it. I had some success until I got to lookarounds, when I realized that I needed a whole new approach that included backtracking, and I haven't gone back to it. This requirement of yours seems similar in some ways. I'll think about it, but please do not hold your breath. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev