https://bugs.exim.org/show_bug.cgi?id=2106
--- Comment #6 from Philip Hazel <p...@hermes.cam.ac.uk> --- (In reply to Kyle J. McKay from comment #5) > > REG_STARTEND is already there. > > Except that it's not *BSD compatible -- see bug #2128 Now mended. > I might expect matching a fixed pattern like "abcabcz" against > a string like "abcabcabcabcabcz" to not be handled all that > efficiently by a naive strstr (or memmem) implementation, but > I'd expect a pattern matching engine to do better. Interestingly, I would expect the opposite. A regex engine has to worry about all the non-fixed stuff whereas an engine specifically looking for fixed strings (even strstr) doesn't have to. It might be instructive to write a test program that compares timings for strstr() vs PCRE2 for some fixed strings. If you are looking for caseful shortish fixed strings in shortish subject strings, in an 8-bit world, have no binary zeroes in your strings, and are not too worried about performance, then I would have thought that strstr() would be fine. For more "serious" searches then one of the Boyer-Moore type algorithms is best. I don't know if anyone has written a B-M searching library. For caseless matching, of course, that doesn't apply. PCRE2 just tries each character one by one against both (all) of its cases. > (With > REG_UTF8 does PCRE perform virtual NFC cannonicalization while > matching so, for example, a decomposed e+accent matches the > precomposed e+accent version? I'm thinking it probably does...) No, I'm afraid it doesn't. It handles only individual characters, not compositions. > In any case, a wrapper that wants to implement REG_NOSPEC > can just kludge it up with calls to strstr/memmem or > producing a malloc'd duplicate starting with \Q and escaped > \E (which is \E\\E\Q BTW) replacements -- I don't see why > the pattern translator can't do that itself though in order to > provide a REG_NOSPEC option. Not quite sure what you mean by "pattern translator"? PCRE's regcomp() is just an API wrapper; it doesn't translate anything (except options bits). I suppose in theory it *could* translate as you suggest, though dealing with \E requires more than just a simple search: consider this pattern "A\\EB". *If* anything were to be added to PCRE2 (and it would be PCRE2, as PCRE1 is feature-frozen), it might be better to add a PCRE2_LITERAL option to pcre2_compile() which REG_NOSPEC could activate. However, I'm not keen (as you can probably guess). -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev