https://bugs.exim.org/show_bug.cgi?id=2793
--- Comment #2 from Thomas Tempelmann <tempelm...@gmail.com> --- Philip, thank you for looking into this. A few thoughts: As I search case-insensitively, shouldn't the fast scan also have found the occurances of "e", i.e. treated it the same as "E"? And since "e" appears about as often in both files, this should then not lead to a performance difference, right? In fact, if I search for "eDL" instead, then both searches are fast. That's probably because then the fast scan is able to see each "e" and then perform a check on the following chars, and quickly determine that they do not match, and then continue looking for the next "e". But with the fast scan looking for "E", the case-sensitivity still somehow needs to be observed. The first "E" appears only halfway down the file, at around 5 MB. I wonder if, once it's found, the algorithm "realizes" that it skipped a lot of data without considering the case-insensitive, i.e. that it also needs to look for "e" within that skipped data. Indeed, if I add a few more "E" within the first 5MB of the file, search becomes faster again. Also, if I search for "xDL", with "x" not appearing in either file, I see that this fast scan having the desired effect: It gets about 5x faster than when searching for "eDL". So there seems to be something wrong with how the fast scan behaves with case-insensitivity. Skipping the fast scan with PCRE2_NO_START_OPTIMIZE is an option, but since the optimization helps SO much with what I need from PCRE, i.e. tell me simply whether a file contains the search string and nothing more, I'd like to keep this working. Also, I think it's important to note that with JIT, this performance issue does not surface. My guess is that the fast scan is not used with JIT because the JIT implements code that will do practically the same as the fast scan does, that is, look for the starting match (here: "e" or "E"), via a state machine? Whereas, without JIT, the search for the first state by using a table is a performance pit compared to simply scanning all bytes until you find the "E"? -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev