https://bugs.exim.org/show_bug.cgi?id=2793
Bug ID: 2793 Summary: Case insensitive search gets exponentially slower with larger buffers and a specific text file Product: PCRE Version: 10.37 (PCRE2) Hardware: x86-64 OS: All Status: NEW Severity: bug Priority: medium Component: Code Assignee: philip.ha...@gmail.com Reporter: tempelm...@gmail.com CC: pcre-dev@exim.org Created attachment 1395 --> https://bugs.exim.org/attachment.cgi?id=1395&action=edit main.c, 1.txt, 2.txt I have two log files. In both, every line is 91 chars long, having only ASCII chars. The first has the same line repeated all over. The other has "real" log lines, with ever-changing time codes. Both are about 10 MB in size. When I search the first, it takes milliseconds, but searching the other takes many seconds, and that's clearly wrong. If I double the file / buffer sizes, the time explodes (i.e. it does not simply double in size) only with the second file. Also, this only happens in non-jit mode, and only when I choose the case-insensitive option. And if I try the same with the built pcre2grep command, using the options "--buffer-size=32M --no-jit -i", it's also not reproducible. Only going wrong with my own code. Here's the code I use to read and search each file. It's as simple as it can get, I think. const char *find = "EDL"; uint32_t regexOptions = PCRE2_CASELESS; // without this, it's fast as expected int errNum = 0; PCRE2_SIZE errOfs = 0; pcre2_code *regEx2 = pcre2_compile_8 ((PCRE2_SPTR)find, PCRE2_ZERO_TERMINATED, regexOptions, &errNum, &errOfs, NULL); pcre2_match_data *regEx2Match = pcre2_match_data_create_from_pattern (regEx2, NULL); // read from file size_t dataLen = 10 * 1024 * 1024; // 20 MB void *dataPtr = malloc (dataLen); int fd = open ("2.txt", O_RDONLY); dataLen = read (fd, dataPtr, dataLen); pcre2_match_8 (regEx2, (PCRE2_SPTR8)dataPtr, dataLen, 0, 0, regEx2Match, NULL); Attached is the complete "main.c" plus the two text files, zipped (it compressed quite well, to about 400 KB) -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev