https://bugs.exim.org/show_bug.cgi?id=2642
Bug ID: 2642 Summary: Searching with PCRE2_MATCH_INVALID_UTF and PCRE2_CASELESS not working in binary files Product: PCRE Version: 10.35 (PCRE2) Hardware: x86-64 OS: All Status: NEW Severity: bug Priority: medium Component: Code Assignee: philip.ha...@gmail.com Reporter: tempelm...@gmail.com CC: pcre-dev@exim.org Created attachment 1334 --> https://bugs.exim.org/attachment.cgi?id=1334&action=edit the binary file with the subject data (See also my post on the developers mailing list titled "Getting crash when searching binary data with case-insensitive option") PCRE2 seems currently unable to find plain ASCII text with the case-insensitive option in binary files. I have attached a sample binary file for this. Searching for the string "AWAVAUATSH" inside, or any other case variation, fails to find it, when I use the PCRE2_CASELESS option. Without PCRE2_CASELESS, it works. I see no logical reason why this shouldn't work. Adding the caseless option means that the search tree is simply getting bigger, with more decision cases. And since it works when searching in plain text files, it should as well work in files that contain invalid Unicode codes inside (i.e. are considered binary). The search pattern is still inside that file and should be found. Here's the test code. #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <unistd.h> #define PCRE2_CODE_UNIT_WIDTH 8 #import "pcre2.h" int main(int argc, const char * argv[]) { { const char *find = "AWAVAUATSH"; uint32_t regexOptions = PCRE2_MATCH_INVALID_UTF | PCRE2_UTF | PCRE2_CASELESS; uint32_t matchOptions = PCRE2_NOTBOL | PCRE2_NOTEOL | PCRE2_NOTEMPTY; int errNum = 0; PCRE2_SIZE errOfs = 0; pcre2_code *regEx2 = pcre2_compile_8 ((PCRE2_SPTR)find, PCRE2_ZERO_TERMINATED, regexOptions, &errNum, &errOfs, NULL); pcre2_match_data *regEx2Match = pcre2_match_data_create_from_pattern (regEx2, NULL); size_t bufLen = 32 * 1024 * 1024; // 32 MB, in case we test larger files void *bufPtr = malloc (bufLen); int fd = open ("pcre2_subject_sample", O_RDONLY); if (fd < 0) { printf("File not found! Please fix the path in the code.\n"); return 1; } size_t actualLen = read (fd, bufPtr, bufLen); int ok = pcre2_match_8 (regEx2, (PCRE2_SPTR8)dataPtr, actualLen, 0, matchOptions, regEx2Match, NULL); if (ok > 0) { printf("Pattern found\n"); } else { printf("Pattern NOT found\n"); } } return 0; } -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev