Hi everyone, I hope I can get some help here with using PCRE2. I'm writing a search program, similar to grep, but with a GUI, and using multiple threads (it's Find Any File for macOS). I had been using Apple's own regex lib, which works reliably (no crashes). but it's too limited in its abilities, so I wanted to switch to PCRE2.
I now run into a reproducible crash, though, and need help resolving this. My goal is to be able to search for UTF8 text inside binary data. Maybe I'm doing this wrong. My code for this currently looks like this: uint32_t regexOptions = PCRE2_UTF | PCRE2_NO_UTF_CHECK | PCRE2_CASELESS; uint32_t matchOptions = PCRE2_NOTBOL | PCRE2_NOTEOL | PCRE2_NOTEMPTY | PCRE2_NO_UTF_CHECK; int errNum = 0; PCRE2_SIZE errOfs = 0; pcre2_code *regEx2 = pcre2_compile_8 ((PCRE2_SPTR)find, PCRE2_ZERO_TERMINATED, regexOptions, &errNum, &errOfs, NULL); pcre2_match_data *regEx2Match = pcre2_match_data_create_from_pattern (regEx2, NULL); pcre2_match_8 (regEx2, (PCRE2_SPTR8)dataPtr, dataLen, 0, matchOptions, regEx2Match, NULL); Without the PCRE2_NO_UTF_CHECK option, it seems it won't find anything in binary files. I also add the PCRE2_CASELESS to be able to find text case-insensitive, but that's what leads to the crash. For instance, if I search my local "locate.database" for "NSURLVolumeNameKey", I get a crash in the "match" function: const char *find = "NSURLVolumeNameKey"; size_t dataLen = 32 * 1024 * 1024; // 32 MB void *dataPtr = malloc (dataLen); int fd = open ("/var/db/locate.database", O_RDONLY); dataLen = read (fd, dataPtr, dataLen); If I remove either the PCRE2_NO_UTF_CHECK or the PCRE2_CASELESS option, I get no crash. Also, when shortening the search string, I get no crash. Here's some details on the crash as shown by Xcode: 0x10001bbae <+37742>: leaq 0x1221b(%rip), %rdi ; _pcre2_ucd_stage1_8 0x10001bbb5 <+37749>: movzwl (%rdi,%rax,2), %eax 0x10001bbb9 <+37753>: shlq $0x7, %rax 0x10001bbbd <+37757>: movl %ecx, %edi 0x10001bbbf <+37759>: subl %edx, %edi 0x10001bbc1 <+37761>: movslq %edi, %rdx 0x10001bbc4 <+37764>: addq %rax, %rdx 0x10001bbc7 <+37767>: leaq 0x16602(%rip), %rax ; _pcre2_ucd_stage2_8 -> 0x10001bbce <+37774>: movzwl (%rax,%rdx,2), %edx 0x10001bbd2 <+37778>: leaq 0x75d7(%rip), %rax I get the msg: Thread 1: EXC_BAD_ACCESS (code=1, address=0x1004fe2b6) Registers: Exception State Registers: trapno unsigned int 0x00000003 err unsigned int 0x00000000 faultvaddr unsigned long 0x00007fff95466230 General Purpose Registers: rax unsigned long 0x00000001000321d0 rbx unsigned long 0x00007ffeefbfa76e rcx unsigned long 0x0000000000e97673 rdx unsigned long 0x0000000000266073 rdi unsigned long 0x0000000000000073 rsi unsigned long 0x0000000000000009 rbp unsigned long 0x00007ffeefbfa450 rsp unsigned long 0x00007ffeefbfa330 r8 unsigned long 0x0000000000000000 r9 unsigned long 0x0000000000000080 r10 unsigned long 0x00007ffeefbfa530 r11 unsigned long 0x00000001006815cd r12 unsigned long 0x0000000000000000 r13 unsigned long 0x00007ffeefbfa780 r14 unsigned long 0x00000001006815cb r15 unsigned long 0x0000000000000010 rip unsigned long 0x000000010001bbce rflags unsigned long 0x0000000000000202 cs unsigned long 0x000000000000002b fs unsigned long 0x0000000000000000 gs unsigned long 0x0000000000000000 eax unsigned int 0x000321d0 ebx unsigned int 0xefbfa76e ecx unsigned int 0x00e97673 edx unsigned int 0x00266073 edi unsigned int 0x00000073 esi unsigned int 0x00000009 ebp unsigned int 0xefbfa450 esp unsigned int 0xefbfa330 r8d unsigned int 0x00000000 r9d unsigned int 0x00000080 r10d unsigned int 0xefbfa530 r11d unsigned int 0x006815cd r12d unsigned int 0x00000000 r13d unsigned int 0xefbfa780 r14d unsigned int 0x006815cb r15d unsigned int 0x00000010 I am using libpcre2-8.a on macOS 10.13.6. The config.log from my build shows: It was created by PCRE2 configure 10.35, which was > generated by GNU Autoconf 2.69. Invocation command line was > $ ./configure --disable-shared --enable-silent-rules CFLAGS=-O2 > -mmacosx-version-min=10.11 You can download my locate.database here, along with the Xcode project I use for testing this (also includes the built libpcre2): https://files.tempel.org/tmp/PCRE2_Binary_Search.zip (3.6 MB) -- Thomas Tempelmann, http://apps.tempel.org/ Follow me on Twitter: https://twitter.com/tempelorg Read my programming blog: http://blog.tempel.org/ -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev