Hi,I think 10.42 has PCRE2_MATCH_INVALID_UTF. Have you tried 
it?Regards,Zoltan-------- Eredeti levél --------Feladó: Thomas Tempelmann via 
Pcre-dev <pcre-dev@exim.org>Dátum: 2023 január 10 23:34:43Tárgy: [pcre-dev] 
Crashes in pcre2_match_16 with binary dataCímzett: pcre-dev@exim.orgIn 2020 I 
had asked for help with searching for UTF-8 text in binary data (i.e. any files 
on a disk). I got some advice and all worked well. Now I expanded my code to 
also search for UTF-16 text in the same data. So I built the lib with both the 
8 and 16 bit functions, and create separate pcre_code, match_data, context etc. 
using the _8 and _16 suffixes instead of the default macros. And it all works 
fine - It finds both UTF-8 and UTF-16 strings in files. However, I sometimes 
get crashes in the pcre2_match_16() function, whereas I never get them in the 
_8 function. And it happens both with JIT and without. I also use the same 
options for both versions, of course. With PCRE2 v10.42. I also ruled out a 
mix-up between the _8 and _16 structs by only using the _16 code, and I also 
don't use concurrent threads. The crashes are a bit random, i.e. certain files 
crash often but not always. But within 5 seconds of scanning random files on my 
disk, I get always a crash. Since I use a built lib, I cannot easily look at 
the source code where it crashes. I wonder if there are cmdline tools I can use 
for testing in order to rule out a mistake on my end. But it seems that 
pcre2grep does not support UTF-16 search, right? Or do I have to build the tool 
with special options first? -- Thomas Tempelmann, http://apps.tempel.org/ -- ## 
List details at https://lists.exim.org/mailman/listinfo/pcre-dev 
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to