[pcre-dev] Question regarding matching invalid unicode

Kilian Kilger via Pcre-dev Fri, 14 Feb 2020 05:29:41 -0800

Dear PCRE2 developers,

we try to use PCRE2 to match UCS-2 encoding, i.e. UTF-16 without any
check for "broken" surrogates or any other invalid unicode. In UCS-2
encoding every character is 2 bytes and every 2-byte sequence is
accepted as a valid character.
Nevertheless we want unicode char properties to be considered, when
the code point at the corresponding position is valid UTF-16.


We recognized that the seem to get this behaviour, when we set the following:

PCRE2_UCP | PCRE2_NEVER_UTF | PCRE2_NO_UTF_CHECK

but we *do not set*

PCRE2_UTF

Is this correct? Does this have negative implications or undefined behaviour?

Best regards,
Kilian.

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev

[pcre-dev] Question regarding matching invalid unicode

Reply via email to