Re: [pcre-dev] Question regarding matching invalid unicode

2020-02-14 Thread ph10
On Fri, 14 Feb 2020, Kilian Kilger via Pcre-dev wrote: > we try to use PCRE2 to match UCS-2 encoding, i.e. UTF-16 without any > check for "broken" surrogates or any other invalid unicode. In UCS-2 > encoding every character is 2 bytes and every 2-byte sequence is > accepted as a valid character.

[pcre-dev] Question regarding matching invalid unicode

2020-02-14 Thread Kilian Kilger via Pcre-dev
Dear PCRE2 developers, we try to use PCRE2 to match UCS-2 encoding, i.e. UTF-16 without any check for "broken" surrogates or any other invalid unicode. In UCS-2 encoding every character is 2 bytes and every 2-byte sequence is accepted as a valid character. Nevertheless we want unicode char