>"Wow, wow... stop it right there" is good advice. It may seem "impolite" 
>advice, 
>especially given that the masking feature was introduced, with good 
>intentions, 
>by the same person who did the hard work of implementing 32-bit support,...
 
I did not mean to be impolite and I appologize for the language if anybody was 
hurt.  Especially I did not mean to hurt the person who has done this great job.
 
>Is there any plan to give the new data format a name, such as "UTF-21", 
...
>Currently, PCRE sets a horrible precedent for a protocol. PCRE_NO_UTF32_CHECK 
>has two meanings at the same time: "don't check the input, since we already 
>know it's valid UTF-32" and "mask the input, since it's not UTF-32, it's 
>really UTF-21". 
>At least this needs to be fixed before the next version of PCRE is released.
 
If we define UTF-21 as a 32 bit data type that mask the 11 high order bits, 
then instead of PCRE_NO_UTF32_CHECK , we could define something like PCRE_UTF21 
as a run time option to the UTF32 that would force UTF32 to mask and work as 
UTF21.  Another, and maybe better option is to create PCRE21 library as an 
exact copy for the PCRE32 library with PCRE_UTF21 turned on, in addition to the 
PCRE32 library.  Obviously, in that case, PCRE_UTF21 should not be commonly 
available, although we may not really prevent its use.  The PCRE_NO_UTF32_CHECK 
would then have only one meaning, performance related:  "don't check the input, 
since we already know it's valid UTF-32".  In that way we have an easy way to 
implement UTF21 in PCRE and UTF32/UTF21 would be distinct implementations.
We could put the UTF21 libray in the contrib branch as something  that is 
available and meant for specialized use
 

Ze'ev Atlas
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to