--- In [email protected], "silvermoonwoman2001" <sheri...@...> wrote:
>
> The unicode plugin's character functions (such as length) apparently are 
> dividing the number of UTF16-based bytes by 2 to get the length, which is 
> true only for the Basic Multilingual Plane. Regex/utf8 works fine tho.

Yes.  I use wcslen, whenich returns string length.

AsI've already said, many unicode services only work in the BMP, because I use 
standnard Microsoft WCHAR services.  Comparisons won't work right, nor will any 
service that relies on finding a position in a string (find, index, slice, 
etc).  I'll update documentation to say it's so, when I get a chance.  Sometime 
much later I'll try to build a new unicode plugin that piggybacks on someone's 
existing code.  No way I'm gonna try to reinvent the wheels of surrogate pair 
detection, case folding, combining character sequence, etc.  

Reply via email to