[power-pro] Re: Unicode: multibyte

entropyreduction Sun, 23 Aug 2009 11:30:49 -0700

--- In [email protected], "silvermoonwoman2001" <sheri...@...> wrote:
>
> The unicode plugin's character functions (such as length) apparently are 
> dividing the number of UTF16-based bytes by 2 to get the length, which is 
> true only for the Basic Multilingual Plane. Regex/utf8 works fine tho.


Yes.  I use wcslen, whenich returns string length.

AsI've already said, many unicode services only work in the BMP, because I use 
standnard Microsoft WCHAR services.  Comparisons won't work right, nor will any 
service that relies on finding a position in a string (find, index, slice, 
etc).  I'll update documentation to say it's so, when I get a chance.  Sometime 
much later I'll try to build a new unicode plugin that piggybacks on someone's 
existing code.  No way I'm gonna try to reinvent the wheels of surrogate pair 
detection, case folding, combining character sequence, etc.

[power-pro] Re: Unicode: multibyte

Reply via email to