[power-pro] Re: Unicode: multibyte

silvermoonwoman2001 Mon, 24 Aug 2009 08:23:03 -0700

--- In [email protected], "entropyreduction" 
<alancampbelllists+ya...@...> wrote:


> Anyway, if W2K gets some things wrong in other parts of the api
> besides counting characters, all the more reason to switch to a
> third party lib that's kept up to date. Eventually. 

It seems to me that surrogate pairs are a rarity and a novelty, and special 
support for them in the unicode plugin isn't necessary. The plugin shouldn't 
fail to read them if they turn up in a file, nor fail to accept them if 
present, e.g., in a string from_utf8. Just mention in the docs that character 
counts generated by unicode.services will be overstated in this rare 
circumstance, and that arbitrary slicing should then be avoided. You could 
include (in the docs) a script that can identify (using regex) whether there 
are any high code points in a unicode string.

if (regex.pcrematch(?"[^\x{0001}-\x{FFFF}]", h_ustring, "utf8")==0) do
  ;unicode.services are ok
else
  ;avoid character based unicode.services
endif

Regards,
Sheri

[power-pro] Re: Unicode: multibyte

Reply via email to