RE: Non-ascii string processing?

jon Mon, 06 Oct 2003 11:57:43 -0700

> But I still don't see any use in knowing how many characters are in an UTF-8
> string, apart the use that I already mentioned: allocating a buffer for a
> UTF-8 to UTF-32 conversion.


I wouldn't use it for that at all. I'd assume a worse-case of 32-bit word in the 
UTF-32 per octet in the UTF-8 or else stream it out, and hence avoid allocating a 
buffer for the entire string at all.

You would need to be able to count UTF-8 characters if you were implementing an spec 
defined in terms of characters rather than bytes, notably since XML is implemented in 
terms of characters any mention of string lengths or indices into strings is defined 
in terms of characters (e.g. in XSLT, XPointer and elsewhere).

RE: Non-ascii string processing?

Reply via email to