Re: [Standards] RTT, take 2

Gunnar Hellström Thu, 23 Jun 2011 12:42:11 -0700

Mark said in the UTF-8 / UTF-16 discussion:

However, I am thinking of following Simon's excellent suggestion.
What do you think of his suggestion of using "code point" counting forlength and position attributes?That'd pretty much essentially turn XMPP RTT equivalently into astandard for editing an array of 32-bit integers instead (allow use ofnative UCS4 string functions in programming languages that storesstrings in UCS4 format). It makes my 16-bit programming slightly morecomplicated, but much easier than counting in UTF8. It might be abetter long term goal.
Opinion?

Yes, counting in code points is the right decision. You do not need tocomment what that means for the programmer.Some may want to work in native UTF-8. Then a Unicode codepoint is welldefined as a 1-4 bytes long UTF-8 transform, easily isolated.

Some may want to work in UTF-16. They then need to watch out for 16-bitvalues in the range U+D800 to U+DFFF and count pairs of such codes as 1codepoint while all other 16-bit codes are 1 codepoint.


And some may want to work in the 32 bit expanded Unicode.

Just specify that in the protocol, p and n are counted in Unicode codepoints.


/Gunnar

Re: [Standards] RTT, take 2

Reply via email to