Regarding Real-Time Tex: http://xmpp.org/extensions/inbox/realtimetext.html
On Wed, Jun 22, 2011 at 2:56 PM, Simon McVittie < [email protected]> wrote: > On Wed, 22 Jun 2011 at 13:05:48 -0400, Mark Rejhon wrote: > > UTF16 and UTF16LE, and even UCS2 has same behaviour in my RTT spec, so I > > just say "16-bit Unicode". Java, C#, ObjectiveC stores strings in > 16-bit, > > and the various flavours of Unicode C++ STL and stdlib++ also store > strings > > in 16-bit as well. Extensive research and testing shows they all process > in > > flat mode like an array of 16-bit integers > > IMO you should either count Unicode codepoints (the underlying data model), > or bytes of UTF-8 (the XMPP wire protocol). Counting in units of what a > particular implementation uses internally, if it isn't one of those two, > seems attractive if you use that particular implementation, but complicates > things further for everyone who isn't. About the two options you suggested: -- Codepoints Method: Code points would probably be the most ideal, even though it would complicate programming for users of 16-bit Unicode programming languages. But it's a viable option. -- UTF8 Method: will make things even *more* complicated for some languages. In Java, how do you quickly calculate the UTF8 size of a inserted fragment of text that contains two displayable characters and five combining characters? Yes, you could convert UTF16 to a byte array, and count the length of the byte array, but if you're doing some real time text insertions in the middle of a string, we're now concerned about UTF8 indexes and offsets, which requires further math calculations, or researching what Java class library can do the calculation... (Same for C#, and other 16-bit Unicode languages) If we went with the suggestion, it would obviously have to be code-point based, for simplicity. Right now, XMPP RTT is essentially equivalent to editing an array of 16-bit integers. Going to a codepoint method would turn XMPP RTT essentially equivalent to editing an array of 32-bit integers. In fact, the same standard would apparently work exactly the same for programming languages of either 16-bit and 32-bit Unicode strings right now -- The differences only happen when somebody types a Unicode character U+10000 and above. Any programming mistakes in implementation would only become apparent in this rare case (in the most countries that don't use those characters). Even section 4.3.2 self-corrects the programmer's mistake via the correct <body> transmission replacing the flawed real time message. It still somewhat complicates most of the programming languages that all the current XMPP RTT projects (several are in progress), since most of them are 16-bit Unicode. But, yes, XMPP RTT is definitely a long-term standard, as it is a candidate of a standard that replace deaf TDD/TTY communications, with a mass-market-compatible real time communication mechanism... So I need to include long-term thinking! Some thinking needed. Sincerely, Mark Rejhon
