Re: XMPP In-Band Real-Time Text http://www.xmpp.org/extensions/inbox/realtimetext.html
Peter -- Thank you again for your comments! I've now made roughly 99% of the minor edits, from everybody. I just have one slightly 'bigger' edit (rewrite of "Internationalization Considerations"), which I'd like you to comment on below: On Thu, Jun 23, 2011 at 9:54 PM, Peter Saint-Andre <[email protected]>wrote: [about message ID] > > From what I recall, I think Kevin agreed on erring on the side of > > simplifying, when I asked if I should remove the msg identifier from the > > last spec. It can be re-added as an optional feature, as an additional > > integrity layer to error recovery. > Kev was right about simplification. I just wanted to gain some clarity > on whether we had a problem to be solved here. > In the future, it may be a useful as an improved error-recovery enhancement, especially to keep track of different real time messages if the specific message with the <body> element gets dropped/lost somehow. However, we've found reliability to be excellent, and the current simple error recovery is solving >99% of our problem cases. So, we all agree, simpler is better. On the wire is no such thing as a code point, there are only code points > that are encoded using an encoding form like UTF-8 or UTF-16. For > details, see: > http://tools.ietf.org/html/draft-ietf-appsawg-rfc3536bis-02 > Given that XMPP is pure UTF-8, I don't see a compelling reason to count > UTF-16-encoded code points or UTF-32-encoded code points. > We have to make it easier for the programmers "on average". Programming platforms frequently use a different /internal/ format than for the /wire/ format (UTF-8): If we process in UTF8 direct off the wire, then: ...Languages using UTF8: Easy ...Languages using UTF16: Complicated (more math) ...Languages using UCS4: Complicated (more math) If we process in UTF16, then: ...Languages using UTF8: Somewhat Complicated (some math) ...Languages using UTF16: Easy ...Languages using UCS4: Minor math If we process in code points instead, then: ...Languages using UTF8: Minor math ...Languages using UTF16: Minor math ...Languages using UCS4: Easy This assumes that a specific programming languages don't have easy access to string length/index counting routines for a different Unicode encoding. Some languages make it difficult, and require manual mathematics. The advantage is that "unicode code points" is exactly the same meaning for all Unicode encodings, according to unicode.org and results in consistency for the specification. We have therefore decided to go with code point processing, and to eliminate UTF16. By saying "unicode code points" we don't have to worry about mentioning "Unicode encodings", as the code points mean the same thing for all Unicode encodings. As you can see, we now believe that this is the only 'practical' alternative to minimize "average complexity" spread across all possible programming languages. I would welcome an alternative that is simpler, but currently our research appear to show that the "code point" method has the compelling advantage of avoiding too much complexity for any specific programming language. > Yes, I figured that out, I just wasn't sure why we needed that kind of > complexity. I'll look at it again. > I've added a small introduction paragraph to the top of the Use Cases, to explain the purpose of the progressive examples leading up to the final real-world example. I plan to submit the document with all the minor by this weekend -- which will make it on time Thanks, Mark Rejhon
