(Ooops -- I accidentally sent my last message early.) (Please disregard the reply I sent 1 hour ago)
About XEP-0301 In-Band Real-Time Text http://www.xmpp.org/extensions/xep-0301.html On Tue, Mar 6, 2012 at 6:04 PM, Florian Zeitz <[email protected]>wrote: > Hello standards-list, hello Mark, "inspired" by the recent discussions > surrounding XEP-0301 (Real Time Text) I had a look over it's current status > and felled I should provide some feedback. Some is rather minor some I'm > somewhat more concerned about, I've ordered it sequentially rather than by > importance. > Thanks for writing! You have some excellent suggestions of areas to clarify in the specification. Section 3: Glossary > The "RTT" entry seems superfluous to me. I'd be better to just note the > acronym in the "real-time text" entry. Also the remark about the > element's name is misleading as that is generally lower-case. > The glossary has been simplified in the next version of the spec, amongst several other simplifications that have now been done. Section 4.2.2: 'seq' attribute > It seems to me the start of a session (and therefore when to reset this > to 0) is not clearly defined, since the start and cancel events are > purely optional. In a more general sense I'm somewhat concerned about > the attempt to reimplement the transport layer on the application layer. > (see below) > The seq was developed, because, it was found in fact to be necessary. -- Disconnect and reconnect cycles. Including those caused by bad wireless reception (WiFi, 3G, mobile phone) -- Not all servers support offline message delivery. -- BOSH failures in a web browser at the client side (i.e. intermittent HTTP request failures) -- And less often, situations of extreme congestion has happened. My XEP-0301 is intentionally designed to survive a wide variety of situations that have actually happened in my tests. This attribute is critical to my applications, as well as Next Generation 9-1-1 experimental demos. See http://tools.ietf.org/html/draft-tschofenig-ecrit-xmpp-es-00 ... "Emergency Services Functionality with the Extensible Messaging and Presence Protocol (XMPP)", section 4.5 mentions XEP-0301 as one of the possible functionalities that may become a part of NG9-1-1. Real-time text technologies here is very useful for accessibility too; as it replaces the need to carry around a teletypewriter (TTY). We need to keep the 'seq' attribute, as it is essential for message integrity during less-than-ideal situations. I actually expanded it to recommend an event='reset' once every 10 XMPP messages, to improve resilence even further -- the latest version of RealJabber now does this. (you can the new RealJabber on two computers -- and test this concept by disconnecting-reconnecting in the middle of a conversation while the other person is still typing real-time in RealJabber, and the real-time text recovers automatically within a few seconds of reconnecting because of the automatic event='reset' occuring at regular intervals.) Also, very rarely (it only happened on BlackBerry's Google Talk client) when you've transmitted several XMPP messages simultaneously (i.e. network congestion) and they all get the same timestamp, and then they get delivered out-of-order (wrong order) that they were transmitted. This should never happen, but it actually occasionally does. An earlier version of XEP-0301 was more complex, having a 'msg' attribute for message number. That was removed, reduced down to just one 'seq' value for simplification. -- 'seq' does not need to start at 0. I'll eliminate that requirement (clarification) -- You can change the 'seq' value anytime there's an event='new' or event='reset'. Setting it back to 0 again works, although I prefer not to reset it to 0 because of the danger of a user disconnecting while seq='0' and reconnecting after it's incremented, reset back to '0' again, and incremented to '1', and the user had reconnected, getting consecutive seq numbers for totally different real-time messages that were never delivered. In this case, the wrong <rtt> will be displayed, resulting in rare text scrambling. This actually happened once in my random testing, so I stopped resetting seq back to 0 everytime there was an event='reset'. > Section 4.2.3: 'event' attribute > I feel the requirement for session start and cancel needs to be either > tightened (if we decide we absolutely need it for this protocol) or > removed. Having it truly optional makes it useless for detecting the > actual session start and end IMHO. It's sole purpose appears to be > mapping to SIP, which is a problem possible best handled separately. > The new and reset events appear to have been introduced under the > assumption that messages get lost. If they are not the reset event > can be safely removed and the new event is implicit upon receipt of a > <body/> element. > You are right, that 'start' and 'cancel' is not really required, so I may be removing them. However, I think that 'cancel' may still be necessary to signal the other recipient to stop transmitting incoming <rtt> for the remainder of the chat session, in order to save bandwidth, whenever the recipient wants to turn off RTT (i.e. via a button or switch, while in middle of conversation). So there's still a usefulness for 'cancel' even if 'start' is not neeed (the start of RTT during a chat session is simply the first delivery of an <rtt> element.) > Section 4.5.1: action elements > The normative text in this section should be further explained. > E.g.: What is REQUIRED for the <t/> element? Support, inclusion in each > <rtt/> element, etc. (It is relatively clear to me what you mean, I just > wish it was somewhat more fleshed out) > "A client conforming to this specification MUST accept <t/>, <e/> and > <d/> elements and handle them as described in the following..." > Agreed, it should be clarified. Section 4.5.1.3: counting > It appears to me that the rules for determining the position and count > of code points are somewhat backwards. In particular if the sending > client does perform any normalization before sending the counts need to > be based on the normalized version since the receiving client can not > undo such normalization (this is the opposite of what is described in > the text). Also most of the described transformations are only relevant > for display on screen and should not change the string. > IMHO it should suffice to count code points based on what is send over > the wire. > We have to keep "Unicode code points" (more on that later) but I agree that normalization paragraph is definitely confusing, so I've modified it to the following wording: * * * "For interoperability of p and n values, processing MUST be done on the transmitted Unicode real-time message. For senders , this is the version of the Unicode message text after any Unicode normalization, emoticon graphics images conversion to Unicode, display text formatting, processing of Unicode combining marks, etc. For recipients obtaining text from the <t> element, this is the Unicode text immediately after XML processing, and before any further processing. From the perspective of p and n values, a real-time message is treated as an editable array of Unicode code points."* Now, the reason why we have to keep "Unicode code points": The section 9 "Internationalization Considerations" explains why XEP-0301 uses "code points" technique: Different programming platforms use different internal Unicode encodings, which may be different from the transmission encoding (UTF-8) for XMPP. -- Multiple Unicode code points may represent one displayable Unicode character (i.e. combining marks). Action elements operate on Unicode code points, not on displayable characters. -- Characters U+10000 through U+1FFFF, which are single code points, but are represented as multiple surrogate code units in certain Unicode encodings (i.e. UTF-16). Action elements operate on Unicode code points, not on individual surrogate code units. -- Some Unicode encodings use a variable number of bytes per Unicode code point (i.e. UTF-8). Action elements operate on Unicode code points, not on individual bytes. Real-time editing (mid-text inserts/deletes) of Unicode text containing variable-length encodings, causes major text scrambling if the recipient and sender So unfortunately, a standardization of counting is essential, especially when international text is involved. The 'char' of some programming languages is sometimes 8-bit, sometimes 16-bit, and sometimes 32-bit. Often, XMPP libraries already pre-convert the text to different Unicode encodings. Not all of them have access to the original UTF-8 "wire" text, so I can't depend on counting via "wire format" (UTF-8) unless I ask them to convert everything back to UTF-8 before processing. But that itself is a catch-22 because in many programming languages, String.Insert / String.Delete operations, don't operate on UTF-8. Therefore, we decided it ended up being necessary to standardize on 'unicode code points'. In addition, here's a flowchart diagram that may help understand the Unicode preservation scenario better: http://www.realjabber.org/flowchart_of_xmpp_rtt_path.pdf (This file will need to be updated for the latest XEP-0301 draft) Section 4.5.2: action elements > I'd like to hear some rational on why there is forward and backward > delete. Both appear to be able to generate the same results. > -- Note that cursor position never changes with a forward delete operation. -- The cursor position only changes if a backspace operation is done. -- Subsequent action elements does not require knowledge of the cursor position of preceding action elements. However, it is true that the standard could still work with just one of the two text-deleting codes (and instead using the <c/> element to correct any cursor position). I have seriously considered removing one of the two codes. However, we found that bandwidth is more efficient if I kept both codes. > It did occur to me that they are meant to be used in conjunction with > cursor display. However, it appears that this would cause interesting > possible situations. E.g. what happens if a character is forward deleted > at a position preceding the cursor. In that situation the absolute > position of the cursor should move one to the left, but will instead > move 1 to the right relative to the text (it might move over the right > end). I'd prefer expected cursor position to always be transmitted > explicitly in these cases and have either delete variant removed. > It actually optimizes bandwidth to have both codes available, because it reduces the number of cursor-position-correcting <c/> elements transmitted. (<c/> is only transmitted when needed, i.e. cursor position is not where it should be after the specific delete operation is done) The source code that I wrote to comply with section 6.2.1 "Monitoring Message Edits" ( http://xmpp.org/extensions/xep-0301.html#monitoring_message_edits<http://xmpp.org/extensions/xep-0301.html> ) is actually implemented at line 298 of RealTimeText.cs of RealJabber .... function EncodeRawRTT viewable at http://code.google.com/p/realjabber/source/browse/trunk/CSharp/RealTimeText.cs#298 intelligently decides whether to do a <e> or a <d> based on where the cursor should go at the end, located at lines 360-370 of the above hyperlink. That said, if bandwidth wasn't as important, I could just very easily remove either <e> or <d> and it would not make any difference to the end user to RealJabber, since a cursor position correction is done. If we keep <d> and eliminate <e> it means backspace operations might need to be accompanied by a corrective cursor repositioning. If we keep <e> and eliminate <d> it means delete key operations might need to be accompanied by a corrective cursor repositioning. (Note: This is optional -- this is only for clients that decide to support transmission / reception of cursor positioning) Also, I've actually stopped using <c/> because I realized an empty <t p='#'/> element does exactly the same thing as <c p='#'/> ... Therefore I am actually thinking of removing two action elements from the next XEP-0301 - Remove <c/> because I can use empty <t/> to do exactly the same thing. - Remove <g/> because I can use XEP-0224 instead successfully anyway. That reduces the number of action elements to just 4, and it makes it easy to merge Tier 1 with Tier 2 into one unified table for simplicity. However, I've found enough reason to keep both <d> and <e> -- but our team can still be swayed by further arguments against having both. Section 4.6: error recovery > As mentioned before, the attempt to correct errors is my biggest concern > about this XEP. For the case of reconnects it appears to me that the > sending client will always be able to notice this situation and treat it > as a new RTT session. > Error recovery is actually simpler than it looks -- it consumes less than 5% of the source code in RealTimeText.cs One of my business clients had serious problems without error recovery (we had an attempt to make it optional), so we actually expanded Error Recovery to RECOMMEND event='reset' at regular intervals, such as once every 10 <rtt> messages (or once every 10 seconds). Also, sometimes you can't detect online/offline transitions, for example, Google Talk network sometimes can't see the online/offline status of jabber.org users, and you can have RTT conversations with users that appear offline (i.e. invisible). Section 6.4.1: message length limit > In the second example with the split messages I would not have expected > an empty <rtt/> element. If that is actually intended to indicate the > <body/> is part of the RTT session this should be mentioned elsewhere. > Yes, that was the intent. This will need to be clarified. It's not essential, but it is a very useful indicator. Your comments are useful. I'd appreciate hearing back from you, and you are very welcome to run RealJabber (www.realjabber.org) and test it out with me -- email me privately to make an appointment -- I would appreciate your comments, given your useful insights. Thanks! Mark Rejhon
