I've had a chance to review XEP-0301 in some detail. First, overall I think it is in good shape, I don't have any major concerns, and I think it is appropriate for advancement on the standards track at the XSF.
However, I do have some comments, which I've grouped into technical, editorial, and nits. ### 1. TECHNICAL ### In Section 4.6.4, why is it a SHOULD to retransmit a partially composed message in the circumstances enumerated there? I don't see what difference this makes regarding interoperability. The spec provides few guidelines about when to send multiple <message/> stanzas vs. when to include multiple actions in the same <message/> stanza (e.g., see the difference between the examples in Section 7.2 and Section 7.3). I was expecting to find some text about this in the section on Congestion Control. A similar concern applies to this text from Section 6.4.1: "For long messages, the final <rtt/> transmission may be made in a separate <message/> than the <message/> containing the <body/>." Why? When is this appropriate and when not? There's some confusion over internationalization terminology. I suggest that the author read RFC 6365. I also provide specific comments below. The use of conformance terms (MUST/SHOULD/MAY) is inappropriate in the Implementation Notes, and in general is being used with regard to user interface issues, not protocol issues that have an impact on interoperability. I fully realize that user interface issues are important to the author, but it's not correct to use conformance terminology here. I provide detailed suggestions in the next section of my comments. Also, personally I strenuously avoid lowercase versions of the RFC 2119 conformance terms these days to avoid any possible confusion. Thus I tend to change "may" to "might" or "can" or "it is acceptable to", "should" to "ought to" or "it is best to", "must" to "needs to", etc. I commend this convention to the author. In Section 6.1.3, we find that "If additional accuracy is required, it is also possible to timecode the <rtt/> elements." How? Is this a matter for implementation? Is it out of scope for this specification? The schema does not define the allowable values for the 'event' attribute. Are "new", "reset", and "cancel" the only values? Is there a default value (I assume not)? Also, it seems correct in the schema to set the default value of the 'n' attribute to "1". The security considerations say nothing about the use of this protocol with end-to-end encryption of whatever flavor (XEP-0027, RFC 3923, XEP-0116, OTR, XTLS, xmlenc, draft-miller-xmpp-e2e, etc.). That seems like a fairly significant oversight. As to congestion control, it's probably a good idea to look again at Section 9 of RFC 4103 for ideas about more detailed suggestions (although the author has probably already done so). ### 2. EDITORIAL ### Most of these are suggestions of varying weight. The author is free to ignore them, however I think most of them make eminent sense and deserve to be strongly considered. SECTION 2 I think it makes sense to cite ITU-T T.140 here. SECTION 4.2.2 The value of "reset" is specified as must-implement, but there is no such statement about "new" and "cancel". Please clarify. SECTION 4.5.5 Please see RFC 6365 regarding the terminology here. For example, I think you want: s/glyph/character/ s/character glyphs/characters/ s/surrogate code units/surrogate pairs/ The text here says that "calculations of p and n values MUST be based on Unicode code points". Are you sure that you mean code points? Given that XMPP mandates the use of UTF-8, I think it would be safer and easier to say "UTF-8-encoded code points" (the point about "Some Unicode encodings use a variable number of bytes per Unicode character" is true but hopefully irrelevant here). Also, it's not always so simple to say exactly which code point you're supposed to count, because Unicode normalization could come into play -- see my presentation on "Internationalization: A Guide for the Perplexed" at https://stpeter.im/files/i18n-intro.pdf for many examples, but a simple one is IV (Latin capital letter "I" plus Latin capital letter "V") vs. Ⅳ (Roman numeral four). SECTION 4.6.2 OLD An indicator MAY be used by the recipient to indicate the loss of sync. NEW A client might want to show an indicator to indicate the loss of sync. (That is: more interface suggestions, no interoperability impact.) SECTION 6 As mentioned, there's a lot of conformance language here and simply doesn't belong. Here are my suggested changes. OLD Senders with bursty output MAY immediately transmit word bursts of text without buffering. NEW It is acceptable for senders with bursty output to immediately transmit word bursts of text without buffering. OLD It is NOT REQUIRED to monitor or transmit Element <w/> – Interval for transcription. NEW It is not necessary to monitor or transmit Element <w/> – Interval for transcription. OLD Clients MAY optimize for bandwidth, performance and/or screen repaints by eliminating, merging, or ignoring Element <w/> – Interval selectively, especially those containing shorter intervals. NEW Clients can optimize for bandwidth, performance and/or screen repaints by eliminating, merging, or ignoring Element <w/> – Interval selectively, especially those containing shorter intervals. OLD The transmission interval of <rtt/> MAY also vary, either intentionally for optimizations, or due to precision limitation. NEW It is acceptable for the transmission interval of <rtt/> elements to also vary, either intentionally for optimizations, or due to precision limitations. OLD Clients MAY choose to implement alternate text-smoothing methods NEW Clients might choose to implement alternate text-smoothing methods OLD Processing of intervals (<w/> elements) SHOULD be done via non-blocking programming techniques. NEW It is best to process of intervals (<w/> elements) via non-blocking programming techniques. OLD Upon receiving a <message/> containing <body/> indicating a completed message, the full message SHOULD be displayed immediately in place of the real-time message, and unprocessed action elements cleared from the playback queue. NEW Upon receiving a <message/> containing <body/> indicating a completed message, the full message ought to be displayed immediately in place of the real-time message, and unprocessed action elements cleared from the playback queue. OLD If the playback queue contains too much delay in <w/> elements (i.e. <w/> elements from two <rtt/> transmissions ago), the recipient client MAY ignore or shorten the intervals of <w/> elements, to allow lagged real-time text to "catch up" more quickly. NEW If the playback queue contains too much delay in <w/> elements (i.e. <w/> elements from two <rtt/> transmissions ago), the recipient client can ignore or shorten the intervals of <w/> elements, to allow lagged real-time text to "catch up" more quickly. OLD Recipient clients MAY choose to display a cursor (or caret) within incoming real-time messages. NEW Recipient clients might choose to display a cursor (or caret) within incoming real-time messages. OLD The remote cursor SHOULD be clearly distinguishable from the sender's real local cursor. NEW The remote cursor ought to be clearly distinguishable from the sender's real local cursor. OLD Whenever the cursor is moving without any text modifications (via arrow keys or mouse), the sender MAY transmit extra Element <t/> – Insert Text with an empty string to update the remote cursor position via attribute p. NEW Whenever the cursor is moving without any text modifications (via arrow keys or mouse), it is acceptable for the sender transmit extra Element <t/> – Insert Text with an empty string to update the remote cursor position via attribute p. OLD Real-time text MAY be accompanied with XEP-0085 Chat State Notifications [12]. NEW Real-time text can be used in conjunction with XEP-0085 Chat State Notifications [12]. OLD Support for real-time text in MUC is OPTIONAL, NEW It can be appropriate to use real-time text in the context of a MUC room, Note: optional/appropriate for what kinds of implementations? Senders? Receivers? MUC servers? What exactly is meant here by "support"? OLD For MUC, the RTT Element event attribute value of 'cancel' SHOULD NOT be used. NEW In MUC rooms, senders ought not generate 'event' attributes with a value of "cancel", and receivers ought to ignore such values. OLD Software MAY hide idle real-time messages to minimize on-screen clutter when more than one person is typing. Congestion control MAY also be used, via automatic adjustment of Transmission Interval, see Congestion Considerations. NEW It is appropriate for software to hide idle real-time messages in order to minimize on-screen clutter when more than one person is typing. Implementers are also encouraged to use congestion control via automatic adjustment of Transmission Interval, see Congestion Considerations. OLD Any combination of audio, video, and real-time text MAY be used together simultaneously. NEW Any combination of audio, video, and real-time text can be used together simultaneously. Similarly, at the end Section 7.9, remove all conformance terms from the bullet points: the conformance language is covered elsewhere so it is unnecessary here. SECTION 8.1 It is not the place of this specification to make recommendations beyond this protocol. Therefore: OLD It is noted there is also another real-time text standard (RFC 4103, IETF RFC 5194 [17]), used for SIP sessions with real-time text. In the situation where an implementor needs to decide which real-time text standard to use, it is generally recommended to use the real-time text specification of the specific session control standard in use for that particular session. This varies from implementation to implementation. For example, Google Talk network uses XMPP messaging for instant messages sent during audio/video conversations. Therefore, in this situation, it is recommended to use this XEP-0301 specification to add real-time text functionality. However, there are other situations where it is necessary to support multiple real-time-text standards, and to interoperate between the multiple real-time text standards. NEW It is noted there is also another real-time text standard (RFC 4103, IETF RFC 5194 [17]), used for SIP sessions with real-time text. In the situation where an implementor needs to decide which real-time text standard to use, it makes sense to use the real-time text specification of the specific session control standard in use for that particular session. This varies from implementation to implementation. For example, the Google Talk network uses XMPP messaging for instant messages sent during audio/video conversations. Therefore, in this situation, it make sense to use this XEP-0301 specification to add real-time text functionality. However, there are other situations where it is necessary to support multiple real-time-text standards, and to interoperate between the multiple real-time text standards. SECTION 8.2 It might be worthwhile to reference here the (expired) Internet-Drafts that already define mapping of addresses and signalling between SIP and XMPP: draft-saintandre-sip-xmpp-core and draft-saintandre-sip-xmpp-media. SECTION 9 Here again please look at RFC 6365. In particular, I think you might mean "scripts" instead of "languages" (or, to be safe, "languages/scripts"). ### 3. NITS ### Throughout the text, "i.e." is used when I think the author means "e.g.". Please double-check all instances. Please expand acronyms on first use (e.g., CART). SECTION 1 s/deaf/hearing impaired/ (?) Perhaps also mention that RTT functionality is beneficial in emergency situations. SECTION 2 s/transversal/traversal/ SECTION 4.1 s/Transmission of <rtt/> occurs/Transmission of the <rtt/> element occurs/ s/“urn:xmp:rtt:0”/“urn:xmpp:rtt:0”/ SECTION 4.2.1 The order of sentences is a bit confusing. I suggest... OLD ### This REQUIRED attribute is a counter to maintain the integrity of a real-time message. Senders MUST increment the seq attribute by 1 for each subsequent <rtt/> transmitted. Recipients MUST monitor the seq value to verify that it is incrementing. For more info, see Automatic Recovery of Real-Time Text. The bounds of seq is 31-bits, the range of positive values of a signed integer. The exception to the incrementing rule is <rtt/> elements with an 'event' attribute. In this case, senders MAY use any seq value as the new starting value. For best integrity, seq SHOULD be randomized. The new starting value SHOULD be less than 1 million to allow plenty of incrementing room, and to keep <rtt/> compact. ### NEW ### This REQUIRED attribute is a counter to maintain the integrity of a real-time message (its bounds are 31-bits, the range of positive values of a signed integer). Senders MUST increment the seq attribute by 1 for each subsequent <rtt/> transmitted, except when the 'event' attribute has a value of "new". In this case, senders MAY use any seq value as the new starting value. For best integrity, the starting value of seq SHOULD be randomized when initializing a new sequence. In addition, the new starting value SHOULD be less than 1 million to allow plenty of incrementing room, and to keep <rtt/> compact. Recipients MUST monitor the seq value to verify that it is incrementing. For further details, see Automatic Recovery of Real-Time Text. ### SECTION 4.3 OLD Upon receipt of <body/>, the message becomes permanent and can not be edited any further. NEW Upon receipt of a message stanza containing <body/> element, the message becomes permanent and cannot be edited any further using this protocol. SECTION 4.3.1 OLD 4.3.1 Backwards Compatible The real-time text standard simply provides early delivery of text before the <body/> element. The <body/> element continues to follow the XMPP Core [7] standard. Clients that do not support real-time text, will continue to behave normally, displaying complete lines of messages as they are delivered. NEW 4.3.1 Backward Compatibility The real-time text protocol simply provides early delivery of text before the <body/> element. The <body/> element continues to follow the XMPP Core [7] specification. In particular, because XMPP implementations need to ignore XML elements they do not understand, slients that do not support real-time text will continue to behave normally, displaying complete lines of messages as they are delivered. SECTION 4.4 OLD For the best balance between interoperability and usability, the transmission interval of <rtt/> for a continuously-changing message SHOULD be approximately 0.7 second. NEW For the best balance between interoperability and usability, the transmission interval of <rtt/> elements for a continuously-changing message SHOULD be approximately 0.7 second. Note: This one is borderline usability advice. However, since it has implications for congestion control, I think it's acceptable to include it. SECTION 4.5.4 s/a compliant XML processor already do this/compliant XML processors already do this/ SECTION 4.6.3 s/Processing of real-time MUST/Processing of real-time messages MUST/ SECTION 6.2.1 OLD it captures accented characters, Chinese, Arabic and other characters that require multiple key presses to compose. NEW it captures Unicode characters that require multiple key presses to compose or that necessitate the use of an input method editor. OLD text change events are more cross-platform portable, including on mobile phones. NEW text change events are more portable across platforms, including on mobile phones. SECTION 6.4.3.1 s/full JID/occupant JID/ (at least that's what I think you mean) Section 6.4.3.2 OLD A good implementation of Message Retransmission will improve user experience, regardless of whether or not XEP-0296 is used (Best Practices for Resource Locking [14]). NEW A good implementation of Message Retransmission will improve user experience, regardless of whether or not the software follows Best Practices for Resource Locking [14]. SECTION 10.2 OLD The load between participants using this specification in the recommended way, will cause a load that is only marginally higher than a user communicating without this specification. NEW Use of this specification in the recommended way will cause a load that is only marginally higher than a user communicating without this specification. I have some even smaller issues of grammar and punctuation, but I can save those for a XEP Editor review before or after Last Call. Thanks! Peter -- Peter Saint-Andre https://stpeter.im/
