Mark, et al, I re-read the draft and here are my comments. After I wrote all of this, I thought it sounds like I'm pounding on everything. Overall, the text was great.. it has come a long way. That said, here are my comments and questions:
Section 1: "and is favored by deaf and hard of hearing individuals who prefer text conversation" I suggest you strike the above. What I have been told several times before is that deaf people do not want special equipment and they want to use mainstream technology. While the above text is true, I do not want anyone to view this extension as "something for accessibility" and dismiss it for that reason. It is an extension with wide applicability that I suspect people would appreciate if it were there. I communicate with some people today who type partial messages and hit ENTER just to move discussion faster. That is evidence to me of broad utility. I would also get rid of most of the examples of various prior implementations. It's somewhat more balanced, but still tilted toward accessibility since 3 of the 6 examples are for that. It would be useful to mention "talk" and ICQ, since those are IP-based text messaging systems that closely parallel XEP-0301. I do have concern about mentioning ICQ by name. Since AOL did this, perhaps say 'UNIX "talk" some proprietary instant messaging systems'. Regarding this: "Real-time text is suitable for smooth and rapid mainstream communication in text, as an all-inclusive technology to complement instant messaging. Real-time text is suitable for smooth and rapid mainstream communication in text, as an all-inclusive technology to complement instant messaging. It can also allow immediate conversation in situations where speech cannot be used (e.g. quiet environments, privacy, deaf and hard of hearing). Real-time text is also beneficial in emergency situations, due to its immediacy. For a visual animation of real-time text, see Real-Time Text Taskforce [5]." I would suggest these slight changes: "Real-time text is suitable for smooth and rapid communication, complementing the existing en bloc mode for sending text messages. It also allows for immediate conversation in situations where speech cannot be used (e.g. quiet environments, privacy, deaf and hard of hearing). Real-time text is also beneficial in emergency situations, due to its immediacy. For a visual animation of real-time text, see Real-Time Text Taskforce [5]." Most importantly, we don't need to say it's suitable for mainstream. It suggests somehow that it might not be. And this does complement IM, but definitely does not replace it or operate separately in some way; I don't want misinterpretation. XMPP currently delivers messages en bloc and this extension adds a means of giving the existing XMPP message delivery a real-time feel. "En bloc" may not be preferred, but I don't want folks to assume this sits alongside (separate from) IM. Section 2: "Next Generation 9-1-1 / 1-1-2 emergency services" This leaves out so many countries; very America/Europe centric. See http://en.wikipedia.org/wiki/Emergency_telephone_number. Should we just get rid of "9-1-1 / 1-1-2"? Point is you want this for next generation emergency services anywhere in the world. I would not capitalize "Next Generation", either. Section 4: Showing <body> not be included in the previous message containing <rtt> in Example 1 might lead people to believe this is expected. I would suggest making the first example one that had <body> in the end, since I suspect that will be the typical case. Perhaps a word about this somewhere might be useful (if not already covered). Section 4.2.1: Why is "seq" only 31 bits? Since the same memory is consumed for 31 or 32 bits, why not just makes it an unsigned 32-bit integer? And why worry about wrap-around? I would allow it to occur. Specify the behavior. Section 4.2.2: A value for "init" is that it would remove any ambiguity related to the "seq" value. The "seq" value could always start at 1 if "init" were required. The problem with "init", though, is that if a sender sends three messages one after the other, the first two might go to client A and the last one might go to client B. This would happen if I have two XMPP clients connected to the server and I disconnect one. Therefore, "init" and "cancel" seem pointless. I'd suggest getting rid of them entirely. I like having "new" since that Client B I refer to would know that if it gets an <rtt> that is not "new" it must be some message somewhere in the middle of typing and can just ignore those until it gets a <body>, then pick up with RTT on the next <rtt event="new">. Section 4.2.3 XEP-0308 specifies use of "id" in <message> and <replace>. Could we not just use "<replace>" along with "<rtt>"? It would require some text in XEP-0308 that says that if <replace> is received without <body>, it shall be ignored. In -0301, it would not be ignored. "id" works, but I would not immediately recognize what that was for if I had not read this part of the spec. Section 4.4: "be approximately 0.7 second" -> " be approximately 0.7 seconds" I would even suggest saying 700ms, as I think that reads metter. Section 4.5.1: "Wait n thousandths of a second." I would prefer "wait n milliseconds", especially since the wait time might be 2300ms or more, for example. Section 4.5.2: "default value of n MUST be 1" -> "default value of n is 1" "For the purpose of this specification, the word "character" represents a single Unicode code point. See Unicode Character Counting." Shouldn't the above be moved to Section 3? Section 4.5.3.1: "Support the transmission" --> "Supports the transmission" Section 4.5.3.2: "Support the behavior of Backspace" --> "Supports the behavior of backspace" Section 4.5.3.3: Suggest changing: "Allow the transmission of intervals, between real-time text actions, to support the pauses between key presses." To: "Allow for the transmission of intervals between real-time text actions to recreate pauses between key presses." "Wait n thousandths of a second" --> "Wait n milliseconds" Question on this: "Also, if a Body Element arrives, pauses SHOULD be interrupted to prevent a delay in message delivery." Do you want to prevent a delay or realize a delay? I believe you want the entire <rtt> element to be fully processed, including delays, before acting on <body>. I'm not sure how to word that, but the above sentence was not clear to ne. Section 4.7: " non-compliant servers that modifies messages" --> " non-compliant servers that modify messages" Section 4.7.2: "line breaks MUST be treated as a single character, if line breaks are used within real-time text." --> "any line breaks MUST be treated as a single character." Section 6.2.1: I think the activation logic is complex. Let each user turn it on or off as he sees fit. If you send <rtt> tags to my client, whether that gets renders or not depends on my local settings. I don't see a strong need to negotiate this. Just always send <rtt> and display it (if received) whenever the user enables RTT. Section 6.3: Whether there is a visible cursor or not, the client has to take steps to render text properly. Since a cursor is not something sent via the protocol, I see no point talking about it. I'd remove this section. Section 6.4.4: I'm not sure what this is telling me. Why is <t> and <e> "unsuitable for most general-purpose clients"? And why encourage a device to use reset rather than provide more complete support? We know rendering is the bigger challenge, but receivers must accept what is sent. I see no reason to suggest a sender be lazy. I'd suggest removing this section unless there is something here of high value that's going over my head. Section 7.4.2: It seems that all of the examples show show <w> used between every key press. However, if sampling the input buffer (as recommend earlier in the text), one may not know the time between keystrokes. Perhaps the device samples the buffer and sees: "a" "app" This would translate to: <rtt><t>a</t></rtt> <rtt><t>a</t><w n="100"/>pp</rtt> Right? Related to <w>, suppose I type "h" and then "e" with about a 100ms delay. Further, suppose the IM client's 700ms timer fires and sends "h" on the wire like this: <rtt><t>h</t></rtt> Now, the client restarts the 700ms timer, after which time it sends: <rtt><w n="100"/><t>e</t></rtt> Is this correct? So, there was a 700ms "collection" delay, some message transmission delay (perhaps 100 or 200ms) and then an artificial delay inserted of 100ms. So, between "h" and "e", the user might actually wait 700+200+100 = 1000 milliseconds? Or, does the receiver maintain a running clock and as soon as the message arrives, it sees that w=100, but it's internal "wait timer" is already at 700+200ms, so it displays "e" immediately? (I assume this is the case and it should be described.) Section 8: As was mentioned in one discussion thread, H.323 also supports RFC 4103, so it might be useful to mention H.323 here, too. Section 9: How does XMPP indicate that a message should be displayed LTR or RTL? Is that derived from the language indicated in the <body> tag? This is legal: <body xml:lang="en">This would display left-to-right</body> In any case, we do need to ensure we capture directionality for languages like Hebrew. Paul
