I've had a chance to review XEP-0301 in some detail.

First, overall I think it is in good shape, I don't have any major
concerns, and I think it is appropriate for advancement on the standards
track at the XSF.

However, I do have some comments, which I've grouped into technical,
editorial, and nits.


### 1. TECHNICAL ###


In Section 4.6.4, why is it a SHOULD to retransmit a partially composed
message in the circumstances enumerated there? I don't see what
difference this makes regarding interoperability.

The spec provides few guidelines about when to send multiple <message/>
stanzas vs. when to include multiple actions in the same <message/>
stanza (e.g., see the difference between the examples in Section 7.2 and
Section 7.3). I was expecting to find some text about this in the
section on Congestion Control. A similar concern applies to this text
from Section 6.4.1: "For long messages, the final <rtt/> transmission
may be made in a separate <message/> than the <message/> containing the
<body/>." Why? When is this appropriate and when not?

There's some confusion over internationalization terminology. I suggest
that the author read RFC 6365. I also provide specific comments below.

The use of conformance terms (MUST/SHOULD/MAY) is inappropriate in the
Implementation Notes, and in general is being used with regard to user
interface issues, not protocol issues that have an impact on
interoperability. I fully realize that user interface issues are
important to the author, but it's not correct to use conformance
terminology here. I provide detailed suggestions in the next section of
my comments. Also, personally I strenuously avoid lowercase versions of
the RFC 2119 conformance terms these days to avoid any possible
confusion. Thus I tend to change "may" to "might" or "can" or "it is
acceptable to", "should" to "ought to" or "it is best to", "must" to
"needs to", etc. I commend this convention to the author.

In Section 6.1.3, we find that "If additional accuracy is required, it
is also possible to timecode the <rtt/> elements." How? Is this a matter
for implementation? Is it out of scope for this specification?

The schema does not define the allowable values for the 'event'
attribute. Are "new", "reset", and "cancel" the only values? Is there a
default value (I assume not)?

Also, it seems correct in the schema to set the default value of the 'n'
attribute to "1".

The security considerations say nothing about the use of this protocol
with end-to-end encryption of whatever flavor (XEP-0027, RFC 3923,
XEP-0116, OTR, XTLS, xmlenc, draft-miller-xmpp-e2e, etc.). That seems
like a fairly significant oversight.

As to congestion control, it's probably a good idea to look again at
Section 9 of RFC 4103 for ideas about more detailed suggestions
(although the author has probably already done so).


### 2. EDITORIAL ###


Most of these are suggestions of varying weight. The author is free to
ignore them, however I think most of them make eminent sense and deserve
to be strongly considered.

SECTION 2

I think it makes sense to cite ITU-T T.140 here.

SECTION 4.2.2

The value of "reset" is specified as must-implement, but there is no
such statement about "new" and "cancel". Please clarify.

SECTION 4.5.5

Please see RFC 6365 regarding the terminology here. For example, I think
you want:

s/glyph/character/
s/character glyphs/characters/
s/surrogate code units/surrogate pairs/

The text here says that "calculations of p and n values MUST be based on
Unicode code points". Are you sure that you mean code points? Given that
XMPP mandates the use of UTF-8, I think it would be safer and easier to
say "UTF-8-encoded code points" (the point about "Some Unicode encodings
use a variable number of bytes per Unicode character" is true but
hopefully irrelevant here). Also, it's not always so simple to say
exactly which code point you're supposed to count, because Unicode
normalization could come into play -- see my presentation on
"Internationalization: A Guide for the Perplexed" at
https://stpeter.im/files/i18n-intro.pdf for many examples, but a simple
one is IV (Latin capital letter "I" plus Latin capital letter "V") vs. Ⅳ
(Roman numeral four).

SECTION 4.6.2

OLD
An indicator MAY be used by the recipient to indicate the loss of sync.

NEW
A client might want to show an indicator to indicate the loss of sync.

(That is: more interface suggestions, no interoperability impact.)

SECTION 6

As mentioned, there's a lot of conformance language here and simply
doesn't belong. Here are my suggested changes.

OLD
Senders with bursty output MAY immediately transmit word bursts of text
without buffering.

NEW
It is acceptable for senders with bursty output to immediately transmit
word bursts of text without buffering.

OLD
It is NOT REQUIRED to monitor or transmit Element <w/> – Interval for
transcription.

NEW
It is not necessary to monitor or transmit Element <w/> – Interval for
transcription.

OLD
Clients MAY optimize for bandwidth, performance and/or screen repaints
by eliminating, merging, or ignoring Element <w/> – Interval
selectively, especially those containing shorter intervals.

NEW
Clients can optimize for bandwidth, performance and/or screen repaints
by eliminating, merging, or ignoring Element <w/> – Interval
selectively, especially those containing shorter intervals.

OLD
The transmission interval of <rtt/> MAY also vary, either intentionally
for optimizations, or due to precision limitation.

NEW
It is acceptable for the transmission interval of <rtt/> elements to
also vary, either intentionally for optimizations, or due to precision
limitations.

OLD
Clients MAY choose to implement alternate text-smoothing methods

NEW
Clients might choose to implement alternate text-smoothing methods

OLD
Processing of intervals (<w/> elements) SHOULD be done via non-blocking
programming techniques.

NEW
It is best to process of intervals (<w/> elements) via non-blocking
programming techniques.

OLD
Upon receiving a <message/> containing <body/> indicating a completed
message, the full message SHOULD be displayed immediately in place of
the real-time message, and unprocessed action elements cleared from the
playback queue.

NEW
Upon receiving a <message/> containing <body/> indicating a completed
message, the full message ought to be displayed immediately in place of
the real-time message, and unprocessed action elements cleared from the
playback queue.

OLD
If the playback queue contains too much delay in <w/> elements (i.e.
<w/> elements from two <rtt/> transmissions ago), the recipient client
MAY ignore or shorten the intervals of <w/> elements, to allow lagged
real-time text to "catch up" more quickly.

NEW
If the playback queue contains too much delay in <w/> elements (i.e.
<w/> elements from two <rtt/> transmissions ago), the recipient client
can ignore or shorten the intervals of <w/> elements, to allow lagged
real-time text to "catch up" more quickly.

OLD
Recipient clients MAY choose to display a cursor (or caret) within
incoming real-time messages.

NEW
Recipient clients might choose to display a cursor (or caret) within
incoming real-time messages.

OLD
The remote cursor SHOULD be clearly distinguishable from the sender's
real local cursor.

NEW
The remote cursor ought to be clearly distinguishable from the sender's
real local cursor.

OLD
Whenever the cursor is moving without any text modifications (via arrow
keys or mouse), the sender MAY transmit extra Element <t/> – Insert Text
with an empty string to update the remote cursor position via attribute p.

NEW
Whenever the cursor is moving without any text modifications (via arrow
keys or mouse), it is acceptable for the sender transmit extra Element
<t/> – Insert Text with an empty string to update the remote cursor
position via attribute p.

OLD
Real-time text MAY be accompanied with XEP-0085 Chat State Notifications
[12].

NEW
Real-time text can be used in conjunction with XEP-0085 Chat State
Notifications [12].

OLD
Support for real-time text in MUC is OPTIONAL,

NEW
It can be appropriate to use real-time text in the context of a MUC room,

Note: optional/appropriate for what kinds of implementations? Senders?
Receivers? MUC servers? What exactly is meant here by "support"?

OLD
For MUC, the RTT Element event attribute value of 'cancel' SHOULD NOT be
used.

NEW
In MUC rooms, senders ought not generate 'event' attributes with a value
of "cancel", and receivers ought to ignore such values.

OLD
Software MAY hide idle real-time messages to minimize on-screen clutter
when more than one person is typing. Congestion control MAY also be
used, via automatic adjustment of Transmission Interval, see Congestion
Considerations.

NEW
It is appropriate for software to hide idle real-time messages in order
to minimize on-screen clutter when more than one person is typing.
Implementers are also encouraged to use congestion control via automatic
adjustment of Transmission Interval, see Congestion Considerations.

OLD
Any combination of audio, video, and real-time text MAY be used together
simultaneously.

NEW
Any combination of audio, video, and real-time text can be used together
simultaneously.

Similarly, at the end Section 7.9, remove all conformance terms from the
bullet points: the conformance language is covered elsewhere so it is
unnecessary here.

SECTION 8.1

It is not the place of this specification to make recommendations beyond
this protocol. Therefore:

OLD
It is noted there is also another real-time text standard (RFC 4103,
IETF RFC 5194 [17]), used for SIP sessions with real-time text. In the
situation where an implementor needs to decide which real-time text
standard to use, it is generally recommended to use the real-time text
specification of the specific session control standard in use for that
particular session. This varies from implementation to implementation.
For example, Google Talk network uses XMPP messaging for instant
messages sent during audio/video conversations. Therefore, in this
situation, it is recommended to use this XEP-0301 specification to add
real-time text functionality. However, there are other situations where
it is necessary to support multiple real-time-text standards, and to
interoperate between the multiple real-time text standards.

NEW
It is noted there is also another real-time text standard (RFC 4103,
IETF RFC 5194 [17]), used for SIP sessions with real-time text. In the
situation where an implementor needs to decide which real-time text
standard to use, it makes sense to use the real-time text specification
of the specific session control standard in use for that particular
session. This varies from implementation to implementation. For example,
the Google Talk network uses XMPP messaging for instant messages sent
during audio/video conversations. Therefore, in this situation, it make
sense to use this XEP-0301 specification to add real-time text
functionality. However, there are other situations where it is necessary
to support multiple real-time-text standards, and to interoperate
between the multiple real-time text standards.

SECTION 8.2

It might be worthwhile to reference here the (expired) Internet-Drafts
that already define mapping of addresses and signalling between SIP and
XMPP: draft-saintandre-sip-xmpp-core and draft-saintandre-sip-xmpp-media.

SECTION 9

Here again please look at RFC 6365. In particular, I think you might
mean "scripts" instead of "languages" (or, to be safe, "languages/scripts").


### 3. NITS ###


Throughout the text, "i.e." is used when I think the author means
"e.g.". Please double-check all instances.

Please expand acronyms on first use (e.g., CART).

SECTION 1

s/deaf/hearing impaired/ (?)

Perhaps also mention that RTT functionality is beneficial in emergency
situations.

SECTION 2

s/transversal/traversal/

SECTION 4.1

s/Transmission of <rtt/> occurs/Transmission of the <rtt/> element occurs/

s/“urn:xmp:rtt:0”/“urn:xmpp:rtt:0”/

SECTION 4.2.1

The order of sentences is a bit confusing. I suggest...

OLD

###

This REQUIRED attribute is a counter to maintain the integrity of a
real-time message. Senders MUST increment the seq attribute by 1 for
each subsequent <rtt/> transmitted. Recipients MUST monitor the seq
value to verify that it is incrementing. For more info, see Automatic
Recovery of Real-Time Text.

The bounds of seq is 31-bits, the range of positive values of a signed
integer. The exception to the incrementing rule is <rtt/> elements with
an 'event' attribute. In this case, senders MAY use any seq value as the
new starting value. For best integrity, seq SHOULD be randomized. The
new starting value SHOULD be less than 1 million to allow plenty of
incrementing room, and to keep <rtt/> compact.

###

NEW

###

This REQUIRED attribute is a counter to maintain the integrity of a
real-time message (its bounds are 31-bits, the range of positive values
of a signed integer).

Senders MUST increment the seq attribute by 1 for each subsequent <rtt/>
transmitted, except when the 'event' attribute has a value of "new". In
this case, senders MAY use any seq value as the new starting value. For
best integrity, the starting value of seq SHOULD be randomized when
initializing a new sequence. In addition, the new starting value SHOULD
be less than 1 million to allow plenty of incrementing room, and to keep
<rtt/> compact.

Recipients MUST monitor the seq value to verify that it is incrementing.
For further details, see Automatic Recovery of Real-Time Text.

###

SECTION 4.3

OLD
Upon receipt of <body/>, the message becomes permanent and can not be
edited any further.

NEW
Upon receipt of a message stanza containing <body/> element, the message
becomes permanent and cannot be edited any further using this protocol.

SECTION 4.3.1

OLD
4.3.1 Backwards Compatible
The real-time text standard simply provides early delivery of text
before the <body/> element. The <body/> element continues to follow the
XMPP Core [7] standard. Clients that do not support real-time text, will
continue to behave normally, displaying complete lines of messages as
they are delivered.

NEW
4.3.1 Backward Compatibility
The real-time text protocol simply provides early delivery of text
before the <body/> element. The <body/> element continues to follow the
XMPP Core [7] specification. In particular, because XMPP implementations
need to ignore XML elements they do not understand, slients that do not
support real-time text will continue to behave normally, displaying
complete lines of messages as they are delivered.

SECTION 4.4

OLD
For the best balance between interoperability and usability, the
transmission interval of <rtt/> for a continuously-changing message
SHOULD be approximately 0.7 second.

NEW
For the best balance between interoperability and usability, the
transmission interval of <rtt/> elements for a continuously-changing
message SHOULD be approximately 0.7 second.

Note: This one is borderline usability advice. However, since it has
implications for congestion control, I think it's acceptable to include it.

SECTION 4.5.4

s/a compliant XML processor already do this/compliant XML processors
already do this/

SECTION 4.6.3

s/Processing of real-time MUST/Processing of real-time messages MUST/

SECTION 6.2.1

OLD
it captures accented characters, Chinese, Arabic and other characters
that require multiple key presses to compose.

NEW
it captures Unicode characters that require multiple key presses to
compose or that necessitate the use of an input method editor.

OLD
text change events are more cross-platform portable, including on mobile
phones.

NEW
text change events are more portable across platforms, including on
mobile phones.

SECTION 6.4.3.1

s/full JID/occupant JID/ (at least that's what I think you mean)

Section 6.4.3.2

OLD
A good implementation of Message Retransmission will improve user
experience, regardless of whether or not XEP-0296 is used (Best
Practices for Resource Locking [14]).

NEW
A good implementation of Message Retransmission will improve user
experience, regardless of whether or not the software follows Best
Practices for Resource Locking [14].

SECTION 10.2

OLD
The load between participants using this specification in the
recommended way, will cause a load that is only marginally higher than a
user communicating without this specification.

NEW
Use of this specification in the recommended way will cause a load that
is only marginally higher than a user communicating without this
specification.

I have some even smaller issues of grammar and punctuation, but I can
save those for a XEP Editor review before or after Last Call.

Thanks!

Peter

-- 
Peter Saint-Andre
https://stpeter.im/



Reply via email to