(Ooops -- I accidentally sent my last message early.)
(Please disregard the reply I sent 1 hour ago)

About XEP-0301 In-Band Real-Time Text
http://www.xmpp.org/extensions/xep-0301.html

On Tue, Mar 6, 2012 at 6:04 PM, Florian Zeitz <[email protected]>wrote:

> Hello standards-list, hello Mark, "inspired" by the recent discussions
> surrounding XEP-0301 (Real Time Text) I had a look over it's current status
> and felled I should provide some feedback. Some is rather minor some I'm
> somewhat more concerned about, I've ordered it sequentially rather than by
> importance.
>

Thanks for writing!  You have some excellent suggestions of areas to
clarify in the specification.


Section 3: Glossary
> The "RTT" entry seems superfluous to me. I'd be better to just note the
> acronym in the "real-time text" entry. Also the remark about the
> element's name is misleading as that is generally lower-case.
>

The glossary has been simplified in the next version of the spec, amongst
several other simplifications that have now been done.


Section 4.2.2: 'seq' attribute
> It seems to me the start of a session (and therefore when to reset this
> to 0) is not clearly defined, since the start and cancel events are
> purely optional. In a more general sense I'm somewhat concerned about
> the attempt to reimplement the transport layer on the application layer.
> (see below)
>

The seq was developed, because, it was found in fact to be necessary.
-- Disconnect and reconnect cycles.  Including those caused by bad wireless
reception (WiFi, 3G, mobile phone)
-- Not all servers support offline message delivery.
-- BOSH failures in a web browser at the client side (i.e. intermittent
HTTP request failures)
-- And less often, situations of extreme congestion has happened.
My XEP-0301 is intentionally designed to survive a wide variety of
situations that have actually happened in my tests.  This attribute is
critical to my applications, as well as Next Generation 9-1-1 experimental
demos.  See http://tools.ietf.org/html/draft-tschofenig-ecrit-xmpp-es-00 ...
"Emergency Services Functionality with the Extensible Messaging and
Presence Protocol (XMPP)", section 4.5 mentions XEP-0301 as one of the
possible functionalities that may become a part of NG9-1-1.  Real-time text
technologies here is very useful for accessibility too; as it replaces the
need to carry around a teletypewriter (TTY).

We need to keep the 'seq' attribute, as it is essential for message
integrity during less-than-ideal situations.  I actually expanded it to
recommend an event='reset' once every 10 XMPP messages, to improve
resilence even further -- the latest version of RealJabber now does this.
 (you can the new RealJabber on two computers -- and test this concept by
disconnecting-reconnecting in the middle of a conversation while the other
person is still typing real-time in RealJabber, and the real-time text
recovers automatically within a few seconds of reconnecting because of the
automatic event='reset' occuring at regular intervals.)

Also, very rarely (it only happened on BlackBerry's Google Talk client)
when you've transmitted several XMPP messages simultaneously (i.e. network
congestion) and they all get the same timestamp, and then they get
delivered out-of-order (wrong order) that they were transmitted.  This
should never happen, but it actually occasionally does.   An earlier
version of XEP-0301 was more complex, having a 'msg' attribute for message
number.  That was removed, reduced down to just one 'seq' value for
simplification.

-- 'seq' does not need to start at 0.  I'll eliminate that requirement
(clarification)
-- You can change the 'seq' value anytime there's an event='new' or
event='reset'.
Setting it back to 0 again works, although I prefer not to reset it to 0
because of the danger of a user disconnecting while seq='0' and
reconnecting after it's incremented, reset back to '0' again, and
incremented to '1', and the user had reconnected, getting consecutive seq
numbers for totally different real-time messages that were never delivered.
 In this case, the wrong <rtt> will be displayed, resulting in rare text
scrambling.  This actually happened once in my random testing, so I stopped
resetting seq back to 0 everytime there was an event='reset'.



> Section 4.2.3: 'event' attribute
> I feel the requirement for session start and cancel needs to be either
> tightened (if we decide we absolutely need it for this protocol) or
> removed. Having it truly optional makes it useless for detecting the
> actual session start and end IMHO. It's sole purpose appears to be
> mapping to SIP, which is a problem possible best handled separately.
>  The new and reset events appear to have been introduced under the
> assumption that messages get lost. If they are not the reset event
> can be safely removed and the new event is implicit upon receipt of a
> <body/> element.
>

You are right, that 'start' and 'cancel' is not really required, so I may
be removing them.  However, I think that 'cancel' may still be necessary to
signal the other recipient to stop transmitting incoming <rtt> for the
remainder of the chat session, in order to save bandwidth, whenever the
recipient wants to turn off RTT (i.e. via a button or switch, while in
middle of conversation).   So there's still a usefulness for 'cancel' even
if 'start' is not neeed (the start of RTT during a chat session is simply
the first delivery of an <rtt> element.)



> Section 4.5.1: action elements
> The normative text in this section should be further explained.
> E.g.: What is REQUIRED for the <t/> element? Support, inclusion in each
> <rtt/> element, etc. (It is relatively clear to me what you mean, I just
> wish it was somewhat more fleshed out)
> "A client conforming to this specification MUST accept <t/>, <e/> and
> <d/> elements and handle them as described in the following..."
>

Agreed, it should be clarified.


Section 4.5.1.3: counting
> It appears to me that the rules for determining the position and count
> of code points are somewhat backwards. In particular if the sending
> client does perform any normalization before sending the counts need to
> be based on the normalized version since the receiving client can not
> undo such normalization (this is the opposite of what is described in
> the text). Also most of the described transformations are only relevant
> for display on screen and should not change the string.
> IMHO it should suffice to count code points based on what is send over
> the wire.
>

We have to keep "Unicode code points" (more on that later) but I agree that
normalization paragraph is definitely confusing, so I've modified it to the
following wording:
*
*
*   "For interoperability of p and n values, processing MUST be done on
the transmitted Unicode real-time message. For senders , this is the
version of the Unicode message text after any Unicode normalization,
emoticon graphics images conversion to Unicode, display text
formatting, processing of Unicode combining marks, etc. For recipients
obtaining text from the <t> element, this is the Unicode text immediately
after XML processing, and before any further processing. From the
perspective of p and n values, a real-time message is treated as an
editable array of Unicode code points."*

Now, the reason why we have to keep "Unicode code points":
The section 9 "Internationalization Considerations" explains why XEP-0301
uses "code points" technique:
Different programming platforms use different internal Unicode encodings,
which may be different from the transmission encoding (UTF-8) for XMPP.
 -- Multiple Unicode code points may represent one displayable Unicode
character (i.e. combining marks).
Action elements operate on Unicode code points, not on displayable
characters.
-- Characters U+10000 through U+1FFFF, which are single code points, but
are represented as multiple surrogate code units in certain Unicode
encodings (i.e. UTF-16).
Action elements operate on Unicode code points, not on individual surrogate
code units.
-- Some Unicode encodings use a variable number of bytes per Unicode code
point (i.e. UTF-8).
Action elements operate on Unicode code points, not on individual bytes.

Real-time editing (mid-text inserts/deletes) of Unicode text containing
variable-length encodings, causes major text scrambling if the recipient
and sender
So unfortunately, a standardization of counting is essential, especially
when international text is involved.

The 'char' of some programming languages is sometimes 8-bit, sometimes
16-bit, and sometimes 32-bit.
Often, XMPP libraries already pre-convert the text to different Unicode
encodings.
Not all of them have access to the original UTF-8 "wire" text, so I can't
depend on counting via "wire format" (UTF-8) unless I ask them to convert
everything back to UTF-8 before processing.  But that itself is a catch-22
because in many programming languages, String.Insert / String.Delete
operations, don't operate on UTF-8.

Therefore, we decided it ended up being necessary to standardize on
'unicode code points'.
In addition, here's a flowchart diagram that may help understand the
Unicode preservation scenario better:
http://www.realjabber.org/flowchart_of_xmpp_rtt_path.pdf
(This file will need to be updated for the latest XEP-0301 draft)


Section 4.5.2: action elements
> I'd like to hear some rational on why there is forward and backward
> delete. Both appear to be able to generate the same results.
>

-- Note that cursor position never changes with a forward delete operation.
-- The cursor position only changes if a backspace operation is done.
-- Subsequent action elements does not require knowledge of the cursor
position of preceding action elements.

However, it is true that the standard could still work with just one of the
two text-deleting codes (and instead using the <c/> element to correct any
cursor position).  I have seriously considered removing one of the two
codes.  However, we found that bandwidth is more efficient if I kept both
codes.


> It did occur to me that they are meant to be used in conjunction with
> cursor display. However, it appears that this would cause interesting
> possible situations. E.g. what happens if a character is forward deleted
> at a position preceding the cursor. In that situation the absolute
> position of the cursor should move one to the left, but will instead
> move 1 to the right relative to the text (it might move over the right
> end). I'd prefer expected cursor position to always be transmitted
> explicitly in these cases and have either delete variant removed.
>

It actually optimizes bandwidth to have both codes available, because it
reduces the number of cursor-position-correcting <c/> elements transmitted.
 (<c/> is only transmitted when needed, i.e. cursor position is not where
it should be after the specific delete operation is done)

The source code that I wrote to comply with section 6.2.1 "Monitoring
Message Edits" (
http://xmpp.org/extensions/xep-0301.html#monitoring_message_edits<http://xmpp.org/extensions/xep-0301.html>
)
is actually implemented at line 298 of RealTimeText.cs of RealJabber ....
function EncodeRawRTT viewable at
http://code.google.com/p/realjabber/source/browse/trunk/CSharp/RealTimeText.cs#298
intelligently
decides whether to do a <e> or a <d> based on where the cursor should go at
the end, located at lines 360-370 of the above hyperlink.
That said, if bandwidth wasn't as important, I could just very easily
remove either <e> or <d> and it would not make any difference to the end
user to RealJabber, since a cursor position correction is done.

If we keep <d> and eliminate <e> it means backspace operations might need
to be accompanied by a corrective cursor repositioning.
If we keep <e> and eliminate <d> it means delete key operations  might need
to be accompanied by a corrective cursor repositioning.
(Note: This is optional -- this is only for clients that decide to support
transmission / reception of cursor positioning)

Also, I've actually stopped using <c/> because I realized an empty <t
p='#'/> element does exactly the same thing as <c p='#'/>  ...
Therefore I am actually thinking of removing two action elements from the
next XEP-0301
- Remove <c/> because I can use empty <t/> to do exactly the same thing.
- Remove <g/> because I can use XEP-0224 instead successfully anyway.
That reduces the number of action elements to just 4, and it makes it easy
to merge Tier 1 with Tier 2 into one unified table for simplicity.
However, I've found enough reason to keep both <d> and <e> -- but our team
can still be swayed by further arguments against having both.


Section 4.6: error recovery
>
As mentioned before, the attempt to correct errors is my biggest concern
> about this XEP. For the case of reconnects it appears to me that the
> sending client will always be able to notice this situation and treat it
> as a new RTT session.
>

Error recovery is actually simpler than it looks -- it consumes less than
5% of the source code in RealTimeText.cs
One of my business clients had serious problems without error recovery (we
had an attempt to make it optional), so we actually expanded Error Recovery
to RECOMMEND event='reset' at regular intervals, such as once every 10
<rtt> messages (or once every 10 seconds).  Also, sometimes you can't
detect online/offline transitions, for example, Google Talk network
sometimes can't see the online/offline status of jabber.org users, and you
can have RTT conversations with users that appear offline (i.e. invisible).


Section 6.4.1: message length limit
> In the second example with the split messages I would not have expected
> an empty <rtt/> element. If that is actually intended to indicate the
> <body/> is part of the RTT session this should be mentioned elsewhere.
>

Yes, that was the intent.  This will need to be clarified.
It's not essential, but it is a very useful indicator.

Your comments are useful.  I'd appreciate hearing back from you, and you
are very welcome to run RealJabber (www.realjabber.org) and test it out
with me -- email me privately to make an appointment -- I would appreciate
your comments, given your useful insights.

Thanks!
Mark Rejhon

Reply via email to