Re: [Standards] Review: XEP-xxxx: In-Band Real Time Text

Gregg Vanderheiden Wed, 02 Mar 2011 04:00:52 -0800

For those new to the concept of real-time text the following notes can provide 
a better understanding of some aspects.

Why real-time text is important 

Real-Time text is an important medium that allows smooth and rapid 
communication between users and others in text, often in combination with other 
media.  This is important in daily communication for those who must rely on 
text to communicate and is even more important in emergency situations.

While critical to some users - it is very handy and liked by some mainstream 
users as well. Others like messaging and have no use or desire to use real-time 
text. Some use it for some communications but not others.  So in 
implementations it should be an easy thing to turn on and off -- and be off as 
default when shipped.

Real-time text provides faster communication because the person receiving the 
message receives it as it is being generated in real time, in the same manner 
as people who are listening to speech or watching someone sign. When the person 
finishes speaking the other person can immediately start responding rather than 
wait for the message to be received and read before being able to respond.

Real-time text also can be sent in parallel with speech (captioned telephony is 
one example of this). This allows the person to be able to both listen and yet 
fully understand what is being said even if their hearing or the noise in the 
background (or a combination of the two) would otherwise prevent them from 
accurately hearing instructions or important details in what is being said to 
them.

In emergency situations real-time text has two additional advantages over 
messaging when real- time text is available.  First, it allows the 911 operator 
to quickly see if the other person is typing the wrong information rather than 
the information most needed by the 911 operator to determine what type of help 
is needed and where it needs to be sent. The 911 operator can then interrupt 
the user early in their message and get them focused on the essentials. Second, 
it is not uncommon for someone sending a 911 message to be cut off before they 
finish their first utterance. They may pass out, be pulled away by circumstance 
or be prevented from finishing the sentence by their abuser or attacker. “help 
I am having a heart a....” “Someone is breaking into my h...” With speech, 
sign, or real-time text the message is received up until the point where it is 
interrupted.       And the sudden interruption and lack of response also give 
the 911 operator additional information. With messaging based text, nothing is 
sent if the first utterance is not fully completed and sent.

Use of Delay Codes
A couple of things that might make the importance of the delay codes  in 
realtime XEP-xxx clearer.

For those who can only speak in text - the delay codes allow the actual typing 
cadence to be viewed by the user (smoothing will not allow this).  This allows 
people to communicate with emphasis and it is even possible to recognize a 
persons typing from this.  (like recognizing a voice)    You also can recognize 
mood and condition.  Sort of the difference between hearing a person with 
intonation and voice quality vs just seeing a transcript of what they say. 

This actually is recognizable by others as well - but is particularly important 
for conversation.  And once people get accustomed to using them - you will find 
that you can (when you want to) use they way you type to add emphasis or 
intonation to your typing. 

Should vs Optional
Should was used because, although they can be safely omitted (the technology 
would still work in message mode), they are important to certain classes of 
users.  So we should include these features in mainstream applications - both 
so that people who need them can use mainstream apps and so that they can call 
everyone else who is using the mainstream apps. 

Hope this is helpful

Thanks for you consideration of these enhancements to XMPP.   I think that many 
users will find that, like captions on television,  they are critical to some 
people but very useful to others as well even though not everyone will use them 
or use them all the time. 

Gregg
-----------------------
Gregg Vanderheiden Ph.D.
Director Trace R&D Center
Professor Industrial & Systems Engineering
and Biomedical Engineering
University of Wisconsin-Madison

On Mar 2, 2011, at 4:10 AM, Kevin Smith wrote:

> On Mon, Feb 28, 2011 at 8:52 PM, Mark Rejhon <[email protected]> wrote:
>> With the help of others at realtimetext.org, I will take the
>> opportunity to rewrite portions of the standard to bring it into
>> better compliance with XMPP.org.  I am currently collaborating behind
>> the scenes with realtimetext.org at this time.  I also have open
>> source code I am releasing shortly (goal: end of March), which will
>> help demonstrate the proposed specification.  This may help you all to
>> determine what features are necessary, and what features are
>> unnecessary.
> 
> Great.
> 
>> In regards to some of the ballpark 'concerns':
>> 1) Simplifying the specification
>> Up front, there are a lot of things that I could simplify in the
>> standard to reduce word count dramatically, perhaps by as much as
>> about 40%.  Removing redundant/duplicate information, unnecessary
>> fluff content, rewording certain  parts into more clear English,
>> remove some unnecessary requirements, remove less important features
>> such as Group Chat, etc.  I will work with the people of
>> realtimetext.org on this.
> 
> I think this is worthwhile.
> 
>> 2) Complexity introduced by Delay Codes (Natural Typing)
>> Originally, I was going to make this a private feature of my own
>> implementation (i.e. private extension).  However, testing indicated
>> rave reviews.
> 
> This is somewhat counter-intuitive, as if network latency is
> consistent, the edits will arrive at a similar sort of rate to that at
> which they were transmitted and if it's not consistent, the edits are
> unlikely to arrive in a timely manner. I have a suspicion that what's
> happening here isn't the use of the delays that make the transmission
> delays less noticeable, but rather that the act of rendering the new
> text slowly is itself masking network latency. I wonder if there's a
> substantial difference in user experience between using the latency
> encoding on realtime edits, and simply using fragments, but having the
> client render e.g. n changes a second.
> 
>> 3) Complexity introduced by Real Time Message Editing Protocol
>> - Real time message does complicate the standard.  However, it is a
>> necessary inclusion for reasons already explained.
> 
> Compared with some simpler typing transmission system, or compared
> with whole-message only?
> 
>> - Technically, we could simply retransmit the whole message, which is
>> also allowed by the standard (Section 3.9.3).  However, this is
>> inefficient for long lines of messages,
> 
> I'm not entirely convinced about this (I think stream compression
> should be smoothing out a lot of these data unless the messages get
> *really* big.
> 
>> and makes it difficult to
>> serialize the real time message editing protocol.
> 
> I don't think this is true - XMPP already guarantees in-order
> delivery, so this spec shouldn't have to deal with that.
> 
>> - Cursor movements are included because it makes it much easier to
>> watch the remote person edit their real time text -- otherwise, edits
>> to the middle of their message sometimes went unnoticed more often by
>> the recipient (and lead to more misunderstandings due to missed
>> edits).
> 
> Surely that's a client rendering issue, though?
> 
>> - Delay codes for inter-keypress delays are good because typing looks
>> natural (and does not 'burst'), irregardless of the interval.  Testing
>> shows that this is a highly desirable extension to the spec.   See
>> next section below.
>> - It is noted that both of these features are NOT made 'REQUIRED'
> 
> No, but they're (from memory) SHOULD, which is as near as.
> 
>> - Testing of the open source software clearly showed that highest
>> quality of real time text occured when I included support for delay
>> codes and cursor positioning.
>> -- The open source software that's being released soon, has an
>> adjustable interval, and allows turning on/off features (including
>> delay codes and cursor codes), so that you can all judge how
>> necessary/unnecessary individual features are.
> 
> I'd be interested, if it allows complete retransmit with fake-delay
> rendering, to see how significant the difference between this and the
> full suite is.
> 
>> 4) Programming complexity
>> I realize the comment about programming simplicity is relative and
>> subject to interpretation.  I got the first version of the real time
>> text working in less than 2 days, in an initial round of programming
>> utilizing the open-source jabber-net library.  If I excluded the
>> optional cursor movements and delay codes, I actually found it really
>> simple to include real time message editing.  I had found most of the
>> complexity is actually found in the delay codes, as well as how I
>> prepared the messages for transmission.  Even with those advanced
>> features thrown in, I had a module (specifically for real time text)
>> that was still only 800 lines of code.  If you ignore all the
>> RECOMMENDED's and OPTIONAL's, the standard is actually much simpler to
>> implement and actually could be crammed into a much smaller document.
> 
> If these things can be safely ignored, then they should be OPTIONAL,
> rather than SHOULD.
> 
>> Perhaps the standard should clearly separate the features so that the
>> easy features are in a separate section from the advanced features, so
>> that it's easier for implementors to do a baseline version of this
>> spec.    Part of the reason why the specification looks more complex
>> than necessary because the easy and hard parts are interspersed with
>> each other.  By releasing open source code, it will help people
>> understand how easy or complex the specification is.
> 
> Certainly, if there are two different models, one easy, one hard, we
> should make it as easy as possible to implement the easy one. I still
> have doubts about the need for anything difficult - this is why
> Council (or I, anyway) seek community feedback on the spec.
> 
>> 5) Rationale of Attributes ('seq', 'msg', and 'type')
>> I found it necessary to include these attributes, because of the
>> nature of real time message editing.  If a message gets lost, and the
>> message contained an edit  (i.e. an insert/delete in the middle of
>> text became lost), then subsequent edits are invalid -- message length
>> is different, so a subsequent edit won't occur in the correct
>> location, and the text will become mangled.  Therefore, perfect
>> integrity is needed in subsequent real time message edit operations,
>> using some sort of continuity mechanism (sequence ID) or other sync
>> verification method.  There was experimentation done with the
>> open-source software that showed the 'seq' was necessary, as was a
>> method of signalling the first received real time text message
>> (type='new').
> 
> I'm surprised you found missing messages a significant problem, but
> maybe this is necessary.
> 
> /K

Re: [Standards] Review: XEP-xxxx: In-Band Real Time Text

Reply via email to