Re: [Standards] Review: XEP-xxxx: In-Band Real Time Text

Mark Rejhon Wed, 02 Mar 2011 06:22:55 -0800

Thank you for your comments!   I will address the most important one
first, because it deserves its own reply:


> This is somewhat counter-intuitive, as if network latency is
> consistent, the edits will arrive at a similar sort of rate to that at
> which they were transmitted and if it's not consistent, the edits are
> unlikely to arrive in a timely manner. I have a suspicion that what's
> happening here isn't the use of the delays that make the transmission
> delays less noticeable, but rather that the act of rendering the new
> text slowly is itself masking network latency. I wonder if there's a

That's not the issue.  Our open-source software code now INCLUDE all
the following 'experimental' adjustments:
- Higher and lower intervals.
- Displaying text instantly as we received text.
- Displaying text in a time-smoothed manner.
- Delay codes (Natural Typing)

We found that the last bullet was vastly superior to all the other
options.  (See below for a description of our experiences.)   In our
greatly simplified resubmission of the spec, we currently plan to keep
delay codes in the second draft of our specification.  We will also,
before then, release open-source code so you can test the various
approaches we have done, which are already enabled (including
artificial delays)  We have found many other ways to majorly simplify
the standard without removing the delay code feature completely.  Let
us release some source code & the demo of our software first -- before
we consider other approaches (which may include turning delay codes to
a private extension to a public standard, since we already have two
software packages that will keep delay codes).

Many deaf people are often used to noticing typing variances.  People
type "What? Are you nuts?" differently than "How now, brown cow?".
Hurried and erratic typing versus slow and relaxed typing.  There are
a lot of subtleties in typing that can only be transmitted with delay
codes.  Tired people have more errors, excited people often type fast,
relaxed people often type slower.  When a deafie is familiar with
talking to the same person over real time text for a long time (such
as via a Text Telephone or TTY -- see Wikipedia --   Plus, it was
found it makes it easy to distinguish copy&pastes away from natural
Internet bursting (i.e. mobile connection with highly variable ping,
or a long XMPP interval).  The typing look actually conveys a small
percentage of the 'emotion', which adds further to the context
(sarcasm versus genuineness, etc).  It turns real time text into a
high-def experience for some of us.

Here are the findings:

-- Lower intevals: This works wonderfully on LAN and fast connections.
 We even tried extremely low transmission intervals such as 5
milliseconds, to make the typing look 'natural'.  However, Google Talk
started to drop XMPP packets once fast typists were sending about
10-12 XMPP packets per second.  (A typist typing 120 WPM types about
10 keypresses per second).   If we raise the interval to 50ms or
100ms, we're still sending 10 XMPP packets per second, but the bursty
look starts to marginally become noticeable to our target audience of
the specification.  XMPP servers started to work sort of reliably
beginning at around 300ms (3 XMPP packets per second).  But at this
point, real time text quality started to significantly degrade, to
1000ms and then to unusable at 3000ms interval.   Server-unfriendly,
user-unfriendly.

-- Display text instantly as we received text: This leads to bursty
look.  The bursty look was noticeable even down to 100ms for most
typists, and even at 50ms for fast 100 WPM typists (such as me).
Trying to simulate natural typing through short intervals is not
practical, and it's not very friendly for XMPP servers if we send 10
XMPP packets per second.    User-unfriendly.  Also short intervals are
bunched up anyway over congested connections, satellite, mobile, and
dial-up connections, so 100ms may look like 500ms interval because the
delivery of messages are 'clumped' together.    Also, at longer
intervals (i.e. 2000ms and up) it becomes hard to to tell apart typing
from copy & pastes.

-- Dispaying text in a time-smoothed manner: Artificial delays
inserted between characters actually looks pretty good albiet somewhat
unnatural looking.   The delay is calculated by the number of
characters (and/or number of backspaces and cursor movements), and
dividing the interval with that value, and using that as the smoothing
value.  However, time-smoothed text masks out 'emotion' in the typing.
 And copy & pastes can look funny unless there's extra complexity in
the client to distinguish sudden-output text from non-sudden-output
text.  Also, when ping becomes variable (random congestion, mobile
connections with fluctuating reception, etc -- we tested laptop
tethered connections too), time-smoothed display looks somewhat
erratic and even more distractingly unnatural.  Also, I was surprised
to find that using good-looking time-smoothing can be more complex
than delay codes (assuming we continued to use a 'edit code' or
'conrol code' based system of real time text) because we still needed
to use non-blocking methods of delays such as timers or
multithreading.

-- Delay codes: This was the eureka moment.  When we did this, delay
codes made the typing look like local typing, and looked exactly the
same, regardless of 100ms interval or 3,000ms interval.   Typing
looked natural the same over high-speed, as well as satellite and
dial-up connections.  It looked the same at 1,000ms ping as at 5ms
ping.  It even looked natural even over highly congested dial-up
connection!!  (Ever tried SSH while doing an FTP transfer over dial-up
Internet?).   It was much easier to tell the person's original typing
'emotion'.   It was easy to tell apart copy and pastes.  To explain
this easier, let's think of VoIP: VoIP is essentially a series of
packets of small recorded snippets of voice.  Likewise, real time text
with delay codes (natural typing) is a series of packets of small
recorded snippets of typing (including original key press delay,
cursor movements, etc).   To use an overused cliche phrase, it turns
real time text into a "high-definition experience".

I plan release of the open source code on SourceForge or Google Code --
We came to the conclusion that for many of us, delay codes are a
critical inclusion to the spec, so the spec must be at least
compatible with a private delay codes extension (that also works
properly with other edits including deletes and pastes).  But for the
second draft, we plan to keep delay codes included at least until
everyone has tried it out (or at least seen a video of it of
side-by-side demos -- we plan to make some).  We already have two
software packages, including the open source code, which I now plan to
release under a permissive open source license (such as Apache 2.0) to
help accelerate adoption.  Any remaining complexity of the
specification is also compensated by the good-will release of
permissive open source code.  The open source license we plan to use,
permits use in either open-source or commercial/proprietary projects.
This will maximize adoption amongst our peers.

Our timeline for releasing the open source software sometime before
the end of March.  We will then submit an updated specification right
after the source code is released.
No doubt, that between now and then, I'll be picking your comments and
feedback about specific things (i.e. various excellent standards
simplification comments including those that have already been said,
etc)

Regards,
Mark Rejhon

Re: [Standards] Review: XEP-xxxx: In-Band Real Time Text

Reply via email to