Re: [Standards] XEP-0301: review comments

Peter Saint-Andre Tue, 26 Jun 2012 21:01:40 -0700

On 6/26/12 5:58 PM, Mark Rejhon wrote:
> About section 4.6.4 of XEP-0301:
> http://xmpp.org/extensions/xep-0301.html#message_retransmission
>  
> 
>     1. When the recipient's presence changes. (i.e. offline to online) 
> 
> 
>     So let's say you are typing a message to me and my presence changes from
>     normal availability to do not disturb or whatever. Why does your client
>     need to retransmit what you have typed so far? Are there specific
>     presence scenarios that you have in mind here?
> 
> 
> Problem: Without retransmit
> - Sender: Starts typing message while recipient is offline


Why is the sender generating RTT messages if it doesn't know whether the
recipient supports the protocol?

You can't assume that *anything* is "real time" if the other party is
offline.

Designing around this "problem" strikes me as silly.

In any case, you're not talking about presence changes, but changing
from offline to online.

>     2. When the recipient sends a <message/> from a different full JID than
>     before. (i.e. Simultaneous Logins)
>     You might want to explain how this recommendation is consistent with
>     XEP-0296.
> 
> 
> You're right, let me do some thinking on how to incorporate this.  It
> benefits regardless whether or not XEP-0296 is used, but I should
> describe the two different behaviours in Implementation Notes, and make
> a link to that from here. (Section 6.4.3.2)

That sounds good.

>     3. At regular intervals, to allow recovery from unexpected situations
>     such as lost <message/> stanzas.
> 
> [snip] 
> 
>     What are you designing for here? Are you defining ways to work around
>     dropped stanzas? If so, isn't the solution to use XEP-0198? It's not
>     clear to me why we're solving a lower-layer problem in the RTT spec.
> 
> 
> You are right, 0198 is one of the many solutions, but it is more
> complicated for many implementations, and not always pratical with all
> libraries (i.e. existin library, etc)

Those don't strike me as very good reasons. Libraries can be updated,
code can be fixed, features can be added. At the least, it would be
better for the spec to say "use XEP-0198 if you possibly can, and
retransmit only if you don't have access to underlying facilities that
prevent dropped stanzas".

>     > I will clarify this in the spec;
>     > Section 7.2 would occur if the text was written fast enough to fit in
>     > one transmission inteval (i.e. 700ms, the default recommendation)
>     > Section 7.3 would occur if the text was written slow enough to cover
>     > four transmission interval cycles (i.e. 700ms x 4 = 2.8 seconds)
>     > This is unrelated to congestion control; can you give me context as to
>     > what made you think 7.2 vs 7.3 was a congestion control issue?
> 
>     Because if I send 20 message stanzas when I could send one message
>     stanza instead, I'm potentially causing additional overhead on my
>     connection (and yours).
> 
> 
> Yes, but that's a necessary design of the spec.  
> *Real-time text MUST stay _real-time_.*
> The typing should be lagged by less than 1 second.  
> When the recipient types, the sender should see the recipient's typing
> virtually instantly, in real-time.
> 
> Therefore, if you're continuously typing non-stop for 20 seconds, it's
> necessary to send at LEAST 20 stanzas -- there's no way around that --

Huh?

I'm talking about the example in Section 7.2:

<message to='[email protected]' from='[email protected]/home' type='chat'
id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello bcak</t><e n='3'/><t>ack</t>
  </rtt>
</message>

In that RTT message, you have encapsulated 4 different actions into one
message stanza. Contrast that with Section 7.3:

<message to='[email protected]' from='[email protected]/home' type='chat'
id='a01'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='0' event='new'>
    <t>Hello</t>
  </rtt>
</message>

<message to='[email protected]' from='[email protected]/home' type='chat'
id='b02'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='1'>
    <t> bcak</t>
  </rtt>
</message>

<message to='[email protected]' from='[email protected]/home' type='chat'
id='c03'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='2'>
    <e n='3'/>
  </rtt>
</message>

<message to='[email protected]' from='[email protected]/home' type='chat'
id='d04'>
  <rtt xmlns='urn:xmpp:rtt:0' seq='3'>
    <t>ack</t>
  </rtt>
</message>

You don't explain why an implementation would have to follow the
multi-message approach.

>     I have never seen a message size limit that small. Some services might
>     have a limit of 64k.
> 
> 
> Ok, same problem applies -- 64k copy and paste (From a webpage),
> resulting in 64k <rtt/> + 64k <body/> + overhead.

XMPP is not designed for, is not optimized for, and is not appropriate
for, the sending of large messages. If you want to send 100k text blobs,
use in-band bytestreams (XEP-0047) or some other file transfer mechanism.

> However, you're right, it might be overly reactive to complicate the spec.
> I'll try to simplify this section.

Thanks.

>     My main concern was about Unicode normalization forms -- e.g., NFC would
>     leave Roman numeral IV (U+2163) as one code point, whereas NFKC would
>     normalize that character to I (U+0049) V (U+0056). The XMPP RFCs say
>     nothing about Unicode normalization of XML character data, and I think
>     it would be best to leave it that way. 
> 
> 
> Normalization is not a problem as long as it's done before the RTT
> encode on the sender end, and only after the RTT decode on the recipient
> end. 

I need to think about it further.

> As you can see, we solved the normalization problem --  but we
> removed the wording about normalization from the spec.   Should we
> re-add the wording about normalization?

I think it would be a good idea to add *some* text. The question right
now is determining what text to add. :)

> Please open the PDF file
> at http://www.realjabber.org/flowchart_of_xmpp_rtt_path.pdf -- see where
> it says "Normalization" at the sender end and at the recipient end.
> 
> Christian/Norm (the author of real time text in American Online's AIM,
> but wants to adopt XEP-0301 from now on) totally agreed that
> normalization is not a problem as long as it occurs /before/ the RTT
> encode, or /after/ the RTT decode, according to the flowchart.
> 
> It's more natural for actual XEP-0301 implementors to work with Code
> Points than to work with UTF-8.   Christian Vogler/Norm Williams -- the
> author of the real-time feature now built into AOL Instant Messenger --
> agrees that we must do Code Points instead of UTF-8.   Over time as we
> worked with XEP-0301, we have gradually come to actual near-unamious
> agreement between actual XEP-0301 implementors that it's more natural to
> do Code Points.
> 
> Since you're still recommending UTF-8, 

I didn't say that.

> we need to be absolutely sure
> that Code Point is the right way to go; 
> while the actual people working with real time text, are insisting on
> Code Points rather than UTF-8.
> 
>  
> 
>     So in the RTT spec we could
>     assume that no normalization is applied, and I think that is the safest
>     approach for several reasons (in fact we probably want to explicitly
>     state that applications MUST NOT apply Unicode normalization to the XML
>     character data of RTT messages).
> 
> 
> I agree with you that talk about Normalization should now be added back
> to the spec.  

Yes, we need to say something about it.

> Do you agree with where normalization is allowed/disallowed as indicated
> in http://www.realjabber.org/flowchart_of_xmpp_rtt_path.pdf ?

I'll review that tomorrow.

>     Is there a reason you don't want to calculate p and n values based on
>     characters instead of code points? The reason I ask is that a code point
>     is a rather abstract notion -- if I scribble ᾧ on a piece of paper or my
>     whiteboard at the office, that's still a code point (U+1FA7). In
>     computer systems, we typically talk about code points that have been
>     encoded using a character encoding scheme such as UTF-8. However, the
>     term "character" is used in the XML specification and is also defined in
>     the Unicode standard as "The basic unit of encoding for the Unicode
>     character encoding." It seems more natural to talk in terms of
>     characters than in terms of code points.
> 
> 
> On cover, I agree...
> But for internal implementation, it's problematic.  Especially on
> platforms that don't store internally in UTF-8.

The question of "code point" vs. "character" is terminological, not a
matter of encoding.

> Actual real-time implementors within our taskgroup have gradually
> shifted to a preference for Code Points for many reasons (not just the
> ones I indicated).  
> 
> 
> Your comments are welcome and useful!
> Thank you very much!
> We'd love a response to the remaining comments too -

Time is limited. I'll reply as I can.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/

Re: [Standards] XEP-0301: review comments

Reply via email to