Re: [precis] [xmpp] review of draft-ietf-xmpp-6122bis-12

Peter Saint-Andre Fri, 29 Aug 2014 10:30:34 -0700

Hi Joe, thanks for the review and my apologies for taking a month to reply.


On 7/30/14, 3:25 PM, Joe Hildebrand (jhildebr) wrote:

The reasons the precis group got a spate of questions from me today was I
was prepping to do this review.  There are a couple of issues that the
precis folk should pay more attention to.

  > 1.  Introduction
...

  >    Instead, this document builds upon the
  >    internationalization framework defined by the IETF's PRECIS Working
  >    Group [I-D.ietf-precis-framework], while attempting to ensure that
  >    the characters allowed in Jabber IDs under stringprep are still
  >    allowed and handled in the same way under PRECIS.

"the same way" means more backward-compatibility to me than I think we
intend here.

Yes, that is a bit vague, even though it does say "attempting". Here isone possible approach...


OLD

   Instead, this document builds upon the
   internationalization framework defined by the IETF's PRECIS Working
   Group [I-D.ietf-precis-framework], while attempting to ensure that
   the characters allowed in Jabber IDs under stringprep are still
   allowed and handled in the same way under PRECIS.

NEW

   Instead, this document builds upon the
   internationalization framework defined by the IETF's PRECIS Working
   Group [I-D.ietf-precis-framework].  Although every attempt has been
   made to ensure that the characters allowed in Jabber IDs under
   stringprep are still allowed and handled in the same way under
   PRECIS, there is no guarantee of strict backward compatibility
   because of changes in Unicode and the fact that PRECIS handling is
   based on Unicode properties, not a hardcoded table of characters.

  > 3.1.  Fundamentals
  >
  >       jid           = [ localpart "@" ] domainpart [ "/" resourcepart ]
  >       localpart     = 1*1023(localpoint)
  >                       ;
  >                       ; a "localpoint" is a UTF-8 encoded
  >                       ; Unicode code point that conforms to
  >                       ; the "JIDlocalIdentifierClass" profile
  >                       ; of the PRECIS IdentifierClass
  >                       ;

This implies 1023 codepoints, not 1023 bytes to me. Same issue for ifqdn
and resourcepart.  6122 just had 1*; I think going back to that would be
fine since we have a rule below that captures the max size.

Your proposal seems fine to me, too. It's hard to capture these nuancesin ABNF, at times. Although, from later in the thread, "localbyte" wouldwork for me.

  > 3.2.  Domainpart
  >
  >    The domainpart of a JID is that portion after the '@' character (if
  >    any) and before the '/' character (if any); it is the primary

I think it's often surprising to people that foo/@bar is a valid JID with
"foo" as the domainpart and "@bar" as the resourcepart.  The text above,
although pulled from 6122, might be better as:

The domainpart of a JID is that portion after the first '@' character (if
any) and before the first '/' character (if any);


That's acceptable to me.

and possibly adding the example.


Examples are good. I'll add a few to Table 1.

  >    In general, the content of a domainpart is an Internationalized
  >    Domain Name ("IDN") as described in the specifications for
  >    Internationalized Domain Names in Applications (commonly called
  >    "IDNA2008"), and a domainpart is an "IDNA-aware domain name slot" as
  >    defined in [RFC5890].  The following rules apply to a domainpart that
  >    consists of a fully-qualified domain name and MUST be applied in the
  >    following order:

When do these rules need to be applied? Only before comparison or routing?


That is a very good question.

This might be a difference between the "preparation" and "comparison" ofthe PRECIS acronym.

You'll notice that the PRECIS nickname spec draws a sharper distinctionbetween preparation and comparison than the others:


http://www.ietf.org/archive/id/draft-ietf-precis-nickname-09.txt

Section 2 there says in part:

   For preparation purposes (most commonly, when a chatroom client
   generates a nickname from user input for inclusion as a protocol
   element that represents a "nickname slot"), an application MUST at a
   minimum ensure that the string conforms to the "FreeformClass" string
   class defined in [I-D.ietf-precis-framework]; however, it MAY in
   addition perform the normalization and mapping operations specified
   below for comparison purposes.

   For comparison purposes (e.g., when a chatroom server determines if
   two nicknames are in conflict during the authorization process), an
   application MUST treat a nickname as specified below (these rules
   constitute the "NicknameFreeformClass" profile).  The operations
   specified MUST be completed in the order shown (in particular,
   normalization MUST be performed after the other mapping steps and
   before validity-checking against the definition of the PRECIS
   "FreeformClass", consistent with [I-D.ietf-precis-framework]).

   [various rules elided]

I wonder if we want to say, in general, that there is something of alower bar for preparation than for comparison. For example, for an XMPPlocalpart we might say that an entity doing preparation just needs toensure that it doesn't include any characters outside of the PRECISIdentifierClass, whereas an entity doing comparison needs to apply thenormalization and mapping rules. The primary reason we might do this isthat it could ease the burden on XMPP clients or servers during certainoperations, whereas at those times when comparison is truly needed(e.g., when user authentication or authorization are being made) thefull set of rules would be applied.

Although I'm not entirely comfortable with this approach, pragmaticallyit might be more acceptable than saying that all entities must apply allof the rules all of the time.


This is related to text in Section 4:

   Enforcement of the XMPP address format rules is the responsibility of
   XMPP servers.  Although XMPP clients SHOULD prepare complete JIDs and
   parts of JIDs in accordance with this document before including them
   in protocol slots within XML streams (such that JIDs and parts of
   JIDs are in conformance), XMPP servers MUST enforce the rules
   wherever possible and reject stanzas and other XML elements that
   violate the rules (for stanzas, by returning a <jid-malformed/> error
   to the sender as described in Section 8.3.3.8 of [RFC6120]).

That text seems to imply the same principle: clients prepare and serversenforce (by mean of comparison?). But I think we could be clearer aboutthe whole matter by explicitly saying that enforcement includesapplication of all the rules (just as comparison does - it's just thatcomparison involves applying all of the rules to two strings in order todetermine if they are "equivalent", whereas enforcement involves applythe rules to a single string).

  >    1.  The domainpart MUST contain only NR-LDH labels and U-labels as
  >        defined in [RFC5890] and MUST consist only of Unicode code points
  >        that conform to the rules specified in [RFC5892] (which includes
  >        Unicode normalization).  This implies that the domainpart MUST
  >        NOT include A-labels as defined in [RFC5890]; each A-label MUST
  >        be converted to a U-label during preparation of a domainpart, and
  >        comparison MUST be performed using U-labels, not A-labels.

This seems like an always rule, including for dumb clients.

Things are a bit more clear-cut with regard to rules that are based onPRECIS, not IDNA, because the models are slightly different. In PRECISwe have base string classes (IdentifierClass and FreeformClass), so itmight make sense to say that preparation involves ensuring that thepreparing entity doesn't allow in any code points that are disallowedfor that base string class. We don't have base string classes in IDNA.Although the foregoing rule is similar to the base string class idea, itgoes beyond by including normalization. I'd almost prefer that we figurethis out very clearly first for PRECIS-based identifiers (in XMPP, thelocalpart and resourcepart) and then see how the resulting text can beported over to our use of IDNA-based identifiers (in XMPP, the domainpart).

  >    2.  All uppercase and titlecase code points within the domainpart
  >        MUST be mapped to their lowercase equivalents, preferably using
  >        Unicode Default Case Folding as defined in Chapter 3 of the
  >        Unicode Standard [UNICODE].

Dumb clients might get away with this and the system would still work.

  >    3.  Fullwidth and halfwidth characters within the domainpart MUST be
  >        mapped to their decomposition mappings.

Dumb clients have no shot at this one.

Right - in the emerging approach we're exploring here, the latter tworules would be a matter of enforcement and comparison only, not ofpreparation.

  >       Implementation Note: The foregoing order is different from the
  >       order for localparts and resourceparts as described below, to
  >       maintain consistency with the IDNA methods in both [RFC5892] and
  >       [RFC5895].
  >
  >    After any and all normalization, conversion, and mapping of code
  >    points,

as well as conversion to UTF-8.

True, although we kind of assume that in the XMPP world because all datasent over an XMPP stream is required to be UTF-8. Mentioning it seemsuseful, though.

  >    a domainpart MUST NOT be zero octets in length and MUST NOT
  >    be more than 1023 octets in length.  (Naturally, the length limits of
  >    [RFC1034] apply, and nothing in this document is to be interpreted as
  >    overriding those more fundamental limits.)
  >
  > 3.3.  Localpart
  >
  >    The localpart of a JID is an optional identifier placed before the
  >    domainpart and separated from the latter by the '@' character.
  >    Typically a localpart uniquely identifies the entity requesting and
  >    using network access provided by a server (i.e., a local account),
  >    although it can also represent other kinds of entities (e.g., a chat
  >    room associated with a multi-user chat service [XEP-0045]).  The
  >    entity represented by an XMPP localpart is addressed within the
  >    context of a specific domain (i.e., <localpart@domainpart>).
  >
  >    A localpart MUST NOT be zero octets in length and MUST NOT be more
  >    than 1023 octets in length.  This rule is to be enforced after any
  >    normalization and mapping of code points.

and conversion to UTF-8.


As above.

  >    A localpart MUST consist only of Unicode code points that conform to
  >    the "JIDlocalIdentifierClass" profile of the "IdentifierClass" base
  >    string class defined in [I-D.ietf-precis-framework].  The
  >    JIDlocalIdentifierClass profile includes all code points allowed by
  >    the IdentifierClass base class, with the exception of the following
  >    characters that are explicitly disallowed in XMPP localparts:

(special precis focus)
I would have expected this to be phrased more similarly to step 2 of
http://tools.ietf.org/html/draft-ietf-precis-framework-17#section-5, or
for section 5 to just have a step about codepoints forbidden in a given
usage of the selected precis class.

Good point - I agree that more internal harmony would be helpful herebetween the framework and the various profiles.

  >    The normalization and mapping rules for the JIDlocalIdentifierClass
  >    are as follows, where the operations specified MUST be completed in
  >    the order shown:

Again, I think we need language about when these rules are applied.  The
rest of the section is about what is allowed, not about how to compare.

As discussed above, I think we need to more clearly delineate what'srequired for preparation, what's required for enforcement, and what'srequired for comparison. And as mentioned seems to me right now that thesame rules are involved in enforcement and comparison, except thatapplying those rules during enforcement is a way to determine if asingle string conforms, whereas applying those rules during comparisonis a way to determine if two strings are "equivalent". That said, youruse of the phrase "about what is allowed, not about how to compare"might suggest that more is involved in comparison than in enforcement.

To choose a simple example, is the JID <[email protected]> "allowed" ifthe jabber.org server enforces all the rules for a localpart? It seemsto me not. We're saying that a client could send that (since both "S"and "P" are allowed by the category "Lu - Uppercase_Letter" and thuswould pass the preparation test), but that a server which is enforcingthe rules would map "S" to "s" and "P" to "p". However, the rules forcomparison are the same as for enforcement: "StPeter" and "stpeter"would compare as equivalent.

  >    1.  Fullwidth and halfwidth characters MUST be mapped to their
  >        decomposition mappings.
  >
  >    2.  Uppercase and titlecase characters MUST be mapped to their
  >        lowercase equivalents, preferably using Unicode Default Case
  >        Folding as defined in Chapter 3 of the Unicode Standard
  >        [UNICODE].

Nothing about SpecialCasing?


That's a question for the WG. :-)

The PRECIS framework states:

   If case mapping is desired (instead of case preservation), it is
   RECOMMENDED to use Unicode Default Case Folding as defined in Chapter
   3 of the Unicode Standard [Unicode6.3].

      Note: Unicode Default Case Folding is not designed to handle
      various localization issues (such as so-called "dotless i" in
      several Turkic languages).  The PRECIS mappings document
      [I-D.ietf-precis-mappings] describes these issues in greater
      detail and defines a "local case mapping" method that handles some
      locale-dependent and context-dependent mappings.

Given the discussions in recent PRECIS WG meetings, I would shy awayfrom applying locale-dependent and context-dependent mappings in XMPPlocalparts. However, I'm open to argument.

  >    A resourcepart MUST NOT be zero octets in length and MUST NOT be more
  >    than 1023 octets in length.  This rule is to be enforced after any
  >    normalization and mapping of code points.
  >
  >    A resourcepart MUST consist only of Unicode code points that conform
  >    to the "JIDresourceFreeformClass" profile of the "FreeformClass" base
  >    string class defined in [I-D.ietf-precis-framework].
  >
  >    The normalization and mapping rules for the resourcepart of a JID are
  >    as follows, where the operations specified MUST be completed in the
  >    order shown:

Again, when are the rules applied?


See above.

  >    1.  Fullwidth and halfwidth characters MAY be mapped to their
  >        decomposition mappings.

(precis)
I need a hint as to when do this.  "MAY" isn't nearly enough.

Do you mean "when" as "in what contexts is it smart to do width mappingon resourceparts" or as something else (e.g., "when" could mean "bywhich entities" such as clients, servers, and XMPP "components").

Later in this thread, you and Florian Zeitz seem to think that MUST NOTperform width mapping is the right approach.

However, resourceparts are used in multiple contexts (we could say thatthere are multiple "resourcepart slots").


For the JIDs of connected resources (user@domain/foo), I tend to agree.

For the JIDs of chatroom participants, the precis-nickname spec says touse NFKC, which handles width mapping as part of normalization (and thusmight be taken to violate the proposed MUST NOT approach).

I haven't yet taken the time to find and think about other resourcepartslots in various XMPP extensions, but I hesitate to make a categoricalstatement in 6122bis since the applicability of width mapping mightdepend on the context in which a resourcepart is used.

  >    2.  Map any instances of non-ASCII space to ASCII space (U+0020).

(precis)
I was hoping either the framework doc or the mappings doc would tell me
more about which characters to map here.  RFC 3454 had table C.1.2, but I
don't see any hints about what I'm supposed to do now.


Good catch.

Is the rule "has a
compatibility mapping to U+0020"?

BTW I count at least three kinds of compatibility mapping to 0020:<compat> (as in U+0384 GREEK TONOS), <noBreak> (as in U+2007 FIGURESPACE)), and <wide> (as in U+3000 IDEOGRAPHIC SPACE).

That doesn't hit U+200B which is in
C.1.2,

Right. I am not sure whether ZERO WIDTH SPACE really ought to be mappedto U+0020. See Florian's comment later in this thread.

nor does "has category Zs".


IMHO that is insufficient.

My intuition is that by "non-ASCII space" we mean anything that has acompatibility mapping of any kind of U-0020, since that seems safest (itcasts a wider net) and is something we can apply in a programmatic way.However, my intuitions are not always correct and applying this rulethis would result in a larger table than what we find in Appendix C.1.2of RFC 3454.

draft-ietf-precis-mappings says
"Therefore, the special mapping table should be based on a well-
    defined mapping table for each protocol", which although I don't
particularly like, I can live with - but we need the table here.

Do you feel that we need the table in 6122bis or in the framework? Asyou say, the mappings document implies that each specification thatdefines a rule like "map non-ASCII space to ASCII space" needs to definetheir own table, but that seems like a recipe for trouble. If, say, SASLand XMPP and LDAP each defines a different table, authentication mightbecome confusing (especially since XMPP uses SASL and authenticationmight be based on an LDAP lookup).

  >    3.  So-called additional mappings MAY be applied, such as mapping of
  >        characters that are similar to common delimiters (such as '@',
  >        ':', '/', '+', '-', and '.', e.g., mapping of IDEOGRAPHIC FULL
  >        STOP (U+3002) to FULL STOP (U+002E)) and special handling of
  >        certain characters or classes of characters (e.g., mapping of
  >        non-ASCII spaces to ASCII space); the PRECIS mappings document
  >        [I-D.ietf-precis-mappings] describes such mappings in more
  >        detail.
  >
  >    4.  Uppercase and titlecase characters MAY be mapped to their
  >        lowercase equivalents, preferably using Unicode Default Case
  >        Folding as defined in Chapter 3 of the Unicode Standard
  >        [UNICODE].

Again, I need more about the MAY here.

  > 6.  IANA Considerations
  >
  >    The following completed templates provide the information necessary
  >    for the IANA to add 'JIDlocalIdentifierClass' and
  >    'JIDresourceFreeformClass' to the PRECIS Profiles Registry.

Should we also ask them to mark the status of nodeprep and resourceprep to
deprecated in the stringprep profiles registry?


Yes.

  > Appendix A.  Differences from RFC 6122
  >
  >    Based on consensus derived from working group discussion,
  >    implementation and deployment experience, and formal interoperability
  >    testing, the following substantive modifications were made from RFC
  >    6122.

I think it might be nice to point out that this may have made
previously-valid JIDs no longer valid (or vice-versa), and that we suggest
careful testing before migrating user data.

+1 to at least that text. Ideally we'd perform the kind of analysis thatTakahiro Nemoto performed for SASLprep vs. SASLprepbis:


http://www.ietf.org/mail-archive/web/precis/current/msg00790.html

I haven't done that yet, though.

Thanks again to you and Florian for your careful reviews.

Peter


_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

Re: [precis] [xmpp] review of draft-ietf-xmpp-6122bis-12

Reply via email to