[precis] review of draft-ietf-xmpp-6122bis-12

Joe Hildebrand (jhildebr) Wed, 30 Jul 2014 14:26:24 -0700

The reasons the precis group got a spate of questions from me today was I
was prepping to do this review.  There are a couple of issues that the
precis folk should pay more attention to.


 > 1.  Introduction
... 

 >    Instead, this document builds upon the
 >    internationalization framework defined by the IETF's PRECIS Working
 >    Group [I-D.ietf-precis-framework], while attempting to ensure that
 >    the characters allowed in Jabber IDs under stringprep are still
 >    allowed and handled in the same way under PRECIS.

"the same way" means more backward-compatibility to me than I think we
intend here.

 > 3.1.  Fundamentals
 > 
 >       jid           = [ localpart "@" ] domainpart [ "/" resourcepart ]
 >       localpart     = 1*1023(localpoint)
 >                       ;
 >                       ; a "localpoint" is a UTF-8 encoded
 >                       ; Unicode code point that conforms to
 >                       ; the "JIDlocalIdentifierClass" profile
 >                       ; of the PRECIS IdentifierClass
 >                       ;

This implies 1023 codepoints, not 1023 bytes to me. Same issue for ifqdn
and resourcepart.  6122 just had 1*; I think going back to that would be
fine since we have a rule below that captures the max size.

 > 3.2.  Domainpart
 > 
 >    The domainpart of a JID is that portion after the '@' character (if
 >    any) and before the '/' character (if any); it is the primary

I think it's often surprising to people that foo/@bar is a valid JID with
"foo" as the domainpart and "@bar" as the resourcepart.  The text above,
although pulled from 6122, might be better as:

The domainpart of a JID is that portion after the first '@' character (if
any) and before the first '/' character (if any);

and possibly adding the example.

 >    In general, the content of a domainpart is an Internationalized
 >    Domain Name ("IDN") as described in the specifications for
 >    Internationalized Domain Names in Applications (commonly called
 >    "IDNA2008"), and a domainpart is an "IDNA-aware domain name slot" as
 >    defined in [RFC5890].  The following rules apply to a domainpart that
 >    consists of a fully-qualified domain name and MUST be applied in the
 >    following order:

When do these rules need to be applied? Only before comparison or routing?

 >    1.  The domainpart MUST contain only NR-LDH labels and U-labels as
 >        defined in [RFC5890] and MUST consist only of Unicode code points
 >        that conform to the rules specified in [RFC5892] (which includes
 >        Unicode normalization).  This implies that the domainpart MUST
 >        NOT include A-labels as defined in [RFC5890]; each A-label MUST
 >        be converted to a U-label during preparation of a domainpart, and
 >        comparison MUST be performed using U-labels, not A-labels.

This seems like an always rule, including for dumb clients.

 >    2.  All uppercase and titlecase code points within the domainpart
 >        MUST be mapped to their lowercase equivalents, preferably using
 >        Unicode Default Case Folding as defined in Chapter 3 of the
 >        Unicode Standard [UNICODE].

Dumb clients might get away with this and the system would still work.

 >    3.  Fullwidth and halfwidth characters within the domainpart MUST be
 >        mapped to their decomposition mappings.

Dumb clients have no shot at this one.

 >       Implementation Note: The foregoing order is different from the
 >       order for localparts and resourceparts as described below, to
 >       maintain consistency with the IDNA methods in both [RFC5892] and
 >       [RFC5895].
 > 
 >    After any and all normalization, conversion, and mapping of code
 >    points, 

as well as conversion to UTF-8.

 >    a domainpart MUST NOT be zero octets in length and MUST NOT
 >    be more than 1023 octets in length.  (Naturally, the length limits of
 >    [RFC1034] apply, and nothing in this document is to be interpreted as
 >    overriding those more fundamental limits.)
 > 
 > 3.3.  Localpart
 > 
 >    The localpart of a JID is an optional identifier placed before the
 >    domainpart and separated from the latter by the '@' character.
 >    Typically a localpart uniquely identifies the entity requesting and
 >    using network access provided by a server (i.e., a local account),
 >    although it can also represent other kinds of entities (e.g., a chat
 >    room associated with a multi-user chat service [XEP-0045]).  The
 >    entity represented by an XMPP localpart is addressed within the
 >    context of a specific domain (i.e., <localpart@domainpart>).
 > 
 >    A localpart MUST NOT be zero octets in length and MUST NOT be more
 >    than 1023 octets in length.  This rule is to be enforced after any
 >    normalization and mapping of code points.

and conversion to UTF-8.

 >    A localpart MUST consist only of Unicode code points that conform to
 >    the "JIDlocalIdentifierClass" profile of the "IdentifierClass" base
 >    string class defined in [I-D.ietf-precis-framework].  The
 >    JIDlocalIdentifierClass profile includes all code points allowed by
 >    the IdentifierClass base class, with the exception of the following
 >    characters that are explicitly disallowed in XMPP localparts:

(special precis focus)
I would have expected this to be phrased more similarly to step 2 of
http://tools.ietf.org/html/draft-ietf-precis-framework-17#section-5, or
for section 5 to just have a step about codepoints forbidden in a given
usage of the selected precis class.

 >    The normalization and mapping rules for the JIDlocalIdentifierClass
 >    are as follows, where the operations specified MUST be completed in
 >    the order shown:

Again, I think we need language about when these rules are applied.  The
rest of the section is about what is allowed, not about how to compare.

 >    1.  Fullwidth and halfwidth characters MUST be mapped to their
 >        decomposition mappings.
 > 
 >    2.  Uppercase and titlecase characters MUST be mapped to their
 >        lowercase equivalents, preferably using Unicode Default Case
 >        Folding as defined in Chapter 3 of the Unicode Standard
 >        [UNICODE].

Nothing about SpecialCasing?

 >    A resourcepart MUST NOT be zero octets in length and MUST NOT be more
 >    than 1023 octets in length.  This rule is to be enforced after any
 >    normalization and mapping of code points.
 > 
 >    A resourcepart MUST consist only of Unicode code points that conform
 >    to the "JIDresourceFreeformClass" profile of the "FreeformClass" base
 >    string class defined in [I-D.ietf-precis-framework].
 > 
 >    The normalization and mapping rules for the resourcepart of a JID are
 >    as follows, where the operations specified MUST be completed in the
 >    order shown:

Again, when are the rules applied?

 >    1.  Fullwidth and halfwidth characters MAY be mapped to their
 >        decomposition mappings.

(precis)
I need a hint as to when do this.  "MAY" isn't nearly enough.

 >    2.  Map any instances of non-ASCII space to ASCII space (U+0020).

(precis)
I was hoping either the framework doc or the mappings doc would tell me
more about which characters to map here.  RFC 3454 had table C.1.2, but I
don't see any hints about what I'm supposed to do now.  Is the rule "has a
compatibility mapping to U+0020"?  That doesn't hit U+200B which is in
C.1.2, nor does "has category Zs".  draft-ietf-precis-mappings says
"Therefore, the special mapping table should be based on a well-
   defined mapping table for each protocol", which although I don't
particularly like, I can live with - but we need the table here.

 >    3.  So-called additional mappings MAY be applied, such as mapping of
 >        characters that are similar to common delimiters (such as '@',
 >        ':', '/', '+', '-', and '.', e.g., mapping of IDEOGRAPHIC FULL
 >        STOP (U+3002) to FULL STOP (U+002E)) and special handling of
 >        certain characters or classes of characters (e.g., mapping of
 >        non-ASCII spaces to ASCII space); the PRECIS mappings document
 >        [I-D.ietf-precis-mappings] describes such mappings in more
 >        detail.
 > 
 >    4.  Uppercase and titlecase characters MAY be mapped to their
 >        lowercase equivalents, preferably using Unicode Default Case
 >        Folding as defined in Chapter 3 of the Unicode Standard
 >        [UNICODE].

Again, I need more about the MAY here.

 > 6.  IANA Considerations
 > 
 >    The following completed templates provide the information necessary
 >    for the IANA to add 'JIDlocalIdentifierClass' and
 >    'JIDresourceFreeformClass' to the PRECIS Profiles Registry.

Should we also ask them to mark the status of nodeprep and resourceprep to
deprecated in the stringprep profiles registry?

 > Appendix A.  Differences from RFC 6122
 > 
 >    Based on consensus derived from working group discussion,
 >    implementation and deployment experience, and formal interoperability
 >    testing, the following substantive modifications were made from RFC
 >    6122.

I think it might be nice to point out that this may have made
previously-valid JIDs no longer valid (or vice-versa), and that we suggest
careful testing before migrating user data.


-- 
Joe Hildebrand



_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

[precis] review of draft-ietf-xmpp-6122bis-12

Reply via email to