Re: Proposed Update Unicode Technical Standard #46 (Unicode IDNA Compatibility Processing)

Mark Davis ☕ Sun, 19 Sep 2010 16:52:49 -0700

Thanks for checking the data. I'm sorry for not responding earlier; I was on
vacation, and am now working through my backlog of email.

Some of the differences are because UTS#46 provides a compatibility 'bridge'
between IDNA2003 and IDNA2008. For details of these particular cases, see
below.

Note that the current tests do not attempt to be exhaustive, eg include a
line for every character with the status for whether it is valid or not.
Such a test can be written using the main data file at
http://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt.

Other test cases can be added for the future; if you (or others) have
suggestions for good test lines, please let us know.

Mark

*— Il meglio è l’inimico del bene —*

On Thu, Sep 16, 2010 at 14:59, Colosi, John <[email protected]> wrote:

>  Hello all,
>
>
>
> I represent the VeriSign Domain Name Registry as an implementer of the
> latest IDNA specifications.  The following four (4) questions arose during
> our implementation of the conformance test.
>
>
>
>
>
> *Question       **1 of 4***
>
> *Line*              204
>
> *Input*             \u0646 \u0627 \u0645 \u0647 \u200C \u0627 \u06CC
>
> *Reference*      Appendix A.1 of *RFC 5892 
> (Tables)<https://trac.tools.ietf.org/html/rfc5892>
> *
>
> *Issue*             Per the reference, the ZWNJ (\u200C) must meet one of
> two qualifications.  It must be preceded by a character with VIRAMA
> combining class.  OR the characters in the label must have a certain pattern
> of joining types.  This input does not meet either of these criteria, and
> appears to be an invalid IDN label with respect to the IDNA 2008 standards.
> There are ten (10) such lines in the input file.
>

This is by design. UTS#46 does not have the contextual checks for ZWJ and
ZWNJ.

Background: While those are excellent checks to have, and are recommended,
they only prevent a small fraction of the homoglyph exploits, so they are
not required by UTS#46 and are not tested for in the file. (If you disagree
with that approach, you should bring that up to the UTC for the next version
of UTS#46.) UTS#46 does allow for implementations to be stricter if desired,
so any implementation can apply those IDNA2008 checks.

Note that we could add a field in the test file that indicated whether the
input (or mapped input [see below]) was valid under IDNA2008. Do people
think that would be helpful?

>
>
>
>
> *Question       **2 of 4***
>
> *Line*              319
>
> *Input*                 …
> 1234567890123456789012345678901234567890123456789012345678901234…
>
> *Reference*      Sections 3.1 and 3.5 of *RFC 
> 1034<http://www.ietf.org/rfc/rfc1034.txt>
> *
>
> *Issue*             Per the reference, DNS labels cannot contain more than
> 63 octets.  It appears that this is a purposeful test, since the first label
> is exactly 63 octets, and the second label is 64 octets.  This does not
> apply to other applications, but these lines of input are not valid for
> DNS.  There are three (3) such lines in the input file.
>

This appears to be a mistake in the conformance file generation. I'll look
at it to see what is happening.

>
>
>
>
> *Question       **3 of 4***
>
> *Line*              319
>
> *Input*             U \u0308 . xn--tda
>
> *Reference*      Section 4.1 of *RFC 5891 
> (Protocol)<https://trac.tools.ietf.org/html/rfc5891>
> *
>
> *Issue*             Per the reference, input into the IDNA Registration
> process “MUST be… in Normalization Form C”.  This input does not meet these
> standards.  The first label is not properly normalized.  Implementations of
> IDNA 2008 for registration should expect an exception.  There are four (4)
> such lines in the input file.
>

Here is the situation:

   - IDNA2003 allows as input denormalized text; it requires that text be
   normalized (and case-folded) in the process of generating the punycode.
   - IDNA2008 disallows denormalized text per se; however it allows a
   mapping phase for the input, which can do a normalization and case folding
   for consistency with IDNA2003.

UTS#46 provides for a mapping that is consistent with IDNA2003 and allowed
by IDNA2008. That mapping normalizes U\u0308 to a lowercase U-umlaut, which
is valid.

>
>
>
> *Question       **4 of 4***
>
> *Line*              276
>
> *Input*             xn—53h
>
> *Reference*      Appendix B.1 of *RFC 5892 
> (Tables)<https://trac.tools.ietf.org/html/rfc5892>
> *
>
> *Issue*             Per the reference, the character \u2615 is disallowed.
>
> 2460..26CD  ; DISALLOWED  # CIRCLED DIGIT ONE..DISABLED CAR
>
> Implementations should expect an exception.  There are twenty (20) such
> lines in the input file.
>
>
>

This is another instance where UTS#46 is mapping. See the line of
http://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt, which has the
following. Such a mapping is permitted by IDNA2008.

2461          ; mapped                 ; 0032  # 1.1         CIRCLED DIGIT TWO

>
> Any input is appreciated,
>
> -- John
>
>
>
>
>
> John Colosi     |     Naming Services     |     Veri*Sign*, Inc.
> Å 703.948.3211    È 703.967.4062    Ê 703.421.8233
>
> *This message is intended for the use of the individual or entity to
> **which it is addressed, and may contain information that is privileged,
> **confidential and exempt from disclosure under applicable law. Any
> **unauthorized use, distribution, or disclosure is strictly prohibited. If
> **you have received this message in error, please notify sender
> **immediately and destroy/delete the original transmission.
>
> *
>

Re: Proposed Update Unicode Technical Standard #46 (Unicode IDNA Compatibility Processing)

Reply via email to