RE: Proposed Update Unicode Technical Standard #46 (Unicode IDNA Compatibility Processing)

Colosi, John Wed, 22 Sep 2010 13:06:43 -0700

Hi Mark,


 

Thanks for the response.  I appreciate your points.  If I can summarize, I 
think the spec creates special rules for registries, different from rules for 
other kinds of clients.
 
“By the time a string enters the IDNA registration process … it MUST be … in 
Normalization Form C.  … [Registries] MUST accept only the exact string for 
which registration is requested, free of any mappings or local adjustments.”
-- RFC 5891, section 4.1
 

I think your point is that UTS #46 has a broader scope than just registries, 
and so it allows the Unicode 6.0 mapping.  In fact, it’s very purpose seems to 
be a bridging of the gap between 2003 and 2008.  So in a trivial sense, a 
strict reading of Idna2008 for registries will cause some issues.

 

 

I’m still confused about the last example.  The Punycode sequence “xn--53h” is 
converting for me to U+2615.
The mapping for this character appears to be empty:
2614..2615    ; valid
And the tables RFC appears to prohibit the point:
2460..26CD  ; DISALLOWED  # CIRCLED DIGIT ONE..DISABLED CAR

Maybe I’m still missing something, but this just doesn’t look like valid input, 
even if I apply the mapping.  Not sure.

 

 

Thanks again,

-- John

 

 

John Colosi     |     Naming Services     |     VeriSign, Inc. 
Å 703.948.3211    È 703.967.4062    Ê 703.421.8233 

This message is intended for the use of the individual or entity to
which it is addressed, and may contain information that is privileged,
confidential and exempt from disclosure under applicable law. Any
unauthorized use, distribution, or disclosure is strictly prohibited. If
you have received this message in error, please notify sender
immediately and destroy/delete the original transmission.



From: [email protected] [mailto:[email protected]] On 
Behalf Of Mark Davis ?
Sent: Sunday, September 19, 2010 7:27 PM
To: Colosi, John
Cc: [email protected]; UTC; Markus Scherer
Subject: Re: Proposed Update Unicode Technical Standard #46 (Unicode IDNA 
Compatibility Processing)

 

Thanks for checking the data. I'm sorry for not responding earlier; I was on 
vacation, and am now working through my backlog of email.

 

Some of the differences are because UTS#46 provides a compatibility 'bridge' 
between IDNA2003 and IDNA2008. For details of these particular cases, see below.

 

Note that the current tests do not attempt to be exhaustive, eg include a line 
for every character with the status for whether it is valid or not. Such a test 
can be written using the main data file at 
http://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt.

 

Other test cases can be added for the future; if you (or others) have 
suggestions for good test lines, please let us know.


Mark

— Il meglio è l’inimico del bene —



On Thu, Sep 16, 2010 at 14:59, Colosi, John <[email protected]> wrote:

Hello all,

 

I represent the VeriSign Domain Name Registry as an implementer of the latest 
IDNA specifications.  The following four (4) questions arose during our 
implementation of the conformance test.

 

 

Question       1 of 4

Line              204

Input             \u0646 \u0627 \u0645 \u0647 \u200C \u0627 \u06CC

Reference      Appendix A.1 of RFC 5892 (Tables) 
<https://trac.tools.ietf.org/html/rfc5892> 

Issue             Per the reference, the ZWNJ (\u200C) must meet one of two 
qualifications.  It must be preceded by a character with VIRAMA combining 
class.  OR the characters in the label must have a certain pattern of joining 
types.  This input does not meet either of these criteria, and appears to be an 
invalid IDN label with respect to the IDNA 2008 standards.  There are ten (10) 
such lines in the input file.

 

This is by design. UTS#46 does not have the contextual checks for ZWJ and ZWNJ. 

 

Background: While those are excellent checks to have, and are recommended, they 
only prevent a small fraction of the homoglyph exploits, so they are not 
required by UTS#46 and are not tested for in the file. (If you disagree with 
that approach, you should bring that up to the UTC for the next version of 
UTS#46.) UTS#46 does allow for implementations to be stricter if desired, so 
any implementation can apply those IDNA2008 checks.

 

Note that we could add a field in the test file that indicated whether the 
input (or mapped input [see below]) was valid under IDNA2008. Do people think 
that would be helpful?

 

         

         

        Question       2 of 4

        Line              319

        Input                 
…1234567890123456789012345678901234567890123456789012345678901234…

        Reference      Sections 3.1 and 3.5 of RFC 1034 
<http://www.ietf.org/rfc/rfc1034.txt> 

        Issue             Per the reference, DNS labels cannot contain more 
than 63 octets.  It appears that this is a purposeful test, since the first 
label is exactly 63 octets, and the second label is 64 octets.  This does not 
apply to other applications, but these lines of input are not valid for DNS.  
There are three (3) such lines in the input file.

 

This appears to be a mistake in the conformance file generation. I'll look at 
it to see what is happening.

 

         

         

        Question       3 of 4

        Line              319

        Input             U \u0308 . xn--tda

        Reference      Section 4.1 of RFC 5891 (Protocol) 
<https://trac.tools.ietf.org/html/rfc5891> 

        Issue             Per the reference, input into the IDNA Registration 
process “MUST be… in Normalization Form C”.  This input does not meet these 
standards.  The first label is not properly normalized.  Implementations of 
IDNA 2008 for registration should expect an exception.  There are four (4) such 
lines in the input file.

 

Here is the situation:

*       IDNA2003 allows as input denormalized text; it requires that text be 
normalized (and case-folded) in the process of generating the punycode. 
*       IDNA2008 disallows denormalized text per se; however it allows a 
mapping phase for the input, which can do a normalization and case folding for 
consistency with IDNA2003.

UTS#46 provides for a mapping that is consistent with IDNA2003 and allowed by 
IDNA2008. That mapping normalizes U\u0308 to a lowercase U-umlaut, which is 
valid.

 

         

         

        Question       4 of 4

        Line              276

        Input             xn—53h

        Reference      Appendix B.1 of RFC 5892 (Tables) 
<https://trac.tools.ietf.org/html/rfc5892> 

        Issue             Per the reference, the character \u2615 is disallowed.

        2460..26CD  ; DISALLOWED  # CIRCLED DIGIT ONE..DISABLED CAR

        Implementations should expect an exception.  There are twenty (20) such 
lines in the input file.

          

 

This is another instance where UTS#46 is mapping. See the line of 
http://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt, which has the 
following. Such a mapping is permitted by IDNA2008.

2461          ; mapped                 ; 0032  # 1.1         CIRCLED DIGIT TWO

         

        Any input is appreciated,

        -- John

         

         

        John Colosi     |     Naming Services     |     VeriSign, Inc. 
        Å 703.948.3211    È 703.967.4062    Ê 703.421.8233 
        
        This message is intended for the use of the individual or entity to
        which it is addressed, and may contain information that is privileged,
        confidential and exempt from disclosure under applicable law. Any
        unauthorized use, distribution, or disclosure is strictly prohibited. If
        you have received this message in error, please notify sender
        immediately and destroy/delete the original transmission.

RE: Proposed Update Unicode Technical Standard #46 (Unicode IDNA Compatibility Processing)

Reply via email to