Thanks for checking the data. I'm sorry for not responding earlier; I was on vacation, and am now working through my backlog of email.
Some of the differences are because UTS#46 provides a compatibility 'bridge' between IDNA2003 and IDNA2008. For details of these particular cases, see below. Note that the current tests do not attempt to be exhaustive, eg include a line for every character with the status for whether it is valid or not. Such a test can be written using the main data file at http://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt. Other test cases can be added for the future; if you (or others) have suggestions for good test lines, please let us know. Mark *— Il meglio è l’inimico del bene —* On Thu, Sep 16, 2010 at 14:59, Colosi, John <[email protected]> wrote: > Hello all, > > > > I represent the VeriSign Domain Name Registry as an implementer of the > latest IDNA specifications. The following four (4) questions arose during > our implementation of the conformance test. > > > > > > *Question **1 of 4*** > > *Line* 204 > > *Input* \u0646 \u0627 \u0645 \u0647 \u200C \u0627 \u06CC > > *Reference* Appendix A.1 of *RFC 5892 > (Tables)<https://trac.tools.ietf.org/html/rfc5892> > * > > *Issue* Per the reference, the ZWNJ (\u200C) must meet one of > two qualifications. It must be preceded by a character with VIRAMA > combining class. OR the characters in the label must have a certain pattern > of joining types. This input does not meet either of these criteria, and > appears to be an invalid IDN label with respect to the IDNA 2008 standards. > There are ten (10) such lines in the input file. > This is by design. UTS#46 does not have the contextual checks for ZWJ and ZWNJ. Background: While those are excellent checks to have, and are recommended, they only prevent a small fraction of the homoglyph exploits, so they are not required by UTS#46 and are not tested for in the file. (If you disagree with that approach, you should bring that up to the UTC for the next version of UTS#46.) UTS#46 does allow for implementations to be stricter if desired, so any implementation can apply those IDNA2008 checks. Note that we could add a field in the test file that indicated whether the input (or mapped input [see below]) was valid under IDNA2008. Do people think that would be helpful? > > > > > *Question **2 of 4*** > > *Line* 319 > > *Input* … > 1234567890123456789012345678901234567890123456789012345678901234… > > *Reference* Sections 3.1 and 3.5 of *RFC > 1034<http://www.ietf.org/rfc/rfc1034.txt> > * > > *Issue* Per the reference, DNS labels cannot contain more than > 63 octets. It appears that this is a purposeful test, since the first label > is exactly 63 octets, and the second label is 64 octets. This does not > apply to other applications, but these lines of input are not valid for > DNS. There are three (3) such lines in the input file. > This appears to be a mistake in the conformance file generation. I'll look at it to see what is happening. > > > > > *Question **3 of 4*** > > *Line* 319 > > *Input* U \u0308 . xn--tda > > *Reference* Section 4.1 of *RFC 5891 > (Protocol)<https://trac.tools.ietf.org/html/rfc5891> > * > > *Issue* Per the reference, input into the IDNA Registration > process “MUST be… in Normalization Form C”. This input does not meet these > standards. The first label is not properly normalized. Implementations of > IDNA 2008 for registration should expect an exception. There are four (4) > such lines in the input file. > Here is the situation: - IDNA2003 allows as input denormalized text; it requires that text be normalized (and case-folded) in the process of generating the punycode. - IDNA2008 disallows denormalized text per se; however it allows a mapping phase for the input, which can do a normalization and case folding for consistency with IDNA2003. UTS#46 provides for a mapping that is consistent with IDNA2003 and allowed by IDNA2008. That mapping normalizes U\u0308 to a lowercase U-umlaut, which is valid. > > > > *Question **4 of 4*** > > *Line* 276 > > *Input* xn—53h > > *Reference* Appendix B.1 of *RFC 5892 > (Tables)<https://trac.tools.ietf.org/html/rfc5892> > * > > *Issue* Per the reference, the character \u2615 is disallowed. > > 2460..26CD ; DISALLOWED # CIRCLED DIGIT ONE..DISABLED CAR > > Implementations should expect an exception. There are twenty (20) such > lines in the input file. > > > This is another instance where UTS#46 is mapping. See the line of http://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt, which has the following. Such a mapping is permitted by IDNA2008. 2461 ; mapped ; 0032 # 1.1 CIRCLED DIGIT TWO > > Any input is appreciated, > > -- John > > > > > > John Colosi | Naming Services | Veri*Sign*, Inc. > Å 703.948.3211 È 703.967.4062 Ê 703.421.8233 > > *This message is intended for the use of the individual or entity to > **which it is addressed, and may contain information that is privileged, > **confidential and exempt from disclosure under applicable law. Any > **unauthorized use, distribution, or disclosure is strictly prohibited. If > **you have received this message in error, please notify sender > **immediately and destroy/delete the original transmission. > > * >

