Re: [DNSOP] draft-yao-dnsop-idntld-implementation-01.txt
There is a genuine user problem here (though whether one should actually solve it is still an open question). It is a genuine user problem but I disagree with your latter statement. It is not an open question it must be solved. It is a serious enough problem for Chinese that it must be resolve for the Chinese user. The open question is how, not if. For some background on CJK ideograph, you may refer to an expired draft I wrote many years ago http://james.seng.sg/files/draft-ietf-idn-cjk-01.pdf RFC 3743 will also be a good point of reference. -James Seng On Sat, Nov 7, 2009 at 3:33 AM, Andrew Sullivan a...@shinkuro.com wrote: On Fri, Nov 06, 2009 at 07:06:38PM +, Alex Bligh wrote: I should probably declare my hand in that I think in most cases the variant stuff is a non-problem (blocking is adequate apart from where the user is likely to mistype themselves) According to some people I'm inclined to believe (I can't say this from personal experience), in non-alphabetic scripts there is indeed a serious problem with respect to variants. It's not actually like the case of (say) colour vs. color, because users tend to have access to one version or another of the same character, but that character is actually encoded under Unicode as a different character. This is, I am led to believe, a very common problem in areas where, for instance, Simplified and Traditional Chinese characters are both in regular use. The upshot is that every competent user of the language will recognize two different arrangements of symbols as the same word, each user will be able to type one or the other of the arrangements (but not both) at their keyboard, and yet the two different arrangements do not constitute equvalent labels. So, it's as though by configuring your system with locale en-CA, you were _unable_ to type color.com into a resolution context. There is a genuine user problem here (though whether one should actually solve it is still an open question). Best, A -- Andrew Sullivan a...@shinkuro.com Shinkuro, Inc. ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] Some second-hand remarks on draft-liman-tld-names-00.txt
By the same logic, the whole IDN would be pointless because RFC 1035 restrict labels to alphabetic letter only. IDNA transform IDN labels into punycode so that it become transparent to the resolvers who made those assumption. -James Seng I think this is what's up for dispute. If people have interpreted the text in 1123 as normative and built resolvers using the logic there, then that is a technical reason to limit TLD characters. Even if we think those resolvers were mistaken in their implementation, they're deployed. Interoperation is one of our more important values, and that includes interoperation with reasonable interpretations of RFCs that we nevertheless think are mistaken. Best, A -- Andrew Sullivan a...@shinkuro.com Shinkuro, Inc. ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] Some second-hand remarks on draft-liman-tld-names-00.txt
On Wed, Mar 11, 2009 at 11:36 PM, Andrew Sullivan a...@shinkuro.com wrote: On Wed, Mar 11, 2009 at 10:56:10PM +0800, James Seng wrote: By the same logic, the whole IDN would be pointless because RFC 1035 restrict labels to alphabetic letter only. I'd like the reference to where 1035 says that, please. In particular, the following passage in §3.1 of RFC 1035 seems to say something different: label ::= letter [ [ ldh-str ] let-dig ] ... letter ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case you seem to be making my argument for me. The reason IDNA is preferable to some of the alternatives is that some resolver software indeed understood 1034 and 1035 to mean that the preferred syntax ought to be enforced (in what seems to me a plain violation of those RFCs). We have to live with those widely-deployed resolvers, and therefore we need to design other protocols as though the additional restrictions that are _not_ part of the DNS protocol are in fact part of it. Designing the protocols for the actually existing conditions in the network is what makes the design activity engineering rather than research, I think. Preciesly. Punycode instead of UTF-8 was selected because widely deployed implementation despite theortically DNS should be 8-bit clean. My point is that RFC 1123 statement on alphabetic requirement a) is highly debatable because it is not an explicit requirement since it is mention in a section called DISCUSSION in a passing that since at least the highest-level component label will be alphabetic, in the context that TLD is alphabetic only as a matter of fact at that time, not as a matter of technical requirement b) even it is an explicit requirement, it should be taken in context in the spirit as much as RFC 1035 forbid non-alphabetic characters in labels. -James Seng ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] RFC1035 and permitted characters in labels
Agreed :) DNS is suppose to be 8-bit clean as according to RFC 1035. But taken in context with that recommended section in RFC 1035, together with RFC 952, many legacy implementation already assumed DNS must be LDH. By the time RFC 2181 comes along, it was too late. This was one of the reasons why Punycode was chosen and not UTF-8 for IDN. -James Seng Er, that's in Section 2.3.1: Preferred Name Syntax which says before the BNF: The following syntax will result in fewer problems with many applications that use domain names RFC1035 does not say that labels can only be composed of ASCII letters and digits. RFC1123 imposes limitations on the characters permissible in a host name. But that's not the same as a domain name. PS Apologies for changing the Subject: header into something appropriate. ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop
Re: [DNSOP] Some second-hand remarks on draft-liman-tld-names-00.txt
The DISCUSSION portion of 2.1 is explaining why relaxing RFC 952's restriction is safe. The safety flows exclusively from the premise that the highest-level component label of a domain name will be alphabetic; this guarantees that a syntactic check for an IP address will fail due to at least one label being made up only of letters. It may be, therefore, that the alphabetic restriction is in fact policy, and is not strictly a protocol issue. The problem is that it is policy on which other technical decisions rest. Change the policy, and the justification for those other technical decisions is undermined. In this sense, the claim in the DISCUSSION portion of 2.1 is not just a policy: it is also the foundation of other protocol issues, and is therefore normative on the protocol even if it _is_ a policy matter. Okay, I agree with this line of logic. 1. We agreed that the TLD restriction is therefore a policy one, and we derive other technical specification (e.g. allowing digits label at 2LD) based on the assumption of the policy one. 2. However, IDNA does not change that technical assumption, since A-label will never be all digit, or start with a digit or end with one. The 2001 introduction of a number of new TLDs was rockier than necessary partly because of those checks, even though there was never an RFC that suggested such was a good check. Agreed 1123 _does_ suggest that it is reasonable to check for top-level labels being alphabetic, and I'd bet a pretty good lunch that we can find implementations that decide whether something is a domain name based on whether the top label starts with a letter. Therefore, even if we don't think that 1123 does in fact restrict the top-level label to letters only, it is prudent to treat such a restriction as a _de facto_ part of the protocol. This is where we differ. 1. RFC 1123 do not suggest that top-level labels be check for alphabetic. RFC 1123 assumed TLD is alphabetic and therefore made certain technical assumption of what is considered valid or not. But I agree with you that there will be implementation that decide what TLD should be but it is a problem with the implementation, not with RFC 1123 or RFC 952, esp on what it did not say. 2. IDNA do not change it either again, since A-label is always LDH, or at least valid according to RFC 1123. To the extent we want to change that de facto part of the protocol, we want to do as little damage as possible. An argument in favour of John Klensin's suggestion to make an explicit exception for IDNA2008 A-labels is that it is the smallest change that can be made that still accommodates the new feature we want. What I failed to see is why we need an update to RFC1123...but I can accept the small change as proposed by John if thats what the group think it is best moving forward. -James Seng ___ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop