Re: [DNSOP] draft-yao-dnsop-idntld-implementation-01.txt

2009-11-07 Thread James Seng
There is a genuine user problem here (though whether one should actually
solve it is still an open question).

It is a genuine user problem but I disagree with your latter statement.

It is not an open question it must be solved. It is a serious enough problem
for Chinese that it must be resolve for the Chinese user. The open question
is how, not if.

For some background on CJK ideograph, you may refer to an expired draft I
wrote many years ago

http://james.seng.sg/files/draft-ietf-idn-cjk-01.pdf

RFC 3743 will also be a good point of reference.

-James Seng

On Sat, Nov 7, 2009 at 3:33 AM, Andrew Sullivan a...@shinkuro.com wrote:

 On Fri, Nov 06, 2009 at 07:06:38PM +, Alex Bligh wrote:

  I should probably declare my hand in that I think in most cases the
  variant stuff is a non-problem (blocking is adequate apart from
  where the user is likely to mistype themselves)

 According to some people I'm inclined to believe (I can't say this
 from personal experience), in non-alphabetic scripts there is indeed a
 serious problem with respect to variants.  It's not actually like the
 case of (say) colour vs. color, because users tend to have access to
 one version or another of the same character, but that character is
 actually encoded under Unicode as a different character.  This is, I
 am led to believe, a very common problem in areas where, for instance,
 Simplified and Traditional Chinese characters are both in regular use.
 The upshot is that every competent user of the language will recognize
 two different arrangements of symbols as the same word, each user
 will be able to type one or the other of the arrangements (but not
 both) at their keyboard, and yet the two different arrangements do not
 constitute equvalent labels.  So, it's as though by configuring your
 system with locale en-CA, you were _unable_ to type color.com into a
 resolution context.  There is a genuine user problem here (though
 whether one should actually solve it is still an open question).

 Best,

 A

 --
 Andrew Sullivan
 a...@shinkuro.com
 Shinkuro, Inc.
 ___
 DNSOP mailing list
 DNSOP@ietf.org
 https://www.ietf.org/mailman/listinfo/dnsop

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] Some second-hand remarks on draft-liman-tld-names-00.txt

2009-03-11 Thread James Seng
By the same logic, the whole IDN would be pointless because RFC 1035
restrict labels to alphabetic letter only.

IDNA transform IDN labels into punycode so that it become transparent
to the resolvers who made those assumption.

-James Seng

 I think this is what's up for dispute.  If people have interpreted the
 text in 1123 as normative and built resolvers using the logic there,
 then that is a technical reason to limit TLD characters.  Even if we
 think those resolvers were mistaken in their implementation, they're
 deployed.  Interoperation is one of our more important values, and
 that includes interoperation with reasonable interpretations of RFCs
 that we nevertheless think are mistaken.

 Best,

 A
 --
 Andrew Sullivan
 a...@shinkuro.com
 Shinkuro, Inc.
 ___
 DNSOP mailing list
 DNSOP@ietf.org
 https://www.ietf.org/mailman/listinfo/dnsop

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] Some second-hand remarks on draft-liman-tld-names-00.txt

2009-03-11 Thread James Seng
On Wed, Mar 11, 2009 at 11:36 PM, Andrew Sullivan a...@shinkuro.com wrote:
 On Wed, Mar 11, 2009 at 10:56:10PM +0800, James Seng wrote:
 By the same logic, the whole IDN would be pointless because RFC 1035
 restrict labels to alphabetic letter only.

 I'd like the reference to where 1035 says that, please.  In
 particular, the following passage in §3.1 of RFC 1035 seems to say
 something different:


label ::= letter [ [ ldh-str ] let-dig ]

...

letter ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

 you seem to be making my argument for me.  The reason IDNA is
 preferable to some of the alternatives is that some resolver software
 indeed understood 1034 and 1035 to mean that the preferred syntax
 ought to be enforced (in what seems to me a plain violation of those
 RFCs).  We have to live with those widely-deployed resolvers, and
 therefore we need to design other protocols as though the additional
 restrictions that are _not_ part of the DNS protocol are in fact part
 of it.  Designing the protocols for the actually existing conditions
 in the network is what makes the design activity engineering rather
 than research, I think.

Preciesly. Punycode instead of UTF-8 was selected because widely
deployed implementation despite theortically DNS should be 8-bit
clean.

My point is that RFC 1123 statement on alphabetic requirement

a) is highly debatable because it is not an explicit requirement since
it is mention in a section called DISCUSSION in a passing that
since at least the highest-level component label will be alphabetic,
in the context that TLD is alphabetic only as a matter of fact at that
time, not as a matter of technical requirement

b) even it is an explicit requirement, it should be taken in context
in the spirit as much as RFC 1035 forbid non-alphabetic characters in
labels.

-James Seng
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] RFC1035 and permitted characters in labels

2009-03-11 Thread James Seng
Agreed :)

DNS is suppose to be 8-bit clean as according to RFC 1035. But taken
in context with that recommended section in RFC 1035, together with
RFC 952, many legacy implementation already assumed DNS must be LDH.
By the time RFC 2181 comes along, it was too late.

This was one of the reasons why Punycode was chosen and not UTF-8 for IDN.

-James Seng

 Er, that's in Section 2.3.1: Preferred Name Syntax which says before the
 BNF:

 The following syntax will result in fewer problems with many
 applications that use domain names

 RFC1035 does not say that labels can only be composed of ASCII letters and
 digits. RFC1123 imposes limitations on the characters permissible in a host
 name. But that's not the same as a domain name.


 PS Apologies for changing the Subject: header into something appropriate.

___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop


Re: [DNSOP] Some second-hand remarks on draft-liman-tld-names-00.txt

2009-03-11 Thread James Seng
 The DISCUSSION portion of 2.1 is explaining why relaxing RFC 952's
 restriction is safe.  The safety flows exclusively from the premise
 that the highest-level component label of a domain name will be
 alphabetic; this guarantees that a syntactic check for an IP address
 will fail due to at least one label being made up only of letters.  It
 may be, therefore, that the alphabetic restriction is in fact
 policy, and is not strictly a protocol issue.  The problem is that it
 is policy on which other technical decisions rest.  Change the policy,
 and the justification for those other technical decisions is
 undermined.  In this sense, the claim in the DISCUSSION portion of 2.1
 is not just a policy: it is also the foundation of other protocol
 issues, and is therefore normative on the protocol even if it _is_ a
 policy matter.

Okay, I agree with this line of logic.

1. We agreed that the TLD restriction is therefore a policy one, and
we derive other technical specification (e.g. allowing digits label at
2LD) based on the assumption of the policy one.

2. However, IDNA does not change that technical assumption, since
A-label will never be all digit, or start with a digit or end with
one.

 The 2001 introduction of a number of new TLDs was
 rockier than necessary partly because of those checks, even though
 there was never an RFC that suggested such was a good check.

Agreed

 1123 _does_ suggest that it is reasonable to check for top-level labels
 being alphabetic, and I'd bet a pretty good lunch that we can find
 implementations that decide whether something is a domain name based
 on whether the top label starts with a letter.  Therefore, even if we
 don't think that 1123 does in fact restrict the top-level label to
 letters only, it is prudent to treat such a restriction as a _de
 facto_ part of the protocol.

This is where we differ.

1. RFC 1123 do not suggest that top-level labels be check for
alphabetic. RFC 1123 assumed TLD is alphabetic and therefore made
certain technical assumption of what is considered valid or not.

But I agree with you that there will be implementation that decide
what TLD should be but it is a problem with the implementation, not
with RFC 1123 or RFC 952, esp on what it did not say.

2. IDNA do not change it either again, since A-label is always LDH, or
at least valid according to RFC 1123.

 To the extent we want to change that de facto part of the protocol, we
 want to do as little damage as possible.  An argument in favour of
 John Klensin's suggestion to make an explicit exception for IDNA2008
 A-labels is that it is the smallest change that can be made that still
 accommodates the new feature we want.

What I failed to see is why we need an update to RFC1123...but I can
accept the small change as proposed by John if thats what the group
think it is best moving forward.

-James Seng
___
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop