--On Wednesday, September 03, 2014 09:46 +0300 Andrew Sullivan
<[email protected]> wrote:

> On Tue, Sep 02, 2014 at 05:42:28PM -0700, Dan Chiba wrote:
>> usernames "fussball" and "fußball" should be treated as
>> equal for case insensitive matching, or as distinct for case
>> sensitive matching. 
> 
> I don't see any case in those two strings.  What am I missing?

That applying ToCaseFold(fußball) yields "fussball".  The
reasons why that made sense were clear, although questioned in
some quarters, when it was done.  Whatever else can be said
about it, it (and similar issues such as those with dotless "i")
are generally much less problematic in running text and/or when
the language and locale context are known than with, e.g., short
identifiers with no language context (such as with IDNA and at
least some PRECIS contexts).    At this stage, stability rules
prevent any reconsideration in the Unicode context: if the
ToCaseFold behavior is not satisfactory for a given context,
that operation either needs to be avoided (as IDNA2008 does) or
replaced by something else.

> But anyway, it's not clear to me that the usernames "fussball"
> and "fußball" should in fact always and everywhere be treated
> as equal. For my understanding is that the use of ß is very
> much a localization issue: the ligature is never used in Swiss
> written German.  One could argue that it'd be extremely unwise
> to permit distinct user registrations that are otherwise
> identical except for ss and ß (certainly I would).  But if we
> think that's a good reason to insist on a mapping, then that
> would also seem to be a good reason to say, "No case folding,
> period."  Indeed, that's one of the big differences between
> IDNA2003 and IDNA2008; IDNA2008 violates a deep tradition in
> the DNS in this way, basically because we couldn't make the
> mapping work the way we'd like.

I would question the "deep tradition" because, in reality, it
applies only to the ASCII repertoire, i.e., a subset of the set
of undecorated Latin characters.  I note that, for scripts that
do not have case as that term is understood in
Greek-Latin-Cyrillic but that do have
Initial-Medial-Final-Isolated distinctions, no one of whom I'm
aware has seriously suggested the CharPositionFormFold operation
and we have dealt with those cases inconsistently by banning one
of the forms, quietly mapping them into another form and losing
whatever information might be present (the option IDNA2008 quite
deliberately eliminated), or treating them as distinct.

>...

best,
   john

_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

Reply via email to