--On Wednesday, September 03, 2014 09:46 +0300 Andrew Sullivan <[email protected]> wrote:
> On Tue, Sep 02, 2014 at 05:42:28PM -0700, Dan Chiba wrote: >> usernames "fussball" and "fußball" should be treated as >> equal for case insensitive matching, or as distinct for case >> sensitive matching. > > I don't see any case in those two strings. What am I missing? That applying ToCaseFold(fußball) yields "fussball". The reasons why that made sense were clear, although questioned in some quarters, when it was done. Whatever else can be said about it, it (and similar issues such as those with dotless "i") are generally much less problematic in running text and/or when the language and locale context are known than with, e.g., short identifiers with no language context (such as with IDNA and at least some PRECIS contexts). At this stage, stability rules prevent any reconsideration in the Unicode context: if the ToCaseFold behavior is not satisfactory for a given context, that operation either needs to be avoided (as IDNA2008 does) or replaced by something else. > But anyway, it's not clear to me that the usernames "fussball" > and "fußball" should in fact always and everywhere be treated > as equal. For my understanding is that the use of ß is very > much a localization issue: the ligature is never used in Swiss > written German. One could argue that it'd be extremely unwise > to permit distinct user registrations that are otherwise > identical except for ss and ß (certainly I would). But if we > think that's a good reason to insist on a mapping, then that > would also seem to be a good reason to say, "No case folding, > period." Indeed, that's one of the big differences between > IDNA2003 and IDNA2008; IDNA2008 violates a deep tradition in > the DNS in this way, basically because we couldn't make the > mapping work the way we'd like. I would question the "deep tradition" because, in reality, it applies only to the ASCII repertoire, i.e., a subset of the set of undecorated Latin characters. I note that, for scripts that do not have case as that term is understood in Greek-Latin-Cyrillic but that do have Initial-Medial-Final-Isolated distinctions, no one of whom I'm aware has seriously suggested the CharPositionFormFold operation and we have dealt with those cases inconsistently by banning one of the forms, quietly mapping them into another form and losing whatever information might be present (the option IDNA2008 quite deliberately eliminated), or treating them as distinct. >... best, john _______________________________________________ precis mailing list [email protected] https://www.ietf.org/mailman/listinfo/precis
