On Sunday, June 1, 2014 2:01:09 PM UTC+5:30, Chris Angelico wrote:
> On Sun, Jun 1, 2014 at 5:58 PM, Marko Rauhamaa wrote:
> > As a Finnish-speaker, I hope that patch doesn't become default behavior.
> > Too many times, we have been victimized by the German conventions. A
> > Finnish-speaker would much rather see
> >    Järvenpää => Jarvenpaa
> >    Öllölä => Ollola
> >    Kärkkäinen => Karkkainen
> > than
> >    Järvenpää => Jaervenpaeae
> >    Öllölä => Oelloelae
> >    Kärkkäinen => Kaerkkaeinen

> It's even worse than that. The rules for ASCIIfying adorned characters
> vary according to context - Müller and Mueller are different names,
> and in many contexts should sort and compare differently, and I
> remember reading somewhere that there's a context in which it's more
> useful to decompose ü to u rather than ue. There is no "safe" lossy
> transformation that can be done to any language's words, and this is
> no exception. ASCIIfication has to be accepted as flawed; this issue
> (an inability to handle non-ASCII labels) is similar to a lot of blog
> URLs - 
> http://rosuav.blogspot.com/2013/08/20th-international-g-festival-awards.html
> is talking about the "International G&S Festival" awards, but the URL
> drops the "&S" part. (If you absolutely have to transmit something
> losslessly in pure ASCII, you need a scheme like Punycode, which is a
> lot less clean and readable than a decomposition scheme.)

> Of course, the better solution is to permit the full Unicode alphabet
> in identifiers...

Yes that is the real point.

Changing the current behavior which maps [ö,ä…] →  [o,a…] to a new
behavior that maps it to [oe,ae…], then arguing that this should/should
not become default is the wrong battle.

The more useful line is: Why have this conversion at all?
Until hardly 3 years ago html authors wrote non-ASCII as chars as html entities.
Now the current standard practice is directly to write the character and
make sure the page is explicitly utf-8.

Its only a question of time before this becomes standard practice in
all domains

Reply via email to