Before dropping any accent, you should make sure that such drops will not break (at least) the relative ordering at the primary collation strengh: this gives very useful hints about German umlauts translformed to e, or Danish rings transformed to double vowels, instead of being simply dropped.
2013/11/1 Jukka K. Korpela <[email protected]> > 2013-11-01 17:37, Jennifer Wong wrote: > > I would like to ask for advice on removing accents from characters. >> > > To address first the question you ask in the Subject line, “How to remove > accents while conforming to language standards?”, but do not ask in the > message body, the answer is: You can’t. Well, except in cases where > language standards permit the omission. For example, according to modern > French orthography standards, the circumflex in “fraîche” could and should > be dropped (though it is still very common to keep it). > > > While the normalization process is straight forward (NFD, remove >> accents), >> > > NFD does *not* remove accents. It is decomposition, not destruction. It > decomposes, say, “å” to “a” followed by a combining ring above. If you then > have your own code removes the combining marks, that’s a different issue, > and generally a wrong thing to do. > > > For example, >> Danish, "å" should be mapped to "aa", not "a". >> > > “Should” as per which standard or policy? It is gene rally accepted for > Danish to replace “å” by “aa” if you cannot use “å”. But what might be the > situation, in the year 2013, where you really cannot use “å”? > > > Likewise, in German, "ä" >> "ö" "ü" should be mapped to "ae", "oe" and "ue" respectively, not "a", >> "e", "u". Are there common practices on how to handle these special >> cases? >> > > There are various language-specific practices. They are not universal. For > example, in Spanish texts, I don’t think many people would find it > acceptable to replace “ü” by “ue”, rather than just “u”, if some evil > powers force you to stick to Ascii characters. > > Yucca > > > > >

