2013-11-01 17:37, Jennifer Wong wrote:

I would like to ask for advice on removing accents from characters.

To address first the question you ask in the Subject line, “How to remove accents while conforming to language standards?”, but do not ask in the message body, the answer is: You can’t. Well, except in cases where language standards permit the omission. For example, according to modern French orthography standards, the circumflex in “fraîche” could and should be dropped (though it is still very common to keep it).

While the normalization process is straight forward (NFD, remove
accents),

NFD does *not* remove accents. It is decomposition, not destruction. It decomposes, say, “å” to “a” followed by a combining ring above. If you then have your own code removes the combining marks, that’s a different issue, and generally a wrong thing to do.

For example,
Danish, "å" should be mapped to "aa", not "a".

“Should” as per which standard or policy? It is gene rally accepted for Danish to replace “å” by “aa” if you cannot use “å”. But what might be the situation, in the year 2013, where you really cannot use “å”?

Likewise, in German, "ä"
"ö" "ü" should be mapped to  "ae", "oe" and "ue" respectively, not "a",
"e", "u". Are there common practices on how to handle these special
cases?

There are various language-specific practices. They are not universal. For example, in Spanish texts, I don’t think many people would find it acceptable to replace “ü” by “ue”, rather than just “u”, if some evil powers force you to stick to Ascii characters.

Yucca




Reply via email to