Thank you everyone for your input. The use case is that customers want to integrate data from our enterprise solution to their ASCII-based downstream systems. Thus all accents need to be removed.
Ilay's "Transliteration on Passport" doc is very useful. We can use that as a basis to map special transliteration cases before normalizing and removing accents. Jennifer From: Markus Scherer <[email protected]<mailto:[email protected]>> Date: Monday, November 4, 2013 11:54 AM To: Jennifer Wong <[email protected]<mailto:[email protected]>> Cc: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: How to remove accents while conforming to language standards? Hi Jennifer, On Fri, Nov 1, 2013 at 8:37 AM, Jennifer Wong <[email protected]<mailto:[email protected]>> wrote: I would like to ask for advice on removing accents from characters. While the normalization process is straight forward (NFD, remove accents), it does not take into account of special cases. For example, Danish, "å" should be mapped to "aa", not "a". Likewise, in German, "ä" "ö" "ü" should be mapped to "ae", "oe" and "ue" respectively, not "a", "e", "u". Are there common practices on how to handle these special cases? Thank you. Can you describe what your use case is? One possible area that appears not to have been discussed yet is sorting of strings and full-text search (as in ctrl-F in a browser or word processor). If you are after those, then please look for "unicode collation" and "cldr collation". The ICU libraries<http://userguide.icu-project.org/collation> might also help. Best regards, markus

