Re: How to remove accents while conforming to language standards?

Markus Scherer Mon, 04 Nov 2013 11:04:19 -0800

Hi Jennifer,

On Fri, Nov 1, 2013 at 8:37 AM, Jennifer Wong <[email protected]>wrote:


>  I would like to ask for advice on removing accents from characters.
> While the normalization process is straight forward (NFD, remove accents),
> it does not take into account of special cases. For example, Danish, "å"
> should be mapped to "aa", not "a". Likewise, in German, "ä" "ö" "ü" should
> be mapped to  "ae", "oe" and "ue" respectively, not "a", "e", "u". Are
> there common practices on how to handle these special cases? Thank you.
>

Can you describe what your use case is?

One possible area that appears not to have been discussed yet is sorting of
strings and full-text search (as in ctrl-F in a browser or word processor).
If you are after those, then please look for "unicode collation" and "cldr
collation". The ICU libraries
<http://userguide.icu-project.org/collation>might also help.

Best regards,
markus

Re: How to remove accents while conforming to language standards?

Reply via email to