On Fri, 19 Dec 2003 [EMAIL PROTECTED] wrote: > Quoting Hallvard B Furuseth <[EMAIL PROTECTED]>: > > > I need a function which converts Latin Unicode characters to the closest > > equivalent ASCII characters, e.g. "Ã" -> "e".
> 1. Produce the NFD normalisation of the text. > 2. Remove all characters with a non-zero combining class. > 3. Some non-ASCII characters may remain (particularly those from non-Latin > scripts) handling of some can be done nicely, but some may require you to > raise an exception or output a replacement character. > on your application. Specialised handling of some characters is possible, for > instance you could convert the trademark sign to "(TM)" to avoid confusion, For Korean syllables (U+AC00 - U+Dxxx), you can use 'Hangul Syllable Short Names' that can be algorithmically derived with small tables.

