Re: Unicode->ASCII approximate conversion

Jungshik Shin Fri, 19 Dec 2003 07:02:33 -0800

On Fri, 19 Dec 2003 [EMAIL PROTECTED] wrote:

> Quoting Hallvard B Furuseth <[EMAIL PROTECTED]>:
>
> > I need a function which converts Latin Unicode characters to the closest
> > equivalent ASCII characters, e.g. "Ã" -> "e".


> 1. Produce the NFD normalisation of the text.
> 2. Remove all characters with a non-zero combining class.
> 3. Some non-ASCII characters may remain (particularly those from non-Latin
> scripts) handling of some can be done nicely, but some may require you to
> raise an exception or output a replacement character.

> on your application. Specialised handling of some characters is possible, for
> instance you could convert the trademark sign to "(TM)" to avoid confusion,

  For Korean syllables (U+AC00 - U+Dxxx), you can use 'Hangul Syllable
Short Names' that can be algorithmically derived with small tables.

Re: Unicode->ASCII approximate conversion

Reply via email to