-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 John Machin wrote: > I have developed a table which maps most latin-decorated Unicode > characters into the non-decorated basic form.
This is a fascinating article by Sean Burke (a linguist) about converting all Unicode characters into US-ASCII. The conversion is primarily based on sound, so in theory running soundex on the result could be somewhat useful. http://interglacial.com/~sburke/tpj/as_html/tpj22.html You can find his tables at this link encoded as perl data structures. http://cpansearch.perl.org/src/SBURKE/Text-Unidecode-0.04/lib/Text/Unidecode/ Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAko4c4wACgkQmOOfHg372QQwUwCglqxQzZSGjHHoL13/L8Kw6NrX 46wAn3q12ugcrBryawTwpV8bjs/nYlZe =XPU9 -----END PGP SIGNATURE----- _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users