On Oct 2, 2008, at 2:25 AM, Morgan Kay wrote:
> >> One approach is to transliterate your input, e.g.: >> >> http://interglacial.com/~sburke/tpj/as_html/tpj22.html >> -- Sean M. Burke, Unidecode!, 2001 >> >> That way, "Chrétien" becomes "chretien" or some such for the purpose >> of your search, but remains "Chrétien" in the text. >> >> For example, both El-Aaiún and El-Aaiun could reference the same >> underlying text: >> >> http://svr225.stepx.com:3388/El-Aaiún >> http://svr225.stepx.com:3388/El-Aaiun >> > > This looks really promising, but after reading up on this for a > while, I > don't see how to get it to work with Rails... could you give me a few > pointers or direct me to some documentation? At its core, Unidecode is simply a lookup table. Should be rather straightforward to port to Ruby if it hasn't been done already. Here is the original Perl implementation: http://search.cpan.org/~sburke/Text-Unidecode-0.04/lib/Text/Unidecode.pm And bellow is a Lua port of it: http://dev.alt.textdrive.com/browser/HTTP/Unidecode.lua As well as the lookup table themselves: http://dev.alt.textdrive.com/browser/HTTP/Unidecode Usage example: local Unidecode = require( 'Unidecode' ) print( 1, 'Москва́', Unidecode( 'Москва́' ) ) print( 2, '北京', Unidecode( '北京' ) ) print( 3, 'Ἀθηνᾶ', Unidecode( 'Ἀθηνᾶ' ) ) print( 4, '서울', Unidecode( '서울' ) ) print( 5, '東京', Unidecode( '東京' ) ) print( 6, '京都市', Unidecode( '京都市' ) ) print( 7, 'नेपाल', Unidecode( 'नेपाल' ) ) > 1 Москва́ Moskva > 2 北京 beijing > 3 Ἀθηνᾶ Athena > 4 서울 seoul > 5 東京 dongjing > 6 京都市 jingdushi > 7 नेपाल nepaal If Unidecode is too much of a good thing, one could use iconv translit or such, e.g. iconv( 'utf-8', 'us-ascii//TRANSLIT' )... One way or another, the crux of it is to transliterate your data as well as you query. And then use the later to search the former. Cheers, -- PA. http://alt.textdrive.com/nanoki/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en -~----------~----~----~----~------~----~------~--~---

