On Oct 2, 2008, at 2:25 AM, Morgan Kay wrote:

>
>> One approach is to transliterate your input, e.g.:
>>
>> http://interglacial.com/~sburke/tpj/as_html/tpj22.html
>> -- Sean M. Burke, Unidecode!, 2001
>>
>> That way, "Chrétien" becomes "chretien" or some such for the purpose
>> of your search, but remains "Chrétien" in the text.
>>
>> For example, both El-Aaiún and El-Aaiun could reference the same
>> underlying text:
>>
>> http://svr225.stepx.com:3388/El-Aaiún
>> http://svr225.stepx.com:3388/El-Aaiun
>>
>
> This looks really promising, but after reading up on this for a  
> while, I
> don't see how to get it to work with Rails... could you give me a few
> pointers or direct me to some documentation?

At its core, Unidecode is simply a lookup table. Should be rather  
straightforward to port to Ruby if it hasn't been done already.

Here is the original Perl implementation:

http://search.cpan.org/~sburke/Text-Unidecode-0.04/lib/Text/Unidecode.pm

And bellow is a Lua port of it:

http://dev.alt.textdrive.com/browser/HTTP/Unidecode.lua

As well as the lookup table themselves:

http://dev.alt.textdrive.com/browser/HTTP/Unidecode

Usage example:

local Unidecode = require( 'Unidecode' )

print( 1, 'Москва́', Unidecode( 'Москва́' ) )
print( 2, '北京', Unidecode( '北京' ) )
print( 3, 'Ἀθηνᾶ', Unidecode( 'Ἀθηνᾶ' ) )
print( 4, '서울', Unidecode( '서울' ) )
print( 5, '東京', Unidecode( '東京' ) )
print( 6, '京都市', Unidecode( '京都市' ) )
print( 7, 'नेपाल', Unidecode( 'नेपाल' ) )

 > 1    Москва́ Moskva
 > 2    北京      beijing
 > 3    Ἀθηνᾶ   Athena
 > 4    서울      seoul
 > 5    東京      dongjing
 > 6    京都市     jingdushi
 > 7    नेपाल   nepaal

If Unidecode is too much of a good thing, one could use iconv translit  
or such, e.g. iconv( 'utf-8', 'us-ascii//TRANSLIT' )...

One way or another, the crux of it is to transliterate your data as  
well as you query. And then use the later to search the former.

Cheers,

--
PA.
http://alt.textdrive.com/nanoki/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to