Gautam Rege wrote: > Example, > I have some data stored in database in ISO-101646-1 and ISO-8859-1 > encoding. i.e. devnagiri script. (for the sake of discussion). I need to > search on the contents. > > Can this be done in sphinx? Any ideas? > > Would it be better if the database is in UTF-8 encoding and the localized > data is then converted to UTF-8 and stored? > If so, how can I index and search on a localized string?
I believe you *can* index single byte encodings (like ISO-8859-1 and friends) by setting the charset: option in sphinx.yml to "sbcs". This will index your content at a byte level, however you'll need to setup a charset_table if your language is anything other than english or russian. If possible, storing your content in a utf-8 database will make things significantly easier. TS uses utf-8 by default and all you'll need to do is setup a charset_table with the Unicode codepoints you want indexed. For more info, have a read of the sphinx docs (http://www.sphinxsearch.com/docs/current.html#charsets) and a blog post I wrote on using TS with Unicode (http://yob.id.au/blog/2008/05/08/thinking_sphinx_and_unicode/). -- James Healy <jimmy-at-deefa-dot-com> Sat, 01 Aug 2009 02:39:31 +1000 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en -~----------~----~----~----~------~----~------~--~---
