Thanks James, I will try out some stunts and post the result on the group.
Thanks, Gautam On Fri, Jul 31, 2009 at 10:14 PM, James Healy <[email protected]> wrote: > > Gautam Rege wrote: > > Example, > > I have some data stored in database in ISO-101646-1 and ISO-8859-1 > > encoding. i.e. devnagiri script. (for the sake of discussion). I need to > > search on the contents. > > > > Can this be done in sphinx? Any ideas? > > > > Would it be better if the database is in UTF-8 encoding and the localized > > data is then converted to UTF-8 and stored? > > If so, how can I index and search on a localized string? > > I believe you *can* index single byte encodings (like ISO-8859-1 and > friends) by setting the charset: option in sphinx.yml to "sbcs". This > will index your content at a byte level, however you'll need to setup a > charset_table if your language is anything other than english or > russian. > > If possible, storing your content in a utf-8 database will make things > significantly easier. TS uses utf-8 by default and all you'll need to do > is setup a charset_table with the Unicode codepoints you want indexed. > > For more info, have a read of the sphinx docs > (http://www.sphinxsearch.com/docs/current.html#charsets) and a blog post > I wrote on using TS with Unicode > (http://yob.id.au/blog/2008/05/08/thinking_sphinx_and_unicode/). > > -- James Healy <jimmy-at-deefa-dot-com> Sat, 01 Aug 2009 02:39:31 +1000 > > > > -- ~~~~~~~~~~~~~~~ All wiyht. Rho sritched mg kegtops awound? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en -~----------~----~----~----~------~----~------~--~---
