Gautam Rege wrote:
> Example,
>  I have some data stored in database in ISO-101646-1 and ISO-8859-1
> encoding. i.e. devnagiri script. (for the sake of discussion). I need to
> search on the contents.
> 
> Can this be done in sphinx? Any ideas?
> 
> Would it be better if the database is in UTF-8 encoding and the localized
> data is then converted to UTF-8 and stored?
> If so, how can I index and search on a localized string?

I believe you *can* index single byte encodings (like ISO-8859-1 and
friends) by setting the charset: option in sphinx.yml to "sbcs". This
will index your content at a byte level, however you'll need to setup a
charset_table if your language is anything other than english or
russian.

If possible, storing your content in a utf-8 database will make things
significantly easier. TS uses utf-8 by default and all you'll need to do
is setup a charset_table with the Unicode codepoints you want indexed.

For more info, have a read of the sphinx docs
(http://www.sphinxsearch.com/docs/current.html#charsets) and a blog post
I wrote on using TS with Unicode
(http://yob.id.au/blog/2008/05/08/thinking_sphinx_and_unicode/).

-- James Healy <jimmy-at-deefa-dot-com>  Sat, 01 Aug 2009 02:39:31 +1000

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to