Re: Best approach to multiple languages

Grant Ingersoll Wed, 22 Jul 2009 11:03:24 -0700

Typically there are three options that people do:

1. Put 'em all in one big field

2. Split Fields (as you and others have described) - not sure why noone ever splits on documents, which is viable too, but comes withrepeated data

3. Split indexes

For your case, #1 isn't going to work since you want to searchlanguage specific. I'd likely go with #2, but #3 has it's meritstoo. #3 allows for managing the languages separately (you can updatethe Spanish document w/o affecting the English version, and also cantake the whole collection offline if you want w/o affecting the otherindexes), which can sometimes be helpful, but the cost is moreoperational complexity, etc.


-Grant

On Jul 22, 2009, at 12:39 PM, Andrew McCombe wrote:

Hi

We will  know the user's language choice before searching.

Regards
Andrew

2009/7/22 Grant Ingersoll <gsing...@apache.org>
How do you want to search those descriptions?  Do you know the query
language going in?


On Jul 22, 2009, at 6:12 AM, Andrew McCombe wrote:

Hi
We have a dataset that contains productname, category anddescriptions.
The
descriptions can be in one or more different languages. Whatwould be the
recommended way of indexing these?
My initial thoughts are to index each description as a separatefield and
append the language identifier to the field name, for example, three
fields
with description_en, description_de, descrtiption_fr. Is this thebest
approach or is there a better way?

Regards
Andrew McCombe
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using
Solr/Lucene:
http://www.lucidimagination.com/search

Re: Best approach to multiple languages

Reply via email to