Typically there are three options that people do:

1. Put 'em all in one big field
2. Split Fields (as you and others have described) - not sure why no one ever splits on documents, which is viable too, but comes with repeated data
3. Split indexes

For your case, #1 isn't going to work since you want to search language specific. I'd likely go with #2, but #3 has it's merits too. #3 allows for managing the languages separately (you can update the Spanish document w/o affecting the English version, and also can take the whole collection offline if you want w/o affecting the other indexes), which can sometimes be helpful, but the cost is more operational complexity, etc.

-Grant

On Jul 22, 2009, at 12:39 PM, Andrew McCombe wrote:

Hi

We will  know the user's language choice before searching.

Regards
Andrew

2009/7/22 Grant Ingersoll <gsing...@apache.org>

How do you want to search those descriptions?  Do you know the query
language going in?


On Jul 22, 2009, at 6:12 AM, Andrew McCombe wrote:

Hi

We have a dataset that contains productname, category and descriptions.
The
descriptions can be in one or more different languages. What would be the
recommended way of indexing these?

My initial thoughts are to index each description as a separate field and
append the language identifier to the field name, for example, three
fields
with description_en, description_de, descrtiption_fr. Is this the best
approach or is there a better way?

Regards
Andrew McCombe


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
Solr/Lucene:
http://www.lucidimagination.com/search



Reply via email to