Typically there are three options that people do:
1. Put 'em all in one big field
2. Split Fields (as you and others have described) - not sure why no
one ever splits on documents, which is viable too, but comes with
repeated data
3. Split indexes
For your case, #1 isn't going to work since you want to search
language specific. I'd likely go with #2, but #3 has it's merits
too. #3 allows for managing the languages separately (you can update
the Spanish document w/o affecting the English version, and also can
take the whole collection offline if you want w/o affecting the other
indexes), which can sometimes be helpful, but the cost is more
operational complexity, etc.
-Grant
On Jul 22, 2009, at 12:39 PM, Andrew McCombe wrote:
Hi
We will know the user's language choice before searching.
Regards
Andrew
2009/7/22 Grant Ingersoll <gsing...@apache.org>
How do you want to search those descriptions? Do you know the query
language going in?
On Jul 22, 2009, at 6:12 AM, Andrew McCombe wrote:
Hi
We have a dataset that contains productname, category and
descriptions.
The
descriptions can be in one or more different languages. What
would be the
recommended way of indexing these?
My initial thoughts are to index each description as a separate
field and
append the language identifier to the field name, for example, three
fields
with description_en, description_de, descrtiption_fr. Is this the
best
approach or is there a better way?
Regards
Andrew McCombe
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using
Solr/Lucene:
http://www.lucidimagination.com/search