Actually, this is one of the biggest disadvantage of Solr for multilingual content. Solr is field based which means you have to know the language _before_ you feed the content to a specific field and process the content for that field. This results in having separate fields for each language. E.g. for Europe this will be 24 to 26 languages for each title, keyword, description, ...
I guess when they started with Lucene/Solr they never had multilingual on their mind. The alternative is to have a separate index for each language. Therefore you also have to know the language of the content _before_ feeding to the core. E.g. again for Europe you end up with 24 to 26 cores. Onother option is to "see" the multilingual fields (title, keywords, description,...) as a "subdocument". Write a filter class as subpipeline, use language and encoding detection as first step in that pipeline and then go on with all other linguistic processing within that pipeline and return the processed content back to the field for further filtering and storing. Many solutions, but nothing out off the box :-) Bernd Am 22.09.2010 12:01, schrieb Andy: > I have documents that are in different languages. There's a field in the > documents specifying what language it's in. > > Is it possible to index the documents such that based on what language a > document is in, a different analyzer will be used on that document? > > What is the "normal" way to handle documents in different languages? > > Thanks > Andy > > >