On Dec 18, 2008, at 6:25 AM, Sujatha Arun wrote:

Hi,
I am prototyping lanuage search using solr 1.3 .I have 3 fields in the
schema -id,content and language.

I am indexing 3 pdf files ,the languages are foroyo,chinese and japanese.

I use xpdf to convert the content of pdf to text and push the text to solr
in the content field.

What is the analyzer  that i need to use for the above.

By using the default text analyzer and posting this content to solr, i am
not getting any  results.

Does solr support stemming for the above languages.

I'm not familiar with Foroyo, but there should be tokenizers/analysis available for Chines and Japanese. Are you putting all three languages into the same field? If that is the case, you will need some type of language detection piece that can choose the correct analyzer.

How are your users searching? That is, do you know the language they want to search in? If so, then you can have a field for each language.

-Grant

Reply via email to