On Dec 18, 2008, at 6:25 AM, Sujatha Arun wrote:
Hi,
I am prototyping lanuage search using solr 1.3 .I have 3 fields in
the
schema -id,content and language.
I am indexing 3 pdf files ,the languages are foroyo,chinese and
japanese.
I use xpdf to convert the content of pdf to text and push the text
to solr
in the content field.
What is the analyzer that i need to use for the above.
By using the default text analyzer and posting this content to solr,
i am
not getting any results.
Does solr support stemming for the above languages.
I'm not familiar with Foroyo, but there should be tokenizers/analysis
available for Chines and Japanese. Are you putting all three
languages into the same field? If that is the case, you will need
some type of language detection piece that can choose the correct
analyzer.
How are your users searching? That is, do you know the language they
want to search in? If so, then you can have a field for each language.
-Grant