Hi ,

In the  solr 1.3 download ,under the folder
src/java/org/apache/solr/analysis

I find the following tokenizer classes for other languages (other than
English)

1.Chinese tokenizer
2.cjk tokenizer which is not expected to work very well with Japanese for
Chinese we already have the Chinese tokenizer

only the above 2 tokenizer are there for the languages

I also see  stem filter factory  and  palin filtet factory for some
languages like
DutchStemFilterFactory,BrazilianStemFilterFactory.java
GermanStemFilterFactory etc

and the plain filter  like ChineseFilterFactory.java

What is the stem filter factory  does it stem the words without including
the snowball porter filter factory

what is the simple filter factories do ?

where do i look for analyzers for other languages and also the information
on for which languages i can use the standard analyzers?

For example given only all the above

for German language analysis  am i to use the standardard anlyzer with
German filter factory and German stemmers ?

are there more language specific tokenizers in lucene and if so what are the
steps to integrate into solr?

Regards
Revas

Reply via email to