Hi , In the solr 1.3 download ,under the folder src/java/org/apache/solr/analysis
I find the following tokenizer classes for other languages (other than English) 1.Chinese tokenizer 2.cjk tokenizer which is not expected to work very well with Japanese for Chinese we already have the Chinese tokenizer only the above 2 tokenizer are there for the languages I also see stem filter factory and palin filtet factory for some languages like DutchStemFilterFactory,BrazilianStemFilterFactory.java GermanStemFilterFactory etc and the plain filter like ChineseFilterFactory.java What is the stem filter factory does it stem the words without including the snowball porter filter factory what is the simple filter factories do ? where do i look for analyzers for other languages and also the information on for which languages i can use the standard analyzers? For example given only all the above for German language analysis am i to use the standardard anlyzer with German filter factory and German stemmers ? are there more language specific tokenizers in lucene and if so what are the steps to integrate into solr? Regards Revas