Hi all, I'm using CJKTokenizerFactory tokenizer to handle text which contains both Japanese and alphabet words. However, I noticed that CJKTokenizerFactory converts alphabet to lowercase, so that I cannot use WordDelimiterFilterFactory filter with splitOnCaseChange property for camel case words.
I changed to NGramTokenizerFactory (2-gram), but it only parses first 1024 characters. Because of that, I cannot use NGramTokenizerFactory, neither. I tried the following two settings and both of them seem working fine, but I don't know if these are good or not, or if there are some other better solutions. 1) <tokenizer class="solr.CJKTokenizerFactory" /> <filter class="solr.NGramFilterFactory" maxGramSize="2" minGramSize="2" /> 2) <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.NGramFilterFactory" maxGramSize="1" minGramSize="1" /> If anyone can give me any advice, it would be nice. Thank you. Tiffany -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-search-camel-case-words-using-CJKTokenizer-tp3018853p3018853.html Sent from the Solr - User mailing list archive at Nabble.com.