Hi Solr users, This is my first posting to this list, after experimenting with Solr for a few days. Please bear with me.
I am trying to set up a text field for searching CJK text. At the moment, I am trying using the ngram tokenizer factory, defined in the schema.xml as follows: <fieldtype name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.NGramTokenizerFactory"/> <!-- <tokenizer class="org.apache.lucene.analysis.cjk.CJKTokenizer"/> --> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.NGramTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="variants.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldtype> I can test this in the administrative interface and it seems to work. However, when I do searches, I only get matches for single character searches, or for searches that match a complete text field. What I am trying to achieve is a substring match that would match any sequence of characters in the target field. Any help appreciated, Christian -- Christian Wittern, Kyoto