Hi Solr users,

This is my first posting to this list, after experimenting with Solr
for a few days.  Please bear with me.

I am trying to set up a text field for searching CJK text.  At the
moment, I am trying using the ngram tokenizer factory, defined in the
schema.xml as follows:

    <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.NGramTokenizerFactory"/>
        <!--        <tokenizer
class="org.apache.lucene.analysis.cjk.CJKTokenizer"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.NGramTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="variants.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldtype>

I can test this in the administrative interface and it seems to work.
However, when I do searches, I only get matches for single character
searches, or for searches that match a complete text field.  What I am
trying to achieve is a substring match that would match any sequence
of characters in the target field.

Any help appreciated,

Christian



-- 
Christian Wittern, Kyoto

Reply via email to