Re: Chinese chars are not indexed ?

go canal Mon, 28 Jun 2010 00:28:40 -0700

oh yes, *...* works. thanks.

I saw tokenizer is defined in schema.xml. There are a few places that define 
the tokenizer. Wondering if it is enough to define one for:


<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
       <!--  --------  this is the only one I need to modify ? --------- -->
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <!-- --------------------------------------------------------- -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>
      </analyzer>    </fieldType>

 thanks,
canal




________________________________
From: Ahmet Arslan <iori...@yahoo.com>
To: solr-user@lucene.apache.org
Sent: Mon, June 28, 2010 2:54:16 PM
Subject: Re: Chinese chars are not indexed ?

> I am using the sample, not deploying Solr in Tomcat. Is
> there a place I can modify this setting ?


Ha, okey if you are using jetty with java -jar start.jar then it is okey.
But for Chinese you need special tokenizer since Chinese is written without 
spaces between words.

<tokenizer class="solr.CJKTokenizerFactory"/>


Or you can search with both leading and trailing star. q=*ChineseText* should 
return something.

Re: Chinese chars are not indexed ?

Reply via email to