I notice this is in the future tense. Is the CJKTokenizer available yet? >From what I can see, the CJK code should be a Filter instead anyway. Also, the ChineseFilter and CJKTokenizer do two different things.
CJKTokenizer turns C1C2C3C4 into 'C1C2 C2C3 C3C4'. ChineseFilter (from 2001) turns C1C2 into 'C1 C2'. I hope someone who speaks Mandarin or Cantonese understands what this should do. Lance -----Original Message----- From: Eswar K [mailto:[EMAIL PROTECTED] Sent: Monday, November 26, 2007 10:28 AM To: solr-user@lucene.apache.org Subject: Re: CJK Analyzers for Solr Hoss, Thanks a lot. Will look into it. Regards, Eswar On Nov 26, 2007 11:55 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : Does Solr come with Language analyzers for CJK? If not, can you > please > : direct me to some good CJK analyzers? > > Lucene has a CJKTokenizer and CJKAnalyzer in the contrib/analyzers jar. > they can be used in Solr. both have been included in Solr for a while > now, so you can specify CJKAnalyzer in your schema with Solr 1.2, but > starting with Solr 1.3 a Factory for the Tokenizer will also be > included so it can be used in a more complex analysis chain defined in the schema. > > > > -Hoss > >