RE: CJK Analyzers for Solr

Norskog, Lance Mon, 26 Nov 2007 12:31:59 -0800

I notice this is in the future tense. Is the CJKTokenizer available yet?
>From what I can see, the CJK code should be a Filter instead anyway.
Also, the ChineseFilter and CJKTokenizer do two different things.


CJKTokenizer turns C1C2C3C4 into 'C1C2 C2C3 C3C4'. ChineseFilter (from
2001) turns C1C2 into 'C1 C2'. I hope someone who speaks Mandarin or
Cantonese understands what this should do.

Lance

-----Original Message-----
From: Eswar K [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 26, 2007 10:28 AM
To: solr-user@lucene.apache.org
Subject: Re: CJK Analyzers for Solr

Hoss,

Thanks a lot. Will look into it.

Regards,
Eswar

On Nov 26, 2007 11:55 PM, Chris Hostetter <[EMAIL PROTECTED]>
wrote:

>
> : Does Solr come with Language analyzers for CJK? If not, can you 
> please
> : direct me to some good CJK analyzers?
>
> Lucene has a CJKTokenizer and CJKAnalyzer in the contrib/analyzers
jar.
> they can be used in Solr.  both have been included in Solr for a while

> now, so you can specify CJKAnalyzer in your schema with Solr 1.2, but 
> starting with Solr 1.3 a Factory for the Tokenizer will also be 
> included so it can be used in a more complex analysis chain defined in
the schema.
>
>
>
> -Hoss
>
>

RE: CJK Analyzers for Solr

Reply via email to