Re: [lucy-user] Chinese support?

Peter Karman Fri, 17 Feb 2017 18:30:21 -0800

Hao Wu wrote on 2/17/17 4:44 PM:

Hi all,


I use the StandardTokenizer. search by English word work, but in
Chinese give me strange results.

my $tokenizer = Lucy::Analysis::StandardTokenizer->new;
my $raw_type = Lucy::Plan::FullTextType->new(
        analyzer => $tokenizer,
);

also, I was going to use the EasyAnalyzer (
https://metacpan.org/pod/distribution/Lucy/lib/Lucy/Analysis/EasyAnalyzer.pod
)
, but chinese in not supported.

What is the simple way to use lucy with chinese doc? Thanks.


There is currently no equivalent of
https://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/cjk/CJKTokenizer.html
within core Lucy.

Furthermore, there is no automatic language detection in Lucy. You'll note inhttps://metacpan.org/pod/distribution/Lucy/lib/Lucy/Analysis/EasyAnalyzer.podthat the language must be explicitly specified, and that is for the stemminganalyzer. Also, Chinese is not among the supported languages listed.

Maybe something wrapped around https://metacpan.org/pod/Lingua::CJK::Tokenizerwould work as a custom analyzer.


You can see an example in the documentation here
https://metacpan.org/pod/Lucy::Analysis::Analyzer#new



--
Peter Karman  .  https://peknet.com/  .  https://keybase.io/peterkarman

Re: [lucy-user] Chinese support?

Reply via email to