Hao Wu wrote on 2/17/17 4:44 PM:
Hi all,

I use the StandardTokenizer. search by English word work, but in
Chinese give me strange results.

my $tokenizer = Lucy::Analysis::StandardTokenizer->new;
my $raw_type = Lucy::Plan::FullTextType->new(
        analyzer => $tokenizer,

also, I was going to use the EasyAnalyzer (
, but chinese in not supported.

What is the simple way to use lucy with chinese doc? Thanks.

There is currently no equivalent of
within core Lucy.

Furthermore, there is no automatic language detection in Lucy. You'll note in https://metacpan.org/pod/distribution/Lucy/lib/Lucy/Analysis/EasyAnalyzer.pod that the language must be explicitly specified, and that is for the stemming analyzer. Also, Chinese is not among the supported languages listed.

Maybe something wrapped around https://metacpan.org/pod/Lingua::CJK::Tokenizer would work as a custom analyzer.

You can see an example in the documentation here

Peter Karman  .  https://peknet.com/  .  https://keybase.io/peterkarman

Reply via email to