回复: solr search is not working when we use `WhitespaceTokenizerFactory` and wanted to search chinese character

Shi Jinghai Fri, 17 May 2024 06:20:38 -0700

Hi Vishal,

When searching Chinese, you have to use a proper parser to parse the Chinese 
characters to be space seperated, for example in the case you mentioned:


Original query text: (第二期)
Parsed query text: ( 第 二 期 )

In English for better understood:
Original query text: (TheSecondTerm)
Parsed query text: ( The Second Term )

As you see, TheSencondTerm is not indexed and cannot be searched, neither (第二期).

Now you can get the right results, as Lucene behind has indexed each Chinese 
character as a word (in English concept) and works perfect.

Regards,

Shi Jinghai

________________________________
发件人: Vishal Patel <vishalpatel199...@outlook.com>
发送时间: 2024年5月17日 12:33
收件人: users@solr.apache.org <users@solr.apache.org>
主题: solr search is not working when we use `WhitespaceTokenizerFactory` and 
wanted to search chinese character

I do have solr schema configuration for field "test" having "text" type and for 
it we have used solr.WhitespaceTokenizerFactory
Now data get indexed in solr for this "test" field but when we try to search 
it, I am not able to get result using solr query like below

field:(（第二期）) /  field:(（第二期）*)

For example: input value : （第二期） being index but when we try to search it than 
not able to get result.
I have tried to use solr.ShingleFilterFactory using query analyser but won't 
help here.
Any suggestions to fix the same ?
Thanks in advance !

Regards,
Vishal Patel

回复: solr search is not working when we use `WhitespaceTokenizerFactory` and wanted to search chinese character

Reply via email to