Re: Is it possible to specigfy only one-character term synonym for 2-gram tokenizer?

Emir Arnautovic Thu, 22 Oct 2015 02:09:13 -0700

Hi Scott,

I don't have experience with Chinese, but SynonymFilter works on tokens,so if CJKTokenizer recognizes C1 and Cm as tokens, it should work. Ifnot, than you can try configuring PatternReplaceCharFilter to replace C1to C2 during indexing and searching and get a match.


Thanks,
Emir

On 22.10.2015 10:53, Scott Chu wrote:

Hi solr-user,
I always uses CJKTokenizer on appropriate amount of Chinese newsarticles. Say in Chinese, character C1 has same meaning ascharacter C2 (e.g 台=臺), Is it possible that I only add this line insynonym.txt:
C1,C2 (and in true exmaple: 台, 臺)
and by applying CJKTokenizer and SynonymFilter, I only have to query"C1Cm..." (say Cm is arbitrary Chinese character) and Solr willreturn documents that matche whether "C1Cm" or "C2Cm"?
Scott Chu，scott....@udngroup.com <mailto:scott....@udngroup.com>
2015/10/22


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Is it possible to specigfy only one-character term synonym for 2-gram tokenizer?

Reply via email to