Hi Scott,
I don't have experience with Chinese, but SynonymFilter works on tokens,
so if CJKTokenizer recognizes C1 and Cm as tokens, it should work. If
not, than you can try configuring PatternReplaceCharFilter to replace C1
to C2 during indexing and searching and get a match.
Thanks,
Emir
On 22.10.2015 10:53, Scott Chu wrote:
Hi solr-user,
I always uses CJKTokenizer on appropriate amount of Chinese news
articles. Say in Chinese, character C1 has same meaning as
character C2 (e.g 台=臺), Is it possible that I only add this line in
synonym.txt:
C1,C2 (and in true exmaple: 台, 臺)
and by applying CJKTokenizer and SynonymFilter, I only have to query
"C1Cm..." (say Cm is arbitrary Chinese character) and Solr will
return documents that matche whether "C1Cm" or "C2Cm"?
Scott Chu,scott....@udngroup.com <mailto:scott....@udngroup.com>
2015/10/22
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/