Hi Scott,
I don't have experience with Chinese, but SynonymFilter works on tokens, so if CJKTokenizer recognizes C1 and Cm as tokens, it should work. If not, than you can try configuring PatternReplaceCharFilter to replace C1 to C2 during indexing and searching and get a match.

Thanks,
Emir

On 22.10.2015 10:53, Scott Chu wrote:
Hi solr-user,
I always uses CJKTokenizer on appropriate amount of Chinese news articles. Say in Chinese, character C1 has same meaning as character C2 (e.g 台=臺), Is it possible that I only add this line in synonym.txt:
C1,C2 (and in true exmaple: 台, 臺)
and by applying CJKTokenizer and SynonymFilter, I only have to query "C1Cm..." (say Cm is arbitrary Chinese character) and Solr will return documents that matche whether "C1Cm" or "C2Cm"?
Scott Chu,scott....@udngroup.com <mailto:scott....@udngroup.com>
2015/10/22


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Reply via email to