On 23/09/2015 16:23, Alexandre Rafalovitch wrote:
You may find the following articles interesting:
http://discovery-grindstone.blogspot.ca/2014/01/searching-in-solr-analyzing-results-and.html
( a whole epic journey)
https://dzone.com/articles/indexing-chinese-solr
The latter article is great and we drew on it when helping a recent
client with Chinese indexing. However, if you do use Paoding bear in
mind that it has few if any tests and all the comments are in Chinese.
We found a problem with it recently (it breaks the Lucene highlighters)
and have submitted a patch:
http://git.oschina.net/zhzhenqin/paoding-analysis/issues/1
Cheers
Charlie
Regards,
Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 23 September 2015 at 10:41, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote:
Hi,
Would like to check, will StandardTokenizerFactory works well for indexing
both English and Chinese (Bilingual) documents, or do we need tokenizers
that are customised for chinese (Eg: HMMChineseTokenizerFactory)?
Regards,
Edwin
--
Charlie Hull
Flax - Open Source Enterprise Search
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk