Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

Charlie Hull Fri, 25 Sep 2015 01:44:01 -0700

On 23/09/2015 16:23, Alexandre Rafalovitch wrote:

You may find the following articles interesting:
http://discovery-grindstone.blogspot.ca/2014/01/searching-in-solr-analyzing-results-and.html
( a whole epic journey)
https://dzone.com/articles/indexing-chinese-solr

The latter article is great and we drew on it when helping a recentclient with Chinese indexing. However, if you do use Paoding bear inmind that it has few if any tests and all the comments are in Chinese.We found a problem with it recently (it breaks the Lucene highlighters)and have submitted a patch:http://git.oschina.net/zhzhenqin/paoding-analysis/issues/1


Cheers

Charlie


Regards,
    Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 23 September 2015 at 10:41, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote:

Hi,

Would like to check, will StandardTokenizerFactory works well for indexing
both English and Chinese (Bilingual) documents, or do we need tokenizers
that are customised for chinese (Eg: HMMChineseTokenizerFactory)?


Regards,
Edwin



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

Reply via email to