I used it in a demo where I searched for Thai words using approximate
English sound-equivalent:
https://github.com/arafalov/solr-thai-test/blob/master/collection1/conf/schema.xml#L34
I thought that was pretty cool and unexpectedly powerful :-)
Regards,
Alex.
http://www.solr-start.com/ -
el.da...@nih.gov]
> Sent: Tuesday, June 20, 2017 12:02 PM
> To: solr-user@lucene.apache.org
> Subject: RE: How are people using the ICUTokenizer?
>
> Joel,
>
> I think the issue is doing word-breaking according to ICU rules. So, if
> you are trying to make sure your index breaks words
g in 6.6.
> use the ICUNormalizer
I could not agree with this more.
-Original Message-
From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.da...@nih.gov]
Sent: Tuesday, June 20, 2017 12:02 PM
To: solr-user@lucene.apache.org
Subject: RE: How are people using the ICUTokenizer?
Joel,
I think
knows more than I do.
-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com]
Sent: Tuesday, June 20, 2017 12:13 PM
To: solr-user@lucene.apache.org
Subject: Re: How are people using the ICUTokenizer?
Have you successfully used the shingles with the MoreLikeThis query
Have you successfully used the shingles with the MoreLikeThis query?
Really curious about if this would to return the "interesting Phrases"
On Tue, Jun 20, 2017 at 12:01 PM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:
> Joel,
>
> I think the issue is doing word-breaking according
Joel,
I think the issue is doing word-breaking according to ICU rules. So, if you
are trying to make sure your index breaks words properly on eastern languages,
just use ICU Tokenizer. Unless your text is already in an ICU normal form,
you should always use the ICUNormalizer character