mmn jnbbbjb)nkkkk9nooooooon
Sent from my HTC ----- Reply message ----- From: "Shawn Heisey" <s...@elyograg.org> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> Subject: ICUTokenizer acting very strangely with oriental characters Date: Tue, Aug 12, 2014 19:00 See the original message on this thread for full details. Some additional information: This happens on version 4.6.1, 4.7.2, and 4.9.0. Here is a screenshot showing the analysis problem in more detail. The first line you can see is the ICUTokenizer. https://www.dropbox.com/s/9wbi7lz77ivya9j/ICUTokenizer-wrong-analysis.png The original field value was: 20世紀の100人;ポートレートアーカイブス;政治家・軍人;政治家・指導 者・軍人;[政 治],100peopeof20century,pploftwentycentury,pploftwentycentury Thanks, Shawn