Re: Problem with Han character in ICUFoldingFilter

2016-10-30 Thread Steve Rowe
Among several other foldings, ICUFoldingFilter performs the Unicode NFC transform, which consists of canonical decomposition (NFD) followed by canonical composition. NFD transforms U+FA04 to U+5B85, and canonical composition leaves U+5B85 as-is. U+FA04 is in the “Pronunciation variants from KS

Re: Problem with Han character in ICUFoldingFilter

2016-10-30 Thread Ahmet Arslan
Hi Eyal, ICUFoldingFilter uses http://site.icu-project.org under the hood. If you think there is a bug, it is better to ask its mailing list. Ahmet On Sunday, October 30, 2016 3:41 PM, "eyal.naam...@exlibrisgroup.com" wrote: Hi, I was wondering if anyone ran into the following issue, or a s

Problem with Han character in ICUFoldingFilter

2016-10-30 Thread eyal.naam...@exlibrisgroup.com
Hi, I was wondering if anyone ran into the following issue, or a similar one: In Han script there are two separate characters - 宅 (FA04) and 宅 (5B85). It seems that ICUFoldingFilter converts FA04 to 5B85, which results in the wrong character being indexed. Does anyone have any idea if and how thi