Re: "textels"

Eric Muller Fri, 16 Sep 2016 08:50:54 -0700

On 9/16/2016 8:30 AM, Janusz S. Bien wrote:

Quote/Cytat - Eric Muller <[email protected]> (pią, 16 wrz 2016,17:03:54):
On 9/16/2016 6:52 AM, Janusz S. Bień wrote:
(when working on a corpus of historical Polish we
noticed some cases where standard Unicode equivalence was not
convenient).
I'm very interested to know more about those cases.
For our search engine we were unable to use compatibility equivalence"out of the box" for splitting the ligature because it also convertedlong s to short s while we wanted to preserve the distinction.

I am interested in the problems with *canonical* equivalence. I thoughtthat you were talking about those before.

Compatibility equivalence is a completely different beast. It is, IMHO,too coarse a tool and best forgotten. For any particular task, it'stypically doing too much (e.g. long/short s folding in your case) andtoo little (not everything you need). There was an attempt at improvingthe situation, by providing a whole bunch of fine grained, targetedtransformations (http://www.unicode.org/reports/tr30/), but that did notpan out.


Eric.



Thanks,
Eric.

Re: "textels"

Reply via email to