Re: Character folding in text editors

Janusz S. Bien Sat, 20 Feb 2016 09:14:35 -0800

Quote/Cytat - Elias Mårtenson <[email protected]> (Sat 20 Feb 201611:23:13 AM CET):

Hello Unicode,


I have been involved in a rather long discussion on the Emacs-devel mailing
list[1] concerning the right way to do character folding and we've reached
a point where input from Unicode experts would be welcome.

The problem is the implementation of equivalence when searching for
characters. For example, if I have a buffer containing the following
characters (both using the precomposed and canonical forms):

    o ö ø ó n ñ

The character folding feature in Emacs allows a search for "o" to mach some
or even all of these characters. The discussion on the mailing list has
circulated around both the fact that the correct behaviour here is
locale-dependent, and also on the correct way to implement this matching
absent any locale-specific exceptions.


What about just using the POSIX equivalent classes in regular expression?

From

http://www.regular-expressions.info/posixbrackets.html

A POSIX locale can define character equivalents that indicate thatcertain characters should be considered as identical for sorting. InFrench, for example, accents are ignored when ordering words. élèvecomes before être which comes before événement. é and ê are all thesame as e, but l comes before t which comes before v. With the localeset to French, a POSIX-compliant regular expression engine matches e,é, è and ê when you use the collating sequence [=e=] in the bracketexpression [[=e=]].


Regards

Janusz
(an Emacs user)


--

Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (KatedraLingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
[email protected], [email protected], http://fleksem.klf.uw.edu.pl/~jsbien/

Re: Character folding in text editors

Reply via email to