Quote/Cytat - Elias Mårtenson <[email protected]> (Sat 20 Feb 2016 11:23:13 AM CET):

Hello Unicode,

I have been involved in a rather long discussion on the Emacs-devel mailing
list[1] concerning the right way to do character folding and we've reached
a point where input from Unicode experts would be welcome.

The problem is the implementation of equivalence when searching for
characters. For example, if I have a buffer containing the following
characters (both using the precomposed and canonical forms):

    o ö ø ó n ñ

The character folding feature in Emacs allows a search for "o" to mach some
or even all of these characters. The discussion on the mailing list has
circulated around both the fact that the correct behaviour here is
locale-dependent, and also on the correct way to implement this matching
absent any locale-specific exceptions.

What about just using the POSIX equivalent classes in regular expression?

From

http://www.regular-expressions.info/posixbrackets.html

A POSIX locale can define character equivalents that indicate that certain characters should be considered as identical for sorting. In French, for example, accents are ignored when ordering words. élève comes before être which comes before événement. é and ê are all the same as e, but l comes before t which comes before v. With the locale set to French, a POSIX-compliant regular expression engine matches e, é, è and ê when you use the collating sequence [=e=] in the bracket expression [[=e=]].

Regards

Janusz
(an Emacs user)


--
Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej)
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
[email protected], [email protected], http://fleksem.klf.uw.edu.pl/~jsbien/

Reply via email to