Quote/Cytat - Elias Mårtenson <[email protected]> (Sat 20 Feb 2016
11:23:13 AM CET):
Hello Unicode,
I have been involved in a rather long discussion on the Emacs-devel mailing
list[1] concerning the right way to do character folding and we've reached
a point where input from Unicode experts would be welcome.
The problem is the implementation of equivalence when searching for
characters. For example, if I have a buffer containing the following
characters (both using the precomposed and canonical forms):
o ö ø ó n ñ
The character folding feature in Emacs allows a search for "o" to mach some
or even all of these characters. The discussion on the mailing list has
circulated around both the fact that the correct behaviour here is
locale-dependent, and also on the correct way to implement this matching
absent any locale-specific exceptions.
What about just using the POSIX equivalent classes in regular expression?
From
http://www.regular-expressions.info/posixbrackets.html
A POSIX locale can define character equivalents that indicate that
certain characters should be considered as identical for sorting. In
French, for example, accents are ignored when ordering words. élève
comes before être which comes before événement. é and ê are all the
same as e, but l comes before t which comes before v. With the locale
set to French, a POSIX-compliant regular expression engine matches e,
é, è and ê when you use the collating sequence [=e=] in the bracket
expression [[=e=]].
Regards
Janusz
(an Emacs user)
--
Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (Katedra
Lingwistyki Formalnej)
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
[email protected], [email protected], http://fleksem.klf.uw.edu.pl/~jsbien/