> From: "Doug Ewell" <[email protected]> > Date: Sat, 20 Feb 2016 14:43:15 -0700 > > > What about language-independent character-folding: where in the > > Unicode database is the data for that? > > The OP kind of alluded to that: there is no such thing really as > language-independent character folding.
Emacs is currently looking for a useful approximation, given that the language of the text is in general unknown. The folding can be toggled off (either as a global default, or for the current search), for those use cases where it is undesirable or gets in the way. > About the closest approximation you can get using Unicode data alone > (not CLDR) is to normalize to NFD, then ignore the combining diacritics. This is what Emacs currently does, IIUC what you say. The NFD normalization uses the decomposition data included with UnicodeData.txt. Is this what you mean? > But that still doesn't work for a character like ø, which doesn't > decompose to o + anything Why doesn't it, btw? Same question about ł. I've heard an opinion that UnicodeData.txt only included decompositions when the combining mark's glyphs don't overlap those of the basic character. Is that correct? > and more importantly, it still won't meet expectations because of > the n/ñ and o/ö/ø language-dependency problems. Given that the feature can be turned off easily, do you think that it will nonetheless be useful, even though language-dependent parts are not available?

