Re: Annoyances from Implementation of Canonical Equivalence

2019-10-16 Thread Richard Wordingham via Unicode
On Wed, 16 Oct 2019 09:33:38 +0300
Eli Zaretskii via Unicode  wrote:

> > These are complaints about primary-level searches, not canonical
> > equivalence.  
> 
> Not sure what you call primary-level searches, but if you deduced the
> complaints were only about searches for base characters, then that's
> not so.  They are long discussions with many sub-threads, so it might
> be hard to find the specific details you are looking for.

The nearest I've found to complaints about including canonical
equivalences are:

(a) an observation that very occasionally one would need to switch
canonical equivalence off.  In such cases, one is not concerned with
the text as such, but rather with how Unicode non-compliant processes
will handle it.  Compliant processes are often built out of
non-compliant processes.

(b) just possibly

"What we have seen is that the behavior that comes from that Unicode
data does not please the users very much.  Users seem to have many
different ideas of what folding is useful, and disagree with each
other greatly." -
https://lists.gnu.org/archive/html/emacs-devel/2016-02/msg01359.html

I can't tell what (b) was talking about; it may well have been about
folding or asymmetric search, as opposed to supporting canonical
equivalence.

(c) A search for 'n' finding 'ñ'.

When it comes to canonical equivalence, one answer to (c) is that as
soon as one adds the next letter letter, e.g. 'na', the search will no
longer match 'ñ'.  (This doesn't apply to diacritic-ignoring folding.)
That argument doesn't work with the Polish letter 'ń' though, as it can
be word-final.

In programming, one might be able to prevent the issue
by using 'n\b{g}', but that is a requirement of RL2.2, which doesn't
seem to be high on the list of implementer's priorities, especially as
it depends on properties outwith the UCD, defined in a non-ASCII file
to boot.  A better supported solution is probably 'n\P{Mn}'.

In many cases, the answer might be a search by collation graphemes, but
that has other issues besides language sensitivity.

Richard.



Re: Annoyances from Implementation of Canonical Equivalence (was: Pure Regular Expression Engines and Literal Clusters)

2019-10-16 Thread Eli Zaretskii via Unicode
> Date: Tue, 15 Oct 2019 20:52:15 +0100
> From: Richard Wordingham via Unicode 
> 
> > > > I'm well aware of the official position.  However, when we
> > > > attempted to implement it unconditionally in Emacs, some people
> > > > objected, and brought up good reasons.  You can, of course, elect
> > > > to disregard this experience, and instead learn it from your
> > > > own.  
> > > 
> > > Is there a good record of these complaints anywhere?  
> > 
> > You could look up these discussions:
> > 
> >   https://lists.gnu.org/archive/html/emacs-devel/2016-02/msg00189.html
> >   https://lists.gnu.org/archive/html/emacs-devel/2016-02/msg00506.html
> 
> These are complaints about primary-level searches, not canonical
> equivalence.

Not sure what you call primary-level searches, but if you deduced the
complaints were only about searches for base characters, then that's
not so.  They are long discussions with many sub-threads, so it might
be hard to find the specific details you are looking for.

However, the conclusion was very firm, and since we made the folding
optional 3 years ago, we had no complaints.