On 2020-01-24 17:22, Tom Lane wrote:
Alvaro Herrera <alvhe...@2ndquadrant.com> writes:
But that's a different POV. The input to this function could come from
arbitrary user input from any application whatsoever. So the only
reason we can get away with that is because the example regression case
Juan José added (which uses non-normals) does not conform to the
standard.
I'm unsure about "conforming to standard", but I think it's reasonable
to put the onus of doing normalization when necessary on the user.
Otherwise, we need to move normalization logic into basically all
the string processing functions (even texteq), which seems like a
pretty huge cost that will benefit only a small minority of people.
(If it's not a small minority, then where's the bug reports complaining
that we don't do it today?)
These reports do exist, and this behavior is known. However, the impact
is mostly that results "look wrong" (looks the same but doesn't compare
as equal) rather than causing inconsistency and corruption, so it's
mostly shrugged off. The nondeterministic collation feature was
introduced in part to be able to deal with this; the pending
normalization patch is another. However, this behavior is baked deeply
into Unicode, so no single feature or facility will simply make it go away.
AFAICT, we haven't so far had any code that does a lookup of non-ASCII
strings in a table, so that's why we haven't had this discussion yet.
Now that I think about it, you could also make an argument that this
should be handled through collation, so the function that looks up the
string in the locale table should go through texteq. However, this
would mostly satisfy the purists but create a bizarre user experience.
Looking through the patch quickly, if you want to get Unicode-fancy,
doing a case-insensitive comparison by running lower-case on both
strings is also wrong in corner cases. All the Greek month names end in
sigma, so I suspect that this patch might not work correctly in such cases.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services