On 05/06/18 07:10, Melanie Plageman wrote:
Hi,
I noticed what seems like a bug in collation.

This query errors out:
SELECT (SELECT 'c' COLLATE "de_DE") > (SELECT 'ç' COLLATE "es_ES");  --
error

While this query does not:
SELECT 'c' COLLATE "de_DE" > (SELECT 'ç' COLLATE "es_ES");  -- no error

It seems like this is in conflict with what the documentation
<https://www.postgresql.org/docs/devel/static/collation.html> says:
"If any input expression has an explicit collation derivation, then all
explicitly derived collations among the input expressions must be the same,
otherwise an error is raised."

After digging into it a bit, I believe the reason for this is that when we
are assigning collations in assign_collations_walker, we always set
collation strength to IMPLICIT for the subquery and always set the
collation strength to EXPLICIT for the collate node on the other side of
the OpExpr. So, we don't hit an error later like the one in
merge_collation_state when the collation of one expression is conflicting
with that of its parent and the strength of both is EXPLICIT.
I think this still applies to our case because one of the two arguments to
OpExpr would have set their parent's collation strength to either IMPLICIT
or EXPLICIT and then we will process the other argument which would then
have a different collation strength than the one we just set its parent to.
So, we end up setting the inputcollid for the OpExpr to that of the
explicit collation in the collate node.

Basically, it seems like our subquery will always have its collation
strength set to IMPLICIT, so, if we have explicit collation in the first
target entry of the subquery's target list, it looks like we will never
truly treat that as explicit collation.

Right, a subquery's result is considered IMPLICIT, even if there is an explicit COLLATE inside the subquery.

You could mark the subquery's result with a collation like this:

postgres=# SELECT 'c' COLLATE "de_DE" > (SELECT 'ç') COLLATE "es_ES";
ERROR:  collation mismatch between explicit collations "de_DE" and "es_ES"

I'm not sure if this behavior is considered a bug, but I also can't imagine
how it would be expected given the current documentation. It seems to me
one or the other should be updated.

It seems correct to me. It does say "An explicit collation derivation occurs when a COLLATE clause is used; all other collation derivations are implicit". A subquery falls under the "all other collation derivations" category. Perhaps we could make it more clear what the COLLATE clause binds to, especially with subqueries, but I'm not sure how exactly to phrase it. Perhaps an additional example with a subquery would help?

- Heikki

Reply via email to