Consider collation when proving subquery uniqueness rel_is_distinct_for()'s RTE_SUBQUERY branch passed only the equality operator from each join clause to query_is_distinct_for(), discarding the operator's input collation. query_is_distinct_for() then verified opfamily compatibility but never checked collations, so a DISTINCT / GROUP BY / set-op operating under one collation was trusted to prove uniqueness for a comparison performed under an unrelated collation. As with the recent fix in relation_has_unique_index_for(), this is unsound for nondeterministic collations and yields wrong query results in any optimization that consumes the proof.
Fix by carrying each clause's operator input collation into query_is_distinct_for() and validating it at every check-site against the subquery target expression's collation. Back-patch to all supported branches. query_is_distinct_for() is declared in an installed header, so on stable branches the existing two-list signature is retained as a thin wrapper that forwards to a new collation-aware entry point; external callers continue to receive the historical collation-blind answer. Author: Richard Guo <[email protected]> Reviewed-by: Tom Lane <[email protected]> Discussion: https://postgr.es/m/CAMbWs4_XUUSTyzCaRjUeeahWNqi=8zoa5q4coi8zuvedsbk...@mail.gmail.com Backpatch-through: 14 Branch ------ REL_14_STABLE Details ------- https://git.postgresql.org/pg/commitdiff/172034f6e08847219239b5fa45f2d4117aadf11b Modified Files -------------- src/backend/optimizer/plan/analyzejoins.c | 182 +++++++++++++++++-------- src/test/regress/expected/collate.icu.utf8.out | 181 ++++++++++++++++++++++++ src/test/regress/sql/collate.icu.utf8.sql | 58 ++++++++ src/tools/pgindent/typedefs.list | 1 + 4 files changed, 366 insertions(+), 56 deletions(-)
