On Wed, 2025-06-11 at 09:03 +0200, Peter Eisentraut wrote: > I think no matter how we slice it, there is going to be some case > that > will be degraded until some update is applied.
The problem I see is a conflict between two goals: 1. Record appropriate dependencies if a function is sensitive to ordering or ctype. 2. Raise parse errors if we cannot infer the collation for a function call site where the function is sensitive to ordering or ctype. The safest assumption with respect to the first goal is to assume that UDFs are sensitive to ordering and ctype. Otherwise, we will miss recording dependencies for, e.g., a validation function that uses a regex that depends on character classification. But the safest assumption with respect to the second goal is to assume that UDFs are not sensitive to ordering or ctype. Otherwise, we'd throw an error for queries that work today (see below example). To resolve this conflict I think we need some notion about whether the markings are explicitly specified or left as the defaults. If CREATE FUNCTION doesn't specify any markings, then the dependency tracking code can make one assumption and the parser can make the opposite assumption. We need to sort out the actual syntax of CREATE FUNCTION, and I'm starting to think we need some options syntax (similar to storage parameters for CREATE TABLE). > Why would a user define their own concatenation function? It's more likely that someone combines a few primitive functions: CREATE OR REPLACE FUNCTION shorten(t TEXT) RETURNS TEXT LANGUAGE plpgsql AS $$ BEGIN IF (length(t) < 4) THEN RETURN t; END IF; RETURN substr(t,1,1) || (length(t) - 2)::text || substr(t,length(t),1); END; $$; CREATE TABLE c(x TEXT COLLATE "C", y TEXT COLLATE "en_US"); INSERT INTO c VALUES ('kuber','netes'); SELECT x = y FROM c; ERROR: could not determine which collation to use for string comparison -- currently succeeds even though collation cannot be inferred SELECT shorten(x || y) FROM c; shorten --------- k8s (1 row) The example is a bit silly, but I think there are realistic cases along those lines. > Everything beyond that looks at the characters and needs to take > collation/ctype/etc. into account. I'm not sure. My guess would be that the various kinds of markings you might want (or no markings at all) are all common enough cases that they shouldn't be ignored. > > Perhaps pg_dump can apply some properties during upgrades? Interesting idea. We'd still need to account for CREATE FUNCTION statements that come from other places (e.g. direct from applications, or migration scripts, or extension scripts). > Regards, Jeff Davis