Re: [19] Proposal: function markers to indicate collation/ctype sensitivity

Jeff Davis Wed, 11 Jun 2025 15:01:40 -0700

On Wed, 2025-06-11 at 09:03 +0200, Peter Eisentraut wrote:
> I think no matter how we slice it, there is going to be some case
> that 
> will be degraded until some update is applied.


The problem I see is a conflict between two goals:

1. Record appropriate dependencies if a function is sensitive to
ordering or ctype.

2. Raise parse errors if we cannot infer the collation for a function
call site where the function is sensitive to ordering or ctype.

The safest assumption with respect to the first goal is to assume that
UDFs are sensitive to ordering and ctype. Otherwise, we will miss
recording dependencies for, e.g., a validation function that uses a
regex that depends on character classification.

But the safest assumption with respect to the second goal is to assume
that UDFs are not sensitive to ordering or ctype. Otherwise, we'd throw
an error for queries that work today (see below example).

To resolve this conflict I think we need some notion about whether the
markings are explicitly specified or left as the defaults. If CREATE
FUNCTION doesn't specify any markings, then the dependency tracking
code can make one assumption and the parser can make the opposite
assumption. We need to sort out the actual syntax of CREATE FUNCTION,
and I'm starting to think we need some options syntax (similar to
storage parameters for CREATE TABLE).

> Why would a user define their own concatenation function?

It's more likely that someone combines a few primitive functions:

  CREATE OR REPLACE FUNCTION shorten(t TEXT) RETURNS TEXT
    LANGUAGE plpgsql AS $$
    BEGIN
      IF (length(t) < 4) THEN
        RETURN t;
      END IF;
      RETURN substr(t,1,1) ||
             (length(t) - 2)::text ||
             substr(t,length(t),1);
    END;
    $$;

  CREATE TABLE c(x TEXT COLLATE "C", y TEXT COLLATE "en_US");
  INSERT INTO c VALUES ('kuber','netes');

  SELECT x = y FROM c;
  ERROR:  could not determine which collation to use for string
comparison

  -- currently succeeds even though collation cannot be inferred
  SELECT shorten(x || y) FROM c;
   shorten 
  ---------
   k8s
  (1 row)

The example is a bit silly, but I think there are realistic cases along
those lines.

> Everything beyond that looks at the characters and needs to take 
> collation/ctype/etc. into account.

I'm not sure. My guess would be that the various kinds of markings you
might want (or no markings at all) are all common enough cases that
they shouldn't be ignored.

> 
> Perhaps pg_dump can apply some properties during upgrades?

Interesting idea. We'd still need to account for CREATE FUNCTION
statements that come from other places (e.g. direct from applications,
or migration scripts, or extension scripts).

> 
Regards,
        Jeff Davis

Re: [19] Proposal: function markers to indicate collation/ctype sensitivity

Reply via email to