Re: ANY_VALUE aggregate

Vik Fearing Thu, 09 Feb 2023 01:42:34 -0800

On 1/23/23 08:50, David Rowley wrote:

On Thu, 19 Jan 2023 at 06:01, Vik Fearing <v...@postgresfriends.org> wrote:

Thank you for the review.  Attached is a new version rebased to d540a02a72.


I've only a bunch of nit-picks, personal preferences and random
thoughts to offer as a review:

1. I'd be inclined *not* to mention the possible future optimisation in:

+ * Currently this just returns the first value, but in the future it might be
+ * able to signal to the aggregate that it does not need to be called anymore.

I think it's unlikely that the transfn would "signal" such a thing. It
seems more likely if we did anything about it that nodeAgg.c would
maybe have some additional knowledge not to call that function if the
agg state already has a value. Just so we're not preempting how we
might do such a thing in the future, it seems best just to remove the
mention of it. I don't really think it serves as a good reminder that
we might want to do this one day anyway.

Modified. My logic in having the transition function signal that it isfinished is to one day allow something like:


  array_agg(x order by y limit z)

2. +any_value_trans(PG_FUNCTION_ARGS)

Many of transition function names end in "transfn", not "trans". I
think it's better to follow the existing (loosely followed) naming
pattern that a few aggregates seem to follow rather than invent a new
one.


Renamed.

3. I tend to try to copy the capitalisation of keywords from the
surrounding regression tests. I see the following breaks that.

+SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v);

(obviously, ideally, we'd always just follow the same capitalisation
of keywords everywhere in each .sql file, but we've long broken that
and the best way can do is be consistent with surrounding tests)


Downcased.

4. I think I'd use the word "Returns" instead of "Chooses" in:

+        Chooses a non-deterministic value from the non-null input values.


Done.

5. I've not managed to find a copy of the 2023 draft, so I'm assuming
you've got the ignoring of NULLs correct.

Yes, I do. This is part of <computational operation>, so SQL:2016 10.9GR 7.a applies.

6. Is it worth adding a WindowFunc test somewhere in window.sql with
an any_value(...) over (...)?  Is what any_value() returns as a
WindowFunc equally as non-deterministic as when it's used as an
Aggref? Can we assume there's no guarantee that it'll return the same
value for each partition in each row? Does the spec mention anything
about that?

This is governed by SQL:2016 10.9 GR 1.d and 1.e which defines thesource rows for the aggregate: either a group or a window frame. Thereis no difference in behavior. I don't think a windowed test is usefulhere unless I were to implement moving transitions. I think that mightbe overkill for this function.

7. I wondered if it's worth adding a
SupportRequestOptimizeWindowClause support function for this
aggregate. I'm thinking that it might not be as likely people would
use something more specific like first_value/nth_value/last_value
instead of using any_value as a WindowFunc. Also, I'm currently
thinking that a SupportRequestWFuncMonotonic for any_value() is not
worth the dozen or so lines of code it would take to write it. I'm
assuming it would always be a MONOTONICFUNC_BOTH function. It seems
unlikely that someone would have a subquery with a WHERE clause in the
upper-level query referencing the any_value() aggregate.  Thought I'd
mention both of these things anyway as someone else might think of
some good reason we should add them that I didn't think of.


I thought about this for a while and decided that it was not worthwhile.

v4 attached. I am putting this back to Needs Review in the commitfestapp, but these changes were editorial so it is probably RfC like Peterhad set it. I will let you be the judge of that.

--
Vik Fearing

From 410751cfa6367e5436e20011f3f47f37888190a1 Mon Sep 17 00:00:00 2001
From: Vik Fearing <v...@postgresfriends.org>
Date: Thu, 9 Feb 2023 10:37:10 +0100
Subject: [PATCH v4] Implement ANY_VALUE aggregate

SQL:2023 defines an ANY_VALUE aggregate whose purpose is to emit an
implementation-dependent (i.e. non-deterministic) value from the
aggregated rows.
---
 doc/src/sgml/func.sgml                   | 14 ++++++++++++++
 src/backend/catalog/sql_features.txt     |  1 +
 src/backend/utils/adt/misc.c             | 10 ++++++++++
 src/include/catalog/pg_aggregate.dat     |  4 ++++
 src/include/catalog/pg_proc.dat          |  8 ++++++++
 src/test/regress/expected/aggregates.out | 24 ++++++++++++++++++++++++
 src/test/regress/sql/aggregates.sql      |  6 ++++++
 7 files changed, 67 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..8bdef6eb32 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19735,6 +19735,20 @@ SELECT NULLIF(value, '(none)') ...
      </thead>
 
      <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>any_value</primary>
+        </indexterm>
+        <function>any_value</function> ( <type>anyelement</type> )
+        <returnvalue><replaceable>same as input type</replaceable></returnvalue>
+       </para>
+       <para>
+        Returns a non-deterministic value from the non-null input values.
+       </para></entry>
+       <entry>Yes</entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..0eb905c177 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -520,6 +520,7 @@ T622	Trigonometric functions			YES
 T623	General logarithm functions			YES	
 T624	Common logarithm functions			YES	
 T625	LISTAGG			NO	
+T626	ANY_VALUE			YES	
 T631	IN predicate with one list element			YES	
 T641	Multiple column assignment			NO	only some syntax variants supported
 T651	SQL-schema statements in SQL routines			YES	
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 220ddb8c01..0f0010b06d 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -1041,3 +1041,13 @@ pg_get_replica_identity_index(PG_FUNCTION_ARGS)
 	else
 		PG_RETURN_NULL();
 }
+
+/*
+ * Transition function for the ANY_VALUE aggregate
+ */
+Datum
+any_value_transfn(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+}
+
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 4fea7d8dc1..d7895cd676 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -634,4 +634,8 @@
   aggfinalfn => 'dense_rank_final', aggfinalextra => 't', aggfinalmodify => 'w',
   aggmfinalmodify => 'w', aggtranstype => 'internal' },
 
+# any_value
+{ aggfnoid => 'any_value(anyelement)', aggtransfn => 'any_value_transfn',
+  aggcombinefn => 'any_value_transfn', aggtranstype => 'anyelement' },
+
 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..d03a72daf0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11918,4 +11918,12 @@
   prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
   prosrc => 'brin_minmax_multi_summary_send' },
 
+{ oid => '8981', descr => 'arbitrary value from among input values',
+  proname => 'any_value', prokind => 'a', proisstrict => 'f',
+  prorettype => 'anyelement', proargtypes => 'anyelement',
+  prosrc => 'aggregate_dummy' },
+{ oid => '8982', descr => 'any_value transition function',
+  proname => 'any_value_transfn', prorettype => 'anyelement', proargtypes => 'anyelement anyelement',
+  prosrc => 'any_value_transfn' },
+
 ]
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 82d0961524..95bf49f3d9 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -25,6 +25,24 @@ SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
  32.6666666666666667
 (1 row)
 
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+ any_value 
+-----------
+         1
+(1 row)
+
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+ any_value 
+-----------
+ 
+(1 row)
+
+SELECT any_value(v) FROM (VALUES (array['hello', 'world'])) AS v (v);
+   any_value   
+---------------
+ {hello,world}
+(1 row)
+
 -- In 7.1, avg(float4) is computed using float8 arithmetic.
 -- Round the result to 3 digits to avoid platform-specific results.
 SELECT avg(b)::numeric(10,3) AS avg_107_943 FROM aggtest;
@@ -2025,6 +2043,12 @@ from (values ('a', 'b')) AS v(foo,bar);
  a
 (1 row)
 
+select any_value(v) filter (where v > 2) from (values (1), (2), (3)) as v (v);
+ any_value 
+-----------
+         3
+(1 row)
+
 -- outer reference in FILTER (PostgreSQL extension)
 select (select count(*)
         from (values (1)) t0(inner_c))
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index e81a22465b..e83c984ed0 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -24,6 +24,10 @@ SELECT avg(four) AS avg_1 FROM onek;
 
 SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
 
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+SELECT any_value(v) FROM (VALUES (array['hello', 'world'])) AS v (v);
+
 -- In 7.1, avg(float4) is computed using float8 arithmetic.
 -- Round the result to 3 digits to avoid platform-specific results.
 
@@ -806,6 +810,8 @@ having exists (select 1 from onek b where sum(distinct a.four) = b.four);
 select max(foo COLLATE "C") filter (where (bar collate "POSIX") > '0')
 from (values ('a', 'b')) AS v(foo,bar);
 
+select any_value(v) filter (where v > 2) from (values (1), (2), (3)) as v (v);
+
 -- outer reference in FILTER (PostgreSQL extension)
 select (select count(*)
         from (values (1)) t0(inner_c))

base-commit: ef7002dbe0b06e4e5b42c89becd4eb9be2e9aa89
-- 
2.34.1

Re: ANY_VALUE aggregate

Reply via email to