On 1/23/23 08:50, David Rowley wrote:
On Thu, 19 Jan 2023 at 06:01, Vik Fearing <v...@postgresfriends.org> wrote:
Thank you for the review. Attached is a new version rebased to d540a02a72.
I've only a bunch of nit-picks, personal preferences and random
thoughts to offer as a review:
1. I'd be inclined *not* to mention the possible future optimisation in:
+ * Currently this just returns the first value, but in the future it might be
+ * able to signal to the aggregate that it does not need to be called anymore.
I think it's unlikely that the transfn would "signal" such a thing. It
seems more likely if we did anything about it that nodeAgg.c would
maybe have some additional knowledge not to call that function if the
agg state already has a value. Just so we're not preempting how we
might do such a thing in the future, it seems best just to remove the
mention of it. I don't really think it serves as a good reminder that
we might want to do this one day anyway.
Modified. My logic in having the transition function signal that it is
finished is to one day allow something like:
array_agg(x order by y limit z)
2. +any_value_trans(PG_FUNCTION_ARGS)
Many of transition function names end in "transfn", not "trans". I
think it's better to follow the existing (loosely followed) naming
pattern that a few aggregates seem to follow rather than invent a new
one.
Renamed.
3. I tend to try to copy the capitalisation of keywords from the
surrounding regression tests. I see the following breaks that.
+SELECT any_value(v) FILTER (WHERE v > 2) FROM (VALUES (1), (2), (3)) AS v (v);
(obviously, ideally, we'd always just follow the same capitalisation
of keywords everywhere in each .sql file, but we've long broken that
and the best way can do is be consistent with surrounding tests)
Downcased.
4. I think I'd use the word "Returns" instead of "Chooses" in:
+ Chooses a non-deterministic value from the non-null input values.
Done.
5. I've not managed to find a copy of the 2023 draft, so I'm assuming
you've got the ignoring of NULLs correct.
Yes, I do. This is part of <computational operation>, so SQL:2016 10.9
GR 7.a applies.
6. Is it worth adding a WindowFunc test somewhere in window.sql with
an any_value(...) over (...)? Is what any_value() returns as a
WindowFunc equally as non-deterministic as when it's used as an
Aggref? Can we assume there's no guarantee that it'll return the same
value for each partition in each row? Does the spec mention anything
about that?
This is governed by SQL:2016 10.9 GR 1.d and 1.e which defines the
source rows for the aggregate: either a group or a window frame. There
is no difference in behavior. I don't think a windowed test is useful
here unless I were to implement moving transitions. I think that might
be overkill for this function.
7. I wondered if it's worth adding a
SupportRequestOptimizeWindowClause support function for this
aggregate. I'm thinking that it might not be as likely people would
use something more specific like first_value/nth_value/last_value
instead of using any_value as a WindowFunc. Also, I'm currently
thinking that a SupportRequestWFuncMonotonic for any_value() is not
worth the dozen or so lines of code it would take to write it. I'm
assuming it would always be a MONOTONICFUNC_BOTH function. It seems
unlikely that someone would have a subquery with a WHERE clause in the
upper-level query referencing the any_value() aggregate. Thought I'd
mention both of these things anyway as someone else might think of
some good reason we should add them that I didn't think of.
I thought about this for a while and decided that it was not worthwhile.
v4 attached. I am putting this back to Needs Review in the commitfest
app, but these changes were editorial so it is probably RfC like Peter
had set it. I will let you be the judge of that.
--
Vik Fearing
From 410751cfa6367e5436e20011f3f47f37888190a1 Mon Sep 17 00:00:00 2001
From: Vik Fearing <v...@postgresfriends.org>
Date: Thu, 9 Feb 2023 10:37:10 +0100
Subject: [PATCH v4] Implement ANY_VALUE aggregate
SQL:2023 defines an ANY_VALUE aggregate whose purpose is to emit an
implementation-dependent (i.e. non-deterministic) value from the
aggregated rows.
---
doc/src/sgml/func.sgml | 14 ++++++++++++++
src/backend/catalog/sql_features.txt | 1 +
src/backend/utils/adt/misc.c | 10 ++++++++++
src/include/catalog/pg_aggregate.dat | 4 ++++
src/include/catalog/pg_proc.dat | 8 ++++++++
src/test/regress/expected/aggregates.out | 24 ++++++++++++++++++++++++
src/test/regress/sql/aggregates.sql | 6 ++++++
7 files changed, 67 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..8bdef6eb32 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19735,6 +19735,20 @@ SELECT NULLIF(value, '(none)') ...
</thead>
<tbody>
+ <row>
+ <entry role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>any_value</primary>
+ </indexterm>
+ <function>any_value</function> ( <type>anyelement</type> )
+ <returnvalue><replaceable>same as input type</replaceable></returnvalue>
+ </para>
+ <para>
+ Returns a non-deterministic value from the non-null input values.
+ </para></entry>
+ <entry>Yes</entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/catalog/sql_features.txt b/src/backend/catalog/sql_features.txt
index 3766762ae3..0eb905c177 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -520,6 +520,7 @@ T622 Trigonometric functions YES
T623 General logarithm functions YES
T624 Common logarithm functions YES
T625 LISTAGG NO
+T626 ANY_VALUE YES
T631 IN predicate with one list element YES
T641 Multiple column assignment NO only some syntax variants supported
T651 SQL-schema statements in SQL routines YES
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index 220ddb8c01..0f0010b06d 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -1041,3 +1041,13 @@ pg_get_replica_identity_index(PG_FUNCTION_ARGS)
else
PG_RETURN_NULL();
}
+
+/*
+ * Transition function for the ANY_VALUE aggregate
+ */
+Datum
+any_value_transfn(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_DATUM(PG_GETARG_DATUM(0));
+}
+
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 4fea7d8dc1..d7895cd676 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -634,4 +634,8 @@
aggfinalfn => 'dense_rank_final', aggfinalextra => 't', aggfinalmodify => 'w',
aggmfinalmodify => 'w', aggtranstype => 'internal' },
+# any_value
+{ aggfnoid => 'any_value(anyelement)', aggtransfn => 'any_value_transfn',
+ aggcombinefn => 'any_value_transfn', aggtranstype => 'anyelement' },
+
]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..d03a72daf0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11918,4 +11918,12 @@
prorettype => 'bytea', proargtypes => 'pg_brin_minmax_multi_summary',
prosrc => 'brin_minmax_multi_summary_send' },
+{ oid => '8981', descr => 'arbitrary value from among input values',
+ proname => 'any_value', prokind => 'a', proisstrict => 'f',
+ prorettype => 'anyelement', proargtypes => 'anyelement',
+ prosrc => 'aggregate_dummy' },
+{ oid => '8982', descr => 'any_value transition function',
+ proname => 'any_value_transfn', prorettype => 'anyelement', proargtypes => 'anyelement anyelement',
+ prosrc => 'any_value_transfn' },
+
]
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 82d0961524..95bf49f3d9 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -25,6 +25,24 @@ SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
32.6666666666666667
(1 row)
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+ any_value
+-----------
+ 1
+(1 row)
+
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+ any_value
+-----------
+
+(1 row)
+
+SELECT any_value(v) FROM (VALUES (array['hello', 'world'])) AS v (v);
+ any_value
+---------------
+ {hello,world}
+(1 row)
+
-- In 7.1, avg(float4) is computed using float8 arithmetic.
-- Round the result to 3 digits to avoid platform-specific results.
SELECT avg(b)::numeric(10,3) AS avg_107_943 FROM aggtest;
@@ -2025,6 +2043,12 @@ from (values ('a', 'b')) AS v(foo,bar);
a
(1 row)
+select any_value(v) filter (where v > 2) from (values (1), (2), (3)) as v (v);
+ any_value
+-----------
+ 3
+(1 row)
+
-- outer reference in FILTER (PostgreSQL extension)
select (select count(*)
from (values (1)) t0(inner_c))
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index e81a22465b..e83c984ed0 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -24,6 +24,10 @@ SELECT avg(four) AS avg_1 FROM onek;
SELECT avg(a) AS avg_32 FROM aggtest WHERE a < 100;
+SELECT any_value(v) FROM (VALUES (1)) AS v (v);
+SELECT any_value(v) FROM (VALUES (NULL)) AS v (v);
+SELECT any_value(v) FROM (VALUES (array['hello', 'world'])) AS v (v);
+
-- In 7.1, avg(float4) is computed using float8 arithmetic.
-- Round the result to 3 digits to avoid platform-specific results.
@@ -806,6 +810,8 @@ having exists (select 1 from onek b where sum(distinct a.four) = b.four);
select max(foo COLLATE "C") filter (where (bar collate "POSIX") > '0')
from (values ('a', 'b')) AS v(foo,bar);
+select any_value(v) filter (where v > 2) from (values (1), (2), (3)) as v (v);
+
-- outer reference in FILTER (PostgreSQL extension)
select (select count(*)
from (values (1)) t0(inner_c))
base-commit: ef7002dbe0b06e4e5b42c89becd4eb9be2e9aa89
--
2.34.1