On Wed, Oct 25, 2023 at 05:10:17PM -0700, David G. Johnston wrote:
> The paragraph leading into the last added example needs to be tweaked:
>
> If DISTINCT is specified within an aggregate, the data is sorted in ascending
> order while extracting unique values. You can add an ORDER BY clause, limited
> to expressions matching the regular arguments of the aggregate, to sort the
> output in descending order.
>
> (show existing - DISTINCT only - example here)
>
> <programlisting>
> WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
> SELECT string_agg(DISTINCT v::text, ';' ORDER BY v::text DESC) FROM vals;
> string_agg
> -----------
> 4;3;2;1
> </programlisting>
>
> (existing note)
I see what you mean. I added an example that doesn't match the existing
paragraph. I have rewritten the paragraph and used a relevant example;
patch attached.
> Question: Do you know whether we for certain always sort ascending here to
> compute the unique values or whether if, say, there is an index on the column
> in descending order (or ascending and traversed backwards) that the data
> within
> the aggregate could, with an order by, be returned in descending order? If it
> is ascending, is that part of the SQL Standard (since it doesn't even allow an
> order by to give the user the ability the control the output ordering) or does
> the SQL Standard expect that even a random order would be fine since there are
> algorithms that can be used that do not involve sorting the input?
I don't think order is ever guaranteed in the standard without an ORDER
BY.
> It seems redundant to first say "regular arguments" then negate it in order to
> say "DISTINCT list". Using the positive form with "DISTINCT list" should get
> the point across sufficiently and succinctly. It also avoids me feeling like
> there should be an example of what happens when you do "sort on an expression
> that is not included in the DISTINCT list".
Agreed, I rewrote that.
> Interestingly:
>
> WITH vals (v,l) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
> SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM vals;
>
> ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in
> argument list
> LINE 2: SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM...
>
> But both expressions in the argument list (el and semicolon) do appear in the
> ORDER BY...
I think ORDER BY has to match DISTINCT columns, while you are using ';'.
I used a simpler example with array_agg() in my patch to avoid the issue.
--
Bruce Momjian <[email protected]> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c3e940afe..3b49e63987 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20273,6 +20273,13 @@ SELECT NULLIF(value, '(none)') ...
aggregation.
</para>
+ <para>
+ While all aggregates below accept an optional
+ <literal>ORDER BY</literal> clause (as outlined in <xref
+ linkend="syntax-aggregates"/>), the clause has only been added to
+ aggregates whose output is affected by ordering.
+ </para>
+
<table id="functions-aggregate-table">
<title>General-Purpose Aggregate Functions</title>
<tgroup cols="2">
@@ -20310,7 +20317,7 @@ SELECT NULLIF(value, '(none)') ...
<indexterm>
<primary>array_agg</primary>
</indexterm>
- <function>array_agg</function> ( <type>anynonarray</type> )
+ <function>array_agg</function> ( <type>anynonarray</type> <literal>ORDER BY</literal> <literal>input_sort_columns</literal> )
<returnvalue>anyarray</returnvalue>
</para>
<para>
@@ -20321,7 +20328,7 @@ SELECT NULLIF(value, '(none)') ...
<row>
<entry role="func_table_entry"><para role="func_signature">
- <function>array_agg</function> ( <type>anyarray</type> )
+ <function>array_agg</function> ( <type>anyarray</type> <literal>ORDER BY</literal> <literal>input_sort_columns</literal> )
<returnvalue>anyarray</returnvalue>
</para>
<para>
@@ -20526,14 +20533,14 @@ SELECT NULLIF(value, '(none)') ...
<indexterm>
<primary>json_agg</primary>
</indexterm>
- <function>json_agg</function> ( <type>anyelement</type> )
+ <function>json_agg</function> ( <type>anyelement</type> <literal>ORDER BY</literal> <literal>input_sort_columns</literal> )
<returnvalue>json</returnvalue>
</para>
<para role="func_signature">
<indexterm>
<primary>jsonb_agg</primary>
</indexterm>
- <function>jsonb_agg</function> ( <type>anyelement</type> )
+ <function>jsonb_agg</function> ( <type>anyelement</type> <literal>ORDER BY</literal> <literal>input_sort_columns</literal> )
<returnvalue>jsonb</returnvalue>
</para>
<para>
@@ -20573,7 +20580,8 @@ SELECT NULLIF(value, '(none)') ...
</indexterm>
<function>json_object_agg</function> ( <parameter>key</parameter>
<type>"any"</type>, <parameter>value</parameter>
- <type>"any"</type> )
+ <type>"any"</type>
+ <literal>ORDER BY</literal> <literal>input_sort_columns</literal> )
<returnvalue>json</returnvalue>
</para>
<para role="func_signature">
@@ -20582,7 +20590,8 @@ SELECT NULLIF(value, '(none)') ...
</indexterm>
<function>jsonb_object_agg</function> ( <parameter>key</parameter>
<type>"any"</type>, <parameter>value</parameter>
- <type>"any"</type> )
+ <type>"any"</type>
+ <literal>ORDER BY</literal> <literal>input_sort_columns</literal> )
<returnvalue>jsonb</returnvalue>
</para>
<para>
@@ -20819,7 +20828,8 @@ SELECT NULLIF(value, '(none)') ...
</para>
<para role="func_signature">
<function>string_agg</function> ( <parameter>value</parameter>
- <type>bytea</type>, <parameter>delimiter</parameter> <type>bytea</type> )
+ <type>bytea</type>, <parameter>delimiter</parameter> <type>bytea</type>
+ <literal>ORDER BY</literal> <literal>input_sort_columns</literal> )
<returnvalue>bytea</returnvalue>
</para>
<para>
@@ -20877,7 +20887,7 @@ SELECT NULLIF(value, '(none)') ...
<indexterm>
<primary>xmlagg</primary>
</indexterm>
- <function>xmlagg</function> ( <type>xml</type> )
+ <function>xmlagg</function> ( <type>xml</type> <literal>ORDER BY</literal> <literal>input_sort_columns</literal> )
<returnvalue>xml</returnvalue>
</para>
<para>
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 3ba844057f..c5627001c7 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1647,7 +1647,26 @@ sqrt(2)
are always just expressions and cannot be output-column names or numbers.
For example:
<programlisting>
-SELECT array_agg(a ORDER BY b DESC) FROM table;
+WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
+SELECT array_agg(v ORDER BY v DESC) FROM vals;
+ array_agg
+-------------
+ {4,3,3,2,1}
+</programlisting>
+ Since <type>jsonb</type> only keeps the last matching key, ordering
+ of its keys can be significant:
+<programlisting>
+WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') )
+SELECT jsonb_object_agg(k, v) FROM vals;
+ jsonb_object_agg
+----------------------------
+ {"key0": "1", "key1": "2"}
+
+WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') )
+SELECT jsonb_object_agg(k, v ORDER BY v) FROM vals;
+ jsonb_object_agg
+----------------------------
+ {"key0": "1", "key1": "3"}
</programlisting>
</para>
@@ -1668,11 +1687,17 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
</para>
<para>
- If <literal>DISTINCT</literal> is specified in addition to an
- <replaceable>order_by_clause</replaceable>, then all the <literal>ORDER BY</literal>
- expressions must match regular arguments of the aggregate; that is,
- you cannot sort on an expression that is not included in the
- <literal>DISTINCT</literal> list.
+ If <literal>DISTINCT</literal> is specified with an
+ <replaceable>order_by_clause</replaceable>, <literal>ORDER
+ BY</literal> expressions can only reference columns in the
+ <literal>DISTINCT</literal> list. For example:
+<programlisting>
+WITH vals (v1, v2) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
+SELECT array_agg(DISTINCT v2 ORDER BY v2 DESC) FROM vals;
+ array_agg
+-------------
+ {Z,T,R,D,A}
+</programlisting>
</para>
<note>