Re: Document aggregate functions better w.r.t. ORDER BY

2023-11-03 Thread Bruce Momjian
On Thu, Oct 26, 2023 at 04:05:12PM -0700, David G. Johnston wrote:
> On Thu, Oct 26, 2023 at 4:03 PM Bruce Momjian  wrote:
> 
> 
> Sure, done in the attached patch.
> 
> 
> 
> WFM.  Thank You!

Patch applied to master.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-26 Thread David G. Johnston
On Thu, Oct 26, 2023 at 4:03 PM Bruce Momjian  wrote:

>
> Sure, done in the attached patch.
>
>
WFM.  Thank You!

David J.


Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-26 Thread Bruce Momjian
On Thu, Oct 26, 2023 at 03:44:14PM -0700, David G. Johnston wrote:
> On Thu, Oct 26, 2023 at 3:36 PM Bruce Momjian  wrote:
> 
> No sneaking.  ;-)  It would be bad to document this unevenly because it
> sets expectations in other parts of the system if we don't mention it.
> 
> 
> Agreed.
> 
> Last suggestion, remove the first jsonb_agg example that lacks an order by.
> 
> +WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') )
> +SELECT jsonb_object_agg(k, v) FROM vals;
> +      jsonb_object_agg
> +
> + {"key0": "1", "key1": "2"}
> +
> 
> We shouldn't write an example that relies on the rows being evaluated 1-2-3
> without specifying an order by clause.

Sure, done in the attached patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c3e940afe..3b49e63987 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20273,6 +20273,13 @@ SELECT NULLIF(value, '(none)') ...
aggregation.
   
 
+  
+   While all aggregates below accept an optional
+   ORDER BY clause (as outlined in ), the clause has only been added to
+   aggregates whose output is affected by ordering.
+  
+

 General-Purpose Aggregate Functions
 
@@ -20310,7 +20317,7 @@ SELECT NULLIF(value, '(none)') ...
 
  array_agg
 
-array_agg ( anynonarray )
+array_agg ( anynonarray ORDER BY input_sort_columns )
 anyarray


@@ -20321,7 +20328,7 @@ SELECT NULLIF(value, '(none)') ...
 
   

-array_agg ( anyarray )
+array_agg ( anyarray ORDER BY input_sort_columns )
 anyarray


@@ -20526,14 +20533,14 @@ SELECT NULLIF(value, '(none)') ...
 
  json_agg
 
-json_agg ( anyelement )
+json_agg ( anyelement ORDER BY input_sort_columns )
 json


 
  jsonb_agg
 
-jsonb_agg ( anyelement )
+jsonb_agg ( anyelement ORDER BY input_sort_columns )
 jsonb


@@ -20573,7 +20580,8 @@ SELECT NULLIF(value, '(none)') ...
 
 json_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 json


@@ -20582,7 +20590,8 @@ SELECT NULLIF(value, '(none)') ...
 
 jsonb_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 jsonb


@@ -20819,7 +20828,8 @@ SELECT NULLIF(value, '(none)') ...


 string_agg ( value
- bytea, delimiter bytea )
+ bytea, delimiter bytea
+ ORDER BY input_sort_columns )
 bytea


@@ -20877,7 +20887,7 @@ SELECT NULLIF(value, '(none)') ...
 
  xmlagg
 
-xmlagg ( xml )
+xmlagg ( xml ORDER BY input_sort_columns )
 xml


diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 3ba844057f..80d71d362f 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1647,7 +1647,19 @@ sqrt(2)
 are always just expressions and cannot be output-column names or numbers.
 For example:
 
-SELECT array_agg(a ORDER BY b DESC) FROM table;
+WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
+SELECT array_agg(v ORDER BY v DESC) FROM vals;
+  array_agg
+-
+ {4,3,3,2,1}
+
+Since jsonb only keeps the last matching key, ordering
+of its keys can be significant:
+WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') )
+SELECT jsonb_object_agg(k, v ORDER BY v) FROM vals;
+  jsonb_object_agg
+
+ {"key0": "1", "key1": "3"}
 

 
@@ -1668,20 +1680,19 @@ SELECT string_agg(a ORDER BY a, ',') FROM table;  -- incorrect

 

-If DISTINCT is specified in addition to an
-order_by_clause, then all the ORDER BY
-expressions must match regular arguments of the aggregate; that is,
-you cannot sort on an expression that is not included in the
-DISTINCT list.
+If DISTINCT is specified with an
+order_by_clause, ORDER
+BY expressions can only reference columns in the
+DISTINCT list.  For example:
+
+WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
+SELECT array_agg(DISTINCT v ORDER BY v DESC) FROM vals;
+ array_agg
+---
+ {4,3,2,1}
+

 
-   
-
- The ability to specify both DISTINCT and ORDER BY
- in an aggregate function is a PostgreSQL extension.
-
-   
-

 Placing ORDER BY within the aggregate's regular argument
 list, as described so far, is used when ordering the input rows for


Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-26 Thread David G. Johnston
On Thu, Oct 26, 2023 at 3:36 PM Bruce Momjian  wrote:

> No sneaking.  ;-)  It would be bad to document this unevenly because it
> sets expectations in other parts of the system if we don't mention it.
>

Agreed.

Last suggestion, remove the first jsonb_agg example that lacks an order by.

+WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') )
+SELECT jsonb_object_agg(k, v) FROM vals;
+  jsonb_object_agg
+
+ {"key0": "1", "key1": "2"}
+

We shouldn't write an example that relies on the rows being evaluated 1-2-3
without specifying an order by clause.

David J.


Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-26 Thread Bruce Momjian
On Thu, Oct 26, 2023 at 03:09:26PM -0700, David G. Johnston wrote:
> On Thu, Oct 26, 2023 at 2:56 PM Bruce Momjian  wrote:
> 
> On Wed, Oct 25, 2023 at 10:34:10PM -0700, David G. Johnston wrote:
> > I would reword the existing note to be something like:
> >
> > The SQL Standard defines specific aggregates and their properties,
> including
> > which of DISTINCT and/or ORDER BY is allowed.  Due to the extensible
> nature of
> > PostgreSQL it accepts either or both clauses for any aggregate.
> 
> Uh, is this something in my patch or somewhere else?  I don't think
> PostgreSQL extensible is an example of syntax flexibility.
> 
> 
> https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES
> 
> Note
> The ability to specify both DISTINCT and ORDER BY in an aggregate function is 
> a
> PostgreSQL extension.
> 
> I am pointing out that the first sentence of the existing note above seems to
> be factually incorrect.  I tried to make it correct - while explaining why we
> differ.  Though in truth I'd probably rather just remove the note.

Agreed, removed, patch attached.  This is just too complex to specify.

> > We get enough complaints regarding "apparent ordering" that I would like
> to
> > add:
> >
> > As a reminder, while some DISTINCT processing algorithms produce sorted
> output
> > as a side-effect, only by specifying ORDER BY is the output order
> guaranteed.
> 
> Well, we need to create a new email thread for this and look at all the
> areas is applies to since this is a much larger issue.
> 
> I was hoping to sneak this one in regardless of the bigger picture issues,
> since this specific combination is guaranteed to output ordered presently.

No sneaking.  ;-)  It would be bad to document this unevenly because it
sets expectations in other parts of the system if we don't mention it.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c3e940afe..3b49e63987 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20273,6 +20273,13 @@ SELECT NULLIF(value, '(none)') ...
aggregation.
   
 
+  
+   While all aggregates below accept an optional
+   ORDER BY clause (as outlined in ), the clause has only been added to
+   aggregates whose output is affected by ordering.
+  
+

 General-Purpose Aggregate Functions
 
@@ -20310,7 +20317,7 @@ SELECT NULLIF(value, '(none)') ...
 
  array_agg
 
-array_agg ( anynonarray )
+array_agg ( anynonarray ORDER BY input_sort_columns )
 anyarray


@@ -20321,7 +20328,7 @@ SELECT NULLIF(value, '(none)') ...
 
   

-array_agg ( anyarray )
+array_agg ( anyarray ORDER BY input_sort_columns )
 anyarray


@@ -20526,14 +20533,14 @@ SELECT NULLIF(value, '(none)') ...
 
  json_agg
 
-json_agg ( anyelement )
+json_agg ( anyelement ORDER BY input_sort_columns )
 json


 
  jsonb_agg
 
-jsonb_agg ( anyelement )
+jsonb_agg ( anyelement ORDER BY input_sort_columns )
 jsonb


@@ -20573,7 +20580,8 @@ SELECT NULLIF(value, '(none)') ...
 
 json_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 json


@@ -20582,7 +20590,8 @@ SELECT NULLIF(value, '(none)') ...
 
 jsonb_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 jsonb


@@ -20819,7 +20828,8 @@ SELECT NULLIF(value, '(none)') ...


 string_agg ( value
- bytea, delimiter bytea )
+ bytea, delimiter bytea
+ ORDER BY input_sort_columns )
 bytea


@@ -20877,7 +20887,7 @@ SELECT NULLIF(value, '(none)') ...
 
  xmlagg
 
-xmlagg ( xml )
+xmlagg ( xml ORDER BY input_sort_columns )
 xml


diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 3ba844057f..ec089fac06 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1647,7 +1647,26 @@ sqrt(2)
 are always just expressions and cannot be output-column names or numbers.
 For example:
 
-SELECT array_agg(a ORDER BY b DESC) FROM table;
+WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
+SELECT array_agg(v ORDER BY v DESC) FROM vals;
+  array_agg
+-
+ {4,3,3,2,1}
+
+Since jsonb only keeps the last matching key, ordering
+of its keys can be significant:
+
+WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') )
+SELECT jsonb_object_agg(k, v) FROM 

Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-26 Thread David G. Johnston
On Thu, Oct 26, 2023 at 2:56 PM Bruce Momjian  wrote:

> On Wed, Oct 25, 2023 at 10:34:10PM -0700, David G. Johnston wrote:
> > I would reword the existing note to be something like:
> >
> > The SQL Standard defines specific aggregates and their properties,
> including
> > which of DISTINCT and/or ORDER BY is allowed.  Due to the extensible
> nature of
> > PostgreSQL it accepts either or both clauses for any aggregate.
>
> Uh, is this something in my patch or somewhere else?  I don't think
> PostgreSQL extensible is an example of syntax flexibility.
>

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES

Note
The ability to specify both DISTINCT and ORDER BY in an aggregate function
is a PostgreSQL extension.

I am pointing out that the first sentence of the existing note above seems
to be factually incorrect.  I tried to make it correct - while explaining
why we differ.  Though in truth I'd probably rather just remove the note.

> We get enough complaints regarding "apparent ordering" that I would like
> to
> > add:
> >
> > As a reminder, while some DISTINCT processing algorithms produce sorted
> output
> > as a side-effect, only by specifying ORDER BY is the output order
> guaranteed.
>
> Well, we need to create a new email thread for this and look at all the
> areas is applies to since this is a much larger issue.
>
>
I was hoping to sneak this one in regardless of the bigger picture issues,
since this specific combination is guaranteed to output ordered presently.

David J.


Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-26 Thread Bruce Momjian
On Wed, Oct 25, 2023 at 10:34:10PM -0700, David G. Johnston wrote:
> I would reword the existing note to be something like:
> 
> The SQL Standard defines specific aggregates and their properties, including
> which of DISTINCT and/or ORDER BY is allowed.  Due to the extensible nature of
> PostgreSQL it accepts either or both clauses for any aggregate.

Uh, is this something in my patch or somewhere else?  I don't think
PostgreSQL extensible is an example of syntax flexibility.

> From the most recent patch:
> 
>     
> -    If DISTINCT is specified in addition to an
> -    order_by_clause, then all the ORDER 
> BY
> 
> -    expressions must match regular arguments of the aggregate; that is,
> -    you cannot sort on an expression that is not included in the
> -    DISTINCT list.
> +    If DISTINCT is specified with an
> +    order_by_clause, ORDER
> +    BY expressions can only reference columns in the
> +    DISTINCT list.  For example:
> +
> +WITH vals (v1, v2) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
> +SELECT array_agg(DISTINCT v2 ORDER BY v2 DESC) FROM vals;
> +  array_agg
> +-
> + {Z,T,R,D,A}
> +
> 
> The change to a two-column vals was mostly to try and find corner-cases that
> might need to be addressed.  If we don't intend to show the error case of
> DISTINCT v1 ORDER BY v2 then we should go back to the original example and 
> just
> add ORDER BY v DESC.  I'm fine with not using string_agg here.
> 
> +    For example:
> +
> +WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
> +SELECT array_agg(DISTINCT v ORDER BY v DESC) FROM vals;
> + array_agg
> +---
> + {4,3,2,1}
> +

Okay, good, switched in the attached patch.

> We get enough complaints regarding "apparent ordering" that I would like to
> add:
> 
> As a reminder, while some DISTINCT processing algorithms produce sorted output
> as a side-effect, only by specifying ORDER BY is the output order guaranteed.

Well, we need to create a new email thread for this and look at all the
areas is applies to since this is a much larger issue.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c3e940afe..3b49e63987 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20273,6 +20273,13 @@ SELECT NULLIF(value, '(none)') ...
aggregation.
   
 
+  
+   While all aggregates below accept an optional
+   ORDER BY clause (as outlined in ), the clause has only been added to
+   aggregates whose output is affected by ordering.
+  
+

 General-Purpose Aggregate Functions
 
@@ -20310,7 +20317,7 @@ SELECT NULLIF(value, '(none)') ...
 
  array_agg
 
-array_agg ( anynonarray )
+array_agg ( anynonarray ORDER BY input_sort_columns )
 anyarray


@@ -20321,7 +20328,7 @@ SELECT NULLIF(value, '(none)') ...
 
   

-array_agg ( anyarray )
+array_agg ( anyarray ORDER BY input_sort_columns )
 anyarray


@@ -20526,14 +20533,14 @@ SELECT NULLIF(value, '(none)') ...
 
  json_agg
 
-json_agg ( anyelement )
+json_agg ( anyelement ORDER BY input_sort_columns )
 json


 
  jsonb_agg
 
-jsonb_agg ( anyelement )
+jsonb_agg ( anyelement ORDER BY input_sort_columns )
 jsonb


@@ -20573,7 +20580,8 @@ SELECT NULLIF(value, '(none)') ...
 
 json_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 json


@@ -20582,7 +20590,8 @@ SELECT NULLIF(value, '(none)') ...
 
 jsonb_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 jsonb


@@ -20819,7 +20828,8 @@ SELECT NULLIF(value, '(none)') ...


 string_agg ( value
- bytea, delimiter bytea )
+ bytea, delimiter bytea
+ ORDER BY input_sort_columns )
 bytea


@@ -20877,7 +20887,7 @@ SELECT NULLIF(value, '(none)') ...
 
  xmlagg
 
-xmlagg ( xml )
+xmlagg ( xml ORDER BY input_sort_columns )
 xml


diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 3ba844057f..92336fb929 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1647,7 +1647,26 @@ sqrt(2)
 are always just expressions and cannot be output-column names or numbers.
 For example:
 
-SELECT array_agg(a ORDER BY b DESC) FROM table;
+WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
+SELECT array_agg(v ORDER BY v DESC) FROM vals;
+  array_agg
+-
+ {4,3,3,2,1}
+
+Since jsonb only keeps the last matching key, ordering
+of its 

Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-25 Thread David G. Johnston
On Wed, Oct 25, 2023 at 7:13 PM David Rowley  wrote:

> On Thu, 26 Oct 2023 at 13:10, David G. Johnston
>  wrote:
> > Question: Do you know whether we for certain always sort ascending here
> to compute the unique values or whether if, say, there is an index on the
> column in descending order (or ascending and traversed backwards) that the
> data within the aggregate could, with an order by, be returned in
> descending order?
>
> The way it's currently coded, we seem to always require ascending
> order.  See addTargetToGroupList().  The call to
> get_sort_group_operators() only requests the ltOpr.
>
> A quick test creating an index on a column with DESC shows that we end
> up doing a backwards index scan so that we get the requested ascending
> order:
>
> create table b (b text);
> create index on b (b desc);
> explain select string_agg(distinct b,',') from b;
> QUERY PLAN
>
> --
>  Aggregate  (cost=67.95..67.97 rows=1 width=32)
>->  Index Only Scan Backward using b_b_idx on b  (cost=0.15..64.55
> rows=1360 width=32)
> (2 rows)
>
> However, I think we'd best stay clear of offering any guarantees in
> the documents about this.  If we did that it would be much harder in
> the future if we wanted to implement the DISTINCT aggregates by
> hashing.
>
> So, I think we are mischaracterizing the Standard here, if only in the
specific case of array_agg.

SQL Standard: 4.16.4

Every unary aggregate function takes an arbitrary  as the
argument; most unary aggregate
functions can optionally be qualified with either DISTINCT or ALL.

If ARRAY_AGG is specified, then an array value with one element formed from
the 
evaluated for each row that qualifies.

Neither DISTINCT nor ALL are allowed to be specified for VAR_POP, VAR_SAMP,
STDDEV_POP, or
STDDEV_SAMP; redundant duplicates are not removed when computing these
functions.

10.9

 ::=
ARRAY_AGG
  [ ORDER BY  ]


I would reword the existing note to be something like:

The SQL Standard defines specific aggregates and their properties,
including which of DISTINCT and/or ORDER BY is allowed.  Due to the
extensible nature of PostgreSQL it accepts either or both clauses for any
aggregate.

>From the most recent patch:


-If DISTINCT is specified in addition to an
-order_by_clause, then all the
ORDER BY
-expressions must match regular arguments of the aggregate; that is,
-you cannot sort on an expression that is not included in the
-DISTINCT list.
+If DISTINCT is specified with an
+order_by_clause, ORDER
+BY expressions can only reference columns in the
+DISTINCT list.  For example:
+
+WITH vals (v1, v2) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
+SELECT array_agg(DISTINCT v2 ORDER BY v2 DESC) FROM vals;
+  array_agg
+-
+ {Z,T,R,D,A}
+

The change to a two-column vals was mostly to try and find corner-cases
that might need to be addressed.  If we don't intend to show the error case
of DISTINCT v1 ORDER BY v2 then we should go back to the original example
and just add ORDER BY v DESC.  I'm fine with not using string_agg here.

+For example:
+
+WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
+SELECT array_agg(DISTINCT v ORDER BY v DESC) FROM vals;
+ array_agg
+---
+ {4,3,2,1}
+

We get enough complaints regarding "apparent ordering" that I would like to
add:

As a reminder, while some DISTINCT processing algorithms produce sorted
output as a side-effect, only by specifying ORDER BY is the output order
guaranteed.

David J.


Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-25 Thread David Rowley
On Thu, 26 Oct 2023 at 13:10, David G. Johnston
 wrote:
> Question: Do you know whether we for certain always sort ascending here to 
> compute the unique values or whether if, say, there is an index on the column 
> in descending order (or ascending and traversed backwards) that the data 
> within the aggregate could, with an order by, be returned in descending order?

The way it's currently coded, we seem to always require ascending
order.  See addTargetToGroupList().  The call to
get_sort_group_operators() only requests the ltOpr.

A quick test creating an index on a column with DESC shows that we end
up doing a backwards index scan so that we get the requested ascending
order:

create table b (b text);
create index on b (b desc);
explain select string_agg(distinct b,',') from b;
QUERY PLAN
--
 Aggregate  (cost=67.95..67.97 rows=1 width=32)
   ->  Index Only Scan Backward using b_b_idx on b  (cost=0.15..64.55
rows=1360 width=32)
(2 rows)

However, I think we'd best stay clear of offering any guarantees in
the documents about this.  If we did that it would be much harder in
the future if we wanted to implement the DISTINCT aggregates by
hashing.

David




Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-25 Thread Bruce Momjian
On Wed, Oct 25, 2023 at 05:10:17PM -0700, David G. Johnston wrote:
> The paragraph leading into the last added example needs to be tweaked:
> 
> If DISTINCT is specified within an aggregate, the data is sorted in ascending
> order while extracting unique values.  You can add an ORDER BY clause, limited
> to expressions matching the regular arguments of the aggregate, to sort the
> output in descending order.
> 
> (show existing - DISTINCT only - example here)
> 
> 
> WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
> SELECT string_agg(DISTINCT v::text, ';' ORDER BY v::text DESC) FROM vals;
>  string_agg
> ---
>   4;3;2;1
> 
> 
> (existing note)

I see what you mean.  I added an example that doesn't match the existing
paragraph.  I have rewritten the paragraph and used a relevant example;
patch attached.

> Question: Do you know whether we for certain always sort ascending here to
> compute the unique values or whether if, say, there is an index on the column
> in descending order (or ascending and traversed backwards) that the data 
> within
> the aggregate could, with an order by, be returned in descending order?  If it
> is ascending, is that part of the SQL Standard (since it doesn't even allow an
> order by to give the user the ability the control the output ordering) or does
> the SQL Standard expect that even a random order would be fine since there are
> algorithms that can be used that do not involve sorting the input?

I don't think order is ever guaranteed in the standard without an ORDER
BY.

> It seems redundant to first say "regular arguments" then negate it in order to
> say "DISTINCT list".  Using the positive form with "DISTINCT list" should get
> the point across sufficiently and succinctly.  It also avoids me feeling like
> there should be an example of what happens when you do "sort on an expression
> that is not included in the DISTINCT list".

Agreed, I rewrote that.

> Interestingly:
> 
> WITH vals (v,l) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
> SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM vals;
> 
> ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in
> argument list
> LINE 2: SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM...
> 
> But both expressions in the argument list (el and semicolon) do appear in the
> ORDER BY...

I think ORDER BY has to match DISTINCT columns, while you are using ';'.
I used a simpler example with array_agg() in my patch to avoid the issue.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c3e940afe..3b49e63987 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20273,6 +20273,13 @@ SELECT NULLIF(value, '(none)') ...
aggregation.
   
 
+  
+   While all aggregates below accept an optional
+   ORDER BY clause (as outlined in ), the clause has only been added to
+   aggregates whose output is affected by ordering.
+  
+

 General-Purpose Aggregate Functions
 
@@ -20310,7 +20317,7 @@ SELECT NULLIF(value, '(none)') ...
 
  array_agg
 
-array_agg ( anynonarray )
+array_agg ( anynonarray ORDER BY input_sort_columns )
 anyarray


@@ -20321,7 +20328,7 @@ SELECT NULLIF(value, '(none)') ...
 
   

-array_agg ( anyarray )
+array_agg ( anyarray ORDER BY input_sort_columns )
 anyarray


@@ -20526,14 +20533,14 @@ SELECT NULLIF(value, '(none)') ...
 
  json_agg
 
-json_agg ( anyelement )
+json_agg ( anyelement ORDER BY input_sort_columns )
 json


 
  jsonb_agg
 
-jsonb_agg ( anyelement )
+jsonb_agg ( anyelement ORDER BY input_sort_columns )
 jsonb


@@ -20573,7 +20580,8 @@ SELECT NULLIF(value, '(none)') ...
 
 json_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 json


@@ -20582,7 +20590,8 @@ SELECT NULLIF(value, '(none)') ...
 
 jsonb_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 jsonb


@@ -20819,7 +20828,8 @@ SELECT NULLIF(value, '(none)') ...


 string_agg ( value
- bytea, delimiter bytea )
+ bytea, delimiter bytea
+ ORDER BY input_sort_columns )
 bytea


@@ -20877,7 +20887,7 @@ SELECT NULLIF(value, '(none)') ...
 
  xmlagg
 
-xmlagg ( xml )
+xmlagg ( xml ORDER BY input_sort_columns )
 xml


diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 3ba844057f..c5627001c7 100644
--- 

Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-25 Thread David G. Johnston
On Wed, Oct 25, 2023 at 4:22 PM Bruce Momjian  wrote:

> On Wed, Oct 25, 2023 at 04:14:11PM -0700, David G. Johnston wrote:
> > Yeah, we punt on the entire concept in the data type section:
> >
> > "Managing these errors and how they propagate through calculations is the
> > subject of an entire branch of mathematics and computer science and will
> not be
> > discussed here," ...
> >
> > Also, I'm now led to believe that the relevant IEEE 754 floating point
> addition
> > is indeed commutative.  Given that, I am inclined to simply not add the
> order
> > by clause at all to those four functions. (actually, you already got rid
> of the
> > avg()s but the sum()s are still present, so just those two).
>
> Ah, yes, sum() removed.  Updated patch attached.
>
>
The paragraph leading into the last added example needs to be tweaked:

If DISTINCT is specified within an aggregate, the data is sorted in
ascending order while extracting unique values.  You can add an ORDER BY
clause, limited to expressions matching the regular arguments of the
aggregate, to sort the output in descending order.

(show existing - DISTINCT only - example here)


WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
SELECT string_agg(DISTINCT v::text, ';' ORDER BY v::text DESC) FROM vals;
 string_agg
---
  4;3;2;1


(existing note)

Question: Do you know whether we for certain always sort ascending here to
compute the unique values or whether if, say, there is an index on the
column in descending order (or ascending and traversed backwards) that the
data within the aggregate could, with an order by, be returned in
descending order?  If it is ascending, is that part of the SQL Standard
(since it doesn't even allow an order by to give the user the ability the
control the output ordering) or does the SQL Standard expect that even a
random order would be fine since there are algorithms that can be used that
do not involve sorting the input?

It seems redundant to first say "regular arguments" then negate it in order
to say "DISTINCT list".  Using the positive form with "DISTINCT list"
should get the point across sufficiently and succinctly.  It also avoids me
feeling like there should be an example of what happens when you do "sort
on an expression that is not included in the DISTINCT list".

Interestingly:

WITH vals (v,l) AS ( VALUES (1,'Z'),(3,'D'),(4,'R'),(3,'A'),(2,'T') )
SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM vals;

ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in
argument list
LINE 2: SELECT string_agg(DISTINCT l, ';' ORDER BY l, ';' DESC) FROM...

But both expressions in the argument list (el and semicolon) do appear in
the ORDER BY...

David J.


Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-25 Thread Bruce Momjian
On Wed, Oct 25, 2023 at 04:14:11PM -0700, David G. Johnston wrote:
> Yeah, we punt on the entire concept in the data type section:
> 
> "Managing these errors and how they propagate through calculations is the
> subject of an entire branch of mathematics and computer science and will not 
> be
> discussed here," ...
> 
> Also, I'm now led to believe that the relevant IEEE 754 floating point 
> addition
> is indeed commutative.  Given that, I am inclined to simply not add the order
> by clause at all to those four functions. (actually, you already got rid of 
> the
> avg()s but the sum()s are still present, so just those two).

Ah, yes, sum() removed.  Updated patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c3e940afe..3b49e63987 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20273,6 +20273,13 @@ SELECT NULLIF(value, '(none)') ...
aggregation.
   
 
+  
+   While all aggregates below accept an optional
+   ORDER BY clause (as outlined in ), the clause has only been added to
+   aggregates whose output is affected by ordering.
+  
+

 General-Purpose Aggregate Functions
 
@@ -20310,7 +20317,7 @@ SELECT NULLIF(value, '(none)') ...
 
  array_agg
 
-array_agg ( anynonarray )
+array_agg ( anynonarray ORDER BY input_sort_columns )
 anyarray


@@ -20321,7 +20328,7 @@ SELECT NULLIF(value, '(none)') ...
 
   

-array_agg ( anyarray )
+array_agg ( anyarray ORDER BY input_sort_columns )
 anyarray


@@ -20526,14 +20533,14 @@ SELECT NULLIF(value, '(none)') ...
 
  json_agg
 
-json_agg ( anyelement )
+json_agg ( anyelement ORDER BY input_sort_columns )
 json


 
  jsonb_agg
 
-jsonb_agg ( anyelement )
+jsonb_agg ( anyelement ORDER BY input_sort_columns )
 jsonb


@@ -20573,7 +20580,8 @@ SELECT NULLIF(value, '(none)') ...
 
 json_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 json


@@ -20582,7 +20590,8 @@ SELECT NULLIF(value, '(none)') ...
 
 jsonb_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 jsonb


@@ -20819,7 +20828,8 @@ SELECT NULLIF(value, '(none)') ...


 string_agg ( value
- bytea, delimiter bytea )
+ bytea, delimiter bytea
+ ORDER BY input_sort_columns )
 bytea


@@ -20877,7 +20887,7 @@ SELECT NULLIF(value, '(none)') ...
 
  xmlagg
 
-xmlagg ( xml )
+xmlagg ( xml ORDER BY input_sort_columns )
 xml


diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 3ba844057f..b64733d8aa 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1647,7 +1647,26 @@ sqrt(2)
 are always just expressions and cannot be output-column names or numbers.
 For example:
 
-SELECT array_agg(a ORDER BY b DESC) FROM table;
+WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
+SELECT array_agg(v ORDER BY v DESC) FROM vals;
+  array_agg
+-
+ {4,3,3,2,1}
+
+Since jsonb only keeps the last matching key, ordering
+of its keys can be significant:
+
+WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') )
+SELECT jsonb_object_agg(k, v) FROM vals;
+  jsonb_object_agg
+
+ {"key0": "1", "key1": "2"}
+
+WITH vals (k, v) AS ( VALUES ('key0','1'), ('key1','3'), ('key1','2') )
+SELECT jsonb_object_agg(k, v ORDER BY v) FROM vals;
+  jsonb_object_agg
+
+ {"key0": "1", "key1": "3"}
 

 
@@ -1673,6 +1692,14 @@ SELECT string_agg(a ORDER BY a, ',') FROM table;  -- incorrect
 expressions must match regular arguments of the aggregate; that is,
 you cannot sort on an expression that is not included in the
 DISTINCT list.
+For example:
+
+WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
+SELECT array_agg(DISTINCT v) FROM vals;
+ array_agg
+---
+ {1,2,3,4}
+

 



Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-25 Thread David G. Johnston
On Wed, Oct 25, 2023 at 8:36 AM Bruce Momjian  wrote:

> On Tue, Oct 24, 2023 at 06:45:48PM -0700, David G. Johnston wrote:
> > I'd prefer to keep pointing out that the ones documented are those whose
> > outputs will vary due to ordering.
>
> Okay, I re-added it in the attached patch, and tightened up the text.
>

Thanks


> I think you are right that it belongs in the syntax section;  we cover
> ordering extensively there.  We already have queries there, but not
> output, so I moved the relevant examples to there and replaced the
> example that had no output.
>

Thanks


> > The same goes for the special knowledge of floating point behavior for
> why
> > we've chosen to document avg/sum, something that typically doesn't care
> about
> > order, as having an optional order by.
>
> The floating example seems too obscure to mention in our function docs.
> I can put a sentence in the syntax docs, but is there value in
> explaining that to users?  How it that helpful?  Example?
>
>
Yeah, we punt on the entire concept in the data type section:

"Managing these errors and how they propagate through calculations is the
subject of an entire branch of mathematics and computer science and will
not be discussed here," ...

Also, I'm now led to believe that the relevant IEEE 754 floating point
addition is indeed commutative.  Given that, I am inclined to simply not
add the order by clause at all to those four functions. (actually, you
already got rid of the avg()s but the sum()s are still present, so just
those two).

David J.


Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-25 Thread Bruce Momjian
On Tue, Oct 24, 2023 at 06:45:48PM -0700, David G. Johnston wrote:
> I'd prefer to keep pointing out that the ones documented are those whose
> outputs will vary due to ordering.

Okay, I re-added it in the attached patch, and tightened up the text.

> I've been sympathetic to the user comments that we don't have enough 
> examples. 

Good point.

> Just using array_agg for that purpose, showing both DISTINCT and ORDER BY 
> seems
> like a fair compromise (removes two from my original proposal).  The examples
> in the section we tell them to go see aren't of that great quality.  If you
> strongly dislike having the function table contain the examples we should at
> least improve the page we are sending them to.  (As an aside to this, I've
> personally always found the syntax block with the 5 syntaxes shown there to be
> intimidating/hard-to-read).

I think you are right that it belongs in the syntax section;  we cover
ordering extensively there.  We already have queries there, but not
output, so I moved the relevant examples to there and replaced the
example that had no output.

> I'd at least suggest you reconsider the commentary and examples surrounding
> jsonb_object_agg.

I moved that as well, and tightened the example.

> The same goes for the special knowledge of floating point behavior for why
> we've chosen to document avg/sum, something that typically doesn't care about
> order, as having an optional order by.

The floating example seems too obscure to mention in our function docs. 
I can put a sentence in the syntax docs, but is there value in
explaining that to users?  How it that helpful?  Example?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c3e940afe..d60c65f8fc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20273,6 +20273,13 @@ SELECT NULLIF(value, '(none)') ...
aggregation.
   
 
+  
+   While all aggregates below accept an optional
+   ORDER BY clause (as outlined in ), the clause has only been added to
+   aggregates whose output is affected by ordering.
+  
+

 General-Purpose Aggregate Functions
 
@@ -20310,7 +20317,7 @@ SELECT NULLIF(value, '(none)') ...
 
  array_agg
 
-array_agg ( anynonarray )
+array_agg ( anynonarray ORDER BY input_sort_columns )
 anyarray


@@ -20321,7 +20328,7 @@ SELECT NULLIF(value, '(none)') ...
 
   

-array_agg ( anyarray )
+array_agg ( anyarray ORDER BY input_sort_columns )
 anyarray


@@ -20526,14 +20533,14 @@ SELECT NULLIF(value, '(none)') ...
 
  json_agg
 
-json_agg ( anyelement )
+json_agg ( anyelement ORDER BY input_sort_columns )
 json


 
  jsonb_agg
 
-jsonb_agg ( anyelement )
+jsonb_agg ( anyelement ORDER BY input_sort_columns )
 jsonb


@@ -20573,7 +20580,8 @@ SELECT NULLIF(value, '(none)') ...
 
 json_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 json


@@ -20582,7 +20590,8 @@ SELECT NULLIF(value, '(none)') ...
 
 jsonb_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_columns )
 jsonb


@@ -20819,7 +20828,8 @@ SELECT NULLIF(value, '(none)') ...


 string_agg ( value
- bytea, delimiter bytea )
+ bytea, delimiter bytea
+ ORDER BY input_sort_columns )
 bytea


@@ -20851,11 +20861,11 @@ SELECT NULLIF(value, '(none)') ...
 numeric


-sum ( real )
+sum ( real  ORDER BY input_sort_columns  )
 real


-sum ( double precision )
+sum ( double precision  ORDER BY input_sort_columns  )
 double precision


@@ -20877,7 +20887,7 @@ SELECT NULLIF(value, '(none)') ...
 
  xmlagg
 
-xmlagg ( xml )
+xmlagg ( xml ORDER BY input_sort_columns )
 xml


diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 3ba844057f..b64733d8aa 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1647,7 +1647,26 @@ sqrt(2)
 are always just expressions and cannot be output-column names or numbers.
 For example:
 
-SELECT array_agg(a ORDER BY b DESC) FROM table;
+WITH vals (v) AS ( VALUES (1),(3),(4),(3),(2) )
+SELECT array_agg(v ORDER BY v DESC) FROM vals;
+  array_agg
+-
+ {4,3,3,2,1}
+
+Since jsonb only keeps the last matching key, ordering
+of its keys can be significant:
+
+WITH vals (k, v) AS ( VALUES ('key0','1'), 

Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-24 Thread David G. Johnston
On Tue, Oct 24, 2023 at 1:39 PM Bruce Momjian  wrote:

> On Tue, Dec 13, 2022 at 07:38:15PM -0700, David G. Johnston wrote:
> > All,
> >
> > The recent discussion surrounding aggregates and ORDER BY moved me to
> look over
> > our existing documentation, especially now that we've reworked the
> function
> > tables, to see what improvements can be had by simply documenting those
> > functions where ORDER BY may change the user-visible output.  I skipped
> range
> > aggregates for the moment but handled the others on the aggregates page
> (not
> > window functions).  This includes the float types for sum and avg.
> >
> > I added a note just before the table linking back to the syntax chapter
> and
> > describing the newly added rules and syntax choice in the table.
> >
> > The nuances of floating point math suggest to me that specifying order
> by for
> > those is in some kind of gray area and so I've marked it optional...any
> > suggestions for wording (or an xref) to explain those nuances or should
> it just
> > be shown non-optional like the others?  Or not shown at all?
> >
> > The novelty of my examples is up for bikeshedding.  I didn't want
> anything too
> > long so a subquery didn't make sense, and I was trying to avoid
> duplication as
> > well as multiple lines - hence creating a CTE that can be copied onto
> all of
> > the example queries to produce the noted result.
> >
> > I added a DISTINCT example to array_agg because it is the first
> aggregate on
> > the page and so hopefully will be seen during a cursory reading.  Plus,
> > array_agg is the go-to function for doing this kind of experimentation.
>
> I like this idea, though the examples seemed too detailed so I skipped
> them.  Here is the trimmed-down patch I would like to apply.
>
>
I'd prefer to keep pointing out that the ones documented are those whose
outputs will vary due to ordering.

I've been sympathetic to the user comments that we don't have enough
examples.  Just using array_agg for that purpose, showing both DISTINCT and
ORDER BY seems like a fair compromise (removes two from my original
proposal).  The examples in the section we tell them to go see aren't of
that great quality.  If you strongly dislike having the function table
contain the examples we should at least improve the page we are sending
them to.  (As an aside to this, I've personally always found the syntax
block with the 5 syntaxes shown there to be intimidating/hard-to-read).

I'd at least suggest you reconsider the commentary and examples surrounding
jsonb_object_agg.

The same goes for the special knowledge of floating point behavior for why
we've chosen to document avg/sum, something that typically doesn't care
about order, as having an optional order by.

David J.


Re: Document aggregate functions better w.r.t. ORDER BY

2023-10-24 Thread Bruce Momjian
On Tue, Dec 13, 2022 at 07:38:15PM -0700, David G. Johnston wrote:
> All,
> 
> The recent discussion surrounding aggregates and ORDER BY moved me to look 
> over
> our existing documentation, especially now that we've reworked the function
> tables, to see what improvements can be had by simply documenting those
> functions where ORDER BY may change the user-visible output.  I skipped range
> aggregates for the moment but handled the others on the aggregates page (not
> window functions).  This includes the float types for sum and avg.
> 
> I added a note just before the table linking back to the syntax chapter and
> describing the newly added rules and syntax choice in the table.
> 
> The nuances of floating point math suggest to me that specifying order by for
> those is in some kind of gray area and so I've marked it optional...any
> suggestions for wording (or an xref) to explain those nuances or should it 
> just
> be shown non-optional like the others?  Or not shown at all?
> 
> The novelty of my examples is up for bikeshedding.  I didn't want anything too
> long so a subquery didn't make sense, and I was trying to avoid duplication as
> well as multiple lines - hence creating a CTE that can be copied onto all of
> the example queries to produce the noted result.
> 
> I added a DISTINCT example to array_agg because it is the first aggregate on
> the page and so hopefully will be seen during a cursory reading.  Plus,
> array_agg is the go-to function for doing this kind of experimentation.

I like this idea, though the examples seemed too detailed so I skipped
them.  Here is the trimmed-down patch I would like to apply.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c3e940afe..35d8924c6b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20273,6 +20273,18 @@ SELECT NULLIF(value, '(none)') ...
aggregation.
   
 
+  
+   
+The aggregates in this section all accept an optional ORDER
+BY clause as described in the general syntax rules
+for aggregates (see ).
+input_sort_expr specifies the column names to sort
+by before being supplied to the aggreagate function.  For example,
+array_agg(... ORDER BY ...) produces an array with
+ordered values.
+   
+  
+

 General-Purpose Aggregate Functions
 
@@ -20310,7 +20322,7 @@ SELECT NULLIF(value, '(none)') ...
 
  array_agg
 
-array_agg ( anynonarray )
+array_agg ( anynonarray ORDER BY input_sort_expr)
 anyarray


@@ -20321,7 +20333,7 @@ SELECT NULLIF(value, '(none)') ...
 
   

-array_agg ( anyarray )
+array_agg ( anyarray ORDER BY input_sort_expr )
 anyarray


@@ -20356,11 +20368,11 @@ SELECT NULLIF(value, '(none)') ...
 numeric


-avg ( real )
+avg ( real  ORDER BY input_sort_expr  )
 double precision


-avg ( double precision )
+avg ( double precision  ORDER BY input_sort_expr  )
 double precision


@@ -20526,14 +20538,14 @@ SELECT NULLIF(value, '(none)') ...
 
  json_agg
 
-json_agg ( anyelement )
+json_agg ( anyelement ORDER BY input_sort_expr )
 json


 
  jsonb_agg
 
-jsonb_agg ( anyelement )
+jsonb_agg ( anyelement ORDER BY input_sort_expr )
 jsonb


@@ -20573,7 +20585,8 @@ SELECT NULLIF(value, '(none)') ...
 
 json_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_expr )
 json


@@ -20582,7 +20595,8 @@ SELECT NULLIF(value, '(none)') ...
 
 jsonb_object_agg ( key
  "any", value
- "any" )
+ "any"
+ ORDER BY input_sort_expr )
 jsonb


@@ -20819,7 +20833,8 @@ SELECT NULLIF(value, '(none)') ...


 string_agg ( value
- bytea, delimiter bytea )
+ bytea, delimiter bytea
+ ORDER BY input_sort_expr )
 bytea


@@ -20851,11 +20866,11 @@ SELECT NULLIF(value, '(none)') ...
 numeric


-sum ( real )
+sum ( real  ORDER BY input_sort_expr  )
 real


-sum ( double precision )
+sum ( double precision  ORDER BY input_sort_expr  )
 double precision


@@ -20877,7 +20892,7 @@ SELECT NULLIF(value, '(none)') ...
 
  xmlagg
 
-xmlagg ( xml )
+xmlagg ( xml ORDER BY input_sort_expr )
 xml