Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-13 Thread Tom Lane
I wrote:
> Moving on to the exact color of the bikeshed: it seems like the right
> way to present this to users of CREATE AGGREGATE is in terms of "does
> the final function modify the transition state?".  So maybe the values
> could be spelled
> SMODIFY = READ_ONLY   ffunc never touches state, ok as window agg
> SMODIFY = SHARABLEffunc does some one-time change like sorting,
>   so state merging is OK but not window agg
> SMODIFY = READ_WRITE  ffunc trashes state, can't do merging either
> I'm not set on these names by any means; anyone have a better idea?

After contemplating the existing CREATE AGGREGATE parameters, particularly
[M]FINALFUNC_EXTRA, it seemed to me that better nomenclature is

[M]FINALFUNC_MODIFY = READ_ONLY
[M]FINALFUNC_MODIFY = STOP_UPDATES
[M]FINALFUNC_MODIFY = READ_WRITE

where "stop updates" is intended to imply "you can't call the transfn
anymore after the first finalfn call".  I'm still not that much in
love with that terminology, but don't have a better idea now.

Attached is a WIP patch; I believe it's code-complete but the SGML
docs are lacking.  Barring objections I'll finish up the docs and
push this.

regards, tom lane

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9af77c1..cfec246 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 487,492 
--- 487,512 
True to pass extra dummy arguments to aggmfinalfn
   
   
+   aggfinalmodify
+   char
+   
+   Whether aggfinalfn modifies the
+transition state value:
+r if it is read-only,
+s if the aggtransfn
+cannot be applied after the aggfinalfn, or
+w if it writes on the value
+   
+  
+  
+   aggmfinalmodify
+   char
+   
+   Like aggfinalmodify, but for
+the aggmfinalfn
+   
+  
+  
aggsortop
oid
pg_operator.oid
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index a920450..ca3fd81 100644
*** a/src/backend/catalog/pg_aggregate.c
--- b/src/backend/catalog/pg_aggregate.c
***
*** 19,24 
--- 19,25 
  #include "catalog/dependency.h"
  #include "catalog/indexing.h"
  #include "catalog/pg_aggregate.h"
+ #include "catalog/pg_aggregate_fn.h"
  #include "catalog/pg_language.h"
  #include "catalog/pg_operator.h"
  #include "catalog/pg_proc.h"
*** AggregateCreate(const char *aggName,
*** 65,70 
--- 66,73 
  List *aggmfinalfnName,
  bool finalfnExtraArgs,
  bool mfinalfnExtraArgs,
+ char finalfnModify,
+ char mfinalfnModify,
  List *aggsortopName,
  Oid aggTransType,
  int32 aggTransSpace,
*** AggregateCreate(const char *aggName,
*** 656,661 
--- 659,666 
  	values[Anum_pg_aggregate_aggmfinalfn - 1] = ObjectIdGetDatum(mfinalfn);
  	values[Anum_pg_aggregate_aggfinalextra - 1] = BoolGetDatum(finalfnExtraArgs);
  	values[Anum_pg_aggregate_aggmfinalextra - 1] = BoolGetDatum(mfinalfnExtraArgs);
+ 	values[Anum_pg_aggregate_aggfinalmodify - 1] = CharGetDatum(finalfnModify);
+ 	values[Anum_pg_aggregate_aggmfinalmodify - 1] = CharGetDatum(mfinalfnModify);
  	values[Anum_pg_aggregate_aggsortop - 1] = ObjectIdGetDatum(sortop);
  	values[Anum_pg_aggregate_aggtranstype - 1] = ObjectIdGetDatum(aggTransType);
  	values[Anum_pg_aggregate_aggtransspace - 1] = Int32GetDatum(aggTransSpace);
diff --git a/src/backend/commands/aggregatecmds.c b/src/backend/commands/aggregatecmds.c
index a63539a..07ef4c5 100644
*** a/src/backend/commands/aggregatecmds.c
--- b/src/backend/commands/aggregatecmds.c
***
*** 26,31 
--- 26,32 
  #include "catalog/dependency.h"
  #include "catalog/indexing.h"
  #include "catalog/pg_aggregate.h"
+ #include "catalog/pg_aggregate_fn.h"
  #include "catalog/pg_proc.h"
  #include "catalog/pg_type.h"
  #include "commands/alter.h"
***
*** 39,44 
--- 40,48 
  #include "utils/syscache.h"
  
  
+ static char extractModify(DefElem *defel);
+ 
+ 
  /*
   *	DefineAggregate
   *
*** DefineAggregate(ParseState *pstate, List
*** 67,72 
--- 71,78 
  	List	   *mfinalfuncName = NIL;
  	bool		finalfuncExtraArgs = false;
  	bool		mfinalfuncExtraArgs = false;
+ 	char		finalfuncModify = 0;
+ 	char		mfinalfuncModify = 0;
  	List	   *sortoperatorName = NIL;
  	TypeName   *baseType = NULL;
  	TypeName   *transType = NULL;
*** DefineAggregate(ParseState *pstate, List
*** 143,148 
--- 149,158 
  			finalfuncExtraArgs = defGetBoolean(defel);
  		else if (pg_strcasecmp(defel->defname, "mfinalfunc_extra") == 0)
  			mfinalfuncExtraArgs = defGetBoolean(defel);
+ 		else if (pg_strcasecmp(defel->defname, "finalfunc_modify") == 0)
+ 			finalfuncModify = extractModify(defel);
+ 		else if (pg_strcasecmp(defel->defname, "mfinalfunc_modify") == 0)
+ 			mfinalfuncModify = extractModify(defel);
  		else if 

Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-13 Thread Tom Lane
Heikki Linnakangas  writes:
> We've been doing that window agg thing for a long time, so I think 
> "works as window agg" should be the default for regular aggregates. For 
> ordered-set aggregates, "no merging, no more transfn() calls after 
> finalfn()" seems safest.

> It's a bit of a shame to have different defaults for regular and 
> ordered-set aggregates. But that is what we implicitly assume today, 
> anyway, and it's not too that hard to document.

It's kind of a pain in the rear, really, especially for purposes like
pg_dump, which will have to know the rule to decide whether it can
omit an SMODIFY clause.  (Which it should do if the value is default,
so as not to make back-porting of dumps harder than necessary.)

On reflection though I see little choice.  Right now it's impossible
to use an OSA as a window agg at all because the parser doesn't support
that combination of options, and nodeWindowAgg is lacking necessary
support too.  If we ever get around to relaxing that, it would really
be necessary that the default for user-created OSAs be "not safe as
window agg", because almost certainly they wouldn't be.  So I think
there's no option but to have different default safety levels for OSA
and normal aggs.  As long as we're forced into that anyway, it does
make sense to default to "not safe to merge" as well.  We don't know
for sure whether there are any user-created OSAs in the wild, but
if there are, they likely borrowed code from orderedsetaggs.c.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-13 Thread Tom Lane
Heikki Linnakangas  writes:
> On 10/13/2017 02:08 AM, Tom Lane wrote:
>> I started to look into fixing orderedsetaggs.c so that we could revert
>> 52328727b, and soon found a rather nasty problem.  Although the plain
>> OSAs seem amenable to supporting multiple finalfn calls on the same
>> transition state, the "hypothetical set" functions are not at all.
>> What they do is to accumulate all the regular aggregate input into
>> a tuplesort, then add the direct arguments as an additional "hypothetical"
>> row, and finally sort the result.  There's no realistic way of undoing
>> the addition of the hypothetical row, so these finalfns simply can't
>> share tuplesort state.

> The current implementation, with the extra flag column, is quite naive. 
> We could add some support to tuplesort to find a row with given 
> attributes, and call that instead of actually adding the hypothetical 
> row to the tuplesort and iterating to re-find it after performsort. For 
> a result set that fits in memory, you could even do a binary search 
> instead of linearly iterating through the result set.

I'd had some idle thoughts about using a heap to support window
aggregation of an OSA, rather than ever performing a full sort.
It'd probably work well as long as the window doesn't get wide enough
that the heap stops fitting in memory.  But in any case, the immediate
point is that we shouldn't just assume that every aggregate
implementation is capable of supporting state merging, and while we're
at it, it'd be better to have a non-syntax-based distinction between
aggs that can support use as window functions and aggs that can't.

Moving on to the exact color of the bikeshed: it seems like the right
way to present this to users of CREATE AGGREGATE is in terms of "does
the final function modify the transition state?".  So maybe the values
could be spelled

SMODIFY = READ_ONLY   ffunc never touches state, ok as window agg
SMODIFY = SHARABLEffunc does some one-time change like sorting,
  so state merging is OK but not window agg
SMODIFY = READ_WRITE  ffunc trashes state, can't do merging either

I'm not set on these names by any means; anyone have a better idea?
Also, I'm not sure whether we'd need a separate MMODIFY flag for
the moving-aggregate sub-implementation, but it seems possible.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-13 Thread Heikki Linnakangas

On 10/13/2017 02:41 AM, Tom Lane wrote:

David Rowley  writes:

On 13 October 2017 at 12:08, Tom Lane  wrote:

Therefore, I think we need to bite the bullet and provide an aggregate
property (CREATE AGGREGATE argument / pg_aggregate column) that tells
whether the aggregate supports transition state merging.  Likely this
should have been in the state-merging patch to begin with, but better
late than never.



Are you considering that this is an option only for ordered-set
aggregates or for all?


All.


If the user defines their normal aggregate as not safe for merging,
then surely it'll not be suitable to be used as a window function
either, since the final function will also be called there multiple
times per state.


Yeah, we would probably also want to check the flag in nodeWindowAgg.
Not sure exactly how that should play out --- maybe we end up with
a tri-valued property "works as normal agg without merging, works
as normal agg with merging, works as window agg".  But this would
arguably be an improvement over the current situation.  Right now
I'm sure there are user-written aggs out there that will just crash
if used as a window agg, and the authors don't have much choice because
the performance costs of not modifying the transition state in the
finalfn are higher than they're willing to bear.  At least with a
flag they could ensure that the case will fail cleanly.


Sounds right to me. I'm not so sure there really are aggregates out 
there that would crash today if used as a window aggregate, but it sure 
would be nice to give some control on that.


We've been doing that window agg thing for a long time, so I think 
"works as window agg" should be the default for regular aggregates. For 
ordered-set aggregates, "no merging, no more transfn() calls after 
finalfn()" seems safest.


It's a bit of a shame to have different defaults for regular and 
ordered-set aggregates. But that is what we implicitly assume today, 
anyway, and it's not too that hard to document.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-13 Thread Heikki Linnakangas

On 10/13/2017 02:08 AM, Tom Lane wrote:

I started to look into fixing orderedsetaggs.c so that we could revert
52328727b, and soon found a rather nasty problem.  Although the plain
OSAs seem amenable to supporting multiple finalfn calls on the same
transition state, the "hypothetical set" functions are not at all.
What they do is to accumulate all the regular aggregate input into
a tuplesort, then add the direct arguments as an additional "hypothetical"
row, and finally sort the result.  There's no realistic way of undoing
the addition of the hypothetical row, so these finalfns simply can't
share tuplesort state.

You could imagine accumulating the regular input into a tuplestore
and then copying it into a tuplesort in each finalfn call.  But that
seems pretty icky, and I'm not sure it'd really be any more performant
than just keeping separate tuplesort objects for each aggregate.


The current implementation, with the extra flag column, is quite naive. 
We could add some support to tuplesort to find a row with given 
attributes, and call that instead of actually adding the hypothetical 
row to the tuplesort and iterating to re-find it after performsort. For 
a result set that fits in memory, you could even do a binary search 
instead of linearly iterating through the result set.


I'm not suggesting that we do that right now, though. And even if we 
did, there might be other aggregates out there that are not safe to 
merge, so we should  still add the option to CREATE AGGREGATE. But 
that'd be nice little project, if someone wanted to make those functions 
faster.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-12 Thread Tom Lane
David Rowley  writes:
> On 13 October 2017 at 12:41, Tom Lane  wrote:
>> Yeah, we would probably also want to check the flag in nodeWindowAgg.
>> Not sure exactly how that should play out --- maybe we end up with
>> a tri-valued property "works as normal agg without merging, works
>> as normal agg with merging, works as window agg".

> hmm, maybe I'm lacking imagination here, but surely the final function
> is either destructive or it's not? I can't understand what the
> difference between nodeAgg.c calling the finalfn multiple times on the
> same state and nodeWindowAgg.c doing it. Maybe there's something I'm
> not accounting for that you are?

nodeWindowAgg is doing something more: not only is it calling the finalfn
repeatedly, but it's continuing to mutate the transition state in between.
The ordered-set aggs provide a counterexample to considering that to be
equivalent to state merging.  The OSAs can cope with state merging as
long as they have a flag to make sure only the first finalfn does
tuplesort_performsort ... but that's not good enough to make them workable
as window aggs.  Once we sort, we can't absorb more rows into the
tuplesort object.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-12 Thread David Rowley
On 13 October 2017 at 12:41, Tom Lane  wrote:
> David Rowley  writes:
>> If the user defines their normal aggregate as not safe for merging,
>> then surely it'll not be suitable to be used as a window function
>> either, since the final function will also be called there multiple
>> times per state.
>
> Yeah, we would probably also want to check the flag in nodeWindowAgg.
> Not sure exactly how that should play out --- maybe we end up with
> a tri-valued property "works as normal agg without merging, works
> as normal agg with merging, works as window agg".  But this would
> arguably be an improvement over the current situation.  Right now
> I'm sure there are user-written aggs out there that will just crash
> if used as a window agg, and the authors don't have much choice because
> the performance costs of not modifying the transition state in the
> finalfn are higher than they're willing to bear.  At least with a
> flag they could ensure that the case will fail cleanly.

hmm, maybe I'm lacking imagination here, but surely the final function
is either destructive or it's not? I can't understand what the
difference between nodeAgg.c calling the finalfn multiple times on the
same state and nodeWindowAgg.c doing it. Maybe there's something I'm
not accounting for that you are?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-12 Thread Tom Lane
David Rowley  writes:
> On 13 October 2017 at 12:08, Tom Lane  wrote:
>> Therefore, I think we need to bite the bullet and provide an aggregate
>> property (CREATE AGGREGATE argument / pg_aggregate column) that tells
>> whether the aggregate supports transition state merging.  Likely this
>> should have been in the state-merging patch to begin with, but better
>> late than never.

> Are you considering that this is an option only for ordered-set
> aggregates or for all?

All.

> If the user defines their normal aggregate as not safe for merging,
> then surely it'll not be suitable to be used as a window function
> either, since the final function will also be called there multiple
> times per state.

Yeah, we would probably also want to check the flag in nodeWindowAgg.
Not sure exactly how that should play out --- maybe we end up with
a tri-valued property "works as normal agg without merging, works
as normal agg with merging, works as window agg".  But this would
arguably be an improvement over the current situation.  Right now
I'm sure there are user-written aggs out there that will just crash
if used as a window agg, and the authors don't have much choice because
the performance costs of not modifying the transition state in the
finalfn are higher than they're willing to bear.  At least with a
flag they could ensure that the case will fail cleanly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions

2017-10-12 Thread David Rowley
On 13 October 2017 at 12:08, Tom Lane  wrote:
> Therefore, I think we need to bite the bullet and provide an aggregate
> property (CREATE AGGREGATE argument / pg_aggregate column) that tells
> whether the aggregate supports transition state merging.  Likely this
> should have been in the state-merging patch to begin with, but better
> late than never.
>
> The main thing that probably has to be hashed out before we can write
> that patch is what the default should be for user-created aggregates.
> I am inclined to think that we should err on the side of safety and
> default it to false (no merge support).  You could argue that the
> lack of complaints since 9.6 came out is sufficient evidence that
> defaulting to true would be all right, but I'm not sure.

Are you considering that this is an option only for ordered-set
aggregates or for all?

If the user defines their normal aggregate as not safe for merging,
then surely it'll not be suitable to be used as a window function
either, since the final function will also be called there multiple
times per state.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers