Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-04 Thread Vik Fearing
On 07/02/2014 10:15 PM, Andrew Gierth wrote:
 (Had one request so far for a mode() variant that returns the unique
 modal value if one exists, otherwise null; so the current set of
 ordered-set aggs by no means exhausts the possible applications.)

I resemble that remark. :)

But seriously, am I the only one who doesn't want some random value when
there is no dominant one?  And the fact that I can't create my own
without C code is a real bummer.
-- 
Vik


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-03 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom If we're going to do that, I think it needs to be in 9.4.  Are
 Tom you going to work up a patch?

How's this?  (applies and passes regression on 9.4 and master)

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 09ff035..0388735 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -2199,44 +2199,56 @@ AggGetAggref(FunctionCallInfo fcinfo)
 }
 
 /*
- * AggGetPerTupleEContext - fetch per-input-tuple ExprContext
+ * AggGetTempMemoryContext - fetch short-term memory context
  *
- * This is useful in agg final functions; the econtext returned is the
- * same per-tuple context that the transfn was called in (which can
- * safely get reset during the final function).
+ * This is useful in agg final functions; the context returned is one that the
+ * final function can safely reset as desired. This isn't useful for transition
+ * functions, since the context returned MAY (we don't promise) be the same as
+ * the context those are called in.
  *
  * As above, this is currently not useful for aggs called as window functions.
  */
-ExprContext *
-AggGetPerTupleEContext(FunctionCallInfo fcinfo)
+MemoryContext
+AggGetTempMemoryContext(FunctionCallInfo fcinfo)
 {
 	if (fcinfo-context  IsA(fcinfo-context, AggState))
 	{
 		AggState   *aggstate = (AggState *) fcinfo-context;
 
-		return aggstate-tmpcontext;
+		return aggstate-tmpcontext-ecxt_per_tuple_memory;
 	}
 	return NULL;
 }
 
 /*
- * AggGetPerAggEContext - fetch per-output-tuple ExprContext
+ * AggRegisterCallback - register a cleanup callback linked to aggcontext
  *
  * This is useful for aggs to register shutdown callbacks, which will ensure
- * that non-memory resources are freed.
+ * that non-memory resources are freed.  These callbacks will occur just before
+ * the associated aggcontext (as returned by AggCheckCallContext) is reset,
+ * either between groups or as a result of rescanning the query.  They will NOT
+ * be called on error paths.  The primary intent is to allow for freeing of
+ * tuplestores or tuplesorts maintained in aggcontext, or pins held by slots
+ * created by the agg functions.  (They will not be called until after the
+ * result of the finalfn is no longer needed, so it's safe for the finalfn to
+ * return data that will be freed by the callback.)
  *
  * As above, this is currently not useful for aggs called as window functions.
  */
-ExprContext *
-AggGetPerAggEContext(FunctionCallInfo fcinfo)
+void
+AggRegisterCallback(FunctionCallInfo fcinfo,
+	ExprContextCallbackFunction func,
+	Datum arg)
 {
 	if (fcinfo-context  IsA(fcinfo-context, AggState))
 	{
 		AggState   *aggstate = (AggState *) fcinfo-context;
 
-		return aggstate-ss.ps.ps_ExprContext;
+		RegisterExprContextCallback(aggstate-ss.ps.ps_ExprContext, func, arg);
+
+		return;
 	}
-	return NULL;
+	elog(ERROR,Aggregate function cannot register a callback in this context);
 }
 
 
diff --git a/src/backend/utils/adt/orderedsetaggs.c b/src/backend/utils/adt/orderedsetaggs.c
index efb0411..d116a63 100644
--- a/src/backend/utils/adt/orderedsetaggs.c
+++ b/src/backend/utils/adt/orderedsetaggs.c
@@ -49,8 +49,6 @@ typedef struct OSAPerQueryState
 	MemoryContext qcontext;
 	/* Memory context containing per-group data: */
 	MemoryContext gcontext;
-	/* Agg plan node's output econtext: */
-	ExprContext *peraggecontext;
 
 	/* These fields are used only when accumulating tuples: */
 
@@ -117,7 +115,6 @@ ordered_set_startup(FunctionCallInfo fcinfo, bool use_tuples)
 		Aggref	   *aggref;
 		MemoryContext qcontext;
 		MemoryContext gcontext;
-		ExprContext *peraggecontext;
 		List	   *sortlist;
 		int			numSortCols;
 
@@ -133,10 +130,6 @@ ordered_set_startup(FunctionCallInfo fcinfo, bool use_tuples)
 			elog(ERROR, ordered-set aggregate called in non-aggregate context);
 		if (!AGGKIND_IS_ORDERED_SET(aggref-aggkind))
 			elog(ERROR, ordered-set aggregate support function called for non-ordered-set aggregate);
-		/* Also get output exprcontext so we can register shutdown callback */
-		peraggecontext = AggGetPerAggEContext(fcinfo);
-		if (!peraggecontext)
-			elog(ERROR, ordered-set aggregate called in non-aggregate context);
 
 		/*
 		 * Prepare per-query structures in the fn_mcxt, which we assume is the
@@ -150,7 +143,6 @@ ordered_set_startup(FunctionCallInfo fcinfo, bool use_tuples)
 		qstate-aggref = aggref;
 		qstate-qcontext = qcontext;
 		qstate-gcontext = gcontext;
-		qstate-peraggecontext = peraggecontext;
 
 		/* Extract the sort information */
 		sortlist = aggref-aggorder;
@@ -291,9 +283,9 @@ ordered_set_startup(FunctionCallInfo fcinfo, bool use_tuples)
 	osastate-number_of_rows = 0;
 
 	/* Now register a shutdown callback to clean things up */
-	RegisterExprContextCallback(qstate-peraggecontext,
-ordered_set_shutdown,
-PointerGetDatum(osastate));
+	AggRegisterCallback(fcinfo,
+		ordered_set_shutdown,
+		

Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-03 Thread Andrew Gierth
Here are two more, cumulative with the previous patch and each other:

The first removes the assumption in ordered_set_startup that the
aggcontext can be cached in fn_extra, and puts it in the transvalue
instead.

The second exposes the OSA* structures in a header file, so that
user-defined ordered-set aggs can use ordered_set_transition[_multi]
directly rather than having to cargo-cult it into their own code.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/utils/adt/orderedsetaggs.c b/src/backend/utils/adt/orderedsetaggs.c
index d116a63..92fa9f3 100644
--- a/src/backend/utils/adt/orderedsetaggs.c
+++ b/src/backend/utils/adt/orderedsetaggs.c
@@ -47,8 +47,6 @@ typedef struct OSAPerQueryState
 	Aggref	   *aggref;
 	/* Memory context containing this struct and other per-query data: */
 	MemoryContext qcontext;
-	/* Memory context containing per-group data: */
-	MemoryContext gcontext;
 
 	/* These fields are used only when accumulating tuples: */
 
@@ -86,6 +84,8 @@ typedef struct OSAPerGroupState
 {
 	/* Link to the per-query state for this aggregate: */
 	OSAPerQueryState *qstate;
+	/* MemoryContext for per-group data */
+	MemoryContext gcontext;
 	/* Sort object we're accumulating data in: */
 	Tuplesortstate *sortstate;
 	/* Number of normal rows inserted into sortstate: */
@@ -104,6 +104,15 @@ ordered_set_startup(FunctionCallInfo fcinfo, bool use_tuples)
 	OSAPerGroupState *osastate;
 	OSAPerQueryState *qstate;
 	MemoryContext oldcontext;
+	MemoryContext gcontext;
+
+	/*
+	 * Check we're called as aggregate (and not a window function), and
+	 * get the Agg node's group-lifespan context (which might not be the
+	 * same throughout the query, but will be the same for each transval)
+	 */
+	if (AggCheckCallContext(fcinfo, gcontext) != AGG_CONTEXT_AGGREGATE)
+		elog(ERROR, ordered-set aggregate called in non-aggregate context);
 
 	/*
 	 * We keep a link to the per-query state in fn_extra; if it's not there,
@@ -114,16 +123,9 @@ ordered_set_startup(FunctionCallInfo fcinfo, bool use_tuples)
 	{
 		Aggref	   *aggref;
 		MemoryContext qcontext;
-		MemoryContext gcontext;
 		List	   *sortlist;
 		int			numSortCols;
 
-		/*
-		 * Check we're called as aggregate (and not a window function), and
-		 * get the Agg node's group-lifespan context
-		 */
-		if (AggCheckCallContext(fcinfo, gcontext) != AGG_CONTEXT_AGGREGATE)
-			elog(ERROR, ordered-set aggregate called in non-aggregate context);
 		/* Need the Aggref as well */
 		aggref = AggGetAggref(fcinfo);
 		if (!aggref)
@@ -142,7 +144,6 @@ ordered_set_startup(FunctionCallInfo fcinfo, bool use_tuples)
 		qstate = (OSAPerQueryState *) palloc0(sizeof(OSAPerQueryState));
 		qstate-aggref = aggref;
 		qstate-qcontext = qcontext;
-		qstate-gcontext = gcontext;
 
 		/* Extract the sort information */
 		sortlist = aggref-aggorder;
@@ -259,10 +260,11 @@ ordered_set_startup(FunctionCallInfo fcinfo, bool use_tuples)
 	}
 
 	/* Now build the stuff we need in group-lifespan context */
-	oldcontext = MemoryContextSwitchTo(qstate-gcontext);
+	oldcontext = MemoryContextSwitchTo(gcontext);
 
 	osastate = (OSAPerGroupState *) palloc(sizeof(OSAPerGroupState));
 	osastate-qstate = qstate;
+	osastate-gcontext = gcontext;
 
 	/* Initialize tuplesort object */
 	if (use_tuples)
diff --git a/src/backend/utils/adt/orderedsetaggs.c b/src/backend/utils/adt/orderedsetaggs.c
index 92fa9f3..bbad0e5 100644
--- a/src/backend/utils/adt/orderedsetaggs.c
+++ b/src/backend/utils/adt/orderedsetaggs.c
@@ -16,6 +16,8 @@
 
 #include math.h
 
+#include utils/orderedset.h
+
 #include catalog/pg_aggregate.h
 #include catalog/pg_operator.h
 #include catalog/pg_type.h
@@ -30,71 +32,8 @@
 #include utils/tuplesort.h
 
 
-/*
- * Generic support for ordered-set aggregates
- *
- * The state for an ordered-set aggregate is divided into a per-group struct
- * (which is the internal-type transition state datum returned to nodeAgg.c)
- * and a per-query struct, which contains data and sub-objects that we can
- * create just once per query because they will not change across groups.
- * The per-query struct and subsidiary data live in the executor's per-query
- * memory context, and go away implicitly at ExecutorEnd().
- */
-
-typedef struct OSAPerQueryState
-{
-	/* Aggref for this aggregate: */
-	Aggref	   *aggref;
-	/* Memory context containing this struct and other per-query data: */
-	MemoryContext qcontext;
-
-	/* These fields are used only when accumulating tuples: */
-
-	/* Tuple descriptor for tuples inserted into sortstate: */
-	TupleDesc	tupdesc;
-	/* Tuple slot we can use for inserting/extracting tuples: */
-	TupleTableSlot *tupslot;
-	/* Per-sort-column sorting information */
-	int			numSortCols;
-	AttrNumber *sortColIdx;
-	Oid		   *sortOperators;
-	Oid		   *eqOperators;
-	Oid		   *sortCollations;
-	bool	   *sortNullsFirsts;
-	/* Equality operator call info, created only if needed: */
-	FmgrInfo   *equalfns;
-
-	/* These fields are used only when accumulating datums: */
-
-	/* Info about datatype 

Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-03 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom If we're going to do that, I think it needs to be in 9.4.  Are
  Tom you going to work up a patch?

 How's this?  (applies and passes regression on 9.4 and master)

Pushed with minor cosmetic adjustments.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-03 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  How's this?  (applies and passes regression on 9.4 and master)

 Tom Pushed with minor cosmetic adjustments.

Thanks!

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-03 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Here are two more, cumulative with the previous patch and each other:
 The first removes the assumption in ordered_set_startup that the
 aggcontext can be cached in fn_extra, and puts it in the transvalue
 instead.

OK, done.  (ATM it seems like we wouldn't need gcontext in the transvalue
either, but I left it there in case we want it later.)

 The second exposes the OSA* structures in a header file, so that
 user-defined ordered-set aggs can use ordered_set_transition[_multi]
 directly rather than having to cargo-cult it into their own code.

I'm not very happy about this one; we intentionally did not expose
these structs, so that we could freely make fixes like, for example,
your immediately preceding patch.

I think it'd be worth considering whether there's a way to allow
third-party ordered-set aggs to use the infrastructure in orderedsetaggs.c
while still treating these structs as opaque.  In any case we'd need a
more carefully thought-through API than just decreeing that some private
structs are now public.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-02 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 1. Assumptions about the relationship between aggcontext and fn_extra

 Tom's version of the ordered-set aggregate code makes the assumption
 that it is safe to store the aggcontext returned by AggCheckCallContext
 in a structure hung off flinfo-fn_extra. This implicitly assumes that
 the function will be called in only one aggcontext, or at least only one
 per flinfo.

 Doing rollup via GroupAggregate by maintaining multiple transition
 values at a time (one per grouping set) means that the transfn is being
 called interleaved for transition values in different contexts. So the
 question becomes: is it wrong for the transition function to assume that
 aggcontext can be cached this way, or is it necessary for the executor
 to use a separate flinfo for each concurrent grouping set?

Hm.  We could probably move gcontext into the per-group data.  I'm not
sure though if there are any other dependencies there on the groups being
evaluated serially.  More generally, I wonder whether user-defined
aggregates are likely to be making assumptions that will be broken by
this.

 2. AggGetPerAggEContext

 The comment documents this as returning the per-output-tuple econtext,
 and the ordered-set code assumes that the rescan of this context implies
 that the aggcontext is also about to be reset (so cleanup functions can
 be called).

 Since it seems that the cleanup callback is the sole reason for this
 function to exist, is it acceptable to remove any implication that the
 context returned is the overall per-output-tuple one, and simply state
 that it is a context whose cleanup functions are called at the
 appropriate time before aggcontext is reset?

Well, I think the implication is that it's the econtext in which the
result of the aggregation will be used.

Another approach would be to remove AggGetPerAggEContext as such from the
API altogether, and instead offer an interface that says register an
aggregate cleanup callback, leaving it to the agg/window core code to
figure out which context to hang that on.  I had thought that exposing
this econtext and the per-input-tuple one would provide useful generality,
but maybe we should rethink that.

Obviously, the time grows short to reconsider this API, but it's not
quite too late.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-02 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  Doing rollup via GroupAggregate by maintaining multiple transition
  values at a time (one per grouping set) means that the transfn is
  being called interleaved for transition values in different
  contexts. So the question becomes: is it wrong for the transition
  function to assume that aggcontext can be cached this way, or is
  it necessary for the executor to use a separate flinfo for each
  concurrent grouping set?

 Tom Hm.  We could probably move gcontext into the per-group data.
 Tom I'm not sure though if there are any other dependencies there on
 Tom the groups being evaluated serially.  More generally, I wonder
 Tom whether user-defined aggregates are likely to be making
 Tom assumptions that will be broken by this.

The thing is that almost everything _except_ aggcontext and
AggGetPerAggEContext that a transfn might want to hang off fn_extra
really is going to be constant across all groups.

The big question, as you say, is whether this is going to be an issue
for existing user-defined aggregates.

  Since it seems that the cleanup callback is the sole reason for
  this function to exist, is it acceptable to remove any implication
  that the context returned is the overall per-output-tuple one, and
  simply state that it is a context whose cleanup functions are
  called at the appropriate time before aggcontext is reset?

 Tom Well, I think the implication is that it's the econtext in which
 Tom the result of the aggregation will be used.

In the rollup case, though, it does not seem reasonable to have
multiple result-tuple econtexts (that would significantly complicate
the projection of rows, the handling of rescans, etc.).

 Tom Another approach would be to remove AggGetPerAggEContext as such
 Tom from the API altogether, and instead offer an interface that
 Tom says register an aggregate cleanup callback, leaving it to the
 Tom agg/window core code to figure out which context to hang that
 Tom on.  I had thought that exposing this econtext and the
 Tom per-input-tuple one would provide useful generality, but maybe
 Tom we should rethink that.

Works for me.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-02 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom Another approach would be to remove AggGetPerAggEContext as such
  Tom from the API altogether, and instead offer an interface that
  Tom says register an aggregate cleanup callback, leaving it to the
  Tom agg/window core code to figure out which context to hang that
  Tom on.  I had thought that exposing this econtext and the
  Tom per-input-tuple one would provide useful generality, but maybe
  Tom we should rethink that.

 Works for me.

If we're going to do that, I think it needs to be in 9.4.  Are you
going to work up a patch?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-02 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 Tom Another approach would be to remove AggGetPerAggEContext as such
 Tom from the API altogether, and instead offer an interface that
 Tom says register an aggregate cleanup callback, leaving it to the
 Tom agg/window core code to figure out which context to hang that
 Tom on.  I had thought that exposing this econtext and the
 Tom per-input-tuple one would provide useful generality, but maybe
 Tom we should rethink that.

  Works for me.

 Tom If we're going to do that, I think it needs to be in 9.4.  Are
 Tom you going to work up a patch?

Do we want a decision on the fn_extra matter first, or shall I do one
patch for the econtext, and a following one for fn_extra?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-02 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom If we're going to do that, I think it needs to be in 9.4.  Are
  Tom you going to work up a patch?

 Do we want a decision on the fn_extra matter first, or shall I do one
 patch for the econtext, and a following one for fn_extra?

I think they're somewhat independent, and probably best patched
separately.  In any case orderedsetagg.c's use of fn_extra is a local
matter that we'd not really have to fix in 9.4, except to the extent
that you think third-party code might copy it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate function API versus grouping sets

2014-07-02 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  Do we want a decision on the fn_extra matter first, or shall I do
  one patch for the econtext, and a following one for fn_extra?

 Tom I think they're somewhat independent, and probably best patched
 Tom separately.  In any case orderedsetagg.c's use of fn_extra is a
 Tom local matter that we'd not really have to fix in 9.4, except to
 Tom the extent that you think third-party code might copy it.

Given that there's been no attempt to expose ordered_set_startup /
ordered_set_transition* as some sort of API, I think it's virtually
inevitable that people will cargo-cult all of that code into any new
ordered set aggregate they might wish to create.

(Had one request so far for a mode() variant that returns the unique
modal value if one exists, otherwise null; so the current set of
ordered-set aggs by no means exhausts the possible applications.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers