subject:"\\\[HACKERS\\\] \\\[PATCH\\\] Lazy hashaggregate when no aggregation is needed"

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-07-02 Thread Robert Haas

On Thu, Jun 28, 2012 at 10:22 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 Honestly, I'm not sure that it's worth including this, considering the use
 case...

Since nobody seems crazy about pursuing this, I'm marking this patch Rejected.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-07-02 Thread Ants Aasma

On Mon, Jul 2, 2012 at 5:33 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 28, 2012 at 10:22 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:
 Honestly, I'm not sure that it's worth including this, considering the use
 case...

 Since nobody seems crazy about pursuing this, I'm marking this patch Rejected.

Thank you, for considering this and many thanks to Etsuro Fujita for reviewing.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-28 Thread Etsuro Fujita

Hi Ants,

 -Original Message-
 From: Ants Aasma [mailto:a...@cybertec.at]
 Sent: Wednesday, June 27, 2012 9:23 PM
 To: Robert Haas
 Cc: Etsuro Fujita; Jay Levitt; Tom Lane; PostgreSQL-development; Francois
 Deliege
 Subject: Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is
needed

 Sorry about the delay in answering. I have been swamped with non-PG
 related things lately.

 On Tue, Jun 26, 2012 at 11:08 PM, Robert Haas robertmh...@gmail.com wrote:
  On Fri, Jun 22, 2012 at 10:12 AM, Robert Haas robertmh...@gmail.com wrote:
  On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita
  fujita.ets...@lab.ntt.co.jp wrote:
  I'm confused by this remark, because surely the query planner does it
this
  way only if there's no LIMIT.  When there is a LIMIT, we choose based on
  the startup cost plus the estimated fraction of the total cost we expect
  to pay based on dividing the LIMIT by the overall row count estimate.  Or
  is this not what you're talking about?

 My reasoning was that query_planner returns the cheapest-total path
 and cheapest fractional presorted (by the aggregation pathkeys). When
 evaluating hash-aggregates with this patch these two are indeed
 compared considering the esimated fraction of the total cost, but this
 might miss cheapest fractional unordered path for lazy hashaggregates.

 Reviewing the code now I discovered this path could be picked out from
 the pathlist, just like it is done by
 get_cheapest_fractional_path_for_pathkeys when pathkeys is nil. This
 would need to be returned in addition to the other two paths. To
 minimize overhead, this should only be done when we possibly want to
 consider lazy hash-aggregation (there is a group clause with no
 aggregates and grouping is hashable) But this is starting to get
 pretty crufty considering that there doesn't seem to be any really
 compelling usecases for this.

  Ants, do you intend to update this patch for this CommitFest?  Or at
  all?  It seems nobody's too excited about this, so I'm not sure
  whether it makes sense for you to put more work on it.  But please
  advise as to your plans.

 If anyone thinks that this patch might be worth considering, then I'm
 prepared to do minor cleanup this CF (I saw some possibly unnecessary
 cruft in agg_fill_hash_and_retrieve). On the other hand, if you think
 the use case is too marginal to consider for inclusion then I won't
 shed a tear if this gets rejected. For me this was mostly a learning
 experience for poking around in the planner.

Honestly, I'm not sure that it's worth including this, considering the use
case...

Thanks,

Best regards,
Etsuro Fujita

 Ants Aasma
 --
 Cybertec Schönig  Schönig GmbH
 Gröhrmühlgasse 26
 A-2700 Wiener Neustadt
 Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-27 Thread Ants Aasma

Sorry about the delay in answering. I have been swamped with non-PG
related things lately.

On Tue, Jun 26, 2012 at 11:08 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 22, 2012 at 10:12 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:
 I'm confused by this remark, because surely the query planner does it this
 way only if there's no LIMIT.  When there is a LIMIT, we choose based on
 the startup cost plus the estimated fraction of the total cost we expect
 to pay based on dividing the LIMIT by the overall row count estimate.  Or
 is this not what you're talking about?

My reasoning was that query_planner returns the cheapest-total path
and cheapest fractional presorted (by the aggregation pathkeys). When
evaluating hash-aggregates with this patch these two are indeed
compared considering the esimated fraction of the total cost, but this
might miss cheapest fractional unordered path for lazy hashaggregates.

Reviewing the code now I discovered this path could be picked out from
the pathlist, just like it is done by
get_cheapest_fractional_path_for_pathkeys when pathkeys is nil. This
would need to be returned in addition to the other two paths. To
minimize overhead, this should only be done when we possibly want to
consider lazy hash-aggregation (there is a group clause with no
aggregates and grouping is hashable) But this is starting to get
pretty crufty considering that there doesn't seem to be any really
compelling usecases for this.

 Ants, do you intend to update this patch for this CommitFest?  Or at
 all?  It seems nobody's too excited about this, so I'm not sure
 whether it makes sense for you to put more work on it.  But please
 advise as to your plans.

If anyone thinks that this patch might be worth considering, then I'm
prepared to do minor cleanup this CF (I saw some possibly unnecessary
cruft in agg_fill_hash_and_retrieve). On the other hand, if you think
the use case is too marginal to consider for inclusion then I won't
shed a tear if this gets rejected. For me this was mostly a learning
experience for poking around in the planner.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-26 Thread Robert Haas

On Fri, Jun 22, 2012 at 10:12 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:
 I'm confused by this remark, because surely the query planner does it this
 way only if there's no LIMIT.  When there is a LIMIT, we choose based on
 the startup cost plus the estimated fraction of the total cost we expect
 to pay based on dividing the LIMIT by the overall row count estimate.  Or
 is this not what you're talking about?

 I think that Ants is pointing the way of estimating costs in
 choose_hashed_grouping()/choose_hashed_distinct(), ie cost_agg() for
 cheapest_path + hashagg, where the costs are calculated based on the total
 cost only of cheapest_path.  I think that it might be good to do cost_agg()
 for the discussed case with the AGG_SORTED strategy, not the AGG_HASHED
 strategy.

 Well, Ants already made some adjustments to those functions; not sure
 if this means they need some more adjustment, but I don't see that
 there's a general problem with the costing algorithm around LIMIT.

Ants, do you intend to update this patch for this CommitFest?  Or at
all?  It seems nobody's too excited about this, so I'm not sure
whether it makes sense for you to put more work on it.  But please
advise as to your plans.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-26 Thread Etsuro Fujita

Hi,

 -Original Message-
 From: Robert Haas [mailto:robertmh...@gmail.com]
 Sent: Wednesday, June 27, 2012 5:09 AM
 To: Etsuro Fujita
 Cc: Ants Aasma; Jay Levitt; Tom Lane; PostgreSQL-development; Francois Deliege
 Subject: Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is
needed

 On Fri, Jun 22, 2012 at 10:12 AM, Robert Haas robertmh...@gmail.com wrote:
  On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita
  fujita.ets...@lab.ntt.co.jp wrote:
  I'm confused by this remark, because surely the query planner does it this
  way only if there's no LIMIT.  When there is a LIMIT, we choose based on
  the startup cost plus the estimated fraction of the total cost we expect
  to pay based on dividing the LIMIT by the overall row count estimate.  Or
  is this not what you're talking about?

  I think that Ants is pointing the way of estimating costs in
  choose_hashed_grouping()/choose_hashed_distinct(), ie cost_agg() for
  cheapest_path + hashagg, where the costs are calculated based on the total
  cost only of cheapest_path.  I think that it might be good to do cost_agg()
  for the discussed case with the AGG_SORTED strategy, not the AGG_HASHED
  strategy.

  Well, Ants already made some adjustments to those functions; not sure
  if this means they need some more adjustment, but I don't see that
  there's a general problem with the costing algorithm around LIMIT.

 Ants, do you intend to update this patch for this CommitFest?  Or at
 all?  It seems nobody's too excited about this, so I'm not sure
 whether it makes sense for you to put more work on it.  But please
 advise as to your plans.

Please excuse my slow response, I would also like to know your plan.

Best regards,
Etsuro Fujita

 Thanks,

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-22 Thread Robert Haas

On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 I'm confused by this remark, because surely the query planner does it this
 way only if there's no LIMIT.  When there is a LIMIT, we choose based on
 the startup cost plus the estimated fraction of the total cost we expect
 to pay based on dividing the LIMIT by the overall row count estimate.  Or
 is this not what you're talking about?

 I think that Ants is pointing the way of estimating costs in
 choose_hashed_grouping()/choose_hashed_distinct(), ie cost_agg() for
 cheapest_path + hashagg, where the costs are calculated based on the total
 cost only of cheapest_path.  I think that it might be good to do cost_agg()
 for the discussed case with the AGG_SORTED strategy, not the AGG_HASHED
 strategy.

Well, Ants already made some adjustments to those functions; not sure
if this means they need some more adjustment, but I don't see that
there's a general problem with the costing algorithm around LIMIT.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-19 Thread Etsuro Fujita

Hi,

 -Original Message-
 From: Robert Haas [mailto:robertmh...@gmail.com]
 Sent: Tuesday, June 19, 2012 3:12 AM
 To: Ants Aasma
 Cc: Etsuro Fujita; Jay Levitt; Tom Lane; PostgreSQL-development; Francois
 Deliege
 Subject: Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is
 needed

 On Fri, Jun 15, 2012 at 9:22 AM, Ants Aasma a...@cybertec.at wrote:
  Exactly. I think the first question for this patch should be whether
  this use-case is worth the complexity of the patch. I can't imagine
  any really compelling use cases that need an arbitrary distinct subset
  of results.

 Me neither.

  The original complaint on -performance [1], didn't specify a real
  world use case, but it seemed to be a case of an ORM generating
  suboptimal queries. On the other hand, the patch itself is in my
  opinion rather simple, so it might be worth it.

 Yeah.

  It has one outstanding issue, query_planner chooses the cheapest path
  based on total cost. This can be suboptimal when that path happens to
  have high startup cost. It seems to me that enabling the query_planner
  to find the cheapest unsorted path returning a limited amount of
  tuples would require some major surgery to the planner. To be clear,
  this is only a case of missed optimization, not a regression.

 I'm confused by this remark, because surely the query planner does it this
 way only if there's no LIMIT.  When there is a LIMIT, we choose based on
 the startup cost plus the estimated fraction of the total cost we expect
 to pay based on dividing the LIMIT by the overall row count estimate.  Or
 is this not what you're talking about?

I think that Ants is pointing the way of estimating costs in
choose_hashed_grouping()/choose_hashed_distinct(), ie cost_agg() for
cheapest_path + hashagg, where the costs are calculated based on the total
cost only of cheapest_path.  I think that it might be good to do cost_agg()
for the discussed case with the AGG_SORTED strategy, not the AGG_HASHED
strategy.

  It won't help set returning functions because the tuplestore for those
  is fully materialized when the first row is fetched.

 That's surely a problem for another day.

OK

Best regards,
Etsuro Fujita

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-18 Thread Robert Haas

On Fri, Jun 15, 2012 at 9:22 AM, Ants Aasma a...@cybertec.at wrote:
 Exactly. I think the first question for this patch should be whether
 this use-case is worth the complexity of the patch. I can't imagine
 any really compelling use cases that need an arbitrary distinct subset
 of results.

Me neither.

 The original complaint on -performance [1], didn't specify
 a real world use case, but it seemed to be a case of an ORM generating
 suboptimal queries. On the other hand, the patch itself is in my
 opinion rather simple, so it might be worth it.

Yeah.

 It has one outstanding issue, query_planner chooses the cheapest path
 based on total cost. This can be suboptimal when that path happens to
 have high startup cost. It seems to me that enabling the query_planner
 to find the cheapest unsorted path returning a limited amount of
 tuples would require some major surgery to the planner. To be clear,
 this is only a case of missed optimization, not a regression.

I'm confused by this remark, because surely the query planner does it
this way only if there's no LIMIT.  When there is a LIMIT, we choose
based on the startup cost plus the estimated fraction of the total
cost we expect to pay based on dividing the LIMIT by the overall row
count estimate.  Or is this not what you're talking about?

 It won't help set returning functions because the tuplestore for those
 is fully materialized when the first row is fetched.

That's surely a problem for another day.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-15 Thread Etsuro Fujita


Hi,

I would like to ask a question before looking into the patch.

At 21:56 12/03/30 -0400, Jay Levitt wrote:

Tom Lane wrote:

Ants Aasmaa...@cybertec.at writes:

A user complained on pgsql-performance that SELECT col FROM table
GROUP BY col LIMIT 2; performs a full table scan. ISTM that it's safe
to return tuples from hash-aggregate as they are found when no
aggregate functions are in use. Attached is a first shot at that.


As I commented in the other thread, the user would be a lot better off
if he'd had an index on the column in question. I'm not sure it's worth
complicating the hashagg logic when an indexscan + groupagg would
address the case better.


Would this patch help in the case where table is actually a 
set-returning function, and thus can't have an index?


ISTM that in many cases, the result size of a set-returning function is 
not so large compared with that of a full plain table scan.  So, in such a 
case a full hash aggregation is not so time consuming.  Am I wrong?


Best regards,
Etsuro Fujita 



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-15 Thread Robert Haas

On Fri, Jun 15, 2012 at 6:55 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 A user complained on pgsql-performance that SELECT col FROM table
 GROUP BY col LIMIT 2; performs a full table scan. ISTM that it's safe
 to return tuples from hash-aggregate as they are found when no
 aggregate functions are in use. Attached is a first shot at that.


 As I commented in the other thread, the user would be a lot better off
 if he'd had an index on the column in question. I'm not sure it's worth
 complicating the hashagg logic when an indexscan + groupagg would
 address the case better.

 Would this patch help in the case where table is actually a
 set-returning function, and thus can't have an index?

 ISTM that in many cases, the result size of a set-returning function is not
 so large compared with that of a full plain table scan.  So, in such a case
 a full hash aggregation is not so time consuming.  Am I wrong?

This query is a little unusual in that it involves both an aggregate
and a limit.

Now, sorted aggregates work pretty well with limit, because you can be
sure upon seeing the beginning of the next group that you are done
with the previous group.  But in a hash aggregate, you normally can't
start returning results until you've read the entire input, so it
doesn't work so well with limit.

However, as Ants points out, we could make it work better for the
special case where we're not actually doing any aggregation, because
in that case we can emit the row for each group when the group is
created, rather than waiting until end-of-input.  This is only going
to help when there is a LIMIT, though.  Moreover, if there happens to
be an ORDER BY, then the data will have to be pre-sorted, in which
case you may as well use a sorted aggregate.  So the use case for this
optimization is basically DISTINCT plus LIMIT but not ORDER BY.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-15 Thread Ants Aasma

On Fri, Jun 15, 2012 at 3:13 PM, Robert Haas robertmh...@gmail.com wrote:
However, as Ants points out, we could make it work better for the
special case where we're not actually doing any aggregation, because
in that case we can emit the row for each group when the group is
created, rather than waiting until end-of-input. This is only going
to help when there is a LIMIT, though. Moreover, if there happens to
be an ORDER BY, then the data will have to be pre-sorted, in which
case you may as well use a sorted aggregate. So the use case for this
optimization is basically DISTINCT plus LIMIT but not ORDER BY.

Exactly. I think the first question for this patch should be whether
this use-case is worth the complexity of the patch. I can't imagine
any really compelling use cases that need an arbitrary distinct subset
of results. The original complaint on -performance [1], didn't specify
a real world use case, but it seemed to be a case of an ORM generating
suboptimal queries. On the other hand, the patch itself is in my
opinion rather simple, so it might be worth it.

It has one outstanding issue, query_planner chooses the cheapest path
based on total cost. This can be suboptimal when that path happens to
have high startup cost. It seems to me that enabling the query_planner
to find the cheapest unsorted path returning a limited amount of
tuples would require some major surgery to the planner. To be clear,
this is only a case of missed optimization, not a regression.

It won't help set returning functions because the tuplestore for those
is fully materialized when the first row is fetched.

[1]
http://archives.postgresql.org/message-id/16737833.463.1332881676120.JavaMail.geo-discussion-forums%40pbcpw7

Ants Aasma
--
Cybertec Schönig Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-03-30 Thread Jay Levitt


Tom Lane wrote:

Ants Aasmaa...@cybertec.at writes:

A user complained on pgsql-performance that SELECT col FROM table
GROUP BY col LIMIT 2; performs a full table scan. ISTM that it's safe
to return tuples from hash-aggregate as they are found when no
aggregate functions are in use. Attached is a first shot at that.


As I commented in the other thread, the user would be a lot better off
if he'd had an index on the column in question. I'm not sure it's worth
complicating the hashagg logic when an indexscan + groupagg would
address the case better.


Would this patch help in the case where table is actually a set-returning 
function, and thus can't have an index? (I don't yet know enough about the 
tree to know when hashaggs get used). I'm wondering if this is a useful 
exception to the restrictions can't get pushed down through GROUP BYs rule.


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-03-28 Thread Tom Lane

Ants Aasma a...@cybertec.at writes:
 A user complained on pgsql-performance that SELECT col FROM table
 GROUP BY col LIMIT 2; performs a full table scan. ISTM that it's safe
 to return tuples from hash-aggregate as they are found when no
 aggregate functions are in use. Attached is a first shot at that.

As I commented in the other thread, the user would be a lot better off
if he'd had an index on the column in question.  I'm not sure it's worth
complicating the hashagg logic when an indexscan + groupagg would
address the case better.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-03-27 Thread Ants Aasma

A user complained on pgsql-performance that SELECT col FROM table
GROUP BY col LIMIT 2; performs a full table scan. ISTM that it's safe
to return tuples from hash-aggregate as they are found when no
aggregate functions are in use. Attached is a first shot at that. The
planner is modified so that when the optimization applies, hash table
size check is compared against the limit and start up cost comes from
the input. The executor is modified so that when the hash table is not
filled yet and the optimization applies, nodes are returned
immediately.

Can somebody poke holes in this? The patch definitely needs some code
cleanup in nodeAgg.c, but otherwise it passes regression tests and
seems to work as intended. It also optimizes the SELECT DISTINCT col
FROM table LIMIT 2; case, but not SELECT DISTINCT ON (col) col FROM
table LIMIT 2 because it is explicitly forced to use sorted
aggregation.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index c9aa921..53cdd09 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -274,9 +274,10 @@ static Bitmapset *find_unaggregated_cols(AggState *aggstate);
 static bool find_unaggregated_cols_walker(Node *node, Bitmapset **colnos);
 static void build_hash_table(AggState *aggstate);
 static AggHashEntry lookup_hash_entry(AggState *aggstate,
-  TupleTableSlot *inputslot);
+  TupleTableSlot *inputslot, bool *isnew);
 static TupleTableSlot *agg_retrieve_direct(AggState *aggstate);
 static void agg_fill_hash_table(AggState *aggstate);
+static TupleTableSlot *agg_fill_hash_and_retrieve(AggState *aggstate);
 static TupleTableSlot *agg_retrieve_hash_table(AggState *aggstate);
 static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
 
@@ -920,12 +921,11 @@ hash_agg_entry_size(int numAggs)
  * When called, CurrentMemoryContext should be the per-query context.
  */
 static AggHashEntry
-lookup_hash_entry(AggState *aggstate, TupleTableSlot *inputslot)
+lookup_hash_entry(AggState *aggstate, TupleTableSlot *inputslot, bool *isnew)
 {
 	TupleTableSlot *hashslot = aggstate-hashslot;
 	ListCell   *l;
 	AggHashEntry entry;
-	bool		isnew;
 
 	/* if first time through, initialize hashslot by cloning input slot */
 	if (hashslot-tts_tupleDescriptor == NULL)
@@ -948,9 +948,9 @@ lookup_hash_entry(AggState *aggstate, TupleTableSlot *inputslot)
 	/* find or create the hashtable entry using the filtered tuple */
 	entry = (AggHashEntry) LookupTupleHashEntry(aggstate-hashtable,
 hashslot,
-isnew);
+isnew);
 
-	if (isnew)
+	if (*isnew)
 	{
 		/* initialize aggregates for new tuple group */
 		initialize_aggregates(aggstate, aggstate-peragg, entry-pergroup);
@@ -1004,7 +1004,12 @@ ExecAgg(AggState *node)
 	if (((Agg *) node-ss.ps.plan)-aggstrategy == AGG_HASHED)
 	{
 		if (!node-table_filled)
-			agg_fill_hash_table(node);
+		{
+			if (node-numaggs)
+agg_fill_hash_table(node);
+			else
+return agg_fill_hash_and_retrieve(node);
+		}
 		return agg_retrieve_hash_table(node);
 	}
 	else
@@ -1222,6 +1227,7 @@ agg_fill_hash_table(AggState *aggstate)
 	ExprContext *tmpcontext;
 	AggHashEntry entry;
 	TupleTableSlot *outerslot;
+	bool 		isnew;
 
 	/*
 	 * get state info from node
@@ -1243,7 +1249,7 @@ agg_fill_hash_table(AggState *aggstate)
 		tmpcontext-ecxt_outertuple = outerslot;
 
 		/* Find or build hashtable entry for this tuple's group */
-		entry = lookup_hash_entry(aggstate, outerslot);
+		entry = lookup_hash_entry(aggstate, outerslot, isnew);
 
 		/* Advance the aggregates */
 		advance_aggregates(aggstate, entry-pergroup);
@@ -1258,6 +1264,101 @@ agg_fill_hash_table(AggState *aggstate)
 }
 
 /*
+ * ExecAgg for hashed case: phase 1, read input and build hash table
+ * return found tuples immediately.
+ */
+static TupleTableSlot *
+agg_fill_hash_and_retrieve(AggState *aggstate)
+{
+	PlanState  *outerPlan;
+	ExprContext *tmpcontext;
+	AggHashEntry entry;
+	TupleTableSlot *outerslot;
+	bool		isnew;
+	ExprContext *econtext;
+	TupleTableSlot *firstSlot;
+
+	/*
+	 * get state info from node
+	 */
+	outerPlan = outerPlanState(aggstate);
+	/* tmpcontext is the per-input-tuple expression context */
+	tmpcontext = aggstate-tmpcontext;
+
+	econtext = aggstate-ss.ps.ps_ExprContext;
+	firstSlot = aggstate-ss.ss_ScanTupleSlot;
+
+	Assert(aggstate-numaggs == 0);
+
+	/*
+	 * Process each outer-plan tuple, and then fetch the next one, until we
+	 * exhaust the outer plan.
+	 */
+	for (;;)
+	{
+		outerslot = ExecProcNode(outerPlan);
+		if (TupIsNull(outerslot))
+		{
+			aggstate-table_filled = true;
+			/* Initialize to walk the hash table */
+			ResetTupleHashIterator(aggstate-hashtable, aggstate-hashiter);
+			return NULL;
+		}
+		/* set up for advance_aggregates call */
+		tmpcontext-ecxt_outertuple = outerslot;
+
+		/* Find or build hashtable entry for this tuple's group */
+

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

[HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

15 matches

Site Navigation

Mail list logo

Footer information