Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-12-28 Thread Jeff Davis
On Thu, 2014-12-11 at 02:46 -0800, Jeff Davis wrote: On Sun, 2014-08-10 at 14:26 -0700, Jeff Davis wrote: This patch is requires the Memory Accounting patch, or something similar to track memory usage. The attached patch enables hashagg to spill to disk, which means that hashagg will

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-12-11 Thread Jeff Davis
On Sun, 2014-08-10 at 14:26 -0700, Jeff Davis wrote: This patch is requires the Memory Accounting patch, or something similar to track memory usage. The attached patch enables hashagg to spill to disk, which means that hashagg will contain itself to work_mem even if the planner makes a bad

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-12-11 Thread Tomas Vondra
On 11.12.2014 11:46, Jeff Davis wrote: New patch attached. All open items are complete, though the patch may have a few rough edges. Summary of changes: * rebased on top of latest memory accounting patch http://www.postgresql.org/message-id/1417497257.5584.5.camel@jeff-desktop * added

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-09-08 Thread Robert Haas
On Wed, Sep 3, 2014 at 7:34 PM, Tomas Vondra t...@fuzzy.cz wrote: Well, I think you're certainly right that a hash table lookup is more expensive than modulo on a 32-bit integer; so much is obvious. But if join can increase the number of batches on the fly, but only by doubling it, so you

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-09-03 Thread Tomas Vondra
On 4.9.2014 00:42, Tomas Vondra wrote: Attached are two CSV files contain both raw results (4 runs per query), and aggregated results (average of the runs), logs with complete logs and explain (analyze) plans of the queries for inspection. Of course, I forgot to attach the CSV files ... here

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-09-03 Thread Tomas Vondra
On 20.8.2014 20:32, Robert Haas wrote: On Sun, Aug 17, 2014 at 1:17 PM, Tomas Vondra t...@fuzzy.cz wrote: Being able to batch inner and outer relations in a matching way is certainly one of the reasons why hashjoin uses that particular scheme. There are other reasons, though - for example

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-09-03 Thread Tomas Vondra
On 4.9.2014 01:34, Tomas Vondra wrote: On 20.8.2014 20:32, Robert Haas wrote: As I see it, the advantage of Jeff's approach is that it doesn't really matter whether our estimates are accurate or not. We don't have to decide at the beginning how many batches to do, and then possibly end up

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-28 Thread Tomas Vondra
On 26.8.2014 21:38, Jeff Davis wrote: On Tue, 2014-08-26 at 12:39 +0300, Heikki Linnakangas wrote: I think this is enough for this commitfest - we have consensus on the design. For the next one, please address those open items, and resubmit. Agreed, return with feedback. I need to get

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-26 Thread Heikki Linnakangas
Summary of this thread so far: There was a lot of discussion comparing this with Tomas Vondra's Hash Join patch. The conclusion was that while it would be nice to be able to dump transition state to disk, for aggregates like array_agg, the patch is fine as it is. Dumping transition states

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-26 Thread Jeff Davis
On Tue, 2014-08-26 at 12:39 +0300, Heikki Linnakangas wrote: I think this is enough for this commitfest - we have consensus on the design. For the next one, please address those open items, and resubmit. Agreed, return with feedback. I need to get the accounting patch in first, which needs to

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-21 Thread Jeff Davis
On Wed, 2014-08-20 at 14:32 -0400, Robert Haas wrote: Well, I think you're certainly right that a hash table lookup is more expensive than modulo on a 32-bit integer; so much is obvious. But if the load factor is not too large, I think that it's not a *lot* more expensive, so it could be

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-20 Thread Robert Haas
On Sun, Aug 17, 2014 at 1:17 PM, Tomas Vondra t...@fuzzy.cz wrote: Being able to batch inner and outer relations in a matching way is certainly one of the reasons why hashjoin uses that particular scheme. There are other reasons, though - for example being able to answer 'Does this group

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-19 Thread Jeff Davis
On Fri, 2014-08-15 at 13:53 -0400, Robert Haas wrote: I think that's right, and I rather like your (Jeff's) approach. It's definitely true that we could do better if we have a mechanism for serializing and deserializing group states, but (1) I think an awful lot of cases would get an awful

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-19 Thread Tomas Vondra
On 19 Srpen 2014, 9:52, Jeff Davis wrote: On Fri, 2014-08-15 at 13:53 -0400, Robert Haas wrote: I think that's right, and I rather like your (Jeff's) approach. It's definitely true that we could do better if we have a mechanism for serializing and deserializing group states, but (1) I think

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-17 Thread Tomas Vondra
On 10.8.2014 23:26, Jeff Davis wrote: This patch is requires the Memory Accounting patch, or something similar to track memory usage. The attached patch enables hashagg to spill to disk, which means that hashagg will contain itself to work_mem even if the planner makes a bad misestimate of

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-17 Thread Tomas Vondra
On 15.8.2014 19:53, Robert Haas wrote: On Thu, Aug 14, 2014 at 2:21 PM, Jeff Davis pg...@j-davis.com wrote: On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote: Oh? So if we have aggregates like array_agg whose memory footprint increases over time, the patch completely fails to avoid bloat?

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-15 Thread Robert Haas
On Thu, Aug 14, 2014 at 2:21 PM, Jeff Davis pg...@j-davis.com wrote: On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote: Oh? So if we have aggregates like array_agg whose memory footprint increases over time, the patch completely fails to avoid bloat? Yes, in its current form. I might think

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Jeff Davis
I think the hash-join like approach is reasonable, but I also think you're going to run into a lot of challenges that make it more complex for HashAgg. For instance, let's say you have the query: SELECT x, array_agg(y) FROM foo GROUP BY x; Say the transition state is an array (for the sake of

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes: I think the hash-join like approach is reasonable, but I also think you're going to run into a lot of challenges that make it more complex for HashAgg. For instance, let's say you have the query: SELECT x, array_agg(y) FROM foo GROUP BY x; Say the

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tomas Vondra
On 14 Srpen 2014, 9:22, Jeff Davis wrote: I think the hash-join like approach is reasonable, but I also think you're going to run into a lot of challenges that make it more complex for HashAgg. For instance, let's say you have the query: SELECT x, array_agg(y) FROM foo GROUP BY x; Say the

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Jeff Davis
On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote: If you're following the HashJoin model, then what you do is the same thing it does: you write the input tuple back out to the pending batch file for the hash partition that now contains key 1001, whence it will be processed when you get to

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Atri Sharma
On Thursday, August 14, 2014, Jeff Davis pg...@j-davis.com wrote: On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote: If you're following the HashJoin model, then what you do is the same thing it does: you write the input tuple back out to the pending batch file for the hash partition that

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes: HashJoin only deals with tuples. With HashAgg, you have to deal with a mix of tuples and partially-computed aggregate state values. Not impossible, but it is a little more awkward than HashJoin. Not sure that I follow your point. You're going to have to

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tomas Vondra
On 14 Srpen 2014, 18:12, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: HashJoin only deals with tuples. With HashAgg, you have to deal with a mix of tuples and partially-computed aggregate state values. Not impossible, but it is a little more awkward than HashJoin. Not sure that I

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tomas Vondra
On 14 Srpen 2014, 18:02, Atri Sharma wrote: On Thursday, August 14, 2014, Jeff Davis pg...@j-davis.com wrote: On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote: If you're following the HashJoin model, then what you do is the same thing it does: you write the input tuple back out to the

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tom Lane
Tomas Vondra t...@fuzzy.cz writes: On 14 Srpen 2014, 18:12, Tom Lane wrote: Not sure that I follow your point. You're going to have to deal with that no matter what, no? That is not how the patch work. Once the memory consumption hits work_mem, it keeps the already existing groups in

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Jeff Davis
On Thu, 2014-08-14 at 16:17 +0200, Tomas Vondra wrote: Either it belongs to the current batch (and either it's in the hash table, or you add it there), or it's not - in that case write it to a temp file. I think the part you left out is that you need two files per batch: one for the dumped-out

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Atri Sharma
On Thu, Aug 14, 2014 at 10:21 PM, Tomas Vondra t...@fuzzy.cz wrote: On 14 Srpen 2014, 18:02, Atri Sharma wrote: On Thursday, August 14, 2014, Jeff Davis pg...@j-davis.com wrote: On Thu, 2014-08-14 at 10:06 -0400, Tom Lane wrote: If you're following the HashJoin model, then what you do

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Jeff Davis
On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote: Oh? So if we have aggregates like array_agg whose memory footprint increases over time, the patch completely fails to avoid bloat? Yes, in its current form. I might think a patch with such a limitation was useful, if it weren't for the fact

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tomas Vondra
On 14.8.2014 18:54, Jeff Davis wrote: On Thu, 2014-08-14 at 16:17 +0200, Tomas Vondra wrote: Either it belongs to the current batch (and either it's in the hash table, or you add it there), or it's not - in that case write it to a temp file. I think the part you left out is that you need

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-14 Thread Tomas Vondra
On 14.8.2014 21:47, Tomas Vondra wrote: On 14.8.2014 18:54, Jeff Davis wrote: On Thu, 2014-08-14 at 16:17 +0200, Tomas Vondra wrote: Either it belongs to the current batch (and either it's in the hash table, or you add it there), or it's not - in that case write it to a temp file. I think

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-13 Thread Tomas Vondra
On 13 Srpen 2014, 7:02, Jeff Davis wrote: On Tue, 2014-08-12 at 14:58 +0200, Tomas Vondra wrote: CREATE AGGREGATE myaggregate ( ... SERIALIZE_FUNC = 'dump_data', DESERIALIZE_FUNC = 'read_data', ... ); Seems reasonable. I don't see why it should get messy? In the end, you

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-13 Thread Tomas Vondra
On 13.8.2014 12:31, Tomas Vondra wrote: On 13 Srpen 2014, 7:02, Jeff Davis wrote: On Tue, 2014-08-12 at 14:58 +0200, Tomas Vondra wrote: (b) bad estimate of required memory - this is common for aggregates passing 'internal' state (planner uses some quite high defaults) Maybe some

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-12 Thread Tomas Vondra
On 12 Srpen 2014, 7:06, Jeff Davis wrote: On Mon, 2014-08-11 at 01:29 +0200, Tomas Vondra wrote: On 10.8.2014 23:26, Jeff Davis wrote: This patch is requires the Memory Accounting patch, or something similar to track memory usage. I think the patch you sent actually includes the accounting

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-12 Thread Jeff Davis
On Tue, 2014-08-12 at 14:58 +0200, Tomas Vondra wrote: CREATE AGGREGATE myaggregate ( ... SERIALIZE_FUNC = 'dump_data', DESERIALIZE_FUNC = 'read_data', ... ); Seems reasonable. I don't see why it should get messy? In the end, you have a chunk of data and a hash for it.

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-11 Thread Jeff Davis
On Mon, 2014-08-11 at 01:29 +0200, Tomas Vondra wrote: On 10.8.2014 23:26, Jeff Davis wrote: This patch is requires the Memory Accounting patch, or something similar to track memory usage. I think the patch you sent actually includes the accounting patch. Is that on purpose, or by

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-10 Thread Tomas Vondra
Hi, it's 1AM here, so only a few comments after quickly reading the patch. On 10.8.2014 23:26, Jeff Davis wrote: This patch is requires the Memory Accounting patch, or something similar to track memory usage. I think the patch you sent actually includes the accounting patch. Is that on