Re: Default setting for enable_hashagg_disk

2020-07-29 Thread Peter Geoghegan
On Mon, Jul 27, 2020 at 5:55 PM Peter Geoghegan wrote: > Pushed the hashagg_avoid_disk_plan patch -- thanks! Pushed the hash_mem_multiplier patch as well -- thanks again! As I've said before, I am not totally opposed to adding a true escape hatch. That has not proven truly necessary just yet.

Re: Default setting for enable_hashagg_disk

2020-07-29 Thread Jeff Davis
On Sat, 2020-07-25 at 17:52 -0700, Peter Geoghegan wrote: > BTW, your HLL patch ameliorates the problem with my extreme "sorted > vs > random input" test case from this morning [1] (the thing that I just > discussed with Tomas). Without the HLL patch the sorted case had 2424 > batches. With the

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Peter Geoghegan
On Mon, Jul 27, 2020 at 5:10 PM Jeff Davis wrote: > I noticed that one of the conditionals, "cheapest_total_path != NULL", > was already redundant with the outer conditional before your patch. I > guess that was just a mistake which your patch corrects along the way? Makes sense. > Anyway, the

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Jeff Davis
On Mon, 2020-07-27 at 11:30 -0700, Peter Geoghegan wrote: > The v4-0001-Remove-hashagg_avoid_disk_plan-GUC.patch changes are > surprisingly complicated. It would be nice if you could take a look > at > that aspect (or confirm that it's included in your review). I noticed that one of the

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Peter Geoghegan
On Mon, Jul 27, 2020 at 12:52 PM Alvaro Herrera wrote: > On 2020-Jul-27, Peter Geoghegan wrote: > > The v4-0001-Remove-hashagg_avoid_disk_plan-GUC.patch changes are > > surprisingly complicated. It would be nice if you could take a look at > > that aspect (or confirm that it's included in your

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Alvaro Herrera
On 2020-Jul-27, Peter Geoghegan wrote: > The v4-0001-Remove-hashagg_avoid_disk_plan-GUC.patch changes are > surprisingly complicated. It would be nice if you could take a look at > that aspect (or confirm that it's included in your review). I think you mean "it replaces surprisingly complicated

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Peter Geoghegan
On Mon, Jul 27, 2020 at 11:24 AM Alvaro Herrera wrote: > > Are you proposing that I just put the prototype in miscadmin.h, while > > leaving the implementation where it is (in nodeHash.c)? > > Yes, that's in the part of my reply you didn't quote: > > : It remains strange to have the function in

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Alvaro Herrera
On 2020-Jul-27, Peter Geoghegan wrote: > On Mon, Jul 27, 2020 at 10:30 AM Alvaro Herrera > wrote: > > On 2020-Jul-23, Peter Geoghegan wrote: > > I notice you put the prototype for get_hash_mem in nodeHash.h. This > > would be fine if not for the fact that optimizer needs to call the > >

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Peter Geoghegan
On Mon, Jul 27, 2020 at 10:30 AM Alvaro Herrera wrote: > On 2020-Jul-23, Peter Geoghegan wrote: > I notice you put the prototype for get_hash_mem in nodeHash.h. This > would be fine if not for the fact that optimizer needs to call the > function too, which means now optimizer have to include

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Alvaro Herrera
On 2020-Jul-23, Peter Geoghegan wrote: > Attached is v3 of the hash_mem_multiplier patch series, which now has > a preparatory patch that removes hashagg_avoid_disk_plan. I notice you put the prototype for get_hash_mem in nodeHash.h. This would be fine if not for the fact that optimizer needs

Re: Default setting for enable_hashagg_disk

2020-07-27 Thread Peter Geoghegan
On Sun, Jul 26, 2020 at 11:34 AM Tomas Vondra wrote: > That's 1.6GB, if I read it right. Which is more than 200MB ;-) Sigh. That solves that "mystery": the behavior that my sorted vs random example exhibited is a known limitation in hash aggs that spill (and an acceptable one). The memory usage

Re: Default setting for enable_hashagg_disk

2020-07-26 Thread Tomas Vondra
On Sat, Jul 25, 2020 at 05:13:00PM -0700, Peter Geoghegan wrote: On Sat, Jul 25, 2020 at 5:05 PM Tomas Vondra wrote: I'm not sure what you mean by "reported memory usage doesn't reflect the space used for transition state"? Surely it does include that, we've built the memory accounting stuff

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 5:31 PM Peter Geoghegan wrote: > I'm glad that this better principled approach is possible. It's hard > to judge how much of a problem this really is, though. We'll need to > think about this aspect some more. BTW, your HLL patch ameliorates the problem with my extreme

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 4:56 PM Jeff Davis wrote: > I wrote a quick patch to use HyperLogLog to estimate the number of > groups contained in a spill file. It seems to reduce the > overpartitioning effect, and is a more principled approach than what I > was doing before. This pretty much fixes

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 5:05 PM Tomas Vondra wrote: > I'm not sure what you mean by "reported memory usage doesn't reflect the > space used for transition state"? Surely it does include that, we've > built the memory accounting stuff pretty much exactly to do that. > > I think it's pretty clear

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Tomas Vondra
On Sat, Jul 25, 2020 at 10:07:37AM -0700, Peter Geoghegan wrote: On Sat, Jul 25, 2020 at 9:39 AM Peter Geoghegan wrote: "Peak Memory Usage: 1605334kB" Hash agg avoids spilling entirely (so the planner gets it right this time around). It even uses notably less memory. I guess that this is

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis
On Sat, 2020-07-25 at 13:27 -0700, Peter Geoghegan wrote: > It's not clear to me that overpartitioning is a real problem in > > this > > case -- but I think the fact that it's causing confusion is enough > > reason to see if we can fix it. > > I'm not sure about that either. > > FWIW I notice

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 1:10 PM Jeff Davis wrote: > On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote: > > What worries me a bit is the sharp discontinuities when spilling with > > significantly less work_mem than the "optimal" amount. For example, > > with Tomas' TPC-H query (against my

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis
On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote: > What worries me a bit is the sharp discontinuities when spilling with > significantly less work_mem than the "optimal" amount. For example, > with Tomas' TPC-H query (against my smaller TPC-H dataset), I find > that setting work_mem to

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 10:23 AM Jeff Davis wrote: > There's also another effect at work that can cause the total number of > batches to be higher for larger work_mem values: when we do recurse, we > again need to estimate the number of partitions needed. Right now, we > overestimate the number

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis
On Thu, 2020-07-23 at 19:33 -0700, Peter Geoghegan wrote: > That does make it sound like the costs of the hash agg aren't being > represented. I suppose it isn't clear if this is a costing issue > because it isn't clear if the execution time performance itself is > pathological or is instead

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis
On Fri, 2020-07-24 at 10:40 +0200, Tomas Vondra wrote: > FWIW one more suspicious thing that I forgot to mention is the > behavior > of the "planned partitions" depending on work_mem, which looks like > this: > >2MB Planned Partitions: 64HashAgg Batches: 4160 >4MB

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Sat, Jul 25, 2020 at 9:39 AM Peter Geoghegan wrote: > "Peak Memory Usage: 1605334kB" > > Hash agg avoids spilling entirely (so the planner gets it right this > time around). It even uses notably less memory. I guess that this is because the reported memory usage doesn't reflect the space used

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Peter Geoghegan
On Fri, Jul 24, 2020 at 12:55 PM Peter Geoghegan wrote: > Could that be caused by clustering in the data? > > If the input data is in totally random order then we have a good > chance of never having to spill skewed "common" values. That is, we're > bound to encounter common values before

Re: Default setting for enable_hashagg_disk

2020-07-25 Thread Jeff Davis
On Fri, 2020-07-24 at 21:16 +0200, Tomas Vondra wrote: > Surely more recursive spilling should do more I/O, but the Disk Usage > reported by explain analyze does not show anything like ... I suspect that's because of disk reuse in logtape.c. Regards, Jeff Davis

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Peter Geoghegan
On Fri, Jul 24, 2020 at 12:16 PM Tomas Vondra wrote: > Maybe, but we're nowhere close to these limits. See this table which I > posted earlier: > >2MB Planned Partitions: 64HashAgg Batches: 4160 >4MB Planned Partitions: 128HashAgg Batches: 16512 >8MB

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Tomas Vondra
On Fri, Jul 24, 2020 at 11:03:54AM -0700, Peter Geoghegan wrote: On Fri, Jul 24, 2020 at 8:19 AM Robert Haas wrote: This is all really good analysis, I think, but this seems like the key finding. It seems like we don't really understand what's actually getting written. Whether we use hash or

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Peter Geoghegan
On Fri, Jul 24, 2020 at 1:40 AM Tomas Vondra wrote: > Maybe, not sure what exactly you think is pathological? The trouble is > hashagg has to spill input tuples but the memory used in no-spill case > represents aggregated groups, so I'm not sure how you could extrapolate > from that ... Yeah,

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Peter Geoghegan
On Fri, Jul 24, 2020 at 8:19 AM Robert Haas wrote: > This is all really good analysis, I think, but this seems like the key > finding. It seems like we don't really understand what's actually > getting written. Whether we use hash or sort doesn't seem like it > should have this kind of impact on

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Tomas Vondra
On Fri, Jul 24, 2020 at 11:18:48AM -0400, Robert Haas wrote: On Thu, Jul 23, 2020 at 9:22 PM Tomas Vondra wrote: 2MB 4MB8MB64MB256MB --- hash 6.716.70 6.736.44

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Robert Haas
On Thu, Jul 23, 2020 at 9:22 PM Tomas Vondra wrote: >2MB 4MB8MB64MB256MB > --- > hash 6.716.70 6.736.44 5.81 > hash CP_SMALL_TLIST 5.285.26 5.24

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Tomas Vondra
On Fri, Jul 24, 2020 at 10:40:47AM +0200, Tomas Vondra wrote: On Thu, Jul 23, 2020 at 07:33:45PM -0700, Peter Geoghegan wrote: On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra wrote: So let me share some fresh I/O statistics collected on the current code using iosnoop. I've done the tests on two

Re: Default setting for enable_hashagg_disk

2020-07-24 Thread Tomas Vondra
On Thu, Jul 23, 2020 at 07:33:45PM -0700, Peter Geoghegan wrote: On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra wrote: So let me share some fresh I/O statistics collected on the current code using iosnoop. I've done the tests on two different machines using the "aggregate part" of TPC-H Q17,

Re: Default setting for enable_hashagg_disk

2020-07-23 Thread Peter Geoghegan
On Thu, Jul 23, 2020 at 6:22 PM Tomas Vondra wrote: > So let me share some fresh I/O statistics collected on the current code > using iosnoop. I've done the tests on two different machines using the > "aggregate part" of TPC-H Q17, i.e. essentially this: > > SELECT * FROM ( > SELECT

Re: Default setting for enable_hashagg_disk

2020-07-23 Thread Peter Geoghegan
On Fri, Jul 17, 2020 at 5:13 PM Jeff Davis wrote: > The patch itself looks reasonable to me. I don't see a lot of obvious > dangers, but perhaps someone would like to take a closer look at the > planner changes as you suggest. Attached is v3 of the hash_mem_multiplier patch series, which now has

Re: Default setting for enable_hashagg_disk

2020-07-22 Thread Peter Geoghegan
On Tue, Jul 21, 2020 at 1:30 PM Bruce Momjian wrote: > On Tue, Jul 14, 2020 at 03:49:40PM -0700, Peter Geoghegan wrote: > > Maybe I missed your point here. The problem is not so much that we'll > > get HashAggs that spill -- there is nothing intrinsically wrong with > > that. While it's true that

Re: Default setting for enable_hashagg_disk

2020-07-22 Thread Robert Haas
On Tue, Jul 14, 2020 at 6:49 PM Peter Geoghegan wrote: > Maybe I missed your point here. The problem is not so much that we'll > get HashAggs that spill -- there is nothing intrinsically wrong with > that. While it's true that the I/O pattern is not as sequential as a > similar group agg + sort,

Re: Default setting for enable_hashagg_disk

2020-07-21 Thread Bruce Momjian
On Tue, Jul 14, 2020 at 03:49:40PM -0700, Peter Geoghegan wrote: > Maybe I missed your point here. The problem is not so much that we'll > get HashAggs that spill -- there is nothing intrinsically wrong with > that. While it's true that the I/O pattern is not as sequential as a > similar group agg

Re: Default setting for enable_hashagg_disk

2020-07-20 Thread Tomas Vondra
On Mon, Jul 20, 2020 at 09:17:21AM -0400, Tom Lane wrote: Tomas Vondra writes: There's a minor problem here, though - these stats were collected before we fixed the tlist issue, so hashagg was spilling about 10x the amount of data compared to sort+groupagg. So maybe that's the first thing we

Re: Default setting for enable_hashagg_disk

2020-07-20 Thread Tom Lane
Tomas Vondra writes: > There's a minor problem here, though - these stats were collected before > we fixed the tlist issue, so hashagg was spilling about 10x the amount > of data compared to sort+groupagg. So maybe that's the first thing we > should do, before contemplating changes to the costing

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread Tomas Vondra
On Sun, Jul 19, 2020 at 02:17:15PM -0700, Jeff Davis wrote: On Sat, 2020-07-18 at 21:15 -0400, Tom Lane wrote: Jeff Davis writes: > What is your opinion about pessimizing the HashAgg disk costs (not > affecting HashAgg plans expected to stay in memory)? Tomas Vondra > presented some evidence

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread Peter Geoghegan
On Sat, Jul 18, 2020 at 3:04 PM Jeff Davis wrote: > > I think the entire discussion > > is way out ahead of any field evidence that we need such a knob. > > In the absence of evidence, our default position ought to be to > > keep it simple, not to accumulate backwards-compatibility kluges. > >

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread Jeff Davis
On Sat, 2020-07-18 at 21:15 -0400, Tom Lane wrote: > Jeff Davis writes: > > What is your opinion about pessimizing the HashAgg disk costs (not > > affecting HashAgg plans expected to stay in memory)? Tomas Vondra > > presented some evidence that Sort had some better IO patterns in > > some > >

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread David G. Johnston
On Sun, Jul 19, 2020 at 4:38 AM Stephen Frost wrote: > > (The only reason I'm in favor of heap_mem[_multiplier] is that it > > seems like it might be possible to use it to get *better* plans > > than before. I do not see it as a backwards-compatibility knob.) > > I still don't think a

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread Tom Lane
Stephen Frost writes: > In other words, if we'd stop trying to shoehorn something in, which > we're doing because we're in beta, we'd very likely be talking about all > of this in a very different way and probably be contemplating something > like a query_mem that provides for an overall memory

Re: Default setting for enable_hashagg_disk

2020-07-19 Thread Stephen Frost
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Jeff Davis writes: > > On Fri, 2020-07-17 at 18:38 -0700, Peter Geoghegan wrote: > >> There is also the separate question of what to do about the > >> hashagg_avoid_disk_plan GUC (this is a separate open item that > >> requires a separate

Re: Default setting for enable_hashagg_disk

2020-07-18 Thread Tom Lane
Jeff Davis writes: > What is your opinion about pessimizing the HashAgg disk costs (not > affecting HashAgg plans expected to stay in memory)? Tomas Vondra > presented some evidence that Sort had some better IO patterns in some > cases that weren't easily reflected in a principled way in the cost

Re: Default setting for enable_hashagg_disk

2020-07-18 Thread Jeff Davis
On Sat, 2020-07-18 at 14:30 -0400, Tom Lane wrote: > You'e being optimistic about it being possible to remove a GUC once > we ship it. That seems to be a hard sell most of the time. If nothing else, a repeat of this thread in a year or two to discuss removing a GUC doesn't seem appealing. > I

Re: Default setting for enable_hashagg_disk

2020-07-18 Thread Peter Geoghegan
On Sat, Jul 18, 2020 at 11:30 AM Tom Lane wrote: > Jeff Davis writes: > > Yes, I think we should have that GUC (hashagg_avoid_disk_plan) for at > > least one release. > > You'e being optimistic about it being possible to remove a GUC once > we ship it. That seems to be a hard sell most of the

Re: Default setting for enable_hashagg_disk

2020-07-18 Thread Tom Lane
Jeff Davis writes: > On Fri, 2020-07-17 at 18:38 -0700, Peter Geoghegan wrote: >> There is also the separate question of what to do about the >> hashagg_avoid_disk_plan GUC (this is a separate open item that >> requires a separate resolution). Tom leans slightly towards removing >> it now. Is

Re: Default setting for enable_hashagg_disk

2020-07-18 Thread Jeff Davis
On Fri, 2020-07-17 at 18:38 -0700, Peter Geoghegan wrote: > There is also the separate question of what to do about the > hashagg_avoid_disk_plan GUC (this is a separate open item that > requires a separate resolution). Tom leans slightly towards removing > it now. Is your position about the same

Re: Default setting for enable_hashagg_disk

2020-07-17 Thread Peter Geoghegan
On Fri, Jul 17, 2020 at 5:13 PM Jeff Davis wrote: > The idea is growing on me a bit. It doesn't give exactly v12 behavior, > but it does offer another lever that might tackle a lot of the > practical cases. Cool. > If I were making the decision alone, I'd still choose > the escape hatch based

Re: Default setting for enable_hashagg_disk

2020-07-17 Thread Jeff Davis
On Tue, 2020-07-14 at 21:12 -0700, Peter Geoghegan wrote: > Attached is a WIP patch implementing hash_mem_multiplier, with 1.0 as > the GUC's default value (i.e. the patch introduces no behavioral > changes by default). The first patch in the series renames some local > variables whose name is

Re: Default setting for enable_hashagg_disk

2020-07-14 Thread Peter Geoghegan
On Mon, Jul 13, 2020 at 9:47 AM Alvaro Herrera wrote: > I'm in favor of hash_mem_multiplier. I think a >1 default is more > sensible than =1 in the long run, but if strategic vote is what we're > doing, then I support the =1 option. Attached is a WIP patch implementing hash_mem_multiplier, with

Re: Default setting for enable_hashagg_disk

2020-07-14 Thread Peter Geoghegan
On Tue, Jul 14, 2020 at 12:46 PM Robert Haas wrote: > - I thought the problem we were trying to solve here was that, in v12, > if the planner thinks that your hashagg will fit in memory when really > it doesn't, you will get good performance because we'll cheat; in v13, > you'll get VERY bad

Re: Default setting for enable_hashagg_disk

2020-07-14 Thread Robert Haas
On Mon, Jul 13, 2020 at 2:50 PM Peter Geoghegan wrote: > Primarily in favor of escape hatch: > > Jeff, > DavidR, > Pavel, > Andres, > Robert ??, > Amit ?? > > Primarily in favor of hash_mem/hash_mem_multiplier: > > PeterG, > Tom, > Alvaro, > Tomas, > Justin, > DavidG, > Jonathan Katz > > There

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Amit Kapila
On Mon, Jul 13, 2020 at 9:50 PM Peter Geoghegan wrote: > > On Mon, Jul 13, 2020 at 6:13 AM Stephen Frost wrote: > > Yes, increasing work_mem isn't unusual, at all. > > It's unusual as a way of avoiding OOMs! > > > Eh? That's not at all what it looks like- they were getting OOM's > > because

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Tom Lane
"David G. Johnston" writes: > To be clear, by "escape hatch" you mean "add a GUC that instructs the > PostgreSQL executor to ignore hash_mem when deciding whether to spill the > contents of the hash table to disk - IOW to never spill the contents of a > hash table to disk"? If so that seems

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Geoghegan
On Mon, Jul 13, 2020 at 12:57 PM David G. Johnston wrote: > To be clear, by "escape hatch" you mean "add a GUC that instructs the > PostgreSQL executor to ignore hash_mem when deciding whether to spill the > contents of the hash table to disk - IOW to never spill the contents of a > hash table

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread David G. Johnston
On Mon, Jul 13, 2020 at 11:50 AM Peter Geoghegan wrote: > > Primarily in favor of escape hatch: > > Jeff, > DavidR, > Pavel, > Andres, > Robert ??, > Amit ?? > > To be clear, by "escape hatch" you mean "add a GUC that instructs the PostgreSQL executor to ignore hash_mem when deciding whether to

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Geoghegan
On Mon, Jul 13, 2020 at 7:25 AM David Rowley wrote: > I think it would be good if we could try to move towards getting > consensus here rather than reiterating our arguments over and over. +1 > Updated summary: > * For hash_mem = Tomas [7], Justin [16] > * For hash_mem_multiplier with a default

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Justin Pryzby
On Mon, Jul 13, 2020 at 12:47:36PM -0400, Alvaro Herrera wrote: > On 2020-Jul-13, Jeff Davis wrote: > > > On Tue, 2020-07-14 at 02:25 +1200, David Rowley wrote: > > > Updated summary: > > > * For hash_mem = Tomas [7], Justin [16] > > > * For hash_mem_multiplier with a default > 1.0 = DavidG [21]

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Tom Lane
Alvaro Herrera writes: > I'm in favor of hash_mem_multiplier. I think a >1 default is more > sensible than =1 in the long run, but if strategic vote is what we're > doing, then I support the =1 option. FWIW, I also think that we'll eventually end up with >1 default. But the evidence to support

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Alvaro Herrera
On 2020-Jul-13, Jeff Davis wrote: > On Tue, 2020-07-14 at 02:25 +1200, David Rowley wrote: > > Updated summary: > > * For hash_mem = Tomas [7], Justin [16] > > * For hash_mem_multiplier with a default > 1.0 = DavidG [21] > > * For hash_mem_multiplier with default = 1.0 = PeterG [15][0], Tom > >

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Geoghegan
On Mon, Jul 13, 2020 at 6:13 AM Stephen Frost wrote: > Yes, increasing work_mem isn't unusual, at all. It's unusual as a way of avoiding OOMs! > Eh? That's not at all what it looks like- they were getting OOM's > because they set work_mem to be higher than the actual amount of memory > they

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Eisentraut
On 2020-07-13 16:11, Tomas Vondra wrote: Why is running out of disk space worse experience than running out of memory? Sure, it'll take longer and ultimately the query fails (and if it fills the device used by the WAL then it may also cause shutdown of the main instance due to inability to

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Jeff Davis
On Tue, 2020-07-14 at 02:25 +1200, David Rowley wrote: > Updated summary: > * For hash_mem = Tomas [7], Justin [16] > * For hash_mem_multiplier with a default > 1.0 = DavidG [21] > * For hash_mem_multiplier with default = 1.0 = PeterG [15][0], Tom > [20][24] I am OK with these options, but I

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread David Rowley
be fine with setting the default to 1.0, per [0]. This thread did split off a while back into "Default setting for enable_hashagg_disk (hash_mem)", I did try and summarise who sits where on this in [19]. I think it would be good if we could try to move towards getting consensus h

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Tomas Vondra
On Mon, Jul 13, 2020 at 01:51:42PM +0200, Peter Eisentraut wrote: On 2020-04-07 20:20, Jeff Davis wrote: Now that we have Disk-based Hash Aggregation, there are a lot more situations where the planner can choose HashAgg. The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Stephen Frost
Greetings, * Peter Geoghegan (p...@bowt.ie) wrote: > On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost wrote: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > > Have you got a better proposal that is reasonably implementable for v13? > > > (I do not accept the argument that "do nothing" is a better

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Eisentraut
On 2020-07-13 14:16, David Rowley wrote: Isn't that what temp_file_limit is for? Yeah, I guess that is so rarely used that I had forgotten about it. So maybe that is also something that more users will want to be aware of. -- Peter Eisentraut http://www.2ndQuadrant.com/

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread David Rowley
On Mon, 13 Jul 2020 at 23:51, Peter Eisentraut wrote: > I have an anecdote that might be related to this discussion. > > I was running an unrelated benchmark suite. With PostgreSQL 12, one > query ran out of memory. With PostgreSQL 13, the same query instead ran > out of disk space. I bisected

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Peter Eisentraut
On 2020-04-07 20:20, Jeff Davis wrote: Now that we have Disk-based Hash Aggregation, there are a lot more situations where the planner can choose HashAgg. The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on costing. If false, it only generates a HashAgg path if it thinks it

Re: Default setting for enable_hashagg_disk

2020-07-13 Thread Amit Kapila
On Sat, Jul 11, 2020 at 3:30 AM Tom Lane wrote: > > Stephen Frost writes: > > I don't see hash_mem as being any kind of proper fix- it's just punting > > to the user saying "we can't figure this out, how about you do it" and, > > worse, it's in conflict with how we already ask the user that

Re: Default setting for enable_hashagg_disk

2020-07-12 Thread Tomas Vondra
On Sat, Jul 11, 2020 at 10:26:22PM -0700, David G. Johnston wrote: On Saturday, July 11, 2020, Tom Lane wrote: "David G. Johnston" writes: > On Sat, Jul 11, 2020 at 5:47 PM Tom Lane wrote: >> It seems like a lot of the disagreement here is focused on Peter's >> proposal to make

Re: Default setting for enable_hashagg_disk

2020-07-12 Thread Tomas Vondra
On Sat, Jul 11, 2020 at 08:47:54PM -0400, Tom Lane wrote: Tomas Vondra writes: I don't know, but one of the main arguments against simply suggesting people to bump up work_mem (if they're hit by the hashagg spill in v13) was that it'd increase overall memory usage for them. It seems strange to

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread David G. Johnston
On Saturday, July 11, 2020, Tom Lane wrote: > "David G. Johnston" writes: > > On Sat, Jul 11, 2020 at 5:47 PM Tom Lane wrote: > >> It seems like a lot of the disagreement here is focused on Peter's > >> proposal to make hash_mem_multiplier default to 2.0. But it doesn't > >> seem to me that

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Tom Lane
"David G. Johnston" writes: > On Sat, Jul 11, 2020 at 5:47 PM Tom Lane wrote: >> It seems like a lot of the disagreement here is focused on Peter's >> proposal to make hash_mem_multiplier default to 2.0. But it doesn't >> seem to me that that's a critical element of the proposal. Why not just

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread David G. Johnston
On Sat, Jul 11, 2020 at 5:47 PM Tom Lane wrote: > Tomas Vondra writes: > > I don't know, but one of the main arguments against simply suggesting > > people to bump up work_mem (if they're hit by the hashagg spill in v13) > > was that it'd increase overall memory usage for them. It seems strange

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Thomas Munro
On Sun, Jul 12, 2020 at 2:27 AM Tom Lane wrote: > David Rowley writes: > > hmm yeah. It's unfortunate, but I'm not sure how I'd have implemented > > it differently. The problem is made worse by the fact that we'll only > > release the memory for the hash table during ExecEndHashJoin(). If the >

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Peter Geoghegan
I would be okay with a default of 1.0. Peter Geoghegan (Sent from my phone)

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Tom Lane
Tomas Vondra writes: > I don't know, but one of the main arguments against simply suggesting > people to bump up work_mem (if they're hit by the hashagg spill in v13) > was that it'd increase overall memory usage for them. It seems strange > to then propose a new GUC set to a default that would

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Tomas Vondra
On Sat, Jul 11, 2020 at 09:02:43AM -0700, David G. Johnston wrote: On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost wrote: There now seems to be some suggestions that not only should we have a new GUC, but we should default to having it not be equal to work_mem (or 1.0 or whatever) and instead

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Peter Geoghegan
On Sat, Jul 11, 2020 at 4:23 PM Tomas Vondra wrote: > I find that example rather suspicious. I mean, what exactly in the > GroupAgg plan would consume this memory? Surely it'd have to be some > node below the grouping, but sort shouldn't do that, no? > > Seems strange. Well, I imagine hash

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 10:00 PM David Rowley wrote: > hmm yeah. It's unfortunate, but I'm not sure how I'd have implemented > it differently. The problem is made worse by the fact that we'll only > release the memory for the hash table during ExecEndHashJoin(). If the > planner had some ability

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Tomas Vondra
On Sat, Jul 11, 2020 at 09:49:43AM -0700, Peter Geoghegan wrote: On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: > Have you got a better proposal that is reasonably implementable for v13? > (I do not accept the argument that "do nothing" is a better

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Peter Geoghegan
On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > Have you got a better proposal that is reasonably implementable for v13? > > (I do not accept the argument that "do nothing" is a better proposal.) > So, no, I don't agree that 'do nothing' (except

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread David G. Johnston
On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost wrote: > There now seems to be some suggestions that not only should we have a > new GUC, but we should default to having it not be equal to work_mem (or > 1.0 or whatever) and instead by higher, to be *twice* or larger whatever > the existing

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Tom Lane
David Rowley writes: > hmm yeah. It's unfortunate, but I'm not sure how I'd have implemented > it differently. The problem is made worse by the fact that we'll only > release the memory for the hash table during ExecEndHashJoin(). If the > planner had some ability to provide the executor with

Re: Default setting for enable_hashagg_disk

2020-07-11 Thread Stephen Frost
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > I don't see hash_mem as being any kind of proper fix- it's just punting > > to the user saying "we can't figure this out, how about you do it" and, > > worse, it's in conflict with how we already ask the user that

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley
On Sat, 11 Jul 2020 at 14:02, Peter Geoghegan wrote: > > On Fri, Jul 10, 2020 at 6:19 PM David Rowley wrote: > > If we have to have a new GUC, my preference would be hashagg_mem, > > where -1 means use work_mem and a value between 64 and MAX_KILOBYTES > > would mean use that value. We'd need

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 6:19 PM David Rowley wrote: > If we get hash_mem > or some variant that is a multiplier of work_mem, then that user is in > exactly the same situation for that plan. i.e there's no ability to > increase the memory allowances for Hash Agg alone. That's true, of course. >

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston
On Fri, Jul 10, 2020 at 6:43 PM David Rowley wrote: > On Sat, 11 Jul 2020 at 13:36, David G. Johnston > wrote: > > If we add a setting that defaults to work_mem then the benefit is > severely reduced. You still have to modify individual queries, but the > change can simply be more targeted

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley
On Sat, 11 Jul 2020 at 13:36, David G. Johnston wrote: > If we add a setting that defaults to work_mem then the benefit is severely > reduced. You still have to modify individual queries, but the change can > simply be more targeted than changing work_mem alone. I think the idea is that this

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston
On Fri, Jul 10, 2020 at 6:19 PM David Rowley wrote: > If we have to have a new GUC, my preference would be hashagg_mem, > where -1 means use work_mem and a value between 64 and MAX_KILOBYTES > would mean use that value. We'd need some sort of check hook to > disallow 0-63. I really am just

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley
On Sat, 11 Jul 2020 at 12:47, David G. Johnston wrote: > The multiplier seems strictly better than "rely on work_mem alone, i.e., do > nothing"; the detracting factor being one more GUC. Even if one wants to > argue the solution is ugly or imperfect the current state seems worse and a > more

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David G. Johnston
On Fri, Jul 10, 2020 at 5:16 PM David Rowley wrote: > Stephen mentions in [1] that: > > Users who are actually hit by this in a negative way > > have an option- increase work_mem to reflect what was actually happening > > already. > > Peter is not a fan of that idea, which can only be due to the

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread David Rowley
On Sat, 11 Jul 2020 at 10:00, Tom Lane wrote: > > Stephen Frost writes: > > I don't see hash_mem as being any kind of proper fix- it's just punting > > to the user saying "we can't figure this out, how about you do it" and, > > worse, it's in conflict with how we already ask the user that

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Peter Geoghegan
On Fri, Jul 10, 2020 at 2:50 PM Stephen Frost wrote: > Nothing of what you've said thus far has shown me that there were > material bits of the discussion that I've missed. Maybe that's just because you missed those bits too? > No, that other people > feel differently or have made comments

Re: Default setting for enable_hashagg_disk

2020-07-10 Thread Tom Lane
Stephen Frost writes: > I don't see hash_mem as being any kind of proper fix- it's just punting > to the user saying "we can't figure this out, how about you do it" and, > worse, it's in conflict with how we already ask the user that question. > Turning it into a multiplier doesn't change that

  1   2   3   >