Greetings, * Peter Geoghegan (p...@bowt.ie) wrote: > On Sat, Jul 11, 2020 at 7:22 AM Stephen Frost <sfr...@snowman.net> wrote: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > > Have you got a better proposal that is reasonably implementable for v13? > > > (I do not accept the argument that "do nothing" is a better proposal.) > > > So, no, I don't agree that 'do nothing' (except ripping out the one GUC > > that was already added) is a worse proposal than adding another work_mem > > like thing that's only for some nodes types. > > The question was "Have you got a better proposal that is reasonably > implementable for v13?". > > This is anecdotal, but just today somebody on Twitter reported > *increasing* work_mem to stop getting OOMs from group aggregate + > sort: > > https://twitter.com/theDressler/status/1281942941133615104
Yes, increasing work_mem isn't unusual, at all. What that tweet shows that I don't think folks who are suggesting things like setting this factor to 2.0 is that people may have a work_mem configured in the gigabytes- meaning that a 2.0 value would result in a work_mem of 5GB and a hash_mem of 10GB. Now, I'm all for telling people to review their configurations between major versions, but that's a large difference that's going to be pretty deeply hidden in a 'multiplier' setting. I'm still wholly unconvinced that we need such a setting, just to be clear, but I don't think there's any way it'd be reasonable to have it set to something other than "whatever work_mem is" by default- and it needs to actually be "what work_mem is" and not "have the same default value" or *everyone* would have to configure it. > It was possible to fix the problem in this instance, since evidently > there wasn't anything else that really did try to consume ~5 GB of > work_mem memory. Evidently the memory isn't available in any general > sense, so there are no OOMs now. Nevertheless, we can expect OOMs on > this server just as soon as there is a real need to do a ~5GB sort, > regardless of anything else. Eh? That's not at all what it looks like- they were getting OOM's because they set work_mem to be higher than the actual amount of memory they had and the Sort before the GroupAgg was actually trying to use all that memory. The HashAgg ended up not needing that much memory because the aggregated set wasn't actually that large. If anything, this shows exactly what Jeff's fine work here is (hopefully) going to give us- the option to plan a HashAgg in such cases, since we can accept spilling to disk if we end up underestimate, or take advantage of that HashAgg being entirely in memory if we overestimate. > I don't think that this kind of perverse effect is uncommon. Hash > aggregate can naturally be far faster than group agg + sort, Hash agg > can naturally use a lot less memory in many cases, and we have every > reason to think that grouping estimates are regularly totally wrong. I'm confused as to why we're talking about the relative performance of a HashAgg vs. a Sort+GroupAgg- of course the HashAgg is going to be faster if it's got enough memory, but that's a constraint we have to consider and deal with because, otherwise, the query can end up failing and potentially impacting other queries or activity on the system, including resulting in the entire database system falling over due to the OOM Killer firing and killing a process and the database ending up restarting and going through crash recovery, which is going to be quite a bit worse than performance maybe not being great. > You're significantly underestimating the risk. Of... what? That we'll end up getting worse performance because we underestimated the size of the result set and we end up spilling to disk with the HashAgg? I think I'm giving that risk the amount of concern it deserves- which is, frankly, not very much. Users who run into that issue, as this tweet *also* showed, are familiar with work_mem and can tune it to address that. This reaction to demand a new GUC to break up work_mem into pieces strikes me as unjustified, and doing so during beta makes it that much worse. Having looked back, I'm not sure that I'm really in the minority regarding the proposal to add this at this time either- there's been a few different comments that it's too late for v13 and/or that we should see if we actually end up with users seriously complaining about the lack of a separate way to specify the memory for a given node type, and/or that if we're going to do this then we should have a broader set of options covering other nodes types too, all of which are positions that I agree with. Thanks, Stephen
signature.asc
Description: PGP signature