> On 08/19/2015 03:53 PM, Tom Lane wrote: > > > > I don't see anything very wrong with constraining the initial > > allocation to 1GB, or even less. That will prevent consuming insane > > amounts of work_mem when the planner's rows estimate is too high > > rather than too low. And we do have the ability to increase the hash > > table size on the fly. > > Perhaps. Limiting the initial allocation to 1GB might help with lowering > memory consumed in case of over-estimation, and it's high enough so that > regular queries don't run into that. > > My initial thought was "if the memory consumption is a problem, lower > the work_mem" but after thinking about it for a while I don't see a > reason to limit the memory consumption this way, if possible. > > But what is the impact on queries that actually need more than 1GB of > buckets? I assume we'd only limit the initial allocation and still allow > the resize based on the actual data (i.e. the 9.5 improvement), so the > queries would start with 1GB and then resize once finding out the > optimal size (as done in 9.5). The resize is not very expensive, but > it's not free either, and with so many tuples (requiring more than 1GB > of buckets, i.e. ~130M tuples) it's probably just a noise in the total > query runtime. But I'd be nice to see some proofs of that ... > The problem here is we cannot know exact size unless Hash node doesn't read entire inner relation. All we can do is relying planner's estimation, however, it often computes a crazy number of rows. I think resizing of hash buckets is a reasonable compromise.
> Also, why 1GB and not 512MB (which is effectively the current limit on > 9.4, because we can't allocate 1GB there so we end up with 1/2 of that)? > Why not some small percentage of work_mem, e.g. 5%? > No clear reason at this moment. > Most importantly, this is mostly orthogonal to the original bug report. > Even if we do this, we still have to fix the palloc after the resize. > > So I'd say let's do a minimal bugfix of the actual issue, and then > perhaps improve the behavior in case of significant overestimates. The > bugfix should happen ASAP so that it gets into 9.5.0 (and backported). > We already have patches for that. > > Limiting the initial allocation deserves more thorough discussion and > testing, and I'd argue it's 9.6 material at this point. > Indeed, the original bug was caused by normal MemoryContextAlloc(). The issue around memory over consumption is a problem newly caused by this. I didn't notice it two months before. > > The real problem here is the potential integer overflow in > > ExecChooseHashTableSize. Now that we know there is one, that should > > be fixed (and not only in HEAD/9.5). > > I don't think there's any integer overflow. The problem is that we end > up with nbuckets so high that (nbuckets*8) overflows 1GB-1. > > There's a protection against integer overflow in place (it was there for > a long time), but there never was any protection against the 1GB limit. > Because we've been using much smaller work_mem values and > NTUP_PER_BUCKET=10, so we could not actually reach it. > > > But I believe Kaigai-san withdrew his initial proposed patch, and we > > don't have a replacementas far as I saw. > > I believe the patch proposed by KaiGai-san is the right one to fix the > bug discussed in this thread. My understanding is KaiGai-san withdrew > the patch as he wants to extend it to address the over-estimation issue. > > I don't think we should do that - IMHO that's an unrelated improvement > and should be addressed in a separate patch. > OK, it might not be a problem we should conclude within a few days, just before the beta release. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers