Tomas Vondra <tomas.von...@2ndquadrant.com> writes: > On 02/06/2016 02:34 AM, Tom Lane wrote: >> I have sussed what's happening in bug #13908. Basically, commit >> 45f6240a8fa9d355 ("Pack tuples in a hash join batch densely, to save >> memory") broke things for the case where a hash join is using a skew >> table.
> Damn, that's an embarrassing oversight :-/ > I believe the attached patch should fix this by actually copying the > tuples into the densely allocated chunks. Haven't tested it though, will > do in a few hours. Yeah, that's one fix approach I was contemplating last night. (I think the patch as written leaks memory and doesn't account for space usage properly either, but certainly this is a direction we could take.) The other answer I was thinking about was to get rid of the assumption that iterating over the chunk storage is a valid thing to do, and instead scan the hashbucket chains when we need to visit all tuples. I really do not like the patch as designed, for several reasons: * It incorporates a bespoke reimplementation of palloc into hash joins. This is not a maintainable/scalable way to go about reducing memory consumption. It should have been done with an arm's-length API to a new type of memory context, IMO (probably one that supports palloc but not pfree, repalloc, or any chunk-header-dependent operations). * No arm's-length API would conceivably allow remote callers to iterate over all allocated chunks in the way this code does, which is why we need to get rid of that behavior. * There's no way to delete single tuples from the hash table given this coding, which no doubt is why you didn't migrate the skew tuples into this representation; but it doesn't seem like a very future-proof data structure. * Not doing anything for the skew tuples doesn't seem very good either, considering the whole point of that sub-module is that there are likely to be a lot of them. I note also that while the idea of ExecHashRemoveNextSkewBucket is to reduce memory consumed by the skew table to make it available to the main hash table, in point of fact it's unlikely that the freed space will be of any use at all, since it will be in tuple-sized chunks far too small for dense_alloc's requests. So the spaceUsed bookkeeping being done there is an academic exercise unconnected to reality, and we need to rethink what the space management plan is. So I'm of the opinion that a great deal more work is needed here. But it's not something we're going to be able to get done for 9.5.1, or realistically 9.5.anything. Whereas adding additional klugery to ExecHashRemoveNextSkewBucket probably is doable over the weekend. regards, tom lane -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers