Re: [HACKERS] Explanation for bug #13908: hash joins are badly broken

Tomas Vondra Sat, 06 Feb 2016 01:04:41 -0800

On 02/06/2016 02:34 AM, Tom Lane wrote:

I have sussed what's happening in bug #13908. Basically, commit
45f6240a8fa9d355 ("Pack tuples in a hash join batch densely, to save
memory") broke things for the case where a hash join is using a skew
table. The reason is that that commit only changed the storage of
tuples going into the main hash table; tuples going into the skew
table are still allocated with a palloc apiece, without being put
into the "chunk" storage. Now, if we're loading the hash table and we
find that we've exceeded the storage allowed for skew tuples,
ExecHashRemoveNextSkewBucket wants to push some skew tuples back into
the main hash table; and it believes that linking such tuples into
the appropriate main hashbucket chain is sufficient for that. Which
it was before the aforesaid commit, and still is in simple cases.
However, if we later try to do ExecHashIncreaseNumBatches, that
function contains new code that assumes that it can find all tuples
in the main hashtable by scanning the "chunk" storage directly. Thus,
the pushed-back tuples are not scanned and are neither re-entered
into the hash table nor dumped into a batch file. So they won't get
joined.


Damn, that's an embarrassing oversight :-/

I believe the attached patch should fix this by actually copying thetuples into the densely allocated chunks. Haven't tested it though, willdo in a few hours.

It looks like ExecHashIncreaseNumBuckets, if it were to run after
some executions of ExecHashRemoveNextSkewBucket, would break things
in the same way. That's not what's happening in this particular test
case, though.

ExecHashIncreaseNumBuckets assumes all the tuples can be reached bysimply walking the chunks (from dense_alloc). So if removing skew bucketonly updates pointers in buckets, that gets broken. But I don't thinkthat's a bug in ExecHashIncreaseNumBuckets and should be resolved byfixing ExecHashRemoveNextSkewBucket.

I'm of the opinion that this is a stop-ship bug for 9.5.1. Barring
somebody producing a fix over the weekend, I will deal with it by
reverting the aforementioned commit.

I think it's not quite possible to revert just the one commit as theother hashjoin improvements in 9.5 built on top of that. So the revertwould either be quite invasive (requiring more code changes than thefix), or we'd have to revert all the hashjoin goodies.


FWIW I'm willing to put some time into fixing this over the weekend.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index 8a8fdf2..653faf5 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -1576,9 +1576,16 @@ ExecHashRemoveNextSkewBucket(HashJoinTable hashtable)
 		/* Decide whether to put the tuple in the hash table or a temp file */
 		if (batchno == hashtable->curbatch)
 		{
+			/* keep tuple in memory - copy it into the new chunk */
+			HashJoinTuple copyTuple;
+
+			copyTuple = (HashJoinTuple) dense_alloc(hashtable, tupleSize);
+			memcpy(copyTuple, hashTuple, tupleSize);
+
 			/* Move the tuple to the main hash table */
-			hashTuple->next = hashtable->buckets[bucketno];
-			hashtable->buckets[bucketno] = hashTuple;
+			copyTuple->next = hashtable->buckets[bucketno];
+			hashtable->buckets[bucketno] = copyTuple;
+
 			/* We have reduced skew space, but overall space doesn't change */
 			hashtable->spaceUsedSkew -= tupleSize;
 		}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explanation for bug #13908: hash joins are badly broken

Reply via email to