On Thu, Feb 9, 2017 at 2:03 AM, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: >> >> 0004-hj-refactor-batch-increases-v4.patch: >> >> Modify the existing hash join code to detect work_mem exhaustion at >> the point where chunks are allocated, instead of checking after every >> tuple insertion. This matches the logic used for estimating, and more >> importantly allows for some parallelism in later patches. > > The patch has three changes > 1. change dense_alloc() to accept respect_workmem argument and use it > within the function. > 2. Move call to ExecHashIncreaseNumBatches() into dense_alloc() from > ExecHashTableInsert() to account for memory before inserting new tuple > 3. Check growEnabled before calling ExecHashIncreaseNumBatches().
Thanks for the review! > I think checking growEnabled within ExecHashIncreaseNumBatches() is > more easy to maintain that checking at every caller. If someone is to > add a caller tomorrow, s/he has to remember to add the check. Hmm. Yeah. In the later 0010 patch ExecHashIncreaseNumBatches will be used in a slightly different way -- not for making decisions or performing the hash table shrink, but only for reallocating the batch arrays. I will see if putting the growEnabled check back in there in the 0004 patch and then refactoring in a later patch makes more sense to someone reviewing the patches independently, for the next version. > It might be better to add some comments in > ExecHashRemoveNextSkewBucket() explaining why dense_alloc() should be > called with respect_work_mem = false? ExecHashSkewTableInsert() does > call ExecHashIncreaseNumBatches() after calling > ExecHashRemoveNextSkewBucket() multiple times, so it looks like we do > expect increase in space used and thus go beyond work_mem for a short > while. Is there a way we can handle this case in dense_alloc()? Right, that needs some explanation, which I'll add for the next version. The explanation is that while 'shrinking' the hash table, we may need to go over the work_mem limit by one chunk for a short time. That is already true in master, but by moving the work_mem checks into dense_alloc I ran into the problem that dense_alloc might decide to shrink the hash table which needs to call dense alloc. Shrinking works by spinning through all the chunks copying only the tuples we want to keep into new chunks and freeing the old chunks as we go. We will temporarily go one chunk over work_mem when we allocate the first new chunk but before we've freed the first old one. We don't want shrink operations to trigger recursive shrink operations, so we disable respect for work_mem when calling it from ExecHashIncreaseNumBatches. In the course of regular hash table loading, we want to respect work_mem. Looking at the v5 patch series I posted yesterday, I see that in fact ExecHashIncreaseNumBatches calls dense_alloc with respect_work_mem = true in the 0004 patch, and then I corrected that mistake in the 0008 patch; I'll move the correction back to the 0004 patch in the next version. To reach ExecHashIncreaseNumBatches, see the "ugly" query in hj-test-queries.sql (posted with v5). In ExecHashRemoveNextSkewBucket I preserved the existing behaviour of not caring about work_mem when performing the rare operation of copying a tuple from the skew bucket into a dense_alloc memory chunk so it can be inserted into a regular (non-skew) bucket. > Is it possible that increasing the number of batches changes the > bucket number of the tuple being inserted? If so, should we > recalculate the bucket and batch of the tuple being inserted? No -- see the function documentation for ExecHashGetBucketAndBatch. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers