Re: [HACKERS] WIP: [[Parallel] Shared] Hash

Ashutosh Bapat Wed, 08 Feb 2017 05:04:40 -0800

>
> 0004-hj-refactor-batch-increases-v4.patch:
>
> Modify the existing hash join code to detect work_mem exhaustion at
> the point where chunks are allocated, instead of checking after every
> tuple insertion.  This matches the logic used for estimating, and more
> importantly allows for some parallelism in later patches.


The patch has three changes
1. change dense_alloc() to accept respect_workmem argument and use it
within the function.
2. Move call to ExecHashIncreaseNumBatches() into dense_alloc() from
ExecHashTableInsert() to account for memory before inserting new tuple
3. Check growEnabled before calling ExecHashIncreaseNumBatches().

I think checking growEnabled within ExecHashIncreaseNumBatches() is
more easy to maintain that checking at every caller. If someone is to
add a caller tomorrow, s/he has to remember to add the check.

It might be better to add some comments in
ExecHashRemoveNextSkewBucket() explaining why dense_alloc() should be
called with respect_work_mem = false? ExecHashSkewTableInsert() does
call ExecHashIncreaseNumBatches() after calling
ExecHashRemoveNextSkewBucket() multiple times, so it looks like we do
expect increase in space used and thus go beyond work_mem for a short
while. Is there a way we can handle this case in dense_alloc()?

Is it possible that increasing the number of batches changes the
bucket number of the tuple being inserted? If so, should we
recalculate the bucket and batch of the tuple being inserted?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

Reply via email to