Thomas Munro <> writes:
> I have been wondering about a couple of different worst case execution
> strategies that would be better than throwing our hands up and
> potentially exploding memory once we detect that further partitioning
> is not going to help, if we still manage to reach that case despite
> adding stats-based defences like this due to statistics being missing,
> bad or confused by joins below us.

Yeah, it would definitely be nice if we could constrain the runtime
space consumption better.

> 1.  We could probe the fraction of the hash table that we have managed
> to load into work_mem so far and then rewind the outer batch and do it
> again for the next work_mem-sized fraction of the inner batch and so
> on.  For outer joins we'd need to scan for unmatched tuples after each
> hash table refill.

I do not understand how that works for a left join?  You'd need to track
whether a given outer tuple has been matched in any one of the fractions
of the inner batch, so that when you're done with the batch you could know
which outer tuples need to be emitted null-extended.  Right now we only
need to track that state for the current outer tuple, but in a rescan
situation we'd have to remember it for each outer tuple in the batch.

Perhaps it could be done by treating the outer batch file as read/write
and incorporating a state flag in each tuple; or to reduce write volume,
maintaining a separate outer batch file parallel to the main one with just
a bool or even just a bit per outer tuple.  Seems messy though.

                        regards, tom lane

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to