On Tue, Jun 10, 2014 at 1:43 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Tue, Jun 10, 2014 at 5:17 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> The problem case is when you have 1 batch and the increased memory
>> consumption causes you to switch to 2 batches.  That's expensive.  It
>> seems clear based on previous testing that *on the average*
>> NTUP_PER_BUCKET = 1 will be better, but in the case where it causes an
>> increase in the number of batches it will be much worse - particularly
>> because the only way we ever increase the number of batches is to
>> double it, which is almost always going to be a huge loss.
> Is there a reason we don't do hybrid hashing, where if 80% fits in memory
> than we write out only the 20% that doesn't? And then when probing the table
> with the other input, the 80% that land in in-memory buckets get handled
> immediately, and only the 20 that land in the on-disk buckets get written
> for the next step?

We have an optimization that is a little bit like that.  The "skew"
hash join stuff tries to (essentially) ensure that the MCVs are in the
first batch.

But more could probably be done.  For example, suppose we have 256
buckets.  If the hash table overflows work_mem, we could write the
contents of *one bucket* out to disk, rather than (as we currently do)
half of the table.  If we overflow again, we write another bucket.
When the number of buckets written reaches half the total, we split
all of the remaining buckets so that all 256 slots are once again
active.  Repeat as needed.

If something like that worked out, it would drastically reduce the
penalty for slightly overrunning work_mem.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to