Re: [HACKERS] Hash Join cost estimates

Stephen Frost Sat, 30 Mar 2013 13:53:09 -0700

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> I think the point is that there may *not* be few hash collisions ...


Right, but that's actually all entirely based on concerns over there
being duplicates (hence why we consider the MCVs and ndistinct), which
makes *some* sense given that we currently have a single linked-list in
each bucket into which any dups are placed.

It occurs to me that we could get away from this by having a 2-level
system.  We hash to a bucket which contains a linked list of linked
lists.  The bucket list would be for actual collisions (which we don't
consider at all in the current bucket estimating) and each of those
entries would be a linked list of duplicates for that entry in the
bucket.  This would add some small cost for building the hash, since we
would have to step through each entry in the bucket to determine if the
entry being added is a new entry or not, but not very much unless we're
worried about lots of collisions happening (which I certainly hope isn't
the case).  Hash tables are generally expected to take more effort to
build initially anyway, so I don't see a problem putting more logic
there.  Also, we could skip checking each entry in the bucket when the
input is known to be unique and instead just skip to the end of the
bucket since the new entry can't match any existing.

We could still work through the bucketing logic and add some costing to
that case for those situations where we are hashing on only one key of a
multi-key join and we expect a lot of duplicates to exist.  I'm not sure
how much that happens though- I would hope that we would use a composite
hash key most of the time that we have multi-key joins that use hash
tables.

Thoughts?

        Thanks,

                Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Hash Join cost estimates

Reply via email to