Stephen Frost <sfr...@snowman.net> writes: > * Jeff Davis (pg...@j-davis.com) wrote: >> In Stephen's case the table was only 41KB, so something still seems off. >> Maybe we should model the likelihood of a collision based on the >> cardinalities (assuming a reasonably good hash function)?
> It's not really 'hash collisions' that we're trying to be wary of, per > se, it's the risk of duplicates. I spent some time looking at this. I think the real issue is that the code is trying to charge something close to cpu_operator_cost for each time an outer tuple is compared to a hashtable entry. That was actually reasonable back when the costing code was last touched --- the current scheme of storing and comparing the hash codes before testing the equality operator proper only dates to commit 849074f9ae422c64501bb1d53ef840de870bf65c. I punted on changing the cost estimates at that time, and I think what this example is showing is that that was a bad decision. Really, when we're traipsing down a bucket list, skipping over bucket entries with the wrong hash code is just about free, or at least it's a whole lot cheaper than applying ExecQual. Perhaps what we should do is charge the hash_qual_cost only for some small multiple of the number of tuples that we expect will *pass* the hash quals, which is a number we have to compute anyway. The multiple would represent the rate of hash-code collisions we expect. I'd still be inclined to charge something per bucket entry, but it should be really small, perhaps on the order of 0.01 times cpu_operator_cost. Or we could just drop that term entirely. It strikes me that the reason to be worried about skewed distribution in the inner relation is not really that it changes the run time much, but rather that it risks blowing out work_mem if specific virtual buckets have too many members (or at best, we're forced to increase the number of batches more than we thought to stay under work_mem; which carries runtime costs of its own). Maybe what we should be doing with the bucketsize numbers is estimating peak memory consumption to gate whether we'll accept the plan at all, rather than adding terms to the cost estimate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers