* Jeff Davis (pg...@j-davis.com) wrote: > In Stephen's case the table was only 41KB, so something still seems off. > Maybe we should model the likelihood of a collision based on the > cardinalities (assuming a reasonably good hash function)?
It's not really 'hash collisions' that we're trying to be wary of, per se, it's the risk of duplicates. To push this very far in the other direction- if you have 41k of the *same value* in the small table, then it's currently faster to build the hash table on the large table and then seq scan the small table (10s vs. 14s on my laptop running w/o being plugged in, so it's a bit slower). Now, that's a pretty ridiculous case, but it seems like we're being pretty dumb here- for every input row from the outer table, we're looking through all 41K *duplicate* keys in that one hash bucket. This goes back to the suggestion I just made- if the hash bucket list contained only unique values (which are a result of actual hash collisions), then we'd only be doing *one* comparison for each input row of the outer table that doesn't match- and when we found one which *did*, we would only need to step through the dup list for that one entry and blast back all of those rows, forgetting about the rest of the bucket which we know can't match. > Also, I think I found an important assumption that seems dubious (in > comment for estimate_hash_bucketsize()): I've wondered about that also. It certainly seems quite bogus that we can end up with an 'estimated # of entries in a bucket' that's larger than the # of entries we've found for the MCV in the table, especially *double* that. > Stephen, do you think this could explain your problem? As I tried to articulate in my initial email- even if we had a *perfect* answer to "how many comparisons will we need to do", the current costing would cause us to pick the plan that, intuitively and empirically, takes longer (hash the 41M row table) because that cost is multiplied times the number of outer row tables and the cpu_tuple_cost (charged to build the hash table) isn't high enough relative to the cpu_op_cost (charged to do the comparisons in the buckets). Thanks, Stephen
signature.asc
Description: Digital signature