On Sunday, June 23, 2013, Simon Riggs wrote: > On 23 June 2013 03:16, Stephen Frost <sfr...@snowman.net <javascript:;>> > wrote: > > > Will think on it more. > > Some other thoughts related to this... > > * Why are we building a special kind of hash table? Why don't we just > use the hash table code that we in every other place in the backend. > If that code is so bad why do we use it everywhere else? That is > extensible, so we could try just using that. (Has anyone actually > tried?)
I've not looked at the hash table in the rest of the backend. > * We're not thinking about cache locality and set correspondence > either. If the join is expected to hardly ever match, then we should > be using a bitmap as a bloom filter rather than assuming that a very > large hash table is easily accessible. That's what I was suggesting earlier, though I don't think it's technically a bloom filter- doesn't that require multiple hash functions?I don't think we want to require every data type to provide multiple hash functions. > * The skew hash table will be hit frequently and would show good L2 > cache usage. I think I'll try adding the skew table always to see if > that improves the speed of the hash join. > The skew tables is just for common values though... To be honest, I have some doubts about that structure really being a terribly good approach for anything which is completely in memory. Thanks, Stephen