On Wed, Mar 1, 2017 at 8:48 AM, Andres Freund <and...@anarazel.de> wrote: >> BTW, another option to consider is lowering the target fillfactor. >> IIRC, Kuntal mentioned to me that cranking it down seemed to fix the >> issue. Obviously, it's better to detect when we need a lower >> fillfactor than to always use a lower one, but obviously the tighter >> you pack the hash table, the more likely it is that you're going to >> have these kinds of problems. > > Yea, that'd be one approach, but I feel better dynamically increasing > the fillfactor like in the patch. Even with a lower fillfactor you could > see issues, and as you say a higher fillfactor is nice [TM]; after the > patch I played with *increasing* the fillfactor, without being able to > measure a downside. I've tested with different fill factors and the query got completed with fill factor 0.6.
With linear probing, the performance degrades more quickly at high fill factors because of primary clustering, a tendency for one collision to cause more nearby collisions. So, increasing the fill factor further doesn't seem to be a good idea. Besides, when I've tested with the patch, I've seen hash table grows even when the 10-12% of the hash table is filled. LOG: size: 8388608, members: 845056, filled: 0.100739, total chain: 735723, max chain: 37, avg chain: 0.870620, total_collisions: 219101, max_collisions: 6, avg_collisions: 0.259274 STATEMENT: explain analyze select l_orderkey from lineitem group by l_orderkey; LOG: size: 16777216, members: 845056, filled: 0.050369, total chain: 197047, max chain: 6, avg chain: 0.233176, total_collisions: 121185, max_collisions: 5, avg_collisions: 0.143405 STATEMENT: explain analyze select l_orderkey from lineitem group by l_orderkey; LOG: size: 16777216, members: 2127649, filled: 0.126818, total chain: 1471327, max chain: 35, avg chain: 0.691527, total_collisions: 486310, max_collisions: 6, avg_collisions: 0.228567 STATEMENT: explain analyze select l_orderkey from lineitem group by l_orderkey; LOG: size: 33554432, members: 2127649, filled: 0.063409, total chain: 413073, max chain: 7, avg chain: 0.194145, total_collisions: 265766, max_collisions: 5, avg_collisions: 0.124911 STATEMENT: explain analyze select l_orderkey from lineitem group by l_orderkey; This seems bad. We're wasting a lot of memory in the hash table. Apart from that, I was thinking about the following optimization in the code. /* * If the bucket is not empty, we either found a match (in which case * we're done), or we have to decide whether to skip over or move the * colliding entry. When the colliding element's distance to its * optimal position is smaller than the to-be-inserted entry's, we * shift the colliding entry (and its followers) forward by one. */ I'm worried about the fact that we are increasing the chain length of each follower. It may happen that we've increased the chain length of the last element by a huge factor. So, I was looking for other alternatives and I've found one called RobinHood hashing. In this approach, when you probe for a position to insert a new element, if the probe length for the existing element is less than the current probe length for the element being inserted, swap the two elements and keep going. It leads to a low variance for the chain lengths rather than the last one. Is this approach already considered? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers