On 10/02/2016 02:17 AM, Andres Freund wrote:
> ...

I think a crucial part of the benchmarking will be identifying and
measuring corner cases, e.g. a lot of rows with the same key, etc.
Although that probably is not a major issue for the two places
switched to the new implementation (e.g. execGrouping merges the
duplicates to a single group, by definition).

Hm. I don't really see a case where that's going to cause issues - all
the execGrouping.c users only store one key, and either ignore later
ones, or add the data to the initial tuple in the group.  I don't really
see any case where repeated keys should cause an issue for a hash table?

Yep, that's pretty much what I suggested. I was wondering more about the
other places where this hash table might be used - I've been thinking about
hashjoins, but now that I think of it - that's a fairly specialized and
tuned code. In any case, let's not complicate the discussion for now.

My point is that I don't really see any potential usecase where
that's an issue.

OK, I think we agree the two places modified by the submitted patch series are fine. Let's leave discussion about places modified by future patches for the future ;-)

A few comments, after quickly skimming through the first two patches:

1) SH_CREATE does this:

        /* round up size to the next power of 2, eases lookups */
        if (size < 2)
                size = 2;
                size = sh_pow2_int(size);

I very much doubt a hash table with 2 buckets is very useful. What about
using some reasonable lower boundary - IIRC hashjoins use 1024 buckets,
which might be a bit too much, but perhaps 256 would work?

For some cases that'll waste a good bit of space. Consider
e.g. catcache.c. Callers can (and do) specify more appropriate sizes for
the individual uses.

Hmmm, OK. I have my doubts about those hardcoded constants, but you're right 256 is probably an overkill.


Do we expect to shrink hash tables?

Not at the moment. Should be doable, but I'm not sure it's worth
developing and testing - I don't see any usecases in pg atm.

OK, sure. Then what about adding this to SH_RESIZE?

Assert(oldsize <= newsize);

I assumed we might shrink the catcache after invalidation, for example (but maybe I don't really know how that works).

It's not entirely clear why is it guaranteed that there's always
an element with optimal position (when load factor < 1)? Adding an
 explanation or a link to a paper would be helpful, I guess.

As the load factor always has to be below 1, there always has to be
an empty bucket. Every bucket after an empty bucket has to be at
it's optimal position.

Hmmm, thanks to moving the entries after delete? Adding an assert() enforcing this seems like a good idea.

7) Should hash agg size estimation for the planner consider the fillfactor?

I think we should account for fillfactor - we should account for the
allocated memory, even if there's free space in the hash table. After all,
that's what hashjoins do.

We didn't really for dynahash (as it basically assumed a fillfactor of 1
all the time), not sure if changing it won't also cause issues.

That's a case of glass half-full vs. half-empty, I guess. If we assumed load factor 1.0, then I see it as accounting for load factor (while you see it as not accounting of it).

I don't see why this should cause issues - of course, the hash table may be a bit larger, exceed work_mem and make the tidbitmap lossy, or switch HashAggregate to GroupAggregate. But I don't think that's a major issue, as those decisions depend on estimates anyway, so we can't really guarantee them.

Also, with load factor 0.8 the table is only ~20% larger, so even if we don't include that into work_mem it's a reasonably small error (easily smaller than errors in the pre-9.5 HashJoin accounting, which did not include chunk headers etc.).

So I won't fight for this, but I don't see why not to account for it.


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to