On 09/24/2015 05:09 PM, Robert Haas wrote:
On Thu, Sep 24, 2015 at 9:49 AM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
So while it does not introduce behavior change in this particular
case (because it fails, as you point out), it introduces a behavior
change in general - it simply triggers behavior that does not
happen below the limit. Would we accept the change if the proposed
limit was 256MB, for example?

So, I'm a huge fan of arbitrary limits.

That's probably the single thing I'll say this year that sounds most
 like a troll, but it isn't. I really, honestly believe that.
Doubling things is very sensible when they are small, but at some
point it ceases to be sensible. The fact that we can't set a
black-and-white threshold as to when we've crossed over that line
doesn't mean that there is no line. We can't imagine that the
occasional 32GB allocation when 4GB would have been optimal is no
more problematic than the occasional 32MB allocation when 4MB would
have been optimal. Where exactly to put the divider is subjective,
but "what palloc will take" is not an obviously unreasonable
barometer.

There are two machines - one with 32GB of RAM and work_mem=2GB, the other one with 256GB of RAM and work_mem=16GB. The machines are hosting about the same data, just scaled accordingly (~8x more data on the large machine).

Let's assume there's a significant over-estimate - we expect to get about 10x the actual number of tuples, and the hash table is expected to almost exactly fill work_mem. Using the 1:3 ratio (as in the query at the beginning of this thread) we'll use ~512MB and ~4GB for the buckets, and the rest is for entries.

Thanks to the 10x over-estimate, ~64MB and 512MB would be enough for the buckets, so we're wasting ~448MB (13% of RAM) on the small machine and ~3.5GB (~1.3%) on the large machine.

How does it make any sense to address the 1.3% and not the 13%?


Of course, if we can postpone sizing the hash table until after the
input size is known, as you suggest, then that would be better still
 (but not back-patch material).

This dynamic resize is 9.5-only anyway.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to