On 09/11/2015 06:55 PM, Robert Haas wrote:
On Wed, Sep 9, 2015 at 11:54 AM, Tomas Vondra
Secondly, we limit the number of buckets to INT_MAX, so about 16GB (because
buckets are just pointers). No matter how awful estimate you get (or how
insanely high you set work_mem) you can't exceed this.
OK, so this is an interesting point, and I think it clarifies things.
Essentially, we're arguing about whether a 16GB limit is as good as a
512MB limit. Right now, if we would have allocated more than 512MB,
we instead fail. There are two possible solutions:
1. I'm arguing for maintaining the 512MB limit, but by clamping the
allocation to 512MB (and the number of buckets accordingly) so that it
works with fewer buckets instead of failing.
2. You're arguing for removing the 512MB limit, allowing an initial
allocation of up to 16GB.
I'm arguing for fixing the existing bug, and then addressing the case of
over-estimation separately, with proper analysis.
My judgement is that #2 could give some people a nasty surprise, in
that such a large initial allocation might cause problems, especially
if driven by a bad estimate. Your judgement is that this is unlikely
to be a problem, and that the performance consequences of limiting a
hash join to an initial allocation of 64 million buckets rather than 2
billion buckets are the thing to worry about.
Not quite, my judgment is that
- We shouldn't address this in this particular bugfix, because it's a
separete problem (even if we limit the initial allocation, we still
have to fix the repalloc after we build the Hash).
- I assume the "might cause problems" refers to malloc() issues on some
platforms. In that case we still have to apply it to both places, not
just to the initial allocation. I don't know if this is a problem (I
haven't heard any such reports until now), but if it is we better
address this consistently everywhere, not just this one place.
- I'm not really sure about the impact of the additional resize. I
surely don't want to significantly penalize the well-estimated cases,
so I'd like to see some numbers first.
I guess we'll need to wait for some other opinions.
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: