* Tom Lane (t...@sss.pgh.pa.us) wrote:
Stephen Frost sfr...@snowman.net writes:
I'm certainly curious about those, but I'm also very interested in the
possibility of making NTUP_PER_BUCKET much smaller, or perhaps variable
depending on the work_mem setting.
Not sure about that. That
In this example, hashing the large table is actually 2 seconds *faster*
than hashing the small table (again, all on my laptop).
Are you running the laptop on battery? When I've benchmarked pgsql last
time I used my laptop as well and it only occured to me after a lot of
trying that laptops
* Matthias (nitrogen...@gmail.com) wrote:
In this example, hashing the large table is actually 2 seconds *faster*
than hashing the small table (again, all on my laptop).
Are you running the laptop on battery? When I've benchmarked pgsql
last time I used my laptop as well and it only occured
Tom, all,
* Tom Lane (t...@sss.pgh.pa.us) wrote:
So that's at least going in the right direction.
I agree that this is going in the right direction; it certainly would
make the plan that I *expect* to be chosen more likely, however..
I've been fiddling with this on the very much larger overall
Stephen Frost sfr...@snowman.net writes:
I've been fiddling with this on the very much larger overall database
where this test case came from and have found that hashing the large
table can actually be *faster* and appears to cause a more consistent
and constant amount of disk i/o (which is
* Tom Lane (t...@sss.pgh.pa.us) wrote:
What I'm trying to get at in this overall email is: why in the world is
it so expensive to do hash lookups?
perf or oprofile reveal anything?
Working on a test case actually- I've got one now:
http://snowman.net/~sfrost/test_case2.sql
In this
* Tom Lane (t...@sss.pgh.pa.us) wrote:
perf or oprofile reveal anything?
Here's what we get from oprofile (perhaps not too surprising):
Hash the small table / scan the big table:
samples cum. samples %cum. % linenr info image
name symbol name
167374
* Stephen Frost (sfr...@snowman.net) wrote:
8504125241524.3624 72.3115mcount.c:60
libc-2.15.so __mcount_internal
28370280785 8.1274 80.4389_mcount.S:33
libc-2.15.so mcount
[...]
And as a side-note, I'm
* Stephen Frost (sfr...@snowman.net) wrote:
It does look like reducing bucket depth, as I outlined before through
the use of a 2-level hashing system, might help speed up
ExecScanHashBucket, as it would hopefully have very few (eg: 1-2)
entries to consider instead of more. Along those same
Stephen Frost sfr...@snowman.net writes:
Looking with opannotate, there's two main hotspots in
ExecScanHashBucket:
12846 17.4001 :hashTuple =
hashtable-buckets[hjstate-hj_CurBucketNo];
and
22100 29.9348 :hashTuple = hashTuple-next;
Those are, of course, pretty trivial
On Thu, Apr 04, 2013 at 04:16:12PM -0400, Stephen Frost wrote:
* Stephen Frost (sfr...@snowman.net) wrote:
It does look like reducing bucket depth, as I outlined before through
the use of a 2-level hashing system, might help speed up
ExecScanHashBucket, as it would hopefully have very few
On Sun, 2013-03-31 at 15:45 -0400, Tom Lane wrote:
Really, when we're traipsing down a bucket
list, skipping over bucket entries with the wrong hash code is just
about free, or at least it's a whole lot cheaper than applying ExecQual.
Perhaps what we should do is charge the hash_qual_cost
On Mon, Apr 1, 2013 at 2:35 AM, Jeff Davis pg...@j-davis.com wrote:
On Sun, 2013-03-31 at 15:45 -0400, Tom Lane wrote:
Really, when we're traipsing down a bucket
list, skipping over bucket entries with the wrong hash code is just
about free, or at least it's a whole lot cheaper than applying
I wrote:
Perhaps what we should do is charge the hash_qual_cost only for some
small multiple of the number of tuples that we expect will *pass* the
hash quals, which is a number we have to compute anyway. The multiple
would represent the rate of hash-code collisions we expect.
I tried the
Stephen Frost sfr...@snowman.net writes:
* Jeff Davis (pg...@j-davis.com) wrote:
In Stephen's case the table was only 41KB, so something still seems off.
Maybe we should model the likelihood of a collision based on the
cardinalities (assuming a reasonably good hash function)?
It's not really
Jeff,
* Jeff Davis (pg...@j-davis.com) wrote:
On Thu, 2013-03-28 at 19:56 -0400, Stephen Frost wrote:
41K hashed, seqscan 4M: 115030.10 + 1229.46 = 116259.56
4M hashed, seqscan 41K: 1229.46 + 211156.20 = 212385.66
I think those are backwards -- typo?
Yes, sorry, those are backwards.
* Tom Lane (t...@sss.pgh.pa.us) wrote:
I think the point is that there may *not* be few hash collisions ...
Right, but that's actually all entirely based on concerns over there
being duplicates (hence why we consider the MCVs and ndistinct), which
makes *some* sense given that we currently have
* Jeff Davis (pg...@j-davis.com) wrote:
In Stephen's case the table was only 41KB, so something still seems off.
Maybe we should model the likelihood of a collision based on the
cardinalities (assuming a reasonably good hash function)?
It's not really 'hash collisions' that we're trying to be
On Thu, 2013-03-28 at 19:56 -0400, Stephen Frost wrote:
41K hashed, seqscan 4M: 115030.10 + 1229.46 = 116259.56
4M hashed, seqscan 41K: 1229.46 + 211156.20 = 212385.66
I think those are backwards -- typo?
In the end, I think the problem here is that we are charging far too
much for
Jeff Davis pg...@j-davis.com writes:
Yes, I have run into this issue (or something very similar). I don't
understand why the bucketsize even matters much -- assuming few hash
collisions, we are not actually evaluating the quals any more times than
necessary. So why all of the hashjoin-specific
On Fri, 2013-03-29 at 16:37 -0400, Tom Lane wrote:
Jeff Davis pg...@j-davis.com writes:
Yes, I have run into this issue (or something very similar). I don't
understand why the bucketsize even matters much -- assuming few hash
collisions, we are not actually evaluating the quals any more
21 matches
Mail list logo