On 19.7.2014 20:28, Tomas Vondra wrote:
> On 19.7.2014 20:24, Tom Lane wrote:
>> Tomas Vondra <t...@fuzzy.cz> writes:
>>> I've reviewed the two test cases mentioned here, and sadly there's
>>> nothing that can be 'fixed' by this patch. The problem here lies in the
>>> planning stage, which decides to hash the large table - we can't fix
>>> that in the executor.
>> We've heard a couple reports before of the planner deciding to hash a
>> larger table rather than a smaller one.  The only reason I can think of
>> for that is if the smaller table has many more duplicates, so that the
>> planner thinks the executor might end up traversing long hash chains.
>> The planner's estimates could easily be off in this area, of course.
>> estimate_hash_bucketsize() is the likely culprit if it's wrong.
>> Which test case are you seeing this in, exactly?
> The two reported by Stephen here:
> http://www.postgresql.org/message-id/20130328235627.gv4...@tamriel.snowman.net
> Just download this (I had to gunzip it):
>   http://snowman.net/~sfrost/test_case.sql
>   http://snowman.net/~sfrost/test_case2.sql

Anyway, looking a bit at the distributions, I don't think the
distributions are significantly skewed. In the first testscase, I get this

test=# select cnt, count(*) from (select id_short, count(*) AS cnt from
small_table group by 1) foo group by cnt order by cnt;
 cnt | count
   1 |    23
   2 | 18795
   3 |    13
   4 |   726
   5 |     4
   6 |    93
   8 |     4
  10 |     1
(8 rows)

and in the second one I get this:

test=# select cnt, count(*) from (select id_short, count(*) AS cnt from
small_table group by 1) foo group by cnt order by cnt;
 cnt | count
   1 | 182161
   2 |   5582
   3 |    371
   4 |     28
   5 |      2
   6 |      3
   9 |      1
(7 rows)

So while #1 contains most values twice, #2 is almost perfectly distinct.
In the big_table, the values are unique.

For the first case, a WARNING at the end of estimate_hash_bucketsize
says this:

WARNING:  nbuckets=8388608.00 estfract=0.000001
WARNING:  nbuckets=65536.00 estfract=0.000267

There are 4.3M rows in the big_table, so it says ~4 tuples (selectivity
>= 1e-6), and ~10 tuples/bucket for the small_table (42k rows).

For the second case, I get this:

WARNING:  nbuckets=16777216.00 estfract=0.000001
WARNING:  nbuckets=262144.00 estfract=0.000100

i.e. ~11 tuples/bucket for big_table and ~20 tuples for small_table.

That sounds really strange, because small_table in the second test case
is almost perfectly unique. And in the first test case, there are only
very few keys with >2 occurences.

By explicitly setting the selectivity in estimate_hash_bucketsize to
1e-6, I got these plans:

Test case #1

                          QUERY PLAN
 Hash Join  (cost=1229.46..79288.95 rows=41176 width=24) (actual
time=50.397..634.986 rows=13731 loops=1)
   Hash Cond: (big_table.id_short = small_table.id_short)
   ->  Seq Scan on big_table  (cost=0.00..61626.71 rows=4272271 width=4)
(actual time=0.018..229.597 rows=4272271 loops=1)
   ->  Hash  (cost=714.76..714.76 rows=41176 width=24) (actual
time=31.653..31.653 rows=41176 loops=1)
         Buckets: 65536  Batches: 1  Memory Usage: 3086kB
         ->  Seq Scan on small_table  (cost=0.00..714.76 rows=41176
width=24) (actual time=0.012..13.341 rows=41176 loops=1)
 Planning time: 0.597 ms
 Execution time: 635.498 ms
(8 rows)

Without explain analyze: 486 ms (original plan: >850ms).

Test case #2

                          QUERY PLAN
 Hash Join  (cost=5240.21..220838.44 rows=194587 width=4) (actual
time=88.252..2143.457 rows=149540 loops=1)
   Hash Cond: (big_table.id_short = small_table.id_short)
   ->  Seq Scan on big_table  (cost=0.00..169569.72 rows=11755372
width=4) (actual time=0.018..533.955 rows=11755475 loops=1)
   ->  Hash  (cost=2807.87..2807.87 rows=194587 width=4) (actual
time=87.548..87.548 rows=194587 loops=1)
         Buckets: 262144  Batches: 1  Memory Usage: 8889kB
         ->  Seq Scan on small_table  (cost=0.00..2807.87 rows=194587
width=4) (actual time=0.012..29.929 rows=194587 loops=1)
 Planning time: 0.438 ms
 Execution time: 2146.818 ms
(8 rows)

Without explain analyze: 1600 ms (original plan: >2300ms)

The differences are fairly small - well, it's 30-40% faster, but it's
less than 1s in both cases. I'm wondering whether we could get into
similar problems with much longer queries and still get 30-40% improvement.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to