On Sun, Feb 28, 2016 at 2:06 AM, David Rowley <david.row...@2ndquadrant.com>
wrote:

> On 27 February 2016 at 11:07, James Parks <james.pa...@meraki.net> wrote:
> >
> > CREATE TABLE a (id bigint primary key, nonce bigint);
> > CREATE TABLE b (id bigint primary key, a_id bigint not null);
> > CREATE INDEX a_idx ON b (a_id);
> >
> > The query:
> >
> > SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = ? ORDER BY
> b.id
> > ASC;
> >
> > (skip down to [1] and [2] to see the query performance)
> >
> > What I know:
> >
> > If you force the query planner to use a merge join on the above query, it
> > takes 10+ minutes to complete using the data as per below. If you force
> the
> > query planner to use a hash join on the same data, it takes ~200
> > milliseconds.
>
> I believe I know what is going on here, but can you please test;
>
> SELECT b.* FROM b WHERE EXISTS (SELECT 1 FROM a ON b.a_id = a.id AND
> a.nonce = ?) ORDER BY b.id ASC;
>
> using the merge join plan.
>
>
Here's the query plan for that query (slight modifications to get it to
run):

postgres=# explain (analyze,buffers) SELECT b.* FROM b WHERE EXISTS (SELECT
1 FROM a WHERE b.a_id = a.id AND a.nonce = 64) ORDER BY b.id ASC;
                                                            QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=16639.19..16855.62 rows=86572 width=16) (actual
time=145.117..173.298 rows=83427 loops=1)
   Sort Key: b.id
   Sort Method: external merge  Disk: 2112kB
   Buffers: shared hit=86193 read=881, temp read=269 written=269
   I/O Timings: read=3.199
   ->  Merge Semi Join  (cost=795.82..8059.09 rows=86572 width=16) (actual
time=6.680..91.862 rows=83427 loops=1)
         Merge Cond: (b.a_id = a.id)
         Buffers: shared hit=86193 read=881
         I/O Timings: read=3.199
         ->  Index Scan using a_idx on b  (cost=0.00..3036.70 rows=86572
width=16) (actual time=0.005..25.193 rows=83427 loops=1)
               Buffers: shared hit=1064 read=374
               I/O Timings: read=1.549
         ->  Index Scan using a_pkey on a  (cost=0.00..26259.85 rows=891
width=8) (actual time=6.663..35.177 rows=237 loops=1)
               Filter: (nonce = 64)
               Rows Removed by Filter: 87939
               Buffers: shared hit=85129 read=507
               I/O Timings: read=1.650
 Total runtime: 186.825 ms

... so, yes, it does indeed get a lot faster


> If this performs much better then the problem is due to the merge join
> mark/restore causing the join to have to transition through many
> tuples which don't match the a.nonce = ? predicate. The mark and
> restore is not required for the rewritten query, as this use a semi
> join rather than a regular inner join. With the semi join the executor
> knows that it's only meant to be matching a single tuple in "a", so
> once the first match is found it can move to the next row in the outer
> relation without having to restore the scan back to where it started
> matching that inner row again.
>
> If I'm right, to get around the problem you could; create index on a
> (nonce, id);
>
>
postgres=# CREATE INDEX a_id_nonce_idx ON a (nonce, id);
CREATE INDEX
postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id =
a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
                                                                 QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3376.57..3376.92 rows=140 width=16) (actual
time=144.340..160.875 rows=83427 loops=1)
   Sort Key: b.id
   Sort Method: external merge  Disk: 2120kB
   Buffers: shared hit=368 read=522, temp read=266 written=266
   ->  Merge Join  (cost=69.50..3371.58 rows=140 width=16) (actual
time=0.056..88.409 rows=83427 loops=1)
         Merge Cond: (b.a_id = a.id)
         Buffers: shared hit=368 read=522
         ->  Index Scan using a_idx on b  (cost=0.29..2724.58 rows=83427
width=16) (actual time=0.009..23.834 rows=83427 loops=1)
               Buffers: shared hit=171 read=518
         ->  Index Only Scan using a_id_nonce_idx on a  (cost=0.42..2450.58
rows=820 width=8) (actual time=0.041..20.776 rows=83658 loops=1)
               Index Cond: (nonce = 64)
               Heap Fetches: 83658
               Buffers: shared hit=197 read=4
 Planning time: 0.241 ms
 Execution time: 172.346 ms

Looks pretty fast to me. That being said, the number of rows returned by
the Index Only Scan seems a bit high, as compared to the results below, so
I added your patch below and got [2].


> If such an index is out of the question then a patch has been
> submitted for review which should fix this problem in (hopefully)
> either 9.6 or 9.7
> https://commitfest.postgresql.org/9/129/
> If you have a test environment handy, it would be nice if you could
> test the patch on the current git head to see if this fixes your
> problem. The findings would be quite interesting for me. Please note
> this patch is for test environments only at this stage, not for
> production use.
>

Can confirm that your patch there seems to improve performance.

HEAD [1], with patch:

postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id =
a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
                                                            QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=7320.39..7320.75 rows=144 width=16) (actual
time=162.199..181.497 rows=83427 loops=1)
   Sort Key: b.id
   Sort Method: external merge  Disk: 2120kB
   Buffers: shared hit=85780 read=472, temp read=266 written=266
   ->  Merge Join  (cost=808.17..7315.23 rows=144 width=16) (actual
time=9.719..101.864 rows=83427 loops=1)
         Merge Cond: (b.a_id = a.id)
         Buffers: shared hit=85780 read=472
         ->  Index Scan using a_idx on b  (cost=0.29..2721.49 rows=83427
width=16) (actual time=0.059..26.346 rows=83427 loops=1)
               Buffers: shared hit=460 read=229
         ->  Index Scan using a_pkey on a  (cost=0.42..24394.88 rows=846
width=8) (actual time=9.651..41.231 rows=237 loops=1)
               Filter: (nonce = 64)
               Rows Removed by Filter: 87939
               Buffers: shared hit=85320 read=243
 Planning time: 0.289 ms
 Execution time: 195.071 ms

HEAD [1], without patch:

postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id =
a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
                                                             QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=7285.48..7285.83 rows=140 width=16) (actual
time=710073.059..710089.924 rows=83427 loops=1)
   Sort Key: b.id
   Sort Method: external merge  Disk: 2120kB
   Buffers: shared hit=2149282200 read=243, temp read=266 written=266
   ->  Merge Join  (cost=685.28..7280.49 rows=140 width=16) (actual
time=10.022..709878.946 rows=83427 loops=1)
         Merge Cond: (b.a_id = a.id)
         Buffers: shared hit=2149282200 read=243
         ->  Index Scan using a_idx on b  (cost=0.29..2724.58 rows=83427
width=16) (actual time=0.013..59.103 rows=83427 loops=1)
               Buffers: shared hit=689
         ->  Index Scan using a_pkey on a  (cost=0.42..24404.69 rows=820
width=8) (actual time=9.998..709595.905 rows=83658 loops=1)
               Filter: (nonce = 64)
               Rows Removed by Filter: 2201063696
               Buffers: shared hit=2149281511 read=243
 Planning time: 0.297 ms
 Execution time: 710101.931 ms

Thank you for your response! This allows us to understand the situations in
which we can run into trouble, and what we can do in most, if not all
cases, to resolve.

Regards,
James

[1] HEAD is "a892234 Change the format of the VM fork to add a second bit
per page." as found on the Github mirror of postgres

[2]
postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id =
a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
                                                                QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=11000.03..11001.07 rows=417 width=16) (actual
time=109.739..129.031 rows=83427 loops=1)
   Sort Key: b.id
   Sort Method: external merge  Disk: 2120kB
   Buffers: shared hit=886, temp read=266 written=266
   ->  Merge Join  (cost=0.71..10981.88 rows=417 width=16) (actual
time=0.050..55.701 rows=83427 loops=1)
         Merge Cond: (b.a_id = a.id)
         Buffers: shared hit=886
         ->  Index Scan using a_idx on b  (cost=0.29..3975.70 rows=83427
width=16) (actual time=0.017..23.647 rows=83427 loops=1)
               Buffers: shared hit=689
         ->  Index Only Scan using a_id_nonce_idx on a  (cost=0.42..6787.31
rows=2451 width=8) (actual time=0.028..0.172 rows=237 loops=1)
               Index Cond: (nonce = 64)
               Heap Fetches: 237
               Buffers: shared hit=197
 Planning time: 0.153 ms
 Execution time: 142.994 ms

Reply via email to