I'm trying to understand how I can get the planner to always do the
right thing with this query:

        , bb_t
        , cc_t
    WHERE bb_t.bb_id = aa_t.bb_id
        AND aa_t.realm_id = cc_t.realm_id
        AND aa_t.server_id = 21
    ORDER BY aa_t.min_date_time desc
    LIMIT 1
    OFFSET 674

There's a extreme elbow in the performance curve around the sum of
LIMIT and OFFSET.  The two plans follow.  First for the query above:

 Limit  (cost=21569.56..21601.56 rows=1 width=84) (actual time=59.60..59.69 rows=1 
   ->  Nested Loop  (cost=0.00..110535.66 rows=3454 width=84) (actual time=0.19..59.20 
rows=676 loops=1)
         ->  Nested Loop  (cost=0.00..93177.46 rows=3454 width=65) (actual 
time=0.14..44.41 rows=676 loops=1)
               ->  Index Scan Backward using aa_t20 on aa_t  (cost=0.00..76738.77 
rows=3454 width=46) (actual time=0.10..31.30 rows=676 loops=1)
                     Filter: (server_id = 21::numeric)
               ->  Index Scan using cc_t1 on cc_t  (cost=0.00..4.75 rows=1 width=19) 
(actual time=0.01..0.01 rows=1 loops=676)
                     Index Cond: ("outer".realm_id = cc_t.realm_id)
         ->  Index Scan using bb_t1 on bb_t  (cost=0.00..5.01 rows=1 width=19) (actual 
time=0.02..0.02 rows=1 loops=676)
               Index Cond: (bb_t.bb_id = "outer".bb_id)
 Total runtime: 59.89 msec
(10 rows)

Setting OFFSET to 675 in the above query, results in this 100 times
slower plan:

 Limit  (cost=21614.48..21614.48 rows=1 width=84) (actual time=4762.39..4762.39 rows=1 
   ->  Sort  (cost=21612.79..21621.42 rows=3454 width=84) (actual 
time=4761.45..4761.92 rows=677 loops=1)
         Sort Key: aa_t.min_date_time
         ->  Merge Join  (cost=21139.96..21409.80 rows=3454 width=84) (actual 
time=4399.80..4685.24 rows=41879 loops=1)
               Merge Cond: ("outer".bb_id = "inner".bb_id)
               ->  Sort  (cost=8079.83..8184.53 rows=41879 width=19) (actual 
time=936.99..967.37 rows=41879 loops=1)
                     Sort Key: bb_t.bb_id
                     ->  Seq Scan on bb_t  (cost=0.00..4864.79 rows=41879 width=19) 
(actual time=0.06..729.60 rows=41879 loops=1)
               ->  Sort  (cost=13060.13..13068.76 rows=3454 width=65) (actual 
time=3462.76..3493.97 rows=41879 loops=1)
                     Sort Key: aa_t.bb_id
                     ->  Merge Join  (cost=12794.42..12857.14 rows=3454 width=65) 
(actual time=2923.62..3202.78 rows=41879 loops=1)
                           Merge Cond: ("outer".realm_id = "inner".realm_id)
                           ->  Sort  (cost=12762.78..12771.41 rows=3454 width=46) 
(actual time=2920.78..2950.87 rows=41879 loops=1)
                                 Sort Key: aa_t.realm_id
                                 ->  Index Scan using aa_t5 on aa_t  
(cost=0.00..12559.79 rows=3454 width=46) (actual time=0.18..2589.22 rows=41879 loops=1)
                                       Index Cond: (server_id = 21::numeric)
                           ->  Sort  (cost=31.64..32.78 rows=455 width=19) (actual 
time=2.54..33.12 rows=42163 loops=1)
                                 Sort Key: cc_t.realm_id
                                 ->  Seq Scan on cc_t  (cost=0.00..11.55 rows=455 
width=19) (actual time=0.04..0.86 rows=455 loops=1)
 Total runtime: 4792.84 msec
(20 rows)

Twiddling effective_cache_size and random_page_cost allows for a large
LIMIT+OFFSET number but not enough.  These tests are made with 400000
effective_cache_size and random_page_cost of 4.

I can increase the LIMIT+OFFSET elbow to 1654 by changing the
query thusly:

< AND aa_t.server_id = 21
> AND aa_t.server_id IN (21, 0)

The value 0 is an invalid server_id, so I know it won't be returned.
However, I've got 41K rows that could be returned by this query and
growing, and 1654 is obviously not enough.  (aa is 690K rows, bb is
41K rows, and cc is 500 rows.)

If I drop the ORDER BY, the query goes much faster, but the query is
useless without the ORDER BY.

I've figured out that the second plan is slow, because it is writing a
huge result set to disk (+200MB).  This doesn't make sense to me,
since sort_mem is 32000.

Is there a way to tell the optimizer to use Nested Loop plan always
instead of the Merge/Join plan?  Turning off enable_mergejoin is
obviously not an option.


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Reply via email to