Re: [PERFORM] ORDER BY Optimization

2005-05-09 Thread Derek Buttineau|Compu-SOLVE
Thanks for the response :)
You could probably get your larger server to try the no-sort plan if
you said set enable_sort = 0 first.  It would be interesting to
compare the EXPLAIN ANALYZE results for that case with the other
server.
 

Odd, I went to investigate this switch on the larger server, but the 
query planner is now using the reverse index sort for this particular 
subscription.  I'm guessing it's now accumulated enough rows for the 
planner to justify the reverse sort?

Limit  (cost=0.00..14808.49 rows=10 width=299) (actual 
time=3.760..11.689 rows=10 loops=1)
  -  Nested Loop  (cost=0.00..15594816.65 rows=10531 width=299) 
(actual time=3.750..11.600 rows=10 loops=1)
-  Index Scan Backward using maillog_msg_date_idx on maillog 
m  (cost=0.00..805268.22 rows=2454190 width=256) (actual 
time=0.132..5.548 rows=194 loops=1)
  Filter: (spam = 1)
-  Index Scan using 
maillog_received_subscription_maillog_id_idx on maillog_received mr  
(cost=0.00..6.01 rows=1 width=43) (actual time=0.020..0.021 rows=0 
loops=194)
  Index Cond: ((mr.subscription = 89) AND (mr.maillog_id = 
outer.id))
Total runtime: 11.878 ms

I decided to try the same query with enable_sort on and off to see what 
sort of a difference it made roughly:

With enable_sort = 1:
Limit  (cost=7515.77..7515.79 rows=10 width=299) (actual 
time=13153.300..13153.412 rows=10 loops=1)
  -  Sort  (cost=7515.77..7516.26 rows=196 width=299) (actual 
time=13153.288..13153.324 rows=10 loops=1)
Sort Key: m.msg_date
-  Nested Loop  (cost=0.00..7508.30 rows=196 width=299) 
(actual time=0.171..13141.099 rows=853 loops=1)
  -  Index Scan using maillog_received_subscription_idx on 
maillog_received mr  (cost=0.00..4266.90 rows=1069 width=43) (actual 
time=0.095..5240.645 rows=993 loops=1)
Index Cond: (subscription = 15245)
  -  Index Scan using maillog_pkey on maillog m  
(cost=0.00..3.02 rows=1 width=256) (actual time=7.893..7.902 rows=1 
loops=993)
Index Cond: (outer.maillog_id = m.id)
Filter: (spam = 1)
Total runtime: 13153.812 ms

With enable_sort = 0;
Limit  (cost=0.00..795580.99 rows=10 width=299) (actual 
time=108.345..3801.446 rows=10 loops=1)
  -  Nested Loop  (cost=0.00..15593387.49 rows=196 width=299) (actual 
time=108.335..3801.352 rows=10 loops=1)
-  Index Scan Backward using maillog_msg_date_idx on maillog 
m  (cost=0.00..805194.97 rows=2453965 width=256) (actual 
time=0.338..3338.096 rows=15594 loops=1)
  Filter: (spam = 1)
-  Index Scan using 
maillog_received_subscription_maillog_id_idx on maillog_received mr  
(cost=0.00..6.01 rows=1 width=43) (actual time=0.020..0.020 rows=0 
loops=15594)
  Index Cond: ((mr.subscription = 15245) AND (mr.maillog_id 
= outer.id))
Total runtime: 3801.676 ms

In comparsion, query plan on the smaller server (it used a sort for this 
subscription vs a reverse scan):

Limit  (cost=197.37..197.38 rows=6 width=313) (actual 
time=883.576..883.597 rows=10 loops=1)
  -  Sort  (cost=197.37..197.38 rows=6 width=313) (actual 
time=883.571..883.577 rows=10 loops=1)
Sort Key: m.msg_date
-  Nested Loop  (cost=0.00..197.29 rows=6 width=313) (actual 
time=106.334..873.928 rows=47 loops=1)
  -  Index Scan using maillog_received_subscription_idx on 
maillog_received mr  (cost=0.00..109.17 rows=28 width=41) (actual 
time=47.289..389.775 rows=58 loops=1)
Index Cond: (subscription = 15245)
  -  Index Scan using maillog_pkey on maillog m  
(cost=0.00..3.13 rows=1 width=272) (actual time=8.319..8.322 rows=1 
loops=58)
Index Cond: (outer.maillog_id = m.id)
Filter: (spam = 1)
Total runtime: 883.820 ms

The contents of the pg_stats row for mr.subscription in each server
would be informative, too.  

I've increased the statistics targets to 300, so these rows are pretty 
bulky, however I've included the rows as text files to this message 
(pg_stats_large.txt and pg_stats_small.txt).

One rowcount estimate that does look
wrong is
  -  Index Scan using maillog_received_subscription_idx on 
maillog_received mr  (cost=0.00..17789.73 rows=4479 width=43) (actual 
time=0.030..33554.061 rows=65508 loops=1)
Index Cond: (subscription = 89)

so the stats row is suggesting there are only 4479 rows with
subscription = 89 when really there are 65508.  (The preceding
discussion hopefully makes it clear why this is a potentially critical
mistake.)
This could potentially make sense on the larger server (if my 
understanding of the vacuum process is correct).  The regular 
maintenance of the large server (which is currently the only one being 
updated regularily), does a vacuum analyze once per day, a scheduled 
vacuum once / hour, and autovacuum for the remainder of the time (which 
might be overkill).  With the function of these tables, it is 

Re: [PERFORM] [SQL] ORDER BY Optimization

2005-05-06 Thread Derek Buttineau|Compu-SOLVE
Thanks for the response :)
That's 50-ish ms versus 80-odd seconds.
It seems to me a merge join might be more appropriate here than a
nestloop. What's your work_mem set at?  Off-the-cuff numbers show the
dataset weighing in the sub-ten mbyte range.
Provided it's not already at least that big, and you don't want to up
it permanently, try saying:
SET work_mem = 10240; -- 10 mbytes
 

It's currently set at 16mb, I've also tried upping sort_mem as well 
without any noticible impact on the uncached query. :(

immediately before running this query (uncached, of course) and see
what happens.
Also, your row-count estimates look pretty off-base.  When were these
tables last VACUUMed or ANALYZEd?
 

I'm not entirely sure what's up with the row-count estimates, the tables 
are updated quite frequently (and VACUUM is also run quite frequently), 
however I had just run a VACUUM ANALYZE on both databases before running 
the explain.

I'm also still baffled at the differences in the plans between the two 
servers, on the one that uses the index to sort, I get for comparison a 
nestloop of:

Nested Loop  (cost=0.00..1175943.99 rows=1814 width=311) (actual 
time=25.337..26.867 rows=10 loops=1)

The plan that the live server seems to be using seems fairly inefficient.
Derek
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]