[PERFORM] Strange planner decision on quite simple select
Hello! I've got a table BOARD_MESSAGE (message_id int8, thread_id int8, ...) with pk on message_id and and a non_unique not_null index on thread_id. A count(*) on BOARD_MESSAGE currently yields a total of 1231171 rows, the planner estimated a total of 1232530 rows in this table. I've got pg_autovacuum running on the database and run an additional nightly VACUUM ANALYZE over it every night. I've got a few queries of the following type: select * from PUBLIC.BOARD_MESSAGE where THREAD_ID = 3354253 order byMESSAGE_ID asc limit 20 offset 0; There are currently roughly 4500 rows with this thread_id in BOARD_MESSAGE. Explain-output is like so: QUERY PLAN -- Limit (cost=0.00..3927.22 rows=20 width=1148) - Index Scan using pk_board_message on board_message (cost=0.00..1100800.55 rows=5606 width=1148) Filter: (thread_id = 3354253) (3 rows) I didn't have the patience to actually complete an explain analyze on that one - I cancelled the query on several attempts after more than 40 minutes runtime. Now I fiddled a little with this statement and tried nudging the planner in the right direction like so: explain analyze select * from (select * from PUBLIC.BOARD_MESSAGE where THREAD_ID = 3354253 order byMESSAGE_ID asc ) as foo limit 20 offset 0; QUERY PLAN - Limit (cost=8083.59..8083.84 rows=20 width=464) (actual time=1497.455..1498.466 rows=20 loops=1) - Subquery Scan foo (cost=8083.59..8153.67 rows=5606 width=464) (actual time=1497.447..1498.408 rows=20 loops=1) - Sort (cost=8083.59..8097.61 rows=5606 width=1148) (actual time=1497.326..1497.353 rows=20 loops=1) Sort Key: message_id - Index Scan using nidx_bm_thread_id on board_message (cost=0.00..7734.54 rows=5606 width=1148) (actual time=0.283..1431.752 rows=4215 loops=1) Index Cond: (thread_id = 3354253) Total runtime: 1502.138 ms Now this is much more like it. As far as I interpret the explain output, in the former case the planner decides to just sort the whole table with it's 1.2m rows by it's primary key on message_id and then filters out the few thousand rows matching the requested thread_id. In the latter case, it selects the few thousand rows with the matching thread_id _first_ and _then_ sorts them according to their message_id. The former attempt involves sorting of more than a million rows and then filtering through the result, the latter just uses the index to retrieve a few thousand rows and sorts those - which is much more efficient. What's more puzzling is that the results vary somewhat depending on the overall load situation. When using the first approach without the subselect, sometimes the planner chooses exactly the same plan as it does with the second approach - with equally satisfying results in regard to total execution time; sometimes it does use the first plan and does complete with a very acceptable execution time, too. But sometimes (when overall load is sufficiently high, I presume) it just runs and runs for minutes on end - I've had this thing running for more than one hour on several occasions until I made some changes to my app which limits the maximum execution time for a query to no more than 55 seconds. With this IMHO quite ugly subselect-workaround, performance is reproducably stable and sufficiently good under either load, so I chose to stick with it for the time being - but I'd still like to know if I could have done anything to have the planner choose the evidently better plan for the first query without such a workaround? Kind regards Markus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Strange planner decision on quite simple select
Markus Wollny wrote: Hello! I've got a table BOARD_MESSAGE (message_id int8, thread_id int8, ...) with pk on message_id and and a non_unique not_null index on thread_id. A count(*) on BOARD_MESSAGE currently yields a total of 1231171 rows, the planner estimated a total of 1232530 rows in this table. I've got pg_autovacuum running on the database and run an additional nightly VACUUM ANALYZE over it every night. I've got a few queries of the following type: select * from PUBLIC.BOARD_MESSAGE where THREAD_ID = 3354253 order byMESSAGE_ID asc limit 20 offset 0; There are currently roughly 4500 rows with this thread_id in BOARD_MESSAGE. Explain-output is like so: QUERY PLAN -- Limit (cost=0.00..3927.22 rows=20 width=1148) - Index Scan using pk_board_message on board_message (cost=0.00..1100800.55 rows=5606 width=1148) Filter: (thread_id = 3354253) (3 rows) I didn't have the patience to actually complete an explain analyze on that one - I cancelled the query on several attempts after more than 40 minutes runtime. Now I fiddled a little with this statement and tried nudging the planner in the right direction like so: Hmm - it shouldn't take that long. If I'm reading this right, it's expecting to have to fetch 5606 rows to match thread_id=3354253 the 20 times you've asked for. Now, what it probably doesn't know is that thread_id is correlated with message_id quite highly (actually, I don't know that, I'm guessing). So - it starts at message_id=1 and works along, but I'm figuring that it needs to reach message_id's in the 3-4 million range to see any of the required thread. Suggestions: 1. Try ORDER BY thread_id,message_id and see if that nudges things your way. 2. Keep #1 and try replacing the index on (thread_id) with (thread_id,message_id) -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Strange planner decision on quite simple select
Hi! -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von Richard Huxton Gesendet: Dienstag, 25. Oktober 2005 12:07 An: Markus Wollny Cc: pgsql-performance@postgresql.org Betreff: Re: [PERFORM] Strange planner decision on quite simple select Hmm - it shouldn't take that long. If I'm reading this right, it's expecting to have to fetch 5606 rows to match thread_id=3354253 the 20 times you've asked for. Now, what it probably doesn't know is that thread_id is correlated with message_id quite highly (actually, I don't know that, I'm guessing). So - it starts at message_id=1 and works along, but I'm figuring that it needs to reach message_id's in the 3-4 million range to see any of the required thread. Reading this I tried with adding a AND MESSAGE_ID = THREAD_ID to the WHERE-clause, as you've guessed quite correctly, both message_id and thread_id are derived from the same sequence and thread_id equals the lowest message_id in a thread. This alone did quite a lot to improve things - I got stable executing times down from an average 12 seconds to a mere 2 seconds - just about the same as with the subselect. Suggestions: 1. Try ORDER BY thread_id,message_id and see if that nudges things your way. 2. Keep #1 and try replacing the index on (thread_id) with (thread_id,message_id) Did both (though adding such an index during ordinary workload took some time as did the VACUUM ANALYZE afterwards) and that worked like a charm - I've got execution times down to as little as a few milliseconds - wow! Thank you very much for providing such insightful hints! Kind regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match