Re: [PERFORM] Strange planner decision on quite simple select
Hi! > -Ursprüngliche Nachricht- > Von: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Im Auftrag > von Richard Huxton > Gesendet: Dienstag, 25. Oktober 2005 12:07 > An: Markus Wollny > Cc: pgsql-performance@postgresql.org > Betreff: Re: [PERFORM] Strange planner decision on quite simple select > > Hmm - it shouldn't take that long. If I'm reading this right, > it's expecting to have to fetch 5606 rows to match > thread_id=3354253 the 20 times you've asked for. Now, what it > probably doesn't know is that thread_id is correlated with > message_id quite highly (actually, I don't know that, I'm > guessing). So - it starts at message_id=1 and works along, > but I'm figuring that it needs to reach message_id's in the > 3-4 million range to see any of the required thread. Reading this I tried with adding a "AND MESSAGE_ID >= THREAD_ID" to the WHERE-clause, as you've guessed quite correctly, both message_id and thread_id are derived from the same sequence and thread_id equals the lowest message_id in a thread. This alone did quite a lot to improve things - I got stable executing times down from an average 12 seconds to a mere 2 seconds - just about the same as with the subselect. > Suggestions: > 1. Try "ORDER BY thread_id,message_id" and see if that nudges > things your way. > 2. Keep #1 and try replacing the index on (thread_id) with > (thread_id,message_id) Did both (though adding such an index during ordinary workload took some time as did the VACUUM ANALYZE afterwards) and that worked like a charm - I've got execution times down to as little as a few milliseconds - wow! Thank you very much for providing such insightful hints! Kind regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Strange planner decision on quite simple select
Markus Wollny wrote: Hello! I've got a table BOARD_MESSAGE (message_id int8, thread_id int8, ...) with pk on message_id and and a non_unique not_null index on thread_id. A count(*) on BOARD_MESSAGE currently yields a total of 1231171 rows, the planner estimated a total of 1232530 rows in this table. I've got pg_autovacuum running on the database and run an additional nightly VACUUM ANALYZE over it every night. I've got a few queries of the following type: select * from PUBLIC.BOARD_MESSAGE where THREAD_ID = 3354253 order byMESSAGE_ID asc limit 20 offset 0; There are currently roughly 4500 rows with this thread_id in BOARD_MESSAGE. Explain-output is like so: QUERY PLAN -- Limit (cost=0.00..3927.22 rows=20 width=1148) -> Index Scan using pk_board_message on board_message (cost=0.00..1100800.55 rows=5606 width=1148) Filter: (thread_id = 3354253) (3 rows) I didn't have the patience to actually complete an explain analyze on that one - I cancelled the query on several attempts after more than 40 minutes runtime. Now I fiddled a little with this statement and tried nudging the planner in the right direction like so: Hmm - it shouldn't take that long. If I'm reading this right, it's expecting to have to fetch 5606 rows to match thread_id=3354253 the 20 times you've asked for. Now, what it probably doesn't know is that thread_id is correlated with message_id quite highly (actually, I don't know that, I'm guessing). So - it starts at message_id=1 and works along, but I'm figuring that it needs to reach message_id's in the 3-4 million range to see any of the required thread. Suggestions: 1. Try "ORDER BY thread_id,message_id" and see if that nudges things your way. 2. Keep #1 and try replacing the index on (thread_id) with (thread_id,message_id) -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster