Re: [PERFORM] Strange planner decision on quite simple select

2005-10-25 Thread Markus Wollny
 
Hi!

> -Ursprüngliche Nachricht-
> Von: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] Im Auftrag 
> von Richard Huxton
> Gesendet: Dienstag, 25. Oktober 2005 12:07
> An: Markus Wollny
> Cc: pgsql-performance@postgresql.org
> Betreff: Re: [PERFORM] Strange planner decision on quite simple select
> 
> Hmm - it shouldn't take that long. If I'm reading this right, 
> it's expecting to have to fetch 5606 rows to match 
> thread_id=3354253 the 20 times you've asked for. Now, what it 
> probably doesn't know is that thread_id is correlated with 
> message_id quite highly (actually, I don't know that, I'm 
> guessing). So - it starts at message_id=1 and works along, 
> but I'm figuring that it needs to reach message_id's in the 
> 3-4 million range to see any of the required thread.

Reading this I tried with adding a "AND MESSAGE_ID >= THREAD_ID" to the 
WHERE-clause, as you've guessed quite correctly, both message_id and thread_id 
are derived from the same sequence and thread_id equals the lowest message_id 
in a thread. This alone did quite a lot to improve things - I got stable 
executing times down from an average 12 seconds to a mere 2 seconds - just 
about the same as with the subselect.

> Suggestions:
> 1. Try "ORDER BY thread_id,message_id" and see if that nudges 
> things your way.
> 2. Keep #1 and try replacing the index on (thread_id) with
> (thread_id,message_id)

Did both (though adding such an index during ordinary workload took some time 
as did the VACUUM ANALYZE afterwards) and that worked like a charm - I've got 
execution times down to as little as a few milliseconds - wow! Thank you very 
much for providing such insightful hints!

Kind regards

   Markus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Strange planner decision on quite simple select

2005-10-25 Thread Richard Huxton

Markus Wollny wrote:

Hello!

I've got a table BOARD_MESSAGE (message_id int8, thread_id int8, ...)
with pk on message_id and and a non_unique not_null index on thread_id.
A count(*) on BOARD_MESSAGE currently yields a total of 1231171 rows,
the planner estimated a total of 1232530 rows in this table. I've got
pg_autovacuum running on the database and run an additional nightly
VACUUM ANALYZE over it every night.

I've got a few queries of the following type:

select  * 
from PUBLIC.BOARD_MESSAGE 
where THREAD_ID = 3354253 
order byMESSAGE_ID asc 
limit   20 
offset  0; 



There are currently roughly 4500 rows with this thread_id in
BOARD_MESSAGE. Explain-output is like so:

  QUERY PLAN 



-- 
 Limit  (cost=0.00..3927.22 rows=20 width=1148) 
   ->  Index Scan using pk_board_message on board_message
(cost=0.00..1100800.55 rows=5606 width=1148) 
 Filter: (thread_id = 3354253) 
(3 rows) 


I didn't have the patience to actually complete an explain analyze on
that one - I cancelled the query on several attempts after more than 40
minutes runtime. Now I fiddled a little with this statement and tried
nudging the planner in the right direction like so:


Hmm - it shouldn't take that long. If I'm reading this right, it's 
expecting to have to fetch 5606 rows to match thread_id=3354253 the 20 
times you've asked for. Now, what it probably doesn't know is that 
thread_id is correlated with message_id quite highly (actually, I don't 
know that, I'm guessing). So - it starts at message_id=1 and works 
along, but I'm figuring that it needs to reach message_id's in the 3-4 
million range to see any of the required thread.


Suggestions:
1. Try "ORDER BY thread_id,message_id" and see if that nudges things 
your way.
2. Keep #1 and try replacing the index on (thread_id) with 
(thread_id,message_id)


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster