On Friday, February 22, 2013 at 21:33 , Tom Lane wrote:
> The reason is that the LIMIT may stop the query before it's scanned all
> of the index. The planner estimates on the assumption that the desired
> rows are roughly uniformly distributed within the created_at index, and
> on that assumption, it looks like this query will stop fairly soon ...
> but evidently, that's wrong. On the other hand, it knows quite well
> that the other plan will require pulling out 5000-some rows and then
> sorting them before it can return anything, so that's not going to be
> exactly instantaneous either.
> 
> In this example, I'll bet that conversation_id and created_at are pretty
> strongly correlated, and that most or all of the rows with that specific
> conversation_id are quite far down the created_at ordering, so that the
> search through the index takes a long time to run. OTOH, with another
> conversation_id the same plan might run almost instantaneously.


That's right. So I created a composite index, and not only does this make the 
plan correct, but the planner now chooses a much more efficient plan than the 
previous index that indexed only on "conversation_id":

    Limit  (cost=0.00..30.80 rows=13 width=12) (actual time=0.042..0.058 
rows=13 loops=1)                                                                
                   
      Buffers: shared hit=8                                                     
                                                                                
            
      ->  Index Scan using index_comments_on_conversation_id_and_created_at on 
comments  (cost=0.00..14127.83 rows=5964 width=12) (actual time=0.039..0.054 
rows=13 loops=1)
            Index Cond: (conversation_id = 3975979)                             
                                                                                
            
            Buffers: shared hit=8                                               
                                                                                
            
    Total runtime: 0.094 ms                                                     
                                                                                
            


Is this because it can get the value of "created_at" from the index, or is it 
because it can know that the index is pre-sorted, or both?

Very impressed that Postgres can use a multi-column index for this. I just 
assumed, wrongly, that it couldn't. I will have to go review my other tables 
now and see if they can benefit from multi-column indexes.

Thanks!


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to