I have a planner problem that looks like a bug, but I'm not familiar enough 
with how planner the works to say for sure.

This is my schema:

    create table comments (
      id serial primary key,
      conversation_id integer,
      created_at timestamp
    );
    create index comments_conversation_id_index on comments (conversation_id);
    create index comments_created_at_index on comments (created_at);

The table has 3.5M rows, and 650k unique values for "conversation_id", where 
the histogram goes up to 54000 rows for the most frequent ID, with a long tail. 
There are only 20 values with a frequency of 1000 or higher. The "created_at" 
column has 3.5M distinct values.

Now, I have this query:

    select comments.id from comments where
      conversation_id = 3975979 order by created_at limit 13


This filters about 5000 rows and returns the oldest 13 rows. But the query is 
consistently planned wrong:

    Limit  (cost=0.00..830.53 rows=13 width=12) (actual time=3174.862..3179.525 
rows=13 loops=1)                                                            
      Buffers: shared hit=2400709 read=338923 written=21                        
                                                                            
      ->  Index Scan using comments_created_at_index on comments  
(cost=0.00..359938.52 rows=5634 width=12) (actual time=3174.860..3179.518 
rows=13 loops=1)
            Filter: (conversation_id = 3975979)                                 
                                                                            
            Rows Removed by Filter: 2817751                                     
                                                                            
            Buffers: shared hit=2400709 read=338923 written=21                  
                                                                            
    Total runtime: 3179.553 ms                                                  
                                                                            



It takes anywhere between 3 seconds and several minutes to run, depending on 
how warm the disk cache is. This is the correct plan and index usage:

    Limit  (cost=6214.34..6214.38 rows=13 width=12) (actual time=25.471..25.473 
rows=13 loops=1)                                                                
  
      Buffers: shared hit=197 read=4510                                         
                                                                                
  
      ->  Sort  (cost=6214.34..6228.02 rows=5471 width=12) (actual 
time=25.469..25.470 rows=13 loops=1)                                            
               
            Sort Key: created_at                                                
                                                                                
  
            Sort Method: top-N heapsort  Memory: 25kB                           
                                                                                
  
            Buffers: shared hit=197 read=4510                                   
                                                                                
  
            ->  Index Scan using comments_conversation_id_index on comments  
(cost=0.00..6085.76 rows=5471 width=12) (actual time=1.163..23.955 rows=5834 
loops=1)
                  Index Cond: (conversation_id = 3975979)                       
                                                                                
  
                  Buffers: shared hit=197 read=4510                             
                                                                                
  
    Total runtime: 25.500 ms                                                    
                                                                                
  




Now, the problem for Postgres is obviously to estimate how many rows have a 
given "conversation_id" value, but it does have that number. I'm at a loss how 
to explain why it thinks scanning a huge index that covers the entire table 
will ever beat scanning a small index that has 17% of the table's values.

It will consistently use the bad plan for higher-frequency values, and the good 
plan for lower-frequency values.

If I run ANALYZE repeatedly, the planner will sometimes, oddly enough, choose 
the correct plan. This behaviour actually seems related to 
effective_cache_size; if it's set small (128MB), the planner will sometimes 
favour the good plan, but if large (2GB) it will consistently use the bad plan.

I have bumped the statistics target up to 10000, but it does not help. I have 
also tried setting n_distinct for the column manually, since Postgres guesses 
it's 285k instead of 650k, but that does not help.

What is odd is that the bad plan is really never correct in any situation for 
this query. It will *always* be better to branch off the 
"comments_conversation_id_index" index.

Our environment: 9.2.2, tweaked memory parameters (work_mem, sort_mem, 
shared_buffers, effective_cache_size), not touched planner/cost parameters. 
Problem also exists on 9.2.3.

(Please CC me as I'm not subscribed to the list. Thanks.) 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to