[PERFORM] Bad query plan with high-cardinality column

2013-02-28 Thread Alexander Staubo
I have a planner problem that looks like a bug, but I'm not familiar enough 
with how planner the works to say for sure.

This is my schema:

create table comments (
  id serial primary key,
  conversation_id integer,
  created_at timestamp
);
create index comments_conversation_id_index on comments (conversation_id);
create index comments_created_at_index on comments (created_at);

The table has 3.5M rows, and 650k unique values for conversation_id, where 
the histogram goes up to 54000 rows for the most frequent ID, with a long tail. 
There are only 20 values with a frequency of 1000 or higher. The created_at 
column has 3.5M distinct values.

Now, I have this query:

select comments.id from comments where
  conversation_id = 3975979 order by created_at limit 13


This filters about 5000 rows and returns the oldest 13 rows. But the query is 
consistently planned wrong:

Limit  (cost=0.00..830.53 rows=13 width=12) (actual time=3174.862..3179.525 
rows=13 loops=1)
  Buffers: shared hit=2400709 read=338923 written=21

  -  Index Scan using comments_created_at_index on comments  
(cost=0.00..359938.52 rows=5634 width=12) (actual time=3174.860..3179.518 
rows=13 loops=1)
Filter: (conversation_id = 3975979) 

Rows Removed by Filter: 2817751 

Buffers: shared hit=2400709 read=338923 written=21  

Total runtime: 3179.553 ms  




It takes anywhere between 3 seconds and several minutes to run, depending on 
how warm the disk cache is. This is the correct plan and index usage:

Limit  (cost=6214.34..6214.38 rows=13 width=12) (actual time=25.471..25.473 
rows=13 loops=1)
  
  Buffers: shared hit=197 read=4510 

  
  -  Sort  (cost=6214.34..6228.02 rows=5471 width=12) (actual 
time=25.469..25.470 rows=13 loops=1)
   
Sort Key: created_at

  
Sort Method: top-N heapsort  Memory: 25kB   

  
Buffers: shared hit=197 read=4510   

  
-  Index Scan using comments_conversation_id_index on comments  
(cost=0.00..6085.76 rows=5471 width=12) (actual time=1.163..23.955 rows=5834 
loops=1)
  Index Cond: (conversation_id = 3975979)   

  
  Buffers: shared hit=197 read=4510 

  
Total runtime: 25.500 ms

  




Now, the problem for Postgres is obviously to estimate how many rows have a 
given conversation_id value, but it does have that number. I'm at a loss how 
to explain why it thinks scanning a huge index that covers the entire table 
will ever beat scanning a small index that has 17% of the table's values.

It will consistently use the bad plan for higher-frequency values, and the good 
plan for lower-frequency values.

If I run ANALYZE repeatedly, the planner will sometimes, oddly enough, choose 
the correct plan. This behaviour actually seems related to 
effective_cache_size; if it's set small (128MB), the planner will sometimes 
favour the good plan, but if large (2GB) it will consistently use the bad plan.

I have bumped the statistics target up to 1, but it does not help. I have 
also tried setting n_distinct for the column manually, since Postgres guesses 
it's 285k instead of 650k, but that does not help.

What is odd is that the bad plan is really never correct in any situation for 
this query. It will *always* be better to branch off the 
comments_conversation_id_index index.

Our environment: 9.2.2, tweaked memory parameters (work_mem, 

Re: [PERFORM] Bad query plan with high-cardinality column

2013-02-23 Thread Tom Lane
Alexander Staubo a...@bengler.no writes:
 That's right. So I created a composite index, and not only does this make the 
 plan correct, but the planner now chooses a much more efficient plan than the 
 previous index that indexed only on conversation_id:
 ...
 Is this because it can get the value of created_at from the index, or is it 
 because it can know that the index is pre-sorted, or both?

What it knows is that leading index columns that have equality
constraints are effectively don't-cares for ordering purposes.
So in general, an indexscan on an index on (x,y) will be seen to
provide the ordering required by any of these queries:

select ... order by x;
select ... order by x,y;
select ... where x = constant order by x,y;
select ... where x = constant order by y;

Your query is an example of the last pattern.  So the planner sees that
the bare indexscan, with no additional sort step, can satisfy the query,
and then its cost estimate for that with the effects of the LIMIT will
be less than for the other possible plans.  There's no need to scan and
then sort thousands of rows, and there's no need to read through a
hard-to-guess-but-certainly-large number of irrelevant index entries.
The relevant part of the index is a small number of adjacent entries
that are already in the right order.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Bad query plan with high-cardinality column

2013-02-22 Thread Alexander Staubo
I have a problem with a query that is planned wrong. This is my schema:

   create table comments (
 id serial primary key,
 conversation_id integer,
 created_at timestamp
   );
   create index comments_conversation_id_index on comments (conversation_id);
   create index comments_created_at_index on comments (created_at);

The table has 3.5M rows, and 650k unique values for conversation_id, where 
the histogram goes up to 54000 rows for the most frequent ID, with a long tail. 
There are only 20 values with a frequency of 1000 or higher. The created_at 
column has 3.5M distinct values.

Now, I have this query:

   select comments.id from comments where
 conversation_id = 3975979 order by created_at limit 13

This filters about 5000 rows and returns the oldest 13 rows. But the query is 
consistently planned wrong:

   Limit  (cost=0.00..830.53 rows=13 width=12) (actual time=3174.862..3179.525 
rows=13 loops=1)
 Buffers: shared hit=2400709 read=338923 written=21 
   
 -  Index Scan using comments_created_at_index on comments  
(cost=0.00..359938.52 rows=5634 width=12) (actual time=3174.860..3179.518 
rows=13 loops=1)
   Filter: (conversation_id = 3975979)  
   
   Rows Removed by Filter: 2817751  
   
   Buffers: shared hit=2400709 read=338923 written=21   
   
   Total runtime: 3179.553 ms   
   

It takes anywhere between 3 seconds and several minutes to run, depending on 
how warm the disk cache is. This is the correct plan and index usage:

   Limit  (cost=6214.34..6214.38 rows=13 width=12) (actual time=25.471..25.473 
rows=13 loops=1)
 
 Buffers: shared hit=197 read=4510  

 -  Sort  (cost=6214.34..6228.02 rows=5471 width=12) (actual 
time=25.469..25.470 rows=13 loops=1)
  
   Sort Key: created_at 

   Sort Method: top-N heapsort  Memory: 25kB

   Buffers: shared hit=197 read=4510

   -  Index Scan using comments_conversation_id_index on comments  
(cost=0.00..6085.76 rows=5471 width=12) (actual time=1.163..23.955 rows=5834 
loops=1)
 Index Cond: (conversation_id = 3975979)

 Buffers: shared hit=197 read=4510  

   Total runtime: 25.500 ms 


The problem for Postgres is obviously to estimate how many rows have a given 
conversation_id value, but I have confirmed that the value is correctly 
tracked in the histogram.

I'm at a loss how to explain why the planner thinks scanning a huge index that 
covers the entire table will ever beat scanning a small index that has 17% of 
the table's values. Even if the entire database were in RAM, this would hit way 
too much buffers unnecessarily. (I have determined that planner will 
consistently use the bad plan for higher-frequency values, and the good plan 
for lower-frequency values.) It will *always* be better to branch off the 
comments_conversation_id_index index.

Another curious thing: If I run ANALYZE repeatedly, the planner will sometimes, 
oddly enough, choose the correct plan. This behaviour actually seems related to 
effective_cache_size; if it's set small (128MB), the planner will sometimes 
favour the good plan, but if large (= 2GB) it will consistently use the bad 
plan. Not sure if ANALYZE is changing anything or if it's just bad timing.

Things I have tried: I have bumped the statistics target up to 1, but it 
does not help. I have also tried setting n_distinct for the column manually, 
since Postgres guesses it's 285k instead of 650k, but 

Re: [PERFORM] Bad query plan with high-cardinality column

2013-02-22 Thread Tom Lane
Alexander Staubo a...@bengler.no writes:
select comments.id from comments where
  conversation_id = 3975979 order by created_at limit 13

 I'm at a loss how to explain why the planner thinks scanning a huge
 index that covers the entire table will ever beat scanning a small index
 that has 17% of the table's values.

The reason is that the LIMIT may stop the query before it's scanned all
of the index.  The planner estimates on the assumption that the desired
rows are roughly uniformly distributed within the created_at index, and
on that assumption, it looks like this query will stop fairly soon ...
but evidently, that's wrong.  On the other hand, it knows quite well
that the other plan will require pulling out 5000-some rows and then
sorting them before it can return anything, so that's not going to be
exactly instantaneous either.

In this example, I'll bet that conversation_id and created_at are pretty
strongly correlated, and that most or all of the rows with that specific
conversation_id are quite far down the created_at ordering, so that the
search through the index takes a long time to run.  OTOH, with another
conversation_id the same plan might run almost instantaneously.

If you're concerned mostly with this type of query then a 2-column index
on (conversation_id, created_at) would serve your purposes nicely.  You
could likely even dispense with the separate index on conversation_id
alone.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad query plan with high-cardinality column

2013-02-22 Thread Kevin Grittner
Alexander Staubo a...@bengler.no wrote:

 This is my schema:

   create table comments (
 id serial primary key,
 conversation_id integer,
 created_at timestamp
   );
   create index comments_conversation_id_index on comments (conversation_id);
   create index comments_created_at_index on comments (created_at);

I suspect you would be better off without those two indexes, and
instead having an index on (conversation_id, created_at).  Not just
for the query you show, but in general.

   select comments.id from comments where
 conversation_id = 3975979 order by created_at limit 13

 This filters about 5000 rows and returns the oldest 13 rows. But
 the query is consistently planned wrong:

 [planner thinks it will be cheaper to read index in ORDER BY
 sequence and filter rows until it has 13 than to read 5471 rows
 and sort them to pick the top 13 after the sort.]

In my experience these problems come largely from the planner not
knowing the cost of dealing with each tuple.  I see a lot less of
this if I raise cpu_tuple_cost to something in the 0.03 to 0.05
range.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad query plan with high-cardinality column

2013-02-22 Thread Alexander Staubo
On Friday, February 22, 2013 at 21:33 , Tom Lane wrote:
 The reason is that the LIMIT may stop the query before it's scanned all
 of the index. The planner estimates on the assumption that the desired
 rows are roughly uniformly distributed within the created_at index, and
 on that assumption, it looks like this query will stop fairly soon ...
 but evidently, that's wrong. On the other hand, it knows quite well
 that the other plan will require pulling out 5000-some rows and then
 sorting them before it can return anything, so that's not going to be
 exactly instantaneous either.
 
 In this example, I'll bet that conversation_id and created_at are pretty
 strongly correlated, and that most or all of the rows with that specific
 conversation_id are quite far down the created_at ordering, so that the
 search through the index takes a long time to run. OTOH, with another
 conversation_id the same plan might run almost instantaneously.


That's right. So I created a composite index, and not only does this make the 
plan correct, but the planner now chooses a much more efficient plan than the 
previous index that indexed only on conversation_id:

Limit  (cost=0.00..30.80 rows=13 width=12) (actual time=0.042..0.058 
rows=13 loops=1)
   
  Buffers: shared hit=8 


  -  Index Scan using index_comments_on_conversation_id_and_created_at on 
comments  (cost=0.00..14127.83 rows=5964 width=12) (actual time=0.039..0.054 
rows=13 loops=1)
Index Cond: (conversation_id = 3975979) 


Buffers: shared hit=8   


Total runtime: 0.094 ms 




Is this because it can get the value of created_at from the index, or is it 
because it can know that the index is pre-sorted, or both?

Very impressed that Postgres can use a multi-column index for this. I just 
assumed, wrongly, that it couldn't. I will have to go review my other tables 
now and see if they can benefit from multi-column indexes.

Thanks!


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad query plan with high-cardinality column

2013-02-22 Thread Alexander Staubo
On Friday, February 22, 2013 at 21:47 , Kevin Grittner wrote:
 I suspect you would be better off without those two indexes, and
 instead having an index on (conversation_id, created_at). Not just
 for the query you show, but in general.


Indeed, that solved it, thanks!
 


 In my experience these problems come largely from the planner not
 knowing the cost of dealing with each tuple. I see a lot less of
 this if I raise cpu_tuple_cost to something in the 0.03 to 0.05
 range.


Is this something I can just frob a bit without worrying about it adversely 
impacting database performance across the board, or should I be very careful 
and do lots of testing on a staging box first?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad query plan with high-cardinality column

2013-02-22 Thread Kevin Grittner
Alexander Staubo a...@bengler.no wrote:
 On Friday, February 22, 2013 at 21:47 , Kevin Grittner wrote:

 In my experience these problems come largely from the planner
 not knowing the cost of dealing with each tuple. I see a lot
 less of this if I raise cpu_tuple_cost to something in the 0.03
 to 0.05 range.

 Is this something I can just frob a bit without worrying about it
 adversely impacting database performance across the board, or
 should I be very careful and do lots of testing on a staging box
 first?

If possible, I would recommend trying it with the old indexes and
seeing whether it causes it to choose the better plan.  (Of course,
you're not going to beat the plan you get with the two-column index
for this query, but it might help it better cost the other
alternatives, which would be a clue that it makes your overall
costing model more accurate and would have a more general benefit.)
You can play with settings like this in a single session without
affecting any other sessions.

I always recommend testing a change like this in staging and
closely monitoring after deploying to production, to confirm the
overall benefit and look for any odd cases which might suffer a
performance regression.  For this particular change, I have never
seen a negative effect, but I'm sure that it's possible to have a
scenario where it isn't helpful.

Personally, I have changed this setting many times and have often
noted that 0.02 was not enough to cause choice of an optimal plan,
0.03 was always enough to do it if adjusting this setting was going
to help at all, and boosting it to 0.05 never caused further plan
changes in the cases I tested.  I have never seen such increases
cause less optimal plan choice.

If you try this, please post your results.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance