Re: [PERFORM] query against large table not using sensible index to find very small amount of data

Shaun Thomas Tue, 08 Apr 2014 06:41:02 -0700

> Other possibly relevant pieces of information...  The entity type
> column has a cardinality in the neighborhood of a couple dozen.
> Meanwhile, for some of the entity types there is a large and ongoing
> number of events, and for other entity types there is a smaller and
> more sporadic number of events.  Every now and again a new entity
> type shows up.


With that as the case, I have two questions for you:

1. Why do you have a low cardinality column as the first column in an index?
2. Do you have any queries at all that only use the entity type as the only 
where clause?

I agree that the planner is probably wrong here, but these choices aren't 
helping. The low cardinality of the first column causes very large buckets that 
don't limit results very well at all. Combined with the order-by clause, the 
planner really wants to walk the date index backwards to find results instead. 
I would do a couple of things.

First, remove the type/date index. Next, do a count of each type in the table 
with something like this:

SELECT type_id, count(1)
  FROM my_table
 GROUP BY 2

Any type that is more than 20% of the table will probably never be useful in an 
index. At this point, you have a choice. You can create a new index with date 
and type *in that order* or create a new partial index on date and type that 
also ignores the top matches. For instance, if you had a type that was 90% of 
the values, this would be my suggestion:

CREATE INDEX idx_foo_table_date_event_type_part ON foo_table (event_date, 
event_type)
 WHERE event_type != 14;

Or whatever. If the IDs are basically evenly distributed, it won't really 
matter.

In any case, index order matters. The planner wants to restrict data as quickly 
as possible. If you provide an order clause, it wants to read the index in that 
order. Your specified type as the first column disrupts that, so it has to 
fetch the values first, which is usually more expensive. Even if that's wrong 
in your particular case, planner stats are not precise enough to know that.

Either way, try moving the indexes around. I can't think of many indexes in our 
database where I have the low cardinality value as the first column. Databases 
have an easier time managing many shallow buckets of values, than a few deep 
ones.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd | Suite 400 | Chicago IL, 60604
312-676-8870
[email protected]

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] query against large table not using sensible index to find very small amount of data

Reply via email to