On Mon, 2008-09-08 at 09:16 -0700, Matt Smiley wrote:
> Hi Kiran,
>
> You gave great info on your problem.
>
> First, is this the query you're actually trying to speed up, or is it a
> simplified version? It looks like the optimizer has already chosen the best
> execution plan for the given query. Since the query has no joins, we only
> have to consider access paths. You're fetching 58221/37909009 = 0.15% of the
> rows, so a sequential scan is clearly inappropriate. A basic index scan is
> likely to incur extra scattered I/O, so a bitmap index scan is favored.
Thanks for your analysis and sorry for the long silence.
Its a simplified version. I was tackling this part of the original query
plan since I saw that I got inaccurate stats on one of the tables.
>
> To improve on this query's runtime, you could try any of the following:
>
> - Reorganize the data to reduce this query's scattered I/O (i.e. cluster on
> "paliasorigin_search3_idx" rather than "paliasorigin_alias_casefold_idx").
> Bear in mind, this may adversely affect other queries.
I applied this on a different table which solved my original problem!
The query was hitting statement_timeouts but now runs in reasonable
time. I re clustered one of the tables in my actual query on a more
appropriate index.
>
> - Increase the cache hit frequency by ensuring the underlying filesystem
> cache has plenty of RAM (usually so under Linux) and checking that other
> concurrent queries aren't polluting the cache. Consider adding RAM if you
> think the working set of blocks required by most queries is larger than the
> combined Postgres and filesystem caches. If other processes than the db do
> I/O on this machine, consider them as resource consumers, too.
>
> - Restructure the table, partitioning along a column that would be useful
> for pruning whole partitions for your painful queries. In this case,
> origin_id or tax_id seems like a good bet, but again, consider other queries
> against this table. 38 million rows probably makes your table around 2 GB
> (guessing about 55 bytes/row). Depending on the size and growth rate of the
> table, it may be time to consider partitioning. Out of curiosity, what
> runtime are you typically seeing from this query? The explain-analyze ran in
> 113 ms, which I'm guessing is the effect of caching, not the runtime you're
> trying to improve.
This seems inevitable eventually, if my tables keep growing in size.
> - Rebuild the indexes on this table. Under certain use conditions, btree
> indexes can get horribly bloated. Rebuilding the indexes returns them to
> their most compact and balanced form. For example: reindex index
> "paliasorigin_search3_idx"; Apart from the locking and CPU usage during the
> rebuild, this has no negative consequences, so I'd try this before something
> drastic like partitioning. First review the current size of the index for
> comparison: select
> pg_size_pretty(pg_relation_size('paliasorigin_search3_idx'));
This didn't improve the stats.
>
> Since you asked specifically about improving the row-count estimate, like the
> previous responder said, you should consider increasing the statistics
> target. This will help if individual columns are being underestimated, but
> not if the overestimate is due to joint variation. In other words, the
> optimizer has no way to tell if there is there a logical relationship between
> columns A and B such that certain values in B only occur with certain values
> of A. Just judging from the names, it sounds like origin_id and tax_id might
> have a parent-child relationship, so I thought it was worth mentioning.
>
> Do the columns individually have good estimates?
Yes.
> explain analyze select * from paliasorigin where origin_id=20;
> explain analyze select * from paliasorigin where tax_id=9606;
>
> If not, increase the statistics on that column, reanalyze the table, and
> recheck the selectivity estimate:
> alter table paliasorigin alter column origin_id set statistics 20;
> analyze paliasorigin;
> explain analyze select * from paliasorigin where origin_id=20;
my default_statistics_target is set to 1000 but I did set some column
specific statistics. But didn't help in this case.
Thanks a lot.
-Kiran
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance