On Mon, 2008-09-08 at 09:16 -0700, Matt Smiley wrote: > Hi Kiran, > > You gave great info on your problem. > > First, is this the query you're actually trying to speed up, or is it a > simplified version? It looks like the optimizer has already chosen the best > execution plan for the given query. Since the query has no joins, we only > have to consider access paths. You're fetching 58221/37909009 = 0.15% of the > rows, so a sequential scan is clearly inappropriate. A basic index scan is > likely to incur extra scattered I/O, so a bitmap index scan is favored.
Thanks for your analysis and sorry for the long silence. Its a simplified version. I was tackling this part of the original query plan since I saw that I got inaccurate stats on one of the tables. > > To improve on this query's runtime, you could try any of the following: > > - Reorganize the data to reduce this query's scattered I/O (i.e. cluster on > "paliasorigin_search3_idx" rather than "paliasorigin_alias_casefold_idx"). > Bear in mind, this may adversely affect other queries. I applied this on a different table which solved my original problem! The query was hitting statement_timeouts but now runs in reasonable time. I re clustered one of the tables in my actual query on a more appropriate index. > > - Increase the cache hit frequency by ensuring the underlying filesystem > cache has plenty of RAM (usually so under Linux) and checking that other > concurrent queries aren't polluting the cache. Consider adding RAM if you > think the working set of blocks required by most queries is larger than the > combined Postgres and filesystem caches. If other processes than the db do > I/O on this machine, consider them as resource consumers, too. > > - Restructure the table, partitioning along a column that would be useful > for pruning whole partitions for your painful queries. In this case, > origin_id or tax_id seems like a good bet, but again, consider other queries > against this table. 38 million rows probably makes your table around 2 GB > (guessing about 55 bytes/row). Depending on the size and growth rate of the > table, it may be time to consider partitioning. Out of curiosity, what > runtime are you typically seeing from this query? The explain-analyze ran in > 113 ms, which I'm guessing is the effect of caching, not the runtime you're > trying to improve. This seems inevitable eventually, if my tables keep growing in size. > - Rebuild the indexes on this table. Under certain use conditions, btree > indexes can get horribly bloated. Rebuilding the indexes returns them to > their most compact and balanced form. For example: reindex index > "paliasorigin_search3_idx"; Apart from the locking and CPU usage during the > rebuild, this has no negative consequences, so I'd try this before something > drastic like partitioning. First review the current size of the index for > comparison: select > pg_size_pretty(pg_relation_size('paliasorigin_search3_idx')); This didn't improve the stats. > > Since you asked specifically about improving the row-count estimate, like the > previous responder said, you should consider increasing the statistics > target. This will help if individual columns are being underestimated, but > not if the overestimate is due to joint variation. In other words, the > optimizer has no way to tell if there is there a logical relationship between > columns A and B such that certain values in B only occur with certain values > of A. Just judging from the names, it sounds like origin_id and tax_id might > have a parent-child relationship, so I thought it was worth mentioning. > > Do the columns individually have good estimates? Yes. > explain analyze select * from paliasorigin where origin_id=20; > explain analyze select * from paliasorigin where tax_id=9606; > > If not, increase the statistics on that column, reanalyze the table, and > recheck the selectivity estimate: > alter table paliasorigin alter column origin_id set statistics 20; > analyze paliasorigin; > explain analyze select * from paliasorigin where origin_id=20; my default_statistics_target is set to 1000 but I did set some column specific statistics. But didn't help in this case. Thanks a lot. -Kiran -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance