[PERFORM] Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1
Hello, We are using PostgreSQL 9.0.3, compiled by Visual C++ build 1500, 32-bit, installed on Windows 2003 R2 32-bit. We have an 'aisposition' table used for a GPS tracking application, containing ~30 million rows and a number of indexes. Two of these are: idx_receiveddatetime: indexes aisposition(receiveddatetime timestamp) idx_userid_receiveddatetime: indexes aisposition(userid int4 desc, receiveddatetime timestamp desc) The problem we get is that the following query is taking many minutes to run: select * from aisposition where userid = 311369000 order by userid desc, receiveddatetime desc limit 1 When we 'EXPLAIN' this query, PostgreSQL says it is using the index idx_receiveddatetime. The way the application is designed means that in virtually all cases the query will have to scan a very long way into idx_receiveddatetime to find the first record where userid = 311369000. If however we delete the idx_receiveddatetime index, the query uses the idx_userid_receiveddatetime index, and the query only takes a few milliseconds. The EXPLAIN ANALYZE output with idx_receiveddatetime in place is: Limit (cost=0.00..1.30 rows=1 width=398) (actual time=1128097.540..1128097.541 rows=1 loops=1) - Index Scan Backward using idx_receiveddatetime on aisposition (cost=0.00..2433441.05 rows=1875926 width=398) (actual time=1128097.532..1128097.532 rows=1 loops=1) Filter: (userid = 311369000) Total runtime: 1128097.609 ms And with that index deleted: Limit (cost=0.00..4.01 rows=1 width=398) (actual time=60.633..60.634 rows=1 loops=1) - Index Scan using idx_userid_receiveddatetime on aisposition (cost=0.00..7517963.47 rows=1875926 width=398) (actual time=60.629..60.629 rows=1 loops=1) Index Cond: (userid = 311369000) Total runtime: 60.736 ms We would obviously prefer PostgreSQL to use the idx_userid_receiveddatetime index in all cases, because we know that this will guarantee results in a timely manner, whereas using idx_receiveddatetime will usually require a scan of much of the table and our application will not work. What are we doing wrong? Cheers now, John
Re: [PERFORM] Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1
The row estimate is way off. Is autovacuum disabled? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1
John Surcombe john.surco...@digimap.gg writes: When we 'EXPLAIN' this query, PostgreSQL says it is using the index idx_receiveddatetime. The way the application is designed means that in virtually all cases the query will have to scan a very long way into idx_receiveddatetime to find the first record where userid = 311369000. If however we delete the idx_receiveddatetime index, the query uses the idx_userid_receiveddatetime index, and the query only takes a few milliseconds. That's just bizarre ... it knows the index is applicable, and the cost estimates clearly favor the better index, so why did it pick the worse one? I tried to duplicate this locally, without success, so there's some contributing factor you've neglected to mention. Can you put together a self-contained test case that acts like this? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tuning massive UPDATES and GROUP BY's?
Don't insert data into an indexed table. A very important point with bulk-loading is that you should load all the data first, then create the indexes. Running multiple (different) CREATE INDEX queries in parallel can additionally save a lot of time. Also don't move data back and forth between the tables, just drop the original when you're done. I just saw your post and it looks similar to what I'm doing. We're going to be loading 12G of data from a MySQL dump into our pg 9.0.3 database next weekend. I've been testing this for the last two weeks. Tried removing the indexes and other constraints just for the import but for a noob like me, this was too much to ask. Maybe when I get more experience. So I *WILL* be importing all of my data into indexed tables. I timed it and it will take eight hours. I'm sure I could get it down to two or three hours for the import if I really knew more about postgres but that's the price you pay when you slam dunk a project and your staff isn't familiar with the database back-end. Other than being very inefficient, and consuming more time than necessary, is there any other down side to importing into an indexed table? In the four test imports I've done, everything seems to work fine, just takes a long time. Sorry for hijacking your thread here!
Re: [PERFORM] Planner wrongly shuns multi-column index for select .. order by col1, col2 limit 1
I wrote: John Surcombe john.surco...@digimap.gg writes: When we 'EXPLAIN' this query, PostgreSQL says it is using the index idx_receiveddatetime. The way the application is designed means that in virtually all cases the query will have to scan a very long way into idx_receiveddatetime to find the first record where userid = 311369000. If however we delete the idx_receiveddatetime index, the query uses the idx_userid_receiveddatetime index, and the query only takes a few milliseconds. That's just bizarre ... it knows the index is applicable, and the cost estimates clearly favor the better index, so why did it pick the worse one? No, scratch that, I misread the plans. It *is* picking the plan it thinks has lower cost; it's just a mistaken cost estimate. It's strange though that the less selective indexscan is getting a lower cost estimate. I wonder whether your table is (almost) perfectly ordered by receiveddatetime, such that the one-column index has correlation close to 1.0. That could possibly lower the cost estimate to the point where it'd appear to dominate the other index. It'd be useful to see the pg_stats.correlation value for both the userid and receiveddatetime columns. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance