[SQL] Sequence vs. Index Scan
I have two schemas, both with the exact same DDL. One performs great and the other has problems with a specific query: SELECT * FROM branch_active_vw WHERE branch_id = get_branch_for_zip ( '22151' ) ORDER BY branch_name ; I am not defining the view here because the explain plans show the real problem. I can post the view as well if it will help. The explain plans are as follows: Fast Schema: 1. Sort (cost=17.50..17.51 rows=1 width=680) (actual time= 2838.583..2838.586 rows=1 loops=1) 2. Sort Key: branch.branch_name 3. -> Nested Loop Left Join (cost=0.00..17.49 rows=1 width=680) (actual time=2838.060..2838.093 rows=1 loops=1) 4. Join Filter: ("inner".branch_group_id = "outer".branch_group_id) 5. -> Nested Loop (cost=0.00..11.45 rows=1 width=647) (actual time=2837.776..2837.804 rows=1 loops=1) 6. -> Nested Loop (cost=0.00..7.88 rows=1 width=618) (actual time=2837.697..2837.716 rows=1 loops=1) 7. Join Filter: ("inner".locale_id = "outer".locale_id) 8. -> Nested Loop (cost=0.00..6.86 rows=1 width=598) (actual time=2837.666..2837.676 rows=1 loops=1) 9. Join Filter: ("inner".corp_id = "outer".corp_id) 10. -> Index Scan using branch_pkey on branch (cost=0.00..5.84 rows=1 width=560) (actual time=2837.621..2837.624 rows=1 loops=1) 11.Index Cond: (branch_id = get_branch_for_zip('22151'::character varying)) 12.Filter: ((start_day <= now()) AND ((end_day IS NULL) OR (end_day >= now( 13. -> Seq Scan on corp (cost=0.00..1.01 rows=1 width=46) (actual time=0.015..0.017 rows=1 loops=1) 14.-> Seq Scan on locale (cost=0.00..1.01 rows=1 width=28) (actual time=0.014..0.016 rows=1 loops=1) 15. -> Index Scan using zip_cd_pkey on zip_cd branch_address_zip_cd (cost=0.00..3.55 rows=1 width=37) (actual time= 0.066..0.069 rows=1 loops=1) 16.Index Cond: (branch_address_zip_cd.zip_cd_id = "outer".branch_address_zip_id) 17.-> Seq Scan on branch_group (cost=0.00..1.07 rows=7 width=41) (actual time=0.013..0.029 rows=7 loops=1) 18.SubPlan 19. -> Seq Scan on branch_area (cost=0.00..4.89 rows=1 width=6) (actual time=0.132..0.137 rows=2 loops=1) 20.Filter: (branch_id = $0) 21. Total runtime: 2839.044 ms Slow Schema: Sort (cost=12.77..12.78 rows=1 width=1380) (actual time= 157492.513..157492.515 rows=1 loops=1) 1. Sort Key: branch.branch_name 2. -> Nested Loop Left Join (cost=0.00..12.76 rows=1 width=1380) (actual time=130130.384..157492.484 rows=1 loops=1) 3. Join Filter: ("inner".branch_group_id = "outer".branch_group_id) 4. -> Nested Loop (cost=0.00..10.34 rows=1 width=1360) (actual time=130130.157..157492.253 rows=1 loops=1) 5. Join Filter: ("inner".locale_id = "outer".locale_id) 6. -> Nested Loop (cost=0.00..9.31 rows=1 width=1340) (actual time=130130.127..157492.213 rows=1 loops=1) 7. -> Nested Loop (cost=0.00..5.75 rows=1 width=1311) (actual time=130130.042..157492.119 rows=1 loops=1) 8. Join Filter: ("inner".corp_id = "outer".corp_id) 9. -> Seq Scan on branch (cost=0.00..4.72 rows=1 width=1281) (actual time=130129.988..157492.057 rows=1 loops=1) 10.Filter: ((start_day <= now()) AND ((end_day IS NULL) OR (end_day >= now())) AND (branch_id = get_branch_for_zip('22151'::character varying))) 11. -> Seq Scan on corp (cost=0.00..1.01 rows=1 width=38) (actual time=0.022..0.024 rows=1 loops=1) 12.-> Index Scan using zip_cd_pkey on zip_cd branch_address_zip_cd (cost=0.00..3.55 rows=1 width=37) (actual time= 0.070..0.073 rows=1 loops=1) 13. Index Cond: (branch_address_zip_cd.zip_cd_id = "outer".branch_address_zip_id) 14. -> Seq Scan on locale (cost=0.00..1.01 rows=1 width=28) (actual time=0.013..0.015 rows=1 loops=1) 15.-> Seq Scan on branch_group (cost=0.00..1.01 rows=1 width=28) (actual time=0.013..0.015 rows=1 loops=1) 16.SubPlan 17. -> Seq Scan on branch_area (cost=0.00..1.40 rows=1 width=5) (actual time=0.077..0.084 rows=1 loops=1) 18.Filter: (branch_id = $0) 19. Total runtime: 157492.890 ms The problem is that lines 10-12 on the fast schema show an index scan while lines 9-10 of the slow schema show a sequence scan. The sequence scan of the branch_id, combined with the rest of the filter takes forever. I have checked and there IS an index, specifically a primary key index, on the branch_id in both schemas so I cannot figure out why the optimizer is looking at these differently. In fact, the table the branch_id comes from has the exact same indices and foreign keys on both schemas. Any direction would be de
Re: [SQL] Sequence vs. Index Scan
On Sat, May 05, 2007 at 04:05:52PM -0500, Aaron Bono wrote: > I have two schemas, both with the exact same DDL. One performs great and Do they have the same data? > I have > checked and there IS an index, specifically a primary key index, on the > branch_id in both schemas so I cannot figure out why the optimizer is > looking at these differently. In fact, the table the branch_id comes from > has the exact same indices and foreign keys on both schemas. Different data? Different statistics? Different numbers of (possibly dead) rows? A -- Andrew Sullivan | [EMAIL PROTECTED] Everything that happens in the world happens at some place. --Jane Jacobs ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] Sequence vs. Index Scan
On 5/5/07, Andrew Sullivan <[EMAIL PROTECTED]> wrote: On Sat, May 05, 2007 at 04:05:52PM -0500, Aaron Bono wrote: > I have two schemas, both with the exact same DDL. One performs great and Do they have the same data? They have different data. The fast one has about 150 rows and the slow one has about 40 rows. The field in question here, the branch_id, is a BIGSERIAL in both. I have > checked and there IS an index, specifically a primary key index, on the > branch_id in both schemas so I cannot figure out why the optimizer is > looking at these differently. In fact, the table the branch_id comes from > has the exact same indices and foreign keys on both schemas. Different data? Different statistics? Different numbers of (possibly dead) rows? We don't allow deletes and updates are fairly infrequent. I also did a vacuum analyze to no effect. What can I do to get more statistics that would help? Thanks for the help, Aaron -- == Aaron Bono Aranya Software Technologies, Inc. http://www.aranya.com http://codeelixir.com ==
Re: [SQL] Sequence vs. Index Scan
"Aaron Bono" <[EMAIL PROTECTED]> writes: > 9. -> Seq Scan on branch (cost=0.00..4.72 rows=1 > width=1281) (actual time=130129.988..157492.057 rows=1 loops=1) > 10.Filter: ((start_day <= now()) AND > ((end_day IS NULL) OR (end_day >= now())) AND (branch_id = > get_branch_for_zip('22151'::character varying))) There is something *awfully* wacko about that entry --- the fact that the cost estimate is less than 5 units means that the planner thinks there's 4 or fewer pages; either that's way wrong or the get_branch_for_zip function is taking enormous amounts of time per row. Have you tried timing that function on its own? One possible reason for the performance difference is if you have get_branch_for_zip marked as stable in one database and volatile in the other --- volatile would prevent it from being used in an indexqual as you'd like. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [SQL] Sequence vs. Index Scan
On Sat, May 05, 2007 at 05:00:53PM -0500, Aaron Bono wrote: > > They have different data. The fast one has about 150 rows and the slow one > has about 40 rows. The field in question here, the branch_id, is a > BIGSERIAL in both. I'd be astonished if a table of 40 rows ever got index scanned. It's probably more efficient to read the whole table. But it seems your case may be strange. > We don't allow deletes and updates are fairly infrequent. I also did a > vacuum analyze to no effect. How do you "not allow" deletes? Does this happen because of a failed foreign key? If so, you can end up with dead tuples. I'd look at the output of VACUUM VERBOSE to make sure you don't have a lot of dead tuples. That said, I wonder if fiddling with the statistics on your tables might help. A -- Andrew Sullivan | [EMAIL PROTECTED] In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] Sequence vs. Index Scan
On 5/5/07, Tom Lane <[EMAIL PROTECTED]> wrote: "Aaron Bono" <[EMAIL PROTECTED]> writes: > 9. -> Seq Scan on branch (cost=0.00..4.72rows=1 > width=1281) (actual time=130129.988..157492.057 rows=1 loops=1) > 10.Filter: ((start_day <= now()) AND > ((end_day IS NULL) OR (end_day >= now())) AND (branch_id = > get_branch_for_zip('22151'::character varying))) There is something *awfully* wacko about that entry --- the fact that the cost estimate is less than 5 units means that the planner thinks there's 4 or fewer pages; either that's way wrong or the get_branch_for_zip function is taking enormous amounts of time per row. Have you tried timing that function on its own? Yes, the first thing I did was run it on its own. It takes about 2-6 seconds to run which is not great but acceptable and the functions are marked as stable. It is the same +/- a second or two on each of the two schemas. The function is a plpgsql function that executes a dynamic SQL string. Would it be better to change it to a straight SQL function? Would that help the planner? I may be able to change it if this will help. One possible reason for the performance difference is if you have get_branch_for_zip marked as stable in one database and volatile in the other --- volatile would prevent it from being used in an indexqual as you'd like. I just switched the functions in both schemas to be stable (before I posted the original question), they were volatile. After doing that, the fast schema worked great but the slow one is still under performing. I even went back to the slow schema and dropped and recreated the function with no discernible difference. This is for PostgreSQL 8.1.3 and the results are the same on Windows and Linux versions. The windows version is a restore from the Linux version's backup. -- == Aaron Bono Aranya Software Technologies, Inc. http://www.aranya.com http://codeelixir.com ==
Re: [SQL] Sequence vs. Index Scan
On 5/5/07, Andrew Sullivan <[EMAIL PROTECTED]> wrote: On Sat, May 05, 2007 at 05:00:53PM -0500, Aaron Bono wrote: > We don't allow deletes and updates are fairly infrequent. I also did a > vacuum analyze to no effect. How do you "not allow" deletes? Permissions are set on the tables to only allow the application to do select/insert/update. We use start/end dates to indicate that records are no longer active and then have the application run against views that filter out inactive records. It also allows "removal" of records sometime in the future, gives us the ability to "undelete" records, and keeps us from having to code the application to have to clean up because of foreign key constraints. There are other advantages. I'd look at the output of VACUUM VERBOSE to make sure you don't have a lot of dead tuples. Fast Schema: psql:vacuumverbose.sql:1: INFO: vacuuming "fast_schema.branch" psql:vacuumverbose.sql:1: INFO: index "branch_pkey" now contains 150 row versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. Slow Schema: psql:vacuumverbose.sql:1: INFO: vacuuming "slow_schema.branch" psql:vacuumverbose.sql:1: INFO: index "branch_pkey" now contains 29 row versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. Like Tom said earlier, I really think it has something to do with the volatile/stable indicator on the function but I changed that. The function in the slow schema acts like it is still volatile while the fast schema is obviously stable. Is there another way to analyze how the database is using the function in the selects? That or is there a way to just look at how the function is being used by the optimizer? I do appreciate the feedback. -- == Aaron Bono Aranya Software Technologies, Inc. http://www.aranya.com http://codeelixir.com ==
Re: [SQL] Sequence vs. Index Scan
On 5/5/07, Tom Lane <[EMAIL PROTECTED]> wrote: "Aaron Bono" <[EMAIL PROTECTED]> writes: > 9. -> Seq Scan on branch (cost=0.00..4.72rows=1 > width=1281) (actual time=130129.988..157492.057 rows=1 loops=1) > 10.Filter: ((start_day <= now()) AND > ((end_day IS NULL) OR (end_day >= now())) AND (branch_id = > get_branch_for_zip('22151'::character varying))) There is something *awfully* wacko about that entry --- the fact that the cost estimate is less than 5 units means that the planner thinks there's 4 or fewer pages; either that's way wrong or the get_branch_for_zip function is taking enormous amounts of time per row. Have you tried timing that function on its own? One possible reason for the performance difference is if you have get_branch_for_zip marked as stable in one database and volatile in the other --- volatile would prevent it from being used in an indexqual as you'd like. I verified it by putting a RAISE NOTICE in the function. The fast schema runs the function twice (odd, I would think it would run only once). The slow schema runs it 30 times (the number of records returned + 1). I know I put the functions into both schemas as stable and even dropped and recreated the function. Then I verified with EMS Manager and it tells me the DDL for the function in the database is set to stable. Is there something I can do to tell PostgreSQL that I really did mean stable? Thanks for all the help, Aaron -- == Aaron Bono Aranya Software Technologies, Inc. http://www.aranya.com http://codeelixir.com ==
Re: [SQL] Sequence vs. Index Scan
On 5/5/07, Aaron Bono <[EMAIL PROTECTED]> wrote: On 5/5/07, Tom Lane <[EMAIL PROTECTED]> wrote: > "Aaron Bono" <[EMAIL PROTECTED]> writes: > > 9. -> Seq Scan on branch (cost=0.00..4.72 rows=1 > > width=1281) (actual time= 130129.988..157492.057 rows=1 loops=1) > > 10.Filter: ((start_day <= now()) AND > > ((end_day IS NULL) OR (end_day >= now())) AND (branch_id = > > get_branch_for_zip('22151'::character varying))) > > There is something *awfully* wacko about that entry --- the fact that > the cost estimate is less than 5 units means that the planner thinks > there's 4 or fewer pages; either that's way wrong or the > get_branch_for_zip function is taking enormous amounts of time per row. > Have you tried timing that function on its own? > > One possible reason for the performance difference is if you have > get_branch_for_zip marked as stable in one database and volatile in the > other --- volatile would prevent it from being used in an indexqual as > you'd like. > I verified it by putting a RAISE NOTICE in the function. The fast schema runs the function twice (odd, I would think it would run only once). The slow schema runs it 30 times (the number of records returned + 1). I know I put the functions into both schemas as stable and even dropped and recreated the function. Then I verified with EMS Manager and it tells me the DDL for the function in the database is set to stable. Is there something I can do to tell PostgreSQL that I really did mean stable? maybe this is silly but you can verify what the database thinks of the function selecting from pg_proc select pronamespace, provolatile from pg_proc where proname = 'get_branch_for_zip' -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org