Re: [PERFORM] Improve performance of query
The first thing to check... Did you do a recent VACUUM ANALYZE? This updates all the statistics. There are a number of places where it says "rows=1000" which is usually the "I have no idea, let me guess 1000". Also, there are a number of places where the estimates are pretty far off. For instance: Richard Rowell wrote: -> Subquery Scan "*SELECT* 1" (cost=0.00..64034.15 rows=10540 width=24) (actual time=279.089..4419.371 rows=161 loops=1) estimating 10,000 when only 161 is a little bit different. -> Seq Scan on da_answer a (cost=0.00..63928.75 rows=10540 width=24) (actual time=279.080..4418.808 rows=161 loops=1) Filter: ((date_effective <= 99::double precision) AND (inactive <> 1) AND (subplan)) Though this could be a lack of cross-column statistics. If 2 columns are correlated, the planner isn't as accurate as it could be. Also, date_effective <= 99 doesn't seem very restrictive, could you use a between statement? (date between 0 and 999). I know for timestamps usually giving a between is better than a single sided query. This one was underestimated. -> Subquery Scan "*SELECT* 2" (cost=988627.58..989175.52 rows=2799 width=24) (actual time=290.730..417.720 rows=7556 loops=1) -> Hash Join (cost=988627.58..989147.53 rows=2799 width=24) (actual time=290.722..395.739 rows=7556 loops=1) Hash Cond: ("outer".main_id = "inner".uid) This is one of the ones that looks like it didn't have any ideas. It could be because of the function. You might consider adding a function index, though I think there are some caveats there. -> Function Scan on svp_getparentproviderids (cost=0.00..12.50 rows=1000 width=4) (actual time=0.473..0.474 rows=1 loops=1) Another very poor estimation. It might be a need to increase the statistics for this column (ALTER TABLE, ALTER COLUMN, SET STATISTICS). IIRC, compared with other db's postgres defaults to a much lower statistics value. Try changing it from 10 (?) to 100 or so. There was a discussion that every column with an index should use higher statistics. -> Index Scan using in_da_dr_type_provider on da_data_restrict (cost=0.00..145.50 rows=46 width=8) (actual time=0.041..26.627 rows=7280 loops=1) I'm not a great optimizer, these are just some first things to look at. Your sort mem seems pretty low to me (considering you have 1GB of RAM). Perhaps you could bump that up to 40MB instead of 4MB. Also, if you run this query twice in a row, is it still slow? (Sometimes it takes a bit of work to get the right indexes loaded into ram, but then it is faster.) Just some guesses, John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Improve performance of query
Richard Rowell <[EMAIL PROTECTED]> writes: > I'm trying to port our application from MS-SQL to Postgres. We have > implemented all of our rather complicated application security in the > database. The query that follows takes a half of a second or less on > MS-SQL server and around 5 seconds on Postgres. The EXPLAIN shows that most of the time is going into repeated executions of svp_getparentproviderids() in the first UNION arm: > -> Seq Scan on da_answer a > (cost=0.00..63928.75 rows=10540 width=24) (actual time=279.080..4418.808 > rows=161 loops=1) >Filter: ((date_effective <= > 99::double precision) AND (inactive <> 1) AND (subplan)) >SubPlan > -> Function Scan on > svp_getparentproviderids (cost=0.00..15.00 rows=5 width=4) (actual > time=0.203..0.203 rows=0 loops=21089) >Filter: > (svp_getparentproviderids = $1) I'd suggest replacing the EXISTS coding by IN: (EXISTS (SELECT * FROM svp_getparentproviderids(1) WHERE svp_getparentproviderids = a.provider_id)) to (a.provider_id IN (SELECT * FROM svp_getparentproviderids(1))) The latter form is likely to be significantly faster in PG 7.4. It's also possible that the speed loss compared to MSSQL is really inside the svp_getparentproviderids function; you should look into that rather than assuming this query per se is at fault. Also, do you actually need UNION as opposed to UNION ALL? The duplicate-elimination behavior of UNION is a bit expensive if not needed. It looks from the EXPLAIN output that some of the unions aren't actually eliminating any rows. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Improve performance of query
The first thing to check... Did you do a recent VACUUM ANALYZE? This updates all the statistics. There are a number of places where it says "rows=1000" which is usually the "I have no idea, let me guess 1000". Also, there are a number of places where the estimates are pretty far off. For instance: Richard Rowell wrote: -> Subquery Scan "*SELECT* 1" (cost=0.00..64034.15 rows=10540 width=24) (actual time=279.089..4419.371 rows=161 loops=1) estimating 10,000 when only 161 is a little bit different. -> Seq Scan on da_answer a (cost=0.00..63928.75 rows=10540 width=24) (actual time=279.080..4418.808 rows=161 loops=1) Filter: ((date_effective <= 99::double precision) AND (inactive <> 1) AND (subplan)) Though this could be a lack of cross-column statistics. If 2 columns are correlated, the planner isn't as accurate as it could be. Also, date_effective <= 99 doesn't seem very restrictive, could you use a between statement? (date between 0 and 999). I know for timestamps usually giving a between is better than a single sided query. This one was underestimated. -> Subquery Scan "*SELECT* 2" (cost=988627.58..989175.52 rows=2799 width=24) (actual time=290.730..417.720 rows=7556 loops=1) -> Hash Join (cost=988627.58..989147.53 rows=2799 width=24) (actual time=290.722..395.739 rows=7556 loops=1) Hash Cond: ("outer".main_id = "inner".uid) This is one of the ones that looks like it didn't have any ideas. It could be because of the function. You might consider adding a function index, though I think there are some caveats there. -> Function Scan on svp_getparentproviderids (cost=0.00..12.50 rows=1000 width=4) (actual time=0.473..0.474 rows=1 loops=1) Another very poor estimation. It might be a need to increase the statistics for this column (ALTER TABLE, ALTER COLUMN, SET STATISTICS). IIRC, compared with other db's postgres defaults to a much lower statistics value. Try changing it from 10 (?) to 100 or so. There was a discussion that every column with an index should use higher statistics. -> Index Scan using in_da_dr_type_provider on da_data_restrict (cost=0.00..145.50 rows=46 width=8) (actual time=0.041..26.627 rows=7280 loops=1) I'm not a great optimizer, these are just some first things to look at. Your sort mem seems pretty low to me (considering you have 1GB of RAM). Perhaps you could bump that up to 40MB instead of 4MB. Also, if you run this query twice in a row, is it still slow? (Sometimes it takes a bit of work to get the right indexes loaded into ram, but then it is faster.) Just some guesses, John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Improve performance of query
Richard Rowell wrote: I'm trying to port our application from MS-SQL to Postgres. We have implemented all of our rather complicated application security in the database. The query that follows takes a half of a second or less on MS-SQL server and around 5 seconds on Postgres. My concern is that this data set is rather "small" by our applications standards. It is not unusual for the da_answer table to have 2-4 million records. I'm worried that if this very small data set is taking 5 seconds, then a "regular sized" data set will take far too long. I originally thought the NOT EXISTS on the "da_data_restrict_except_open" table was killing performance, but the query took the exact same amount of time after I deleted all rows from this table. Note that the hard-coded 9.0, and 4000 parameters, as well as the parameter to svp_getparentproviders are the three variables that change from one run of this query to the next. I'm using Postgres 7.4.5 as packaged in Debian. shared_buffers is set to 57344 and sort_mem=4096. That shared_buffers value sounds too large for 1GB RAM - rewind to 1 say. Also make sure you've read the "performance tuning" article at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php I have included an EXPLAIN ANALYZE, relevant table counts, and relevant indexing information. If anyone has any suggestions on how to improve performance TIA! I think it's the function call(s). SELECT tab.answer_id, client_id, question_id, recordset_id, date_effective, virt_field_name FROM ( SELECT a.uid AS answer_id, a.client_id, a.question_id, recordset_id, date_effective FROM da_answer a WHERE a.date_effective <= 99.0 AND a.inactive != 1 AND ( 5000 = 4000 OR (EXISTS (SELECT * FROM svp_getparentproviderids(1) WHERE svp_getparentproviderids = a.provider_id)) ) ... SubPlan -> Function Scan on svp_getparentproviderids (cost=0.00..15.00 rows=5 width=4) (actual time=0.203..0.203 rows=0 loops=21089) Filter: (svp_getparentproviderids = $1) Here it's running 21,089 loops around your function. Each one isn't costing much, but it's the total that's killing you I think. It might be possible to mark the function STABLE or such, depending on what it does - see http://www.postgresql.org/docs/7.4/static/sql-createfunction.html -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Improve performance of query
* Richard Rowell ([EMAIL PROTECTED]) wrote: > I have included an EXPLAIN ANALYZE, relevant table counts, and relevant > indexing information. If anyone has any suggestions on how to improve > performance TIA! Just a thought- do the UNION's actually have to be union's or would having them be 'UNION ALL's work? Stephen signature.asc Description: Digital signature
[PERFORM] Improve performance of query
I'm trying to port our application from MS-SQL to Postgres. We have implemented all of our rather complicated application security in the database. The query that follows takes a half of a second or less on MS-SQL server and around 5 seconds on Postgres. My concern is that this data set is rather "small" by our applications standards. It is not unusual for the da_answer table to have 2-4 million records. I'm worried that if this very small data set is taking 5 seconds, then a "regular sized" data set will take far too long. I originally thought the NOT EXISTS on the "da_data_restrict_except_open" table was killing performance, but the query took the exact same amount of time after I deleted all rows from this table. Note that the hard-coded 9.0, and 4000 parameters, as well as the parameter to svp_getparentproviders are the three variables that change from one run of this query to the next. I'm using Postgres 7.4.5 as packaged in Debian. shared_buffers is set to 57344 and sort_mem=4096. The machine has an AMD 1.8+ and ` gig of RAM. Here are some relevant performance statistics: richard:/usr/share/cups/model# cat /proc/sys/kernel/shmmax 536870912 richard:/usr/share/cups/model# cat /proc/sys/kernel/shmall 536870912 richard:/home/richard# hdparm -tT /dev/hda Timing cached reads: 1112 MB in 2.00 seconds = 556.00 MB/sec Timing buffered disk reads: 176 MB in 3.02 seconds = 58.28 MB/sec I have included an EXPLAIN ANALYZE, relevant table counts, and relevant indexing information. If anyone has any suggestions on how to improve performance TIA! SELECT tab.answer_id, client_id, question_id, recordset_id, date_effective, virt_field_name FROM ( SELECT a.uid AS answer_id, a.client_id, a.question_id, recordset_id, date_effective FROM da_answer a WHERE a.date_effective <= 99.0 AND a.inactive != 1 AND ( 5000 = 4000 OR (EXISTS (SELECT * FROM svp_getparentproviderids(1) WHERE svp_getparentproviderids = a.provider_id)) ) UNION SELECT a.uid AS answer_id, a.client_id, a.question_id, recordset_id, date_effective FROM da_answer a, ( SELECT main_id FROM da_data_restrict WHERE type_id = 2 AND (provider_id IN (SELECT * FROM svp_getparentproviderids(1))) UNION SELECT sa.uid AS main_id FROM da_answer sa JOIN da_data_restrict_except_closed dr ON dr.main_id = sa.uid AND dr.type_id = 2 AND dr.except_provider_id = 1 WHERE (restricted = 1) AND (restricted_closed_except = 1) AND sa.covered_by_roi = 1 UNION SELECT sa.uid AS main_id FROM da_answer sa WHERE (restricted = 0) AND (restricted_open_except = 1) AND (NOT EXISTS (SELECT dr.main_id FROM da_data_restrict_except_open dr WHERE (dr.main_id = sa.uid) AND (dr.type_id = 2) AND (dr.except_provider_id in (select * from svp_getparentproviderids(1) AND sa.covered_by_roi = 1 UNION SELECT sa.uid AS main_id FROM da_answer sa WHERE (restricted = 0) AND (restricted_open_except = 0) AND sa.covered_by_roi = 1 ) sec WHERE a.covered_by_roi = 1 AND a.date_effective <= 99.0 AND a.inactive != 1 AND a.uid = sec.main_id AND 5000 > 4000 ) tab, da_question q WHERE tab.question_id = q.uid AND (min_access_level <= 4000 OR min_access_level IS NULL) Table counts from relevant tables da_question 1095 da_answer 21117 da_question 1095 da_data_restrict_except_closed 3087 da_data_restrict_except_open13391 svp_getparentproviderids(1) 1 Relevant Index create index in_da_data_restrict_provider_id on da_data_restrict(provider_id); create index in_da_data_restrict_main_id on da_data_restrict(main_id); create index in_da_data_restrict_type_id on da_data_restrict(type_id); create index in_da_data_restrict_client_id on da_data_restrict(client_id); create index in_da_dr_type_provider on da_data_restrict(type_id,provider_id); create index in_da_data_rec_provider_id ON da_data_restrict_except_closed(provider_id); create index in_da_data_rec_type_id ON da_data_restrict_except_closed(type_id); create index in_da_data_rec_main_id ON da_data_restrict_except_closed(main_id); create index in_da_data_rec_except_provider_id ON da_data_restrict_except_closed(except_provider_id); create index in_da_data_reo_provider_id ON da_data_restrict_except_open(provider_id); create index in_da_data_reo_type_id ON da_data_restrict_except_open(type_id); create index in_da_data_reo_main_id ON da_data_restrict_except_open(main_id); create index in_da_data_reo_except_provider_id ON da_data_restrict_except_open(except_provider_id); create index in_da_answer_client_id ON