Re: [PERFORM] Improve performance of query

2004-12-17 Thread John A Meinel
The first thing to check... Did you do a recent VACUUM ANALYZE? This 
updates all the statistics. There are a number of places where it says 
"rows=1000" which is usually the "I have no idea, let me guess 1000". 
Also, there are a number of places where the estimates are pretty far 
off. For instance:

Richard Rowell wrote:
->  Subquery Scan "*SELECT* 1"  (cost=0.00..64034.15 rows=10540 width=24) (actual time=279.089..4419.371 rows=161 loops=1)
 

estimating 10,000 when only 161 is a little bit different.
->  Seq Scan on da_answer a  (cost=0.00..63928.75 rows=10540 width=24) (actual time=279.080..4418.808 rows=161 loops=1)
  Filter: ((date_effective <= 99::double precision) AND (inactive <> 1) AND (subplan))
 

Though this could be a lack of cross-column statistics. If 2 columns are 
correlated, the planner isn't as accurate as it could be. Also, 
date_effective <= 99 doesn't seem very restrictive, could you 
use a between statement? (date between 0 and 999). I know for 
timestamps usually giving a between is better than a single sided query.

This one was underestimated.
->  Subquery Scan "*SELECT* 2"  (cost=988627.58..989175.52 rows=2799 width=24) (actual time=290.730..417.720 rows=7556 loops=1)
->  Hash Join  (cost=988627.58..989147.53 rows=2799 width=24) (actual time=290.722..395.739 rows=7556 loops=1)
  Hash Cond: ("outer".main_id = "inner".uid)
 

This is one of the ones that looks like it didn't have any ideas. It 
could be because of the function. You might consider adding a function 
index, though I think there are some caveats there.

->  Function Scan on svp_getparentproviderids  (cost=0.00..12.50 rows=1000 width=4) (actual time=0.473..0.474 rows=1 loops=1)
 

Another very poor estimation. It might be a need to increase the 
statistics for this column (ALTER TABLE, ALTER COLUMN, SET STATISTICS). 
IIRC, compared with other db's postgres defaults to a much lower 
statistics value. Try changing it from 10 (?) to 100 or so. There was a 
discussion that every column with an index should use higher statistics.

->  Index Scan using in_da_dr_type_provider on da_data_restrict  (cost=0.00..145.50 rows=46 width=8) (actual time=0.041..26.627 rows=7280 loops=1)
 

I'm not a great optimizer, these are just some first things to look at. 
Your sort mem seems pretty low to me (considering you have 1GB of RAM). 
Perhaps you could bump that up to 40MB instead of 4MB. Also, if you run 
this query twice in a row, is it still slow? (Sometimes it takes a bit 
of work to get the right indexes loaded into ram, but then it is faster.)

Just some guesses,
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Improve performance of query

2004-12-16 Thread Tom Lane
Richard Rowell <[EMAIL PROTECTED]> writes:
> I'm trying to port our application from MS-SQL to Postgres.  We have
> implemented all of our rather complicated application security in the
> database.  The query that follows takes a half of a second or less on
> MS-SQL server and around 5 seconds on Postgres.

The EXPLAIN shows that most of the time is going into repeated
executions of svp_getparentproviderids() in the first UNION arm:

>  ->  Seq Scan on da_answer a  
> (cost=0.00..63928.75 rows=10540 width=24) (actual time=279.080..4418.808 
> rows=161 loops=1)
>Filter: ((date_effective <= 
> 99::double precision) AND (inactive <> 1) AND (subplan))
>SubPlan
>  ->  Function Scan on 
> svp_getparentproviderids  (cost=0.00..15.00 rows=5 width=4) (actual 
> time=0.203..0.203 rows=0 loops=21089)
>Filter: 
> (svp_getparentproviderids = $1)

I'd suggest replacing the EXISTS coding by IN:
(EXISTS (SELECT * FROM svp_getparentproviderids(1) WHERE 
svp_getparentproviderids = a.provider_id))
to
(a.provider_id IN (SELECT * FROM svp_getparentproviderids(1)))
The latter form is likely to be significantly faster in PG 7.4.

It's also possible that the speed loss compared to MSSQL is really
inside the svp_getparentproviderids function; you should look into
that rather than assuming this query per se is at fault.

Also, do you actually need UNION as opposed to UNION ALL?  The
duplicate-elimination behavior of UNION is a bit expensive if not
needed.  It looks from the EXPLAIN output that some of the unions
aren't actually eliminating any rows.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Improve performance of query

2004-12-16 Thread John A Meinel
The first thing to check... Did you do a recent VACUUM ANALYZE? This
updates all the statistics. There are a number of places where it says
"rows=1000" which is usually the "I have no idea, let me guess 1000".
Also, there are a number of places where the estimates are pretty far
off. For instance:
Richard Rowell wrote:
->  Subquery Scan "*SELECT* 1"  (cost=0.00..64034.15 rows=10540 width=24) (actual time=279.089..4419.371 rows=161 loops=1)
 

estimating 10,000 when only 161 is a little bit different.
->  Seq Scan on da_answer a  (cost=0.00..63928.75 rows=10540 width=24) (actual time=279.080..4418.808 rows=161 loops=1)
  Filter: ((date_effective <= 99::double precision) AND (inactive <> 1) AND (subplan))
 

Though this could be a lack of cross-column statistics. If 2 columns are
correlated, the planner isn't as accurate as it could be. Also,
date_effective <= 99 doesn't seem very restrictive, could you
use a between statement? (date between 0 and 999). I know for
timestamps usually giving a between is better than a single sided query.
This one was underestimated.
->  Subquery Scan "*SELECT* 2"  (cost=988627.58..989175.52 rows=2799 width=24) (actual time=290.730..417.720 rows=7556 loops=1)
->  Hash Join  (cost=988627.58..989147.53 rows=2799 width=24) (actual time=290.722..395.739 rows=7556 loops=1)
  Hash Cond: ("outer".main_id = "inner".uid)
 

This is one of the ones that looks like it didn't have any ideas. It
could be because of the function. You might consider adding a function
index, though I think there are some caveats there.
->  Function Scan on svp_getparentproviderids  (cost=0.00..12.50 rows=1000 width=4) (actual time=0.473..0.474 rows=1 loops=1)
 

Another very poor estimation. It might be a need to increase the
statistics for this column (ALTER TABLE, ALTER COLUMN, SET STATISTICS).
IIRC, compared with other db's postgres defaults to a much lower
statistics value. Try changing it from 10 (?) to 100 or so. There was a
discussion that every column with an index should use higher statistics.
->  Index Scan using in_da_dr_type_provider on da_data_restrict  (cost=0.00..145.50 rows=46 width=8) (actual time=0.041..26.627 rows=7280 loops=1)
 

I'm not a great optimizer, these are just some first things to look at.
Your sort mem seems pretty low to me (considering you have 1GB of RAM).
Perhaps you could bump that up to 40MB instead of 4MB. Also, if you run
this query twice in a row, is it still slow? (Sometimes it takes a bit
of work to get the right indexes loaded into ram, but then it is faster.)
Just some guesses,
John
=:->



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Improve performance of query

2004-12-16 Thread Richard Huxton
Richard Rowell wrote:
I'm trying to port our application from MS-SQL to Postgres.  We have
implemented all of our rather complicated application security in the
database.  The query that follows takes a half of a second or less on
MS-SQL server and around 5 seconds on Postgres.  My concern is that this
data set is rather "small" by our applications standards.  It is not
unusual for the da_answer table to have 2-4 million records.  I'm
worried that if this very small data set is taking 5 seconds, then a
"regular sized" data set will take far too long.
I originally thought the NOT EXISTS on the
"da_data_restrict_except_open" table was killing performance, but the
query took the exact same amount of time after I deleted all rows from
this table.  Note that the hard-coded 9.0, and 4000 parameters,
as well as the parameter to svp_getparentproviders are the three
variables that change from one run of this query to the next.
I'm using Postgres 7.4.5 as packaged in Debian.  shared_buffers is set
to 57344 and sort_mem=4096.
That shared_buffers value sounds too large for 1GB RAM - rewind to 1 
say. Also make sure you've read the "performance tuning" article at:
  http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

I have included an EXPLAIN ANALYZE, relevant table counts, and relevant
indexing information.  If anyone has any suggestions on how to improve
performance  TIA!
I think it's the function call(s).
SELECT tab.answer_id, client_id, question_id, recordset_id,
date_effective, virt_field_name
FROM
(
SELECT a.uid AS answer_id, a.client_id, a.question_id, recordset_id,
date_effective
FROM da_answer a
WHERE a.date_effective <= 99.0
AND a.inactive != 1
AND
(
5000 = 4000 
OR
(EXISTS (SELECT * FROM svp_getparentproviderids(1) WHERE
svp_getparentproviderids = a.provider_id))
)
...
SubPlan
 ->  Function Scan on svp_getparentproviderids  (cost=0.00..15.00 rows=5 
width=4) (actual time=0.203..0.203 rows=0 loops=21089)
   Filter: (svp_getparentproviderids = $1)
Here it's running 21,089 loops around your function. Each one isn't 
costing much, but it's the total that's killing you I think. It might be 
possible to mark the function STABLE or such, depending on what it does 
- see http://www.postgresql.org/docs/7.4/static/sql-createfunction.html

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Improve performance of query

2004-12-16 Thread Stephen Frost
* Richard Rowell ([EMAIL PROTECTED]) wrote:
> I have included an EXPLAIN ANALYZE, relevant table counts, and relevant
> indexing information.  If anyone has any suggestions on how to improve
> performance  TIA!

Just a thought- do the UNION's actually have to be union's or would
having them be 'UNION ALL's work?

Stephen


signature.asc
Description: Digital signature


[PERFORM] Improve performance of query

2004-12-16 Thread Richard Rowell
I'm trying to port our application from MS-SQL to Postgres.  We have
implemented all of our rather complicated application security in the
database.  The query that follows takes a half of a second or less on
MS-SQL server and around 5 seconds on Postgres.  My concern is that this
data set is rather "small" by our applications standards.  It is not
unusual for the da_answer table to have 2-4 million records.  I'm
worried that if this very small data set is taking 5 seconds, then a
"regular sized" data set will take far too long.

I originally thought the NOT EXISTS on the
"da_data_restrict_except_open" table was killing performance, but the
query took the exact same amount of time after I deleted all rows from
this table.  Note that the hard-coded 9.0, and 4000 parameters,
as well as the parameter to svp_getparentproviders are the three
variables that change from one run of this query to the next.

I'm using Postgres 7.4.5 as packaged in Debian.  shared_buffers is set
to 57344 and sort_mem=4096.

The machine has an AMD 1.8+ and ` gig of RAM.  Here are some relevant
performance statistics:
richard:/usr/share/cups/model# cat /proc/sys/kernel/shmmax
536870912
richard:/usr/share/cups/model# cat /proc/sys/kernel/shmall
536870912
richard:/home/richard# hdparm -tT /dev/hda
 Timing cached reads:   1112 MB in  2.00 seconds = 556.00 MB/sec
 Timing buffered disk reads:  176 MB in  3.02 seconds =  58.28 MB/sec

I have included an EXPLAIN ANALYZE, relevant table counts, and relevant
indexing information.  If anyone has any suggestions on how to improve
performance  TIA!

SELECT tab.answer_id, client_id, question_id, recordset_id,
date_effective, virt_field_name
FROM
(
SELECT a.uid AS answer_id, a.client_id, a.question_id, recordset_id,
date_effective
FROM da_answer a
WHERE a.date_effective <= 99.0
AND a.inactive != 1
AND
(
5000 = 4000 
OR
(EXISTS (SELECT * FROM svp_getparentproviderids(1) WHERE
svp_getparentproviderids = a.provider_id))
)
UNION
SELECT a.uid AS answer_id, a.client_id, a.question_id, recordset_id,
date_effective
FROM da_answer a,
(
SELECT main_id 
FROM da_data_restrict
WHERE type_id = 2
AND (provider_id IN (SELECT * FROM svp_getparentproviderids(1)))
   
UNION

SELECT sa.uid AS main_id 
FROM da_answer sa
JOIN da_data_restrict_except_closed dr ON dr.main_id =
sa.uid AND dr.type_id = 2 AND dr.except_provider_id = 1
WHERE (restricted = 1) 
AND (restricted_closed_except = 1)
AND sa.covered_by_roi = 1
UNION
SELECT sa.uid AS main_id 
FROM da_answer sa
WHERE (restricted = 0) 
AND (restricted_open_except = 1)
AND (NOT EXISTS (SELECT dr.main_id FROM
da_data_restrict_except_open dr WHERE (dr.main_id = sa.uid) AND
(dr.type_id = 2) AND (dr.except_provider_id in (select * from
svp_getparentproviderids(1)
AND sa.covered_by_roi = 1
UNION
SELECT sa.uid AS main_id FROM da_answer sa WHERE (restricted
= 0) AND (restricted_open_except = 0)
AND sa.covered_by_roi = 1
) sec
WHERE a.covered_by_roi = 1
AND a.date_effective <= 99.0
AND a.inactive != 1
AND a.uid = sec.main_id
AND 5000 > 4000
) tab, da_question q
WHERE tab.question_id = q.uid AND (min_access_level <= 4000 OR
min_access_level IS NULL)

Table counts from relevant tables
da_question  1095
da_answer   21117
da_question  1095
da_data_restrict_except_closed   3087
da_data_restrict_except_open13391
svp_getparentproviderids(1) 1

Relevant Index
create index in_da_data_restrict_provider_id on
da_data_restrict(provider_id);
create index in_da_data_restrict_main_id on da_data_restrict(main_id);
create index in_da_data_restrict_type_id on da_data_restrict(type_id);
create index in_da_data_restrict_client_id on
da_data_restrict(client_id);
create index in_da_dr_type_provider on
da_data_restrict(type_id,provider_id);

create index in_da_data_rec_provider_id ON
da_data_restrict_except_closed(provider_id);
create index in_da_data_rec_type_id ON
da_data_restrict_except_closed(type_id);
create index in_da_data_rec_main_id ON
da_data_restrict_except_closed(main_id);
create index in_da_data_rec_except_provider_id ON
da_data_restrict_except_closed(except_provider_id);

create index in_da_data_reo_provider_id ON
da_data_restrict_except_open(provider_id);
create index in_da_data_reo_type_id ON
da_data_restrict_except_open(type_id);
create index in_da_data_reo_main_id ON
da_data_restrict_except_open(main_id);
create index in_da_data_reo_except_provider_id ON
da_data_restrict_except_open(except_provider_id);

create index in_da_answer_client_id ON