[PERFORM] strange query behavior

2006-12-13 Thread Tim Jones
Hi,

  I have a query that uses an IN clause and it seems in perform great
when there is more than two values in it but if there is only one it is
really slow. Also if I change the query to use an = instead of IN in the
case of only one value it is still slow. Possibly I need to reindex this
particular index?

thanks 

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] strange query behavior

2006-12-13 Thread Tim Jones
nd: (batteryidentifier = 1177470)
   ->  Bitmap Index Scan on ix_batcomment  (cost=0.00..1.02
rows=6 width=0) (actual time=0.032..0.032 rows=0 loops=1)
 Index Cond: (batteryidentifier = 1177470)
 Total runtime: 19275.838 ms
(18 rows)


Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555
-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 13, 2006 2:17 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior 

"Tim Jones" <[EMAIL PROTECTED]> writes:
>   I have a query that uses an IN clause and it seems in perform great 
> when there is more than two values in it but if there is only one it 
> is really slow. Also if I change the query to use an = instead of IN 
> in the case of only one value it is still slow.

Please provide EXPLAIN ANALYZE output for both cases.

> Possibly I need to reindex this
> particular index?

More likely you need to ANALYZE the table so that the planner has
up-to-date stats ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] strange query behavior

2006-12-13 Thread Tim Jones
Version 8.1

Here are the planner constraints I believe we changed
effective_cache_size and random_page_cost
BTW this is an AIX 5.2 

#---

# QUERY TUNING
#---


# - Planner Method Configuration -

#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on

# - Planner Cost Constants -

#effective_cache_size = 1   # typically 8KB each
effective_cache_size = 40
random_page_cost = 3.8  # units are one sequential page fetch
# cost
#cpu_tuple_cost = 0.01  # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025 # (same)

# - Genetic Query Optimizer -

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5# range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0   # selects default based on effort
#geqo_selection_bias = 2.0  # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10 # range 1-1000
#constraint_exclusion = off
#from_collapse_limit = 8
#join_collapse_limit = 8# 1 disables collapsing of explicit
# JOINs


Thanks

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 13, 2006 4:59 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior 

"Tim Jones" <[EMAIL PROTECTED]> writes:
> The tables for theses queries are vacuumed and analyzed regularly. I 
> just did an analyze to be sure and here are the results ...

There's something pretty wacko about the choice of plan in the slow case
--- I don't see why it'd not have used the same plan structure as for
the IN case.  It's coming up with a cost a lot higher than for the
other, so it certainly knows this isn't a great plan ...

Which PG version is this exactly?  Are you running with any nondefault
planner parameters?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] strange query behavior

2006-12-13 Thread Tim Jones
Looks like 8.1.2

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 13, 2006 5:37 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior 

"Tim Jones" <[EMAIL PROTECTED]> writes:
>> Which PG version is this exactly?  Are you running with any 
>> nondefault planner parameters?

> Version 8.1

8.1.what?

> Here are the planner constraints I believe we changed 
> effective_cache_size and random_page_cost

Those look reasonably harmless.

My best bet at the moment is that you've got a pretty early 8.1.x
release and are hitting one of the planner bugs that we fixed earlier
this year.  Not enough info to say for sure though.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] strange query behavior

2006-12-13 Thread Tim Jones
That's what I did and got 8.1.2 ... do you want gcc version etc 3.3.2
powerpc aix5.2

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Matthew O'Connor [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 13, 2006 5:51 PM
To: Tim Jones
Subject: Re: [PERFORM] strange query behavior

 From psql perform: select version();
and send us that output.

Tim Jones wrote:
> Looks like 8.1.2
> 
> Tim Jones
> Healthcare Project Manager
> Optio Software, Inc.
> (770) 576-3555
> 
> -Original Message-
> From: Tom Lane [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, December 13, 2006 5:37 PM
> To: Tim Jones
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] strange query behavior
> 
> "Tim Jones" <[EMAIL PROTECTED]> writes:
>>> Which PG version is this exactly?  Are you running with any 
>>> nondefault planner parameters?
> 
>> Version 8.1
> 
> 8.1.what?
> 
>> Here are the planner constraints I believe we changed 
>> effective_cache_size and random_page_cost
> 
> Those look reasonably harmless.
> 
> My best bet at the moment is that you've got a pretty early 8.1.x 
> release and are hitting one of the planner bugs that we fixed earlier 
> this year.  Not enough info to say for sure though.
> 
>   regards, tom lane
> 
> ---(end of 
> broadcast)---
> TIP 5: don't forget to increase your free space map settings
> 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] strange query behavior

2006-12-14 Thread Tim Jones

18,273,008 rows in observationresults

pg_stats:

select * from pg_stats where tablename='observationresults' and
attname='batteryidentifier';

 schemaname | tablename  |  attname  | null_frac |
avg_width | n_distinct | most_common_vals
|most_common_freqs
|  histogram_bounds
| correlation
++---+---+--
-++-
-+--
---+
-+-
 public | observationresults | batteryidentifier | 0 |
4 |  12942 |
{437255,1588952,120420,293685,356599,504069,589910,693683,834990,854693}
|
{0.0013,0.0013,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001}
|
{3561,271263,556929,839038,1125682,1406538,1697589,1970463,2226781,25392
41,2810844} |     0.31779

thanks

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 13, 2006 6:25 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior 


The large rowcount estimate makes it back off to a non-nestloop plan for
the outer joins, and in this situation that's a loser.

I'm actually not sure why they're not both too high --- with the
rowcount estimate of 1362 for the inner scan in the first example, you'd
expect about twice that for the join result.  But the immediate problem
is that in the case where it knows exactly what batteryidentifier is
being probed for, it's still off by more than a factor of 100 on the
rowcount estimate for observationresults.  How many rows in
observationresults, and may we see the pg_stats entry for
observationresults.batteryidentifier?

It's likely that the answer for you will be "raise the statistics target
for observationresults and re-ANALYZE", but I'd like to gather more info
about what's going wrong first.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] strange query behavior

2006-12-14 Thread Tim Jones
ok thanks Tom I will alter the statistics and re-analyze the table.

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 14, 2006 12:49 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] strange query behavior 

"Tim Jones" <[EMAIL PROTECTED]> writes:
> 18,273,008 rows in observationresults
> [ and n_distinct = 12942 ]

OK, so the estimated rowcounts are coming from those two numbers.
It's notoriously hard to get a decent n_distinct estimate from a small
sample :-(, and I would imagine the number of batteryidentifiers is
really a lot more than 12942?

What you need to do is increase the statistics target for
observationresults.batteryidentifier (see ALTER TABLE) and re-ANALYZE
and see if you get a saner n_distinct in pg_stats.  I'd try 100 and then
1000 as target.  Or you could just increase the global default target
(see postgresql.conf) but that might be overkill.

It's still a bit odd that the case with two batteryidentifiers was
estimated fairly accurately when the other wasn't; I'll go look into
that.  But in any case you need better stats if you want good plans.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] query plans different for 8.1 on windows and aix

2006-01-19 Thread Tim Jones




Hi,
 
  I am trying 
to perform the following type of query  'select patientname ... from 
patient were patientname LIKE 'JONES%, %' order by patientname asc limit 100'. 
There about 1.4 million rows in the table. On my windows machine (2GB Ram ,3Ghz, 
Windows XP, 120GB Hard Drive, postgres 8.1beta4) it takes about 150 millisecs 
and the query plan is 
 
    
'Limit  (cost=18381.90..18384.40 rows=100 width=404)''  
->  Unique  (cost=18381.90..18418.62 rows=1469 
width=404)''    ->  
Sort  (cost=18381.90..18385.57 rows=1469 
width=404)''  
Sort Key: patientname, patientidentifier, patientvipindicator, 
patientconfidentiality, patientmrn, patientfacility, patientssn, patientsex, 
patientbirthdate''  
->  Bitmap Heap Scan on patient  (cost=81.08..18304.62 rows=1469 
width=404)''    
Filter: ((patientname)::text ~~ ''BILL%, 
%''::text)''    
->  Bitmap Index Scan on ix_patientname  (cost=0.00..81.08 
rows=7347 
width=0)''  
Index Cond: (((patientname)::text >= ''BILL''::character varying) AND 
((patientname)::text < ''BILM''::character varying))'
However the same 
query on AIX (4 1.5Ghz processors, 60GB filesystem, 4GB Ram, postgres 8.1.2) it 
takes like 5 secs because the query plan just uses sequentials 
scans
 
Limit  
(cost=100054251.96..100054253.41 rows=58 width=161)   ->  
Unique  (cost=100054251.96..100054253.41 rows=58 
width=161) ->  
Sort  (cost=100054251.96..100054252.11 rows=58 
width=161)   
Sort Key: patientname, patientidentifier, patientvipindicator, 
patientconfidentiality, patientmrn, patientfacility, patientssn, patientsex, 
patientbirthdate   
->  Seq Scan on patient  (cost=1.00..100054250.26 rows=58 
width=161)     
Filter: ((patientname)::text ~~ 'SMITH%, NA%'::text)
Why is postgres 
using a sequential scan and not the index what parameters do I need to 
adjust
 
thanks
Tim 
Jones
Optio 
Software


[PERFORM] joining two tables slow due to sequential scan

2006-02-10 Thread Tim Jones



 
I am trying to join 
two tables and keep getting a sequential scan in the plan even though there is 
an index on the columns I am joining on.  Basically this the deal  ... 
I have two tables with docid in them which is what I am using for the 
join.  
 
ClinicalDocs ... (no 
primary key) though it does not help if I make docid primary 
key
docid integer 
(index)
patientid integer 
(index)
visitid integer 
(index)
 ...
 
Documentversions
docid integer 
(index)
docversionnumber 
(index)
docversionidentifier 
(primary key)
 
It seems to do an 
index scan if I put the primary key as docid.  This is what occurs when I 
link on the patid from ClinicalDocs to patient table.  However I can not 
make the docid primary key because it gets repeated depending on how may 
versions of a document I have.  I have tried using a foreign key on 
documentversions with no sucess. 
 
In addition this 
query
 
select * from 
documentversions join clinicaldocuments on 
documentversions.documentidentifier= clinicaldocuments.dssdocumentidentifier 
where documentversions.documentstatus = 'AC'; 
 
does index scan 

but if I change the 
order e.g
 
select * from clinicaldocuments 
join documentversions on clinicaldocuments.dssdocumentidentifier= 
documentversions .documentidentifier where clinicaldocuments.patientidentifier= 
123;
 
does sequential scan what I need is bottom 
query
it is extremely slow ... Any ideas 
?
 
Tim Jones
Healthcare Project Manager
Optio Software, 
Inc.
(770) 576-3555
 


Re: [PERFORM] joining two tables slow due to sequential scan

2006-02-10 Thread Tim Jones


OK.  I'm gonna make a couple of guesses here:

1:  clinicaldocuments.patientidentifier is an int8 and you're running
7.4 or before.

-- nope int4  and 8.1

2: There are more rows with clinicaldocuments.patientidentifier= 123
than with documentversions.documentstatus = 'AC'.

-- nope generally speaking all statuses are 'AC'

3: documentversions.documentidentifier and
clinicaldocuments.dssdocumentidentifier are not the same type.

-- nope both int4

Any of those things true?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] joining two tables slow due to sequential scan

2006-02-10 Thread Tim Jones
for first query

QUERY PLAN
'Limit  (cost=4.69..88.47 rows=10 width=1350) (actual
time=32.195..32.338 rows=10 loops=1)'
'  ->  Nested Loop  (cost=4.69..4043.09 rows=482 width=1350) (actual
time=32.190..32.316 rows=10 loops=1)'
'->  Bitmap Heap Scan on documentversions  (cost=4.69..1139.40
rows=482 width=996) (actual time=32.161..32.171 rows=10 loops=1)'
'  Recheck Cond: (documentstatus = ''AC''::bpchar)'
'  ->  Bitmap Index Scan on ix_docstatus  (cost=0.00..4.69
rows=482 width=0) (actual time=31.467..31.467 rows=96368 loops=1)'
'Index Cond: (documentstatus = ''AC''::bpchar)'
'->  Index Scan using ix_cdocdid on clinicaldocuments
(cost=0.00..6.01 rows=1 width=354) (actual time=0.006..0.007 rows=1
loops=10)'
'  Index Cond: ("outer".documentidentifier =
clinicaldocuments.dssdocumentidentifier)'
 
 
 for second query

QUERY PLAN
'Hash Join  (cost=899.83..4384.17 rows=482 width=1350)'
'  Hash Cond: ("outer".documentidentifier =
"inner".dssdocumentidentifier)'
'  ->  Seq Scan on documentversions  (cost=0.00..2997.68 rows=96368
width=996)'
'  ->  Hash  (cost=898.62..898.62 rows=482 width=354)'
'->  Bitmap Heap Scan on clinicaldocuments  (cost=4.69..898.62
rows=482 width=354)'
'  Recheck Cond: (patientidentifier = 123)'
'  ->  Bitmap Index Scan on ix_cdocpid  (cost=0.00..4.69
rows=482 width=0)'
'    Index Cond: (patientidentifier = 123)'


thnx

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555
 



From: Dave Dutcher [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 5:15 PM
To: Tim Jones; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] joining two tables slow due to sequential scan



What version of postgres are you using?  Can you post the output from
EXPLAIN ANALYZE?

 

 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tim Jones
Sent: Friday, February 10, 2006 4:07 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] joining two tables slow due to sequential scan

 

 

I am trying to join two tables and keep getting a sequential scan in the
plan even though there is an index on the columns I am joining on.
Basically this the deal  ... I have two tables with docid in them which
is what I am using for the join.  

 

ClinicalDocs ... (no primary key) though it does not help if I make
docid primary key

docid integer (index)

patientid integer (index)

visitid integer (index)

 ...

 

Documentversions

docid integer (index)

docversionnumber (index)

docversionidentifier (primary key)

 

It seems to do an index scan if I put the primary key as docid.  This is
what occurs when I link on the patid from ClinicalDocs to patient table.
However I can not make the docid primary key because it gets repeated
depending on how may versions of a document I have.  I have tried using
a foreign key on documentversions with no sucess. 

 

In addition this query

 

select * from documentversions join clinicaldocuments on
documentversions.documentidentifier
= clinicaldocuments.dssdocumentidentifier where
documentversions.documentstatus = 'AC'; 

 

does index scan 

but if I change the order e.g

 

select * from clinicaldocuments join documentversions on
clinicaldocuments.dssdocumentidentifier
= documentversions .documentidentifier where
clinicaldocuments.patientidentifier= 123;

 

does sequential scan what I need is bottom query

it is extremely slow ... Any ideas ?

 

Tim Jones

Healthcare Project Manager

Optio Software, Inc.

(770) 576-3555

 


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] joining two tables slow due to sequential scan

2006-02-10 Thread Tim Jones
oops

QUERY PLAN
'Hash Join  (cost=899.83..4384.17 rows=482 width=1350) (actual
time=0.203..0.203 rows=0 loops=1)'
'  Hash Cond: ("outer".documentidentifier =
"inner".dssdocumentidentifier)'
'  ->  Seq Scan on documentversions  (cost=0.00..2997.68 rows=96368
width=996) (actual time=0.007..0.007 rows=1 loops=1)'
'  ->  Hash  (cost=898.62..898.62 rows=482 width=354) (actual
time=0.161..0.161 rows=0 loops=1)'
'->  Bitmap Heap Scan on clinicaldocuments  (cost=4.69..898.62
rows=482 width=354) (actual time=0.159..0.159 rows=0 loops=1)'
'  Recheck Cond: (patientidentifier = 123)'
'  ->  Bitmap Index Scan on ix_cdocpid  (cost=0.00..4.69
rows=482 width=0) (actual time=0.153..0.153 rows=0 loops=1)'
'Index Cond: (patientidentifier = 123)'
'Total runtime: 0.392 ms'

note I have done these on a smaller db than what I am using but the
plans are the same 


Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Scott Marlowe [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 5:39 PM
To: Tim Jones
Cc: Dave Dutcher; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] joining two tables slow due to sequential scan

On Fri, 2006-02-10 at 16:37, Tim Jones wrote:
> for first query
> 
> QUERY PLAN
> 'Limit  (cost=4.69..88.47 rows=10 width=1350) (actual
> time=32.195..32.338 rows=10 loops=1)'
> '  ->  Nested Loop  (cost=4.69..4043.09 rows=482 width=1350) (actual
> time=32.190..32.316 rows=10 loops=1)'
> '->  Bitmap Heap Scan on documentversions  (cost=4.69..1139.40
> rows=482 width=996) (actual time=32.161..32.171 rows=10 loops=1)'
> '  Recheck Cond: (documentstatus = ''AC''::bpchar)'
> '  ->  Bitmap Index Scan on ix_docstatus  (cost=0.00..4.69
> rows=482 width=0) (actual time=31.467..31.467 rows=96368 loops=1)'
> 'Index Cond: (documentstatus = ''AC''::bpchar)'
> '->  Index Scan using ix_cdocdid on clinicaldocuments
> (cost=0.00..6.01 rows=1 width=354) (actual time=0.006..0.007 rows=1 
> loops=10)'
> '  Index Cond: ("outer".documentidentifier =
> clinicaldocuments.dssdocumentidentifier)'
>  
> 
>  for second query
> 
> QUERY PLAN
> 'Hash Join  (cost=899.83..4384.17 rows=482 width=1350)'
> '  Hash Cond: ("outer".documentidentifier = 
> "inner".dssdocumentidentifier)'
> '  ->  Seq Scan on documentversions  (cost=0.00..2997.68 rows=96368 
> width=996)'
> '  ->  Hash  (cost=898.62..898.62 rows=482 width=354)'
> '->  Bitmap Heap Scan on clinicaldocuments  (cost=4.69..898.62
> rows=482 width=354)'
> '  Recheck Cond: (patientidentifier = 123)'
> '  ->  Bitmap Index Scan on ix_cdocpid  (cost=0.00..4.69
> rows=482 width=0)'
> 'Index Cond: (patientidentifier = 123)'

OK, the first one is explain analyze, but the second one is just plain
explain...

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] joining two tables slow due to sequential scan

2006-02-10 Thread Tim Jones
ok here is real db

the first  query I had seems to make no sense because it is only fast if
I limit the rows since almost all rows have status = 'AC'

second query
 tables both have about 10 million rows and it takes a long time as you
can see but this person only has approx 160 total documents


 QUERY PLAN  

---
 Hash Join  (cost=84813.14..1510711.97 rows=48387 width=555) (actual
time=83266.854..91166.315 rows=3 loops=1)
   Hash Cond: ("outer".documentidentifier =
"inner".dssdocumentidentifier)
   ->  Seq Scan on documentversions  (cost=0.00..269141.98 rows=9677398
width=415) (actual time=0.056..49812.459 rows=9677398 loops=1)
   ->  Hash  (cost=83660.05..83660.05 rows=48036 width=140) (actual
time=10.833..10.833 rows=3 loops=1)
 ->  Bitmap Heap Scan on clinicaldocuments
(cost=301.13..83660.05 rows=48036 width=140) (actual time=0.243..0.258
rows=3 loops=1)
   Recheck Cond: (patientidentifier = 690193)
   ->  Bitmap Index Scan on ix_cdocpid  (cost=0.00..301.13
rows=48036 width=0) (actual time=0.201..0.201 rows=3 loops=1)
 Index Cond: (patientidentifier = 690193)
 Total runtime: 91166.540 ms


Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 5:52 PM
To: Tim Jones
Cc: Scott Marlowe; Dave Dutcher; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] joining two tables slow due to sequential scan 

"Tim Jones" <[EMAIL PROTECTED]> writes:
> QUERY PLAN
> 'Hash Join  (cost=899.83..4384.17 rows=482 width=1350) (actual
> time=0.203..0.203 rows=0 loops=1)'
> ...
> 'Total runtime: 0.392 ms'

Hardly seems like evidence of a performance problem ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] joining two tables slow due to sequential scan

2006-02-13 Thread Tim Jones
ok I am retarded :) Apparently I thought I had done analyze on these
tables but I actually had not and that was all that was needed. but
thanks for the help.


Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Dave Dutcher [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 6:25 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] joining two tables slow due to sequential scan 

OK, if I'm reading this correctly, it looks like the planner is choosing
a sequential scan because it expects 48,000 rows for that
patientidentifier, but its actually only getting 3.  The planner has the
number of rows right for the sequential scan, so it seems like the stats
are up to date.  I would try increasing the stats for the
patientindentifier column with 'alter table set statistics...' or
increasing the default_statistics_target for the whole DB.  Once you
have changed the stats I believe you need to run analyze again.



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tim Jones
Sent: Friday, February 10, 2006 4:59 PM
To: Tom Lane
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] joining two tables slow due to sequential scan 

ok here is real db

the first  query I had seems to make no sense because it is only fast if
I limit the rows since almost all rows have status = 'AC'

second query
 tables both have about 10 million rows and it takes a long time as you
can see but this person only has approx 160 total documents


 QUERY PLAN  

---
 Hash Join  (cost=84813.14..1510711.97 rows=48387 width=555) (actual
time=83266.854..91166.315 rows=3 loops=1)
   Hash Cond: ("outer".documentidentifier =
"inner".dssdocumentidentifier)
   ->  Seq Scan on documentversions  (cost=0.00..269141.98 rows=9677398
width=415) (actual time=0.056..49812.459 rows=9677398 loops=1)
   ->  Hash  (cost=83660.05..83660.05 rows=48036 width=140) (actual
time=10.833..10.833 rows=3 loops=1)
 ->  Bitmap Heap Scan on clinicaldocuments
(cost=301.13..83660.05 rows=48036 width=140) (actual time=0.243..0.258
rows=3 loops=1)
   Recheck Cond: (patientidentifier = 690193)
   ->  Bitmap Index Scan on ix_cdocpid  (cost=0.00..301.13
rows=48036 width=0) (actual time=0.201..0.201 rows=3 loops=1)
 Index Cond: (patientidentifier = 690193)  Total
runtime: 91166.540 ms


Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED]
Sent: Friday, February 10, 2006 5:52 PM
To: Tim Jones
Cc: Scott Marlowe; Dave Dutcher; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] joining two tables slow due to sequential scan 

"Tim Jones" <[EMAIL PROTECTED]> writes:
> QUERY PLAN
> 'Hash Join  (cost=899.83..4384.17 rows=482 width=1350) (actual
> time=0.203..0.203 rows=0 loops=1)'
> ...
> 'Total runtime: 0.392 ms'

Hardly seems like evidence of a performance problem ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] slow query using sub select

2006-05-22 Thread Tim Jones
Hi,
  I am having a problem with a sub select query being kinda slow.  The
query is as follows:
 
select batterycode, batterydescription, observationdate from Battery t1
where patientidentifier=611802158 and observationdate = (select
max(observationdate) from Battery t2 where t2.batterycode=t1.batterycode
and patientidentifier=611802158) order by batterydescription.
 
explain analyze:


'Sort (cost=1697.16..1697.16 rows=1 width=31) (actual
time=910.721..910.729 rows=22 loops=1)'
' Sort Key: batterydescription'
' -> Index Scan using ix_battery_patient on battery t1
(cost=0.00..1697.15 rows=1 width=31) (actual time=241.836..910.580
rows=22 loops=1)'
' Index Cond: (patientidentifier = 611802158)'
' Filter: (observationdate = (subplan))'
' SubPlan'
' -> Aggregate (cost=26.25..26.26 rows=1 width=8) (actual
time=9.666..9.667 rows=1 loops=94)'
' -> Bitmap Heap Scan on battery t2 (cost=22.23..26.25 rows=1 width=8)
(actual time=9.606..9.620 rows=7 loops=94)'
' Recheck Cond: ((patientidentifier = 611802158) AND
((batterycode)::text = ($0)::text))'
' -> BitmapAnd (cost=22.23..22.23 rows=1 width=0) (actual
time=9.596..9.596 rows=0 loops=94)'
' -> Bitmap Index Scan on ix_battery_patient (cost=0.00..2.20 rows=58
width=0) (actual time=0.039..0.039 rows=94 loops=94)'
' Index Cond: (patientidentifier = 611802158)'
' -> Bitmap Index Scan on ix_battery_code (cost=0.00..19.78 rows=2794
width=0) (actual time=9.514..9.514 rows=27323 loops=94)'
' Index Cond: ((batterycode)::text = ($0)::text)'
'Total runtime: 910.897 ms'

Basically I am just trying to display the batterycode with its most
recent date.  Is there a better way to do this query ?

thanks
 
 
Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] slow query using sub select

2006-05-23 Thread Tim Jones
that worked like a champ nice call as always! 

thanks

Tim Jones
Healthcare Project Manager
Optio Software, Inc.
(770) 576-3555

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Monday, May 22, 2006 7:07 PM
To: Tim Jones
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] slow query using sub select 

"Tim Jones" <[EMAIL PROTECTED]> writes:
>   I am having a problem with a sub select query being kinda slow.  The

> query is as follows:
 
> select batterycode, batterydescription, observationdate from Battery 
> t1 where patientidentifier=611802158 and observationdate = (select
> max(observationdate) from Battery t2 where 
> t2.batterycode=t1.batterycode and patientidentifier=611802158) order
by batterydescription.

Yeah, this is essentially impossible for the planner to optimize,
because it doesn't see any way to de-correlate the subselect, so it does
it over again for every row.  You might find it works better if you cast
the thing as a SELECT DISTINCT ON problem (look at the "weather report"
example in the SELECT reference page).

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match