[PERFORM] Why the difference in plans ??

2006-09-15 Thread Joost Kraaijeveld
Hi,

I have two table: customers and salesorders. salesorders have a foreign
key to the customer

If I run this query:

SELECT 
salesOrders.objectid, 
salesOrders.ordernumber, 
salesOrders.orderdate, 
customers.objectid, 
customers.customernumber, 
customers.lastname 
FROM prototype.salesorders 
INNER JOIN prototype.customers ON ( 
customers.objectid = salesorders.customer 
) 
where 
lastname ilike 'Boonk' 
order by ordernumber asc LIMIT 1


WITHOUT LIMIT 1 this query plan is executed (EXPLAIN ANALYZE):


Sort  (cost=41811.90..41812.78 rows=353 width=103) (actual 
time=623.855..623.867 rows=7 loops=1)
  Sort Key: salesorders.ordernumber
  -  Nested Loop  (cost=2.15..41796.96 rows=353 width=103) (actual 
time=0.166..623.793 rows=7 loops=1)
-  Seq Scan on customers  (cost=0.00..21429.44 rows=118 width=55) 
(actual time=0.037..623.325 rows=5 loops=1)
  Filter: (lastname ~~* 'Boonk'::text)
-  Bitmap Heap Scan on salesorders  (cost=2.15..172.06 rows=44 
width=88) (actual time=0.075..0.079 rows=1 loops=5)
  Recheck Cond: (outer.objectid = salesorders.customer)
  -  Bitmap Index Scan on orders_customer  (cost=0.00..2.15 
rows=44 width=0) (actual time=0.066..0.066 rows=1 loops=5)
Index Cond: (outer.objectid = salesorders.customer)
Total runtime: 624.051 ms



With the limit this query plan is used (EXPLAIN ANALYZE):

Limit  (cost=0.00..18963.24 rows=1 width=103) (actual time=18404.730..18404.732 
rows=1 loops=1)
  -  Nested Loop  (cost=0.00..6694025.41 rows=353 width=103) (actual 
time=18404.723..18404.723 rows=1 loops=1)
-  Index Scan using prototype_orders_ordernumber on salesorders  
(cost=0.00..37263.14 rows=1104381 width=88) (actual time=26.715..1862.408 
rows=607645 loops=1)
-  Index Scan using pk_prototype_customers on customers  
(cost=0.00..6.02 rows=1 width=55) (actual time=0.023..0.023 rows=0 loops=607645)
  Index Cond: (customers.objectid = outer.customer)
  Filter: (lastname ~~* 'Boonk'::text)
Total runtime: 18404.883 ms


Both tables are freshly fully vacuumed analyzed.

Why the difference and can I influence the result so that the first
query plan (which is the fastest) is actually used in both cases (I
would expect that the limit would be done after the sort?)? 

TIA

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] High CPU Load

2006-09-15 Thread Markus Schaber
Hi, Jérôme,

Jérôme BENOIS wrote:

 max_connections = 512

Do you really have that much concurrent connections? Then you should
think about getting a larger machine, probably.

You will definitely want to play with commit_delay and commit_siblings
settings in that case, especially if you have write access.

 work_mem = 65536
 effective_cache_size = 131072

hmm, 131072*8*1024 + 512*65536*1024 = 35433480192 - thats 33 Gig of
Memory you assume here, not counting OS usage, and the fact that certain
queries can use up a multiple of work_mem.

Even on amachine that big, I'd be inclined to dedicate more memory to
caching, and less to the backends, unless specific needs dictate it. You
could try to use sqlrelay or pgpool to cut down the number of backends
you need.

 My Server is Dual Xeon 3.06GHz

For xeons, there were rumours about context switch storms which kill
performance.

 with 2 Go RAM and good SCSI disks.

For 2 Gigs of ram, you should cut down the number of concurrent backends.

Does your machine go into swap?

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Vacuums on large busy databases

2006-09-15 Thread Markus Schaber
Hi, Francisco,

Francisco Reyes wrote:

 I am looking to either improve the time of the vacuum or decrease it's
 impact on the loads.
 Are the variables:
 #vacuum_cost_delay = 0  # 0-1000 milliseconds
 #vacuum_cost_page_hit = 1   # 0-1 credits
 #vacuum_cost_page_miss = 10 # 0-1 credits
 #vacuum_cost_page_dirty = 20# 0-1 credits
 #vacuum_cost_limit = 200# 0-1 credits

Just to avoid a silly mistake:

You pasted those settings with # sign, that means that PostgreSQL does
treat them as comments, and uses the defaults instead. You should make
shure that you use real settings in your config.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Why the difference in plans ??

2006-09-15 Thread Tom Lane
Joost Kraaijeveld [EMAIL PROTECTED] writes:
 Why the difference and can I influence the result so that the first
 query plan (which is the fastest) is actually used in both cases (I
 would expect that the limit would be done after the sort?)? 

It likes the second plan because 6694025.41/353  41812.78.  It would
probably be right, too, if the number of matching rows were indeed 353,
but it seems there are only 7.  Try increasing your statistics target
and re-analyzing.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Optimize SQL

2006-09-15 Thread Pallav Kalva
Hi,

   Is there anyway we can optimize this sql ? it is doing full table
scan on listing and address table . Postgres version 8.0.2

Thanks!
Pallav.


explain analyze
select listing0_.listingid as col_0_0_, 
getmaxdate(listing0_.lastupdate, max(addressval2_.createdate)) as col_1_0_
from listing.listing listing0_
left outer join listing.address listingadd1_
on listing0_.fkbestaddressid=listingadd1_.addressid
left outer join listing.addressvaluation addressval2_
on listingadd1_.addressid=addressval2_.fkaddressid
where listing0_.lastupdate'2006-09-15 08:31:26.927'
and listing0_.lastupdate=current_timestamp
or addressval2_.createdate'2006-09-15 08:31:26.927' and
addressval2_.createdate=current_timestamp
group by listing0_.listingid , listing0_.lastupdate
order by getmaxdate(listing0_.lastupdate, max(addressval2_.createdate))
asc limit 10;


Limit  (cost=2399501.49..2399501.51 rows=10 width=20) (actual 
time=414298.076..414298.174 rows=10 loops=1)
   -  Sort  (cost=2399501.49..2410707.32 rows=4482333 width=20) (actual 
time=414298.068..414298.098 rows=10 loops=1)
 Sort Key: getmaxdate(listing0_.lastupdate, 
max(addressval2_.createdate))
 -  GroupAggregate  (cost=1784490.47..1851725.47 rows=4482333 
width=20) (actual time=414212.926..414284.927 rows=2559 loops=1)
   -  Sort  (cost=1784490.47..1795696.31 rows=4482333 width=20) 
(actual time=414174.678..414183.536 rows=2563 loops=1)
 Sort Key: listing0_.listingid, listing0_.lastupdate
 -  Merge Right Join  (cost=1113947.32..1236714.45 
rows=4482333 width=20) (actual time=273257.256..414163.920 rows=2563 loops=1)
   Merge Cond: (outer.fkaddressid = inner.addressid)
   Filter: (((inner.lastupdate  '2006-09-15 
08:31:26.927'::timestamp without time zone) AND (inner.lastupdate = 
('now'::text)::timestamp(6) with time zone)) OR ((outer.createdate  
'2006-09-15 08:31:26.927'::timestamp without time zone) AND (outer.createdate 
= ('now'::text)::timestamp(6) with time zone)))
   -  Index Scan using 
idx_addressvaluation_fkaddressid on addressvaluation addressval2_  
(cost=0.00..79769.55 rows=947056 width=12) (actual time=0.120..108240.633 
rows=960834 loops=1)
   -  Sort  (cost=1113947.32..1125153.15 rows=4482333 
width=16) (actual time=256884.646..275823.217 rows=5669719 loops=1)
 Sort Key: listingadd1_.addressid
 -  Hash Left Join  (cost=228115.38..570557.39 
rows=4482333 width=16) (actual time=93874.356..205054.946 rows=4490963 loops=1)
   Hash Cond: (outer.fkbestaddressid = 
inner.addressid)
   -  Seq Scan on listing listing0_  
(cost=0.00..112111.33 rows=4482333 width=16) (actual time=0.026..25398.685 
rows=4490963 loops=1)
   -  Hash  (cost=18.70..18.70 
rows=6990270 width=4) (actual time=93873.659..93873.659 rows=0 loops=1)
 -  Seq Scan on address 
listingadd1_  (cost=0.00..18.70 rows=6990270 width=4) (actual 
time=13.256..69441.056 rows=6990606 loops=1)


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] RAID 0 not as fast as expected

2006-09-15 Thread Luke Lonergan
Greg, Josh,

Something I found out while doing this - lvm (and lvm2) slows the block
stream down dramatically.  At first I was using it for convenience sake to
implement partitions on top of the md devices, but I found I was stuck at
about 700 MB/s.  Removing lvm2 from the picture allowed me to get within
chucking distance of 2GB/s.

When we first started working with Solaris ZFS, we were getting about
400-600 MB/s, and after working with the Solaris Engineering team we now get
rates approaching 2GB/s.  The updates needed to Solaris are part of the
Solaris 10 U3 available in October (and already in Solaris Express, aka
Solaris 11).

- Luke   


On 9/15/06 5:43 AM, Spiegelberg, Greg [EMAIL PROTECTED] wrote:

 That's an all PCI-X box which makes sense.  There are 6 SATA controllers
 in that little beastie also.  You can always count on Sun to provide
 over engineered boxes.
 
  
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 Joshua D. Drake
 Sent: Friday, September 15, 2006 12:01 AM
 To: Luke Lonergan
 Cc: Craig A. James; pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] RAID 0 not as fast as expected
 
 Luke Lonergan wrote:
 Josh,
 
 On 9/14/06 8:47 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:
 
 I've obtained 1,950 MB/s using Linux software RAID on SATA drives.
 With what? :)
 
 Sun X4500 (aka Thumper) running stock RedHat 4.3 (actually
 CentOS 4.3) 
 with XFS and the linux md driver without lvm.  Here is a
 summary of the results:
 
 
 
 Good god!
 
  
  Read Test 
  RAID Level Max Readahead (KB) RAID Chunksize Max Readahead
 on Disks 
 (KB) Max Time (s)  Read Bandwidth (MB/s)  0 65536 64 256 16.689
 1,917.43  0 4096 64 256 21.269  1,504.54  0 65536 256 256 17.967
 1,781.04  0 2816 256 256 18.835  1,698.96  0 65536 1024 256 18.538
 1,726.18  0 65536 64 512 18.295  1,749.11  0 65536 64 256 18.931
 1,690.35  0 65536 64 256 18.873  1,695.54  0 64768 64 256 18.545
 1,725.53  0 131172 64 256 18.548  1,725.25  0 131172 64
 65536 19.046  
 1,680.14  0 131172 64 524288 18.125  1,765.52  0 131172 64 1048576
 18.701  1,711.14
  5 2560 64 256 39.933  801.34
  5 16777216 64 256 37.76  847.46
  5 524288 64 256 53.497  598.16
  5 65536 32 256 38.472  831.77
  5 65536 32 256 38.004  842.02
  5 65536 32 256 37.884  844.68
  5 2560 16 256 41.39  773.13
  5 65536 16 256 48.902  654.37
  10 65536 64 256 83.256  384.36
  1+0 65536 64 256 19.394  1,649.99
  1+0 65536 64 256 19.047  1,680.05
  1+0 65536 64 256 19.195  1,667.10
  1+0 65536 64 256 18.806  1,701.58
  1+0 65536 64 256 18.848  1,697.79
  1+0 65536 64 256 18.371  1,741.88
  1+0 65536 64 256 21.446  1,492.12
  1+0 65536 64 256 20.254  1,579.93
 
 
 
 
 -- 
 
 === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
 Providing the most comprehensive  PostgreSQL solutions since 1997
   http://www.commandprompt.com/
 
 
 
 ---(end of
 broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Optimize SQL

2006-09-15 Thread Tom Lane
Pallav Kalva [EMAIL PROTECTED] writes:
 select listing0_.listingid as col_0_0_, 
 getmaxdate(listing0_.lastupdate, max(addressval2_.createdate)) as col_1_0_
 from listing.listing listing0_
 left outer join listing.address listingadd1_
 on listing0_.fkbestaddressid=listingadd1_.addressid
 left outer join listing.addressvaluation addressval2_
 on listingadd1_.addressid=addressval2_.fkaddressid
 where listing0_.lastupdate'2006-09-15 08:31:26.927'
 and listing0_.lastupdate=current_timestamp
 or addressval2_.createdate'2006-09-15 08:31:26.927' and
 addressval2_.createdate=current_timestamp
 group by listing0_.listingid , listing0_.lastupdate
 order by getmaxdate(listing0_.lastupdate, max(addressval2_.createdate))
 asc limit 10;

If that WHERE logic is actually what you need, then getting this query
to run quickly seems pretty hopeless.  The database must form the full
outer join result: it cannot discard any listing0_ rows, even if they
have lastupdate outside the given range, because they might join to
addressval2_ rows within the given createdate range.  And conversely
it can't discard any addressval2_ rows early.  Is there any chance
that you wanted AND not OR there?

One thing that might help a bit is to change the join order:

from listing.listing listing0_
left outer join listing.addressvaluation addressval2_
on listing0_.fkbestaddressid=addressval2_.fkaddressid
left outer join listing.address listingadd1_
on listing0_.fkbestaddressid=listingadd1_.addressid

so that at least the WHERE clause can be applied before having joined to
listingadd1_.  The semantics of your ON clauses are probably wrong anyway
--- did you think twice about what happens if there's no matching
listingadd1_ entry?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Why the difference in plans ??

2006-09-15 Thread Joost Kraaijeveld
On Fri, 2006-09-15 at 10:08 -0400, Tom Lane wrote:
 but it seems there are only 7.  Try increasing your statistics target
 and re-analyzing.

Do you mean with increasing my statistics target changing the value of
default_statistics_target = 10 to a bigger number? If so, changing it
to 900 did not make any difference (PostgreSQL restarted, vacuumed
analysed etc).

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Why the difference in plans ??

2006-09-15 Thread Tom Lane
Joost Kraaijeveld [EMAIL PROTECTED] writes:
 Do you mean with increasing my statistics target changing the value of
 default_statistics_target = 10 to a bigger number? If so, changing it
 to 900 did not make any difference (PostgreSQL restarted, vacuumed
 analysed etc).

Hm, did the 353 rowcount estimate not change at all?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Optimize SQL

2006-09-15 Thread Arjen van der Meijden

On 15-9-2006 17:53 Tom Lane wrote:

If that WHERE logic is actually what you need, then getting this query
to run quickly seems pretty hopeless.  The database must form the full
outer join result: it cannot discard any listing0_ rows, even if they
have lastupdate outside the given range, because they might join to
addressval2_ rows within the given createdate range.  And conversely
it can't discard any addressval2_ rows early.  Is there any chance
that you wanted AND not OR there?


Couldn't it also help to do something like this?

SELECT ..., (SELECT MAX(createdate) FROM addressval ...)
FROM listing l
  LEFT JOIN address ...
WHERE l.id IN (SELECT id FROM listing WHERE lastupdate ...
  UNION
SELECT id FROM listing JOIN addressval a ON ... WHERE 
a.createdate ...)



Its not pretty, but looking at the explain only a small amount of 
records match both clauses. So this should allow the use of indexes for 
both the createdate-clause and the lastupdate-clause.


Best regards,

Arjen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] RAID 0 not as fast as expected

2006-09-15 Thread Bucky Jordan
When we first started working with Solaris ZFS, we were getting about
400-600 MB/s, and after working with the Solaris Engineering team we
now get
rates approaching 2GB/s.  The updates needed to Solaris are part of the
Solaris 10 U3 available in October (and already in Solaris Express, aka
Solaris 11).

Luke,

What other file systems have you had good success with? Solaris would be
nice, but it looks like I'm stuck running on FreeBSD (6.1, amd64) so
UFS2 would be the default. Not sure about XFS on BSD, and I'm not sure
at the moment that ext2/3 provide enough benefit over UFS to spend much
time on. 

Also, has anyone had any experience with gmirror (good or bad)? I'm
thinking of trying to use it to stripe two hardware mirrored sets since
HW RAID10 wasn't doing as well as I had hoped (Dell Perc5/I controller).
For a 4 disk RAID 10 (10k rpm SAS/SCSI disks) what would be a good
target performance number? Right now, dd shows 224 MB/s. 

And lastly, for a more OLAP style database, would I be correct in
assuming that sequential access speed would be more important than is
normally the case? (I have a relatively small number of connections, but
each running on pretty large data sets). 

Thanks,

Bucky

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Performance of IN (...) vs. = ANY array[...]

2006-09-15 Thread Benjamin Minshall

Greetings:

I'm running 8.1.4, and have noticed major differences in execution time 
for plpgsql functions running queries that differ only in use of an 
array such as:



slower_function( vals integer[] )
[query] WHERE id = ANY vals;


faster_function( vals integer[] )
vals_text := array_to_string( vals, ',' )
EXECUTE '[query] WHERE id IN (' || vals_text || ')';


In general, there are about 10 integers in the lookup set on average and 
50 max.


What are the advantages or disadvantages of using arrays in this 
situation?  The = ANY array method makes plpgsql development cleaner, 
but seems to really lack performance in certain cases.  What do you 
recommend as the preferred method?


Thanks for your comments.

--
Benjamin Minshall [EMAIL PROTECTED]
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PERFORM] Performance of IN (...) vs. = ANY array[...]

2006-09-15 Thread Tom Lane
Benjamin Minshall [EMAIL PROTECTED] writes:
 What are the advantages or disadvantages of using arrays in this 
 situation?  The = ANY array method makes plpgsql development cleaner, 
 but seems to really lack performance in certain cases.

In existing releases, the form with IN (list-of-scalar-constants)
can be optimized into indexscan(s), but = ANY (array) isn't.

8.2 will treat them equivalently (in fact, it converts IN (...) to
= ANY (ARRAY[...]) !).  So depending on your time horizon, you might
wish to stick with whichever is cleaner for your calling code.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] RAID 0 not as fast as expected

2006-09-15 Thread Luke Lonergan
Josh,

On 9/14/06 8:47 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:

 I've obtained 1,950 MB/s using Linux software RAID on SATA drives.
 
 With what? :)

Sun X4500 (aka Thumper) running stock RedHat 4.3 (actually CentOS 4.3) with
XFS and the linux md driver without lvm.  Here is a summary of the results:

 
 Read Test  
 RAID Level Max Readahead (KB) RAID Chunksize Max Readahead on Disks (KB)
Max Time (s)  Read Bandwidth (MB/s)
 0 65536 64 256 16.689  1,917.43
 0 4096 64 256 21.269  1,504.54
 0 65536 256 256 17.967  1,781.04
 0 2816 256 256 18.835  1,698.96
 0 65536 1024 256 18.538  1,726.18
 0 65536 64 512 18.295  1,749.11
 0 65536 64 256 18.931  1,690.35
 0 65536 64 256 18.873  1,695.54
 0 64768 64 256 18.545  1,725.53
 0 131172 64 256 18.548  1,725.25
 0 131172 64 65536 19.046  1,680.14
 0 131172 64 524288 18.125  1,765.52
 0 131172 64 1048576 18.701  1,711.14
 5 2560 64 256 39.933  801.34
 5 16777216 64 256 37.76  847.46
 5 524288 64 256 53.497  598.16
 5 65536 32 256 38.472  831.77
 5 65536 32 256 38.004  842.02
 5 65536 32 256 37.884  844.68
 5 2560 16 256 41.39  773.13
 5 65536 16 256 48.902  654.37
 10 65536 64 256 83.256  384.36
 1+0 65536 64 256 19.394  1,649.99
 1+0 65536 64 256 19.047  1,680.05
 1+0 65536 64 256 19.195  1,667.10
 1+0 65536 64 256 18.806  1,701.58
 1+0 65536 64 256 18.848  1,697.79
 1+0 65536 64 256 18.371  1,741.88
 1+0 65536 64 256 21.446  1,492.12
 1+0 65536 64 256 20.254  1,579.93  




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings