Re: [PERFORM] Optimising a query

2007-12-19 Thread Richard Huxton

Paul Lambert wrote:

Paul Lambert wrote:
snip 



This part of the query alone takes a significant part of the time:

SELECT DISTINCT ON (finbalance.dealer_id, finbalance.year_id, 
finbalance.subledger_id, finbalance.account_id)

finbalance.year_id AS year,
finbalance.dealer_id AS dealer_id,
lpad(finbalance.subledger_id::text,4,'0') AS subledger,
lpad(finbalance.account_id::text,4,'0') AS account
FROM finbalance

Runs with a query plan of :

Unique  (cost=30197.98..32782.33 rows=20675 width=16) (actual 
time=5949.695..7197.475 rows=17227 loops=1)
  -  Sort  (cost=30197.98..30714.85 rows=206748 width=16) (actual 
time=5949.691..7018.931 rows=206748 loops=1)

Sort Key: dealer_id, year_id, subledger_id, account_id
Sort Method:  external merge  Disk: 8880kB
-  Seq Scan on finbalance  (cost=0.00..8409.70 rows=206748 
width=16) (actual time=0.042..617.949 rows=206748 loops=1)

Total runtime: 7210.966 ms


So basically selecting from the finbalance table (approx. 206,000 
records) takes 10 seconds, even longer without the distinct clause in 
there - the distinct collapses the result-set down to around 17,000 rows.


Well, if you need to summarise all the rows then that plan is as good as 
any.


If you run this query very frequently, you'll probably want to look into 
keeping a summary table updated via triggers.


Before that though, try issuing a SET work_mem = '9MB' before running 
your query. If that doesn't change the plan step up gradually. You 
should be able to get the sort stage to happen in RAM rather than on 
disk (see Sort Method above). Don't go overboard though, your big 
query will probably use multiples of that value.


Taking out the two lpad's in there knocks off about 1500ms, so I can 
come up with something else for them - but I'd like to get the query as 
a whole down to under a second.


Stick the lpads in a query that wraps your DISTINCT query.

dealer_id, year_id, subledger_id and account_id are all part of the 
primary key on the finbalance table, so I don't think I can index them 
down any further.


A CLUSTER pkey-index ON table might help, but it will degrade as you 
update the finbalance table.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Optimising a query

2007-12-19 Thread Gregory Stark
Richard Huxton [EMAIL PROTECTED] writes:

 Paul Lambert wrote:

   -  Sort  (cost=30197.98..30714.85 rows=206748 width=16) (actual  
 time=5949.691..7018.931 rows=206748 loops=1)
 Sort Key: dealer_id, year_id, subledger_id, account_id
 Sort Method:  external merge  Disk: 8880kB

 Before that though, try issuing a SET work_mem = '9MB' before running your
 query. If that doesn't change the plan step up gradually. You should be able 
 to
 get the sort stage to happen in RAM rather than on disk (see Sort Method
 above). 

FWIW you'll probably need more than that. Try something more like 20MB.

Also, note you can change this with SET for just this connection and even just
this query and then reset it to the normal value (or use SET LOCAL). You don't
have to change it in the config file and restart the whole server.

Also, try replacing the DISTINCT with GROUP BY. The code path for DISTINCT
unfortunately needs a bit of cleaning up and isn't exactly equivalent to GROUP
BY. In particular it doesn't support hash aggregates which, if your work_mem
is large enough, might work for you here.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Optimising a query

2007-12-19 Thread Gregory Stark

 Also, try replacing the DISTINCT with GROUP BY. The code path for DISTINCT
 unfortunately needs a bit of cleaning up and isn't exactly equivalent to GROUP
 BY. In particular it doesn't support hash aggregates which, if your work_mem
 is large enough, might work for you here.

Sorry, strike that last suggestion. I was looking at the plan and forgot that
the query had DISTINCT ON. It is possible to replace DISTINCT ON with GROUP BY
but it's not going to be faster than the DISTINCT ON case since you'll need
the sort anyways.

Actually it's possible to do without the sort if you write some fancy
aggregate functions but for this large a query that's going to be awfully
complex.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Stuart Bishop
Hi.

We are looking at upgrading our primary servers. The final boxes will have
128GB ram, fast disk arrays and 4 CPUs.

We currently have some eval units with 8GB ram and crappy disk to let us
benchmark CPU choice. One box has 4 3GHz dual core Opterons with 1MB cache,
the other box ha 4 3GHz quad core Xeons with 4MB cache.

model name  : Intel(R) Xeon(R) CPU   X7350  @ 2.93GHz
cache size  : 4096 KB
model name  : Dual-Core AMD Opteron(tm) Processor 8222 SE
cache size  : 1024 KB

I haven't had a chance to play with the hardware myself yet. The sysadmins
have been running some benchmarks themselves though.

For every non PG related benchmark they have run, the Xeon wins by around 20%.

For pgbench (PG 8.2 running Ubuntu), the Opteron is getting about 6x TPS
over the Xeon (3000+ TPS on Opteron vs ~500 on Xeon). Things get a little
better for Xeon with PG 8.3 (570-540 TPS).

Does this match what other people are seeing or expect, or have we screwed
our benchmarks somehow?

Is this a PG specific win for Opteron, or will we see similar results with
other DBs?

Do people see wins for non-PG databases on Xeon, and are they as dramatic as
we are seeing for PG on Opteron?

With PG 8.2 and 8.3, is it still pretty much limited to 8 cores making 2 of
the quad core Xeons redundant or detrimental?

I expect we will be running this hardware for 8.2, 8.3 and 8.4. Anyone aware
of anything that might change the landscape for 8.4?

-- 
Stuart Bishop [EMAIL PROTECTED]
http://www.stuartbishop.net/



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Cédric Villemain

Stuart Bishop a écrit :

Hi.

We are looking at upgrading our primary servers. The final boxes will have
128GB ram, fast disk arrays and 4 CPUs.

We currently have some eval units with 8GB ram and crappy disk to let us
benchmark CPU choice. One box has 4 3GHz dual core Opterons with 1MB cache,
the other box ha 4 3GHz quad core Xeons with 4MB cache.

model name  : Intel(R) Xeon(R) CPU   X7350  @ 2.93GHz
cache size  : 4096 KB
model name  : Dual-Core AMD Opteron(tm) Processor 8222 SE
cache size  : 1024 KB

I haven't had a chance to play with the hardware myself yet. The sysadmins
have been running some benchmarks themselves though.

For every non PG related benchmark they have run, the Xeon wins by around 20%.

For pgbench (PG 8.2 running Ubuntu), the Opteron is getting about 6x TPS
over the Xeon (3000+ TPS on Opteron vs ~500 on Xeon). Things get a little
better for Xeon with PG 8.3 (570-540 TPS).

Does this match what other people are seeing or expect, or have we screwed
our benchmarks somehow?
  

http://tweakers.net/reviews/661/7

as an example

You can travel the website for other benchs... (there are about  dual 
and quad core)



Is this a PG specific win for Opteron, or will we see similar results with
other DBs?

Do people see wins for non-PG databases on Xeon, and are they as dramatic as
we are seeing for PG on Opteron?

With PG 8.2 and 8.3, is it still pretty much limited to 8 cores making 2 of
the quad core Xeons redundant or detrimental?

I expect we will be running this hardware for 8.2, 8.3 and 8.4. Anyone aware
of anything that might change the landscape for 8.4?

  



--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

begin:vcard
fn;quoted-printable:C=C3=A9dric Villemain
n;quoted-printable:Villemain;C=C3=A9dric
org:Dalibo
email;internet:[EMAIL PROTECTED]
title:Consultant PostgreSQL
tel;cell:+33 (0)6 74 15 56 53
x-mozilla-html:FALSE
url:dalibo.com
version:2.1
end:vcard


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Scott Marlowe
On Dec 19, 2007 6:04 AM, Stuart Bishop [EMAIL PROTECTED] wrote:
 Hi.

 We are looking at upgrading our primary servers. The final boxes will have
 128GB ram, fast disk arrays and 4 CPUs.

 We currently have some eval units with 8GB ram and crappy disk to let us
 benchmark CPU choice. One box has 4 3GHz dual core Opterons with 1MB cache,
 the other box ha 4 3GHz quad core Xeons with 4MB cache.

Imagine two scenarios.  In one you have an infinite number of hard
drives with an infinite amount of battery backed cache, and an
infinite I/O bandwidth.  In the other you have one disk.  Which one is
likely to be I/O bound?

Yep.  So, it's not likely you'll be able to do a realistic benchmark
of the CPUs with such a limited disk subsystem...

 For pgbench (PG 8.2 running Ubuntu), the Opteron is getting about 6x TPS
 over the Xeon (3000+ TPS on Opteron vs ~500 on Xeon). Things get a little
 better for Xeon with PG 8.3 (570-540 TPS).

pgbench is a mostly I/O bound benchmark.  What are your -c, -t and -s
settings btw?

It's would be much better if you could benchmark something like the
real load you'll be running in the future. Are you looking at
reporting, transactions, content management, etc...?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] viewing source code

2007-12-19 Thread Roberts, Jon

 -Original Message-
 From: Trevor Talbot [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, December 19, 2007 9:45 AM
 To: Joshua D. Drake
 Cc: Roberts, Jon; Kris Jurka; Merlin Moncure; Jonah H. Harris; Bill Moran;
 pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] viewing source code
 
 On 12/18/07, Joshua D. Drake [EMAIL PROTECTED] wrote:
 
  On Tue, 18 Dec 2007 10:05:46 -0600
  Roberts, Jon [EMAIL PROTECTED] wrote:
 
   If we are talking about enhancement requests, I would propose we
   create a role that can be granted/revoked that enables a user to see
   dictionary objects like source code.  Secondly, users should be able
   to see their own code they write but not others unless they have been
   granted this dictionary role.
 
  You are likely not going to get any support on an obfuscation front.
  This is an Open Source project :P
 
 Wait, what? This is a DBMS, with some existing security controls
 regarding the data users are able to access, and the proposal is about
 increasing the granularity of that control. Arbitrary function bodies
 are just as much data as anything else in the system.
 
 Obfuscation would be something like encrypting the function bodies so
 that even the owner or administrator cannot view or modify the code
 without significant reverse engineering. I mean, some people do want
 that sort of thing, but this proposal isn't even close.

Trevor, thank you for making the proposal clearer.

The more I thought about a counter proposal to put views on pg_proc, I
realized that isn't feasible either.  It would break functionality of
pgAdmin because users couldn't view their source code with the tool.

 
 Where on earth did obfuscation come from?

Don't know.  :)


This really is a needed feature to make PostgreSQL more attractive to
businesses.  A more robust security model that better follows commercial
products is needed for adoption.


Jon


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Optimising a query

2007-12-19 Thread Jeremy Harris

Paul Lambert wrote:
-  Merge Join 
(cost=35.56..19732.91 rows=12 width=48) (actual time=0.841..2309.828 
rows=206748 loops=1)


I'm no expert, but in the interests of learning: why is the
rows estimate so far out for this join?

Thanks,
   Jeremy


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Greg Smith

On Wed, 19 Dec 2007, Stuart Bishop wrote:


For pgbench (PG 8.2 running Ubuntu), the Opteron is getting about 6x TPS
over the Xeon (3000+ TPS on Opteron vs ~500 on Xeon). Things get a little
better for Xeon with PG 8.3 (570-540 TPS).


The 3000+ TPS figure is the correct one for a controller that can cache 
writes.  Around 500TPS is normal for a setup without one.  I suspect all 
you're testing is the difference between the I/O subsystem in the two 
serves, and it's probaby the case that the Opteron disk subsystem caches 
writes while the Xeon doesn't.  You haven't drawn any useful conclusions 
comparing Xeons and Opterons yet.


Warning:  the system with the write caching can easily be doing that 
incorrectly, in a way that can corrupt your database one day.  See 
http://momjian.us/main/writings/pgsql/sgml/wal-reliability.html for an 
intro and 
http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm for way 
more detail.


If you don't have a real disk setup, you can't use the default pgbench 
test and expect the results to be useful.  The main thing it does is write 
heavily in a way that makes the disk controller and associated I/O the 
bottleneck in most cases.


The only useful test you can do right now with pgbench is to pass it the 
-S parameter so that it does only reads instead.  That will give you a 
much better idea how the CPUs compare.  You still need to be careful about 
the database scale relative to the amount of RAM; at some point even the 
read test will be limited by disk parameters instead of CPU.  Take a look 
at http://www.westnet.com/~gsmith/content/postgresql/pgbench-scaling.htm 
for a tutorial on using pgbench to quantify read performance.  Note that 
the way I compute the sizes of things in there is a little difficult, one 
day I'm going to use some of the suggestions at 
http://andreas.scherbaum.la/blog/archives/282-table-size,-database-size.html 
to improve that and you should take a look there as well.


You'll also need to vary the number of clients a bit.  You should see the 
largest difference between the two servers with around 16 of them (where 
the Xeon system has a dedicated core for each while the Opteron has 2 
clients/core) while a useful spot to compare the maximum throughput of the 
servers will be around 64 clients.



With PG 8.2 and 8.3, is it still pretty much limited to 8 cores making 2 of
the quad core Xeons redundant or detrimental?


Where'd you get the idea 8 cores was a limit?  As cores go up eventually 
you run out of disk or memory bandwidth, but how that plays out is very 
application dependant and there's no hard line anywhere.



I expect we will be running this hardware for 8.2, 8.3 and 8.4. Anyone aware
of anything that might change the landscape for 8.4?


8.4 is only in the earliest of planning stages right now, nobody knows 
what that will bring yet.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 19 Dec 2007 13:50:29 -0500 (EST)
Greg Smith [EMAIL PROTECTED] wrote:

  With PG 8.2 and 8.3, is it still pretty much limited to 8 cores
  making 2 of the quad core Xeons redundant or detrimental?
 
 Where'd you get the idea 8 cores was a limit?  As cores go up
 eventually you run out of disk or memory bandwidth, but how that
 plays out is very application dependant and there's no hard line
 anywhere.

Actually this is not true. Although I have yet to test 8.3. It is
pretty much common knowledge that after 8 cores the acceleration of
performance drops with PostgreSQL...

This has gotten better every release. 8.1 for example handles 8 cores
very well, 8.0 didn't and 7.4 well :)

Sincerely,

Joshua D. Drake


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHaWoJATb/zqfZUUQRAgMeAJ9RS7BLAowXpJTbXuufJhIATj9gaACgrH6x
LRVDPbyIvn71ANra2yiXmgY=
=8QVl
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Scott Marlowe
On Dec 19, 2007 12:59 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On Wed, 19 Dec 2007 13:50:29 -0500 (EST)
 Greg Smith [EMAIL PROTECTED] wrote:

   With PG 8.2 and 8.3, is it still pretty much limited to 8 cores
   making 2 of the quad core Xeons redundant or detrimental?
 
  Where'd you get the idea 8 cores was a limit?  As cores go up
  eventually you run out of disk or memory bandwidth, but how that
  plays out is very application dependant and there's no hard line
  anywhere.

 Actually this is not true. Although I have yet to test 8.3. It is
 pretty much common knowledge that after 8 cores the acceleration of
 performance drops with PostgreSQL...

I thought Tom had played with some simple hacks that got the scaling
pretty close to linear for up to 16 cores earlier this year...

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 19 Dec 2007 13:03:32 -0600
Scott Marlowe [EMAIL PROTECTED] wrote:


  Actually this is not true. Although I have yet to test 8.3. It is
  pretty much common knowledge that after 8 cores the acceleration of
  performance drops with PostgreSQL...
 
 I thought Tom had played with some simple hacks that got the scaling
 pretty close to linear for up to 16 cores earlier this year...
 

See.. have not tested 8.3 above and 8.2 is better than 8.1 etc...

Sincerely,

Joshua D. Drake

- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHaWwCATb/zqfZUUQRApqZAJ92yx3LhMIF2nhI2LKrKAaxK2pqdgCffK9A
22rLNPRHHOaZAcvQTtLmRdA=
=GHVK
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Scott Marlowe
On Dec 19, 2007 1:07 PM, Joshua D. Drake [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On Wed, 19 Dec 2007 13:03:32 -0600
 Scott Marlowe [EMAIL PROTECTED] wrote:


   Actually this is not true. Although I have yet to test 8.3. It is
   pretty much common knowledge that after 8 cores the acceleration of
   performance drops with PostgreSQL...
 
  I thought Tom had played with some simple hacks that got the scaling
  pretty close to linear for up to 16 cores earlier this year...
 

 See.. have not tested 8.3 above and 8.2 is better than 8.1 etc...

Well, I'm not even sure if those got applied or were just Tom hacking
in the basement or, heck, my fevered imagination. :)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Ron Mayer
Joshua D. Drake wrote:
 Actually this is not true. Although I have yet to test 8.3. It is
 pretty much common knowledge that after 8 cores the acceleration of
 performance drops with PostgreSQL...
 
 This has gotten better every release. 8.1 for example handles 8 cores
 very well, 8.0 didn't and 7.4 well :)

I agree with the spirit of what you say, but are you overstating
things a bit?

Benchmarks I see[1] suggest that 8.1.2 scaled pretty reasonably to 16
cores (from the chart on page 9 in the link below).  But yeah, 8.0
scaled to maybe 2 cores if you're lucky. :-)

Agree with the rest of the things you say, tho.   It's getting
way better every recent release.



[1] 
http://www.pgcon.org/2007/schedule/attachments/22-Scaling%20PostgreSQL%20on%20SMP%20Architectures%20--%20An%20Update



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 19 Dec 2007 11:14:08 -0800
Ron Mayer [EMAIL PROTECTED] wrote:

 Joshua D. Drake wrote:
  Actually this is not true. Although I have yet to test 8.3. It is
  pretty much common knowledge that after 8 cores the acceleration of
  performance drops with PostgreSQL...
  
  This has gotten better every release. 8.1 for example handles 8
  cores very well, 8.0 didn't and 7.4 well :)
 
 I agree with the spirit of what you say, but are you overstating
 things a bit?

My point was :)... which that PDF actually illustrates is the gain
between say 2 cores and 8 cores is greater than 8 and 16 and even
less when you go beyond 16.

 
 Benchmarks I see[1] suggest that 8.1.2 scaled pretty reasonably to 16
 cores (from the chart on page 9 in the link below).  But yeah, 8.0
 scaled to maybe 2 cores if you're lucky. :-)

I really need to check this test out more though because their numbers
don't reflect mine. I wonder if that is per connection.

Sincerely,

Joshua D. Drake
- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD4DBQFHaXjxATb/zqfZUUQRAhlyAJijNIytenaBH2c5mEivFCT4qRmPAKCiW7Qn
2CDwNUBNd463Kz7G6n68yA==
=bnaL
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Greg Smith

On Wed, 19 Dec 2007, Ron Mayer wrote:


Benchmarks I see[1] suggest that 8.1.2 scaled pretty reasonably to 16
cores (from the chart on page 9 in the link below).  But yeah, 8.0
scaled to maybe 2 cores if you're lucky. :-)
[1] 
http://www.pgcon.org/2007/schedule/attachments/22-Scaling%20PostgreSQL%20on%20SMP%20Architectures%20--%20An%20Update


Thank you, I was looking for that one but couldn't find it again.  Note 
that those results are using a TPC-C variant, which is not the most CPU 
intensive of tests out there.  It's certainly possible that an application 
that has more processing to do per transaction (I'm thinking something 
more in the scientific computing database realm) could scale even better.


While I'd expect the bang per buck to go down quite a bit beyond 8 cores, 
I know I haven't seen any data on what new systems running 8.3 are capable 
of, and extrapolating performance rules of thumb based on old data is 
perilous.  Bottlenecks shift around in unexpected ways.  In that Unisys 
example, they're running 32-bit single core Xeons circa 2004 with 4MB of 
*L3* cache and there's evidence that scales 16 processors.  Current Xeons 
are considerably faster and you can get them with 4-8MB of *L2* cache.


What does that do to scalability?  Beats me.  Maybe since the individual 
CPUs are faster, you bottleneck on something else way before you can use 
16 of them usefully.  Maybe the much better CPU cache means there's less 
reliance on the memory bus and they scale better.  It depends a lot on the 
CPU vs. memory vs. disk requirements of your app, which is what I was 
suggesting before.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Optimising a query

2007-12-19 Thread Paul Lambert

Gregory Stark wrote:

Richard Huxton [EMAIL PROTECTED] writes:


Paul Lambert wrote:


  -  Sort  (cost=30197.98..30714.85 rows=206748 width=16) (actual  
time=5949.691..7018.931 rows=206748 loops=1)
Sort Key: dealer_id, year_id, subledger_id, account_id
Sort Method:  external merge  Disk: 8880kB



Before that though, try issuing a SET work_mem = '9MB' before running your
query. If that doesn't change the plan step up gradually. You should be able to
get the sort stage to happen in RAM rather than on disk (see Sort Method
above). 


FWIW you'll probably need more than that. Try something more like 20MB.

Also, note you can change this with SET for just this connection and even just
this query and then reset it to the normal value (or use SET LOCAL). You don't
have to change it in the config file and restart the whole server.

Also, try replacing the DISTINCT with GROUP BY. The code path for DISTINCT
unfortunately needs a bit of cleaning up and isn't exactly equivalent to GROUP
BY. In particular it doesn't support hash aggregates which, if your work_mem
is large enough, might work for you here.




I changed work_mem to 20MB per suggestion and that knocks the query time 
 down to just over 6 seconds... still a bit fast for my liking, but any 
higher work_mem doesn't change the result - i.e. 30, 40, 50MB all give 
just over 6 seconds.


The explain analyze shows all the sorts taking place in memory now as 
quicksorts rather than on-disk merge in the previous query plan, so I'll 
make a permanent change to the config to set work_mem to 20MB.


I've also changed the inner-most select into a two level select with the 
lpad's on the outer so they are not being evaluated on every row, just 
the collapsed rows - that accounted for about 1 second of the overall 
time reduction.


Would increasing the stats of anything on any of these tables speed 
things up any more?


--
Paul Lambert
Database Administrator
AutoLedgers - A Reynolds  Reynolds Company

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Gregory Stark
Joshua D. Drake [EMAIL PROTECTED] writes:

 It is pretty much common knowledge that

I think we have too much common knowledge.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Minimizing dead tuples caused by update triggers

2007-12-19 Thread Dan Harris
I've been fighting with the common workarounds for inadequate response  
times on select count(*) and min(),max() on tables with tens of  
millions of rows for quite a while now and understand the reasons for  
the table scans.


I have applications that regularly poll a table ( ideally, the more  
frequent, the better ) to learn the most recent data inside it as well  
as the # of rows in it ( among a few other things ).  As the databases  
have grown in size, these summarizations could no longer be done on  
the fly, so I wrote a database wrapper API that tracks those values  
internally.


This wrapper has grown very complex and is difficult to manage across  
different systems.  What I'd like to do instead is implement triggers  
for insert, updates, and deletes to check and/or replace a value in a  
table_stats, representing table count, min/max dates, and a few  
other costly operations.. that can then be queried in short order.  I  
know this is a fairly common thing to do.


The thing that concerns me is dead tuples on the table_stats table.  I  
believe that every insert of new data in one of the monitored tables  
will result in an UPDATE of the table_stats table.  When thousands  
( or millions ) of rows are inserted, the select performance ( even  
trying with an index ) on table_stats slows down in a hurry.  If I  
wrap the inserts into large transactions, will it only call the update  
on table_states when I commit?


Obviously I want to vacuum this table regularly to recover this.  The  
problem I'm running into is contention between VACUUM ( not full ) and  
pg_dump ( version 8.0.12 ).  My system backups takes 6 hours to run  
pg_dump on a 400GB cluster directory.  If the vacuum command fires  
during the dump, it forces an exclusive lock and any queries will hang  
until pg_dump finishes.


If I have to wait until pg_dump is finished before issuing the VACUUM  
command, everything slows down significantly as the dead tuples in  
table_stats pile up.


What strategy could I employ to either:

1. resolve the contention between pg_dump and vacuum, or
2. reduce the dead tuple pile up between vacuums

Thanks for reading

-Dan

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Minimizing dead tuples caused by update triggers

2007-12-19 Thread Tom Lane
Dan Harris [EMAIL PROTECTED] writes:
 The thing that concerns me is dead tuples on the table_stats table.  I  
 believe that every insert of new data in one of the monitored tables  
 will result in an UPDATE of the table_stats table.  When thousands  
 ( or millions ) of rows are inserted, the select performance ( even  
 trying with an index ) on table_stats slows down in a hurry.

Yup.  FWIW, 8.3's HOT tuple updates might help this quite a lot.
Not a lot to be done about it in 8.0.x though :-(

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Decibel!

On Dec 19, 2007, at 4:54 PM, Gregory Stark wrote:

Joshua D. Drake [EMAIL PROTECTED] writes:


It is pretty much common knowledge that


I think we have too much common knowledge.



Yeah. For a lot of folks it's still common knowledge that you should  
only set shared_buffers to 10% of memory...

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Minimizing dead tuples caused by update triggers

2007-12-19 Thread Decibel!

On Dec 19, 2007, at 6:39 PM, Tom Lane wrote:
The thing that concerns me is dead tuples on the table_stats  
table.  I

believe that every insert of new data in one of the monitored tables
will result in an UPDATE of the table_stats table.  When thousands
( or millions ) of rows are inserted, the select performance ( even
trying with an index ) on table_stats slows down in a hurry.


Yup.  FWIW, 8.3's HOT tuple updates might help this quite a lot.
Not a lot to be done about it in 8.0.x though :-(



A work-around others have used is to have the trigger just insert  
into a 'staging' table and then periodically take the records from  
that table and summarize them somewhere else. You still have a vacuum  
concern on the staging table, but the advantage is that you trigger  
path is a simple insert instead of an update, which is effectively a  
delete and an insert.


This is a case where a cron'd vacuum that runs once a minute is  
probably a wise idea.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 19 Dec 2007 19:51:13 -0600
Decibel! [EMAIL PROTECTED] wrote:

 On Dec 19, 2007, at 4:54 PM, Gregory Stark wrote:
  Joshua D. Drake [EMAIL PROTECTED] writes:
 
  It is pretty much common knowledge that
 
  I think we have too much common knowledge.
 
 
 Yeah. For a lot of folks it's still common knowledge that you should  
 only set shared_buffers to 10% of memory...

Sometimes that's true ;).

Joshua D. Drake

- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHae7UATb/zqfZUUQRAsKAAKCkDNtWarrHT4yDrVn7Bs3GGMRBNACfd2+B
8HDzjIF2OO4aS3AZ7+7muAs=
=STaP
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Tom Lane
Scott Marlowe [EMAIL PROTECTED] writes:
 Well, I'm not even sure if those got applied or were just Tom hacking
 in the basement or, heck, my fevered imagination. :)

For the record, I hack in the attic ... or what I tell the IRS is my
third-floor office ...

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate