Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-08-17 Thread Guy Rouillier
I originally posted the question below back in Dec 2006, and many 
helpful suggestions resulted.  Unfortunately, since this was a closet 
effort, my official duties pushed further exploration to the back 
burner, then I lost my original test environment.  So while I can no 
longer compare to BigDBMS, I've just made some discoveries that I 
thought others might find helpful.


The app (which I inherited) was implemented making exhaustive use of 
stored procedures.  All inserts and updates are done using procs.  When 
configuration changes produced no noticeable improvements in 
performance, I turned to the application architecture.  In a new 
environment, I updated an insert/update intensive part of the app to use 
embedded insert and update statements instead of invoking stored 
procedures that did the same work.  All the remaining code, database 
implementation, hardware, etc remains the same.


The results were significant.  Running a repeatable test set of data 
produced the following results:


With stored procs: 2595 seconds
With embedded inserts/updates: 991 seconds

So at least in this one scenario, it looks like the extensive use of 
stored procs is contributing significantly to long run times.


Guy Rouillier wrote:
I don't want to violate any license agreement by discussing performance, 
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as 
BigDBMS here.


I'm trying to convince my employer to replace BigDBMS with PostgreSQL 
for at least some of our Java applications.  As a proof of concept, I 
started with a high-volume (but conceptually simple) network data 
collection application.  This application collects files of 5-minute 
usage statistics from our network devices, and stores a raw form of 
these stats into one table and a normalized form into a second table. We 
are currently storing about 12 million rows a day in the normalized 
table, and each month we start new tables.  For the normalized data, the 
app inserts rows initialized to zero for the entire current day first 
thing in the morning, then throughout the day as stats are received, 
executes updates against existing rows.  So the app has very high update 
activity.


In my test environment, I have a dual-x86 Linux platform running the 
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and 
PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays 
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those 
familiar with these devices.)  The arrays are set up with RAID5.  So I'm 
working with a consistent hardware platform for this comparison.  I'm 
only processing a small subset of files (144.)


BigDBMS processed this set of data in 2 seconds, with all foreign 
keys in place.  With all foreign keys in place, PG took 54000 seconds to 
complete the same job.  I've tried various approaches to autovacuum 
(none, 30-seconds) and it doesn't seem to make much difference.  What 
does seem to make a difference is eliminating all the foreign keys; in 
that configuration, PG takes about 3 seconds.  Better, but BigDBMS 
still has it beat significantly.


I've got PG configured so that that the system database is on disk array 
2, as are the transaction log files.  The default table space for the 
test database is disk array 3.  I've got all the reference tables (the 
tables to which the foreign keys in the stats tables refer) on this 
array.  I also store the stats tables on this array.  Finally, I put the 
indexes for the stats tables on disk array 4.  I don't use disk array 1 
because I believe it is a software array.


I'm out of ideas how to improve this picture any further.  I'd 
appreciate some suggestions.  Thanks.





--
Guy Rouillier

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-10 Thread Jim C. Nasby
On Tue, Jan 09, 2007 at 09:10:51AM -0800, Jeff Frost wrote:
 On Tue, 9 Jan 2007, Jim C. Nasby wrote:
 
 On Thu, Dec 28, 2006 at 02:15:31PM -0800, Jeff Frost wrote:
 When benchmarking various options for a new PG server at one of my 
 clients,
 I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be
 fastest to have ext2 for the WAL.  The winning time was 157m46.713s for
 ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3
 data=writeback.  This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/
 1GB BBU cache.  This config benched out faster than a 6disk RAID10 + 2 
 disk
 RAID1 for those of you who have been wondering if the BBU write back cache
 mitigates the need for separate WAL (at least on this workload).  Those 
 are
 the fastest times for each config, but ext2 WAL was always faster than the
 other two options.  I didn't test any other filesystems in this go around.
 
 Uh, if I'm reading this correctly, you're saying that WAL on a separate
 ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of
 ~158.5 minutes, or 0.4%? Is that even above the noise for your
 measurements? I suspect the phase of the moon might play a bigger role
 ;P
 
 That's what I thought too...cept I ran it 20 times and ext2 won by that 
 margin every time, so it was quite repeatable. :-/

Even so, you've got to really be hunting for performance to go through
the hassle of different filesystems just to gain 0.4%... :)
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-10 Thread Jeff Frost

On Wed, 10 Jan 2007, Jim C. Nasby wrote:


RAID1 for those of you who have been wondering if the BBU write back cache
mitigates the need for separate WAL (at least on this workload).  Those
are
the fastest times for each config, but ext2 WAL was always faster than the
other two options.  I didn't test any other filesystems in this go around.


Uh, if I'm reading this correctly, you're saying that WAL on a separate
ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of
~158.5 minutes, or 0.4%? Is that even above the noise for your
measurements? I suspect the phase of the moon might play a bigger role
;P


That's what I thought too...cept I ran it 20 times and ext2 won by that
margin every time, so it was quite repeatable. :-/


Even so, you've got to really be hunting for performance to go through
the hassle of different filesystems just to gain 0.4%... :)


Indeed, but actually, I did the math again and it appears that it saves close 
to 2 minutes versus one big ext3.  I guess the moral of the story is that 
having a separate pg_xlog even on the same physical volume tends to be 
slightly faster for write oriented workloads.  Ext2 is slightly faster than 
ext3, but of course you could likely go with another filesystem yet and be 
even slightly faster as well. :-)


I guess the real moral of the story is that you can probably use one big ext3 
with the default config and it won't matter much more than 1-2% if you have a 
BBU.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-09 Thread Jim C. Nasby
On Thu, Dec 28, 2006 at 02:15:31PM -0800, Jeff Frost wrote:
 When benchmarking various options for a new PG server at one of my clients, 
 I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be 
 fastest to have ext2 for the WAL.  The winning time was 157m46.713s for 
 ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3 
 data=writeback.  This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/ 
 1GB BBU cache.  This config benched out faster than a 6disk RAID10 + 2 disk 
 RAID1 for those of you who have been wondering if the BBU write back cache 
 mitigates the need for separate WAL (at least on this workload).  Those are 
 the fastest times for each config, but ext2 WAL was always faster than the 
 other two options.  I didn't test any other filesystems in this go around.

Uh, if I'm reading this correctly, you're saying that WAL on a separate
ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of
~158.5 minutes, or 0.4%? Is that even above the noise for your
measurements? I suspect the phase of the moon might play a bigger role
;P
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-09 Thread Jim C. Nasby
On Sun, Jan 07, 2007 at 11:26:01PM -0500, Guy Rouillier wrote:
 Ok, I ran with the settings below, but with
 
 shared_buffers=768MB
 effective_cache_size=2048MB
 fsync=on
 
 This run took 29000 seconds.  I'm beginning to think configuration 
 changes are not going to buy significant additional improvement.  Time 
 to look at the app implementation.

Very likely, but one thing I haven't seen mentioned is what your
bottleneck actually is. Is it CPU? Disk? Something else (ie: neither CPU
or IO is at 100%). Additionally, since you have multiple arrays, are you
sure they're being utilized equally? Having something like MRTG or
cricket will make your tuning much easier. Unlike Oracle, PostgreSQL has
no ability to avoid hitting the base table even if an index could cover
a query... so compared to Oracle you'll need to dedicate a lot more IO
to the base tables.

Search around for PostgreSQL on Solaris tuning tips... there's some
OS-settings that can make a huge difference. In particular, by default
Solaris will only dedicate a fraction of memory to disk caching. That
won't bother Oracle much but it's a big deal to PostgreSQL. I think
there's some other relevant OS parameters as well.

For vacuum, you're going to need to tune the vacuum_cost_* settings so
that you can balance the IO impact of vacuums with the need to complete
the vacuums in a reasonable time. You'll find this easiest to tune by
running manual vacuums and monitoring IO activity.

You'll also likely need to tune the bgwriter so that checkpoints aren't
killing you. If you're targeting a checkpoint every 5 minutes you'll
need to at least up bgwriter_all_maxpages to shared_buffers (in pages) /
300 / 5. I'd round up a bit. As with everything, you'll need to tweak
your values from there. If you're using stock bgwriter settings then
you'll probably be seeing a big IO spike every time a checkpoint occurs.

Speaking of which... how often are checkpoints? If you can tolerate 5
minutes of recovery time, (the default checkpoint_timeout), I suggest
setting checkpount_warning to 290 seconds or so; that way if you're
getting checkpoints much more often than every 5 minutes you'll be able
to see in the logs.

Speaking of which, going longer between checkpoints will likely help
performance, if you can tolerate longer recovery times. I haven't
actually tested the correlation, but I would expect recovery to complete
in a maximum of checkpount_timeout seconds. If you can tolerate being in
recovery mode for 10 minutes after a crash, try bumping
checkpount_timeout, checkpount_warning and checkpoint_segments and see
what it does for performance (if you do that you'll also want to tweak
bgwriter further... in this case increasing bgwriter_delay would be
easiest).

Given what sounds like decent IO capabilities, you'll likely get better
query plans from decreasing random_page_cost, probably to between 2 and
3.

Speaking of IO... if you can switch to RAID10 you'll likely get better
preformance since your write load is so heavy. Normally RAID5 is a
complete performance killer as soon as you're doing much writing, but
I'm guessing that those nice expensive Sun arrays are better than most
RAID controllers.

All that being said... generally the biggest tuning impact to be had for
any database environment is in how the application is using the
database. A few sub-optimal things in the application/database design
could easily erase every gain you'll get from all your tuning. I suggest
running EXPLAIN ANALYZE on the queries that are run most often and
seeing what that shows.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-09 Thread Jeff Frost

On Tue, 9 Jan 2007, Jim C. Nasby wrote:


On Thu, Dec 28, 2006 at 02:15:31PM -0800, Jeff Frost wrote:

When benchmarking various options for a new PG server at one of my clients,
I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be
fastest to have ext2 for the WAL.  The winning time was 157m46.713s for
ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3
data=writeback.  This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/
1GB BBU cache.  This config benched out faster than a 6disk RAID10 + 2 disk
RAID1 for those of you who have been wondering if the BBU write back cache
mitigates the need for separate WAL (at least on this workload).  Those are
the fastest times for each config, but ext2 WAL was always faster than the
other two options.  I didn't test any other filesystems in this go around.


Uh, if I'm reading this correctly, you're saying that WAL on a separate
ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of
~158.5 minutes, or 0.4%? Is that even above the noise for your
measurements? I suspect the phase of the moon might play a bigger role
;P


That's what I thought too...cept I ran it 20 times and ext2 won by that margin 
every time, so it was quite repeatable. :-/


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-08 Thread Bruce Momjian
Craig A. James wrote:
 Postgres functions like count() and max() are plug ins which has huge
 architectural advantages.  But in pre-8.1 releases, there was a big
 speed penalty for this: functions like count() were very, very slow,
 requiring a full table scan.  I think this is vastly improved from 8.0x
 to 8.1 and forward; others might be able to comment whether count() is
 now as fast in Postgres as Oracle.  The idiom to replace count() was

 ^^

BigDBMS == Oracle.  ;-)

--
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Shane Ambler

Guy Rouillier wrote:
The application is fairly straightforward, but as you say, what is 
working okay with BigDBMS isn't working as well under PG.  I'm going to 
try other configuration suggestions made by others before I attempt 
logic changes.  The core logic is unchangeable; millions of rows of data 
in a single table will be updated throughout the day.  If PG can't 
handle high volume updates well, this may be brick wall.


I understand your reluctance to change your working design in the change 
over to PostgreSQL but -


1. Your table definitions may or may not be the issue and a small change 
in design (even only choice of datatype) may be all that is needed to 
get the needed performance out of PostgreSQL. These changes would be 
done before you put PostgreSQL into production use so the amount of 
current usage is not relevant when deciding/analyzing these changes but 
they may affect your ability to use PostgreSQL as an alternative.


2. I think that the idea of logic changes suggested earlier was more 
aimed at your select/update commands than the structure of your tables. 
You should expect to have some SQL changes between any database and 
using select/update's designed to take advantage of PostgreSQL strengths 
can give you performance improvements.



--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Dave Cramer


On 6-Jan-07, at 11:32 PM, Guy Rouillier wrote:


Dave Cramer wrote:



The box has 3 GB of memory.  I would think that BigDBMS would be  
hurt by this more than PG.  Here are the settings I've modified  
in postgresql.conf:
As I said you need to set shared_buffers to at least 750MB this is  
the starting point, it can actually go higher. Additionally  
effective cache should be set to 2.25 G turning fsync is not a  
real world situation. Additional tuning of file systems can  
provide some gain, however as Craig pointed out some queries may  
need to be tweaked.


Dave, thanks for the hard numbers, I'll try them.  I agree turning  
fsync off is not a production option.  In another reply to my  
original posting, Alex mentioned that BigDBMS gets an advantage  
from its async IO.  So simply as a test, I turned fsync off in an  
attempt to open wide all the pipes.


Regarding shared_buffers=750MB, the last discussions I remember on  
this subject said that anything over 10,000 (8K buffers = 80 MB)  
had unproven benefits.  So I'm surprised to see such a large value  
suggested.  I'll certainly give it a try and see what happens.


That is 25% of your available memory. This is just a starting point.  
There are reports that going as high as 50% can be advantageous,  
however you need to measure it yourself.






autovacuum=on
stats_row_level = on
max_connections = 10
listen_addresses = 'db01,localhost'
shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
temp_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 30
redirect_stderr = on
log_line_prefix = '%t %d'

--
Guy Rouillier

---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

  http://archives.postgresql.org




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Craig A. James

Craig A. James wrote:
The idiom to replace count() was 
select col from tbl order by col desc limit 1.  It worked miracles for 
my app.


Sorry, I meant to write, the idiom to replace MAX(), not count()...  MAX() 
was the function that was killing me, 'tho count() also gave me problems.

Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Adam Rich

I'm using 8.2 and using order by  limit is still faster than MAX()
even though MAX() now seems to rewrite to an almost identical plan
internally.

Count(*) still seems to use a full table scan rather than an index scan.

Using one of our tables, MySQL/Oracle/MS-SQL all return instantly while
PG takes longer ther 700ms.  Luckily we can design around this issue.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Craig A.
James
Sent: Sunday, January 07, 2007 5:57 PM
To: Guy Rouillier; PostgreSQL Performance
Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS


Craig A. James wrote:
 The idiom to replace count() was 
 select col from tbl order by col desc limit 1.  It worked miracles
for 
 my app.

Sorry, I meant to write, the idiom to replace MAX(), not count()...
MAX() was the function that was killing me, 'tho count() also gave me
problems.

Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Tom Lane
Adam Rich [EMAIL PROTECTED] writes:
 I'm using 8.2 and using order by  limit is still faster than MAX()
 even though MAX() now seems to rewrite to an almost identical plan
 internally.

Care to quantify that?  AFAICT any difference is within measurement
noise, at least for the case of separately-issued SQL commands.

 Count(*) still seems to use a full table scan rather than an index scan.

Yup.  Don't hold your breath for something different.  Postgres has made
design choices that make certain cases fast and others slow, and
count(*) is one case that has come out on the short end of the stick.
If that's your most important measure of performance, then indeed you
should select a different database that's made different tradeoffs.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Joshua D. Drake
On Sun, 2007-01-07 at 20:26 -0600, Adam Rich wrote:
 I'm using 8.2 and using order by  limit is still faster than MAX()
 even though MAX() now seems to rewrite to an almost identical plan
 internally.


Gonna need you to back that up :) Can we get an explain analyze?


 Count(*) still seems to use a full table scan rather than an index scan.
 

There is a TODO out there to help this. Don't know if it will get done.

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Guy Rouillier

Dave Cramer wrote:


On 6-Jan-07, at 11:32 PM, Guy Rouillier wrote:


Dave Cramer wrote:



The box has 3 GB of memory.  I would think that BigDBMS would be 
hurt by this more than PG.  Here are the settings I've modified in 
postgresql.conf:
As I said you need to set shared_buffers to at least 750MB this is 
the starting point, it can actually go higher. Additionally effective 
cache should be set to 2.25 G turning fsync is not a real world 
situation. Additional tuning of file systems can provide some gain, 
however as Craig pointed out some queries may need to be tweaked.


Dave, thanks for the hard numbers, I'll try them.  I agree turning 
fsync off is not a production option.  In another reply to my original 
posting, Alex mentioned that BigDBMS gets an advantage from its async 
IO.  So simply as a test, I turned fsync off in an attempt to open 
wide all the pipes.


Regarding shared_buffers=750MB, the last discussions I remember on 
this subject said that anything over 10,000 (8K buffers = 80 MB) had 
unproven benefits.  So I'm surprised to see such a large value 
suggested.  I'll certainly give it a try and see what happens.


That is 25% of your available memory. This is just a starting point. 
There are reports that going as high as 50% can be advantageous, however 
you need to measure it yourself.


Ok, I ran with the settings below, but with

shared_buffers=768MB
effective_cache_size=2048MB
fsync=on

This run took 29000 seconds.  I'm beginning to think configuration 
changes are not going to buy significant additional improvement.  Time 
to look at the app implementation.








autovacuum=on
stats_row_level = on
max_connections = 10
listen_addresses = 'db01,localhost'
shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
temp_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 30
redirect_stderr = on
log_line_prefix = '%t %d'

--Guy Rouillier

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org






--
Guy Rouillier

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Adam Rich

Here's the queries and explains... Granted, it's not a huge difference
here,
but both timings are absolutely consistent.  Using max(), this runs
almost 
15 queries/sec and limit 1 runs at almost 40 queries/sec.  

Is the differene in explain analyze expected behavior? (rows=168196 vs.
rows=1)
(The table is freshly analayzed)



select max(item_id) from receipt_items

Result  (cost=0.04..0.05 rows=1 width=0) (actual time=0.030..0.031
rows=1 loops=1)
InitPlan
-  Limit  (cost=0.00..0.04 rows=1 width=4) (actual time=0.023..0.024
rows=1 loops=1)
-  Index Scan Backward using receipt_items_pkey on receipt_items
(cost=0.00..6883.71 rows=168196 width=4) (actual time=0.020..0.020
rows=1 loops=1)
Filter: (item_id IS NOT NULL)
Total runtime: 0.067 ms


select item_id 
from receipt_items
order by item_id desc
limit 1

Limit  (cost=0.00..0.04 rows=1 width=4) (actual time=0.010..0.011 rows=1
loops=1)
-  Index Scan Backward using receipt_items_pkey on receipt_items
(cost=0.00..6883.71 rows=168196 width=4) (actual time=0.008..0.008
rows=1 loops=1)
Total runtime: 0.026 ms


A couple more similar examples from this table:



select max(create_date) from receipt_items

Result  (cost=0.05..0.06 rows=1 width=0) (actual time=0.032..0.032
rows=1 loops=1)
InitPlan
-  Limit  (cost=0.00..0.05 rows=1 width=8) (actual time=0.025..0.026
rows=1 loops=1)
-  Index Scan Backward using test_idx_1 on receipt_items
(cost=0.00..7986.82 rows=168196 width=8) (actual time=0.022..0.022
rows=1 loops=1)
Filter: (create_date IS NOT NULL)
Total runtime: 0.069 ms


select create_date
from receipt_items
order by create_date desc
limit 1;

Limit  (cost=0.00..0.05 rows=1 width=8) (actual time=0.011..0.012 rows=1
loops=1)
-  Index Scan Backward using test_idx_1 on receipt_items
(cost=0.00..7986.82 rows=168196 width=8) (actual time=0.009..0.009
rows=1 loops=1)
Total runtime: 0.027 ms






-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Sunday, January 07, 2007 8:48 PM
To: Adam Rich
Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance'
Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS 


Adam Rich [EMAIL PROTECTED] writes:
 I'm using 8.2 and using order by  limit is still faster than MAX()
 even though MAX() now seems to rewrite to an almost identical plan
 internally.

Care to quantify that?  AFAICT any difference is within measurement
noise, at least for the case of separately-issued SQL commands.

 Count(*) still seems to use a full table scan rather than an index
scan.

Yup.  Don't hold your breath for something different.  Postgres has made
design choices that make certain cases fast and others slow, and
count(*) is one case that has come out on the short end of the stick.
If that's your most important measure of performance, then indeed you
should select a different database that's made different tradeoffs.

regards, tom lane


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Guy Rouillier

Ron wrote:
C= What file system are you using?  Unlike BigDBMS, pg does not have its 
own native one, so you have to choose the one that best suits your 
needs.  For update heavy applications involving lots of small updates 
jfs and XFS should both be seriously considered.


Ron, thanks for your ideas.  Many of them I've addressed in response to 
suggestions from others.  I wanted to address this one in particular. 
Unfortunately, I do not have the liberty to change file systems on this 
old Sun box.  All file systems are formatted Sun UFS.  BigDBMS is 
equally subject to whatever pluses or minuses can be attributed to this 
file system, so I'm thinking that this issue would be a wash between the 
two.


I've come to the conclusion that configuration changes to PG alone will 
not equal the playing field.  My next step is to try to determine where 
the biggest payback will be regarding changing the implementation.


--
Guy Rouillier

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Adam Rich

Here's another, more drastic example... Here the order by / limit
version
runs in less than 1/7000 the time of the MAX() version.


select max(item_id)
from events e, receipts r, receipt_items ri
where e.event_id=r.event_id and r.receipt_id=ri.receipt_id

Aggregate  (cost=10850.84..10850.85 rows=1 width=4) (actual
time=816.382..816.383 rows=1 loops=1)
  -  Hash Join  (cost=2072.12..10503.30 rows=139019 width=4) (actual
time=155.177..675.870 rows=147383 loops=1)
Hash Cond: (ri.receipt_id = r.receipt_id)
-  Seq Scan on receipt_items ri  (cost=0.00..4097.56
rows=168196 width=8) (actual time=0.009..176.894 rows=168196 loops=1)
-  Hash  (cost=2010.69..2010.69 rows=24571 width=4) (actual
time=155.146..155.146 rows=24571 loops=1)
  -  Hash Join  (cost=506.84..2010.69 rows=24571 width=4)
(actual time=34.803..126.452 rows=24571 loops=1)
Hash Cond: (r.event_id = e.event_id)
-  Seq Scan on receipts r  (cost=0.00..663.58
rows=29728 width=8) (actual time=0.006..30.870 rows=29728 loops=1)
-  Hash  (cost=469.73..469.73 rows=14843 width=4)
(actual time=34.780..34.780 rows=14843 loops=1)
  -  Seq Scan on events e  (cost=0.00..469.73
rows=14843 width=4) (actual time=0.007..17.603 rows=14843 loops=1)
Total runtime: 816.645 ms

select item_id
from events e, receipts r, receipt_items ri
where e.event_id=r.event_id and r.receipt_id=ri.receipt_id
order by item_id desc limit 1


Limit  (cost=0.00..0.16 rows=1 width=4) (actual time=0.047..0.048 rows=1
loops=1)
  -  Nested Loop  (cost=0.00..22131.43 rows=139019 width=4) (actual
time=0.044..0.044 rows=1 loops=1)
-  Nested Loop  (cost=0.00..12987.42 rows=168196 width=8)
(actual time=0.032..0.032 rows=1 loops=1)
  -  Index Scan Backward using receipt_items_pkey on
receipt_items ri  (cost=0.00..6885.50 rows=168196 width=8) (actual
time=0.016..0.016 rows=1 loops=1)
  -  Index Scan using receipts_pkey on receipts r
(cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1
loops=1)
Index Cond: (r.receipt_id = ri.receipt_id)
-  Index Scan using events_pkey on events e  (cost=0.00..0.04
rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
  Index Cond: (e.event_id = r.event_id)
Total runtime: 0.112 ms





-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Joshua D.
Drake
Sent: Sunday, January 07, 2007 9:10 PM
To: Adam Rich
Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance'
Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS


On Sun, 2007-01-07 at 20:26 -0600, Adam Rich wrote:
 I'm using 8.2 and using order by  limit is still faster than MAX()
 even though MAX() now seems to rewrite to an almost identical plan
 internally.


Gonna need you to back that up :) Can we get an explain analyze?


 Count(*) still seems to use a full table scan rather than an index
scan.
 

There is a TODO out there to help this. Don't know if it will get done.

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-06 Thread Guy Rouillier

Craig A. James wrote:
I don't know if you have access to the application's SQL, or the time to 
experiment a bit, but unless your schema is trival and your SQL is 
boneheaded simple, you're not going to get equal performance from 
Postgres until you do some analysis of your application under real-world 
conditions, and optimize the problem areas.


Craig, thanks for taking the time to think about this.  Yes, I have all 
the application source code, and all the time in the world, as I'm doing 
this experimentation on my own time.  The test hardware is old stuff no 
one intends to use for production work ever again, so I can use it as 
long as I want.


The application is fairly straightforward, but as you say, what is 
working okay with BigDBMS isn't working as well under PG.  I'm going to 
try other configuration suggestions made by others before I attempt 
logic changes.  The core logic is unchangeable; millions of rows of data 
in a single table will be updated throughout the day.  If PG can't 
handle high volume updates well, this may be brick wall.


--
Guy Rouillier

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-06 Thread Guy Rouillier

Dave Cramer wrote:



The box has 3 GB of memory.  I would think that BigDBMS would be hurt 
by this more than PG.  Here are the settings I've modified in 
postgresql.conf:


As I said you need to set shared_buffers to at least 750MB this is the 
starting point, it can actually go higher. Additionally effective cache 
should be set to 2.25 G turning fsync is not a real world situation. 
Additional tuning of file systems can provide some gain, however as 
Craig pointed out some queries may need to be tweaked.


Dave, thanks for the hard numbers, I'll try them.  I agree turning fsync 
off is not a production option.  In another reply to my original 
posting, Alex mentioned that BigDBMS gets an advantage from its async 
IO.  So simply as a test, I turned fsync off in an attempt to open wide 
all the pipes.


Regarding shared_buffers=750MB, the last discussions I remember on this 
subject said that anything over 10,000 (8K buffers = 80 MB) had unproven 
benefits.  So I'm surprised to see such a large value suggested.  I'll 
certainly give it a try and see what happens.




autovacuum=on
stats_row_level = on
max_connections = 10
listen_addresses = 'db01,localhost'
shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
temp_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 30
redirect_stderr = on
log_line_prefix = '%t %d'

--
Guy Rouillier

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-06 Thread Joshua D. Drake

 Regarding shared_buffers=750MB, the last discussions I remember on this 
 subject said that anything over 10,000 (8K buffers = 80 MB) had unproven 
 benefits.  So I'm surprised to see such a large value suggested.  I'll 
 certainly give it a try and see what happens.
 

That is old news :) As of 8.1 it is quite beneficial to go well above
the aforementioned amount.

J


 
  autovacuum=on
  stats_row_level = on
  max_connections = 10
  listen_addresses = 'db01,localhost'
  shared_buffers = 128MB
  work_mem = 16MB
  maintenance_work_mem = 64MB
  temp_buffers = 32MB
  max_fsm_pages = 204800
  checkpoint_segments = 30
  redirect_stderr = on
  log_line_prefix = '%t %d'
-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-06 Thread Craig A. James

Guy,

The application is fairly straightforward, but as you say, what is 
working okay with BigDBMS isn't working as well under PG.  I'm going to 
try other configuration suggestions made by others before I attempt 
logic changes.  The core logic is unchangeable; millions of rows of data 
in a single table will be updated throughout the day.  If PG can't 
handle high volume updates well, this may be brick wall.


Here are a couple things I learned.

ANALYZE is VERY important, surprisingly so even for small tables.  I had a case last week 
where a temporary scratch table with just 100 rows was joined to two more 
tables of 6 and 12 million rows.  You might think that a 100-row table wouldn't need to 
be analyzed, but it does: Without the ANALYZE, Postgres generated a horrible plan that 
took many minutes to run; with the ANALYZE, it took milliseconds.  Any time a table's 
contents change dramatically, ANALYZE it, ESPECIALLY if it's a small table.  After all, 
changing 20 rows in a 100-row table has a much larger affect on its statistics than 
changing 20 rows in a million-row table.

Postgres functions like count() and max() are plug ins which has huge architectural advantages.  
But in pre-8.1 releases, there was a big speed penalty for this: functions like count() were very, very slow, 
requiring a full table scan.  I think this is vastly improved from 8.0x to 8.1 and forward; others might be 
able to comment whether count() is now as fast in Postgres as Oracle.  The idiom to replace 
count() was select col from tbl order by col desc limit 1.  It worked miracles for my app.

Postgres has explicit garbage collection via VACUUM, and you have to design your 
application with this in mind.  In Postgres, update is delete+insert, meaning updates 
create garbage.  If you have very wide tables, but only a subset of the 
columns are updated frequently, put these columns in a separate table with an index to 
join the two tables.  For example, my original design was something like this:

  integer primary key
  very large text column
  ... a bunch of integer columns, float columns, and small text columns

The properties were updated by the application, but the large text column never 
changed.  This led to huge garbage-collection problems as the large text field 
was repeatedly deleted and reinserted by the updates.  By separating these into 
two tables, one with the large text column, and the other table with the 
dynamic, but smaller, columns, garbage is massively reduced, and performance 
increased, both immediately (smaller data set to update) and long term (smaller 
vacuums).  You can use views to recreate your original combined columns, so the 
changes to your app are limited to where updates occur.

If you have a column that is *frequently* updated (say, for example, a user's 
last-access timestamp each time s/he hits your web server) then you definitely 
want this in its own table, not mixed in with the user's name, address, etc.

Partitioning in Postgres is more powerful than in Oracle.  Use it if you can.

Partial indexes are VERY nice in Postgres, if your data is poorly distributed 
(for example, a mostly-NULL column with a small percentage of very important 
values).

I'm sure there are more things that others can contribute.

Craig


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-05 Thread Guy Rouillier
I've got back access to my test system.  I ran another test run with the 
same input data set.  This time I put pg_xlog on a different RAID volume 
(the unused one that I suspect is a software RAID), and I turned 
fsync=off in postgresql.conf.  I left the rest of the configuration 
alone (all foreign keys removed), etc.  Unfortunately, this only dropped 
elapsed time down to about 28000 seconds (from 3), still 
significantly more than BigDBMS.  Additional info inline below.


Shoaib Mir wrote:

Here are my few recommendations that might help you:

- You will need to do table partitioning 
(http://www.postgresql.org/docs/current/static/ddl-partitioning.html 
http://www.postgresql.org/docs/current/static/ddl-partitioning.html) 
as you are storing quite a lot of data in one table per day.


I'm focusing on the detailed perspective for now.  The 144 files I'm 
processing represent not even two hours of data, so that surely wouldn't 
be split up.




- You are using a RAID5 setup which is something that can also affect 
performance so switching to RAID1 might help you there, but again you 
have a RAID5 with 12 disks so hmm that shouldn't be that much of a problem.


Agreed.



- Have you done the tuning for postgresql.conf parameters? if not then 
you really need to do this for like checkpoint segments, random page 
cost, shared buffers, cache size, fsm pages, vacuum cost delay, 
work_mem, bgwriter etc etc. You can get good advice for tuning these 
parameters at -- http://www.powerpostgresql.com/PerfList/


The box has 3 GB of memory.  I would think that BigDBMS would be hurt by 
this more than PG.  Here are the settings I've modified in postgresql.conf:


autovacuum=on
stats_row_level = on
max_connections = 10
listen_addresses = 'db01,localhost'
shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
temp_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 30
redirect_stderr = on
log_line_prefix = '%t %d'



- For autovacuuming you need to properly tune the thresholds so that the 
vacuum and analyze is done at the right time not affecting the database 
server performance. (You can find help for this at 
http://www.postgresql.org/docs/current/static/routine-vacuuming.html 
under 22.1.4. The auto-vacuum daemon)


The real-life load on this database would be fairly constant throughout 
the day.  Stats from network devices are received every 15 minutes from 
each device, but they are staggered.  As a result, the database is 
almost constantly being updated, so there is no dead time to do vacuums.




- You will need to separate your transactional logs i.e. pg_xlog folder 
to a different drive other then your database server drive. This can be 
done by creating symlinks for pg_xlog folder.


Done, see opening remarks.  Unfortunately minor impact.



- I hope you are doing proper connection pool management, because good 
use of database connections can be really effect the overall 
performance, connections can be expensive to create, and consume memory 
if they are not properly exited.


I probably should have mentioned this originally but was afraid of 
information overload.  The application runs on JBoss and uses JBoss 
connection pools.  So connections are pooled, but I don't know how they 
would compare to native PG connection pools.  Essentially, JBoss gets 
native JDBC connections, and the pools simply allow them to be re-used 
without opening and closing each time.  So if the native PG connection 
pools provide any pooling optimizations beyond that, those advantages 
are not being realized.




Hope that helps your tests...


Thanks to everyone for providing suggestions, and I apologize for my 
delay in responding to each of them.





Shoaib Mir
EnterpriseDB (www.enterprisedb.com http://www.enterprisedb.com)

On 12/28/06, *Guy Rouillier* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


I don't want to violate any license agreement by discussing
performance,
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as
BigDBMS here.

I'm trying to convince my employer to replace BigDBMS with PostgreSQL
for at least some of our Java applications.  As a proof of concept, I
started with a high-volume (but conceptually simple) network data
collection application.  This application collects files of 5-minute
usage statistics from our network devices, and stores a raw form of
these stats into one table and a normalized form into a second table.
We are currently storing about 12 million rows a day in the normalized
table, and each month we start new tables.  For the normalized data, the
app inserts rows initialized to zero for the entire current day first
thing in the morning, then throughout the day as stats are received,
executes updates against existing rows.  So the app has very high update
activity.

In my test environment, I have a dual-x86 Linux platform running the
application, and an 

Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-05 Thread Craig A. James

Guy Rouillier wrote:
I've got back access to my test system.  I ran another test run with the 
same input data set.  This time I put pg_xlog on a different RAID volume 
(the unused one that I suspect is a software RAID), and I turned 
fsync=off in postgresql.conf.  I left the rest of the configuration 
alone (all foreign keys removed), etc.  Unfortunately, this only dropped 
elapsed time down to about 28000 seconds (from 3), still 
significantly more than BigDBMS.  Additional info inline below.


Although tuning is extremely important, you also have to look at the 
application itself.  I discovered (the hard way) that there's simply no 
substitute for a bit of redesign/rewriting of the schema and/or SQL statements.

Many of us who grew up on Oracle assume that their SQL is standard stuff, and that 
Oracle's optimizer is the way it's done.  But in fact most Oracle applications are 
tweaked and tuned to take advantage of Oracle's strengths and avoid its weaknesses.  If you 
designed an application from the ground up to use Postgres, then migrated to Oracle, you would 
probably be equally frustrated by Oracle's poor performance on your Postgres-tuned application.

I don't know if you have access to the application's SQL, or the time to 
experiment a bit, but unless your schema is trival and your SQL is boneheaded 
simple, you're not going to get equal performance from Postgres until you do 
some analysis of your application under real-world conditions, and optimize the 
problem areas.

In my case, I found just a few specific SQL constructs that, with a bit of 
tuning, made massive differences in performance.

Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-05 Thread Dave Cramer


On 5-Jan-07, at 9:51 PM, Guy Rouillier wrote:

I've got back access to my test system.  I ran another test run  
with the same input data set.  This time I put pg_xlog on a  
different RAID volume (the unused one that I suspect is a software  
RAID), and I turned fsync=off in postgresql.conf.  I left the rest  
of the configuration alone (all foreign keys removed), etc.   
Unfortunately, this only dropped elapsed time down to about 28000  
seconds (from 3), still significantly more than BigDBMS.   
Additional info inline below.


Shoaib Mir wrote:

Here are my few recommendations that might help you:
- You will need to do table partitioning (http:// 
www.postgresql.org/docs/current/static/ddl-partitioning.html  
http://www.postgresql.org/docs/current/static/ddl- 
partitioning.html) as you are storing quite a lot of data in one  
table per day.


I'm focusing on the detailed perspective for now.  The 144 files  
I'm processing represent not even two hours of data, so that surely  
wouldn't be split up.


- You are using a RAID5 setup which is something that can also  
affect performance so switching to RAID1 might help you there, but  
again you have a RAID5 with 12 disks so hmm that shouldn't be that  
much of a problem.


Agreed.

- Have you done the tuning for postgresql.conf parameters? if not  
then you really need to do this for like checkpoint segments,  
random page cost, shared buffers, cache size, fsm pages, vacuum  
cost delay, work_mem, bgwriter etc etc. You can get good advice  
for tuning these parameters at -- http://www.powerpostgresql.com/ 
PerfList/


The box has 3 GB of memory.  I would think that BigDBMS would be  
hurt by this more than PG.  Here are the settings I've modified in  
postgresql.conf:


As I said you need to set shared_buffers to at least 750MB this is  
the starting point, it can actually go higher. Additionally effective  
cache should be set to 2.25 G turning fsync is not a real world  
situation. Additional tuning of file systems can provide some gain,  
however as Craig pointed out some queries may need to be tweaked.


autovacuum=on
stats_row_level = on
max_connections = 10
listen_addresses = 'db01,localhost'
shared_buffers = 128MB
work_mem = 16MB
maintenance_work_mem = 64MB
temp_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 30
redirect_stderr = on
log_line_prefix = '%t %d'

- For autovacuuming you need to properly tune the thresholds so  
that the vacuum and analyze is done at the right time not  
affecting the database server performance. (You can find help for  
this at http://www.postgresql.org/docs/current/static/routine- 
vacuuming.html under 22.1.4. The auto-vacuum daemon)


The real-life load on this database would be fairly constant  
throughout the day.  Stats from network devices are received every  
15 minutes from each device, but they are staggered.  As a result,  
the database is almost constantly being updated, so there is no  
dead time to do vacuums.


- You will need to separate your transactional logs i.e. pg_xlog  
folder to a different drive other then your database server drive.  
This can be done by creating symlinks for pg_xlog folder.


Done, see opening remarks.  Unfortunately minor impact.

- I hope you are doing proper connection pool management, because  
good use of database connections can be really effect the overall  
performance, connections can be expensive to create, and consume  
memory if they are not properly exited.


I probably should have mentioned this originally but was afraid of  
information overload.  The application runs on JBoss and uses JBoss  
connection pools.  So connections are pooled, but I don't know how  
they would compare to native PG connection pools.  Essentially,  
JBoss gets native JDBC connections, and the pools simply allow them  
to be re-used without opening and closing each time.  So if the  
native PG connection pools provide any pooling optimizations beyond  
that, those advantages are not being realized.


the PG Connection pools will not help, they do not currently provide  
any extra optimization.


Dave



Hope that helps your tests...


Thanks to everyone for providing suggestions, and I apologize for  
my delay in responding to each of them.




Shoaib Mir
EnterpriseDB (www.enterprisedb.com http://www.enterprisedb.com)
On 12/28/06, *Guy Rouillier* [EMAIL PROTECTED] mailto:guyr- 
[EMAIL PROTECTED] wrote:

I don't want to violate any license agreement by discussing
performance,
so I'll refer to a large, commercial PostgreSQL-compatible  
DBMS only as

BigDBMS here.
I'm trying to convince my employer to replace BigDBMS with  
PostgreSQL
for at least some of our Java applications.  As a proof of  
concept, I

started with a high-volume (but conceptually simple) network data
collection application.  This application collects files of 5- 
minute
usage statistics from our network devices, and stores a raw  
form of
these 

Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-02 Thread Geoffrey

Alvaro Herrera wrote:

Ron wrote:

C= What file system are you using?  Unlike BigDBMS, pg does not have 
its own native one, so you have to choose the one that best suits 
your needs.  For update heavy applications involving lots of small 
updates jfs and XFS should both be seriously considered.


Actually it has been suggested that a combination of ext2 (for WAL) and
ext3 (for data, with data journalling disabled) is a good performer.
AFAIK you don't want the overhead of journalling for the WAL partition.


I'm curious as to why ext3 for data with journalling disabled?  Would 
that not be the same as ext2?


--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-02 Thread Ragnar
On þri, 2007-01-02 at 09:04 -0500, Geoffrey wrote:
 Alvaro Herrera wrote:
  
  Actually it has been suggested that a combination of ext2 (for WAL) and
  ext3 (for data, with data journalling disabled) is a good performer.
  AFAIK you don't want the overhead of journalling for the WAL partition.
 
 I'm curious as to why ext3 for data with journalling disabled?  Would 
 that not be the same as ext2?

I believe Alvaro was referring to ext3 with journalling enabled 
for meta-data, but not for data.
I also believe this is the standard ext3 configuration, but I
could be wrong on that.

gnari




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-02 Thread Lars Heidieker

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On 2 Jan 2007, at 14:54, Ragnar wrote:


On þri, 2007-01-02 at 09:04 -0500, Geoffrey wrote:

Alvaro Herrera wrote:


Actually it has been suggested that a combination of ext2 (for  
WAL) and

ext3 (for data, with data journalling disabled) is a good performer.
AFAIK you don't want the overhead of journalling for the WAL  
partition.


I'm curious as to why ext3 for data with journalling disabled?  Would
that not be the same as ext2?


I believe Alvaro was referring to ext3 with journalling enabled
for meta-data, but not for data.
I also believe this is the standard ext3 configuration, but I
could be wrong on that.

gnari



it doesn't really belong here but ext3 has
data journaled (data and meta data)
ordered (meta data journald but data written before meta data (default))
journald (meta data only journal)
modes.

The performance differences between ordered and meta data only  
journaling should be very small enyway


- --

Viele Grüße,
Lars Heidieker

[EMAIL PROTECTED]
http://paradoxon.info

- 

Mystische Erklärungen.
Die mystischen Erklärungen gelten für tief;
die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind.
 -- Friedrich Nietzsche



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Darwin)

iD8DBQFFmnJUcxuYqjT7GRYRApNrAJ9oYusdw+Io4iSZrEITTbFy2qDA4QCgmBW5
7cpQZmlIv61EF2wP2yNXZhA=
=glwc
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-02 Thread Jeremy Haile
More specifically, you should set the noatime,data=writeback options in
fstab on ext3 partitions for best performance.  Correct?

 it doesn't really belong here but ext3 has
 data journaled (data and meta data)
 ordered (meta data journald but data written before meta data (default))
 journald (meta data only journal)
 modes.
 
 The performance differences between ordered and meta data only  
 journaling should be very small enyway

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-02 Thread Jeff Davis
On Fri, 2006-12-29 at 07:52 -0500, Ron wrote:
 A= go through each query and see what work_mem needs to be for that 
 query to be as RAM resident as possible.  If you have enough RAM, set 
 work_mem for that query that large.  Remember that work_mem is =per 
 query=, so queries running in parallel eat the sum of each of their 
 work_mem's.

Just to clarify, from the docs on work_mem at
http://www.postgresql.org/docs/current/static/runtime-config-
resource.html :

Specifies the amount of memory to be used by internal sort operations
and hash tables before switching to temporary disk files. The value is
specified in kilobytes, and defaults to 1024 kilobytes (1 MB). Note that
for a complex query, several sort or hash operations might be running in
parallel; each one will be allowed to use as much memory as this value
specifies before it starts to put data into temporary files. Also,
several running sessions could be doing such operations concurrently. So
the total memory used could be many times the value of work_mem; it is
necessary to keep this fact in mind when choosing the value. Sort
operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables
are used in hash joins, hash-based aggregation, and hash-based
processing of IN subqueries.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-30 Thread Guy Rouillier

Tom Lane wrote:

Shoaib Mir [EMAIL PROTECTED] writes:

Here are my few recommendations that might help you:
[ snip good advice ]


Another thing to look at is whether you are doing inserts/updates as
individual transactions, and if so see if you can batch them to
reduce the per-transaction overhead.


Thank you everyone who replied with suggestions.  Unfortunately, this is 
a background activity for me, so I can only work on it when I can 
squeeze in time.  Right now, I can't do anything; I swapped out a broken 
switch in our network and the DB server is currently inaccessible ;(.  I 
will eventually work through all suggestions, but I'll start with the 
ones I can respond to without further investigation.


I'm not doing updates as individual transactions.  I cannot use the Java 
batch functionality because the code uses stored procedures to do the 
inserts and updates, and the PG JDBC driver cannot handle executing 
stored procedures in batch.  Briefly, executing a stored procedure 
returns a result set, and Java batches don't expect result sets.


So, in the code I turn autocommit off, and do a commit every 100 
executions of the stored proc.  The exact same code is running against 
BigDBMS, so any penalty from this approach should be evenly felt.


--
Guy Rouillier

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Ron

At 12:46 AM 12/28/2006, Guy Rouillier wrote:
I don't want to violate any license agreement by discussing 
performance, so I'll refer to a large, commercial 
PostgreSQL-compatible DBMS only as BigDBMS here.


I'm trying to convince my employer to replace BigDBMS with 
PostgreSQL for at least some of our Java applications.  As a proof 
of concept, I started with a high-volume (but conceptually simple) 
network data collection application.  This application collects 
files of 5-minute usage statistics from our network devices, and 
stores a raw form of these stats into one table and a normalized 
form into a second table. We are currently storing about 12 million 
rows a day in the normalized table, and each month we start new 
tables.  For the normalized data, the app inserts rows initialized 
to zero for the entire current day first thing in the morning, then 
throughout the day as stats are received, executes updates against 
existing rows.  So the app has very high update activity.


In my test environment, I have a dual-x86 Linux platform running the 
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS 
and PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk 
arrays attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, 
for those familiar with these devices.)  The arrays are set up with 
RAID5.  So I'm working with a consistent hardware platform for this 
comparison.  I'm only processing a small subset of files (144.)


BigDBMS processed this set of data in 2 seconds, with all 
foreign keys in place.  With all foreign keys in place, PG took 
54000 seconds to complete the same job.  I've tried various 
approaches to autovacuum (none, 30-seconds) and it doesn't seem to 
make much difference.  What does seem to make a difference is 
eliminating all the foreign keys; in that configuration, PG takes 
about 3 seconds.  Better, but BigDBMS still has it beat significantly.


If you are using pg configured as default installed, you are not 
getting pg's best performance.  Ditto using data structures optimized 
for BigDBMS.


A= go through each query and see what work_mem needs to be for that 
query to be as RAM resident as possible.  If you have enough RAM, set 
work_mem for that query that large.  Remember that work_mem is =per 
query=, so queries running in parallel eat the sum of each of their work_mem's.


B= Make sure shared buffers is set reasonably.  A good rule of thumb 
for 8.x is that shared buffers should be at least ~1/4 your RAM.  If 
your E4500 is maxed with RAM, there's a good chance shared buffers 
should be considerably more than 1/4 of RAM.


C= What file system are you using?  Unlike BigDBMS, pg does not have 
its own native one, so you have to choose the one that best suits 
your needs.  For update heavy applications involving lots of small 
updates jfs and XFS should both be seriously considered.


D= Your table schema and physical table layout probably needs to 
change.  What BigDBMS likes here is most likely different from what pg likes.


E= pg does not actually update records in place.  It appends new 
records to the table and marks the old version invalid.  This means 
that things like pages size, RAID stripe size, etc etc may need to 
have different values than they do for BigDBMS.  Another consequence 
is that pg likes RAID 10 even more than most of its competitors.


F= This may seem obvious, but how many of the foreign keys and other 
overhead do you actually need?  Get rid of the unnecessary.


G= Bother the folks at Sun, like Josh Berkus, who know pq inside and 
out +and+ know your HW (or have access to those that do ;-) )inside 
and out.  I'll bet they'll have ideas I'm not thinking of.


H= Explain Analyze is your friend.  Slow queries may need better 
table statistics, or better SQL, or may be symptoms of issues C or 
D above or ...


I've got PG configured so that that the system database is on disk 
array 2, as are the transaction log files.  The default table space 
for the test database is disk array 3.  I've got all the reference 
tables (the tables to which the foreign keys in the stats tables 
refer) on this array.  I also store the stats tables on this 
array.  Finally, I put the indexes for the stats tables on disk 
array 4.  I don't use disk array 1 because I believe it is a software array.
I= With 4 arrays of 12 HDs each, you definitely have enough spindles 
to place pg_xlog somewhere separate from all the other pg tables.  In 
addition, you should analyze you table access patterns and then 
scatter them across your 4 arrays in such as way as to minimize head 
contention.



I'm out of ideas how to improve this picture any further.  I'd 
appreciate some suggestions.  Thanks.

Hope this helps,

Ron Peacetree 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  

Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Arnau

Hi all,

A= go through each query and see what work_mem needs to be for that 
query to be as RAM resident as possible.  If you have enough RAM, set 
work_mem for that query that large.  Remember that work_mem is =per 
query=, so queries running in parallel eat the sum of each of their 
work_mem's.


How can I know what work_mem needs a query needs?

Regards
--
Arnau

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Alvaro Herrera
Ron wrote:

 C= What file system are you using?  Unlike BigDBMS, pg does not have 
 its own native one, so you have to choose the one that best suits 
 your needs.  For update heavy applications involving lots of small 
 updates jfs and XFS should both be seriously considered.

Actually it has been suggested that a combination of ext2 (for WAL) and
ext3 (for data, with data journalling disabled) is a good performer.
AFAIK you don't want the overhead of journalling for the WAL partition.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Jeff Frost

On Fri, 29 Dec 2006, Alvaro Herrera wrote:


Ron wrote:


C= What file system are you using?  Unlike BigDBMS, pg does not have
its own native one, so you have to choose the one that best suits
your needs.  For update heavy applications involving lots of small
updates jfs and XFS should both be seriously considered.


Actually it has been suggested that a combination of ext2 (for WAL) and
ext3 (for data, with data journalling disabled) is a good performer.
AFAIK you don't want the overhead of journalling for the WAL partition.


When benchmarking various options for a new PG server at one of my clients, I 
tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be fastest 
to have ext2 for the WAL.  The winning time was 157m46.713s for ext2, 
159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3 
data=writeback.  This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/ 1GB 
BBU cache.  This config benched out faster than a 6disk RAID10 + 2 disk RAID1 
for those of you who have been wondering if the BBU write back cache mitigates 
the need for separate WAL (at least on this workload).  Those are the fastest 
times for each config, but ext2 WAL was always faster than the other two 
options.  I didn't test any other filesystems in this go around.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-29 Thread Alex Turner

You should search the archives for Luke Lonegran's posting about how IO in
Postgresql is significantly bottlenecked because it's not async.  A 12 disk
array is going to max out Postgresql's max theoretical write capacity to
disk, and therefore BigRDBMS is always going to win in such a config.  You
can also look towards Bizgres which allegedly elimates some of these
problems, and is cheaper than most BigRDBMS products.

Alex.

On 12/28/06, Guy Rouillier [EMAIL PROTECTED] wrote:


I don't want to violate any license agreement by discussing performance,
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as
BigDBMS here.

I'm trying to convince my employer to replace BigDBMS with PostgreSQL
for at least some of our Java applications.  As a proof of concept, I
started with a high-volume (but conceptually simple) network data
collection application.  This application collects files of 5-minute
usage statistics from our network devices, and stores a raw form of
these stats into one table and a normalized form into a second table.
We are currently storing about 12 million rows a day in the normalized
table, and each month we start new tables.  For the normalized data, the
app inserts rows initialized to zero for the entire current day first
thing in the morning, then throughout the day as stats are received,
executes updates against existing rows.  So the app has very high update
activity.

In my test environment, I have a dual-x86 Linux platform running the
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and
PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those
familiar with these devices.)  The arrays are set up with RAID5.  So I'm
working with a consistent hardware platform for this comparison.  I'm
only processing a small subset of files (144.)

BigDBMS processed this set of data in 2 seconds, with all foreign
keys in place.  With all foreign keys in place, PG took 54000 seconds to
complete the same job.  I've tried various approaches to autovacuum
(none, 30-seconds) and it doesn't seem to make much difference.  What
does seem to make a difference is eliminating all the foreign keys; in
that configuration, PG takes about 3 seconds.  Better, but BigDBMS
still has it beat significantly.

I've got PG configured so that that the system database is on disk array
2, as are the transaction log files.  The default table space for the
test database is disk array 3.  I've got all the reference tables (the
tables to which the foreign keys in the stats tables refer) on this
array.  I also store the stats tables on this array.  Finally, I put the
indexes for the stats tables on disk array 4.  I don't use disk array 1
because I believe it is a software array.

I'm out of ideas how to improve this picture any further.  I'd
appreciate some suggestions.  Thanks.

--
Guy Rouillier


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-28 Thread Shoaib Mir

Here are my few recommendations that might help you:

- You will need to do table partitioning (
http://www.postgresql.org/docs/current/static/ddl-partitioning.html) as you
are storing quite a lot of data in one table per day.

- You are using a RAID5 setup which is something that can also affect
performance so switching to RAID1 might help you there, but again you have a
RAID5 with 12 disks so hmm that shouldn't be that much of a problem.

- Have you done the tuning for postgresql.conf parameters? if not then you
really need to do this for like checkpoint segments, random page cost,
shared buffers, cache size, fsm pages, vacuum cost delay, work_mem, bgwriter
etc etc. You can get good advice for tuning these parameters at --
http://www.powerpostgresql.com/PerfList/

- For autovacuuming you need to properly tune the thresholds so that the
vacuum and analyze is done at the right time not affecting the database
server performance. (You can find help for this at
http://www.postgresql.org/docs/current/static/routine-vacuuming.html under 
22.1.4. The auto-vacuum daemon)

- You will need to separate your transactional logs i.e. pg_xlog folder to a
different drive other then your database server drive. This can be done by
creating symlinks for pg_xlog folder.

- I hope you are doing proper connection pool management, because good use
of database connections can be really effect the overall performance,
connections can be expensive to create, and consume memory if they are not
properly exited.

Hope that helps your tests...


Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 12/28/06, Guy Rouillier [EMAIL PROTECTED] wrote:


I don't want to violate any license agreement by discussing performance,
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as
BigDBMS here.

I'm trying to convince my employer to replace BigDBMS with PostgreSQL
for at least some of our Java applications.  As a proof of concept, I
started with a high-volume (but conceptually simple) network data
collection application.  This application collects files of 5-minute
usage statistics from our network devices, and stores a raw form of
these stats into one table and a normalized form into a second table.
We are currently storing about 12 million rows a day in the normalized
table, and each month we start new tables.  For the normalized data, the
app inserts rows initialized to zero for the entire current day first
thing in the morning, then throughout the day as stats are received,
executes updates against existing rows.  So the app has very high update
activity.

In my test environment, I have a dual-x86 Linux platform running the
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and
PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those
familiar with these devices.)  The arrays are set up with RAID5.  So I'm
working with a consistent hardware platform for this comparison.  I'm
only processing a small subset of files (144.)

BigDBMS processed this set of data in 2 seconds, with all foreign
keys in place.  With all foreign keys in place, PG took 54000 seconds to
complete the same job.  I've tried various approaches to autovacuum
(none, 30-seconds) and it doesn't seem to make much difference.  What
does seem to make a difference is eliminating all the foreign keys; in
that configuration, PG takes about 3 seconds.  Better, but BigDBMS
still has it beat significantly.

I've got PG configured so that that the system database is on disk array
2, as are the transaction log files.  The default table space for the
test database is disk array 3.  I've got all the reference tables (the
tables to which the foreign keys in the stats tables refer) on this
array.  I also store the stats tables on this array.  Finally, I put the
indexes for the stats tables on disk array 4.  I don't use disk array 1
because I believe it is a software array.

I'm out of ideas how to improve this picture any further.  I'd
appreciate some suggestions.  Thanks.

--
Guy Rouillier


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-28 Thread Dave Cramer

Guy,

Did you tune postgresql ? How much memory does the box have? Have you  
tuned postgresql ?


Dave
On 28-Dec-06, at 12:46 AM, Guy Rouillier wrote:

I don't want to violate any license agreement by discussing  
performance, so I'll refer to a large, commercial PostgreSQL- 
compatible DBMS only as BigDBMS here.


I'm trying to convince my employer to replace BigDBMS with  
PostgreSQL for at least some of our Java applications.  As a proof  
of concept, I started with a high-volume (but conceptually simple)  
network data collection application.  This application collects  
files of 5-minute usage statistics from our network devices, and  
stores a raw form of these stats into one table and a normalized  
form into a second table. We are currently storing about 12 million  
rows a day in the normalized table, and each month we start new  
tables.  For the normalized data, the app inserts rows initialized  
to zero for the entire current day first thing in the morning, then  
throughout the day as stats are received, executes updates against  
existing rows.  So the app has very high update activity.


In my test environment, I have a dual-x86 Linux platform running  
the application, and an old 4-CPU Sun Enterprise 4500 running  
BigDBMS and PostgreSQL 8.2.0 (only one at a time.)  The Sun box has  
4 disk arrays attached, each with 12 SCSI hard disks (a D1000 and 3  
A1000, for those familiar with these devices.)  The arrays are set  
up with RAID5.  So I'm working with a consistent hardware platform  
for this comparison.  I'm only processing a small subset of files  
(144.)


BigDBMS processed this set of data in 2 seconds, with all  
foreign keys in place.  With all foreign keys in place, PG took  
54000 seconds to complete the same job.  I've tried various  
approaches to autovacuum (none, 30-seconds) and it doesn't seem to  
make much difference.  What does seem to make a difference is  
eliminating all the foreign keys; in that configuration, PG takes  
about 3 seconds.  Better, but BigDBMS still has it beat  
significantly.


I've got PG configured so that that the system database is on disk  
array 2, as are the transaction log files.  The default table space  
for the test database is disk array 3.  I've got all the reference  
tables (the tables to which the foreign keys in the stats tables  
refer) on this array.  I also store the stats tables on this  
array.  Finally, I put the indexes for the stats tables on disk  
array 4.  I don't use disk array 1 because I believe it is a  
software array.


I'm out of ideas how to improve this picture any further.  I'd  
appreciate some suggestions.  Thanks.


--
Guy Rouillier


---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-28 Thread Tom Lane
Shoaib Mir [EMAIL PROTECTED] writes:
 Here are my few recommendations that might help you:
 [ snip good advice ]

Another thing to look at is whether you are doing inserts/updates as
individual transactions, and if so see if you can batch them to
reduce the per-transaction overhead.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] High update activity, PostgreSQL vs BigDBMS

2006-12-27 Thread Guy Rouillier
I don't want to violate any license agreement by discussing performance, 
so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as 
BigDBMS here.


I'm trying to convince my employer to replace BigDBMS with PostgreSQL 
for at least some of our Java applications.  As a proof of concept, I 
started with a high-volume (but conceptually simple) network data 
collection application.  This application collects files of 5-minute 
usage statistics from our network devices, and stores a raw form of 
these stats into one table and a normalized form into a second table. 
We are currently storing about 12 million rows a day in the normalized 
table, and each month we start new tables.  For the normalized data, the 
app inserts rows initialized to zero for the entire current day first 
thing in the morning, then throughout the day as stats are received, 
executes updates against existing rows.  So the app has very high update 
activity.


In my test environment, I have a dual-x86 Linux platform running the 
application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and 
PostgreSQL 8.2.0 (only one at a time.)  The Sun box has 4 disk arrays 
attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those 
familiar with these devices.)  The arrays are set up with RAID5.  So I'm 
working with a consistent hardware platform for this comparison.  I'm 
only processing a small subset of files (144.)


BigDBMS processed this set of data in 2 seconds, with all foreign 
keys in place.  With all foreign keys in place, PG took 54000 seconds to 
complete the same job.  I've tried various approaches to autovacuum 
(none, 30-seconds) and it doesn't seem to make much difference.  What 
does seem to make a difference is eliminating all the foreign keys; in 
that configuration, PG takes about 3 seconds.  Better, but BigDBMS 
still has it beat significantly.


I've got PG configured so that that the system database is on disk array 
2, as are the transaction log files.  The default table space for the 
test database is disk array 3.  I've got all the reference tables (the 
tables to which the foreign keys in the stats tables refer) on this 
array.  I also store the stats tables on this array.  Finally, I put the 
indexes for the stats tables on disk array 4.  I don't use disk array 1 
because I believe it is a software array.


I'm out of ideas how to improve this picture any further.  I'd 
appreciate some suggestions.  Thanks.


--
Guy Rouillier


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings