Re: [PERFORM] Testing Sandforce SSD

2010-08-05 Thread Brad Nicholson

 On 10-08-04 03:49 PM, Scott Carey wrote:

On Aug 2, 2010, at 7:26 AM, Merlin Moncure wrote:


On Fri, Jul 30, 2010 at 11:01 AM, Yeb Havingayebhavi...@gmail.com  wrote:

After a week testing I think I can answer the question above: does it work
like it's supposed to under PostgreSQL?

YES

The drive I have tested is the $435,- 50GB OCZ Vertex 2 Pro,
http://www.newegg.com/Product/Product.aspx?Item=N82E16820227534

* it is safe to mount filesystems with barrier off, since it has a 'supercap
backed cache'. That data is not lost is confirmed by a dozen power switch
off tests while running either diskchecker.pl or pgbench.
* the above implies its also safe to use this SSD with barriers, though that
will perform less, since this drive obeys write trough commands.
* the highest pgbench tps number for the TPC-B test for a scale 300 database
(~5GB) I could get was over 6700. Judging from the iostat average util of
~40% on the xlog partition, I believe that this number is limited by other
factors than the SSD, like CPU, core count, core MHz, memory size/speed, 8.4
pgbench without threads. Unfortunately I don't have a faster/more core
machines available for testing right now.
* pgbench numbers for a larger than RAM database, read only was over 25000
tps (details are at the end of this post), during which iostat reported
~18500 read iops and 100% utilization.
* pgbench max reported latencies are 20% of comparable BBWC setups.
* how reliable it is over time, and how it performs over time I cannot say,
since I tested it only for a week.

Thank you very much for posting this analysis.  This has IMNSHO the
potential to be a game changer.  There are still some unanswered
questions in terms of how the drive wears, reliability, errors, and
lifespan but 6700 tps off of a single 400$ device with decent fault
tolerance is amazing (Intel, consider yourself upstaged).  Ever since
the first samsung SSD hit the market I've felt the days of the
spinning disk have been numbered.  Being able to build a 100k tps
server on relatively inexpensive hardware without an entire rack full
of drives is starting to look within reach.

Intel's next gen 'enterprise' SSD's are due out later this year.  I have heard 
from those with access to to test samples that they really like them -- these 
people rejected the previous versions because of the data loss on power failure.

So, hopefully there will be some interesting competition later this year in the 
medium price range enterprise ssd market.



I'll be doing some testing on Enterprise grade SSD's this year.  I'll 
also be looking at some hybrid storage products that use as  SSD's as 
accelerators mixed with lower cost storage.


--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] vacuum performance on insert

2010-08-05 Thread Sean Chen
Hi, I'm curious -- does vacuum analyze e.g. table1 improve
performance on insert into table1  I understand the vacuum
analyze helps out the query -- select, etc., but just not quite sure
on insert.

Specifically, I'm doing the following.

1, delete records ...
2, insert records ...

if I add vacuum analyze in-between this two steps, will it help on
the performance on the insert?
More importantly, If so, why?

Thanks,
Sean

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] vacuum performance on insert

2010-08-05 Thread Kevin Grittner
Sean Chen zysc...@gmail.com wrote:
 
 1, delete records ...
 2, insert records ...
 
 if I add vacuum analyze in-between this two steps, will it help
 on the performance on the insert?
 
Assuming there are no long-running transactions which would still be
able to see the deleted rows, a VACUUM between those statements
would allow the INSERT to re-use the space previously occupied by
the deleted rows, rather than possibly needing to allocate new space
from the OS.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] vacuum performance on insert

2010-08-05 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Sean Chen zysc...@gmail.com wrote:
 1, delete records ...
 2, insert records ...
 
 if I add vacuum analyze in-between this two steps, will it help
 on the performance on the insert?
 
 Assuming there are no long-running transactions which would still be
 able to see the deleted rows, a VACUUM between those statements
 would allow the INSERT to re-use the space previously occupied by
 the deleted rows, rather than possibly needing to allocate new space
 from the OS.

But on the other side of the coin, the ANALYZE step is probably not very
helpful there.  Better to do that after you've loaded the new data.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Kenneth Cox
I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM running  
CentOS 5.4 x86_64.  I have a ServeRAID 8k controller with 6 SATA 7500RPM  
disks in RAID 6, and for the OLAP workload it feels* slow.  I have 6 more  
disks to add, and the RAID has to be rebuilt in any case, but first I  
would like to solicit general advice.  I know that's little data to go on,  
and I believe in the scientific method, but in this case I don't have the  
time to make many iterations.


My questions are simple, but in my reading I have not been able to find  
definitive answers:


1) Should I switch to RAID 10 for performance?  I see things like RAID 5  
is bad for a DB and RAID 5 is slow with = 6 drives but I see little on  
RAID 6.  RAID 6 was the original choice for more usable space with good  
redundancy.  My current performance is 85MB/s write, 151 MB/s reads (using  
dd of 2xRAM per  
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm).


2) Should I configure the ext3 file system with noatime and/or  
data=writeback or data=ordered?  My controller has a battery, the logical  
drive has write cache enabled (write-back), and the physical devices have  
write cache disabled (write-through).


3) Do I just need to spend more time configuring postgresql?  My  
non-default settings were largely generated by pgtune-0.9.3:


max_locks_per_transaction = 128 # manual; avoiding out of shared  
memory

default_statistics_target = 100
maintenance_work_mem = 1GB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 16GB
work_mem = 352MB
wal_buffers = 32MB
checkpoint_segments = 64
shared_buffers = 2316MB
max_connections = 32

I am happy to take informed opinion.  If you don't have the time to  
properly cite all your sources but have suggestions, please send them.


Thanks in advance,
Ken

* I know feels slow is not scientific.  What I mean is that any single  
query on a fact table, or any 'rm -rf' of a big directory sends disk  
utilization to 100% (measured with iostat -x 3).


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Two fast searches turn slow when used with OR clause

2010-08-05 Thread Craig James

I can query either my PARENT table joined to PRICES, or my VERSION table joined 
to PRICES, and get an answer in 30-40 msec.  But put the two together, it jumps 
to 4 seconds.  What am I missing here?  I figured this query would be nearly 
instantaneous.  The VERSION.ISOSMILES and PARENT.ISOSMILES columns both have 
unique indexes.  Instead of using these indexes, it's doing a full-table scan 
of both tables, even though there can't possibly be more than one match in each 
table.

I guess I could rewrite this as a UNION of the two subqueries, but that seems 
contrived.

This is PG 8.3.10 on Linux.

Thanks,
Craig


= explain analyze select p.price, p.amount, p.units, s.catalogue_id, 
vn.version_id
- from plus p join sample s
-  on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id)
- join version vn on (s.version_id = vn.version_id) join parent pn
-  on (s.parent_id = pn.parent_id)
- where vn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1'
- or pn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1'
- order by price;

 Sort  (cost=71922.00..71922.00 rows=1 width=19) (actual 
time=4337.114..4337.122 rows=10 loops=1)
   Sort Key: p.price   Sort Method:  quicksort  Memory: 25kB
   -  Nested Loop  (cost=18407.53..71921.99 rows=1 width=19) (actual 
time=1122.685..4337.028 rows=10 loops=1)
 -  Hash Join  (cost=18407.53..71903.71 rows=4 width=20) (actual 
time=1122.624..4336.682 rows=7 loops=1)
   Hash Cond: (s.version_id = vn.version_id)
   Join Filter: ((vn.isosmiles = 
'Fc1n1B1OC(C)(C)C(C)(C)O1'::text) OR (pn.isosmiles = 
'Fc1n1B1OC(C)(C)C(C)(C)O1'::text))
   -  Hash Join  (cost=8807.15..44470.73 rows=620264 width=54) 
(actual time=431.501..2541.329 rows=620264 loops=1)
 Hash Cond: (s.parent_id = pn.parent_id)
 -  Seq Scan on sample s  (cost=0.00..21707.64 rows=620264 
width=24) (actual time=0.008..471.340 rows=620264 loops=1)
 -  Hash  (cost=5335.40..5335.40 rows=277740 width=38) 
(actual time=431.166..431.166 rows=277740 loops=1)
   -  Seq Scan on parent pn  (cost=0.00..5335.40 
rows=277740 width=38) (actual time=0.012..195.822 rows=277740 loops=1)
   -  Hash  (cost=5884.06..5884.06 rows=297306 width=38) (actual 
time=467.267..467.267 rows=297306 loops=1)
 -  Seq Scan on version vn  (cost=0.00..5884.06 
rows=297306 width=38) (actual time=0.017..215.285 rows=297306 loops=1)
 -  Index Scan using i_plus_compound_id on plus p  (cost=0.00..4.51 
rows=4 width=26) (actual time=0.039..0.041 rows=1 loops=7)
   Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id 
= s.compound_id))
 Total runtime: 4344.222 ms
(17 rows)


If I only query the VERSION table, it's very fast:

x= explain analyze select p.price, p.amount, p.units, s.catalogue_id, 
vn.version_id
- from plus p
- join sample s on (p.compound_id = s.compound_id and p.supplier_id = 
s.supplier_id)
- join version vn on (s.version_id = vn.version_id)
- where vn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' order by price;

Sort  (cost=45.73..45.74 rows=1 width=19) (actual time=32.438..32.448 rows=10 
loops=1)
   Sort Key: p.price
   Sort Method:  quicksort  Memory: 25kB
   -  Nested Loop  (cost=0.00..45.72 rows=1 width=19) (actual 
time=32.309..32.411 rows=10 loops=1)
 -  Nested Loop  (cost=0.00..36.58 rows=2 width=20) (actual 
time=32.295..32.319 rows=7 loops=1)
   -  Index Scan using i_version_isosmiles on version vn  
(cost=0.00..8.39 rows=1 width=4) (actual time=32.280..32.281 rows=1 loops=1)
 Index Cond: (isosmiles = 
'Fc1n1B1OC(C)(C)C(C)(C)O1'::text)
   -  Index Scan using i_sample_version_id on sample s  
(cost=0.00..28.12 rows=6 width=20) (actual time=0.011..0.024 rows=7 loops=1)
 Index Cond: (s.version_id = vn.version_id)
 -  Index Scan using i_plus_compound_id on plus p  (cost=0.00..4.51 
rows=4 width=26) (actual time=0.010..0.011 rows=1 loops=7)
   Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id 
= s.compound_id))
 Total runtime: 32.528 ms
(12 rows)


Same good performance if I only query the PARENT table:

x= explain analyze select p.price, p.amount, p.units, s.catalogue_id, 
pn.parent_id from plus p join sample s on (p.compound_id = s.compound_id and 
p.supplier_id = s.supplier_id) join parent pn on (s.parent_id = pn.parent_id) 
where pn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' order by price;
QUERY PLAN
---
 Sort  (cost=57.73..57.74 rows=1 width=19) (actual time=43.564..43.564 rows=10 
loops=1)
   Sort Key: p.price
   Sort Method:  quicksort  Memory: 25kB
   -  Nested Loop  (cost=0.00..57.72 rows=1 width=19) (actual 
time=43.429..43.537 

Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Alan Hodgson
On Thursday, August 05, 2010, Kenneth Cox kens...@gmail.com wrote:
 1) Should I switch to RAID 10 for performance?  I see things like RAID 5
 is bad for a DB and RAID 5 is slow with = 6 drives but I see little
 on RAID 6.  RAID 6 was the original choice for more usable space with
 good redundancy.  My current performance is 85MB/s write, 151 MB/s reads
 (using dd of 2xRAM per
 http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm).

If you can spare the drive space, go to RAID 10. RAID 5/6 usually look fine 
on single-threaded sequential tests (unless your controller really sucks), 
but in the real world with multiple processes doing random I/O RAID 10 will 
go a lot further on the same drives. Plus your recovery time from disk 
failures will be a lot faster.

If you can't spare the drive space ... you should buy more drives. 

 
 2) Should I configure the ext3 file system with noatime and/or
 data=writeback or data=ordered?  My controller has a battery, the logical
 drive has write cache enabled (write-back), and the physical devices have
 write cache disabled (write-through).

noatime is fine but really minor filesystem options rarely show much impact. 
My best performance comes from XFS filesystems created with stripe options 
matching the underlying RAID array. Anything else is just a bonus.

 * I know feels slow is not scientific.  What I mean is that any single
 query on a fact table, or any 'rm -rf' of a big directory sends disk
 utilization to 100% (measured with iostat -x 3).

.. and it should. Any modern system can peg a small disk array without much 
effort. Disks are slow.

-- 
No animals were harmed in the recording of this episode. We tried but that 
damn monkey was just too fast.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Scott Marlowe
On Thu, Aug 5, 2010 at 12:28 PM, Kenneth Cox kens...@gmail.com wrote:
 I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM running
 CentOS 5.4 x86_64.  I have a ServeRAID 8k controller with 6 SATA 7500RPM
 disks in RAID 6, and for the OLAP workload it feels* slow.  I have 6 more
 disks to add, and the RAID has to be rebuilt in any case, but first I would
 like to solicit general advice.  I know that's little data to go on, and I
 believe in the scientific method, but in this case I don't have the time to
 make many iterations.

 My questions are simple, but in my reading I have not been able to find
 definitive answers:

 1) Should I switch to RAID 10 for performance?  I see things like RAID 5 is
 bad for a DB and RAID 5 is slow with = 6 drives but I see little on RAID
 6.  RAID 6 was the original choice for more usable space with good
 redundancy.  My current performance is 85MB/s write, 151 MB/s reads (using
 dd of 2xRAM per
 http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm).

Sequential read / write is not very useful for a database benchmark.
It does kind of give you a baseline for throughput, but most db access
is mixed enough that random access becomes the important measurement.

RAID6 is basically RAID5 with a hot spare already built into the
array.  This makes rebuild less of an issue since you can reduce the
spare io used to rebuild the array to something really small.
However, it's in the same performance ballpark as RAID 5 with the
accompanying write performance penalty.

RAID-10 is pretty much the only way to go for a DB, and if you need
more space, you need more or bigger drives, not RAID-5/6

-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Greg Smith

Kenneth Cox wrote:
1) Should I switch to RAID 10 for performance?  I see things like 
RAID 5 is bad for a DB and RAID 5 is slow with = 6 drives but I 
see little on RAID 6.  RAID 6 was the original choice for more usable 
space with good redundancy.  My current performance is 85MB/s write, 
151 MB/s reads 


RAID6 is no better than RAID5 performance wise, it just has better fault 
tolerance.  And the ServeRAID 8k is a pretty underpowered card as RAID 
controllers go, so it would not be impossible for it computing RAID 
parity and the like to be the bottleneck here.  I'd expect a 6-disk 
RAID10 with 7200RPM drives to be closer to 120MB/s on writes, so you're 
not getting ideal performance there.  Your read figure is more 
competative, but that's usually the RAID5 pattern--decent on reads, 
slugging on writes.


2) Should I configure the ext3 file system with noatime and/or 
data=writeback or data=ordered?  My controller has a battery, the 
logical drive has write cache enabled (write-back), and the physical 
devices have write cache disabled (write-through).


data=ordered is the ext3 default and usually a reasonable choice.  Using 
writeback instead can be dangerous, I wouldn't advise starting there.  
noatime is certainly a good thing, but the speedup is pretty minor if 
you have a battery-backed write cache.



3) Do I just need to spend more time configuring postgresql?  My 
non-default settings were largely generated by pgtune-0.9.3


Those look reasonable enough, except no reason to make wal_buffers 
bigger than 16MB.  That work_mem figure might be high too, that's a 
known concern with pgtune I need to knock out of it one day soon.  When 
you are hitting high I/O wait periods, is the system running out of RAM 
and swapping?  That can cause really nasty I/O wait.


Your basic hardware is off a bit, but not so badly that I'd start 
there.  Have you turned on slow query logging to see what is hammering 
the system when the iowait climbs?  Often tuning those by looking at the 
EXPLAIN ANALYZE output can be much more effective than hardware/server 
configuration tuning.


* I know feels slow is not scientific.  What I mean is that any 
single query on a fact table, or any 'rm -rf' of a big directory sends 
disk utilization to 100% (measured with iostat -x 3).


rm -rf is really slow on ext3 on just about any hardware.  If your 
fact tables aren't in RAM and you run a query against them, paging them 
back in again will hammer the disks until it's done.  That's not 
necessarily indicative of a misconfiguration on its own.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Craig James

On 8/5/10 11:28 AM, Kenneth Cox wrote:

I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM
running CentOS 5.4 x86_64. I have a ServeRAID 8k controller with 6 SATA
7500RPM disks in RAID 6, and for the OLAP workload it feels* slow
 My current performance is 85MB/s write, 151 MB/s reads


I get 193MB/sec write and 450MB/sec read on a RAID10 on 8 SATA 7200 RPM disks.  
RAID10 seems to scale linearly -- add disks, get more speed, to the limit of 
your controller.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Scott Marlowe
On Thu, Aug 5, 2010 at 4:27 PM, Pierre C li...@peufeu.com wrote:

 1) Should I switch to RAID 10 for performance?  I see things like RAID 5
 is bad for a DB and RAID 5 is slow with = 6 drives but I see little on
 RAID 6.

 As others said, RAID6 is RAID5 + a hot spare.

 Basically when you UPDATE a row, at some point postgres will write the page
 which contains that row.

 RAID10 : write the page to all mirrors.
 RAID5/6 : write the page to the relevant disk. Read the corresponding page
 from all disks (minus one), compute parity, write parity.

Actually it's not quite that bad.  You only have to read from two
disks, the data disk and the parity disk, then compute new parity and
write to both disks.  Still 2 reads / 2 writes for every write.

 As you can see one small write will need to hog all drives in the array.
 RAID5/6 performance for small random writes is really, really bad.

 Databases like RAID10 for reads too because when you need some random data
 you can get it from any of the mirrors, so you get increased parallelism on
 reads too.

Also for sequential access RAID-10 can read both drives in a pair
interleaved so you get 50% of the data you need from each drive and
double the read rate there.  This is even true for linux software md
RAID.

 with good redundancy.  My current performance is 85MB/s write, 151 MB/s
 reads

 FYI, I get 200 MB/s sequential out of the software RAID5 of 3 cheap desktop
 consumer SATA drives in my home multimedia server...

On a machine NOT configured for max seq throughput (it's used for
mostly OLTP stuff) I get 325M/s both read and write speed with a 26
disk RAID-10.  OTOH, that setup gets ~6000 to 7000 transactions per
second with multi-day runs of pgbench.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Dave Crooke
Definitely switch to RAID-10  it's not merely that it's a fair bit
faster on normal operations (less seek contention), it's **WAY** faster than
any parity based RAID (RAID-2 through RAID-6) in degraded mode when you lose
a disk and have to rebuild it. This is something many people don't test for,
and then get bitten badly when they lose a drive under production loads.

Use higher capacity drives if necessary to make your data fit in the number
of spindles your controller supports ... the difference in cost is modest
compared to an overall setup, especially with SATA. Make sure you still
leave at least one hot spare!

In normal operation, RAID-5 has to read and write 2 drives for every write
... not sure about RAID-6 but I suspect it needs to read the entire stripe
to recalculate the Hamming parity, and it definitely has to write to 3
drives for each write, which means seeking all 3 of those drives to that
position. In degraded mode (a disk rebuilding) with either of those levels,
ALL the drives have to seek to that point for every write, and for any reads
of the failed drive, so seek contention is horrendous.

RAID-5 and RAID-6 are designed for optimum capacity, protection, and low
write performance, which is fine for a general file server.

Parity RAID simply isn't suitable for database use  anyone who claims
otherwise either (a) doesn't understand the failure modes of RAID, or (b) is
running in a situation where performance simply doesn't matter.

Cheers
Dave

On Thu, Aug 5, 2010 at 1:28 PM, Kenneth Cox kens...@gmail.com wrote:

 My questions are simple, but in my reading I have not been able to find
 definitive answers:

 1) Should I switch to RAID 10 for performance?  I see things like RAID 5
 is bad for a DB and RAID 5 is slow with = 6 drives but I see little on
 RAID 6.  RAID 6 was the original choice for more usable space with good
 redundancy.  My current performance is 85MB/s write, 151 MB/s reads (using
 dd of 2xRAM per
 http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htmhttp://www.westnet.com/%7Egsmith/content/postgresql/pg-disktesting.htm
 ).



Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Scott Marlowe
On Thu, Aug 5, 2010 at 5:13 PM, Dave Crooke dcro...@gmail.com wrote:
 Definitely switch to RAID-10  it's not merely that it's a fair bit
 faster on normal operations (less seek contention), it's **WAY** faster than
 any parity based RAID (RAID-2 through RAID-6) in degraded mode when you lose
 a disk and have to rebuild it. This is something many people don't test for,
 and then get bitten badly when they lose a drive under production loads.

Had a friend with a 600G x 5 disk RAID-5 and one drive died.  It took
nearly 48 hours to rebuild the array.

 Use higher capacity drives if necessary to make your data fit in the number
 of spindles your controller supports ... the difference in cost is modest
 compared to an overall setup, especially with SATA. Make sure you still
 leave at least one hot spare!

Yeah, a lot of chassis hold an even number of drives, and I wind up
with 2 hot spares because of it.

 Parity RAID simply isn't suitable for database use  anyone who claims
 otherwise either (a) doesn't understand the failure modes of RAID, or (b) is
 running in a situation where performance simply doesn't matter.

The only time it's acceptable is when you're running something like
low write volume report generation / batch processing, where you're
mostly sequentially reading and writing 100s of gigabytes at a time in
one or maybe two threads.

-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Mark Kirkwood

On 06/08/10 06:28, Kenneth Cox wrote:
I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM 
running CentOS 5.4 x86_64.  I have a ServeRAID 8k controller with 6 
SATA 7500RPM disks in RAID 6, and for the OLAP workload it feels* 
slow.  I have 6 more disks to add, and the RAID has to be rebuilt in 
any case, but first I would like to solicit general advice.  I know 
that's little data to go on, and I believe in the scientific method, 
but in this case I don't have the time to make many iterations.


My questions are simple, but in my reading I have not been able to 
find definitive answers:


1) Should I switch to RAID 10 for performance?  I see things like 
RAID 5 is bad for a DB and RAID 5 is slow with = 6 drives but I 
see little on RAID 6.  RAID 6 was the original choice for more usable 
space with good redundancy.  My current performance is 85MB/s write, 
151 MB/s reads (using dd of 2xRAM per 
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm).




Normally I'd agree with the others and recommend RAID10 - but you say 
you have an OLAP workload - if it is *heavily* read biased you may get 
better performance with RAID5 (more effective disks to read from). 
Having said that, your sequential read performance right now is pretty 
low (151 MB/s  - should be double this), which may point to an issue 
with this controller. Unfortunately this *may* be important for an OLAP 
workload (seq scans of big tables).




2) Should I configure the ext3 file system with noatime and/or 
data=writeback or data=ordered?  My controller has a battery, the 
logical drive has write cache enabled (write-back), and the physical 
devices have write cache disabled (write-through).




Probably wise to use noatime. If you have a heavy write workload (i.e so 
what I just wrote above does *not* apply), then you might find adjusting 
the ext3 commit interval upwards from its default of 5 seconds can help 
(I'm doing some testing at the moment and commit=20 seemed to improve 
performance by about 5-10%).


3) Do I just need to spend more time configuring postgresql?  My 
non-default settings were largely generated by pgtune-0.9.3:


max_locks_per_transaction = 128 # manual; avoiding out of shared 
memory

default_statistics_target = 100
maintenance_work_mem = 1GB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 16GB
work_mem = 352MB
wal_buffers = 32MB
checkpoint_segments = 64
shared_buffers = 2316MB
max_connections = 32



Possibly higher checkpoint_segments and lower wal_buffers (I recall 
someone - maybe Greg suggesting that there was no benefit in having the 
latter  10MB). I wonder about setting shared_buffers higher - how large 
is the database?


Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Alan Hodgson
On Thursday, August 05, 2010, Mark Kirkwood mark.kirkw...@catalyst.net.nz 
wrote:
 Normally I'd agree with the others and recommend RAID10 - but you say
 you have an OLAP workload - if it is *heavily* read biased you may get
 better performance with RAID5 (more effective disks to read from).
 Having said that, your sequential read performance right now is pretty
 low (151 MB/s  - should be double this), which may point to an issue
 with this controller. Unfortunately this *may* be important for an OLAP
 workload (seq scans of big tables).

Probably a low (default) readahead limitation. ext3 doesn't help but it can 
usually get up over 400MB/sec. Doubt it's the controller.

-- 
No animals were harmed in the recording of this episode. We tried but that 
damn monkey was just too fast.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Mark Kirkwood

On 06/08/10 11:58, Alan Hodgson wrote:

On Thursday, August 05, 2010, Mark Kirkwoodmark.kirkw...@catalyst.net.nz
wrote:
   

Normally I'd agree with the others and recommend RAID10 - but you say
you have an OLAP workload - if it is *heavily* read biased you may get
better performance with RAID5 (more effective disks to read from).
Having said that, your sequential read performance right now is pretty
low (151 MB/s  - should be double this), which may point to an issue
with this controller. Unfortunately this *may* be important for an OLAP
workload (seq scans of big tables).
 

Probably a low (default) readahead limitation. ext3 doesn't help but it can
usually get up over 400MB/sec. Doubt it's the controller.

   


Yeah - good suggestion, so cranking up readahead (man blockdev) and 
retesting is recommended.


Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance