Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-31 Thread Merlin Moncure
On Sun, Aug 30, 2009 at 7:38 PM, Greg Starkgsst...@mit.edu wrote:
 On Sun, Aug 30, 2009 at 11:56 PM, Merlin Moncuremmonc...@gmail.com wrote:
 192k written
  raid 10: six writes
  raid 5: four writes, one read (but the read and one of the writes is
 same physical location)

 now, by 'same physical' location, that may mean that the drive head
 has to move if the data is not in cache.

 I realize that many raid 5 implementations tend to suck.  That said,
 raid 5 should offer higher theoretical performance for writing than
 raid 10, both for sequential and random.

 In the above there are two problems.

 1) 192kB is not a random access pattern. Any time you're writing a
 whole raid stripe or more then RAID5 can start performing reasonably
 but that's not random, that's sequential i/o. The relevant random i/o
 pattern is writing 8kB chunks at random offsets into a multi-terabyte
 storage which doesn't fit in cache.

 2) It's not clear but I think you're saying but the read and one of
 the writes is same physical location on the basis that this mitigates
 the costs. In fact it's the worst case. It means after doing the read
 and calculating the parity block the drive must then spin a full
 rotation before being able to write it back out. So instead of an
 average latency of 1/2 of a rotation you have that plus a full
 rotation, or 3x as much latency before the write can be performed as
 without raid5.

 It's not a fault of the implementations, it's a fundamental problem
 with RAId5. Even a spectacular implementation of RAID5 will be awful
 for random access writes. The only saving grace some hardware
 implementations have is having huge amounts of battery backed cache
 which mean that they can usually buffer all the writes for long enough
 that the access patterns no longer look random. If you buffer enough
 then you can hope you'll eventually overwrite the whole stripe and can
 write out the new parity without reading the old data. Or failing that
 you can perform the reads of the old data when it's convenient because
 you're reading nearby data effectively turning it into sequential i/o.

I agree, that's good analysis.  The main point I was making was that
if you have say a 10 disk raid 5, you don't involve 10 disks, only
two...a very common misconception.  I made another mistake that you
didn't catch: you need to read *both* the data drive and the parity
drive before writing, not just the parity drive.

I wonder if flash SSD are a better fit for raid 5 since the reads are
much cheaper than writes and there is no rotational latency.  (also,
$/gb is different, and so are the failure cases).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-31 Thread Aidan Van Dyk
* Merlin Moncure mmonc...@gmail.com [090831 10:38]:
 
 I agree, that's good analysis.  The main point I was making was that
 if you have say a 10 disk raid 5, you don't involve 10 disks, only
 two...a very common misconception.  I made another mistake that you
 didn't catch: you need to read *both* the data drive and the parity
 drive before writing, not just the parity drive.
 
 I wonder if flash SSD are a better fit for raid 5 since the reads are
 much cheaper than writes and there is no rotational latency.  (also,
 $/gb is different, and so are the failure cases).

The other thing that scares me about raid-5 is the write-hole, and the
possible delayed inconsistency that brings...

Again, hopefully mitigated by a dependable controller w/ BBU...

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-31 Thread Jeff Janes
On Sun, Aug 30, 2009 at 1:01 PM, Joseph S j...@selectacast.net wrote:

 I've already learned my lesson and will never use raid 5 again.  The
 question is what I do with my 14 drives. Should I use only 1 pair for
 indexes or should I use 4 drives?  The wal logs are already slated for an
 SSD.




Why not just spread all your index data over 14 spindles, and do the same
with your table data?  I haven't encountered this debate in in the pgsql
world, but from the Oracle world it seems to me the Stripe And Mirror
Everything people had the better argument than the separate tables and
indexes people.


Jeff


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-31 Thread Kevin Grittner
Jeff Janes jeff.ja...@gmail.com wrote:
 Joseph S j...@selectacast.net wrote:
 
 The question is what I do with my 14 drives. Should I use only 1
 pair for indexes or should I use 4 drives?  The wal logs are
 already slated for an SSD.
 
 Why not just spread all your index data over 14 spindles, and do the
 same with your table data?
 
If you have the luxury of being able to test more than one
configuration with something resembling your actual workload, I would
strongly recommend including this as one of your configurations.
Spreading everything over the larger number of spindles might well
out-perform your most carefully hand-crafted tuning of object
placement on smaller spindle sets.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-31 Thread Scott Marlowe
On Mon, Aug 31, 2009 at 10:31 AM, Kevin
Grittnerkevin.gritt...@wicourts.gov wrote:
 Jeff Janes jeff.ja...@gmail.com wrote:
 Joseph S j...@selectacast.net wrote:

 The question is what I do with my 14 drives. Should I use only 1
 pair for indexes or should I use 4 drives?  The wal logs are
 already slated for an SSD.

 Why not just spread all your index data over 14 spindles, and do the
 same with your table data?

 If you have the luxury of being able to test more than one
 configuration with something resembling your actual workload, I would
 strongly recommend including this as one of your configurations.
 Spreading everything over the larger number of spindles might well
 out-perform your most carefully hand-crafted tuning of object
 placement on smaller spindle sets.

The first thing I'd test would be if having a separate mirror set for
pg_xlog helps.  If you have a high write environment moving pg_xlog
off of the main data set can help a lot.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-30 Thread Merlin Moncure
On Sat, Aug 29, 2009 at 9:59 AM, Scott Marlowescott.marl...@gmail.com wrote:
 On Sat, Aug 29, 2009 at 2:46 AM, Greg Starkgsst...@mit.edu wrote:
 On Sat, Aug 29, 2009 at 5:20 AM, Luke Koopsluke.ko...@entrust.com wrote:
 Joseph S Wrote
 If I have 14 drives in a RAID 10 to split between data tables
 and indexes what would be the best way to allocate the drives
 for performance?

 RAID-5 can be much faster than RAID-10 for random reads and writes.  It is 
 much slower than RAID-10 for sequential writes, but about the same for 
 sequential reads.  For typical access patterns, I would put the data and 
 indexes on RAID-5 unless you expect there to be lots of sequential scans.

 That's pretty much exactly backwards. RAID-5 will at best slightly
 slower than RAID-0 or RAID-10 for sequential reads or random reads.
 For sequential writes it performs *terribly*, especially for random
 writes. The only write pattern where it performs ok sometimes is
 sequential writes of large chunks.

 Note that while RAID-10 is theoretically always better than RAID-5,
 I've run into quite a few cheapie controllers that were heavily
 optimised for RAID-5 and de-optimised for RAID-10.  However, if it's
 got battery backed cache and can run in JBOD mode, linux software
 RAID-10 or hybrid RAID-1 in hardware RAID-0 in software will almost
 always beat hardware RAID-5 on the same controller.


raid 5 can outperform raid 10 on sequential writes in theory.  if you
are writing 100mb of actual data on, say, a 8 drive array, the raid 10
system has to write 200mb data and the raid 5 system has to write 100
* (8/7) or about 114mb.  Of course, the raid 5 system has to do
parity, etc.

For random writes, raid 5 has to write a minimum of two drives, the
data being written and parity.  Raid 10 also has to write two drives
minimum.  A lot of people think parity is a big deal in terms of raid
5 performance penalty, but I don't -- relative to the what's going on
in the drive, xor calculation costs (one of the fastest operations in
computing) are basically zero, and off-lined if you have a hardware
raid controller.

I bet part of the problem with raid 5 is actually contention. since
your write to a stripe can conflict with other writes to a different
stripe.  The other problem with raid 5 that I see is that you don't
get very much extra protection -- it's pretty scary doing a rebuild
even with a hot spare (and then you should probably be doing raid 6).
On read performance RAID 10 wins all day long because more drives can
be involved.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-30 Thread Greg Stark
On Sun, Aug 30, 2009 at 4:40 PM, Merlin Moncuremmonc...@gmail.com wrote:

 For random writes, raid 5 has to write a minimum of two drives, the
 data being written and parity.  Raid 10 also has to write two drives
 minimum.  A lot of people think parity is a big deal in terms of raid
 5 performance penalty, but I don't -- relative to the what's going on
 in the drive, xor calculation costs (one of the fastest operations in
 computing) are basically zero, and off-lined if you have a hardware
 raid controller.

The cost is that in order to calculate the parity block the RAID
controller has to *read* in either the old data block being
overwritten and the old parity block or all the other data blocks
which participate in that parity block. So every random write becomes
not just two writes but two reads + two writes.

If you're always writing large sequential hunks at a time then this is
minimized because the RAID controller can just calculate the new
parity block for the whole new hunk. But if you often just seek to
random places in the file and overwrite 8k at a time then things go
very very poorly.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-30 Thread Mark Mielke

On 08/30/2009 11:40 AM, Merlin Moncure wrote:

For random writes, raid 5 has to write a minimum of two drives, the
data being written and parity.  Raid 10 also has to write two drives
minimum.  A lot of people think parity is a big deal in terms of raid
5 performance penalty, but I don't -- relative to the what's going on
in the drive, xor calculation costs (one of the fastest operations in
computing) are basically zero, and off-lined if you have a hardware
raid controller.

I bet part of the problem with raid 5 is actually contention. since
your write to a stripe can conflict with other writes to a different
stripe.  The other problem with raid 5 that I see is that you don't
get very much extra protection -- it's pretty scary doing a rebuild
even with a hot spare (and then you should probably be doing raid 6).
On read performance RAID 10 wins all day long because more drives can
be involved.
   


In real life, with real life writes (i.e. not sequential from the start 
of the disk to the end of the disk), where the stripes on the disk being 
written are not already in RAM (to allow for XOR to be cheap), RAID 5 is 
horrible. I still recall naively playing with software RAID 5 on a three 
disk system and finding write performance to be 20% - 50% less than a 
single drive on its own.


People need to realize that the cost of maintaining parity is not the 
XOR itself - XOR is cheap - the cost is having knowledge of all drives 
in the stripe in order to write the parity. This implies it is already 
in cache (requires a very large cache, or a very localized load such 
that the load all fits in cache), or it requires 1 or more reads before 
2 or more writes. Latency is a killer here - latency is already the 
slowest part of the disk, so to effectively multiply latency x 2 has a 
huge impact.


I will never use RAID 5 again unless I have a huge memory backed cache 
for it to cache writes against. By huge, I mean something approximately 
the size of the data normally read and written. Having 1 Gbytes of RAM 
dedicated to RAID 5 for a 1 Tbyte drive may not be enough.


RAID 1+0 on the other hand, has never disappointed me yet. Disks are 
cheap, and paying x2 for single disk redundancy is an acceptable price.


Cheers,
mark

--
Mark Mielkem...@mielke.cc


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-30 Thread Joseph S
I've already learned my lesson and will never use raid 5 again.  The 
question is what I do with my 14 drives. Should I use only 1 pair for 
indexes or should I use 4 drives?  The wal logs are already slated for 
an SSD.


Scott Marlowe wrote:

On Sat, Aug 29, 2009 at 2:46 AM, Greg Starkgsst...@mit.edu wrote:

On Sat, Aug 29, 2009 at 5:20 AM, Luke Koopsluke.ko...@entrust.com wrote:

Joseph S Wrote

If I have 14 drives in a RAID 10 to split between data tables
and indexes what would be the best way to allocate the drives
for performance?

RAID-5 can be much faster than RAID-10 for random reads and writes.  It is much 
slower than RAID-10 for sequential writes, but about the same for sequential 
reads.  For typical access patterns, I would put the data and indexes on RAID-5 
unless you expect there to be lots of sequential scans.

That's pretty much exactly backwards. RAID-5 will at best slightly
slower than RAID-0 or RAID-10 for sequential reads or random reads.
For sequential writes it performs *terribly*, especially for random
writes. The only write pattern where it performs ok sometimes is
sequential writes of large chunks.


Note that while RAID-10 is theoretically always better than RAID-5,
I've run into quite a few cheapie controllers that were heavily
optimised for RAID-5 and de-optimised for RAID-10.  However, if it's
got battery backed cache and can run in JBOD mode, linux software
RAID-10 or hybrid RAID-1 in hardware RAID-0 in software will almost
always beat hardware RAID-5 on the same controller.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-30 Thread Merlin Moncure
On Sun, Aug 30, 2009 at 1:36 PM, Mark Mielkem...@mark.mielke.cc wrote:
 On 08/30/2009 11:40 AM, Merlin Moncure wrote:

 For random writes, raid 5 has to write a minimum of two drives, the
 data being written and parity.  Raid 10 also has to write two drives
 minimum.  A lot of people think parity is a big deal in terms of raid
 5 performance penalty, but I don't -- relative to the what's going on
 in the drive, xor calculation costs (one of the fastest operations in
 computing) are basically zero, and off-lined if you have a hardware
 raid controller.

 I bet part of the problem with raid 5 is actually contention. since
 your write to a stripe can conflict with other writes to a different
 stripe.  The other problem with raid 5 that I see is that you don't
 get very much extra protection -- it's pretty scary doing a rebuild
 even with a hot spare (and then you should probably be doing raid 6).
 On read performance RAID 10 wins all day long because more drives can
 be involved.


 In real life, with real life writes (i.e. not sequential from the start of
 the disk to the end of the disk), where the stripes on the disk being
 written are not already in RAM (to allow for XOR to be cheap), RAID 5 is
 horrible. I still recall naively playing with software RAID 5 on a three
 disk system and finding write performance to be 20% - 50% less than a single
 drive on its own.

 People need to realize that the cost of maintaining parity is not the XOR
 itself - XOR is cheap - the cost is having knowledge of all drives in the
 stripe in order to write the parity. This implies it is already in cache
 (requires a very large cache, or a very localized load such that the load
 all fits in cache), or it requires 1 or more reads before 2 or more writes.
 Latency is a killer here - latency is already the slowest part of the disk,
 so to effectively multiply latency x 2 has a huge impact.

This is not necessarily correct.  As long as the data you are writing
is less than the raid stripe size (say 64kb), then you only need the
old data for that stripe (which is stored on one disk only), the
parity (also stored on one disk only), and the data being written to
recalculate the parity.  A raid stripe is usually on one disk.  So a
raid 5 random write will only involve two drives if it's less than
stripe size (and three drives if it's up to 2x stripe size, etc).

IOW, if your stripe size is 64k:
64k written:
  raid 10: two writes
  raid 5: two writes, one read (but the read and one of the writes is
same physical location)
128k written
  raid 10: four writes
  raid 5: three writes, one read (but the read and one of the writes
is same physical location)
192k written
  raid 10: six writes
  raid 5: four writes, one read (but the read and one of the writes is
same physical location)

now, by 'same physical' location, that may mean that the drive head
has to move if the data is not in cache.

I realize that many raid 5 implementations tend to suck.  That said,
raid 5 should offer higher theoretical performance for writing than
raid 10, both for sequential and random. (many, many online
descriptions of raid get this wrong and stupidly blame the overhead of
parity calculation).  raid 10 wins on read all day long.  Of course,
on a typical system with lots of things going on, it gets a lot more
complicated...

(just for the record, I use raid 10 on my databases always) :-)

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-30 Thread Greg Stark
On Sun, Aug 30, 2009 at 11:56 PM, Merlin Moncuremmonc...@gmail.com wrote:
 192k written
  raid 10: six writes
  raid 5: four writes, one read (but the read and one of the writes is
 same physical location)

 now, by 'same physical' location, that may mean that the drive head
 has to move if the data is not in cache.

 I realize that many raid 5 implementations tend to suck.  That said,
 raid 5 should offer higher theoretical performance for writing than
 raid 10, both for sequential and random.

In the above there are two problems.

1) 192kB is not a random access pattern. Any time you're writing a
whole raid stripe or more then RAID5 can start performing reasonably
but that's not random, that's sequential i/o. The relevant random i/o
pattern is writing 8kB chunks at random offsets into a multi-terabyte
storage which doesn't fit in cache.

2) It's not clear but I think you're saying but the read and one of
the writes is same physical location on the basis that this mitigates
the costs. In fact it's the worst case. It means after doing the read
and calculating the parity block the drive must then spin a full
rotation before being able to write it back out. So instead of an
average latency of 1/2 of a rotation you have that plus a full
rotation, or 3x as much latency before the write can be performed as
without raid5.

It's not a fault of the implementations, it's a fundamental problem
with RAId5. Even a spectacular implementation of RAID5 will be awful
for random access writes. The only saving grace some hardware
implementations have is having huge amounts of battery backed cache
which mean that they can usually buffer all the writes for long enough
that the access patterns no longer look random. If you buffer enough
then you can hope you'll eventually overwrite the whole stripe and can
write out the new parity without reading the old data. Or failing that
you can perform the reads of the old data when it's convenient because
you're reading nearby data effectively turning it into sequential i/o.


-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-29 Thread Greg Stark
On Sat, Aug 29, 2009 at 5:20 AM, Luke Koopsluke.ko...@entrust.com wrote:
 Joseph S Wrote
 If I have 14 drives in a RAID 10 to split between data tables
 and indexes what would be the best way to allocate the drives
 for performance?

 RAID-5 can be much faster than RAID-10 for random reads and writes.  It is 
 much slower than RAID-10 for sequential writes, but about the same for 
 sequential reads.  For typical access patterns, I would put the data and 
 indexes on RAID-5 unless you expect there to be lots of sequential scans.

That's pretty much exactly backwards. RAID-5 will at best slightly
slower than RAID-0 or RAID-10 for sequential reads or random reads.
For sequential writes it performs *terribly*, especially for random
writes. The only write pattern where it performs ok sometimes is
sequential writes of large chunks.

 Always put the transaction logs (WAL Files) on RAID-10 (or RAID-1 if you 
 don't want to dedicate so many drives to the logs).  The only significant 
 performance difference between RAID-10 and RAID-1 is that RAID-1 is much 
 slower (factor of 4 or 5) for random reads.

no, RAID-10 and RAID-1 should perform the same for reads. RAID-10 will
be slower at writes by about a factor equal to the number of mirror
sides.

 I think the ratio of random reads from the transaction logs would typically 
 be quite low.

During normal operation the logs are *never* read, neither randomly
nor sequentially.

 You don't want your transaction logs or any swapfiles on RAID-5.  The slow 
 sequential write performance can be a killer.

As i mentioned sequential writes are the only write case when RAID-5
sometimes ok. However the picture is complicated by transaction
syncing which would make RAID-5 see it more as random i/o. In any case
wal normally doesn't take much disk space so there's not much reason
to use anything but RAID-1.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-29 Thread Merlin Moncure
On Fri, Aug 28, 2009 at 8:19 PM, Jeff Janesjeff.ja...@gmail.com wrote:
 -- Forwarded message --
 From: Joseph S j...@selectacast.net
 To: Greg Smith gsm...@gregsmith.com, pgsql-performance@postgresql.org
 Date: Fri, 28 Aug 2009 10:25:10 -0400
 Subject: Re: What exactly is postgres doing during INSERT/UPDATE ?
 Greg Smith wrote:

 The main two things you can do to improve this on the database side:

 -Increase checkpoint_segments, which reduces how often updated data has
 to be flushed to disk

 It fsync is turned off, does this matter so much?

 It still matters.  The kernel is only willing to have so much dirty data
 sitting in the disk cache.  Once it reaches that limit, user processes doing
 writes start blocking while the kernel flushes stuff on their behalf.

it doesn't matter nearly as much though.  if you are outrunning the
o/s write cache with fsync off, then it's time to start looking at new
hardware.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-29 Thread Scott Marlowe
On Sat, Aug 29, 2009 at 2:46 AM, Greg Starkgsst...@mit.edu wrote:
 On Sat, Aug 29, 2009 at 5:20 AM, Luke Koopsluke.ko...@entrust.com wrote:
 Joseph S Wrote
 If I have 14 drives in a RAID 10 to split between data tables
 and indexes what would be the best way to allocate the drives
 for performance?

 RAID-5 can be much faster than RAID-10 for random reads and writes.  It is 
 much slower than RAID-10 for sequential writes, but about the same for 
 sequential reads.  For typical access patterns, I would put the data and 
 indexes on RAID-5 unless you expect there to be lots of sequential scans.

 That's pretty much exactly backwards. RAID-5 will at best slightly
 slower than RAID-0 or RAID-10 for sequential reads or random reads.
 For sequential writes it performs *terribly*, especially for random
 writes. The only write pattern where it performs ok sometimes is
 sequential writes of large chunks.

Note that while RAID-10 is theoretically always better than RAID-5,
I've run into quite a few cheapie controllers that were heavily
optimised for RAID-5 and de-optimised for RAID-10.  However, if it's
got battery backed cache and can run in JBOD mode, linux software
RAID-10 or hybrid RAID-1 in hardware RAID-0 in software will almost
always beat hardware RAID-5 on the same controller.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-29 Thread David Rees
On Sat, Aug 29, 2009 at 1:46 AM, Greg Starkgsst...@mit.edu wrote:
 On Sat, Aug 29, 2009 at 5:20 AM, Luke Koopsluke.ko...@entrust.com wrote:
 RAID-5 can be much faster than RAID-10 for random reads and writes.  It is 
 much slower than
 RAID-10 for sequential writes, but about the same for sequential reads.  For 
 typical access
 patterns, I would put the data and indexes on RAID-5 unless you expect there 
 to be lots of
 sequential scans.

 That's pretty much exactly backwards. RAID-5 will at best slightly
 slower than RAID-0 or RAID-10 for sequential reads or random reads.
 For sequential writes it performs *terribly*, especially for random
 writes. The only write pattern where it performs ok sometimes is
 sequential writes of large chunks.

Also note that how terribly RAID5 performs on those small random
writes depends on a LOT on the implementation.  A good controller with
a large BBU cache will be able to mitigate the performance penalty of
having to read stripes before small writes to calculate parity (of
course, if the writes are really random enough, it's still not going
to help much).

 Always put the transaction logs (WAL Files) on RAID-10 (or RAID-1 if you 
 don't want to dedicate
 so many drives to the logs).  The only significant performance difference 
 between RAID-10 and
 RAID-1 is that RAID-1 is much slower (factor of 4 or 5) for random reads.

 no, RAID-10 and RAID-1 should perform the same for reads. RAID-10 will
 be slower at writes by about a factor equal to the number of mirror
 sides.

Let's keep in mind that a 2-disk RAID-10 is really the same as a
2-disk RAID-1, it just doesn't have any mirrors to stripe over.  So
since you really need 4-disks for a true RAID-10, the performance of
a RAID-10 array compared to a RAID1 array is pretty much proportional
to the number of disks in the array (more disks = more performance).

The far RAID-10 layout that is available when using Linux software
raid is interesting.  It will lay the data out on the disks so that
you can get the streaming read performance of a RAID-0 array, but
streaming write performance will suffer a bit since now the disk will
have to seek to perform those writes.  You can also use this layout
with just 2 disks instead of RAID1.  Some claim that the performance
hit isn't noticeable due to write caching/IO ordering, but I have not
tested it's performance using PostgreSQL.  Might be a nice thing for
someone to try.

http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

-Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-29 Thread Jeff Janes
On Sat, Aug 29, 2009 at 6:26 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Fri, Aug 28, 2009 at 8:19 PM, Jeff Janesjeff.ja...@gmail.com wrote:
  -- Forwarded message --
  From: Joseph S j...@selectacast.net
  To: Greg Smith gsm...@gregsmith.com, pgsql-performance@postgresql.org
  Date: Fri, 28 Aug 2009 10:25:10 -0400
  Subject: Re: What exactly is postgres doing during INSERT/UPDATE ?
  Greg Smith wrote:
 
  The main two things you can do to improve this on the database side:
 
  -Increase checkpoint_segments, which reduces how often updated data has
  to be flushed to disk
 
  It fsync is turned off, does this matter so much?
 
  It still matters.  The kernel is only willing to have so much dirty data
  sitting in the disk cache.  Once it reaches that limit, user processes
 doing
  writes start blocking while the kernel flushes stuff on their behalf.

 it doesn't matter nearly as much though.


True, but it matters enough that it ought not be ignored.  I've run into it
more than once, and I haven't been at this very long.


 if you are outrunning the
 o/s write cache with fsync off, then it's time to start looking at new
 hardware.


Or to start looking at tweaking the kernel VM settings.  The kernel doesn't
always handle these situations as gracefully as it could, and might produce
a practical throughput that is much less than the theoretical one.  But
reducing the frequency of checkpoints is easier than either of these, and
cheaper than buying new hardware.  I don't see why the hardest and most
expensive option would be the first choice.

 Jeff


[PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-28 Thread Joseph S

If I run  dd if=/dev/zero bs=1024k of=file count=1000  iostat shows me:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 671.5088.00113496.00176 226992


However postgres 8.3.7 doing a bulk data write (a slony slave, doing 
inserts and updates) doesn't go nearly as fast:


 Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 418.41   648.76  7052.74   1304  14176

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 237.5044.00  3668.00 88   7336

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 221.50   444.00  3832.00888   7664

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 166.00   248.00  3360.00496   6720

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 163.00   480.00  3184.00960   6368

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 102.50   724.00  1736.00   1448   3472

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 295.50   712.00  6004.00   1424  12008

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 109.45   433.83  2260.70872   4544

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 180.00   640.00  3512.00   1280   7024

top shows the cpu usage of the pg process ranges from zero to never more 
than ten percent of a cpu, and that one cpu is always ninety some odd 
percent in iowait.  So what is postgres doing (with fsync off) that 
causes the cpu to spend so much time in iowait?


This is a 64 bit amd linux system with ext3 filesystem. free shows:

 total   used   free sharedbuffers cached
Mem:   81169928085848  31144  0 1030163098568
-/+ buffers/cache:48842643232728
Swap:  669729620355084661788

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-28 Thread Pierre Frédéric Caillau d


top shows the cpu usage of the pg process ranges from zero to never more  
than ten percent of a cpu, and that one cpu is always ninety some odd  
percent in iowait.  So what is postgres doing (with fsync off) that  
causes the cpu to spend so much time in iowait?


Updating indexes ?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-28 Thread Greg Smith

On Fri, 28 Aug 2009, Joseph S wrote:


If I run  dd if=/dev/zero bs=1024k of=file count=1000  iostat shows me:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 671.5088.00113496.00176 226992


That's the sequential transfer rate of your drive.  It's easier to present 
these numbers if you use vmstat 1 instead; that shows the I/O in more 
useful units, and with the CPU stats on the same line.


However postgres 8.3.7 doing a bulk data write (a slony slave, doing inserts 
and updates) doesn't go nearly as fast:


In PostgreSQL, an update is:

1) A read of the old data
2) Writing out the updated data
3) Marking the original data as dead
4) Updating any indexes involved
5) Later cleaning up after the now dead row

On top of that Slony may need to do its own metadata updates.

This sort of workload involves random I/O rather than sequential.  On 
regular hard drives this normally happens at a tiny fraction of the speed 
because of how the disk has to seek around.  Typically a single drive 
capable of 50-100MB/s on sequential I/O will only do 1-2MB/s on a 
completely random workload.  You look like you're getting somewhere in the 
middle there, on the low side which doesn't surprise me.


The main two things you can do to improve this on the database side:

-Increase checkpoint_segments, which reduces how often updated data has to 
be flushed to disk


-Increase shared_buffers in order to hold more of the working set of data 
in RAM, so that more reads are satisfied by the database cache and less 
data gets evicted to disk.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-28 Thread Scott Marlowe
On Fri, Aug 28, 2009 at 2:08 AM, Greg Smithgsm...@gregsmith.com wrote:

 This sort of workload involves random I/O rather than sequential.  On
 regular hard drives this normally happens at a tiny fraction of the speed
 because of how the disk has to seek around.  Typically a single drive
 capable of 50-100MB/s on sequential I/O will only do 1-2MB/s on a completely
 random workload.  You look like you're getting somewhere in the middle
 there, on the low side which doesn't surprise me.

 The main two things you can do to improve this on the database side:

 -Increase checkpoint_segments, which reduces how often updated data has to
 be flushed to disk

 -Increase shared_buffers in order to hold more of the working set of data in
 RAM, so that more reads are satisfied by the database cache and less data
 gets evicted to disk.

After that you have to start looking at hardware.  Soimething as
simple as a different drive for indexes and another for WAL, and
another for the base tables can make a big difference.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-28 Thread Joseph S

Greg Smith wrote:


The main two things you can do to improve this on the database side:

-Increase checkpoint_segments, which reduces how often updated data has 
to be flushed to disk


It fsync is turned off, does this matter so much?




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-28 Thread Joseph S

Scott Marlowe wrote:


After that you have to start looking at hardware.  Soimething as
simple as a different drive for indexes and another for WAL, and
another for the base tables can make a big difference.

If I have 14 drives in a RAID 10 to split between data tables and 
indexes what would be the best way to allocate the drives for performance?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-28 Thread Jeff Janes

 -- Forwarded message --
 From: Joseph S j...@selectacast.net
 To: Greg Smith gsm...@gregsmith.com, pgsql-performance@postgresql.org
 Date: Fri, 28 Aug 2009 10:25:10 -0400
 Subject: Re: What exactly is postgres doing during INSERT/UPDATE ?
 Greg Smith wrote:

  The main two things you can do to improve this on the database side:

 -Increase checkpoint_segments, which reduces how often updated data has to
 be flushed to disk


 It fsync is turned off, does this matter so much?


It still matters.  The kernel is only willing to have so much dirty data
sitting in the disk cache.  Once it reaches that limit, user processes doing
writes start blocking while the kernel flushes stuff on their behalf.

Jeff


Re: [PERFORM] What exactly is postgres doing during INSERT/UPDATE ?

2009-08-28 Thread Luke Koops
Joseph S Wrote
 If I have 14 drives in a RAID 10 to split between data tables
 and indexes what would be the best way to allocate the drives
 for performance?

RAID-5 can be much faster than RAID-10 for random reads and writes.  It is much 
slower than RAID-10 for sequential writes, but about the same for sequential 
reads.  For typical access patterns, I would put the data and indexes on RAID-5 
unless you expect there to be lots of sequential scans.

If you do this, you can drop the random_page_cost from the default 4.0 to 1.0.  
That should also encourage postgres to use the index more often.  I think the 
default costs for postgres assume that the data is on a RAID-1 array.  Either 
that, or they are a compromise that isn't quite right for any system.  On a 
plain old disk the random_page_cost should be 8.0 or 10.0.

The division of the drives into two arrays would depend on how much space will 
be occupied by the tables vs the indexes.  This is very specific to your 
database.  For example, if indexes take half as much space as tables, then you 
want 2/3rds for tables and 1/3rd for indexes.  8 drives for tables, 5 drives 
for indexes, and 1 for a hot standby.  The smaller array may be a bit slower 
for some operations due to reduced parallelism.  This also depends on the 
intelligence of your RAID controller.

Always put the transaction logs (WAL Files) on RAID-10 (or RAID-1 if you don't 
want to dedicate so many drives to the logs).  The only significant performance 
difference between RAID-10 and RAID-1 is that RAID-1 is much slower (factor of 
4 or 5) for random reads.  I think the ratio of random reads from the 
transaction logs would typically be quite low.  They are written sequentially 
and during checkpoint they are read sequentially.  In the interim, the data is 
probably still in shared memory if it needs to be read.

You don't want your transaction logs or any swapfiles on RAID-5.  The slow 
sequential write performance can be a killer.

-Luke

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance