Re: [PERFORM] Hardware vs Software RAID

2008-06-27 Thread Matthew Wakeling

On Thu, 26 Jun 2008, Merlin Moncure wrote:
In addition there are many different types of flash (MLC/SLC) and the 
flash cells themselves can be organized in particular ways involving 
various trade-offs.


Yeah, I wouldn't go for MLC, given it has a tenth the lifespan of SLC.

The main issue is lousy random write performance that basically makes 
them useless for any kind of OLTP operation.


For the mentioned device, they claim a sequential read speed of 100MB/s, 
sequential write speed of 80MB/s, random read speed of 80MB/s and random 
write speed of 30MB/s. This is *much* better than figures quoted for many 
other devices, but of course unless they publish the block size they used 
for the random speed tests, the figures are completely useless.


Matthew

--
sed -e '/^[when][coders]/!d;/^...[discover].$/d;/^..[real].[code]$/!d
' `locate dict/words`

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-27 Thread Merlin Moncure
On Fri, Jun 27, 2008 at 7:00 AM, Matthew Wakeling [EMAIL PROTECTED] wrote:
 On Thu, 26 Jun 2008, Merlin Moncure wrote:

 In addition there are many different types of flash (MLC/SLC) and the
 flash cells themselves can be organized in particular ways involving various
 trade-offs.

 Yeah, I wouldn't go for MLC, given it has a tenth the lifespan of SLC.

 The main issue is lousy random write performance that basically makes them
 useless for any kind of OLTP operation.

 For the mentioned device, they claim a sequential read speed of 100MB/s,
 sequential write speed of 80MB/s, random read speed of 80MB/s and random
 write speed of 30MB/s. This is *much* better than figures quoted for many
 other devices, but of course unless they publish the block size they used
 for the random speed tests, the figures are completely useless.

right. not likely completely truthful. here's why:

A 15k drive can deliver around 200 seeks/sec (under worst case
conditions translating to 1-2mb/sec with 8k block size).   30mb/sec
random performance would then be rough equivalent to around 40 15k
drives configured in a raid 10.  Of course, I'm assuming the block
size :-).

Unless there were some other mitigating factors (lifetime, etc), this
would demonstrate that flash ssd would crush disks in any reasonable
cost/performance metric.  It's probably not so cut and dry, otherwise
we'd be hearing more about them (pure speculation on  my part).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Greg Smith

On Wed, 25 Jun 2008, Andrew Sullivan wrote:

the key thing to do is to ensure you have good testing infrastructure in 
place to check that things will work before you deploy to production.


This is true whether you're using Linux or completely closed source 
software.  There are two main differences from my view:


-OSS software lets you look at the code before a typical closed-source 
company would have pushed a product out the door at all.  Downside is that 
you need to recognize that.  Linux kernels for example need significant 
amounts of encouters with the real world after release before they're 
ready for most people.


-If your OSS program doesn't work, you can potentially find the problem 
yourself.  I find that I don't fix issues when I come across them very 
much, but being able to browse the source code for something that isn't 
working frequently makes it easier to understand what's going on as part 
of troubleshooting.


It's not like closed source software doesn't have the same kinds of bugs. 
The way commercial software (and projects like PostgreSQL) get organized 
into a smaller number of official releases tends to focus the QA process a 
bit better though, so that regular customers don't see as many rough 
edges.  Linux used to do a decent job of this with their development vs. 
stable kernels, which I really miss.  Unfortunately there's just not 
enough time for the top-level developers to manage that while still 
keeping up with the pace needed just for new work.  Sorting out which are 
the stable kernel releases seems to have become the job of the 
distributors (RedHat, SuSE, Debian, etc.) instead of the core kernel 
developers.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Vivek Khera


On Jun 25, 2008, at 11:35 AM, Matthew Wakeling wrote:


On Wed, 25 Jun 2008, Greg Smith wrote:

A firewire-attached log device is an extremely bad idea.


Anyone have experience with IDE, SATA, or SAS-connected flash  
devices like the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely -  
32GB, at a transfer rate of 100MB/s, and doesn't degrade much in  
performance when writing small random blocks. But what's it actually  
like, and is it reliable?


None of these manufacturers rates these drives for massive amounts of  
writes.  They're sold as suitable for laptop/desktop use, which  
normally is not a heavy wear and tear operation like a DB.  Once they  
claim suitability for this purpose, be sure that I and a lot of others  
will dive into it to see how well it really works.  Until then, it  
will just be an expensive brick-making experiment, I'm sure.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Matthew Wakeling

On Thu, 26 Jun 2008, Vivek Khera wrote:
Anyone have experience with IDE, SATA, or SAS-connected flash devices like 
the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely - 32GB, at a transfer 
rate of 100MB/s, and doesn't degrade much in performance when writing small 
random blocks. But what's it actually like, and is it reliable?


None of these manufacturers rates these drives for massive amounts of writes. 
They're sold as suitable for laptop/desktop use, which normally is not a 
heavy wear and tear operation like a DB.  Once they claim suitability for 
this purpose, be sure that I and a lot of others will dive into it to see how 
well it really works.  Until then, it will just be an expensive brick-making 
experiment, I'm sure.


It claims a MTBF of 2,000,000 hours, but no further reliability 
information seems forthcoming. I thought the idea that flash couldn't cope 
with many writes was no longer true these days?


Matthew

--
I work for an investment bank. I have dealt with code written by stock
exchanges. I have seen how the computer systems that store your money are
run. If I ever make a fortune, I will store it in gold bullion under my
bed.  -- Matthew Crosby

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Scott Marlowe
On Thu, Jun 26, 2008 at 10:14 AM, Matthew Wakeling [EMAIL PROTECTED] wrote:
 On Thu, 26 Jun 2008, Vivek Khera wrote:

 Anyone have experience with IDE, SATA, or SAS-connected flash devices
 like the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely - 32GB, at a
 transfer rate of 100MB/s, and doesn't degrade much in performance when
 writing small random blocks. But what's it actually like, and is it
 reliable?

 None of these manufacturers rates these drives for massive amounts of
 writes. They're sold as suitable for laptop/desktop use, which normally is
 not a heavy wear and tear operation like a DB.  Once they claim suitability
 for this purpose, be sure that I and a lot of others will dive into it to
 see how well it really works.  Until then, it will just be an expensive
 brick-making experiment, I'm sure.

 It claims a MTBF of 2,000,000 hours, but no further reliability information
 seems forthcoming. I thought the idea that flash couldn't cope with many
 writes was no longer true these days?

What's mainly happened is a great increase in storage capacity has
allowed flash based devices to spread their writes out over so many
cells that the time it takes to overwrite all the cells enough to get
dead ones is measured in much longer intervals.  Instead of dieing in
weeks or months, they'll now die, for most work loads, in years or
more.

However, I've tested a few less expensive solid state storage and for
some transactional loads it was much faster, but then for things like
report queries scanning whole tables they were factors slower than a
sw RAID-10 array of just 4 spinning disks.  But pg_bench was quite
snappy using the solid state storage for pg_xlog.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Merlin Moncure
On Thu, Jun 26, 2008 at 9:49 AM, Peter T. Breuer [EMAIL PROTECTED] wrote:
 Also sprach Merlin Moncure:
 As discussed down thread, software raid still gets benefits of
 write-back caching on the raid controller...but there are a couple of

 (I wish I knew what write-back caching was!)

hardware raid controllers generally have some dedicated memory for
caching.  the controllers can be configured in one of two modes: (the
jargon is so common it's almost standard)
write back: raid controller can lie to host o/s. when o/s asks
controller to sync, controller can hold data in cache (for a time)
write through: raid controller can not lie. all sync requests must
pass through to disk

The thinking is, the bbu on the controller can hold scheduled writes
in memory (for a time) and replayed to disk when server restarts in
event of power failure.  This is a reasonable compromise between data
integrity and performance.  'write back' caching provides insane burst
IOPS (because you are writing to controller cache) and somewhat
improved sustained IOPS because the controller is reorganizing writes
on the fly in (hopefully) optimal fashion.

 This imposes a considerable extra resource burden. It's a mystery to me
 However the lack of extra buffering is really deliberate (double
 buffering is a horrible thing in many ways, not least because of the

snip
completely unconvincing.  the overhead of various cache layers is
completely minute compared to a full fault to disk that requires a
seek which is several orders of magnitude slower.

The linux software raid algorithms are highly optimized, and run on a
presumably (much faster) cpu than what the controller supports.
However, there is still some extra oomph you can get out of letting
the raid controller do what the software raid can't...namely delay
sync for a time.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Merlin Moncure
On Thu, Jun 26, 2008 at 12:14 PM, Matthew Wakeling [EMAIL PROTECTED] wrote:
 None of these manufacturers rates these drives for massive amounts of
 writes. They're sold as suitable for laptop/desktop use, which normally is
 not a heavy wear and tear operation like a DB.  Once they claim suitability
 for this purpose, be sure that I and a lot of others will dive into it to
 see how well it really works.  Until then, it will just be an expensive
 brick-making experiment, I'm sure.

 It claims a MTBF of 2,000,000 hours, but no further reliability information
 seems forthcoming. I thought the idea that flash couldn't cope with many
 writes was no longer true these days?

Flash and disks have completely different failure modes, and you can't
do apples to apples MTBF comparisons.  In addition there are many
different types of flash (MLC/SLC) and the flash cells themselves can
be organized in particular ways involving various trade-offs.

The best flash drives combined with smart wear leveling are
anecdotally believed to provide lifetimes that are good enough to
warrant use in high duty server environments.  The main issue is lousy
random write performance that basically makes them useless for any
kind of OLTP operation.  There are a couple of software (hacks?) out
there which may address this problem if the technology doesn't get
there first.

If the random write problem were solved, a single ssd would provide
the equivalent of a stack of 15k disks in a raid 10.

see:
http://www.bigdbahead.com/?p=44
http://feedblog.org/2008/01/30/24-hours-with-an-ssd-and-mysql/

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software Raid

2008-06-26 Thread Peter T. Breuer
Also sprach Merlin Moncure:
 write back: raid controller can lie to host o/s. when o/s asks

This is not what the linux software raid controller does, then. It 
does not queue requests internally at all, nor ack requests that have
not already been acked by the components (modulo the fact that one can
deliberately choose to have a slow component not be sync by allowing
write-behind on it, in which case the controller will ack the
incoming request after one of the compionents has been serviced,
without waiting for both).

 integrity and performance.  'write back' caching provides insane burst
 IOPS (because you are writing to controller cache) and somewhat
 improved sustained IOPS because the controller is reorganizing writes
 on the fly in (hopefully) optimal fashion.

This is what is provided by Linux file system and (ordinary) block
device driver subsystem. It is deliberately eschewed by the soft raid
driver, because any caching will already have been done above and below
the driver, either in the FS or in the components. 

  However the lack of extra buffering is really deliberate (double
  buffering is a horrible thing in many ways, not least because of the
 
 snip
 completely unconvincing. 

But true.  Therefore the problem in attaining conviction must be at your
end.  Double buffering just doubles the resources dedicated to a single
request, without doing anything for it!  It doubles the frequency with
which one runs out of resources, it doubles the frequency of the burst
limit being reached.  It's deadly (deadlockly :) in the situation where
the receiving component device also needs resources in order to service
the request, such as when the transport is network tcp (and I have my
suspicions about scsi too).

 the overhead of various cache layers is
 completely minute compared to a full fault to disk that requires a
 seek which is several orders of magnitude slower.

That's aboslutely true when by overhead you mean computation cycles
and absolutely false when by overhead you mean memory resources, as I
do.  Double buffering is a killer.

 The linux software raid algorithms are highly optimized, and run on a

I can confidently tell you that that's balderdash both as a Linux author
and as a software RAID linux author (check the attributions in the
kernel source, or look up something like Raiding the Noosphere on
google).

 presumably (much faster) cpu than what the controller supports.
 However, there is still some extra oomph you can get out of letting
 the raid controller do what the software raid can't...namely delay
 sync for a time.

There are several design problems left in software raid in the linux kernel.
One of them is the need for extra memory to dispatch requests with and
as (i.e. buffer heads and buffers, both). bhs should be OK since the
small cache per device won't be exceeded while the raid driver itself
serialises requests, which is essentially the case (it does not do any
buffering, queuing, whatever .. and tries hard to avoid doing so). The
need for extra buffers for the data is a problem. On different
platforms different aspects of that problem are important (would you
believe that on ARM mere copying takes so much cpu time that one wants
to avoid it at all costs, whereas on intel it's a forgettable trivium).

I also wouldn't aboslutely swear that request ordering is maintained
under ordinary circumstances.

But of course we try.


Peter

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software Raid

2008-06-26 Thread Merlin Moncure
On Thu, Jun 26, 2008 at 1:03 AM, Peter T. Breuer [EMAIL PROTECTED] wrote:
 Also sprach Merlin Moncure:
 write back: raid controller can lie to host o/s. when o/s asks

 This is not what the linux software raid controller does, then. It
 does not queue requests internally at all, nor ack requests that have
 not already been acked by the components (modulo the fact that one can
 deliberately choose to have a slow component not be sync by allowing
 write-behind on it, in which case the controller will ack the
 incoming request after one of the compionents has been serviced,
 without waiting for both).

 integrity and performance.  'write back' caching provides insane burst
 IOPS (because you are writing to controller cache) and somewhat
 improved sustained IOPS because the controller is reorganizing writes
 on the fly in (hopefully) optimal fashion.

 This is what is provided by Linux file system and (ordinary) block
 device driver subsystem. It is deliberately eschewed by the soft raid
 driver, because any caching will already have been done above and below
 the driver, either in the FS or in the components.

  However the lack of extra buffering is really deliberate (double
  buffering is a horrible thing in many ways, not least because of the

 snip
 completely unconvincing.

 But true.  Therefore the problem in attaining conviction must be at your
 end.  Double buffering just doubles the resources dedicated to a single
 request, without doing anything for it!  It doubles the frequency with
 which one runs out of resources, it doubles the frequency of the burst
 limit being reached.  It's deadly (deadlockly :) in the situation where

Only if those resources are drawn from the same pool.  You are
oversimplifying a calculation that has many variables such as cost.
CPUs for example are introducing more cache levels (l1, l2, l3), etc.
 Also, the different levels of cache have different capabilities.
Only the hardware controller cache is (optionally) allowed to delay
acknowledgment of a sync.  In postgresql terms, we get roughly the
same effect with the computers entire working memory with fsync
disabled...so that we are trusting, rightly or wrongly, that all
writes will eventually make it to disk.  In this case, the raid
controller cache is redundant and marginally useful.

 the receiving component device also needs resources in order to service
 the request, such as when the transport is network tcp (and I have my
 suspicions about scsi too).

 the overhead of various cache layers is
 completely minute compared to a full fault to disk that requires a
 seek which is several orders of magnitude slower.

 That's aboslutely true when by overhead you mean computation cycles
 and absolutely false when by overhead you mean memory resources, as I
 do.  Double buffering is a killer.

Double buffering is most certainly _not_ a killer (or at least, _the_
killer) in practical terms.  Most database systems that do any amount
of writing (that is, interesting databases) are bound by the ability
to randomly read and write to the storage medium, and only that.

This is why raid controllers come with a relatively small amount of
cache...there are diminishing returns from reorganizing writes.  This
is also why up and coming storage technologies (like flash) are so
interesting.  Disk drives have made only marginal improvements in
speed since the early 80's.

 The linux software raid algorithms are highly optimized, and run on a

 I can confidently tell you that that's balderdash both as a Linux author

I'm just saying here that there is little/no cpu overhead for using
software raid on modern hardware.

 believe that on ARM mere copying takes so much cpu time that one wants
 to avoid it at all costs, whereas on intel it's a forgettable trivium).

This is a database list.  The main area of interest is in dealing with
server class hardware.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software Raid

2008-06-26 Thread david

On Thu, 26 Jun 2008, Peter T. Breuer wrote:


Also sprach Merlin Moncure:

The linux software raid algorithms are highly optimized, and run on a


I can confidently tell you that that's balderdash both as a Linux author
and as a software RAID linux author (check the attributions in the
kernel source, or look up something like Raiding the Noosphere on
google).


presumably (much faster) cpu than what the controller supports.
However, there is still some extra oomph you can get out of letting
the raid controller do what the software raid can't...namely delay
sync for a time.


There are several design problems left in software raid in the linux kernel.
One of them is the need for extra memory to dispatch requests with and
as (i.e. buffer heads and buffers, both). bhs should be OK since the
small cache per device won't be exceeded while the raid driver itself
serialises requests, which is essentially the case (it does not do any
buffering, queuing, whatever .. and tries hard to avoid doing so). The
need for extra buffers for the data is a problem. On different
platforms different aspects of that problem are important (would you
believe that on ARM mere copying takes so much cpu time that one wants
to avoid it at all costs, whereas on intel it's a forgettable trivium).

I also wouldn't aboslutely swear that request ordering is maintained
under ordinary circumstances.


which flavor of linux raid are you talking about (the two main families I 
am aware of are the md and dm ones)


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software Raid

2008-06-26 Thread Greg Smith

On Thu, 26 Jun 2008, Peter T. Breuer wrote:


Double buffering is a killer.


No, it isn't; it's a completely trivial bit of overhead.  It only exists 
during the time when blocks are queued to write but haven't been written 
yet.  On any database system, in those cases I/O congestion at the disk 
level (probably things backed up behind seeks) is going to block writes 
way before the memory used or the bit of CPU time making the extra copy 
becomes a factor on anything but minimal platforms.


You seem to know quite a bit about the RAID implementation, but you are a) 
extrapolating from that knowledge into areas of database performance you 
need to spend some more time researching first and b) extrapolating based 
on results from trivial hardware, relative to what the average person on 
this list is running a database server on in 2008.  The weakest platform I 
deploy PostgreSQL on and consider relevant today has two cores and 2GB of 
RAM, for a single-user development system that only has to handle a small 
amount of data relative to what the real servers handle.  If you note the 
kind of hardware people ask about here that's pretty typical.


You have some theories here, Merlin and I have positions that come from 
running benchmarks, and watching theories suffer a brutal smack-down from 
the real world is one of those things that happens every day.  There is 
absolutely some overhead from paths through the Linux software RAID that 
consume resources.  But you can't even measure that in database-oriented 
comparisions against hardware setups that don't use those resources, which 
means that for practical purposes the overhead doesn't exist in this 
context.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Robert Treat
On Wednesday 25 June 2008 11:24:23 Greg Smith wrote:
 What I often do is get a hardware RAID controller, just to accelerate disk
 writes, but configure it in JBOD mode and use Linux or other software RAID
 on that platform.


JBOD + RAIDZ2 FTW ;-)

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Matthew Wakeling

On Wed, 25 Jun 2008, Merlin Moncure wrote:

Has anyone done some benchmarks between hardware RAID vs Linux MD software
RAID?


I have here:
http://merlinmoncure.blogspot.com/2007/08/following-are-results-of-our-testing-of.html

The upshot is I don't really see a difference in performance.


The main difference is that you can get hardware RAID with 
battery-backed-up cache, which means small writes will be much quicker 
than software RAID. Postgres does a lot of small writes under some use 
cases.


Without a BBU cache, it is sensible to put the transaction logs on a 
separate disc system to the main database, to make the transaction log 
writes fast (due to no seeking on those discs). However, with a BBU cache, 
that advantage is irrelevant, as the cache will absorb the writes.


However, not all hardware RAID will have such a battery-backed-up cache, 
and those that do tend to have a hefty price tag.


Matthew

--
$ rm core
Segmentation Fault (core dumped)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Peter T. Breuer
Also sprach Matthew Wakeling:
  Has anyone done some benchmarks between hardware RAID vs Linux MD software
  RAID?
  ...
  The upshot is I don't really see a difference in performance.
 
 The main difference is that you can get hardware RAID with 
 battery-backed-up cache, which means small writes will be much quicker 
 than software RAID. Postgres does a lot of small writes under some use 

It doesn't mean that, I'm afraid.  You can put the log/bitmap wherever
you want in software raid, including on a battery-backed local ram disk
if you feel so inclined.  So there is no intrinsic advantage to be
gained there at all.

 However, not all hardware RAID will have such a battery-backed-up cache, 
 and those that do tend to have a hefty price tag.

Whereas software raid and a firewire-attached log device does not.


Peter

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Greg Smith

On Wed, 25 Jun 2008, Peter T. Breuer wrote:

You can put the log/bitmap wherever you want in software raid, including 
on a battery-backed local ram disk if you feel so inclined.  So there is 
no intrinsic advantage to be gained there at all.


You are technically correct but this is irrelevant.  There are zero 
mainstream battery-backed local RAM disk setups appropriate for database 
use that don't cost substantially more than the upgrade cost to just 
getting a good hardware RAID controller with cache integrated and using 
regular disks.


What I often do is get a hardware RAID controller, just to accelerate disk 
writes, but configure it in JBOD mode and use Linux or other software RAID 
on that platform.


Advantages of using software RAID, in general and in some cases even with 
a hardware disk controller:


-Your CPU is inevitably faster than the one on the controller, so this can 
give better performance than having RAID calcuations done on the 
controller itself.


-If the RAID controllers dies, you can move everything to another machine 
and know that the RAID setup will transfer.  Usually hardware RAID 
controllers use a formatting process such that you can't read the array 
without such a controller, so you're stuck with having a replacement 
controller around if you're paranoid.  As long as I've got any hardware 
that can read the disks, I can get a software RAID back again.


-There is a transparency to having the disks directly attached to the OS 
you lose with most hardware RAID.  Often with hardware RAID you lose the 
ability to do things like monitor drive status and temperature without 
using a special utility to read SMART and similar data.


Disadvantages:

-Maintenance like disk replacement rebuilds will be using up your main CPU 
and its resources (like I/O bus bandwidth) that might be offloaded onto 
the hardware RAID controller.


-It's harder to setup a redundant boot volume with software RAID that 
works right with a typical PC BIOS.  If you use hardware RAID it tends to 
insulate you from the BIOS quirks.


-If a disk fails, I've found a full hardware RAID setup is less likely to 
result in an OS crash than a software RAID is.  The same transparency and 
visibility into what the individual disks are doing can be a problem when 
a disk goes crazy and starts spewing junk the OS has to listen to. 
Hardware controllers tend to do a better job planning for that sort of 
failure, and some of that is lost even by putting them into JBOD mode.



However, not all hardware RAID will have such a battery-backed-up cache,
and those that do tend to have a hefty price tag.


Whereas software raid and a firewire-attached log device does not.


A firewire-attached log device is an extremely bad idea.  First off, 
you're at the mercy of the firewire bridge's write guarantees, which may 
or may not be sensible.  It's not hard to find reports of people whose 
disks were corrupted when the disk was accidentally disconnected, or of 
buggy drive controller firmware causing problems.  I stopped using 
Firewire years ago because it seems you need to do some serious QA to 
figure out which combinations are reliable and which aren't, and I don't 
use external disks enough to spend that kind of time with them.


Second, there's few if any Firewire setups where the host gets to read 
SMART error data from the disk.  This means that you can continue to use a 
flaky disk long past the point where a direct connected drive would have 
been kicked out of an array for being unreliable.  SMART doesn't detect 
100% of drive failures in advance, but you'd be silly to setup a database 
system where you don't get to take advantage of the ~50% it does catch 
before you lose any data.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Jonah H. Harris
On Wed, Jun 25, 2008 at 11:24 AM, Greg Smith [EMAIL PROTECTED] wrote:
 SMART doesn't detect 100% of drive failures in advance, but you'd be silly
 to setup a database system where you don't get to take advantage of the
 ~50% it does catch before you lose any data.

Can't argue with that one.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | [EMAIL PROTECTED]
Edison, NJ 08837 | http://www.enterprisedb.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Joshua D. Drake


On Wed, 2008-06-25 at 11:30 -0400, Jonah H. Harris wrote:
 On Wed, Jun 25, 2008 at 11:24 AM, Greg Smith [EMAIL PROTECTED] wrote:
  SMART doesn't detect 100% of drive failures in advance, but you'd be silly
  to setup a database system where you don't get to take advantage of the
  ~50% it does catch before you lose any data.
 
 Can't argue with that one.

SMART has certainly saved our butts more than once.

Joshua D. Drake



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Matthew Wakeling

On Wed, 25 Jun 2008, Greg Smith wrote:

A firewire-attached log device is an extremely bad idea.


Anyone have experience with IDE, SATA, or SAS-connected flash devices like 
the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely - 32GB, at a 
transfer rate of 100MB/s, and doesn't degrade much in performance when 
writing small random blocks. But what's it actually like, and is it 
reliable?


Matthew

--
Terrorists evolve but security is intelligently designed?  -- Jake von Slatt

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Joshua D. Drake


On Wed, 2008-06-25 at 09:53 -0600, Scott Marlowe wrote:
 On Wed, Jun 25, 2008 at 5:05 AM, Adrian Moisey
 [EMAIL PROTECTED] wrote:
  Hi

 I'm currently having a problem with a well known very large
 servermanufacturer who shall remain unnamed and their semi-custom
 RAID controller firmware not getting along with the driver for ubuntu.

/me waves to Dell.

Joshua D. Drake



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Merlin Moncure
On Wed, Jun 25, 2008 at 11:55 AM, Joshua D. Drake [EMAIL PROTECTED] wrote:
 On Wed, 2008-06-25 at 09:53 -0600, Scott Marlowe wrote:
 On Wed, Jun 25, 2008 at 5:05 AM, Adrian Moisey
 [EMAIL PROTECTED] wrote:

 I'm currently having a problem with a well known very large
 servermanufacturer who shall remain unnamed and their semi-custom
 RAID controller firmware not getting along with the driver for ubuntu.

 /me waves to Dell.

not just ubuntu...the dell perc/x line software  utilities also
explicitly check the hardware platform so they only run on dell
hardware.  However, the lsi logic command line utilities run just
fine.  As for ubuntu sas support, ubuntu suports the mpt fusion/sas
line directly through the kernel.

In fact, installing ubuntu server fixed an unrelated issue relating to
a qlogic fibre hba that was causing reboots under heavy load with a
pci-x fibre controller on centos.  So, based on this and other
experiences, i'm starting to be more partial to linux distributions
with faster moving kernels, mainly because i trust the kernel drivers
more than the vendor provided drivers.  The in place distribution
upgrade is also very nice.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Merlin Moncure
On Wed, Jun 25, 2008 at 9:03 AM, Matthew Wakeling [EMAIL PROTECTED] wrote:
 On Wed, 25 Jun 2008, Merlin Moncure wrote:

 Has anyone done some benchmarks between hardware RAID vs Linux MD
 software
 RAID?

 I have here:

 http://merlinmoncure.blogspot.com/2007/08/following-are-results-of-our-testing-of.html

 The upshot is I don't really see a difference in performance.

 The main difference is that you can get hardware RAID with battery-backed-up
 cache, which means small writes will be much quicker than software RAID.
 Postgres does a lot of small writes under some use cases.

As discussed down thread, software raid still gets benefits of
write-back caching on the raid controller...but there are a couple of
things I'd like to add.  First, if your sever is extremely busy, the
write back cache will eventually get overrun and performance will
eventually degrade to more typical ('write through') performance.
Secondly, many hardware raid controllers have really nasty behavior in
this scenario.  Linux software raid has decent degradation in overload
conditions but many popular raid controllers (dell perc/lsi logic sas
for example) become unpredictable and very bursty in sustained high
load conditions.

As greg mentioned, I trust the linux kernel software raid much more
than the black box hw controllers.  Also, contrary to vast popular
mythology, the 'overhead' of sw raid in most cases is zero except in
very particular conditions.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Andrew Sullivan
On Wed, Jun 25, 2008 at 01:35:49PM -0400, Merlin Moncure wrote:
 experiences, i'm starting to be more partial to linux distributions
 with faster moving kernels, mainly because i trust the kernel drivers
 more than the vendor provided drivers.

While I have some experience that agrees with this, I'll point out
that I've had the opposite experience, too: upgrading the kernel made
a perfectly stable system both unstable and prone to data loss.  I
think this is a blade that cuts both ways, and the key thing to do is
to ensure you have good testing infrastructure in place to check that
things will work before you deploy to production.  (The other way to
say that, of course, is Linux is only free if your time is worth
nothing.  Substitute your favourite free software for Linux, of
course.  ;-) )

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Andrew Sullivan
On Wed, Jun 25, 2008 at 01:07:25PM -0500, Kevin Grittner wrote:
  
 It doesn't have to be free software to cut that way.  I've actually
 found the free software to waste less of my time.  

No question.  But one of the unfortunate facts of the
no-charge-for-licenses world is that many people expect the systems to
be _really free_.  It appears that some people think, because they've
already paid $smallfortune for a license, it's therefore ok to pay
another amount in operation costs and experts to run the system.  Free
systems, for some reason, are expected also magically to run
themselves.  This tendency is getting better, but hasn't gone away.
It's partly because the budget for the administrators is often buried
in the overall large system budget, so nobody balks when there's a big
figure attached there.  When you present a budget for free software
that includes the cost of a few administrators, the accounting people
want to know why the free software costs so much.  

 If you depend on your systems, though, you should never deploy any
 change, no matter how innocuous it seems, without testing.

I agree completely.

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Greg Smith

On Wed, 25 Jun 2008, Peter T. Breuer wrote:


I refrained from saying in my reply that I would set up a firewire-based
link to ram in a spare old portable (which comes with a battery) if I
wanted to do this cheaply.


Maybe, but this is kind of a weird setup.  Not many people are going to 
run a production database that way and us wandering into the details too 
much risks confusing everybody else.


The log is sync. Therefore it doesn't matter what the guarantees are, or 
at least I assume you are worrying about acks coming back before the 
write has been sent, etc.  Only an actual net write will be acked by the 
firewire transport as far as I know.


That's exactly the issue; it's critical for database use that a disk not 
lie to you about writes being done if they're actually sitting in a cache 
somewhere.  (S)ATA disks do that, so you have to turn that off for them to 
be safe to use.  Since the firewire enclosure is a black box, it's 
difficult to know exactly what it's doing here, and history here says that 
every type (S)ATA disk does the wrong in the default case.  I expect that 
for any Firewire/USB device, if I write to the disk, then issue a fsync, 
it will return success from that once the data has been written to the 
disk's cache--which is crippling behavior from the database's perspective 
one day when you get a crash.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Greg Smith

On Wed, 25 Jun 2008, Merlin Moncure wrote:

So, based on this and other experiences, i'm starting to be more partial 
to linux distributions with faster moving kernels, mainly because i 
trust the kernel drivers more than the vendor provided drivers.


Depends on how fast.  I find it takes a minimum of 3-6 months before any 
new kernel release stabilizes (somewhere around 2.6.X-5 to -10), and some 
distributions push them out way before that.  Also, after major changes, 
it can be a year or more before a new kernel is not a regression either in 
reliability, performance, or worst-case behavior.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance