Re: [PERFORM] performance on new linux box

Scott Carey Fri, 16 Jul 2010 04:04:34 -0700

On Jul 15, 2010, at 6:22 PM, Scott Marlowe wrote:

> On Thu, Jul 15, 2010 at 10:30 AM, Scott Carey <[email protected]> wrote:
>> 
>> On Jul 14, 2010, at 7:50 PM, Ben Chobot wrote:
>> 
>>> On Jul 14, 2010, at 6:57 PM, Scott Carey wrote:
>>> 
>>>> But none of this explains why a 4-disk raid 10 is slower than a 1 disk 
>>>> system.  If there is no write-back caching on the RAID, it should still be 
>>>> similar to the one disk setup.
>>> 
>>> Many raid controllers are smart enough to always turn off write caching on 
>>> the drives, and also disable the feature on their own buffer without a BBU. 
>>> Add a BBU, and the cache on the controller starts getting used, but *not* 
>>> the cache on the drives.
>> 
>> This does not make sense.
> 
> Basically, you can have cheap, fast and dangerous (drive with write
> cache enabled, which responds positively to fsync even when it hasn't
> actually fsynced the data.  You can have cheap, slow and safe with a
> drive that has a cache but since it'll be fsyncing it all the the time
> the write cache won't actually get used, or fast, expensive, and safe,
> which is what a BBU RAID card gets by saying the data is fsynced when
> it's actually just in cache, but a safe cache that won't get lost on
> power down.
> 
> I don't find it that complicated.


It doesn't make sense that a raid 10 will be slower than a 1-disk setup unless 
the former respects fsync() and the latter does not.  Individual drive write 
cache does not explain the situation.  That is what does not make sense.

When in _write-through_ mode, there is no reason to turn off the drive's write 
cache unless the drive does not properly respect its cache-flush command, or 
the RAID card is too dumb to issue cache-flush commands.  The RAID card simply 
has to issue its writes, then issue the flush commands, then return to the OS 
when those complete.  With drive write caches on, this is perfectly safe.  The 
only way it is unsafe is if the drive lies and returns from a cache flush 
before the data from its cache is actually flushed.

Some SSD's on the market currently lie.  A handful of the thousands of all hard 
drive models in the server, desktop, and laptop space in the last decade did 
not respect the cache flush command properly, and none of them in the SAS/SCSI 
or 'enterprise SATA' space lie to my knowledge.  Information on this topic has 
come across this list several times.

The explanation why one setup respects fsync() and another does not almost 
always lies in the FS + OS combination.  HFS+ on OSX does not respect fsync.  
ext3 until recently only did fdatasync() when you told it to fsync() (which is 
fine for postgres' transaction log anyway).

A raid card, especially with any SAS/SCSI drives has no reason to turn off the 
drive's write cache unless it _wants_ to return to the OS before the data is on 
the drive.  That condition occurs in write-back cache mode when the RAID card's 
cache is safe via a battery or some other mechanism.  In that case, it should 
turn off the drive's write cache so that it can be sure  that data is on disk 
when a power fails without having to call the cache-flush command on every 
write.  That way, it can remove data from its RAM as soon as the drive returns 
from the write.
In write-through mode it should turn the caches back on and rely on the flush 
command to pass through direct writes, cache flush demands, and barrier 
requests.  It could optionally turn the caches off, but that won't improve data 
safety unless the drive cannot faithfully flush its cache.



-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance on new linux box

Reply via email to