Re: [zfs-discuss] Finding corrupted files

Edward Ned Harvey Wed, 06 Oct 2010 19:06:54 -0700

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Stephan Budach
> 
> Ian,
> 
> yes, although these vdevs are FC raids themselves, so the risk is… uhm…
> calculated.


Whenever possible, you should always JBOD the storage and let ZFS manage the 
raid, for several reasons.  (See below).  Also, as counter-intuitive as this 
sounds (see below) you should disable hardware write-back cache (even with BBU) 
because it hurts performance in any of these situations:  (a) Disable WB if you 
have access to SSD or other nonvolatile dedicated log device.  (b) Disable WB 
if you know all of your writes to be async mode and not sync mode.  (c) Disable 
WB if you've opted to disable ZIL.

* Hardware raid blindly assumes the redundant data written to disk is written 
correctly.  So later, if you experience a checksum error (such as you have) 
then it's impossible for ZFS to correct it.  The hardware raid doesn't know a 
checksum error has occurred, and there is no way for the OS to read the "other 
side of the mirror" to attempt correcting the checksum via redundant data.

* ZFS has knowledge of both the filesystem, and the block level devices, while 
hardware raid has only knowledge of block level devices.  Which means ZFS is 
able to optimize performance in ways that hardware cannot possibly do.  For 
example, whenever there are many small writes taking place concurrently, ZFS is 
able to remap the physical disk blocks of those writes, to aggregate them into 
a single sequential write.  Depending on your metric, this yields 1-2 orders of 
magnitude higher IOPS.

* Because ZFS automatically buffers writes in ram in order to aggregate as 
previously mentioned, the hardware WB cache is not beneficial.  There is one 
exception.  If you are doing sync writes to spindle disks, and you don't have a 
dedicated log device, then the WB cache will benefit you, approx half as much 
as you would benefit by adding dedicated log device.  The sync write sort-of 
by-passes the ram buffer, and that's the reason why the WB is able to do some 
good in the case of sync writes.  

Ironically, if you have WB enabled, and you have a SSD log device, then the WB 
hurts you.  You get the best performance with SSD log, and no WB.  Because the 
WB "lies" to the OS, saying some tiny chunk of data has been written... then 
the OS will happily write another tiny chunk, and another, and another.  The WB 
is only buffering a lot of tiny random writes, and in aggregate, it will only 
go as fast as the random writes.  It undermines ZFS's ability to aggregate 
small writes into sequential writes.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Finding corrupted files

Reply via email to