Re: [zfs-discuss] Yager on ZFS

Robert Milkowski Thu, 08 Nov 2007 04:13:40 -0800

Hello can,


>> 
>> Journaling vs ZFS - well, I've been managing some
>> rather large
>> environment and having fsck (even with journaling)
>> from time to time

cyg> 'From time to time' suggests at least several occurrences:  just
cyg> how many were there?  What led you to think that doing an fsck
cyg> was necessary?  What did journaling fail to handle?  What
cyg> journaling file system were you using?

>From time to time means several times a year in entire environment.
UFS+ with journaling were used. Why fsck was necessary? Well, file
system during normal operation started to complain about some inodes,
remounted itself to RO (or locked) and printed in logs: please run
fsck. Then you've got about 20-30 hours to wait, then sometimes after
fsck finished it asked you that you have to re-run fsck...

We've migrated to ZFS and all these problems are gone, except some
checksum errors.



>> The same happens on ext2/3 - from time to time you've
>> got to run fsck.

cyg> Of course using ext2 sometimes requires fscking, but please be specific 
about when ext3 does.

You reboot and it asks you for fsck - rarely but still. Also use
google and you'll find other users in similar situation (forced to use
fsck on ext3).



>> ZFS end-to-end checksumming - well, you definitely
>> underestimate it.

cyg> Au contraire:  I estimate its worth quite accurately from the
cyg> undetected error rates reported in the CERN "Data Integrity"
cyg> paper published last April (first hit if you Google 'cern "data 
integrity"').


>> While I have yet to see any checksum error reported
>> by ZFS on
>> Symmetrix arrays or FC/SAS arrays with some other
>> "cheap" HW I've seen
>> many of them

cyg> While one can never properly diagnose anecdotal issues off the
cyg> cuff in a Web forum, given CERN's experience you should probably
cyg> check your configuration very thoroughly for things like marginal
cyg> connections:  unless you're dealing with a far larger data set
cyg> than CERN was, you shouldn't have seen 'many' checksum errors.


Maybe I shouldn't get checksum errors but I do.
Then sometimes it's a HW/firmware problem - check with EMC Clariion on
SATA disks - there was a bug which caused data corruption (array just
said that some sectors were lost but RAID still continued to work).
Then there was (probably still is) a problem on IBM's FastT arrays
with SATA disks which is causing data corruption. Then Sun's 3511
array with SATA disks in RAID-5 had also a bug which casued data
corruption... just go to vendors bug databases and look for data
corruption and you would be surprised (I was). Even with "simple" PCI
cards you will find bugs causing silent data corruption (I like one
with IBM's ServeRAID card which occured only if you got 8+GB of memory
in a host).


Then we've got quite a lot of storage on x4500 too, and so far no
single checksum error detected by ZFS - so you're right it's not only
about disk but rather about entire solution (disks, controllers,
firmware on all levels, connections, switches, HBAs, drivers, ...).

The point is that ZFS gives you really good protection in all these
cases and it has already pay off for me.


cyg> Since fsck does not detect errors in user data (unless you're
cyg> talking about *detectable* errors due to 'bit rot' which a full
cyg> surface scan could discover, the incidence of which is just not
cyg> very high in disks run within spec), and since user data
cyg> comprises the vast majority of disk data in most installations,
cyg> something sounds a bit strange here.  Are you saying that you ran
cyg> fsck after noticing some otherwise-undetected error in your user
cyg> data?  If so, did fsck find anything additional wrong when you ran it?

I know it doesn't check user data - we only run fsck when we had to
due to fs being remounted RO or locked with an advice to run fsck.
Sometimes some inodes were fixed sometimes fsck didn't detect
anything. Nevertheless to get fs working we had to fsck.
Also very rarely we have actually found some files with corrupted
content - as we've developed all applications we thought the problem
was with a bug in our code - once migrated to zfs no single occurance
of bad files so probably it wasn't our code after all. Some problems
were related to firmware bugs on IBM's and other arrays, some maybe
due to other reasons.


cyg> In any event, finding and fixing the hardware that is likely to
cyg> be producing errors at the levels you suggest should be a high
cyg> priority even if ZFS helps you discover the need for this in the
cyg> first place (other kinds of checks could also help discover such
cyg> problems, but ZFS does make it easy and provides an additional
cyg> level of protection until their underlying causes have been corrected).

I agree. The point is that before ZFS we weren't even sure where is
the problem and main suspects were applications. Once we started
moving to ZFS the main suspect was ZFS badly reporting. Then it turned
out there's a bug in IBM's firmware... then in EMC (fixed), then still
we got some errors - unfortunatelly after changing FC cables, GBICs,
etc. the problem is still there from time to time for whatever reason.
Fortunately thanks to ZFS the problem is not propagated to
application.


>> Then check this list for other reports on checksum
>> errors from people
>> running on home x86 equipment.

cyg> Such anecdotal information (especially from enthusiasts) is of
cyg> limited value, I'm afraid, especially when compared with a more
cyg> quantitative study like CERN's.  Then again, many home systems
cyg> may be put together less carefully than CERN's (or for that matter my own) 
are.

Then you probably got more homogenic environment in CERN. As I wrote -
we do have lot of storage on x4500 with no single checksum error so
far. So it's about entire solution and your experience may vary from
environment to environment. The point again is ZFS makes your life
better if you are unlucky one.


>> Then you're complaining that ZFS isn't novel...

cyg> When you paraphrase people and don't choose to quote them
cyg> directly, it's a good idea at least to point to the material that
cyg> you're purportedly representing - keeps you honest, even if you
cyg> *think* you're being honest already.

cyg> I certainly don't ever recall saying anything like that, so I'll
cyg> ask you for that reference.  I *have* suggested that *some*
cyg> portions of ZFS are not as novel as Sun (perhaps I should have
cyg> been more specific and said "Jonathan", since it's his recent
cyg> spin in such areas that I find particularly offensive) seems to be 
suggesting that they are.

http://weblog.infoworld.com/yager/archives/2007/10/suns_zfs_is_clo.html
"So while ZFS really isn't all that 'close to perfect', nor as
entirely novel as Sun might have one believe [...]"

I'm sorry you're right. You write it differently.



cyg>  well
>> comparing to other
>> products easy of management and rich of features, all
>> in one, is a
>> good enough reason for some environments.

cyg> Then why can't those over-hyping ZFS limit themselves to that
cyg> kind of (entirely reasonable) assessment?

Well, because if you want to win as we all know it's not only about
technology. You can provide the best technology in a market but
without proper marketing and hype you probably will loose...



>>   While WAFL
>> offers
>> checksumming its done differently which does offer
>> less protection
>> than what ZFS does.

cyg> I'm afraid that you just don't know what you're talking about,
cyg> Robert - and IIRC I've corrected you on this elsewhere, so you
cyg> have no excuse for repeating your misconceptions now.

I haven't spotted your correction.

cyg> WAFL provides not one but two checksum mechanisms which separate
cyg> the checksums and their updates from the data that they protect
cyg> and hence should offer every bit as much protection as ZFS's checksums do.

Can you point to any document?

Well, below link (to NetApp page) says that checksum and datablock are
in the same block. So it's nothing like ZFS. Unless they changed
something and haven't updated that page. Again - can you point to some
documentation.

http://www.netapp.com/go/techontap/matl/sample/0206tot_resiliency.html
"Brace yourself, because we saved the most insidious disk problem for last.
 With extreme rarity, a disk malfunction occurs in which a write operation
 fails but the disk is unable to detect the write failure and signals a 
successful write status.
 This event is called a "lost write," and it causes silent data corruption if 
no detection and
 correction mechanism is in place. You might think that checksums and RAID will 
protect you against
 this type of failure, but that isn't the case. Checksums are written in the 
block metadatacoresident
 with the blockduring the same I/O. In this failure mode, neither the block 
nor the checksum gets written,
 so what you see on disk is the previous data that was written to that block 
location with a valid checksum.

 Only NetApp, with its innovative WAFL (Write Anywhere File Layout)
 storage virtualization technology closely integrated with RAID, identifies 
this failure.
 WAFL never rewrites a block to the same location. If a block is changed, it is 
written to a new location,
 and the old block is freed. The identity of a block changes each time it is 
written. WAFL stores the identity
 of each block in the block's metadata and cross checks the identity on each 
read to ensure that the block being
 read belongs to the file and has the correct offset. If not, the data is 
recreated using RAID.
 The check doesn't have any performance impact.
"



When it comes to NetApp - they are really great. However thanks to ZFS
I can basically get the same and more than what NetApp offers with
spending much less money at the same time.


-- 
Best regards,
 Robert Milkowski                           mailto:[EMAIL PROTECTED]
                                       http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Yager on ZFS

Reply via email to