Hello can,
>> >> Journaling vs ZFS - well, I've been managing some >> rather large >> environment and having fsck (even with journaling) >> from time to time cyg> 'From time to time' suggests at least several occurrences: just cyg> how many were there? What led you to think that doing an fsck cyg> was necessary? What did journaling fail to handle? What cyg> journaling file system were you using? >From time to time means several times a year in entire environment. UFS+ with journaling were used. Why fsck was necessary? Well, file system during normal operation started to complain about some inodes, remounted itself to RO (or locked) and printed in logs: please run fsck. Then you've got about 20-30 hours to wait, then sometimes after fsck finished it asked you that you have to re-run fsck... We've migrated to ZFS and all these problems are gone, except some checksum errors. >> The same happens on ext2/3 - from time to time you've >> got to run fsck. cyg> Of course using ext2 sometimes requires fscking, but please be specific about when ext3 does. You reboot and it asks you for fsck - rarely but still. Also use google and you'll find other users in similar situation (forced to use fsck on ext3). >> ZFS end-to-end checksumming - well, you definitely >> underestimate it. cyg> Au contraire: I estimate its worth quite accurately from the cyg> undetected error rates reported in the CERN "Data Integrity" cyg> paper published last April (first hit if you Google 'cern "data integrity"'). >> While I have yet to see any checksum error reported >> by ZFS on >> Symmetrix arrays or FC/SAS arrays with some other >> "cheap" HW I've seen >> many of them cyg> While one can never properly diagnose anecdotal issues off the cyg> cuff in a Web forum, given CERN's experience you should probably cyg> check your configuration very thoroughly for things like marginal cyg> connections: unless you're dealing with a far larger data set cyg> than CERN was, you shouldn't have seen 'many' checksum errors. Maybe I shouldn't get checksum errors but I do. Then sometimes it's a HW/firmware problem - check with EMC Clariion on SATA disks - there was a bug which caused data corruption (array just said that some sectors were lost but RAID still continued to work). Then there was (probably still is) a problem on IBM's FastT arrays with SATA disks which is causing data corruption. Then Sun's 3511 array with SATA disks in RAID-5 had also a bug which casued data corruption... just go to vendors bug databases and look for data corruption and you would be surprised (I was). Even with "simple" PCI cards you will find bugs causing silent data corruption (I like one with IBM's ServeRAID card which occured only if you got 8+GB of memory in a host). Then we've got quite a lot of storage on x4500 too, and so far no single checksum error detected by ZFS - so you're right it's not only about disk but rather about entire solution (disks, controllers, firmware on all levels, connections, switches, HBAs, drivers, ...). The point is that ZFS gives you really good protection in all these cases and it has already pay off for me. cyg> Since fsck does not detect errors in user data (unless you're cyg> talking about *detectable* errors due to 'bit rot' which a full cyg> surface scan could discover, the incidence of which is just not cyg> very high in disks run within spec), and since user data cyg> comprises the vast majority of disk data in most installations, cyg> something sounds a bit strange here. Are you saying that you ran cyg> fsck after noticing some otherwise-undetected error in your user cyg> data? If so, did fsck find anything additional wrong when you ran it? I know it doesn't check user data - we only run fsck when we had to due to fs being remounted RO or locked with an advice to run fsck. Sometimes some inodes were fixed sometimes fsck didn't detect anything. Nevertheless to get fs working we had to fsck. Also very rarely we have actually found some files with corrupted content - as we've developed all applications we thought the problem was with a bug in our code - once migrated to zfs no single occurance of bad files so probably it wasn't our code after all. Some problems were related to firmware bugs on IBM's and other arrays, some maybe due to other reasons. cyg> In any event, finding and fixing the hardware that is likely to cyg> be producing errors at the levels you suggest should be a high cyg> priority even if ZFS helps you discover the need for this in the cyg> first place (other kinds of checks could also help discover such cyg> problems, but ZFS does make it easy and provides an additional cyg> level of protection until their underlying causes have been corrected). I agree. The point is that before ZFS we weren't even sure where is the problem and main suspects were applications. Once we started moving to ZFS the main suspect was ZFS badly reporting. Then it turned out there's a bug in IBM's firmware... then in EMC (fixed), then still we got some errors - unfortunatelly after changing FC cables, GBICs, etc. the problem is still there from time to time for whatever reason. Fortunately thanks to ZFS the problem is not propagated to application. >> Then check this list for other reports on checksum >> errors from people >> running on home x86 equipment. cyg> Such anecdotal information (especially from enthusiasts) is of cyg> limited value, I'm afraid, especially when compared with a more cyg> quantitative study like CERN's. Then again, many home systems cyg> may be put together less carefully than CERN's (or for that matter my own) are. Then you probably got more homogenic environment in CERN. As I wrote - we do have lot of storage on x4500 with no single checksum error so far. So it's about entire solution and your experience may vary from environment to environment. The point again is ZFS makes your life better if you are unlucky one. >> Then you're complaining that ZFS isn't novel... cyg> When you paraphrase people and don't choose to quote them cyg> directly, it's a good idea at least to point to the material that cyg> you're purportedly representing - keeps you honest, even if you cyg> *think* you're being honest already. cyg> I certainly don't ever recall saying anything like that, so I'll cyg> ask you for that reference. I *have* suggested that *some* cyg> portions of ZFS are not as novel as Sun (perhaps I should have cyg> been more specific and said "Jonathan", since it's his recent cyg> spin in such areas that I find particularly offensive) seems to be suggesting that they are. http://weblog.infoworld.com/yager/archives/2007/10/suns_zfs_is_clo.html "So while ZFS really isn't all that 'close to perfect', nor as entirely novel as Sun might have one believe [...]" I'm sorry you're right. You write it differently. cyg> well >> comparing to other >> products easy of management and rich of features, all >> in one, is a >> good enough reason for some environments. cyg> Then why can't those over-hyping ZFS limit themselves to that cyg> kind of (entirely reasonable) assessment? Well, because if you want to win as we all know it's not only about technology. You can provide the best technology in a market but without proper marketing and hype you probably will loose... >> While WAFL >> offers >> checksumming its done differently which does offer >> less protection >> than what ZFS does. cyg> I'm afraid that you just don't know what you're talking about, cyg> Robert - and IIRC I've corrected you on this elsewhere, so you cyg> have no excuse for repeating your misconceptions now. I haven't spotted your correction. cyg> WAFL provides not one but two checksum mechanisms which separate cyg> the checksums and their updates from the data that they protect cyg> and hence should offer every bit as much protection as ZFS's checksums do. Can you point to any document? Well, below link (to NetApp page) says that checksum and datablock are in the same block. So it's nothing like ZFS. Unless they changed something and haven't updated that page. Again - can you point to some documentation. http://www.netapp.com/go/techontap/matl/sample/0206tot_resiliency.html "Brace yourself, because we saved the most insidious disk problem for last. With extreme rarity, a disk malfunction occurs in which a write operation fails but the disk is unable to detect the write failure and signals a successful write status. This event is called a "lost write," and it causes silent data corruption if no detection and correction mechanism is in place. You might think that checksums and RAID will protect you against this type of failure, but that isn't the case. Checksums are written in the block metadatacoresident with the blockduring the same I/O. In this failure mode, neither the block nor the checksum gets written, so what you see on disk is the previous data that was written to that block location with a valid checksum. Only NetApp, with its innovative WAFL (Write Anywhere File Layout) storage virtualization technology closely integrated with RAID, identifies this failure. WAFL never rewrites a block to the same location. If a block is changed, it is written to a new location, and the old block is freed. The identity of a block changes each time it is written. WAFL stores the identity of each block in the block's metadata and cross checks the identity on each read to ensure that the block being read belongs to the file and has the correct offset. If not, the data is recreated using RAID. The check doesn't have any performance impact. " When it comes to NetApp - they are really great. However thanks to ZFS I can basically get the same and more than what NetApp offers with spending much less money at the same time. -- Best regards, Robert Milkowski mailto:[EMAIL PROTECTED] http://milek.blogspot.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss