Hey, Bob, Though I have already got the answer I was looking for here, I thought I'd at least take the time to provide my point of view as to my *why*...
First: I don't think any of us have forgotten the goodness that ZFS's checksum *can* bring. I'm also keenly aware that we have some customers running HDS / EMC boxes who disable the ZFS checksum by default because they 'don't want to have files break due to a single bit flip...' and they really don't care where the flip happens, and they don't want to 'waste' disks or bandwidth allowing ZFS to do it's own protection when they already pay for it inside their zillion dollar disk box. (Some say waste, some call it insurance... ;). Oracle users in particular seem to have this mindset, though that's another thread entirely. :) I'd suspect we don't hear people whining about single bit flips, because they would not know if it's happening unless the app sitting on top had it's own protection. Or - if the error is obvious, or crashes their system... Or if they were running ZFS, but at this stage, we cannot delineate between single bit or massively crapped out errors, so what's to say we are NOT seeing it? Also - Don't assume bit rot on disk is the only way we can get single bit errors. Considering that until very recently (and quite likely even now to a reasonable extent), most CPU's did not have data protection in *every* place data transited through, single bit flips are still a very real possibility, and becoming more likely as process shrinks continue. Granted, on CPU's with Register Parity protection, undetected doubles are more likely to 'slip under the radar', as registers are typically protected with parity at best, if at all... A single bit in the parity protected register will be detected, a double won't. It does seem that some of us are getting a little caught up in disks and their magnificence in what they write to the platter and read back, and overlooking the potential value of a simple (though potentially computationally expensive) circus trick, which might, just might, make your broken 1TB archive useful again... I don't think it's a good idea for us to assume that it's OK to 'leave out' potential goodness for the masses that want to use ZFS in non-enterprise environments like laptops / home PC's, or use commodity components in conjunction with the Big Stuff... (Like white box PC's connected to an EMC or HDS box... ) Anyhoo - I'm glad we have pretty much already done this work once before. It gives me hope that we'll see it make a comeback. ;) (And I look forward to Jeff & Co developing a hyper cool way of generating 128000000 checksums using all 64 threads of a Niagara 2, using the same source data in cache, so we don't need to hit memory, so that it happens in the blink of an eye. or two. ok - maybe three... ;) Maybe we could also use the SPU's as well... OK - So, I'm possibly dreaming here, but hell, if I'm dreaming, why not dream big. :) Nathan. Bob Friesenhahn wrote: > On Mon, 3 Mar 2008, me wrote: > >> I'm sure people using no redundancy (e.g. future OSX users) would >> appreciate it, saving some grief if the bad blocks are indeed just >> single bit flips. > > In case people have somehow forgotten, most other filesystems in > common use do not checksum data blocks. In spite of this, we rarely > hear users wailing about single bit flips in their files. Instead we > usually hear about people who find whole chunks of their file missing > or overwritten, or find that the hard disk does not spin up at all any > more. As we move toward solid state storage, the typical error cases > will surely differ. > > Since ZFS is smart and is able to perform tasks in the background, one > possibility to consider is to use otherwise unused storage space to > store "weak" ditto copies or even forward error correction data. > However, rather than explicitly writing these blocks during normal > I/O, they could be created by a background task, and reused for other > purposes when required. In this way, otherwise unused disk blocks > would be taken advantage of in a similar way that otherwise unused > memory is used to cache filesystem data. If the filesystem becomes > very full, then there would be less protection but if the filesystem > has plenty of free space then there would be lots of protection. > > Bob > ====================================== > Bob Friesenhahn > [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss