Re: the " 'official' point of view" expressed by kernelnewbies.org

Ric Wheeler Wed, 16 Aug 2006 08:13:05 -0700

Hans Reiser wrote:

I am skeptical that bitflip errors above the storage layer are as common
as the ZFS authors say, and their statistics that I have seen somehow
lack a lot of detail about how they were gathered.  If, say, a device
with 100 errors counts as 100 instances for their statistics.....  Well,
it would be nice to know how they were gathered.  Next time I meet them
I must ask.

I think that most big vendors have a lot of information about failurerates on drives, but cannot actually share the details in public (due toNDA's with the suppliers).

One thing that we are trying to do is to get some of the more"community" oriented people at Seagate Research to come out and talk tothe people about what are reasonable types of errors to code against.Current idea is to get everyone in the same place a couple of daysbefore the next FAST conference (i.e., linux IO people or file systempeople and these vendors). (See the USENIX page for details on FAST athttp://www.usenix.org/events/fast07/cfp/).

I will say that media errors tend to be larger than single bit errors,i.e. you will lose a set of sectors instead of seeing a single bit flipon one sector (remember that the drive vendors do extensive ECC at theirlevel). What their ECC will not fix is something like junk settling onthe platter or a really bad error like a bad disk head.

That said, if users want it, there should be a plugin that checks the bits.

I agree that stripe awareness and the need to signal the underlying raid
that a block needs to be recovered is important.  Checksumming at the fs
level seems like a reasonable plugin.

I have no opinion on the computational cost of ECC vs. checksums, I will
trust that you are correct.

What we can (and should) do is to make sure that we detect errors muchbetter than we do today. I think that ECC would be overkill, but wecertainly could do simple checksums for strategic parts of the filesystem data.

Also note that there is work underway in the SCSI space on somethingcalled "block guard" that defines some extra bytes per disk sector forapplication level data. That could be used for per block sanitychecking information, but how to get at it from the file system is aninteresting question.

Val's write up at LWN on the file system workshop & the comments on thatwrite up has a an active discussion on this kind of thing(http://lwn.net/Articles/190222/).

ric

Re: the " 'official' point of view" expressed by kernelnewbies.org

Reply via email to