Joel, yes, I will work on your recommended RAID1 with checksumming of course. Just forgotten to add that. Thanks, Karel
On Thu, Jun 18, 2015 at 10:40 PM, Karel Gardas <[email protected]> wrote: > On Thu, Jun 18, 2015 at 6:32 PM, Joel Sing <[email protected]> wrote: >> >> Re adding some form of checksumming, it only seems to make sense in the case >> of RAID 1 where you can decide that the data on a disk is invalid, then fail >> the read and pull the data from another drive. That coupled with block >> level "healing" or similar could be interesting. Otherwise checksumming on >> its own is not overly useful at this level - you would simply fail a read, >> which then results in potentially worse than bit-flipping at higher layers. > > Honestly speaking, I value reliability in extreme way so I would > rather prefer failed read than silent bit-flip propagated up to the > application level. > >> If you wanted to investigate this I would suggest considering it as an option >> to the existing RAID 1 implementation. The bulk of it would be calculating >> and adding a checksum to each write and offsetting each block accordingly, >> along with verification on read. The failure modes would need to be thought >> through and handled - the re-reading from a different disk is already there, >> however what you then do with the failure is an open question (failing the >> chunk entirely is the heavy handed but already supported approach). > > I will see what I can do with it over the summer. There is indeed a > lot of questions which will need to be solved, but let's see if I'm > able to come up with some patch as a proof of concept first. Just for > now thinking about propagating failures in some form into sensors > value and showing to user without hard chunk failure -- at least up to > some predefined threshold. Also what's I'm used to use is kind of > scrub, which looks like is kind of supported in kernel, at least there > are some signs of scrub support there, but I'm yet to find out for > what reason since bioctl does not know about it (yet?). Anyway, > correct scrubbing will require whole drive zeroing first, but then it > may be enforced just by dd drive to /dev/null probably -- just a wild > guess. > > Thanks for your information! > Karel
