Joel,
yes, I will work on your recommended RAID1 with checksumming of
course. Just forgotten to add that.
Thanks,
Karel

On Thu, Jun 18, 2015 at 10:40 PM, Karel Gardas <[email protected]> wrote:
> On Thu, Jun 18, 2015 at 6:32 PM, Joel Sing <[email protected]> wrote:
>>
>> Re adding some form of checksumming, it only seems to make sense in the case
>> of RAID 1 where you can decide that the data on a disk is invalid, then fail
>> the read and pull the data from another drive. That coupled with block
>> level "healing" or similar could be interesting. Otherwise checksumming on
>> its own is not overly useful at this level - you would simply fail a read,
>> which then results in potentially worse than bit-flipping at higher layers.
>
> Honestly speaking, I value reliability in extreme way so I would
> rather prefer failed read than silent bit-flip propagated up to the
> application level.
>
>> If you wanted to investigate this I would suggest considering it as an option
>> to the existing RAID 1 implementation. The bulk of it would be calculating
>> and adding a checksum to each write and offsetting each block accordingly,
>> along with verification on read. The failure modes would need to be thought
>> through and handled - the re-reading from a different disk is already there,
>> however what you then do with the failure is an open question (failing the
>> chunk entirely is the heavy handed but already supported approach).
>
> I will see what I can do with it over the summer. There is indeed a
> lot of questions which will need to be solved, but let's see if I'm
> able to come up with some patch as a proof of concept first. Just for
> now thinking about propagating failures in some form into sensors
> value and showing to user without hard chunk failure -- at least up to
> some predefined threshold. Also what's I'm used to use is kind of
> scrub, which looks like is kind of supported in kernel, at least there
> are some signs of scrub support there, but I'm yet to find out for
> what reason since bioctl does not know about it (yet?). Anyway,
> correct scrubbing will require whole drive zeroing first, but then it
> may be enforced just by dd drive to /dev/null probably -- just a wild
> guess.
>
> Thanks for your information!
> Karel

Reply via email to