Re: [PATCH] fix write error handling on SR RAID1
On Sat, Jul 11, 2015 at 3:44 PM, Joel Sing wrote: > Your analysis is incorrect - offlining of chunks is handled via sr_ccb_done(). > If lower level I/O indicates an error occurred then the chunk is marked > offline, > providing that the discipline has redundancy (for example, we do not offline > chunks for RAID 0 or CRYPTO - it usually just makes things worse). This > applies to both read and write operations. Joel, thanks a lot for correcting me. Indeed, sr_ccb_done looks "suspicious" and looks like is doing exactly what I expected sr_wu_done should do. Thanks for the lesson, I'll need to read also softraid.c more closely next time. BTW: Speaking about SR, how do you test the stuff? Do you have any work-flow ho to inject random or intended errors to the device or emulated device and this way test that the code is doing exactly what it should? I'm thinking here about several vnds and corrupting their files or about running OpenBSD inside VBox and corrupting VM files etc. I would really appreciate any idea in this domain since I'll need to test my checksumming RAID 1 once code is ready. Thanks, Karel
Re: [PATCH] fix write error handling on SR RAID1
On Friday 10 July 2015 22:01:43 Karel Gardas wrote: > On Fri, Jul 10, 2015 at 9:34 PM, Chris Cappuccio wrote: > > My first impression, offlining the drive after a single chunk failure > > may be too aggressive as some errors are a result of issues other than > > drive failures. > > Indeed, it may look as too aggressive, but is my analysis written in > comment correct? I mean: if there is a write error for whatever reason > to one or more chunk(s) and if we completely ignore it since at least > one write succeed, then arrays is in incorrect state where some > drive(s) hold(s) correct data and another drive(s) hold(s) previous > data. Since reading is done in round-robin fashion, then there is a > chance that you will read old data in the future. If this is correct, > then I think it calls for fix. Your analysis is incorrect - offlining of chunks is handled via sr_ccb_done(). If lower level I/O indicates an error occurred then the chunk is marked offline, providing that the discipline has redundancy (for example, we do not offline chunks for RAID 0 or CRYPTO - it usually just makes things worse). This applies to both read and write operations. > If you do not like off-lining drive(s) just after 1 failed read, then > perhaps correct may be to restart whole work unit and enforce writing > again? We can even have some threshold where we may stop and consider > the problematic block really not writeable at the end. Is something > like that better solution? We already offline after a single read or write failure occurs - it would be possible to implement some form of retry algorithm, however at some point we have to trust the lower layers (VFS, disk controller driver, disk hardware, etc).
Re: [PATCH] fix write error handling on SR RAID1
On Fri, Jul 10, 2015 at 9:34 PM, Chris Cappuccio wrote: > My first impression, offlining the drive after a single chunk failure > may be too aggressive as some errors are a result of issues other than > drive failures. Indeed, it may look as too aggressive, but is my analysis written in comment correct? I mean: if there is a write error for whatever reason to one or more chunk(s) and if we completely ignore it since at least one write succeed, then arrays is in incorrect state where some drive(s) hold(s) correct data and another drive(s) hold(s) previous data. Since reading is done in round-robin fashion, then there is a chance that you will read old data in the future. If this is correct, then I think it calls for fix. If you do not like off-lining drive(s) just after 1 failed read, then perhaps correct may be to restart whole work unit and enforce writing again? We can even have some threshold where we may stop and consider the problematic block really not writeable at the end. Is something like that better solution? Thanks, Karel
Re: [PATCH] fix write error handling on SR RAID1
Karel Gardas [gard...@gmail.com] wrote: > Hello, > > I think I've found a bug on software RAID1 implementation of handling > write errors. IMHO code should check if every write to every chunk > succeed. If not, then there is an error which it needs to handle. > Proposed patch handles such error by off-lining the problematic drive. > The patch compiles on current, but so far it's not tested at all -- I > send it to get some advice from developers if this is the way to go or > not and if so, then how exactly proceed with patch send etc. > My first impression, offlining the drive after a single chunk failure may be too aggressive as some errors are a result of issues other than drive failures.
[PATCH] fix write error handling on SR RAID1
Hello, I think I've found a bug on software RAID1 implementation of handling write errors. IMHO code should check if every write to every chunk succeed. If not, then there is an error which it needs to handle. Proposed patch handles such error by off-lining the problematic drive. The patch compiles on current, but so far it's not tested at all -- I send it to get some advice from developers if this is the way to go or not and if so, then how exactly proceed with patch send etc. Thanks, Karel Index: sys/dev/softraid_raid1.c === RCS file: /cvs/src/sys/dev/softraid_raid1.c,v retrieving revision 1.60 diff -u -r1.60 softraid_raid1.c --- sys/dev/softraid_raid1.c27 Jan 2015 10:12:45 - 1.60 +++ sys/dev/softraid_raid1.c10 Jul 2015 17:56:36 - @@ -415,14 +415,28 @@ { struct sr_discipline*sd = wu->swu_dis; struct scsi_xfer*xs = wu->swu_xs; + int chunk_no; + int succeeding_ios; + struct sr_ccb *ccb_error; - /* If at least one I/O succeeded, we are okay. */ - if (wu->swu_ios_succeeded > 0) { + chunk_no = sd->sd_meta->ssdi.ssd_chunk_no; + succeeding_ios = wu->swu_ios_succeeded; + + /* +* In case of reads, we're OK when exactly one I/O succeeds, +* in case of writes we need to have all write I/Os succeeding +* since otherwise we may end up with reading from the failed +* write in the future. +*/ + if ((xs->flags & SCSI_DATA_IN) && succeeding_ios == 1) { xs->error = XS_NOERROR; return SR_WU_OK; } - - /* If all I/O failed, retry reads and give up on writes. */ + if ((xs->flags & SCSI_DATA_OUT) && succeeding_ios == chunk_no) { + xs->error = XS_NOERROR; + return SR_WU_OK; + } + /* Something failed here so retry reads and give up on writes. */ if (xs->flags & SCSI_DATA_IN) { printf("%s: retrying read on block %lld\n", sd->sd_meta->ssd_devname, (long long)wu->swu_blk_start); @@ -436,6 +450,13 @@ } else { printf("%s: permanently failing write on block %lld\n", sd->sd_meta->ssd_devname, (long long)wu->swu_blk_start); + TAILQ_FOREACH(ccb_error, &wu->swu_ccb, ccb_link) { + if (ccb_error->ccb_state == SR_CCB_FAILED) { + printf("%s: chunk: %d going off-line due to write error.\n", + sd->sd_meta->ssd_devname, ccb_error->ccb_target); + sr_raid1_set_chunk_state(sd, ccb_error->ccb_target, BIOC_SDOFFLINE); + } + } } wu->swu_state = SR_WU_FAILED;