Re: [PATCH] fix write error handling on SR RAID1

2015-07-11 Thread Karel Gardas
On Sat, Jul 11, 2015 at 3:44 PM, Joel Sing  wrote:
> Your analysis is incorrect - offlining of chunks is handled via sr_ccb_done().
> If lower level I/O indicates an error occurred then the chunk is marked 
> offline,
> providing that the discipline has redundancy (for example, we do not offline
> chunks for RAID 0 or CRYPTO - it usually just makes things worse). This
> applies to both read and write operations.

Joel,

thanks a lot for correcting me. Indeed, sr_ccb_done looks "suspicious"
and looks like is doing exactly what I expected sr_wu_done should do.
Thanks for the lesson, I'll need to read also softraid.c more closely
next time.

BTW: Speaking about SR, how do you test the stuff? Do you have any
work-flow ho to inject random or intended errors to the device or
emulated device and this way test that the code is doing exactly what
it should? I'm thinking here about several vnds and corrupting their
files or about running OpenBSD inside VBox and corrupting VM files
etc. I would really appreciate any idea in this domain since I'll need
to test my checksumming RAID 1 once code is ready.

Thanks,
Karel



Re: [PATCH] fix write error handling on SR RAID1

2015-07-11 Thread Joel Sing
On Friday 10 July 2015 22:01:43 Karel Gardas wrote:
> On Fri, Jul 10, 2015 at 9:34 PM, Chris Cappuccio  wrote:
> > My first impression, offlining the drive after a single chunk failure
> > may be too aggressive as some errors are a result of issues other than
> > drive failures.
> 
> Indeed, it may look as too aggressive, but is my analysis written in
> comment correct? I mean: if there is a write error for whatever reason
> to one or more chunk(s) and if we completely ignore it since at least
> one write succeed, then arrays is in incorrect state where some
> drive(s) hold(s) correct data and another drive(s) hold(s) previous
> data. Since reading is done in round-robin fashion, then there is a
> chance that you will read old data in the future. If this is correct,
> then I think it calls for fix.

Your analysis is incorrect - offlining of chunks is handled via sr_ccb_done(). 
If lower level I/O indicates an error occurred then the chunk is marked 
offline, 
providing that the discipline has redundancy (for example, we do not offline 
chunks for RAID 0 or CRYPTO - it usually just makes things worse). This 
applies to both read and write operations. 

> If you do not like off-lining drive(s) just after 1 failed read, then
> perhaps correct may be to restart whole work unit and enforce writing
> again? We can even have some threshold where we may stop and consider
> the problematic block really not writeable at the end. Is something
> like that better solution?

We already offline after a single read or write failure occurs - it would be 
possible to implement some form of retry algorithm, however at some point we 
have to trust the lower layers (VFS, disk controller driver, disk hardware, 
etc).



Re: [PATCH] fix write error handling on SR RAID1

2015-07-10 Thread Karel Gardas
On Fri, Jul 10, 2015 at 9:34 PM, Chris Cappuccio  wrote:
> My first impression, offlining the drive after a single chunk failure
> may be too aggressive as some errors are a result of issues other than
> drive failures.

Indeed, it may look as too aggressive, but is my analysis written in
comment correct? I mean: if there is a write error for whatever reason
to one or more chunk(s) and if we completely ignore it since at least
one write succeed, then arrays is in incorrect state where some
drive(s) hold(s) correct data and another drive(s) hold(s) previous
data. Since reading is done in round-robin fashion, then there is a
chance that you will read old data in the future. If this is correct,
then I think it calls for fix.

If you do not like off-lining drive(s) just after 1 failed read, then
perhaps correct may be to restart whole work unit and enforce writing
again? We can even have some threshold where we may stop and consider
the problematic block really not writeable at the end. Is something
like that better solution?

Thanks,
Karel



Re: [PATCH] fix write error handling on SR RAID1

2015-07-10 Thread Chris Cappuccio
Karel Gardas [gard...@gmail.com] wrote:
> Hello,
> 
> I think I've found a bug on software RAID1 implementation of handling
> write errors. IMHO code should check if every write to every chunk
> succeed. If not, then there is an error which it needs to handle.
> Proposed patch handles such error by off-lining the problematic drive.
> The patch compiles on current, but so far it's not tested at all -- I
> send it to get some advice from developers if this is the way to go or
> not and if so, then how exactly proceed with patch send etc.
> 

My first impression, offlining the drive after a single chunk failure
may be too aggressive as some errors are a result of issues other than
drive failures. 



[PATCH] fix write error handling on SR RAID1

2015-07-10 Thread Karel Gardas
Hello,

I think I've found a bug on software RAID1 implementation of handling
write errors. IMHO code should check if every write to every chunk
succeed. If not, then there is an error which it needs to handle.
Proposed patch handles such error by off-lining the problematic drive.
The patch compiles on current, but so far it's not tested at all -- I
send it to get some advice from developers if this is the way to go or
not and if so, then how exactly proceed with patch send etc.

Thanks,
Karel

Index: sys/dev/softraid_raid1.c
===
RCS file: /cvs/src/sys/dev/softraid_raid1.c,v
retrieving revision 1.60
diff -u -r1.60 softraid_raid1.c
--- sys/dev/softraid_raid1.c27 Jan 2015 10:12:45 -  1.60
+++ sys/dev/softraid_raid1.c10 Jul 2015 17:56:36 -
@@ -415,14 +415,28 @@
 {
struct sr_discipline*sd = wu->swu_dis;
struct scsi_xfer*xs = wu->swu_xs;
+   int chunk_no;
+   int succeeding_ios;
+   struct sr_ccb   *ccb_error;

-   /* If at least one I/O succeeded, we are okay. */
-   if (wu->swu_ios_succeeded > 0) {
+   chunk_no = sd->sd_meta->ssdi.ssd_chunk_no;
+   succeeding_ios = wu->swu_ios_succeeded;
+
+   /*
+* In case of reads, we're OK when exactly one I/O succeeds,
+* in case of writes we need to have all write I/Os succeeding
+* since otherwise we may end up with reading from the failed
+* write in the future.
+*/
+   if ((xs->flags & SCSI_DATA_IN) && succeeding_ios == 1) {
xs->error = XS_NOERROR;
return SR_WU_OK;
}
-
-   /* If all I/O failed, retry reads and give up on writes. */
+   if ((xs->flags & SCSI_DATA_OUT) && succeeding_ios == chunk_no) {
+   xs->error = XS_NOERROR;
+   return SR_WU_OK;
+   }
+   /* Something failed here so retry reads and give up on writes. */
if (xs->flags & SCSI_DATA_IN) {
printf("%s: retrying read on block %lld\n",
sd->sd_meta->ssd_devname, (long long)wu->swu_blk_start);
@@ -436,6 +450,13 @@
} else {
printf("%s: permanently failing write on block %lld\n",
sd->sd_meta->ssd_devname, (long long)wu->swu_blk_start);
+   TAILQ_FOREACH(ccb_error, &wu->swu_ccb, ccb_link) {
+   if (ccb_error->ccb_state == SR_CCB_FAILED) {
+   printf("%s: chunk: %d going off-line
due to write error.\n",
+   sd->sd_meta->ssd_devname,
ccb_error->ccb_target);
+   sr_raid1_set_chunk_state(sd,
ccb_error->ccb_target, BIOC_SDOFFLINE);
+   }
+   }
}

wu->swu_state = SR_WU_FAILED;