Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread Bart Van Assche
On 05/26/14 17:27, James Bottomley wrote:
 On Mon, 2014-05-26 at 17:15 +0200, Bart Van Assche wrote:
 Make it possible to test the REQ_ATOM_COMPLETE bit from outside the
 block layer core.
 
 I don't see the value of patches 2,3 they're checking for an impossible
 condition ... why might it be possible?

When reading the source code in scsi_error.c it's easy to overlook that
scmd_eh_abort_handler(), scsi_abort_command() and scsi_eh_scmd_add() are
all invoked for requests in which the REQ_ATOM_COMPLETE bit has been
set. Although it is possible to mention this as a comment above these
functions, such comments are not checked at runtime. It would require
additional work from the reader to verify whether or not such source
code comments are up to date. However, the condition inside a
WARN_ON_ONCE() statement is checked every time the code is executed.
Hence my preference for a WARN_ON_ONCE() statement instead of writing
down somewhere that these three functions operate on requests in which
the REQ_ATOM_COMPLETE bit has been set.

Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread h...@infradead.org
On Tue, May 27, 2014 at 09:49:48AM +0200, Bart Van Assche wrote:
  I don't see the value of patches 2,3 they're checking for an impossible
  condition ... why might it be possible?
 
 When reading the source code in scsi_error.c it's easy to overlook that
 scmd_eh_abort_handler(), scsi_abort_command() and scsi_eh_scmd_add() are
 all invoked for requests in which the REQ_ATOM_COMPLETE bit has been
 set. Although it is possible to mention this as a comment above these
 functions, such comments are not checked at runtime. It would require
 additional work from the reader to verify whether or not such source
 code comments are up to date. However, the condition inside a
 WARN_ON_ONCE() statement is checked every time the code is executed.
 Hence my preference for a WARN_ON_ONCE() statement instead of writing
 down somewhere that these three functions operate on requests in which
 the REQ_ATOM_COMPLETE bit has been set.


I really do like the REQ_ATOM_COMPLETE asserts - as experience tells
these kinds of assumptions are best checked, otherwise they will
unintentionally be violated.

I'm less excited about the list walk I have to say, as the overhead is
getting fairly large for a simple assertation.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread James Bottomley
On Tue, 2014-05-27 at 09:49 +0200, Bart Van Assche wrote:
 On 05/26/14 17:27, James Bottomley wrote:
  On Mon, 2014-05-26 at 17:15 +0200, Bart Van Assche wrote:
  Make it possible to test the REQ_ATOM_COMPLETE bit from outside the
  block layer core.
  
  I don't see the value of patches 2,3 they're checking for an impossible
  condition ... why might it be possible?
 
 When reading the source code in scsi_error.c it's easy to overlook that
 scmd_eh_abort_handler(), scsi_abort_command() and scsi_eh_scmd_add() are
 all invoked for requests in which the REQ_ATOM_COMPLETE bit has been
 set.

I really don't like this entanglement of state of block and SCSI.
complete in block terms isn't the same as in SCSI terms.  The
REQ_ATOM_COMPLETE flag is fully internal to block and indicates that
we've taken over processing the command and any completions into block
get ignored.  This is for the possible race between timeout and inbound
command completion.  If you start coding SCSI assertions in terms of it,
you're entangling layers that should be separate.

The assertion in SCSI terms is that abort and -done cannot race.

James

  Although it is possible to mention this as a comment above these
 functions, such comments are not checked at runtime. It would require
 additional work from the reader to verify whether or not such source
 code comments are up to date. However, the condition inside a
 WARN_ON_ONCE() statement is checked every time the code is executed.
 Hence my preference for a WARN_ON_ONCE() statement instead of writing
 down somewhere that these three functions operate on requests in which
 the REQ_ATOM_COMPLETE bit has been set.


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread Bart Van Assche
On 05/27/14 10:23, James Bottomley wrote:
 On Tue, 2014-05-27 at 09:49 +0200, Bart Van Assche wrote:
 When reading the source code in scsi_error.c it's easy to overlook that
 scmd_eh_abort_handler(), scsi_abort_command() and scsi_eh_scmd_add() are
 all invoked for requests in which the REQ_ATOM_COMPLETE bit has been
 set.
 
 I really don't like this entanglement of state of block and SCSI.
 complete in block terms isn't the same as in SCSI terms.  The
 REQ_ATOM_COMPLETE flag is fully internal to block and indicates that
 we've taken over processing the command and any completions into block
 get ignored.  This is for the possible race between timeout and inbound
 command completion.  If you start coding SCSI assertions in terms of it,
 you're entangling layers that should be separate.
 
 The assertion in SCSI terms is that abort and -done cannot race.

Recent e-mail threads on the linux-scsi mailing list have shown that it
is easily overlooked which error handler functions are called only for
requests marked by the block layer as complete and hence cannot race
with scsi_softirq_done(). So far my proposals for how to improve the
documentation of how this race is avoided were rejected. Do you perhaps
have a proposal of how this should be documented properly ?

Thanks,

Bart.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread James Bottomley
On Tue, 2014-05-27 at 11:00 +0200, Bart Van Assche wrote:
 On 05/27/14 10:23, James Bottomley wrote:
  On Tue, 2014-05-27 at 09:49 +0200, Bart Van Assche wrote:
  When reading the source code in scsi_error.c it's easy to overlook that
  scmd_eh_abort_handler(), scsi_abort_command() and scsi_eh_scmd_add() are
  all invoked for requests in which the REQ_ATOM_COMPLETE bit has been
  set.
  
  I really don't like this entanglement of state of block and SCSI.
  complete in block terms isn't the same as in SCSI terms.  The
  REQ_ATOM_COMPLETE flag is fully internal to block and indicates that
  we've taken over processing the command and any completions into block
  get ignored.  This is for the possible race between timeout and inbound
  command completion.  If you start coding SCSI assertions in terms of it,
  you're entangling layers that should be separate.
  
  The assertion in SCSI terms is that abort and -done cannot race.
 
 Recent e-mail threads on the linux-scsi mailing list have shown that it
 is easily overlooked which error handler functions are called only for
 requests marked by the block layer as complete and hence cannot race
 with scsi_softirq_done(). So far my proposals for how to improve the
 documentation of how this race is avoided were rejected. Do you perhaps
 have a proposal of how this should be documented properly ?

Um, if that's your goal, then I don't see how adding a 
WARN_ON_ONCE(!blk_rq_completed(scmd-request));
makes it clear that timeout and the softirq cannot race.

I suppose for me, it's just an obvious systems assertion since every
command must be in a defined state for the state model, either a command
has timed out or it's in normal completion.  But since we inherit this
race mediation from block, isn't that the place to document it if people
are confused?

I could also see us one day extending the TMF capability to abort any
running command, which would make even an assertion of block timed out
or completed invalid.

James

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread Paolo Bonzini

Il 27/05/2014 12:21, James Bottomley ha scritto:

I could also see us one day extending the TMF capability to abort any
running command, which would make even an assertion of block timed out
or completed invalid.


Actually the assertion would remain valid, and that's exactly what Bart 
wants to document with this assertion.


Completed from the point of view of the block layer really means do 
not run the softirq.  If you wanted to abort any running command, you 
still would mark the request as completed for the block layer before 
issuing the TMF.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread James Bottomley
On Tue, 2014-05-27 at 12:47 +0200, Paolo Bonzini wrote:
 Il 27/05/2014 12:21, James Bottomley ha scritto:
  I could also see us one day extending the TMF capability to abort any
  running command, which would make even an assertion of block timed out
  or completed invalid.
 
 Actually the assertion would remain valid, and that's exactly what Bart 
 wants to document with this assertion.

No, it wouldn't: if we abort a running command by definition the command
hadn't timed out and might not be completed.  This is required by TMF
handling because now you have an abort racing with a completion.  Either
the command completes normally because it misses the abort or the abort
gets to it and its returned status is set to TASK_ABORTED.  That's the
only way you can tell if the abort was successful or not.

If you're thinking we would tell block to ignore returning commands
before issuing the abort, we'd never be able to tell if the abort were
successful, so we have to allow the race to collect the status.

James

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread Paolo Bonzini

Il 27/05/2014 12:59, James Bottomley ha scritto:

On Tue, 2014-05-27 at 12:47 +0200, Paolo Bonzini wrote:

Il 27/05/2014 12:21, James Bottomley ha scritto:

I could also see us one day extending the TMF capability to abort any
running command, which would make even an assertion of block timed out
or completed invalid.


Actually the assertion would remain valid, and that's exactly what Bart
wants to document with this assertion.


No, it wouldn't: if we abort a running command by definition the command
hadn't timed out and might not be completed.  This is required by TMF
handling because now you have an abort racing with a completion.  Either
the command completes normally because it misses the abort or the abort
gets to it and its returned status is set to TASK_ABORTED.  That's the
only way you can tell if the abort was successful or not.

If you're thinking we would tell block to ignore returning commands
before issuing the abort, we'd never be able to tell if the abort were
successful, so we have to allow the race to collect the status.


You could use a different mechanism than a softirq to tell the abort 
were successful, for example by overriding scsi_done.  But with respect 
to the block layer, the mechanics of avoiding the race and double-free 
would probably be the same.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread Paolo Bonzini

Il 27/05/2014 13:26, James Bottomley ha scritto:

You could use a different mechanism than a softirq to tell the abort
were successful, for example by overriding scsi_done.  But with respect
to the block layer, the mechanics of avoiding the race and double-free
would probably be the same.


I think there's some confusion about what the race and double free is:
It only occurs with timeouts.  In a timeout situation, the host had
decided it's not waiting any longer for the target to respond and
proceeds to error recovery.  At any time between the host making this
decision up to the point it kicks the target hard enough to clear all
in-flight commands, the target may return the command.  If we didn't
have some ignore function on command completions while we're handling
errors, this would lead to double completion.

If we decided to allow arbitrary aborts of running commands, we would
send a TMF in during the normal (i.e. un timed out) command period.
Because there's no timeout involved, there's no double free problem.
The race in this case is whether the abort catches the command or not
and to mediate that race we need the normal status return.


I'm not sure why no timeout implies no double free.  There would 
still be a race between the interrupt handler and softirq on one side, 
and the abort handler on the other.  The interrupt handler's call to 
cmd-scsi_done ends up triggering the softirq and thus freeing the 
command with scsi_put_command.  The abort handler, as you mentioned, 
wants the status return so it needs the interrupt handler to run---but 
not the softirq.


A simple way to avoid this could be to skip the softirq processing, by 
marking the request block-layer-complete, and do everything in the abort 
handler.  The interrupt handler would still run and fill in the status.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-27 Thread James Bottomley
On Tue, 2014-05-27 at 13:52 +0200, Paolo Bonzini wrote:
 Il 27/05/2014 13:26, James Bottomley ha scritto:
  You could use a different mechanism than a softirq to tell the abort
  were successful, for example by overriding scsi_done.  But with respect
  to the block layer, the mechanics of avoiding the race and double-free
  would probably be the same.
 
  I think there's some confusion about what the race and double free is:
  It only occurs with timeouts.  In a timeout situation, the host had
  decided it's not waiting any longer for the target to respond and
  proceeds to error recovery.  At any time between the host making this
  decision up to the point it kicks the target hard enough to clear all
  in-flight commands, the target may return the command.  If we didn't
  have some ignore function on command completions while we're handling
  errors, this would lead to double completion.
 
  If we decided to allow arbitrary aborts of running commands, we would
  send a TMF in during the normal (i.e. un timed out) command period.
  Because there's no timeout involved, there's no double free problem.
  The race in this case is whether the abort catches the command or not
  and to mediate that race we need the normal status return.
 
 I'm not sure why no timeout implies no double free.  There would 
 still be a race between the interrupt handler and softirq on one side, 
 and the abort handler on the other.

You're assuming the current error handling after timeout methodology.
If we allow arbitrary TMFs, that doesn't hold because we have to wait
for the command completion.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-26 Thread Bart Van Assche
Make it possible to test the REQ_ATOM_COMPLETE bit from outside the
block layer core.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Jens Axboe ax...@fb.com
Cc: Hannes Reinecke h...@suse.de
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Joe Lawrence joe.lawre...@stratus.com
---
 block/blk-softirq.c| 13 +
 include/linux/blkdev.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/block/blk-softirq.c b/block/blk-softirq.c
index 53b1737..fc7d160 100644
--- a/block/blk-softirq.c
+++ b/block/blk-softirq.c
@@ -172,6 +172,19 @@ void blk_complete_request(struct request *req)
 }
 EXPORT_SYMBOL(blk_complete_request);
 
+/**
+ * blk_rq_completed - whether or not a request has been completed
+ *
+ * Intended for debugging purposes only. Any completion handler code should go
+ * into the softirq_done_fn() and/or in the rq_timed_out_fn() request_queue
+ * callback functions.
+ */
+bool blk_rq_completed(struct request *rq)
+{
+   return test_bit(REQ_ATOM_COMPLETE, rq-atomic_flags);
+}
+EXPORT_SYMBOL(blk_rq_completed);
+
 static __init int blk_softirq_init(void)
 {
int i;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0d84981..a621bc5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -952,6 +952,7 @@ extern void blk_complete_request(struct request *);
 extern void __blk_complete_request(struct request *);
 extern void blk_abort_request(struct request *);
 extern void blk_unprep_request(struct request *);
+extern bool blk_rq_completed(struct request *);
 
 /*
  * Access functions for manipulating queue properties
-- 
1.8.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] block: Introduce blk_rq_completed()

2014-05-26 Thread Hannes Reinecke

On 05/26/2014 05:15 PM, Bart Van Assche wrote:

Make it possible to test the REQ_ATOM_COMPLETE bit from outside the
block layer core.

Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Jens Axboe ax...@fb.com
Cc: Hannes Reinecke h...@suse.de
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Christoph Hellwig h...@infradead.org
Cc: Joe Lawrence joe.lawre...@stratus.com

Reviewed-by: Hannes Reinecke h...@suse.de

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html