Re: [dm-devel] bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-08 Thread Benjamin Block
On 12:06 Fri 05 Aug , Mike Snitzer wrote:
> On Fri, Aug 05 2016 at 11:54am -0400,
> Jens Axboe <ax...@kernel.dk> wrote:
> 
> > On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> > >On Fri, Aug 05 2016 at 11:33P -0400,
> > >Jens Axboe <ax...@kernel.dk> wrote:
> > >
> > >>On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> > >>>On Wed, Aug 03 2016 at 11:35am -0400,
> > >>>Benjamin Block <bbl...@linux.vnet.ibm.com> wrote:
> > >>>
> > >>>>Hej Mike,
> > >>>>
> > >>>>when running a debug-kernel today with several multipath-devices using
> > >>>>the round-robin path selector I noticed that the kernel throws these
> > >>>>warnings here:
> > >>>>
> > >>>>BUG: using smp_processor_id() in preemptible [] code: 
> > >>>>kdmwork-252:0/881
> > >>>>caller is rr_select_path+0x36/0x108 [dm_round_robin]
> > >>>>CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> > >>>>  617679b8 61767a48 0002 
> > >>>> 
> > >>>>  61767ae8 61767a60 61767a60 
> > >>>> 001145d0
> > >>>>   00b962ae 00bb291e 
> > >>>> 000b
> > >>>>  61767aa8 61767a48  
> > >>>> 
> > >>>>  07b962ae 001145d0 61767a48 
> > >>>> 61767aa8
> > >>>>Call Trace:
> > >>>>([<001144a2>] show_trace+0x8a/0xe0)
> > >>>>([<00114586>] show_stack+0x8e/0xf0)
> > >>>>([<006c7fdc>] dump_stack+0x9c/0xe0)
> > >>>>([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> > >>>>([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> > >>>>([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> > >>>>([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> > >>>>([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> > >>>>([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> > >>>>([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> > >>>>([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> > >>>>([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> > >>>>([<001681da>] kthread+0x112/0x120)
> > >>>>([<0098378a>] kernel_thread_starter+0x6/0xc)
> > >>>>([<00983784>] kernel_thread_starter+0x0/0xc)
> > >>>>no locks held by kdmwork-252:0/881.
> > >>>>
[:snip:]
> > 
> > I always forget the details (if this confuses lockdep or not), but you
> > could potentially turn it into:
> > 
> > local_irq_save(flags);
> > x = this_cpu_ptr();
> > [...]
> > 
> > spin_lock(>lock);
> > [...]
> > 
> > instead.
> 
> Cool, I've coded up the patch (compile tested only).
> 
> Benjamin, any chance you could test this against your v4.7 kernel and
> report back?
> 
> Thanks,
> Mike
> 
> diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
> index 4ace1da..ed446f8 100644
> --- a/drivers/md/dm-round-robin.c
> +++ b/drivers/md/dm-round-robin.c
> @@ -210,14 +210,17 @@ static struct dm_path *rr_select_path(struct 
> path_selector *ps, size_t nr_bytes)
>   struct path_info *pi = NULL;
>   struct dm_path *current_path = NULL;
> 
> + local_irq_save(flags);
>   current_path = *this_cpu_ptr(s->current_path);
>   if (current_path) {
>   percpu_counter_dec(>repeat_count);
> - if (percpu_counter_read_positive(>repeat_count) > 0)
> + if (percpu_counter_read_positive(>repeat_count) > 0) {
> + local_irq_restore(flags);
>   return current_path;
> + }
>   }
> 
> - spin_lock_irqsave(>lock, flags);
> + spin_lock(>lock);
>   if (!list_empty(>valid_paths)) {
>   pi = list_entry(s->valid_paths.next, struct path_info, list);
>   list_move_tail(>list, >valid_paths);
> @@ -225,7 +228,8 @@ static struct dm_path *rr_select_path(struct 
&g

Re: bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-07 Thread Benjamin Block
On 12:06 Fri 05 Aug , Mike Snitzer wrote:
> On Fri, Aug 05 2016 at 11:54am -0400,
> Jens Axboe <ax...@kernel.dk> wrote:
> 
> > On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> > >On Fri, Aug 05 2016 at 11:33P -0400,
> > >Jens Axboe <ax...@kernel.dk> wrote:
> > >
> > >>On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> > >>>On Wed, Aug 03 2016 at 11:35am -0400,
> > >>>Benjamin Block <bbl...@linux.vnet.ibm.com> wrote:
> > >>>
> > >>>>Hej Mike,
> > >>>>
> > >>>>when running a debug-kernel today with several multipath-devices using
> > >>>>the round-robin path selector I noticed that the kernel throws these
> > >>>>warnings here:
> > >>>>
> > >>>>BUG: using smp_processor_id() in preemptible [] code: 
> > >>>>kdmwork-252:0/881
> > >>>>caller is rr_select_path+0x36/0x108 [dm_round_robin]
> > >>>>CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> > >>>>  617679b8 61767a48 0002 
> > >>>> 
> > >>>>  61767ae8 61767a60 61767a60 
> > >>>> 001145d0
> > >>>>   00b962ae 00bb291e 
> > >>>> 000b
> > >>>>  61767aa8 61767a48  
> > >>>> 
> > >>>>  07b962ae 001145d0 61767a48 
> > >>>> 61767aa8
> > >>>>Call Trace:
> > >>>>([<001144a2>] show_trace+0x8a/0xe0)
> > >>>>([<00114586>] show_stack+0x8e/0xf0)
> > >>>>([<006c7fdc>] dump_stack+0x9c/0xe0)
> > >>>>([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> > >>>>([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> > >>>>([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> > >>>>([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> > >>>>([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> > >>>>([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> > >>>>([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> > >>>>([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> > >>>>([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> > >>>>([<001681da>] kthread+0x112/0x120)
> > >>>>([<0098378a>] kernel_thread_starter+0x6/0xc)
> > >>>>([<00983784>] kernel_thread_starter+0x0/0xc)
> > >>>>no locks held by kdmwork-252:0/881.
> > >>>>
> > >>>>
> > >>>>There are several more of these warnings, but all have the same
> > >>>>stack-trace (this is on s390x, but this looks like its only common code)
> > >>>>- sometimes the process-context is multipath.
> > >>>>
> > >>>>Looking at the changes in this function, it rather looks like this is
> > >>>>caused by changes made in commit
> > >>>>b0b477c7e0dd93f8916d106018ded1331b81bf61 (dm round robin: use percpu
> > >>>>'repeat_count' and 'current_path').
> > >>>>
> > >>>>The kernel is a stock v4.7 with some debug options enabled (prominently
> > >>>>DEBUG_PREEMPT). Need any more info?
> > >>>
> > >>>As you can see from commit b0b477c7e0dd9 the round-robin path selector
> > >>>is now using percpu data (pointer) and a percpu_counter.
> > >>>
> > >>>I'm really not sure how else to access this percpu data.
> > >>>
> > >>>Cc'ing LKML, linux-block, Jens and hch to cast a wider net in the hopes
> > >>>of getting an answer to how to fix this.
> > >>
> > >>From a quick look, looks like you are using this_cpu_ptr() without
> > >>having preemption disabled.
> > >
> > >Right, that is what it looked like to me too.
> > >
> > >I'm just not sure on what the proper pattern is to fix this.
> > >
> > >I'll look closer though.
> > 
> > I always forget the details (if this confuses lockdep or not), b

Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-12 Thread Benjamin Block
On Fri, Apr 07, 2017 at 11:16:48AM -0700, Bart Van Assche wrote:
> Hello Jens,
> 
> The six patches in this patch series fix the queue lockup I reported
> recently on the linux-block mailing list. Please consider these patches
> for inclusion in the upstream kernel.
> 

Hey Bart,

just out of curiosity. Is this maybe related to similar stuff happening
when CPUs are hot plugged - at least in that the stack gets stuck? Like
in this thread here:
https://www.mail-archive.com/linux-block@vger.kernel.org/msg06057.html

Would be interesting, because we recently saw similar stuff happening.


Beste Grüße / Best regards,
      - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-13 Thread Benjamin Block
On Wed, Apr 12, 2017 at 06:11:25PM +, Bart Van Assche wrote:
> On Wed, 2017-04-12 at 12:55 +0200, Benjamin Block wrote:
> > On Fri, Apr 07, 2017 at 11:16:48AM -0700, Bart Van Assche wrote:
> > > The six patches in this patch series fix the queue lockup I reported
> > > recently on the linux-block mailing list. Please consider these patches
> > > for inclusion in the upstream kernel.
> >
> > just out of curiosity. Is this maybe related to similar stuff happening
> > when CPUs are hot plugged - at least in that the stack gets stuck? Like
> > in this thread here:
> > https://www.mail-archive.com/linux-block@vger.kernel.org/msg06057.html
> >
> > Would be interesting, because we recently saw similar stuff happening.
>
> Hello Benjamin,
>
> My proposal is to repeat that test with Jens' for-next branch. If the issue
> still occurs with that tree then please check the contents of
> /sys/kernel/debug/block/*/mq/*/{dispatch,*/rq_list}. That will allow to
> determine whether or not any block layer requests are still pending. If
> running the command below resolves the deadlock then it means that a
> trigger to run a block layer queue is still missing somewhere:
>
> for a in /sys/kernel/debug/block/*/mq/state; do echo run >$a; done
>
> See also git://git.kernel.dk/linux-block.git.
>

Thx for the hint! I'll forward that and see if the affected folks are
willing to reproduce.


    Beste Grüße / Best regards,
  - Benjamin Block
--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-11 Thread Benjamin Block
d: GB   
4.13.0-rc4-bsg-regression+ #2
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Call Trace:
([<00117532>] show_stack+0x8a/0xe0)
 [<00bcbaee>] dump_stack+0x96/0xd8
 [<003cd5fc>] slab_err+0xac/0xc0
 [<003d68e4>] free_debug_processing+0x554/0x570
 [<003d69ae>] __slab_free+0xae/0x618
 [<003d7dce>] kfree+0x44e/0x4a0
 [<00859b4e>] blk_done_softirq+0x146/0x160
 [<00bf4ec0>] __do_softirq+0x3d0/0x840
 [<001662a6>] run_ksoftirqd+0x3e/0xb8
 [<001957fc>] smpboot_thread_fn+0x2f4/0x318
 [<0018f6f6>] kthread+0x166/0x178
 [<00bf3cf2>] kernel_thread_starter+0x6/0xc
 [<00bf3cec>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
FIX kmalloc-1024: Object at 0x4ad9d650 not freed
=
BUG kmalloc-1024 (Tainted: GB  ): Invalid object pointer 
0x4ad9eb90
-

INFO: Slab 0x03d1012b6600 objects=24 used=17 fp=0x4ad99548 
flags=0x3fffc008101
CPU: 2 PID: 20 Comm: ksoftirqd/2 Tainted: GB   
4.13.0-rc4-bsg-regression+ #2
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Call Trace:
([<00117532>] show_stack+0x8a/0xe0)
 [<00bcbaee>] dump_stack+0x96/0xd8
 [<003cd5fc>] slab_err+0xac/0xc0
 [<003d68e4>] free_debug_processing+0x554/0x570
 [<003d69ae>] __slab_free+0xae/0x618
 [<003d7dce>] kfree+0x44e/0x4a0
 [<00859b4e>] blk_done_softirq+0x146/0x160
 [<00bf4ec0>] __do_softirq+0x3d0/0x840
 [<001662a6>] run_ksoftirqd+0x3e/0xb8
 [<001957fc>] smpboot_thread_fn+0x2f4/0x318
 [<0018f6f6>] kthread+0x166/0x178
 [<00bf3cf2>] kernel_thread_starter+0x6/0xc
 [<00bf3cec>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
FIX kmalloc-1024: Object at 0x4ad9eb90 not freed

> 
> ---
> From 4cd32ee48e334b62b55bff0d380833b978454040 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <h...@lst.de>
> Date: Fri, 11 Aug 2017 11:03:29 +0200
> Subject: bsg-lib: allocate sense data for each request
> 
> Since we split the scsi_request out of the request the driver is supposed
> to provide storage for the sense buffer.  The bsg-lib code failed to do so,
> though and will crash anytime it is used.
> 
> This patch moves bsg-lib to allocate and setup the bsg_job ahead of time,
> and allocate the sense data, which is used as reply buffer in bsg.
> 
> Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
> Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
> Cc: <sta...@vger.kernel.org> #4.11+
> ---
>  block/bsg-lib.c | 53 
> +++--
>  include/linux/blkdev.h  |  1 -
>  include/linux/bsg-lib.h |  2 ++
>  3 files changed, 31 insertions(+), 25 deletions(-)
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index c4513b23f57a..215893dbd038 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -100,7 +100,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
>   */
>  static void bsg_softirq_done(struct request *rq)
>  {
> - struct bsg_job *job = rq->special;
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> 
>   bsg_job_put(job);
>  }
> @@ -129,26 +129,9 @@ static int bsg_map_buffer(struct bsg_buffer *buf, struct 
> request *req)
>  static int bsg_create_job(struct device *dev, struct request *req)
>  {
>   struct request *rsp = req->next_rq;
> - struct request_queue *q = req->q;
> - struct scsi_request *rq = scsi_req(req);
> - struct bsg_job *job;
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   int ret;
> 
> - BUG_ON(req->special);
> -
> - job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - req->special = job;
> - job->req = req;
> - if (q->bsg_job_size)
> - job->dd_data = (void *)[1];
> - job->request = rq->cmd;
> - job->request_len = rq->cmd_len;
> - job->reply = rq->sense;
> - job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
> -  * allocated */
>   if (req->bio) {
>   ret = bsg_map_buffer(>request_payload, req);
>   if (ret)
> @@ -187,7 +170,6 @@ static void bsg_request_fn(struct request_queue *q)
>  {
>   struct device *dev = q->qu

Re: [RFC PATCH 4/6] bsg: refactor ioctl to use regular BSG-command infrastructure for SG_IO

2017-08-10 Thread Benjamin Block
On Thu, Aug 10, 2017 at 10:24:56AM +0200, Johannes Thumshirn wrote:
> On Wed, Aug 09, 2017 at 04:11:18PM +0200, Benjamin Block wrote:
> > +   return 0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL);
> 
> return !(bc->hdr.flags & BSG_FLAG_Q_AT_TAIL); and make the function return
> bool? I have to admit, this is the 1st time I have seen the above construct.
> 

Hmm yeah, odd. To be honest, I just copied the expression so that it is
obvious that no behavior changed, but just the place. I'll replace it
with what you suggest.



Beste Grüße / Best regards,
      - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [RFC PATCH 5/6] bsg: reduce unnecessary arguments for bsg_map_hdr()

2017-08-10 Thread Benjamin Block
On Thu, Aug 10, 2017 at 11:35:31AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 09, 2017 at 04:11:19PM +0200, Benjamin Block wrote:
> > Since struct bsg_command is now used in every calling case, we don't
> > need separation of arguments anymore that are contained in the same
> > bsg_command.
> > 
> > Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
> > ---
> >  block/bsg.c | 13 ++---
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/block/bsg.c b/block/bsg.c
> > index 8517361a9b3f..6ee2ffca808a 100644
> > --- a/block/bsg.c
> > +++ b/block/bsg.c
> > @@ -212,11 +212,12 @@ bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
> >   * map sg_io_v4 to a request.
> >   */
> >  static struct request *
> > -bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t 
> > has_write_perm,
> > -   u8 *reply_buffer)
> > +bsg_map_hdr(struct bsg_device *bd, struct bsg_command *bc,
> > +   fmode_t has_write_perm)
> 
> I wish we could just rename the argument to mode and pass on the
> whole file->f_mode while you are cleaning up this code.  That should
> be a separate patch, though.
> 

Hmm, I did a quick pass through the code and the only place this seems
to be used, is to pass it to blk_verify_command() if the subcommand used
in the BSG request is a SCSI Command. And this has the same semantics.
So I guess this would require adjustments to the whole stack, as this is
also used from the 'normal' SG side of the world.



Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-10 Thread Benjamin Block
On Fri, Aug 11, 2017 at 12:10:38AM +0200, Benjamin Block wrote:
> On Thu, Aug 10, 2017 at 11:32:17AM +0200, Christoph Hellwig wrote:
> > We can't use an on-stack buffer for the sense data, as drivers will
> > dma to it.  So we should reuse the SCSI init_rq_fn() for the BSG
> > queues and/or implement the same scheme.
> > 
> 
...
> 
>  struct sg_io_v4
>  +--+
>  |  |  
>  | request>++
>  |   + _len |  ||
>  |   (A)|  | BSG Request|
>  |  |  | e.g. struct fc_bsg_request | Depends on BSG 
> implementation
>  |  |  || FC vs. iSCSI vs. ...
>  |  |  ++
>  |  |
>  | response--->++ Used as _Output_
>  |   + max_len  |  || User doesn't initialize
>  |   (B)|  | BSG Reply  | User provides (optional)
>  |  |  | e.g. struct fc_bsg_reply   |   memory; May be NULL.
>  |  |  ||
>  |  |  ++
>  |  |
>  | dout_xferp->+---+ Stuff send on the wire by
>  |   + _len |  |   |   the LLD
>  |   (C)|  | Transport Protocol IU | Aligned on PAGE_SIZE
>  |  |  | e.g. FC-GS-7 CT_IU|
>  |  |  |   |
>  |  |  +---+
>  |  |
>  | din_xferp-->+---+ Buffer for response data by
>  |   + _len |  |   |   the LLD
>  |   (D)|  | Transport Protocol IU | Aligned on PAGE_SIZE
>  |  |  | e.g. FC-GS-7 CT_IU|
>  |  |  |   |
>  |  |  +---+
>  +--+
> 
...
> 
>  struct request (E)
>  +--+
>  |  |  struct scsi_request
>  | scsi_request--->+-+
>  |  |  | |
>  |  |  | cmd-> Copy of (A)
>  |  |  |  + _len | Space in struct or kzalloc
>  |  |  |  (G)|
>  |  |  | |
>  |  |  | sense---> Space for BSG Reply
>  |  |  |  + _len | Same Data-Structure as (B)
>  |  |  |  (H)| NOT actually pointer (B)
>  |  |  | | 'reply_buffer' in my patch 
>  |  |  +-+
>  |  |
>  | bio> Mapped via blk_rq_map_user() to (C) dout_xferp
>  |  |
>  | next_rq-+
>  |  |  |
>  +--+  |
>|
>  struct request (F)|(if used)
>  +--+<-+
>  |  |
>  | scsi_request---> Unused here
>  |  |
>  | bio> Mapped via blk_rq_map_user() to (D) din_xferp
>  |  |
>  +--+
> 
...
> 
>  struct bsg_job
>  +-+
>  | |
>  | request---> (G) scsi_request->cmd -> Copy of (A)
>  |   + _len|   e.g. struct fc_bsg_request
>  | |
>  | reply-> (H) scsi_request->sense -> 'reply_buffer'
>  |   + _len|   e.g. struct fc_bsg_reply
>  | |
>  | request_payload---> struct scatterlist ... map (E)->bio
>  |   + _len|
>  |   (I)   |
>  | |
>  | reply_payload-> struct scatterlist ... map (F)->bio
>  |   + _len|
>  |   (J)   |
>  | |
>  +-+
> 

> 
> This worked till it broke. Right now every driver that tries to access
> (H) will panic the system, or cause very undefined behavior. I suspect
> no driver right now tries to do any DMA into (H); before the regression,
> this has been also an on-stack variable (I suspect since BSG was
> introduced, haven't checked though).
> 
> The asymmetries between the first struct request (E) and the following
> (F) also makes it hard to use the same scheme as in other drivers, where
> init_rq_fn() gets to initialize each request in the same'ish way. Or?
> Just looking at it right now, this would require some bigger rework that
> is not appropriate for a stable bug-fix.
> 

Just some more brain-dump here.

One more problem for direct DMA into (H) in the current BSG setup is
probably, that the transport classes have each their own private format
for the BSG reply (

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-14 Thread Benjamin Block
Hey Christoph,

I looked over the patch in detail, see below.

> From f5b03b82df0569c035022c1c2535696186907f1a Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <h...@lst.de>
> Date: Fri, 11 Aug 2017 11:03:29 +0200
> Subject: bsg-lib: allocate sense data for each request
>
> Since we split the scsi_request out of the request the driver is supposed
> to provide storage for the sense buffer.  The bsg-lib code failed to do so,
> though and will crash anytime it is used.
>
> This patch moves bsg-lib to allocate and setup the bsg_job ahead of time,
> and allocate the sense data, which is used as reply buffer in bsg.
>
> Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
> Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
> Cc: <sta...@vger.kernel.org> #4.11+
> ---
>  block/bsg-lib.c | 51 
> +++--
>  include/linux/blkdev.h  |  1 -
>  include/linux/bsg-lib.h |  2 ++
>  3 files changed, 30 insertions(+), 24 deletions(-)
>
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index c4513b23f57a..c07333c3b785 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -37,13 +37,11 @@ static void bsg_destroy_job(struct kref *kref)
>   struct bsg_job *job = container_of(kref, struct bsg_job, kref);
>   struct request *rq = job->req;
>
> - blk_end_request_all(rq, BLK_STS_OK);
> -
>   put_device(job->dev);   /* release reference for the request */
>
>   kfree(job->request_payload.sg_list);
>   kfree(job->reply_payload.sg_list);
> - kfree(job);
> + blk_end_request_all(rq, BLK_STS_OK);

What is the reason for moving that last line? Just wondering whether
that might change the behavior somehow, although it doesn't look like it
from the code.

>  }
>
>  void bsg_job_put(struct bsg_job *job)
> @@ -100,7 +98,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
>   */
>  static void bsg_softirq_done(struct request *rq)
>  {
> - struct bsg_job *job = rq->special;
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
>
>   bsg_job_put(job);
>  }
> @@ -129,24 +127,11 @@ static int bsg_map_buffer(struct bsg_buffer *buf, 
> struct request *req)
>  static int bsg_create_job(struct device *dev, struct request *req)
>  {
>   struct request *rsp = req->next_rq;
> - struct request_queue *q = req->q;
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   struct scsi_request *rq = scsi_req(req);
> - struct bsg_job *job;
>   int ret;
>
> - BUG_ON(req->special);
> -
> - job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - req->special = job;
> - job->req = req;
> - if (q->bsg_job_size)
> - job->dd_data = (void *)[1];
> - job->request = rq->cmd;
>   job->request_len = rq->cmd_len;
> - job->reply = rq->sense;
>   job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
>* allocated */
>   if (req->bio) {
> @@ -187,7 +172,6 @@ static void bsg_request_fn(struct request_queue *q)
>  {
>   struct device *dev = q->queuedata;
>   struct request *req;
> - struct bsg_job *job;
>   int ret;
>
>   if (!get_device(dev))
> @@ -207,8 +191,7 @@ static void bsg_request_fn(struct request_queue *q)
>   continue;
>   }
>
> - job = req->special;
> - ret = q->bsg_job_fn(job);
> + ret = q->bsg_job_fn(blk_mq_rq_to_pdu(req));
>   spin_lock_irq(q->queue_lock);
>   if (ret)
>   break;
> @@ -219,6 +202,27 @@ static void bsg_request_fn(struct request_queue *q)
>   spin_lock_irq(q->queue_lock);
>  }
>
> +static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t 
> gfp)
> +{
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
> +
> + memset(job, 0, sizeof(*job));
> + job->req = req;
> + job->request = job->sreq.cmd;

That doesn't work with bsg.c if the request submitted by the user is
bigger than BLK_MAX_CDB. There is code in blk_fill_sgv4_hdr_rq() that
will reassign the req->cmd point in that case to something else.

This is maybe wrong in the same vein as my Patch 1 is. If not for the
legacy code in bsg.c, setting this here, will miss changes to that
pointer between request-allocation and job-submission.

So maybe just move this to bsg_create_job().

> + job->dd_data = job + 1;
> + job

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-14 Thread Benjamin Block
On Sun, Aug 13, 2017 at 04:39:40PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 11, 2017 at 06:01:42PM +0200, Benjamin Block wrote:
> > When the BSG interface is used with bsg-lib, and the user sends a
> > Bidirectional command - so when he gives an input- and output-buffer
> > (most users of our interface will likely do that, if they wanna get the
> > transport-level response data) - bsg will allocate two requests from the
> > queue. The first request's bio is used to map the input and the second
> > request's bio for the output (see bsg_map_hdr() in the if-statement with
> > (op == REQ_OP_SCSI_OUT && hdr->din_xfer_len)).
> > 
> > When we now allocate the full space of bsg_job, sense, dd_data for each
> > request, these will be wasted on the (linked) second request. They will
> > go unused all the time, as only the first request's bsg_job, sense and
> > dd_data is used by the LLDs and BSG itself.
> > 
> > Right now, because we don't allocate this on each request, those spaces
> > are only allocated for the first request in bsg-lib.
> > 
> > Maybe we can ignore this, if it gets to complicated, I don't wanne
> > prolong this unnecessary.
> 
> We have the same 'issue' with bidirection scsi commands - it's a side
> effect of having two request structures for these commands, and the
> only real fix would be to have a single request structure, which would
> be nice especially if we can't do it without growing struct request.
> 

Alright, I was not aware of that. That is fair then. Thx.


    Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[RFC PATCH 2/6] bsg: assign sense_len instead of fixed SCSI_SENSE_BUFFERSIZE

2017-08-09 Thread Benjamin Block
We do set rq->sense_len when we assigne the reply-buffer in
blk_fill_sgv4_hdr_rq(). No point in possibly deviating from this value
later on.

bsg-lib.h specifies:
unsigned int reply_len;
/*
 * On entry : reply_len indicates the buffer size allocated for
 * the reply.
 *
 * ...
 */

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg-lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c4513b23f57a..c7c2c6bbb5ae 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -147,8 +147,8 @@ static int bsg_create_job(struct device *dev, struct 
request *req)
job->request = rq->cmd;
job->request_len = rq->cmd_len;
job->reply = rq->sense;
-   job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
-* allocated */
+   job->reply_len = rq->sense_len;
+
if (req->bio) {
ret = bsg_map_buffer(>request_payload, req);
if (ret)
-- 
2.12.2



[RFC PATCH 5/6] bsg: reduce unnecessary arguments for bsg_map_hdr()

2017-08-09 Thread Benjamin Block
Since struct bsg_command is now used in every calling case, we don't
need separation of arguments anymore that are contained in the same
bsg_command.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 8517361a9b3f..6ee2ffca808a 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -212,11 +212,12 @@ bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
  * map sg_io_v4 to a request.
  */
 static struct request *
-bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t 
has_write_perm,
-   u8 *reply_buffer)
+bsg_map_hdr(struct bsg_device *bd, struct bsg_command *bc,
+   fmode_t has_write_perm)
 {
struct request_queue *q = bd->queue;
struct request *rq, *next_rq = NULL;
+   struct sg_io_v4 *hdr = >hdr;
int ret;
unsigned int op, dxfer_len;
void __user *dxferp = NULL;
@@ -244,7 +245,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, 
fmode_t has_write_perm,
if (IS_ERR(rq))
return rq;
 
-   ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, reply_buffer,
+   ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, bc->reply_buffer,
   has_write_perm);
if (ret)
goto out;
@@ -633,8 +634,7 @@ static int __bsg_write(struct bsg_device *bd, const char 
__user *buf,
/*
 * get a request, fill in the blanks, and add to request queue
 */
-   rq = bsg_map_hdr(bd, >hdr, has_write_perm,
-bc->reply_buffer);
+   rq = bsg_map_hdr(bd, bc, has_write_perm);
if (IS_ERR(rq)) {
ret = PTR_ERR(rq);
rq = NULL;
@@ -934,8 +934,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
goto sg_io_out;
}
 
-   bc->rq = bsg_map_hdr(bd, >hdr, file->f_mode & FMODE_WRITE,
-bc->reply_buffer);
+   bc->rq = bsg_map_hdr(bd, bc, file->f_mode & FMODE_WRITE);
if (IS_ERR(bc->rq)) {
ret = PTR_ERR(bc->rq);
goto sg_io_out;
-- 
2.12.2



[RFC PATCH 0/6] bsg: fix regression resulting in panics when sending commands via BSG and some sanity cleanups

2017-08-09 Thread Benjamin Block
Hello all,

Steffen noticed recently that we have a regression in the BSG code that
prevents us from sending any traffic over this interface. After I
researched this a bit, it turned out that this affects not only zFCP, but
likely all LLDs that implements the BSG API. This was introduced in 4.11
(details in Patch 1 of this series).

I imagine the regression happened because of some very "unfortunate"
variable- namings. I can not fix them, as they involve the base struct
request, but I tried to add some cleanups that should make the
relationships between stuff more visible in the future I hope.

Patch 1   - Regression Fix; Also tagged for stable
Patch 2-6 - Cleanups

I tagged this as RFC. Patches 2-6 are a 'nice to have' IMO, Patch 1 is
obviously necessary, and if it is OK, I can re-send it separately if
necessary. If you don't like the changes in the other patches, I don't mind
dropping them.

I am not sure about Patch 4. It certainly works, but it changes
user-visible behavior, in what I believe is within the behavior described
by the SG interface. It makes the different methods of how BSG passes
commands down to the LLDs more conform with each other - even though I
can't make them the exact same. More details in the patch description.

I rebased the series on Jens' for-next and I have function-tested the
series on s390x's zFCP with the tools provided in the zfcp HBA API library
(https://www.ibm.com/developerworks/linux/linux390/zfcp-hbaapi.html) and
some custom code to test the read/write interface of BSG.

Reviews are more than welcome :)


Beste Grüße / Best regards,
  - Benjamin Block

Benjamin Block (6):
  bsg: fix kernel panic resulting from missing allocation of a
reply-buffer
  bsg: assign sense_len instead of fixed SCSI_SENSE_BUFFERSIZE
  bsg: scsi-transport: add compile-tests to prevent reply-buffer
overflows
  bsg: refactor ioctl to use regular BSG-command infrastructure for
SG_IO
  bsg: reduce unecessary arguments for bsg_map_hdr()
  bsg: reduce unecessary arguments for blk_complete_sgv4_hdr_rq()

 block/bsg-lib.c |  4 +-
 block/bsg.c | 90 ++---
 drivers/scsi/scsi_transport_fc.c|  3 ++
 drivers/scsi/scsi_transport_iscsi.c |  3 ++
 include/linux/bsg-lib.h |  2 +
 5 files changed, 65 insertions(+), 37 deletions(-)

--
2.12.2



[RFC PATCH 3/6] bsg: scsi-transport: add compile-tests to prevent reply-buffer overflows

2017-08-09 Thread Benjamin Block
The BSG implementations use the bsg_job's reply buffer as storage for their
own custom reply structures (e.g.: struct fc_bsg_reply or
struct iscsi_bsg_reply). The size of bsg_job's reply buffer and those of
the implementations is not dependent in any way the compiler can currently
check.

To make it easier to notice accidental violations add an explicit compile-
time check that tests whether the implementations' reply buffer is at most
as large as bsg_job's.

To do so, we have to move the size-define from bsg.c to a common header.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 3 +--
 drivers/scsi/scsi_transport_fc.c| 3 +++
 drivers/scsi/scsi_transport_iscsi.c | 3 +++
 include/linux/bsg-lib.h | 2 ++
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 285b1b8126c3..b924f1c23c58 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -74,8 +75,6 @@ static int bsg_major;
 
 static struct kmem_cache *bsg_cmd_cachep;
 
-#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
-
 /*
  * our internal command type
  */
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 892fbd9800d9..ce6654b5d329 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3736,6 +3736,9 @@ static int fc_bsg_dispatch(struct bsg_job *job)
 {
struct Scsi_Host *shost = fc_bsg_to_shost(job);
 
+   BUILD_BUG_ON(sizeof(struct fc_bsg_reply) >
+BSG_COMMAND_REPLY_BUFFERSIZE);
+
if (scsi_is_fc_rport(job->dev))
return fc_bsg_rport_dispatch(shost, job);
else
diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index a424eaeafeb0..4e021c949ad7 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1483,6 +1483,9 @@ static int iscsi_bsg_host_dispatch(struct bsg_job *job)
int cmdlen = sizeof(uint32_t);  /* start with length of msgcode */
int ret;
 
+   BUILD_BUG_ON(sizeof(struct iscsi_bsg_reply) >
+BSG_COMMAND_REPLY_BUFFERSIZE);
+
/* check if we have the msgcode value at least */
if (job->request_len < sizeof(uint32_t)) {
ret = -ENOMSG;
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index e34dde2da0ef..85d7c7678cc6 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -25,6 +25,8 @@
 
 #include 
 
+#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
+
 struct request;
 struct device;
 struct scatterlist;
-- 
2.12.2



[RFC PATCH 6/6] bsg: reduce unnecessary arguments for blk_complete_sgv4_hdr_rq()

2017-08-09 Thread Benjamin Block
Since struct bsg_command is now used in every calling case, we don't
need separation of arguments anymore that are contained in the same
bsg_command.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 6ee2ffca808a..09f767cdf816 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -399,13 +399,14 @@ static struct bsg_command *bsg_get_done_cmd(struct 
bsg_device *bd)
return bc;
 }
 
-static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
-   struct bio *bio, struct bio *bidi_bio)
+static int blk_complete_sgv4_hdr_rq(struct bsg_command *bc)
 {
+   struct sg_io_v4 *hdr = >hdr;
+   struct request *rq = bc->rq;
struct scsi_request *req = scsi_req(rq);
int ret = 0;
 
-   dprintk("rq %p bio %p 0x%x\n", rq, bio, req->result);
+   dprintk("rq %p bio %p 0x%x\n", rq, bc->bio, req->result);
/*
 * fill in all the output members
 */
@@ -432,7 +433,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
if (rq->next_rq) {
hdr->dout_resid = req->resid_len;
hdr->din_resid = scsi_req(rq->next_rq)->resid_len;
-   blk_rq_unmap_user(bidi_bio);
+   blk_rq_unmap_user(bc->bidi_bio);
blk_put_request(rq->next_rq);
} else if (rq_data_dir(rq) == READ)
hdr->din_resid = req->resid_len;
@@ -448,7 +449,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
if (!ret && req->result < 0)
ret = req->result;
 
-   blk_rq_unmap_user(bio);
+   blk_rq_unmap_user(bc->bio);
scsi_req_free_cmd(req);
blk_put_request(rq);
 
@@ -507,8 +508,7 @@ static int bsg_complete_all_commands(struct bsg_device *bd)
if (IS_ERR(bc))
break;
 
-   tret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-   bc->bidi_bio);
+   tret = blk_complete_sgv4_hdr_rq(bc);
if (!ret)
ret = tret;
 
@@ -542,8 +542,7 @@ __bsg_read(char __user *buf, size_t count, struct 
bsg_device *bd,
 * after completing the request. so do that here,
 * bsg_complete_work() cannot do that for us
 */
-   ret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-  bc->bidi_bio);
+   ret = blk_complete_sgv4_hdr_rq(bc);
 
if (copy_to_user(buf, >hdr, sizeof(bc->hdr)))
ret = -EFAULT;
@@ -944,8 +943,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
blk_execute_rq(bd->queue, NULL, bc->rq, at_head);
bc->hdr.duration = jiffies_to_msecs(jiffies - bc->hdr.duration);
 
-   ret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-  bc->bidi_bio);
+   ret = blk_complete_sgv4_hdr_rq(bc);
 
if (copy_to_user(uarg, >hdr, sizeof(bc->hdr)))
ret = -EFAULT;
-- 
2.12.2



[RFC PATCH 4/6] bsg: refactor ioctl to use regular BSG-command infrastructure for SG_IO

2017-08-09 Thread Benjamin Block
Before, the SG_IO ioctl for BSG devices used to use its own on-stack data
to assemble and send the specified command. The read and write calls use
their own infrastructure build around the struct bsg_command and a custom
slab-pool for that.

Rafactor this, so that SG_IO ioctl also uses struct bsg_command and the
surrounding infrastructure. This way we use global defines like
BSG_COMMAND_REPLY_BUFFERSIZE only in one place, rather than two, the
handling of BSG commands gets more consistent, and it reduces some code-
duplications (the bio-pointer handling). It also reduces the stack
footprint by 320 to 384 bytes (depending on how large pointers are), and
uses the active slab-implementation for efficient alloc/free.

There are two other side-effects:
 - the 'duration' field in the sg header is now also filled for SG_IO
   calls, unlike before were it was always zero.
 - the BSG device queue-limit is also applied to SG_IO, unlike before were
   you could flood one BSG device with as many commands as you'd like. If
   one can trust older SG documentation this limit is applicable to either
   normal writes, or SG_IO calls; but this wasn't enforced before for
   SG_IO.

A complete unification is not possible, as it then would also enqueue SG_IO
commands in the BGS devices's command list, but this is only for the read-
and write-calls.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 60 
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index b924f1c23c58..8517361a9b3f 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -320,6 +320,17 @@ static void bsg_rq_end_io(struct request *rq, blk_status_t 
status)
wake_up(>wq_done);
 }
 
+static int bsg_prep_add_command(struct bsg_command *bc, struct request *rq)
+{
+   bc->rq = rq;
+   bc->bio = rq->bio;
+   if (rq->next_rq)
+   bc->bidi_bio = rq->next_rq->bio;
+   bc->hdr.duration = jiffies;
+
+   return 0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL);
+}
+
 /*
  * do final setup of a 'bc' and submit the matching 'rq' to the block
  * layer for io
@@ -327,16 +338,11 @@ static void bsg_rq_end_io(struct request *rq, 
blk_status_t status)
 static void bsg_add_command(struct bsg_device *bd, struct request_queue *q,
struct bsg_command *bc, struct request *rq)
 {
-   int at_head = (0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL));
+   int at_head = bsg_prep_add_command(bc, rq);
 
/*
 * add bc command to busy queue and submit rq for io
 */
-   bc->rq = rq;
-   bc->bio = rq->bio;
-   if (rq->next_rq)
-   bc->bidi_bio = rq->next_rq->bio;
-   bc->hdr.duration = jiffies;
spin_lock_irq(>lock);
list_add_tail(>list, >busy_list);
spin_unlock_irq(>lock);
@@ -916,31 +922,37 @@ static long bsg_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg)
return scsi_cmd_ioctl(bd->queue, NULL, file->f_mode, cmd, uarg);
}
case SG_IO: {
-   struct request *rq;
-   u8 reply_buffer[BSG_COMMAND_REPLY_BUFFERSIZE] = { 0, };
-   struct bio *bio, *bidi_bio = NULL;
-   struct sg_io_v4 hdr;
+   struct bsg_command *bc;
int at_head;
 
-   if (copy_from_user(, uarg, sizeof(hdr)))
-   return -EFAULT;
+   bc = bsg_alloc_command(bd);
+   if (IS_ERR(bc))
+   return PTR_ERR(bc);
 
-   rq = bsg_map_hdr(bd, , file->f_mode & FMODE_WRITE,
-reply_buffer);
-   if (IS_ERR(rq))
-   return PTR_ERR(rq);
+   if (copy_from_user(>hdr, uarg, sizeof(bc->hdr))) {
+   ret = -EFAULT;
+   goto sg_io_out;
+   }
 
-   bio = rq->bio;
-   if (rq->next_rq)
-   bidi_bio = rq->next_rq->bio;
+   bc->rq = bsg_map_hdr(bd, >hdr, file->f_mode & FMODE_WRITE,
+bc->reply_buffer);
+   if (IS_ERR(bc->rq)) {
+   ret = PTR_ERR(bc->rq);
+   goto sg_io_out;
+   }
 
-   at_head = (0 == (hdr.flags & BSG_FLAG_Q_AT_TAIL));
-   blk_execute_rq(bd->queue, NULL, rq, at_head);
-   ret = blk_complete_sgv4_hdr_rq(rq, , bio, bidi_bio);
+   at_head = bsg_prep_add_command(bc, bc->rq);
+   blk_execute_rq(bd->queue, NULL, bc->rq, at_head);
+   bc->hdr.duration = jiffies_to_msecs(jiffies - bc->hdr.duration);
 
-   if (copy_to_user(uarg, , sizeof(hdr)))
-  

[RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-09 Thread Benjamin Block
In contrast to the normal SCSI-lib, the BSG block-queue doesn't make use of
any extra init_rq_fn() to make additional allocations during
request-creation, and the request sense-pointer is not used to transport
SCSI sense data, but is used as backing for the bsg_job->reply pointer;
that in turn is used in the LLDs to store protocol IUs or similar stuff.
This 're-purposing' of the sense-pointer is done in the BSG blk-lib
(bsg_create_job()), during the queue-processing.

Failing to allocate/assign it results in illegal dereferences because LLDs
use this pointer unquestioned, as can be seen in the various
BSG-implementations:

drivers/scsi/libfc/fc_lport.c:  fc_lport_bsg_request()
drivers/scsi/qla2xxx/qla_bsg.c: qla24xx_bsg_request()
drivers/scsi/qla4xxx/ql4_bsg.c: qla4xxx_process_vendor_specific()
drivers/s390/scsi/zfcp_fc.c:zfcp_fc_ct_els_job_handler()
...

An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:01590007 R3:0024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: 
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 65cb0100 task.stack: 65cb4000
Krnl PSW : 0704e0018000 03ff801e4156 
(zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0001 5fa9d0d0 5fa9d078 00e16866
   03ff0290 6b6b6b6b6b6b6b6b 59f78f00 000f
   593a0958 593a0958 60d88800 5ddd4c38
   58b50100 0700659cba08 03ff801e8556 659cb9a8
Krnl Code: 03ff801e4146: e3102054lg  %r1,80(%r2)
   03ff801e414c: 58402040   l   %r4,64(%r2)
  #03ff801e4150: e3502024   lg  %r5,32(%r2)
  >03ff801e4156: 50405004   st  %r4,4(%r5)
   03ff801e415a: e54c5008   mvhi8(%r5),0
   03ff801e4160: e33010280012   lt  %r3,40(%r1)
   03ff801e4166: a718fffb   lhi %r1,-5
   03ff801e416a: 1803   lr  %r0,%r3
Call Trace:
([<03ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
 [<03ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
 [<03ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
 [<009b91b6>] qdio_kick_handler+0x2ae/0x2c8
 [<009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
 [<001684c2>] tasklet_action+0x15a/0x1d8
 [<00bd28ec>] __do_softirq+0x3ec/0x848
 [<001675a4>] irq_exit+0x74/0xf8
 [<0010dd6a>] do_IRQ+0xba/0xf0
 [<00bd19e8>] io_int_handler+0x104/0x2d4
 [<001033b6>] enabled_wait+0xb6/0x188
([<0010339e>] enabled_wait+0x9e/0x188)
 [<0010396a>] arch_cpu_idle+0x32/0x50
 [<00bd0112>] default_idle_call+0x52/0x68
 [<001cd0fa>] do_idle+0x102/0x188
 [<001cd41e>] cpu_startup_entry+0x3e/0x48
 [<00118c64>] smp_start_secondary+0x11c/0x130
 [<00bd2016>] restart_int_handler+0x62/0x78
 [<>]   (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
 [<03ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]

Kernel panic - not syncing: Fatal exception in interrupt

To prevent this, allocate a buffer when the BSG blk-request is setup, and
before it is queued for LLD processing.

Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Cc: <sta...@vger.kernel.org> #4.11+
---
 block/bsg.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 37663b664666..285b1b8126c3 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -74,6 +74,8 @@ static int bsg_major;
 
 static struct kmem_cache *bsg_cmd_cachep;
 
+#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
+
 /*
  * our internal command type
  */
@@ -85,6 +87,7 @@ struct bsg_command {
struct bio *bidi_bio;
int err;
struct sg_io_v4 hdr;
+   u8 reply_buffer[BSG_COMMAND_REPLY_BUFFERSIZE];
 };
 
 static void bsg_free_command(struct bsg_command *bc)
@@ -137,7 +140,7 @@ static inline struct hlist_head *bsg_dev_idx_hash(int index)
 
 static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq,
struct sg_io_v4 *hdr, struct bsg_devic

Re: Is the possible cross-talking between unrelated file-descriptors on bsg-device by design?

2017-09-21 Thread Benjamin Block
On Tue, Sep 19, 2017 at 02:16:26PM -0400, Douglas Gilbert wrote:
> On 2017-09-19 10:56 AM, Benjamin Block wrote:
> > Hello linux-block,
> > 
> > I wrote some tests recently to test patches against bsg.c and bsg-lib.c,
> > and while writing those I noticed something strange:
> > 
> > When you use the write() and read() call on multiple file-descriptors
> > for a single bsg-device (FC or SCSI), it is possible that you get
> > cross-talk between those different file-descriptors.
> > 
> > E.g.: You have two independent processes open a FD on /dev/bsg/fc_host0
> > and you send commands via write() in both processes, when they both do
> > read() later on - to read the response for the commands they send before
> > -, it is possible that process A gets the response (Sense-Data,
> > FC-Protocol-Data, ...) for a command that process B wrote and vis-a-vis.
> > 
> > I noticed this because in my tests I 'tag' each command I send with
> > write() via a value I write into the usr_ptr field of sg_io_v4. When I
> > later user read() to receive responses I check for this tag in a
> > hash-table and so can look up the original command. When I used this and
> > spawned multiple processes with the same target bsg-device, I got
> > responses for commands I don't have tags for in my hash-table, so they
> > were written by an other process. This never happend when I only spawn
> > one test-process.
> > 
> > This seems awfully contra-productive.. so much that I don't see any
> > point in allowing users to open more than one FD per bsg-device, because
> > that would inevitably lead to very hard to debug 'bugs'. And I wonder if
> > this is really by design or some regression that happend over time.
> > 
> > I looked at the code in bsg.c and yes, it seems this is what is coded
> > right now. We have a hash-table which manages all 'struct bsg_device's
> > who are indexed by device-minors, so one hash-table entry per
> > device-node.
> > 
> > So eh, long talk short question, is that intended?
> 
> Hi,
> About once a year I point out that major misfeature in the bsg driver
> and no-one seems to care. Most recently I pointed it out in a
> discussion about SCSI SG CVE-2017-0794 6 days ago:
>   " ... Last time I looked at the bsg driver, a SCSI command
> could be issued on one file descriptor and its data-in block
> and SCSI status delivered to another file descriptor to the
> same block device (potentially in another process). IOW chaos"
> 
> It is hard to imagine any sane application relying on this bizarre
> behaviour so fixing it should not break any applications. My guess
> is that it would require a non-trivial patch set to fix. Would you
> like to volunteer?
> 

Interesting. So this is not a regression then.

Well, personally I am intrigued to try to fix this, but professionally
it is not really up to me to choose to work on this. Especially because
as you say, this looks not trivial - at least if you want to let
multiple applications open a FD for a single BSG device node. Lets see.

But thank you for elaborating on this!


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[PATCH 1/1] bsg-lib: fix use-after-free under memory-pressure

2017-09-21 Thread Benjamin Block
When under memory-pressure it is possible that the mempool which backs
the 'struct request_queue' will make use of up to BLKDEV_MIN_RQ count
emergency buffers - in case it can't get a regular allocation. These
buffers are preallocated and once they are also used, they are
re-supplied with old finished requests from the same request_queue (see
mempool_free()).

The bug is, when re-supplying the emergency pool, the old requests are
not again ran through the callback mempool_t->alloc(), and thus also not
through the callback bsg_init_rq(). Thus we skip initialization, and
while the sense-buffer still should be good, scsi_request->cmd might
have become to be an invalid pointer in the meantime. When the request
is initialized in bsg.c, and the user's CDB is larger than BLK_MAX_CDB,
bsg will replace it with a custom allocated buffer, which is freed when
the user's command is finished, thus it dangles afterwards. When next a
command is sent by the user that has a smaller/similar CDB as
BLK_MAX_CDB, bsg will assume that scsi_request->cmd is backed by
scsi_request->__cmd, will not make a custom allocation, and write into
undefined memory.

Fix this by splitting bsg_init_rq() into two functions:
 - bsg_init_job() directly replace bsg_init_rq() and only does the
   allocation of the sense-buffer, which is used to back the bsg job's
   reply buffer. This pointer should never change during the lifetime of
   a scsi_request, so it doesn't need re-initialization.
 - bsg_init_rq() is a new function that make use of
   'struct request_queue's initialize_rq_fn callback (which was
   introduced in v4.12). This is always called before the request is
   given out via blk_get_request(). This function does the remaining
   initialization that was previously done in bsg_init_rq(), and will
   also do it when the request is taken from the emergency-pool of the
   backing mempool.

Also rename bsg_exit_rq() into bsg_exit_job(), to make it fit the
name-scheme.

Fixes: 50b4d485528d ("bsg-lib: fix kernel panic resulting from missing 
allocation of reply-buffer")
Cc: <sta...@vger.kernel.org> # 4.11+
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---

Notes:
I did test this on zFCP with FC CT commands send via the ioctl() and
write() system-call. That did work fine. But I would very much
appreciate if anyone could run this against an other HBA or even an
other implementer of bsg-lib, such as now SAS, because I have no access
to such hardware here.

This should make no difference to the normal cases - where each request
is allocated via slab - with- or without this patch; if I didn't miss
anything. Only the order is a bit mixed up - the memset is done after
the sense-allocation, so I have to buffer the sense-pointer for that.
But otherwise there is no difference I am aware of, so it should behave
the same (does for me).

I could not reproduce the memory-pressure case here in the lab.. I
don't see any reason why it should work now, but I am open to
suggestions :)

Beste Grüße / Best regards,
      - Benjamin Block

 block/bsg-lib.c | 41 -
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c82408c7cc3c..634d1557da38 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -203,28 +203,42 @@ static void bsg_request_fn(struct request_queue *q)
spin_lock_irq(q->queue_lock);
 }
 
-static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t gfp)
+static int bsg_init_job(struct request_queue *q, struct request *req, gfp_t 
gfp)
 {
struct bsg_job *job = blk_mq_rq_to_pdu(req);
struct scsi_request *sreq = >sreq;
 
-   memset(job, 0, sizeof(*job));
+   /* called right after the request is allocated for the request_queue */
 
-   scsi_req_init(sreq);
-   sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
-   sreq->sense = kzalloc(sreq->sense_len, gfp);
+   sreq->sense = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);
if (!sreq->sense)
return -ENOMEM;
 
-   job->req = req;
-   job->reply = sreq->sense;
-   job->reply_len = sreq->sense_len;
-   job->dd_data = job + 1;
-
return 0;
 }
 
-static void bsg_exit_rq(struct request_queue *q, struct request *req)
+static void bsg_init_rq(struct request *req)
+{
+   struct bsg_job *job = blk_mq_rq_to_pdu(req);
+   struct scsi_request *sreq = >sreq;
+   void *sense = sreq->sense;
+
+   /* called right before the request is given to the request_queue user */
+
+   memset(job, 0, sizeof(*job));
+
+   scsi_req_init(sreq);
+
+   sreq->sense = sense;
+   sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
+
+   job->req = req;
+   job->reply = sense;

Is the possible cross-talking between unrelated file-descriptors on bsg-device by design?

2017-09-19 Thread Benjamin Block
Hello linux-block,

I wrote some tests recently to test patches against bsg.c and bsg-lib.c,
and while writing those I noticed something strange:

When you use the write() and read() call on multiple file-descriptors
for a single bsg-device (FC or SCSI), it is possible that you get
cross-talk between those different file-descriptors.

E.g.: You have two independent processes open a FD on /dev/bsg/fc_host0
and you send commands via write() in both processes, when they both do
read() later on - to read the response for the commands they send before
-, it is possible that process A gets the response (Sense-Data,
FC-Protocol-Data, ...) for a command that process B wrote and vis-a-vis.

I noticed this because in my tests I 'tag' each command I send with
write() via a value I write into the usr_ptr field of sg_io_v4. When I
later user read() to receive responses I check for this tag in a
hash-table and so can look up the original command. When I used this and
spawned multiple processes with the same target bsg-device, I got
responses for commands I don't have tags for in my hash-table, so they
were written by an other process. This never happend when I only spawn
one test-process.

This seems awfully contra-productive.. so much that I don't see any
point in allowing users to open more than one FD per bsg-device, because
that would inevitably lead to very hard to debug 'bugs'. And I wonder if
this is really by design or some regression that happend over time.

I looked at the code in bsg.c and yes, it seems this is what is coded
right now. We have a hash-table which manages all 'struct bsg_device's
who are indexed by device-minors, so one hash-table entry per
device-node.

So eh, long talk short question, is that intended?



Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 8/9] block: pass full fmode_t to blk_verify_command

2017-10-16 Thread Benjamin Block
0;
> 
>   return -EPERM;
> @@ -234,7 +234,7 @@ static int blk_fill_sghdr_rq(struct request_queue *q, 
> struct request *rq,
> 
>   if (copy_from_user(req->cmd, hdr->cmdp, hdr->cmd_len))
>   return -EFAULT;
> - if (blk_verify_command(req->cmd, mode & FMODE_WRITE))
> + if (blk_verify_command(req->cmd, mode))
>   return -EPERM;
> 
>   /*
> @@ -469,7 +469,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk 
> *disk, fmode_t mode,
>   if (in_len && copy_from_user(buffer, sic->data + cmdlen, in_len))
>   goto error;
> 
> - err = blk_verify_command(req->cmd, mode & FMODE_WRITE);
> + err = blk_verify_command(req->cmd, mode);
>   if (err)
>   goto error;
> 
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 0419c2298eab..92fd870e1315 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -217,7 +217,7 @@ static int sg_allow_access(struct file *filp, unsigned 
> char *cmd)
>   if (sfp->parentdp->device->type == TYPE_SCANNER)
>   return 0;
> 
> - return blk_verify_command(cmd, filp->f_mode & FMODE_WRITE);
> + return blk_verify_command(cmd, filp->f_mode);
>  }
> 
>  static int
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 02fa42d24b52..75fe9d45ead1 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1371,7 +1371,7 @@ static inline int sb_issue_zeroout(struct super_block 
> *sb, sector_t block,
>   gfp_mask, 0);
>  }
> 
> -extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
> +extern int blk_verify_command(unsigned char *cmd, fmode_t mode);
> 
>  enum blk_default_limits {
>   BLK_MAX_SEGMENTS= 128,
> -- 
> 2.14.1
> 

Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 6/9] bsg-lib: introduce a timeout field in struct bsg_job

2017-10-16 Thread Benjamin Block
On Tue, Oct 03, 2017 at 12:48:42PM +0200, Christoph Hellwig wrote:
> The zfcp driver wants to know the timeout for a bsg job, so add a field
> to struct bsg_job for it in preparation of not exposing the request
> to the bsg-lib users.
> 
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> ---
>  block/bsg-lib.c | 1 +
>  drivers/s390/scsi/zfcp_fc.c | 4 ++--
>  include/linux/bsg-lib.h | 2 ++
>  3 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index 15d25ccd51a5..0d5bbf6a2ddd 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -132,6 +132,7 @@ static int bsg_prepare_job(struct device *dev, struct 
> request *req)
>   struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   int ret;
> 
> + job->timeout = req->timeout;
>   job->request = rq->cmd;
>   job->request_len = rq->cmd_len;
> 
> diff --git a/drivers/s390/scsi/zfcp_fc.c b/drivers/s390/scsi/zfcp_fc.c
> index 8210645c2111..9d6f69b92c81 100644
> --- a/drivers/s390/scsi/zfcp_fc.c
> +++ b/drivers/s390/scsi/zfcp_fc.c
> @@ -960,7 +960,7 @@ static int zfcp_fc_exec_els_job(struct bsg_job *job,
>   d_id = ntoh24(bsg_request->rqst_data.h_els.port_id);
> 
>   els->handler = zfcp_fc_ct_els_job_handler;
> - return zfcp_fsf_send_els(adapter, d_id, els, job->req->timeout / HZ);
> + return zfcp_fsf_send_els(adapter, d_id, els, job->timeout / HZ);
>  }
> 
>  static int zfcp_fc_exec_ct_job(struct bsg_job *job,
> @@ -979,7 +979,7 @@ static int zfcp_fc_exec_ct_job(struct bsg_job *job,
>   return ret;
> 
>   ct->handler = zfcp_fc_ct_job_handler;
> - ret = zfcp_fsf_send_ct(wka_port, ct, NULL, job->req->timeout / HZ);
> + ret = zfcp_fsf_send_ct(wka_port, ct, NULL, job->timeout / HZ);
>   if (ret)
>   zfcp_fc_wka_port_put(wka_port);
> 
> diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
> index b1be0233ce35..402223c95ce1 100644
> --- a/include/linux/bsg-lib.h
> +++ b/include/linux/bsg-lib.h
> @@ -44,6 +44,8 @@ struct bsg_job {
> 
>   struct kref kref;
> 
> + unsigned int timeout;
> +
>   /* Transport/driver specific request/reply structs */
>   void *request;
>   void *reply;
> -- 
> 2.14.1
> 

Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 1/2] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer

2017-09-06 Thread Benjamin Block
On Wed, Sep 06, 2017 at 08:07:43AM -0600, Jens Axboe wrote:
> On 09/06/2017 07:44 AM, Christoph Hellwig wrote:
> > From: Benjamin Block <bbl...@linux.vnet.ibm.com>
> > 
> > Since we split the scsi_request out of struct request bsg fails to
> > provide a reply-buffer for the drivers. This was done via the pointer
> > for sense-data, that is not preallocated anymore.
> 
> Confused, this is already in master, was included before 4.13 was
> finalized.
> 

I think this is the backport for 4.11 and 4.12 as requested by Greg. I
just send a mail as answer that I would do it, but I guess Christoph
already had something in the pipe.

Or? I'll take a look.


Beste Grüße / Best regards,
      - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[PATCH v2 1/1] bsg-lib: fix use-after-free under memory-pressure

2017-10-01 Thread Benjamin Block
When under memory-pressure it is possible that the mempool which backs
the 'struct request_queue' will make use of up to BLKDEV_MIN_RQ count
emergency buffers - in case it can't get a regular allocation. These
buffers are preallocated and once they are also used, they are
re-supplied with old finished requests from the same request_queue (see
mempool_free()).

The bug is, when re-supplying the emergency pool, the old requests are
not again ran through the callback mempool_t->alloc(), and thus also not
through the callback bsg_init_rq(). Thus we skip initialization, and
while the sense-buffer still should be good, scsi_request->cmd might
have become to be an invalid pointer in the meantime. When the request
is initialized in bsg.c, and the user's CDB is larger than BLK_MAX_CDB,
bsg will replace it with a custom allocated buffer, which is freed when
the user's command is finished, thus it dangles afterwards. When next a
command is sent by the user that has a smaller/similar CDB as
BLK_MAX_CDB, bsg will assume that scsi_request->cmd is backed by
scsi_request->__cmd, will not make a custom allocation, and write into
undefined memory.

Fix this by splitting bsg_init_rq() into two functions:
 - bsg_init_rq() is changed to only do the allocation of the
   sense-buffer, which is used to back the bsg job's reply buffer. This
   pointer should never change during the lifetime of a scsi_request, so
   it doesn't need re-initialization.
 - bsg_initialize_rq() is a new function that makes use of
   'struct request_queue's initialize_rq_fn callback (which was
   introduced in v4.12). This is always called before the request is
   given out via blk_get_request(). This function does the remaining
   initialization that was previously done in bsg_init_rq(), and will
   also do it when the request is taken from the emergency-pool of the
   backing mempool.

Fixes: 50b4d485528d ("bsg-lib: fix kernel panic resulting from missing 
allocation of reply-buffer")
Cc: <sta...@vger.kernel.org> # 4.11+
Reviewed-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---

Notes:
I did test this on zFCP with FC CT commands send via the ioctl() and
write() system-call. That did work fine. But I would very much
appreciate if anyone could run this against an other HBA or even an
other implementer of bsg-lib, such as now SAS, because I have no access
to such hardware here.

This should make no difference to the normal cases - where each request
is allocated via slab - with- or without this patch; if I didn't miss
anything. Only the order is a bit mixed up - the memset is done after
the sense-allocation, so I have to buffer the sense-pointer for that.
But otherwise there is no difference I am aware of, so it should behave
the same (does for me).

I could not reproduce the memory-pressure case here in the lab.. I
don't see any reason why it should work now, but I am open to
suggestions :)

Beste Grüße / Best regards,
  - Benjamin Block

Changes in v2:
 - Renamed function names to reflect the function-pointer names in
   request_queue

 block/bsg-lib.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c82408c7cc3c..ff5a158b22fa 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -208,20 +208,34 @@ static int bsg_init_rq(struct request_queue *q, struct 
request *req, gfp_t gfp)
struct bsg_job *job = blk_mq_rq_to_pdu(req);
struct scsi_request *sreq = >sreq;
 
-   memset(job, 0, sizeof(*job));
+   /* called right after the request is allocated for the request_queue */
 
-   scsi_req_init(sreq);
-   sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
-   sreq->sense = kzalloc(sreq->sense_len, gfp);
+   sreq->sense = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);
if (!sreq->sense)
return -ENOMEM;
 
+   return 0;
+}
+
+static void bsg_initialize_rq(struct request *req)
+{
+   struct bsg_job *job = blk_mq_rq_to_pdu(req);
+   struct scsi_request *sreq = >sreq;
+   void *sense = sreq->sense;
+
+   /* called right before the request is given to the request_queue user */
+
+   memset(job, 0, sizeof(*job));
+
+   scsi_req_init(sreq);
+
+   sreq->sense = sense;
+   sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
+
job->req = req;
-   job->reply = sreq->sense;
+   job->reply = sense;
job->reply_len = sreq->sense_len;
job->dd_data = job + 1;
-
-   return 0;
 }
 
 static void bsg_exit_rq(struct request_queue *q, struct request *req)
@@ -252,6 +266,7 @@ struct request_queue *bsg_setup_queue(struct device *dev, 
const char *name,
q->cmd_size = si

Re: [PATCH 1/1] bsg-lib: fix use-after-free under memory-pressure

2017-09-25 Thread Benjamin Block
On Mon, Sep 25, 2017 at 08:53:07AM -0700, Christoph Hellwig wrote:
> > if (!q)
> > return ERR_PTR(-ENOMEM);
> > q->cmd_size = sizeof(struct bsg_job) + dd_job_size;
> > -   q->init_rq_fn = bsg_init_rq;
> > -   q->exit_rq_fn = bsg_exit_rq;
> > +   q->init_rq_fn = bsg_init_job;
> > +   q->exit_rq_fn = bsg_exit_job;
> > +   q->initialize_rq_fn = bsg_init_rq;
> 
> Please use function names that match the method names, that is keep
> the existing names and name the new helper bsg_initialize_rq;
>

OK, I can change that.


Beste Grüße / Best regards,
  - Benjamin Block

> 
> Except for that the patch looks fine to me:
> 
> Reviewed-by: Christoph Hellwig <h...@lst.de>
> 

-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer

2017-08-24 Thread Benjamin Block
On Thu, Aug 24, 2017 at 10:45:56AM +0200, Christoph Hellwig wrote:
> >  /**
> > - * bsg_destroy_job - routine to teardown/delete a bsg job
> > + * bsg_teardown_job - routine to teardown a bsg job
> >   * @job: bsg_job that is to be torn down
> >   */
> > -static void bsg_destroy_job(struct kref *kref)
> > +static void bsg_teardown_job(struct kref *kref)
> 
> Why this rename?  The destroy name seems to be one of the most
> common patterns for the kref_put callbacks.
>

Hmm, I did it mostly so it is symmetric with bsg_prepare_job() and it
doesn't really itself destroy the job-struct anymore. If there are other
thing amiss I can change that along with them, if it bothers poeple.


Beste Grüße / Best regards,
      - Benjamin Block

> 
> Otherwise this looks fine:
> 
> Reviewed-by: Christoph Hellwig <h...@lst.de>
> 

-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG

2017-08-23 Thread Benjamin Block
Hello all,

This is the second try for fixing the regression in the BSG-interface that
exists since v4.11 (for more infos see the first series).

I separated my other changes from the bug-fix so that it is easier to apply
if judged good. I will rebase my cleanups I sent in v1 and send them when I
get a bit more time. But the regression-fix is more important, so here's
that.

I did some more tests on it than on v1, including some heavy parallel I/O
on the same blk-queue using both BSG and the normal SCSI-stack at the same
time (throwing some intentional bad commands in it too). That seemed to
work all well enough - i.e. it didn't crash and got the expected results. I
haven't done any external error-inject, but IMO that would be beyond the
scope right now.

The fix is based on Christoph's idea, I discussed this with him off-list
already.

I rebased the series on Jens' for-next.

Reviews are more than welcome :)


Beste Grüße / Best regards,
  - Benjamin Block

Benjamin Block (1):
  bsg-lib: fix kernel panic resulting from missing allocation of
reply-buffer

 block/bsg-lib.c | 74 +
 include/linux/blkdev.h  |  1 -
 include/linux/bsg-lib.h |  2 ++
 3 files changed, 46 insertions(+), 31 deletions(-)

-- 
Linux on z Systems Development / IBM Systems & Technology Group
   IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer

2017-08-23 Thread Benjamin Block
Since we split the scsi_request out of struct request bsg fails to
provide a reply-buffer for the drivers. This was done via the pointer
for sense-data, that is not preallocated anymore.

Failing to allocate/assign it results in illegal dereferences because
LLDs use this pointer unquestioned.

An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:01590007 R3:0024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: 
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 65cb0100 task.stack: 65cb4000
Krnl PSW : 0704e0018000 03ff801e4156 
(zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0001 5fa9d0d0 5fa9d078 00e16866
   03ff0290 6b6b6b6b6b6b6b6b 59f78f00 000f
   593a0958 593a0958 60d88800 5ddd4c38
   58b50100 0700659cba08 03ff801e8556 659cb9a8
Krnl Code: 03ff801e4146: e3102054lg  %r1,80(%r2)
   03ff801e414c: 58402040   l   %r4,64(%r2)
  #03ff801e4150: e3502024   lg  %r5,32(%r2)
  >03ff801e4156: 50405004   st  %r4,4(%r5)
   03ff801e415a: e54c5008   mvhi8(%r5),0
   03ff801e4160: e33010280012   lt  %r3,40(%r1)
   03ff801e4166: a718fffb   lhi %r1,-5
   03ff801e416a: 1803   lr  %r0,%r3
Call Trace:
([<03ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
 [<03ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
 [<03ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
 [<009b91b6>] qdio_kick_handler+0x2ae/0x2c8
 [<009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
 [<001684c2>] tasklet_action+0x15a/0x1d8
 [<00bd28ec>] __do_softirq+0x3ec/0x848
 [<001675a4>] irq_exit+0x74/0xf8
 [<0010dd6a>] do_IRQ+0xba/0xf0
 [<00bd19e8>] io_int_handler+0x104/0x2d4
 [<001033b6>] enabled_wait+0xb6/0x188
([<0010339e>] enabled_wait+0x9e/0x188)
 [<0010396a>] arch_cpu_idle+0x32/0x50
 [<00bd0112>] default_idle_call+0x52/0x68
 [<001cd0fa>] do_idle+0x102/0x188
 [<001cd41e>] cpu_startup_entry+0x3e/0x48
 [<00118c64>] smp_start_secondary+0x11c/0x130
 [<00bd2016>] restart_int_handler+0x62/0x78
 [<>]   (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
 [<03ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]

Kernel panic - not syncing: Fatal exception in interrupt

This patch moves bsg-lib to allocate and setup struct bsg_job ahead of
time, including the allocation of a buffer for the reply-data.

This means, struct bsg_job is not allocated separately anymore, but as part
of struct request allocation - similar to struct scsi_cmd. Reflect this in
the function names that used to handle creation/destruction of struct
bsg_job.

Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Suggested-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Cc: <sta...@vger.kernel.org> #4.11+
---
 block/bsg-lib.c | 74 +
 include/linux/blkdev.h  |  1 -
 include/linux/bsg-lib.h |  2 ++
 3 files changed, 46 insertions(+), 31 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c4513b23f57a..dd56d7460cb9 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -29,26 +29,25 @@
 #include 
 
 /**
- * bsg_destroy_job - routine to teardown/delete a bsg job
+ * bsg_teardown_job - routine to teardown a bsg job
  * @job: bsg_job that is to be torn down
  */
-static void bsg_destroy_job(struct kref *kref)
+static void bsg_teardown_job(struct kref *kref)
 {
struct bsg_job *job = container_of(kref, struct bsg_job, kref);
struct request *rq = job->req;
 
-   blk_end_request_all(rq, BLK_STS_OK);
-
put_device(job->dev);   /* release reference for the request */
 
kfree(job->request_payload.sg_list);
kfree(job->reply_payload.sg_list);
-   kfree(job);
+
+   blk_end_request_all(rq, BLK_STS_OK);
 }
 
 void bsg_job_put(struct bsg_job *job)
 {
-   kref_put(>kref, bsg_destroy_job);
+   kref_p

Re: [PATCH 9/9] bsg: split handling of SCSI CDBs vs transport requeues

2017-10-24 Thread Benjamin Block
> +static int bsg_transport_complete_rq(struct request *rq, struct sg_io_v4 
> *hdr)
> +{
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> + int ret = 0;
> +
> + /*
> +  * The assignments below don't make much sense, but are kept for
> +  * bug by bug backwards compatibility:
> +  */
> + hdr->device_status = job->result & 0xff;
> + hdr->transport_status = host_byte(job->result);
> + hdr->driver_status = driver_byte(job->result);
> + hdr->info = 0;
> + if (hdr->device_status || hdr->transport_status || hdr->driver_status)
> + hdr->info |= SG_INFO_CHECK;
> + hdr->response_len = 0;
> +
> + if (job->result < 0) {
> + /* we're only returning the result field in the reply */
> + job->reply_len = sizeof(u32);
> + ret = job->result;
> + }
> +
> + if (job->reply_len && hdr->response) {
> + int len = min(hdr->max_response_len, job->reply_len);
> +
> + if (copy_to_user(ptr64(hdr->response), job->reply, len))
> + ret = -EFAULT;
> + else
> + hdr->response_len = len;
> + }
> +
> + /* we assume all request payload was transferred, residual == 0 */
> + hdr->dout_resid = 0;
> +
> + if (rq->next_rq) {
> + unsigned int rsp_len = blk_rq_bytes(rq->next_rq);
> +
> + if (WARN_ON(job->reply_payload_rcv_len > rsp_len))

This gives my a lot of new Warnings when running my tests on zFCP, non
of which happen when I run on 4.13.

After browsing the source some, I figured this is because in comparison
to the old code, blk_rq_bytes() is now called after we finished the
blk-request already and blk_update_bidi_request()+blk_update_request()
was already called. This will update rq->__data_len, and thus the call
here to get rsp_len will get a wrong value. Thus the warnings, and the
following calculation is actually wrong.

I figure you can just replace this call for blk_rq_bytes() with
'job->reply_payload.payload_len'. Its essentially the same value, but
the later is not changed after the job is committed as far as I could see
in the source. Driver makes use of it, but only reading as far as I
could make out after browsing the code for a bit.

I did a quick test with that change in place and that seems to work fine
now. As far as my tests go, they behave as they did before.


Beste Grüße / Best regards,
  - Benjamin Block

> + hdr->din_resid = 0;
> + else
> + hdr->din_resid = rsp_len - job->reply_payload_rcv_len;
> + } else {
> + hdr->din_resid = 0;
> + }
> +
> + return ret;
> +}
> +

--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 9/9] bsg: split handling of SCSI CDBs vs transport requeues

2017-10-19 Thread Benjamin Block
b.c
> @@ -2105,8 +2105,6 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct 
> request_queue *q)
>  {
>   struct device *dev = shost->dma_dev;
>
> - queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
> -
>   /*
>* this limit is imposed by hardware restrictions
>*/
> @@ -2202,6 +2200,7 @@ struct request_queue *scsi_old_alloc_queue(struct 
> scsi_device *sdev)
>   }
>
>   __scsi_init_queue(shost, q);
> + queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
>   blk_queue_prep_rq(q, scsi_prep_fn);
>   blk_queue_unprep_rq(q, scsi_unprep_fn);
>   blk_queue_softirq_done(q, scsi_softirq_done);
> @@ -2231,6 +2230,7 @@ struct request_queue *scsi_mq_alloc_queue(struct 
> scsi_device *sdev)
>
>   sdev->request_queue->queuedata = sdev;
>   __scsi_init_queue(sdev->host, sdev->request_queue);
> + queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, 
> sdev->request_queue);
>   return sdev->request_queue;
>  }
>
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index bf53356f41f0..bfb4e6875643 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1262,8 +1262,7 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
>   transport_add_device(>sdev_gendev);
>   sdev->is_visible = 1;
>
> - error = bsg_register_queue(rq, >sdev_gendev, NULL, NULL);
> -
> + error = bsg_scsi_register_queue(rq, >sdev_gendev);
>   if (error)
>   /* we're treating error on bsg register as non-fatal,
>* so pretend nothing went wrong */
> diff --git a/drivers/scsi/scsi_transport_sas.c 
> b/drivers/scsi/scsi_transport_sas.c
> index 736a1f4f9676..4889bd432382 100644
> --- a/drivers/scsi/scsi_transport_sas.c
> +++ b/drivers/scsi/scsi_transport_sas.c
> @@ -228,7 +228,6 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, 
> struct sas_rphy *rphy)
>*/
>   blk_queue_bounce_limit(q, BLK_BOUNCE_HIGH);
>   queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q);
> - queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
>   return 0;
>  }
>
> diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
> index 08762d297cbd..28a7ccc55c89 100644
> --- a/include/linux/bsg-lib.h
> +++ b/include/linux/bsg-lib.h
> @@ -38,7 +38,6 @@ struct bsg_buffer {
>  };
>
>  struct bsg_job {
> - struct scsi_request sreq;
>   struct device *dev;
>
>   struct kref kref;
> @@ -64,6 +63,9 @@ struct bsg_job {
>   struct bsg_buffer request_payload;
>   struct bsg_buffer reply_payload;
>
> + int result;
> + unsigned int reply_payload_rcv_len;
> +
>   void *dd_data;  /* Used for driver-specific storage */
>  };
>
> diff --git a/include/linux/bsg.h b/include/linux/bsg.h
> index 7173f6e9d2dd..ed98946a8324 100644
> --- a/include/linux/bsg.h
> +++ b/include/linux/bsg.h
> @@ -1,33 +1,42 @@
> -#ifndef BSG_H
> -#define BSG_H
> +#ifndef _LINUX_BSG_H
> +#define _LINUX_BSG_H
>
>  #include 
>
> +struct request;
> +
> +#ifdef CONFIG_BLK_DEV_BSG
> +struct bsg_ops {
> + int (*check_proto)(struct sg_io_v4 *hdr);
> + int (*fill_hdr)(struct request *rq, struct sg_io_v4 *hdr,
> + fmode_t mode);
> + int (*complete_rq)(struct request *rq, struct sg_io_v4 *hdr);
> + void(*free_rq)(struct request *rq);
> +};
>
> -#if defined(CONFIG_BLK_DEV_BSG)
>  struct bsg_class_device {
>   struct device *class_dev;
>   struct device *parent;
>   int minor;
>   struct request_queue *queue;
>   struct kref ref;
> + const struct bsg_ops *ops;
>   void (*release)(struct device *);
>  };
>
> -extern int bsg_register_queue(struct request_queue *q,
> -   struct device *parent, const char *name,
> -   void (*release)(struct device *));
> -extern void bsg_unregister_queue(struct request_queue *);
> +int bsg_register_queue(struct request_queue *q, struct device *parent,
> + const char *name, const struct bsg_ops *ops,
> + void (*release)(struct device *));
> +int bsg_scsi_register_queue(struct request_queue *q, struct device *parent);
> +void bsg_unregister_queue(struct request_queue *);
>  #else
> -static inline int bsg_register_queue(struct request_queue *q,
> -  struct device *parent, const char *name,
> -  void (*release)(struct device *))
> +static inline int bsg_scsi_register_queue(struct request_queue *q,
> + struct device *parent)
>  {
>   return 0;
>  }
>  static inline void bsg_unregister_queue(struct request_queue *q)
>  {
>  }
> -#endif
> -
> -#endif
> +#endif /* CONFIG_BLK_DEV_BSG */
> +#endif /* _LINUX_BSG_H */
> --
> 2.14.1
>

Otherwise I haven't seen anything. I'll run it through my tests on zFCP
and look whether I see something strange happening.

I like the idea of splitting things up here, it gets rid of some
code-duplications and unnecessary double book-keeping. Although I have
to say, looking at functions like bsg_transport_complete_rq() and
bsg_scsi_complete_rq() it also creates some new duplications that
haven't been there before.


Beste Grüße / Best regards,
  - Benjamin Block
--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 9/9] bsg: split handling of SCSI CDBs vs transport requeues

2017-10-20 Thread Benjamin Block
On Fri, Oct 20, 2017 at 06:26:30PM +0200, Christoph Hellwig wrote:
> On Thu, Oct 19, 2017 at 05:59:33PM +0200, Benjamin Block wrote:
> > > +#define ptr64(val) ((void __user *)(uintptr_t)(val))
> > 
> > Better to reflect the special property, that it is a user pointer, in
> > the name of the macro. Maybe something like user_ptr(64). The same
> > comment for the same macro in bsg.c.
> 
> Not sure it's worth it especially now that Martin has merged the patch.

He did? I only saw a mail that he picked patches 2-5. So all the bsg
changes are still open I think.

(Maybe I just missed that, I haven't exactly followed the list very
closely as of late)

> But given how many interface we have all over the kernel that use a u64
> to store a user pointer in ioctls and similar it might make sense to
> lift a helper like this to a generic header.  In that case we'll need
> a more descriptive name for sure.
> 
> > > +static int bsg_transport_check_proto(struct sg_io_v4 *hdr)
> > > +{
> > > + if (hdr->protocol != BSG_PROTOCOL_SCSI  ||
> > > + hdr->subprotocol != BSG_SUB_PROTOCOL_SCSI_TRANSPORT)
> > > + return -EINVAL;
> > > + if (!capable(CAP_SYS_RAWIO))
> > > + return -EPERM;
> > 
> > Any particular reason why this is not symmetric with bsg_scsi? IOW
> > permission checking done in bsg_transport_fill_hdr(), like it is done in
> > bsg_scsi_fill_hdr()?
> > 
> > We might save some time copying memory with this (we also only talk
> > about ~20 bytes here), but on the other hand the interface would be more
> > clean otherwise IMO (if we already do restructure the interface) -
> > similar callbacks have similar responsibilities.
> 
> I could move the capable check around, no sure why I had done it that
> way, it's been a while.  Probably because blk_verify_command needs the
> CDB while a simple capable() check does not.

That was my guess, too. I just though it would be more consistent otherwise.
Its not a big thing, really.


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294