Re: [PATCH] bsg: convert to use blk-mq

2018-10-26 Thread Benjamin Block
On Fri, Oct 26, 2018 at 10:08:30AM -0600, Jens Axboe wrote:
> On 10/26/18 10:06 AM, Benjamin Block wrote:
> > On Thu, Oct 25, 2018 at 05:12:59PM -0600, Jens Axboe wrote:
> >> On 10/23/18 12:07 PM, Jens Axboe wrote:
> >>> On 10/23/18 11:40 AM, Benjamin Block wrote:
> >>>>
> >>>> So I tried 4.19.0 with only the two patches from you:
> >>>> http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions=2b2ffa16193e9a69a076595ed64429b8cc9b42aa
> >>>> http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions=142dc9f36e3113b6a76d472978c33c8c2a2b702c
> >>>>
> >>>> This fixed the first warning from before, as you suggested, but it still
> >>>> crash like this:
> >>>>
> >>>> [ ] Unable to handle kernel pointer dereference in virtual kernel 
> >>>> address space
> >>>> [ ] Failing address:  TEID: 0483
> >>>> [ ] Fault in home space mode while using kernel ASCE.
> >>>> [ ] AS:025f0007 R3:dffb8007 S:dffbf000 
> >>>> P:013d
> >>>> [ ] Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >>>> [ ] Modules linked in: 
> >>>> [ ] CPU: 2 PID: 609 Comm: kworker/2:1H Kdump: loaded Tainted: GW 
> >>>> 4.19.0-bb-next+ #1
> >>>> [ ] Hardware name: IBM 3906 M03 704 (LPAR)
> >>>> [ ] Workqueue: kblockd blk_mq_run_work_fn
> >>>> [ ] Krnl PSW : 0704e0018000 03ff806a6b40 
> >>>> (zfcp_fc_exec_bsg_job+0x1c0/0x440 [zfcp])
> >>>> [ ]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 
> >>>> EA:3
> >>>> [ ] Krnl GPRS:  83e0f3c0  
> >>>> 0300
> >>>> [ ]0300 03ff806a6b3a a86b5948 
> >>>> a86b5988
> >>>> [ ]83e0f3f0  a86b5938 
> >>>> 984aee80
> >>>> [ ]a86b5800 03ff806ba950 03ff806a6b3a 
> >>>> 98a5ed88
> >>>> [ ] Krnl Code: 03ff806a6b30: b9040029lgr %r2,%r9
> >>>>03ff806a6b34: c0e58b7ebrasl   
> >>>> %r14,3ff80698230
> >>>>   #03ff806a6b3a: e310f0a4lg  
> >>>> %r1,160(%r15)
> >>>>   >03ff806a6b40: e3109024stg %r1,0(%r9)
> >>>>03ff806a6b46: 4120a040la  %r2,64(%r10)
> >>>>03ff806a6b4a: c0e58ae7brasl   
> >>>> %r14,3ff80698118
> >>>>03ff806a6b50: e310a044lg  %r1,64(%r10)
> >>>>03ff806a6b56: e310f0a00024stg 
> >>>> %r1,160(%r15)
> >>>> [ ] Call Trace:
> >>>> [ ] ([<03ff806a6aa2>] zfcp_fc_exec_bsg_job+0x122/0x440 [zfcp])
> >>>> [ ]  [<03ff8061e928>] fc_bsg_dispatch+0x310/0x398 [scsi_transport_fc]
> >>>> [ ]  [<00d2995a>] bsg_queue_rq+0x15a/0x198
> >>>> [ ]  [<00d03566>] blk_mq_dispatch_rq_list+0x966/0xf90
> >>>> [ ]  [<00d0e37a>] blk_mq_sched_dispatch_requests+0x3fa/0x410
> >>>> [ ]  [<00cfc230>] __blk_mq_run_hw_queue+0x218/0x248
> >>>> [ ]  [<001cb124>] process_one_work+0x8c4/0xe90
> >>>> [ ]  [<001cbe58>] worker_thread+0x768/0xbb0
> >>>> [ ]  [<001dc67a>] kthread+0x22a/0x248
> >>>> [ ]  [<0137b812>] kernel_thread_starter+0x6/0xc
> >>>> [ ]  [<0137b80c>] kernel_thread_starter+0x0/0xc
> >>>> [ ] INFO: lockdep is turned off.
> >>>> [ ] Last Breaking-Event-Address:
> >>>> [ ]  [<005d9b08>] __asan_store8+0x98/0xa0
> >>>> [ ]
> >>>> [ ] Kernel panic - not syncing: Fatal exception: panic_on_oops
> >>>>
> >>>> zfcp_fc_exec_bsg_job+0x1c0 is here:
> >>>>
> >>>> int zfcp_fc_exec_bsg_job(struct bsg_job *job)
> >>>> {
> >>>> struct Scsi_Host *shost;
> >>>> struct zfcp_adapter *adapter;
> >>>> struct zfcp_fsf_ct_els *ct_els = job->dd_data;
> >>>

Re: [PATCH] bsg: convert to use blk-mq

2018-10-26 Thread Benjamin Block
On Thu, Oct 25, 2018 at 05:12:59PM -0600, Jens Axboe wrote:
> On 10/23/18 12:07 PM, Jens Axboe wrote:
> > On 10/23/18 11:40 AM, Benjamin Block wrote:
> >>
> >> So I tried 4.19.0 with only the two patches from you:
> >> http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions=2b2ffa16193e9a69a076595ed64429b8cc9b42aa
> >> http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions=142dc9f36e3113b6a76d472978c33c8c2a2b702c
> >>
> >> This fixed the first warning from before, as you suggested, but it still
> >> crash like this:
> >>
> >> [ ] Unable to handle kernel pointer dereference in virtual kernel address 
> >> space
> >> [ ] Failing address:  TEID: 0483
> >> [ ] Fault in home space mode while using kernel ASCE.
> >> [ ] AS:025f0007 R3:dffb8007 S:dffbf000 
> >> P:013d
> >> [ ] Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >> [ ] Modules linked in: 
> >> [ ] CPU: 2 PID: 609 Comm: kworker/2:1H Kdump: loaded Tainted: GW   
> >>   4.19.0-bb-next+ #1
> >> [ ] Hardware name: IBM 3906 M03 704 (LPAR)
> >> [ ] Workqueue: kblockd blk_mq_run_work_fn
> >> [ ] Krnl PSW : 0704e0018000 03ff806a6b40 
> >> (zfcp_fc_exec_bsg_job+0x1c0/0x440 [zfcp])
> >> [ ]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
> >> [ ] Krnl GPRS:  83e0f3c0  
> >> 0300
> >> [ ]0300 03ff806a6b3a a86b5948 
> >> a86b5988
> >> [ ]83e0f3f0  a86b5938 
> >> 984aee80
> >> [ ]a86b5800 03ff806ba950 03ff806a6b3a 
> >> 98a5ed88
> >> [ ] Krnl Code: 03ff806a6b30: b9040029lgr %r2,%r9
> >>03ff806a6b34: c0e58b7ebrasl   
> >> %r14,3ff80698230
> >>   #03ff806a6b3a: e310f0a4lg  %r1,160(%r15)
> >>   >03ff806a6b40: e3109024stg %r1,0(%r9)
> >>03ff806a6b46: 4120a040la  %r2,64(%r10)
> >>03ff806a6b4a: c0e58ae7brasl   
> >> %r14,3ff80698118
> >>03ff806a6b50: e310a044lg  %r1,64(%r10)
> >>03ff806a6b56: e310f0a00024stg %r1,160(%r15)
> >> [ ] Call Trace:
> >> [ ] ([<03ff806a6aa2>] zfcp_fc_exec_bsg_job+0x122/0x440 [zfcp])
> >> [ ]  [<03ff8061e928>] fc_bsg_dispatch+0x310/0x398 [scsi_transport_fc]
> >> [ ]  [<00d2995a>] bsg_queue_rq+0x15a/0x198
> >> [ ]  [<00d03566>] blk_mq_dispatch_rq_list+0x966/0xf90
> >> [ ]  [<00d0e37a>] blk_mq_sched_dispatch_requests+0x3fa/0x410
> >> [ ]  [<00cfc230>] __blk_mq_run_hw_queue+0x218/0x248
> >> [ ]  [<001cb124>] process_one_work+0x8c4/0xe90
> >> [ ]  [<001cbe58>] worker_thread+0x768/0xbb0
> >> [ ]  [<001dc67a>] kthread+0x22a/0x248
> >> [ ]  [<0137b812>] kernel_thread_starter+0x6/0xc
> >> [ ]  [<0137b80c>] kernel_thread_starter+0x0/0xc
> >> [ ] INFO: lockdep is turned off.
> >> [ ] Last Breaking-Event-Address:
> >> [ ]  [<005d9b08>] __asan_store8+0x98/0xa0
> >> [ ]
> >> [ ] Kernel panic - not syncing: Fatal exception: panic_on_oops
> >>
> >> zfcp_fc_exec_bsg_job+0x1c0 is here:
> >>
> >> int zfcp_fc_exec_bsg_job(struct bsg_job *job)
> >> {
> >> struct Scsi_Host *shost;
> >> struct zfcp_adapter *adapter;
> >> struct zfcp_fsf_ct_els *ct_els = job->dd_data;
> >> struct fc_bsg_request *bsg_request = job->request;
> >> struct fc_rport *rport = fc_bsg_to_rport(job);
> >>
> >> shost = rport ? rport_to_shost(rport) : fc_bsg_to_shost(job);
> >> adapter = (struct zfcp_adapter *)shost->hostdata[0];
> >>
> >> if (!(atomic_read(>status) & ZFCP_STATUS_COMMON_OPEN))
> >> return -EINVAL;
> >>
> >> ==> ct_els->req = job->request_payload.sg_list;
> >> ct_els->resp = job->reply_payload.sg_list;
> >> ct_els->handler_data = job;
> >

Re: [PATCH] bsg: convert to use blk-mq

2018-10-23 Thread Benjamin Block
On Mon, Oct 22, 2018 at 06:38:36AM -0600, Jens Axboe wrote:
> On 10/22/18 4:03 AM, Benjamin Block wrote:
> > On Fri, Oct 19, 2018 at 09:50:53AM -0600, Jens Axboe wrote:
> > 
> > Ok so, that gets past the stage where we initialize the queues. Simple
> > SCSI-I/O also seems to work, that is for example an INQUIRY(10), but
> > transport commands that get passed to the driver break. Tried to send
> > a FibreChannel GPN_FT (remote port discovery).
> > 
> > As the BSG interface goes. This is a bidirectional command, that has
> > both a buffer for the request and for the reply. AFAIR BSG will create a
> > struct request for each of them. Protocol is BSG_PROTOCOL_SCSI,
> > Subprotocol BSG_SUB_PROTOCOL_SCSI_TRANSPORT. The rest should be
> > transparent till we get into the driver.
> > 
> > First got this:
> > 
> > [  566.531100] BUG: sleeping function called from invalid context at 
> > mm/slab.h:421
> > [  566.531452] in_atomic(): 1, irqs_disabled(): 0, pid: 3104, name: 
> > bsg_api_test
> > [  566.531460] 1 lock held by bsg_api_test/3104:
> > [  566.531466]  #0: cb4b58e8 (rcu_read_lock){}, at: 
> > hctx_lock+0x30/0x118
> > [  566.531498] Preemption disabled at:
> > [  566.531503] [<008175d0>] __blk_mq_delay_run_hw_queue+0x50/0x218
> > [  566.531519] CPU: 3 PID: 3104 Comm: bsg_api_test Tainted: GW  
> >4.19.0-rc6-bb-next+ #1
> > [  566.531527] Hardware name: IBM 3906 M03 704 (LPAR)
> > [  566.531533] Call Trace:
> > [  566.531544] ([<001167fa>] show_stack+0x8a/0xd8)
> > [  566.531555]  [<00bcc6d2>] dump_stack+0x9a/0xd8
> > [  566.531565]  [<00196410>] ___might_sleep+0x280/0x298
> > [  566.531576]  [<003e528c>] __kmalloc+0xbc/0x560
> > [  566.531584]  [<0083186a>] bsg_map_buffer+0x5a/0xb0
> > [  566.531591]  [<00831948>] bsg_queue_rq+0x88/0x118
> > [  566.531599]  [<0081ab56>] blk_mq_dispatch_rq_list+0x37e/0x670
> > [  566.531607]  [<0082050e>] blk_mq_do_dispatch_sched+0x11e/0x130
> > [  566.531615]  [<00820dfe>] 
> > blk_mq_sched_dispatch_requests+0x156/0x1a0
> > [  566.531622]  [<00817564>] __blk_mq_run_hw_queue+0x144/0x160
> > [  566.531630]  [<00817614>] __blk_mq_delay_run_hw_queue+0x94/0x218
> > [  566.531638]  [<008178b2>] blk_mq_run_hw_queue+0xda/0xf0
> > [  566.531645]  [<008211d8>] blk_mq_sched_insert_request+0x1a8/0x1e8
> > [  566.531653]  [<00811ee2>] blk_execute_rq_nowait+0x72/0x80
> > [  566.531660]  [<00811f66>] blk_execute_rq+0x76/0xb8
> > [  566.531778]  [<00830d0e>] bsg_ioctl+0x426/0x500
> > [  566.531787]  [<00440cb4>] do_vfs_ioctl+0x68c/0x710
> > [  566.531794]  [<00440dac>] ksys_ioctl+0x74/0xa0
> > [  566.531801]  [<00440e0a>] sys_ioctl+0x32/0x40
> > [  566.531808]  [<00bf1dd0>] system_call+0xd8/0x2d0
> > [  566.531815] 1 lock held by bsg_api_test/3104:
> > [  566.531821]  #0: cb4b58e8 (rcu_read_lock){}, at: 
> > hctx_lock+0x30/0x118
> > 
> 
> The first one is an easy fix, not sure how I missed that. The other
> one I have no idea, any chance you could try with this one:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions=142dc9f36e3113b6a76d472978c33c8c2a2b702c
> 
> which fixes the first one, and also corrects a wrong end_io call,
> but I don't think that's the cause of the above.
> 
> If it crashes, can you figure out where in the source that is?
> Basically just do
> 
> gdb vmlinux
> l *zfcp_fc_exec_bsg_job+0x116
> 
> assuming that works fine on s390 :-)
> 

So I tried 4.19.0 with only the two patches from you:
http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions=2b2ffa16193e9a69a076595ed64429b8cc9b42aa
http://git.kernel.dk/cgit/linux-block/commit/?h=mq-conversions=142dc9f36e3113b6a76d472978c33c8c2a2b702c

This fixed the first warning from before, as you suggested, but it still
crash like this:

[ ] Unable to handle kernel pointer dereference in virtual kernel address space
[ ] Failing address:  TEID: 0483
[ ] Fault in home space mode while using kernel ASCE.
[ ] AS:025f0007 R3:dffb8007 S:dffbf000 
P:013d
[ ] Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ ] Modules linked in: 
[ ] CPU: 2 PID: 609 Comm: kworker/2:1H Kdump: loaded Tainted: GW
 4.19.0-bb-next+ #1
[ ] Hardware name: IBM 3906 M03 704 (LPAR)
[ ] Workqueue: kblockd blk_mq_run_work_fn
[ ] Krnl PSW : 0704e0018000 0

Re: [PATCH] bsg: convert to use blk-mq

2018-10-22 Thread Benjamin Block
On Mon, Oct 22, 2018 at 06:38:36AM -0600, Jens Axboe wrote:
> On 10/22/18 4:03 AM, Benjamin Block wrote:
> > On Fri, Oct 19, 2018 at 09:50:53AM -0600, Jens Axboe wrote:
> >> On 10/19/18 9:01 AM, Benjamin Block wrote:
> >>> On Wed, Oct 17, 2018 at 10:01:16AM -0600, Jens Axboe wrote:
> >>>> On 10/17/18 9:55 AM, Benjamin Block wrote:
> >>>>> On Tue, Oct 16, 2018 at 08:43:01AM -0600, Jens Axboe wrote:
> >>>>>> Requires a few changes to the FC transport class as well.
> >>>>>>
> >>>>>> Cc: Johannes Thumshirn 
> >>>>>> Cc: Benjamin Block 
> >>>>>> Cc: linux-scsi@vger.kernel.org
> >>>>>> Signed-off-by: Jens Axboe 
> >>>>>> ---
> >>>>>>  block/bsg-lib.c  | 102 +--
> >>>>>>  drivers/scsi/scsi_transport_fc.c |  61 ++
> >>>>>>  2 files changed, 91 insertions(+), 72 deletions(-)
> >>>>>>
> >>
> >> but that's not going to apply cleanly... Can you just try and run my
> >> mq-conversions branch? That has everything, and it also has that
> >> alloc failure fixed.
> >>
> >> git://git.kernel.dk/linux-block mq-conversions
> >>
> > 
> > Ok so, that gets past the stage where we initialize the queues. Simple
> > SCSI-I/O also seems to work, that is for example an INQUIRY(10), but
> > transport commands that get passed to the driver break. Tried to send
> > a FibreChannel GPN_FT (remote port discovery).
> > 
> > As the BSG interface goes. This is a bidirectional command, that has
> > both a buffer for the request and for the reply. AFAIR BSG will create a
> > struct request for each of them. Protocol is BSG_PROTOCOL_SCSI,
> > Subprotocol BSG_SUB_PROTOCOL_SCSI_TRANSPORT. The rest should be
> > transparent till we get into the driver.
> > 
> > First got this:
> > 
> > [  566.531100] BUG: sleeping function called from invalid context at 
> > mm/slab.h:421
> > [  566.531452] in_atomic(): 1, irqs_disabled(): 0, pid: 3104, name: 
> > bsg_api_test
> > [  566.531460] 1 lock held by bsg_api_test/3104:
> > [  566.531466]  #0: cb4b58e8 (rcu_read_lock){}, at: 
> > hctx_lock+0x30/0x118
> > [  566.531498] Preemption disabled at:
> > [  566.531503] [<008175d0>] __blk_mq_delay_run_hw_queue+0x50/0x218
> > [  566.531519] CPU: 3 PID: 3104 Comm: bsg_api_test Tainted: GW  
> >4.19.0-rc6-bb-next+ #1
> > [  566.531527] Hardware name: IBM 3906 M03 704 (LPAR)
> > [  566.531533] Call Trace:
> > [  566.531544] ([<001167fa>] show_stack+0x8a/0xd8)
> > [  566.531555]  [<00bcc6d2>] dump_stack+0x9a/0xd8
> > [  566.531565]  [<00196410>] ___might_sleep+0x280/0x298
> > [  566.531576]  [<003e528c>] __kmalloc+0xbc/0x560
> > [  566.531584]  [<0083186a>] bsg_map_buffer+0x5a/0xb0
> > [  566.531591]  [<00831948>] bsg_queue_rq+0x88/0x118
> > [  566.531599]  [<0081ab56>] blk_mq_dispatch_rq_list+0x37e/0x670
> > [  566.531607]  [<0082050e>] blk_mq_do_dispatch_sched+0x11e/0x130
> > [  566.531615]  [<00820dfe>] 
> > blk_mq_sched_dispatch_requests+0x156/0x1a0
> > [  566.531622]  [<00817564>] __blk_mq_run_hw_queue+0x144/0x160
> > [  566.531630]  [<00817614>] __blk_mq_delay_run_hw_queue+0x94/0x218
> > [  566.531638]  [<008178b2>] blk_mq_run_hw_queue+0xda/0xf0
> > [  566.531645]  [<008211d8>] blk_mq_sched_insert_request+0x1a8/0x1e8
> > [  566.531653]  [<00811ee2>] blk_execute_rq_nowait+0x72/0x80
> > [  566.531660]  [<00811f66>] blk_execute_rq+0x76/0xb8
> > [  566.531778]  [<00830d0e>] bsg_ioctl+0x426/0x500
> > [  566.531787]  [<00440cb4>] do_vfs_ioctl+0x68c/0x710
> > [  566.531794]  [<00440dac>] ksys_ioctl+0x74/0xa0
> > [  566.531801]  [<00440e0a>] sys_ioctl+0x32/0x40
> > [  566.531808]  [<00bf1dd0>] system_call+0xd8/0x2d0
> > [  566.531815] 1 lock held by bsg_api_test/3104:
> > [  566.531821]  #0: cb4b58e8 (rcu_read_lock){}, at: 
> > hctx_lock+0x30/0x118
> > 
> > And then it dies completely:
> > 
> > [  566.531854] Unable to handle kernel pointer dereference in virtual 
> > kernel address space
> > [  566.531861] Failing address:  TEID: 0483
> > [  566.5318

Re: [PATCH] bsg: convert to use blk-mq

2018-10-22 Thread Benjamin Block
On Fri, Oct 19, 2018 at 09:50:53AM -0600, Jens Axboe wrote:
> On 10/19/18 9:01 AM, Benjamin Block wrote:
> > On Wed, Oct 17, 2018 at 10:01:16AM -0600, Jens Axboe wrote:
> >> On 10/17/18 9:55 AM, Benjamin Block wrote:
> >>> On Tue, Oct 16, 2018 at 08:43:01AM -0600, Jens Axboe wrote:
> >>>> Requires a few changes to the FC transport class as well.
> >>>>
> >>>> Cc: Johannes Thumshirn 
> >>>> Cc: Benjamin Block 
> >>>> Cc: linux-scsi@vger.kernel.org
> >>>> Signed-off-by: Jens Axboe 
> >>>> ---
> >>>>  block/bsg-lib.c  | 102 +--
> >>>>  drivers/scsi/scsi_transport_fc.c |  61 ++
> >>>>  2 files changed, 91 insertions(+), 72 deletions(-)
> >>>>
> 
> but that's not going to apply cleanly... Can you just try and run my
> mq-conversions branch? That has everything, and it also has that
> alloc failure fixed.
> 
> git://git.kernel.dk/linux-block mq-conversions
> 

Ok so, that gets past the stage where we initialize the queues. Simple
SCSI-I/O also seems to work, that is for example an INQUIRY(10), but
transport commands that get passed to the driver break. Tried to send
a FibreChannel GPN_FT (remote port discovery).

As the BSG interface goes. This is a bidirectional command, that has
both a buffer for the request and for the reply. AFAIR BSG will create a
struct request for each of them. Protocol is BSG_PROTOCOL_SCSI,
Subprotocol BSG_SUB_PROTOCOL_SCSI_TRANSPORT. The rest should be
transparent till we get into the driver.

First got this:

[  566.531100] BUG: sleeping function called from invalid context at 
mm/slab.h:421
[  566.531452] in_atomic(): 1, irqs_disabled(): 0, pid: 3104, name: bsg_api_test
[  566.531460] 1 lock held by bsg_api_test/3104:
[  566.531466]  #0: cb4b58e8 (rcu_read_lock){}, at: 
hctx_lock+0x30/0x118
[  566.531498] Preemption disabled at:
[  566.531503] [<008175d0>] __blk_mq_delay_run_hw_queue+0x50/0x218
[  566.531519] CPU: 3 PID: 3104 Comm: bsg_api_test Tainted: GW 
4.19.0-rc6-bb-next+ #1
[  566.531527] Hardware name: IBM 3906 M03 704 (LPAR)
[  566.531533] Call Trace:
[  566.531544] ([<001167fa>] show_stack+0x8a/0xd8)
[  566.531555]  [<00bcc6d2>] dump_stack+0x9a/0xd8
[  566.531565]  [<00196410>] ___might_sleep+0x280/0x298
[  566.531576]  [<003e528c>] __kmalloc+0xbc/0x560
[  566.531584]  [<0083186a>] bsg_map_buffer+0x5a/0xb0
[  566.531591]  [<00831948>] bsg_queue_rq+0x88/0x118
[  566.531599]  [<0081ab56>] blk_mq_dispatch_rq_list+0x37e/0x670
[  566.531607]  [<0082050e>] blk_mq_do_dispatch_sched+0x11e/0x130
[  566.531615]  [<00820dfe>] blk_mq_sched_dispatch_requests+0x156/0x1a0
[  566.531622]  [<00817564>] __blk_mq_run_hw_queue+0x144/0x160
[  566.531630]  [<00817614>] __blk_mq_delay_run_hw_queue+0x94/0x218
[  566.531638]  [<008178b2>] blk_mq_run_hw_queue+0xda/0xf0
[  566.531645]  [<008211d8>] blk_mq_sched_insert_request+0x1a8/0x1e8
[  566.531653]  [<00811ee2>] blk_execute_rq_nowait+0x72/0x80
[  566.531660]  [<00811f66>] blk_execute_rq+0x76/0xb8
[  566.531778]  [<00830d0e>] bsg_ioctl+0x426/0x500
[  566.531787]  [<00440cb4>] do_vfs_ioctl+0x68c/0x710
[  566.531794]  [<00440dac>] ksys_ioctl+0x74/0xa0
[  566.531801]  [<00440e0a>] sys_ioctl+0x32/0x40
[  566.531808]  [<00bf1dd0>] system_call+0xd8/0x2d0
[  566.531815] 1 lock held by bsg_api_test/3104:
[  566.531821]  #0: cb4b58e8 (rcu_read_lock){}, at: 
hctx_lock+0x30/0x118

And then it dies completely:

[  566.531854] Unable to handle kernel pointer dereference in virtual kernel 
address space
[  566.531861] Failing address:  TEID: 0483
[  566.531867] Fault in home space mode while using kernel ASCE.
[  566.531885] AS:013ec007 R3:effc8007 S:effce000 
P:013d
[  566.531927] Oops: 0004 ilc:3 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  566.531938] Modules linked in: ...
[  566.532042] CPU: 3 PID: 3104 Comm: bsg_api_test Tainted: GW 
4.19.0-rc6-bb-next+ #1
[  566.532047] Hardware name: IBM 3906 M03 704 (LPAR)
[  566.532051] Krnl PSW : d16c67b2 e4a74b5c 
(zfcp_fc_exec_bsg_job+0x116/0x2c0 [zfcp])
[  566.532071]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:3 PM:0 
RI:0 EA:3
[  566.532077] Krnl GPRS: 1000 bfb84178 0001 
8004
[  566.532082]1000 a6625108  
0001
[  566.532086]bfb870e8  b6276930 
bb3a6fc8
[  566.53209

Re: [PATCH] bsg: convert to use blk-mq

2018-10-19 Thread Benjamin Block
On Fri, Oct 19, 2018 at 09:50:53AM -0600, Jens Axboe wrote:
> On 10/19/18 9:01 AM, Benjamin Block wrote:
> > On Wed, Oct 17, 2018 at 10:01:16AM -0600, Jens Axboe wrote:
> >> On 10/17/18 9:55 AM, Benjamin Block wrote:
> >>> On Tue, Oct 16, 2018 at 08:43:01AM -0600, Jens Axboe wrote:
> >>>> Requires a few changes to the FC transport class as well.
> >>>>
> >>>> Cc: Johannes Thumshirn 
> >>>> Cc: Benjamin Block 
> >>>> Cc: linux-scsi@vger.kernel.org
> >>>> Signed-off-by: Jens Axboe 
> >>>> ---
> >>>>  block/bsg-lib.c  | 102 +--
> >>>>  drivers/scsi/scsi_transport_fc.c |  61 ++
> >>>>  2 files changed, 91 insertions(+), 72 deletions(-)
> >>>>
> >>>
> >>> Hey Jens,
> >>>
> >>> I haven't had time to look into this in any deep way - but I did plan to
> >>> -, but just to see whether it starts and runs some I/O I tried giving it
> >>> a spin and came up with nothing (see line 3 and 5):
> >>
> >> I'm an idiot, can you try this on top?
> >>
> >>
> >> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> >> index 1aa0ed3fc339..95e12b635225 100644
> >> --- a/block/bsg-lib.c
> >> +++ b/block/bsg-lib.c
> >> @@ -311,7 +311,7 @@ struct request_queue *bsg_setup_queue(struct device 
> >> *dev, const char *name,
> >>int ret = -ENOMEM;
> >>  
> >>set = kzalloc(sizeof(*set), GFP_KERNEL);
> >> -  if (set)
> >> +  if (!set)
> >>return ERR_PTR(-ENOMEM);
> >>  
> >>set->ops = _mq_ops;
> >>
> > 
> > Well, yes, that would be wrong. But it still doesn't fly (s390x stack
> > trace):
> > 
> > [   36.271953] scsi host0: scsi_eh_0: sleeping
> > [   36.272571] scsi host0: zfcp
> > [   36.298065] WARNING: CPU: 7 PID: 856 at block/blk-settings.c:71 
> > blk_queue_rq_timed_out+0x44/0x60
> > [   36.298315] Modules linked in: zfcp scsi_transport_fc dm_multipath 
> > [   36.299015] CPU: 7 PID: 856 Comm: systemd-udevd Tainted: GW  
> >4.19.0-rc8-bb-next+ #1
> > [   36.299059] Hardware name: IBM 3906 M03 704 (LPAR)
> > [   36.299101] Krnl PSW : 0704e0018000 00ced494 
> > (blk_queue_rq_timed_out+0x44/0x60)
> > [   36.299192]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 
> > PM:0 RI:0 EA:3
> > [   36.299259] Krnl GPRS:  015cee60 
> > a0ce0180 0300
> > [   36.299304]0300 00ced486 
> > a5738000 03ff8069eba0
> > [   36.299349]a11ec6a8 a0ce 
> > a11ec400 03ff805ee438
> > [   36.299394]a0ce 03ff805f6b00 
> > 00ced486 a28af0b0
> > [   36.299460] Krnl Code: 00ced486: e310c182ltg 
> > %r1,384(%r12)
> >   00ced48c: a7840004brc 
> > 8,ced494
> >  #00ced490: a7f40001brc 
> > 15,ced492
> >  >00ced494: 4120c150la  
> > %r2,336(%r12)
> >   00ced498: c0e5ffc76290brasl   
> > %r14,5d99b8
> >   00ced49e: e3b0c1500024stg 
> > %r11,336(%r12)
> >   00ced4a4: ebbff0a4lmg 
> > %r11,%r15,160(%r15)
> >   00ced4aa: 07febcr 
> > 15,%r14
> > [   36.299879] Call Trace:
> > [   36.299922] ([<a11ec6a8>] 0xa11ec6a8)
> > [   36.299979]  [<03ff805ee3fa>] fc_host_setup+0x622/0x660 
> > [scsi_transport_fc] 
> > [   36.300029]  [<00f0baca>] transport_setup_classdev+0x62/0x70 
> > [   36.300075]  [<00f0b592>] 
> > attribute_container_add_device+0x182/0x1d0 
> > [   36.300163]  [<03ff80503cae>] scsi_sysfs_add_host+0x5e/0x100 
> > [scsi_mod] 
> > [   36.300245]  [<03ff804e6d7c>] scsi_add_host_with_dma+0x2dc/0x468 
> > [scsi_mod] 
> > [   36.300310]  [<03ff806835f2>] zfcp_scsi_adapter_register+0x222/0x260 
> > [zfcp] 
> > [   36.300373]  [<03ff8066a49a>] zfcp_adapter_enqueue+0xae2/0xb20 
> > [zfcp] 
> > [   36.300435]  [<03ff8066b436>] zfcp_ccw_set_online+0xb6/0x208 [zfcp] 
> >

Re: [PATCH] bsg: convert to use blk-mq

2018-10-19 Thread Benjamin Block
On Wed, Oct 17, 2018 at 10:01:16AM -0600, Jens Axboe wrote:
> On 10/17/18 9:55 AM, Benjamin Block wrote:
> > On Tue, Oct 16, 2018 at 08:43:01AM -0600, Jens Axboe wrote:
> >> Requires a few changes to the FC transport class as well.
> >>
> >> Cc: Johannes Thumshirn 
> >> Cc: Benjamin Block 
> >> Cc: linux-scsi@vger.kernel.org
> >> Signed-off-by: Jens Axboe 
> >> ---
> >>  block/bsg-lib.c  | 102 +--
> >>  drivers/scsi/scsi_transport_fc.c |  61 ++
> >>  2 files changed, 91 insertions(+), 72 deletions(-)
> >>
> > 
> > Hey Jens,
> > 
> > I haven't had time to look into this in any deep way - but I did plan to
> > -, but just to see whether it starts and runs some I/O I tried giving it
> > a spin and came up with nothing (see line 3 and 5):
> 
> I'm an idiot, can you try this on top?
> 
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index 1aa0ed3fc339..95e12b635225 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -311,7 +311,7 @@ struct request_queue *bsg_setup_queue(struct device *dev, 
> const char *name,
>   int ret = -ENOMEM;
>  
>   set = kzalloc(sizeof(*set), GFP_KERNEL);
> - if (set)
> + if (!set)
>   return ERR_PTR(-ENOMEM);
>  
>   set->ops = _mq_ops;
> 

Well, yes, that would be wrong. But it still doesn't fly (s390x stack
trace):

[   36.271953] scsi host0: scsi_eh_0: sleeping
[   36.272571] scsi host0: zfcp
[   36.298065] WARNING: CPU: 7 PID: 856 at block/blk-settings.c:71 
blk_queue_rq_timed_out+0x44/0x60
[   36.298315] Modules linked in: zfcp scsi_transport_fc dm_multipath 
[   36.299015] CPU: 7 PID: 856 Comm: systemd-udevd Tainted: GW 
4.19.0-rc8-bb-next+ #1
[   36.299059] Hardware name: IBM 3906 M03 704 (LPAR)
[   36.299101] Krnl PSW : 0704e0018000 00ced494 
(blk_queue_rq_timed_out+0x44/0x60)
[   36.299192]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 
RI:0 EA:3
[   36.299259] Krnl GPRS:  015cee60 a0ce0180 
0300
[   36.299304]0300 00ced486 a5738000 
03ff8069eba0
[   36.299349]a11ec6a8 a0ce a11ec400 
03ff805ee438
[   36.299394]a0ce 03ff805f6b00 00ced486 
a28af0b0
[   36.299460] Krnl Code: 00ced486: e310c182ltg 
%r1,384(%r12)
  00ced48c: a7840004brc 8,ced494
 #00ced490: a7f40001brc 
15,ced492
 >00ced494: 4120c150la  
%r2,336(%r12)
  00ced498: c0e5ffc76290brasl   
%r14,5d99b8
  00ced49e: e3b0c1500024stg 
%r11,336(%r12)
  00ced4a4: ebbff0a4lmg 
%r11,%r15,160(%r15)
  00ced4aa: 07febcr 15,%r14
[   36.299879] Call Trace:
[   36.299922] ([<a11ec6a8>] 0xa11ec6a8)
[   36.299979]  [<03ff805ee3fa>] fc_host_setup+0x622/0x660 
[scsi_transport_fc] 
[   36.300029]  [<00f0baca>] transport_setup_classdev+0x62/0x70 
[   36.300075]  [<00f0b592>] attribute_container_add_device+0x182/0x1d0 
[   36.300163]  [<03ff80503cae>] scsi_sysfs_add_host+0x5e/0x100 [scsi_mod] 
[   36.300245]  [<03ff804e6d7c>] scsi_add_host_with_dma+0x2dc/0x468 
[scsi_mod] 
[   36.300310]  [<03ff806835f2>] zfcp_scsi_adapter_register+0x222/0x260 
[zfcp] 
[   36.300373]  [<03ff8066a49a>] zfcp_adapter_enqueue+0xae2/0xb20 [zfcp] 
[   36.300435]  [<03ff8066b436>] zfcp_ccw_set_online+0xb6/0x208 [zfcp] 
[   36.300482]  [<00f8ad14>] ccw_device_set_online+0x41c/0x878 
[   36.300527]  [<00f8b374>] online_store_recog_and_online+0x204/0x230 
[   36.300572]  [<00f8de20>] online_store+0x290/0x3e8 
[   36.300619]  [<007842c0>] kernfs_fop_write+0x1b0/0x288 
[   36.300663]  [<0064217e>] __vfs_write+0xee/0x398 
[   36.300705]  [<006426b4>] vfs_write+0xec/0x238 
[   36.300754]  [<00642abe>] ksys_write+0xd6/0x148 
[   36.300800]  [<0137b668>] system_call+0x2b0/0x2d0 
[   36.300843] 5 locks held by systemd-udevd/856:
[   36.300882]  #0: da3c74e2 (sb_writers#4){.+.+}, at: 
vfs_write+0xd6/0x238
[   36.301053]  #1: 2a1f461f (>mutex){+.+.}, at: 
kernfs_fop_write+0x258/0x288
[   36.301202]  #2: deb28615 (kn->count#52){.+.+}, at: 
kernfs_fop_write+0x26e/0x288
[   36.301374]  #3: 2ddb9663 (>mutex){}, at: 
online_store+0x19c/0x3e

Re: [PATCH] bsg: convert to use blk-mq

2018-10-17 Thread Benjamin Block
On Tue, Oct 16, 2018 at 08:43:01AM -0600, Jens Axboe wrote:
> Requires a few changes to the FC transport class as well.
> 
> Cc: Johannes Thumshirn 
> Cc: Benjamin Block 
> Cc: linux-scsi@vger.kernel.org
> Signed-off-by: Jens Axboe 
> ---
>  block/bsg-lib.c  | 102 +--
>  drivers/scsi/scsi_transport_fc.c |  61 ++
>  2 files changed, 91 insertions(+), 72 deletions(-)
> 

Hey Jens,

I haven't had time to look into this in any deep way - but I did plan to
-, but just to see whether it starts and runs some I/O I tried giving it
a spin and came up with nothing (see line 3 and 5):

[   14.082079] scsi host0: scsi_eh_0: sleeping
[   14.082288] scsi host0: zfcp
[   14.086823] scsi host0: fc_host0: bsg interface failed to initialize - setup 
queue
[   14.089198] qdio: 0.0.1900 ZFCP on SC 107 using AI:1 QEBSM:0 PRI:1 TDD:1 
SIGA: W AP
[   15.228583]  rport-0:0-0: failed to setup bsg queue
[   15.229886] scsi 0:0:0:0: scsi scan: INQUIRY pass 1 length 36
[   15.230801] scsi 0:0:0:0: scsi scan: INQUIRY successful with code 0x0
[   15.230860] scsi 0:0:0:0: scsi scan: INQUIRY pass 2 length 164
[   15.231171] scsi 0:0:0:0: scsi scan: INQUIRY successful with code 0x0
[   15.231190] scsi 0:0:0:0: scsi scan: peripheral device type of 31, no device 
added
[   15.233171] scsi 0:0:0:0: scsi scan: Sending REPORT LUNS to (try 0)
[   15.234144] scsi 0:0:0:0: scsi scan: REPORT LUNS successful (try 0) result 
0x0
[   15.234156] scsi 0:0:0:0: scsi scan: REPORT LUN scan
[   15.235040] scsi 0:0:0:1082671104: scsi scan: INQUIRY pass 1 length 36
[   15.235220] scsi host1: scsi_eh_1: sleeping
[   15.235336] scsi 0:0:0:1082671104: scsi scan: INQUIRY successful with code 
0x0
[   15.235355] scsi 0:0:0:1082671104: scsi scan: INQUIRY pass 2 length 164
[   15.235434] scsi host1: zfcp
[   15.235667] scsi 0:0:0:1082671104: scsi scan: INQUIRY successful with code 
0x0
[   15.235709] scsi 0:0:0:1082671104: Direct-Access IBM  2107900
  3230 PQ: 0 ANSI: 5
[   15.238468] scsi host1: fc_host1: bsg interface failed to initialize - setup 
queue


Seems to already fail at setting up the bsg interfaces for zFCP - at
loading time of the module. This is just your patch on top of
4.19.0-rc8.

-- 
With Best Regards, Benjamin Block  /  Linux on IBM Z Kernel Development
IBM Systems & Technology Group   /  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz   /  Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 9/9] bsg: split handling of SCSI CDBs vs transport requeues

2017-10-24 Thread Benjamin Block
> +static int bsg_transport_complete_rq(struct request *rq, struct sg_io_v4 
> *hdr)
> +{
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> + int ret = 0;
> +
> + /*
> +  * The assignments below don't make much sense, but are kept for
> +  * bug by bug backwards compatibility:
> +  */
> + hdr->device_status = job->result & 0xff;
> + hdr->transport_status = host_byte(job->result);
> + hdr->driver_status = driver_byte(job->result);
> + hdr->info = 0;
> + if (hdr->device_status || hdr->transport_status || hdr->driver_status)
> + hdr->info |= SG_INFO_CHECK;
> + hdr->response_len = 0;
> +
> + if (job->result < 0) {
> + /* we're only returning the result field in the reply */
> + job->reply_len = sizeof(u32);
> + ret = job->result;
> + }
> +
> + if (job->reply_len && hdr->response) {
> + int len = min(hdr->max_response_len, job->reply_len);
> +
> + if (copy_to_user(ptr64(hdr->response), job->reply, len))
> + ret = -EFAULT;
> + else
> + hdr->response_len = len;
> + }
> +
> + /* we assume all request payload was transferred, residual == 0 */
> + hdr->dout_resid = 0;
> +
> + if (rq->next_rq) {
> + unsigned int rsp_len = blk_rq_bytes(rq->next_rq);
> +
> + if (WARN_ON(job->reply_payload_rcv_len > rsp_len))

This gives my a lot of new Warnings when running my tests on zFCP, non
of which happen when I run on 4.13.

After browsing the source some, I figured this is because in comparison
to the old code, blk_rq_bytes() is now called after we finished the
blk-request already and blk_update_bidi_request()+blk_update_request()
was already called. This will update rq->__data_len, and thus the call
here to get rsp_len will get a wrong value. Thus the warnings, and the
following calculation is actually wrong.

I figure you can just replace this call for blk_rq_bytes() with
'job->reply_payload.payload_len'. Its essentially the same value, but
the later is not changed after the job is committed as far as I could see
in the source. Driver makes use of it, but only reading as far as I
could make out after browsing the code for a bit.

I did a quick test with that change in place and that seems to work fine
now. As far as my tests go, they behave as they did before.


Beste Grüße / Best regards,
  - Benjamin Block

> + hdr->din_resid = 0;
> + else
> + hdr->din_resid = rsp_len - job->reply_payload_rcv_len;
> + } else {
> + hdr->din_resid = 0;
> + }
> +
> + return ret;
> +}
> +

--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 9/9] bsg: split handling of SCSI CDBs vs transport requeues

2017-10-20 Thread Benjamin Block
On Fri, Oct 20, 2017 at 06:26:30PM +0200, Christoph Hellwig wrote:
> On Thu, Oct 19, 2017 at 05:59:33PM +0200, Benjamin Block wrote:
> > > +#define ptr64(val) ((void __user *)(uintptr_t)(val))
> > 
> > Better to reflect the special property, that it is a user pointer, in
> > the name of the macro. Maybe something like user_ptr(64). The same
> > comment for the same macro in bsg.c.
> 
> Not sure it's worth it especially now that Martin has merged the patch.

He did? I only saw a mail that he picked patches 2-5. So all the bsg
changes are still open I think.

(Maybe I just missed that, I haven't exactly followed the list very
closely as of late)

> But given how many interface we have all over the kernel that use a u64
> to store a user pointer in ioctls and similar it might make sense to
> lift a helper like this to a generic header.  In that case we'll need
> a more descriptive name for sure.
> 
> > > +static int bsg_transport_check_proto(struct sg_io_v4 *hdr)
> > > +{
> > > + if (hdr->protocol != BSG_PROTOCOL_SCSI  ||
> > > + hdr->subprotocol != BSG_SUB_PROTOCOL_SCSI_TRANSPORT)
> > > + return -EINVAL;
> > > + if (!capable(CAP_SYS_RAWIO))
> > > + return -EPERM;
> > 
> > Any particular reason why this is not symmetric with bsg_scsi? IOW
> > permission checking done in bsg_transport_fill_hdr(), like it is done in
> > bsg_scsi_fill_hdr()?
> > 
> > We might save some time copying memory with this (we also only talk
> > about ~20 bytes here), but on the other hand the interface would be more
> > clean otherwise IMO (if we already do restructure the interface) -
> > similar callbacks have similar responsibilities.
> 
> I could move the capable check around, no sure why I had done it that
> way, it's been a while.  Probably because blk_verify_command needs the
> CDB while a simple capable() check does not.

That was my guess, too. I just though it would be more consistent otherwise.
Its not a big thing, really.


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 9/9] bsg: split handling of SCSI CDBs vs transport requeues

2017-10-19 Thread Benjamin Block
b.c
> @@ -2105,8 +2105,6 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct 
> request_queue *q)
>  {
>   struct device *dev = shost->dma_dev;
>
> - queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
> -
>   /*
>* this limit is imposed by hardware restrictions
>*/
> @@ -2202,6 +2200,7 @@ struct request_queue *scsi_old_alloc_queue(struct 
> scsi_device *sdev)
>   }
>
>   __scsi_init_queue(shost, q);
> + queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
>   blk_queue_prep_rq(q, scsi_prep_fn);
>   blk_queue_unprep_rq(q, scsi_unprep_fn);
>   blk_queue_softirq_done(q, scsi_softirq_done);
> @@ -2231,6 +2230,7 @@ struct request_queue *scsi_mq_alloc_queue(struct 
> scsi_device *sdev)
>
>   sdev->request_queue->queuedata = sdev;
>   __scsi_init_queue(sdev->host, sdev->request_queue);
> + queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, 
> sdev->request_queue);
>   return sdev->request_queue;
>  }
>
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index bf53356f41f0..bfb4e6875643 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1262,8 +1262,7 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev)
>   transport_add_device(>sdev_gendev);
>   sdev->is_visible = 1;
>
> - error = bsg_register_queue(rq, >sdev_gendev, NULL, NULL);
> -
> + error = bsg_scsi_register_queue(rq, >sdev_gendev);
>   if (error)
>   /* we're treating error on bsg register as non-fatal,
>* so pretend nothing went wrong */
> diff --git a/drivers/scsi/scsi_transport_sas.c 
> b/drivers/scsi/scsi_transport_sas.c
> index 736a1f4f9676..4889bd432382 100644
> --- a/drivers/scsi/scsi_transport_sas.c
> +++ b/drivers/scsi/scsi_transport_sas.c
> @@ -228,7 +228,6 @@ static int sas_bsg_initialize(struct Scsi_Host *shost, 
> struct sas_rphy *rphy)
>*/
>   blk_queue_bounce_limit(q, BLK_BOUNCE_HIGH);
>   queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q);
> - queue_flag_set_unlocked(QUEUE_FLAG_SCSI_PASSTHROUGH, q);
>   return 0;
>  }
>
> diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
> index 08762d297cbd..28a7ccc55c89 100644
> --- a/include/linux/bsg-lib.h
> +++ b/include/linux/bsg-lib.h
> @@ -38,7 +38,6 @@ struct bsg_buffer {
>  };
>
>  struct bsg_job {
> - struct scsi_request sreq;
>   struct device *dev;
>
>   struct kref kref;
> @@ -64,6 +63,9 @@ struct bsg_job {
>   struct bsg_buffer request_payload;
>   struct bsg_buffer reply_payload;
>
> + int result;
> + unsigned int reply_payload_rcv_len;
> +
>   void *dd_data;  /* Used for driver-specific storage */
>  };
>
> diff --git a/include/linux/bsg.h b/include/linux/bsg.h
> index 7173f6e9d2dd..ed98946a8324 100644
> --- a/include/linux/bsg.h
> +++ b/include/linux/bsg.h
> @@ -1,33 +1,42 @@
> -#ifndef BSG_H
> -#define BSG_H
> +#ifndef _LINUX_BSG_H
> +#define _LINUX_BSG_H
>
>  #include 
>
> +struct request;
> +
> +#ifdef CONFIG_BLK_DEV_BSG
> +struct bsg_ops {
> + int (*check_proto)(struct sg_io_v4 *hdr);
> + int (*fill_hdr)(struct request *rq, struct sg_io_v4 *hdr,
> + fmode_t mode);
> + int (*complete_rq)(struct request *rq, struct sg_io_v4 *hdr);
> + void(*free_rq)(struct request *rq);
> +};
>
> -#if defined(CONFIG_BLK_DEV_BSG)
>  struct bsg_class_device {
>   struct device *class_dev;
>   struct device *parent;
>   int minor;
>   struct request_queue *queue;
>   struct kref ref;
> + const struct bsg_ops *ops;
>   void (*release)(struct device *);
>  };
>
> -extern int bsg_register_queue(struct request_queue *q,
> -   struct device *parent, const char *name,
> -   void (*release)(struct device *));
> -extern void bsg_unregister_queue(struct request_queue *);
> +int bsg_register_queue(struct request_queue *q, struct device *parent,
> + const char *name, const struct bsg_ops *ops,
> + void (*release)(struct device *));
> +int bsg_scsi_register_queue(struct request_queue *q, struct device *parent);
> +void bsg_unregister_queue(struct request_queue *);
>  #else
> -static inline int bsg_register_queue(struct request_queue *q,
> -  struct device *parent, const char *name,
> -  void (*release)(struct device *))
> +static inline int bsg_scsi_register_queue(struct request_queue *q,
> + struct device *parent)
>  {
>   return 0;
>  }
>  static inline void bsg_unregister_queue(struct request_queue *q)
>  {
>  }
> -#endif
> -
> -#endif
> +#endif /* CONFIG_BLK_DEV_BSG */
> +#endif /* _LINUX_BSG_H */
> --
> 2.14.1
>

Otherwise I haven't seen anything. I'll run it through my tests on zFCP
and look whether I see something strange happening.

I like the idea of splitting things up here, it gets rid of some
code-duplications and unnecessary double book-keeping. Although I have
to say, looking at functions like bsg_transport_complete_rq() and
bsg_scsi_complete_rq() it also creates some new duplications that
haven't been there before.


Beste Grüße / Best regards,
  - Benjamin Block
--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 8/9] block: pass full fmode_t to blk_verify_command

2017-10-16 Thread Benjamin Block
0;
> 
>   return -EPERM;
> @@ -234,7 +234,7 @@ static int blk_fill_sghdr_rq(struct request_queue *q, 
> struct request *rq,
> 
>   if (copy_from_user(req->cmd, hdr->cmdp, hdr->cmd_len))
>   return -EFAULT;
> - if (blk_verify_command(req->cmd, mode & FMODE_WRITE))
> + if (blk_verify_command(req->cmd, mode))
>   return -EPERM;
> 
>   /*
> @@ -469,7 +469,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk 
> *disk, fmode_t mode,
>   if (in_len && copy_from_user(buffer, sic->data + cmdlen, in_len))
>   goto error;
> 
> - err = blk_verify_command(req->cmd, mode & FMODE_WRITE);
> + err = blk_verify_command(req->cmd, mode);
>   if (err)
>   goto error;
> 
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 0419c2298eab..92fd870e1315 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -217,7 +217,7 @@ static int sg_allow_access(struct file *filp, unsigned 
> char *cmd)
>   if (sfp->parentdp->device->type == TYPE_SCANNER)
>   return 0;
> 
> - return blk_verify_command(cmd, filp->f_mode & FMODE_WRITE);
> + return blk_verify_command(cmd, filp->f_mode);
>  }
> 
>  static int
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 02fa42d24b52..75fe9d45ead1 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1371,7 +1371,7 @@ static inline int sb_issue_zeroout(struct super_block 
> *sb, sector_t block,
>   gfp_mask, 0);
>  }
> 
> -extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
> +extern int blk_verify_command(unsigned char *cmd, fmode_t mode);
> 
>  enum blk_default_limits {
>   BLK_MAX_SEGMENTS= 128,
> -- 
> 2.14.1
> 

Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 7/9] bsg-lib: remove bsg_job.req

2017-10-16 Thread Benjamin Block
On Tue, Oct 03, 2017 at 12:48:43PM +0200, Christoph Hellwig wrote:
> Users of the bsg-lib interface should only use the bsg_job data structure
> and not know about implementation details of it.
> 
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> ---
>  block/bsg-lib.c | 14 ++
>  include/linux/bsg-lib.h |  1 -
>  2 files changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index 0d5bbf6a2ddd..6299526bd2c3 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -35,7 +35,7 @@
>  static void bsg_teardown_job(struct kref *kref)
>  {
>   struct bsg_job *job = container_of(kref, struct bsg_job, kref);
> - struct request *rq = job->req;
> + struct request *rq = blk_mq_rq_from_pdu(job);
> 
>   put_device(job->dev);   /* release reference for the request */
> 
> @@ -68,19 +68,18 @@ EXPORT_SYMBOL_GPL(bsg_job_get);
>  void bsg_job_done(struct bsg_job *job, int result,
> unsigned int reply_payload_rcv_len)
>  {
> - struct request *req = job->req;
> + struct request *req = blk_mq_rq_from_pdu(job);
>   struct request *rsp = req->next_rq;
> - struct scsi_request *rq = scsi_req(req);
>   int err;
> 
> - err = scsi_req(job->req)->result = result;
> + err = job->sreq.result = result;
>   if (err < 0)
>   /* we're only returning the result field in the reply */
> - rq->sense_len = sizeof(u32);
> + job->sreq.sense_len = sizeof(u32);
>   else
> - rq->sense_len = job->reply_len;
> + job->sreq.sense_len = job->reply_len;
>   /* we assume all request payload was transferred, residual == 0 */
> - rq->resid_len = 0;
> + job->sreq.resid_len = 0;
> 
>   if (rsp) {
>   WARN_ON(reply_payload_rcv_len > scsi_req(rsp)->resid_len);
> @@ -232,7 +231,6 @@ static void bsg_initialize_rq(struct request *req)
>   sreq->sense = sense;
>   sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
> 
> - job->req = req;
>   job->reply = sense;
>   job->reply_len = sreq->sense_len;
>   job->dd_data = job + 1;
> diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
> index 402223c95ce1..08762d297cbd 100644
> --- a/include/linux/bsg-lib.h
> +++ b/include/linux/bsg-lib.h
> @@ -40,7 +40,6 @@ struct bsg_buffer {
>  struct bsg_job {
>   struct scsi_request sreq;
>   struct device *dev;
> - struct request *req;
> 
>   struct kref kref;
> 
> -- 
> 2.14.1
> 

Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 6/9] bsg-lib: introduce a timeout field in struct bsg_job

2017-10-16 Thread Benjamin Block
On Tue, Oct 03, 2017 at 12:48:42PM +0200, Christoph Hellwig wrote:
> The zfcp driver wants to know the timeout for a bsg job, so add a field
> to struct bsg_job for it in preparation of not exposing the request
> to the bsg-lib users.
> 
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> ---
>  block/bsg-lib.c | 1 +
>  drivers/s390/scsi/zfcp_fc.c | 4 ++--
>  include/linux/bsg-lib.h | 2 ++
>  3 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index 15d25ccd51a5..0d5bbf6a2ddd 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -132,6 +132,7 @@ static int bsg_prepare_job(struct device *dev, struct 
> request *req)
>   struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   int ret;
> 
> + job->timeout = req->timeout;
>   job->request = rq->cmd;
>   job->request_len = rq->cmd_len;
> 
> diff --git a/drivers/s390/scsi/zfcp_fc.c b/drivers/s390/scsi/zfcp_fc.c
> index 8210645c2111..9d6f69b92c81 100644
> --- a/drivers/s390/scsi/zfcp_fc.c
> +++ b/drivers/s390/scsi/zfcp_fc.c
> @@ -960,7 +960,7 @@ static int zfcp_fc_exec_els_job(struct bsg_job *job,
>   d_id = ntoh24(bsg_request->rqst_data.h_els.port_id);
> 
>   els->handler = zfcp_fc_ct_els_job_handler;
> - return zfcp_fsf_send_els(adapter, d_id, els, job->req->timeout / HZ);
> + return zfcp_fsf_send_els(adapter, d_id, els, job->timeout / HZ);
>  }
> 
>  static int zfcp_fc_exec_ct_job(struct bsg_job *job,
> @@ -979,7 +979,7 @@ static int zfcp_fc_exec_ct_job(struct bsg_job *job,
>   return ret;
> 
>   ct->handler = zfcp_fc_ct_job_handler;
> - ret = zfcp_fsf_send_ct(wka_port, ct, NULL, job->req->timeout / HZ);
> + ret = zfcp_fsf_send_ct(wka_port, ct, NULL, job->timeout / HZ);
>   if (ret)
>   zfcp_fc_wka_port_put(wka_port);
> 
> diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
> index b1be0233ce35..402223c95ce1 100644
> --- a/include/linux/bsg-lib.h
> +++ b/include/linux/bsg-lib.h
> @@ -44,6 +44,8 @@ struct bsg_job {
> 
>   struct kref kref;
> 
> + unsigned int timeout;
> +
>   /* Transport/driver specific request/reply structs */
>   void *request;
>   void *reply;
> -- 
> 2.14.1
> 

Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[PATCH v2 1/1] bsg-lib: fix use-after-free under memory-pressure

2017-10-01 Thread Benjamin Block
When under memory-pressure it is possible that the mempool which backs
the 'struct request_queue' will make use of up to BLKDEV_MIN_RQ count
emergency buffers - in case it can't get a regular allocation. These
buffers are preallocated and once they are also used, they are
re-supplied with old finished requests from the same request_queue (see
mempool_free()).

The bug is, when re-supplying the emergency pool, the old requests are
not again ran through the callback mempool_t->alloc(), and thus also not
through the callback bsg_init_rq(). Thus we skip initialization, and
while the sense-buffer still should be good, scsi_request->cmd might
have become to be an invalid pointer in the meantime. When the request
is initialized in bsg.c, and the user's CDB is larger than BLK_MAX_CDB,
bsg will replace it with a custom allocated buffer, which is freed when
the user's command is finished, thus it dangles afterwards. When next a
command is sent by the user that has a smaller/similar CDB as
BLK_MAX_CDB, bsg will assume that scsi_request->cmd is backed by
scsi_request->__cmd, will not make a custom allocation, and write into
undefined memory.

Fix this by splitting bsg_init_rq() into two functions:
 - bsg_init_rq() is changed to only do the allocation of the
   sense-buffer, which is used to back the bsg job's reply buffer. This
   pointer should never change during the lifetime of a scsi_request, so
   it doesn't need re-initialization.
 - bsg_initialize_rq() is a new function that makes use of
   'struct request_queue's initialize_rq_fn callback (which was
   introduced in v4.12). This is always called before the request is
   given out via blk_get_request(). This function does the remaining
   initialization that was previously done in bsg_init_rq(), and will
   also do it when the request is taken from the emergency-pool of the
   backing mempool.

Fixes: 50b4d485528d ("bsg-lib: fix kernel panic resulting from missing 
allocation of reply-buffer")
Cc: <sta...@vger.kernel.org> # 4.11+
Reviewed-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---

Notes:
I did test this on zFCP with FC CT commands send via the ioctl() and
write() system-call. That did work fine. But I would very much
appreciate if anyone could run this against an other HBA or even an
other implementer of bsg-lib, such as now SAS, because I have no access
to such hardware here.

This should make no difference to the normal cases - where each request
is allocated via slab - with- or without this patch; if I didn't miss
anything. Only the order is a bit mixed up - the memset is done after
the sense-allocation, so I have to buffer the sense-pointer for that.
But otherwise there is no difference I am aware of, so it should behave
the same (does for me).

I could not reproduce the memory-pressure case here in the lab.. I
don't see any reason why it should work now, but I am open to
suggestions :)

Beste Grüße / Best regards,
  - Benjamin Block

Changes in v2:
 - Renamed function names to reflect the function-pointer names in
   request_queue

 block/bsg-lib.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c82408c7cc3c..ff5a158b22fa 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -208,20 +208,34 @@ static int bsg_init_rq(struct request_queue *q, struct 
request *req, gfp_t gfp)
struct bsg_job *job = blk_mq_rq_to_pdu(req);
struct scsi_request *sreq = >sreq;
 
-   memset(job, 0, sizeof(*job));
+   /* called right after the request is allocated for the request_queue */
 
-   scsi_req_init(sreq);
-   sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
-   sreq->sense = kzalloc(sreq->sense_len, gfp);
+   sreq->sense = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);
if (!sreq->sense)
return -ENOMEM;
 
+   return 0;
+}
+
+static void bsg_initialize_rq(struct request *req)
+{
+   struct bsg_job *job = blk_mq_rq_to_pdu(req);
+   struct scsi_request *sreq = >sreq;
+   void *sense = sreq->sense;
+
+   /* called right before the request is given to the request_queue user */
+
+   memset(job, 0, sizeof(*job));
+
+   scsi_req_init(sreq);
+
+   sreq->sense = sense;
+   sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
+
job->req = req;
-   job->reply = sreq->sense;
+   job->reply = sense;
job->reply_len = sreq->sense_len;
job->dd_data = job + 1;
-
-   return 0;
 }
 
 static void bsg_exit_rq(struct request_queue *q, struct request *req)
@@ -252,6 +266,7 @@ struct request_queue *bsg_setup_queue(struct device *dev, 
const char *name,
q->cmd_size = si

Re: [PATCH 1/1] bsg-lib: fix use-after-free under memory-pressure

2017-09-25 Thread Benjamin Block
On Mon, Sep 25, 2017 at 08:53:07AM -0700, Christoph Hellwig wrote:
> > if (!q)
> > return ERR_PTR(-ENOMEM);
> > q->cmd_size = sizeof(struct bsg_job) + dd_job_size;
> > -   q->init_rq_fn = bsg_init_rq;
> > -   q->exit_rq_fn = bsg_exit_rq;
> > +   q->init_rq_fn = bsg_init_job;
> > +   q->exit_rq_fn = bsg_exit_job;
> > +   q->initialize_rq_fn = bsg_init_rq;
> 
> Please use function names that match the method names, that is keep
> the existing names and name the new helper bsg_initialize_rq;
>

OK, I can change that.


Beste Grüße / Best regards,
  - Benjamin Block

> 
> Except for that the patch looks fine to me:
> 
> Reviewed-by: Christoph Hellwig <h...@lst.de>
> 

-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: Is the possible cross-talking between unrelated file-descriptors on bsg-device by design?

2017-09-21 Thread Benjamin Block
On Tue, Sep 19, 2017 at 02:16:26PM -0400, Douglas Gilbert wrote:
> On 2017-09-19 10:56 AM, Benjamin Block wrote:
> > Hello linux-block,
> > 
> > I wrote some tests recently to test patches against bsg.c and bsg-lib.c,
> > and while writing those I noticed something strange:
> > 
> > When you use the write() and read() call on multiple file-descriptors
> > for a single bsg-device (FC or SCSI), it is possible that you get
> > cross-talk between those different file-descriptors.
> > 
> > E.g.: You have two independent processes open a FD on /dev/bsg/fc_host0
> > and you send commands via write() in both processes, when they both do
> > read() later on - to read the response for the commands they send before
> > -, it is possible that process A gets the response (Sense-Data,
> > FC-Protocol-Data, ...) for a command that process B wrote and vis-a-vis.
> > 
> > I noticed this because in my tests I 'tag' each command I send with
> > write() via a value I write into the usr_ptr field of sg_io_v4. When I
> > later user read() to receive responses I check for this tag in a
> > hash-table and so can look up the original command. When I used this and
> > spawned multiple processes with the same target bsg-device, I got
> > responses for commands I don't have tags for in my hash-table, so they
> > were written by an other process. This never happend when I only spawn
> > one test-process.
> > 
> > This seems awfully contra-productive.. so much that I don't see any
> > point in allowing users to open more than one FD per bsg-device, because
> > that would inevitably lead to very hard to debug 'bugs'. And I wonder if
> > this is really by design or some regression that happend over time.
> > 
> > I looked at the code in bsg.c and yes, it seems this is what is coded
> > right now. We have a hash-table which manages all 'struct bsg_device's
> > who are indexed by device-minors, so one hash-table entry per
> > device-node.
> > 
> > So eh, long talk short question, is that intended?
> 
> Hi,
> About once a year I point out that major misfeature in the bsg driver
> and no-one seems to care. Most recently I pointed it out in a
> discussion about SCSI SG CVE-2017-0794 6 days ago:
>   " ... Last time I looked at the bsg driver, a SCSI command
> could be issued on one file descriptor and its data-in block
> and SCSI status delivered to another file descriptor to the
> same block device (potentially in another process). IOW chaos"
> 
> It is hard to imagine any sane application relying on this bizarre
> behaviour so fixing it should not break any applications. My guess
> is that it would require a non-trivial patch set to fix. Would you
> like to volunteer?
> 

Interesting. So this is not a regression then.

Well, personally I am intrigued to try to fix this, but professionally
it is not really up to me to choose to work on this. Especially because
as you say, this looks not trivial - at least if you want to let
multiple applications open a FD for a single BSG device node. Lets see.

But thank you for elaborating on this!


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[PATCH 1/1] bsg-lib: fix use-after-free under memory-pressure

2017-09-21 Thread Benjamin Block
When under memory-pressure it is possible that the mempool which backs
the 'struct request_queue' will make use of up to BLKDEV_MIN_RQ count
emergency buffers - in case it can't get a regular allocation. These
buffers are preallocated and once they are also used, they are
re-supplied with old finished requests from the same request_queue (see
mempool_free()).

The bug is, when re-supplying the emergency pool, the old requests are
not again ran through the callback mempool_t->alloc(), and thus also not
through the callback bsg_init_rq(). Thus we skip initialization, and
while the sense-buffer still should be good, scsi_request->cmd might
have become to be an invalid pointer in the meantime. When the request
is initialized in bsg.c, and the user's CDB is larger than BLK_MAX_CDB,
bsg will replace it with a custom allocated buffer, which is freed when
the user's command is finished, thus it dangles afterwards. When next a
command is sent by the user that has a smaller/similar CDB as
BLK_MAX_CDB, bsg will assume that scsi_request->cmd is backed by
scsi_request->__cmd, will not make a custom allocation, and write into
undefined memory.

Fix this by splitting bsg_init_rq() into two functions:
 - bsg_init_job() directly replace bsg_init_rq() and only does the
   allocation of the sense-buffer, which is used to back the bsg job's
   reply buffer. This pointer should never change during the lifetime of
   a scsi_request, so it doesn't need re-initialization.
 - bsg_init_rq() is a new function that make use of
   'struct request_queue's initialize_rq_fn callback (which was
   introduced in v4.12). This is always called before the request is
   given out via blk_get_request(). This function does the remaining
   initialization that was previously done in bsg_init_rq(), and will
   also do it when the request is taken from the emergency-pool of the
   backing mempool.

Also rename bsg_exit_rq() into bsg_exit_job(), to make it fit the
name-scheme.

Fixes: 50b4d485528d ("bsg-lib: fix kernel panic resulting from missing 
allocation of reply-buffer")
Cc: <sta...@vger.kernel.org> # 4.11+
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---

Notes:
I did test this on zFCP with FC CT commands send via the ioctl() and
write() system-call. That did work fine. But I would very much
appreciate if anyone could run this against an other HBA or even an
other implementer of bsg-lib, such as now SAS, because I have no access
to such hardware here.

This should make no difference to the normal cases - where each request
is allocated via slab - with- or without this patch; if I didn't miss
anything. Only the order is a bit mixed up - the memset is done after
the sense-allocation, so I have to buffer the sense-pointer for that.
But otherwise there is no difference I am aware of, so it should behave
the same (does for me).

I could not reproduce the memory-pressure case here in the lab.. I
don't see any reason why it should work now, but I am open to
suggestions :)

Beste Grüße / Best regards,
      - Benjamin Block

 block/bsg-lib.c | 41 -
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c82408c7cc3c..634d1557da38 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -203,28 +203,42 @@ static void bsg_request_fn(struct request_queue *q)
spin_lock_irq(q->queue_lock);
 }
 
-static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t gfp)
+static int bsg_init_job(struct request_queue *q, struct request *req, gfp_t 
gfp)
 {
struct bsg_job *job = blk_mq_rq_to_pdu(req);
struct scsi_request *sreq = >sreq;
 
-   memset(job, 0, sizeof(*job));
+   /* called right after the request is allocated for the request_queue */
 
-   scsi_req_init(sreq);
-   sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
-   sreq->sense = kzalloc(sreq->sense_len, gfp);
+   sreq->sense = kzalloc(SCSI_SENSE_BUFFERSIZE, gfp);
if (!sreq->sense)
return -ENOMEM;
 
-   job->req = req;
-   job->reply = sreq->sense;
-   job->reply_len = sreq->sense_len;
-   job->dd_data = job + 1;
-
return 0;
 }
 
-static void bsg_exit_rq(struct request_queue *q, struct request *req)
+static void bsg_init_rq(struct request *req)
+{
+   struct bsg_job *job = blk_mq_rq_to_pdu(req);
+   struct scsi_request *sreq = >sreq;
+   void *sense = sreq->sense;
+
+   /* called right before the request is given to the request_queue user */
+
+   memset(job, 0, sizeof(*job));
+
+   scsi_req_init(sreq);
+
+   sreq->sense = sense;
+   sreq->sense_len = SCSI_SENSE_BUFFERSIZE;
+
+   job->req = req;
+   job->reply = sense;

Is the possible cross-talking between unrelated file-descriptors on bsg-device by design?

2017-09-19 Thread Benjamin Block
Hello linux-block,

I wrote some tests recently to test patches against bsg.c and bsg-lib.c,
and while writing those I noticed something strange:

When you use the write() and read() call on multiple file-descriptors
for a single bsg-device (FC or SCSI), it is possible that you get
cross-talk between those different file-descriptors.

E.g.: You have two independent processes open a FD on /dev/bsg/fc_host0
and you send commands via write() in both processes, when they both do
read() later on - to read the response for the commands they send before
-, it is possible that process A gets the response (Sense-Data,
FC-Protocol-Data, ...) for a command that process B wrote and vis-a-vis.

I noticed this because in my tests I 'tag' each command I send with
write() via a value I write into the usr_ptr field of sg_io_v4. When I
later user read() to receive responses I check for this tag in a
hash-table and so can look up the original command. When I used this and
spawned multiple processes with the same target bsg-device, I got
responses for commands I don't have tags for in my hash-table, so they
were written by an other process. This never happend when I only spawn
one test-process.

This seems awfully contra-productive.. so much that I don't see any
point in allowing users to open more than one FD per bsg-device, because
that would inevitably lead to very hard to debug 'bugs'. And I wonder if
this is really by design or some regression that happend over time.

I looked at the code in bsg.c and yes, it seems this is what is coded
right now. We have a hash-table which manages all 'struct bsg_device's
who are indexed by device-minors, so one hash-table entry per
device-node.

So eh, long talk short question, is that intended?



Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 1/2] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer

2017-09-06 Thread Benjamin Block
On Wed, Sep 06, 2017 at 08:07:43AM -0600, Jens Axboe wrote:
> On 09/06/2017 07:44 AM, Christoph Hellwig wrote:
> > From: Benjamin Block <bbl...@linux.vnet.ibm.com>
> > 
> > Since we split the scsi_request out of struct request bsg fails to
> > provide a reply-buffer for the drivers. This was done via the pointer
> > for sense-data, that is not preallocated anymore.
> 
> Confused, this is already in master, was included before 4.13 was
> finalized.
> 

I think this is the backport for 4.11 and 4.12 as requested by Greg. I
just send a mail as answer that I would do it, but I guess Christoph
already had something in the pipe.

Or? I'll take a look.


Beste Grüße / Best regards,
      - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer

2017-08-24 Thread Benjamin Block
On Thu, Aug 24, 2017 at 10:45:56AM +0200, Christoph Hellwig wrote:
> >  /**
> > - * bsg_destroy_job - routine to teardown/delete a bsg job
> > + * bsg_teardown_job - routine to teardown a bsg job
> >   * @job: bsg_job that is to be torn down
> >   */
> > -static void bsg_destroy_job(struct kref *kref)
> > +static void bsg_teardown_job(struct kref *kref)
> 
> Why this rename?  The destroy name seems to be one of the most
> common patterns for the kref_put callbacks.
>

Hmm, I did it mostly so it is symmetric with bsg_prepare_job() and it
doesn't really itself destroy the job-struct anymore. If there are other
thing amiss I can change that along with them, if it bothers poeple.


Beste Grüße / Best regards,
      - Benjamin Block

> 
> Otherwise this looks fine:
> 
> Reviewed-by: Christoph Hellwig <h...@lst.de>
> 

-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG

2017-08-23 Thread Benjamin Block
Hello all,

This is the second try for fixing the regression in the BSG-interface that
exists since v4.11 (for more infos see the first series).

I separated my other changes from the bug-fix so that it is easier to apply
if judged good. I will rebase my cleanups I sent in v1 and send them when I
get a bit more time. But the regression-fix is more important, so here's
that.

I did some more tests on it than on v1, including some heavy parallel I/O
on the same blk-queue using both BSG and the normal SCSI-stack at the same
time (throwing some intentional bad commands in it too). That seemed to
work all well enough - i.e. it didn't crash and got the expected results. I
haven't done any external error-inject, but IMO that would be beyond the
scope right now.

The fix is based on Christoph's idea, I discussed this with him off-list
already.

I rebased the series on Jens' for-next.

Reviews are more than welcome :)


Beste Grüße / Best regards,
  - Benjamin Block

Benjamin Block (1):
  bsg-lib: fix kernel panic resulting from missing allocation of
reply-buffer

 block/bsg-lib.c | 74 +
 include/linux/blkdev.h  |  1 -
 include/linux/bsg-lib.h |  2 ++
 3 files changed, 46 insertions(+), 31 deletions(-)

-- 
Linux on z Systems Development / IBM Systems & Technology Group
   IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer

2017-08-23 Thread Benjamin Block
Since we split the scsi_request out of struct request bsg fails to
provide a reply-buffer for the drivers. This was done via the pointer
for sense-data, that is not preallocated anymore.

Failing to allocate/assign it results in illegal dereferences because
LLDs use this pointer unquestioned.

An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:01590007 R3:0024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: 
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 65cb0100 task.stack: 65cb4000
Krnl PSW : 0704e0018000 03ff801e4156 
(zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0001 5fa9d0d0 5fa9d078 00e16866
   03ff0290 6b6b6b6b6b6b6b6b 59f78f00 000f
   593a0958 593a0958 60d88800 5ddd4c38
   58b50100 0700659cba08 03ff801e8556 659cb9a8
Krnl Code: 03ff801e4146: e3102054lg  %r1,80(%r2)
   03ff801e414c: 58402040   l   %r4,64(%r2)
  #03ff801e4150: e3502024   lg  %r5,32(%r2)
  >03ff801e4156: 50405004   st  %r4,4(%r5)
   03ff801e415a: e54c5008   mvhi8(%r5),0
   03ff801e4160: e33010280012   lt  %r3,40(%r1)
   03ff801e4166: a718fffb   lhi %r1,-5
   03ff801e416a: 1803   lr  %r0,%r3
Call Trace:
([<03ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
 [<03ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
 [<03ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
 [<009b91b6>] qdio_kick_handler+0x2ae/0x2c8
 [<009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
 [<001684c2>] tasklet_action+0x15a/0x1d8
 [<00bd28ec>] __do_softirq+0x3ec/0x848
 [<001675a4>] irq_exit+0x74/0xf8
 [<0010dd6a>] do_IRQ+0xba/0xf0
 [<00bd19e8>] io_int_handler+0x104/0x2d4
 [<001033b6>] enabled_wait+0xb6/0x188
([<0010339e>] enabled_wait+0x9e/0x188)
 [<0010396a>] arch_cpu_idle+0x32/0x50
 [<00bd0112>] default_idle_call+0x52/0x68
 [<001cd0fa>] do_idle+0x102/0x188
 [<001cd41e>] cpu_startup_entry+0x3e/0x48
 [<00118c64>] smp_start_secondary+0x11c/0x130
 [<00bd2016>] restart_int_handler+0x62/0x78
 [<>]   (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
 [<03ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]

Kernel panic - not syncing: Fatal exception in interrupt

This patch moves bsg-lib to allocate and setup struct bsg_job ahead of
time, including the allocation of a buffer for the reply-data.

This means, struct bsg_job is not allocated separately anymore, but as part
of struct request allocation - similar to struct scsi_cmd. Reflect this in
the function names that used to handle creation/destruction of struct
bsg_job.

Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Suggested-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Cc: <sta...@vger.kernel.org> #4.11+
---
 block/bsg-lib.c | 74 +
 include/linux/blkdev.h  |  1 -
 include/linux/bsg-lib.h |  2 ++
 3 files changed, 46 insertions(+), 31 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c4513b23f57a..dd56d7460cb9 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -29,26 +29,25 @@
 #include 
 
 /**
- * bsg_destroy_job - routine to teardown/delete a bsg job
+ * bsg_teardown_job - routine to teardown a bsg job
  * @job: bsg_job that is to be torn down
  */
-static void bsg_destroy_job(struct kref *kref)
+static void bsg_teardown_job(struct kref *kref)
 {
struct bsg_job *job = container_of(kref, struct bsg_job, kref);
struct request *rq = job->req;
 
-   blk_end_request_all(rq, BLK_STS_OK);
-
put_device(job->dev);   /* release reference for the request */
 
kfree(job->request_payload.sg_list);
kfree(job->reply_payload.sg_list);
-   kfree(job);
+
+   blk_end_request_all(rq, BLK_STS_OK);
 }
 
 void bsg_job_put(struct bsg_job *job)
 {
-   kref_put(>kref, bsg_destroy_job);
+   kref_p

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-14 Thread Benjamin Block
On Sun, Aug 13, 2017 at 04:39:40PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 11, 2017 at 06:01:42PM +0200, Benjamin Block wrote:
> > When the BSG interface is used with bsg-lib, and the user sends a
> > Bidirectional command - so when he gives an input- and output-buffer
> > (most users of our interface will likely do that, if they wanna get the
> > transport-level response data) - bsg will allocate two requests from the
> > queue. The first request's bio is used to map the input and the second
> > request's bio for the output (see bsg_map_hdr() in the if-statement with
> > (op == REQ_OP_SCSI_OUT && hdr->din_xfer_len)).
> > 
> > When we now allocate the full space of bsg_job, sense, dd_data for each
> > request, these will be wasted on the (linked) second request. They will
> > go unused all the time, as only the first request's bsg_job, sense and
> > dd_data is used by the LLDs and BSG itself.
> > 
> > Right now, because we don't allocate this on each request, those spaces
> > are only allocated for the first request in bsg-lib.
> > 
> > Maybe we can ignore this, if it gets to complicated, I don't wanne
> > prolong this unnecessary.
> 
> We have the same 'issue' with bidirection scsi commands - it's a side
> effect of having two request structures for these commands, and the
> only real fix would be to have a single request structure, which would
> be nice especially if we can't do it without growing struct request.
> 

Alright, I was not aware of that. That is fair then. Thx.


    Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-14 Thread Benjamin Block
Hey Christoph,

I looked over the patch in detail, see below.

> From f5b03b82df0569c035022c1c2535696186907f1a Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <h...@lst.de>
> Date: Fri, 11 Aug 2017 11:03:29 +0200
> Subject: bsg-lib: allocate sense data for each request
>
> Since we split the scsi_request out of the request the driver is supposed
> to provide storage for the sense buffer.  The bsg-lib code failed to do so,
> though and will crash anytime it is used.
>
> This patch moves bsg-lib to allocate and setup the bsg_job ahead of time,
> and allocate the sense data, which is used as reply buffer in bsg.
>
> Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
> Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
> Cc: <sta...@vger.kernel.org> #4.11+
> ---
>  block/bsg-lib.c | 51 
> +++--
>  include/linux/blkdev.h  |  1 -
>  include/linux/bsg-lib.h |  2 ++
>  3 files changed, 30 insertions(+), 24 deletions(-)
>
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index c4513b23f57a..c07333c3b785 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -37,13 +37,11 @@ static void bsg_destroy_job(struct kref *kref)
>   struct bsg_job *job = container_of(kref, struct bsg_job, kref);
>   struct request *rq = job->req;
>
> - blk_end_request_all(rq, BLK_STS_OK);
> -
>   put_device(job->dev);   /* release reference for the request */
>
>   kfree(job->request_payload.sg_list);
>   kfree(job->reply_payload.sg_list);
> - kfree(job);
> + blk_end_request_all(rq, BLK_STS_OK);

What is the reason for moving that last line? Just wondering whether
that might change the behavior somehow, although it doesn't look like it
from the code.

>  }
>
>  void bsg_job_put(struct bsg_job *job)
> @@ -100,7 +98,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
>   */
>  static void bsg_softirq_done(struct request *rq)
>  {
> - struct bsg_job *job = rq->special;
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
>
>   bsg_job_put(job);
>  }
> @@ -129,24 +127,11 @@ static int bsg_map_buffer(struct bsg_buffer *buf, 
> struct request *req)
>  static int bsg_create_job(struct device *dev, struct request *req)
>  {
>   struct request *rsp = req->next_rq;
> - struct request_queue *q = req->q;
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   struct scsi_request *rq = scsi_req(req);
> - struct bsg_job *job;
>   int ret;
>
> - BUG_ON(req->special);
> -
> - job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - req->special = job;
> - job->req = req;
> - if (q->bsg_job_size)
> - job->dd_data = (void *)[1];
> - job->request = rq->cmd;
>   job->request_len = rq->cmd_len;
> - job->reply = rq->sense;
>   job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
>* allocated */
>   if (req->bio) {
> @@ -187,7 +172,6 @@ static void bsg_request_fn(struct request_queue *q)
>  {
>   struct device *dev = q->queuedata;
>   struct request *req;
> - struct bsg_job *job;
>   int ret;
>
>   if (!get_device(dev))
> @@ -207,8 +191,7 @@ static void bsg_request_fn(struct request_queue *q)
>   continue;
>   }
>
> - job = req->special;
> - ret = q->bsg_job_fn(job);
> + ret = q->bsg_job_fn(blk_mq_rq_to_pdu(req));
>   spin_lock_irq(q->queue_lock);
>   if (ret)
>   break;
> @@ -219,6 +202,27 @@ static void bsg_request_fn(struct request_queue *q)
>   spin_lock_irq(q->queue_lock);
>  }
>
> +static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t 
> gfp)
> +{
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
> +
> + memset(job, 0, sizeof(*job));
> + job->req = req;
> + job->request = job->sreq.cmd;

That doesn't work with bsg.c if the request submitted by the user is
bigger than BLK_MAX_CDB. There is code in blk_fill_sgv4_hdr_rq() that
will reassign the req->cmd point in that case to something else.

This is maybe wrong in the same vein as my Patch 1 is. If not for the
legacy code in bsg.c, setting this here, will miss changes to that
pointer between request-allocation and job-submission.

So maybe just move this to bsg_create_job().

> + job->dd_data = job + 1;
> + job

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-11 Thread Benjamin Block
On Fri, Aug 11, 2017 at 05:35:53PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 11, 2017 at 05:32:03PM +0200, Benjamin Block wrote:
> > So when the bsg interface is used with something different than the
> > bsg-lib request queue?
> 
> Yes.
> 
> > I haven't actually thought about that (presuming
> > the bsg-lib queue was the only one being used). Fair enough, I haven't
> > completely read that code now, but that seems bad then, to reassign a
> > space allocated in someone else's request queue. 
> > 
> > That still leaves open that we now over-allocate space in bsg-lib, or?
> 
> Which space do we over-allocate?

When the BSG interface is used with bsg-lib, and the user sends a
Bidirectional command - so when he gives an input- and output-buffer
(most users of our interface will likely do that, if they wanna get the
transport-level response data) - bsg will allocate two requests from the
queue. The first request's bio is used to map the input and the second
request's bio for the output (see bsg_map_hdr() in the if-statement with
(op == REQ_OP_SCSI_OUT && hdr->din_xfer_len)).

When we now allocate the full space of bsg_job, sense, dd_data for each
request, these will be wasted on the (linked) second request. They will
go unused all the time, as only the first request's bsg_job, sense and
dd_data is used by the LLDs and BSG itself.

Right now, because we don't allocate this on each request, those spaces
are only allocated for the first request in bsg-lib.

Maybe we can ignore this, if it gets to complicated, I don't wanne
prolong this unnecessary.

> 
> > My diff tells that this was the same patch as before.
> 
> Next try:

I will have a look when I am back in the office next week.


Beste Grüße / Best regards,
  - Benjamin Block
> 
> ---
> From f5b03b82df0569c035022c1c2535696186907f1a Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <h...@lst.de>
> Date: Fri, 11 Aug 2017 11:03:29 +0200
> Subject: bsg-lib: allocate sense data for each request
> 
> Since we split the scsi_request out of the request the driver is supposed
> to provide storage for the sense buffer.  The bsg-lib code failed to do so,
> though and will crash anytime it is used.
> 
> This patch moves bsg-lib to allocate and setup the bsg_job ahead of time,
> and allocate the sense data, which is used as reply buffer in bsg.
> 
> Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
> Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
> Cc: <sta...@vger.kernel.org> #4.11+
> ---
>  block/bsg-lib.c | 51 
> +++--
>  include/linux/blkdev.h  |  1 -
>  include/linux/bsg-lib.h |  2 ++
>  3 files changed, 30 insertions(+), 24 deletions(-)
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index c4513b23f57a..c07333c3b785 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -37,13 +37,11 @@ static void bsg_destroy_job(struct kref *kref)
>   struct bsg_job *job = container_of(kref, struct bsg_job, kref);
>   struct request *rq = job->req;
> 
> - blk_end_request_all(rq, BLK_STS_OK);
> -
>   put_device(job->dev);   /* release reference for the request */
> 
>   kfree(job->request_payload.sg_list);
>   kfree(job->reply_payload.sg_list);
> - kfree(job);
> + blk_end_request_all(rq, BLK_STS_OK);
>  }
> 
>  void bsg_job_put(struct bsg_job *job)
> @@ -100,7 +98,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
>   */
>  static void bsg_softirq_done(struct request *rq)
>  {
> - struct bsg_job *job = rq->special;
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> 
>   bsg_job_put(job);
>  }
> @@ -129,24 +127,11 @@ static int bsg_map_buffer(struct bsg_buffer *buf, 
> struct request *req)
>  static int bsg_create_job(struct device *dev, struct request *req)
>  {
>   struct request *rsp = req->next_rq;
> - struct request_queue *q = req->q;
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   struct scsi_request *rq = scsi_req(req);
> - struct bsg_job *job;
>   int ret;
> 
> - BUG_ON(req->special);
> -
> - job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - req->special = job;
> - job->req = req;
> - if (q->bsg_job_size)
> - job->dd_data = (void *)[1];
> - job->request = rq->cmd;
>   job->request_len = rq->cmd_len;
> - job->reply = rq->sense;
>

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-11 Thread Benjamin Block
On Fri, Aug 11, 2017 at 04:36:49PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 11, 2017 at 03:49:29PM +0200, Benjamin Block wrote:
> > On Fri, Aug 11, 2017 at 11:14:15AM +0200, Christoph Hellwig wrote:
> > > But patch 1 still creates an additional copy of the sense data for
> > > all bsg users.
> > >
> > 
> > Huh? What additional copy? There is one reply-buffer and that is copied
> > into the user-buffer should it contain valid data. Just like in your
> > patch, neither you, nor me touches any of the copy-code. There is also
> > no changes to how the driver get their data into that buffer, it will
> > still be copied in both cases.
> 
> You're right - I misread your patch.  But that does make it worse as
> this means that with your patch we re-assign the scsi_request.sense
> pointer when using bsg.  That will lead to crashes if using the bsg
> code against e.g. a normal scsi device using bsg when that request
> later gets reused for something that is not bsg.
>

So when the bsg interface is used with something different than the
bsg-lib request queue? I haven't actually thought about that (presuming
the bsg-lib queue was the only one being used). Fair enough, I haven't
completely read that code now, but that seems bad then, to reassign a
space allocated in someone else's request queue. 

That still leaves open that we now over-allocate space in bsg-lib, or?

> 
> > 
> > > 
> > > Can you test the patch below which implements my suggestion?  Your
> > > other patches should still apply fine on top modulo minor context
> > > changes.
> > 
> > Only your patch on top of 4.13-rc4. din_xferp (D) is also empty, which is
> > not taken from the sense-buffer.
> 
> Can't parse this.
> 
> > =
> > BUG kmalloc-1024 (Not tainted): Invalid object pointer 0x4ad9e0f0
> > -
> 
> Oops - if we don't allocate the job separately we should not free it either.
> Updated patch for that below:
>

My diff tells that this was the same patch as before.


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-11 Thread Benjamin Block
d: GB   
4.13.0-rc4-bsg-regression+ #2
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Call Trace:
([<00117532>] show_stack+0x8a/0xe0)
 [<00bcbaee>] dump_stack+0x96/0xd8
 [<003cd5fc>] slab_err+0xac/0xc0
 [<003d68e4>] free_debug_processing+0x554/0x570
 [<003d69ae>] __slab_free+0xae/0x618
 [<003d7dce>] kfree+0x44e/0x4a0
 [<00859b4e>] blk_done_softirq+0x146/0x160
 [<00bf4ec0>] __do_softirq+0x3d0/0x840
 [<001662a6>] run_ksoftirqd+0x3e/0xb8
 [<001957fc>] smpboot_thread_fn+0x2f4/0x318
 [<0018f6f6>] kthread+0x166/0x178
 [<00bf3cf2>] kernel_thread_starter+0x6/0xc
 [<00bf3cec>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
FIX kmalloc-1024: Object at 0x4ad9d650 not freed
=
BUG kmalloc-1024 (Tainted: GB  ): Invalid object pointer 
0x4ad9eb90
-

INFO: Slab 0x03d1012b6600 objects=24 used=17 fp=0x4ad99548 
flags=0x3fffc008101
CPU: 2 PID: 20 Comm: ksoftirqd/2 Tainted: GB   
4.13.0-rc4-bsg-regression+ #2
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Call Trace:
([<00117532>] show_stack+0x8a/0xe0)
 [<00bcbaee>] dump_stack+0x96/0xd8
 [<003cd5fc>] slab_err+0xac/0xc0
 [<003d68e4>] free_debug_processing+0x554/0x570
 [<003d69ae>] __slab_free+0xae/0x618
 [<003d7dce>] kfree+0x44e/0x4a0
 [<00859b4e>] blk_done_softirq+0x146/0x160
 [<00bf4ec0>] __do_softirq+0x3d0/0x840
 [<001662a6>] run_ksoftirqd+0x3e/0xb8
 [<001957fc>] smpboot_thread_fn+0x2f4/0x318
 [<0018f6f6>] kthread+0x166/0x178
 [<00bf3cf2>] kernel_thread_starter+0x6/0xc
 [<00bf3cec>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
FIX kmalloc-1024: Object at 0x4ad9eb90 not freed

> 
> ---
> From 4cd32ee48e334b62b55bff0d380833b978454040 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <h...@lst.de>
> Date: Fri, 11 Aug 2017 11:03:29 +0200
> Subject: bsg-lib: allocate sense data for each request
> 
> Since we split the scsi_request out of the request the driver is supposed
> to provide storage for the sense buffer.  The bsg-lib code failed to do so,
> though and will crash anytime it is used.
> 
> This patch moves bsg-lib to allocate and setup the bsg_job ahead of time,
> and allocate the sense data, which is used as reply buffer in bsg.
> 
> Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
> Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
> Cc: <sta...@vger.kernel.org> #4.11+
> ---
>  block/bsg-lib.c | 53 
> +++--
>  include/linux/blkdev.h  |  1 -
>  include/linux/bsg-lib.h |  2 ++
>  3 files changed, 31 insertions(+), 25 deletions(-)
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index c4513b23f57a..215893dbd038 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -100,7 +100,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
>   */
>  static void bsg_softirq_done(struct request *rq)
>  {
> - struct bsg_job *job = rq->special;
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> 
>   bsg_job_put(job);
>  }
> @@ -129,26 +129,9 @@ static int bsg_map_buffer(struct bsg_buffer *buf, struct 
> request *req)
>  static int bsg_create_job(struct device *dev, struct request *req)
>  {
>   struct request *rsp = req->next_rq;
> - struct request_queue *q = req->q;
> - struct scsi_request *rq = scsi_req(req);
> - struct bsg_job *job;
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   int ret;
> 
> - BUG_ON(req->special);
> -
> - job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - req->special = job;
> - job->req = req;
> - if (q->bsg_job_size)
> - job->dd_data = (void *)[1];
> - job->request = rq->cmd;
> - job->request_len = rq->cmd_len;
> - job->reply = rq->sense;
> - job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
> -  * allocated */
>   if (req->bio) {
>   ret = bsg_map_buffer(>request_payload, req);
>   if (ret)
> @@ -187,7 +170,6 @@ static void bsg_request_fn(struct request_queue *q)
>  {
>   struct device *dev = q->qu

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-10 Thread Benjamin Block
On Fri, Aug 11, 2017 at 12:10:38AM +0200, Benjamin Block wrote:
> On Thu, Aug 10, 2017 at 11:32:17AM +0200, Christoph Hellwig wrote:
> > We can't use an on-stack buffer for the sense data, as drivers will
> > dma to it.  So we should reuse the SCSI init_rq_fn() for the BSG
> > queues and/or implement the same scheme.
> > 
> 
...
> 
>  struct sg_io_v4
>  +--+
>  |  |  
>  | request>++
>  |   + _len |  ||
>  |   (A)|  | BSG Request|
>  |  |  | e.g. struct fc_bsg_request | Depends on BSG 
> implementation
>  |  |  || FC vs. iSCSI vs. ...
>  |  |  ++
>  |  |
>  | response--->++ Used as _Output_
>  |   + max_len  |  || User doesn't initialize
>  |   (B)|  | BSG Reply  | User provides (optional)
>  |  |  | e.g. struct fc_bsg_reply   |   memory; May be NULL.
>  |  |  ||
>  |  |  ++
>  |  |
>  | dout_xferp->+---+ Stuff send on the wire by
>  |   + _len |  |   |   the LLD
>  |   (C)|  | Transport Protocol IU | Aligned on PAGE_SIZE
>  |  |  | e.g. FC-GS-7 CT_IU|
>  |  |  |   |
>  |  |  +---+
>  |  |
>  | din_xferp-->+---+ Buffer for response data by
>  |   + _len |  |   |   the LLD
>  |   (D)|  | Transport Protocol IU | Aligned on PAGE_SIZE
>  |  |  | e.g. FC-GS-7 CT_IU|
>  |  |  |   |
>  |  |  +---+
>  +--+
> 
...
> 
>  struct request (E)
>  +--+
>  |  |  struct scsi_request
>  | scsi_request--->+-+
>  |  |  | |
>  |  |  | cmd-> Copy of (A)
>  |  |  |  + _len | Space in struct or kzalloc
>  |  |  |  (G)|
>  |  |  | |
>  |  |  | sense---> Space for BSG Reply
>  |  |  |  + _len | Same Data-Structure as (B)
>  |  |  |  (H)| NOT actually pointer (B)
>  |  |  | | 'reply_buffer' in my patch 
>  |  |  +-+
>  |  |
>  | bio> Mapped via blk_rq_map_user() to (C) dout_xferp
>  |  |
>  | next_rq-+
>  |  |  |
>  +--+  |
>|
>  struct request (F)|(if used)
>  +--+<-+
>  |  |
>  | scsi_request---> Unused here
>  |  |
>  | bio> Mapped via blk_rq_map_user() to (D) din_xferp
>  |  |
>  +--+
> 
...
> 
>  struct bsg_job
>  +-+
>  | |
>  | request---> (G) scsi_request->cmd -> Copy of (A)
>  |   + _len|   e.g. struct fc_bsg_request
>  | |
>  | reply-> (H) scsi_request->sense -> 'reply_buffer'
>  |   + _len|   e.g. struct fc_bsg_reply
>  | |
>  | request_payload---> struct scatterlist ... map (E)->bio
>  |   + _len|
>  |   (I)   |
>  | |
>  | reply_payload-> struct scatterlist ... map (F)->bio
>  |   + _len|
>  |   (J)   |
>  | |
>  +-+
> 

> 
> This worked till it broke. Right now every driver that tries to access
> (H) will panic the system, or cause very undefined behavior. I suspect
> no driver right now tries to do any DMA into (H); before the regression,
> this has been also an on-stack variable (I suspect since BSG was
> introduced, haven't checked though).
> 
> The asymmetries between the first struct request (E) and the following
> (F) also makes it hard to use the same scheme as in other drivers, where
> init_rq_fn() gets to initialize each request in the same'ish way. Or?
> Just looking at it right now, this would require some bigger rework that
> is not appropriate for a stable bug-fix.
> 

Just some more brain-dump here.

One more problem for direct DMA into (H) in the current BSG setup is
probably, that the transport classes have each their own private format
for the BSG reply (

Re: [RFC PATCH 5/6] bsg: reduce unnecessary arguments for bsg_map_hdr()

2017-08-10 Thread Benjamin Block
On Thu, Aug 10, 2017 at 11:35:31AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 09, 2017 at 04:11:19PM +0200, Benjamin Block wrote:
> > Since struct bsg_command is now used in every calling case, we don't
> > need separation of arguments anymore that are contained in the same
> > bsg_command.
> > 
> > Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
> > ---
> >  block/bsg.c | 13 ++---
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/block/bsg.c b/block/bsg.c
> > index 8517361a9b3f..6ee2ffca808a 100644
> > --- a/block/bsg.c
> > +++ b/block/bsg.c
> > @@ -212,11 +212,12 @@ bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
> >   * map sg_io_v4 to a request.
> >   */
> >  static struct request *
> > -bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t 
> > has_write_perm,
> > -   u8 *reply_buffer)
> > +bsg_map_hdr(struct bsg_device *bd, struct bsg_command *bc,
> > +   fmode_t has_write_perm)
> 
> I wish we could just rename the argument to mode and pass on the
> whole file->f_mode while you are cleaning up this code.  That should
> be a separate patch, though.
> 

Hmm, I did a quick pass through the code and the only place this seems
to be used, is to pass it to blk_verify_command() if the subcommand used
in the BSG request is a SCSI Command. And this has the same semantics.
So I guess this would require adjustments to the whole stack, as this is
also used from the 'normal' SG side of the world.



Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [RFC PATCH 4/6] bsg: refactor ioctl to use regular BSG-command infrastructure for SG_IO

2017-08-10 Thread Benjamin Block
On Thu, Aug 10, 2017 at 10:24:56AM +0200, Johannes Thumshirn wrote:
> On Wed, Aug 09, 2017 at 04:11:18PM +0200, Benjamin Block wrote:
> > +   return 0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL);
> 
> return !(bc->hdr.flags & BSG_FLAG_Q_AT_TAIL); and make the function return
> bool? I have to admit, this is the 1st time I have seen the above construct.
> 

Hmm yeah, odd. To be honest, I just copied the expression so that it is
obvious that no behavior changed, but just the place. I'll replace it
with what you suggest.



Beste Grüße / Best regards,
      - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-10 Thread Benjamin Block
 on (H), the LLD fills in some information for
accounting and such. In case of FC, there is also some
protocol-information, but this is by no means a standard IU.

This is then passed up the stack pretty quick and if the user passed
something with (B), data from (H) is copied into (B). But this is
optional. The main payload is transferred to the user via (J) which is a
remap of (D).

So this is my understanding here. (I don't wanna say that this is
necessarily completely correct ;-), pleas correct me, if I am wrong. The
lack of proper documentation is also not helping at all.)

This worked till it broke. Right now every driver that tries to access
(H) will panic the system, or cause very undefined behavior. I suspect
no driver right now tries to do any DMA into (H); before the regression,
this has been also an on-stack variable (I suspect since BSG was
introduced, haven't checked though).

The asymmetries between the first struct request (E) and the following
(F) also makes it hard to use the same scheme as in other drivers, where
init_rq_fn() gets to initialize each request in the same'ish way. Or?
Just looking at it right now, this would require some bigger rework that
is not appropriate for a stable bug-fix.

I think it would be best if we fix the possible panics every user of
this is gonna experience and after that we can still think about
improving it further beyond what the rest of the patch set already does,
if that is necessary.


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-09 Thread Benjamin Block
In contrast to the normal SCSI-lib, the BSG block-queue doesn't make use of
any extra init_rq_fn() to make additional allocations during
request-creation, and the request sense-pointer is not used to transport
SCSI sense data, but is used as backing for the bsg_job->reply pointer;
that in turn is used in the LLDs to store protocol IUs or similar stuff.
This 're-purposing' of the sense-pointer is done in the BSG blk-lib
(bsg_create_job()), during the queue-processing.

Failing to allocate/assign it results in illegal dereferences because LLDs
use this pointer unquestioned, as can be seen in the various
BSG-implementations:

drivers/scsi/libfc/fc_lport.c:  fc_lport_bsg_request()
drivers/scsi/qla2xxx/qla_bsg.c: qla24xx_bsg_request()
drivers/scsi/qla4xxx/ql4_bsg.c: qla4xxx_process_vendor_specific()
drivers/s390/scsi/zfcp_fc.c:zfcp_fc_ct_els_job_handler()
...

An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:01590007 R3:0024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: 
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 65cb0100 task.stack: 65cb4000
Krnl PSW : 0704e0018000 03ff801e4156 
(zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0001 5fa9d0d0 5fa9d078 00e16866
   03ff0290 6b6b6b6b6b6b6b6b 59f78f00 000f
   593a0958 593a0958 60d88800 5ddd4c38
   58b50100 0700659cba08 03ff801e8556 659cb9a8
Krnl Code: 03ff801e4146: e3102054lg  %r1,80(%r2)
   03ff801e414c: 58402040   l   %r4,64(%r2)
  #03ff801e4150: e3502024   lg  %r5,32(%r2)
  >03ff801e4156: 50405004   st  %r4,4(%r5)
   03ff801e415a: e54c5008   mvhi8(%r5),0
   03ff801e4160: e33010280012   lt  %r3,40(%r1)
   03ff801e4166: a718fffb   lhi %r1,-5
   03ff801e416a: 1803   lr  %r0,%r3
Call Trace:
([<03ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
 [<03ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
 [<03ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
 [<009b91b6>] qdio_kick_handler+0x2ae/0x2c8
 [<009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
 [<001684c2>] tasklet_action+0x15a/0x1d8
 [<00bd28ec>] __do_softirq+0x3ec/0x848
 [<001675a4>] irq_exit+0x74/0xf8
 [<0010dd6a>] do_IRQ+0xba/0xf0
 [<00bd19e8>] io_int_handler+0x104/0x2d4
 [<001033b6>] enabled_wait+0xb6/0x188
([<0010339e>] enabled_wait+0x9e/0x188)
 [<0010396a>] arch_cpu_idle+0x32/0x50
 [<00bd0112>] default_idle_call+0x52/0x68
 [<001cd0fa>] do_idle+0x102/0x188
 [<001cd41e>] cpu_startup_entry+0x3e/0x48
 [<00118c64>] smp_start_secondary+0x11c/0x130
 [<00bd2016>] restart_int_handler+0x62/0x78
 [<>]   (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
 [<03ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]

Kernel panic - not syncing: Fatal exception in interrupt

To prevent this, allocate a buffer when the BSG blk-request is setup, and
before it is queued for LLD processing.

Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Cc: <sta...@vger.kernel.org> #4.11+
---
 block/bsg.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 37663b664666..285b1b8126c3 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -74,6 +74,8 @@ static int bsg_major;
 
 static struct kmem_cache *bsg_cmd_cachep;
 
+#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
+
 /*
  * our internal command type
  */
@@ -85,6 +87,7 @@ struct bsg_command {
struct bio *bidi_bio;
int err;
struct sg_io_v4 hdr;
+   u8 reply_buffer[BSG_COMMAND_REPLY_BUFFERSIZE];
 };
 
 static void bsg_free_command(struct bsg_command *bc)
@@ -137,7 +140,7 @@ static inline struct hlist_head *bsg_dev_idx_hash(int index)
 
 static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq,
struct sg_io_v4 *hdr, struct bsg_devic

[RFC PATCH 0/6] bsg: fix regression resulting in panics when sending commands via BSG and some sanity cleanups

2017-08-09 Thread Benjamin Block
Hello all,

Steffen noticed recently that we have a regression in the BSG code that
prevents us from sending any traffic over this interface. After I
researched this a bit, it turned out that this affects not only zFCP, but
likely all LLDs that implements the BSG API. This was introduced in 4.11
(details in Patch 1 of this series).

I imagine the regression happened because of some very "unfortunate"
variable- namings. I can not fix them, as they involve the base struct
request, but I tried to add some cleanups that should make the
relationships between stuff more visible in the future I hope.

Patch 1   - Regression Fix; Also tagged for stable
Patch 2-6 - Cleanups

I tagged this as RFC. Patches 2-6 are a 'nice to have' IMO, Patch 1 is
obviously necessary, and if it is OK, I can re-send it separately if
necessary. If you don't like the changes in the other patches, I don't mind
dropping them.

I am not sure about Patch 4. It certainly works, but it changes
user-visible behavior, in what I believe is within the behavior described
by the SG interface. It makes the different methods of how BSG passes
commands down to the LLDs more conform with each other - even though I
can't make them the exact same. More details in the patch description.

I rebased the series on Jens' for-next and I have function-tested the
series on s390x's zFCP with the tools provided in the zfcp HBA API library
(https://www.ibm.com/developerworks/linux/linux390/zfcp-hbaapi.html) and
some custom code to test the read/write interface of BSG.

Reviews are more than welcome :)


Beste Grüße / Best regards,
  - Benjamin Block

Benjamin Block (6):
  bsg: fix kernel panic resulting from missing allocation of a
reply-buffer
  bsg: assign sense_len instead of fixed SCSI_SENSE_BUFFERSIZE
  bsg: scsi-transport: add compile-tests to prevent reply-buffer
overflows
  bsg: refactor ioctl to use regular BSG-command infrastructure for
SG_IO
  bsg: reduce unecessary arguments for bsg_map_hdr()
  bsg: reduce unecessary arguments for blk_complete_sgv4_hdr_rq()

 block/bsg-lib.c |  4 +-
 block/bsg.c | 90 ++---
 drivers/scsi/scsi_transport_fc.c|  3 ++
 drivers/scsi/scsi_transport_iscsi.c |  3 ++
 include/linux/bsg-lib.h |  2 +
 5 files changed, 65 insertions(+), 37 deletions(-)

--
2.12.2



[RFC PATCH 3/6] bsg: scsi-transport: add compile-tests to prevent reply-buffer overflows

2017-08-09 Thread Benjamin Block
The BSG implementations use the bsg_job's reply buffer as storage for their
own custom reply structures (e.g.: struct fc_bsg_reply or
struct iscsi_bsg_reply). The size of bsg_job's reply buffer and those of
the implementations is not dependent in any way the compiler can currently
check.

To make it easier to notice accidental violations add an explicit compile-
time check that tests whether the implementations' reply buffer is at most
as large as bsg_job's.

To do so, we have to move the size-define from bsg.c to a common header.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 3 +--
 drivers/scsi/scsi_transport_fc.c| 3 +++
 drivers/scsi/scsi_transport_iscsi.c | 3 +++
 include/linux/bsg-lib.h | 2 ++
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 285b1b8126c3..b924f1c23c58 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -74,8 +75,6 @@ static int bsg_major;
 
 static struct kmem_cache *bsg_cmd_cachep;
 
-#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
-
 /*
  * our internal command type
  */
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 892fbd9800d9..ce6654b5d329 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3736,6 +3736,9 @@ static int fc_bsg_dispatch(struct bsg_job *job)
 {
struct Scsi_Host *shost = fc_bsg_to_shost(job);
 
+   BUILD_BUG_ON(sizeof(struct fc_bsg_reply) >
+BSG_COMMAND_REPLY_BUFFERSIZE);
+
if (scsi_is_fc_rport(job->dev))
return fc_bsg_rport_dispatch(shost, job);
else
diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index a424eaeafeb0..4e021c949ad7 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1483,6 +1483,9 @@ static int iscsi_bsg_host_dispatch(struct bsg_job *job)
int cmdlen = sizeof(uint32_t);  /* start with length of msgcode */
int ret;
 
+   BUILD_BUG_ON(sizeof(struct iscsi_bsg_reply) >
+BSG_COMMAND_REPLY_BUFFERSIZE);
+
/* check if we have the msgcode value at least */
if (job->request_len < sizeof(uint32_t)) {
ret = -ENOMSG;
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index e34dde2da0ef..85d7c7678cc6 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -25,6 +25,8 @@
 
 #include 
 
+#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
+
 struct request;
 struct device;
 struct scatterlist;
-- 
2.12.2



[RFC PATCH 6/6] bsg: reduce unnecessary arguments for blk_complete_sgv4_hdr_rq()

2017-08-09 Thread Benjamin Block
Since struct bsg_command is now used in every calling case, we don't
need separation of arguments anymore that are contained in the same
bsg_command.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 6ee2ffca808a..09f767cdf816 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -399,13 +399,14 @@ static struct bsg_command *bsg_get_done_cmd(struct 
bsg_device *bd)
return bc;
 }
 
-static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
-   struct bio *bio, struct bio *bidi_bio)
+static int blk_complete_sgv4_hdr_rq(struct bsg_command *bc)
 {
+   struct sg_io_v4 *hdr = >hdr;
+   struct request *rq = bc->rq;
struct scsi_request *req = scsi_req(rq);
int ret = 0;
 
-   dprintk("rq %p bio %p 0x%x\n", rq, bio, req->result);
+   dprintk("rq %p bio %p 0x%x\n", rq, bc->bio, req->result);
/*
 * fill in all the output members
 */
@@ -432,7 +433,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
if (rq->next_rq) {
hdr->dout_resid = req->resid_len;
hdr->din_resid = scsi_req(rq->next_rq)->resid_len;
-   blk_rq_unmap_user(bidi_bio);
+   blk_rq_unmap_user(bc->bidi_bio);
blk_put_request(rq->next_rq);
} else if (rq_data_dir(rq) == READ)
hdr->din_resid = req->resid_len;
@@ -448,7 +449,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
if (!ret && req->result < 0)
ret = req->result;
 
-   blk_rq_unmap_user(bio);
+   blk_rq_unmap_user(bc->bio);
scsi_req_free_cmd(req);
blk_put_request(rq);
 
@@ -507,8 +508,7 @@ static int bsg_complete_all_commands(struct bsg_device *bd)
if (IS_ERR(bc))
break;
 
-   tret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-   bc->bidi_bio);
+   tret = blk_complete_sgv4_hdr_rq(bc);
if (!ret)
ret = tret;
 
@@ -542,8 +542,7 @@ __bsg_read(char __user *buf, size_t count, struct 
bsg_device *bd,
 * after completing the request. so do that here,
 * bsg_complete_work() cannot do that for us
 */
-   ret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-  bc->bidi_bio);
+   ret = blk_complete_sgv4_hdr_rq(bc);
 
if (copy_to_user(buf, >hdr, sizeof(bc->hdr)))
ret = -EFAULT;
@@ -944,8 +943,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
blk_execute_rq(bd->queue, NULL, bc->rq, at_head);
bc->hdr.duration = jiffies_to_msecs(jiffies - bc->hdr.duration);
 
-   ret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-  bc->bidi_bio);
+   ret = blk_complete_sgv4_hdr_rq(bc);
 
if (copy_to_user(uarg, >hdr, sizeof(bc->hdr)))
ret = -EFAULT;
-- 
2.12.2



[RFC PATCH 5/6] bsg: reduce unnecessary arguments for bsg_map_hdr()

2017-08-09 Thread Benjamin Block
Since struct bsg_command is now used in every calling case, we don't
need separation of arguments anymore that are contained in the same
bsg_command.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 8517361a9b3f..6ee2ffca808a 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -212,11 +212,12 @@ bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
  * map sg_io_v4 to a request.
  */
 static struct request *
-bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t 
has_write_perm,
-   u8 *reply_buffer)
+bsg_map_hdr(struct bsg_device *bd, struct bsg_command *bc,
+   fmode_t has_write_perm)
 {
struct request_queue *q = bd->queue;
struct request *rq, *next_rq = NULL;
+   struct sg_io_v4 *hdr = >hdr;
int ret;
unsigned int op, dxfer_len;
void __user *dxferp = NULL;
@@ -244,7 +245,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, 
fmode_t has_write_perm,
if (IS_ERR(rq))
return rq;
 
-   ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, reply_buffer,
+   ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, bc->reply_buffer,
   has_write_perm);
if (ret)
goto out;
@@ -633,8 +634,7 @@ static int __bsg_write(struct bsg_device *bd, const char 
__user *buf,
/*
 * get a request, fill in the blanks, and add to request queue
 */
-   rq = bsg_map_hdr(bd, >hdr, has_write_perm,
-bc->reply_buffer);
+   rq = bsg_map_hdr(bd, bc, has_write_perm);
if (IS_ERR(rq)) {
ret = PTR_ERR(rq);
rq = NULL;
@@ -934,8 +934,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
goto sg_io_out;
}
 
-   bc->rq = bsg_map_hdr(bd, >hdr, file->f_mode & FMODE_WRITE,
-bc->reply_buffer);
+   bc->rq = bsg_map_hdr(bd, bc, file->f_mode & FMODE_WRITE);
if (IS_ERR(bc->rq)) {
ret = PTR_ERR(bc->rq);
goto sg_io_out;
-- 
2.12.2



[RFC PATCH 4/6] bsg: refactor ioctl to use regular BSG-command infrastructure for SG_IO

2017-08-09 Thread Benjamin Block
Before, the SG_IO ioctl for BSG devices used to use its own on-stack data
to assemble and send the specified command. The read and write calls use
their own infrastructure build around the struct bsg_command and a custom
slab-pool for that.

Rafactor this, so that SG_IO ioctl also uses struct bsg_command and the
surrounding infrastructure. This way we use global defines like
BSG_COMMAND_REPLY_BUFFERSIZE only in one place, rather than two, the
handling of BSG commands gets more consistent, and it reduces some code-
duplications (the bio-pointer handling). It also reduces the stack
footprint by 320 to 384 bytes (depending on how large pointers are), and
uses the active slab-implementation for efficient alloc/free.

There are two other side-effects:
 - the 'duration' field in the sg header is now also filled for SG_IO
   calls, unlike before were it was always zero.
 - the BSG device queue-limit is also applied to SG_IO, unlike before were
   you could flood one BSG device with as many commands as you'd like. If
   one can trust older SG documentation this limit is applicable to either
   normal writes, or SG_IO calls; but this wasn't enforced before for
   SG_IO.

A complete unification is not possible, as it then would also enqueue SG_IO
commands in the BGS devices's command list, but this is only for the read-
and write-calls.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 60 
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index b924f1c23c58..8517361a9b3f 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -320,6 +320,17 @@ static void bsg_rq_end_io(struct request *rq, blk_status_t 
status)
wake_up(>wq_done);
 }
 
+static int bsg_prep_add_command(struct bsg_command *bc, struct request *rq)
+{
+   bc->rq = rq;
+   bc->bio = rq->bio;
+   if (rq->next_rq)
+   bc->bidi_bio = rq->next_rq->bio;
+   bc->hdr.duration = jiffies;
+
+   return 0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL);
+}
+
 /*
  * do final setup of a 'bc' and submit the matching 'rq' to the block
  * layer for io
@@ -327,16 +338,11 @@ static void bsg_rq_end_io(struct request *rq, 
blk_status_t status)
 static void bsg_add_command(struct bsg_device *bd, struct request_queue *q,
struct bsg_command *bc, struct request *rq)
 {
-   int at_head = (0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL));
+   int at_head = bsg_prep_add_command(bc, rq);
 
/*
 * add bc command to busy queue and submit rq for io
 */
-   bc->rq = rq;
-   bc->bio = rq->bio;
-   if (rq->next_rq)
-   bc->bidi_bio = rq->next_rq->bio;
-   bc->hdr.duration = jiffies;
spin_lock_irq(>lock);
list_add_tail(>list, >busy_list);
spin_unlock_irq(>lock);
@@ -916,31 +922,37 @@ static long bsg_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg)
return scsi_cmd_ioctl(bd->queue, NULL, file->f_mode, cmd, uarg);
}
case SG_IO: {
-   struct request *rq;
-   u8 reply_buffer[BSG_COMMAND_REPLY_BUFFERSIZE] = { 0, };
-   struct bio *bio, *bidi_bio = NULL;
-   struct sg_io_v4 hdr;
+   struct bsg_command *bc;
int at_head;
 
-   if (copy_from_user(, uarg, sizeof(hdr)))
-   return -EFAULT;
+   bc = bsg_alloc_command(bd);
+   if (IS_ERR(bc))
+   return PTR_ERR(bc);
 
-   rq = bsg_map_hdr(bd, , file->f_mode & FMODE_WRITE,
-reply_buffer);
-   if (IS_ERR(rq))
-   return PTR_ERR(rq);
+   if (copy_from_user(>hdr, uarg, sizeof(bc->hdr))) {
+   ret = -EFAULT;
+   goto sg_io_out;
+   }
 
-   bio = rq->bio;
-   if (rq->next_rq)
-   bidi_bio = rq->next_rq->bio;
+   bc->rq = bsg_map_hdr(bd, >hdr, file->f_mode & FMODE_WRITE,
+bc->reply_buffer);
+   if (IS_ERR(bc->rq)) {
+   ret = PTR_ERR(bc->rq);
+   goto sg_io_out;
+   }
 
-   at_head = (0 == (hdr.flags & BSG_FLAG_Q_AT_TAIL));
-   blk_execute_rq(bd->queue, NULL, rq, at_head);
-   ret = blk_complete_sgv4_hdr_rq(rq, , bio, bidi_bio);
+   at_head = bsg_prep_add_command(bc, bc->rq);
+   blk_execute_rq(bd->queue, NULL, bc->rq, at_head);
+   bc->hdr.duration = jiffies_to_msecs(jiffies - bc->hdr.duration);
 
-   if (copy_to_user(uarg, , sizeof(hdr)))
-  

[RFC PATCH 2/6] bsg: assign sense_len instead of fixed SCSI_SENSE_BUFFERSIZE

2017-08-09 Thread Benjamin Block
We do set rq->sense_len when we assigne the reply-buffer in
blk_fill_sgv4_hdr_rq(). No point in possibly deviating from this value
later on.

bsg-lib.h specifies:
unsigned int reply_len;
/*
 * On entry : reply_len indicates the buffer size allocated for
 * the reply.
 *
 * ...
 */

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg-lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c4513b23f57a..c7c2c6bbb5ae 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -147,8 +147,8 @@ static int bsg_create_job(struct device *dev, struct 
request *req)
job->request = rq->cmd;
job->request_len = rq->cmd_len;
job->reply = rq->sense;
-   job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
-* allocated */
+   job->reply_len = rq->sense_len;
+
if (req->bio) {
ret = bsg_map_buffer(>request_payload, req);
if (ret)
-- 
2.12.2



Re: [PATCH 00/22] zfcp fixes and cleanups

2017-08-08 Thread Benjamin Block
On Mon, Aug 07, 2017 at 01:19:11PM -0400, Martin K. Petersen wrote:
> > here is a series of (important) fixes and some additional cleanups for
> > the zfcp driver. Our fixes are marked for stable accordingly.
> >
> > Patches 01 - 04 are cleanups and external patches (also cleanups) that we
> >   & 13 - 22 had queued for quite some time now
> >
> > Patches 05 - 12 are driver fixes that are all marked for stable as well
> 
> Given the ordering, it's a bit unclear how you want me to apply these.
> 
> There are a bunch of fixes you labeled as "important" but they are not
> at the beginning of the series. And I am not sure if these have
> dependencies on the preceding cleanup patches?
> 
> If you have fixes you would like to get into 4.13, please post those
> first as a separate series. And clearly indicate that that's the branch
> you are aiming for.
> 
> If you are OK with everything going into 4.14, then let me know. In that
> case you don't have to shuffle the patches around.
> 

I think it is OK for us, if this goes into 4.14 as one series. So if it
is OK with you, just apply the complete series for that.

Thanks.



    Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



[PATCH 22/22] zfcp: early returns for traces disabled via level

2017-07-28 Thread Benjamin Block
From: Martin Peschke <mpesc...@linux.vnet.ibm.com>

This patch adds early checks to avoid burning CPU cycles on
the assembly of trace entries which would be skipped anyway.

Introduce a static const variable to keep the trace level to check with
debug_level_enabled() in sync with the actual trace emit with
debug_event(). In order not to refactor the SAN tracing too much,
simply use a define instead.

This change is only for the non / semi hot paths,
while the actual (I/O) hot path was already improved earlier:
zfcp_dbf_scsi() is already guarded by its only caller _zfcp_dbf_scsi()
since commit dcd20e2316cd ("[SCSI] zfcp: Only collect SCSI debug data for
matching trace levels").
zfcp_dbf_hba_fsf_res() is already guarded by its only caller
zfcp_dbf_hba_fsf_response() since commit 2e261af84cdb ("[SCSI] zfcp: Only
collect FSF/HBA debug data for matching trace levels").

Signed-off-by: Martin Peschke <mpesc...@linux.vnet.ibm.com>
[ma...@linux.vnet.ibm.com: rebase, reword, default level 3 branch prediction]
Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_dbf.c | 54 +---
 1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index 484da0b2d678..8227076c9cbb 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -113,8 +113,12 @@ void zfcp_dbf_hba_fsf_uss(char *tag, struct zfcp_fsf_req 
*req)
struct zfcp_dbf *dbf = req->adapter->dbf;
struct fsf_status_read_buffer *srb = req->data;
struct zfcp_dbf_hba *rec = >hba_buf;
+   static int const level = 2;
unsigned long flags;
 
+   if (unlikely(!debug_level_enabled(dbf->hba, level)))
+   return;
+
spin_lock_irqsave(>hba_lock, flags);
memset(rec, 0, sizeof(*rec));
 
@@ -142,7 +146,7 @@ void zfcp_dbf_hba_fsf_uss(char *tag, struct zfcp_fsf_req 
*req)
zfcp_dbf_pl_write(dbf, srb->payload.data, rec->pl_len,
  "fsf_uss", req->req_id);
 log:
-   debug_event(dbf->hba, 2, rec, sizeof(*rec));
+   debug_event(dbf->hba, level, rec, sizeof(*rec));
spin_unlock_irqrestore(>hba_lock, flags);
 }
 
@@ -156,8 +160,12 @@ void zfcp_dbf_hba_bit_err(char *tag, struct zfcp_fsf_req 
*req)
struct zfcp_dbf *dbf = req->adapter->dbf;
struct zfcp_dbf_hba *rec = >hba_buf;
struct fsf_status_read_buffer *sr_buf = req->data;
+   static int const level = 1;
unsigned long flags;
 
+   if (unlikely(!debug_level_enabled(dbf->hba, level)))
+   return;
+
spin_lock_irqsave(>hba_lock, flags);
memset(rec, 0, sizeof(*rec));
 
@@ -169,7 +177,7 @@ void zfcp_dbf_hba_bit_err(char *tag, struct zfcp_fsf_req 
*req)
memcpy(>u.be, _buf->payload.bit_error,
   sizeof(struct fsf_bit_error_payload));
 
-   debug_event(dbf->hba, 1, rec, sizeof(*rec));
+   debug_event(dbf->hba, level, rec, sizeof(*rec));
spin_unlock_irqrestore(>hba_lock, flags);
 }
 
@@ -186,8 +194,12 @@ void zfcp_dbf_hba_def_err(struct zfcp_adapter *adapter, 
u64 req_id, u16 scount,
struct zfcp_dbf *dbf = adapter->dbf;
struct zfcp_dbf_pay *payload = >pay_buf;
unsigned long flags;
+   static int const level = 1;
u16 length;
 
+   if (unlikely(!debug_level_enabled(dbf->pay, level)))
+   return;
+
if (!pl)
return;
 
@@ -202,7 +214,7 @@ void zfcp_dbf_hba_def_err(struct zfcp_adapter *adapter, u64 
req_id, u16 scount,
 
while (payload->counter < scount && (char *)pl[payload->counter]) {
memcpy(payload->data, (char *)pl[payload->counter], length);
-   debug_event(dbf->pay, 1, payload, zfcp_dbf_plen(length));
+   debug_event(dbf->pay, level, payload, zfcp_dbf_plen(length));
payload->counter++;
}
 
@@ -217,15 +229,19 @@ void zfcp_dbf_hba_basic(char *tag, struct zfcp_adapter 
*adapter)
 {
struct zfcp_dbf *dbf = adapter->dbf;
struct zfcp_dbf_hba *rec = >hba_buf;
+   static int const level = 1;
unsigned long flags;
 
+   if (unlikely(!debug_level_enabled(dbf->hba, level)))
+   return;
+
spin_lock_irqsave(>hba_lock, flags);
memset(rec, 0, sizeof(*rec));
 
memcpy(rec->tag, tag, ZFCP_DBF_TAG_LEN);
rec->id = ZFCP_DBF_HBA_BASIC;
 
-   debug_event(dbf->hba, 1, rec, sizeof(*rec));
+   debug_event(dbf->hba, level, rec, sizeof(*rec));
spin_unlock_irqrestore(>hba_lock, flags);
 }
 
@@ -264,9 +280,13 @@ void zfcp_dbf_rec_trig(char 

[PATCH 17/22] zfcp: fix kernel doc comment typos for struct zfcp_dbf_scsi

2017-07-28 Thread Benjamin Block
From: Steffen Maier <ma...@linux.vnet.ibm.com>

Improves commit 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing
for SCSI records.")

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_dbf.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h
index 33ccf15b63e0..3508c00458f4 100644
--- a/drivers/s390/scsi/zfcp_dbf.h
+++ b/drivers/s390/scsi/zfcp_dbf.h
@@ -208,12 +208,12 @@ enum zfcp_dbf_scsi_id {
  * @scsi_result: scsi result
  * @scsi_retries: current retry number of scsi request
  * @scsi_allowed: allowed retries
- * @fcp_rsp_info: FCP response info
+ * @fcp_rsp_info: FCP response info code
  * @scsi_opcode: scsi opcode
  * @fsf_req_id: request id of fsf request
  * @host_scribble: LLD specific data attached to SCSI request
- * @pl_len: length of paload stored as zfcp_dbf_pay
- * @fsf_rsp: response for fsf request
+ * @pl_len: length of payload stored as zfcp_dbf_pay
+ * @fcp_rsp: response for FCP request
  * @scsi_lun_64_hi: scsi device logical unit number, high part of 64 bit
  */
 struct zfcp_dbf_scsi {
-- 
2.11.2



[PATCH 05/22] zfcp: fix queuecommand for scsi_eh commands when DIX enabled

2017-07-28 Thread Benjamin Block
From: Steffen Maier <ma...@linux.vnet.ibm.com>

Since commit db007fc5e20c ("[SCSI] Command protection operation"),
scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it to
SCSI_PROT_NORMAL.
Other FCP LLDDs such as qla2xxx and lpfc shield their queuecommand()
to only access any of scsi_prot_sg...() if
(scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL).

Do the same thing for zfcp, which introduced DIX support with
commit ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for
DIF/DIX").

Otherwise, TUR SCSI commands as part of scsi_eh likely fail in zfcp,
because the regular SCSI command with DIX protection data, that scsi_eh
re-uses in scsi_send_eh_cmnd(), of course still has
(scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests to the
FCP channel hardware.

This causes scsi_eh_test_devices() to have (finish_cmds == 0)
[not SCSI device is online or not scsi_eh_tur() failed]
so regular SCSI commands, that caused / were affected by scsi_eh,
are moved to work_q and scsi_eh_test_devices() itself returns false.
In turn, it unnecessarily escalates in our case in scsi_eh_ready_devs()
beyond host reset to finally scsi_eh_offline_sdevs()
which sets affected SCSI devices offline with the following kernel message:

"kernel: sd H:0:T:L: Device offlined - not ready after error recovery"

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Fixes: ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX")
Cc: <sta...@vger.kernel.org> #2.6.36+
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_fsf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 3b69ec5e69ed..e894ec92076c 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -2257,7 +2257,8 @@ int zfcp_fsf_fcp_cmnd(struct scsi_cmnd *scsi_cmnd)
fcp_cmnd = (struct fcp_cmnd *) >qtcb->bottom.io.fcp_cmnd;
zfcp_fc_scsi_to_fcp(fcp_cmnd, scsi_cmnd, 0);
 
-   if (scsi_prot_sg_count(scsi_cmnd)) {
+   if ((scsi_get_prot_op(scsi_cmnd) != SCSI_PROT_NORMAL) &&
+   scsi_prot_sg_count(scsi_cmnd)) {
zfcp_qdio_set_data_div(qdio, >qdio_req,
   scsi_prot_sg_count(scsi_cmnd));
retval = zfcp_qdio_sbals_from_sg(qdio, >qdio_req,
-- 
2.11.2



[PATCH 01/22] zfcp: replace zfcp_qdio_sbale_count by sg_nents

2017-07-28 Thread Benjamin Block
From: LABBE Corentin <clabbe.montj...@gmail.com>

The zfcp_qdio_sbale_count function do the same work than sg_nents().
So replace it by sg_nents() for removing duplicate code.

Signed-off-by: LABBE Corentin <clabbe.montj...@gmail.com>
Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_fsf.c  |  3 +--
 drivers/s390/scsi/zfcp_qdio.h | 15 ---
 2 files changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 27ff38f839fc..3b69ec5e69ed 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -991,8 +991,7 @@ static int zfcp_fsf_setup_ct_els_sbals(struct zfcp_fsf_req 
*req,
qtcb->bottom.support.resp_buf_length =
zfcp_qdio_real_bytes(sg_resp);
 
-   zfcp_qdio_set_data_div(qdio, >qdio_req,
-   zfcp_qdio_sbale_count(sg_req));
+   zfcp_qdio_set_data_div(qdio, >qdio_req, sg_nents(sg_req));
zfcp_qdio_set_sbale_last(qdio, >qdio_req);
zfcp_qdio_set_scount(qdio, >qdio_req);
return 0;
diff --git a/drivers/s390/scsi/zfcp_qdio.h b/drivers/s390/scsi/zfcp_qdio.h
index 497cd379b0d1..85cdb82127e8 100644
--- a/drivers/s390/scsi/zfcp_qdio.h
+++ b/drivers/s390/scsi/zfcp_qdio.h
@@ -225,21 +225,6 @@ void zfcp_qdio_set_data_div(struct zfcp_qdio *qdio,
 }
 
 /**
- * zfcp_qdio_sbale_count - count sbale used
- * @sg: pointer to struct scatterlist
- */
-static inline
-unsigned int zfcp_qdio_sbale_count(struct scatterlist *sg)
-{
-   unsigned int count = 0;
-
-   for (; sg; sg = sg_next(sg))
-   count++;
-
-   return count;
-}
-
-/**
  * zfcp_qdio_real_bytes - count bytes used
  * @sg: pointer to struct scatterlist
  */
-- 
2.11.2



[PATCH 18/22] zfcp: clean up redundant code with fall through in link down SRB switch case

2017-07-28 Thread Benjamin Block
From: Martin Peschke <mpesc...@linux.vnet.ibm.com>

Signed-off-by: Martin Peschke <mpesc...@linux.vnet.ibm.com>
[ma...@linux.vnet.ibm.com: re-worded short description for more details]
Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_fsf.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index cc923c71a0fa..eefb474a9e42 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -197,8 +197,6 @@ static void zfcp_fsf_status_read_link_down(struct 
zfcp_fsf_req *req)
 
switch (sr_buf->status_subtype) {
case FSF_STATUS_READ_SUB_NO_PHYSICAL_LINK:
-   zfcp_fsf_link_down_info_eval(req, ldi);
-   break;
case FSF_STATUS_READ_SUB_FDISC_FAILED:
zfcp_fsf_link_down_info_eval(req, ldi);
break;
-- 
2.11.2



[PATCH 16/22] zfcp: use endianness conversions with common FC(P) struct fields

2017-07-28 Thread Benjamin Block
c:502:38: warning: restricted __be64 degrades to 
integer
drivers/s390/scsi/zfcp_fc.c:541:40: warning: incorrect type in assignment 
(different base types)
drivers/s390/scsi/zfcp_fc.c:541:40:expected restricted __be64 [usertype] 
adisc_wwpn
drivers/s390/scsi/zfcp_fc.c:541:40:got unsigned long long [unsigned] 
[usertype] port_name
drivers/s390/scsi/zfcp_fc.c:542:40: warning: incorrect type in assignment 
(different base types)
drivers/s390/scsi/zfcp_fc.c:542:40:expected restricted __be64 [usertype] 
adisc_wwnn
drivers/s390/scsi/zfcp_fc.c:542:40:got unsigned long long [unsigned] 
[usertype] node_name
drivers/s390/scsi/zfcp_fc.c:669:16: warning: restricted __be16 degrades to 
integer
drivers/s390/scsi/zfcp_fc.c:696:24: warning: restricted __be64 degrades to 
integer
drivers/s390/scsi/zfcp_fc.c:699:54: warning: incorrect type in argument 2 
(different base types)
drivers/s390/scsi/zfcp_fc.c:699:54:expected unsigned long long [unsigned] 
[usertype] 
drivers/s390/scsi/zfcp_fc.c:699:54:got restricted __be64 [usertype] fp_wwpn
  CC  drivers/s390/scsi/zfcp_fc.o
  CHECK   drivers/s390/scsi/zfcp_fsf.c
drivers/s390/scsi/zfcp_fsf.c:479:34: warning: incorrect type in assignment 
(different base types)
drivers/s390/scsi/zfcp_fsf.c:479:34:expected unsigned long long [unsigned] 
[usertype] port_name
drivers/s390/scsi/zfcp_fsf.c:479:34:got restricted __be64 [usertype] fl_wwpn
drivers/s390/scsi/zfcp_fsf.c:480:34: warning: incorrect type in assignment 
(different base types)
drivers/s390/scsi/zfcp_fsf.c:480:34:expected unsigned long long [unsigned] 
[usertype] node_name
drivers/s390/scsi/zfcp_fsf.c:480:34:got restricted __be64 [usertype] fl_wwnn
drivers/s390/scsi/zfcp_fsf.c:506:36: warning: incorrect type in assignment 
(different base types)
drivers/s390/scsi/zfcp_fsf.c:506:36:expected unsigned long long [unsigned] 
[usertype] peer_wwpn
drivers/s390/scsi/zfcp_fsf.c:506:36:got restricted __be64 [usertype] fl_wwpn
drivers/s390/scsi/zfcp_fsf.c:507:36: warning: incorrect type in assignment 
(different base types)
drivers/s390/scsi/zfcp_fsf.c:507:36:expected unsigned long long [unsigned] 
[usertype] peer_wwnn
drivers/s390/scsi/zfcp_fsf.c:507:36:got restricted __be64 [usertype] fl_wwnn
drivers/s390/scsi/zfcp_fc.h:269:46: warning: restricted __be32 degrades to 
integer
drivers/s390/scsi/zfcp_fc.h:270:29: error: incompatible types in comparison 
expression (different base types)

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_dbf.c |  5 +++--
 drivers/s390/scsi/zfcp_fc.c  | 46 +++-
 drivers/s390/scsi/zfcp_fc.h  | 19 +++---
 drivers/s390/scsi/zfcp_fsf.c |  8 
 4 files changed, 43 insertions(+), 35 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index 225e60d5d3fc..484da0b2d678 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -461,7 +461,7 @@ static u16 zfcp_dbf_san_res_cap_len_if_gpn_ft(char *tag,
  && reqh->ct_fs_subtype == FC_NS_SUBTYPE
  && reqh->ct_options == 0
  && reqh->_ct_resvd1 == 0
- && reqh->ct_cmd == FC_NS_GPN_FT
+ && reqh->ct_cmd == cpu_to_be16(FC_NS_GPN_FT)
  /* reqh->ct_mr_size can vary so do not match but read below */
  && reqh->_ct_resvd2 == 0
  && reqh->ct_reason == 0
@@ -481,7 +481,8 @@ static u16 zfcp_dbf_san_res_cap_len_if_gpn_ft(char *tag,
(resph->ct_cmd != cpu_to_be16(FC_FS_ACC)))
return max(FC_CT_HDR_LEN, ZFCP_DBF_SAN_MAX_PAYLOAD);
 
-   max_entries = (reqh->ct_mr_size * 4 / sizeof(struct fc_gpn_ft_resp))
+   max_entries = (be16_to_cpu(reqh->ct_mr_size) * 4 /
+  sizeof(struct fc_gpn_ft_resp))
+ 1 /* zfcp_fc_scan_ports: bytes correct, entries off-by-one
 * to account for header as 1st pseudo "entry" */;
 
diff --git a/drivers/s390/scsi/zfcp_fc.c b/drivers/s390/scsi/zfcp_fc.c
index cc3f378782b2..3e715597b739 100644
--- a/drivers/s390/scsi/zfcp_fc.c
+++ b/drivers/s390/scsi/zfcp_fc.c
@@ -260,7 +260,8 @@ static void zfcp_fc_incoming_rscn(struct zfcp_fsf_req 
*fsf_req)
page = (struct fc_els_rscn_page *) head;
 
/* see FC-FS */
-   no_entries = head->rscn_plen / sizeof(struct fc_els_rscn_page);
+   no_entries = be16_to_cpu(head->rscn_plen) /
+   sizeof(struct fc_els_rscn_page);
 
for (i = 1; i < no_entries; i++) {
/* skip head and start with 1st element */
@@ -296,7 +297,7 @@ static void zfcp_fc_incoming_plogi(struct zfcp_fsf_req *req)
 
status_buffer = (struct fsf_status_read_buffer *) req->data;

[PATCH 20/22] zfcp: clean up a member of struct zfcp_qdio that was assigned but never used

2017-07-28 Thread Benjamin Block
From: Martin Peschke <mpesc...@linux.vnet.ibm.com>

v2.6.38 commit a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing
for HBA records.")
dropped trace information previously introduced with
v2.6.27 commit c3baa9a26c5a ("[SCSI] zfcp: Add information about interrupt
to trace.")
but kept and needlessly assigned a now no longer used struct field.

Signed-off-by: Martin Peschke <mpesc...@linux.vnet.ibm.com>
[ma...@linux.vnet.ibm.com: reword, added git history]
Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_fsf.c  | 1 -
 drivers/s390/scsi/zfcp_qdio.h | 2 --
 2 files changed, 3 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index eefb474a9e42..69d1dc3ec79d 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -2394,7 +2394,6 @@ void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int 
sbal_idx)
  req_id, dev_name(>ccw_device->dev));
}
 
-   fsf_req->qdio_req.sbal_response = sbal_idx;
zfcp_fsf_req_complete(fsf_req);
 
if (likely(sbale->eflags & SBAL_EFLAGS_LAST_ENTRY))
diff --git a/drivers/s390/scsi/zfcp_qdio.h b/drivers/s390/scsi/zfcp_qdio.h
index 85cdb82127e8..7f647a90c750 100644
--- a/drivers/s390/scsi/zfcp_qdio.h
+++ b/drivers/s390/scsi/zfcp_qdio.h
@@ -54,7 +54,6 @@ struct zfcp_qdio {
  * @sbal_last: last sbal for this request
  * @sbal_limit: last possible sbal for this request
  * @sbale_curr: current sbale at creation of this request
- * @sbal_response: sbal used in interrupt
  * @qdio_outb_usage: usage of outbound queue
  */
 struct zfcp_qdio_req {
@@ -64,7 +63,6 @@ struct zfcp_qdio_req {
u8  sbal_last;
u8  sbal_limit;
u8  sbale_curr;
-   u8  sbal_response;
u16 qdio_outb_usage;
 };
 
-- 
2.11.2



[PATCH 00/22] zfcp fixes and cleanups

2017-07-28 Thread Benjamin Block
Hi all,

here is a series of (important) fixes and some additional cleanups for the
zfcp driver. Our fixes are marked for stable accordingly.

Patches 01 - 04 are cleanups and external patches (also cleanups) that we
  & 13 - 22 had queued for quite some time now

Patches 05 - 12 are driver fixes that are all marked for stable as well

The set applies to the fixes branch of James' scsi.git.


Beste Grüße / Best regards,
  - Benjamin Block

Benjamin Block (2):
  zfcp: convert bool-definitions to use 'true' instead of '1'
  zfcp: add handling for FCP_RESID_OVER to the fcp ingress path

Corentin Labbe (1):
  zfcp: Remove unneeded linux/miscdevice.h include

LABBE Corentin (1):
  zfcp: replace zfcp_qdio_sbale_count by sg_nents

Lukáš Korenčik (1):
  zfcp: use setup_timer instead of init_timer

Martin Peschke (4):
  zfcp: clean up redundant code with fall through in link down SRB
switch case
  zfcp: clean up a member of struct zfcp_qdio that was assigned but
never used
  zfcp: clean up unnecessary module_param_named() with
no_auto_port_rescan
  zfcp: early returns for traces disabled via level

Steffen Maier (13):
  zfcp: fix queuecommand for scsi_eh commands when DIX enabled
  zfcp: fix capping of unsuccessful GPN_FT SAN response trace records
  zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA
  zfcp: fix missing trace records for early returns in TMF eh handlers
  zfcp: fix payload with full FCP_RSP IU in SCSI trace records
  zfcp: trace HBA FSF response by default on dismiss or timedout late
response
  zfcp: trace high part of "new" 64 bit SCSI LUN
  zfcp: more fitting constant for fc_ct_hdr.ct_reason on port scan
response
  zfcp: clarify that we don't need "link" test on failed open port
  zfcp: use common code fcp_cmnd and fcp_resp with union in
fsf_qtcb_bottom_io
  zfcp: use endianness conversions with common FC(P) struct fields
  zfcp: fix kernel doc comment typos for struct zfcp_dbf_scsi
  zfcp: clean up no longer existent prototype from zfcp API header

 drivers/s390/scsi/zfcp_aux.c  |  1 -
 drivers/s390/scsi/zfcp_dbf.c  | 95 +++
 drivers/s390/scsi/zfcp_dbf.h  | 25 
 drivers/s390/scsi/zfcp_erp.c  |  5 +--
 drivers/s390/scsi/zfcp_ext.h  |  1 -
 drivers/s390/scsi/zfcp_fc.c   | 52 +++
 drivers/s390/scsi/zfcp_fc.h   | 25 
 drivers/s390/scsi/zfcp_fsf.c  | 35 
 drivers/s390/scsi/zfcp_fsf.h  | 12 --
 drivers/s390/scsi/zfcp_qdio.c |  2 +-
 drivers/s390/scsi/zfcp_qdio.h | 17 
 drivers/s390/scsi/zfcp_scsi.c | 18 
 12 files changed, 178 insertions(+), 110 deletions(-)

--
2.11.2



[PATCH 21/22] zfcp: clean up unnecessary module_param_named() with no_auto_port_rescan

2017-07-28 Thread Benjamin Block
From: Martin Peschke <mpesc...@linux.vnet.ibm.com>

Improves commit 43f60cbd56c4 ("[SCSI] zfcp: No automatic port_rescan on
events")

Signed-off-by: Martin Peschke <mpesc...@linux.vnet.ibm.com>
[ma...@linux.vnet.ibm.com: reword, underscore in description to match sysfs]
Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_fc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/s390/scsi/zfcp_fc.c b/drivers/s390/scsi/zfcp_fc.c
index 3e715597b739..8210645c2111 100644
--- a/drivers/s390/scsi/zfcp_fc.c
+++ b/drivers/s390/scsi/zfcp_fc.c
@@ -29,7 +29,7 @@ static u32 zfcp_fc_rscn_range_mask[] = {
 };
 
 static bool no_auto_port_rescan;
-module_param_named(no_auto_port_rescan, no_auto_port_rescan, bool, 0600);
+module_param(no_auto_port_rescan, bool, 0600);
 MODULE_PARM_DESC(no_auto_port_rescan,
 "no automatic port_rescan (default off)");
 
-- 
2.11.2



[PATCH 11/22] zfcp: trace HBA FSF response by default on dismiss or timedout late response

2017-07-28 Thread Benjamin Block
=> fs_rerr
Request ID : 0x
Request status : 0x1010 ZFCP_STATUS_FSFREQ_DISMISSED
| ZFCP_STATUS_FSFREQ_CLEANUP
FSF cmnd   : 0x0001
FSF sequence no: 0x...
FSF issued : ...
FSF stat   : 0x FSF_GOOD
FSF stat qual  :    
Prot stat  : 0x0001 FSF_PROT_GOOD
Prot stat qual :    
Port handle: 0x...
LUN handle : 0x...
|
Timestamp  : ...
Area   : SCSI
Subarea: 00
Level  : 3
Exception  : -
CPU ID : ..
Caller : ...
Record ID  : 1
Tag: rsl_err
Request ID : 0x
SCSI ID: 0x...
SCSI LUN   : 0x...
SCSI result: 0x000e DID_TRANSPORT_DISRUPTED
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x
SCSI opcode: 28...  Read(10)
FCP rsp inf cod: 0x00
FCP rsp IU :    
 ^^ SAM_STAT_GOOD
  

Only with luck in both above cases, we could see a follow-on trace record
of an unsuccesful event following a successful but late FSF response with
FSF_PROT_GOOD and FSF_GOOD. Typically this was the case for I/O requests
resulting in a SCSI trace record "rsl_err" with DID_TRANSPORT_DISRUPTED
[On ZFCP_STATUS_FSFREQ_DISMISSED, zfcp_fsf_protstatus_eval() sets
ZFCP_STATUS_FSFREQ_ERROR seen by the request handler functions as failure].
However, the reason for this follow-on trace was invisible because the
corresponding HBA trace record was missing at the default trace level
(by default hidden records with tags "fs_norm", "fs_qtcb", or "fs_open").

On adapter recovery, after we had shut down the QDIO queues, we perform
unsuccessful pseudo completions with flag ZFCP_STATUS_FSFREQ_DISMISSED
for each pending FSF request in zfcp_fsf_req_dismiss_all().
In order to find the root cause, we need to see all pseudo responses even
if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.

Therefore, check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED
or ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr".

It does not matter that there are numerous places which set
ZFCP_STATUS_FSFREQ_ERROR after the location where we trace an FSF response
early. These cases are based on protocol status != FSF_PROT_GOOD or
== FSF_PROT_FSF_STATUS_PRESENTED and are thus already traced by default
as trace tag "fs_perr" or "fs_ferr" respectively.

NB: The trace record with tag "fssrh_1" for status read buffers on dismiss
all remains. zfcp_fsf_req_complete() handles this and returns early.
All other FSF request types are handled separately and as described above.

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features")
Fixes: 2e261af84cdb ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching 
trace levels")
Cc: <sta...@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_dbf.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h
index 776d1ac125ff..8e7f8e6037d2 100644
--- a/drivers/s390/scsi/zfcp_dbf.h
+++ b/drivers/s390/scsi/zfcp_dbf.h
@@ -323,7 +323,11 @@ void zfcp_dbf_hba_fsf_response(struct zfcp_fsf_req *req)
 {
struct fsf_qtcb *qtcb = req->qtcb;
 
-   if ((qtcb->prefix.prot_status != FSF_PROT_GOOD) &&
+   if (unlikely(req->status & (ZFCP_STATUS_FSFREQ_DISMISSED |
+   ZFCP_STATUS_FSFREQ_ERROR))) {
+   zfcp_dbf_hba_fsf_resp("fs_rerr", 3, req);
+
+   } else if ((qtcb->prefix.prot_status != FSF_PROT_GOOD) &&
(qtcb->prefix.prot_status != FSF_PROT_FSF_STATUS_PRESENTED)) {
zfcp_dbf_hba_fsf_resp("fs_perr", 1, req);
 
-- 
2.11.2



[PATCH 15/22] zfcp: use common code fcp_cmnd and fcp_resp with union in fsf_qtcb_bottom_io

2017-07-28 Thread Benjamin Block
From: Steffen Maier <ma...@linux.vnet.ibm.com>

This eases crash dump analysis by automatically dissecting these
protocol headers at least somewhat rather than getting a string
interpretation of large unstructured character array buffer fields.

Also, we can get rid of some unnecessary and error-prone type casts.

This change is possible since v2.6.33 commit 4318e08c84e4
("[SCSI] zfcp: Update FCP protocol related code").

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_dbf.c |  3 +--
 drivers/s390/scsi/zfcp_dbf.h |  2 +-
 drivers/s390/scsi/zfcp_fsf.c | 10 ++
 drivers/s390/scsi/zfcp_fsf.h | 12 +---
 4 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index 34367d172961..225e60d5d3fc 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -573,8 +573,7 @@ void zfcp_dbf_scsi(char *tag, int level, struct scsi_cmnd 
*sc,
if (fsf) {
rec->fsf_req_id = fsf->req_id;
rec->pl_len = FCP_RESP_WITH_EXT;
-   fcp_rsp = (struct fcp_resp_with_ext *)
-   &(fsf->qtcb->bottom.io.fcp_rsp);
+   fcp_rsp = &(fsf->qtcb->bottom.io.fcp_rsp.iu);
/* mandatory parts of FCP_RSP IU in this SCSI record */
memcpy(>fcp_rsp, fcp_rsp, FCP_RESP_WITH_EXT);
if (fcp_rsp->resp.fr_flags & FCP_RSP_LEN_VAL) {
diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h
index b60667c145fd..33ccf15b63e0 100644
--- a/drivers/s390/scsi/zfcp_dbf.h
+++ b/drivers/s390/scsi/zfcp_dbf.h
@@ -301,7 +301,7 @@ bool zfcp_dbf_hba_fsf_resp_suppress(struct zfcp_fsf_req 
*req)
 
if (qtcb->prefix.qtcb_type != FSF_IO_COMMAND)
return false; /* not an FCP response */
-   fcp_rsp = (struct fcp_resp *)>bottom.io.fcp_rsp;
+   fcp_rsp = >bottom.io.fcp_rsp.iu.resp;
rsp_flags = fcp_rsp->fr_flags;
fr_status = fcp_rsp->fr_status;
return (fsf_stat == FSF_FCP_RSP_AVAILABLE) &&
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 9f73b8fc7f3b..6ddaee5f3701 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -2143,7 +2143,8 @@ static void zfcp_fsf_fcp_cmnd_handler(struct zfcp_fsf_req 
*req)
zfcp_scsi_dif_sense_error(scpnt, 0x3);
goto skip_fsfstatus;
}
-   fcp_rsp = (struct fcp_resp_with_ext *) >qtcb->bottom.io.fcp_rsp;
+   BUILD_BUG_ON(sizeof(struct fcp_resp_with_ext) > FSF_FCP_RSP_SIZE);
+   fcp_rsp = >qtcb->bottom.io.fcp_rsp.iu;
zfcp_fc_eval_fcp_rsp(fcp_rsp, scpnt);
 
 skip_fsfstatus:
@@ -2256,7 +2257,8 @@ int zfcp_fsf_fcp_cmnd(struct scsi_cmnd *scsi_cmnd)
if (zfcp_fsf_set_data_dir(scsi_cmnd, >data_direction))
goto failed_scsi_cmnd;
 
-   fcp_cmnd = (struct fcp_cmnd *) >qtcb->bottom.io.fcp_cmnd;
+   BUILD_BUG_ON(sizeof(struct fcp_cmnd) > FSF_FCP_CMND_SIZE);
+   fcp_cmnd = >qtcb->bottom.io.fcp_cmnd.iu;
zfcp_fc_scsi_to_fcp(fcp_cmnd, scsi_cmnd, 0);
 
if ((scsi_get_prot_op(scsi_cmnd) != SCSI_PROT_NORMAL) &&
@@ -2301,7 +2303,7 @@ static void zfcp_fsf_fcp_task_mgmt_handler(struct 
zfcp_fsf_req *req)
 
zfcp_fsf_fcp_handler_common(req);
 
-   fcp_rsp = (struct fcp_resp_with_ext *) >qtcb->bottom.io.fcp_rsp;
+   fcp_rsp = >qtcb->bottom.io.fcp_rsp.iu;
rsp_info = (struct fcp_resp_rsp_info *) _rsp[1];
 
if ((rsp_info->rsp_code != FCP_TMF_CMPL) ||
@@ -2350,7 +2352,7 @@ struct zfcp_fsf_req *zfcp_fsf_fcp_task_mgmt(struct 
scsi_cmnd *scmnd,
 
zfcp_qdio_set_sbale_last(qdio, >qdio_req);
 
-   fcp_cmnd = (struct fcp_cmnd *) >qtcb->bottom.io.fcp_cmnd;
+   fcp_cmnd = >qtcb->bottom.io.fcp_cmnd.iu;
zfcp_fc_scsi_to_fcp(fcp_cmnd, scmnd, tm_flags);
 
zfcp_fsf_start_timer(req, ZFCP_SCSI_ER_TIMEOUT);
diff --git a/drivers/s390/scsi/zfcp_fsf.h b/drivers/s390/scsi/zfcp_fsf.h
index ea3c76ac0de1..88feba5bfda4 100644
--- a/drivers/s390/scsi/zfcp_fsf.h
+++ b/drivers/s390/scsi/zfcp_fsf.h
@@ -3,7 +3,7 @@
  *
  * Interface to the FSF support functions.
  *
- * Copyright IBM Corp. 2002, 2016
+ * Copyright IBM Corp. 2002, 2017
  */
 
 #ifndef FSF_H
@@ -312,8 +312,14 @@ struct fsf_qtcb_bottom_io {
u32 data_block_length;
u32 prot_data_length;
u8  res2[4];
-   u8  fcp_cmnd[FSF_FCP_CMND_SIZE];
-   u8  fcp_rsp[FSF_FCP_RSP_SIZE];
+   union {
+   u8  byte[FSF_FCP_CMND_SIZE];
+   struct fcp_cmnd iu;
+   }   fcp_cmnd;
+   union {
+   u8   byte[FSF_FCP_RSP_SIZE];
+   struct fcp_resp_with_ext iu;
+   }   fcp_rsp;
u8  res3[64];
 } __attribute__ ((packed));
 
-- 
2.11.2



[PATCH 04/22] zfcp: convert bool-definitions to use 'true' instead of '1'

2017-07-28 Thread Benjamin Block
Better form and cleans remaining warnings.

Found with scripts/coccinelle/misc/boolinit.cocci.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_qdio.c | 2 +-
 drivers/s390/scsi/zfcp_scsi.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_qdio.c b/drivers/s390/scsi/zfcp_qdio.c
index dbf2b54703f7..9e358fc04b78 100644
--- a/drivers/s390/scsi/zfcp_qdio.c
+++ b/drivers/s390/scsi/zfcp_qdio.c
@@ -14,7 +14,7 @@
 #include "zfcp_ext.h"
 #include "zfcp_qdio.h"
 
-static bool enable_multibuffer = 1;
+static bool enable_multibuffer = true;
 module_param_named(datarouter, enable_multibuffer, bool, 0400);
 MODULE_PARM_DESC(datarouter, "Enable hardware data router support (default 
on)");
 
diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index 0678cf714c0e..d99da4e1e27a 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -28,7 +28,7 @@ static bool enable_dif;
 module_param_named(dif, enable_dif, bool, 0400);
 MODULE_PARM_DESC(dif, "Enable DIF/DIX data integrity support");
 
-static bool allow_lun_scan = 1;
+static bool allow_lun_scan = true;
 module_param(allow_lun_scan, bool, 0600);
 MODULE_PARM_DESC(allow_lun_scan, "For NPIV, scan and attach all storage LUNs");
 
-- 
2.11.2



[PATCH 07/22] zfcp: fix capping of unsuccessful GPN_FT SAN response trace records

2017-07-28 Thread Benjamin Block
From: Steffen Maier <ma...@linux.vnet.ibm.com>

v4.9 commit aceeffbb59bb ("zfcp: trace full payload of all SAN records
(req,resp,iels)") fixed trace data loss of 2.6.38 commit 2c55b750a884
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
necessary for problem determination, e.g. to see the
currently active zone set during automatic port scan.

While it already saves space by not dumping any empty residual entries
of the large successful GPN_FT response (4 pages), there are seldom cases
where the GPN_FT response is unsuccessful and likely does not have
FC_NS_FID_LAST set in fp_flags so we did not cap the trace record.
We typically see such case for an initiator WWPN, which is not in any zone.

Cap unsuccessful responses to at least the actual basic CT_IU response
plus whatever fits the SAN trace record built-in "payload" buffer
just in case there's trailing information
of which we would at least see the existence and its beginning.

In order not to erroneously cap successful responses, we need to swap
calling the trace function and setting the CT / ELS status to success (0).

Example trace record pair formatted with zfcpdbf:

Timestamp  : ...
Area   : SAN
Subarea: 00
Level  : 1
Exception  : -
CPU ID : ..
Caller : 0x...
Record ID  : 1
Tag: fssct_1
Request ID : 0x
Destination ID : 0x00fc
SAN req short  : 0100 fc02 01720ffc 
 0008
SAN req length : 20
|
Timestamp  : ...
Area   : SAN
Subarea: 00
Level  : 1
Exception  : -
CPU ID : ..
Caller : 0x...
Record ID  : 2
Tag: fsscth2
Request ID : 0x
Destination ID : 0x00fc
SAN resp short : 0100 fc02 8001 00090700
     [trailing info]
     [trailing info]
SAN resp length: 16384
San resp info  : 0100 fc02 8001 00090700
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]
     [trailing info]

The fix saves all but one of the previously associated 64 PAYload trace
record chunks of size 256 bytes each.

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records 
(req,resp,iels)")
Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN 
records.")
Cc: <sta...@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_dbf.c | 10 +-
 drivers/s390/scsi/zfcp_fsf.c |  4 ++--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index d5bf36ec8a75..31d62ea5fdcd 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -3,7 +3,7 @@
  *
  * Debug traces for zfcp.
  *
- * Copyright IBM Corp. 2002, 2016
+ * Copyright IBM Corp. 2002, 2017
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -447,6 +447,7 @@ static u16 zfcp_dbf_san_res_cap_len_if_gpn_ft(char *tag,
struct fc_ct_hdr *reqh = sg_virt(ct_els->req);
struct fc_ns_gid_ft *reqn = (struct fc_ns_gid_ft *)(reqh + 1);
struct scatterlist *resp_entry = ct_els->resp;
+   struct fc_ct_hdr *resph;
struct fc_gpn_ft_resp *acc;
int max_entries, x, last = 0;
 
@@ -473,6 +474,13 @@ static u16 zfcp_dbf_san_res_cap_len_if_gpn_ft(char *tag,
return len; /* not GPN_FT response so do not cap */
 
acc = sg_virt(resp_entry);
+
+   /* cap all but accept CT responses to at least the CT header */
+   resph = (struct fc_ct_hdr *)acc;
+   if ((ct_els->status) ||
+   (resph->ct_cmd != cpu_to_be16(FC_FS_ACC)))
+   return max(FC_CT_HDR_LEN, ZFCP_DBF_SAN_MAX_PAYLOAD);
+
max_entries = (reqh->ct_mr_size * 4 / si

[PATCH 06/22] zfcp: add handling for FCP_RESID_OVER to the fcp ingress path

2017-07-28 Thread Benjamin Block
Up until now zfcp would just ignore the FCP_RESID_OVER flag in the FCP
response IU. When this flag is set, it is possible, in regards to the
FCP standard, that the storage-server processes the command normally, up
to the point where data is missing and simply ignores those.

In this case no CHECK CONDITION would be set, and because we ignored the
FCP_RESID_OVER flag we resulted in at least a data loss or even
-corruption as a follow-up error, depending on how the
applications/layers on top behave. To prevent this, we now set the
host-byte of the corresponding scsi_cmnd to DID_ERROR.

Other storage-behaviors, where the same condition results in a CHECK
CONDITION set in the answer, don't need to be changed as they are
handled in the mid-layer already.

Following is an example trace record decoded with zfcpdbf from the
s390-tools package. We forcefully injected a fc_dl which is one byte too
small:

Timestamp  : ...
Area   : SCSI
Subarea: 00
Level  : 3
Exception  : -
CPU ID : ..
Caller : 0x...
Record ID  : 1
Tag: rsl_err
Request ID : 0x...
SCSI ID: 0x...
SCSI LUN   : 0x...
SCSI result: 0x0007
 ^^DID_ERROR
SCSI retries   : 0x..
SCSI allowed   : 0x..
SCSI scribble  : 0x...
SCSI opcode: 2a00  0800 
FCP rsp inf cod: 0x00
FCP rsp IU :   0400 0001
   ^^fr_flags==FCP_RESID_OVER
 ^^fr_status==SAM_STAT_GOOD
fr_resid
  

As of now, we don't actively handle to possibility that a response IU
has both flags - FCP_RESID_OVER and FCP_RESID_UNDER - set at once.

Reported-by: Luke M. Hopkins <lmhop...@us.ibm.com>
Reviewed-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Fixes: 553448f6c483 ("[SCSI] zfcp: Message cleanup")
Fixes: ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter.") 
(tglx/history.git)
Cc: <sta...@vger.kernel.org> #2.6.33+
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_fc.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/scsi/zfcp_fc.h b/drivers/s390/scsi/zfcp_fc.h
index df2b541c8287..a2275825186f 100644
--- a/drivers/s390/scsi/zfcp_fc.h
+++ b/drivers/s390/scsi/zfcp_fc.h
@@ -4,7 +4,7 @@
  * Fibre Channel related definitions and inline functions for the zfcp
  * device driver
  *
- * Copyright IBM Corp. 2009
+ * Copyright IBM Corp. 2009, 2017
  */
 
 #ifndef ZFCP_FC_H
@@ -279,6 +279,10 @@ void zfcp_fc_eval_fcp_rsp(struct fcp_resp_with_ext 
*fcp_rsp,
 !(rsp_flags & FCP_SNS_LEN_VAL) &&
 fcp_rsp->resp.fr_status == SAM_STAT_GOOD)
set_host_byte(scsi, DID_ERROR);
+   } else if (unlikely(rsp_flags & FCP_RESID_OVER)) {
+   /* FCP_DL was not sufficient for SCSI data length */
+   if (fcp_rsp->resp.fr_status == SAM_STAT_GOOD)
+   set_host_byte(scsi, DID_ERROR);
}
 }
 
-- 
2.11.2



[PATCH 19/22] zfcp: clean up no longer existent prototype from zfcp API header

2017-07-28 Thread Benjamin Block
From: Steffen Maier <ma...@linux.vnet.ibm.com>

Commit a54ca0f62f95 ("[SCSI] zfcp: Redesign of the debug tracing for HBA
records.") refactored zfcp_dbf_hba_berr into zfcp_dbf_hba_bit_err
but added the prototype for the latter without removing it for the former.

Suggested-by: Martin Peschke <mpesc...@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_ext.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/s390/scsi/zfcp_ext.h b/drivers/s390/scsi/zfcp_ext.h
index 9afdbc32b23f..a9e968717dd9 100644
--- a/drivers/s390/scsi/zfcp_ext.h
+++ b/drivers/s390/scsi/zfcp_ext.h
@@ -41,7 +41,6 @@ extern void zfcp_dbf_rec_run_wka(char *, struct 
zfcp_fc_wka_port *, u64);
 extern void zfcp_dbf_hba_fsf_uss(char *, struct zfcp_fsf_req *);
 extern void zfcp_dbf_hba_fsf_res(char *, int, struct zfcp_fsf_req *);
 extern void zfcp_dbf_hba_bit_err(char *, struct zfcp_fsf_req *);
-extern void zfcp_dbf_hba_berr(struct zfcp_dbf *, struct zfcp_fsf_req *);
 extern void zfcp_dbf_hba_def_err(struct zfcp_adapter *, u64, u16, void **);
 extern void zfcp_dbf_hba_basic(char *, struct zfcp_adapter *);
 extern void zfcp_dbf_san_req(char *, struct zfcp_fsf_req *, u32);
-- 
2.11.2



[PATCH 10/22] zfcp: fix payload with full FCP_RSP IU in SCSI trace records

2017-07-28 Thread Benjamin Block
^==FCP_RSP_INFO

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Fixes: 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI 
records.")
Cc: <sta...@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_dbf.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index 31d62ea5fdcd..c801f9782cb2 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -572,19 +572,32 @@ void zfcp_dbf_scsi(char *tag, int level, struct scsi_cmnd 
*sc,
 
if (fsf) {
rec->fsf_req_id = fsf->req_id;
+   rec->pl_len = FCP_RESP_WITH_EXT;
fcp_rsp = (struct fcp_resp_with_ext *)
&(fsf->qtcb->bottom.io.fcp_rsp);
+   /* mandatory parts of FCP_RSP IU in this SCSI record */
memcpy(>fcp_rsp, fcp_rsp, FCP_RESP_WITH_EXT);
if (fcp_rsp->resp.fr_flags & FCP_RSP_LEN_VAL) {
fcp_rsp_info = (struct fcp_resp_rsp_info *) _rsp[1];
rec->fcp_rsp_info = fcp_rsp_info->rsp_code;
+   rec->pl_len += be32_to_cpu(fcp_rsp->ext.fr_rsp_len);
}
if (fcp_rsp->resp.fr_flags & FCP_SNS_LEN_VAL) {
-   rec->pl_len = min((u16)SCSI_SENSE_BUFFERSIZE,
- (u16)ZFCP_DBF_PAY_MAX_REC);
-   zfcp_dbf_pl_write(dbf, sc->sense_buffer, rec->pl_len,
- "fcp_sns", fsf->req_id);
+   rec->pl_len += be32_to_cpu(fcp_rsp->ext.fr_sns_len);
}
+   /* complete FCP_RSP IU in associated PAYload record
+* but only if there are optional parts
+*/
+   if (fcp_rsp->resp.fr_flags != 0)
+   zfcp_dbf_pl_write(
+   dbf, fcp_rsp,
+   /* at least one full PAY record
+* but not beyond hardware response field
+*/
+   min_t(u16, max_t(u16, rec->pl_len,
+ZFCP_DBF_PAY_MAX_REC),
+ FSF_FCP_RSP_SIZE),
+   "fcp_riu", fsf->req_id);
}
 
debug_event(dbf->scsi, level, rec, sizeof(*rec));
-- 
2.11.2



[PATCH 08/22] zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA

2017-07-28 Thread Benjamin Block
From: Steffen Maier <ma...@linux.vnet.ibm.com>

Without this fix we get SCSI trace records on task management functions
which cannot be correlated to HBA trace records because all fields
related to the FSF request are empty (zero).
Also, the FCP_RSP_IU is missing as well as any sense data if available.

This was caused by v2.6.14 commit 8a36e4532ea1 ("[SCSI] zfcp: enhancement
of zfcp debug features") introducing trace records for TMFs but
hard coding NULL for a possibly existing TMF FSF request.
The scsi_cmnd scribble is also zero or unrelated for the TMF request
so it also could not lookup a suitable FSF request from there.

A broken example trace record formatted with zfcpdbf from the s390-tools
package:

Timestamp  : ...
Area   : SCSI
Subarea: 00
Level  : 1
Exception  : -
CPU ID : ..
Caller : 0x...
Record ID  : 1
Tag: lr_fail
Request ID : 0x
    no correlation to HBA record
SCSI ID: 0x
SCSI LUN   : 0x
SCSI result: 0x000e
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x
SCSI opcode: 2a17 3bb8 0800 
FCP rsp inf cod: 0x00
   ^^ no TMF response
FCP rsp IU :    
 ^^^
  
 ^ no interesting FCP_RSP_IU
Sense len  : ...
 no sense data length
Sense info : ...
 no sense data content, even if present

There are some true cases where we really do not have an FSF request:
"rsl_fai" from zfcp_dbf_scsi_fail_send() called for early
returns / completions in zfcp_scsi_queuecommand(),
"abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from
zfcp_scsi_eh_abort_handler() where we did not get as far,
"lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're
successful and do not need to do anything because adapter stopped.
For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi().

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features")
Cc: <sta...@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_dbf.h  | 7 ---
 drivers/s390/scsi/zfcp_scsi.c | 8 
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h
index db186d44cfaf..776d1ac125ff 100644
--- a/drivers/s390/scsi/zfcp_dbf.h
+++ b/drivers/s390/scsi/zfcp_dbf.h
@@ -2,7 +2,7 @@
  * zfcp device driver
  * debug feature declarations
  *
- * Copyright IBM Corp. 2008, 2016
+ * Copyright IBM Corp. 2008, 2017
  */
 
 #ifndef ZFCP_DBF_H
@@ -401,7 +401,8 @@ void zfcp_dbf_scsi_abort(char *tag, struct scsi_cmnd *scmd,
  * @flag: indicates type of reset (Target Reset, Logical Unit Reset)
  */
 static inline
-void zfcp_dbf_scsi_devreset(char *tag, struct scsi_cmnd *scmnd, u8 flag)
+void zfcp_dbf_scsi_devreset(char *tag, struct scsi_cmnd *scmnd, u8 flag,
+   struct zfcp_fsf_req *fsf_req)
 {
char tmp_tag[ZFCP_DBF_TAG_LEN];
 
@@ -411,7 +412,7 @@ void zfcp_dbf_scsi_devreset(char *tag, struct scsi_cmnd 
*scmnd, u8 flag)
memcpy(tmp_tag, "lr_", 3);
 
memcpy(_tag[3], tag, 4);
-   _zfcp_dbf_scsi(tmp_tag, 1, scmnd, NULL);
+   _zfcp_dbf_scsi(tmp_tag, 1, scmnd, fsf_req);
 }
 
 /**
diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index d99da4e1e27a..4c9ac55d053d 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -3,7 +3,7 @@
  *
  * Interface to Linux SCSI midlayer.
  *
- * Copyright IBM Corp. 2002, 2016
+ * Copyright IBM Corp. 2002, 2017
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -278,7 +278,7 @@ static int zfcp_task_mgmt_function(struct scsi_cmnd *scpnt, 
u8 tm_flags)
 
if (!(atomic_read(>status) &
  ZFCP_STATUS_COMMON_RUNNING)) {
-   zfcp_dbf_scsi_devreset("nres", scpnt, tm_flags);
+   zfcp_dbf_scsi_devreset("nres", scpnt, tm_flags, NULL);
return SUCCESS;
}
}
@@ -288,10 +288,10 @@ static int zfcp_task_mgmt_function(struct scsi_cmnd 
*scpnt, u8 tm_flags)
wait_for_completion(_req->completion);
 
if (fsf_req->status & ZFCP_STATUS_FSFREQ_TMFUNCFAILED) {
-   zfcp_dbf_scsi_devreset("fail", scpnt, tm_flags);
+   zfcp_dbf_scsi_devreset("fail", scpnt, tm_flags, fsf_req);
retval = FAILED;
} else {
-   zfcp_dbf_scsi_devreset("okay", scpnt, tm_flags);
+   zfcp_dbf_scsi_devreset("okay", scpnt, tm_flags, fsf_req);
zfcp_scsi_forget_cmnds(zfcp_sdev, tm_flags);
}
 
-- 
2.11.2



[PATCH 09/22] zfcp: fix missing trace records for early returns in TMF eh handlers

2017-07-28 Thread Benjamin Block
From: Steffen Maier <ma...@linux.vnet.ibm.com>

For problem determination we need to see that we were in scsi_eh
as well as whether and why we were successful or not.

The following commits introduced new early returns without adding
a trace record:

v2.6.35 commit a1dbfddd02d2
("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
on fc_block_scsi_eh() returning != 0 which is FAST_IO_FAIL,

v2.6.30 commit 63caf367e1c9
("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
on not having gotten an FSF request after the maximum number of retry
attempts and thus could not issue a TMF and has to return FAILED.

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Fixes: a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to 
scsi eh")
Fixes: 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in 
zfcp")
Cc: <sta...@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_scsi.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
index 4c9ac55d053d..ec3ddd1d31d5 100644
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -273,8 +273,10 @@ static int zfcp_task_mgmt_function(struct scsi_cmnd 
*scpnt, u8 tm_flags)
 
zfcp_erp_wait(adapter);
ret = fc_block_scsi_eh(scpnt);
-   if (ret)
+   if (ret) {
+   zfcp_dbf_scsi_devreset("fiof", scpnt, tm_flags, NULL);
return ret;
+   }
 
if (!(atomic_read(>status) &
  ZFCP_STATUS_COMMON_RUNNING)) {
@@ -282,8 +284,10 @@ static int zfcp_task_mgmt_function(struct scsi_cmnd 
*scpnt, u8 tm_flags)
return SUCCESS;
}
}
-   if (!fsf_req)
+   if (!fsf_req) {
+   zfcp_dbf_scsi_devreset("reqf", scpnt, tm_flags, NULL);
return FAILED;
+   }
 
wait_for_completion(_req->completion);
 
-- 
2.11.2



[PATCH 14/22] zfcp: clarify that we don't need "link" test on failed open port

2017-07-28 Thread Benjamin Block
From: Steffen Maier <ma...@linux.vnet.ibm.com>

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_fsf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c10b4b9f1574..9f73b8fc7f3b 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -3,7 +3,7 @@
  *
  * Implementation of FSF commands.
  *
- * Copyright IBM Corp. 2002, 2015
+ * Copyright IBM Corp. 2002, 2017
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -1393,6 +1393,8 @@ static void zfcp_fsf_open_port_handler(struct 
zfcp_fsf_req *req)
case FSF_ADAPTER_STATUS_AVAILABLE:
switch (header->fsf_status_qual.word[0]) {
case FSF_SQ_INVOKE_LINK_TEST_PROCEDURE:
+   /* no zfcp_fc_test_link() with failed open port */
+   /* fall through */
case FSF_SQ_ULP_DEPENDENT_ERP_REQUIRED:
case FSF_SQ_NO_RETRY_POSSIBLE:
req->status |= ZFCP_STATUS_FSFREQ_ERROR;
-- 
2.11.2



[PATCH 12/22] zfcp: trace high part of "new" 64 bit SCSI LUN

2017-07-28 Thread Benjamin Block
From: Steffen Maier <ma...@linux.vnet.ibm.com>

Complements debugging aspects of the otherwise functionally complete
v3.17 commit 9cb78c16f5da ("scsi: use 64-bit LUNs").

While I don't have access to a target exporting 3 or 4 level LUNs,
I did test it by explicitly attaching a non-existent fake 4 level LUN
by means of zfcp sysfs attribute "unit_add".
In order to see corresponding trace records of otherwise successful
events, we had to increase the trace level of area SCSI and HBA to 6.

$ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_scsi/level
$ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_hba/level

$ echo 0x4011402240334044 > \
  /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/unit_add

Example output formatted by an updated zfcpdbf from the s390-tools
package interspersed with kernel messages at scsi_logging_level=4605:

Timestamp  : ...
Area   : REC
Subarea: 00
Level  : 1
Exception  : -
CPU ID : ..
Caller : 0x...
Record ID  : 1
Tag: scsla_1
LUN: 0x4011402240334044
WWPN   : 0x50050763031bd327
D_ID   : 0x00..
Adapter status : 0x5400050b
Port status: 0x5401
LUN status : 0x4100
Ready count: 0x0001
Running count  : 0x
ERP want   : 0x01
ERP need   : 0x01

scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 1 length 36
scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0

Timestamp  : ...
Area   : HBA
Subarea: 00
Level  : 6
Exception  : -
CPU ID : ..
Caller : 0x...
Record ID  : 1
Tag: fs_norm
Request ID : 0x
Request status : 0x0010
FSF cmnd   : 0x0001
FSF sequence no: 0x...
FSF issued : ...
FSF stat   : 0x
FSF stat qual  :    
Prot stat  : 0x0001
Prot stat qual :    
Port handle: 0x...
LUN handle : 0x...
|
Timestamp  : ...
Area   : SCSI
Subarea: 00
Level  : 6
Exception  : -
CPU ID : ..
Caller : 0x...
Record ID  : 1
Tag: rsl_nor
Request ID : 0x
SCSI ID: 0x
SCSI LUN   : 0x40224011
SCSI LUN high  : 0x40444033 <===
SCSI result: 0x
SCSI retries   : 0x00
SCSI allowed   : 0x03
SCSI scribble  : 0x
SCSI opcode: 1200 a400  
FCP rsp inf cod: 0x00
FCP rsp IU :    
  

scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 2 length 164
scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0
scsi 2:0:0:4630896905707208721: scsi scan: peripheral device type of 31, \
no device added

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Fixes: 9cb78c16f5da ("scsi: use 64-bit LUNs")
Cc: <sta...@vger.kernel.org> #3.17+
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Reviewed-by: Jens Remus <jre...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_dbf.c | 2 +-
 drivers/s390/scsi/zfcp_dbf.h | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index c801f9782cb2..34367d172961 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -563,8 +563,8 @@ void zfcp_dbf_scsi(char *tag, int level, struct scsi_cmnd 
*sc,
rec->scsi_retries = sc->retries;
rec->scsi_allowed = sc->allowed;
rec->scsi_id = sc->device->id;
-   /* struct zfcp_dbf_scsi needs to be updated to handle 64bit LUNs */
rec->scsi_lun = (u32)sc->device->lun;
+   rec->scsi_lun_64_hi = (u32)(sc->device->lun >> 32);
rec->host_scribble = (unsigned long)sc->host_scribble;
 
memcpy(rec->scsi_opcode, sc->cmnd,
diff --git a/drivers/s390/scsi/zfcp_dbf.h b/drivers/s390/scsi/zfcp_dbf.h
index 8e7f8e6037d2..b60667c145fd 100644
--- a/drivers/s390/scsi/zfcp_dbf.h
+++ b/drivers/s390/scsi/zfcp_dbf.h
@@ -204,7 +204,7 @@ enum zfcp_dbf_scsi_id {
  * @id: unique number of recovery record type
  * @tag: identifier string specifying the location of initiation
  * @scsi_id: scsi device id
- * @scsi_lun: scsi device logical unit number
+ * @scsi_lun: scsi device logical unit number, low part of 64 bit, old 32 bit
  * @scsi_result: scsi result
  * @scsi_retries: current retry number of scsi request
  * @scsi_allowed: allowed retries
@@ -214,6 +214,7 @@ enum zfcp_dbf_scsi_id {
  * @host_scribble: LLD specific data attached to SCSI request
  * @pl_len: length of paload stored as zfcp_dbf_pay
  * @fsf_rsp: response for fsf request
+ * @scsi_lun_64_hi: scsi device logical unit number, high part of 64 bit
  */
 struct zfcp_dbf_scsi {
u8 id;
@@ -230,6 +231,7 @@ str

[PATCH 03/22] zfcp: Remove unneeded linux/miscdevice.h include

2017-07-28 Thread Benjamin Block
From: Corentin Labbe <clabbe.montj...@gmail.com>

drivers/s390/scsi/zfcp_aux.c does not contain any miscdevice so the
inclusion of linux/miscdevice.h is unnecessary.

[ma...@linux.vnet.ibm.com: just for the records, this is in fact a
 minor missing code cleanup of the following older "feature"
 which also dropped the only former use of a misc device in zfcp:
 commit 663e0890e31c ("[SCSI] zfcp: remove access control tables
interface")
 commit b5dc3c4800cc ("[SCSI] zfcp: remove access control tables
interface (keep sysfs files)")
 commit 1b33ef23946a ("zfcp: remove access control tables interface
 (port leftovers)")]

Signed-off-by: Corentin Labbe <clabbe.montj...@gmail.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_aux.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c
index bcc8f3dfd4c4..82ac331d9125 100644
--- a/drivers/s390/scsi/zfcp_aux.c
+++ b/drivers/s390/scsi/zfcp_aux.c
@@ -29,7 +29,6 @@
 #define KMSG_COMPONENT "zfcp"
 #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
 
-#include 
 #include 
 #include 
 #include 
-- 
2.11.2



[PATCH 13/22] zfcp: more fitting constant for fc_ct_hdr.ct_reason on port scan response

2017-07-28 Thread Benjamin Block
From: Steffen Maier <ma...@linux.vnet.ibm.com>

v2.6.33 commit dbf5dfe9dbce ("[SCSI] zfcp: Use common code definitions for
FC CT structs") replaced own definitions with common code definitions.
While FC_BA_RJT_UNABLE happens to be defined with the same value 9 as
FC_FS_RJT_UNABL and thus also works, here we should use the latter from
fc_gs.h.
See also its use in libfc's fc_disc_gpn_ft_resp().

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_fc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_fc.c b/drivers/s390/scsi/zfcp_fc.c
index 7331eea67435..cc3f378782b2 100644
--- a/drivers/s390/scsi/zfcp_fc.c
+++ b/drivers/s390/scsi/zfcp_fc.c
@@ -3,7 +3,7 @@
  *
  * Fibre Channel related functions for the zfcp device driver.
  *
- * Copyright IBM Corp. 2008, 2010
+ * Copyright IBM Corp. 2008, 2017
  */
 
 #define KMSG_COMPONENT "zfcp"
@@ -667,7 +667,7 @@ static int zfcp_fc_eval_gpn_ft(struct zfcp_fc_req *fc_req,
return -EIO;
 
if (hdr->ct_cmd != FC_FS_ACC) {
-   if (hdr->ct_reason == FC_BA_RJT_UNABLE)
+   if (hdr->ct_reason == FC_FS_RJT_UNABL)
return -EAGAIN; /* might be a temporary condition */
return -EIO;
}
-- 
2.11.2



[PATCH 02/22] zfcp: use setup_timer instead of init_timer

2017-07-28 Thread Benjamin Block
From: Lukáš Korenčik <xkore...@fi.muni.cz>

Use initialization with setup_timer function instead of using
init_timer function and data fields. It improves readability.

Signed-off-by: Lukáš Korenčik <xkore...@fi.muni.cz>
Signed-off-by: Jiri Slaby <jsl...@suse.cz>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_erp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_erp.c b/drivers/s390/scsi/zfcp_erp.c
index 7ccfce559034..37408f5f81ce 100644
--- a/drivers/s390/scsi/zfcp_erp.c
+++ b/drivers/s390/scsi/zfcp_erp.c
@@ -572,9 +572,8 @@ static void zfcp_erp_memwait_handler(unsigned long data)
 
 static void zfcp_erp_strategy_memwait(struct zfcp_erp_action *erp_action)
 {
-   init_timer(_action->timer);
-   erp_action->timer.function = zfcp_erp_memwait_handler;
-   erp_action->timer.data = (unsigned long) erp_action;
+   setup_timer(_action->timer, zfcp_erp_memwait_handler,
+   (unsigned long) erp_action);
erp_action->timer.expires = jiffies + HZ;
add_timer(_action->timer);
 }
-- 
2.11.2



[PATCH] MAINTAINERS: Add myself to S390 ZFCP DRIVER as a co-maintainer

2017-07-28 Thread Benjamin Block
I have been working with Steffen on zFCP for quite a while now and we
decided adding me as a co-maintainer might be a good thing.

Acked-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f6ec1be48609..2c1a52d1376c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11342,6 +11342,7 @@ F:  drivers/s390/crypto/
 
 S390 ZFCP DRIVER
 M: Steffen Maier <ma...@linux.vnet.ibm.com>
+M: Benjamin Block <bbl...@linux.vnet.ibm.com>
 L: linux-s...@vger.kernel.org
 W: http://www.ibm.com/developerworks/linux/linux390/
 S: Supported
-- 
2.11.2



Re: A bug in scsi_alloc_target of drivers/scsi/scsi_scan.c

2017-05-16 Thread Benjamin Block
Hello Dashi,

On Tue, May 09, 2017 at 09:08:14AM +, Dashi DS1 Cao wrote:
> When debugging a race condition in scsi_remove_target of 3.12, I ran into 
> this possible bug within scsi_alloc_target.
> When an existing "struct scsi_target" is found and used, the starget just got 
> through kzmalloc should be freed, rather than dong a "put_device(dev)".

But that is exactly what is done when put_device is called and the
internal ref-count drops below 1. It will go through the kobj-core and
end up in scsi_target_dev_release().

Also this specific code was changed in 12fb8c1574d7d in 2010, see the
commit message there.

Beste Grüße / Best regards,
  - Benjamin Block
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 81d4151..96795d4 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -483,7 +483,7 @@ static struct scsi_target *scsi_alloc_target(struct 
> device *parent,
> 
> spin_unlock_irqrestore(shost->host_lock, flags);
> if (ref_got) {
> -   put_device(dev);
> +   kfree(starget);
> return found_target;
> }
> /*
> --
> 
> Dashi Cao
> 
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [RFC] scsi: generate uevent for SCSI sense code

2017-05-16 Thread Benjamin Block
 *envp[3];
> + int idx = 0, i;
> + char *envp[5];  /* SDEV_EVT_SCSI_SENSE needs most entries (4) */
> + int free_envp = -1;
>
>   switch (evt->evt_type) {
>   case SDEV_EVT_MEDIA_CHANGE:
> @@ -2682,6 +2683,23 @@ static void scsi_evt_emit(struct scsi_device *sdev, 
> struct scsi_event *evt)
>   case SDEV_EVT_ALUA_STATE_CHANGE_REPORTED:
>   envp[idx++] = "SDEV_UA=ASYMMETRIC_ACCESS_STATE_CHANGED";
>   break;
> +#ifdef CONFIG_SCSI_SENSE_UEVENT
> + case SDEV_EVT_SCSI_SENSE:
> + envp[idx++] = "SDEV_UA=SCSI_SENSE";
> + for (i = idx; i < idx + 3; ++i) {
> + envp[i] = kzalloc(32, GFP_ATOMIC);

Why try so hard with GFP_ATOMIC? This is run in its own work-item and AFAIK
there is no need to be so strict here.

> + if (!envp[i])
> + break;

The error-handling seems a bit optimistic. Especially when your write on
the pointers afterwards.

> + free_envp = i;
> + }
> + snprintf(envp[idx++], 32, "LBA=%lu", evt->sense_evt_data.lba);
> + snprintf(envp[idx++], 32, "SIZE=%d", evt->sense_evt_data.size);
> + snprintf(envp[idx++], 32, "SENSE_CODE=%1x/%02x/%02x",
> +  evt->sense_evt_data.sshdr.sense_key,
> +  evt->sense_evt_data.sshdr.asc,
> +  evt->sense_evt_data.sshdr.ascq);
> + break;

How do you get to this 3x'32'? Seems like quite some waste when we talk
about frequent events.

> +#endif
>   default:
>   /* do nothing */
>   break;
> @@ -2690,6 +2708,10 @@ static void scsi_evt_emit(struct scsi_device *sdev, 
> struct scsi_event *evt)
>   envp[idx++] = NULL;
>
>   kobject_uevent_env(>sdev_gendev.kobj, KOBJ_CHANGE, envp);
> +
> + /* no need to free envp[0], so start with i = 1 */
> + for (i = 1 ; i < free_envp; ++i)
> + kfree(envp[i]);
>  }
>
>  /**
> @@ -2786,6 +2808,7 @@ struct scsi_event *sdev_evt_alloc(enum 
> scsi_device_event evt_type,
>   case SDEV_EVT_MODE_PARAMETER_CHANGE_REPORTED:
>   case SDEV_EVT_LUN_CHANGE_REPORTED:
>   case SDEV_EVT_ALUA_STATE_CHANGE_REPORTED:
> + case SDEV_EVT_SCSI_SENSE:
>   default:
>   /* do nothing */
>   break;
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 82dfe07..cfc7380 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1072,6 +1072,63 @@ static DEVICE_ATTR(queue_ramp_up_period, S_IRUGO | 
> S_IWUSR,
>  sdev_show_queue_ramp_up_period,
>  sdev_store_queue_ramp_up_period);
>
> +#ifdef CONFIG_SCSI_SENSE_UEVENT
> +
> +/*
> + * SCSI sense key could be 0x00 - 0x08, 0x0a, 0x0b, 0x0d, 0x0e, so the
> + * mask is 0x6dff.
> + */
> +#define SCSI_SENSE_EVENT_FILTER_MASK 0x6dff

Why omit 0x09? Its vendor specific, but that doesn't mean no-one ever
will use it? Well, I don't know a real-life example, but anyway, doesn't
hurt IMO.

Also SPC-4 specifies 0x0F.

> +
> +static ssize_t
> +sdev_show_sense_event_filter(struct device *dev,
> +  struct device_attribute *attr,
> +  char *buf)
> +{
> + struct scsi_device *sdev;
> +
> + sdev = to_scsi_device(dev);
> + return snprintf(buf, 20, "0x%04lx\n",
> + (sdev->sense_event_filter &
> +  SCSI_SENSE_EVENT_FILTER_MASK));
> +}
> +
> +static ssize_t
> +sdev_store_sense_event_filter(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf, size_t count)
> +{
> + struct scsi_device *sdev = to_scsi_device(dev);
> + unsigned long filter;
> + int i;
> +
> + if (buf[0] == '0' && buf[1] == 'x') {
> + if (kstrtoul(buf + 2, 16, ))
> + return -EINVAL;
> + } else
> + if (kstrtoul(buf, 10, ))
> + return -EINVAL;

Why the manual parsing? AFAIK kstrtoul can do that already, if you pass
0 as second argument.


Beste Grüße / Best regards,
  - Benjamin Block
> +
> + /*
> +  * Accurate mask for all sense keys is 0x6dff. However, we allow
> +  * user to enable event for all sense keys by echoing 0x
> +  */
> + if ((filter & 0x) != filter)
> + return -EINVAL;
> +
> + for (i = 0; i < 15; i++)
> 

Re: [PATCH v3 3/4] sd: Make synchronize cache upon shutdown asynchronous

2017-04-18 Thread Benjamin Block
Hej Bart,

sry for the late'ish reply, had a long weekend.

On Thu, Apr 13, 2017 at 12:28:54AM +, Bart Van Assche wrote:
> On Wed, 2017-04-12 at 16:41 +0200, Benjamin Block wrote:
> > On Mon, Apr 10, 2017 at 10:54:01AM -0700, Bart Van Assche wrote:
> > > [ ... ]
> > OK, so I take it the problem is when the queue is stopped, then the
> > completion in blk_execute_rq() will never be triggered and then we wait
> > for a timeout there, or potentially forever?
>
> Hello Benjamin,
>
> Thanks for the review.
>
> If a request is queued after a queue has been stopped then that request
> will never be started and hence even the timeout timer won't be started.
> blk_cleanup_queue() hangs if invoked for a stopped queue and one or more
> requests have not yet been started.
>
> > But then what is the point in trying to do it async here anyway? Won't
> > that just be doomed in the same way, just that we don't see the effect?
>
> Have you noticed that patch 4/4 in this series restarts the queue just
> before calling blk_cleanup_queue()?
>
> Anyway, can you have a look at the patch below and see whether this new
> version addresses all the concerns you had reported in your previous
> e-mail?
>

Yes, the code- and comment-changes in sd_shutdown() are good. Apparently
there is something new with the done-function now, but you got that from
Israel.

I still wonder why we try 'so hard' scheduling a command for a dead
device, but as that seems to be the status quo, and only lacks in the
case where the LLD is already half-way gone, its ok for me too. I mean,
the order is a bit screwed.. we apparently first remove the driver and
post-factum try to drain the queue.. that is strange.


- Benjamin

On Mon, Apr 17, 2017 at 10:34:35AM -0700, Bart Van Assche wrote:
> This patch avoids that sd_shutdown() hangs on the SYNCHRONIZE CACHE
> command if the block layer queue has been stopped by
> scsi_target_block().
> 
> Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com>
> Cc: Israel Rukshin <isra...@mellanox.com>
> Cc: Max Gurtovoy <m...@mellanox.com>
> Cc: Hannes Reinecke <h...@suse.de>
> Cc: Benjamin Block <bbl...@linux.vnet.ibm.com>
> ---
>  drivers/scsi/sd.c | 45 -
>  1 file changed, 40 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index fe0f7997074e..deff564fe649 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1489,6 +1489,33 @@ static unsigned int sd_check_events(struct gendisk 
> *disk, unsigned int clearing)
>   return retval;
>  }
> 
> +static void sd_sync_cache_done(struct request *rq, int e)
> +{
> + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
> +
> + sd_printk(KERN_DEBUG, sdkp, "%s\n", __func__);
> +
> + blk_put_request(rq);
> +}
> +
> +/*
> + * Issue a SYNCHRONIZE CACHE command asynchronously. Since 
> blk_cleanup_queue()
> + * waits for all commands to finish, __scsi_remove_device() will wait for the
> + * SYNCHRONIZE CACHE command to finish.
> + */
> +static int sd_sync_cache_async(struct scsi_disk *sdkp)
> +{
> + const struct scsi_device *sdp = sdkp->device;
> + const int timeout = sdp->request_queue->rq_timeout *
> + SD_FLUSH_TIMEOUT_MULTIPLIER;
> + const unsigned char cmd[10] = { SYNCHRONIZE_CACHE };
> +
> + sd_printk(KERN_DEBUG, sdkp, "%s\n", __func__);
> + return scsi_execute_async(sdp, sdkp->disk, cmd, DMA_NONE, NULL, 0,
> +   timeout, SD_MAX_RETRIES, 0, 0,
> +   sd_sync_cache_done);
> +}
> +
>  static int sd_sync_cache(struct scsi_disk *sdkp)
>  {
>   int retries, res;
> @@ -3349,13 +3376,15 @@ static int sd_start_stop_device(struct scsi_disk 
> *sdkp, int start)
>  }
> 
>  /*
> - * Send a SYNCHRONIZE CACHE instruction down to the device through
> - * the normal SCSI command structure.  Wait for the command to
> - * complete.
> + * Send a SYNCHRONIZE CACHE instruction down to the device through the normal
> + * SCSI command structure. When stopping the disk, wait for the command to
> + * complete. When not stopping the disk, the blk_cleanup_queue() call in
> + * __scsi_remove_device() will wait for this command to complete.
>   */
>  static void sd_shutdown(struct device *dev)
>  {
>   struct scsi_disk *sdkp = dev_get_drvdata(dev);
> + bool stop_disk;
> 
>   if (!sdkp)
>   return; /* this can happen */
> @@ -3363,12 +3392,18 @@ static void sd_shutdown(struct device *dev)
>   if (pm_runtime_suspended(dev))
>   return;
> 
> + stop

Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-13 Thread Benjamin Block
On Wed, Apr 12, 2017 at 06:11:25PM +, Bart Van Assche wrote:
> On Wed, 2017-04-12 at 12:55 +0200, Benjamin Block wrote:
> > On Fri, Apr 07, 2017 at 11:16:48AM -0700, Bart Van Assche wrote:
> > > The six patches in this patch series fix the queue lockup I reported
> > > recently on the linux-block mailing list. Please consider these patches
> > > for inclusion in the upstream kernel.
> >
> > just out of curiosity. Is this maybe related to similar stuff happening
> > when CPUs are hot plugged - at least in that the stack gets stuck? Like
> > in this thread here:
> > https://www.mail-archive.com/linux-block@vger.kernel.org/msg06057.html
> >
> > Would be interesting, because we recently saw similar stuff happening.
>
> Hello Benjamin,
>
> My proposal is to repeat that test with Jens' for-next branch. If the issue
> still occurs with that tree then please check the contents of
> /sys/kernel/debug/block/*/mq/*/{dispatch,*/rq_list}. That will allow to
> determine whether or not any block layer requests are still pending. If
> running the command below resolves the deadlock then it means that a
> trigger to run a block layer queue is still missing somewhere:
>
> for a in /sys/kernel/debug/block/*/mq/state; do echo run >$a; done
>
> See also git://git.kernel.dk/linux-block.git.
>

Thx for the hint! I'll forward that and see if the affected folks are
willing to reproduce.


    Beste Grüße / Best regards,
  - Benjamin Block
--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH v2 3/4] sd: Make synchronize cache upon shutdown asynchronous

2017-04-12 Thread Benjamin Block
On Mon, Apr 10, 2017 at 10:54:01AM -0700, Bart Van Assche wrote:
> This patch avoids that sd_shutdown() hangs on the SYNCHRONIZE CACHE
> command if the block layer queue has been stopped by
> scsi_target_block().
> 
> Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com>
> Cc: Israel Rukshin <isra...@mellanox.com>
> Cc: Max Gurtovoy <m...@mellanox.com>
> Cc: Hannes Reinecke <h...@suse.de>
> ---
>  drivers/scsi/sd.c | 25 +++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index fe0f7997074e..8e98b7684893 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1489,6 +1489,22 @@ static unsigned int sd_check_events(struct gendisk 
> *disk, unsigned int clearing)
>   return retval;
>  }
> 
> +/*
> + * Issue a SYNCHRONIZE CACHE command asynchronously. Since 
> blk_cleanup_queue()
> + * waits for all commands to finish, __scsi_remove_device() will wait for the
> + * SYNCHRONIZE CACHE command to finish.
> + */
> +static int sd_sync_cache_async(struct scsi_disk *sdkp)
> +{
> + const struct scsi_device *sdp = sdkp->device;
> + const int timeout = sdp->request_queue->rq_timeout *
> + SD_FLUSH_TIMEOUT_MULTIPLIER;
> + const unsigned char cmd[10] = { SYNCHRONIZE_CACHE };
> +
> + return scsi_execute_async(sdp, cmd, DMA_NONE, NULL, 0, timeout,
> +   SD_MAX_RETRIES, 0, 0);
> +}
> +

OK, so I take it the problem is when the queue is stopped, then the
completion in blk_execute_rq() will never be triggered and then we wait
for a timeout there, or potentially forever?

But then what is the point in trying to do it async here anyway? Won't
that just be doomed in the same way, just that we don't see the effect?

>  static int sd_sync_cache(struct scsi_disk *sdkp)
>  {
>   int retries, res;
> @@ -3356,6 +3372,8 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, 
> int start)
>  static void sd_shutdown(struct device *dev)
>  {
>   struct scsi_disk *sdkp = dev_get_drvdata(dev);
> + const bool stop_disk = system_state != SYSTEM_RESTART &&
> + sdkp->device->manage_start_stop;
> 
>   if (!sdkp)
>   return; /* this can happen */

That seems wrong then. You already dereference sdkp before the function
checks whether or not the pointer is valid (at least if you can trust
the comment there).

> @@ -3365,10 +3383,13 @@ static void sd_shutdown(struct device *dev)
> 
>   if (sdkp->WCE && sdkp->media_present) {
>   sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n");
> - sd_sync_cache(sdkp);
> + if (stop_disk)
> + sd_sync_cache(sdkp);
> + else
> +     sd_sync_cache_async(sdkp);

That makes the function-documentation obsolete, doesn't it?


Beste Grüße / Best regards,
  - Benjamin Block

>   }
> 
> - if (system_state != SYSTEM_RESTART && sdkp->device->manage_start_stop) {
> + if (stop_disk) {
>   sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>   sd_start_stop_device(sdkp, 0);
>   }
> -- 
> 2.12.0
> 
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH v4 0/6] Avoid that scsi-mq and dm-mq queue processing stalls sporadically

2017-04-12 Thread Benjamin Block
On Fri, Apr 07, 2017 at 11:16:48AM -0700, Bart Van Assche wrote:
> Hello Jens,
> 
> The six patches in this patch series fix the queue lockup I reported
> recently on the linux-block mailing list. Please consider these patches
> for inclusion in the upstream kernel.
> 

Hey Bart,

just out of curiosity. Is this maybe related to similar stuff happening
when CPUs are hot plugged - at least in that the stack gets stuck? Like
in this thread here:
https://www.mail-archive.com/linux-block@vger.kernel.org/msg06057.html

Would be interesting, because we recently saw similar stuff happening.


Beste Grüße / Best regards,
      - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command

2017-03-21 Thread Benjamin Block
On Thu, Mar 16, 2017 at 12:53:45PM +0100, Hannes Reinecke wrote:
>On 03/16/2017 12:01 PM, Benjamin Block wrote:
>> On Wed, Mar 15, 2017 at 02:54:16PM +0100, Hannes Reinecke wrote:
>>> On 03/14/2017 06:56 PM, Benjamin Block wrote:
>>>> Hello Hannes,
>>>>
>>>> On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
>>>>> When a command is sent as part of the error handling there
>>>>> is not point whatsoever to start EH escalation when that
>>>>> command fails; we are _already_ in the error handler,
>>>>> and the escalation is about to commence anyway.
>>>>> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
>>>>> commands and let the main EH routine handle the rest.
>>>>>
>>>>> Signed-off-by: Hannes Reinecke <h...@suse.de>
>>>>> Reviewed-by: Johannes Thumshirn <jthumsh...@suse.de>
>>>>> Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com>
>>>>> ---
>>>>>  drivers/scsi/scsi_error.c | 11 +--
>>>>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>>>> index e1ca3b8..4613aa1 100644
>>>>> --- a/drivers/scsi/scsi_error.c
>>>>> +++ b/drivers/scsi/scsi_error.c
>>>>> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct 
>>>>> scsi_host_template *hostt,
>>>>>   return hostt->eh_abort_handler(scmd);
>>>>>  }
>>>>>
>>>>> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
>>>>> -{
>>>>> - if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
>>>>> - if (scsi_try_bus_device_reset(scmd) != SUCCESS)
>>>>> - if (scsi_try_target_reset(scmd) != SUCCESS)
>>>>> - if (scsi_try_bus_reset(scmd) != SUCCESS)
>>>>> - scsi_try_host_reset(scmd);
>>>>> -}
>>>>> -
>>>>>  /**
>>>>>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error 
>>>>> recovery
>>>>>   * @scmd:   SCSI command structure to hijack
>>>>> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd 
>>>>> *scmd, unsigned char *cmnd,
>>>>>   break;
>>>>>   }
>>>>>   } else if (rtn != FAILED) {
>>>>> - scsi_abort_eh_cmnd(scmd);
>>>>> + scsi_try_to_abort_cmd(shost->hostt, scmd);
>>>>>   rtn = FAILED;
>>>>>   }
>>>>
>>>> The idea is sound, but this implementation would cause "use-after-free"s.
>>>>
>>>> I only know our own LLD well enough to judge, but with zFCP there will
>>>> always be a chance that an abort fails - be it memory pressure,
>>>> hardware/firmware behavior or internal EH in zFCP.
>>>>
>>>> Calling queuecommand() will mean for us in the LLD, that we allocate a
>>>> unique internal request struct for the scsi_cmnd (struct
>>>> zfcp_fsf_request) and add that to our internal hash-table with
>>>> outstanding commands. We assume this scsi_cmnd-pointer is ours till we
>>>> complete it via scsi_done are yield it via successful EH-actions.
>>>>
>>>> In case the abort fails, you fail to take back the ownership over the
>>>> scsi command. Which in turn means possible "use-after-free"s when we
>>>> still thinks the scsi command is ours, but EH has already overwritten
>>>> the scsi-command with the original one. When we still get an answer or
>>>> otherwise use the scsi_cmnd-pointer we would access an invalid one.
>>>>
>>> That is actually not try.
>>> As soon as we're calling 'scsi_try_to_abort_command()' ownership is
>>> assumed to reside in the SCSI midlayer;
>>
>> That can not be true. First of all, look at the function itself (v4.10):
>>
>>  static int scsi_try_to_abort_cmd...
>>  {
>>  if (!hostt->eh_abort_handler)
>>  return FAILED;
>>
>>  return hostt->eh_abort_handler(scmd);
>>  }
>>
>> If what you say is true, then this whole API of LLDs providing or
>> choosing not to provide implementations for these function would 

Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command

2017-03-16 Thread Benjamin Block
On Wed, Mar 15, 2017 at 02:54:16PM +0100, Hannes Reinecke wrote:
> On 03/14/2017 06:56 PM, Benjamin Block wrote:
> > Hello Hannes,
> >
> > On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
> >> When a command is sent as part of the error handling there
> >> is not point whatsoever to start EH escalation when that
> >> command fails; we are _already_ in the error handler,
> >> and the escalation is about to commence anyway.
> >> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
> >> commands and let the main EH routine handle the rest.
> >>
> >> Signed-off-by: Hannes Reinecke <h...@suse.de>
> >> Reviewed-by: Johannes Thumshirn <jthumsh...@suse.de>
> >> Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com>
> >> ---
> >>  drivers/scsi/scsi_error.c | 11 +--
> >>  1 file changed, 1 insertion(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> >> index e1ca3b8..4613aa1 100644
> >> --- a/drivers/scsi/scsi_error.c
> >> +++ b/drivers/scsi/scsi_error.c
> >> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct 
> >> scsi_host_template *hostt,
> >>return hostt->eh_abort_handler(scmd);
> >>  }
> >>
> >> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
> >> -{
> >> -  if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
> >> -  if (scsi_try_bus_device_reset(scmd) != SUCCESS)
> >> -  if (scsi_try_target_reset(scmd) != SUCCESS)
> >> -  if (scsi_try_bus_reset(scmd) != SUCCESS)
> >> -  scsi_try_host_reset(scmd);
> >> -}
> >> -
> >>  /**
> >>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
> >>   * @scmd:   SCSI command structure to hijack
> >> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
> >> unsigned char *cmnd,
> >>break;
> >>}
> >>} else if (rtn != FAILED) {
> >> -  scsi_abort_eh_cmnd(scmd);
> >> +  scsi_try_to_abort_cmd(shost->hostt, scmd);
> >>rtn = FAILED;
> >>}
> >
> > The idea is sound, but this implementation would cause "use-after-free"s.
> >
> > I only know our own LLD well enough to judge, but with zFCP there will
> > always be a chance that an abort fails - be it memory pressure,
> > hardware/firmware behavior or internal EH in zFCP.
> >
> > Calling queuecommand() will mean for us in the LLD, that we allocate a
> > unique internal request struct for the scsi_cmnd (struct
> > zfcp_fsf_request) and add that to our internal hash-table with
> > outstanding commands. We assume this scsi_cmnd-pointer is ours till we
> > complete it via scsi_done are yield it via successful EH-actions.
> >
> > In case the abort fails, you fail to take back the ownership over the
> > scsi command. Which in turn means possible "use-after-free"s when we
> > still thinks the scsi command is ours, but EH has already overwritten
> > the scsi-command with the original one. When we still get an answer or
> > otherwise use the scsi_cmnd-pointer we would access an invalid one.
> >
> That is actually not try.
> As soon as we're calling 'scsi_try_to_abort_command()' ownership is
> assumed to reside in the SCSI midlayer;

That can not be true. First of all, look at the function itself (v4.10):

static int scsi_try_to_abort_cmd...
{
if (!hostt->eh_abort_handler)
return FAILED;

return hostt->eh_abort_handler(scmd);
}

If what you say is true, then this whole API of LLDs providing or
choosing not to provide implementations for these function would be
fundamentally broken.
The function itself returns FAILED when there is no such function.. how
is a LLD that does not implement it ever to know that you took ownership
by calling scsi_try_to_abort_cmd()?

Then look at the function-comment:

/**
 * scsi_try_to_abort_cmd - ...
 * ...
 * Notes:
 *SUCCESS does not necessarily indicate that the command
 *has been aborted; it only indicates that the LLDDs
 *has cleared all references to that command.
 *LLDDs should return FAILED only if an abort was required
 *but could not be executed. LLDDs should return FAST_IO_FAIL
 *if the device is temporarily unavaila

Re: [PATCHv3 6/6] scsi: make asynchronous aborts mandatory

2017-03-15 Thread Benjamin Block
On Wed, Mar 15, 2017 at 02:54:09PM +0100, Hannes Reinecke wrote:
> On 03/14/2017 06:33 PM, Benjamin Block wrote:
> > Hello Hannes,
> >
> > On Wed, Mar 01, 2017 at 10:15:20AM +0100, Hannes Reinecke wrote:
> >> There hasn't been any reports for HBAs where asynchronous abort
> >> would not work, so we should make it mandatory and remove
> >> the fallback.
> >>
> >> Signed-off-by: Hannes Reinecke <h...@suse.de>
> >> Reviewed-by: Johannes Thumshirn <jthumsh...@suse.de>
> >> Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com>
> >> ---
> >>  Documentation/scsi/scsi_eh.txt | 18 --
> >>  drivers/scsi/scsi_error.c  | 81 
> >> --
> >>  drivers/scsi/scsi_lib.c|  2 +-
> >>  drivers/scsi/scsi_priv.h   |  3 +-
> >>  include/scsi/scsi_host.h   |  5 ---
> >>  5 files changed, 15 insertions(+), 94 deletions(-)
> >>
> >> diff --git a/Documentation/scsi/scsi_eh.txt 
> >> b/Documentation/scsi/scsi_eh.txt
> >> index 4edb9c1c..11e447b 100644
> >> --- a/Documentation/scsi/scsi_eh.txt
> >> +++ b/Documentation/scsi/scsi_eh.txt
> >> @@ -70,7 +70,7 @@ with the command.
> >>scmd is requeued to blk queue.
> >>
> >>   - otherwise
> >> -  scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
> >> +  scsi_eh_scmd_add(scmd) is invoked for the command.  See
> >>[1-3] for details of this function.
> >>
> >>
> >> @@ -103,9 +103,7 @@ function
> >>  eh_timed_out() callback did not handle the command.
> >>Step #2 is taken.
> >>
> >> - 2. If the host supports asynchronous completion (as indicated by the
> >> -no_async_abort setting in the host template) scsi_abort_command()
> >> -is invoked to schedule an asynchrous abort.
> >> + 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
> >>  Asynchronous abort are not invoked for commands which the
> >>  SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
> >>  already had been aborted once, and this is a retry which failed),
> >> @@ -127,16 +125,13 @@ function
> >>
> >>   scmds enter EH via scsi_eh_scmd_add(), which does the following.
> >>
> >> - 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
> >> -completions and SCSI_EH_CANCEL_CMD for timeouts.
> >> + 1. Links scmd->eh_entry to shost->eh_cmd_q
> >>
> >> - 2. Links scmd->eh_entry to shost->eh_cmd_q
> >> + 2. Sets SHOST_RECOVERY bit in shost->shost_state
> >>
> >> - 3. Sets SHOST_RECOVERY bit in shost->shost_state
> >> + 3. Increments shost->host_failed
> >>
> >> - 4. Increments shost->host_failed
> >> -
> >> - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
> >> + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
> >>
> >>   As can be seen above, once any scmd is added to shost->eh_cmd_q,
> >>  SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
> >> @@ -252,7 +247,6 @@ scmd->allowed.
> >>
> >>   1. Error completion / time out
> >>  ACTION: scsi_eh_scmd_add() is invoked for scmd
> >> -  - set scmd->eh_eflags
> >>- add scmd to shost->eh_cmd_q
> >>- set SHOST_RECOVERY
> >>- shost->host_failed++
> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> >> index 802a081..7b70ee9 100644
> >> --- a/drivers/scsi/scsi_error.c
> >> +++ b/drivers/scsi/scsi_error.c
> >> @@ -163,7 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
> >> *shost)
> >>}
> >>}
> >>
> >> -  scsi_eh_scmd_add(scmd, 0);
> >> +  scsi_eh_scmd_add(scmd);
> >>  }
> >>
> >>  /**
> >> @@ -217,9 +217,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
> >> *shost)
> >>  /**
> >>   * scsi_eh_scmd_add - add scsi cmd to error handling.
> >>   * @scmd: scmd to run eh on.
> >> - * @eh_flag:  optional SCSI_EH flag.
> >>   */
> >> -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
> >> +void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
> >>  {
> >>struct Scsi_Host *shost = scmd->device->host;
> >>unsigned long flags;
> >> @

Re: [PATCHv3 5/6] scsi: make scsi_eh_scmd_add() always succeed

2017-03-14 Thread Benjamin Block
Hello Hannes,

On Wed, Mar 01, 2017 at 10:15:19AM +0100, Hannes Reinecke wrote:
> scsi_eh_scmd_add() currently only will fail if no
> error handler thread is started (which will never be the
> case) or if the state machine encounters an illegal transition.
> 
> But if we're encountering an invalid state transition
> chances is we cannot fixup things with the error handler.
> So better add a WARN_ON for illegal host states and
> make scsi_dh_scmd_add() a void function.
> 
> Signed-off-by: Hannes Reinecke <h...@suse.de>
> Reviewed-by: Johannes Thumshirn <jthumsh...@suse.de>
> ---
>  drivers/scsi/scsi_error.c | 41 +
>  drivers/scsi/scsi_lib.c   |  4 ++--
>  drivers/scsi/scsi_priv.h  |  2 +-
>  3 files changed, 16 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 4613aa1..802a081 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -163,13 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
> *shost)
>   }
>   }
> 
> - if (!scsi_eh_scmd_add(scmd, 0)) {
> - SCSI_LOG_ERROR_RECOVERY(3,
> - scmd_printk(KERN_WARNING, scmd,
> - "terminate aborted command\n"));
> - set_host_byte(scmd, DID_TIME_OUT);
> - scsi_finish_command(scmd);
> - }
> + scsi_eh_scmd_add(scmd, 0);
>  }
> 
>  /**
> @@ -224,28 +218,23 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
> *shost)
>   * scsi_eh_scmd_add - add scsi cmd to error handling.
>   * @scmd:scmd to run eh on.
>   * @eh_flag: optional SCSI_EH flag.
> - *
> - * Return value:
> - *   0 on failure.
>   */
> -int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
> +void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>  {
>   struct Scsi_Host *shost = scmd->device->host;
>   unsigned long flags;
> - int ret = 0;
> + int ret;
> 
> - if (!shost->ehandler)
> - return 0;
> + WARN_ON_ONCE(!shost->ehandler);
>

So I saw that you already changed stuff here after Bart reviewed it. Do
you think it would be useful to make the warning per-host-instance,
rather than per-linux-instance?

Like, if for some reason the EH thread for one SCSI host dies - however
that might happen - we would get an individual warning in the klog for
this one host and could maybe even save the setup by
disabling/re-enabling the whole host - effectively removing the host and
adding a new one, plus a new EH thread for it.

With this WARN_ON_ONCE we end up in a broken setup w/o any information
what exactly broke. Previously we would get at least a SCSI-logging
message. Which would also help with analysis of such bugs - correlate
data etc.


Beste Grüße / Best regards,
  - Benjamin Block

> 
>   spin_lock_irqsave(shost->host_lock, flags);
> - if (scsi_host_set_state(shost, SHOST_RECOVERY))
> - if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
> - goto out_unlock;
> -
> + if (scsi_host_set_state(shost, SHOST_RECOVERY)) {
> + ret = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
> + WARN_ON_ONCE(ret);
> + }
>   if (shost->eh_deadline != -1 && !shost->last_reset)
>   shost->last_reset = jiffies;
> 
> - ret = 1;
>   if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>   eh_flag &= ~SCSI_EH_CANCEL_CMD;
>   scmd->eh_eflags |= eh_flag;
> @@ -253,9 +242,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>   list_add_tail(>eh_entry, >eh_cmd_q);
>   shost->host_failed++;
>   scsi_eh_wakeup(shost);
> - out_unlock:
>   spin_unlock_irqrestore(shost->host_lock, flags);
> - return ret;
>  }
> 
>  /**
> @@ -284,13 +271,11 @@ enum blk_eh_timer_return scsi_times_out(struct request 
> *req)
>   rtn = host->hostt->eh_timed_out(scmd);
> 
>   if (rtn == BLK_EH_NOT_HANDLED) {
> - if (!host->hostt->no_async_abort &&
> - scsi_abort_command(scmd) == SUCCESS)
> - return BLK_EH_NOT_HANDLED;
> -
> - set_host_byte(scmd, DID_TIME_OUT);
> - if (!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))
> - rtn = BLK_EH_HANDLED;
> + if (host->hostt->no_async_abort ||
> + scsi_abort_command(scmd) != SUCCESS) {
> + set_host_byte(scmd, DID_TIME

Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command

2017-03-14 Thread Benjamin Block
Hello Hannes,

On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
> When a command is sent as part of the error handling there
> is not point whatsoever to start EH escalation when that
> command fails; we are _already_ in the error handler,
> and the escalation is about to commence anyway.
> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
> commands and let the main EH routine handle the rest.
>
> Signed-off-by: Hannes Reinecke <h...@suse.de>
> Reviewed-by: Johannes Thumshirn <jthumsh...@suse.de>
> Reviewed-by: Bart Van Assche <bart.vanass...@sandisk.com>
> ---
>  drivers/scsi/scsi_error.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index e1ca3b8..4613aa1 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct 
> scsi_host_template *hostt,
>   return hostt->eh_abort_handler(scmd);
>  }
>
> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
> -{
> - if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
> - if (scsi_try_bus_device_reset(scmd) != SUCCESS)
> - if (scsi_try_target_reset(scmd) != SUCCESS)
> - if (scsi_try_bus_reset(scmd) != SUCCESS)
> - scsi_try_host_reset(scmd);
> -}
> -
>  /**
>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
>   * @scmd:   SCSI command structure to hijack
> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
> unsigned char *cmnd,
>   break;
>   }
>   } else if (rtn != FAILED) {
> - scsi_abort_eh_cmnd(scmd);
> + scsi_try_to_abort_cmd(shost->hostt, scmd);
>   rtn = FAILED;
>   }

The idea is sound, but this implementation would cause "use-after-free"s.

I only know our own LLD well enough to judge, but with zFCP there will
always be a chance that an abort fails - be it memory pressure,
hardware/firmware behavior or internal EH in zFCP.

Calling queuecommand() will mean for us in the LLD, that we allocate a
unique internal request struct for the scsi_cmnd (struct
zfcp_fsf_request) and add that to our internal hash-table with
outstanding commands. We assume this scsi_cmnd-pointer is ours till we
complete it via scsi_done are yield it via successful EH-actions.

In case the abort fails, you fail to take back the ownership over the
scsi command. Which in turn means possible "use-after-free"s when we
still thinks the scsi command is ours, but EH has already overwritten
the scsi-command with the original one. When we still get an answer or
otherwise use the scsi_cmnd-pointer we would access an invalid one.

I guess this might as well be true for other LLDs.


Beste Grüße / Best regards,
  - Benjamin Block

>
> --
> 1.8.5.6
>

--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCHv3 3/6] scsi: make eh_eflags persistent

2017-03-14 Thread Benjamin Block
a/include/scsi/scsi_eh.h
> +++ b/include/scsi/scsi_eh.h
> @@ -31,6 +31,7 @@ extern int scsi_get_sense_info_fld(const u8 * sense_buffer, 
> int sb_len,
>  struct scsi_eh_save {
>   /* saved state */
>   int result;
> + int eh_eflags;
>   enum dma_data_direction data_direction;
>   unsigned underflow;
>   unsigned char cmd_len;
> --
> 1.8.5.6
>

So I think the code is good. The only thing I am missing is a bit of
reasoning why we want to escalate immediately after an already retried
command fails again. Anyway, would not require code-changes.


Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>


Beste Grüße / Best regards,
  - Benjamin Block
--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCHv3 6/6] scsi: make asynchronous aborts mandatory

2017-03-14 Thread Benjamin Block
eh_abort_cmds(_work_q, _done_q))
> - scsi_eh_ready_devs(shost, _work_q, _done_q);
> + scsi_eh_ready_devs(shost, _work_q, _done_q);
> 
>   spin_lock_irqsave(shost->host_lock, flags);
>   if (shost->eh_deadline != -1)
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0735a46..98b2df8 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1593,7 +1593,7 @@ static void scsi_softirq_done(struct request *rq)
>   scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
>   break;
>   default:
> - scsi_eh_scmd_add(cmd, 0);
> + scsi_eh_scmd_add(cmd);
>   break;
>   }
>  }
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index 5be6cbf6..e20ab10 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -18,7 +18,6 @@
>  /*
>   * Scsi Error Handler Flags
>   */
> -#define SCSI_EH_CANCEL_CMD   0x0001  /* Cancel this cmd */
>  #define SCSI_EH_ABORT_SCHEDULED  0x0002  /* Abort has been scheduled */
> 
>  #define SCSI_SENSE_VALID(scmd) \
> @@ -72,7 +71,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, 
> char *vendor,
>  extern int scsi_error_handler(void *host);
>  extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
>  extern void scsi_eh_wakeup(struct Scsi_Host *shost);
> -extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
> +extern void scsi_eh_scmd_add(struct scsi_cmnd *);
>  void scsi_eh_ready_devs(struct Scsi_Host *shost,
>   struct list_head *work_q,
>   struct list_head *done_q);
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 3cd8c3b..afb0481 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -452,11 +452,6 @@ struct scsi_host_template {
>   unsigned no_write_same:1;
> 
>   /*
> -  * True if asynchronous aborts are not supported
> -  */
> - unsigned no_async_abort:1;
> -
> - /*
>* Countdown for host blocking with no commands outstanding.
>*/
>   unsigned int max_host_blocked;
> -- 
> 1.8.5.6
> 

Hmm so, I guess we compromise in terms of how granular we want to
recover? 

When say an abort for command A on LUN 1 behind Port α fails for some
reason, then we also skip all possible aborts for command B on LUN 2
behind Port α and command C on LUN 1 behind Port β? (The host might
already be in recovery by the time command B and C fail)

I mean, in the end, all traffic holds on that host anyway, as long as we
get into EH - which is true as soon as one abort fails. Might as well
skip the wait time till we get there.

I would like it, if we could document that fact a bit more direct.
Otherwise I think that's good. The code also looks good.


Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCH 21/29] drivers, s390: convert fc_fcp_pkt.ref_cnt from atomic_t to refcount_t

2017-03-06 Thread Benjamin Block
On Mon, Mar 06, 2017 at 04:27:11PM +0100, Johannes Thumshirn wrote:
> On 03/06/2017 03:21 PM, Elena Reshetova wrote:
> > refcount_t type and corresponding API should be
> > used instead of atomic_t when the variable is used as
> > a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free
> > situations.
> 
> The subject is wrong, should be something like "scsi: libfc convert
> fc_fcp_pkt.ref_cnt from atomic_t to refcount_t" but not s390.
> 

Yes please, I was extremely confused for a moment here.



Beste Grüße / Best regards,
      - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294



Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run

2017-03-02 Thread Benjamin Block
Hej Hannes,

On Wed, Mar 01, 2017 at 10:15:15AM +0100, Hannes Reinecke wrote:
> The current medium access timeout counter will be increased for
> each command, so if there are enough failed commands we'll hit
> the medium access timeout for even a single failure.
> Fix this by making the timeout per EH run, ie the counter will
> only be increased once per device and EH run.
>
> Cc: Ewan Milne <emi...@redhat.com>
> Cc: Lawrence Obermann <lober...@redhat.com>
> Cc: Benjamin Block <bbl...@linux.vnet.ibm.com>
> Cc: Steffen Maier <ma...@de.ibm.com>
> Signed-off-by: Hannes Reinecke <h...@suse.com>

Steffen already suggested it, It would be nice to have a stable-tag
here. This already caused multiple real-world false-positive outages, I
think that qualifies for stable :)

> ---
>  drivers/scsi/scsi_error.c  | 16 +++-
>  drivers/scsi/sd.c  | 21 +
>  drivers/scsi/sd.h  |  1 +
>  include/scsi/scsi_driver.h |  2 +-
>  4 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index f2cafae..cec439c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -58,6 +58,7 @@
>  static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
>  static int scsi_try_to_abort_cmd(struct scsi_host_template *,
>struct scsi_cmnd *);
> +static int scsi_eh_reset(struct scsi_cmnd *scmd);
>
>  /* called with shost->host_lock held */
>  void scsi_eh_wakeup(struct Scsi_Host *shost)
> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>   if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>   eh_flag &= ~SCSI_EH_CANCEL_CMD;
>   scmd->eh_eflags |= eh_flag;
> + scsi_eh_reset(scmd);
>   list_add_tail(>eh_entry, >eh_cmd_q);
>   shost->host_failed++;
>   scsi_eh_wakeup(shost);
> @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int 
> rtn)
>   if (!blk_rq_is_passthrough(scmd->request)) {
>   struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>   if (sdrv->eh_action)
> - rtn = sdrv->eh_action(scmd, rtn);
> + rtn = sdrv->eh_action(scmd, rtn, false);
> + }
> + return rtn;
> +}
> +
> +static int scsi_eh_reset(struct scsi_cmnd *scmd)
> +{
> + int rtn = SUCCESS;
> +
> + if (!blk_rq_is_passthrough(scmd->request)) {
> + struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
> + if (sdrv->eh_action)
> + rtn = sdrv->eh_action(scmd, rtn, true);
>   }
>   return rtn;
>  }
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index c7839f6..c794686 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -115,7 +115,7 @@
>  static int sd_init_command(struct scsi_cmnd *SCpnt);
>  static void sd_uninit_command(struct scsi_cmnd *SCpnt);
>  static int sd_done(struct scsi_cmnd *);
> -static int sd_eh_action(struct scsi_cmnd *, int);
> +static int sd_eh_action(struct scsi_cmnd *, int, bool);
>  static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
>  static void scsi_disk_release(struct device *cdev);
>  static void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *);
> @@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 
> key)
>   *   sd_eh_action - error handling callback
>   *   @scmd:  sd-issued command that has failed
>   *   @eh_disp:   The recovery disposition suggested by the midlayer
> + *   @reset: Reset medium access counter
>   *
>   *   This function is called by the SCSI midlayer upon completion of an
>   *   error test command (currently TEST UNIT READY). The result of sending
>   *   the eh command is passed in eh_disp.  We're looking for devices that
>   *   fail medium access commands but are OK with non access commands like
>   *   test unit ready (so wrongly see the device as having a successful
> - *   recovery)
> + *   recovery).
> + *   We have to be careful to count a medium access failure only once
> + *   per SCSI EH run; there might be several timed out commands which
> + *   will cause the 'max_medium_access_timeouts' counter to trigger
> + *   after the first SCSI EH run already and set the device to offline.
>   **/
> -static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
> +static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp, bool reset)
>  {
>   struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk);
>
> + if (reset) {
> + /* New SCSI EH run, reset gate va

Re: [PATCH v3 1/2] scsi: Handle Unit Attention when issuing SCSI command

2016-10-25 Thread Benjamin Block
On 01:20 Mon 24 Oct , Gabriel Krisman Bertazi wrote:
> James,
> 
> I fixed the things you pointed out on the previous review.  As
> discussed, I didn't change the code to reuse the request yet.  We can
> follow up on that later.
> 
> Thanks,
> 
> >8
> 
> Usually, re-sending the SCSI command is enough to recover from a Unit
> Attention (UA).  This adds a generic retry code to the SCSI command path
> in case of an UA, before giving up and returning the error condition to
> the caller.
> 
> I added the UA verification into scsi_execute instead of
> scsi_execute_req because there are at least a few callers that invoke
> scsi_execute directly and would benefit from the internal UA retry.
> Also, I didn't use scsi_normalize_sense to not duplicate functionality
> with scsi_execute_req_flags.  Instead, scsi_execute uses a small helper
> function that verifies only the UA condition directly from the raw sense
> buffer.  If this design is not OK, I can refactor to use
> scsi_normalize_sense.
> 
> This prevents us from duplicating the retry code in at least a few
> places.  In particular, it fixes an issue found in some IBM enclosures,
> in which the device may return an Unit Attention during probe, breaking
> the bind with the ses module:
> 
> scsi 1:0:7:0: Failed to get diagnostic page 0x802
> scsi 1:0:7:0: Failed to bind enclosure -19
> 
> Link: https://patchwork.kernel.org/patch/9336763/
> Suggested-by: Brian King <brk...@linux.vnet.ibm.com>
> Suggested-by: James Bottomley <j...@linux.vnet.ibm.com>
> Signed-off-by: Gabriel Krisman Bertazi <kris...@linux.vnet.ibm.com>
> ---
>  drivers/scsi/scsi_lib.c | 24 
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index c71344aebdbb..4f6a91d6ee34 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -163,6 +163,16 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
>  {
>   __scsi_queue_insert(cmd, reason, 1);
>  }
> +
> +static inline bool scsi_sense_unit_attention(const char *sense)
> +{
> + int resp = sense[0] & 0x7f;
> +
> + return ((resp & 0x70) &&
> + ((resp >= 0x72 && (sense[1] & 0xf) == UNIT_ATTENTION) ||
> +  (resp < 0x72 && (sense[2] & 0xf) == UNIT_ATTENTION)));
> +}
> +

Just some minor nit-pick. In other places where we check for valid sense
(SCSI_SENSE_VALID(), scsi_sense_valid())
we always check for whether the upper nibble really only contains 0x70
((resp & 0x70) == 0x70). 

I also find it strange that we now have 3 different and independent
functions/places that check for valid sense.


Beste Grüße / Best regards,
  - Benjamin Block

>  /**
>   * scsi_execute - insert request and wait for the result
>   * @sdev:scsi device
> @@ -187,7 +197,14 @@ int scsi_execute(struct scsi_device *sdev, const 
> unsigned char *cmd,
>   struct request *req;
>   int write = (data_direction == DMA_TO_DEVICE);
>   int ret = DRIVER_ERROR << 24;
> + unsigned char sense_buf[SCSI_SENSE_BUFFERSIZE];
> 
> + if (!sense) {
> + sense = sense_buf;
> + memset(sense, 0, SCSI_SENSE_BUFFERSIZE);
> + }
> +
> + retry:
>   req = blk_get_request(sdev->request_queue, write, __GFP_RECLAIM);
>   if (IS_ERR(req))
>   return ret;
> @@ -210,6 +227,13 @@ int scsi_execute(struct scsi_device *sdev, const 
> unsigned char *cmd,
>*/
>   blk_execute_rq(req->q, NULL, req, 1);
> 
> + if (scsi_sense_unit_attention(sense) && req->retries > 0) {
> + memset(sense, 0, SCSI_SENSE_BUFFERSIZE);
> + retries = req->retries - 1;
> + blk_put_request(req);
> + goto retry;
> + }
> +
>   /*
>* Some devices (USB mass-storage in particular) may transfer
>* garbage data together with a residue indicating that the data
> -- 
> 2.7.4

-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] zfcp: Revert to original scanning behaviour

2016-04-12 Thread Benjamin Block
On 22:14 Mon 11 Apr , Martin K. Petersen wrote:
> >>>>> "Hannes" == Hannes Reinecke <h...@suse.de> writes:
> 
> Hannes> zfcp has its own mechanism for selective scanning, so revert to
> Hannes> the original scanning behaviour to not confuse users.
> 
> Benjamin?
> 

Sry, I meant to quickly test this yesterday but got carried away. As far
as I understand the patch-series this should be good.

Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>



Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] scsi: disable automatic target scan

2016-04-05 Thread Benjamin Block
On 09:29 Fri 01 Apr , Hannes Reinecke wrote:
> On 03/30/2016 09:41 PM, Benjamin Block wrote:
> > Hello Hannes,
> > 
> > I am a bit late here, but as this got pulled and Steffen didn't have
> > time to give it a review, I did today.
> > 
> > On 08:39 Thu 17 Mar , Hannes Reinecke wrote:
> >> On larger installations it is useful to disable automatic LUN
> >> scanning, and only add the required LUNs via udev rules.
> >> This can speed up bootup dramatically.
> >>
> >> This patch introduces a new scan module parameter value 'manual',
> >> which works like 'none', but can be overriden by setting the 'rescan'
> >> value from scsi_scan_target to 'SCSI_SCAN_MANUAL'.
> >> And it updates all relevant callers to set the 'rescan' value
> >> to 'SCSI_SCAN_MANUAL' if invoked via the 'scan' option in sysfs.
> >>
> >> Signed-off-by: Hannes Reinecke <h...@suse.de>
> >> ---
> > 
> > [:snip:]
> > 
> >>
> >> diff --git a/drivers/s390/scsi/zfcp_unit.c b/drivers/s390/scsi/zfcp_unit.c
> >> index 157d3d2..08bba7c 100644
> >> --- a/drivers/s390/scsi/zfcp_unit.c
> >> +++ b/drivers/s390/scsi/zfcp_unit.c
> >> @@ -26,7 +26,8 @@ void zfcp_unit_scsi_scan(struct zfcp_unit *unit)
> >>lun = scsilun_to_int((struct scsi_lun *) >fcp_lun);
> >>
> >>if (rport && rport->port_state == FC_PORTSTATE_ONLINE)
> >> -  scsi_scan_target(>dev, 0, rport->scsi_target_id, lun, 1);
> >> +  scsi_scan_target(>dev, 0, rport->scsi_target_id, lun,
> >> +   SCSI_SCAN_RESCAN);
> > 
> > I'd rather use the new SCSI_SCAN_MANUAL here. We don't want this to be
> > "blocked" by an attribute set in a different (new) code-path and
> > config-location.
> > 
> > This path is used by both, zfcp-recovery and -userinterfaces (sysfs) and
> > in both cases we call it with the intend that the scan is really done -
> > hence the SCSI_SCAN_MANUAL to force a scan. I'd find it very weird if
> > suddenly our users would have to additionally use yet an other config to
> > get the old interfaces working properly.
> > 
> Okay, no problem with that.
> zfcp has its own scanning rules (cf allow_lun_scan module parameter), so
> setting it to 'MANUAL' will just restore the original behaviour.
> 
> Will you be sending a patch for it?
> 

Yeah, sry, once I am through our backlog and I find time to test that
properly.



Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3] scsi: disable automatic target scan

2016-03-30 Thread Benjamin Block
Hello Hannes,

I am a bit late here, but as this got pulled and Steffen didn't have
time to give it a review, I did today.

On 08:39 Thu 17 Mar , Hannes Reinecke wrote:
> On larger installations it is useful to disable automatic LUN
> scanning, and only add the required LUNs via udev rules.
> This can speed up bootup dramatically.
> 
> This patch introduces a new scan module parameter value 'manual',
> which works like 'none', but can be overriden by setting the 'rescan'
> value from scsi_scan_target to 'SCSI_SCAN_MANUAL'.
> And it updates all relevant callers to set the 'rescan' value
> to 'SCSI_SCAN_MANUAL' if invoked via the 'scan' option in sysfs.
> 
> Signed-off-by: Hannes Reinecke <h...@suse.de>
> ---

[:snip:]

> 
> diff --git a/drivers/s390/scsi/zfcp_unit.c b/drivers/s390/scsi/zfcp_unit.c
> index 157d3d2..08bba7c 100644
> --- a/drivers/s390/scsi/zfcp_unit.c
> +++ b/drivers/s390/scsi/zfcp_unit.c
> @@ -26,7 +26,8 @@ void zfcp_unit_scsi_scan(struct zfcp_unit *unit)
>   lun = scsilun_to_int((struct scsi_lun *) >fcp_lun);
> 
>   if (rport && rport->port_state == FC_PORTSTATE_ONLINE)
> - scsi_scan_target(>dev, 0, rport->scsi_target_id, lun, 1);
> + scsi_scan_target(>dev, 0, rport->scsi_target_id, lun,
> +  SCSI_SCAN_RESCAN);

I'd rather use the new SCSI_SCAN_MANUAL here. We don't want this to be
"blocked" by an attribute set in a different (new) code-path and
config-location.

This path is used by both, zfcp-recovery and -userinterfaces (sysfs) and
in both cases we call it with the intend that the scan is really done -
hence the SCSI_SCAN_MANUAL to force a scan. I'd find it very weird if
suddenly our users would have to additionally use yet an other config to
get the old interfaces working properly.



    Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question about expected behavior of terminate_rport_io() in fc_function_template

2015-09-25 Thread Benjamin Block
Hej Hannes,

thx for the short explanation.

On 23:05 Wed 23 Sep , Hannes Reinecke wrote:
> On 09/23/2015 07:06 PM, Benjamin Block wrote:
> > Hello,
> > 
> > just a short question. If a low-level driver implements the function
> > `terminate_rport_io()` in `struct fc_function_template`, and it gets
> > called after IO failed, is the low-level driver expected to handle this
> > request synchronously or can it just schedule an action that is worked on
> > asynchronously to the call to the function?
> > 
> Actually, it doesn't matter, as 'terminate_rport_io()' should cause the
> driver to about outstanding commands. The main idea behind this is that
> the driver clears up any additional state it might have tacked onto the
> command. And calling '->done()', obviously.
> 
> Main goal is to have outstanding I/O returned to the upper layers, so
> that things like multipath can redirect outstanding I/O to other paths
> and facilitate quick failover.
>

Yeah, that is what I thought as well, after I read the initial patch
that introduced that function to the template and stack. Makes much more
sense then an implicit rule.

> 
> > Trouble is, we are seeing problems with SCSI-Commands being used by the
> > upper layers when we expect them to still be ours, after we got a call to
> > that function and didn't react upon it immediately. They do not contain
> > valid content anymore when they should.
> > 
> True; after terminate_rport_io() I/O should have been aborted.
> However, the SCSI layer really shouldn't reuse commands before ->done()
> has been invoked or the command itself has been aborted.
> 
> > I've looked into other implementations and it seems there are both
> > version, some LLDs explicitly wait upon completions of requests they
> > schedule and others just schedule work-items and return. That may
> > already be the answer, but I wanted to make sure I am not missing
> > something here. The documentation on it is not really existing, or I
> > missed it.
> > 
> As indicated, the driver is expected to call ->done() on outstanding
> commands when terminate_rport_io() is called.
> This smells more like an issue with the driver itself; if I were to
> guess I would think that some aborts are not handled correctly ...
> 
> But it's hard to know without details. Do you have some message log or
> something?
> 

It may well be that this is a problem in the driver. I am still working
on it, I have logs but those are very messy because the test load
involves LVM volumes with multiple LUNs and multipathing, and I am
trying to reduce it in order to be better able to debug it.



Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
   Geschäftsführung: Dirk Wittkopp / Sitz der Gesellschaft: Böblingen
   Registergericht: Amtsgericht Stuttgart, HRB 243294

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question about expected behavior of terminate_rport_io() in fc_function_template

2015-09-23 Thread Benjamin Block
Hello,

just a short question. If a low-level driver implements the function
`terminate_rport_io()` in `struct fc_function_template`, and it gets
called after IO failed, is the low-level driver expected to handle this
request synchronously or can it just schedule an action that is worked on
asynchronously to the call to the function?

Trouble is, we are seeing problems with SCSI-Commands being used by the
upper layers when we expect them to still be ours, after we got a call to
that function and didn't react upon it immediately. They do not contain
valid content anymore when they should.

I've looked into other implementations and it seems there are both
version, some LLDs explicitly wait upon completions of requests they
schedule and others just schedule work-items and return. That may
already be the answer, but I wanted to make sure I am not missing
something here. The documentation on it is not really existing, or I
missed it.


Beste Grüße / Best regards,
  - Benjamin Block
--
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
   Geschäftsführung: Dirk Wittkopp / Sitz der Gesellschaft: Böblingen
   Registergericht: Amtsgericht Stuttgart, HRB 243294

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html