date:20170809

Re: [PATCH] sd_zbc: Write unlock zone from sd_uninit_cmnd()

2017-08-09 Thread Damien Le Moal



On 8/10/17 11:14, Damien Le Moal wrote:
> Bart,
> 
> On 8/10/17 00:24, Bart Van Assche wrote:
>> On Wed, 2017-08-09 at 14:15 +0900, Damien Le Moal wrote:
>>> Overall, yes, that is possible. However, considering only write commands
>>> to a single zone, since at any time only one command is dequeued (the
>>> one holding the zone lock), having the "lock+dequeue" and
>>> "unlock+requeue" both atomic prevent any reordering of write commands
>>> directed to that zone. The first command in the queue for the zone will
>>> either (1) get the zone lock and be dequeued OR (2) stay in the queue.
>>> In case (1), the next write for the zone will hit case (2) until the
>>> dispatch command completes. If the dispatch command is requeued (at the
>>> dispatch queue head), we hit back again the (1) or (2) cases, without
>>> any change in the order. Isnt't it ?
>>
>> Hello Damien,
>>
>> Commands that get requeued are not reinserted at their original position
>> in the request queue but usually at a different position. This is what makes
>> requeueing cause request reordering. Anyway, can you check whether replacing
>> patch "sd_zbc: Write unlock zone from sd_uninit_cmnd()" with the (untested)
>> patch below does not trigger any lockups and prevents request reordering?
>> Since that patch keeps the zone lock across requeuing attempts it should
>> prevent request reordering.
> 
> Your patch will indeed prevent deadlock for the case where the write
> command that acquired the lock gets re-queued before running/completing.
> If after re-queueing the command end ups being *before* any following
> sequential write to the same zone, it will be fine as the flag indicates
> that the zone lock is already acquired. So the command prep will not
> cause deadlocking.
> 
> However, the failure pattern I have seen most of the time is a write
> command never tried before ending up at the head of the dispatch queue,
> trying to acquire the zone lock and failing to do so because the lock
> owning command is behind in the queue.
> 
> Ex: The dispatch queue has sequential write commands A at the head and
> write B following.
> (1) A is dequeued and prep-ed, the zone lock is acquired.
> (2) For whatever reason, A gets requeued, still holding the zone lock.
> (3) Between (1) and (2), another context tries to push command B and
> dequeues it. Zone lock fails and (B) goes for requeue (using a
> workqueue, so different context)
> (4) If the requeue work for A runs before the one for B (instead of both
> A and B ending up in the same work), then B ends up at the head of the
> queue.
> (5) From here, B fails systematically. Requests behind B are not tried
> since the device is marked busy. So B stays at the head and the failure
> persists.

Correction: At step (3), since command B is not issued to the HBA, it
goes for requeue in the same context as scsi_queue_rq() call.
At step (4), since A was issued, requeue is done with the requeue list
workqueue. But A can still end up behind B because initially, B is only
requeued at the head of the dispatch local list, which is itself spliced
into the dispatch queue after that. A requeue may still be done before B
goes back in the dispatch queue.

> 
> The patch I sent is not about solving the ordering problem. Never was.
> It just avoids all possible deadlock situations.
> 
> There are more reordering situations possible higher up in the stack.
> Ex: without a scheduler, blk-mq simply moves all submitted requests from
> the CPU context queues into the hctx dispatch queue, in CPU order. This
> means that a thread cleanly writing sequentially to a zone but moved
> from one CPU to another half-way through the sequence can see its write
> reordered in the dispatch queue. Same for the actual dispatch code
> calling scsi_queue_rq(). This code loops over a local list, not the hctx
> queue. So this code running from 2 different contexts with the second
> one starting half way through our well behaving thread sequence will end
> up splitting the sequence into 2 different local lists and so loosing
> the sequential ordering.
> 
> The no-scheduler case is a little special but needs handling. As for the
> blk-mq dispatch code, we need that serialized/single threaded for zoned
> block devices. I am inclined to say that this should be the case for any
> HDD anyway as performance can only be better. Ming's series already
> brings improvements, but no ordering guarantees for zoned devices.
> 
> I am currently trying different approaches for this. In the mean time, I
> would like to see the unlock change patch be applied to fix the deadlock
> problem.
> 
> Best.
> 
>>
>> Thanks,
>>
>> Bart.
>>
>> ---
>>  drivers/scsi/sd_zbc.c| 8 
>>  include/scsi/scsi_cmnd.h | 3 +++
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
>> index 96855df9f49d..6d18b1bd3b26 100644
>> --- a/drivers/scsi/sd_zbc.c
>> +++ b/drivers/scsi/sd_zbc.c
>> @@ -290,10 +290,15 @@ int

Re: [PATCH] sd_zbc: Write unlock zone from sd_uninit_cmnd()

2017-08-09 Thread Damien Le Moal

Bart,

On 8/10/17 00:24, Bart Van Assche wrote:
> On Wed, 2017-08-09 at 14:15 +0900, Damien Le Moal wrote:
>> Overall, yes, that is possible. However, considering only write commands
>> to a single zone, since at any time only one command is dequeued (the
>> one holding the zone lock), having the "lock+dequeue" and
>> "unlock+requeue" both atomic prevent any reordering of write commands
>> directed to that zone. The first command in the queue for the zone will
>> either (1) get the zone lock and be dequeued OR (2) stay in the queue.
>> In case (1), the next write for the zone will hit case (2) until the
>> dispatch command completes. If the dispatch command is requeued (at the
>> dispatch queue head), we hit back again the (1) or (2) cases, without
>> any change in the order. Isnt't it ?
> 
> Hello Damien,
> 
> Commands that get requeued are not reinserted at their original position
> in the request queue but usually at a different position. This is what makes
> requeueing cause request reordering. Anyway, can you check whether replacing
> patch "sd_zbc: Write unlock zone from sd_uninit_cmnd()" with the (untested)
> patch below does not trigger any lockups and prevents request reordering?
> Since that patch keeps the zone lock across requeuing attempts it should
> prevent request reordering.

Your patch will indeed prevent deadlock for the case where the write
command that acquired the lock gets re-queued before running/completing.
If after re-queueing the command end ups being *before* any following
sequential write to the same zone, it will be fine as the flag indicates
that the zone lock is already acquired. So the command prep will not
cause deadlocking.

However, the failure pattern I have seen most of the time is a write
command never tried before ending up at the head of the dispatch queue,
trying to acquire the zone lock and failing to do so because the lock
owning command is behind in the queue.

Ex: The dispatch queue has sequential write commands A at the head and
write B following.
(1) A is dequeued and prep-ed, the zone lock is acquired.
(2) For whatever reason, A gets requeued, still holding the zone lock.
(3) Between (1) and (2), another context tries to push command B and
dequeues it. Zone lock fails and (B) goes for requeue (using a
workqueue, so different context)
(4) If the requeue work for A runs before the one for B (instead of both
A and B ending up in the same work), then B ends up at the head of the
queue.
(5) From here, B fails systematically. Requests behind B are not tried
since the device is marked busy. So B stays at the head and the failure
persists.

The patch I sent is not about solving the ordering problem. Never was.
It just avoids all possible deadlock situations.

There are more reordering situations possible higher up in the stack.
Ex: without a scheduler, blk-mq simply moves all submitted requests from
the CPU context queues into the hctx dispatch queue, in CPU order. This
means that a thread cleanly writing sequentially to a zone but moved
from one CPU to another half-way through the sequence can see its write
reordered in the dispatch queue. Same for the actual dispatch code
calling scsi_queue_rq(). This code loops over a local list, not the hctx
queue. So this code running from 2 different contexts with the second
one starting half way through our well behaving thread sequence will end
up splitting the sequence into 2 different local lists and so loosing
the sequential ordering.

The no-scheduler case is a little special but needs handling. As for the
blk-mq dispatch code, we need that serialized/single threaded for zoned
block devices. I am inclined to say that this should be the case for any
HDD anyway as performance can only be better. Ming's series already
brings improvements, but no ordering guarantees for zoned devices.

I am currently trying different approaches for this. In the mean time, I
would like to see the unlock change patch be applied to fix the deadlock
problem.

Best.

> 
> Thanks,
> 
> Bart.
> 
> ---
>  drivers/scsi/sd_zbc.c| 8 
>  include/scsi/scsi_cmnd.h | 3 +++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> index 96855df9f49d..6d18b1bd3b26 100644
> --- a/drivers/scsi/sd_zbc.c
> +++ b/drivers/scsi/sd_zbc.c
> @@ -290,10 +290,15 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
>* threads running on different CPUs write to the same
>* zone (with a synchronized sequential pattern).
>*/
> + if (cmd->flags & SCMD_IN_PROGRESS)
> + return BLKPREP_OK;
> +
>   if (sdkp->zones_wlock &&
>   test_and_set_bit(zno, sdkp->zones_wlock))
>   return BLKPREP_DEFER;
>  
> + cmd->flags |= SCMD_IN_PROGRESS;
> +
>   return BLKPREP_OK;
>  }
>  
> @@ -302,6 +307,9 @@ void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd)
>   struct request *rq = cmd->request;
>   struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
>  
> +

[RFT PATCH v3 1/1] aic7xxx: printk() lines unbroken (WIP)

2017-08-09 Thread Michał Mirosław

Current code is not SMP-friendly and since now each printk() call
generates a separate line in the log, the output is mostly unreadable.

This patch makes printed lines consistent and unbroken. It is necessarily
big since it touches almost every printk() in the driver.

Signed-off-by: Michał Mirosław 
---
v3:
 - update after testing: turns out, there are a lot more of broken lines

---
 drivers/scsi/Kconfig |   3 +-
 drivers/scsi/aic7xxx/Kconfig |   6 +
 drivers/scsi/aic7xxx/Kconfig.aic79xx |   1 +
 drivers/scsi/aic7xxx/Kconfig.aic7xxx |   1 +
 drivers/scsi/aic7xxx/Makefile|   1 +
 drivers/scsi/aic7xxx/aic79xx.h   |   7 -
 drivers/scsi/aic7xxx/aic79xx_core.c  | 382 --
 drivers/scsi/aic7xxx/aic79xx_osm.c   |  77 ++--
 drivers/scsi/aic7xxx/aic79xx_osm.h   |   2 +-
 drivers/scsi/aic7xxx/aic79xx_reg.h_shipped   | 307 +++
 drivers/scsi/aic7xxx/aic79xx_reg_print.c_shipped | 480 +++
 drivers/scsi/aic7xxx/aic7xxx.h   |   7 -
 drivers/scsi/aic7xxx/aic7xxx_core.c  | 181 -
 drivers/scsi/aic7xxx/aic7xxx_reg.h_shipped   | 157 
 drivers/scsi/aic7xxx/aic7xxx_reg_print.c_shipped | 242 ++--
 drivers/scsi/aic7xxx/aicasm/aicasm_symbol.c  |  39 +-
 drivers/scsi/aic7xxx/aiclib.c| 140 +--
 drivers/scsi/aic7xxx/aiclib.h|  30 ++
 18 files changed, 1035 insertions(+), 1028 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 3c52867dfe28..13971769bb62 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -463,8 +463,7 @@ config SCSI_AACRAID
  will be called aacraid.
 
 
-source "drivers/scsi/aic7xxx/Kconfig.aic7xxx"
-source "drivers/scsi/aic7xxx/Kconfig.aic79xx"
+source "drivers/scsi/aic7xxx/Kconfig"
 source "drivers/scsi/aic94xx/Kconfig"
 source "drivers/scsi/hisi_sas/Kconfig"
 source "drivers/scsi/mvsas/Kconfig"
diff --git a/drivers/scsi/aic7xxx/Kconfig b/drivers/scsi/aic7xxx/Kconfig
new file mode 100644
index ..aeae1ba88281
--- /dev/null
+++ b/drivers/scsi/aic7xxx/Kconfig
@@ -0,0 +1,6 @@
+config SCSI_AICLIB
+   tristate
+   default n
+
+source "drivers/scsi/aic7xxx/Kconfig.aic7xxx"
+source "drivers/scsi/aic7xxx/Kconfig.aic79xx"
diff --git a/drivers/scsi/aic7xxx/Kconfig.aic79xx 
b/drivers/scsi/aic7xxx/Kconfig.aic79xx
index 3b3d599103f8..7cf29839bc54 100644
--- a/drivers/scsi/aic7xxx/Kconfig.aic79xx
+++ b/drivers/scsi/aic7xxx/Kconfig.aic79xx
@@ -6,6 +6,7 @@ config SCSI_AIC79XX
tristate "Adaptec AIC79xx U320 support"
depends on PCI && SCSI
select SCSI_SPI_ATTRS
+   select SCSI_AICLIB
help
This driver supports all of Adaptec's Ultra 320 PCI-X
based SCSI controllers.
diff --git a/drivers/scsi/aic7xxx/Kconfig.aic7xxx 
b/drivers/scsi/aic7xxx/Kconfig.aic7xxx
index 55ac55ee6068..0d6cea939551 100644
--- a/drivers/scsi/aic7xxx/Kconfig.aic7xxx
+++ b/drivers/scsi/aic7xxx/Kconfig.aic7xxx
@@ -6,6 +6,7 @@ config SCSI_AIC7XXX
tristate "Adaptec AIC7xxx Fast -> U160 support (New Driver)"
depends on (PCI || EISA) && SCSI
select SCSI_SPI_ATTRS
+   select SCSI_AICLIB
---help---
This driver supports all of Adaptec's Fast through Ultra 160 PCI
based SCSI controllers as well as the aic7770 based EISA and VLB
diff --git a/drivers/scsi/aic7xxx/Makefile b/drivers/scsi/aic7xxx/Makefile
index 58ce5af3970f..834cb410a44a 100644
--- a/drivers/scsi/aic7xxx/Makefile
+++ b/drivers/scsi/aic7xxx/Makefile
@@ -7,6 +7,7 @@
 # Let kbuild descend into aicasm when cleaning
 subdir-+= aicasm
 
+obj-$(CONFIG_SCSI_AICLIB)  += aiclib.o
 obj-$(CONFIG_SCSI_AIC7XXX) += aic7xxx.o
 obj-$(CONFIG_SCSI_AIC79XX) += aic79xx.o
 
diff --git a/drivers/scsi/aic7xxx/aic79xx.h b/drivers/scsi/aic7xxx/aic79xx.h
index d47b527b25dd..f81287db98bb 100644
--- a/drivers/scsi/aic7xxx/aic79xx.h
+++ b/drivers/scsi/aic7xxx/aic79xx.h
@@ -1468,11 +1468,4 @@ extern uint32_t ahd_debug;
 void   ahd_print_devinfo(struct ahd_softc *ahd,
  struct ahd_devinfo *devinfo);
 void   ahd_dump_card_state(struct ahd_softc *ahd);
-intahd_print_register(const ahd_reg_parse_entry_t *table,
-  u_int num_entries,
-  const char *name,
-  u_int address,
-  u_int value,
-  u_int *cur_column,
-  u_int wrap_point);
 #endif /* _AIC79XX_H_ */
diff --git a/drivers/scsi/aic7xxx/aic79xx_core.c 
b/drivers/scsi/aic7xxx/aic79xx_core.c
index 95d8f25cbcca..cc517f7f4c5c 100644
---

Re: [PATCH v2 3/5] aic7xxx: remove rules for shipped files

2017-08-09 Thread Michał Mirosław

On Fri, Aug 04, 2017 at 01:28:09AM +0200, Michał Mirosław wrote:
[...]
> --- a/drivers/scsi/aic7xxx/Makefile
> +++ b/drivers/scsi/aic7xxx/Makefile
> @@ -61,8 +61,6 @@ $(obj)/aic7xxx_seq.h: $(src)/aic7xxx.seq $(src)/aic7xxx.reg 
> $(obj)/aicasm/aicasm
[...]
> -else
> -$(obj)/aic7xxx_reg_print.c: $(src)/aic7xxx_reg_print.c_shipped
[...]

Hi,

Please drop this patch (only this one). I didn't notice, that the file is
needed by modpost:

  Building modules, stage 2.
  MODPOST 785 modules
WARNING: could not open drivers/scsi/aic7xxx/aic79xx_reg_print.c: No
such file or directory
  LD [M]  drivers/scsi/aic7xxx/aic79xx.ko

Best Regards,
Michał Mirosław

RE: [PATCH RESEND 0/6] hpsa: support legacy boards

2017-08-09 Thread Don Brace

> -Original Message-
> From: Hannes Reinecke [mailto:h...@suse.de]
> Sent: Tuesday, August 08, 2017 3:35 AM
> To: Martin K. Petersen 
> Cc: Don Brace ; Christoph Hellwig
> ; James Bottomley
> ; Meelis Roos
> ; linux-scsi@vger.kernel.org; Hannes Reinecke
> 
> Subject: [PATCH RESEND 0/6] hpsa: support legacy boards
> 
> EXTERNAL EMAIL
> 
> 
> (Resend to include linux-scsi)
> 
> Hi all,
> 
> this patch adds support for legacy boards, ie for boards previously
> supported by cciss only.
> With this patchset the hpsa driver should work with all Smart Array
> boards if the 'hpsa_allow_any' module option is set, rendering the
> cciss driver obsolete.
> 
> Hannes Reinecke (5):
>   hpsa: consolidate status variables
>   hpsa: add support for legacy boards
>   hpsa: disable volume status check for older controller
>   hpsa: Ignore errors for unsupported LV_DEVICE_ID VPD page
>   hpsa: do not print errors for unsupported report luns format
> 
> Jeff Mahoney (1):
>   hpsa: handle unsupported devices more gracefully
> 
>  drivers/scsi/hpsa.c | 131
> 
>  drivers/scsi/hpsa.h |  57 ---
>  2 files changed, 162 insertions(+), 26 deletions(-)
> 
> --
> 1.8.5.6

NACK this series

We do not want to support these older controllers in the hpsa driver.

While the driver may load and manage requests under healthy conditions
many features will not be supported. Customers will not have support
for any issues involving these controllers.

We would need to track these changes in our OOB driver adding
more confusion.

For more information please see:
http://cciss.sourceforge.net


Thanks,
Don Brace
ESC - Smart Storage
Microsemi Corporation

Re: [PATCHv2 4/4] scsi_devinfo: fixup string compare

2017-08-09 Thread Alan Stern

On Wed, 9 Aug 2017, Hannes Reinecke wrote:

> When checking the model and vendor string we need to use the
> minimum value of either string, otherwise we'll miss out on
> wildcard matches.

This is now true only for the model string, not the vendor string.  And 
even for the model string, you allow the lengths to differ in only one 
direction (the model string can be longer than the devinfo model 
string, but not vice versa).

> And we should take card when matching with zero size strings;
> results might be unpredictable.
> With this patch the rules for matching devinfo strings are
> as follows:
> - Vendor strings must match exactly
> - Empty Model strings will only match if the devinfo model
>   is also empty
> - Model strings shorter than the devinfo model string will
>   not match

Good, this is a lot better than before.  These rules make sense.

However, the rules and the code are both somewhat redundant.  For 
example, the second rule above is already a consequence of the third 
rule: If the model string is empty and the devinfo model string isn't, 
then the model string is shorter than the devinfo model string so they 
won't match.

> Fixes: 5e7ff2c ("SCSI: fix new bug in scsi_dev_info_list string matching")
> Signed-off-by: Hannes Reinecke 
> ---
>  drivers/scsi/scsi_devinfo.c | 46 
> -
>  1 file changed, 33 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
> index 776c701..f8f5cb7 100644
> --- a/drivers/scsi/scsi_devinfo.c
> +++ b/drivers/scsi/scsi_devinfo.c
> @@ -399,8 +399,8 @@ int scsi_dev_info_list_add_keyed(int compatible, char 
> *vendor, char *model,
>  
>  /**
>   * scsi_dev_info_list_find - find a matching dev_info list entry.
> - * @vendor:  vendor string
> - * @model:   model (product) string
> + * @vendor:  full vendor string
> + * @model:   full model (product) string
>   * @key: specify list to use
>   *
>   * Description:
> @@ -440,6 +440,8 @@ static struct scsi_dev_info_list 
> *scsi_dev_info_list_find(const char *vendor,
>   /* Also skip trailing spaces */
>   while (vmax > 0 && vskip[vmax - 1] == ' ')
>   --vmax;
> + if (!vmax)
> + vskip = NULL;
>  
>   mmax = sizeof(devinfo->model);
>   mskip = model;
> @@ -449,27 +451,45 @@ static struct scsi_dev_info_list 
> *scsi_dev_info_list_find(const char *vendor,
>   }
>   while (mmax > 0 && mskip[mmax - 1] == ' ')
>   --mmax;
> + if (!mmax)
> + mskip = NULL;

Neither of these changes is needed.

>  
>   list_for_each_entry(devinfo, _table->scsi_dev_info_list,
>   dev_info_list) {
>   if (devinfo->compatible) {
>   /*
> -  * Behave like the older version of get_device_flags.
> +  * vendor strings must be an exact match
>*/
> - if (memcmp(devinfo->vendor, vskip, vmax) ||
> - (vmax < sizeof(devinfo->vendor) &&
> - devinfo->vendor[vmax]))
> + if (vmax != strlen(devinfo->vendor))
>   continue;
> - if (memcmp(devinfo->model, mskip, mmax) ||
> - (mmax < sizeof(devinfo->model) &&
> - devinfo->model[mmax]))
> + if (vskip && vmax &&
> + memcmp(devinfo->vendor, vskip, vmax))
>   continue;

There's no need to test vskip and vmax.  The memcmp test alone is 
sufficient, since you know the lengths are equal and you are looking 
for an exact match.  So in the end, you could just do this:

if (vmax != strlen(devinfo->vendor) ||
memcmp(devinfo->vendor, vskip, vmax))
continue;

> - return devinfo;
> + /*
> +  * Empty model strings only match if both strings
> +  * are empty.
> +  */
> + if (!mmax && !strlen(devinfo->model))
> + return devinfo;

As mentioned above, this is not necessary.

> +
> + /*
> +  * Empty @model never matches
> +  */
> + if (!mskip)
> + continue;

Nor is this.

> +
> + /*
> +  * @model specifies the full string, and
> +  * must be larger or equal to devinfo->model
> +  */
> + if (!memcmp(devinfo->model, mskip,
> + strlen(devinfo->model)))
> + return devinfo;

It would be safer to do it this way:

Re: [v4.13-rc BUG] system lockup when running big buffered write(4M) to IB SRP via mpath

2017-08-09 Thread Bart Van Assche

On Wed, 2017-08-09 at 12:43 -0400, Laurence Oberman wrote:
> Your latest patch on stock upstream without Ming's latest patches is 
> behaving for me.
> 
> As already mentioned, the requeue -11 and clone failure messages are 
> gone and I am not actually seeing any soft lockups or hard lockups.
> 
> When Ming gets back I will work with him on his patch set and the lockups.
> 
> Running 10 parallel writes which easily trips into soft lockups on 
> Ming's kernel (even with your patch) has been stable here on 4.13-RC3 
> with your patch.
> 
> I will leave it running for a while now but the patch is good.
> 
> If it survives 4 hours I will add a Tested-by to your latest patch.

Hello Laurence,

I'm working on an additional patch that should reduce unnecessary requeuing
even further. I will let you know when it's ready.

Additionally, please trim e-mails when replying such that e-mails do not get
too long.

Thanks,

Bart.

Re: Increased memory usage with scsi-mq

2017-08-09 Thread Paolo Bonzini

On 09/08/2017 18:01, Christoph Hellwig wrote:
> On Mon, Aug 07, 2017 at 03:07:48PM +0200, Paolo Bonzini wrote:
>> can_queue should depend on the virtqueue size, which unfortunately can
>> vary for each virtio-scsi device in theory.  The virtqueue size is
>> retrieved by drivers/virtio prior to calling vring_create_virtqueue, and
>> in QEMU it is the second argument to virtio_add_queue.
> 
> Why is that unfortunate?  We don't even have to set can_queue in
> the host template, we can dynamically set it on per-host basis.

Ah, cool, I thought allocations based on can_queue happened already in
scsi_host_alloc, but they happen at scsi_add_host time.

Paolo

Re: [v4.13-rc BUG] system lockup when running big buffered write(4M) to IB SRP via mpath

2017-08-09 Thread Laurence Oberman




On 08/08/2017 10:28 PM, Laurence Oberman wrote:

On Tue, 2017-08-08 at 20:11 -0400, Laurence Oberman wrote:

On Tue, 2017-08-08 at 22:17 +0800, Ming Lei wrote:

Hi Guys,
  
Laurence and I see a system lockup issue when running

concurrent

big buffered write(4M bytes) to IB SRP on v4.13-rc3.
  
1 how to reproduce
  
1) setup IB_SRR & multi path
  
  	#./start_opensm.sh

#./start_srp.sh
  
  	# cat start_opensm.sh

#!/bin/bash
rmmod ib_srpt
opensm -F opensm.1.conf &
opensm -F opensm.2.conf &
  
  	# cat start_srp.sh

run_srp_daemon  -f /etc/ddn/srp_daemon.conf -R 30 -T 10

-t 7000

-ance -i mlx5_0 -p 1 1>/root/srp1.log 2>&1 &
run_srp_daemon  -f /etc/ddn/srp_daemon.conf -R 30 -T 10

-t 7000

-ance -i mlx5_1 -p 1 1>/root/srp2.log 2>&1 &

  
2) run the following test script
  
  	#./test-mp.sh
  
  	#cat test-mp.sh

#!/bin/sh
DEVS="360014051186d24cc27b4d04994e3
36001405b2b5c6c24c084b6fa4d55da2f
36001405b26ebe76dcb94a489f6f245f8"
  
  	for i in $DEVS; do

for j in `seq 0 15`; do
./hammer_write.sh $i 1>/dev/null 2>&1 &
done
done
  
  	#cat hammer_write.sh

#!/bin/bash
while true; do
dd if=/dev/zero of=/dev/mapper/$1 bs=4096k

count=800

done
  
  
2 lockup log

[root@ibclient ~]# ./test-mp.sh
[root@ibclient ~]# [  169.095494] perf: interrupt took too long
(2575

2500), lowering kernel.perf_event_max_sample_rate to 77000


  
[  178.569124] perf: interrupt took too long (3249 > 3218),

lowering
kernel.perf_event_max_sample_rate to 61000
[  190.586624] perf: interrupt took too long (4066 > 4061),
lowering
kernel.perf_event_max_sample_rate to 49000
[  210.381836] perf: interrupt took too long (5083 > 5082),
lowering
kernel.perf_event_max_sample_rate to 39000
[  240.498659] perf: interrupt took too long (6373 > 6353),
lowering
kernel.perf_event_max_sample_rate to 31000
  
[root@ibclient ~]#

[root@ibclient ~]# [  292.252795] perf: interrupt took too long
(8006

7966), lowering kernel.perf_event_max_sample_rate to 24000


  
[  410.354443] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  410.432725] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  410.499530] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  410.564952] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  410.644493] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  410.698091] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  410.755175] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  410.796784] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  410.837690] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  410.878743] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  463.476823] multipath_clone_and_map: 32 callbacks suppressed
[  463.506773] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  463.586752] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  463.656880] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  463.698919] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  463.723572] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  463.751820] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  463.781110] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  463.828267] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  463.856997] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  463.885998] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  468.572060] multipath_clone_and_map: 5201 callbacks

suppressed

[  468.602191] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  468.655761] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  468.697540] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  468.738610] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  468.779597] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  468.820705] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  468.862712] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  468.904662] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  468.945132] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  468.985128] device-mapper: multipath: blk_get_request()

returned

-11 - requeuing
[  473.643013] multipath_clone_and_map: 6569 callbacks

suppressed

[  473.673466] device-mapper: multipath: blk_get_request()

returned

-11 -

Re: [PATCHv2] scsi_dh_alua: suppress errors from unsupported devices

2017-08-09 Thread Christoph Hellwig

> index 0962fd5..c3aea06 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -1100,6 +1100,8 @@ static int alua_bus_attach(struct scsi_device *sdev)
>   err = alua_initialize(sdev, h);
>   if (err == SCSI_DH_NOMEM)
>   ret = -ENOMEM;
> + if (err == SCSI_DH_DEV_UNSUPP)
> + ret = -ENODEV;
>   if (err != SCSI_DH_OK && err != SCSI_DH_DEV_OFFLINED)
>   goto failed;

This screams for a switch statement..

> --- a/drivers/scsi/scsi_dh.c
> +++ b/drivers/scsi/scsi_dh.c
> @@ -133,8 +133,9 @@ static int scsi_dh_handler_attach(struct scsi_device 
> *sdev,
>  
>   error = scsi_dh->attach(sdev);
>   if (error) {
> - sdev_printk(KERN_ERR, sdev, "%s: Attach failed (%d)\n",
> - scsi_dh->name, error);
> + if (error != -ENODEV)
> + sdev_printk(KERN_ERR, sdev, "%s: Attach failed (%d)\n",
> + scsi_dh->name, error);
>   module_put(scsi_dh->module);

OTOH why don't we just return the SCSI_DH_ values from ->attach?

Re: Increased memory usage with scsi-mq

2017-08-09 Thread Christoph Hellwig

On Mon, Aug 07, 2017 at 03:07:48PM +0200, Paolo Bonzini wrote:
> can_queue should depend on the virtqueue size, which unfortunately can
> vary for each virtio-scsi device in theory.  The virtqueue size is
> retrieved by drivers/virtio prior to calling vring_create_virtqueue, and
> in QEMU it is the second argument to virtio_add_queue.

Why is that unfortunate?  We don't even have to set can_queue in
the host template, we can dynamically set it on per-host basis.

Re: [PATCH 2/6] hpsa: add support for legacy boards

2017-08-09 Thread Christoph Hellwig

On Wed, Aug 09, 2017 at 05:08:05PM +0200, Hannes Reinecke wrote:
> On 08/09/2017 03:41 PM, Christoph Hellwig wrote:
> > On Tue, Aug 08, 2017 at 10:35:11AM +0200, Hannes Reinecke wrote:
> >> Add support for legacy boards, ensuring to enable the driver for
> >> those boards only when 'hpsa_allow_any' is set.
> > 
> > Why the wildcard instead of the specific IDs from the cciss driver?
> > 
> Because I'm lazy? And we're catching those boards with the wildcard
> entry anyway? And the distinction between wildcard entry (designed to
> catch unsupported devices) and pci IDs from the cciss drivers (which
> will be unsupported, too) is meaningless?

So what again is unsupported?

How do we know there never was any other compaq device claiming to be
raid class?

Re: [PATCH] scsi: make 'state' device attribute pollable

2017-08-09 Thread Christoph Hellwig

On Wed, Aug 09, 2017 at 03:09:18PM +0200, Hannes Reinecke wrote:
> While the 'state' attribute can (and will) change occasionally,
> calling 'poll()' or 'select()' on it fails as sysfs is never
> notified that the state has changed.
> With this patch calling 'poll()' or 'select()' will work
> properly.
> 
> Signed-off-by: Hannes Reinecke 
> ---
>  drivers/scsi/scsi_lib.c   | 7 +--
>  drivers/scsi/scsi_transport_srp.c | 5 -
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 41c19c7..2101cfd 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2654,6 +2654,7 @@ void scsi_exit_queue(void)
>  
>   }
>   sdev->sdev_state = state;
> + sysfs_notify(>sdev_gendev.kobj, NULL, "state");
>   return 0;
>  
>   illegal:
> @@ -3074,14 +3075,16 @@ int scsi_internal_device_unblock_nowait(struct 
> scsi_device *sdev,
>* offlined states and goose the device queue if successful.
>*/
>   if ((sdev->sdev_state == SDEV_BLOCK) ||
> - (sdev->sdev_state == SDEV_TRANSPORT_OFFLINE))
> + (sdev->sdev_state == SDEV_TRANSPORT_OFFLINE)) {
>   sdev->sdev_state = new_state;
> - else if (sdev->sdev_state == SDEV_CREATED_BLOCK) {
> + sysfs_notify(>sdev_gendev.kobj, NULL, "state");
> + } else if (sdev->sdev_state == SDEV_CREATED_BLOCK) {
>   if (new_state == SDEV_TRANSPORT_OFFLINE ||
>   new_state == SDEV_OFFLINE)
>   sdev->sdev_state = new_state;
>   else
>   sdev->sdev_state = SDEV_CREATED;
> + sysfs_notify(>sdev_gendev.kobj, NULL, "state");
>   } else if (sdev->sdev_state != SDEV_CANCEL &&
>sdev->sdev_state != SDEV_OFFLINE)
>   return -EINVAL;

This would be a lot more readable using switch statements:

switch (sdev->sdev_state) {
case SDEV_BLOCK:
case SDEV_TRANSPORT_OFFLINE:
sdev->sdev_state = new_state;
sysfs_notify(>sdev_gendev.kobj, NULL, "state");
break;
case SDEV_CREATED_BLOCK:
switch (new_state) {
case SDEV_TRANSPORT_OFFLINE:
case SDEV_OFFLINE:
sdev->sdev_state = new_state;
break;
default:
sdev->sdev_state = SDEV_CREATED;
break;
}
sysfs_notify(>sdev_gendev.kobj, NULL, "state");
break;
case SDEV_CANCEL:
case SDEV_OFFLINE:
break;
default:
return -EINVAL;
}

Otherwise it looks fine to me.

Re: [PATCH] sd_zbc: Write unlock zone from sd_uninit_cmnd()

2017-08-09 Thread Bart Van Assche

On Wed, 2017-08-09 at 14:15 +0900, Damien Le Moal wrote:
> Overall, yes, that is possible. However, considering only write commands
> to a single zone, since at any time only one command is dequeued (the
> one holding the zone lock), having the "lock+dequeue" and
> "unlock+requeue" both atomic prevent any reordering of write commands
> directed to that zone. The first command in the queue for the zone will
> either (1) get the zone lock and be dequeued OR (2) stay in the queue.
> In case (1), the next write for the zone will hit case (2) until the
> dispatch command completes. If the dispatch command is requeued (at the
> dispatch queue head), we hit back again the (1) or (2) cases, without
> any change in the order. Isnt't it ?

Hello Damien,

Commands that get requeued are not reinserted at their original position
in the request queue but usually at a different position. This is what makes
requeueing cause request reordering. Anyway, can you check whether replacing
patch "sd_zbc: Write unlock zone from sd_uninit_cmnd()" with the (untested)
patch below does not trigger any lockups and prevents request reordering?
Since that patch keeps the zone lock across requeuing attempts it should
prevent request reordering.

Thanks,

Bart.

---
 drivers/scsi/sd_zbc.c| 8 
 include/scsi/scsi_cmnd.h | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 96855df9f49d..6d18b1bd3b26 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -290,10 +290,15 @@ int sd_zbc_write_lock_zone(struct scsi_cmnd *cmd)
 * threads running on different CPUs write to the same
 * zone (with a synchronized sequential pattern).
 */
+   if (cmd->flags & SCMD_IN_PROGRESS)
+   return BLKPREP_OK;
+
if (sdkp->zones_wlock &&
test_and_set_bit(zno, sdkp->zones_wlock))
return BLKPREP_DEFER;
 
+   cmd->flags |= SCMD_IN_PROGRESS;
+
return BLKPREP_OK;
 }
 
@@ -302,6 +307,9 @@ void sd_zbc_write_unlock_zone(struct scsi_cmnd *cmd)
struct request *rq = cmd->request;
struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
 
+   WARN_ON_ONCE(!(cmd->flags & SCMD_IN_PROGRESS));
+   cmd->flags &= ~SCMD_IN_PROGRESS;
+
if (sdkp->zones_wlock) {
unsigned int zno = sd_zbc_zone_no(sdkp, blk_rq_pos(rq));
WARN_ON_ONCE(!test_bit(zno, sdkp->zones_wlock));
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index a1266d318c85..f18c812094b5 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -57,6 +57,9 @@ struct scsi_pointer {
 /* for scmd->flags */
 #define SCMD_TAGGED(1 << 0)
 #define SCMD_UNCHECKED_ISA_DMA (1 << 1)
+#ifdef CONFIG_BLK_DEV_ZONED
+#define SCMD_IN_PROGRESS   (1 << 2)
+#endif
 
 struct scsi_cmnd {
struct scsi_request req;

Re: [PATCH 3/4] scsi: add Mylex RAID controller

2017-08-09 Thread Hannes Reinecke

On 08/09/2017 04:10 PM, Christoph Hellwig wrote:
> On Mon, Aug 07, 2017 at 08:09:11AM +0200, Hannes Reinecke wrote:
>> On 08/05/2017 01:39 PM, Christoph Hellwig wrote:
>>> Can you use normal linux style for the code instead of copy and
>>> pasting the weird naming and capitalization from the DAC960 driver?
>>>
>> Yes; already planned for v2. But first wanted to get some general
>> feedback (like: is anyone interested in that at all?)
> 
> Yes, please go ahead and kill off the DAC960 driver.
> 
I knew you would be saying that :-)

>>
>>> Also please use the driver name as prefix for the functions.
>>>
>> Ok.
>>
>>> Maybe myraid instead of mylex?
>>>
>> Nah; I'd rather stick with mylex.
>> (Especially as it says 'Mylex' in big fat letters on the board :-)
> 
> Naming drivers after the vendor only is usually a bad idea as vendors
> have/had multiple products.  In this case we have plenty other drivers
> that support Mylex products.
> 
Okay. Make it 'myraid_pio' and 'myraid_mmio' then.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH 0/4] mylex: Replace DAC960 block driver

2017-08-09 Thread Hannes Reinecke

On 08/09/2017 04:23 PM, Christoph Hellwig wrote:
> On Wed, Aug 02, 2017 at 04:13:24PM +0200, Hannes Reinecke wrote:
>> Hi all,
>>
>> as we're trying to get rid of the remaining request_fn drivers here's
>> a patchset to move the DAC960 driver to the SCSI stack.
>> The new driver is called 'mylex'.
>>
>> The Mylex/DAC960 HBA comes in two flavours; the later one (V2) already
>> has a pretty complete SCSI emulation layer, so we just have to reformat
>> the command.
>> For for earlier ones (V1) we don't have a SCSI emulation layer for the
>> logical drives, so I've added a (pretty rudimentary, admittedly) SCSI
>> translation for them.
>> And the weird proc interface has been converted to sysfs attributes.
> 
> It seems like V1 and V2 basically don't share any code at all, so it seems
> like they should be separate drivers.  And for V1 it seems like it might
> even be better off as a blk-mq driver - which could then reuse the DAC960
> name and device nodes?
> 
> For the V2 firmware your approach looks great, though.
> 
Guess what, I've started doing the separation already :-)
For V1 I'm indeed unsure; thing is, it _does_ have a SCSI passthrough
for the physical devices, so that definitely simplifies things.
And for the logical drives; doing a SCSI emulation is not hard to do
(especially so if one sticks with SCSI-3 :-), so I'd rather move it to
become a SCSI driver.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH 2/6] hpsa: add support for legacy boards

2017-08-09 Thread Hannes Reinecke

On 08/09/2017 03:41 PM, Christoph Hellwig wrote:
> On Tue, Aug 08, 2017 at 10:35:11AM +0200, Hannes Reinecke wrote:
>> Add support for legacy boards, ensuring to enable the driver for
>> those boards only when 'hpsa_allow_any' is set.
> 
> Why the wildcard instead of the specific IDs from the cciss driver?
> 
Because I'm lazy? And we're catching those boards with the wildcard
entry anyway? And the distinction between wildcard entry (designed to
catch unsupported devices) and pci IDs from the cciss drivers (which
will be unsupported, too) is meaningless?

But if you insist ...

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH 5/6] hpsa: do not print errors for unsupported report luns format

2017-08-09 Thread Hannes Reinecke

On 08/09/2017 03:46 PM, Christoph Hellwig wrote:
> I don't like the misnamed supported flag.  Either we should always ignore
> the errors, or key it off a specific flag for newer firmware.
> 
Well, there is no such hardware flag; there are several features
implemented across the board. Hence we need to store it in the driver
structure, which actually is the reason for the 'supported' flag.

I can rename it to 'legacy_board' or the like if 'supported' is not an
acceptable naming ...

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH 2/6] hpsa: add support for legacy boards

2017-08-09 Thread Hannes Reinecke

On 08/09/2017 03:45 PM, Christoph Hellwig wrote:
>> -static int hpsa_lookup_board_id(struct pci_dev *pdev, u32 *board_id)
>> +static int hpsa_lookup_board_id(struct pci_dev *pdev, u32 *board_id,
>> +bool *supported)
>>  {
>>  int i;
>>  u32 subsystem_vendor_id, subsystem_device_id;
>> @@ -7242,9 +7266,22 @@ static int hpsa_lookup_board_id(struct pci_dev *pdev, 
>> u32 *board_id)
>>  *board_id = ((subsystem_device_id << 16) & 0x) |
>>  subsystem_vendor_id;
>>  
>> +if (supported)
>> +*supported = true;
>>  for (i = 0; i < ARRAY_SIZE(products); i++)
>> -if (*board_id == products[i].board_id)
>> -return i;
>> +if (*board_id == products[i].board_id) {
>> +if (products[i].access != _access &&
>> +products[i].access != _access)
>> +return i;
>> +if (hpsa_allow_any) {
>> +dev_warn(>dev,
>> + "unsupported board ID: 0x%08x\n",
>> + *board_id);
>> +if (supported)
>> +*supported = false;
>> +return i;
>> +}
>> +}
> 
> Can you explain the point of the supported flag?
> 
'hpsa_allow_any' just enables the _driver_ to bind to older /
unsupported boards, it doesn't tell us if this particular instance is an
old board.
For older boards we should suppress messages from not-implemented
features, whereas on 'real' boards those features should be implemented,
and we should be throwing an error or a warning.
Hence the =supported" flag.

>> +unsigned long register_value  =
>> +readl(h->vaddr + SA5_INTR_STATUS);
>> +return (register_value & SA5B_INTR_PENDING);
> 
> This should be condensed into:
> 
>   return readl(h->vaddr + SA5_INTR_STATUS) & SA5B_INTR_PENDING;
> 
Yeah; but this is _actually_ just copied over, so other lines will be
affected here, too.
Will be adding a patch for that.

>>  .command_completed = SA5_completed,
>>  };
>>  
>> +/* Duplicate entry of the above to mark unsupported boards */
>> +static struct access_method SA5A_access = {
>> +.submit_command = SA5_submit_command,
>> +.set_intr_mask = SA5_intr_mask,
>> +.intr_pending = SA5_intr_pending,
>> +.command_completed = SA5_completed,
>> +};
>> +
>> +static struct access_method SA5B_access = {
>> +.submit_command = SA5_submit_command,
>> +.set_intr_mask = SA5B_intr_mask,
>> +.intr_pending = SA5B_intr_pending,
>> +.command_completed = SA5_completed,
>> +};
> 
> Please align the fields nicely, e.g.:
> 
> static struct access_method SA5A_access = {
>   .submit_command = SA5_submit_command,
>   .set_intr_mask  = SA5_intr_mask,
>   ...
> 
Okay.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH 0/4] mylex: Replace DAC960 block driver

2017-08-09 Thread Christoph Hellwig

On Wed, Aug 02, 2017 at 04:13:24PM +0200, Hannes Reinecke wrote:
> Hi all,
> 
> as we're trying to get rid of the remaining request_fn drivers here's
> a patchset to move the DAC960 driver to the SCSI stack.
> The new driver is called 'mylex'.
> 
> The Mylex/DAC960 HBA comes in two flavours; the later one (V2) already
> has a pretty complete SCSI emulation layer, so we just have to reformat
> the command.
> For for earlier ones (V1) we don't have a SCSI emulation layer for the
> logical drives, so I've added a (pretty rudimentary, admittedly) SCSI
> translation for them.
> And the weird proc interface has been converted to sysfs attributes.

It seems like V1 and V2 basically don't share any code at all, so it seems
like they should be separate drivers.  And for V1 it seems like it might
even be better off as a blk-mq driver - which could then reuse the DAC960
name and device nodes?

For the V2 firmware your approach looks great, though.

[RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-09 Thread Benjamin Block

In contrast to the normal SCSI-lib, the BSG block-queue doesn't make use of
any extra init_rq_fn() to make additional allocations during
request-creation, and the request sense-pointer is not used to transport
SCSI sense data, but is used as backing for the bsg_job->reply pointer;
that in turn is used in the LLDs to store protocol IUs or similar stuff.
This 're-purposing' of the sense-pointer is done in the BSG blk-lib
(bsg_create_job()), during the queue-processing.

Failing to allocate/assign it results in illegal dereferences because LLDs
use this pointer unquestioned, as can be seen in the various
BSG-implementations:

drivers/scsi/libfc/fc_lport.c:  fc_lport_bsg_request()
drivers/scsi/qla2xxx/qla_bsg.c: qla24xx_bsg_request()
drivers/scsi/qla4xxx/ql4_bsg.c: qla4xxx_process_vendor_specific()
drivers/s390/scsi/zfcp_fc.c:zfcp_fc_ct_els_job_handler()
...

An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:01590007 R3:0024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: 
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 65cb0100 task.stack: 65cb4000
Krnl PSW : 0704e0018000 03ff801e4156 
(zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0001 5fa9d0d0 5fa9d078 00e16866
   03ff0290 6b6b6b6b6b6b6b6b 59f78f00 000f
   593a0958 593a0958 60d88800 5ddd4c38
   58b50100 0700659cba08 03ff801e8556 659cb9a8
Krnl Code: 03ff801e4146: e3102054lg  %r1,80(%r2)
   03ff801e414c: 58402040   l   %r4,64(%r2)
  #03ff801e4150: e3502024   lg  %r5,32(%r2)
  >03ff801e4156: 50405004   st  %r4,4(%r5)
   03ff801e415a: e54c5008   mvhi8(%r5),0
   03ff801e4160: e33010280012   lt  %r3,40(%r1)
   03ff801e4166: a718fffb   lhi %r1,-5
   03ff801e416a: 1803   lr  %r0,%r3
Call Trace:
([<03ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
 [<03ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
 [<03ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
 [<009b91b6>] qdio_kick_handler+0x2ae/0x2c8
 [<009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
 [<001684c2>] tasklet_action+0x15a/0x1d8
 [<00bd28ec>] __do_softirq+0x3ec/0x848
 [<001675a4>] irq_exit+0x74/0xf8
 [<0010dd6a>] do_IRQ+0xba/0xf0
 [<00bd19e8>] io_int_handler+0x104/0x2d4
 [<001033b6>] enabled_wait+0xb6/0x188
([<0010339e>] enabled_wait+0x9e/0x188)
 [<0010396a>] arch_cpu_idle+0x32/0x50
 [<00bd0112>] default_idle_call+0x52/0x68
 [<001cd0fa>] do_idle+0x102/0x188
 [<001cd41e>] cpu_startup_entry+0x3e/0x48
 [<00118c64>] smp_start_secondary+0x11c/0x130
 [<00bd2016>] restart_int_handler+0x62/0x78
 [<>]   (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
 [<03ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]

Kernel panic - not syncing: Fatal exception in interrupt

To prevent this, allocate a buffer when the BSG blk-request is setup, and
before it is queued for LLD processing.

Reported-by: Steffen Maier 
Signed-off-by: Benjamin Block 
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Cc:  #4.11+
---
 block/bsg.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 37663b664666..285b1b8126c3 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -74,6 +74,8 @@ static int bsg_major;
 
 static struct kmem_cache *bsg_cmd_cachep;
 
+#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
+
 /*
  * our internal command type
  */
@@ -85,6 +87,7 @@ struct bsg_command {
struct bio *bidi_bio;
int err;
struct sg_io_v4 hdr;
+   u8 reply_buffer[BSG_COMMAND_REPLY_BUFFERSIZE];
 };
 
 static void bsg_free_command(struct bsg_command *bc)
@@ -137,7 +140,7 @@ static inline struct hlist_head *bsg_dev_idx_hash(int index)
 
 static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq,
struct sg_io_v4 *hdr, struct bsg_device *bd,
-   fmode_t has_write_perm)
+   u8 *reply_buffer, fmode_t has_write_perm)
 {
struct

[RFC PATCH 0/6] bsg: fix regression resulting in panics when sending commands via BSG and some sanity cleanups

2017-08-09 Thread Benjamin Block

Hello all,

Steffen noticed recently that we have a regression in the BSG code that
prevents us from sending any traffic over this interface. After I
researched this a bit, it turned out that this affects not only zFCP, but
likely all LLDs that implements the BSG API. This was introduced in 4.11
(details in Patch 1 of this series).

I imagine the regression happened because of some very "unfortunate"
variable- namings. I can not fix them, as they involve the base struct
request, but I tried to add some cleanups that should make the
relationships between stuff more visible in the future I hope.

Patch 1   - Regression Fix; Also tagged for stable
Patch 2-6 - Cleanups

I tagged this as RFC. Patches 2-6 are a 'nice to have' IMO, Patch 1 is
obviously necessary, and if it is OK, I can re-send it separately if
necessary. If you don't like the changes in the other patches, I don't mind
dropping them.

I am not sure about Patch 4. It certainly works, but it changes
user-visible behavior, in what I believe is within the behavior described
by the SG interface. It makes the different methods of how BSG passes
commands down to the LLDs more conform with each other - even though I
can't make them the exact same. More details in the patch description.

I rebased the series on Jens' for-next and I have function-tested the
series on s390x's zFCP with the tools provided in the zfcp HBA API library
(https://www.ibm.com/developerworks/linux/linux390/zfcp-hbaapi.html) and
some custom code to test the read/write interface of BSG.

Reviews are more than welcome :)


Beste Grüße / Best regards,
  - Benjamin Block

Benjamin Block (6):
  bsg: fix kernel panic resulting from missing allocation of a
reply-buffer
  bsg: assign sense_len instead of fixed SCSI_SENSE_BUFFERSIZE
  bsg: scsi-transport: add compile-tests to prevent reply-buffer
overflows
  bsg: refactor ioctl to use regular BSG-command infrastructure for
SG_IO
  bsg: reduce unecessary arguments for bsg_map_hdr()
  bsg: reduce unecessary arguments for blk_complete_sgv4_hdr_rq()

 block/bsg-lib.c |  4 +-
 block/bsg.c | 90 ++---
 drivers/scsi/scsi_transport_fc.c|  3 ++
 drivers/scsi/scsi_transport_iscsi.c |  3 ++
 include/linux/bsg-lib.h |  2 +
 5 files changed, 65 insertions(+), 37 deletions(-)

--
2.12.2

[RFC PATCH 3/6] bsg: scsi-transport: add compile-tests to prevent reply-buffer overflows

2017-08-09 Thread Benjamin Block

The BSG implementations use the bsg_job's reply buffer as storage for their
own custom reply structures (e.g.: struct fc_bsg_reply or
struct iscsi_bsg_reply). The size of bsg_job's reply buffer and those of
the implementations is not dependent in any way the compiler can currently
check.

To make it easier to notice accidental violations add an explicit compile-
time check that tests whether the implementations' reply buffer is at most
as large as bsg_job's.

To do so, we have to move the size-define from bsg.c to a common header.

Signed-off-by: Benjamin Block 
---
 block/bsg.c | 3 +--
 drivers/scsi/scsi_transport_fc.c| 3 +++
 drivers/scsi/scsi_transport_iscsi.c | 3 +++
 include/linux/bsg-lib.h | 2 ++
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 285b1b8126c3..b924f1c23c58 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -74,8 +75,6 @@ static int bsg_major;
 
 static struct kmem_cache *bsg_cmd_cachep;
 
-#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
-
 /*
  * our internal command type
  */
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 892fbd9800d9..ce6654b5d329 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3736,6 +3736,9 @@ static int fc_bsg_dispatch(struct bsg_job *job)
 {
struct Scsi_Host *shost = fc_bsg_to_shost(job);
 
+   BUILD_BUG_ON(sizeof(struct fc_bsg_reply) >
+BSG_COMMAND_REPLY_BUFFERSIZE);
+
if (scsi_is_fc_rport(job->dev))
return fc_bsg_rport_dispatch(shost, job);
else
diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index a424eaeafeb0..4e021c949ad7 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1483,6 +1483,9 @@ static int iscsi_bsg_host_dispatch(struct bsg_job *job)
int cmdlen = sizeof(uint32_t);  /* start with length of msgcode */
int ret;
 
+   BUILD_BUG_ON(sizeof(struct iscsi_bsg_reply) >
+BSG_COMMAND_REPLY_BUFFERSIZE);
+
/* check if we have the msgcode value at least */
if (job->request_len < sizeof(uint32_t)) {
ret = -ENOMSG;
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index e34dde2da0ef..85d7c7678cc6 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -25,6 +25,8 @@
 
 #include 
 
+#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
+
 struct request;
 struct device;
 struct scatterlist;
-- 
2.12.2

[RFC PATCH 6/6] bsg: reduce unnecessary arguments for blk_complete_sgv4_hdr_rq()

2017-08-09 Thread Benjamin Block

Since struct bsg_command is now used in every calling case, we don't
need separation of arguments anymore that are contained in the same
bsg_command.

Signed-off-by: Benjamin Block 
---
 block/bsg.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 6ee2ffca808a..09f767cdf816 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -399,13 +399,14 @@ static struct bsg_command *bsg_get_done_cmd(struct 
bsg_device *bd)
return bc;
 }
 
-static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
-   struct bio *bio, struct bio *bidi_bio)
+static int blk_complete_sgv4_hdr_rq(struct bsg_command *bc)
 {
+   struct sg_io_v4 *hdr = >hdr;
+   struct request *rq = bc->rq;
struct scsi_request *req = scsi_req(rq);
int ret = 0;
 
-   dprintk("rq %p bio %p 0x%x\n", rq, bio, req->result);
+   dprintk("rq %p bio %p 0x%x\n", rq, bc->bio, req->result);
/*
 * fill in all the output members
 */
@@ -432,7 +433,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
if (rq->next_rq) {
hdr->dout_resid = req->resid_len;
hdr->din_resid = scsi_req(rq->next_rq)->resid_len;
-   blk_rq_unmap_user(bidi_bio);
+   blk_rq_unmap_user(bc->bidi_bio);
blk_put_request(rq->next_rq);
} else if (rq_data_dir(rq) == READ)
hdr->din_resid = req->resid_len;
@@ -448,7 +449,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
if (!ret && req->result < 0)
ret = req->result;
 
-   blk_rq_unmap_user(bio);
+   blk_rq_unmap_user(bc->bio);
scsi_req_free_cmd(req);
blk_put_request(rq);
 
@@ -507,8 +508,7 @@ static int bsg_complete_all_commands(struct bsg_device *bd)
if (IS_ERR(bc))
break;
 
-   tret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-   bc->bidi_bio);
+   tret = blk_complete_sgv4_hdr_rq(bc);
if (!ret)
ret = tret;
 
@@ -542,8 +542,7 @@ __bsg_read(char __user *buf, size_t count, struct 
bsg_device *bd,
 * after completing the request. so do that here,
 * bsg_complete_work() cannot do that for us
 */
-   ret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-  bc->bidi_bio);
+   ret = blk_complete_sgv4_hdr_rq(bc);
 
if (copy_to_user(buf, >hdr, sizeof(bc->hdr)))
ret = -EFAULT;
@@ -944,8 +943,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
blk_execute_rq(bd->queue, NULL, bc->rq, at_head);
bc->hdr.duration = jiffies_to_msecs(jiffies - bc->hdr.duration);
 
-   ret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-  bc->bidi_bio);
+   ret = blk_complete_sgv4_hdr_rq(bc);
 
if (copy_to_user(uarg, >hdr, sizeof(bc->hdr)))
ret = -EFAULT;
-- 
2.12.2

[RFC PATCH 5/6] bsg: reduce unnecessary arguments for bsg_map_hdr()

2017-08-09 Thread Benjamin Block

Since struct bsg_command is now used in every calling case, we don't
need separation of arguments anymore that are contained in the same
bsg_command.

Signed-off-by: Benjamin Block 
---
 block/bsg.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 8517361a9b3f..6ee2ffca808a 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -212,11 +212,12 @@ bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
  * map sg_io_v4 to a request.
  */
 static struct request *
-bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t 
has_write_perm,
-   u8 *reply_buffer)
+bsg_map_hdr(struct bsg_device *bd, struct bsg_command *bc,
+   fmode_t has_write_perm)
 {
struct request_queue *q = bd->queue;
struct request *rq, *next_rq = NULL;
+   struct sg_io_v4 *hdr = >hdr;
int ret;
unsigned int op, dxfer_len;
void __user *dxferp = NULL;
@@ -244,7 +245,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, 
fmode_t has_write_perm,
if (IS_ERR(rq))
return rq;
 
-   ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, reply_buffer,
+   ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, bc->reply_buffer,
   has_write_perm);
if (ret)
goto out;
@@ -633,8 +634,7 @@ static int __bsg_write(struct bsg_device *bd, const char 
__user *buf,
/*
 * get a request, fill in the blanks, and add to request queue
 */
-   rq = bsg_map_hdr(bd, >hdr, has_write_perm,
-bc->reply_buffer);
+   rq = bsg_map_hdr(bd, bc, has_write_perm);
if (IS_ERR(rq)) {
ret = PTR_ERR(rq);
rq = NULL;
@@ -934,8 +934,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
goto sg_io_out;
}
 
-   bc->rq = bsg_map_hdr(bd, >hdr, file->f_mode & FMODE_WRITE,
-bc->reply_buffer);
+   bc->rq = bsg_map_hdr(bd, bc, file->f_mode & FMODE_WRITE);
if (IS_ERR(bc->rq)) {
ret = PTR_ERR(bc->rq);
goto sg_io_out;
-- 
2.12.2

[RFC PATCH 4/6] bsg: refactor ioctl to use regular BSG-command infrastructure for SG_IO

2017-08-09 Thread Benjamin Block

Before, the SG_IO ioctl for BSG devices used to use its own on-stack data
to assemble and send the specified command. The read and write calls use
their own infrastructure build around the struct bsg_command and a custom
slab-pool for that.

Rafactor this, so that SG_IO ioctl also uses struct bsg_command and the
surrounding infrastructure. This way we use global defines like
BSG_COMMAND_REPLY_BUFFERSIZE only in one place, rather than two, the
handling of BSG commands gets more consistent, and it reduces some code-
duplications (the bio-pointer handling). It also reduces the stack
footprint by 320 to 384 bytes (depending on how large pointers are), and
uses the active slab-implementation for efficient alloc/free.

There are two other side-effects:
 - the 'duration' field in the sg header is now also filled for SG_IO
   calls, unlike before were it was always zero.
 - the BSG device queue-limit is also applied to SG_IO, unlike before were
   you could flood one BSG device with as many commands as you'd like. If
   one can trust older SG documentation this limit is applicable to either
   normal writes, or SG_IO calls; but this wasn't enforced before for
   SG_IO.

A complete unification is not possible, as it then would also enqueue SG_IO
commands in the BGS devices's command list, but this is only for the read-
and write-calls.

Signed-off-by: Benjamin Block 
---
 block/bsg.c | 60 
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index b924f1c23c58..8517361a9b3f 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -320,6 +320,17 @@ static void bsg_rq_end_io(struct request *rq, blk_status_t 
status)
wake_up(>wq_done);
 }
 
+static int bsg_prep_add_command(struct bsg_command *bc, struct request *rq)
+{
+   bc->rq = rq;
+   bc->bio = rq->bio;
+   if (rq->next_rq)
+   bc->bidi_bio = rq->next_rq->bio;
+   bc->hdr.duration = jiffies;
+
+   return 0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL);
+}
+
 /*
  * do final setup of a 'bc' and submit the matching 'rq' to the block
  * layer for io
@@ -327,16 +338,11 @@ static void bsg_rq_end_io(struct request *rq, 
blk_status_t status)
 static void bsg_add_command(struct bsg_device *bd, struct request_queue *q,
struct bsg_command *bc, struct request *rq)
 {
-   int at_head = (0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL));
+   int at_head = bsg_prep_add_command(bc, rq);
 
/*
 * add bc command to busy queue and submit rq for io
 */
-   bc->rq = rq;
-   bc->bio = rq->bio;
-   if (rq->next_rq)
-   bc->bidi_bio = rq->next_rq->bio;
-   bc->hdr.duration = jiffies;
spin_lock_irq(>lock);
list_add_tail(>list, >busy_list);
spin_unlock_irq(>lock);
@@ -916,31 +922,37 @@ static long bsg_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg)
return scsi_cmd_ioctl(bd->queue, NULL, file->f_mode, cmd, uarg);
}
case SG_IO: {
-   struct request *rq;
-   u8 reply_buffer[BSG_COMMAND_REPLY_BUFFERSIZE] = { 0, };
-   struct bio *bio, *bidi_bio = NULL;
-   struct sg_io_v4 hdr;
+   struct bsg_command *bc;
int at_head;
 
-   if (copy_from_user(, uarg, sizeof(hdr)))
-   return -EFAULT;
+   bc = bsg_alloc_command(bd);
+   if (IS_ERR(bc))
+   return PTR_ERR(bc);
 
-   rq = bsg_map_hdr(bd, , file->f_mode & FMODE_WRITE,
-reply_buffer);
-   if (IS_ERR(rq))
-   return PTR_ERR(rq);
+   if (copy_from_user(>hdr, uarg, sizeof(bc->hdr))) {
+   ret = -EFAULT;
+   goto sg_io_out;
+   }
 
-   bio = rq->bio;
-   if (rq->next_rq)
-   bidi_bio = rq->next_rq->bio;
+   bc->rq = bsg_map_hdr(bd, >hdr, file->f_mode & FMODE_WRITE,
+bc->reply_buffer);
+   if (IS_ERR(bc->rq)) {
+   ret = PTR_ERR(bc->rq);
+   goto sg_io_out;
+   }
 
-   at_head = (0 == (hdr.flags & BSG_FLAG_Q_AT_TAIL));
-   blk_execute_rq(bd->queue, NULL, rq, at_head);
-   ret = blk_complete_sgv4_hdr_rq(rq, , bio, bidi_bio);
+   at_head = bsg_prep_add_command(bc, bc->rq);
+   blk_execute_rq(bd->queue, NULL, bc->rq, at_head);
+   bc->hdr.duration = jiffies_to_msecs(jiffies - bc->hdr.duration);
 
-   if (copy_to_user(uarg, , sizeof(hdr)))
-   return -EFAULT;
+   ret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
+  bc->bidi_bio);
 
+   if

[RFC PATCH 2/6] bsg: assign sense_len instead of fixed SCSI_SENSE_BUFFERSIZE

2017-08-09 Thread Benjamin Block

We do set rq->sense_len when we assigne the reply-buffer in
blk_fill_sgv4_hdr_rq(). No point in possibly deviating from this value
later on.

bsg-lib.h specifies:
unsigned int reply_len;
/*
 * On entry : reply_len indicates the buffer size allocated for
 * the reply.
 *
 * ...
 */

Signed-off-by: Benjamin Block 
---
 block/bsg-lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c4513b23f57a..c7c2c6bbb5ae 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -147,8 +147,8 @@ static int bsg_create_job(struct device *dev, struct 
request *req)
job->request = rq->cmd;
job->request_len = rq->cmd_len;
job->reply = rq->sense;
-   job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
-* allocated */
+   job->reply_len = rq->sense_len;
+
if (req->bio) {
ret = bsg_map_buffer(>request_payload, req);
if (ret)
-- 
2.12.2

Re: [PATCH 3/4] scsi: add Mylex RAID controller

2017-08-09 Thread Christoph Hellwig

On Mon, Aug 07, 2017 at 08:09:11AM +0200, Hannes Reinecke wrote:
> On 08/05/2017 01:39 PM, Christoph Hellwig wrote:
> > Can you use normal linux style for the code instead of copy and
> > pasting the weird naming and capitalization from the DAC960 driver?
> > 
> Yes; already planned for v2. But first wanted to get some general
> feedback (like: is anyone interested in that at all?)

Yes, please go ahead and kill off the DAC960 driver.

> 
> > Also please use the driver name as prefix for the functions.
> > 
> Ok.
> 
> > Maybe myraid instead of mylex?
> > 
> Nah; I'd rather stick with mylex.
> (Especially as it says 'Mylex' in big fat letters on the board :-)

Naming drivers after the vendor only is usually a bad idea as vendors
have/had multiple products.  In this case we have plenty other drivers
that support Mylex products.

Re: [PATCH] megaraid_sas: move command counter to correct place

2017-08-09 Thread Tomas Henzl

On 8.8.2017 09:37, Sumit Saxena wrote:
>> -Original Message-
>> From: Martin K. Petersen [mailto:martin.peter...@oracle.com]
>> Sent: Monday, August 07, 2017 11:07 PM
>> To: Tomas Henzl
>> Cc: linux-scsi@vger.kernel.org; sumit.sax...@broadcom.com;
>> kashyap.de...@broadcom.com
>> Subject: Re: [PATCH] megaraid_sas: move command counter to correct place
>>
>>
>> Tomas,
>>
>>> the eh reset function returns success when fw_outstanding equals zero,
>>> that means that the counter shouldn't be decremented when the driver
>>> still owns the command
>> Applied to 4.13/scsi-fixes. Thank you!
> Just realized that this patch may cause performance regression.
> With this patch below scenario may occur-
>
> -Consider outstanding IOs reaches to controller's Queue depth.
> -Driver frees up command and complete it back to SML.
> -Since host_busy is decremented, SML can issue one new  IO to driver.
> -By the time, if "fw_outstanding" is not decremented by driver, then
> driver will return newly submitted IO back to SML with return status
> SCSI_MLQUEUE_HOST_BUSY because of below code
>   in megaraid_sas driver's IO submission path-
>
> if (atomic_inc_return(>fw_outstanding) >
> instance->host->can_queue) {
> atomic_dec(>fw_outstanding);
> return SCSI_MLQUEUE_HOST_BUSY;
> }
>
> This situation will be more evident when RAID1 fastpath IOs are running as
> in that case driver will be issuing two IOs to firmware for single IO
> issued from SML.
> Above situation can be avoided, if this patch is removed.
>
> Regarding Tomas' concern, there should not be any problem as driver calls
> "synchronize_irq" before checking "fw_outstanding". Once fw_outstanding is
> decremented and
> driver frees up command, then only driver will be checking
> "fw_outstanding" equals to zero or not so all this will always fall in a
> sequence and will not
> cause the problem stated by Tomas.

I haven't expected this to fix a real issue in latest upstream code,
just wanted to follow the correct ordering.
If it creates a performance issue, reverting the patch is fine for me.

>
> I am sorry for confusion and would request to revert this patch.
>
> Thanks,
> Sumit
>
>> --
>> Martin K. Petersen   Oracle Linux Engineering

Re: [PATCH 6/6] hpsa: handle unsupported devices more gracefully

2017-08-09 Thread Christoph Hellwig

On Tue, Aug 08, 2017 at 10:35:15AM +0200, Hannes Reinecke wrote:
> From: Jeff Mahoney 
> 
> Add a warning message if an unsupported device is found and the
> hpsa_allow_any parameter is not set.
> Also make the hpsa_allow_any parameter writeable once the hpsa
> driver is loaded.

This looks ok, but maybe we should also plan for just setting the
flag by defauly sooner or later?

> +static struct kernel_param_ops hpsa_allow_any_ops = {

const?

> +
> + if (hpsa_allow_any && !newval) {
> + if (hpsa_claimed_unsupported) {
> + pr_info("hpsa: can't disable hpsa_allow_any parameter. 
> Devices already in use.\n");
> + return -EPERM;
> + } else
> + hpsa_allow_any = false;
> + } else if (!hpsa_allow_any && newval) {
> + pr_info("hpsa: allowing unsupported devices. If devices are 
> claimed, this will result in an unsupported environment.\n");
> + hpsa_allow_any = true;
> + }
> + return 0;

Do we really need this to start with?  It's normal that module
parameters only affect newly probed devices when changed at runtime,
so I think we could just stick to the simple module parameter with
permissions that allow runtime changes.

Re: [PATCH 5/6] hpsa: do not print errors for unsupported report luns format

2017-08-09 Thread Christoph Hellwig

I don't like the misnamed supported flag.  Either we should always ignore
the errors, or key it off a specific flag for newer firmware.

Re: [PATCH 4/6] hpsa: Ignore errors for unsupported LV_DEVICE_ID VPD page

2017-08-09 Thread Christoph Hellwig

Looks fine,

Reviewed-by: Christoph Hellwig

Re: [PATCH 2/6] hpsa: add support for legacy boards

2017-08-09 Thread Christoph Hellwig

> -static int hpsa_lookup_board_id(struct pci_dev *pdev, u32 *board_id)
> +static int hpsa_lookup_board_id(struct pci_dev *pdev, u32 *board_id,
> + bool *supported)
>  {
>   int i;
>   u32 subsystem_vendor_id, subsystem_device_id;
> @@ -7242,9 +7266,22 @@ static int hpsa_lookup_board_id(struct pci_dev *pdev, 
> u32 *board_id)
>   *board_id = ((subsystem_device_id << 16) & 0x) |
>   subsystem_vendor_id;
>  
> + if (supported)
> + *supported = true;
>   for (i = 0; i < ARRAY_SIZE(products); i++)
> - if (*board_id == products[i].board_id)
> - return i;
> + if (*board_id == products[i].board_id) {
> + if (products[i].access != _access &&
> + products[i].access != _access)
> + return i;
> + if (hpsa_allow_any) {
> + dev_warn(>dev,
> +  "unsupported board ID: 0x%08x\n",
> +  *board_id);
> + if (supported)
> + *supported = false;
> + return i;
> + }
> + }

Can you explain the point of the supported flag?

> + unsigned long register_value  =
> + readl(h->vaddr + SA5_INTR_STATUS);
> + return (register_value & SA5B_INTR_PENDING);

This should be condensed into:

return readl(h->vaddr + SA5_INTR_STATUS) & SA5B_INTR_PENDING;

>   .command_completed = SA5_completed,
>  };
>  
> +/* Duplicate entry of the above to mark unsupported boards */
> +static struct access_method SA5A_access = {
> + .submit_command = SA5_submit_command,
> + .set_intr_mask = SA5_intr_mask,
> + .intr_pending = SA5_intr_pending,
> + .command_completed = SA5_completed,
> +};
> +
> +static struct access_method SA5B_access = {
> + .submit_command = SA5_submit_command,
> + .set_intr_mask = SA5B_intr_mask,
> + .intr_pending = SA5B_intr_pending,
> + .command_completed = SA5_completed,
> +};

Please align the fields nicely, e.g.:

static struct access_method SA5A_access = {
.submit_command = SA5_submit_command,
.set_intr_mask  = SA5_intr_mask,
...

Re: [PATCH 3/6] hpsa: disable volume status check for older controller

2017-08-09 Thread Christoph Hellwig

Why that weird supported flag check?

(which also means I need to got back to the previous patch and understand
that part..)

Re: [PATCH 2/6] hpsa: add support for legacy boards

2017-08-09 Thread Christoph Hellwig

On Tue, Aug 08, 2017 at 10:35:11AM +0200, Hannes Reinecke wrote:
> Add support for legacy boards, ensuring to enable the driver for
> those boards only when 'hpsa_allow_any' is set.

Why the wildcard instead of the specific IDs from the cciss driver?

Re: [PATCH 1/6] hpsa: consolidate status variables

2017-08-09 Thread Christoph Hellwig

These seem to be modified from sysfs files, so you'll run into
read-modify-write atomicy issues with bitfields.

That being said while plain ints are somewhat safe the right thing
would be to use proper atomic bitops.

[PATCH] scsi: make 'state' device attribute pollable

2017-08-09 Thread Hannes Reinecke

While the 'state' attribute can (and will) change occasionally,
calling 'poll()' or 'select()' on it fails as sysfs is never
notified that the state has changed.
With this patch calling 'poll()' or 'select()' will work
properly.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_lib.c   | 7 +--
 drivers/scsi/scsi_transport_srp.c | 5 -
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 41c19c7..2101cfd 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2654,6 +2654,7 @@ void scsi_exit_queue(void)
 
}
sdev->sdev_state = state;
+   sysfs_notify(>sdev_gendev.kobj, NULL, "state");
return 0;
 
  illegal:
@@ -3074,14 +3075,16 @@ int scsi_internal_device_unblock_nowait(struct 
scsi_device *sdev,
 * offlined states and goose the device queue if successful.
 */
if ((sdev->sdev_state == SDEV_BLOCK) ||
-   (sdev->sdev_state == SDEV_TRANSPORT_OFFLINE))
+   (sdev->sdev_state == SDEV_TRANSPORT_OFFLINE)) {
sdev->sdev_state = new_state;
-   else if (sdev->sdev_state == SDEV_CREATED_BLOCK) {
+   sysfs_notify(>sdev_gendev.kobj, NULL, "state");
+   } else if (sdev->sdev_state == SDEV_CREATED_BLOCK) {
if (new_state == SDEV_TRANSPORT_OFFLINE ||
new_state == SDEV_OFFLINE)
sdev->sdev_state = new_state;
else
sdev->sdev_state = SDEV_CREATED;
+   sysfs_notify(>sdev_gendev.kobj, NULL, "state");
} else if (sdev->sdev_state != SDEV_CANCEL &&
 sdev->sdev_state != SDEV_OFFLINE)
return -EINVAL;
diff --git a/drivers/scsi/scsi_transport_srp.c 
b/drivers/scsi/scsi_transport_srp.c
index f617021..698cc46 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -556,8 +556,11 @@ int srp_reconnect_rport(struct srp_rport *rport)
 */
shost_for_each_device(sdev, shost) {
mutex_lock(>state_mutex);
-   if (sdev->sdev_state == SDEV_OFFLINE)
+   if (sdev->sdev_state == SDEV_OFFLINE) {
sdev->sdev_state = SDEV_RUNNING;
+   sysfs_notify(>sdev_gendev.kobj,
+NULL, "state");
+   }
mutex_unlock(>state_mutex);
}
} else if (rport->state == SRP_RPORT_RUNNING) {
-- 
1.8.5.6

Re: [PATCH v3 03/13] mpt3sas: SGL to PRP Translation for I/Os to NVMe devices

2017-08-09 Thread Suganath Prabu Subramani

Hi Martin,
This code was added to detect holes, when we started testing with 4.9
kernel. when we disabled "use_blk_mq" and no merges, we are hitting
issues with holes. Anyhow In latest upstream, it got fixed with this
commit 5a8d75a1b8c99bdc926ba69b7b7dbe4fae81a5af

So we are removing the code related to hole detection .

Thanks,
Suganath Prabu S



On Tue, Aug 8, 2017 at 9:42 PM, Martin K. Petersen
 wrote:
>
> Suganath,
>
>> + /*
>> +  ** Below code detects gaps/holes in IO data buffers.
>> +  ** What does holes/gaps mean?
>> +  ** Any SGE except first one in a SGL starts at non NVME page size
>> +  ** aligned address OR Any SGE except last one in a SGL ends at
>> +  ** non NVME page size boundary.
>> +  **
>> +  ** Driver has already informed block layer by setting boundary rules
>> +  ** for bio merging done at NVME page size boundary calling kernel API
>> +  ** blk_queue_virt_boundary inside slave_config.
>> +  ** Still there is possibility of IO coming with holes to driver 
>> because
>> +  ** of IO merging done by IO scheduler.
>
> All this SGL to PRP code needs to go.
>
> If you are seeing anything that's not a valid PRP after setting the
> queue virt boundary then there's a block layer bug that needs to be
> debugged and fixed. Regardless of whether you are using an I/O scheduler
> or not.
>
> --
> Martin K. Petersen  Oracle Linux Engineering

[no subject]

2017-08-09 Thread системы администратор

внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных 
администратором, который в настоящее время работает на 10.9GB, Вы не сможете 
отправить или получить новую почту, пока вы повторно не проверить ваш почтовый 
ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, 
отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет 
отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...9o76ypp2345t..2017
Почты технической поддержки ©2017

спасибо
системы администратор

[PATCH 1/2] tests/scsi: add SCSI midlayer test group

2017-08-09 Thread Hannes Reinecke

Add a test group for tests of the SCSI midlayer.

Signed-off-by: Hannes Reinecke 
---
 tests/scsi/group | 26 ++
 1 file changed, 26 insertions(+)
 create mode 100644 tests/scsi/group

diff --git a/tests/scsi/group b/tests/scsi/group
new file mode 100644
index 000..73459a6
--- /dev/null
+++ b/tests/scsi/group
@@ -0,0 +1,26 @@
+#!/bin/bash
+#
+# SCSI regression tests
+#
+# Copyright (C) 2017 Hannes Reinecke, SUSE Linux GmbH
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+#
+# Usually, group_requires() just needs to check that any necessary programs and
+# kernel features are available using the _have_foo helpers. If
+# group_requires() returns non-zero, all tests in this group will be skipped.
+group_requires() {
+   _have_root
+}
-- 
1.8.5.6

[PATCH 2/2] tests/scsi/0001: Regression test for SCSI device blacklisting

2017-08-09 Thread Hannes Reinecke

SCSI device blacklisting seems to be a tricky subject, with
lots of potential for messing up the selection algorithm.
This adds a test for catching regressions here.

Signed-off-by: Hannes Reinecke 
---
 tests/scsi/001 | 69 ++
 tests/scsi/001.out | 10 
 2 files changed, 79 insertions(+)
 create mode 100644 tests/scsi/001
 create mode 100644 tests/scsi/001.out

diff --git a/tests/scsi/001 b/tests/scsi/001
new file mode 100644
index 000..374a458
--- /dev/null
+++ b/tests/scsi/001
@@ -0,0 +1,69 @@
+#!/bin/bash
+#
+# Regression test for scsi device blacklisting
+#
+# Copyright (C) 2017 Hannes Reinecke, SUSE Linux GmbH
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+
+DESCRIPTION="SCSI device blacklisting"
+
+QUICK=1
+
+CHECK_DMESG=0
+
+requires() {
+if modinfo scsi_debug | grep -q inq_vendor ; then
+   return 0
+fi
+return 1
+}
+
+test() {
+local inq vendor model host dev blacklist
+
+echo "Running ${TEST_NAME}"
+
+for inq in \
+   "" \
+   "" \
+   "HITACHI OPEN-V  " \
+   "Scanner " \
+   "Inateck " \
+   "Promise STEX" \
+   "HITAOPEN-V  " \
+   "ABCDScanner " ; do
+   vendor=${inq:0:8}
+   model=${inq:8:16}
+   modprobe scsi_debug inq_vendor="$vendor" inq_product="$model"
+   host=$(lsscsi -H | sed -n 's/.\([0-9]*\).*scsi_debug/\1/p')
+   if [ -z "$host" ] ; then
+   echo "Test failed, scsi_debug could not be loaded"
+   return 1
+   fi
+   dev=$(lsscsi | grep $host | sed -n 's/.*\/dev\/\(sd[a-z]*\).*/\1/p')
+   if [ -z "$dev" ] ; then
+   echo "Test failed, SCSI device not found"
+   rmmod scsi_debug
+   return 1
+   fi
+   vendor=$(cat /sys/block/$dev/device/vendor)
+   model=$(cat /sys/block/$dev/device/model)
+   blacklist=$(cat /sys/block/$dev/device/blacklist)
+   echo "$vendor $model $blacklist"
+   rmmod scsi_debug
+done
+echo "Test complete"
+return 0
+}
diff --git a/tests/scsi/001.out b/tests/scsi/001.out
new file mode 100644
index 000..64db97c
--- /dev/null
+++ b/tests/scsi/001.out
@@ -0,0 +1,10 @@
+Running scsi/001
+  0x0
+  0x0
+HITACHI  OPEN-V   0x2
+ Scanner  0x1
+Inateck   0x0
+Promise  STEX 0x40
+HITA OPEN-V   0x0
+ABCD Scanner  0x0
+Test complete
-- 
1.8.5.6

[PATCHv2 3/4] scsi: whitespace fixes in scsi_devinfo.c

2017-08-09 Thread Hannes Reinecke

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_devinfo.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index 28fea83..776c701 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -304,8 +304,8 @@ static void scsi_strcpy_devinfo(char *name, char *to, 
size_t to_length,
 */
to[from_length] = '\0';
} else {
-   /* 
-* space pad the string if it is short. 
+   /*
+* space pad the string if it is short.
 */
strncpy([from_length], spaces,
to_length - from_length);
@@ -325,10 +325,10 @@ static void scsi_strcpy_devinfo(char *name, char *to, 
size_t to_length,
  * @flags: if strflags NULL, use this flag value
  *
  * Description:
- * Create and add one dev_info entry for @vendor, @model, @strflags or
- * @flag. If @compatible, add to the tail of the list, do not space
- * pad, and set devinfo->compatible. The scsi_static_device_list entries
- * are added with @compatible 1 and @clfags NULL.
+ * Create and add one dev_info entry for @vendor, @model, @strflags or
+ * @flag. If @compatible, add to the tail of the list, do not space
+ * pad, and set devinfo->compatible. The scsi_static_device_list entries
+ * are added with @compatible 1 and @clfags NULL.
  *
  * Returns: 0 OK, -error on failure.
  **/
@@ -350,11 +350,11 @@ static int scsi_dev_info_list_add(int compatible, char 
*vendor, char *model,
  * @key:   specify list to use
  *
  * Description:
- * Create and add one dev_info entry for @vendor, @model,
- * @strflags or @flag in list specified by @key. If @compatible,
- * add to the tail of the list, do not space pad, and set
- * devinfo->compatible. The scsi_static_device_list entries are
- * added with @compatible 1 and @clfags NULL.
+ * Create and add one dev_info entry for @vendor, @model,
+ * @strflags or @flag in list specified by @key. If @compatible,
+ * add to the tail of the list, do not space pad, and set
+ * devinfo->compatible. The scsi_static_device_list entries are
+ * added with @compatible 1 and @clfags NULL.
  *
  * Returns: 0 OK, -error on failure.
  **/
@@ -405,7 +405,7 @@ int scsi_dev_info_list_add_keyed(int compatible, char 
*vendor, char *model,
  *
  * Description:
  * Finds the first dev_info entry matching @vendor, @model
- * in list specified by @key.
+ * in list specified by @key.
  *
  * Returns: pointer to matching entry, or ERR_PTR on failure.
  **/
@@ -508,10 +508,10 @@ int scsi_dev_info_list_del_keyed(char *vendor, char 
*model, int key)
  * @dev_list:  string of device flags to add
  *
  * Description:
- * Parse dev_list, and add entries to the scsi_dev_info_list.
- * dev_list is of the form "vendor:product:flag,vendor:product:flag".
- * dev_list is modified via strsep. Can be called for command line
- * addition, for proc or mabye a sysfs interface.
+ * Parse dev_list, and add entries to the scsi_dev_info_list.
+ * dev_list is of the form "vendor:product:flag,vendor:product:flag".
+ * dev_list is modified via strsep. Can be called for command line
+ * addition, for proc or mabye a sysfs interface.
  *
  * Returns: 0 if OK, -error on failure.
  **/
@@ -701,7 +701,7 @@ static int proc_scsi_devinfo_open(struct inode *inode, 
struct file *file)
return seq_open(file, _devinfo_seq_ops);
 }
 
-/* 
+/*
  * proc_scsi_dev_info_write - allow additions to scsi_dev_info_list via /proc.
  *
  * Description: Adds a black/white list entry for vendor and model with an
@@ -840,8 +840,8 @@ int scsi_dev_info_remove_list(int key)
  * scsi_init_devinfo - set up the dynamic device list.
  *
  * Description:
- * Add command line entries from scsi_dev_flags, then add
- * scsi_static_device_list entries to the scsi device info list.
+ * Add command line entries from scsi_dev_flags, then add
+ * scsi_static_device_list entries to the scsi device info list.
  */
 int __init scsi_init_devinfo(void)
 {
-- 
1.8.5.6

[PATCHv2 0/4] Fixup blacklist handling

2017-08-09 Thread Hannes Reinecke

Hi all,

the SCSI blacklist handling seems to be rather tricky issue;
everytime a fix is included it tends to break other devices.
This patchset attempt to simplify the devlist handling yet again,
but this time implementing the framework for regression testing, too.
A patch adding a regression test to the blktest suite will follow.

As usual, comments and reviews are welcome.

Changes to v1:
- Implement exact match for vendor string as suggested by Bart
- Straigten out issues pointed out by Alan Stern
- Reshuffle patches

Hannes Reinecke (4):
  scsi_debug: allow to specify inquiry vendor and model
  scsi: Export blacklist flags to sysfs
  scsi: whitespace fixes in scsi_devinfo.c
  scsi_devinfo: fixup string compare

 drivers/scsi/scsi_debug.c   | 25 +-
 drivers/scsi/scsi_devinfo.c | 84 -
 drivers/scsi/scsi_scan.c|  1 +
 drivers/scsi/scsi_sysfs.c   | 11 ++
 4 files changed, 81 insertions(+), 40 deletions(-)

-- 
1.8.5.6

[PATCHv2 4/4] scsi_devinfo: fixup string compare

2017-08-09 Thread Hannes Reinecke

When checking the model and vendor string we need to use the
minimum value of either string, otherwise we'll miss out on
wildcard matches.
And we should take card when matching with zero size strings;
results might be unpredictable.
With this patch the rules for matching devinfo strings are
as follows:
- Vendor strings must match exactly
- Empty Model strings will only match if the devinfo model
  is also empty
- Model strings shorter than the devinfo model string will
  not match

Fixes: 5e7ff2c ("SCSI: fix new bug in scsi_dev_info_list string matching")
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_devinfo.c | 46 -
 1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index 776c701..f8f5cb7 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -399,8 +399,8 @@ int scsi_dev_info_list_add_keyed(int compatible, char 
*vendor, char *model,
 
 /**
  * scsi_dev_info_list_find - find a matching dev_info list entry.
- * @vendor:vendor string
- * @model: model (product) string
+ * @vendor:full vendor string
+ * @model: full model (product) string
  * @key:   specify list to use
  *
  * Description:
@@ -440,6 +440,8 @@ static struct scsi_dev_info_list 
*scsi_dev_info_list_find(const char *vendor,
/* Also skip trailing spaces */
while (vmax > 0 && vskip[vmax - 1] == ' ')
--vmax;
+   if (!vmax)
+   vskip = NULL;
 
mmax = sizeof(devinfo->model);
mskip = model;
@@ -449,27 +451,45 @@ static struct scsi_dev_info_list 
*scsi_dev_info_list_find(const char *vendor,
}
while (mmax > 0 && mskip[mmax - 1] == ' ')
--mmax;
+   if (!mmax)
+   mskip = NULL;
 
list_for_each_entry(devinfo, _table->scsi_dev_info_list,
dev_info_list) {
if (devinfo->compatible) {
/*
-* Behave like the older version of get_device_flags.
+* vendor strings must be an exact match
 */
-   if (memcmp(devinfo->vendor, vskip, vmax) ||
-   (vmax < sizeof(devinfo->vendor) &&
-   devinfo->vendor[vmax]))
+   if (vmax != strlen(devinfo->vendor))
continue;
-   if (memcmp(devinfo->model, mskip, mmax) ||
-   (mmax < sizeof(devinfo->model) &&
-   devinfo->model[mmax]))
+   if (vskip && vmax &&
+   memcmp(devinfo->vendor, vskip, vmax))
continue;
-   return devinfo;
+   /*
+* Empty model strings only match if both strings
+* are empty.
+*/
+   if (!mmax && !strlen(devinfo->model))
+   return devinfo;
+
+   /*
+* Empty @model never matches
+*/
+   if (!mskip)
+   continue;
+
+   /*
+* @model specifies the full string, and
+* must be larger or equal to devinfo->model
+*/
+   if (!memcmp(devinfo->model, mskip,
+   strlen(devinfo->model)))
+   return devinfo;
} else {
if (!memcmp(devinfo->vendor, vendor,
-sizeof(devinfo->vendor)) &&
-!memcmp(devinfo->model, model,
- sizeof(devinfo->model)))
+   sizeof(devinfo->vendor)) &&
+   !memcmp(devinfo->model, model,
+   sizeof(devinfo->model)))
return devinfo;
}
}
-- 
1.8.5.6

[PATCHv2 1/4] scsi_debug: allow to specify inquiry vendor and model

2017-08-09 Thread Hannes Reinecke

For testing purposes we need to be able to pass in the inquiry
vendor and model.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_debug.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index dc095a2..c8a3f8d 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -953,9 +953,9 @@ static int fetch_to_dev_buffer(struct scsi_cmnd *scp, 
unsigned char *arr,
 }
 
 
-static const char * inq_vendor_id = "Linux   ";
-static const char * inq_product_id = "scsi_debug  ";
-static const char *inq_product_rev = "0186";   /* version less '.' */
+static char sdebug_inq_vendor_id[9] = "Linux   ";
+static char sdebug_inq_product_id[17] = "scsi_debug  ";
+static char sdebug_inq_product_rev[5] = "0186";/* version less '.' */
 /* Use some locally assigned NAAs for SAS addresses. */
 static const u64 naa3_comp_a = 0x3220ULL;
 static const u64 naa3_comp_b = 0x3330ULL;
@@ -975,8 +975,8 @@ static int inquiry_vpd_83(unsigned char *arr, int 
port_group_id,
arr[0] = 0x2;   /* ASCII */
arr[1] = 0x1;
arr[2] = 0x0;
-   memcpy([4], inq_vendor_id, 8);
-   memcpy([12], inq_product_id, 16);
+   memcpy([4], sdebug_inq_vendor_id, 8);
+   memcpy([12], sdebug_inq_product_id, 16);
memcpy([28], dev_id_str, dev_id_str_len);
num = 8 + 16 + dev_id_str_len;
arr[3] = num;
@@ -1408,9 +1408,9 @@ static int resp_inquiry(struct scsi_cmnd *scp, struct 
sdebug_dev_info *devip)
arr[6] = 0x10; /* claim: MultiP */
/* arr[6] |= 0x40; ... claim: EncServ (enclosure services) */
arr[7] = 0xa; /* claim: LINKED + CMDQUE */
-   memcpy([8], inq_vendor_id, 8);
-   memcpy([16], inq_product_id, 16);
-   memcpy([32], inq_product_rev, 4);
+   memcpy([8], sdebug_inq_vendor_id, 8);
+   memcpy([16], sdebug_inq_product_id, 16);
+   memcpy([32], sdebug_inq_product_rev, 4);
/* version descriptors (2 bytes each) follow */
put_unaligned_be16(0xc0, arr + 58);   /* SAM-6 no version claimed */
put_unaligned_be16(0x5c0, arr + 60);  /* SPC-5 no version claimed */
@@ -4152,6 +4152,12 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct 
sdebug_dev_info *devip,
 module_param_named(fake_rw, sdebug_fake_rw, int, S_IRUGO | S_IWUSR);
 module_param_named(guard, sdebug_guard, uint, S_IRUGO);
 module_param_named(host_lock, sdebug_host_lock, bool, S_IRUGO | S_IWUSR);
+module_param_string(inq_vendor, sdebug_inq_vendor_id,
+   sizeof(sdebug_inq_vendor_id), S_IRUGO|S_IWUSR);
+module_param_string(inq_product, sdebug_inq_product_id,
+   sizeof(sdebug_inq_product_id), S_IRUGO|S_IWUSR);
+module_param_string(inq_rev, sdebug_inq_product_rev,
+   sizeof(sdebug_inq_product_rev), S_IRUGO|S_IWUSR);
 module_param_named(lbpu, sdebug_lbpu, int, S_IRUGO);
 module_param_named(lbpws, sdebug_lbpws, int, S_IRUGO);
 module_param_named(lbpws10, sdebug_lbpws10, int, S_IRUGO);
@@ -4203,6 +4209,9 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct 
sdebug_dev_info *devip,
 MODULE_PARM_DESC(fake_rw, "fake reads/writes instead of copying (def=0)");
 MODULE_PARM_DESC(guard, "protection checksum: 0=crc, 1=ip (def=0)");
 MODULE_PARM_DESC(host_lock, "host_lock is ignored (def=0)");
+MODULE_PARM_DESC(inq_vendor, "SCSI INQUIRY vendor string (def=\"Linux\")");
+MODULE_PARM_DESC(inq_product, "SCSI INQUIRY product string 
(def=\"scsi_debug\")");
+MODULE_PARM_DESC(inq_rev, "SCSI INQUIRY revision string (def=\"scsi_debug\")");
 MODULE_PARM_DESC(lbpu, "enable LBP, support UNMAP command (def=0)");
 MODULE_PARM_DESC(lbpws, "enable LBP, support WRITE SAME(16) with UNMAP bit 
(def=0)");
 MODULE_PARM_DESC(lbpws10, "enable LBP, support WRITE SAME(10) with UNMAP bit 
(def=0)");
-- 
1.8.5.6

[PATCHv2 2/4] scsi: Export blacklist flags to sysfs

2017-08-09 Thread Hannes Reinecke

Each scsi device is scanned according to the found blacklist flags,
but this information is never presented to sysfs.
This makes it quite hard to figure out if blacklisting worked as
expected.
With this patch we're exporting an additional attribute 'blacklist'
containing the blacklist flags for this device.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_scan.c  |  1 +
 drivers/scsi/scsi_sysfs.c | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3832ba5..e4e4374 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -984,6 +984,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned 
char *inq_result,
scsi_attach_vpd(sdev);
 
sdev->max_queue_depth = sdev->queue_depth;
+   sdev->sdev_bflags = *bflags;
 
/*
 * Ok, the device is now all set up, we can
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 5e8ace2..37d436b 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -953,6 +953,16 @@ static DEVICE_ATTR(queue_depth, S_IRUGO | S_IWUSR, 
sdev_show_queue_depth,
 }
 static DEVICE_ATTR(wwid, S_IRUGO, sdev_show_wwid, NULL);
 
+static ssize_t
+sdev_show_blacklist(struct device *dev, struct device_attribute *attr,
+   char *buf)
+{
+   struct scsi_device *sdev = to_scsi_device(dev);
+
+   return snprintf (buf, 20, "0x%x\n", sdev->sdev_bflags);
+}
+static DEVICE_ATTR(blacklist, S_IRUGO, sdev_show_blacklist, NULL);
+
 #ifdef CONFIG_SCSI_DH
 static ssize_t
 sdev_show_dh_state(struct device *dev, struct device_attribute *attr,
@@ -1138,6 +1148,7 @@ static umode_t scsi_sdev_bin_attr_is_visible(struct 
kobject *kobj,
_attr_queue_depth.attr,
_attr_queue_type.attr,
_attr_wwid.attr,
+   _attr_blacklist.attr,
 #ifdef CONFIG_SCSI_DH
_attr_dh_state.attr,
_attr_access_state.attr,
-- 
1.8.5.6

[PATCH] scsi: dpt_i2o: remove redundant null check on array device

2017-08-09 Thread Colin King

From: Colin Ian King 

The null check on pHba->channel[chan].device is redundant because
device is an array and hence can never be null. Remove the check.

Detected by CoverityScan, CID#115362 ("Array compared against 0")

Signed-off-by: Colin Ian King 
---
 drivers/scsi/dpt_i2o.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c
index 256dd6791fcc..fd172b0890d3 100644
--- a/drivers/scsi/dpt_i2o.c
+++ b/drivers/scsi/dpt_i2o.c
@@ -1169,11 +1169,6 @@ static struct adpt_device* adpt_find_device(adpt_hba* 
pHba, u32 chan, u32 id, u6
if(chan < 0 || chan >= MAX_CHANNEL)
return NULL;

-   if( pHba->channel[chan].device == NULL){
-   printk(KERN_DEBUG"Adaptec I2O RAID: Trying to find device 
before they are allocated\n");
-   return NULL;
-   }
-
d = pHba->channel[chan].device[id];
if(!d || d->tid == 0) {
return NULL;
-- 
2.11.0

UFS maximum access unit is 8KB

2017-08-09 Thread Bean Huo (beanhuo)

Hi,
I am now using one hikey960, and Linux kernel version is 4.4.77.
And found that for UFS driver version, the maximum transformation is 8KB.
Means that if I using 128KB chuck size to program in the user space, but, from 
ftrace/blktrace, 
It shows that it always programs by 8KB in kernel. please see below log, who 
knows where is wrong with it?

24992.855327 |   7)   |  vfs_read() {
24992.855340 |   7) + 89.584 us   |submit_bio();
24992.855432 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676352 + 
16 [fio] */
24992.855434 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676368 + 
16 [fio] */
24992.855435 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676384 + 
16 [fio] */
24992.855437 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676400 + 
16 [fio] */
24992.855438 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676416 + 
16 [fio] */
24992.855439 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676432 + 
16 [fio] */
24992.855440 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676448 + 
16 [fio] */
24992.855441 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676464 + 
16 [fio] */
24992.855443 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676480 + 
16 [fio] */
24992.855443 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676496 + 
16 [fio] */
24992.855444 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676512 + 
16 [fio] */
24992.855445 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676528 + 
16 [fio] */
24992.855446 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676544 + 
16 [fio] */
24992.855447 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676560 + 
16 [fio] */
24992.855448 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676576 + 
16 [fio] */
24992.855449 |   7)   |  /* block_rq_insert: 8,48 R 0 () 40676592 + 
16 [fio] */
24992.855451 |   7)   |__blk_run_queue() {
24992.855451 |   7)   |  scsi_request_fn() {
24992.855453 |   7)   |/* block_rq_issue: 8,48 R 0 () 
40676352 + 16 [fio] */
24992.855456 |   7)   |scsi_dispatch_cmd() {
24992.855457 |   7)   |  /* scsi_dispatch_cmd_start: 
host_no=0 channel=0 id=0 lun=3 data_sgl=2 prot_sgl=0 prot_op=SCSI_PROT_NORMAL 
cmnd=(READ_10 lba=5084544 txlen=2 protect=0 raw=28 00 00 4d 95 80 00 00 02 00) 
*/
24992.855459 |   7) + 15.104 us   |  scsi_dma_map();
24992.855476 |   7) + 19.271 us   |}
24992.855478 |   7)   |/* block_rq_issue: 8,48 R 0 () 
40676368 + 16 [fio] */
24992.855481 |   7)   |scsi_dispatch_cmd() {
24992.855481 |   7)   |  /* scsi_dispatch_cmd_start: 
host_no=0 channel=0 id=0 lun=3 data_sgl=2 prot_sgl=0 prot_op=SCSI_PROT_NORMAL 
cmnd=(READ_10 lba=5084546 txlen=2 protect=0 raw=28 00 00 4d 95 82 00 00 02 00) 
*/
24992.855483 |   7)   2.083 us|  scsi_dma_map();
24992.855486 |   7)   5.730 us|}
24992.855488 |   7)   |/* block_rq_issue: 8,48 R 0 () 
40676384 + 16 [fio] */
24992.855489 |   7)   |scsi_dispatch_cmd() {
24992.855490 |   7)   |  /* scsi_dispatch_cmd_start: 
host_no=0 channel=0 id=0 lun=3 data_sgl=2 prot_sgl=0 prot_op=SCSI_PROT_NORMAL 
cmnd=(READ_10 lba=5084548 txlen=2 protect=0 raw=28 00 00 4d 95 84 00 00 02 00) 
*/
24992.855491 |   7)   2.083 us|  scsi_dma_map();
24992.855495 |   7)   5.209 us|}
24992.855497 |   7)   |/* block_rq_issue: 8,48 R 0 () 
40676400 + 16 [fio] */
24992.855498 |   7)   |scsi_dispatch_cmd() {
24992.855499 |   7)   |  /* scsi_dispatch_cmd_start: 
host_no=0 channel=0 id=0 lun=3 data_sgl=2 prot_sgl=0 prot_op=SCSI_PROT_NORMAL 
cmnd=(READ_10 lba=5084550 txlen=2 protect=0 raw=28 00 00 4d 95 86 00 00 02 00) 
*/
24992.855500 |   7)   2.083 us|  scsi_dma_map();
24992.855503 |   7)   5.208 us|}
24992.855505 |   7)   |/* block_rq_issue: 8,48 R 0 () 
40676416 + 16 [fio] */
24992.855507 |   7)   |scsi_dispatch_cmd() {
24992.855508 |   7)   |  /* scsi_dispatch_cmd_start: 
host_no=0 channel=0 id=0 lun=3 data_sgl=2 prot_sgl=0 prot_op=SCSI_PROT_NORMAL 
cmnd=(READ_10 lba=5084552 txlen=2 protect=0 raw=28 00 00 4d 95 88 00 00 02 00) 
*/
24992.855509 |   7)   2.084 us|  scsi_dma_map();
24992.855513 |   7)   5.208 us|}
24992.855514 |   7)   |/* block_rq_issue: 8,48 R 0 () 
40676432 + 16 [fio] */
24992.855516 |   7)   |scsi_dispatch_cmd() {
24992.855516 |   7)   |  /* scsi_dispatch_cmd_start: 
host_no=0 channel=0 id=0 lun=3 data_sgl=2 prot_sgl=0 prot_op=SCSI_PROT_NORMAL 
cmnd=(READ_10 lba=5084554 txlen=2 protect=0 raw=28 00 00 4d 95 8a 00 00 02 00) 
*/
24992.855518

Re: [PATCH 0/5] qcom-ufs: phy/hcd: Refactor phy initialization code

2017-08-09 Thread Vivek Gautam

Hi Martin, Subhash


On Wed, Aug 9, 2017 at 11:18 AM, Kishon Vijay Abraham I  wrote:
> Vivek,
>
> On Tuesday 08 August 2017 09:20 PM, Vivek Gautam wrote:
>> Hi Koshon,
>>
>> On 2017-08-08 17:39, Kishon Vijay Abraham I wrote:
>>> Hi,
>>>
>>> On Friday 04 August 2017 12:18 PM, Vivek Gautam wrote:
 Refactoring the qcom-ufs phy and host controller code to move
 further towards the generic phy usage. Right now the qcom-ufs exports
 a bunch of APIs that are used by the host controller to initialize
 the phy.
 With this patch series, we populate the phy_init() which was a no-op
 earlier. The host controller then calls the phy_init() at the designated
 place rather than doing it invariably in ufs_hcd_init().

 As part of this series, we introduce phy modes for ufs phy.
 The M-PHY has two data rates defined for each generations (Gears) -
 Rate A and Rate B. These can serve as the two modes of ufs HS phy.
 Host controller can direct the phy to set the respective configurations
 based on the phy modes.

 The patch-series has been tested with necessary dt patches on db820c.
>>>
>>> Can the first 3 patches go independently of the other 2 or should all this 
>>> be
>>> merged together?
>>
>> The first 3 patches are independent, but the next 2 patches depend on those 3
>> for functionality.
>> I would prefer all to go in one tree. If you want to pull these in the phy 
>> tree,
>> I will request Subhash/Martin to ack the patches.

Can you kindly review this patch series (for UFS controller changes) and
consider giving your Ack so that Kishon can pull in the series through phy tree.
Thanks.

best regards
Vivek

>
> sure, that should be fine!
>
> Thanks
> Kishon
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

49 matches

Mail list logo