RE: [PATCH] aacraid: Fix 2T+ drives on SmartIOC-2000

2017-09-08 Thread Raghava Aditya Renukunta
> -Original Message-
> From: Dave Carroll [mailto:david.carr...@microsemi.com]
> Sent: Wednesday, August 30, 2017 3:23 PM
> To: Martin K . Petersen ; James Bottomley
> 
> Cc: Dave Carroll ; linux-scsi  s...@vger.kernel.org>; dl-esc-Aacraid Linux Driver
> ; Scott Benesh ;
> Brian King ; sta...@vger.kernel.org
> Subject: [PATCH] aacraid: Fix 2T+ drives on SmartIOC-2000
> 
> The logic for supporting large drives was previously tied to 4Kn support
> for SmartIOC-2000. As SmartIOC-2000 does not support volumes using 4Kn
> drives, use the intended option flag AAC_OPT_NEW_COMM_64 to
> determine
> support for volumes greater than 2T.
> 
> cc: sta...@vger.kernel.org
> Signed-off-by: Dave Carroll 
> 
> ---
>  drivers/scsi/aacraid/aachba.c  | 12 ++--
>  drivers/scsi/aacraid/aacraid.h |  5 +
>  2 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
> index 4591113..18a1a1f 100644
> --- a/drivers/scsi/aacraid/aachba.c
> +++ b/drivers/scsi/aacraid/aachba.c
> @@ -695,13 +695,13 @@ static void _aac_probe_container1(void * context,
> struct fib * fibptr)
>   int status;
> 
>   dresp = (struct aac_mount *) fib_data(fibptr);
> - if (!(fibptr->dev->supplement_adapter_info.supported_options2 &
> - AAC_OPTION_VARIABLE_BLOCK_SIZE))
> + if (!aac_supports_2T(fibptr->dev)) {
>   dresp->mnt[0].capacityhigh = 0;
> - if ((le32_to_cpu(dresp->status) != ST_OK) ||
> - (le32_to_cpu(dresp->mnt[0].vol) != CT_NONE)) {
> - _aac_probe_container2(context, fibptr);
> - return;
> + if ((le32_to_cpu(dresp->status) == ST_OK) &&
> + (le32_to_cpu(dresp->mnt[0].vol) != CT_NONE)) {
> + _aac_probe_container2(context, fibptr);
> + return;
> + }
>   }
>   scsicmd = (struct scsi_cmnd *) context;
> 
> diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
> index 6981299..998fbad 100644
> --- a/drivers/scsi/aacraid/aacraid.h
> +++ b/drivers/scsi/aacraid/aacraid.h
> @@ -2701,6 +2701,11 @@ static inline int aac_is_src(struct aac_dev *dev)
>   return 0;
>  }
> 
> +static inline int aac_supports_2T(struct aac_dev *dev)
> +{
> + return (dev->adapter_info.options & AAC_OPT_NEW_COMM_64);
> +}
> +
>  char * get_container_type(unsigned type);
>  extern int numacb;
>  extern char aac_driver_version[];
> --
> 2.8.4

Reviewed-by: Raghava Aditya Renukunta 


Re: [PATCH V3 7/8] block: allow to allocate req with REQF_PREEMPT when queue is preempt frozen

2017-09-08 Thread Bart Van Assche
On Fri, 2017-09-08 at 11:08 +0800, Ming Lei wrote:
> Looks I replied or clarified all your questions/comments on this
> patchset.

No, you have not addressed all my comments, you know that you have not
addressed all my comments so you should not have written that you have
addressed all my comments. This patch series not only introduces ugly
changes in the request queue freeze mechanism but it also introduces an
unacceptable race condition between blk_get_request() and request queue
cleanup.

BTW, you don't have to spend more time on this patch series. I have
implemented an alternative and much cleaner approach to fix SCSI device
suspend. I'm currently testing that patch series.

Bart.

Re: [PATCH V2 10/12] scsi: sd_zbc: Disable zone write locking with scsi-mq

2017-09-08 Thread Damien Le Moal
Ming,

On 9/8/17 05:43, Ming Lei wrote:
> Hi Damien,
> 
> On Fri, Sep 08, 2017 at 01:16:38AM +0900, Damien Le Moal wrote:
>> In the case of a ZBC disk used with scsi-mq, zone write locking does
>> not prevent write reordering in sequential zones. Unlike the legacy
>> case, zone locking can only be done after the command request is
>> removed from the scheduler dispatch queue. That is, at the time of
>> zone locking, the write command may already be out of order.
> 
> Per my understanding, for legacy case, it can be quite tricky to let
> the existed I/O scheduler guarantee the write order for ZBC disk.
> I guess requeue still might cause write reorder even in legacy path,
> since requeue can happen in both scsi_request_fn() and scsi_io_completion()
> with q->queue_lock released, meantime new rq belonging to the same
> zone can come and be inserted to queue.

Yes, the write ordering will always depend on the scheduler doing the
right thing. But both cfq, deadline and even noop do the right thing
there, even considering the aging case. The next write for a zone will
always be the oldest in the queue for that zone, if it is not, it means
that the application did not write sequentially. Extensive testing in
the legacy case never showed a problem due to the scheduler itself.

scsi_requeue_command() does the unprep (zone unlock) and requeue while
holding the queue lock. So this is atomic with new write command
insertion. Requeued commands are added to the dispatch queue head, and
since a zone will only have a single write in-flight, there is no
reordering possible. The next write command for a zone to go again is
the last requeued one or the next in lba order. It works.

Note that for write commands that failed due to an unaligned write
error, there is no retry done, so no requeue. The requeue case for
writes would only happen for other conditions (a dead drive being the
most likely in this case).

>> Disable zone write locking in sd_zbc_write_lock_zone() if the disk is
>> used with scsi-mq. Write order guarantees can be provided by an
>> adapted I/O scheduler.
> 
> Sounds a good idea to enhance the order in a new scheduler, will
> look at the following patch.

For blk-mq, I only tried mq-deadline. The zoned scheduler I posted is
based on it. There is no fundamental change to the ordering on
insertion. Only different choices on dispatch (using the zone lock).

For rotating rust and blk-mq, I think that getting calls to dispatch
serialized would naturally enhance ordering and also merging to some
extent. Ordering really gets killed when multiple context try to push
down requests, which each context ending up each with only a few
requests in their local dispatch lists. Some initial patch I wrote for
zbc that attacked the problem from within blk-mq did that serialization.
That is not mandatory anymore with the zoned scheduler, but I think
would still be benefitial to both ZBC disks and standard disks too.

Best regards.


-- 
Damien Le Moal,
Western Digital Research


Re: [PATCH V2 00/12] scsi-mq support for ZBC disks

2017-09-08 Thread Damien Le Moal
Christoph,

On 9/8/17 01:20, Christoph Hellwig wrote:
> I'm really worried about how this is tied into the sd driver.
> 
> From a quick look it seems like the only thing you really need in
> the scheduler that is done through struct scsi_disk is the
> offset + len -> zone number lookup - I'd much rather expose this
> from sd upwards and keep the scheduler in the block code.

The only things that the scheduler needs are:
1) The zone size and the number of zones of the device (for the bitmaps
allocation and offset->zone number conversion).
2) Zone type for the optimization that avoids locking conventional zones.

(2) is optional. We can do without, but still really nice to have from a
performance perspective as conventional zones tend to be used for
storing metadata. So a lot of small random writes is more likely and
high queue depth writing would improve performance significantly.

For (1), the zone size is known through q->limits.chunk_sectors. But the
disk capacity is not known using request_queue only, so the number of
zones cannot be calculated... I thought of exporting it through queue
limits too, but then stacking of device mappers using ZBC drives becomes
a pain as the number of zones needs to be recalculated.

The best solution would really be to be able to get a disk capacity
given a request queue. Any idea on how to do this ?
For optimization (2), there is also the problem that given a request
queue, the block_device struct is not known, so it is not possible to
issue report zones (granted, the scheduler could build the request
directly to do that, but that is really an ugly solution).

I did not find any nice solution for either problem, so ended up tying
the scheduler to sd since the scsi_disk struct already had all the info
needed.

> Except for that this looks like a fine approach to me.

Thanks.

-- 
Damien Le Moal,
Western Digital Research


Re: [PATCH V2 10/12] scsi: sd_zbc: Disable zone write locking with scsi-mq

2017-09-08 Thread Ming Lei
Hi Damien,

On Fri, Sep 08, 2017 at 01:16:38AM +0900, Damien Le Moal wrote:
> In the case of a ZBC disk used with scsi-mq, zone write locking does
> not prevent write reordering in sequential zones. Unlike the legacy
> case, zone locking can only be done after the command request is
> removed from the scheduler dispatch queue. That is, at the time of
> zone locking, the write command may already be out of order.

Per my understanding, for legacy case, it can be quite tricky to let
the existed I/O scheduler guarantee the write order for ZBC disk.
I guess requeue still might cause write reorder even in legacy path,
since requeue can happen in both scsi_request_fn() and scsi_io_completion()
with q->queue_lock released, meantime new rq belonging to the same
zone can come and be inserted to queue.

> 
> Disable zone write locking in sd_zbc_write_lock_zone() if the disk is
> used with scsi-mq. Write order guarantees can be provided by an
> adapted I/O scheduler.

Sounds a good idea to enhance the order in a new scheduler, will
look at the following patch.

-- 
Ming


Re: [PATCH V2 09/12] scsi: sd_zbc: Limit zone write locking to sequential zones

2017-09-08 Thread Johannes Thumshirn
On Fri, Sep 08, 2017 at 11:48:09AM +0200, Christoph Hellwig wrote:
> On Fri, Sep 08, 2017 at 10:39:25AM +0200, Johannes Thumshirn wrote:
> > It's a shame we have all this overflow checking kcalloc() and
> > kmalloc_array() functions but none of them is taking NUMA nodes into
> > account.
> 
> Something that could be trivially fixed..

I know I'm on it ;-)

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


[PATCH] mpt3sas: Fix a double unlock in _transport_smp_handler()

2017-09-08 Thread Dan Carpenter
We can't "goto out;" if we're not holding "ioc->transport_cmds.mutex".
It leads to a double unlock bug, and I don't think we should set
"ioc->transport_cmds.status" if we don't have the lock.

Fixes: 651a01364994 ("scsi: scsi_transport_sas: switch to bsg-lib for SMP 
passthrough")
Signed-off-by: Dan Carpenter 
---
I'm not totally sure about the .status thing.  This is a static checker
fix.

diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c 
b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index d3940c5d079d..c9cd9ed90002 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -1936,12 +1936,12 @@ _transport_smp_handler(struct bsg_job *job, struct 
Scsi_Host *shost,
pr_info(MPT3SAS_FMT "%s: host reset in progress!\n",
__func__, ioc->name);
rc = -EFAULT;
-   goto out;
+   goto job_done;
}
 
rc = mutex_lock_interruptible(>transport_cmds.mutex);
if (rc)
-   goto out;
+   goto job_done;
 
if (ioc->transport_cmds.status != MPT3_CMD_NOT_USED) {
pr_err(MPT3SAS_FMT "%s: transport_cmds in use\n", ioc->name,
@@ -2066,6 +2066,7 @@ _transport_smp_handler(struct bsg_job *job, struct 
Scsi_Host *shost,
  out:
ioc->transport_cmds.status = MPT3_CMD_NOT_USED;
mutex_unlock(>transport_cmds.mutex);
+ job_done:
bsg_job_done(job, rc, reslen);
 }
 


Re: [PATCH V2 09/12] scsi: sd_zbc: Limit zone write locking to sequential zones

2017-09-08 Thread Christoph Hellwig
On Fri, Sep 08, 2017 at 10:39:25AM +0200, Johannes Thumshirn wrote:
> It's a shame we have all this overflow checking kcalloc() and
> kmalloc_array() functions but none of them is taking NUMA nodes into
> account.

Something that could be trivially fixed..


Re: [PATCH V2 09/12] scsi: sd_zbc: Limit zone write locking to sequential zones

2017-09-08 Thread Johannes Thumshirn
On Fri, Sep 08, 2017 at 01:16:37AM +0900, Damien Le Moal wrote:
> +/*
> + * Allocate a zone bitmap (one bit per zone).
> + */
> +static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
> +{
> + struct request_queue *q = sdkp->disk->queue;
> +
> + return kzalloc_node(BITS_TO_LONGS(sdkp->nr_zones)
> + * sizeof(unsigned long),
> + GFP_KERNEL, q->node);
> +}

It's a shame we have all this overflow checking kcalloc() and
kmalloc_array() functions but none of them is taking NUMA nodes into
account.

But that has nothing to do with your patch, just a general rant,
Reviewed-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH V2 00/12] scsi-mq support for ZBC disks

2017-09-08 Thread Christoph Hellwig
I'm really worried about how this is tied into the sd driver.

>From a quick look it seems like the only thing you really need in
the scheduler that is done through struct scsi_disk is the
offset + len -> zone number lookup - I'd much rather expose this
from sd upwards and keep the scheduler in the block code.

Except for that this looks like a fine approach to me.


Re: [PATCH V2 08/12] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()

2017-09-08 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH V2 07/12] scsi: sd_zbc.c: Use well defined macros

2017-09-08 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH V2 06/12] scsi: sd_zbc: Rearrange code

2017-09-08 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH V2 05/12] scsi: sd_zbc: Fix comments and indentation

2017-09-08 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH V2 04/12] scsi: sd_zbc: Move zbc disk declarations to sd_zbc.h

2017-09-08 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH V2 03/12] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h

2017-09-08 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH V2 02/12] block: Fix declaration of blk-mq scheduler functions

2017-09-08 Thread Johannes Thumshirn
Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH V2 01/12] block: Fix declaration of blk-mq debugfs functions

2017-09-08 Thread Johannes Thumshirn
On Fri, Sep 08, 2017 at 01:16:29AM +0900, Damien Le Moal wrote:
> __blk_mq_debugfs_rq_show() and blk_mq_debugfs_rq_show() are exported
> symbols but ar eonly declared in the block internal file
 are only

Otherwise this looks good to me,
Reviewed-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


[PATCH] scsi: lpfc: remove redundant null check on eqe

2017-09-08 Thread Colin King
From: Colin Ian King 

The pointer eqe is always non-null inside the while loop, so the
check to see if eqe is NULL is redudant and hence can be removed.

Detected by CoverityScan CID#1248693 ("Logically Dead Code")

Signed-off-by: Colin Ian King 
---
 drivers/scsi/lpfc/lpfc_sli.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 8b119f87b51d..80036a9fae7f 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -13760,9 +13760,6 @@ lpfc_sli4_hba_intr_handler(int irq, void *dev_id)
 * Process all the event on FCP fast-path EQ
 */
while ((eqe = lpfc_sli4_eq_get(fpeq))) {
-   if (eqe == NULL)
-   break;
-
ccount += lpfc_sli4_hba_handle_eqe(phba, eqe, hba_eqidx);
if (!(++ecount % fpeq->entry_repost) ||
ccount > LPFC_MAX_ISR_CQE)
-- 
2.14.1



[PATCH] scsi: shost->async_scan should be protected by mutex_lock

2017-09-08 Thread Ouyangzhaowei (Charles)
shost->async_scan should be protected by mutex_lock, otherwise the check
of "called twice" won't work.

Signed-off-by: Ouyang Zhaowei 
---
 drivers/scsi/scsi_scan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index fd88dab..1d1df51 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1722,6 +1722,7 @@ static struct async_scan_data
*scsi_prep_async_scan(struct Scsi_Host *shost)
if (strncmp(scsi_scan_type, "sync", 4) == 0)
return NULL;

+   mutex_lock(>scan_mutex);
if (shost->async_scan) {
shost_printk(KERN_DEBUG, shost, "%s called twice\n",
__func__);
return NULL;
@@ -1735,7 +1736,6 @@ static struct async_scan_data
*scsi_prep_async_scan(struct Scsi_Host *shost)
goto err;
init_completion(>prev_finished);

-   mutex_lock(>scan_mutex);
spin_lock_irqsave(shost->host_lock, flags);
shost->async_scan = 1;
spin_unlock_irqrestore(shost->host_lock, flags);