Re: [PATCH V4 08/10] block: allow to allocate req with RQF_PREEMPT when queue is preempt frozen

2017-09-11 Thread Ming Lei
On Mon, Sep 11, 2017 at 04:03:55PM +, Bart Van Assche wrote:
> On Mon, 2017-09-11 at 19:10 +0800, Ming Lei wrote:
> > @@ -787,6 +787,35 @@ int blk_queue_enter(struct request_queue *q, unsigned 
> > flags)
> > if (percpu_ref_tryget_live(>q_usage_counter))
> > return 0;
> >  
> > +   /*
> > +* If queue is preempt frozen and caller need to allocate
> > +* request for RQF_PREEMPT, we grab the .q_usage_counter
> > +* unconditionally and return successfully.
> > +*
> > +* There isn't race with queue cleanup because:
> > +*
> > +* 1) it is guaranteed that preempt freeze can't be
> > +* started after queue is set as dying
> > +*
> > +* 2) normal freeze runs exclusively with preempt
> > +* freeze, so even after queue is set as dying
> > +* afterwards, blk_queue_cleanup() won't move on
> > +* until preempt freeze is done
> > +*
> > +* 3) blk_queue_dying() needn't to be checked here
> > +*  - for legacy path, it will be checked in
> > +*  __get_request()
> 
> For the legacy block layer core, what do you think will happen if the
> "dying" state is set by another thread after __get_request() has passed the
> blk_queue_dying() check?

Without this patchset, block core still need to handle the above
situation, so your question isn't related with this patchset.

Also q->queue_lock is required in both setting dying and checking
dying in__get_request(). But the lock can be released in
__get_request(), so it is possible to allocate one request after
queue is set as dying, and the request can be dispatched to a
dying queue too for legacy.

> 
> > +*  - blk-mq depends on driver to handle dying well
> > +*  because it is normal for queue to be set as dying
> > +*  just between blk_queue_enter() and allocating new
> > +*  request.
> 
> The above comment is not correct. The block layer core handles the "dying"
> state. Block drivers other than dm-mpath should not have to query this state
> directly.

If blk-mq doesn't query dying state, how does it know queue is dying
and handle the state? Also blk-mq isn't different with legacy wrt.
depending on driver to handle dying.

> 
> > +*/
> > +   if ((flags & BLK_REQ_PREEMPT) &&
> > +   blk_queue_is_preempt_frozen(q)) {
> > +   blk_queue_enter_live(q);
> > +   return 0;
> > +   }
> > +
> 
> Sorry but to me it looks like the above code introduces a race condition
> between blk_queue_cleanup() and blk_get_request() for at least blk-mq.
> Consider e.g. the following scenario:
> * A first thread preempt-freezes a request queue.
> * A second thread calls blk_get_request() with BLK_REQ_PREEMPT set. That
>   results in a call of blk_queue_is_preempt_frozen().
> * A context switch occurs to the first thread.
> * The first thread preempt-unfreezes the same request queue and calls
>   blk_queue_cleanup(). That last function changes the request queue state
>   into DYING and waits until all pending requests have finished.
> * The second thread continues and calls blk_queue_enter_live(), allocates
>   a request and submits it.

OK, looks a race I don't think of, but it can be fixed easily by calling
blk_queue_enter_live() with holding q->freeze_lock, and it won't
cause performance issue too since it is in slow path.

For example, we can introduce the following code in blk_queue_enter():

if ((flags & BLK_REQ_PREEMPT) &&
blk_queue_enter_preempt_freeze(q))
return 0;

static inline bool blk_queue_enter_preempt_freeze(struct request_queue *q)
{
bool preempt_frozen;

spin_lock(>freeze_lock);
preempt_frozen = q->preempt_freezing && !q->preempt_unfreezing;
if (preempt_froze)
blk_queue_enter_live(q);
spin_unlock(>freeze_lock);

return preempt_frozen;
}

> 
> In other words, a request gets submitted against a dying queue. This must
> not happen. See also my explanation of queue shutdown from a few days ago

That is not correct, think about why queue dead is checked in
__blk_run_queue_uncond() instead of queue dying. We still need to
submit requests to driver when queue is dying, and driver knows
how to handle that.

> (https://marc.info/?l=linux-block=150449845831789).

> from (https://marc.info/?l=linux-block=150449845831789).
>> Do you understand how request queue cleanup works? The algorithm used for
>> request queue cleanup is as follows:
>> * Set the DYING flag. This flag makes all later blk_get_request() calls
>>   fail.

Your description isn't true for both legacy and blk-mq:

For legacy, as you see q->queue_lock can be released in
__get_request(), at that time, the 

Re: [PATCH V4 0/10] block/scsi: safe SCSI quiescing

2017-09-11 Thread Oleksandr Natalenko
For v4 with regard to suspend/resume:

Tested-by: Oleksandr Natalenko 

On pondělí 11. září 2017 13:10:11 CEST Ming Lei wrote:
> Hi,
> 
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> 
> Once SCSI device is put into QUIESCE, no new request except for
> RQF_PREEMPT can be dispatched to SCSI successfully, and
> scsi_device_quiesce() just simply waits for completion of I/Os
> dispatched to SCSI stack. It isn't enough at all.
> 
> Because new request still can be comming, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
> 
> Then request with RQF_PREEMPT can't be allocated and wait forever,
> meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
> then system hangs forever, such as during system suspend or
> sending SCSI domain alidation.
> 
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
> 
> This patch introduces preempt freeze, and solves the issue
> by preempt freezing block queue during SCSI quiesce, and allows
> to allocate request of RQF_PREEMPT when queue is in this state.
> 
> Oleksandr verified that V3 does fix the hang during suspend/resume,
> and Cathy verified that revised V3 fixes hang in sending
> SCSI domain validation.
> 
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all by introducing/unifying blk_freeze_queue_preempt() and
> blk_unfreeze_queue_preempt(), and cleanup is done together.
> 
> The patchset can be found in the following gitweb:
> 
>   https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V4
> 
> V4:
>   - reorganize patch order to make it more reasonable
>   - support nested preempt freeze, as required by SCSI transport spi
>   - check preempt freezing in slow path of of blk_queue_enter()
>   - add "SCSI: transport_spi: resume a quiesced device"
>   - wake up freeze queue in setting dying for both blk-mq and legacy
>   - rename blk_mq_[freeze|unfreeze]_queue() in one patch
>   - rename .mq_freeze_wq and .mq_freeze_depth
>   - improve comment
> 
> V3:
>   - introduce q->preempt_unfreezing to fix one bug of preempt freeze
>   - call blk_queue_enter_live() only when queue is preempt frozen
>   - cleanup a bit on the implementation of preempt freeze
>   - only patch 6 and 7 are changed
> 
> V2:
>   - drop the 1st patch in V1 because percpu_ref_is_dying() is
>   enough as pointed by Tejun
>   - introduce preempt version of blk_[freeze|unfreeze]_queue
>   - sync between preempt freeze and normal freeze
>   - fix warning from percpu-refcount as reported by Oleksandr
> 
> 
> [1] https://marc.info/?t=150340250100013=3=2
> 
> 
> Thanks,
> Ming
> 
> 
> Ming Lei (10):
>   blk-mq: only run hw queues for blk-mq
>   block: tracking request allocation with q_usage_counter
>   blk-mq: rename blk_mq_[freeze|unfreeze]_queue
>   blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
>   block: rename .mq_freeze_wq and .mq_freeze_depth
>   block: pass flags to blk_queue_enter()
>   block: introduce preempt version of blk_[freeze|unfreeze]_queue
>   block: allow to allocate req with RQF_PREEMPT when queue is preempt
> frozen
>   SCSI: transport_spi: resume a quiesced device
>   SCSI: preempt freeze block queue when SCSI device is put into quiesce
> 
>  block/bfq-iosched.c   |   2 +-
>  block/blk-cgroup.c|   8 +-
>  block/blk-core.c  |  95 
>  block/blk-mq.c| 180
> -- block/blk-mq.h| 
>  1 -
>  block/blk-timeout.c   |   2 +-
>  block/blk.h   |  12 +++
>  block/elevator.c  |   4 +-
>  drivers/block/loop.c  |  24 ++---
>  drivers/block/rbd.c   |   2 +-
>  drivers/nvme/host/core.c  |   8 +-
>  drivers/scsi/scsi_lib.c   |  25 +-
>  drivers/scsi/scsi_transport_spi.c |   3 +
>  fs/block_dev.c|   4 +-
>  include/linux/blk-mq.h|  15 ++--
>  include/linux/blkdev.h|  32 +--
>  16 files changed, 313 insertions(+), 104 deletions(-)




[PATCH] scsi: acornscsi: fix build error

2017-09-11 Thread Arnd Bergmann
A cleanup patch introduced a fatal typo from inbalanced curly
braces:

drivers/scsi/arm/acornscsi.c: In function 'acornscsi_host_reset':
drivers/scsi/arm/acornscsi.c:2773:1: error: ISO C90 forbids mixed declarations 
and code [-Werror=declaration-after-statement]
drivers/scsi/arm/acornscsi.c:2795:12: error: invalid storage class for function 
'acornscsi_show_info'
 static int acornscsi_show_info(struct seq_file *m, struct Scsi_Host *instance)

The same patch incorrectly changed the argument type of the reset
handler, as shown by this warning:

drivers/scsi/arm/acornscsi.c:2888:27: error: initialization of 'int (*)(struct 
scsi_cmnd *)' from incompatible pointer type 'int (*)(struct Scsi_Host *)' 
[-Werror=incompatible-pointer-types]
  .eh_host_reset_handler = acornscsi_host_reset,

This removes one the extraneous opening brace and reverts the
argument type change.

Fixes: 74fa80ee3fae ("scsi: acornscsi: move bus reset to host reset")
Signed-off-by: Arnd Bergmann 
---
 drivers/scsi/arm/acornscsi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/arm/acornscsi.c b/drivers/scsi/arm/acornscsi.c
index 690816f3c6af..8d54283a99b2 100644
--- a/drivers/scsi/arm/acornscsi.c
+++ b/drivers/scsi/arm/acornscsi.c
@@ -2725,9 +2725,9 @@ int acornscsi_abort(struct scsi_cmnd *SCpnt)
  * Params   : SCpnt  - command causing reset
  * Returns  : one of SCSI_RESET_ macros
  */
-int acornscsi_host_reset(struct Scsi_Host *shpnt)
+int acornscsi_host_reset(struct scsi_cmnd *SCpnt)
 {
-   AS_Host *host = (AS_Host *)shpnt->hostdata;
+   AS_Host *host = (AS_Host *)SCpnt->device->host->hostdata;
struct scsi_cmnd *SCptr;
 
 host->stats.resets += 1;
@@ -2741,7 +2741,7 @@ int acornscsi_host_reset(struct Scsi_Host *shpnt)
 
printk(KERN_WARNING "acornscsi_reset: ");
print_sbic_status(asr, ssr, host->scsi.phase);
-   for (devidx = 0; devidx < 9; devidx ++) {
+   for (devidx = 0; devidx < 9; devidx ++)
acornscsi_dumplog(host, devidx);
 }
 #endif
-- 
2.9.0



Re: [PATCH V4 08/10] block: allow to allocate req with RQF_PREEMPT when queue is preempt frozen

2017-09-11 Thread Bart Van Assche
On Mon, 2017-09-11 at 19:10 +0800, Ming Lei wrote:
> @@ -787,6 +787,35 @@ int blk_queue_enter(struct request_queue *q, unsigned 
> flags)
>   if (percpu_ref_tryget_live(>q_usage_counter))
>   return 0;
>  
> + /*
> +  * If queue is preempt frozen and caller need to allocate
> +  * request for RQF_PREEMPT, we grab the .q_usage_counter
> +  * unconditionally and return successfully.
> +  *
> +  * There isn't race with queue cleanup because:
> +  *
> +  * 1) it is guaranteed that preempt freeze can't be
> +  * started after queue is set as dying
> +  *
> +  * 2) normal freeze runs exclusively with preempt
> +  * freeze, so even after queue is set as dying
> +  * afterwards, blk_queue_cleanup() won't move on
> +  * until preempt freeze is done
> +  *
> +  * 3) blk_queue_dying() needn't to be checked here
> +  *  - for legacy path, it will be checked in
> +  *  __get_request()

For the legacy block layer core, what do you think will happen if the
"dying" state is set by another thread after __get_request() has passed the
blk_queue_dying() check?

> +  *  - blk-mq depends on driver to handle dying well
> +  *  because it is normal for queue to be set as dying
> +  *  just between blk_queue_enter() and allocating new
> +  *  request.

The above comment is not correct. The block layer core handles the "dying"
state. Block drivers other than dm-mpath should not have to query this state
directly.

> +  */
> + if ((flags & BLK_REQ_PREEMPT) &&
> + blk_queue_is_preempt_frozen(q)) {
> + blk_queue_enter_live(q);
> + return 0;
> + }
> +

Sorry but to me it looks like the above code introduces a race condition
between blk_queue_cleanup() and blk_get_request() for at least blk-mq.
Consider e.g. the following scenario:
* A first thread preempt-freezes a request queue.
* A second thread calls blk_get_request() with BLK_REQ_PREEMPT set. That
  results in a call of blk_queue_is_preempt_frozen().
* A context switch occurs to the first thread.
* The first thread preempt-unfreezes the same request queue and calls
  blk_queue_cleanup(). That last function changes the request queue state
  into DYING and waits until all pending requests have finished.
* The second thread continues and calls blk_queue_enter_live(), allocates
  a request and submits it.

In other words, a request gets submitted against a dying queue. This must
not happen. See also my explanation of queue shutdown from a few days ago
(https://marc.info/?l=linux-block=150449845831789).

Bart.

Re: [man-pages PATCH] cciss.4, hpsa.4: mention cciss removal in Linux 4.13

2017-09-11 Thread Michael Kerrisk (man-pages)
Hi Eugene,

On 11 September 2017 at 14:36, Eugene Syromyatnikov  wrote:
> On Mon, Sep 11, 2017 at 8:58 AM, Meelis Roos  wrote:
>> On Mon, 11 Sep 2017, Eugene Syromyatnikov wrote:
>>
>>> During the Linux 4.13 development cycle, cciss driver has been removed
>>> in flavor to hpsa driver that has been amended with some legacy board
>>> support.
>>
>> It's gone in 4.14 - 4.13 works the old way.
>
> Uhh, my bad, sorry: forgot to add --contain to git describe. I'll
> resend the patch.

Too late. I'll fix manually.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/


RE: [man-pages PATCH v2] cciss.4, hpsa.4: mention cciss removal in Linux 4.14

2017-09-11 Thread Don Brace
> -Original Message-
> From: Eugene Syromyatnikov [mailto:evg...@gmail.com]
> Sent: Monday, September 11, 2017 7:48 AM
> To: mtk.manpa...@gmail.com
> Cc: linux-...@vger.kernel.org; Hannes Reinecke ; Martin
> K. Petersen ; Christoph Hellwig
> ; James Bottomley
> ; Don Brace
> ; Meelis Roos ; linux-
> s...@vger.kernel.org
> Subject: [man-pages PATCH v2] cciss.4, hpsa.4: mention cciss removal in
> Linux 4.14
> 
> EXTERNAL EMAIL
> 
> 
> During the Linux 4.14 development cycle, cciss driver has been removed
> in flavor to hpsa driver that has been amended with some legacy board
> support.
> 
> * man4/cciss.4 (.SH DESCRIPTION): Mention driver removal.
> * man4/hpsa.4 (.SH DESCRIPTION): Mention list of boards that recognised
> since Linux 4.14.
> 
> Signed-off-by: Eugene Syromyatnikov 
> ---
>  man4/cciss.4 |  7 +++
>  man4/hpsa.4  | 26 ++
>  2 files changed, 33 insertions(+)

Acked-by: Don Brace 
Nice find.
Thanks for the patch.

Thanks,
Don Brace
ESC - Smart Storage
Microsemi Corporation


> 
> diff --git a/man4/cciss.4 b/man4/cciss.4
> index e6ba93d..4b543ba 100644
> --- a/man4/cciss.4
> +++ b/man4/cciss.4
> @@ -15,6 +15,13 @@ cciss \- HP Smart Array block driver
>  modprobe cciss [ cciss_allow_hpsa=1 ]
>  .fi
>  .SH DESCRIPTION
> +.\" commit 253d2464df446456c0bba5ed4137a7be0b278aa8
> +.BR Note :
> +This obsolete driver was removed from the kernel in version 4.14,
> +as it is superseded by
> +.BR hpsa (4)
> +driver in newer kernels.
> +.PP
>  .B cciss
>  is a block driver for older HP Smart Array RAID controllers.
>  .SS Options
> diff --git a/man4/hpsa.4 b/man4/hpsa.4
> index 63000bf..64f4536 100644
> --- a/man4/hpsa.4
> +++ b/man4/hpsa.4
> @@ -52,6 +52,32 @@ driver supports the following Smart Array boards:
>  Smart Array P711m
>  StorageWorks P1210m
>  .fi
> +.PP
> +.\" commit 135ae6edeb51979d0998daf1357f149a7d6ebb08
> +Since Linux 4.14, the following Smart Array boards are also supported:
> +.PP
> +.nf
> +Smart Array 5300
> +Smart Array 5312
> +Smart Array 532
> +Smart Array 5i
> +Smart Array 6400
> +Smart Array 6400 EM
> +Smart Array 641
> +Smart Array 642
> +Smart Array 6i
> +Smart Array E200
> +Smart Array E200i
> +Smart Array E200i
> +Smart Array E200i
> +Smart Array E200i
> +Smart Array E500
> +Smart Array P400
> +Smart Array P400i
> +Smart Array P600
> +Smart Array P700m
> +Smart Array P800
> +.fi
>  .SS Configuration details
>  To configure HP Smart Array controllers,
>  use the HP Array Configuration Utility (either
> --
> 2.1.4



[PATCH] scsi: qla2xxx: remove unnecessary call to memset

2017-09-11 Thread Himanshu Jha
call to memset to assign 0 value immediately after allocating
memory with kzalloc is unnecesaary as kzalloc allocates the memory
filled with 0 value.

Semantic patch used to resolve this issue:

@@
expression e,e2; constant c;
statement S;
@@

  e = kzalloc(e2, c);
  if(e == NULL) S
- memset(e, 0, e2);

Signed-off-by: Himanshu Jha 
---
 drivers/scsi/qla2xxx/qla_init.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c
index b5b48dd..54c1d63 100644
--- a/drivers/scsi/qla2xxx/qla_init.c
+++ b/drivers/scsi/qla2xxx/qla_init.c
@@ -7917,7 +7917,6 @@ struct qla_qpair *qla2xxx_create_qpair(struct 
scsi_qla_host *vha, int qos,
"Failed to allocate memory for queue pair.\n");
return NULL;
}
-   memset(qpair, 0, sizeof(struct qla_qpair));
 
qpair->hw = vha->hw;
qpair->vha = vha;
-- 
2.7.4



[man-pages PATCH v2] cciss.4, hpsa.4: mention cciss removal in Linux 4.14

2017-09-11 Thread Eugene Syromyatnikov
During the Linux 4.14 development cycle, cciss driver has been removed
in flavor to hpsa driver that has been amended with some legacy board
support.

* man4/cciss.4 (.SH DESCRIPTION): Mention driver removal.
* man4/hpsa.4 (.SH DESCRIPTION): Mention list of boards that recognised
since Linux 4.14.

Signed-off-by: Eugene Syromyatnikov 
---
 man4/cciss.4 |  7 +++
 man4/hpsa.4  | 26 ++
 2 files changed, 33 insertions(+)

diff --git a/man4/cciss.4 b/man4/cciss.4
index e6ba93d..4b543ba 100644
--- a/man4/cciss.4
+++ b/man4/cciss.4
@@ -15,6 +15,13 @@ cciss \- HP Smart Array block driver
 modprobe cciss [ cciss_allow_hpsa=1 ]
 .fi
 .SH DESCRIPTION
+.\" commit 253d2464df446456c0bba5ed4137a7be0b278aa8
+.BR Note :
+This obsolete driver was removed from the kernel in version 4.14,
+as it is superseded by
+.BR hpsa (4)
+driver in newer kernels.
+.PP
 .B cciss
 is a block driver for older HP Smart Array RAID controllers.
 .SS Options
diff --git a/man4/hpsa.4 b/man4/hpsa.4
index 63000bf..64f4536 100644
--- a/man4/hpsa.4
+++ b/man4/hpsa.4
@@ -52,6 +52,32 @@ driver supports the following Smart Array boards:
 Smart Array P711m
 StorageWorks P1210m
 .fi
+.PP
+.\" commit 135ae6edeb51979d0998daf1357f149a7d6ebb08
+Since Linux 4.14, the following Smart Array boards are also supported:
+.PP
+.nf
+Smart Array 5300
+Smart Array 5312
+Smart Array 532
+Smart Array 5i
+Smart Array 6400
+Smart Array 6400 EM
+Smart Array 641
+Smart Array 642
+Smart Array 6i
+Smart Array E200
+Smart Array E200i
+Smart Array E200i
+Smart Array E200i
+Smart Array E200i
+Smart Array E500
+Smart Array P400
+Smart Array P400i
+Smart Array P600
+Smart Array P700m
+Smart Array P800
+.fi
 .SS Configuration details
 To configure HP Smart Array controllers,
 use the HP Array Configuration Utility (either
-- 
2.1.4



Re: [man-pages PATCH] cciss.4, hpsa.4: mention cciss removal in Linux 4.13

2017-09-11 Thread Eugene Syromyatnikov
On Mon, Sep 11, 2017 at 8:58 AM, Meelis Roos  wrote:
> On Mon, 11 Sep 2017, Eugene Syromyatnikov wrote:
>
>> During the Linux 4.13 development cycle, cciss driver has been removed
>> in flavor to hpsa driver that has been amended with some legacy board
>> support.
>
> It's gone in 4.14 - 4.13 works the old way.

Uhh, my bad, sorry: forgot to add --contain to git describe. I'll
resend the patch.

-- 
Eugene Syromyatnikov
mailto:evg...@gmail.com
xmpp:esyr@jabber.{ru|org}


Re: [PATCH V2 00/12] scsi-mq support for ZBC disks

2017-09-11 Thread Christoph Hellwig
On Fri, Sep 08, 2017 at 09:12:12AM -0700, Damien Le Moal wrote:
> 1) The zone size and the number of zones of the device (for the bitmaps
> allocation and offset->zone number conversion).
> 2) Zone type for the optimization that avoids locking conventional zones.
> 
> (2) is optional. We can do without, but still really nice to have from a
> performance perspective as conventional zones tend to be used for
> storing metadata. So a lot of small random writes is more likely and
> high queue depth writing would improve performance significantly.
> 
> For (1), the zone size is known through q->limits.chunk_sectors. But the
> disk capacity is not known using request_queue only, so the number of
> zones cannot be calculated... I thought of exporting it through queue
> limits too, but then stacking of device mappers using ZBC drives becomes
> a pain as the number of zones needs to be recalculated.

For 4.14-rc+ you should be able to easily get at the gendisk as the
whole submission path is now gendisk based, although it might need
some minor argument reshuffle to pass it instead of the request_queue
in a few places.  Note that for everything passing the request
you can get the gendisk from the request as it contains a pointer.

The only annoying issue is that some of our passthrough character
device callers don't have a gendisk at all, which causes
problems all over and should probably be fixed at some point.


Re: 4.13+git: undefined references to bsg_setup_queue and bsg_job_done

2017-09-11 Thread Johannes Thumshirn
On Mon, Sep 11, 2017 at 02:58:59PM +0300, Meelis Roos wrote:
> Just went and changed kernel conf to HPSA instead of old CCISS but got a 
> compilation failure:
> 
> drivers/scsi/scsi_transport_sas.o: In function `sas_bsg_initialize':
> scsi_transport_sas.c:(.text+0x12fd): undefined reference to `bsg_setup_queue'
> scsi_transport_sas.c:(.text+0x13b2): undefined reference to `bsg_setup_queue'
> drivers/scsi/scsi_transport_sas.o: In function `sas_smp_dispatch':
> scsi_transport_sas.c:(.text+0x188e): undefined reference to `bsg_job_done'
> Makefile:1000: recipe for target 'vmlinux' failed
> make: *** [vmlinux] Error 1

There's a fix from Arnd [1] for this and it is in Martin's queue.

[1] 580b71e9f64e0e6e9063466fce4564c56156695d

Byte,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: 4.13+git: undefined references to bsg_setup_queue and bsg_job_done

2017-09-11 Thread Christoph Hellwig
On Mon, Sep 11, 2017 at 02:58:59PM +0300, Meelis Roos wrote:
> Just went and changed kernel conf to HPSA instead of old CCISS but got a 
> compilation failure:

Martin has already applied a fix for this from Arnd, but it doesn't
seem to have made it to Linus yet due to the usual detour
via James' tree.

The fix is here:

https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=4.14/scsi-queue=580b71e9f64e0e6e9063466fce4564c56156695d


4.13+git: undefined references to bsg_setup_queue and bsg_job_done

2017-09-11 Thread Meelis Roos
Just went and changed kernel conf to HPSA instead of old CCISS but got a 
compilation failure:

drivers/scsi/scsi_transport_sas.o: In function `sas_bsg_initialize':
scsi_transport_sas.c:(.text+0x12fd): undefined reference to `bsg_setup_queue'
scsi_transport_sas.c:(.text+0x13b2): undefined reference to `bsg_setup_queue'
drivers/scsi/scsi_transport_sas.o: In function `sas_smp_dispatch':
scsi_transport_sas.c:(.text+0x188e): undefined reference to `bsg_job_done'
Makefile:1000: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1


Config:
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.13.0 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
CONFIG_KERNEL_LZMA=y
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# CONFIG_BUILD_BIN2C is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_BPF is not set
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_SOCK_CGROUP_DATA is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y

[PATCH V4 09/10] SCSI: transport_spi: resume a quiesced device

2017-09-11 Thread Ming Lei
We have to preempt freeze queue in scsi_device_quiesce(),
and unfreeze in scsi_device_resume(), so call scsi_device_resume()
for the device which is quiesced by scsi_device_quiesce().

Signed-off-by: Ming Lei 
---
 drivers/scsi/scsi_transport_spi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/scsi_transport_spi.c 
b/drivers/scsi/scsi_transport_spi.c
index d0219e36080c..bdecf99b6ab1 100644
--- a/drivers/scsi/scsi_transport_spi.c
+++ b/drivers/scsi/scsi_transport_spi.c
@@ -1040,6 +1040,9 @@ spi_dv_device(struct scsi_device *sdev)
 
scsi_target_resume(starget);
 
+   /* undo what scsi_device_quiesce() did */
+   scsi_device_resume(sdev);
+
spi_initial_dv(starget) = 1;
 
  out_free:
-- 
2.9.5



[PATCH V4 10/10] SCSI: preempt freeze block queue when SCSI device is put into quiesce

2017-09-11 Thread Ming Lei
Simply quiesing SCSI device and waiting for completeion of IO
dispatched to SCSI queue isn't safe, it is easy to use up
request pool because all allocated requests before can't
be dispatched when device is put in QIUESCE. Then no request
can be allocated for RQF_PREEMPT, and system may hang somewhere,
such as When sending commands of sync_cache or start_stop during
system suspend path.

Before quiesing SCSI, this patch freezes block queue in preempt
mode first, so no new normal request can enter queue any more,
and all pending requests are drained too once blk_freeze_queue_preempt
is returned. Then RQF_PREEMPT can be allocated successfully duirng
preempt freeze.

Signed-off-by: Ming Lei 
---
 drivers/scsi/scsi_lib.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..751a956b7b2b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -252,9 +252,10 @@ int scsi_execute(struct scsi_device *sdev, const unsigned 
char *cmd,
struct scsi_request *rq;
int ret = DRIVER_ERROR << 24;
 
-   req = blk_get_request(sdev->request_queue,
+   req = __blk_get_request(sdev->request_queue,
data_direction == DMA_TO_DEVICE ?
-   REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM);
+   REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM,
+   BLK_REQ_PREEMPT);
if (IS_ERR(req))
return ret;
rq = scsi_req(req);
@@ -2928,12 +2929,28 @@ scsi_device_quiesce(struct scsi_device *sdev)
 {
int err;
 
+   /*
+* Simply quiesing SCSI device isn't safe, it is easy
+* to use up requests because all these allocated requests
+* can't be dispatched when device is put in QIUESCE.
+* Then no request can be allocated and we may hang
+* somewhere, such as system suspend/resume.
+*
+* So we freeze block queue in preempt mode first, no new
+* normal request can enter queue any more, and all pending
+* requests are drained once blk_freeze_queue_preempt()
+* is returned. Only RQF_PREEMPT is allowed in preempt freeze.
+*/
+   blk_freeze_queue_preempt(sdev->request_queue);
+
mutex_lock(>state_mutex);
err = scsi_device_set_state(sdev, SDEV_QUIESCE);
mutex_unlock(>state_mutex);
 
-   if (err)
+   if (err) {
+   blk_unfreeze_queue_preempt(sdev->request_queue);
return err;
+   }
 
scsi_run_queue(sdev->request_queue);
while (atomic_read(>device_busy)) {
@@ -2964,6 +2981,8 @@ void scsi_device_resume(struct scsi_device *sdev)
scsi_device_set_state(sdev, SDEV_RUNNING) == 0)
scsi_run_queue(sdev->request_queue);
mutex_unlock(>state_mutex);
+
+   blk_unfreeze_queue_preempt(sdev->request_queue);
 }
 EXPORT_SYMBOL(scsi_device_resume);
 
-- 
2.9.5



[PATCH V4 08/10] block: allow to allocate req with RQF_PREEMPT when queue is preempt frozen

2017-09-11 Thread Ming Lei
REQF_PREEMPT is a bit special because the request is required
to be dispatched to lld even when SCSI device is quiesced.

So this patch introduces __blk_get_request() to allow block
layer to allocate request when queue is preempt frozen, since we
will preempt freeze queue before quiescing SCSI device in the
following patch for supporting safe SCSI quiescing.

Signed-off-by: Ming Lei 
---
 block/blk-core.c   | 48 ++--
 block/blk-mq.c |  3 +--
 include/linux/blk-mq.h |  7 ---
 include/linux/blkdev.h | 17 ++---
 4 files changed, 57 insertions(+), 18 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ade9b5484a6e..1c8e264753f0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -787,6 +787,35 @@ int blk_queue_enter(struct request_queue *q, unsigned 
flags)
if (percpu_ref_tryget_live(>q_usage_counter))
return 0;
 
+   /*
+* If queue is preempt frozen and caller need to allocate
+* request for RQF_PREEMPT, we grab the .q_usage_counter
+* unconditionally and return successfully.
+*
+* There isn't race with queue cleanup because:
+*
+* 1) it is guaranteed that preempt freeze can't be
+* started after queue is set as dying
+*
+* 2) normal freeze runs exclusively with preempt
+* freeze, so even after queue is set as dying
+* afterwards, blk_queue_cleanup() won't move on
+* until preempt freeze is done
+*
+* 3) blk_queue_dying() needn't to be checked here
+*  - for legacy path, it will be checked in
+*  __get_request()
+*  - blk-mq depends on driver to handle dying well
+*  because it is normal for queue to be set as dying
+*  just between blk_queue_enter() and allocating new
+*  request.
+*/
+   if ((flags & BLK_REQ_PREEMPT) &&
+   blk_queue_is_preempt_frozen(q)) {
+   blk_queue_enter_live(q);
+   return 0;
+   }
+
if (flags & BLK_REQ_NOWAIT)
return -EBUSY;
 
@@ -1410,7 +1439,8 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
 }
 
 static struct request *blk_old_get_request(struct request_queue *q,
-  unsigned int op, gfp_t gfp_mask)
+  unsigned int op, gfp_t gfp_mask,
+  unsigned int flags)
 {
struct request *rq;
int ret = 0;
@@ -1420,8 +1450,7 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
/* create ioc upfront */
create_io_context(gfp_mask, q->node);
 
-   ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ?
-   BLK_REQ_NOWAIT : 0);
+   ret = blk_queue_enter(q, flags & BLK_REQ_BITS_MASK);
if (ret)
return ERR_PTR(ret);
spin_lock_irq(q->queue_lock);
@@ -1439,26 +1468,25 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
return rq;
 }
 
-struct request *blk_get_request(struct request_queue *q, unsigned int op,
-   gfp_t gfp_mask)
+struct request *__blk_get_request(struct request_queue *q, unsigned int op,
+ gfp_t gfp_mask, unsigned int flags)
 {
struct request *req;
 
+   flags |= gfp_mask & __GFP_DIRECT_RECLAIM ? 0 : BLK_REQ_NOWAIT;
if (q->mq_ops) {
-   req = blk_mq_alloc_request(q, op,
-   (gfp_mask & __GFP_DIRECT_RECLAIM) ?
-   0 : BLK_MQ_REQ_NOWAIT);
+   req = blk_mq_alloc_request(q, op, flags);
if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn)
q->mq_ops->initialize_rq_fn(req);
} else {
-   req = blk_old_get_request(q, op, gfp_mask);
+   req = blk_old_get_request(q, op, gfp_mask, flags);
if (!IS_ERR(req) && q->initialize_rq_fn)
q->initialize_rq_fn(req);
}
 
return req;
 }
-EXPORT_SYMBOL(blk_get_request);
+EXPORT_SYMBOL(__blk_get_request);
 
 /**
  * blk_requeue_request - put a request back on queue
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 096c5f0ea518..720559724f97 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -482,8 +482,7 @@ struct request *blk_mq_alloc_request(struct request_queue 
*q, unsigned int op,
struct request *rq;
int ret;
 
-   ret = blk_queue_enter(q, (flags & BLK_MQ_REQ_NOWAIT) ?
-   BLK_REQ_NOWAIT : 0);
+ 

[PATCH V4 07/10] block: introduce preempt version of blk_[freeze|unfreeze]_queue

2017-09-11 Thread Ming Lei
The two APIs are required to allow request allocation of
RQF_PREEMPT when queue is preempt frozen.

We have to guarantee that normal freeze and preempt freeze
are run exclusive. Because for normal freezing, once
blk_freeze_queue_wait() is returned, no request can enter
queue any more.

Another issue we should pay attention to is that the race
of preempt freeze vs. blk_cleanup_queue(), and it is avoided
by not allowing to preempt freeeze after queue becomes dying,
otherwise preempt freeeze may hang forever.

Signed-off-by: Ming Lei 
---
 block/blk-core.c   |   2 +
 block/blk-mq.c | 133 +++--
 block/blk.h|  11 
 include/linux/blk-mq.h |   2 +
 include/linux/blkdev.h |   6 +++
 5 files changed, 140 insertions(+), 14 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 04327a60061e..ade9b5484a6e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -905,6 +905,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
if (blkcg_init_queue(q))
goto fail_ref;
 
+   spin_lock_init(>freeze_lock);
+
return q;
 
 fail_ref:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 358b2ca33010..096c5f0ea518 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -118,19 +118,6 @@ void blk_mq_in_flight(struct request_queue *q, struct 
hd_struct *part,
blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight, );
 }
 
-void blk_freeze_queue_start(struct request_queue *q)
-{
-   int freeze_depth;
-
-   freeze_depth = atomic_inc_return(>freeze_depth);
-   if (freeze_depth == 1) {
-   percpu_ref_kill(>q_usage_counter);
-   if (q->mq_ops)
-   blk_mq_run_hw_queues(q, false);
-   }
-}
-EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
-
 void blk_freeze_queue_wait(struct request_queue *q)
 {
if (!q->mq_ops)
@@ -148,6 +135,69 @@ int blk_mq_freeze_queue_wait_timeout(struct request_queue 
*q,
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);
 
+static bool queue_freeze_is_over(struct request_queue *q,
+   bool preempt, bool *queue_dying)
+{
+   /*
+* preempt freeze has to be prevented after queue is set as
+* dying, otherwise we may hang forever
+*/
+   if (preempt) {
+   spin_lock_irq(q->queue_lock);
+   *queue_dying = !!blk_queue_dying(q);
+   spin_unlock_irq(q->queue_lock);
+
+   return !q->normal_freezing || *queue_dying;
+   }
+   return !q->preempt_freezing;
+}
+
+static void __blk_freeze_queue_start(struct request_queue *q, bool preempt)
+{
+   int freeze_depth;
+   bool queue_dying;
+
+   /*
+* Make sure normal freeze and preempt freeze are run
+* exclusively, but each kind itself is allowed to be
+* run concurrently, even nested.
+*/
+   spin_lock(>freeze_lock);
+   wait_event_cmd(q->freeze_wq,
+  queue_freeze_is_over(q, preempt, _dying),
+  spin_unlock(>freeze_lock),
+  spin_lock(>freeze_lock));
+
+   if (preempt && queue_dying)
+   goto unlock;
+
+   freeze_depth = atomic_inc_return(>freeze_depth);
+   if (freeze_depth == 1) {
+   if (preempt) {
+   q->preempt_freezing = 1;
+   q->preempt_unfreezing = 0;
+   } else
+   q->normal_freezing = 1;
+   spin_unlock(>freeze_lock);
+
+   percpu_ref_kill(>q_usage_counter);
+   if (q->mq_ops)
+   blk_mq_run_hw_queues(q, false);
+
+   /* have to drain I/O here for preempt quiesce */
+   if (preempt)
+   blk_freeze_queue_wait(q);
+   } else
+ unlock:
+   spin_unlock(>freeze_lock);
+}
+
+void blk_freeze_queue_start(struct request_queue *q)
+{
+   __blk_freeze_queue_start(q, false);
+}
+EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
+
 /*
  * Guarantee no request is in use, so we can change any data structure of
  * the queue afterward.
@@ -166,20 +216,75 @@ void blk_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue);
 
-void blk_unfreeze_queue(struct request_queue *q)
+static void blk_start_unfreeze_queue_preempt(struct request_queue *q)
+{
+   /* no new request can be coming after unfreezing */
+   spin_lock(>freeze_lock);
+   q->preempt_unfreezing = 1;
+   spin_unlock(>freeze_lock);
+
+   blk_freeze_queue_wait(q);
+}
+
+static void __blk_unfreeze_queue(struct request_queue *q, bool preempt)
 {
int freeze_depth;
 
freeze_depth = atomic_dec_return(>freeze_depth);
WARN_ON_ONCE(freeze_depth < 0);
if (!freeze_depth) {
+   if (preempt)
+   blk_start_unfreeze_queue_preempt(q);
+

[PATCH V4 05/10] block: rename .mq_freeze_wq and .mq_freeze_depth

2017-09-11 Thread Ming Lei
Both two are used for legacy and blk-mq, so rename them
as .freeze_wq and .freeze_depth for avoiding to confuse
people.

No functional change.

Signed-off-by: Ming Lei 
---
 block/blk-core.c   | 12 ++--
 block/blk-mq.c | 12 ++--
 include/linux/blkdev.h |  4 ++--
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2347107eeca4..eec5881a9e74 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -630,7 +630,7 @@ void blk_set_queue_dying(struct request_queue *q)
 * We need to ensure that processes currently waiting on
 * the queue are notified as well.
 */
-   wake_up_all(>mq_freeze_wq);
+   wake_up_all(>freeze_wq);
 }
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
@@ -793,14 +793,14 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
/*
 * read pair of barrier in blk_freeze_queue_start(),
 * we need to order reading __PERCPU_REF_DEAD flag of
-* .q_usage_counter and reading .mq_freeze_depth or
+* .q_usage_counter and reading .freeze_depth or
 * queue dying flag, otherwise the following wait may
 * never return if the two reads are reordered.
 */
smp_rmb();
 
-   ret = wait_event_interruptible(q->mq_freeze_wq,
-   !atomic_read(>mq_freeze_depth) ||
+   ret = wait_event_interruptible(q->freeze_wq,
+   !atomic_read(>freeze_depth) ||
blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
@@ -819,7 +819,7 @@ static void blk_queue_usage_counter_release(struct 
percpu_ref *ref)
struct request_queue *q =
container_of(ref, struct request_queue, q_usage_counter);
 
-   wake_up_all(>mq_freeze_wq);
+   wake_up_all(>freeze_wq);
 }
 
 static void blk_rq_timed_out_timer(unsigned long data)
@@ -891,7 +891,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
q->bypass_depth = 1;
__set_bit(QUEUE_FLAG_BYPASS, >queue_flags);
 
-   init_waitqueue_head(>mq_freeze_wq);
+   init_waitqueue_head(>freeze_wq);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 345943fea998..205ae2d3da14 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -122,7 +122,7 @@ void blk_freeze_queue_start(struct request_queue *q)
 {
int freeze_depth;
 
-   freeze_depth = atomic_inc_return(>mq_freeze_depth);
+   freeze_depth = atomic_inc_return(>freeze_depth);
if (freeze_depth == 1) {
percpu_ref_kill(>q_usage_counter);
if (q->mq_ops)
@@ -135,14 +135,14 @@ void blk_freeze_queue_wait(struct request_queue *q)
 {
if (!q->mq_ops)
blk_drain_queue(q);
-   wait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
+   wait_event(q->freeze_wq, percpu_ref_is_zero(>q_usage_counter));
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_wait);
 
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 unsigned long timeout)
 {
-   return wait_event_timeout(q->mq_freeze_wq,
+   return wait_event_timeout(q->freeze_wq,
percpu_ref_is_zero(>q_usage_counter),
timeout);
 }
@@ -170,11 +170,11 @@ void blk_unfreeze_queue(struct request_queue *q)
 {
int freeze_depth;
 
-   freeze_depth = atomic_dec_return(>mq_freeze_depth);
+   freeze_depth = atomic_dec_return(>freeze_depth);
WARN_ON_ONCE(freeze_depth < 0);
if (!freeze_depth) {
percpu_ref_reinit(>q_usage_counter);
-   wake_up_all(>mq_freeze_wq);
+   wake_up_all(>freeze_wq);
}
 }
 EXPORT_SYMBOL_GPL(blk_unfreeze_queue);
@@ -2424,7 +2424,7 @@ void blk_mq_free_queue(struct request_queue *q)
 /* Basically redo blk_mq_init_queue with queue frozen */
 static void blk_mq_queue_reinit(struct request_queue *q)
 {
-   WARN_ON_ONCE(!atomic_read(>mq_freeze_depth));
+   WARN_ON_ONCE(!atomic_read(>freeze_depth));
 
blk_mq_debugfs_unregister_hctxs(q);
blk_mq_sysfs_unregister(q);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 460294bb0fa5..b8053bcc6b5f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -564,7 +564,7 @@ struct request_queue {
struct mutexsysfs_lock;
 
int bypass_depth;
-   atomic_tmq_freeze_depth;
+   atomic_tfreeze_depth;
 
 #if defined(CONFIG_BLK_DEV_BSG)
bsg_job_fn  *bsg_job_fn;
@@ -576,7 +576,7 @@ struct request_queue {
struct throtl_data *td;
 #endif
struct 

[PATCH V4 06/10] block: pass flags to blk_queue_enter()

2017-09-11 Thread Ming Lei
We need to pass PREEMPT flags to blk_queue_enter()
for allocating request with RQF_PREEMPT in the
following patch.

Signed-off-by: Ming Lei 
---
 block/blk-core.c   | 10 ++
 block/blk-mq.c |  5 +++--
 block/blk-timeout.c|  2 +-
 fs/block_dev.c |  4 ++--
 include/linux/blkdev.h |  7 ++-
 5 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index eec5881a9e74..04327a60061e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -779,7 +779,7 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(blk_alloc_queue);
 
-int blk_queue_enter(struct request_queue *q, bool nowait)
+int blk_queue_enter(struct request_queue *q, unsigned flags)
 {
while (true) {
int ret;
@@ -787,7 +787,7 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
if (percpu_ref_tryget_live(>q_usage_counter))
return 0;
 
-   if (nowait)
+   if (flags & BLK_REQ_NOWAIT)
return -EBUSY;
 
/*
@@ -1418,7 +1418,8 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
/* create ioc upfront */
create_io_context(gfp_mask, q->node);
 
-   ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
+   ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ?
+   BLK_REQ_NOWAIT : 0);
if (ret)
return ERR_PTR(ret);
spin_lock_irq(q->queue_lock);
@@ -2225,7 +2226,8 @@ blk_qc_t generic_make_request(struct bio *bio)
do {
struct request_queue *q = bio->bi_disk->queue;
 
-   if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) {
+   if (likely(blk_queue_enter(q, (bio->bi_opf & REQ_NOWAIT) ?
+   BLK_REQ_NOWAIT : 0) == 0)) {
struct bio_list lower, same;
 
/* Create a fresh bio_list for all subordinate requests 
*/
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 205ae2d3da14..358b2ca33010 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -377,7 +377,8 @@ struct request *blk_mq_alloc_request(struct request_queue 
*q, unsigned int op,
struct request *rq;
int ret;
 
-   ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+   ret = blk_queue_enter(q, (flags & BLK_MQ_REQ_NOWAIT) ?
+   BLK_REQ_NOWAIT : 0);
if (ret)
return ERR_PTR(ret);
 
@@ -416,7 +417,7 @@ struct request *blk_mq_alloc_request_hctx(struct 
request_queue *q,
if (hctx_idx >= q->nr_hw_queues)
return ERR_PTR(-EIO);
 
-   ret = blk_queue_enter(q, true);
+   ret = blk_queue_enter(q, BLK_REQ_NOWAIT);
if (ret)
return ERR_PTR(ret);
 
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 17ec83bb0900..e803106a5e5b 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -134,7 +134,7 @@ void blk_timeout_work(struct work_struct *work)
struct request *rq, *tmp;
int next_set = 0;
 
-   if (blk_queue_enter(q, true))
+   if (blk_queue_enter(q, BLK_REQ_NOWAIT))
return;
spin_lock_irqsave(q->queue_lock, flags);
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index bb715b2fcfb8..44e6502e377e 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -674,7 +674,7 @@ int bdev_read_page(struct block_device *bdev, sector_t 
sector,
if (!ops->rw_page || bdev_get_integrity(bdev))
return result;
 
-   result = blk_queue_enter(bdev->bd_queue, false);
+   result = blk_queue_enter(bdev->bd_queue, 0);
if (result)
return result;
result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, false);
@@ -710,7 +710,7 @@ int bdev_write_page(struct block_device *bdev, sector_t 
sector,
 
if (!ops->rw_page || bdev_get_integrity(bdev))
return -EOPNOTSUPP;
-   result = blk_queue_enter(bdev->bd_queue, false);
+   result = blk_queue_enter(bdev->bd_queue, 0);
if (result)
return result;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b8053bcc6b5f..54450715915b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -857,6 +857,11 @@ enum {
BLKPREP_INVALID,/* invalid command, kill, return -EREMOTEIO */
 };
 
+/* passed to blk_queue_enter */
+enum {
+   BLK_REQ_NOWAIT = (1 << 0),
+};
+
 extern unsigned long blk_max_low_pfn, blk_max_pfn;
 
 /*
@@ -962,7 +967,7 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct 
gendisk *, fmode_t,
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 struct scsi_ioctl_command __user *);
 
-extern int blk_queue_enter(struct request_queue *q, bool nowait);
+extern int blk_queue_enter(struct request_queue *q, 

[PATCH V4 04/10] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait

2017-09-11 Thread Ming Lei
The only change on legacy is that blk_drain_queue() is run
from blk_freeze_queue(), which is called in blk_cleanup_queue().

So this patch removes the explicit call of __blk_drain_queue() in
blk_cleanup_queue().

Signed-off-by: Ming Lei 
---
 block/blk-core.c | 17 +++--
 block/blk-mq.c   |  8 +---
 block/blk.h  |  1 +
 drivers/nvme/host/core.c |  2 +-
 include/linux/blk-mq.h   |  2 +-
 5 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 7e436ee04e08..2347107eeca4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -530,6 +530,21 @@ static void __blk_drain_queue(struct request_queue *q, 
bool drain_all)
 }
 
 /**
+ * blk_drain_queue - drain requests from request_queue
+ * @q: queue to drain
+ *
+ * Drain requests from @q.  All pending requests are drained.
+ * The caller is responsible for ensuring that no new requests
+ * which need to be drained are queued.
+ */
+void blk_drain_queue(struct request_queue *q)
+{
+   spin_lock_irq(q->queue_lock);
+   __blk_drain_queue(q, true);
+   spin_unlock_irq(q->queue_lock);
+}
+
+/**
  * blk_queue_bypass_start - enter queue bypass mode
  * @q: queue of interest
  *
@@ -659,8 +674,6 @@ void blk_cleanup_queue(struct request_queue *q)
 */
blk_freeze_queue(q);
spin_lock_irq(lock);
-   if (!q->mq_ops)
-   __blk_drain_queue(q, true);
queue_flag_set(QUEUE_FLAG_DEAD, q);
spin_unlock_irq(lock);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index aacf47f15b9a..345943fea998 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -131,11 +131,13 @@ void blk_freeze_queue_start(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
 
-void blk_mq_freeze_queue_wait(struct request_queue *q)
+void blk_freeze_queue_wait(struct request_queue *q)
 {
+   if (!q->mq_ops)
+   blk_drain_queue(q);
wait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
 }
-EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
+EXPORT_SYMBOL_GPL(blk_freeze_queue_wait);
 
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 unsigned long timeout)
@@ -160,7 +162,7 @@ void blk_freeze_queue(struct request_queue *q)
 * exported to drivers as the only user for unfreeze is blk_mq.
 */
blk_freeze_queue_start(q);
-   blk_mq_freeze_queue_wait(q);
+   blk_freeze_queue_wait(q);
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue);
 
diff --git a/block/blk.h b/block/blk.h
index fcb9775b997d..21eed59d96db 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -64,6 +64,7 @@ void blk_rq_bio_prep(struct request_queue *q, struct request 
*rq,
struct bio *bio);
 void blk_queue_bypass_start(struct request_queue *q);
 void blk_queue_bypass_end(struct request_queue *q);
+void blk_drain_queue(struct request_queue *q);
 void __blk_queue_free_tags(struct request_queue *q);
 void blk_freeze_queue(struct request_queue *q);
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 62808c4536f5..8506f7402a8d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2904,7 +2904,7 @@ void nvme_wait_freeze(struct nvme_ctrl *ctrl)
 
mutex_lock(>namespaces_mutex);
list_for_each_entry(ns, >namespaces, list)
-   blk_mq_freeze_queue_wait(ns->queue);
+   blk_freeze_queue_wait(ns->queue);
mutex_unlock(>namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_wait_freeze);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 355d74507656..62c3d1f7d12a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -256,7 +256,7 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 void blk_freeze_queue(struct request_queue *q);
 void blk_unfreeze_queue(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
-void blk_mq_freeze_queue_wait(struct request_queue *q);
+void blk_freeze_queue_wait(struct request_queue *q);
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 unsigned long timeout);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set,
-- 
2.9.5



[PATCH V4 03/10] blk-mq: rename blk_mq_[freeze|unfreeze]_queue

2017-09-11 Thread Ming Lei
We will support to freeze queue on block legacy path too.

No functional change.

Signed-off-by: Ming Lei 
---
 block/bfq-iosched.c  |  2 +-
 block/blk-cgroup.c   |  8 
 block/blk-mq.c   | 27 +--
 block/blk-mq.h   |  1 -
 block/elevator.c |  4 ++--
 drivers/block/loop.c | 24 
 drivers/block/rbd.c  |  2 +-
 drivers/nvme/host/core.c |  6 +++---
 include/linux/blk-mq.h   |  4 ++--
 9 files changed, 34 insertions(+), 44 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a4783da90ba8..a18f36bfbdf0 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4758,7 +4758,7 @@ static int bfq_init_queue(struct request_queue *q, struct 
elevator_type *e)
 * The invocation of the next bfq_create_group_hierarchy
 * function is the head of a chain of function calls
 * (bfq_create_group_hierarchy->blkcg_activate_policy->
-* blk_mq_freeze_queue) that may lead to the invocation of the
+* blk_freeze_queue) that may lead to the invocation of the
 * has_work hook function. For this reason,
 * bfq_create_group_hierarchy is invoked only after all
 * scheduler data has been initialized, apart from the fields
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index d3f56baee936..ffc984381e4b 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1298,7 +1298,7 @@ int blkcg_activate_policy(struct request_queue *q,
return 0;
 
if (q->mq_ops)
-   blk_mq_freeze_queue(q);
+   blk_freeze_queue(q);
else
blk_queue_bypass_start(q);
 pd_prealloc:
@@ -1339,7 +1339,7 @@ int blkcg_activate_policy(struct request_queue *q,
spin_unlock_irq(q->queue_lock);
 out_bypass_end:
if (q->mq_ops)
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
else
blk_queue_bypass_end(q);
if (pd_prealloc)
@@ -1365,7 +1365,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
return;
 
if (q->mq_ops)
-   blk_mq_freeze_queue(q);
+   blk_freeze_queue(q);
else
blk_queue_bypass_start(q);
 
@@ -1390,7 +1390,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
spin_unlock_irq(q->queue_lock);
 
if (q->mq_ops)
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
else
blk_queue_bypass_end(q);
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index eee86adc0f53..aacf47f15b9a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -162,18 +162,9 @@ void blk_freeze_queue(struct request_queue *q)
blk_freeze_queue_start(q);
blk_mq_freeze_queue_wait(q);
 }
+EXPORT_SYMBOL_GPL(blk_freeze_queue);
 
-void blk_mq_freeze_queue(struct request_queue *q)
-{
-   /*
-* ...just an alias to keep freeze and unfreeze actions balanced
-* in the blk_mq_* namespace
-*/
-   blk_freeze_queue(q);
-}
-EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
-
-void blk_mq_unfreeze_queue(struct request_queue *q)
+void blk_unfreeze_queue(struct request_queue *q)
 {
int freeze_depth;
 
@@ -184,7 +175,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
wake_up_all(>mq_freeze_wq);
}
 }
-EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
+EXPORT_SYMBOL_GPL(blk_unfreeze_queue);
 
 /*
  * FIXME: replace the scsi_internal_device_*block_nowait() calls in the
@@ -2176,9 +2167,9 @@ static void blk_mq_update_tag_set_depth(struct 
blk_mq_tag_set *set,
lockdep_assert_held(>tag_list_lock);
 
list_for_each_entry(q, >tag_list, tag_set_list) {
-   blk_mq_freeze_queue(q);
+   blk_freeze_queue(q);
queue_set_hctx_shared(q, shared);
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
}
 }
 
@@ -2609,7 +2600,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, 
unsigned int nr)
if (!set)
return -EINVAL;
 
-   blk_mq_freeze_queue(q);
+   blk_freeze_queue(q);
 
ret = 0;
queue_for_each_hw_ctx(q, hctx, i) {
@@ -2634,7 +2625,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, 
unsigned int nr)
if (!ret)
q->nr_requests = nr;
 
-   blk_mq_unfreeze_queue(q);
+   blk_unfreeze_queue(q);
 
return ret;
 }
@@ -2652,7 +2643,7 @@ static void __blk_mq_update_nr_hw_queues(struct 
blk_mq_tag_set *set,
return;
 
list_for_each_entry(q, >tag_list, tag_set_list)
-   blk_mq_freeze_queue(q);
+   blk_freeze_queue(q);
 
set->nr_hw_queues = nr_hw_queues;
blk_mq_update_queue_map(set);
@@ -2662,7 +2653,7 @@ static void __blk_mq_update_nr_hw_queues(struct 
blk_mq_tag_set *set,
}
 
list_for_each_entry(q, >tag_list, 

[PATCH V4 02/10] block: tracking request allocation with q_usage_counter

2017-09-11 Thread Ming Lei
This usage is basically same with blk-mq, so that we can
support to freeze legacy queue easily.

Also 'wake_up_all(>mq_freeze_wq)' has to be moved
into blk_set_queue_dying() since both legacy and blk-mq
may wait on the wait queue of .mq_freeze_wq.

Signed-off-by: Ming Lei 
---
 block/blk-core.c | 14 ++
 block/blk-mq.c   |  7 ---
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d709c0e3a2ac..7e436ee04e08 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -610,6 +610,12 @@ void blk_set_queue_dying(struct request_queue *q)
}
spin_unlock_irq(q->queue_lock);
}
+
+   /*
+* We need to ensure that processes currently waiting on
+* the queue are notified as well.
+*/
+   wake_up_all(>mq_freeze_wq);
 }
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
@@ -1392,16 +1398,21 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
   unsigned int op, gfp_t gfp_mask)
 {
struct request *rq;
+   int ret = 0;
 
WARN_ON_ONCE(q->mq_ops);
 
/* create ioc upfront */
create_io_context(gfp_mask, q->node);
 
+   ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
+   if (ret)
+   return ERR_PTR(ret);
spin_lock_irq(q->queue_lock);
rq = get_request(q, op, NULL, gfp_mask);
if (IS_ERR(rq)) {
spin_unlock_irq(q->queue_lock);
+   blk_queue_exit(q);
return rq;
}
 
@@ -1573,6 +1584,7 @@ void __blk_put_request(struct request_queue *q, struct 
request *req)
blk_free_request(rl, req);
freed_request(rl, sync, rq_flags);
blk_put_rl(rl);
+   blk_queue_exit(q);
}
 }
 EXPORT_SYMBOL_GPL(__blk_put_request);
@@ -1854,8 +1866,10 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio)
 * Grab a free request. This is might sleep but can not fail.
 * Returns with the queue unlocked.
 */
+   blk_queue_enter_live(q);
req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
if (IS_ERR(req)) {
+   blk_queue_exit(q);
__wbt_done(q->rq_wb, wb_acct);
if (PTR_ERR(req) == -ENOMEM)
bio->bi_status = BLK_STS_RESOURCE;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c364497cc44..eee86adc0f53 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -256,13 +256,6 @@ void blk_mq_wake_waiters(struct request_queue *q)
queue_for_each_hw_ctx(q, hctx, i)
if (blk_mq_hw_queue_mapped(hctx))
blk_mq_tag_wakeup_all(hctx->tags, true);
-
-   /*
-* If we are called because the queue has now been marked as
-* dying, we need to ensure that processes currently waiting on
-* the queue are notified as well.
-*/
-   wake_up_all(>mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
-- 
2.9.5



[PATCH V4 01/10] blk-mq: only run hw queues for blk-mq

2017-09-11 Thread Ming Lei
This patch just makes it explicitely.

Reviewed-by: Johannes Thumshirn 
Signed-off-by: Ming Lei 
---
 block/blk-mq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3f18cff80050..9c364497cc44 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,7 +125,8 @@ void blk_freeze_queue_start(struct request_queue *q)
freeze_depth = atomic_inc_return(>mq_freeze_depth);
if (freeze_depth == 1) {
percpu_ref_kill(>q_usage_counter);
-   blk_mq_run_hw_queues(q, false);
+   if (q->mq_ops)
+   blk_mq_run_hw_queues(q, false);
}
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
-- 
2.9.5



[PATCH V4 0/10] block/scsi: safe SCSI quiescing

2017-09-11 Thread Ming Lei
Hi,

The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.

Once SCSI device is put into QUIESCE, no new request except for
RQF_PREEMPT can be dispatched to SCSI successfully, and
scsi_device_quiesce() just simply waits for completion of I/Os
dispatched to SCSI stack. It isn't enough at all.

Because new request still can be comming, but all the allocated
requests can't be dispatched successfully, so request pool can be
consumed up easily.

Then request with RQF_PREEMPT can't be allocated and wait forever,
meantime scsi_device_resume() waits for completion of RQF_PREEMPT,
then system hangs forever, such as during system suspend or
sending SCSI domain alidation.

Both IO hang inside system suspend[1] or SCSI domain validation
were reported before.

This patch introduces preempt freeze, and solves the issue
by preempt freezing block queue during SCSI quiesce, and allows
to allocate request of RQF_PREEMPT when queue is in this state.

Oleksandr verified that V3 does fix the hang during suspend/resume,
and Cathy verified that revised V3 fixes hang in sending
SCSI domain validation.

Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
them all by introducing/unifying blk_freeze_queue_preempt() and
blk_unfreeze_queue_preempt(), and cleanup is done together.

The patchset can be found in the following gitweb:

https://github.com/ming1/linux/tree/blk_safe_scsi_quiesce_V4

V4:
- reorganize patch order to make it more reasonable
- support nested preempt freeze, as required by SCSI transport spi
- check preempt freezing in slow path of of blk_queue_enter()
- add "SCSI: transport_spi: resume a quiesced device"
- wake up freeze queue in setting dying for both blk-mq and legacy
- rename blk_mq_[freeze|unfreeze]_queue() in one patch
- rename .mq_freeze_wq and .mq_freeze_depth
- improve comment

V3:
- introduce q->preempt_unfreezing to fix one bug of preempt freeze
- call blk_queue_enter_live() only when queue is preempt frozen
- cleanup a bit on the implementation of preempt freeze
- only patch 6 and 7 are changed

V2:
- drop the 1st patch in V1 because percpu_ref_is_dying() is
enough as pointed by Tejun
- introduce preempt version of blk_[freeze|unfreeze]_queue
- sync between preempt freeze and normal freeze
- fix warning from percpu-refcount as reported by Oleksandr


[1] https://marc.info/?t=150340250100013=3=2


Thanks,
Ming


Ming Lei (10):
  blk-mq: only run hw queues for blk-mq
  block: tracking request allocation with q_usage_counter
  blk-mq: rename blk_mq_[freeze|unfreeze]_queue
  blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  block: rename .mq_freeze_wq and .mq_freeze_depth
  block: pass flags to blk_queue_enter()
  block: introduce preempt version of blk_[freeze|unfreeze]_queue
  block: allow to allocate req with RQF_PREEMPT when queue is preempt
frozen
  SCSI: transport_spi: resume a quiesced device
  SCSI: preempt freeze block queue when SCSI device is put into quiesce

 block/bfq-iosched.c   |   2 +-
 block/blk-cgroup.c|   8 +-
 block/blk-core.c  |  95 
 block/blk-mq.c| 180 --
 block/blk-mq.h|   1 -
 block/blk-timeout.c   |   2 +-
 block/blk.h   |  12 +++
 block/elevator.c  |   4 +-
 drivers/block/loop.c  |  24 ++---
 drivers/block/rbd.c   |   2 +-
 drivers/nvme/host/core.c  |   8 +-
 drivers/scsi/scsi_lib.c   |  25 +-
 drivers/scsi/scsi_transport_spi.c |   3 +
 fs/block_dev.c|   4 +-
 include/linux/blk-mq.h|  15 ++--
 include/linux/blkdev.h|  32 +--
 16 files changed, 313 insertions(+), 104 deletions(-)

-- 
2.9.5



Re: [PATCH] csi: libcxgbi: remove redundant check and close on csk

2017-09-11 Thread Varun Prakash
On Thu, Sep 07, 2017 at 02:51:33PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> csk is always null on the error return path and so the non-null
> check and call to cxgbi_sock_closed on csk is redundant and
> can be removed.
> 
> Detected by: CoverityScan CID#114329 ("Logically dead code")
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/scsi/cxgbi/libcxgbi.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
> index 512c8f1ea5b0..da36c2de069e 100644
> --- a/drivers/scsi/cxgbi/libcxgbi.c
> +++ b/drivers/scsi/cxgbi/libcxgbi.c
> @@ -688,8 +688,6 @@ cxgbi_check_route(struct sockaddr *dst_addr, int ifindex)
>  
>  rel_rt:
>   ip_rt_put(rt);
> - if (csk)
> - cxgbi_sock_closed(csk);
>  err_out:
>   return ERR_PTR(err);
>  }

Acked-by: Varun Prakash  


Re: [PATCH 2/2] scsi: Align queue to ARCH_DMA_MINALIGN innon-coherent DMA mode

2017-09-11 Thread 陈华才
Hi, Christoph

I think we cannot modify dma_get_cache_alignment(), because existing callers 
may want to unconditionally return ARCH_DMA_MINALIGN.

Huacai
 
 
-- Original --
From:  "Christoph Hellwig";
Date:  Mon, Sep 11, 2017 03:39 PM
To:  "Huacai Chen"; 
Cc:  "James E . J . Bottomley"; "Martin K . 
Petersen"; "Fuxin Zhang"; 
"linux-scsi"; 
"linux-kernel"; "stable"; 
Subject:  Re: [PATCH 2/2] scsi: Align queue to ARCH_DMA_MINALIGN innon-coherent 
DMA mode

 
> + if (plat_device_is_coherent(dev))

We can't just call platform device code.  We'll need a proper
DMA API call for this.

> + blk_queue_dma_alignment(q, 0x04 - 1);
> + else
> + blk_queue_dma_alignment(q, dma_get_cache_alignment() - 1);

Which we already have with dma_get_cache_alignment, except that it
doesn't take a struct device pointer and doesn't call into dma_map
ops.  So please add a struct device argument to dma_get_cache_alignment,
and let it call into dma_map_ops where needed.

With that you can replace the above with:

blk_queue_dma_alignment(q,
max(0x04U, dma_get_cache_alignment(dev)) - 1);

Re: [PATCH 3/3] fcoe: open-code fcoe_destroy_work() for NETDEV_UNREGISTER

2017-09-11 Thread Johannes Thumshirn

Acked-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH 2/3] fcoe: separate out fcoe_vport_remove()

2017-09-11 Thread Johannes Thumshirn

Acked-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH 1/3] fcoe: move fcoe_interface_remove() out of fcoe_interface_cleanup()

2017-09-11 Thread Johannes Thumshirn
With an updates patch description (see Lee's mail) 

Acked-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


Re: [PATCH 2/2] scsi: Align queue to ARCH_DMA_MINALIGN in non-coherent DMA mode

2017-09-11 Thread Christoph Hellwig
> + if (plat_device_is_coherent(dev))

We can't just call platform device code.  We'll need a proper
DMA API call for this.

> + blk_queue_dma_alignment(q, 0x04 - 1);
> + else
> + blk_queue_dma_alignment(q, dma_get_cache_alignment() - 1);

Which we already have with dma_get_cache_alignment, except that it
doesn't take a struct device pointer and doesn't call into dma_map
ops.  So please add a struct device argument to dma_get_cache_alignment,
and let it call into dma_map_ops where needed.

With that you can replace the above with:

blk_queue_dma_alignment(q,
max(0x04U, dma_get_cache_alignment(dev)) - 1);


Re: [man-pages PATCH] cciss.4, hpsa.4: mention cciss removal in Linux 4.13

2017-09-11 Thread Meelis Roos
On Mon, 11 Sep 2017, Eugene Syromyatnikov wrote:

> During the Linux 4.13 development cycle, cciss driver has been removed
> in flavor to hpsa driver that has been amended with some legacy board
> support.

It's gone in 4.14 - 4.13 works the old way.

> 
> * man4/cciss.4 (.SH DESCRIPTION): Mention driver removal.
> * man4/hpsa.4 (.SH DESCRIPTION): Mention list of boards that recognised
> since Linux 4.13.
> 
> Signed-off-by: Eugene Syromyatnikov 
> ---
>  man4/cciss.4 |  7 +++
>  man4/hpsa.4  | 26 ++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/man4/cciss.4 b/man4/cciss.4
> index e6ba93d..ff4c248 100644
> --- a/man4/cciss.4
> +++ b/man4/cciss.4
> @@ -15,6 +15,13 @@ cciss \- HP Smart Array block driver
>  modprobe cciss [ cciss_allow_hpsa=1 ]
>  .fi
>  .SH DESCRIPTION
> +.\" commit 253d2464df446456c0bba5ed4137a7be0b278aa8
> +.BR Note :
> +This obsolete driver was removed from the kernel in version 4.13,
> +as it is superseded by
> +.BR hpsa (4)
> +driver in newer kernels.
> +.PP
>  .B cciss
>  is a block driver for older HP Smart Array RAID controllers.
>  .SS Options
> diff --git a/man4/hpsa.4 b/man4/hpsa.4
> index 63000bf..9b7fd82 100644
> --- a/man4/hpsa.4
> +++ b/man4/hpsa.4
> @@ -52,6 +52,32 @@ driver supports the following Smart Array boards:
>  Smart Array P711m
>  StorageWorks P1210m
>  .fi
> +.PP
> +.\" commit 135ae6edeb51979d0998daf1357f149a7d6ebb08
> +Since Linux 4.13, the following Smart Array boards are also supported:
> +.PP
> +.nf
> +Smart Array 5300
> +Smart Array 5312
> +Smart Array 532
> +Smart Array 5i
> +Smart Array 6400
> +Smart Array 6400 EM
> +Smart Array 641
> +Smart Array 642
> +Smart Array 6i
> +Smart Array E200
> +Smart Array E200i
> +Smart Array E200i
> +Smart Array E200i
> +Smart Array E200i
> +Smart Array E500
> +Smart Array P400
> +Smart Array P400i
> +Smart Array P600
> +Smart Array P700m
> +Smart Array P800
> +.fi
>  .SS Configuration details
>  To configure HP Smart Array controllers,
>  use the HP Array Configuration Utility (either
> 

-- 
Meelis Roos (mr...@ut.ee)  http://www.cs.ut.ee/~mroos/