date:20171010

Re: [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen

2017-10-10 Thread Shaohua Li

On Tue, Oct 10, 2017 at 11:33:06PM +, Bart Van Assche wrote:
> On Tue, 2017-10-10 at 15:30 -0700, Shaohua Li wrote:
> > On Tue, Oct 10, 2017 at 02:03:39PM -0700, Bart Van Assche wrote:
> > > Some people use the md driver on laptops and use the suspend and
> > > resume functionality. Since it is essential that submitting of
> > > new I/O requests stops before a hibernation image is created,
> > > interrupt the md resync and reshape actions if the system is
> > > being frozen. Note: the resync and reshape will restart after
> > > the system is resumed and a message similar to the following
> > > will appear in the system log:
> > > 
> > > md: md0: data-check interrupted.
> > 
> > Where do we restart resync and reshape?
> 
> Hello Shaohua,
> 
> My understanding of the md driver is as follows:
> - Before any write occurs, md_write_start() is called. That function clears
>   mddev->safemode and checks mddev->in_sync. If that variable is set, it is
>   cleared and md_write_start() wakes up mddev->thread and waits until the
>   superblock has been written to disk.
> - All mddev->thread implementations call md_check_recovery(). That function
>   updates the superblock by calling md_update_sb() if needed.
>   md_check_recovery() also starts a resync thread if the array is not in
>   sync. See also the comment block above md_check_recovery().
> - md_do_sync() performs the resync. If it completes successfully the
>   "in_sync" state is set (set_in_sync()). If it is interrupted the "in_sync"
>   state is not modified.
> - super_90_sync() sets the MD_SB_CLEAN flag in the on-disk superblock if the
>   "in_sync" flag is set.
> 
> In other words: if the array is in sync the MD_SB_CLEAN flag is set in the
> on-disk superblock. If a resync is needed that flag is not set. The
> MD_SB_CLEAN flag is checked when the superblock is loaded. If that flag is
> not set a resync is started.

The problem is __md_stop_writes set some bit like MD_RECOVERY_FROZEN, which
will prevent md_check_recovery restarting resync/reshape. I think we need
preserve mddev->recovery in suspend prepare and restore after resume

Thanks,
Shaohua

[PATCH] scsi: fix doc. typo for I2O

2017-10-10 Thread Randy Dunlap

From: Randy Dunlap 

Fix typo: I20 should be I2O.

Signed-off-by: Randy Dunlap 
---
 Documentation/driver-api/scsi.rst |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- lnx-414-rc3.orig/Documentation/driver-api/scsi.rst
+++ lnx-414-rc3/Documentation/driver-api/scsi.rst
@@ -28,7 +28,7 @@ SCSI commands can be transported over ju
 are the default protocol for storage devices attached to USB, SATA, SAS,
 Fibre Channel, FireWire, and ATAPI devices. SCSI packets are also
 commonly exchanged over Infiniband,
-`I20 `__, TCP/IP
+`I2O `__, TCP/IP
 (`iSCSI `__), even `Parallel
 ports `__.

[PATCH] scsi: ses: don't ask for diagnostic pages repeatedly during probe

2017-10-10 Thread Li Dongyang

We are testing if there is a match with the ses device in a loop
by calling ses_match_to_enclosure(), which will issue scsi receive
diagnostics commands to the ses device for every device on the
same host.
On one of our boxes with 840 disks, it takes a long time to load
the driver:

[root@g1b-oss06 ~]# time modprobe ses

real40m48.247s
user0m0.001s
sys 0m0.196s

With the patch:

[root@g1b-oss06 ~]# time modprobe ses

real0m17.915s
user0m0.008s
sys 0m0.053s

Note that we still need to refresh page 10 when we see a new disk
to create the link.

Signed-off-by: Li Dongyang 
---
 drivers/scsi/ses.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index 11826c5c2dd4..62f04c0511cf 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -615,13 +615,16 @@ static void ses_enclosure_data_process(struct 
enclosure_device *edev,
 }
 
 static void ses_match_to_enclosure(struct enclosure_device *edev,
-  struct scsi_device *sdev)
+  struct scsi_device *sdev,
+  int refresh)
 {
+   struct scsi_device *edev_sdev = to_scsi_device(edev->edev.parent);
struct efd efd = {
.addr = 0,
};
 
-   ses_enclosure_data_process(edev, to_scsi_device(edev->edev.parent), 0);
+   if (refresh)
+   ses_enclosure_data_process(edev, edev_sdev, 0);
 
if (scsi_is_sas_rphy(sdev->sdev_target->dev.parent))
efd.addr = sas_get_address(sdev);
@@ -652,7 +655,7 @@ static int ses_intf_add(struct device *cdev,
struct enclosure_device *prev = NULL;
 
while ((edev = enclosure_find(>host->shost_gendev, prev)) 
!= NULL) {
-   ses_match_to_enclosure(edev, sdev);
+   ses_match_to_enclosure(edev, sdev, 1);
prev = edev;
}
return -ENODEV;
@@ -768,7 +771,7 @@ static int ses_intf_add(struct device *cdev,
shost_for_each_device(tmp_sdev, sdev->host) {
if (tmp_sdev->lun != 0 || scsi_device_enclosure(tmp_sdev))
continue;
-   ses_match_to_enclosure(edev, tmp_sdev);
+   ses_match_to_enclosure(edev, tmp_sdev, 0);
}
 
return 0;
-- 
2.14.2

Re: [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen

2017-10-10 Thread Bart Van Assche

On Tue, 2017-10-10 at 15:30 -0700, Shaohua Li wrote:
> On Tue, Oct 10, 2017 at 02:03:39PM -0700, Bart Van Assche wrote:
> > Some people use the md driver on laptops and use the suspend and
> > resume functionality. Since it is essential that submitting of
> > new I/O requests stops before a hibernation image is created,
> > interrupt the md resync and reshape actions if the system is
> > being frozen. Note: the resync and reshape will restart after
> > the system is resumed and a message similar to the following
> > will appear in the system log:
> > 
> > md: md0: data-check interrupted.
> 
> Where do we restart resync and reshape?

Hello Shaohua,

My understanding of the md driver is as follows:
- Before any write occurs, md_write_start() is called. That function clears
  mddev->safemode and checks mddev->in_sync. If that variable is set, it is
  cleared and md_write_start() wakes up mddev->thread and waits until the
  superblock has been written to disk.
- All mddev->thread implementations call md_check_recovery(). That function
  updates the superblock by calling md_update_sb() if needed.
  md_check_recovery() also starts a resync thread if the array is not in
  sync. See also the comment block above md_check_recovery().
- md_do_sync() performs the resync. If it completes successfully the
  "in_sync" state is set (set_in_sync()). If it is interrupted the "in_sync"
  state is not modified.
- super_90_sync() sets the MD_SB_CLEAN flag in the on-disk superblock if the
  "in_sync" flag is set.

In other words: if the array is in sync the MD_SB_CLEAN flag is set in the
on-disk superblock. If a resync is needed that flag is not set. The
MD_SB_CLEAN flag is checked when the superblock is loaded. If that flag is
not set a resync is started.

Bart.

RE: [PATCH 2/2] hpsa: destroy sas transport properties before scsi_host

2017-10-10 Thread Don Brace

> -Original Message-
> From: Johannes Thumshirn [mailto:jthumsh...@suse.de]
> Sent: Monday, November 21, 2016 8:15 AM
> To: Martin Wilck 
> Cc: Don Brace ; dl-esc-Team ESD Storage Dev
> Support ;
> iss_storage...@hp.com; linux-scsi@vger.kernel.org; jbottom...@odin.com;
> h...@lst.de; h...@suse.de
> Subject: Re: [PATCH 2/2] hpsa: destroy sas transport properties before
> scsi_host
> 
> EXTERNAL EMAIL
> 
> 
> On Mon, Nov 21, 2016 at 03:04:29PM +0100, Martin Wilck wrote:
> > Unloading the hpsa driver causes warnings
> >
> > [ 1063.793652] WARNING: CPU: 1 PID: 4850 at ../fs/sysfs/group.c:237
> device_del+0x54/0x240()
> > [ 1063.793659] sysfs group 81cf21a0 not found for kobject 'port-2:0'
> >
> > with two different stacks:
> > 1)
> > [ 1063.793774]  [] device_del+0x54/0x240
> > [ 1063.793780]  []
> transport_remove_classdev+0x4a/0x60
> > [ 1063.793784]  []
> attribute_container_device_trigger+0xa6/0xb0
> > [ 1063.793802]  [] sas_port_delete+0x126/0x160
> [scsi_transport_sas]
> > [ 1063.793819]  [] hpsa_free_sas_port+0x3c/0x70 [hpsa]
> >
> > 2)
> > [ 1063.797103]  [] device_del+0x54/0x240
> > [ 1063.797118]  [] sas_port_delete+0x12e/0x160
> [scsi_transport_sas]
> > [ 1063.797134]  [] hpsa_free_sas_port+0x3c/0x70 [hpsa]
> >
> > This is caused by the fact that host device hostX is deleted before the
> > SAS transport devices hostX/port-a:b.
> >
> > This patch fixes this by reverting the order of device deletions.
> >
> > References: bsc#1010946
> > Signed-off-by: Martin Wilck 
> > ---
> 
> With the References changed to the bug link like in patch 1/2
> Reviewed-by: Johannes Thumshirn 

Now that Hannes's patch  9441284fbc39610c0f9ec0ed118ff85d78352906
has been applied, this patch corrects the stack trace issue.

Would you like to re-submit this patch or would you like me to send it up?
I'll run some quick tests if you do decide to send it up. 
If you want me to send it up, you will get the credit anyway.

Thanks for your help and attention to this issue.
And thanks again to Hannes.

Thanks,
Don Brace
ESC - Smart Storage
Microsemi Corporation


> --
> Johannes Thumshirn  Storage
> jthumsh...@suse.de+49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen

2017-10-10 Thread Shaohua Li

On Tue, Oct 10, 2017 at 02:03:39PM -0700, Bart Van Assche wrote:
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before a hibernation image is created,
> interrupt the md resync and reshape actions if the system is
> being frozen. Note: the resync and reshape will restart after
> the system is resumed and a message similar to the following
> will appear in the system log:
> 
> md: md0: data-check interrupted.

Where do we restart resync and reshape?
 
> Signed-off-by: Bart Van Assche 
> Cc: Shaohua Li 
> Cc: linux-r...@vger.kernel.org
> Cc: Ming Lei 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---
>  drivers/md/md.c | 30 +-
>  1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index b99584e5d6b1..d712d3320c1d 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -66,6 +66,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include 
>  #include "md.h"
> @@ -7439,6 +7441,7 @@ static int md_thread(void *arg)
>*/
>  
>   allow_signal(SIGKILL);
> + set_freezable();
>   while (!kthread_should_stop()) {
>  
>   /* We need to wait INTERRUPTIBLE so that
> @@ -7449,7 +7452,7 @@ static int md_thread(void *arg)
>   if (signal_pending(current))
>   flush_signals(current);
>  
> - wait_event_interruptible_timeout
> + wait_event_freezable_timeout
>   (thread->wqueue,
>test_bit(THREAD_WAKEUP, >flags)
>|| kthread_should_stop() || kthread_should_park(),
> @@ -8963,6 +8966,29 @@ static void md_stop_all_writes(void)
>   mdelay(1000*1);
>  }
>  
> +/*
> + * Ensure that neither resyncing nor reshaping occurs while the system is
> + * frozen.
> + */
> +static int md_notify_pm(struct notifier_block *bl, unsigned long state,
> + void *unused)
> +{
> + pr_debug("%s: state = %ld\n", __func__, state);
> +
> + switch (state) {
> + case PM_HIBERNATION_PREPARE:
> + case PM_SUSPEND_PREPARE:
> + case PM_RESTORE_PREPARE:
> + md_stop_all_writes();
> + break;
> + }
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block md_pm_notifier = {
> + .notifier_call  = md_notify_pm,
> +};
> +
>  static int md_notify_reboot(struct notifier_block *this,
>   unsigned long code, void *x)
>  {
> @@ -9009,6 +9035,7 @@ static int __init md_init(void)
>   md_probe, NULL, NULL);
>  
>   register_reboot_notifier(_reboot_notifier);
> + register_pm_notifier(_pm_notifier);
>   raid_table_header = register_sysctl_table(raid_root_table);
>  
>   md_geninit();
> @@ -9248,6 +9275,7 @@ static __exit void md_exit(void)
>  
>   unregister_blkdev(MD_MAJOR,"md");
>   unregister_blkdev(mdp_major, "mdp");
> + unregister_pm_notifier(_pm_notifier);
>   unregister_reboot_notifier(_reboot_notifier);
>   unregister_sysctl_table(raid_table_header);
>  
> -- 
> 2.14.2
>

[PATCH] scsi: ibmvscsi: Convert timers to use timer_setup()

2017-10-10 Thread Kees Cook

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: "Martin K. Petersen" 
Cc: Tyrel Datwyler 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: "James E.J. Bottomley" 
Cc: linux-scsi@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Kees Cook 
---
This requires commit 686fef928bba ("timer: Prepare to change timer
callback argument type") in v4.14-rc3, but should be otherwise
stand-alone.
---
 drivers/scsi/ibmvscsi/ibmvfc.c   | 14 ++
 drivers/scsi/ibmvscsi/ibmvscsi.c |  7 +++
 2 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index b491af31a5f8..0d2f7eb3acb6 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1393,8 +1393,9 @@ static int ibmvfc_map_sg_data(struct scsi_cmnd *scmd,
  *
  * Called when an internally generated command times out
  **/
-static void ibmvfc_timeout(struct ibmvfc_event *evt)
+static void ibmvfc_timeout(struct timer_list *t)
 {
+   struct ibmvfc_event *evt = from_timer(evt, t, timer);
struct ibmvfc_host *vhost = evt->vhost;
dev_err(vhost->dev, "Command timed out (%p). Resetting connection\n", 
evt);
ibmvfc_reset_host(vhost);
@@ -1424,12 +1425,10 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
BUG();
 
list_add_tail(>queue, >sent);
-   init_timer(>timer);
+   timer_setup(>timer, ibmvfc_timeout, 0);
 
if (timeout) {
-   evt->timer.data = (unsigned long) evt;
evt->timer.expires = jiffies + (timeout * HZ);
-   evt->timer.function = (void (*)(unsigned long))ibmvfc_timeout;
add_timer(>timer);
}
 
@@ -3692,8 +3691,9 @@ static void ibmvfc_tgt_adisc_cancel_done(struct 
ibmvfc_event *evt)
  * out, reset the CRQ. When the ADISC comes back as cancelled,
  * log back into the target.
  **/
-static void ibmvfc_adisc_timeout(struct ibmvfc_target *tgt)
+static void ibmvfc_adisc_timeout(struct timer_list *t)
 {
+   struct ibmvfc_target *tgt = from_timer(tgt, t, timer);
struct ibmvfc_host *vhost = tgt->vhost;
struct ibmvfc_event *evt;
struct ibmvfc_tmf *tmf;
@@ -3778,9 +3778,7 @@ static void ibmvfc_tgt_adisc(struct ibmvfc_target *tgt)
if (timer_pending(>timer))
mod_timer(>timer, jiffies + (IBMVFC_ADISC_TIMEOUT * HZ));
else {
-   tgt->timer.data = (unsigned long) tgt;
tgt->timer.expires = jiffies + (IBMVFC_ADISC_TIMEOUT * HZ);
-   tgt->timer.function = (void (*)(unsigned 
long))ibmvfc_adisc_timeout;
add_timer(>timer);
}
 
@@ -3912,7 +3910,7 @@ static int ibmvfc_alloc_target(struct ibmvfc_host *vhost, 
u64 scsi_id)
tgt->vhost = vhost;
tgt->need_login = 1;
tgt->cancel_key = vhost->task_set++;
-   init_timer(>timer);
+   timer_setup(>timer, ibmvfc_adisc_timeout, 0);
kref_init(>kref);
ibmvfc_init_tgt(tgt, ibmvfc_tgt_implicit_logout);
spin_lock_irqsave(vhost->host->host_lock, flags);
diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index 7d156b161482..17df76f0be3c 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -837,8 +837,9 @@ static void ibmvscsi_reset_host(struct ibmvscsi_host_data 
*hostdata)
  *
  * Called when an internally generated command times out
 */
-static void ibmvscsi_timeout(struct srp_event_struct *evt_struct)
+static void ibmvscsi_timeout(struct timer_list *t)
 {
+   struct srp_event_struct *evt_struct = from_timer(evt_struct, t, timer);
struct ibmvscsi_host_data *hostdata = evt_struct->hostdata;
 
dev_err(hostdata->dev, "Command timed out (%x). Resetting connection\n",
@@ -927,11 +928,9 @@ static int ibmvscsi_send_srp_event(struct srp_event_struct 
*evt_struct,
 */
list_add_tail(_struct->list, >sent);
 
-   init_timer(_struct->timer);
+   timer_setup(_struct->timer, ibmvscsi_timeout, 0);
if (timeout) {
-   evt_struct->timer.data = (unsigned long) evt_struct;
evt_struct->timer.expires = jiffies + (timeout * HZ);
-   evt_struct->timer.function = (void (*)(unsigned 
long))ibmvscsi_timeout;
add_timer(_struct->timer);
}
 
-- 
2.7.4


-- 
Kees Cook
Pixel Security

Re: [PATCH] scsi: be2iscsi: Use kasprintf

2017-10-10 Thread Kyle Fortin

Hi Himanshu,

On Oct 6, 2017, at 2:57 PM, Himanshu Jha  wrote:
> 
> Use kasprintf instead of combination of kmalloc and sprintf.
> 
> Signed-off-by: Himanshu Jha 
> ---
> drivers/scsi/be2iscsi/be_main.c | 12 +---
> 1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
> index b4542e7..6a9ee0e 100644
> --- a/drivers/scsi/be2iscsi/be_main.c
> +++ b/drivers/scsi/be2iscsi/be_main.c
> @@ -803,15 +803,14 @@ static int beiscsi_init_irqs(struct beiscsi_hba *phba)
> 
>   if (pcidev->msix_enabled) {
>   for (i = 0; i < phba->num_cpus; i++) {
> - phba->msi_name[i] = kzalloc(BEISCSI_MSI_NAME,
> - GFP_KERNEL);
> + phba->msi_name[i] = kasprintf(GFP_KERNEL,
> +   "beiscsi_%02x_%02x",
> +   phba->shost->host_no, i);
>   if (!phba->msi_name[i]) {
>   ret = -ENOMEM;
>   goto free_msix_irqs;
>   }
> 
> - sprintf(phba->msi_name[i], "beiscsi_%02x_%02x",
> - phba->shost->host_no, i);
>   ret = request_irq(pci_irq_vector(pcidev, i),
> be_isr_msix, 0, phba->msi_name[i],
> _context->be_eq[i]);
> @@ -824,13 +823,12 @@ static int beiscsi_init_irqs(struct beiscsi_hba *phba)
>   goto free_msix_irqs;
>   }
>   }
> - phba->msi_name[i] = kzalloc(BEISCSI_MSI_NAME, GFP_KERNEL);
> + phba->msi_name[i] = kasprintf(GFP_KERNEL, "beiscsi_mcc_%02x",
> +   phba->shost->host_no);
>   if (!phba->msi_name[i]) {
>   ret = -ENOMEM;
>   goto free_msix_irqs;
>   }
> - sprintf(phba->msi_name[i], "beiscsi_mcc_%02x",
> - phba->shost->host_no);
>   ret = request_irq(pci_irq_vector(pcidev, i), be_isr_mcc, 0,
> phba->msi_name[i], _context->be_eq[i]);
>   if (ret) {
> -- 
> 2.7.4

Since you are getting rid of the only use for BEISCSI_MSI_NAME within 
drivers/scsi/be2iscsi/be_main.h, that should be removed too.

--
Kyle Fortin - Oracle Linux Engineering

[PATCH v8 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests

2017-10-10 Thread Bart Van Assche

Convert blk_get_request(q, op, __GFP_RECLAIM) into
blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not
change any functionality.

Signed-off-by: Bart Van Assche 
Cc: Martin K. Petersen 
Acked-by: David S. Miller  [ for IDE ]
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 drivers/ide/ide-pm.c| 4 ++--
 drivers/scsi/scsi_lib.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c
index 544f02d673ca..f56d742908df 100644
--- a/drivers/ide/ide-pm.c
+++ b/drivers/ide/ide-pm.c
@@ -89,9 +89,9 @@ int generic_ide_resume(struct device *dev)
}
 
memset(, 0, sizeof(rqpm));
-   rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
+   rq = blk_get_request_flags(drive->queue, REQ_OP_DRV_IN,
+  BLK_MQ_REQ_PREEMPT);
ide_req(rq)->type = ATA_PRIV_PM_RESUME;
-   rq->rq_flags |= RQF_PREEMPT;
rq->special = 
rqpm.pm_step = IDE_PM_START_RESUME;
rqpm.pm_state = PM_EVENT_ON;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..1c16a247fae6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -252,9 +252,9 @@ int scsi_execute(struct scsi_device *sdev, const unsigned 
char *cmd,
struct scsi_request *rq;
int ret = DRIVER_ERROR << 24;
 
-   req = blk_get_request(sdev->request_queue,
+   req = blk_get_request_flags(sdev->request_queue,
data_direction == DMA_TO_DEVICE ?
-   REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM);
+   REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, BLK_MQ_REQ_PREEMPT);
if (IS_ERR(req))
return ret;
rq = scsi_req(req);
@@ -268,7 +268,7 @@ int scsi_execute(struct scsi_device *sdev, const unsigned 
char *cmd,
rq->retries = retries;
req->timeout = timeout;
req->cmd_flags |= flags;
-   req->rq_flags |= rq_flags | RQF_QUIET | RQF_PREEMPT;
+   req->rq_flags |= rq_flags | RQF_QUIET;
 
/*
 * head injection *required* here otherwise quiesce won't work
-- 
2.14.2

[PATCH v8 06/10] block: Introduce BLK_MQ_REQ_PREEMPT

2017-10-10 Thread Bart Van Assche

Set RQF_PREEMPT if BLK_MQ_REQ_PREEMPT is passed to
blk_get_request_flags().

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Ming Lei 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 block/blk-core.c   | 4 +++-
 block/blk-mq.c | 2 ++
 include/linux/blk-mq.h | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 03156d9ede94..842e67e0b7ab 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1260,6 +1260,8 @@ static struct request *__get_request(struct request_list 
*rl, unsigned int op,
blk_rq_set_rl(rq, rl);
rq->cmd_flags = op;
rq->rq_flags = rq_flags;
+   if (flags & BLK_MQ_REQ_PREEMPT)
+   rq->rq_flags |= RQF_PREEMPT;
 
/* init elvpriv */
if (rq_flags & RQF_ELVPRIV) {
@@ -1441,7 +1443,7 @@ struct request *blk_get_request_flags(struct 
request_queue *q, unsigned int op,
struct request *req;
 
WARN_ON_ONCE(op & REQ_NOWAIT);
-   WARN_ON_ONCE(flags & ~BLK_MQ_REQ_NOWAIT);
+   WARN_ON_ONCE(flags & ~(BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT));
 
if (q->mq_ops) {
req = blk_mq_alloc_request(q, op, flags);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8766587ae8b8..bdbfe760bda0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -290,6 +290,8 @@ static struct request *blk_mq_rq_ctx_init(struct 
blk_mq_alloc_data *data,
rq->q = data->q;
rq->mq_ctx = data->ctx;
rq->cmd_flags = op;
+   if (data->flags & BLK_MQ_REQ_PREEMPT)
+   rq->rq_flags |= RQF_PREEMPT;
if (blk_queue_io_stat(data->q))
rq->rq_flags |= RQF_IO_STAT;
/* do not touch atomic flags, it needs atomic ops against the timer */
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 50c6485cb04f..ee9fc3fb7fca 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -200,6 +200,7 @@ enum {
BLK_MQ_REQ_NOWAIT   = (1 << 0), /* return when out of requests */
BLK_MQ_REQ_RESERVED = (1 << 1), /* allocate from reserved pool */
BLK_MQ_REQ_INTERNAL = (1 << 2), /* allocate internal/sched tag */
+   BLK_MQ_REQ_PREEMPT  = (1 << 3), /* set RQF_PREEMPT */
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
-- 
2.14.2

[PATCH v8 04/10] block: Make q_usage_counter also track legacy requests

2017-10-10 Thread Bart Van Assche

From: Ming Lei 

This patch makes it possible to pause request allocation for
the legacy block layer by calling blk_mq_freeze_queue() and
blk_mq_unfreeze_queue().

Signed-off-by: Ming Lei 
[ bvanassche: Combined two patches into one, edited a comment and made sure
  REQ_NOWAIT is handled properly in blk_old_get_request() ]
Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
Cc: Ming Lei 
Cc: Hannes Reinecke 
---
 block/blk-core.c | 12 
 block/blk-mq.c   | 10 ++
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14f7674fa0b1..265a69511c47 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -610,6 +610,9 @@ void blk_set_queue_dying(struct request_queue *q)
}
spin_unlock_irq(q->queue_lock);
}
+
+   /* Make blk_queue_enter() reexamine the DYING flag. */
+   wake_up_all(>mq_freeze_wq);
 }
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
@@ -1395,16 +1398,22 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
   unsigned int op, gfp_t gfp_mask)
 {
struct request *rq;
+   int ret = 0;
 
WARN_ON_ONCE(q->mq_ops);
 
/* create ioc upfront */
create_io_context(gfp_mask, q->node);
 
+   ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ||
+ (op & REQ_NOWAIT));
+   if (ret)
+   return ERR_PTR(ret);
spin_lock_irq(q->queue_lock);
rq = get_request(q, op, NULL, gfp_mask);
if (IS_ERR(rq)) {
spin_unlock_irq(q->queue_lock);
+   blk_queue_exit(q);
return rq;
}
 
@@ -1576,6 +1585,7 @@ void __blk_put_request(struct request_queue *q, struct 
request *req)
blk_free_request(rl, req);
freed_request(rl, sync, rq_flags);
blk_put_rl(rl);
+   blk_queue_exit(q);
}
 }
 EXPORT_SYMBOL_GPL(__blk_put_request);
@@ -1857,8 +1867,10 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio)
 * Grab a free request. This is might sleep but can not fail.
 * Returns with the queue unlocked.
 */
+   blk_queue_enter_live(q);
req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
if (IS_ERR(req)) {
+   blk_queue_exit(q);
__wbt_done(q->rq_wb, wb_acct);
if (PTR_ERR(req) == -ENOMEM)
bio->bi_status = BLK_STS_RESOURCE;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 076cbab9c3e0..8766587ae8b8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,7 +125,8 @@ void blk_freeze_queue_start(struct request_queue *q)
freeze_depth = atomic_inc_return(>mq_freeze_depth);
if (freeze_depth == 1) {
percpu_ref_kill(>q_usage_counter);
-   blk_mq_run_hw_queues(q, false);
+   if (q->mq_ops)
+   blk_mq_run_hw_queues(q, false);
}
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
@@ -255,13 +256,6 @@ void blk_mq_wake_waiters(struct request_queue *q)
queue_for_each_hw_ctx(q, hctx, i)
if (blk_mq_hw_queue_mapped(hctx))
blk_mq_tag_wakeup_all(hctx->tags, true);
-
-   /*
-* If we are called because the queue has now been marked as
-* dying, we need to ensure that processes currently waiting on
-* the queue are notified as well.
-*/
-   wake_up_all(>mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
-- 
2.14.2

[PATCH v8 05/10] block: Introduce blk_get_request_flags()

2017-10-10 Thread Bart Van Assche

A side effect of this patch is that the GFP mask that is passed to
several allocation functions in the legacy block layer is changed
from GFP_KERNEL into __GFP_DIRECT_RECLAIM.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Ming Lei 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 block/blk-core.c   | 50 +++---
 include/linux/blkdev.h |  3 +++
 2 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 265a69511c47..03156d9ede94 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1157,7 +1157,7 @@ int blk_update_nr_requests(struct request_queue *q, 
unsigned int nr)
  * @rl: request list to allocate from
  * @op: operation and flags
  * @bio: bio to allocate request for (can be %NULL)
- * @gfp_mask: allocation mask
+ * @flags: BLQ_MQ_REQ_* flags
  *
  * Get a free request from @q.  This function may fail under memory
  * pressure or if @q is dead.
@@ -1167,7 +1167,7 @@ int blk_update_nr_requests(struct request_queue *q, 
unsigned int nr)
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *__get_request(struct request_list *rl, unsigned int op,
-   struct bio *bio, gfp_t gfp_mask)
+struct bio *bio, unsigned int flags)
 {
struct request_queue *q = rl->q;
struct request *rq;
@@ -1176,6 +1176,8 @@ static struct request *__get_request(struct request_list 
*rl, unsigned int op,
struct io_cq *icq = NULL;
const bool is_sync = op_is_sync(op);
int may_queue;
+   gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
+__GFP_DIRECT_RECLAIM;
req_flags_t rq_flags = RQF_ALLOCED;
 
lockdep_assert_held(q->queue_lock);
@@ -1336,7 +1338,7 @@ static struct request *__get_request(struct request_list 
*rl, unsigned int op,
  * @q: request_queue to allocate request from
  * @op: operation and flags
  * @bio: bio to allocate request for (can be %NULL)
- * @gfp_mask: allocation mask
+ * @flags: BLK_MQ_REQ_* flags.
  *
  * Get a free request from @q.  If %__GFP_DIRECT_RECLAIM is set in @gfp_mask,
  * this function keeps retrying under memory pressure and fails iff @q is dead.
@@ -1346,7 +1348,7 @@ static struct request *__get_request(struct request_list 
*rl, unsigned int op,
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *get_request(struct request_queue *q, unsigned int op,
-   struct bio *bio, gfp_t gfp_mask)
+  struct bio *bio, unsigned int flags)
 {
const bool is_sync = op_is_sync(op);
DEFINE_WAIT(wait);
@@ -1358,7 +1360,7 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
 
rl = blk_get_rl(q, bio);/* transferred to @rq on success */
 retry:
-   rq = __get_request(rl, op, bio, gfp_mask);
+   rq = __get_request(rl, op, bio, flags);
if (!IS_ERR(rq))
return rq;
 
@@ -1367,7 +1369,7 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
return ERR_PTR(-EAGAIN);
}
 
-   if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) 
{
+   if ((flags & BLK_MQ_REQ_NOWAIT) || unlikely(blk_queue_dying(q))) {
blk_put_rl(rl);
return rq;
}
@@ -1394,10 +1396,13 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
goto retry;
 }
 
+/* flags: BLK_MQ_REQ_PREEMPT and/or BLK_MQ_REQ_NOWAIT. */
 static struct request *blk_old_get_request(struct request_queue *q,
-  unsigned int op, gfp_t gfp_mask)
+  unsigned int op, unsigned int flags)
 {
struct request *rq;
+   gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
+__GFP_DIRECT_RECLAIM;
int ret = 0;
 
WARN_ON_ONCE(q->mq_ops);
@@ -1410,7 +1415,7 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
if (ret)
return ERR_PTR(ret);
spin_lock_irq(q->queue_lock);
-   rq = get_request(q, op, NULL, gfp_mask);
+   rq = get_request(q, op, NULL, flags);
if (IS_ERR(rq)) {
spin_unlock_irq(q->queue_lock);
blk_queue_exit(q);
@@ -1424,25 +1429,40 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
return rq;
 }
 
-struct request *blk_get_request(struct request_queue *q, unsigned int op,
-   gfp_t gfp_mask)
+/**
+ * blk_get_request_flags - allocate a request
+ * @q: request queue to allocate a request for
+ * @op: operation (REQ_OP_*) and REQ_* flags, e.g. REQ_SYNC.
+ * @flags: BLK_MQ_REQ_* flags, e.g.

[PATCH v8 00/10] block, scsi, md: Improve suspend and resume

2017-10-10 Thread Bart Van Assche

Hello Jens,

It is known that during the resume following a hibernate, especially when
using an md RAID1 array created on top of SCSI devices, sometimes the system
hangs instead of coming up properly. This patch series fixes that
problem. These patches have been tested on top of the block layer for-next
branch. Please consider these changes for kernel v4.15.

Thanks,

Bart.

Changes between v7 and v8:
- Fixed a (theoretical?) race identified by Ming Lei.
- Added a tenth patch that checks whether the proper type of flags has been
  passed to a range of block layer functions.

Changes between v6 and v7:
- Added support for the PREEMPT_ONLY flag in blk-mq-debugfs.c.
- Fixed kerneldoc header of blk_queue_enter().
- Added a rcu_read_lock_sched() / rcu_read_unlock_sched() pair in
  blk_set_preempt_only().
- Removed a synchronize_rcu() call from scsi_device_quiesce().
- Modified the description of patch 9/9 in this series.
- Removed scsi_run_queue() call from scsi_device_resume().

Changes between v5 and v6:
- Split an md patch into two patches to make it easier to review the changes.
- For the md patch that suspends I/O while the system is frozen, switched back
  to the freezer mechanism because this makes the code shorter and easier to
  review.
- Changed blk_set/clear_preempt_only() from EXPORT_SYMBOL() into
  EXPORT_SYMBOL_GPL().
- Made blk_set_preempt_only() behave as a test-and-set operation.
- Introduced blk_get_request_flags() and BLK_MQ_REQ_PREEMPT as requested by
  Christoph and reduced the number of arguments of blk_queue_enter() back from
  three to two.
- In scsi_device_quiesce(), moved the blk_mq_freeze_queue() call out of a
  critical section. Made the explanation of why the synchronize_rcu() call
  is necessary more detailed.

Changes between v4 and v5:
- Split blk_set_preempt_only() into two functions as requested by Christoph.
- Made blk_get_request() trigger WARN_ONCE() if it is attempted to allocate
  a request while the system is frozen. This is a big help to identify drivers
  that submit I/O while the system is frozen.
- Since kernel thread freezing is on its way out, reworked the approach for
  avoiding that the md driver submits I/O while the system is frozen such that
  the approach no longer depends on the kernel thread freeze mechanism.
- Fixed the (theoretical) deadlock in scsi_device_quiesce() that was identified
  by Ming.

Changes between v3 and v4:
- Made sure that this patch series not only works for scsi-mq but also for
  the legacy SCSI stack.
- Removed an smp_rmb()/smp_wmb() pair from the hot path and added a comment
  that explains why that is safe.
- Reordered the patches in this patch series.

Changes between v2 and v3:
- Made md kernel threads freezable.
- Changed the approach for quiescing SCSI devices again.
- Addressed Ming's review comments.

Changes compared to v1 of this patch series:
- Changed the approach and rewrote the patch series.

Bart Van Assche (9):
  md: Rename md_notifier into md_reboot_notifier
  md: Introduce md_stop_all_writes()
  md: Neither resync nor reshape while the system is frozen
  block: Introduce blk_get_request_flags()
  block: Introduce BLK_MQ_REQ_PREEMPT
  ide, scsi: Tell the block layer at request allocation time about
preempt requests
  block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  block, scsi: Make SCSI quiesce and resume work reliably
  block, nvme: Introduce blk_mq_req_flags_t

Ming Lei (1):
  block: Make q_usage_counter also track legacy requests

 block/blk-core.c  | 132 +++---
 block/blk-mq-debugfs.c|   1 +
 block/blk-mq.c|  20 +++
 block/blk-mq.h|   2 +-
 block/blk-timeout.c   |   2 +-
 drivers/ide/ide-pm.c  |   4 +-
 drivers/md/md.c   |  45 +---
 drivers/nvme/host/core.c  |   5 +-
 drivers/nvme/host/nvme.h  |   5 +-
 drivers/scsi/scsi_lib.c   |  35 +++-
 fs/block_dev.c|   4 +-
 include/linux/blk-mq.h|  16 --
 include/linux/blk_types.h |   2 +
 include/linux/blkdev.h|  11 +++-
 14 files changed, 217 insertions(+), 67 deletions(-)

-- 
2.14.2

[PATCH v8 10/10] block, nvme: Introduce blk_mq_req_flags_t

2017-10-10 Thread Bart Van Assche

Several block layer and NVMe core functions accept a combination
of BLK_MQ_REQ_* flags through the 'flags' argument but there is
no verification at compile time whether the right type of block
layer flags is passed. Make it possible for sparse to verify this.
This patch does not change any functionality.

Signed-off-by: Bart Van Assche 
Cc: linux-n...@lists.infradead.org
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
Cc: Ming Lei 
---
 block/blk-core.c  | 12 ++--
 block/blk-mq.c|  4 ++--
 block/blk-mq.h|  2 +-
 drivers/nvme/host/core.c  |  5 +++--
 drivers/nvme/host/nvme.h  |  5 +++--
 include/linux/blk-mq.h| 17 +++--
 include/linux/blk_types.h |  2 ++
 include/linux/blkdev.h|  4 ++--
 8 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3847ea42e341..3bd4d42cee80 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -799,7 +799,7 @@ EXPORT_SYMBOL(blk_alloc_queue);
  * @q: request queue pointer
  * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
  */
-int blk_queue_enter(struct request_queue *q, unsigned int flags)
+int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
 
@@ -1224,7 +1224,7 @@ int blk_update_nr_requests(struct request_queue *q, 
unsigned int nr)
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *__get_request(struct request_list *rl, unsigned int op,
-struct bio *bio, unsigned int flags)
+struct bio *bio, blk_mq_req_flags_t flags)
 {
struct request_queue *q = rl->q;
struct request *rq;
@@ -1407,7 +1407,7 @@ static struct request *__get_request(struct request_list 
*rl, unsigned int op,
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *get_request(struct request_queue *q, unsigned int op,
-  struct bio *bio, unsigned int flags)
+  struct bio *bio, blk_mq_req_flags_t flags)
 {
const bool is_sync = op_is_sync(op);
DEFINE_WAIT(wait);
@@ -1457,7 +1457,7 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
 
 /* flags: BLK_MQ_REQ_PREEMPT and/or BLK_MQ_REQ_NOWAIT. */
 static struct request *blk_old_get_request(struct request_queue *q,
-  unsigned int op, unsigned int flags)
+   unsigned int op, blk_mq_req_flags_t flags)
 {
struct request *rq;
gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
@@ -1494,7 +1494,7 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
  * @flags: BLK_MQ_REQ_* flags, e.g. BLK_MQ_REQ_NOWAIT.
  */
 struct request *blk_get_request_flags(struct request_queue *q, unsigned int op,
- unsigned int flags)
+ blk_mq_req_flags_t flags)
 {
struct request *req;
 
@@ -2290,7 +2290,7 @@ blk_qc_t generic_make_request(struct bio *bio)
current->bio_list = bio_list_on_stack;
do {
struct request_queue *q = bio->bi_disk->queue;
-   unsigned int flags = bio->bi_opf & REQ_NOWAIT ?
+   blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
BLK_MQ_REQ_NOWAIT : 0;
 
if (likely(blk_queue_enter(q, flags) == 0)) {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 44a06e8541f2..8fa03e3f97ca 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -380,7 +380,7 @@ static struct request *blk_mq_get_request(struct 
request_queue *q,
 }
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
-   unsigned int flags)
+   blk_mq_req_flags_t flags)
 {
struct blk_mq_alloc_data alloc_data = { .flags = flags };
struct request *rq;
@@ -406,7 +406,7 @@ struct request *blk_mq_alloc_request(struct request_queue 
*q, unsigned int op,
 EXPORT_SYMBOL(blk_mq_alloc_request);
 
 struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
-   unsigned int op, unsigned int flags, unsigned int hctx_idx)
+   unsigned int op, blk_mq_req_flags_t flags, unsigned int hctx_idx)
 {
struct blk_mq_alloc_data alloc_data = { .flags = flags };
struct request *rq;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ef15b3414da5..1f0f7bb3c7e4 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -108,7 +108,7 @@ static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
 struct blk_mq_alloc_data {
/* input parameter */
struct request_queue *q;
-   unsigned int flags;
+   blk_mq_req_flags_t flags;
unsigned int shallow_depth;

[PATCH v8 02/10] md: Introduce md_stop_all_writes()

2017-10-10 Thread Bart Van Assche

Introduce md_stop_all_writes() because the next patch will add
a second caller for this function. This patch does not change
any functionality.

Signed-off-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
---
 drivers/md/md.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8933cafc212d..b99584e5d6b1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8937,8 +8937,7 @@ int rdev_clear_badblocks(struct md_rdev *rdev, sector_t 
s, int sectors,
 }
 EXPORT_SYMBOL_GPL(rdev_clear_badblocks);
 
-static int md_notify_reboot(struct notifier_block *this,
-   unsigned long code, void *x)
+static void md_stop_all_writes(void)
 {
struct list_head *tmp;
struct mddev *mddev;
@@ -8962,6 +8961,12 @@ static int md_notify_reboot(struct notifier_block *this,
 */
if (need_delay)
mdelay(1000*1);
+}
+
+static int md_notify_reboot(struct notifier_block *this,
+   unsigned long code, void *x)
+{
+   md_stop_all_writes();
 
return NOTIFY_DONE;
 }
-- 
2.14.2

[PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen

2017-10-10 Thread Bart Van Assche

Some people use the md driver on laptops and use the suspend and
resume functionality. Since it is essential that submitting of
new I/O requests stops before a hibernation image is created,
interrupt the md resync and reshape actions if the system is
being frozen. Note: the resync and reshape will restart after
the system is resumed and a message similar to the following
will appear in the system log:

md: md0: data-check interrupted.

Signed-off-by: Bart Van Assche 
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 drivers/md/md.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index b99584e5d6b1..d712d3320c1d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -66,6 +66,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include "md.h"
@@ -7439,6 +7441,7 @@ static int md_thread(void *arg)
 */
 
allow_signal(SIGKILL);
+   set_freezable();
while (!kthread_should_stop()) {
 
/* We need to wait INTERRUPTIBLE so that
@@ -7449,7 +7452,7 @@ static int md_thread(void *arg)
if (signal_pending(current))
flush_signals(current);
 
-   wait_event_interruptible_timeout
+   wait_event_freezable_timeout
(thread->wqueue,
 test_bit(THREAD_WAKEUP, >flags)
 || kthread_should_stop() || kthread_should_park(),
@@ -8963,6 +8966,29 @@ static void md_stop_all_writes(void)
mdelay(1000*1);
 }
 
+/*
+ * Ensure that neither resyncing nor reshaping occurs while the system is
+ * frozen.
+ */
+static int md_notify_pm(struct notifier_block *bl, unsigned long state,
+   void *unused)
+{
+   pr_debug("%s: state = %ld\n", __func__, state);
+
+   switch (state) {
+   case PM_HIBERNATION_PREPARE:
+   case PM_SUSPEND_PREPARE:
+   case PM_RESTORE_PREPARE:
+   md_stop_all_writes();
+   break;
+   }
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block md_pm_notifier = {
+   .notifier_call  = md_notify_pm,
+};
+
 static int md_notify_reboot(struct notifier_block *this,
unsigned long code, void *x)
 {
@@ -9009,6 +9035,7 @@ static int __init md_init(void)
md_probe, NULL, NULL);
 
register_reboot_notifier(_reboot_notifier);
+   register_pm_notifier(_pm_notifier);
raid_table_header = register_sysctl_table(raid_root_table);
 
md_geninit();
@@ -9248,6 +9275,7 @@ static __exit void md_exit(void)
 
unregister_blkdev(MD_MAJOR,"md");
unregister_blkdev(mdp_major, "mdp");
+   unregister_pm_notifier(_pm_notifier);
unregister_reboot_notifier(_reboot_notifier);
unregister_sysctl_table(raid_table_header);
 
-- 
2.14.2

[PATCH v8 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag

2017-10-10 Thread Bart Van Assche

This flag will be used in the next patch to let the block layer
core know whether or not a SCSI request queue has been quiesced.
A quiesced SCSI queue namely only processes RQF_PREEMPT requests.

Signed-off-by: Bart Van Assche 
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 block/blk-core.c   | 30 ++
 block/blk-mq-debugfs.c |  1 +
 include/linux/blkdev.h |  6 ++
 3 files changed, 37 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 842e67e0b7ab..ed992cbd107f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -346,6 +346,36 @@ void blk_sync_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_sync_queue);
 
+/**
+ * blk_set_preempt_only - set QUEUE_FLAG_PREEMPT_ONLY
+ * @q: request queue pointer
+ *
+ * Returns the previous value of the PREEMPT_ONLY flag - 0 if the flag was not
+ * set and 1 if the flag was already set.
+ */
+int blk_set_preempt_only(struct request_queue *q)
+{
+   unsigned long flags;
+   int res;
+
+   spin_lock_irqsave(q->queue_lock, flags);
+   res = queue_flag_test_and_set(QUEUE_FLAG_PREEMPT_ONLY, q);
+   spin_unlock_irqrestore(q->queue_lock, flags);
+
+   return res;
+}
+EXPORT_SYMBOL_GPL(blk_set_preempt_only);
+
+void blk_clear_preempt_only(struct request_queue *q)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(q->queue_lock, flags);
+   queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
+   spin_unlock_irqrestore(q->queue_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
+
 /**
  * __blk_run_queue_uncond - run a queue whether or not it has been stopped
  * @q: The queue to run
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 7f4a1ba532af..75f31535f280 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -74,6 +74,7 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(REGISTERED),
QUEUE_FLAG_NAME(SCSI_PASSTHROUGH),
QUEUE_FLAG_NAME(QUIESCED),
+   QUEUE_FLAG_NAME(PREEMPT_ONLY),
 };
 #undef QUEUE_FLAG_NAME
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3f7b08a9b9ae..89555eea742b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -630,6 +630,7 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26  /* queue has been registered to a disk 
*/
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27 /* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED28  /* queue has been quiesced */
+#define QUEUE_FLAG_PREEMPT_ONLY29  /* only process REQ_PREEMPT 
requests */
 
 #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) |\
 (1 << QUEUE_FLAG_SAME_COMP)|   \
@@ -730,6 +731,11 @@ static inline void queue_flag_clear(unsigned int flag, 
struct request_queue *q)
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 REQ_FAILFAST_DRIVER))
 #define blk_queue_quiesced(q)  test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
+#define blk_queue_preempt_only(q)  \
+   test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags)
+
+extern int blk_set_preempt_only(struct request_queue *q);
+extern void blk_clear_preempt_only(struct request_queue *q);
 
 static inline bool blk_account_rq(struct request *rq)
 {
-- 
2.14.2

[PATCH v8 01/10] md: Rename md_notifier into md_reboot_notifier

2017-10-10 Thread Bart Van Assche

This avoids confusion with the pm notifier that will be added
through a later patch.

Signed-off-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Cc: Shaohua Li 
Cc: linux-r...@vger.kernel.org
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
---
 drivers/md/md.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5d61049e7417..8933cafc212d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8966,7 +8966,7 @@ static int md_notify_reboot(struct notifier_block *this,
return NOTIFY_DONE;
 }
 
-static struct notifier_block md_notifier = {
+static struct notifier_block md_reboot_notifier = {
.notifier_call  = md_notify_reboot,
.next   = NULL,
.priority   = INT_MAX, /* before any real devices */
@@ -9003,7 +9003,7 @@ static int __init md_init(void)
blk_register_region(MKDEV(mdp_major, 0), 1UL<

[PATCH v8 09/10] block, scsi: Make SCSI quiesce and resume work reliably

2017-10-10 Thread Bart Van Assche

The contexts from which a SCSI device can be quiesced or resumed are:
* Writing into /sys/class/scsi_device/*/device/state.
* SCSI parallel (SPI) domain validation.
* The SCSI device power management methods. See also scsi_bus_pm_ops.

It is essential during suspend and resume that neither the filesystem
state nor the filesystem metadata in RAM changes. This is why while
the hibernation image is being written or restored that SCSI devices
are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
and scsi_device_resume(). In the SDEV_QUIESCE state execution of
non-preempt requests is deferred. This is realized by returning
BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
devices. Avoid that a full queue prevents power management requests
to be submitted by deferring allocation of non-preempt requests for
devices in the quiesced state. This patch has been tested by running
the following commands and by verifying that after resume the fio job
is still running:

for d in /sys/class/block/sd*[a-z]; do
  hcil=$(readlink "$d/device")
  hcil=${hcil#../../../}
  echo 4 > "$d/queue/nr_requests"
  echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
done
bdev=$(readlink /dev/disk/by-uuid/5217d83f-213e-4b42-b86e-20013325ba6c)
bdev=${bdev#../../}
hcil=$(readlink "/sys/block/$bdev/device")
hcil=${hcil#../../../}
fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 --rw=randread \
  --ioengine=libaio --numjobs=4 --iodepth=16 --iodepth_batch=1 --thread \
  --loops=$((2**31)) &
pid=$!
sleep 1
systemctl hibernate
sleep 10
kill $pid

Reported-by: Oleksandr Natalenko 
References: "I/O hangs after resuming from suspend-to-ram" 
(https://marc.info/?l=linux-block=150340235201348).
Signed-off-by: Bart Van Assche 
Cc: Martin K. Petersen 
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 block/blk-core.c| 42 +++---
 block/blk-mq.c  |  4 ++--
 block/blk-timeout.c |  2 +-
 drivers/scsi/scsi_lib.c | 29 +++--
 fs/block_dev.c  |  4 ++--
 include/linux/blkdev.h  |  2 +-
 6 files changed, 60 insertions(+), 23 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ed992cbd107f..3847ea42e341 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -372,6 +372,7 @@ void blk_clear_preempt_only(struct request_queue *q)
 
spin_lock_irqsave(q->queue_lock, flags);
queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
+   wake_up_all(>mq_freeze_wq);
spin_unlock_irqrestore(q->queue_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
@@ -793,15 +794,40 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(blk_alloc_queue);
 
-int blk_queue_enter(struct request_queue *q, bool nowait)
+/**
+ * blk_queue_enter() - try to increase q->q_usage_counter
+ * @q: request queue pointer
+ * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
+ */
+int blk_queue_enter(struct request_queue *q, unsigned int flags)
 {
+   const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
+
while (true) {
+   bool success = false;
int ret;
 
-   if (percpu_ref_tryget_live(>q_usage_counter))
+   rcu_read_lock_sched();
+   if (percpu_ref_tryget_live(>q_usage_counter)) {
+   /*
+* The code that sets the PREEMPT_ONLY flag is
+* responsible for ensuring that that flag is globally
+* visible before the queue is unfrozen.
+*/
+   if (preempt || !blk_queue_preempt_only(q)) {
+   success = true;
+   } else {
+   percpu_ref_put(>q_usage_counter);
+   WARN_ONCE("%s: Attempt to allocate non-preempt 
request in preempt-only mode.\n",
+ kobject_name(q->kobj.parent));
+   }
+   }
+   rcu_read_unlock_sched();
+
+   if (success)
return 0;
 
-   if (nowait)
+   if (flags & BLK_MQ_REQ_NOWAIT)
return -EBUSY;
 
/*
@@ -814,7 +840,8 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
smp_rmb();
 
ret = wait_event_interruptible(q->mq_freeze_wq,
-   !atomic_read(>mq_freeze_depth) ||
+   (atomic_read(>mq_freeze_depth) == 0 &&
+(preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
@@

[PATCH 5/5] scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()

2017-10-10 Thread Damien Le Moal

The three values starting at byte 8 of the Zoned Block Device
Characteristics VPD page B6h are 32 bits values, not 64bits. So use
get_unaligned_be32() to retrieve the values and not get_unaligned_be64()

Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Cc: 

Signed-off-by: Damien Le Moal 
Reviewed-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Christoph Hellwig 
---
 drivers/scsi/sd_zbc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index bbad851c1789..27793b9f54c0 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -423,15 +423,15 @@ static int sd_zbc_read_zoned_characteristics(struct 
scsi_disk *sdkp,
if (sdkp->device->type != TYPE_ZBC) {
/* Host-aware */
sdkp->urswrz = 1;
-   sdkp->zones_optimal_open = get_unaligned_be64([8]);
-   sdkp->zones_optimal_nonseq = get_unaligned_be64([12]);
+   sdkp->zones_optimal_open = get_unaligned_be32([8]);
+   sdkp->zones_optimal_nonseq = get_unaligned_be32([12]);
sdkp->zones_max_open = 0;
} else {
/* Host-managed */
sdkp->urswrz = buf[4] & 1;
sdkp->zones_optimal_open = 0;
sdkp->zones_optimal_nonseq = 0;
-   sdkp->zones_max_open = get_unaligned_be64([16]);
+   sdkp->zones_max_open = get_unaligned_be32([16]);
}
 
return 0;
-- 
2.13.6

[PATCH 4/5] scsi: sd_zbc: Use well defined macros

2017-10-10 Thread Damien Le Moal

instead of open coding, use the min() macro to calculate a report zones
reply buffer length in sd_zbc_check_zone_size() and the round_up()
macro for calculating the number of zones in sd_zbc_setup().

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
---
 drivers/scsi/sd_zbc.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 7dbaf920679e..bbad851c1789 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -475,7 +475,7 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, 
unsigned char *buf)
return 0;
 }
 
-#define SD_ZBC_BUF_SIZE 131072
+#define SD_ZBC_BUF_SIZE 131072U
 
 /**
  * sd_zbc_check_zone_size - Check the device zone sizes
@@ -526,10 +526,7 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
/* Parse REPORT ZONES header */
list_length = get_unaligned_be32([0]) + 64;
rec = buf + 64;
-   if (list_length < SD_ZBC_BUF_SIZE)
-   buf_len = list_length;
-   else
-   buf_len = SD_ZBC_BUF_SIZE;
+   buf_len = min(list_length, SD_ZBC_BUF_SIZE);
 
/* Parse zone descriptors */
while (rec < buf + buf_len) {
@@ -599,9 +596,8 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
/* chunk_sectors indicates the zone size */
blk_queue_chunk_sectors(sdkp->disk->queue,
logical_to_sectors(sdkp->device, sdkp->zone_blocks));
-   sdkp->nr_zones = sdkp->capacity >> sdkp->zone_shift;
-   if (sdkp->capacity & (sdkp->zone_blocks - 1))
-   sdkp->nr_zones++;
+   sdkp->nr_zones =
+   round_up(sdkp->capacity, sdkp->zone_blocks) >> sdkp->zone_shift;
 
if (!sdkp->zones_wlock) {
sdkp->zones_wlock = kcalloc(BITS_TO_LONGS(sdkp->nr_zones),
-- 
2.13.6

[PATCH 3/5] scsi: sd_zbc: Rearrange code

2017-10-10 Thread Damien Le Moal

Rearrange sd_zbc_setup() to include use_16_for_rw and use_10_for_rw
assignments and move the calculation of sdkp->zone_shift together
with the assignment of the verified zone_blocks value in
sd_zbc_check_zone_size().

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Bart Van Assche 
---
 drivers/scsi/sd_zbc.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 023f705ae235..7dbaf920679e 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -584,6 +584,7 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
}
 
sdkp->zone_blocks = zone_blocks;
+   sdkp->zone_shift = ilog2(zone_blocks);
 
return 0;
 }
@@ -591,10 +592,13 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
 static int sd_zbc_setup(struct scsi_disk *sdkp)
 {
 
+   /* READ16/WRITE16 is mandatory for ZBC disks */
+   sdkp->device->use_16_for_rw = 1;
+   sdkp->device->use_10_for_rw = 0;
+
/* chunk_sectors indicates the zone size */
blk_queue_chunk_sectors(sdkp->disk->queue,
logical_to_sectors(sdkp->device, sdkp->zone_blocks));
-   sdkp->zone_shift = ilog2(sdkp->zone_blocks);
sdkp->nr_zones = sdkp->capacity >> sdkp->zone_shift;
if (sdkp->capacity & (sdkp->zone_blocks - 1))
sdkp->nr_zones++;
@@ -657,10 +661,6 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned 
char *buf)
if (ret)
goto err;
 
-   /* READ16/WRITE16 is mandatory for ZBC disks */
-   sdkp->device->use_16_for_rw = 1;
-   sdkp->device->use_10_for_rw = 0;
-
return 0;
 
 err:
-- 
2.13.6

[PATCH 2/5] scsi: sd_zbc: Fix comments and indentation

2017-10-10 Thread Damien Le Moal

Fix comments style (use kernel-doc style) and content to clarify some
functions. Also fix some functions signature indentation and remove a
useless blank line in sd_zbc_read_zones().

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
---
 drivers/scsi/scsi_lib.c |   5 ++-
 drivers/scsi/sd_zbc.c   | 117 +---
 2 files changed, 104 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..c72b97a74906 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1752,7 +1752,10 @@ static void scsi_done(struct scsi_cmnd *cmd)
  *
  * Returns: Nothing
  *
- * Lock status: IO request lock assumed to be held when called.
+ * Lock status: request queue lock assumed to be held when called.
+ *
+ * Note: See sd_zbc.c sd_zbc_write_lock_zone() for write order
+ * protection for ZBC disks.
  */
 static void scsi_request_fn(struct request_queue *q)
__releases(q->queue_lock)
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 692c8cbc7ed8..023f705ae235 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -32,10 +32,14 @@
 #include "sd.h"
 
 /**
- * Convert a zone descriptor to a zone struct.
+ * sd_zbc_parse_report - Convert a zone descriptor to a struct blk_zone,
+ * @sdkp: The disk the report originated from
+ * @buf: Address of the report zone descriptor
+ * @zone: the destination zone structure
+ *
+ * All LBA sized values are converted to 512B sectors unit.
  */
-static void sd_zbc_parse_report(struct scsi_disk *sdkp,
-   u8 *buf,
+static void sd_zbc_parse_report(struct scsi_disk *sdkp, u8 *buf,
struct blk_zone *zone)
 {
struct scsi_device *sdp = sdkp->device;
@@ -58,7 +62,13 @@ static void sd_zbc_parse_report(struct scsi_disk *sdkp,
 }
 
 /**
- * Issue a REPORT ZONES scsi command.
+ * sd_zbc_report_zones - Issue a REPORT ZONES scsi command.
+ * @sdkp: The target disk
+ * @buf: Buffer to use for the reply
+ * @buflen: the buffer size
+ * @lba: Start LBA of the report
+ *
+ * For internal use during device validation.
  */
 static int sd_zbc_report_zones(struct scsi_disk *sdkp, unsigned char *buf,
   unsigned int buflen, sector_t lba)
@@ -99,6 +109,12 @@ static int sd_zbc_report_zones(struct scsi_disk *sdkp, 
unsigned char *buf,
return 0;
 }
 
+/**
+ * sd_zbc_setup_report_cmnd - Prepare a REPORT ZONES scsi command
+ * @cmd: The command to setup
+ *
+ * Call in sd_init_command() for a REQ_OP_ZONE_REPORT request.
+ */
 int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
 {
struct request *rq = cmd->request;
@@ -141,6 +157,14 @@ int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
return BLKPREP_OK;
 }
 
+/**
+ * sd_zbc_report_zones_complete - Process a REPORT ZONES scsi command reply.
+ * @scmd: The completed report zones command
+ * @good_bytes: reply size in bytes
+ *
+ * Convert all reported zone descriptors to struct blk_zone. The conversion
+ * is done in-place, directly in the request specified sg buffer.
+ */
 static void sd_zbc_report_zones_complete(struct scsi_cmnd *scmd,
 unsigned int good_bytes)
 {
@@ -196,17 +220,32 @@ static void sd_zbc_report_zones_complete(struct scsi_cmnd 
*scmd,
local_irq_restore(flags);
 }
 
+/**
+ * sd_zbc_zone_sectors - Get the device zone size in number of 512B sectors.
+ * @sdkp: The target disk
+ */
 static inline sector_t sd_zbc_zone_sectors(struct scsi_disk *sdkp)
 {
return logical_to_sectors(sdkp->device, sdkp->zone_blocks);
 }
 
+/**
+ * sd_zbc_zone_no - Get the number of the zone conataining a sector.
+ * @sdkp: The target disk
+ * @sector: 512B sector address contained in the zone
+ */
 static inline unsigned int sd_zbc_zone_no(struct scsi_disk *sdkp,
  sector_t sector)
 {
return sectors_to_logical(sdkp->device, sector) >> sdkp->zone_shift;
 }
 
+/**
+ * sd_zbc_setup_reset_cmnd - Prepare a RESET WRITE POINTER scsi command.
+ * @cmd: the command to setup
+ *
+ * Called from sd_init_command() for a REQ_OP_ZONE_RESET request.
+ */
 int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
 {
struct request *rq = cmd->request;
@@ -239,6 +278,23 @@ int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
return BLKPREP_OK;
 }
 
+/**
+ * sd_zbc_write_lock_zone - Write lock a sequential zone.
+ * @cmd: write command
+ *
+ * Called from sd_init_cmd() for write requests (standard write, write same or
+ * write zeroes operations). If the request target zone is not already locked,
+ * the zone is locked and BLKPREP_OK returned, allowing the request to proceed
+ * through dispatch in scsi_request_fn(). Otherwise, BLKPREP_DEFER is returned,
+ * forcing the request to wait for the zone to be unlocked, that is, for the
+ *

[PATCH 0/5] ZBC support clenaup and fixes

2017-10-10 Thread Damien Le Moal

Martin,

First part of the "scsi-mq support for ZBC disks" series that is only scsi
specific. These patches are just cleanups with only one bug fix (last pacth)
and do not functionally change anything.

The remaining rework of the zone locking including scsi-mq support is still
ongoing and will be sent later.

Please consider these for addition to 4.15.

Damien Le Moal (5):
  scsi: sd_zbc: Move ZBC declarations to scsi_proto.h
  scsi: sd_zbc: Fix comments and indentation
  scsi: sd_zbc: Rearrange code
  scsi: sd_zbc: Use well defined macros
  scsi: sd_zbc: Fix sd_zbc_read_zoned_characteristics()

 drivers/scsi/scsi_lib.c   |   5 +-
 drivers/scsi/sd_zbc.c | 169 ++
 include/scsi/scsi_proto.h |  45 +---
 3 files changed, 150 insertions(+), 69 deletions(-)

-- 
2.13.6

[PATCH 1/5] scsi: sd_zbc: Move ZBC declarations to scsi_proto.h

2017-10-10 Thread Damien Le Moal

Move standard macro definitions for the zone types and zone conditions
to scsi_proto.h together with the definitions related to the
REPORT ZONES command. While at it, define all values in the enums to
be clear.

Also remove unnecessary includes in sd_zbc.c.

No functional change is introduced by this patch.

Signed-off-by: Damien Le Moal 
Reviewed-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Christoph Hellwig 
---
 drivers/scsi/sd_zbc.c | 24 
 include/scsi/scsi_proto.h | 45 ++---
 2 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 8aa54779aac1..692c8cbc7ed8 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -28,32 +28,8 @@
 
 #include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
 
 #include "sd.h"
-#include "scsi_priv.h"
-
-enum zbc_zone_type {
-   ZBC_ZONE_TYPE_CONV = 0x1,
-   ZBC_ZONE_TYPE_SEQWRITE_REQ,
-   ZBC_ZONE_TYPE_SEQWRITE_PREF,
-   ZBC_ZONE_TYPE_RESERVED,
-};
-
-enum zbc_zone_cond {
-   ZBC_ZONE_COND_NO_WP,
-   ZBC_ZONE_COND_EMPTY,
-   ZBC_ZONE_COND_IMP_OPEN,
-   ZBC_ZONE_COND_EXP_OPEN,
-   ZBC_ZONE_COND_CLOSED,
-   ZBC_ZONE_COND_READONLY = 0xd,
-   ZBC_ZONE_COND_FULL,
-   ZBC_ZONE_COND_OFFLINE,
-};
 
 /**
  * Convert a zone descriptor to a zone struct.
diff --git a/include/scsi/scsi_proto.h b/include/scsi/scsi_proto.h
index 8c285d9a06d8..39130a9c05bf 100644
--- a/include/scsi/scsi_proto.h
+++ b/include/scsi/scsi_proto.h
@@ -301,19 +301,42 @@ struct scsi_lun {
 
 /* Reporting options for REPORT ZONES */
 enum zbc_zone_reporting_options {
-   ZBC_ZONE_REPORTING_OPTION_ALL = 0,
-   ZBC_ZONE_REPORTING_OPTION_EMPTY,
-   ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN,
-   ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN,
-   ZBC_ZONE_REPORTING_OPTION_CLOSED,
-   ZBC_ZONE_REPORTING_OPTION_FULL,
-   ZBC_ZONE_REPORTING_OPTION_READONLY,
-   ZBC_ZONE_REPORTING_OPTION_OFFLINE,
-   ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
-   ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE,
-   ZBC_ZONE_REPORTING_OPTION_NON_WP = 0x3f,
+   ZBC_ZONE_REPORTING_OPTION_ALL   = 0x00,
+   ZBC_ZONE_REPORTING_OPTION_EMPTY = 0x01,
+   ZBC_ZONE_REPORTING_OPTION_IMPLICIT_OPEN = 0x02,
+   ZBC_ZONE_REPORTING_OPTION_EXPLICIT_OPEN = 0x03,
+   ZBC_ZONE_REPORTING_OPTION_CLOSED= 0x04,
+   ZBC_ZONE_REPORTING_OPTION_FULL  = 0x05,
+   ZBC_ZONE_REPORTING_OPTION_READONLY  = 0x06,
+   ZBC_ZONE_REPORTING_OPTION_OFFLINE   = 0x07,
+   /* 0x08 to 0x0f are reserved */
+   ZBC_ZONE_REPORTING_OPTION_NEED_RESET_WP = 0x10,
+   ZBC_ZONE_REPORTING_OPTION_NON_SEQWRITE  = 0x11,
+   /* 0x12 to 0x3e are reserved */
+   ZBC_ZONE_REPORTING_OPTION_NON_WP= 0x3f,
 };
 
 #define ZBC_REPORT_ZONE_PARTIAL 0x80
 
+/* Zone types of REPORT ZONES zone descriptors */
+enum zbc_zone_type {
+   ZBC_ZONE_TYPE_CONV  = 0x1,
+   ZBC_ZONE_TYPE_SEQWRITE_REQ  = 0x2,
+   ZBC_ZONE_TYPE_SEQWRITE_PREF = 0x3,
+   /* 0x4 to 0xf are reserved */
+};
+
+/* Zone conditions of REPORT ZONES zone descriptors */
+enum zbc_zone_cond {
+   ZBC_ZONE_COND_NO_WP = 0x0,
+   ZBC_ZONE_COND_EMPTY = 0x1,
+   ZBC_ZONE_COND_IMP_OPEN  = 0x2,
+   ZBC_ZONE_COND_EXP_OPEN  = 0x3,
+   ZBC_ZONE_COND_CLOSED= 0x4,
+   /* 0x5 to 0xc are reserved */
+   ZBC_ZONE_COND_READONLY  = 0xd,
+   ZBC_ZONE_COND_FULL  = 0xe,
+   ZBC_ZONE_COND_OFFLINE   = 0xf,
+};
+
 #endif /* _SCSI_PROTO_H_ */
-- 
2.13.6

Re: [PATCH] scsi: logging_level: update bits description

2017-10-10 Thread Kyle Fortin

Hi Randy,

On Oct 10, 2017, at 3:05 PM, Randy Dunlap  wrote:
> 
> From: Randy Dunlap 
> 
> Update the description of 'scsi_logging_level' from 8 4-bit nibbles
> to the (pre-git) reality of 10 3-bit 'nibbles'.
> 
> Signed-off-by: Randy Dunlap 
> ---
> drivers/scsi/scsi_logging.h |8 
> 1 file changed, 4 insertions(+), 4 deletions(-)
> 
> --- lnx-414-rc3.orig/drivers/scsi/scsi_logging.h
> +++ lnx-414-rc3/drivers/scsi/scsi_logging.h
> @@ -3,10 +3,10 @@
> 
> 
> /*
> - * This defines the scsi logging feature.  It is a means by which the user
> - * can select how much information they get about various goings on, and it
> - * can be really useful for fault tracing.  The logging word is divided into
> - * 8 nibbles, each of which describes a loglevel.  The division of things is
> + * This defines the scsi logging feature.  It is a means by which the user 
> can
> + * select how much information they get about various goings on, and it can 
> be
> + * really useful for fault tracing.  The logging word is divided into 10 
> 3-bit
> + * 'nibbles', each of which describes a loglevel.  The division of things is

I think ‘bitfields' is more appropriate than ‘nibbles’ (a 4-bit construct in 
compute).

>  * somewhat arbitrary, and the division of the word could be changed if it
>  * were really needed for any reason.  The numbers below are the only place
>  * where these are specified.  For a first go-around, 3 bits is more than

Reviewed-by: Kyle Fortin

[PATCH] qla2xxx: Fix uninitialize work element

2017-10-10 Thread Himanshu Madhani

From: Quinn Tran 

Fixes following stack trace

kernel: Call Trace:
kernel: dump_stack+0x63/0x84
kernel: __warn+0xd1/0xf0
kernel: warn_slowpath_null+0x1d/0x20
kernel: __queue_work+0x37a/0x420
kernel: queue_work_on+0x27/0x40
kernel: queue_work+0x14/0x20 [qla2xxx]
kernel: schedule_work+0x13/0x20 [qla2xxx]
kernel: qla2x00_post_work+0xab/0xb0 [qla2xxx]
kernel: qla2x00_post_aen_work+0x3b/0x50 [qla2xxx]
kernel: qla2x00_async_event+0x20d/0x15d0 [qla2xxx]
kernel: ? lock_timer_base+0x7d/0xa0
kernel: qla24xx_intr_handler+0x1da/0x310 [qla2xxx]
kernel: qla2x00_poll+0x36/0x60 [qla2xxx]
kernel: qla2x00_mailbox_command+0x659/0xec0 [qla2xxx]
kernel: ? proc_create_data+0x7a/0xd0
kernel: qla25xx_init_rsp_que+0x15b/0x240 [qla2xxx]
kernel: ? request_irq+0x14/0x20 [qla2xxx]
kernel: qla25xx_create_rsp_que+0x256/0x3c0 [qla2xxx]
kernel: qla2xxx_create_qpair+0x2af/0x5b0 [qla2xxx]
kernel: qla2x00_probe_one+0x1107/0x1c30 [qla2xxx]

Fixes: ec7193e26055 ("qla2xxx: Fix delayed response to command for loop 
mode/direct connect.")
Cc:  # 4.13
Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
---
Hi Martin,

Please apply this patch to 4.14.0-rc5.  User will see call stack everytime 
qla2xxx driver is loaded with kernel where original patch was added. 

Thanks,
Himanshu
---
 drivers/scsi/qla2xxx/qla_os.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 5b2437a5ea44..937209805baf 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -3175,6 +3175,8 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
host->can_queue, base_vha->req,
base_vha->mgmt_svr_loop_id, host->sg_tablesize);
 
+   INIT_WORK(_vha->iocb_work, qla2x00_iocb_work_fn);
+
if (ha->mqenable) {
bool mq = false;
bool startit = false;
@@ -3223,7 +3225,6 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
 */
qla2xxx_wake_dpc(base_vha);
 
-   INIT_WORK(_vha->iocb_work, qla2x00_iocb_work_fn);
INIT_WORK(>board_disable, qla2x00_disable_board_on_pci_error);
 
if (IS_QLA8031(ha) || IS_MCTP_CAPABLE(ha)) {
-- 
2.12.0

[PATCH] scsi: logging_level: update bits description

2017-10-10 Thread Randy Dunlap

From: Randy Dunlap 

Update the description of 'scsi_logging_level' from 8 4-bit nibbles
to the (pre-git) reality of 10 3-bit 'nibbles'.

Signed-off-by: Randy Dunlap 
---
 drivers/scsi/scsi_logging.h |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- lnx-414-rc3.orig/drivers/scsi/scsi_logging.h
+++ lnx-414-rc3/drivers/scsi/scsi_logging.h
@@ -3,10 +3,10 @@
 
 
 /*
- * This defines the scsi logging feature.  It is a means by which the user
- * can select how much information they get about various goings on, and it
- * can be really useful for fault tracing.  The logging word is divided into
- * 8 nibbles, each of which describes a loglevel.  The division of things is
+ * This defines the scsi logging feature.  It is a means by which the user can
+ * select how much information they get about various goings on, and it can be
+ * really useful for fault tracing.  The logging word is divided into 10 3-bit
+ * 'nibbles', each of which describes a loglevel.  The division of things is
  * somewhat arbitrary, and the division of the word could be changed if it
  * were really needed for any reason.  The numbers below are the only place
  * where these are specified.  For a first go-around, 3 bits is more than

Re: [PATCH V6 5/5] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed

2017-10-10 Thread Omar Sandoval

On Mon, Oct 09, 2017 at 07:24:24PM +0800, Ming Lei wrote:
> During dispatching, we moved all requests from hctx->dispatch to
> one temporary list, then dispatch them one by one from this list.
> Unfortunately during this period, run queue from other contexts
> may think the queue is idle, then start to dequeue from sw/scheduler
> queue and still try to dispatch because ->dispatch is empty. This way
> hurts sequential I/O performance because requests are dequeued when
> lld queue is busy.
> 
> This patch introduces the state of BLK_MQ_S_DISPATCH_BUSY to
> make sure that request isn't dequeued until ->dispatch is
> flushed.
> 
> Reviewed-by: Bart Van Assche 
> Reviewed-by: Christoph Hellwig 

I think this will do for now.

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  block/blk-mq-debugfs.c |  1 +
>  block/blk-mq-sched.c   | 38 --
>  block/blk-mq.c |  5 +
>  include/linux/blk-mq.h |  1 +
>  4 files changed, 39 insertions(+), 6 deletions(-)

Re: [PATCH V6 4/5] blk-mq-sched: improve dispatching from sw queue

2017-10-10 Thread Omar Sandoval

On Mon, Oct 09, 2017 at 07:24:23PM +0800, Ming Lei wrote:
> SCSI devices use host-wide tagset, and the shared driver tag space is
> often quite big. Meantime there is also queue depth for each lun(
> .cmd_per_lun), which is often small, for example, on both lpfc and
> qla2xxx, .cmd_per_lun is just 3.
> 
> So lots of requests may stay in sw queue, and we always flush all
> belonging to same hw queue and dispatch them all to driver, unfortunately
> it is easy to cause queue busy because of the small .cmd_per_lun.
> Once these requests are flushed out, they have to stay in hctx->dispatch,
> and no bio merge can participate into these requests, and sequential IO
> performance is hurt a lot.
> 
> This patch introduces blk_mq_dequeue_from_ctx for dequeuing request from
> sw queue so that we can dispatch them in scheduler's way, then we can
> avoid to dequeue too many requests from sw queue when ->dispatch isn't
> flushed completely.
> 
> This patch improves dispatching from sw queue when there is per-request-queue
> queue depth by taking request one by one from sw queue, just like the way
> of IO scheduler.

This still didn't address Jens' concern about using q->queue_depth as
the heuristic for whether to do the full sw queue flush or one-by-one
dispatch. The EWMA approach is a bit too complex for now, can you please
try the heuristic of whether the driver ever returned BLK_STS_RESOURCE?

> Reviewed-by: Omar Sandoval 
> Reviewed-by: Bart Van Assche 
> Signed-off-by: Ming Lei 
> ---
>  block/blk-mq-sched.c   | 50 
> --
>  block/blk-mq.c | 39 +++
>  block/blk-mq.h |  2 ++
>  include/linux/blk-mq.h |  2 ++
>  4 files changed, 91 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index be29ba849408..14b354f617e5 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -104,6 +104,39 @@ static void blk_mq_do_dispatch_sched(struct 
> blk_mq_hw_ctx *hctx)
>   } while (blk_mq_dispatch_rq_list(q, _list));
>  }
>  
> +static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> +   struct blk_mq_ctx *ctx)
> +{
> + unsigned idx = ctx->index_hw;
> +
> + if (++idx == hctx->nr_ctx)
> + idx = 0;
> +
> + return hctx->ctxs[idx];
> +}
> +
> +static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> +{
> + struct request_queue *q = hctx->queue;
> + LIST_HEAD(rq_list);
> + struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> +
> + do {
> + struct request *rq;
> +
> + rq = blk_mq_dequeue_from_ctx(hctx, ctx);
> + if (!rq)
> + break;
> + list_add(>queuelist, _list);
> +
> + /* round robin for fair dispatch */
> + ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> +
> + } while (blk_mq_dispatch_rq_list(q, _list));
> +
> + WRITE_ONCE(hctx->dispatch_from, ctx);
> +}
> +
>  void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
>  {
>   struct request_queue *q = hctx->queue;
> @@ -143,10 +176,23 @@ void blk_mq_sched_dispatch_requests(struct 
> blk_mq_hw_ctx *hctx)
>*/
>   if (!list_empty(_list)) {
>   blk_mq_sched_mark_restart_hctx(hctx);
> - if (blk_mq_dispatch_rq_list(q, _list) && has_sched_dispatch)
> - blk_mq_do_dispatch_sched(hctx);
> + if (blk_mq_dispatch_rq_list(q, _list)) {
> + if (has_sched_dispatch)
> + blk_mq_do_dispatch_sched(hctx);
> + else
> + blk_mq_do_dispatch_ctx(hctx);
> + }
>   } else if (has_sched_dispatch) {
>   blk_mq_do_dispatch_sched(hctx);
> + } else if (q->queue_depth) {
> + /*
> +  * If there is per-request_queue depth, we dequeue
> +  * request one by one from sw queue for avoiding to mess
> +  * up I/O merge when dispatch runs out of resource, which
> +  * can be triggered easily when there is per-request_queue
> +  * queue depth or .cmd_per_lun, such as SCSI device.
> +  */
> + blk_mq_do_dispatch_ctx(hctx);
>   } else {
>   blk_mq_flush_busy_ctxs(hctx, _list);
>   blk_mq_dispatch_rq_list(q, _list);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 076cbab9c3e0..394cb75d66fa 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -911,6 +911,45 @@ void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, 
> struct list_head *list)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_flush_busy_ctxs);
>  
> +struct dispatch_rq_data {
> + struct blk_mq_hw_ctx *hctx;
> + struct request *rq;
> +};
> +
> +static bool dispatch_rq_from_ctx(struct sbitmap *sb, unsigned int bitnr,
> +

Re: [PATCH V6 3/5] sbitmap: introduce __sbitmap_for_each_set()

2017-10-10 Thread Omar Sandoval

On Mon, Oct 09, 2017 at 07:24:22PM +0800, Ming Lei wrote:
> We need to iterate ctx starting from any ctx in round robin
> way, so introduce this helper.
> 
> Cc: Omar Sandoval 

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  include/linux/sbitmap.h | 64 
> -
>  1 file changed, 47 insertions(+), 17 deletions(-)

Re: [PATCH V6 1/5] blk-mq-sched: fix scheduler bad performance

2017-10-10 Thread Omar Sandoval

On Mon, Oct 09, 2017 at 07:24:20PM +0800, Ming Lei wrote:
> When hw queue is busy, we shouldn't take requests from
> scheduler queue any more, otherwise it is difficult to do
> IO merge.
> 
> This patch fixes the awful IO performance on some
> SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber
> is used by not taking requests if hw queue is busy.
> 
> Reviewed-by: Bart Van Assche 
> Reviewed-by: Christoph Hellwig 

Reviewed-by: Omar Sandoval 

> Signed-off-by: Ming Lei 
> ---
>  block/blk-mq-sched.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 4ab69435708c..eca011fdfa0e 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -94,7 +94,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx 
> *hctx)
>   struct request_queue *q = hctx->queue;
>   struct elevator_queue *e = q->elevator;
>   const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
> - bool did_work = false;
> + bool do_sched_dispatch = true;
>   LIST_HEAD(rq_list);
>  
>   /* RCU or SRCU read lock is needed before checking quiesced flag */
> @@ -125,18 +125,18 @@ void blk_mq_sched_dispatch_requests(struct 
> blk_mq_hw_ctx *hctx)
>*/
>   if (!list_empty(_list)) {
>   blk_mq_sched_mark_restart_hctx(hctx);
> - did_work = blk_mq_dispatch_rq_list(q, _list);
> + do_sched_dispatch = blk_mq_dispatch_rq_list(q, _list);
>   } else if (!has_sched_dispatch) {
>   blk_mq_flush_busy_ctxs(hctx, _list);
>   blk_mq_dispatch_rq_list(q, _list);
>   }
>  
>   /*
> -  * We want to dispatch from the scheduler if we had no work left
> -  * on the dispatch list, OR if we did have work but weren't able
> -  * to make progress.
> +  * We want to dispatch from the scheduler if there was nothing
> +  * on the dispatch list or we were able to dispatch from the
> +  * dispatch list.
>*/
> - if (!did_work && has_sched_dispatch) {
> + if (do_sched_dispatch && has_sched_dispatch) {
>   do {
>   struct request *rq;
>  
> -- 
> 2.9.5
>

Re: [PATCH v7 9/9] block, scsi: Make SCSI quiesce and resume work reliably

2017-10-10 Thread Bart Van Assche

On Tue, 2017-10-10 at 18:56 +0800, Ming Lei wrote:
> On Mon, Oct 09, 2017 at 04:14:00PM -0700, Bart Van Assche wrote:
> > [ ... ]
> >  int
> >  scsi_device_quiesce(struct scsi_device *sdev)
> >  {
> > +   struct request_queue *q = sdev->request_queue;
> > int err;
> >  
> > +   /* If the SCSI device already has been quiesced, do nothing. */
> > +   if (blk_set_preempt_only(q))
> > +   return 0;
> 
> This way isn't safe:
> 
> 1) suppose path1 sets the flag, and blk_mq_freeze_queue() isn't
> finished, or even not started;
> 
> 2) path2 sees the flag set at the exact time, and returns immediately,
> and unfortunately SCSI QUIESCE isn't respected in this context.

That comment applies to concurrent invocations of scsi_device_quiesce() only.
I think concurrent calls of scsi_device_quiesce() can only occur when power
management suspends or
hibernates a system that is equipped with a parallel
port and on which SCSI parallel domain validation occurs. I think that's a
very unlikely combination. And if we have to
address this, I propose to
disable power management during SCSI parallel domain validation, e.g. by
locking pm_mutex before SPI DV starts and by unlocking that mutex after SPI
DV
has finished. I think that will result in code that is easier to review
than the approach you proposed.

> > +   blk_mq_freeze_queue(q);
> > +   blk_mq_unfreeze_queue(q);
> 
> This way isn't safe too, because queue may has been frozen before
> scsi_device_quiesce() is run, then there isn't synchronize_rcu()
> implicated in blk_mq_freeze_queue().

It's very unlikely that this will cause trouble in practice. Anyway, the
previous version of this code did not show this race so I will change this
code fragment back to what I had in the previous version.

Bart.

Re: system hung up when offlining CPUs

2017-10-10 Thread YASUAKI ISHIMATSU

Hi Thomas,

Sorry for the late reply.

I'll apply the patches and retest in this week.
Please wait a while.

Thanks,
Yasuaki Ishimatsu

On 10/04/2017 05:04 PM, Thomas Gleixner wrote:
> On Tue, 3 Oct 2017, Thomas Gleixner wrote:
>> Can you please apply the debug patch below.
> 
> I found an issue with managed interrupts when the affinity mask of an
> managed interrupt spawns multiple CPUs. Explanation in the changelog
> below. I'm not sure that this cures the problems you have, but at least I
> could prove that it's not doing what it should do. The failure I'm seing is
> fixed, but I can't test that megasas driver due to -ENOHARDWARE.
> 
> Can you please apply the patch below on top of Linus tree and retest?
> 
> Please send me the outputs I asked you to provide last time in any case
> (success or fail).
> 
> @block/scsi folks: Can you please run that through your tests as well?
> 
> Thanks,
> 
>   tglx
> 
> 8<---
> Subject: genirq/cpuhotplug: Enforce affinity setting on startup of managed 
> irqs
> From: Thomas Gleixner 
> Date: Wed, 04 Oct 2017 21:07:38 +0200
> 
> Managed interrupts can end up in a stale state on CPU hotplug. If the
> interrupt is not targeting a single CPU, i.e. the affinity mask spawns
> multiple CPUs then the following can happen:
> 
> After boot:
> 
> dstate:   0x01601200
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 24
> pending:  0
> 
> After offlining CPU 31 - 24
> 
> dstate:   0x01a31000
> IRQD_IRQ_DISABLED
> IRQD_IRQ_MASKED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_AFFINITY_MANAGED
> IRQD_MANAGED_SHUTDOWN
> node: 0
> affinity: 24-31
> effectiv: 24
> pending:  0
> 
> Now CPU 25 gets onlined again, so it should get the effective interrupt
> affinity for this interruopt, but due to the x86 interrupt affinity setter
> restrictions this ends up after restarting the interrupt with:
> 
> dstate:   0x01601300
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_SETAFFINITY_PENDING
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 24
> pending:  24-31
> 
> So the interrupt is still affine to CPU 24, which was the last CPU to go
> offline of that affinity set and the move to an online CPU within 24-31,
> in this case 25, is pending. This mechanism is x86/ia64 specific as those
> architectures cannot move interrupts from thread context and do this when
> an interrupt is actually handled. So the move is set to pending.
> 
> Whats worse is that offlining CPU 25 again results in:
> 
> dstate:   0x01601300
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_SET
> IRQD_SETAFFINITY_PENDING
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 24
> pending:  24-31
> 
> This means the interrupt has not been shut down, because the outgoing CPU
> is not in the effective affinity mask, but of course nothing notices that
> the effective affinity mask is pointing at an offline CPU.
> 
> In the case of restarting a managed interrupt the move restriction does not
> apply, so the affinity setting can be made unconditional. This needs to be
> done _before_ the interrupt is started up as otherwise the condition for
> moving it from thread context would not longer be fulfilled.
> 
> With that change applied onlining CPU 25 after offlining 31-24 results in:
> 
> dstate:   0x01600200
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_MANAGED
> node: 0
> affinity: 24-31
> effectiv: 25
> pending:  
> 
> And after offlining CPU 25:
> 
> dstate:   0x01a3
> IRQD_IRQ_DISABLED
> IRQD_IRQ_MASKED
> IRQD_SINGLE_TARGET
> IRQD_AFFINITY_MANAGED
> IRQD_MANAGED_SHUTDOWN
> node: 0
> affinity: 24-31
> effectiv: 25
> pending:  
> 
> which is the correct and expected result.
> 
> To complete that, add some debug code to catch this kind of situation in
> the cpu offline code and warn about interrupt chips which allow affinity
> setting and do not update the effective affinity mask if that feature is
> enabled.
> 
> Reported-by: YASUAKI ISHIMATSU 
> Signed-off-by: Thomas Gleixner 
> 
> ---
>  kernel/irq/chip.c   |2 +-
>  kernel/irq/cpuhotplug.c |   28 +++-
>  kernel/irq/manage.c |   17 +
>  3 files changed, 45 insertions(+), 2 deletions(-)
> 
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -265,8 +265,8 @@ int irq_startup(struct irq_desc *desc, b
>

Re: [PATCH] scsi: use set_host_byte instead of open-coding it

2017-10-10 Thread Bart Van Assche

On Tue, 2017-10-10 at 17:29 +0200, Johannes Thumshirn wrote:
> Call set_host_byte() instead of open-coding it.
> 
> Converted using this simple Coccinelle spatch
> 
> 
> @@
> local idexpression struct scsi_cmnd *c;
> expression E1;
> @@
> 
> - c->result = E1 << 16;
> + set_host_byte(c, E1);
> 

This is useful but I think we should take this a step further. How
about the following approach:
- Introduce four enumeration types - one for the msg byte codes, one
  for the host byte codes, one for the driver byte codes and one for
  SCSI sense codes.
- Update the signatures of functions that store or extract these four
  bytes.
- Change the data type of scsi_cmnd.result into something that makes
  regular assignments trigger a compiler error.

I'm proposing this because there are several drivers that do not assign the
SCSI host byte correctly:
$ git grep -nH '>result = DID_' | grep -v '<< *16' | grep -v ': \*'
drivers/scsi/ips.c:953: scsi_cmd->result = DID_ERROR;
drivers/scsi/ips.c:2657:SC->result = DID_OK;
drivers/scsi/libiscsi.c:1731:   sc->result = DID_REQUEUE;
drivers/scsi/qla2xxx/qla_bsg.c:152: bsg_reply->result = 
DID_OK;
drivers/scsi/qla2xxx/qla_bsg.c:167: 
bsg_reply->result = DID_OK;
drivers/scsi/qla2xxx/qla_bsg.c:184: bsg_reply->result = DID_OK;
drivers/scsi/qla2xxx/qla_bsg.c:236: bsg_reply->result = DID_OK;
drivers/scsi/qla2xxx/qla_bsg.c:976: bsg_reply->result = DID_OK;
drivers/scsi/qla2xxx/qla_bsg.c:1078:bsg_reply->result = DID_OK;
drivers/scsi/qla2xxx/qla_bsg.c:1261:bsg_reply->result = DID_OK;
drivers/scsi/qla2xxx/qla_bsg.c:1375:bsg_reply->result = DID_OK;
drivers/scsi/qla2xxx/qla_bsg.c:1480:bsg_reply->result = DID_OK;
drivers/scsi/qla2xxx/qla_bsg.c:1516:bsg_reply->result = DID_OK;
drivers/scsi/qlogicpti.c:1049:  Cmnd->result = DID_BUS_BUSY;
drivers/scsi/stex.c:615:cmd->result = DID_NO_CONNECT;

Bart.

[PATCH v3] scsi: fc: check for rport presence in fc_block_scsi_eh

2017-10-10 Thread Johannes Thumshirn

Coverity-scan recently found a possible NULL pointer dereference in
fc_block_scsi_eh() as starget_to_rport() either returns the rport for
the startget or NULL.

While it is rather unlikely to have fc_block_scsi_eh() called without
an rport associated it's a good idea to catch potential misuses of the
API gracefully.

Signed-off-by: Johannes Thumshirn 
Reviewed-by: Bart Van Assche 
Reviewed-by: Hannes Reinecke 
---
Martin, I'm not sure if I have already sent it out or not if I already did
send this v3 please ignore it.

Changes since v2:
- return FAST_IO_FAIL instead of 0 (Steffen)

Changes since v1:
- s/WARN_ON/WARN_ON_ONCE/ (Bart)
---
 drivers/scsi/scsi_transport_fc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index ba9d70f8a6a1..18f56a124b6c 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3328,6 +3328,9 @@ int fc_block_scsi_eh(struct scsi_cmnd *cmnd)
 {
struct fc_rport *rport = starget_to_rport(scsi_target(cmnd->device));
 
+   if (WARN_ON_ONCE(!rport))
+   return FAST_IO_FAIL;
+
return fc_block_rport(rport);
 }
 EXPORT_SYMBOL(fc_block_scsi_eh);
-- 
2.13.5

[PATCH] scsi: use set_host_byte instead of open-coding it

2017-10-10 Thread Johannes Thumshirn

Call set_host_byte() instead of open-coding it.

Converted using this simple Coccinelle spatch


@@
local idexpression struct scsi_cmnd *c;
expression E1;
@@

- c->result = E1 << 16;
+ set_host_byte(c, E1);


Signed-off-by: Johannes Thumshirn 
Cc: Steffen Maier 
Cc: Tejun Heo 
Cc: Bart Van Assche 
Cc: Oliver Neukum 
Cc: Alan Stern 
---
Martin, this path should be applied on top of
http://lkml.kernel.org/r/20171009113319.9505-1-jthumsh...@suse.de
the above is a bug fix and possibly stable material.
---
 drivers/ata/libata-scsi.c   |  2 +-
 drivers/infiniband/ulp/srp/ib_srp.c |  2 +-
 drivers/message/fusion/mptfc.c  |  4 +-
 drivers/message/fusion/mptsas.c |  2 +-
 drivers/message/fusion/mptscsih.c   | 53 +++---
 drivers/message/fusion/mptspi.c |  4 +-
 drivers/scsi/53c700.c   |  2 +-
 drivers/scsi/BusLogic.c | 12 ++---
 drivers/scsi/NCR5380.c  | 16 +++
 drivers/scsi/aacraid/aachba.c   | 12 ++---
 drivers/scsi/aacraid/linit.c|  4 +-
 drivers/scsi/aic7xxx/aic79xx_osm.c  |  2 +-
 drivers/scsi/aic7xxx/aic7xxx_osm.c  |  2 +-
 drivers/scsi/arcmsr/arcmsr_hba.c|  2 +-
 drivers/scsi/arm/acornscsi.c|  4 +-
 drivers/scsi/bfa/bfad_im.c  |  4 +-
 drivers/scsi/bnx2fc/bnx2fc_io.c |  6 +--
 drivers/scsi/dc395x.c   |  8 ++--
 drivers/scsi/eata.c |  4 +-
 drivers/scsi/eata_pio.c |  4 +-
 drivers/scsi/esas2r/esas2r_main.c   |  8 ++--
 drivers/scsi/esp_scsi.c |  6 +--
 drivers/scsi/fnic/fnic_scsi.c   | 12 ++---
 drivers/scsi/hpsa.c | 46 ++--
 drivers/scsi/hptiop.c   |  2 +-
 drivers/scsi/ibmvscsi/ibmvfc.c  |  2 +-
 drivers/scsi/imm.c  |  2 +-
 drivers/scsi/ips.c  | 16 +++
 drivers/scsi/libfc/fc_fcp.c | 10 ++---
 drivers/scsi/libiscsi.c | 22 +-
 drivers/scsi/libsas/sas_scsi_host.c |  6 +--
 drivers/scsi/megaraid/megaraid_sas_base.c   | 12 ++---
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 12 ++---
 drivers/scsi/mesh.c |  4 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c| 58 -
 drivers/scsi/nsp32.c| 32 +++---
 drivers/scsi/pcmcia/nsp_cs.c| 10 ++---
 drivers/scsi/pcmcia/sym53c500_cs.c  | 10 ++---
 drivers/scsi/ppa.c  |  4 +-
 drivers/scsi/ps3rom.c   |  4 +-
 drivers/scsi/qedf/qedf_io.c |  8 ++--
 drivers/scsi/qla2xxx/qla_iocb.c |  4 +-
 drivers/scsi/qla2xxx/qla_isr.c  |  2 +-
 drivers/scsi/qla2xxx/qla_os.c   | 16 +++
 drivers/scsi/qla4xxx/ql4_isr.c  | 22 +-
 drivers/scsi/qla4xxx/ql4_os.c   |  6 +--
 drivers/scsi/qlogicfas408.c |  2 +-
 drivers/scsi/qlogicpti.c|  2 +-
 drivers/scsi/scsi_lib.c |  4 +-
 drivers/scsi/snic/snic_scsi.c   |  2 +-
 drivers/scsi/stex.c |  2 +-
 drivers/scsi/storvsc_drv.c  |  2 +-
 drivers/scsi/wd33c93.c  |  8 ++--
 drivers/scsi/wd719x.c   |  2 +-
 drivers/scsi/xen-scsifront.c|  2 +-
 drivers/staging/rts5208/rtsx.c  |  2 +-
 drivers/staging/rts5208/rtsx_transport.c|  4 +-
 drivers/staging/unisys/visorhba/visorhba_main.c |  8 ++--
 drivers/usb/image/microtek.c|  4 +-
 drivers/usb/storage/isd200.c| 12 ++---
 drivers/usb/storage/scsiglue.c  |  2 +-
 drivers/usb/storage/transport.c | 14 +++---
 drivers/usb/storage/uas.c   | 12 ++---
 63 files changed, 285 insertions(+), 284 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 44ba292f2cd7..37253d5f1502 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -4331,7 +4331,7 @@ static inline int __ata_scsi_queuecmd(struct scsi_cmnd 
*scmd,
  bad_cdb_len:
DPRINTK("bad CDB len=%u, scsi_op=0x%02x, max=%u\n",
scmd->cmd_len, scsi_op, dev->cdb_len);
-   scmd->result = DID_ERROR << 16;
+   set_host_byte(scmd, DID_ERROR);
scmd->scsi_done(scmd);
return 0;
 }
diff --git

Re: [PATCH v7 9/9] block, scsi: Make SCSI quiesce and resume work reliably

2017-10-10 Thread Bart Van Assche

On Tue, 2017-10-10 at 09:57 +0200, Martin Steigerwald wrote:
> Bart Van Assche - 09.10.17, 16:14:
> > The contexts from which a SCSI device can be quiesced or resumed are:
> > [ ... ]
> 
> Does this as reliably fix the issue as the patches from Ming? I mean in *real 
> world* scenarios? Or is it just about the same approach as Ming has taken.
> 
> I ask cause I don´t see any Tested-By:´s here? I know I tested Ming´s patch 
> series and I know it fixes the hang after resume from suspend with blk-mq + 
> BFQ 
> issue for me. I have an uptime of 7 days and I didn´t see any uptime even 
> remotely like that in a long time (before that issue Intel gfx drivers caused 
> hangs, but thankfully that seems fixed meanwhile).
> 
> I´d be willing to test. Do you have a 4.14.x tree available with these 
> patches 
> applied I can just add as a remote and fetch from?

Hello Martin,

A previous version of this patch series passed Oleksandr's tests. This series is
close to that previous version so I think it is safe to assume that this version
will also pass Oleksandr's tests. Anyway, more testing is definitely welcome so
if you want to verify this series please start from the following tree (Jens'
for-next branch + this series): 
https://github.com/bvanassche/linux/tree/blk-mq-pm-v7

Thanks,

Bart.

Re: [PATCH] scsi: libiscsi: fix shifting of DID_REQUEUE host byte

2017-10-10 Thread Hannes Reinecke

On 10/09/2017 01:33 PM, Johannes Thumshirn wrote:
> The SCSI host byte should be shifted left by 16 in order to have
> scsi_decide_disposition() do the right thing (.i.e. requeue the command).
> 
> Signed-off-by: Johannes Thumshirn 
> Fixes: 661134ad3765 ("[SCSI] libiscsi, bnx2i: make bound ep check common")
> Cc: Lee Duncan 
> Cc: Hannes Reinecke 
> Cc: Bart Van Assche 
> Cc: Chris Leech 
> ---
>  drivers/scsi/libiscsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index bd4605a34f54..9cba4913b43c 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1728,7 +1728,7 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct 
> scsi_cmnd *sc)
>  
>   if (test_bit(ISCSI_SUSPEND_BIT, >suspend_tx)) {
>   reason = FAILURE_SESSION_IN_RECOVERY;
> - sc->result = DID_REQUEUE;
> + sc->result = DID_REQUEUE << 16;
>   goto fault;
>   }
>  
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH V5 00/14] blk-mq-sched: improve sequential I/O performance(part 1)

2017-10-10 Thread John Garry

On 10/10/2017 14:45, Ming Lei wrote:

Hi John,

All change in V6.2 is blk-mq/scsi-mq only, which shouldn't
affect non SCSI_MQ, so I suggest you to compare the perf
between deadline and mq-deadline, like Johannes mentioned.

>
> V6.2 series with default SCSI_MQ
> read, rw, write IOPS   
> 700K, 130K/128K, 640K

If possible, could you provide your fio script and log on both
non SCSI_MQ(deadline) and SCSI_MQ(mq_deadline)? Maybe some clues
can be figured out.

Also, I just put another patch on V6.2 branch, which may improve
a bit too. You may try that in your test.

https://github.com/ming1/linux/commit/e31e2eec46c9b5ae7cfa181e9b77adad2c6a97ce

-- Ming .

Hi Ming Lei,

OK, I have tested deadline vs mq-deadline for your v6.2 branch and 
4.12-rc2. Unfortunately I don't have time now to test your experimental 
patches.

4.14-rc2 without default SCSI_MQ, deadline scheduler
read, rw, write IOPS
920K, 115K/115K, 806K

4.14-rc2 with default SCSI_MQ, mq-deadline scheduler
read, rw, write IOPS
280K, 99K/99K, 300K

V6.2 series without default SCSI_MQ, deadline scheduler
read, rw, write IOPS
919K, 117K/117K, 806K

V6.2 series with default SCSI_MQ, mq-deadline scheduler
read, rw, write IOPS
688K, 128K/128K, 630K

I think that the non-mq results look a bit more sensible - that is, 
consistent results.

Here's my script sample:
[global]
rw=rW
direct=1
ioengine=libaio
iodepth=2048
numjobs=1
bs=4k
;size=1024m
;zero_buffers=1
group_reporting=1
group_reporting=1
;ioscheduler=noop
cpumask=0xff
;cpus_allowed=0-3
;gtod_reduce=1
;iodepth_batch=2
;iodepth_batch_complete=2
runtime=1
;thread
loops = 1

[job1]
filename=/dev/sdb:
[job1]
filename=/dev/sdc:
[job1]
filename=/dev/sdd:
[job1]
filename=/dev/sde:
[job1]
filename=/dev/sdf:
[job1]
filename=/dev/sdg:

John

[PATCH] scsi: libcxgbi: in case of vlan pass 0 as ifindex to find route

2017-10-10 Thread Varun Prakash

In case of vlan pass 0 as ifindex to find route instead of
passing real_dev ifindex, if we pass real_dev ifindex
then ip_route_output_ports() and ip6_route_output() will
check for route through real_dev not through vlan interface.

Signed-off-by: Varun Prakash 
---
 drivers/scsi/cxgbi/libcxgbi.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index 512c8f1..8503ac9 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -2556,7 +2556,10 @@ struct iscsi_endpoint *cxgbi_ep_connect(struct Scsi_Host 
*shost,
goto err_out;
}
 
-   ifindex = hba->ndev->ifindex;
+   rtnl_lock();
+   if (!vlan_uses_dev(hba->ndev))
+   ifindex = hba->ndev->ifindex;
+   rtnl_unlock();
}
 
if (dst_addr->sa_family == AF_INET) {
-- 
2.0.2

Re: [PATCH V5 00/14] blk-mq-sched: improve sequential I/O performance(part 1)

2017-10-10 Thread Ming Lei

On Tue, Oct 10, 2017 at 01:24:52PM +0100, John Garry wrote:
> On 10/10/2017 02:46, Ming Lei wrote:
> > > > > > I tested this series for the SAS controller on HiSilicon hip07 
> > > > > > platform as I
> > > > > > am interested in enabling MQ for this driver. Driver is
> > > > > > ./drivers/scsi/hisi_sas/.
> > > > > >
> > > > > > So I found that that performance is improved when enabling default 
> > > > > > SCSI_MQ
> > > > > > with this series vs baseline. However, it is still not as a good as 
> > > > > > when
> > > > > > default SCSI_MQ is disabled.
> > > > > >
> > > > > > Here are some figures I got with fio:
> > > > > > 4.14-rc2 without default SCSI_MQ
> > > > > > read, rw, write IOPS
> > > > > > 952K, 133K/133K, 800K
> > > > > >
> > > > > > 4.14-rc2 with default SCSI_MQ
> > > > > > read, rw, write IOPS
> > > > > > 311K, 117K/117K, 320K
> > > > > >
> > > > > > This series* without default SCSI_MQ
> > > > > > read, rw, write IOPS
> > > > > > 975K, 132K/132K, 790K
> > > > > >
> > > > > > This series* with default SCSI_MQ
> > > > > > read, rw, write IOPS
> > > > > > 770K, 164K/164K, 594K
> > > >
> > > > Thanks for testing this patchset!
> > > >
> > > > Looks there is big improvement, but the gap compared with
> > > > block legacy is not small too.
> > > >
> > > > > >
> > > > > > Please note that hisi_sas driver does not enable mq by exposing 
> > > > > > multiple
> > > > > > queues to upper layer (even though it has multiple queues). I have 
> > > > > > been
> > > > > > playing with enabling it, but my performance is always worse...
> > > > > >
> > > > > > * I'm using
> > > > > > https://github.com/ming1/linux/commits/blk_mq_improve_scsi_mpath_perf_V5.1,
> > > > > > as advised by Ming Lei.
> > > >
> > > > Could you test on the following branch and see if it makes a
> > > > difference?
> > > >
> > > > 
> > > > https://github.com/ming1/linux/commits/blk_mq_improve_scsi_mpath_perf_V6.1_test
> > Hi John,
> > 
> > Please test the following branch directly:
> > 
> > https://github.com/ming1/linux/tree/blk_mq_improve_scsi_mpath_perf_V6.2_test
> > 
> > And code is simplified and cleaned up much in V6.2, then only two extra
> > patches(top 2) are needed against V6 which was posted yesterday.
> > 
> > Please test SCSI_MQ with mq-deadline, which should be the default
> > mq scheduler on your HiSilicon SAS.
> 
> Hi Ming Lei,
> 
> It's using cfq (for non-mq) and mq-deadline (obviously for mq).
> 
> root@(none)$ pwd
> /sys/devices/platform/HISI0162:01/host0/port-0:0/expander-0:0/port-0:0:7/end_device-0:0:7
> root@(none)$ more ./target0:0:3/0:0:3:0/block/sdd/queue/scheduler
> noop [cfq]
> 
> and
> 
> root@(none)$ more ./target0:0:3/0:0:3:0/block/sdd/queue/scheduler
> [mq-deadline] kyber none
> 
> Unfortunately my setup has changed since yeterday, and the absolute figures
> are not the exact same (I retested 4.14-rc2). However, we still see that
> drop when mq is enabled.
> 
> Here's the results:
> 4.14-rc4 without default SCSI_MQ
> read, rw, write IOPS  
> 860K, 112K/112K, 800K
> 
> 4.14-rc2 without default SCSI_MQ
> read, rw, write IOPS  
> 880K, 113K/113K, 808K
> 
> V6.2 series without default SCSI_MQ
> read, rw, write IOPS  
> 820K, 114/114K, 790K

Hi John,

All change in V6.2 is blk-mq/scsi-mq only, which shouldn't
affect non SCSI_MQ, so I suggest you to compare the perf
between deadline and mq-deadline, like Johannes mentioned.

> 
> V6.2 series with default SCSI_MQ
> read, rw, write IOPS  
> 700K, 130K/128K, 640K

If possible, could you provide your fio script and log on both
non SCSI_MQ(deadline) and SCSI_MQ(mq_deadline)? Maybe some clues
can be figured out.

Also, I just put another patch on V6.2 branch, which may improve
a bit too. You may try that in your test.


https://github.com/ming1/linux/commit/e31e2eec46c9b5ae7cfa181e9b77adad2c6a97ce

-- 
Ming

[PATCH 01/10] mpt3sas: Processing of Cable Exception events

2017-10-10 Thread Sreekanth Reddy

Earlier Active Cable Exception event with reason code
"Cable Degraded (0x02))" was added only for Active Cable,
Now this event is extended to Passive cable too.
So re-arranged display message accordingly.

Also added Cable Exception Event even for SAS3008 & SAS3108 HBAs
(i.e. MPI 2.5 spec supporting HBAs) earlier this event was
enabled only for MPI 2.6 spec supporting HBA devices.

Signed-off-by: Sreekanth Reddy 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c  |  5 ++---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 20 +++-
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 870..844e29c 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -655,7 +655,7 @@ _base_display_event_data(struct MPT3SAS_ADAPTER *ioc,
desc = "Temperature Threshold";
break;
case MPI2_EVENT_ACTIVE_CABLE_EXCEPTION:
-   desc = "Active cable exception";
+   desc = "Cable Event";
break;
}
 
@@ -5517,8 +5517,7 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc)
_base_unmask_events(ioc, MPI2_EVENT_IR_OPERATION_STATUS);
_base_unmask_events(ioc, MPI2_EVENT_LOG_ENTRY_ADDED);
_base_unmask_events(ioc, MPI2_EVENT_TEMP_THRESHOLD);
-   if (ioc->hba_mpi_version_belonged == MPI26_VERSION)
-   _base_unmask_events(ioc, MPI2_EVENT_ACTIVE_CABLE_EXCEPTION);
+   _base_unmask_events(ioc, MPI2_EVENT_ACTIVE_CABLE_EXCEPTION);
 
r = _base_make_ioc_operational(ioc);
if (r)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 22998cb..9594166 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -8056,19 +8056,21 @@ mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER 
*ioc, u8 msix_index,
(Mpi26EventDataActiveCableExcept_t *) mpi_reply->EventData;
switch (ActiveCableEventData->ReasonCode) {
case MPI26_EVENT_ACTIVE_CABLE_INSUFFICIENT_POWER:
-   pr_notice(MPT3SAS_FMT "Receptacle ID %d: This active 
cable"
- " requires %d mW of power\n", ioc->name,
-ActiveCableEventData->ReceptacleID,
+   pr_notice(MPT3SAS_FMT
+   "Currently an active cable with ReceptacleID %d\n",
+   ioc->name, ActiveCableEventData->ReceptacleID);
+   pr_notice("cannot be powered and devices connected\n");
+   pr_notice("to this active cable will not be seen\n");
+   pr_notice("This active cable require %d mW of power\n",
 ActiveCableEventData->ActiveCablePowerRequirement);
-   pr_notice(MPT3SAS_FMT "Receptacle ID %d: Devices 
connected"
- " to this active cable will not be seen\n",
-ioc->name, ActiveCableEventData->ReceptacleID);
break;
 
case MPI26_EVENT_ACTIVE_CABLE_DEGRADED:
-   pr_notice(MPT3SAS_FMT "ReceptacleID %d: This cable",
-   ioc->name, ActiveCableEventData->ReceptacleID);
-   pr_notice(" is not running at an optimal speed(12 
Gb/s)\n");
+   pr_notice(MPT3SAS_FMT
+   "Currently a cable with ReceptacleID %d\n",
+   ioc->name, ActiveCableEventData->ReceptacleID);
+   pr_notice(
+   "is not running at optimal speed(12 Gb/s rate)\n");
break;
}
 
-- 
2.4.3

[PATCH 00/10] [SCSI] mpt3sas: Phase15 driver enhancements and fixes

2017-10-10 Thread Sreekanth Reddy

Phase15 driver enhancements and fixes.

Sreekanth Reddy (10):
  mpt3sas: Processing of Cable Exception events
  mpt3sas: Fixed memory leaks in driver
  mpt3sas: Reduce memory footprints in kdump kernel
  mpt3sas: Fix removal and addition of vSES device during host reset
  mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume
created on two SATA drive
  mpt3sas: Updated MPI headers to v2.00.48
  mpt3sas: Display chassis slot information of the drive
  mpt3sas: Fix possibility of using invalid Enclosure Handles for SAS
device after host reset
  mpt3sas: Adding support for SAS3616 HBA device
  mpt3sas: Bump mpt3sas driver version to v16.100.00.00

 drivers/scsi/mpt3sas/mpi/mpi2.h  |  43 ++-
 drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h | 564 +--
 drivers/scsi/mpt3sas/mpi/mpi2_init.h |  11 +-
 drivers/scsi/mpt3sas/mpi/mpi2_ioc.h  | 282 +-
 drivers/scsi/mpt3sas/mpi/mpi2_pci.h  | 112 +++
 drivers/scsi/mpt3sas/mpi/mpi2_tool.h |  14 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c  |  19 +-
 drivers/scsi/mpt3sas/mpt3sas_base.h  |  10 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 383 ++--
 9 files changed, 1241 insertions(+), 197 deletions(-)
 create mode 100644 drivers/scsi/mpt3sas/mpi/mpi2_pci.h

-- 
2.4.3

[PATCH 04/10] mpt3sas: Fix removal and addition of vSES device during host reset

2017-10-10 Thread Sreekanth Reddy

For Dev Handles who value is less than hba's phys count number
 driver will return HBA sas address value as a sas address.
So for Virtual SES device also driver was returning HBA sas address instead
 of Virtual SES sas address. So now updated the driver to return
 Virtual SES's sas address for Virtual SES device instead of
 HBA's sas address.

Signed-off-by: Sreekanth Reddy 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 600e8ef..cc78ce4 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -406,11 +406,6 @@ _scsih_get_sas_address(struct MPT3SAS_ADAPTER *ioc, u16 
handle,
 
*sas_address = 0;
 
-   if (handle <= ioc->sas_hba.num_phys) {
-   *sas_address = ioc->sas_hba.sas_address;
-   return 0;
-   }
-
if ((mpt3sas_config_get_sas_device_pg0(ioc, _reply, _device_pg0,
MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, handle))) {
pr_err(MPT3SAS_FMT "failure at %s:%d/%s()!\n", ioc->name,
@@ -420,7 +415,15 @@ _scsih_get_sas_address(struct MPT3SAS_ADAPTER *ioc, u16 
handle,
 
ioc_status = le16_to_cpu(mpi_reply.IOCStatus) & MPI2_IOCSTATUS_MASK;
if (ioc_status == MPI2_IOCSTATUS_SUCCESS) {
-   *sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
+   /* For HBA vSES don't return hba sas address instead return
+* vSES's sas address.
+*/
+   if ((handle <= ioc->sas_hba.num_phys) &&
+  (!(le32_to_cpu(sas_device_pg0.DeviceInfo) &
+  MPI2_SAS_DEVICE_INFO_SEP)))
+   *sas_address = ioc->sas_hba.sas_address;
+   else
+   *sas_address = le64_to_cpu(sas_device_pg0.SASAddress);
return 0;
}
 
-- 
2.4.3

[PATCH 03/10] mpt3sas: Reduce memory footprints in kdump kernel

2017-10-10 Thread Sreekanth Reddy

To reduce the memory footprints of the driver in kdump kernel,
 we have made below driver setting when system boots in to kdump kernel,

1. Used single MSI-x vector.
2. Disable RDPQ mode.
3. Set sg_table_size to 32 by default.
4) Set SCSI IO Queue depth to 200.

Signed-off-by: Sreekanth Reddy 
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 14 +++---
 drivers/scsi/mpt3sas/mpt3sas_base.h |  2 ++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 844e29c..11c6afe 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -1990,7 +1990,7 @@ _base_enable_msix(struct MPT3SAS_ADAPTER *ioc)
  ioc->cpu_count, max_msix_vectors);
 
if (!ioc->rdpq_array_enable && max_msix_vectors == -1)
-   local_max_msix_vectors = 8;
+   local_max_msix_vectors = (reset_devices) ? 1 : 8;
else
local_max_msix_vectors = max_msix_vectors;
 
@@ -3308,6 +3308,11 @@ _base_allocate_memory_pools(struct MPT3SAS_ADAPTER *ioc)
sg_tablesize = MPT3SAS_SG_DEPTH;
}
 
+   /* max sgl entries <= MPT_KDUMP_MIN_PHYS_SEGMENTS in KDUMP mode */
+   if (reset_devices)
+   sg_tablesize = min_t(unsigned short, sg_tablesize,
+  MPT_KDUMP_MIN_PHYS_SEGMENTS);
+
if (sg_tablesize < MPT_MIN_PHYS_SEGMENTS)
sg_tablesize = MPT_MIN_PHYS_SEGMENTS;
else if (sg_tablesize > MPT_MAX_PHYS_SEGMENTS) {
@@ -3340,7 +3345,10 @@ _base_allocate_memory_pools(struct MPT3SAS_ADAPTER *ioc)
ioc->internal_depth, facts->RequestCredit);
if (max_request_credit > MAX_HBA_QUEUE_DEPTH)
max_request_credit =  MAX_HBA_QUEUE_DEPTH;
-   } else
+   } else if (reset_devices)
+   max_request_credit = min_t(u16, facts->RequestCredit,
+   (MPT3SAS_KDUMP_SCSI_IO_DEPTH + ioc->internal_depth));
+   else
max_request_credit = min_t(u16, facts->RequestCredit,
MAX_HBA_QUEUE_DEPTH);
 
@@ -4446,7 +4454,7 @@ _base_get_ioc_facts(struct MPT3SAS_ADAPTER *ioc)
if ((facts->IOCCapabilities & MPI2_IOCFACTS_CAPABILITY_INTEGRATED_RAID))
ioc->ir_firmware = 1;
if ((facts->IOCCapabilities &
- MPI2_IOCFACTS_CAPABILITY_RDPQ_ARRAY_CAPABLE))
+ MPI2_IOCFACTS_CAPABILITY_RDPQ_ARRAY_CAPABLE) && (!reset_devices))
ioc->rdpq_array_capable = 1;
if (facts->IOCCapabilities & MPI26_IOCFACTS_CAPABILITY_ATOMIC_REQ)
ioc->atomic_desc_capable = 1;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
b/drivers/scsi/mpt3sas/mpt3sas_base.h
index a77bb7d..95ee1c6 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -92,6 +92,7 @@
  */
 #define MPT_MAX_PHYS_SEGMENTS  SG_CHUNK_SIZE
 #define MPT_MIN_PHYS_SEGMENTS  16
+#define MPT_KDUMP_MIN_PHYS_SEGMENTS32
 
 #ifdef CONFIG_SCSI_MPT3SAS_MAX_SGE
 #define MPT3SAS_SG_DEPTH   CONFIG_SCSI_MPT3SAS_MAX_SGE
@@ -111,6 +112,7 @@
 #define MPT3SAS_SATA_QUEUE_DEPTH   32
 #define MPT3SAS_SAS_QUEUE_DEPTH254
 #define MPT3SAS_RAID_QUEUE_DEPTH   128
+#define MPT3SAS_KDUMP_SCSI_IO_DEPTH200
 
 #define MPT3SAS_RAID_MAX_SECTORS   8192
 
-- 
2.4.3

[PATCH 02/10] mpt3sas: Fixed memory leaks in driver

2017-10-10 Thread Sreekanth Reddy

Fixed below memory leak in driver,

* While removing Expander devices - we are removing expander
 device entry from the list before freeing it's child devices,
 so while freeing child device we are finding its parent device
 node as NULL and so we are not freeing the child device's
 allocated data structures.
 Updated the driver to remove the expander device from the list
 only after freeing all its child devices.

Signed-off-by: Sreekanth Reddy 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 9594166..600e8ef 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -5274,8 +5274,6 @@ mpt3sas_expander_remove(struct MPT3SAS_ADAPTER *ioc, u64 
sas_address)
spin_lock_irqsave(>sas_node_lock, flags);
sas_expander = mpt3sas_scsih_expander_find_by_sas_address(ioc,
sas_address);
-   if (sas_expander)
-   list_del(_expander->list);
spin_unlock_irqrestore(>sas_node_lock, flags);
if (sas_expander)
_scsih_expander_node_remove(ioc, sas_expander);
@@ -7476,7 +7474,6 @@ _scsih_remove_unresponding_sas_devices(struct 
MPT3SAS_ADAPTER *ioc)
spin_unlock_irqrestore(>sas_node_lock, flags);
list_for_each_entry_safe(sas_expander, sas_expander_next, _list,
list) {
-   list_del(_expander->list);
_scsih_expander_node_remove(ioc, sas_expander);
}
 
@@ -8102,7 +8099,6 @@ mpt3sas_scsih_event_callback(struct MPT3SAS_ADAPTER *ioc, 
u8 msix_index,
  * _scsih_expander_node_remove - removing expander device from list.
  * @ioc: per adapter object
  * @sas_expander: the sas_device object
- * Context: Calling function should acquire ioc->sas_node_lock.
  *
  * Removing object and freeing associated memory from the
  * ioc->sas_expander_list.
@@ -8114,6 +8110,7 @@ _scsih_expander_node_remove(struct MPT3SAS_ADAPTER *ioc,
struct _sas_node *sas_expander)
 {
struct _sas_port *mpt3sas_port, *next;
+   unsigned long flags;
 
/* remove sibling ports attached to this expander */
list_for_each_entry_safe(mpt3sas_port, next,
@@ -8141,6 +8138,10 @@ _scsih_expander_node_remove(struct MPT3SAS_ADAPTER *ioc,
sas_expander->handle, (unsigned long long)
sas_expander->sas_address);
 
+   spin_lock_irqsave(>sas_node_lock, flags);
+   list_del(_expander->list);
+   spin_unlock_irqrestore(>sas_node_lock, flags);
+
kfree(sas_expander->phy);
kfree(sas_expander);
 }
-- 
2.4.3

[PATCH 05/10] mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume created on two SATA drive

2017-10-10 Thread Sreekanth Reddy

Whenever IO for raid volume fails with IOCStatus
 "MPI2_IOCSTATUS_SCSI_IOC_TERMINATED"and SCSIStatus equal to
 "(MPI2_SCSI_STATE_TERMINATED | MPI2_SCSI_STATE_NO_SCSI_STATUS)"
 then return the IO to SML with "DID_RESET"
 (i.e. retry the IO infinite times) host bytes.

Earlier driver is returning the IO with "DID_SOFT_ERROR"
 that reties the IO quickly for five times but still
 firmware needed some more time and hence IOs were failing.

Signed-off-by: Sreekanth Reddy 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index cc78ce4..fa948ab 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -4807,6 +4807,11 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 
msix_index, u32 reply)
} else if (log_info == VIRTUAL_IO_FAILED_RETRY) {
scmd->result = DID_RESET << 16;
break;
+   } else if ((scmd->device->channel == RAID_CHANNEL) &&
+  (scsi_state == (MPI2_SCSI_STATE_TERMINATED |
+  MPI2_SCSI_STATE_NO_SCSI_STATUS))) {
+   scmd->result = DID_RESET << 16;
+   break;
}
scmd->result = DID_SOFT_ERROR << 16;
break;
-- 
2.4.3

[PATCH 07/10] mpt3sas: Display chassis slot information of the drive

2017-10-10 Thread Sreekanth Reddy

Display chassis slot information along with other drive location parameters
 such as slot number and connector name in the logs; if chassis slot
 validity bit is set in 'SAS Enclosure Page 0' while adding the drive.

Signed-off-by: Sreekanth Reddy 
---
 drivers/scsi/mpt3sas/mpt3sas_base.h  |   4 +
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 266 ++-
 2 files changed, 143 insertions(+), 127 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 95ee1c6..75d90f2 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -469,6 +469,8 @@ struct _internal_cmd {
  * @pfa_led_on: flag for PFA LED status
  * @pend_sas_rphy_add: flag to check if device is in sas_rphy_add()
  * addition routine.
+ * @chassis_slot: chassis slot
+ * @is_chassis_slot_valid: chassis slot valid or not
  */
 struct _sas_device {
struct list_head list;
@@ -491,6 +493,8 @@ struct _sas_device {
u8  pfa_led_on;
u8  pend_sas_rphy_add;
u8  enclosure_level;
+   u8  chassis_slot;
+   u8  is_chassis_slot_valid;
u8  connector_name[5];
struct kref refcount;
 };
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index fa948ab..17b934b 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -653,6 +653,69 @@ mpt3sas_get_sdev_by_handle(struct MPT3SAS_ADAPTER *ioc, 
u16 handle)
 }
 
 /**
+ * _scsih_display_enclosure_chassis_info - display device location info
+ * @ioc: per adapter object
+ * @sas_device: per sas device object
+ * @sdev: scsi device struct
+ * @starget: scsi target struct
+ *
+ * Returns nothing.
+ */
+static void
+_scsih_display_enclosure_chassis_info(struct MPT3SAS_ADAPTER *ioc,
+   struct _sas_device *sas_device, struct scsi_device *sdev,
+   struct scsi_target *starget)
+{
+   if (sdev) {
+   if (sas_device->enclosure_handle != 0)
+   sdev_printk(KERN_INFO, sdev,
+   "enclosure logical id (0x%016llx), slot(%d) \n",
+   (unsigned long long)
+   sas_device->enclosure_logical_id,
+   sas_device->slot);
+   if (sas_device->connector_name[0] != '\0')
+   sdev_printk(KERN_INFO, sdev,
+   "enclosure level(0x%04x), connector name( %s)\n",
+   sas_device->enclosure_level,
+   sas_device->connector_name);
+   if (sas_device->is_chassis_slot_valid)
+   sdev_printk(KERN_INFO, sdev, "chassis slot(0x%04x)\n",
+   sas_device->chassis_slot);
+   } else if (starget) {
+   if (sas_device->enclosure_handle != 0)
+   starget_printk(KERN_INFO, starget,
+   "enclosure logical id(0x%016llx), slot(%d) \n",
+   (unsigned long long)
+   sas_device->enclosure_logical_id,
+   sas_device->slot);
+   if (sas_device->connector_name[0] != '\0')
+   starget_printk(KERN_INFO, starget,
+   "enclosure level(0x%04x), connector name( %s)\n",
+   sas_device->enclosure_level,
+   sas_device->connector_name);
+   if (sas_device->is_chassis_slot_valid)
+   starget_printk(KERN_INFO, starget,
+   "chassis slot(0x%04x)\n",
+   sas_device->chassis_slot);
+   } else {
+   if (sas_device->enclosure_handle != 0)
+   pr_info(MPT3SAS_FMT
+   "enclosure logical id(0x%016llx), slot(%d) \n",
+   ioc->name, (unsigned long long)
+   sas_device->enclosure_logical_id,
+   sas_device->slot);
+   if (sas_device->connector_name[0] != '\0')
+   pr_info(MPT3SAS_FMT
+   "enclosure level(0x%04x), connector name( %s)\n",
+   ioc->name, sas_device->enclosure_level,
+   sas_device->connector_name);
+   if (sas_device->is_chassis_slot_valid)
+   pr_info(MPT3SAS_FMT "chassis slot(0x%04x)\n",
+   ioc->name, sas_device->chassis_slot);
+   }
+}
+
+/**
  * _scsih_sas_device_remove - remove sas_device from list.
  * @ioc: per adapter object
  * @sas_device: the sas_device object
@@ -673,17 +736,7 @@ _scsih_sas_device_remove(struct MPT3SAS_ADAPTER *ioc,
ioc->name, sas_device->handle,
(unsigned long long) sas_device->sas_address);
 
-   if (sas_device->enclosure_handle != 0)
-

[PATCH 06/10] mpt3sas: Updated MPI headers to v2.00.48

2017-10-10 Thread Sreekanth Reddy

Updated MPI headers to v2.00.48

Signed-off-by: Sreekanth Reddy 
---
 drivers/scsi/mpt3sas/mpi/mpi2.h  |  43 ++-
 drivers/scsi/mpt3sas/mpi/mpi2_cnfg.h | 564 +--
 drivers/scsi/mpt3sas/mpi/mpi2_init.h |  11 +-
 drivers/scsi/mpt3sas/mpi/mpi2_ioc.h  | 282 +-
 drivers/scsi/mpt3sas/mpi/mpi2_pci.h  | 112 +++
 drivers/scsi/mpt3sas/mpi/mpi2_tool.h |  14 +-
 6 files changed, 992 insertions(+), 34 deletions(-)
 create mode 100644 drivers/scsi/mpt3sas/mpi/mpi2_pci.h

diff --git a/drivers/scsi/mpt3sas/mpi/mpi2.h b/drivers/scsi/mpt3sas/mpi/mpi2.h
index a9a659f..bc59058 100644
--- a/drivers/scsi/mpt3sas/mpi/mpi2.h
+++ b/drivers/scsi/mpt3sas/mpi/mpi2.h
@@ -8,7 +8,7 @@
  * scatter/gather formats.
  * Creation Date:  June 21, 2006
  *
- * mpi2.h Version:  02.00.42
+ * mpi2.h Version:  02.00.48
  *
  * NOTE: Names (typedefs, defines, etc.) beginning with an MPI25 or Mpi25
  *   prefix are for use only on MPI v2.5 products, and must not be used
@@ -103,6 +103,16 @@
  * 08-25-15  02.00.40  Bumped MPI2_HEADER_VERSION_UNIT.
  * 12-15-15  02.00.41  Bumped MPI_HEADER_VERSION_UNIT
  * 01-01-16  02.00.42  Bumped MPI_HEADER_VERSION_UNIT
+ * 04-05-16  02.00.43  Modified  MPI26_DIAG_BOOT_DEVICE_SELECT defines
+ * to be unique within first 32 characters.
+ * Removed AHCI support.
+ * Removed SOP support.
+ * Bumped MPI2_HEADER_VERSION_UNIT.
+ * 04-10-16  02.00.44  Bumped MPI2_HEADER_VERSION_UNIT.
+ * 07-06-16  02.00.45  Bumped MPI2_HEADER_VERSION_UNIT.
+ * 09-02-16  02.00.46  Bumped MPI2_HEADER_VERSION_UNIT.
+ * 11-23-16  02.00.47  Bumped MPI2_HEADER_VERSION_UNIT.
+ * 02-03-17  02.00.48  Bumped MPI2_HEADER_VERSION_UNIT.
  * --
  */
 
@@ -142,7 +152,7 @@
 #define MPI2_VERSION_02_06 (0x0206)
 
 /*Unit and Dev versioning for this MPI header set */
-#define MPI2_HEADER_VERSION_UNIT(0x2A)
+#define MPI2_HEADER_VERSION_UNIT(0x30)
 #define MPI2_HEADER_VERSION_DEV (0x00)
 #define MPI2_HEADER_VERSION_UNIT_MASK   (0xFF00)
 #define MPI2_HEADER_VERSION_UNIT_SHIFT  (8)
@@ -249,6 +259,12 @@ typedef volatile struct _MPI2_SYSTEM_INTERFACE_REGS {
 #define MPI2_DIAG_BOOT_DEVICE_SELECT_DEFAULT(0x)
 #define MPI2_DIAG_BOOT_DEVICE_SELECT_HCDW   (0x0800)
 
+/* Defines for V7A/V7R HostDiagnostic Register */
+#define MPI26_DIAG_BOOT_DEVICE_SEL_64FLASH  (0x)
+#define MPI26_DIAG_BOOT_DEVICE_SEL_64HCDW   (0x0800)
+#define MPI26_DIAG_BOOT_DEVICE_SEL_32FLASH  (0x1000)
+#define MPI26_DIAG_BOOT_DEVICE_SEL_32HCDW   (0x1800)
+
 #define MPI2_DIAG_CLEAR_FLASH_BAD_SIG   (0x0400)
 #define MPI2_DIAG_FORCE_HCB_ON_RESET(0x0200)
 #define MPI2_DIAG_HCB_MODE  (0x0100)
@@ -367,6 +383,7 @@ typedef struct _MPI2_DEFAULT_REQUEST_DESCRIPTOR {
 #define MPI2_REQ_DESCRIPT_FLAGS_DEFAULT_TYPE(0x08)
 #define MPI2_REQ_DESCRIPT_FLAGS_RAID_ACCELERATOR(0x0A)
 #define MPI25_REQ_DESCRIPT_FLAGS_FAST_PATH_SCSI_IO  (0x0C)
+#define MPI26_REQ_DESCRIPT_FLAGS_PCIE_ENCAPSULATED  (0x10)
 
 #define MPI2_REQ_DESCRIPT_FLAGS_IOC_FIFO_MARKER (0x01)
 
@@ -425,6 +442,13 @@ typedef MPI2_SCSI_IO_REQUEST_DESCRIPTOR
Mpi25FastPathSCSIIORequestDescriptor_t,
*pMpi25FastPathSCSIIORequestDescriptor_t;
 
+/*PCIe Encapsulated Request Descriptor */
+typedef MPI2_SCSI_IO_REQUEST_DESCRIPTOR
+   MPI26_PCIE_ENCAPSULATED_REQUEST_DESCRIPTOR,
+   *PTR_MPI26_PCIE_ENCAPSULATED_REQUEST_DESCRIPTOR,
+   Mpi26PCIeEncapsulatedRequestDescriptor_t,
+   *pMpi26PCIeEncapsulatedRequestDescriptor_t;
+
 /*union of Request Descriptors */
 typedef union _MPI2_REQUEST_DESCRIPTOR_UNION {
MPI2_DEFAULT_REQUEST_DESCRIPTOR Default;
@@ -433,6 +457,7 @@ typedef union _MPI2_REQUEST_DESCRIPTOR_UNION {
MPI2_SCSI_TARGET_REQUEST_DESCRIPTOR SCSITarget;
MPI2_RAID_ACCEL_REQUEST_DESCRIPTOR RAIDAccelerator;
MPI25_FP_SCSI_IO_REQUEST_DESCRIPTOR FastPathSCSIIO;
+   MPI26_PCIE_ENCAPSULATED_REQUEST_DESCRIPTOR PCIeEncapsulated;
U64 Words;
 } MPI2_REQUEST_DESCRIPTOR_UNION,
*PTR_MPI2_REQUEST_DESCRIPTOR_UNION,
@@ -450,6 +475,7 @@ typedef union _MPI2_REQUEST_DESCRIPTOR_UNION {
  *  Atomic SCSI Target Request Descriptor
  *  Atomic RAID Accelerator Request Descriptor
  *  Atomic Fast Path SCSI IO Request Descriptor
+ *  Atomic PCIe Encapsulated Request Descriptor
  */
 
 /*Atomic Request Descriptor */
@@ -487,6 +513,7 @@ typedef struct _MPI2_DEFAULT_REPLY_DESCRIPTOR {
 #define MPI2_RPY_DESCRIPT_FLAGS_TARGET_COMMAND_BUFFER   (0x03)
 #define MPI2_RPY_DESCRIPT_FLAGS_RAID_ACCELERATOR_SUCCESS(0x05)
 #define MPI25_RPY_DESCRIPT_FLAGS_FAST_PATH_SCSI_IO_SUCCESS  (0x06)
+#define

[PATCH 09/10] mpt3sas: Adding support for SAS3616 HBA device

2017-10-10 Thread Sreekanth Reddy

Adding PNP ID of Mercator i.e. SAS3616 HBA device.
Its device ID is 0xD1 and vendor ID is 0x1000.

Signed-off-by: Sreekanth Reddy 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index b819914..3f640ba 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -8808,6 +8808,7 @@ _scsih_determine_hba_mpi_version(struct pci_dev *pdev)
case MPI26_MFGPAGE_DEVID_SAS3516:
case MPI26_MFGPAGE_DEVID_SAS3516_1:
case MPI26_MFGPAGE_DEVID_SAS3416:
+   case MPI26_MFGPAGE_DEVID_SAS3616:
return MPI26_VERSION;
}
return 0;
@@ -8885,6 +8886,7 @@ _scsih_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
case MPI26_MFGPAGE_DEVID_SAS3516:
case MPI26_MFGPAGE_DEVID_SAS3516_1:
case MPI26_MFGPAGE_DEVID_SAS3416:
+   case MPI26_MFGPAGE_DEVID_SAS3616:
ioc->is_gen35_ioc = 1;
break;
default:
@@ -9341,6 +9343,9 @@ static const struct pci_device_id mpt3sas_pci_table[] = {
PCI_ANY_ID, PCI_ANY_ID },
{ MPI2_MFGPAGE_VENDORID_LSI, MPI26_MFGPAGE_DEVID_SAS3416,
PCI_ANY_ID, PCI_ANY_ID },
+   /* Mercator ~ 3616*/
+   { MPI2_MFGPAGE_VENDORID_LSI, MPI26_MFGPAGE_DEVID_SAS3616,
+   PCI_ANY_ID, PCI_ANY_ID },
{0} /* Terminating entry */
 };
 MODULE_DEVICE_TABLE(pci, mpt3sas_pci_table);
-- 
2.4.3

[PATCH 08/10] mpt3sas: Fix possibility of using invalid Enclosure Handles for SAS device after host reset

2017-10-10 Thread Sreekanth Reddy

Enclosure handles are not updated after host reset.
As a result, driver device structure is holding previously
 assigned enclosure handle which is different from the
 enclosure handle populated in the corresponding device page.

Modified the driver to update devices enclosure handles after
 host reset to current value, by referring the enclosure handles
 from corresponding device pages

Signed-off-by: Sreekanth Reddy 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 117 ---
 1 file changed, 81 insertions(+), 36 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 17b934b..b819914 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -5383,6 +5383,52 @@ _scsih_check_access_status(struct MPT3SAS_ADAPTER *ioc, 
u64 sas_address,
 }
 
 /**
+ * _scsih_get_enclosure_logicalid_chassis_slot - get device's
+ * EnclosureLogicalID and ChassisSlot information.
+ * @ioc: per adapter object
+ * @sas_device_pg0: SAS device page0
+ * @sas_device: per sas device object
+ *
+ * Returns nothing.
+ */
+static void
+_scsih_get_enclosure_logicalid_chassis_slot(struct MPT3SAS_ADAPTER *ioc,
+   Mpi2SasDevicePage0_t *sas_device_pg0, struct _sas_device *sas_device)
+{
+   Mpi2ConfigReply_t mpi_reply;
+   Mpi2SasEnclosurePage0_t enclosure_pg0;
+
+   if (!sas_device_pg0 || !sas_device)
+   return;
+
+   sas_device->enclosure_handle =
+   le16_to_cpu(sas_device_pg0->EnclosureHandle);
+   sas_device->is_chassis_slot_valid = 0;
+
+   if (!le16_to_cpu(sas_device_pg0->EnclosureHandle))
+   return;
+
+   if (mpt3sas_config_get_enclosure_pg0(ioc, _reply,
+   _pg0, MPI2_SAS_ENCLOS_PGAD_FORM_HANDLE,
+   le16_to_cpu(sas_device_pg0->EnclosureHandle))) {
+   pr_err(MPT3SAS_FMT
+   "Enclosure Pg0 read failed for handle(0x%04x)\n",
+   ioc->name, le16_to_cpu(sas_device_pg0->EnclosureHandle));
+   return;
+   }
+
+   sas_device->enclosure_logical_id =
+   le64_to_cpu(enclosure_pg0.EnclosureLogicalID);
+
+   if (le16_to_cpu(enclosure_pg0.Flags) &
+   MPI2_SAS_ENCLS0_FLAGS_CHASSIS_SLOT_VALID) {
+   sas_device->is_chassis_slot_valid = 1;
+   sas_device->chassis_slot = enclosure_pg0.ChassisSlot;
+   }
+}
+
+
+/**
  * _scsih_check_device - checking device responsiveness
  * @ioc: per adapter object
  * @parent_sas_address: sas address of parent expander or sas host
@@ -5398,7 +5444,6 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
 {
Mpi2ConfigReply_t mpi_reply;
Mpi2SasDevicePage0_t sas_device_pg0;
-   Mpi2SasEnclosurePage0_t enclosure_pg0;
struct _sas_device *sas_device;
u32 ioc_status;
unsigned long flags;
@@ -5407,7 +5452,6 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
struct MPT3SAS_TARGET *sas_target_priv_data;
u32 device_info;
 
-
if ((mpt3sas_config_get_sas_device_pg0(ioc, _reply, _device_pg0,
MPI2_SAS_DEVICE_PGAD_FORM_HANDLE, handle)))
return;
@@ -5454,18 +5498,9 @@ _scsih_check_device(struct MPT3SAS_ADAPTER *ioc,
sas_device->enclosure_level = 0;
sas_device->connector_name[0] = '\0';
}
-   sas_device->is_chassis_slot_valid = 0;
-   if (sas_device->enclosure_handle &&
-   !(mpt3sas_config_get_enclosure_pg0(ioc, _reply,
-   _pg0, MPI2_SAS_ENCLOS_PGAD_FORM_HANDLE,
-   sas_device->enclosure_handle))) {
-   if (le16_to_cpu(enclosure_pg0.Flags) &
-   MPI2_SAS_ENCLS0_FLAGS_CHASSIS_SLOT_VALID) {
-   sas_device->is_chassis_slot_valid = 1;
-   sas_device->chassis_slot =
-   enclosure_pg0.ChassisSlot;
-   }
-   }
+
+   _scsih_get_enclosure_logicalid_chassis_slot(ioc,
+   _device_pg0, sas_device);
}
 
/* check if device is present */
@@ -7088,10 +7123,8 @@ Mpi2SasDevicePage0_t *sas_device_pg0)
 {
struct MPT3SAS_TARGET *sas_target_priv_data = NULL;
struct scsi_target *starget;
-   struct _sas_device *sas_device;
+   struct _sas_device *sas_device = NULL;
unsigned long flags;
-   Mpi2SasEnclosurePage0_t enclosure_pg0;
-   Mpi2ConfigReply_t mpi_reply;
 
spin_lock_irqsave(>sas_device_lock, flags);
list_for_each_entry(sas_device, >sas_device_list, list) {
@@ -7131,18 +7164,8 @@ Mpi2SasDevicePage0_t *sas_device_pg0)
sas_device->connector_name[0] = '\0';
}
 
-   sas_device->is_chassis_slot_valid = 0;
-   if

[PATCH 10/10] mpt3sas: Bump mpt3sas driver version to v16.100.00.00

2017-10-10 Thread Sreekanth Reddy

Bump mpt3sas driver version to v16.100.00.00

Signed-off-by: Sreekanth Reddy 
---
 drivers/scsi/mpt3sas/mpt3sas_base.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 75d90f2..0fe3969 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -73,8 +73,8 @@
 #define MPT3SAS_DRIVER_NAME"mpt3sas"
 #define MPT3SAS_AUTHOR "Avago Technologies "
 #define MPT3SAS_DESCRIPTION"LSI MPT Fusion SAS 3.0 Device Driver"
-#define MPT3SAS_DRIVER_VERSION "15.100.00.00"
-#define MPT3SAS_MAJOR_VERSION  15
+#define MPT3SAS_DRIVER_VERSION "16.100.00.00"
+#define MPT3SAS_MAJOR_VERSION  16
 #define MPT3SAS_MINOR_VERSION  100
 #define MPT3SAS_BUILD_VERSION  0
 #define MPT3SAS_RELEASE_VERSION00
-- 
2.4.3

Re: [PATCH V5 00/14] blk-mq-sched: improve sequential I/O performance(part 1)

2017-10-10 Thread Paolo Valente


> Il giorno 10 ott 2017, alle ore 14:34, Johannes Thumshirn 
>  ha scritto:
> 
> Hi John,
> 
> On Tue, Oct 10, 2017 at 01:24:52PM +0100, John Garry wrote:
>> It's using cfq (for non-mq) and mq-deadline (obviously for mq).
> 
> Please be aware that cfq and mq-deadline are _not_ comparable, for a realistic
> comparasion please use deadline and mq-deadline or cfq and bfq.
> 

Please set low_latency=0 for bfq if yours is just a maximum-throughput test.

Thanks,
Paolo

>> root@(none)$ pwd
>> /sys/devices/platform/HISI0162:01/host0/port-0:0/expander-0:0/port-0:0:7/end_device-0:0:7
>> root@(none)$ more ./target0:0:3/0:0:3:0/block/sdd/queue/scheduler
>> noop [cfq]
> 
> Maybe missing CONFIG_IOSCHED_DEADLINE?
> 
> Thanks,
>   Johannes
> 
> -- 
> Johannes Thumshirn  Storage
> jthumsh...@suse.de+49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH V5 00/14] blk-mq-sched: improve sequential I/O performance(part 1)

2017-10-10 Thread Johannes Thumshirn

Hi John,

On Tue, Oct 10, 2017 at 01:24:52PM +0100, John Garry wrote:
> It's using cfq (for non-mq) and mq-deadline (obviously for mq).

Please be aware that cfq and mq-deadline are _not_ comparable, for a realistic
comparasion please use deadline and mq-deadline or cfq and bfq.

> root@(none)$ pwd
> /sys/devices/platform/HISI0162:01/host0/port-0:0/expander-0:0/port-0:0:7/end_device-0:0:7
> root@(none)$ more ./target0:0:3/0:0:3:0/block/sdd/queue/scheduler
> noop [cfq]

Maybe missing CONFIG_IOSCHED_DEADLINE?

Thanks,
Johannes

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH V5 00/14] blk-mq-sched: improve sequential I/O performance(part 1)

2017-10-10 Thread John Garry


On 10/10/2017 02:46, Ming Lei wrote:

> > I tested this series for the SAS controller on HiSilicon hip07 platform as I
> > am interested in enabling MQ for this driver. Driver is
> > ./drivers/scsi/hisi_sas/.
> >
> > So I found that that performance is improved when enabling default SCSI_MQ
> > with this series vs baseline. However, it is still not as a good as when
> > default SCSI_MQ is disabled.
> >
> > Here are some figures I got with fio:
> > 4.14-rc2 without default SCSI_MQ
> > read, rw, write IOPS  
> > 952K, 133K/133K, 800K
> >
> > 4.14-rc2 with default SCSI_MQ
> > read, rw, write IOPS  
> > 311K, 117K/117K, 320K
> >
> > This series* without default SCSI_MQ
> > read, rw, write IOPS  
> > 975K, 132K/132K, 790K
> >
> > This series* with default SCSI_MQ
> > read, rw, write IOPS  
> > 770K, 164K/164K, 594K

>
> Thanks for testing this patchset!
>
> Looks there is big improvement, but the gap compared with
> block legacy is not small too.
>

> >
> > Please note that hisi_sas driver does not enable mq by exposing multiple
> > queues to upper layer (even though it has multiple queues). I have been
> > playing with enabling it, but my performance is always worse...
> >
> > * I'm using
> > https://github.com/ming1/linux/commits/blk_mq_improve_scsi_mpath_perf_V5.1,
> > as advised by Ming Lei.

>
> Could you test on the following branch and see if it makes a
> difference?
>
>
https://github.com/ming1/linux/commits/blk_mq_improve_scsi_mpath_perf_V6.1_test

Hi John,

Please test the following branch directly:

https://github.com/ming1/linux/tree/blk_mq_improve_scsi_mpath_perf_V6.2_test

And code is simplified and cleaned up much in V6.2, then only two extra
patches(top 2) are needed against V6 which was posted yesterday.

Please test SCSI_MQ with mq-deadline, which should be the default
mq scheduler on your HiSilicon SAS.


Hi Ming Lei,

It's using cfq (for non-mq) and mq-deadline (obviously for mq).

root@(none)$ pwd
/sys/devices/platform/HISI0162:01/host0/port-0:0/expander-0:0/port-0:0:7/end_device-0:0:7
root@(none)$ more ./target0:0:3/0:0:3:0/block/sdd/queue/scheduler
noop [cfq]

and

root@(none)$ more ./target0:0:3/0:0:3:0/block/sdd/queue/scheduler
[mq-deadline] kyber none

Unfortunately my setup has changed since yeterday, and the absolute 
figures are not the exact same (I retested 4.14-rc2). However, we still 
see that drop when mq is enabled.


Here's the results:
4.14-rc4 without default SCSI_MQ
read, rw, write IOPS
860K, 112K/112K, 800K

4.14-rc2 without default SCSI_MQ
read, rw, write IOPS
880K, 113K/113K, 808K

V6.2 series without default SCSI_MQ
read, rw, write IOPS
820K, 114/114K, 790K

V6.2 series with default SCSI_MQ
read, rw, write IOPS
700K, 130K/128K, 640K

Cheers,
John



-- Ming .

Re: [PATCH v7 9/9] block, scsi: Make SCSI quiesce and resume work reliably

2017-10-10 Thread Ming Lei

On Mon, Oct 09, 2017 at 04:14:00PM -0700, Bart Van Assche wrote:
> The contexts from which a SCSI device can be quiesced or resumed are:
> * Writing into /sys/class/scsi_device/*/device/state.
> * SCSI parallel (SPI) domain validation.
> * The SCSI device power management methods. See also scsi_bus_pm_ops.
> 
> It is essential during suspend and resume that neither the filesystem
> state nor the filesystem metadata in RAM changes. This is why while
> the hibernation image is being written or restored that SCSI devices
> are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
> and scsi_device_resume(). In the SDEV_QUIESCE state execution of
> non-preempt requests is deferred. This is realized by returning
> BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
> devices. Avoid that a full queue prevents power management requests
> to be submitted by deferring allocation of non-preempt requests for
> devices in the quiesced state. This patch has been tested by running
> the following commands and by verifying that after resume the fio job
> is still running:
> 
> for d in /sys/class/block/sd*[a-z]; do
>   hcil=$(readlink "$d/device")
>   hcil=${hcil#../../../}
>   echo 4 > "$d/queue/nr_requests"
>   echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
> done
> bdev=$(readlink /dev/disk/by-uuid/5217d83f-213e-4b42-b86e-20013325ba6c)
> bdev=${bdev#../../}
> hcil=$(readlink "/sys/block/$bdev/device")
> hcil=${hcil#../../../}
> fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 
> --rw=randread \
>   --ioengine=libaio --numjobs=4 --iodepth=16 --iodepth_batch=1 --thread \
>   --loops=$((2**31)) &
> pid=$!
> sleep 1
> systemctl hibernate
> sleep 10
> kill $pid
> 
> Reported-by: Oleksandr Natalenko 
> References: "I/O hangs after resuming from suspend-to-ram" 
> (https://marc.info/?l=linux-block=150340235201348).
> Signed-off-by: Bart Van Assche 
> Cc: Martin K. Petersen 
> Cc: Ming Lei 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---
>  block/blk-core.c| 42 +++---
>  block/blk-mq.c  |  4 ++--
>  block/blk-timeout.c |  2 +-
>  drivers/scsi/scsi_lib.c | 28 ++--
>  fs/block_dev.c  |  4 ++--
>  include/linux/blkdev.h  |  2 +-
>  6 files changed, 59 insertions(+), 23 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ed992cbd107f..3847ea42e341 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -372,6 +372,7 @@ void blk_clear_preempt_only(struct request_queue *q)
>  
>   spin_lock_irqsave(q->queue_lock, flags);
>   queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
> + wake_up_all(>mq_freeze_wq);
>   spin_unlock_irqrestore(q->queue_lock, flags);
>  }
>  EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
> @@ -793,15 +794,40 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
>  }
>  EXPORT_SYMBOL(blk_alloc_queue);
>  
> -int blk_queue_enter(struct request_queue *q, bool nowait)
> +/**
> + * blk_queue_enter() - try to increase q->q_usage_counter
> + * @q: request queue pointer
> + * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
> + */
> +int blk_queue_enter(struct request_queue *q, unsigned int flags)
>  {
> + const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
> +
>   while (true) {
> + bool success = false;
>   int ret;
>  
> - if (percpu_ref_tryget_live(>q_usage_counter))
> + rcu_read_lock_sched();
> + if (percpu_ref_tryget_live(>q_usage_counter)) {
> + /*
> +  * The code that sets the PREEMPT_ONLY flag is
> +  * responsible for ensuring that that flag is globally
> +  * visible before the queue is unfrozen.
> +  */
> + if (preempt || !blk_queue_preempt_only(q)) {
> + success = true;
> + } else {
> + percpu_ref_put(>q_usage_counter);
> + WARN_ONCE("%s: Attempt to allocate non-preempt 
> request in preempt-only mode.\n",
> +   kobject_name(q->kobj.parent));
> + }
> + }
> + rcu_read_unlock_sched();
> +
> + if (success)
>   return 0;
>  
> - if (nowait)
> + if (flags & BLK_MQ_REQ_NOWAIT)
>   return -EBUSY;
>  
>   /*
> @@ -814,7 +840,8 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
>   smp_rmb();
>  
>   ret = wait_event_interruptible(q->mq_freeze_wq,
> - !atomic_read(>mq_freeze_depth) ||
> +

[PATCH 08/10] be2iscsi: Fix misc static analysis errors

2017-10-10 Thread Jitendra Bhivare

The patch fixes errors reported by tools like smatch:
 - removes unused structure fields
 - removes dead code
 - fixes code identation

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be.h  | 17 
 drivers/scsi/be2iscsi/be_cmds.c |  1 -
 drivers/scsi/be2iscsi/be_cmds.h | 14 ++---
 drivers/scsi/be2iscsi/be_main.c | 40 
 drivers/scsi/be2iscsi/be_main.h | 45 ++---
 drivers/scsi/be2iscsi/be_mgmt.h |  6 --
 6 files changed, 29 insertions(+), 94 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be.h b/drivers/scsi/be2iscsi/be.h
index 55e3f8b..1310fbf 100644
--- a/drivers/scsi/be2iscsi/be.h
+++ b/drivers/scsi/be2iscsi/be.h
@@ -81,12 +81,12 @@ static inline void queue_tail_inc(struct be_queue_info *q)
 /*ISCSI */
 
 struct be_aic_obj {/* Adaptive interrupt coalescing (AIC) info */
-   u32 min_eqd;/* in usecs */
-   u32 max_eqd;/* in usecs */
-   u32 prev_eqd;   /* in usecs */
-   u32 et_eqd; /* configured val when aic is off */
-   ulong jiffies;
-   u64 eq_prev;/* Used to calculate eqe */
+   unsigned long jiffies;
+   u32 eq_prev;/* Used to calculate eqe */
+   u32 prev_eqd;
+#define BEISCSI_EQ_DELAY_MIN   0
+#define BEISCSI_EQ_DELAY_DEF   32
+#define BEISCSI_EQ_DELAY_MAX   128
 };
 
 struct be_eq_obj {
@@ -148,9 +148,8 @@ struct be_ctrl_info {
 /* TAG is from 1...MAX_MCC_CMD, MASK includes MAX_MCC_CMD */
 #define MCC_Q_CMD_TAG_MASK ((MAX_MCC_CMD << 1) - 1)
 
-#define PAGE_SHIFT_4K 12
-#define PAGE_SIZE_4K (1 << PAGE_SHIFT_4K)
-#define mcc_timeout12 /* 12s timeout */
+#define PAGE_SHIFT_4K  12
+#define PAGE_SIZE_4K   (1 << PAGE_SHIFT_4K)
 
 /* Returns number of pages spanned by the data starting at the given addr */
 #define PAGES_4K_SPANNED(_address, size)   \
diff --git a/drivers/scsi/be2iscsi/be_cmds.c b/drivers/scsi/be2iscsi/be_cmds.c
index 0499666..7d633dd 100644
--- a/drivers/scsi/be2iscsi/be_cmds.c
+++ b/drivers/scsi/be2iscsi/be_cmds.c
@@ -947,7 +947,6 @@ int beiscsi_cmd_q_destroy(struct be_ctrl_info *ctrl, struct 
be_queue_info *q,
default:
mutex_unlock(>mbox_lock);
BUG();
-   return -ENXIO;
}
be_cmd_hdr_prepare(>hdr, subsys, opcode, sizeof(*req));
if (queue_type != QTYPE_SGL)
diff --git a/drivers/scsi/be2iscsi/be_cmds.h b/drivers/scsi/be2iscsi/be_cmds.h
index 5d5e8fb..fc252de 100644
--- a/drivers/scsi/be2iscsi/be_cmds.h
+++ b/drivers/scsi/be2iscsi/be_cmds.h
@@ -1298,19 +1298,9 @@ struct be_cmd_get_port_name {
 * a read command
 */
 #define TGT_CTX_UPDT_CMD   7   /* Target context update */
-#define TGT_STS_CMD8   /* Target R2T and other BHS
-* where only the status number
-* need to be updated
-*/
-#define TGT_DATAIN_CMD 9   /* Target Data-Ins in response
-* to read command
-*/
-#define TGT_SOS_PDU10  /* Target:standalone status
-* response
-*/
 #define TGT_DM_CMD 11  /* Indicates that the bhs
-*  preparedby
-* driver should not be touched
+* prepared by driver should not
+* be touched.
 */
 
 /* Returns the number of items in the field array. */
diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index 02144c5..f8123ad5 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -455,14 +455,12 @@ static int beiscsi_map_pci_bars(struct beiscsi_hba *phba,
return -ENOMEM;
phba->ctrl.csr = addr;
phba->csr_va = addr;
-   phba->csr_pa.u.a64.address = pci_resource_start(pcidev, 2);
 
addr = ioremap_nocache(pci_resource_start(pcidev, 4), 128 * 1024);
if (addr == NULL)
goto pci_map_err;
phba->ctrl.db = addr;
phba->db_va = addr;
-   phba->db_pa.u.a64.address =  pci_resource_start(pcidev, 4);
 
if (phba->generation == BE_GEN2)
pcicfg_reg = 1;
@@ -476,7 +474,6 @@ static int beiscsi_map_pci_bars(struct beiscsi_hba *phba,
goto pci_map_err;
phba->ctrl.pcicfg = addr;

[PATCH 10/10] scsi: be2iscsi: Update driver version

2017-10-10 Thread Jitendra Bhivare

Version 11.4.0.1

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_main.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/be2iscsi/be_main.h b/drivers/scsi/be2iscsi/be_main.h
index 3d21849..9bc7ef9 100644
--- a/drivers/scsi/be2iscsi/be_main.h
+++ b/drivers/scsi/be2iscsi/be_main.h
@@ -31,7 +31,7 @@
 #include 
 
 #define DRV_NAME   "be2iscsi"
-#define BUILD_STR  "11.4.0.0"
+#define BUILD_STR  "11.4.0.1"
 #define BE_NAME"Emulex OneConnect" \
"Open-iSCSI Driver version" BUILD_STR
 #define DRV_DESC   BE_NAME " " "Driver"
-- 
2.7.4

[PATCH 07/10] be2iscsi: Add cmd to set host data

2017-10-10 Thread Jitendra Bhivare

Provide driver version in host data to FW.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_cmds.c | 46 +
 drivers/scsi/be2iscsi/be_cmds.h | 26 +++
 drivers/scsi/be2iscsi/be_main.c |  2 ++
 3 files changed, 74 insertions(+)

diff --git a/drivers/scsi/be2iscsi/be_cmds.c b/drivers/scsi/be2iscsi/be_cmds.c
index 6af448d..0499666 100644
--- a/drivers/scsi/be2iscsi/be_cmds.c
+++ b/drivers/scsi/be2iscsi/be_cmds.c
@@ -1522,6 +1522,52 @@ int beiscsi_get_port_name(struct be_ctrl_info *ctrl, 
struct beiscsi_hba *phba)
return ret;
 }
 
+int beiscsi_set_host_data(struct beiscsi_hba *phba)
+{
+   struct be_ctrl_info *ctrl = >ctrl;
+   struct be_cmd_set_host_data *ioctl;
+   struct be_mcc_wrb *wrb;
+   int ret = 0;
+
+   if (is_chip_be2_be3r(phba))
+   return ret;
+
+   mutex_lock(>mbox_lock);
+   wrb = wrb_from_mbox(>mbox_mem);
+   memset(wrb, 0, sizeof(*wrb));
+   ioctl = embedded_payload(wrb);
+
+   be_wrb_hdr_prepare(wrb, sizeof(*ioctl), true, 0);
+   be_cmd_hdr_prepare(>h.req_hdr, CMD_SUBSYSTEM_COMMON,
+  OPCODE_COMMON_SET_HOST_DATA,
+  EMBED_MBX_MAX_PAYLOAD_SIZE);
+   ioctl->param.req.param_id = BE_CMD_SET_HOST_PARAM_ID;
+   ioctl->param.req.param_len =
+   snprintf((char *)ioctl->param.req.param_data,
+sizeof(ioctl->param.req.param_data),
+"Linux iSCSI v%s", BUILD_STR);
+   ioctl->param.req.param_len = ALIGN(ioctl->param.req.param_len, 4);
+   if (ioctl->param.req.param_len > BE_CMD_MAX_DRV_VERSION)
+   ioctl->param.req.param_len = BE_CMD_MAX_DRV_VERSION;
+   ret = be_mbox_notify(ctrl);
+   if (!ret) {
+   beiscsi_log(phba, KERN_INFO, BEISCSI_LOG_INIT,
+   "BG_%d : HBA set host driver version\n");
+   } else {
+   /**
+* Check "MCC_STATUS_INVALID_LENGTH" for SKH.
+* Older FW versions return this error.
+*/
+   if (ret == MCC_STATUS_ILLEGAL_REQUEST ||
+   ret == MCC_STATUS_INVALID_LENGTH)
+   __beiscsi_log(phba, KERN_INFO,
+ "BG_%d : HBA failed to set host driver 
version\n");
+   }
+
+   mutex_unlock(>mbox_lock);
+   return ret;
+}
+
 int beiscsi_set_uer_feature(struct beiscsi_hba *phba)
 {
struct be_ctrl_info *ctrl = >ctrl;
diff --git a/drivers/scsi/be2iscsi/be_cmds.h b/drivers/scsi/be2iscsi/be_cmds.h
index 4f1ac97..5d5e8fb 100644
--- a/drivers/scsi/be2iscsi/be_cmds.h
+++ b/drivers/scsi/be2iscsi/be_cmds.h
@@ -230,6 +230,7 @@ struct be_mcc_mailbox {
 #define OPCODE_COMMON_QUERY_FIRMWARE_CONFIG58
 #define OPCODE_COMMON_FUNCTION_RESET   61
 #define OPCODE_COMMON_GET_PORT_NAME77
+#define OPCODE_COMMON_SET_HOST_DATA93
 #define OPCODE_COMMON_SET_FEATURES 191
 
 /**
@@ -737,6 +738,30 @@ struct be_cmd_hba_name {
u8 initiator_alias[BE_INI_ALIAS_LEN];
 } __packed;
 
+/ COMMON SET HOST DATA ***/
+#define BE_CMD_SET_HOST_PARAM_ID   0x2
+#define BE_CMD_MAX_DRV_VERSION 0x30
+struct be_sethost_req {
+   u32 param_id;
+   u32 param_len;
+   u32 param_data[32];
+};
+
+struct be_sethost_resp {
+   u32 rsvd0;
+};
+
+struct be_cmd_set_host_data {
+   union {
+   struct be_cmd_req_hdr req_hdr;
+   struct be_cmd_resp_hdr resp_hdr;
+   } h;
+   union {
+   struct be_sethost_req req;
+   struct be_sethost_resp resp;
+   } param;
+} __packed;
+
 / COMMON SET Features ***/
 #define BE_CMD_SET_FEATURE_UER 0x10
 #define BE_CMD_UER_SUPP_BIT0x1
@@ -845,6 +870,7 @@ int beiscsi_get_fw_config(struct be_ctrl_info *ctrl, struct 
beiscsi_hba *phba);
 int beiscsi_get_port_name(struct be_ctrl_info *ctrl, struct beiscsi_hba *phba);
 
 int beiscsi_set_uer_feature(struct beiscsi_hba *phba);
+int beiscsi_set_host_data(struct beiscsi_hba *phba);
 
 struct be_default_pdu_context {
u32 dw[4];
diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index 8f7e394..02144c5 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -5325,6 +5325,7 @@ static int beiscsi_enable_port(struct beiscsi_hba *phba)
be2iscsi_enable_msix(phba);
 
beiscsi_get_params(phba);
+   beiscsi_set_host_data(phba);
/* Re-enable UER. If different TPE occurs then it is recoverable. */
beiscsi_set_uer_feature(phba);
 
@@ -5623,6 +5624,7 @@ static int beiscsi_dev_probe(struct pci_dev *pcidev,
}
beiscsi_get_port_name(>ctrl, phba);
beiscsi_get_params(phba);
+   beiscsi_set_host_data(phba);

[PATCH 09/10] be2iscsi: Remove A-circumflex character in copyright marking

2017-10-10 Thread Jitendra Bhivare

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be.h   | 2 +-
 drivers/scsi/be2iscsi/be_cmds.c  | 2 +-
 drivers/scsi/be2iscsi/be_cmds.h  | 2 +-
 drivers/scsi/be2iscsi/be_iscsi.c | 2 +-
 drivers/scsi/be2iscsi/be_iscsi.h | 2 +-
 drivers/scsi/be2iscsi/be_main.c  | 2 +-
 drivers/scsi/be2iscsi/be_main.h  | 2 +-
 drivers/scsi/be2iscsi/be_mgmt.c  | 2 +-
 drivers/scsi/be2iscsi/be_mgmt.h  | 2 +-
 9 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be.h b/drivers/scsi/be2iscsi/be.h
index 1310fbf..e035acf 100644
--- a/drivers/scsi/be2iscsi/be.h
+++ b/drivers/scsi/be2iscsi/be.h
@@ -1,5 +1,5 @@
 /*
- * Copyright 2017 Broadcom. All Rights Reserved.
+ * Copyright 2017 Broadcom. All Rights Reserved.
  * The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.
  *
  * This program is free software; you can redistribute it and/or
diff --git a/drivers/scsi/be2iscsi/be_cmds.c b/drivers/scsi/be2iscsi/be_cmds.c
index 7d633dd..2eb66df 100644
--- a/drivers/scsi/be2iscsi/be_cmds.c
+++ b/drivers/scsi/be2iscsi/be_cmds.c
@@ -1,5 +1,5 @@
 /*
- * Copyright 2017 Broadcom. All Rights Reserved.
+ * Copyright 2017 Broadcom. All Rights Reserved.
  * The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.
  *
  * This program is free software; you can redistribute it and/or
diff --git a/drivers/scsi/be2iscsi/be_cmds.h b/drivers/scsi/be2iscsi/be_cmds.h
index fc252de..6f05d1d 100644
--- a/drivers/scsi/be2iscsi/be_cmds.h
+++ b/drivers/scsi/be2iscsi/be_cmds.h
@@ -1,5 +1,5 @@
 /*
- * Copyright 2017 Broadcom. All Rights Reserved.
+ * Copyright 2017 Broadcom. All Rights Reserved.
  * The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.
  *
  * This program is free software; you can redistribute it and/or
diff --git a/drivers/scsi/be2iscsi/be_iscsi.c b/drivers/scsi/be2iscsi/be_iscsi.c
index aef9764..a398c54 100644
--- a/drivers/scsi/be2iscsi/be_iscsi.c
+++ b/drivers/scsi/be2iscsi/be_iscsi.c
@@ -1,5 +1,5 @@
 /*
- * Copyright 2017 Broadcom. All Rights Reserved.
+ * Copyright 2017 Broadcom. All Rights Reserved.
  * The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.
  *
  * This program is free software; you can redistribute it and/or
diff --git a/drivers/scsi/be2iscsi/be_iscsi.h b/drivers/scsi/be2iscsi/be_iscsi.h
index b9d459a..f41dfda 100644
--- a/drivers/scsi/be2iscsi/be_iscsi.h
+++ b/drivers/scsi/be2iscsi/be_iscsi.h
@@ -1,5 +1,5 @@
 /*
- * Copyright 2017 Broadcom. All Rights Reserved.
+ * Copyright 2017 Broadcom. All Rights Reserved.
  * The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.
  *
  * This program is free software; you can redistribute it and/or
diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index f8123ad5..7561e13 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -1,5 +1,5 @@
 /*
- * Copyright 2017 Broadcom. All Rights Reserved.
+ * Copyright 2017 Broadcom. All Rights Reserved.
  * The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.
  *
  * This program is free software; you can redistribute it and/or
diff --git a/drivers/scsi/be2iscsi/be_main.h b/drivers/scsi/be2iscsi/be_main.h
index 6d8f819..3d21849 100644
--- a/drivers/scsi/be2iscsi/be_main.h
+++ b/drivers/scsi/be2iscsi/be_main.h
@@ -1,5 +1,5 @@
 /*
- * Copyright 2017 Broadcom. All Rights Reserved.
+ * Copyright 2017 Broadcom. All Rights Reserved.
  * The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.
  *
  * This program is free software; you can redistribute it and/or
diff --git a/drivers/scsi/be2iscsi/be_mgmt.c b/drivers/scsi/be2iscsi/be_mgmt.c
index 713c7de..66ca967 100644
--- a/drivers/scsi/be2iscsi/be_mgmt.c
+++ b/drivers/scsi/be2iscsi/be_mgmt.c
@@ -1,5 +1,5 @@
 /*
- * Copyright 2017 Broadcom. All Rights Reserved.
+ * Copyright 2017 Broadcom. All Rights Reserved.
  * The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.
  *
  * This program is free software; you can redistribute it and/or
diff --git a/drivers/scsi/be2iscsi/be_mgmt.h b/drivers/scsi/be2iscsi/be_mgmt.h
index b310c24..0b22c99 100644
--- a/drivers/scsi/be2iscsi/be_mgmt.h
+++ b/drivers/scsi/be2iscsi/be_mgmt.h
@@ -1,5 +1,5 @@
 /*
- * Copyright 2017 Broadcom. All Rights Reserved.
+ * Copyright 2017 Broadcom. All Rights Reserved.
  * The term "Broadcom" refers to Broadcom Limited and/or its subsidiaries.
  *
  * This program is free software; you can redistribute it and/or
-- 
2.7.4

[PATCH 06/10] be2iscsi: Modify IOCTL to fetch user configured IQN

2017-10-10 Thread Jitendra Bhivare

Add version 1 of GET_HBA_NAME to fetch port specific IQN first.
If it fails use version 0 to get the IQN.

To use this old IQN names of interfaces needs to be cleared from
the iscsiadm database.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_iscsi.c | 12 
 drivers/scsi/be2iscsi/be_mgmt.c  |  7 ++-
 drivers/scsi/be2iscsi/be_mgmt.h  |  2 +-
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be_iscsi.c b/drivers/scsi/be2iscsi/be_iscsi.c
index 512c52a..aef9764 100644
--- a/drivers/scsi/be2iscsi/be_iscsi.c
+++ b/drivers/scsi/be2iscsi/be_iscsi.c
@@ -762,11 +762,15 @@ int beiscsi_get_host_param(struct Scsi_Host *shost,
}
break;
case ISCSI_HOST_PARAM_INITIATOR_NAME:
-   status = beiscsi_get_initiator_name(phba, buf);
+   /* try fetching user configured name first */
+   status = beiscsi_get_initiator_name(phba, buf, true);
if (status < 0) {
-   beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_CONFIG,
-   "BS_%d : Retreiving Initiator Name 
Failed\n");
-   return 0;
+   status = beiscsi_get_initiator_name(phba, buf, false);
+   if (status < 0) {
+   beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_CONFIG,
+   "BS_%d : Retreiving Initiator Name 
Failed\n");
+   status = 0;
+   }
}
break;
case ISCSI_HOST_PARAM_PORT_STATE:
diff --git a/drivers/scsi/be2iscsi/be_mgmt.c b/drivers/scsi/be2iscsi/be_mgmt.c
index 0c25c10..713c7de 100644
--- a/drivers/scsi/be2iscsi/be_mgmt.c
+++ b/drivers/scsi/be2iscsi/be_mgmt.c
@@ -339,12 +339,14 @@ int beiscsi_modify_eq_delay(struct beiscsi_hba *phba,
  * beiscsi_get_initiator_name - read initiator name from flash
  * @phba: device priv structure
  * @name: buffer pointer
+ * @cfg: fetch user configured
  *
  */
-int beiscsi_get_initiator_name(struct beiscsi_hba *phba, char *name)
+int beiscsi_get_initiator_name(struct beiscsi_hba *phba, char *name, bool cfg)
 {
struct be_dma_mem nonemb_cmd;
struct be_cmd_hba_name resp;
+   struct be_cmd_hba_name *req;
int rc;
 
rc = beiscsi_prep_nemb_cmd(phba, _cmd, CMD_SUBSYSTEM_ISCSI_INI,
@@ -352,6 +354,9 @@ int beiscsi_get_initiator_name(struct beiscsi_hba *phba, 
char *name)
if (rc)
return rc;
 
+   req = nonemb_cmd.va;
+   if (cfg)
+   req->hdr.version = 1;
rc = beiscsi_exec_nemb_cmd(phba, _cmd, NULL,
   , sizeof(resp));
if (rc) {
diff --git a/drivers/scsi/be2iscsi/be_mgmt.h b/drivers/scsi/be2iscsi/be_mgmt.h
index 665fd89..8d886f8 100644
--- a/drivers/scsi/be2iscsi/be_mgmt.h
+++ b/drivers/scsi/be2iscsi/be_mgmt.h
@@ -178,7 +178,7 @@ int beiscsi_mgmt_invalidate_icds(struct beiscsi_hba *phba,
 struct invldt_cmd_tbl *inv_tbl,
 unsigned int nents);
 
-int beiscsi_get_initiator_name(struct beiscsi_hba *phba, char *name);
+int beiscsi_get_initiator_name(struct beiscsi_hba *phba, char *name, bool cfg);
 
 int beiscsi_if_en_dhcp(struct beiscsi_hba *phba, u32 ip_type);
 
-- 
2.7.4

[PATCH 02/10] be2iscsi: Fix return value in mgmt_open_connection

2017-10-10 Thread Jitendra Bhivare

mgmt_open_connection is expected to return tag not errno.

In error case, just return invalid tag 0.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_mgmt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/be2iscsi/be_mgmt.c b/drivers/scsi/be2iscsi/be_mgmt.c
index c737753..af6ee43 100644
--- a/drivers/scsi/be2iscsi/be_mgmt.c
+++ b/drivers/scsi/be2iscsi/be_mgmt.c
@@ -156,7 +156,7 @@ int mgmt_open_connection(struct beiscsi_hba *phba,
beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_CONFIG,
"BG_%d : unknown addr family %d\n",
dst_addr->sa_family);
-   return -EINVAL;
+   return 0;
}
 
phwi_ctrlr = phba->phwi_ctrlr;
-- 
2.7.4

[PATCH 01/10] be2iscsi: Fix boot flags in sysfs

2017-10-10 Thread Jitendra Bhivare

The boot flags exported through sysfs was wrongly reverted to 2.
Use boot flag 3 required per spec.
Bit 0 Block valid flag
Bit 1 Firmware booting selected

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_main.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index b4542e7..56ae0f4 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -4917,6 +4917,13 @@ void beiscsi_start_boot_work(struct beiscsi_hba *phba, 
unsigned int s_handle)
schedule_work(>boot_work);
 }
 
+/**
+ * Boot flag info for iscsi-utilities
+ * Bit 0 Block valid flag
+ * Bit 1 Firmware booting selected
+ */
+#define BEISCSI_SYSFS_ISCSI_BOOT_FLAGS 3
+
 static ssize_t beiscsi_show_boot_tgt_info(void *data, int type, char *buf)
 {
struct beiscsi_hba *phba = data;
@@ -4972,7 +4979,7 @@ static ssize_t beiscsi_show_boot_tgt_info(void *data, int 
type, char *buf)
 auth_data.chap.intr_secret);
break;
case ISCSI_BOOT_TGT_FLAGS:
-   rc = sprintf(str, "2\n");
+   rc = sprintf(str, "%d\n", BEISCSI_SYSFS_ISCSI_BOOT_FLAGS);
break;
case ISCSI_BOOT_TGT_NIC_ASSOC:
rc = sprintf(str, "0\n");
@@ -5004,7 +5011,7 @@ static ssize_t beiscsi_show_boot_eth_info(void *data, int 
type, char *buf)
 
switch (type) {
case ISCSI_BOOT_ETH_FLAGS:
-   rc = sprintf(str, "2\n");
+   rc = sprintf(str, "%d\n", BEISCSI_SYSFS_ISCSI_BOOT_FLAGS);
break;
case ISCSI_BOOT_ETH_INDEX:
rc = sprintf(str, "0\n");
-- 
2.7.4

[PATCH 04/10] be2iscsi: Fix _modify_eq_delay buffer overflow

2017-10-10 Thread Jitendra Bhivare

beiscsi_modify_eq_delay is using embedded command to send request of
788 bytes in 236 bytes buffer. Non-embedded command needs to be used
in such cases.

Use mgmt_alloc_cmd_data fn modified to allow passing of subsystem.
Use mgmt_exec_nonemb_cmd fn modified to allow setting of callback.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_cmds.c |   6 +-
 drivers/scsi/be2iscsi/be_cmds.h |   4 +-
 drivers/scsi/be2iscsi/be_mgmt.c | 212 ++--
 3 files changed, 124 insertions(+), 98 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be_cmds.c b/drivers/scsi/be2iscsi/be_cmds.c
index a79a5e7..6af448d 100644
--- a/drivers/scsi/be2iscsi/be_cmds.c
+++ b/drivers/scsi/be2iscsi/be_cmds.c
@@ -675,8 +675,8 @@ static int be_mbox_notify(struct be_ctrl_info *ctrl)
return status;
 }
 
-void be_wrb_hdr_prepare(struct be_mcc_wrb *wrb, int payload_len,
-   bool embedded, u8 sge_cnt)
+void be_wrb_hdr_prepare(struct be_mcc_wrb *wrb, u32 payload_len,
+   bool embedded, u8 sge_cnt)
 {
if (embedded)
wrb->emb_sgecnt_special |= MCC_WRB_EMBEDDED_MASK;
@@ -688,7 +688,7 @@ void be_wrb_hdr_prepare(struct be_mcc_wrb *wrb, int 
payload_len,
 }
 
 void be_cmd_hdr_prepare(struct be_cmd_req_hdr *req_hdr,
-   u8 subsystem, u8 opcode, int cmd_len)
+   u8 subsystem, u8 opcode, u32 cmd_len)
 {
req_hdr->opcode = opcode;
req_hdr->subsystem = subsystem;
diff --git a/drivers/scsi/be2iscsi/be_cmds.h b/drivers/scsi/be2iscsi/be_cmds.h
index d9b6773..22ddfe9 100644
--- a/drivers/scsi/be2iscsi/be_cmds.h
+++ b/drivers/scsi/be2iscsi/be_cmds.h
@@ -1444,9 +1444,9 @@ struct be_cmd_get_port_name {
 * the cxn
 */
 
-void be_wrb_hdr_prepare(struct be_mcc_wrb *wrb, int payload_len,
+void be_wrb_hdr_prepare(struct be_mcc_wrb *wrb, u32 payload_len,
bool embedded, u8 sge_cnt);
 
 void be_cmd_hdr_prepare(struct be_cmd_req_hdr *req_hdr,
-   u8 subsystem, u8 opcode, int cmd_len);
+   u8 subsystem, u8 opcode, u32 cmd_len);
 #endif /* !BEISCSI_CMDS_H */
diff --git a/drivers/scsi/be2iscsi/be_mgmt.c b/drivers/scsi/be2iscsi/be_mgmt.c
index af6ee43..2117ac0 100644
--- a/drivers/scsi/be2iscsi/be_mgmt.c
+++ b/drivers/scsi/be2iscsi/be_mgmt.c
@@ -19,43 +19,6 @@
 #include "be_iscsi.h"
 #include "be_main.h"
 
-int beiscsi_modify_eq_delay(struct beiscsi_hba *phba,
-   struct be_set_eqd *set_eqd,
-   int num)
-{
-   struct be_ctrl_info *ctrl = >ctrl;
-   struct be_mcc_wrb *wrb;
-   struct be_cmd_req_modify_eq_delay *req;
-   unsigned int tag;
-   int i;
-
-   mutex_lock(>mbox_lock);
-   wrb = alloc_mcc_wrb(phba, );
-   if (!wrb) {
-   mutex_unlock(>mbox_lock);
-   return 0;
-   }
-
-   req = embedded_payload(wrb);
-   be_wrb_hdr_prepare(wrb, sizeof(*req), true, 0);
-   be_cmd_hdr_prepare(>hdr, CMD_SUBSYSTEM_COMMON,
-  OPCODE_COMMON_MODIFY_EQ_DELAY, sizeof(*req));
-
-   req->num_eq = cpu_to_le32(num);
-   for (i = 0; i < num; i++) {
-   req->delay[i].eq_id = cpu_to_le32(set_eqd[i].eq_id);
-   req->delay[i].phase = 0;
-   req->delay[i].delay_multiplier =
-   cpu_to_le32(set_eqd[i].delay_multiplier);
-   }
-
-   /* ignore the completion of this mbox command */
-   set_bit(MCC_TAG_STATE_IGNORE, >ptag_state[tag].tag_state);
-   be_mcc_notify(phba, tag);
-   mutex_unlock(>mbox_lock);
-   return tag;
-}
-
 unsigned int mgmt_vendor_specific_fw_cmd(struct be_ctrl_info *ctrl,
 struct beiscsi_hba *phba,
 struct bsg_job *job,
@@ -236,16 +199,19 @@ int mgmt_open_connection(struct beiscsi_hba *phba,
 }
 
 /*
- * mgmt_exec_nonemb_cmd()- Execute Non Embedded MBX Cmd
- * @phba: Driver priv structure
- * @nonemb_cmd: Address of the MBX command issued
- * @resp_buf: Buffer to copy the MBX cmd response
- * @resp_buf_len: respone lenght to be copied
+ * beiscsi_exec_nemb_cmd()- execute non-embedded MBX cmd
+ * @phba: driver priv structure
+ * @nonemb_cmd: DMA address of the MBX command to be issued
+ * @cbfn: callback func on MCC completion
+ * @resp_buf: buffer to copy the MBX cmd response
+ * @resp_buf_len: response length to be copied
  *
  **/
-static int mgmt_exec_nonemb_cmd(struct beiscsi_hba *phba,
-   struct be_dma_mem *nonemb_cmd, void *resp_buf,
-   int resp_buf_len)
+static int beiscsi_exec_nemb_cmd(struct beiscsi_hba *phba,
+struct be_dma_mem *nonemb_cmd,
+void (*cbfn)(struct beiscsi_hba *,
+

[PATCH 05/10] be2iscsi: Fix _get_initname buffer overflow

2017-10-10 Thread Jitendra Bhivare

be_cmd_get_initname pulls GET_HBA_NAME response of 276 bytes in embedded
WRB buffer of 236 bytes.

Use non-embedded functions to issue the IOCTL.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_cmds.h  |  2 --
 drivers/scsi/be2iscsi/be_iscsi.c | 44 +++
 drivers/scsi/be2iscsi/be_mgmt.c  | 57 
 drivers/scsi/be2iscsi/be_mgmt.h  |  2 ++
 4 files changed, 35 insertions(+), 70 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be_cmds.h b/drivers/scsi/be2iscsi/be_cmds.h
index 22ddfe9..4f1ac97 100644
--- a/drivers/scsi/be2iscsi/be_cmds.h
+++ b/drivers/scsi/be2iscsi/be_cmds.h
@@ -793,8 +793,6 @@ int beiscsi_cmd_mccq_create(struct beiscsi_hba *phba,
struct be_queue_info *mccq,
struct be_queue_info *cq);
 
-unsigned int be_cmd_get_initname(struct beiscsi_hba *phba);
-
 void free_mcc_wrb(struct be_ctrl_info *ctrl, unsigned int tag);
 
 int beiscsi_modify_eq_delay(struct beiscsi_hba *phba, struct be_set_eqd *,
diff --git a/drivers/scsi/be2iscsi/be_iscsi.c b/drivers/scsi/be2iscsi/be_iscsi.c
index 43a80ce..512c52a 100644
--- a/drivers/scsi/be2iscsi/be_iscsi.c
+++ b/drivers/scsi/be2iscsi/be_iscsi.c
@@ -684,41 +684,6 @@ int beiscsi_set_param(struct iscsi_cls_conn *cls_conn,
 }
 
 /**
- * beiscsi_get_initname - Read Initiator Name from flash
- * @buf: buffer bointer
- * @phba: The device priv structure instance
- *
- * returns number of bytes
- */
-static int beiscsi_get_initname(char *buf, struct beiscsi_hba *phba)
-{
-   int rc;
-   unsigned int tag;
-   struct be_mcc_wrb *wrb;
-   struct be_cmd_hba_name *resp;
-
-   tag = be_cmd_get_initname(phba);
-   if (!tag) {
-   beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_CONFIG,
-   "BS_%d : Getting Initiator Name Failed\n");
-
-   return -EBUSY;
-   }
-
-   rc = beiscsi_mccq_compl_wait(phba, tag, , NULL);
-   if (rc) {
-   beiscsi_log(phba, KERN_ERR,
-   BEISCSI_LOG_CONFIG | BEISCSI_LOG_MBOX,
-   "BS_%d : Initiator Name MBX Failed\n");
-   return rc;
-   }
-
-   resp = embedded_payload(wrb);
-   rc = sprintf(buf, "%s\n", resp->initiator_name);
-   return rc;
-}
-
-/**
  * beiscsi_get_port_state - Get the Port State
  * @shost : pointer to scsi_host structure
  *
@@ -772,7 +737,6 @@ static void beiscsi_get_port_speed(struct Scsi_Host *shost)
  * @param: parameter type identifier
  * @buf: buffer pointer
  *
- * returns host parameter
  */
 int beiscsi_get_host_param(struct Scsi_Host *shost,
   enum iscsi_host_param param, char *buf)
@@ -783,7 +747,7 @@ int beiscsi_get_host_param(struct Scsi_Host *shost,
if (!beiscsi_hba_is_online(phba)) {
beiscsi_log(phba, KERN_INFO, BEISCSI_LOG_CONFIG,
"BS_%d : HBA in error 0x%lx\n", phba->state);
-   return -EBUSY;
+   return 0;
}
beiscsi_log(phba, KERN_INFO, BEISCSI_LOG_CONFIG,
"BS_%d : In beiscsi_get_host_param, param = %d\n", param);
@@ -794,15 +758,15 @@ int beiscsi_get_host_param(struct Scsi_Host *shost,
if (status < 0) {
beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_CONFIG,
"BS_%d : beiscsi_get_macaddr Failed\n");
-   return status;
+   return 0;
}
break;
case ISCSI_HOST_PARAM_INITIATOR_NAME:
-   status = beiscsi_get_initname(buf, phba);
+   status = beiscsi_get_initiator_name(phba, buf);
if (status < 0) {
beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_CONFIG,
"BS_%d : Retreiving Initiator Name 
Failed\n");
-   return status;
+   return 0;
}
break;
case ISCSI_HOST_PARAM_PORT_STATE:
diff --git a/drivers/scsi/be2iscsi/be_mgmt.c b/drivers/scsi/be2iscsi/be_mgmt.c
index 2117ac0..0c25c10 100644
--- a/drivers/scsi/be2iscsi/be_mgmt.c
+++ b/drivers/scsi/be2iscsi/be_mgmt.c
@@ -335,6 +335,35 @@ int beiscsi_modify_eq_delay(struct beiscsi_hba *phba,
 __beiscsi_eq_delay_compl, NULL, 0);
 }
 
+/**
+ * beiscsi_get_initiator_name - read initiator name from flash
+ * @phba: device priv structure
+ * @name: buffer pointer
+ *
+ */
+int beiscsi_get_initiator_name(struct beiscsi_hba *phba, char *name)
+{
+   struct be_dma_mem nonemb_cmd;
+   struct be_cmd_hba_name resp;
+   int rc;
+
+   rc = beiscsi_prep_nemb_cmd(phba, _cmd, CMD_SUBSYSTEM_ISCSI_INI,
+   OPCODE_ISCSI_INI_CFG_GET_HBA_NAME, sizeof(resp));
+   if (rc)
+   return rc;
+
+   rc = beiscsi_exec_nemb_cmd(phba, _cmd, NULL,
+

[PATCH 00/10] be2iscsi: driver update 11.4.0.1

2017-10-10 Thread Jitendra Bhivare

This patch is generated against for-next branch.

Jitendra Bhivare (10):
  be2iscsi: Fix boot flags in sysfs
  be2iscsi: Fix return value in mgmt_open_connection
  be2iscsi: Free msi_name and disable HW intr
  be2iscsi: Fix _modify_eq_delay buffer overflow
  be2iscsi: Fix _get_initname buffer overflow
  be2iscsi: Modify IOCTL to fetch user configured IQN
  be2iscsi: Add cmd to set host data
  be2iscsi: Fix misc static analysis errors
  be2iscsi: Remove A-circumflex character in copyright marking
  scsi: be2iscsi: Update driver version

 drivers/scsi/be2iscsi/be.h   |  19 ++-
 drivers/scsi/be2iscsi/be_cmds.c  |  55 +++-
 drivers/scsi/be2iscsi/be_cmds.h  |  48 ---
 drivers/scsi/be2iscsi/be_iscsi.c |  54 ++--
 drivers/scsi/be2iscsi/be_iscsi.h |   2 +-
 drivers/scsi/be2iscsi/be_main.c  | 102 --
 drivers/scsi/be2iscsi/be_main.h  |  49 +--
 drivers/scsi/be2iscsi/be_mgmt.c  | 278 ++-
 drivers/scsi/be2iscsi/be_mgmt.h  |  10 +-
 9 files changed, 323 insertions(+), 294 deletions(-)

-- 
2.7.4

[PATCH 03/10] be2iscsi: Free msi_name and disable HW intr

2017-10-10 Thread Jitendra Bhivare

In beiscsi_dev_probe, allocated msi_name does not get freed and enabled
HW interrupts are not disabled in iscsi_host_add error case.

Add beiscsi_free_irqs fn to handle the cleanup in probe and disable port.

Signed-off-by: Jitendra Bhivare 
---
 drivers/scsi/be2iscsi/be_main.c | 47 ++---
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index 56ae0f4..8f7e394 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -790,6 +790,24 @@ static irqreturn_t be_isr(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
+static void beiscsi_free_irqs(struct beiscsi_hba *phba)
+{
+   struct hwi_context_memory *phwi_context;
+   int i;
+
+   if (!phba->pcidev->msix_enabled) {
+   if (phba->pcidev->irq)
+   free_irq(phba->pcidev->irq, phba);
+   return;
+   }
+
+   phwi_context = phba->phwi_ctrlr->phwi_ctxt;
+   for (i = 0; i <= phba->num_cpus; i++) {
+   free_irq(pci_irq_vector(phba->pcidev, i),
+_context->be_eq[i]);
+   kfree(phba->msi_name[i]);
+   }
+}
 
 static int beiscsi_init_irqs(struct beiscsi_hba *phba)
 {
@@ -5396,15 +5414,7 @@ static void beiscsi_disable_port(struct beiscsi_hba 
*phba, int unload)
phwi_ctrlr = phba->phwi_ctrlr;
phwi_context = phwi_ctrlr->phwi_ctxt;
hwi_disable_intr(phba);
-   if (phba->pcidev->msix_enabled) {
-   for (i = 0; i <= phba->num_cpus; i++) {
-   free_irq(pci_irq_vector(phba->pcidev, i),
-   _context->be_eq[i]);
-   kfree(phba->msi_name[i]);
-   }
-   } else
-   if (phba->pcidev->irq)
-   free_irq(phba->pcidev->irq, phba);
+   beiscsi_free_irqs(phba);
pci_free_irq_vectors(phba->pcidev);
 
for (i = 0; i < phba->num_cpus; i++) {
@@ -5595,12 +5605,12 @@ static int beiscsi_dev_probe(struct pci_dev *pcidev,
if (ret) {
beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_INIT,
"BM_%d : be_ctrl_init failed\n");
-   goto hba_free;
+   goto free_hba;
}
 
ret = beiscsi_init_sliport(phba);
if (ret)
-   goto hba_free;
+   goto free_hba;
 
spin_lock_init(>io_sgl_lock);
spin_lock_init(>mgmt_sgl_lock);
@@ -5680,13 +5690,13 @@ static int beiscsi_dev_probe(struct pci_dev *pcidev,
beiscsi_log(phba, KERN_ERR, BEISCSI_LOG_INIT,
"BM_%d : beiscsi_dev_probe-"
"Failed to beiscsi_init_irqs\n");
-   goto free_blkenbld;
+   goto disable_iopoll;
}
hwi_enable_intr(phba);
 
ret = iscsi_host_add(phba->shost, >pcidev->dev);
if (ret)
-   goto free_blkenbld;
+   goto free_irqs;
 
/* set online bit after port is operational */
set_bit(BEISCSI_HBA_ONLINE, >state);
@@ -5724,12 +5734,15 @@ static int beiscsi_dev_probe(struct pci_dev *pcidev,
"\n\n\n BM_%d : SUCCESS - DRIVER LOADED\n\n\n");
return 0;
 
-free_blkenbld:
-   destroy_workqueue(phba->wq);
+free_irqs:
+   hwi_disable_intr(phba);
+   beiscsi_free_irqs(phba);
+disable_iopoll:
for (i = 0; i < phba->num_cpus; i++) {
pbe_eq = _context->be_eq[i];
irq_poll_disable(_eq->iopoll);
}
+   destroy_workqueue(phba->wq);
 free_twq:
hwi_cleanup_port(phba);
beiscsi_cleanup_port(phba);
@@ -5738,9 +5751,9 @@ static int beiscsi_dev_probe(struct pci_dev *pcidev,
pci_free_consistent(phba->pcidev,
phba->ctrl.mbox_mem_alloced.size,
phba->ctrl.mbox_mem_alloced.va,
-  phba->ctrl.mbox_mem_alloced.dma);
+   phba->ctrl.mbox_mem_alloced.dma);
beiscsi_unmap_pci_function(phba);
-hba_free:
+free_hba:
pci_disable_msix(phba->pcidev);
pci_dev_put(phba->pcidev);
iscsi_host_free(phba->shost);
-- 
2.7.4

Re: [PATCH v7 9/9] block, scsi: Make SCSI quiesce and resume work reliably

2017-10-10 Thread Martin Steigerwald

Bart Van Assche - 09.10.17, 16:14:
> The contexts from which a SCSI device can be quiesced or resumed are:
> * Writing into /sys/class/scsi_device/*/device/state.
> * SCSI parallel (SPI) domain validation.
> * The SCSI device power management methods. See also scsi_bus_pm_ops.
> 
> It is essential during suspend and resume that neither the filesystem
> state nor the filesystem metadata in RAM changes. This is why while
> the hibernation image is being written or restored that SCSI devices
> are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
> and scsi_device_resume(). In the SDEV_QUIESCE state execution of
> non-preempt requests is deferred. This is realized by returning
> BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
> devices. Avoid that a full queue prevents power management requests
> to be submitted by deferring allocation of non-preempt requests for
> devices in the quiesced state. This patch has been tested by running
> the following commands and by verifying that after resume the fio job
> is still running:
> 
> for d in /sys/class/block/sd*[a-z]; do
>   hcil=$(readlink "$d/device")
>   hcil=${hcil#../../../}
>   echo 4 > "$d/queue/nr_requests"
>   echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
> done
> bdev=$(readlink /dev/disk/by-uuid/5217d83f-213e-4b42-b86e-20013325ba6c)
> bdev=${bdev#../../}
> hcil=$(readlink "/sys/block/$bdev/device")
> hcil=${hcil#../../../}
> fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512
> --rw=randread \ --ioengine=libaio --numjobs=4 --iodepth=16
> --iodepth_batch=1 --thread \ --loops=$((2**31)) &
> pid=$!
> sleep 1
> systemctl hibernate
> sleep 10
> kill $pid
> 
> Reported-by: Oleksandr Natalenko 
> References: "I/O hangs after resuming from suspend-to-ram"
> (https://marc.info/?l=linux-block=150340235201348). Signed-off-by: Bart
> Van Assche 
> Cc: Martin K. Petersen 
> Cc: Ming Lei 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 

Does this as reliably fix the issue as the patches from Ming? I mean in *real 
world* scenarios? Or is it just about the same approach as Ming has taken.

I ask cause I don´t see any Tested-By:´s here? I know I tested Ming´s patch 
series and I know it fixes the hang after resume from suspend with blk-mq + BFQ 
issue for me. I have an uptime of 7 days and I didn´t see any uptime even 
remotely like that in a long time (before that issue Intel gfx drivers caused 
hangs, but thankfully that seems fixed meanwhile).

I´d be willing to test. Do you have a 4.14.x tree available with these patches 
applied I can just add as a remote and fetch from?

Thanks,
Martin

> ---
>  block/blk-core.c| 42 +++---
>  block/blk-mq.c  |  4 ++--
>  block/blk-timeout.c |  2 +-
>  drivers/scsi/scsi_lib.c | 28 ++--
>  fs/block_dev.c  |  4 ++--
>  include/linux/blkdev.h  |  2 +-
>  6 files changed, 59 insertions(+), 23 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ed992cbd107f..3847ea42e341 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -372,6 +372,7 @@ void blk_clear_preempt_only(struct request_queue *q)
> 
>   spin_lock_irqsave(q->queue_lock, flags);
>   queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
> + wake_up_all(>mq_freeze_wq);
>   spin_unlock_irqrestore(q->queue_lock, flags);
>  }
>  EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
> @@ -793,15 +794,40 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
>  }
>  EXPORT_SYMBOL(blk_alloc_queue);
> 
> -int blk_queue_enter(struct request_queue *q, bool nowait)
> +/**
> + * blk_queue_enter() - try to increase q->q_usage_counter
> + * @q: request queue pointer
> + * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
> + */
> +int blk_queue_enter(struct request_queue *q, unsigned int flags)
>  {
> + const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
> +
>   while (true) {
> + bool success = false;
>   int ret;
> 
> - if (percpu_ref_tryget_live(>q_usage_counter))
> + rcu_read_lock_sched();
> + if (percpu_ref_tryget_live(>q_usage_counter)) {
> + /*
> +  * The code that sets the PREEMPT_ONLY flag is
> +  * responsible for ensuring that that flag is globally
> +  * visible before the queue is unfrozen.
> +  */
> + if (preempt || !blk_queue_preempt_only(q)) {
> + success = true;
> + } else {
> + percpu_ref_put(>q_usage_counter);
> + WARN_ONCE("%s: Attempt to allocate non-preempt 
> request in 
preempt-only
> mode.\n", +

Re: [PATCH v7 4/9] block: Make q_usage_counter also track legacy requests

2017-10-10 Thread Johannes Thumshirn

Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v7 2/9] md: Introduce md_stop_all_writes()

2017-10-10 Thread Johannes Thumshirn

Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v7 1/9] md: Rename md_notifier into md_reboot_notifier

2017-10-10 Thread Johannes Thumshirn

Looks good,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

72 matches

Mail list logo