RE: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue

2018-03-12 Thread Don Brace
> -Original Message-
> From: Ming Lei [mailto:ming@redhat.com]
> Sent: Thursday, March 08, 2018 9:32 PM
> To: James Bottomley ; Jens Axboe
> ; Martin K . Petersen 
> Cc: Christoph Hellwig ; linux-s...@vger.kernel.org; linux-
> bl...@vger.kernel.org; Meelis Roos ; Don Brace
> ; Kashyap Desai
> ; Laurence Oberman
> ; Mike Snitzer ; Ming Lei
> ; Hannes Reinecke ; James Bottomley
> ; Artem Bityutskiy
> 
> Subject: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
> 
> EXTERNAL EMAIL
> 
> 
> From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
> one msix vector can be created without any online CPU mapped, then one
> command's completion may not be notified.
> 
> This patch setups mapping between cpu and reply queue according to irq
> affinity info retrived by pci_irq_get_affinity(), and uses this mapping
> table to choose reply queue for queuing one command.
> 
> Then the chosen reply queue has to be active, and fixes IO hang caused
> by using inactive reply queue which doesn't have any online CPU mapped.
> 
> Cc: Hannes Reinecke 
> Cc: "Martin K. Petersen" ,
> Cc: James Bottomley ,
> Cc: Christoph Hellwig ,
> Cc: Don Brace 
> Cc: Kashyap Desai 
> Cc: Laurence Oberman 
> Cc: Meelis Roos 
> Cc: Artem Bityutskiy 
> Cc: Mike Snitzer 
> Tested-by: Laurence Oberman 
> Tested-by: Don Brace 
> Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
> Signed-off-by: Ming Lei 
> ---

Acked-by: Don Brace 
Tested-by: Don Brace 
   * Rebuilt test rig: applied the following patches to Linus's tree 
4.16.0-rc4+:
 [PATCH V4 1_4] scsi: hpsa: fix selection of reply queue - Ming 
Lei  - 2018-03-08 2132.eml
 [PATCH V4 3_4] scsi: introduce force_blk_mq - Ming Lei 
 - 2018-03-08 2132.eml
* fio tests on 6 LVs on P441 controller (fw 6.59) 5 days.
* fio tests on 10 HBA disks on P431 (fw 4.54) controller. 3 days. ( 
concurrent with P441 tests)

>  drivers/scsi/hpsa.c | 73 
> +++--
>  drivers/scsi/hpsa.h |  1 +
>  2 files changed, 55 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 5293e6827ce5..3a9eca163db8 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1045,11 +1045,7 @@ static void set_performant_mode(struct ctlr_info
> *h, struct CommandList *c,
> c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
> if (unlikely(!h->msix_vectors))
> return;
> -   if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -   c->Header.ReplyQueue =
> -   raw_smp_processor_id() % h->nreply_queues;
> -   else
> -   c->Header.ReplyQueue = reply_queue % h->nreply_queues;
> +   c->Header.ReplyQueue = reply_queue;
> }
>  }
> 
> @@ -1063,10 +1059,7 @@ static void set_ioaccel1_performant_mode(struct
> ctlr_info *h,
>  * Tell the controller to post the reply to the queue for this
>  * processor.  This seems to give the best I/O throughput.
>  */
> -   if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -   cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
> -   else
> -   cp->ReplyQueue = reply_queue % h->nreply_queues;
> +   cp->ReplyQueue = reply_queue;
> /*
>  * Set the bits in the address sent down to include:
>  *  - performant mode bit (bit 0)
> @@ -1087,10 +1080,7 @@ static void
> set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
> /* Tell the controller to post the reply to the queue for this
>  * processor.  This seems to give the best I/O throughput.
>  */
> -   if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -   cp->reply_queue = smp_processor_id() % h->nreply_queues;
> -   else
> -   cp->reply_queue = reply_queue % h->nreply_queues;
> +   cp->reply_queue = reply_queue;
> /* Set the bits in the address sent down to include:
>  *  - performant mode bit not used in ioaccel mode 2
>  *  - pull count (bits 0-3)
> @@ -1109,10 +1099,7 @@ static void set_ioaccel2_performant_mode(struct
> ctlr_info *h,
>

Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue

2018-03-12 Thread Ming Lei
On Mon, Mar 12, 2018 at 08:52:02AM +0100, Christoph Hellwig wrote:
> On Sat, Mar 10, 2018 at 11:01:43PM +0800, Ming Lei wrote:
> > > I really dislike this being open coded in drivers.  It really should
> > > be helper chared with the blk-mq map building that drivers just use.
> > > 
> > > For now just have a low-level blk_pci_map_queues that
> > > blk_mq_pci_map_queues, hpsa and megaraid can share.  In the long run
> > > it might make sense to change the blk-mq callout to that low-level
> > > prototype as well.
> > 
> > The way for selecting reply queue is needed for non scsi_mq too.
> 
> Which still doesn't prevent you from using a common helper.

The only common code is the following part:

+   for (queue = 0; queue < instance->msix_vectors; queue++) {
+   mask = pci_irq_get_affinity(instance->pdev, queue);
+   if (!mask)
+   goto fallback;
+
+   for_each_cpu(cpu, mask)
+   instance->reply_map[cpu] = queue;
+   }

For megraraid_sas, the fallback code need to handle mapping in the
following way for legacy vectors:

   for_each_possible_cpu(cpu)
   instance->reply_map[cpu] = cpu % instance->msix_vectors;


So not sure if it is worth of a common helper, given there may not be
potential users of the helper.

Thanks,
Ming


Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue

2018-03-12 Thread Christoph Hellwig
On Sat, Mar 10, 2018 at 11:01:43PM +0800, Ming Lei wrote:
> > I really dislike this being open coded in drivers.  It really should
> > be helper chared with the blk-mq map building that drivers just use.
> > 
> > For now just have a low-level blk_pci_map_queues that
> > blk_mq_pci_map_queues, hpsa and megaraid can share.  In the long run
> > it might make sense to change the blk-mq callout to that low-level
> > prototype as well.
> 
> The way for selecting reply queue is needed for non scsi_mq too.

Which still doesn't prevent you from using a common helper.


Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue

2018-03-12 Thread Bityutskiy, Artem
Linux-Regression-ID: lr#15a115

On Fri, 2018-03-09 at 11:32 +0800, Ming Lei wrote:
> From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
> one msix vector can be created without any online CPU mapped, then one
> command's completion may not be notified.
> 
> This patch setups mapping between cpu and reply queue according to irq
> affinity info retrived by pci_irq_get_affinity(), and uses this mapping
> table to choose reply queue for queuing one command.
> 
> Then the chosen reply queue has to be active, and fixes IO hang caused
> by using inactive reply queue which doesn't have any online CPU mapped.
> 
> Cc: Hannes Reinecke 
> Cc: "Martin K. Petersen" ,
> Cc: James Bottomley ,
> Cc: Christoph Hellwig ,
> Cc: Don Brace 
> Cc: Kashyap Desai 
> Cc: Laurence Oberman 
> Cc: Meelis Roos 
> Cc: Artem Bityutskiy 
> Cc: Mike Snitzer 
> Tested-by: Laurence Oberman 
> Tested-by: Don Brace 
> Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
> Signed-off-by: Ming Lei 

Tested-by: Artem Bityutskiy 
Link: https://lkml.kernel.org/r/1519311270.2535.53.ca...@intel.com

These 2 patches make the Dell R640 regression that I reported go away.
Tested on top of v4.16-rc5, thanks!

-- 
Best Regards,
Artem Bityutskiy
-
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki 
Business Identity Code: 0357606 - 4 
Domiciled in Helsinki 

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue

2018-03-10 Thread Ming Lei
On Sat, Mar 10, 2018 at 11:09:59AM +0100, Christoph Hellwig wrote:
> > +static void hpsa_setup_reply_map(struct ctlr_info *h)
> > +{
> > +   const struct cpumask *mask;
> > +   unsigned int queue, cpu;
> > +
> > +   for (queue = 0; queue < h->msix_vectors; queue++) {
> > +   mask = pci_irq_get_affinity(h->pdev, queue);
> > +   if (!mask)
> > +   goto fallback;
> > +
> > +   for_each_cpu(cpu, mask)
> > +   h->reply_map[cpu] = queue;
> > +   }
> > +   return;
> > +
> > +fallback:
> > +   for_each_possible_cpu(cpu)
> > +   h->reply_map[cpu] = 0;
> > +}
> 
> > +   h->reply_map = kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_KERNEL);
> > +   if (!h->reply_map) {
> > +   kfree(h);
> > +   return NULL;
> > +   }
> > +   return h;
> 
> I really dislike this being open coded in drivers.  It really should
> be helper chared with the blk-mq map building that drivers just use.
> 
> For now just have a low-level blk_pci_map_queues that
> blk_mq_pci_map_queues, hpsa and megaraid can share.  In the long run
> it might make sense to change the blk-mq callout to that low-level
> prototype as well.

The way for selecting reply queue is needed for non scsi_mq too.

Thanks,
Ming


Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue

2018-03-10 Thread Christoph Hellwig
> +static void hpsa_setup_reply_map(struct ctlr_info *h)
> +{
> + const struct cpumask *mask;
> + unsigned int queue, cpu;
> +
> + for (queue = 0; queue < h->msix_vectors; queue++) {
> + mask = pci_irq_get_affinity(h->pdev, queue);
> + if (!mask)
> + goto fallback;
> +
> + for_each_cpu(cpu, mask)
> + h->reply_map[cpu] = queue;
> + }
> + return;
> +
> +fallback:
> + for_each_possible_cpu(cpu)
> + h->reply_map[cpu] = 0;
> +}

> + h->reply_map = kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_KERNEL);
> + if (!h->reply_map) {
> + kfree(h);
> + return NULL;
> + }
> + return h;

I really dislike this being open coded in drivers.  It really should
be helper chared with the blk-mq map building that drivers just use.

For now just have a low-level blk_pci_map_queues that
blk_mq_pci_map_queues, hpsa and megaraid can share.  In the long run
it might make sense to change the blk-mq callout to that low-level
prototype as well.


Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue

2018-03-09 Thread Don Brace


From: Ming Lei 
Sent: Thursday, March 8, 2018 7:32 PM
To: James Bottomley; Jens Axboe; Martin K . Petersen
Cc: Christoph Hellwig; linux-s...@vger.kernel.org; linux-block@vger.kernel.org; 
Meelis Roos; Don Brace; Kashyap Desai; Laurence Oberman; Mike Snitzer; Ming 
Lei; Hannes Reinecke; James Bottomley; Artem Bityutskiy
Subject: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue

EXTERNAL EMAIL


>From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
one msix vector can be created without any online CPU mapped, then one
command's completion may not be notified.

This patch setups mapping between cpu and reply queue according to irq
affinity info retrived by pci_irq_get_affinity(), and uses this mapping
table to choose reply queue for queuing one command.

Then the chosen reply queue has to be active, and fixes IO hang caused
by using inactive reply queue which doesn't have any online CPU mapped.

Cc: Hannes Reinecke 
Cc: "Martin K. Petersen" ,
Cc: James Bottomley ,
Cc: Christoph Hellwig ,
Cc: Don Brace 
Cc: Kashyap Desai 
Cc: Laurence Oberman 
Cc: Meelis Roos 
Cc: Artem Bityutskiy 
Cc: Mike Snitzer 
Tested-by: Laurence Oberman 
Tested-by: Don Brace 
Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
Signed-off-by: Ming Lei 

I got the following stack trace while testing: I need to pop off the patches 
and re-test for a baseline.


[root@cyflym ~]# [18564.263896] XFS (dm-2): _xfs_buf_find: daddr 0x282084848 
out of range, EOFS 0x7298000
[18564.301491] WARNING: CPU: 51 PID: 18275 at fs/xfs/xfs_buf.c:591 
_xfs_buf_find+0x3f0/0x530 [xfs]
[18564.342614] Modules linked in: sg ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 
nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 
nf_defrag_ipv4 xt_conntrack nf_conntrack cfg80211 rfkill ebtable_nat 
ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle 
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle 
iptable_security iptable_raw iptable_filter ip_tables sb_edac 
x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt aesni_intel iTCO_vendor_support 
crypto_simd glue_helper cryptd pcspkr ipmi_si hpilo hpwdt lpc_ich ioatdma 
pcc_cpufreq shpchp dca mfd_core wmi ipmi_msghandler acpi_power_meter uinput xfs 
libcrc32c mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
[18564.678017]  sd_mod fb_sys_fops ttm drm crc32c_intel tg3 hpsa i2c_core 
scsi_transport_sas usb_storage dm_mirror dm_region_hash dm_log dm_mod dax
[18564.739543] CPU: 51 PID: 18275 Comm: bash Not tainted 4.16.0-rc4+ #14
[18564.769923] Hardware name: HP ProLiant DL580 Gen8, BIOS P79 08/18/2016
[18564.80] RIP: 0010:_xfs_buf_find+0x3f0/0x530 [xfs]
[18564.825121] RSP: 0018:9f0aaabaf6b8 EFLAGS: 00010246
[18564.849811] RAX:  RBX: 9f0aaabaf808 RCX: 
[18564.883604] RDX: 9f0aaabaf5d8 RSI: 000a RDI: c046ad77
[18564.917315] RBP: 0001 R08:  R09: 0021
[18564.951211] R10:  R11: 000a R12: 8ade9c88dbc0
[18564.984925] R13: 8ade9c88dbc0 R14: 0001 R15: 9f0aaabaf808
[18565.018846] FS:  7f423c899740() GS:8aee9ef8() 
knlGS:
[18565.057473] CS:  0010 DS:  ES:  CR0: 80050033
[18565.084562] CR2: 7ffc480f8070 CR3: 00105b8ce006 CR4: 001606e0
[18565.118377] Call Trace:
[18565.129851]  ? _cond_resched+0x15/0x30
[18565.147590]  xfs_buf_get_map+0x23/0x260 [xfs]
[18565.168557]  xfs_buf_read_map+0x29/0x180 [xfs]
[18565.189845]  xfs_trans_read_buf_map+0xec/0x300 [xfs]
[18565.213354]  xfs_btree_read_buf_block.constprop.36+0x77/0xd0 [xfs]
[18565.242721]  xfs_btree_lookup_get_block+0x82/0x170 [xfs]
[18565.268117]  xfs_btree_lookup+0xce/0x3c0 [xfs]
[18565.289218]  ? kmem_zone_alloc+0x95/0x100 [xfs]
[18565.310659]  xfs_free_ag_extent+0x93/0x830 [xfs]
[18565.332491]  xfs_free_extent+0xb6/0x150 [xfs]
[18565.353187]  xfs_trans_free_extent+0x4f/0x110 [xfs]
[18565.376544]  ? xfs_trans_add_item+0x50/0x80 [xfs]
[18565.399174]  xfs_extent_free_finish_item+0x21/0x30 [xfs]
[18565.424638]  xfs_defer_finish+0x13d/0x400 [xfs]
[18565.446007]  xfs_itruncate_extents+0x11d/0x2d0 [xfs]
[18565.469501]  xfs_setattr_size+0x275/0x300 [xfs]
[18565.490808]  xfs_vn_setattr+0x40/0x60 [xfs]
[18565.510577]  notify_change+0x269/0x440
[18565.529105]  do_truncate+0x72/0xc0
[18565.545982]  path_openat+0x5ed/0x1210
[18565.563916]  ? xfs_iext_lookup_extent+0x60/0x140 [xfs]
[18565.588955]  ?