RE: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
> -Original Message- > From: Ming Lei [mailto:ming@redhat.com] > Sent: Thursday, March 08, 2018 9:32 PM > To: James Bottomley; Jens Axboe > ; Martin K . Petersen > Cc: Christoph Hellwig ; linux-s...@vger.kernel.org; linux- > bl...@vger.kernel.org; Meelis Roos ; Don Brace > ; Kashyap Desai > ; Laurence Oberman > ; Mike Snitzer ; Ming Lei > ; Hannes Reinecke ; James Bottomley > ; Artem Bityutskiy > > Subject: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue > > EXTERNAL EMAIL > > > From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs), > one msix vector can be created without any online CPU mapped, then one > command's completion may not be notified. > > This patch setups mapping between cpu and reply queue according to irq > affinity info retrived by pci_irq_get_affinity(), and uses this mapping > table to choose reply queue for queuing one command. > > Then the chosen reply queue has to be active, and fixes IO hang caused > by using inactive reply queue which doesn't have any online CPU mapped. > > Cc: Hannes Reinecke > Cc: "Martin K. Petersen" , > Cc: James Bottomley , > Cc: Christoph Hellwig , > Cc: Don Brace > Cc: Kashyap Desai > Cc: Laurence Oberman > Cc: Meelis Roos > Cc: Artem Bityutskiy > Cc: Mike Snitzer > Tested-by: Laurence Oberman > Tested-by: Don Brace > Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs") > Signed-off-by: Ming Lei > --- Acked-by: Don Brace Tested-by: Don Brace * Rebuilt test rig: applied the following patches to Linus's tree 4.16.0-rc4+: [PATCH V4 1_4] scsi: hpsa: fix selection of reply queue - Ming Lei - 2018-03-08 2132.eml [PATCH V4 3_4] scsi: introduce force_blk_mq - Ming Lei - 2018-03-08 2132.eml * fio tests on 6 LVs on P441 controller (fw 6.59) 5 days. * fio tests on 10 HBA disks on P431 (fw 4.54) controller. 3 days. ( concurrent with P441 tests) > drivers/scsi/hpsa.c | 73 > +++-- > drivers/scsi/hpsa.h | 1 + > 2 files changed, 55 insertions(+), 19 deletions(-) > > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c > index 5293e6827ce5..3a9eca163db8 100644 > --- a/drivers/scsi/hpsa.c > +++ b/drivers/scsi/hpsa.c > @@ -1045,11 +1045,7 @@ static void set_performant_mode(struct ctlr_info > *h, struct CommandList *c, > c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1); > if (unlikely(!h->msix_vectors)) > return; > - if (likely(reply_queue == DEFAULT_REPLY_QUEUE)) > - c->Header.ReplyQueue = > - raw_smp_processor_id() % h->nreply_queues; > - else > - c->Header.ReplyQueue = reply_queue % h->nreply_queues; > + c->Header.ReplyQueue = reply_queue; > } > } > > @@ -1063,10 +1059,7 @@ static void set_ioaccel1_performant_mode(struct > ctlr_info *h, > * Tell the controller to post the reply to the queue for this > * processor. This seems to give the best I/O throughput. > */ > - if (likely(reply_queue == DEFAULT_REPLY_QUEUE)) > - cp->ReplyQueue = smp_processor_id() % h->nreply_queues; > - else > - cp->ReplyQueue = reply_queue % h->nreply_queues; > + cp->ReplyQueue = reply_queue; > /* > * Set the bits in the address sent down to include: > * - performant mode bit (bit 0) > @@ -1087,10 +1080,7 @@ static void > set_ioaccel2_tmf_performant_mode(struct ctlr_info *h, > /* Tell the controller to post the reply to the queue for this > * processor. This seems to give the best I/O throughput. > */ > - if (likely(reply_queue == DEFAULT_REPLY_QUEUE)) > - cp->reply_queue = smp_processor_id() % h->nreply_queues; > - else > - cp->reply_queue = reply_queue % h->nreply_queues; > + cp->reply_queue = reply_queue; > /* Set the bits in the address sent down to include: > * - performant mode bit not used in ioaccel mode 2 > * - pull count (bits 0-3) > @@ -1109,10 +1099,7 @@ static void set_ioaccel2_performant_mode(struct > ctlr_info *h, >
Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
On Mon, Mar 12, 2018 at 08:52:02AM +0100, Christoph Hellwig wrote: > On Sat, Mar 10, 2018 at 11:01:43PM +0800, Ming Lei wrote: > > > I really dislike this being open coded in drivers. It really should > > > be helper chared with the blk-mq map building that drivers just use. > > > > > > For now just have a low-level blk_pci_map_queues that > > > blk_mq_pci_map_queues, hpsa and megaraid can share. In the long run > > > it might make sense to change the blk-mq callout to that low-level > > > prototype as well. > > > > The way for selecting reply queue is needed for non scsi_mq too. > > Which still doesn't prevent you from using a common helper. The only common code is the following part: + for (queue = 0; queue < instance->msix_vectors; queue++) { + mask = pci_irq_get_affinity(instance->pdev, queue); + if (!mask) + goto fallback; + + for_each_cpu(cpu, mask) + instance->reply_map[cpu] = queue; + } For megraraid_sas, the fallback code need to handle mapping in the following way for legacy vectors: for_each_possible_cpu(cpu) instance->reply_map[cpu] = cpu % instance->msix_vectors; So not sure if it is worth of a common helper, given there may not be potential users of the helper. Thanks, Ming
Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
On Sat, Mar 10, 2018 at 11:01:43PM +0800, Ming Lei wrote: > > I really dislike this being open coded in drivers. It really should > > be helper chared with the blk-mq map building that drivers just use. > > > > For now just have a low-level blk_pci_map_queues that > > blk_mq_pci_map_queues, hpsa and megaraid can share. In the long run > > it might make sense to change the blk-mq callout to that low-level > > prototype as well. > > The way for selecting reply queue is needed for non scsi_mq too. Which still doesn't prevent you from using a common helper.
Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
Linux-Regression-ID: lr#15a115 On Fri, 2018-03-09 at 11:32 +0800, Ming Lei wrote: > From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs), > one msix vector can be created without any online CPU mapped, then one > command's completion may not be notified. > > This patch setups mapping between cpu and reply queue according to irq > affinity info retrived by pci_irq_get_affinity(), and uses this mapping > table to choose reply queue for queuing one command. > > Then the chosen reply queue has to be active, and fixes IO hang caused > by using inactive reply queue which doesn't have any online CPU mapped. > > Cc: Hannes Reinecke> Cc: "Martin K. Petersen" , > Cc: James Bottomley , > Cc: Christoph Hellwig , > Cc: Don Brace > Cc: Kashyap Desai > Cc: Laurence Oberman > Cc: Meelis Roos > Cc: Artem Bityutskiy > Cc: Mike Snitzer > Tested-by: Laurence Oberman > Tested-by: Don Brace > Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs") > Signed-off-by: Ming Lei Tested-by: Artem Bityutskiy Link: https://lkml.kernel.org/r/1519311270.2535.53.ca...@intel.com These 2 patches make the Dell R640 regression that I reported go away. Tested on top of v4.16-rc5, thanks! -- Best Regards, Artem Bityutskiy - Intel Finland Oy Registered Address: PL 281, 00181 Helsinki Business Identity Code: 0357606 - 4 Domiciled in Helsinki This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
On Sat, Mar 10, 2018 at 11:09:59AM +0100, Christoph Hellwig wrote: > > +static void hpsa_setup_reply_map(struct ctlr_info *h) > > +{ > > + const struct cpumask *mask; > > + unsigned int queue, cpu; > > + > > + for (queue = 0; queue < h->msix_vectors; queue++) { > > + mask = pci_irq_get_affinity(h->pdev, queue); > > + if (!mask) > > + goto fallback; > > + > > + for_each_cpu(cpu, mask) > > + h->reply_map[cpu] = queue; > > + } > > + return; > > + > > +fallback: > > + for_each_possible_cpu(cpu) > > + h->reply_map[cpu] = 0; > > +} > > > + h->reply_map = kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_KERNEL); > > + if (!h->reply_map) { > > + kfree(h); > > + return NULL; > > + } > > + return h; > > I really dislike this being open coded in drivers. It really should > be helper chared with the blk-mq map building that drivers just use. > > For now just have a low-level blk_pci_map_queues that > blk_mq_pci_map_queues, hpsa and megaraid can share. In the long run > it might make sense to change the blk-mq callout to that low-level > prototype as well. The way for selecting reply queue is needed for non scsi_mq too. Thanks, Ming
Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
> +static void hpsa_setup_reply_map(struct ctlr_info *h) > +{ > + const struct cpumask *mask; > + unsigned int queue, cpu; > + > + for (queue = 0; queue < h->msix_vectors; queue++) { > + mask = pci_irq_get_affinity(h->pdev, queue); > + if (!mask) > + goto fallback; > + > + for_each_cpu(cpu, mask) > + h->reply_map[cpu] = queue; > + } > + return; > + > +fallback: > + for_each_possible_cpu(cpu) > + h->reply_map[cpu] = 0; > +} > + h->reply_map = kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_KERNEL); > + if (!h->reply_map) { > + kfree(h); > + return NULL; > + } > + return h; I really dislike this being open coded in drivers. It really should be helper chared with the blk-mq map building that drivers just use. For now just have a low-level blk_pci_map_queues that blk_mq_pci_map_queues, hpsa and megaraid can share. In the long run it might make sense to change the blk-mq callout to that low-level prototype as well.
Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
From: Ming LeiSent: Thursday, March 8, 2018 7:32 PM To: James Bottomley; Jens Axboe; Martin K . Petersen Cc: Christoph Hellwig; linux-s...@vger.kernel.org; linux-block@vger.kernel.org; Meelis Roos; Don Brace; Kashyap Desai; Laurence Oberman; Mike Snitzer; Ming Lei; Hannes Reinecke; James Bottomley; Artem Bityutskiy Subject: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue EXTERNAL EMAIL >From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs), one msix vector can be created without any online CPU mapped, then one command's completion may not be notified. This patch setups mapping between cpu and reply queue according to irq affinity info retrived by pci_irq_get_affinity(), and uses this mapping table to choose reply queue for queuing one command. Then the chosen reply queue has to be active, and fixes IO hang caused by using inactive reply queue which doesn't have any online CPU mapped. Cc: Hannes Reinecke Cc: "Martin K. Petersen" , Cc: James Bottomley , Cc: Christoph Hellwig , Cc: Don Brace Cc: Kashyap Desai Cc: Laurence Oberman Cc: Meelis Roos Cc: Artem Bityutskiy Cc: Mike Snitzer Tested-by: Laurence Oberman Tested-by: Don Brace Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs") Signed-off-by: Ming Lei I got the following stack trace while testing: I need to pop off the patches and re-test for a baseline. [root@cyflym ~]# [18564.263896] XFS (dm-2): _xfs_buf_find: daddr 0x282084848 out of range, EOFS 0x7298000 [18564.301491] WARNING: CPU: 51 PID: 18275 at fs/xfs/xfs_buf.c:591 _xfs_buf_find+0x3f0/0x530 [xfs] [18564.342614] Modules linked in: sg ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sb_edac x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt aesni_intel iTCO_vendor_support crypto_simd glue_helper cryptd pcspkr ipmi_si hpilo hpwdt lpc_ich ioatdma pcc_cpufreq shpchp dca mfd_core wmi ipmi_msghandler acpi_power_meter uinput xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt [18564.678017] sd_mod fb_sys_fops ttm drm crc32c_intel tg3 hpsa i2c_core scsi_transport_sas usb_storage dm_mirror dm_region_hash dm_log dm_mod dax [18564.739543] CPU: 51 PID: 18275 Comm: bash Not tainted 4.16.0-rc4+ #14 [18564.769923] Hardware name: HP ProLiant DL580 Gen8, BIOS P79 08/18/2016 [18564.80] RIP: 0010:_xfs_buf_find+0x3f0/0x530 [xfs] [18564.825121] RSP: 0018:9f0aaabaf6b8 EFLAGS: 00010246 [18564.849811] RAX: RBX: 9f0aaabaf808 RCX: [18564.883604] RDX: 9f0aaabaf5d8 RSI: 000a RDI: c046ad77 [18564.917315] RBP: 0001 R08: R09: 0021 [18564.951211] R10: R11: 000a R12: 8ade9c88dbc0 [18564.984925] R13: 8ade9c88dbc0 R14: 0001 R15: 9f0aaabaf808 [18565.018846] FS: 7f423c899740() GS:8aee9ef8() knlGS: [18565.057473] CS: 0010 DS: ES: CR0: 80050033 [18565.084562] CR2: 7ffc480f8070 CR3: 00105b8ce006 CR4: 001606e0 [18565.118377] Call Trace: [18565.129851] ? _cond_resched+0x15/0x30 [18565.147590] xfs_buf_get_map+0x23/0x260 [xfs] [18565.168557] xfs_buf_read_map+0x29/0x180 [xfs] [18565.189845] xfs_trans_read_buf_map+0xec/0x300 [xfs] [18565.213354] xfs_btree_read_buf_block.constprop.36+0x77/0xd0 [xfs] [18565.242721] xfs_btree_lookup_get_block+0x82/0x170 [xfs] [18565.268117] xfs_btree_lookup+0xce/0x3c0 [xfs] [18565.289218] ? kmem_zone_alloc+0x95/0x100 [xfs] [18565.310659] xfs_free_ag_extent+0x93/0x830 [xfs] [18565.332491] xfs_free_extent+0xb6/0x150 [xfs] [18565.353187] xfs_trans_free_extent+0x4f/0x110 [xfs] [18565.376544] ? xfs_trans_add_item+0x50/0x80 [xfs] [18565.399174] xfs_extent_free_finish_item+0x21/0x30 [xfs] [18565.424638] xfs_defer_finish+0x13d/0x400 [xfs] [18565.446007] xfs_itruncate_extents+0x11d/0x2d0 [xfs] [18565.469501] xfs_setattr_size+0x275/0x300 [xfs] [18565.490808] xfs_vn_setattr+0x40/0x60 [xfs] [18565.510577] notify_change+0x269/0x440 [18565.529105] do_truncate+0x72/0xc0 [18565.545982] path_openat+0x5ed/0x1210 [18565.563916] ? xfs_iext_lookup_extent+0x60/0x140 [xfs] [18565.588955] ?