Re: disk-io lockup in 4.14.13 kernel
Hi Bart, Does the following go with your theory: [452545.945561] sysrq: SysRq : Show backtrace of all active CPUs [452545.946182] NMI backtrace for cpu 5 [452545.946185] CPU: 5 PID: 31921 Comm: bash Tainted: G I 4.14.13-uls #2 [452545.946186] Hardware name: Supermicro SSG-5048R-E1CR36L/X10SRH-CLN4F, BIOS T20140520103247 05/20/2014 [452545.946187] Call Trace: [452545.946196] dump_stack+0x46/0x5a [452545.946200] nmi_cpu_backtrace+0xb3/0xc0 [452545.946205] ? irq_force_complete_move+0xd0/0xd0 [452545.946208] nmi_trigger_cpumask_backtrace+0x8f/0xc0 [452545.946212] __handle_sysrq+0xec/0x140 [452545.946216] write_sysrq_trigger+0x26/0x30 [452545.946219] proc_reg_write+0x38/0x60 [452545.946222] __vfs_write+0x1e/0x130 [452545.946225] vfs_write+0xab/0x190 [452545.946228] SyS_write+0x3d/0xa0 [452545.946233] entry_SYSCALL_64_fastpath+0x13/0x6c [452545.946236] RIP: 0033:0x7f6b85db52d0 [452545.946238] RSP: 002b:7fff6f9479e8 EFLAGS: 0246 [452545.946241] Sending NMI from CPU 5 to CPUs 0-4: [452545.946272] NMI backtrace for cpu 0 skipped: idling at pc 0x8162b0a0 [452545.946275] NMI backtrace for cpu 3 skipped: idling at pc 0x8162b0a0 [452545.946279] NMI backtrace for cpu 4 skipped: idling at pc 0x8162b0a0 [452545.946283] NMI backtrace for cpu 2 skipped: idling at pc 0x8162b0a0 [452545.946287] NMI backtrace for cpu 1 skipped: idling at pc 0x8162b0a0 I'm not sure how to link that address back to some function or something, and had to reboot, so not sure if that can be done still. Kind Regards, Jaco On 13/03/2018 19:24, Bart Van Assche wrote: > On Tue, 2018-03-13 at 19:16 +0200, Jaco Kroon wrote: >> The server in question is the destination of numerous rsync/ssh cases >> (used primarily for backups) and is not intended as a real-time system. >> I'm happy to enable the options below that you would indicate would be >> helpful in pinpointing the problem (assuming we're not looking at a 8x >> more CPU required type of degrading as I've recently seen with asterisk >> lock debugging enabled). I've marked in bold below what I assume would >> be helpful. If you don't mind confirming for me I'll enable and >> schedule a reboot. > Hello Jaco, > > My recommendation is to wait until the mpt3sas maintainers post a fix > for what I reported yesterday on the linux-scsi mailing list. Enabling > CONFIG_DEBUG_ATOMIC_SLEEP has namely a very annoying consequence for the > mpt3sas driver: the first process that hits the "sleep in atomic context" > bug gets killed. I don't think that you want this kind of behavior on a > production setup. > > Bart. > > > >
Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory
> That would be very nice but many devices do not support the internal > route. But Logan in the NVMe case we are discussing movement within a single function (i.e. from a NVMe namespace to a NVMe CMB on the same function). Bjorn is discussing movement between two functions (PFs or VFs) in the same PCIe EP. In the case of multi-function endpoints I think the standard requires those devices to support internal DMAs for transfers between those functions (but does not require it within a function). So I think the summary is: 1. There is no requirement for a single function to support internal DMAs but in the case of NVMe we do have a protocol specific way for a NVMe function to indicate it supports via the CMB BAR. Other protocols may also have such methods but I am not aware of them at this time. 2. For multi-function end-points I think it is a requirement that DMAs *between* functions are supported via an internal path but this can be over-ridden by ACS when supported in the EP. 3. For multi-function end-points there is no requirement to support internal DMA within each individual function (i.e. a la point 1 but extended to each function in a MF device). Based on my review of the specification I concur with Bjorn that p2pdma between functions in a MF end-point should be assured to be supported via the standard. However if the p2pdma involves only a single function in a MF device then we can only support NVMe CMBs for now. Let's review and see what the options are for supporting this in the next respin. Stephen
Re: [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues
Hi Keith Thanks for your time and patch for this. On 03/24/2018 06:19 AM, Keith Busch wrote: > The PCI interrupt vectors intended to be associated with a queue may > not start at 0. This patch adds an offset parameter so blk-mq may find > the intended affinity mask. The default value is 0 so existing drivers > that don't care about this parameter don't need to change. > > Signed-off-by: Keith Busch> --- > block/blk-mq-pci.c | 12 ++-- > include/linux/blk-mq-pci.h | 2 ++ > 2 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c > index 76944e3271bf..1040a7705c13 100644 > --- a/block/blk-mq-pci.c > +++ b/block/blk-mq-pci.c > @@ -21,6 +21,7 @@ > * blk_mq_pci_map_queues - provide a default queue mapping for PCI device > * @set: tagset to provide the mapping for > * @pdev:PCI device associated with @set. > + * @offset: PCI irq starting vector offset > * > * This function assumes the PCI device @pdev has at least as many available > * interrupt vectors as @set has queues. It will then query the vector > @@ -28,13 +29,14 @@ > * that maps a queue to the CPUs that have irq affinity for the corresponding > * vector. > */ > -int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev) > +int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev, > + int offset) > { > const struct cpumask *mask; > unsigned int queue, cpu; > > for (queue = 0; queue < set->nr_hw_queues; queue++) { > - mask = pci_irq_get_affinity(pdev, queue); > + mask = pci_irq_get_affinity(pdev, queue + offset); > if (!mask) > goto fallback; > Maybe we could provide a callback parameter for __blk_mq_pci_map_queues which give the mapping from hctx queue num to device-relative interrupt vector index. Thanks Jianchao