Re: disk-io lockup in 4.14.13 kernel

2018-03-24 Thread Jaco Kroon
Hi Bart,

Does the following go with your theory:

[452545.945561] sysrq: SysRq : Show backtrace of all active CPUs
[452545.946182] NMI backtrace for cpu 5
[452545.946185] CPU: 5 PID: 31921 Comm: bash Tainted: G  I
4.14.13-uls #2
[452545.946186] Hardware name: Supermicro
SSG-5048R-E1CR36L/X10SRH-CLN4F, BIOS T20140520103247 05/20/2014
[452545.946187] Call Trace:
[452545.946196]  dump_stack+0x46/0x5a
[452545.946200]  nmi_cpu_backtrace+0xb3/0xc0
[452545.946205]  ? irq_force_complete_move+0xd0/0xd0
[452545.946208]  nmi_trigger_cpumask_backtrace+0x8f/0xc0
[452545.946212]  __handle_sysrq+0xec/0x140
[452545.946216]  write_sysrq_trigger+0x26/0x30
[452545.946219]  proc_reg_write+0x38/0x60
[452545.946222]  __vfs_write+0x1e/0x130
[452545.946225]  vfs_write+0xab/0x190
[452545.946228]  SyS_write+0x3d/0xa0
[452545.946233]  entry_SYSCALL_64_fastpath+0x13/0x6c
[452545.946236] RIP: 0033:0x7f6b85db52d0
[452545.946238] RSP: 002b:7fff6f9479e8 EFLAGS: 0246
[452545.946241] Sending NMI from CPU 5 to CPUs 0-4:
[452545.946272] NMI backtrace for cpu 0 skipped: idling at pc
0x8162b0a0
[452545.946275] NMI backtrace for cpu 3 skipped: idling at pc
0x8162b0a0
[452545.946279] NMI backtrace for cpu 4 skipped: idling at pc
0x8162b0a0
[452545.946283] NMI backtrace for cpu 2 skipped: idling at pc
0x8162b0a0
[452545.946287] NMI backtrace for cpu 1 skipped: idling at pc
0x8162b0a0

I'm not sure how to link that address back to some function or
something, and had to reboot, so not sure if that can be done still.

Kind Regards,
Jaco

On 13/03/2018 19:24, Bart Van Assche wrote:
> On Tue, 2018-03-13 at 19:16 +0200, Jaco Kroon wrote:
>> The server in question is the destination of  numerous rsync/ssh cases
>> (used primarily for backups) and is not intended as a real-time system.
>> I'm happy to enable the options below that you would indicate would be
>> helpful in pinpointing the problem (assuming we're not looking at a 8x
>> more CPU required type of degrading as I've recently seen with asterisk
>> lock debugging enabled). I've marked in bold below what I assume would
>> be helpful.  If you don't mind confirming for me I'll enable and
>> schedule a reboot.
> Hello Jaco,
>
> My recommendation is to wait until the mpt3sas maintainers post a fix
> for what I reported yesterday on the linux-scsi mailing list. Enabling
> CONFIG_DEBUG_ATOMIC_SLEEP has namely a very annoying consequence for the
> mpt3sas driver: the first process that hits the "sleep in atomic context"
> bug gets killed. I don't think that you want this kind of behavior on a
> production setup.
>
> Bart.
>
>
>
>



Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-24 Thread Stephen Bates
> That would be very nice but many devices do not support the internal
> route. 

But Logan in the NVMe case we are discussing movement within a single function 
(i.e. from a NVMe namespace to a NVMe CMB on the same function). Bjorn is 
discussing movement between two functions (PFs or VFs) in the same PCIe EP. In 
the case of multi-function endpoints I think the standard requires those 
devices to support internal DMAs for transfers between those functions (but 
does not require it within a function).

So I think the summary is:

1. There is no requirement for a single function to support internal DMAs but 
in the case of NVMe we do have a protocol specific way for a NVMe function to 
indicate it supports via the CMB BAR. Other protocols may also have such 
methods but I am not aware of them at this time.

2. For multi-function end-points I think it is a requirement that DMAs 
*between* functions are supported via an internal path but this can be 
over-ridden by ACS when supported in the EP.

3. For multi-function end-points there is no requirement to support internal 
DMA within each individual function (i.e. a la point 1 but extended to each 
function in a MF device). 

Based on my review of the specification I concur with Bjorn that p2pdma between 
functions in a MF end-point should be assured to be supported via the standard. 
However if the p2pdma involves only a single function in a MF device then we 
can only support NVMe CMBs for now. Let's review and see what the options are 
for supporting this in the next respin.

Stephen




Re: [PATCH 1/3] blk-mq: Allow PCI vector offset for mapping queues

2018-03-24 Thread jianchao.wang
Hi Keith

Thanks for your time and patch for this.

On 03/24/2018 06:19 AM, Keith Busch wrote:
> The PCI interrupt vectors intended to be associated with a queue may
> not start at 0. This patch adds an offset parameter so blk-mq may find
> the intended affinity mask. The default value is 0 so existing drivers
> that don't care about this parameter don't need to change.
> 
> Signed-off-by: Keith Busch 
> ---
>  block/blk-mq-pci.c | 12 ++--
>  include/linux/blk-mq-pci.h |  2 ++
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c
> index 76944e3271bf..1040a7705c13 100644
> --- a/block/blk-mq-pci.c
> +++ b/block/blk-mq-pci.c
> @@ -21,6 +21,7 @@
>   * blk_mq_pci_map_queues - provide a default queue mapping for PCI device
>   * @set: tagset to provide the mapping for
>   * @pdev:PCI device associated with @set.
> + * @offset:  PCI irq starting vector offset
>   *
>   * This function assumes the PCI device @pdev has at least as many available
>   * interrupt vectors as @set has queues.  It will then query the vector
> @@ -28,13 +29,14 @@
>   * that maps a queue to the CPUs that have irq affinity for the corresponding
>   * vector.
>   */
> -int blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev)
> +int __blk_mq_pci_map_queues(struct blk_mq_tag_set *set, struct pci_dev *pdev,
> + int offset)
>  {
>   const struct cpumask *mask;
>   unsigned int queue, cpu;
>  
>   for (queue = 0; queue < set->nr_hw_queues; queue++) {
> - mask = pci_irq_get_affinity(pdev, queue);
> + mask = pci_irq_get_affinity(pdev, queue + offset);
>   if (!mask)
>   goto fallback;
>  

Maybe we could provide a callback parameter for __blk_mq_pci_map_queues which
give the mapping from hctx queue num to device-relative interrupt vector index.

Thanks
Jianchao