Re: source-id verification failures
On Thu, 11 Oct 2018 12:09:16 -0700 Jerry Snitselaar wrote: > On Fri Oct 05 18, Jacob Pan wrote: > >On Thu, 4 Oct 2018 13:57:24 -0700 > >Jerry Snitselaar wrote: > > > >> > > >> >On Tue, Oct 02, 2018 at 10:25:29AM -0700, Jerry Snitselaar > >> >wrote: > >> >> I've been trying to track down a problem where an hp dl380 gen8 > >> >> with a Cavium QLogic BR-1860 Fabric Adapter is getting source-id > >> >> verification failures when running dhclient against that > >> >> interface. This started showing up when I backported the iova > >> >> deferred flushing patches. So far this has only been seen on > >> >> this one system, but I'm trying to understand why it appears > >> >> with the new deferred flushing code. I also see it with both > >> >> 4.18.10, and 4.19.0-rc6 kernels. > >> >> > >Hi Jerry, > >Could you confirm that you see this failure in v4.19-rc6 kernel only > >without "strict" mode? I don't see a connection between deferred > >flushing and IR here, AFAIK deferred flush only affects DMA > >remapping. > > > >Also, does the SID failure occur on other devices under the same > >IOMMU? > > > > > >Thanks, > > > >Jacob > > > > Confirmed the system doesn't see the problem with intel_iommu=strict. > We've only seen SID failures occur for the 2 ports on the brocade > device. > Thanks for the data. I am puzzled about how could IR fault be related to the strict mode. We don't defer flushing for IR. The only thing is that queued invalidation interface is shared, where the non-strict mode may hold up the QI longer from time to time. perhaps add a printk in modify_irte() and see if that fault happens around IRTE updates. e.g. @@ -170,10 +171,15 @@ static int modify_irte(struct irq_2_iommu *irq_iommu, index = irq_iommu->irte_index + irq_iommu->sub_handle; irte = >ir_table->base[index]; + pr_debug("irte h:%llx l:%llx, sid:%x\n", + irte_modified->high, irte_modified->low, irte_modified->sid); + > Another data point is that there is a dl388 gen8 with the same card, > and we don't see any problems there. I'd say it is something with > this system, but it is odd that the problem starts showing itself > when I add those patches. > All I can say is that the following IRTE dump looks sane. You might be able to relax the SID check from full 16bit to 13bit which excludes the three function bits. (bit 80-81 SQ=0b11). But i doubt it is the issue. Jacob > > # cat ir_translation_struct > Remapped Interrupt supported on IOMMU: dmar0 > IR table address:42f20 > Entry SrcID DstIDVct IRTE_high IRTE_low > 2 24:00.0 00020001 21 00042400 000200010021000d > 3 24:00.0 00020200 28 00042400 00020228000d > 4 24:00.0 00020800 28 00042400 00020828000d > 5 24:00.0 0001 ef 00042400 000100ef000d > 6 24:00.0 0001 ef 00042400 000100ef000d > 7 24:00.0 0001 ef 00042400 000100ef000d > 8 24:00.0 0001 ef 00042400 000100ef000d > 9 24:00.0 0001 ef 00042400 000100ef000d > 1024:00.0 0001 ef 00042400 000100ef000d > 1124:00.0 0001 ef 00042400 000100ef000d > 1224:00.0 0001 ef 00042400 000100ef000d > 1324:00.0 0001 ef 00042400 000100ef000d > 1424:00.0 0001 ef 00042400 000100ef000d > 1524:00.0 0001 ef 00042400 000100ef000d > 1624:00.0 0001 ef 00042400 000100ef000d > 1724:00.0 0001 ef 00042400 000100ef000d > 1824:00.0 0001 ef 00042400 000100ef000d > 1924:00.0 0001 ef 00042400 000100ef000d > 2024:00.1 00020004 25 00042401 000200040025000d > 2124:00.1 00020001 29 00042401 000200010029000d > 2224:00.1 00020004 29 00042401 000200040029000d > 2324:00.1 0001 ef 00042401 000100ef000d > 2424:00.1 0001 ef 00042401 000100ef000d > 2524:00.1 0001 ef 00042401 000100ef000d > 2624:00.1 0001 ef 00042401 000100ef000d > 2724:00.1 0001 ef 00042401 000100ef000d > 2824:00.1 0001 ef 00042401 000100ef000d > 2924:00.1 0001 ef 00042401 000100ef000d > 3024:00.1 0001 ef 00042401 000100ef000d > 3124:00.1 0001 ef 00042401 000100ef000d > 3224
Re: source-id verification failures
On Fri Oct 05 18, Jacob Pan wrote: On Thu, 4 Oct 2018 13:57:24 -0700 Jerry Snitselaar wrote: > >On Tue, Oct 02, 2018 at 10:25:29AM -0700, Jerry Snitselaar wrote: >> I've been trying to track down a problem where an hp dl380 gen8 >> with a Cavium QLogic BR-1860 Fabric Adapter is getting source-id >> verification failures when running dhclient against that >> interface. This started showing up when I backported the iova >> deferred flushing patches. So far this has only been seen on this >> one system, but I'm trying to understand why it appears with the >> new deferred flushing code. I also see it with both 4.18.10, and >> 4.19.0-rc6 kernels. >> Hi Jerry, Could you confirm that you see this failure in v4.19-rc6 kernel only without "strict" mode? I don't see a connection between deferred flushing and IR here, AFAIK deferred flush only affects DMA remapping. Also, does the SID failure occur on other devices under the same IOMMU? Thanks, Jacob Confirmed the system doesn't see the problem with intel_iommu=strict. We've only seen SID failures occur for the 2 ports on the brocade device. Another data point is that there is a dl388 gen8 with the same card, and we don't see any problems there. I'd say it is something with this system, but it is odd that the problem starts showing itself when I add those patches. # cat ir_translation_struct Remapped Interrupt supported on IOMMU: dmar0 IR table address:42f20 Entry SrcID DstIDVct IRTE_highIRTE_low 2 24:00.0 00020001 21 00042400 000200010021000d 3 24:00.0 00020200 28 00042400 00020228000d 4 24:00.0 00020800 28 00042400 00020828000d 5 24:00.0 0001 ef 00042400 000100ef000d 6 24:00.0 0001 ef 00042400 000100ef000d 7 24:00.0 0001 ef 00042400 000100ef000d 8 24:00.0 0001 ef 00042400 000100ef000d 9 24:00.0 0001 ef 00042400 000100ef000d 1024:00.0 0001 ef 00042400 000100ef000d 1124:00.0 0001 ef 00042400 000100ef000d 1224:00.0 0001 ef 00042400 000100ef000d 1324:00.0 0001 ef 00042400 000100ef000d 1424:00.0 0001 ef 00042400 000100ef000d 1524:00.0 0001 ef 00042400 000100ef000d 1624:00.0 0001 ef 00042400 000100ef000d 1724:00.0 0001 ef 00042400 000100ef000d 1824:00.0 0001 ef 00042400 000100ef000d 1924:00.0 0001 ef 00042400 000100ef000d 2024:00.1 00020004 25 00042401 000200040025000d 2124:00.1 00020001 29 00042401 000200010029000d 2224:00.1 00020004 29 00042401 000200040029000d 2324:00.1 0001 ef 00042401 000100ef000d 2424:00.1 0001 ef 00042401 000100ef000d 2524:00.1 0001 ef 00042401 000100ef000d 2624:00.1 0001 ef 00042401 000100ef000d 2724:00.1 0001 ef 00042401 000100ef000d 2824:00.1 0001 ef 00042401 000100ef000d 2924:00.1 0001 ef 00042401 000100ef000d 3024:00.1 0001 ef 00042401 000100ef000d 3124:00.1 0001 ef 00042401 000100ef000d 3224:00.1 0001 ef 00042401 000100ef000d 3324:00.1 0001 ef 00042401 000100ef000d 3424:00.1 0001 ef 00042401 000100ef000d 3524:00.1 0001 ef 00042401 000100ef000d 3624:00.1 0001 ef 00042401 000100ef000d 3724:00.1 0001 ef 00042401 000100ef000d 3920:04.0 00020010 28 00042020 000200100028000d 4120:04.1 00020040 28 00042021 000200400028000d 4220:04.2 00020100 28 00042022 00020128000d 4320:04.3 00020400 28 00042023 00020428000d 4420:04.4 00020002 28 00042024 000200020028000d 4520:04.5 00020008 28 00042025 000200080028000d 4620:04.6 00020020 28 00042026 000200200028000d 4720:04.7 00020080 28 00042027 000200800028000d Remapped Interrupt supported on IOMMU: dmar1 IR table address:42e80 Entry SrcID DstIDVct IRTE_highIRTE_low 0 00:1e.1 00020020 2a 000400f1 00020020002a000d 1 00:1e.1 0001 30 000400f1 0001003d 2 00:1e.1 00020200 2a 000400f1 0002022a000d 7 00:1e.1 00020004 2b 000400f1 00020004002b000d 8 00:1e.1 00020040 2b 000400f1 00020040002b000d 1100:1e.1 00020100 2b 000400f1 0002012b000d 1700:1e.1 00020010 29 000
Re: source-id verification failures
On Fri Oct 05 18, Raj, Ashok wrote: On Thu, Oct 04, 2018 at 03:07:46PM -0700, Jacob Pan wrote: On Thu, 4 Oct 2018 13:57:24 -0700 Jerry Snitselaar wrote: > On Thu Oct 04 18, Joerg Roedel wrote: > >Hi Jerry, > > > >thanks for the report. > > > >On Tue, Oct 02, 2018 at 10:25:29AM -0700, Jerry Snitselaar wrote: > >> I've been trying to track down a problem where an hp dl380 gen8 > >> with a Cavium QLogic BR-1860 Fabric Adapter is getting source-id > >> verification failures when running dhclient against that > >> interface. This started showing up when I backported the iova > >> deferred flushing patches. So far this has only been seen on this > >> one system, but I'm trying to understand why it appears with the > >> new deferred flushing code. I also see it with both 4.18.10, and > >> 4.19.0-rc6 kernels. Weird.. IRC, these were there to accomodate phantom functions. Thought PCIe allowed 8bit tag, so if the device needs to allow more than 256 outstanding transactions, one could use the extra functions to account for. I assumed Linux didn't enable phantom functions. If that's the case we also need to ensure all the DMA is aliased properly. I'm assuming if interrupts are generated by other aliases we could block them. Is this device one such? Cheers, Ashok > >> > >> [35645.282021] bna :24:00.1 ens5f1: link down > >> [35645.298396] bna :24:00.0 ens5f0: link down > >> [35650.313210] DMAR: DRHD: handling fault status reg 2 > >> [35650.332477] DMAR: [INTR-REMAP] Request device [24:00.0] fault > >> index 14 [fault reason 38] Blocked an interrupt request due to > >> source-id verification failure [35655.137667] bna :24:00.0 > >> ens5f0: link up [35657.532454] bna :24:00.1 ens5f1: link up > >> [35664.281563] bna :24:00.1 ens5f1: link down [35664.298103] > >> bna :24:00.0 ens5f0: link down [35669.313568] DMAR: DRHD: > >> handling fault status reg 102 [35669.333198] DMAR: [INTR-REMAP] > >> Request device [24:00.0] fault index 14 [fault reason 38] Blocked > >> an interrupt request due to source-id verification failure > >> [35674.081212] bna :24:00.0 ens5f0: link up [35674.981280] bna > >> :24:00.1 ens5f1: link up > >> > >> > >> Any ideas? > > > >No, not yet. Can you please post the output of lscpi -vvv? > > > >Jacob, can you or someone from your team please also have a look into > >this problem report? > > yep. +Ashok Jerry, Could you also dump the interrupt remapping table with this patchset? https://lkml.org/lkml/2018/9/12/44 Thanks, Sorry, I've been on dad duty the past few days. I should be back working on this tonight or tomorrow. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: source-id verification failures
On Thu, Oct 04, 2018 at 03:07:46PM -0700, Jacob Pan wrote: > On Thu, 4 Oct 2018 13:57:24 -0700 > Jerry Snitselaar wrote: > > > On Thu Oct 04 18, Joerg Roedel wrote: > > >Hi Jerry, > > > > > >thanks for the report. > > > > > >On Tue, Oct 02, 2018 at 10:25:29AM -0700, Jerry Snitselaar wrote: > > >> I've been trying to track down a problem where an hp dl380 gen8 > > >> with a Cavium QLogic BR-1860 Fabric Adapter is getting source-id > > >> verification failures when running dhclient against that > > >> interface. This started showing up when I backported the iova > > >> deferred flushing patches. So far this has only been seen on this > > >> one system, but I'm trying to understand why it appears with the > > >> new deferred flushing code. I also see it with both 4.18.10, and > > >> 4.19.0-rc6 kernels. Weird.. IRC, these were there to accomodate phantom functions. Thought PCIe allowed 8bit tag, so if the device needs to allow more than 256 outstanding transactions, one could use the extra functions to account for. I assumed Linux didn't enable phantom functions. If that's the case we also need to ensure all the DMA is aliased properly. I'm assuming if interrupts are generated by other aliases we could block them. Is this device one such? Cheers, Ashok > > >> > > >> [35645.282021] bna :24:00.1 ens5f1: link down > > >> [35645.298396] bna :24:00.0 ens5f0: link down > > >> [35650.313210] DMAR: DRHD: handling fault status reg 2 > > >> [35650.332477] DMAR: [INTR-REMAP] Request device [24:00.0] fault > > >> index 14 [fault reason 38] Blocked an interrupt request due to > > >> source-id verification failure [35655.137667] bna :24:00.0 > > >> ens5f0: link up [35657.532454] bna :24:00.1 ens5f1: link up > > >> [35664.281563] bna :24:00.1 ens5f1: link down [35664.298103] > > >> bna :24:00.0 ens5f0: link down [35669.313568] DMAR: DRHD: > > >> handling fault status reg 102 [35669.333198] DMAR: [INTR-REMAP] > > >> Request device [24:00.0] fault index 14 [fault reason 38] Blocked > > >> an interrupt request due to source-id verification failure > > >> [35674.081212] bna :24:00.0 ens5f0: link up [35674.981280] bna > > >> :24:00.1 ens5f1: link up > > >> > > >> > > >> Any ideas? > > > > > >No, not yet. Can you please post the output of lscpi -vvv? > > > > > >Jacob, can you or someone from your team please also have a look into > > >this problem report? > > > > yep. > +Ashok > > Jerry, > Could you also dump the interrupt remapping table with this patchset? > https://lkml.org/lkml/2018/9/12/44 > > Thanks, > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: source-id verification failures
On Thu, 4 Oct 2018 13:57:24 -0700 Jerry Snitselaar wrote: > > > >On Tue, Oct 02, 2018 at 10:25:29AM -0700, Jerry Snitselaar wrote: > >> I've been trying to track down a problem where an hp dl380 gen8 > >> with a Cavium QLogic BR-1860 Fabric Adapter is getting source-id > >> verification failures when running dhclient against that > >> interface. This started showing up when I backported the iova > >> deferred flushing patches. So far this has only been seen on this > >> one system, but I'm trying to understand why it appears with the > >> new deferred flushing code. I also see it with both 4.18.10, and > >> 4.19.0-rc6 kernels. > >> Hi Jerry, Could you confirm that you see this failure in v4.19-rc6 kernel only without "strict" mode? I don't see a connection between deferred flushing and IR here, AFAIK deferred flush only affects DMA remapping. Also, does the SID failure occur on other devices under the same IOMMU? Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: source-id verification failures
Hi Jerry, thanks for the report. On Tue, Oct 02, 2018 at 10:25:29AM -0700, Jerry Snitselaar wrote: > I've been trying to track down a problem where an hp dl380 gen8 with a Cavium > QLogic BR-1860 Fabric Adapter > is getting source-id verification failures when running dhclient against that > interface. This started showing > up when I backported the iova deferred flushing patches. So far this has only > been seen on this one system, > but I'm trying to understand why it appears with the new deferred flushing > code. I also see it with both 4.18.10, > and 4.19.0-rc6 kernels. > > [35645.282021] bna :24:00.1 ens5f1: link down > [35645.298396] bna :24:00.0 ens5f0: link down > [35650.313210] DMAR: DRHD: handling fault status reg 2 > [35650.332477] DMAR: [INTR-REMAP] Request device [24:00.0] fault index 14 > [fault reason 38] Blocked an interrupt request due to source-id verification > failure > [35655.137667] bna :24:00.0 ens5f0: link up > [35657.532454] bna :24:00.1 ens5f1: link up > [35664.281563] bna :24:00.1 ens5f1: link down > [35664.298103] bna :24:00.0 ens5f0: link down > [35669.313568] DMAR: DRHD: handling fault status reg 102 > [35669.333198] DMAR: [INTR-REMAP] Request device [24:00.0] fault index 14 > [fault reason 38] Blocked an interrupt request due to source-id verification > failure > [35674.081212] bna :24:00.0 ens5f0: link up > [35674.981280] bna :24:00.1 ens5f1: link up > > > Any ideas? No, not yet. Can you please post the output of lscpi -vvv? Jacob, can you or someone from your team please also have a look into this problem report? Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
source-id verification failures
I've been trying to track down a problem where an hp dl380 gen8 with a Cavium QLogic BR-1860 Fabric Adapter is getting source-id verification failures when running dhclient against that interface. This started showing up when I backported the iova deferred flushing patches. So far this has only been seen on this one system, but I'm trying to understand why it appears with the new deferred flushing code. I also see it with both 4.18.10, and 4.19.0-rc6 kernels. [35645.282021] bna :24:00.1 ens5f1: link down [35645.298396] bna :24:00.0 ens5f0: link down [35650.313210] DMAR: DRHD: handling fault status reg 2 [35650.332477] DMAR: [INTR-REMAP] Request device [24:00.0] fault index 14 [fault reason 38] Blocked an interrupt request due to source-id verification failure [35655.137667] bna :24:00.0 ens5f0: link up [35657.532454] bna :24:00.1 ens5f1: link up [35664.281563] bna :24:00.1 ens5f1: link down [35664.298103] bna :24:00.0 ens5f0: link down [35669.313568] DMAR: DRHD: handling fault status reg 102 [35669.333198] DMAR: [INTR-REMAP] Request device [24:00.0] fault index 14 [fault reason 38] Blocked an interrupt request due to source-id verification failure [35674.081212] bna :24:00.0 ens5f0: link up [35674.981280] bna :24:00.1 ens5f1: link up Any ideas? Regards, Jerry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu