iommu/io-pgtable-arm-v7s: About pagetable 33bit PA
Hi Robin, After the commit ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32"), we don't have ZONE_DMA in aarch64, but __arm_v7s_alloc_table[1] use the GFP_DMA to expect the physical address of pagetable should be 32bit. Right now we meet a issue that the lvl1 pagetable PA is 0x1_3ab6_ in the 4GB broad. then the IOMMU initialize failed.(This issue can be fixed if we revert ad67f5a6545f.) I planed to add GFP_DMA32 for pagetable allocation. the level1 was ok but level2 was failed. It looks that slab don't like GFP_DMA32[2]. this is the warning log: = Unexpected gfp: 0x4 (GFP_DMA32). Fixing up to gfp: 0x488020 (GFP_ATOMIC| __GFP_ZERO). Fix your code! = I don't know why slab don't allow GFP_DMA32, the gfp flags seems only be bypassed to alloc_page. Is it possible that let slab allow GFP_DMA32 for aarch64? We notice that this has been discussed[3]. but if we use alloc_page for the level2 pagetable, It may waste lots of memory. Any suggestion is appreciated. Reported-by: Nicolas Boichat [1] https://elixir.bootlin.com/linux/v4.20-rc1/source/drivers/iommu/io-pgtable-arm-v7s.c#L200 [2] https://elixir.bootlin.com/linux/v4.20-rc1/source/mm/internal.h#L37 [3] https://patchwork.kernel.org/patch/10474289/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Thursday, November 8, 2018 2:14 PM > > Hi, > > On 11/8/18 1:45 PM, Liu, Yi L wrote: > >> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >> Sent: Thursday, November 8, 2018 1:25 PM > >> Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation > >> descriptor support > >> > >> Hi, > >> > >> On 11/8/18 11:49 AM, Liu, Yi L wrote: > >>> Hi, > >>> > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Thursday, November 8, 2018 10:17 AM > Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation > descriptor support > > Hi Yi, > > On 11/7/18 2:07 PM, Liu, Yi L wrote: > > Hi Baolu, > > > >> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >> Sent: Monday, November 5, 2018 1:32 PM > > [...] > > > >> --- > >> drivers/iommu/dmar.c| 83 > >> +++-- > >> drivers/iommu/intel-svm.c | 76 -- > >> drivers/iommu/intel_irq_remapping.c | 6 ++- > >> include/linux/intel-iommu.h | 9 +++- > >> 4 files changed, 115 insertions(+), 59 deletions(-) > >> > >> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index > >> d9c748b6f9e4..ec10427b98ac 100644 > >> --- a/drivers/iommu/dmar.c > >> +++ b/drivers/iommu/dmar.c > >> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct > >> intel_iommu *iommu, int > >> index) > >>int head, tail; > >>struct q_inval *qi = iommu->qi; > >>int wait_index = (index + 1) % QI_LENGTH; > >> + int shift = qi_shift(iommu); > >> > >>if (qi->desc_status[wait_index] == QI_ABORT) > >>return -EAGAIN; > >> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct > >> intel_iommu *iommu, int index) > >> */ > >>if (fault & DMA_FSTS_IQE) { > >>head = readl(iommu->reg + DMAR_IQH_REG); > >> - if ((head >> DMAR_IQ_SHIFT) == index) { > >> + if ((head >> shift) == index) { > >> + struct qi_desc *desc = qi->desc + head; > >> + > >>pr_err("VT-d detected invalid descriptor: " > >>"low=%llx, high=%llx\n", > >> - (unsigned long long)qi->desc[index].low, > >> - (unsigned long > >> long)qi->desc[index].high); > >> - memcpy(>desc[index], >desc[wait_index], > >> - sizeof(struct qi_desc)); > >> + (unsigned long long)desc->qw0, > >> + (unsigned long long)desc->qw1); > > Still missing qw2 and qw3. May make the print differ based on if smts is > configed. > qw2 and qw3 are reserved from software point of view. We don't need > to print it for information. > >>> But for Scalable mode, it should be valid? > >> No. It's reserved for software. > > No, I don’t think so. PRQ response would also be queued to hardware by > > QI. For such QI descriptors, the high bits are not reserved. > > > > Do you mean the private data fields of a page request descriptor or a page > group > response descriptor? Those fields contains software defined private data > (might a > kernel pointer?). We should avoid leaking such information in the generic > kernel > message for security consideration. > Or anything I missed? yes, I'm not sure what kind of data it may be in the private data field. From software point of view, it may be helpful to show the full content of the QI descriptor for error triage. Personally, I'm fine if you keep it on this point. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
On 11/8/18 1:48 PM, Liu, Yi L wrote: From: Liu, Yi L Sent: Thursday, November 8, 2018 1:45 PM + memcpy(desc, qi->desc + (wait_index << shift), Would "memcpy(desc, (unsigned long long) (qi->desc + (wait_index << shift)," be more safe? Can that be compiled? memcpy() requires a "const void *" for the second parameter. By the way, why it's safer with this casting? This is just an example. My point is the possibility that "qi->desc + (wait_index << shift)" would be treated as "qi->desc plus (wait_index << shift)*sizeof(*qi->desc)". Is it possible for kernel build? qi->desc is of type of "void *". no, I don’t think so... Refer to the code below. Even it has no correctness issue her, It's not due to qi->desc is "void *" type... struct qi_desc { - u64 low, high; + u64 qw0; + u64 qw1; + u64 qw2; + u64 qw3; }; Oops, just see you modified it to be "void *" in this patch. Ok, then this is fair enough. Yes. :-) Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi, On 11/8/18 1:45 PM, Liu, Yi L wrote: From: Lu Baolu [mailto:baolu...@linux.intel.com] Sent: Thursday, November 8, 2018 1:25 PM Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support Hi, On 11/8/18 11:49 AM, Liu, Yi L wrote: Hi, From: Lu Baolu [mailto:baolu...@linux.intel.com] Sent: Thursday, November 8, 2018 10:17 AM Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support Hi Yi, On 11/7/18 2:07 PM, Liu, Yi L wrote: Hi Baolu, From: Lu Baolu [mailto:baolu...@linux.intel.com] Sent: Monday, November 5, 2018 1:32 PM [...] --- drivers/iommu/dmar.c| 83 +++-- drivers/iommu/intel-svm.c | 76 -- drivers/iommu/intel_irq_remapping.c | 6 ++- include/linux/intel-iommu.h | 9 +++- 4 files changed, 115 insertions(+), 59 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index d9c748b6f9e4..ec10427b98ac 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index) int head, tail; struct q_inval *qi = iommu->qi; int wait_index = (index + 1) % QI_LENGTH; + int shift = qi_shift(iommu); if (qi->desc_status[wait_index] == QI_ABORT) return -EAGAIN; @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu *iommu, int index) */ if (fault & DMA_FSTS_IQE) { head = readl(iommu->reg + DMAR_IQH_REG); - if ((head >> DMAR_IQ_SHIFT) == index) { + if ((head >> shift) == index) { + struct qi_desc *desc = qi->desc + head; + pr_err("VT-d detected invalid descriptor: " "low=%llx, high=%llx\n", - (unsigned long long)qi->desc[index].low, - (unsigned long long)qi->desc[index].high); - memcpy(>desc[index], >desc[wait_index], - sizeof(struct qi_desc)); + (unsigned long long)desc->qw0, + (unsigned long long)desc->qw1); Still missing qw2 and qw3. May make the print differ based on if smts is configed. qw2 and qw3 are reserved from software point of view. We don't need to print it for information. But for Scalable mode, it should be valid? No. It's reserved for software. No, I don’t think so. PRQ response would also be queued to hardware by QI. For such QI descriptors, the high bits are not reserved. Do you mean the private data fields of a page request descriptor or a page group response descriptor? Those fields contains software defined private data (might a kernel pointer?). We should avoid leaking such information in the generic kernel message for security consideration. Or anything I missed? Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH -next] iommu/amd: remove set but not used variable 'tag'
From: Yue Haibing Fixes gcc '-Wunused-but-set-variable' warning: drivers/iommu/amd_iommu.c: In function 'iommu_print_event': drivers/iommu/amd_iommu.c:550:33: warning: variable 'tag' set but not used [-Wunused-but-set-variable] It never used since introduction in commit e7f63ffc1bf1 ("iommu/amd: Update logging information for new event type") Signed-off-by: Yue Haibing --- drivers/iommu/amd_iommu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 4e04fff..58a7834 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -547,7 +547,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, static void iommu_print_event(struct amd_iommu *iommu, void *__evt) { struct device *dev = iommu->iommu.dev; - int type, devid, pasid, flags, tag; + int type, devid, pasid, flags; volatile u32 *event = __evt; int count = 0; u64 address; @@ -615,7 +615,6 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt) case EVENT_TYPE_INV_PPR_REQ: pasid = ((event[0] >> 16) & 0x) | ((event[1] << 6) & 0xF); - tag = event[1] & 0x03FF; dev_err(dev, "INVALID_PPR_REQUEST device=%02x:%02x.%x pasid=0x%05x address=0x%016llx flags=0x%04x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), pasid, address, flags); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
> From: Liu, Yi L > Sent: Thursday, November 8, 2018 1:45 PM > > + memcpy(desc, qi->desc + (wait_index << shift), > > >>> > > >>> Would "memcpy(desc, (unsigned long long) (qi->desc + (wait_index > > >>> << shift)," be more safe? > > >> > > >> Can that be compiled? memcpy() requires a "const void *" for the > > >> second > > parameter. > > >> By the way, why it's safer with this casting? > > > > > > This is just an example. My point is the possibility that "qi->desc > > > + (wait_index << > > shift)" > > > would be treated as "qi->desc plus (wait_index << > > > shift)*sizeof(*qi->desc)". Is it possible for kernel build? > > > > qi->desc is of type of "void *". > > no, I don’t think so... Refer to the code below. Even it has no correctness > issue her, > It's not due to qi->desc is "void *" type... > > struct qi_desc { > - u64 low, high; > + u64 qw0; > + u64 qw1; > + u64 qw2; > + u64 qw3; > }; Oops, just see you modified it to be "void *" in this patch. Ok, then this is fair enough. Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
> From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Thursday, November 8, 2018 1:25 PM > Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor > support > > Hi, > > On 11/8/18 11:49 AM, Liu, Yi L wrote: > > Hi, > > > >> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >> Sent: Thursday, November 8, 2018 10:17 AM > >> Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation > >> descriptor support > >> > >> Hi Yi, > >> > >> On 11/7/18 2:07 PM, Liu, Yi L wrote: > >>> Hi Baolu, > >>> > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Monday, November 5, 2018 1:32 PM > >>> > >>> [...] > >>> > --- > drivers/iommu/dmar.c| 83 +++-- > drivers/iommu/intel-svm.c | 76 -- > drivers/iommu/intel_irq_remapping.c | 6 ++- > include/linux/intel-iommu.h | 9 +++- > 4 files changed, 115 insertions(+), 59 deletions(-) > > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index > d9c748b6f9e4..ec10427b98ac 100644 > --- a/drivers/iommu/dmar.c > +++ b/drivers/iommu/dmar.c > @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu > *iommu, int > index) > int head, tail; > struct q_inval *qi = iommu->qi; > int wait_index = (index + 1) % QI_LENGTH; > +int shift = qi_shift(iommu); > > if (qi->desc_status[wait_index] == QI_ABORT) > return -EAGAIN; > @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct > intel_iommu *iommu, int index) > */ > if (fault & DMA_FSTS_IQE) { > head = readl(iommu->reg + DMAR_IQH_REG); > -if ((head >> DMAR_IQ_SHIFT) == index) { > +if ((head >> shift) == index) { > +struct qi_desc *desc = qi->desc + head; > + > pr_err("VT-d detected invalid descriptor: " > "low=%llx, high=%llx\n", > -(unsigned long long)qi->desc[index].low, > -(unsigned long > long)qi->desc[index].high); > -memcpy(>desc[index], >desc[wait_index], > -sizeof(struct qi_desc)); > +(unsigned long long)desc->qw0, > +(unsigned long long)desc->qw1); > >>> > >>> Still missing qw2 and qw3. May make the print differ based on if smts is > >>> configed. > >> > >> qw2 and qw3 are reserved from software point of view. We don't need > >> to print it for information. > > > > But for Scalable mode, it should be valid? > > No. It's reserved for software. No, I don’t think so. PRQ response would also be queued to hardware by QI. For such QI descriptors, the high bits are not reserved. > >> > >>> > +memcpy(desc, qi->desc + (wait_index << shift), > >>> > >>> Would "memcpy(desc, (unsigned long long) (qi->desc + (wait_index << > >>> shift)," be more safe? > >> > >> Can that be compiled? memcpy() requires a "const void *" for the second > parameter. > >> By the way, why it's safer with this casting? > > > > This is just an example. My point is the possibility that "qi->desc + > > (wait_index << > shift)" > > would be treated as "qi->desc plus (wait_index << > > shift)*sizeof(*qi->desc)". Is it possible for kernel build? > > qi->desc is of type of "void *". no, I don’t think so... Refer to the code below. Even it has no correctness issue her, It's not due to qi->desc is "void *" type... struct qi_desc { - u64 low, high; + u64 qw0; + u64 qw1; + u64 qw2; + u64 qw3; }; Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi, On 11/8/18 11:49 AM, Liu, Yi L wrote: Hi, From: Lu Baolu [mailto:baolu...@linux.intel.com] Sent: Thursday, November 8, 2018 10:17 AM Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support Hi Yi, On 11/7/18 2:07 PM, Liu, Yi L wrote: Hi Baolu, From: Lu Baolu [mailto:baolu...@linux.intel.com] Sent: Monday, November 5, 2018 1:32 PM [...] --- drivers/iommu/dmar.c| 83 +++-- drivers/iommu/intel-svm.c | 76 -- drivers/iommu/intel_irq_remapping.c | 6 ++- include/linux/intel-iommu.h | 9 +++- 4 files changed, 115 insertions(+), 59 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index d9c748b6f9e4..ec10427b98ac 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index) int head, tail; struct q_inval *qi = iommu->qi; int wait_index = (index + 1) % QI_LENGTH; + int shift = qi_shift(iommu); if (qi->desc_status[wait_index] == QI_ABORT) return -EAGAIN; @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu *iommu, int index) */ if (fault & DMA_FSTS_IQE) { head = readl(iommu->reg + DMAR_IQH_REG); - if ((head >> DMAR_IQ_SHIFT) == index) { + if ((head >> shift) == index) { + struct qi_desc *desc = qi->desc + head; + pr_err("VT-d detected invalid descriptor: " "low=%llx, high=%llx\n", - (unsigned long long)qi->desc[index].low, - (unsigned long long)qi->desc[index].high); - memcpy(>desc[index], >desc[wait_index], - sizeof(struct qi_desc)); + (unsigned long long)desc->qw0, + (unsigned long long)desc->qw1); Still missing qw2 and qw3. May make the print differ based on if smts is configed. qw2 and qw3 are reserved from software point of view. We don't need to print it for information. But for Scalable mode, it should be valid? No. It's reserved for software. + memcpy(desc, qi->desc + (wait_index << shift), Would "memcpy(desc, (unsigned long long) (qi->desc + (wait_index << shift)," be more safe? Can that be compiled? memcpy() requires a "const void *" for the second parameter. By the way, why it's safer with this casting? This is just an example. My point is the possibility that "qi->desc + (wait_index << shift)" would be treated as "qi->desc plus (wait_index << shift)*sizeof(*qi->desc)". Is it possible for kernel build? qi->desc is of type of "void *". Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 06/12] iommu/vt-d: Add second level page table interface
Hi, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Thursday, November 8, 2018 10:28 AM > Subject: Re: [PATCH v4 06/12] iommu/vt-d: Add second level page table > interface > > Hi, > > On 11/7/18 3:13 PM, Liu, Yi L wrote: > > Hi Baolu, > > > >> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >> Sent: Monday, November 5, 2018 1:32 PM > >> > >> This adds the interfaces to setup or tear down the structures for > >> second level page table translations. This includes types of second > >> level only translation and pass through. > > > > A little bit refining to the description:) "This patch adds interfaces > > for setup or tear down second level translation in PASID granularity. > > Translation type includes second level only type and pass-through > > type." > > > >> Cc: Ashok Raj > >> Cc: Jacob Pan > >> Cc: Kevin Tian > >> Cc: Liu Yi L > >> Signed-off-by: Sanjay Kumar > > > > [...] > > > >> + > >> +void intel_pasid_tear_down_entry(struct intel_iommu *iommu, > >> + struct device *dev, int pasid) > >> +{ > >> + struct pasid_entry *pte; > > > > pte is confusing as it is similar with pte in paging structures. may > > use pt_entry or just pasid_entry. This comment applies to other "pte"s > > in this patch. > > "pte" in this file means "pasid table entry", not "page table entry". > This file holds code to handle pasid table related staff. It has nothing to > do with > paging structure. I think there should be no confusion here. > :-) I see. Then up to you. :) It's just my feeling when reading the patch, it leads me to believe it is paging structure. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi, > From: Lu Baolu [mailto:baolu...@linux.intel.com] > Sent: Thursday, November 8, 2018 10:17 AM > Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor > support > > Hi Yi, > > On 11/7/18 2:07 PM, Liu, Yi L wrote: > > Hi Baolu, > > > >> From: Lu Baolu [mailto:baolu...@linux.intel.com] > >> Sent: Monday, November 5, 2018 1:32 PM > > > > [...] > > > >> --- > >> drivers/iommu/dmar.c| 83 +++-- > >> drivers/iommu/intel-svm.c | 76 -- > >> drivers/iommu/intel_irq_remapping.c | 6 ++- > >> include/linux/intel-iommu.h | 9 +++- > >> 4 files changed, 115 insertions(+), 59 deletions(-) > >> > >> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index > >> d9c748b6f9e4..ec10427b98ac 100644 > >> --- a/drivers/iommu/dmar.c > >> +++ b/drivers/iommu/dmar.c > >> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu > >> *iommu, int > >> index) > >>int head, tail; > >>struct q_inval *qi = iommu->qi; > >>int wait_index = (index + 1) % QI_LENGTH; > >> + int shift = qi_shift(iommu); > >> > >>if (qi->desc_status[wait_index] == QI_ABORT) > >>return -EAGAIN; > >> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu > >> *iommu, int index) > >> */ > >>if (fault & DMA_FSTS_IQE) { > >>head = readl(iommu->reg + DMAR_IQH_REG); > >> - if ((head >> DMAR_IQ_SHIFT) == index) { > >> + if ((head >> shift) == index) { > >> + struct qi_desc *desc = qi->desc + head; > >> + > >>pr_err("VT-d detected invalid descriptor: " > >>"low=%llx, high=%llx\n", > >> - (unsigned long long)qi->desc[index].low, > >> - (unsigned long long)qi->desc[index].high); > >> - memcpy(>desc[index], >desc[wait_index], > >> - sizeof(struct qi_desc)); > >> + (unsigned long long)desc->qw0, > >> + (unsigned long long)desc->qw1); > > > > Still missing qw2 and qw3. May make the print differ based on if smts is > > configed. > > qw2 and qw3 are reserved from software point of view. We don't need to print > it for > information. But for Scalable mode, it should be valid? > > > > >> + memcpy(desc, qi->desc + (wait_index << shift), > > > > Would "memcpy(desc, (unsigned long long) (qi->desc + (wait_index << > > shift)," be more safe? > > Can that be compiled? memcpy() requires a "const void *" for the second > parameter. > By the way, why it's safer with this casting? This is just an example. My point is the possibility that "qi->desc + (wait_index << shift)" would be treated as "qi->desc plus (wait_index << shift)*sizeof(*qi->desc)". Is it possible for kernel build? Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 08/12] iommu/vt-d: Pass pasid table to context mapping
Hi, On 11/7/18 3:25 PM, Liu, Yi L wrote: Hi Baolu, From: Lu Baolu [mailto:baolu...@linux.intel.com] Sent: Monday, November 5, 2018 1:32 PM Subject: [PATCH v4 08/12] iommu/vt-d: Pass pasid table to context mapping So that the pasid related info, such as the pasid table and the maximum of pasid could be used during setting up scalable mode context. A little bit refine. Wish it helps. :) "This patch passes the pasid related info(e.g. the pasid table and the maximum of pasid) to context mapping, so that pasid related fields can be setup accordingly in scalable mode context entry." Yeah, thanks. Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 06/12] iommu/vt-d: Add second level page table interface
Hi, On 11/7/18 3:13 PM, Liu, Yi L wrote: Hi Baolu, From: Lu Baolu [mailto:baolu...@linux.intel.com] Sent: Monday, November 5, 2018 1:32 PM This adds the interfaces to setup or tear down the structures for second level page table translations. This includes types of second level only translation and pass through. A little bit refining to the description:) "This patch adds interfaces for setup or tear down second level translation in PASID granularity. Translation type includes second level only type and pass-through type." Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Signed-off-by: Sanjay Kumar [...] + +void intel_pasid_tear_down_entry(struct intel_iommu *iommu, +struct device *dev, int pasid) +{ + struct pasid_entry *pte; pte is confusing as it is similar with pte in paging structures. may use pt_entry or just pasid_entry. This comment applies to other "pte"s in this patch. "pte" in this file means "pasid table entry", not "page table entry". This file holds code to handle pasid table related staff. It has nothing to do with paging structure. I think there should be no confusion here. :-) Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/12] iommu/vt-d: Reserve a domain id for FL and PT modes
Hi, On 11/7/18 2:55 PM, Liu, Yi L wrote: Hi Baolu, From: Lu Baolu [mailto:baolu...@linux.intel.com] Sent: Monday, November 5, 2018 1:32 PM Subject: [PATCH v4 05/12] iommu/vt-d: Reserve a domain id for FL and PT modes Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid entry for first-level or pass-through translation should be programmed with a domain id different from those used for second-level or nested translation. It is recommended that software could use a same domain id for all first-only and pass-through translations. This reserves a domain id for first-level and pass-through translations. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Cc: Sanjay Kumar Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 10 ++ drivers/iommu/intel-pasid.h | 6 ++ 2 files changed, 16 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 9331240c70b8..2f7455ee4e7a 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1618,6 +1618,16 @@ static int iommu_init_domains(struct intel_iommu *iommu) */ set_bit(0, iommu->domain_ids); + /* +* Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid +* entry for first-level or pass-through translation modes should +* be programmed with a domain id different from those used for +* second-level or nested translation. We reserve a domain id for +* this purpose. +*/ + if (sm_supported(iommu)) + set_bit(FLPT_DEFAULT_DID, iommu->domain_ids); "FLPT_DEFAULT_DID" looks very likely for first level translation. How about "PT_FL_DEFAULT_DID"? We have comments above it, so people won't be confused. return 0; } diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h index 12f480c2bb8b..03c1612d173c 100644 --- a/drivers/iommu/intel-pasid.h +++ b/drivers/iommu/intel-pasid.h @@ -17,6 +17,12 @@ #define PDE_PFN_MASK PAGE_MASK #define PASID_PDE_SHIFT 6 +/* + * Domain ID reserved for pasid entries programmed for first-level + * only and pass-through transfer modes. + */ +#define FLPT_DEFAULT_DID 1 Would be helpful to elaborate why DID 1 is selected in the patch description. Yeah. DID 0 has been caved out for caching mode and we start from 1 for this. Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support
Hi Yi, On 11/7/18 2:07 PM, Liu, Yi L wrote: Hi Baolu, From: Lu Baolu [mailto:baolu...@linux.intel.com] Sent: Monday, November 5, 2018 1:32 PM [...] --- drivers/iommu/dmar.c| 83 +++-- drivers/iommu/intel-svm.c | 76 -- drivers/iommu/intel_irq_remapping.c | 6 ++- include/linux/intel-iommu.h | 9 +++- 4 files changed, 115 insertions(+), 59 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index d9c748b6f9e4..ec10427b98ac 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index) int head, tail; struct q_inval *qi = iommu->qi; int wait_index = (index + 1) % QI_LENGTH; + int shift = qi_shift(iommu); if (qi->desc_status[wait_index] == QI_ABORT) return -EAGAIN; @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu *iommu, int index) */ if (fault & DMA_FSTS_IQE) { head = readl(iommu->reg + DMAR_IQH_REG); - if ((head >> DMAR_IQ_SHIFT) == index) { + if ((head >> shift) == index) { + struct qi_desc *desc = qi->desc + head; + pr_err("VT-d detected invalid descriptor: " "low=%llx, high=%llx\n", - (unsigned long long)qi->desc[index].low, - (unsigned long long)qi->desc[index].high); - memcpy(>desc[index], >desc[wait_index], - sizeof(struct qi_desc)); + (unsigned long long)desc->qw0, + (unsigned long long)desc->qw1); Still missing qw2 and qw3. May make the print differ based on if smts is configed. qw2 and qw3 are reserved from software point of view. We don't need to print it for information. + memcpy(desc, qi->desc + (wait_index << shift), Would "memcpy(desc, (unsigned long long) (qi->desc + (wait_index << shift)," be more safe? Can that be compiled? memcpy() requires a "const void *" for the second parameter. By the way, why it's safer with this casting? + 1 << shift); writel(DMA_FSTS_IQE, iommu->reg + DMAR_FSTS_REG); return -EINVAL; } @@ -1191,10 +1194,10 @@ static int qi_check_fault(struct intel_iommu *iommu, int index) */ if (fault & DMA_FSTS_ITE) { head = readl(iommu->reg + DMAR_IQH_REG); - head = ((head >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH; + head = ((head >> shift) - 1 + QI_LENGTH) % QI_LENGTH; head |= 1; tail = readl(iommu->reg + DMAR_IQT_REG); - tail = ((tail >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH; + tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH; writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG); @@ -1222,15 +1225,14 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) { int rc; struct q_inval *qi = iommu->qi; - struct qi_desc *hw, wait_desc; + int offset, shift, length; + struct qi_desc wait_desc; int wait_index, index; unsigned long flags; if (!qi) return 0; - hw = qi->desc; - restart: rc = 0; @@ -1243,16 +1245,21 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu) index = qi->free_head; wait_index = (index + 1) % QI_LENGTH; + shift = qi_shift(iommu); + length = 1 << shift; qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE; - hw[index] = *desc; - - wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) | + offset = index << shift; + memcpy(qi->desc + offset, desc, length); + wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) | QI_IWD_STATUS_WRITE | QI_IWD_TYPE; - wait_desc.high = virt_to_phys(>desc_status[wait_index]); + wait_desc.qw1 = virt_to_phys(>desc_status[wait_index]); + wait_desc.qw2 = 0; + wait_desc.qw3 = 0; - hw[wait_index] = wait_desc; + offset = wait_index << shift; + memcpy(qi->desc + offset, _desc, length); same question with above one. Ditto. Best regards, Lu Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] iommu/vt-d: Group and domain relationship
Hi, On 11/8/18 1:55 AM, James Sewart wrote: Hey, On 7 Nov 2018, at 02:10, Lu Baolu wrote: Hi, On 11/6/18 6:40 PM, James Sewart wrote: Hey Lu, Would you be able to go into more detail about the issues with allowing IOMMU_DOMAIN_DMA to be allocated via domain_alloc? This door is closed because intel iommu driver does everything for IOMMU_DOMAIN_DMA: allocating a domain and setup the context entries for the domain. As far as I can tell, attach_device in the intel driver will handle cleaning up any old domain context mapping and ensure the new domain is mapped with calls to dmar_remove_one_dev_info and domain_add_dev_info. That's only for domains of IOMMU_DOMAIN_UNMANAGED, right? Why do we want to open this door? Probably we want the generic iommu layer to handle these things (it's called default domain). I’d like to allocate a domain and attach it to multiple devices in a group/multiple groups so that they share address translation, but still allow drivers for devices in those groups to use the dma_map_ops api. Just out of curiosity, why do you want to share a single domain across multiple groups? By default, the groups and DMA domains are normally 1-1 mapped, right? So we can't just open the door but not cleanup the things right? A user of domain_alloc and attach_device is responsible for detaching a domain if it is no longer needed and calling domain_free. Currently DMA API calls get_valid_domain_for_dev() to retrieve a DMA domain. If the domain has already been allocated, return directly. Otherwise, allocate and initialize a new one for the device. Let's call domains allocated by get_valid_domain_for_dev() as "A". If we open the door and allow another component to manage the DMA domains through domain iommu_domain_alloc/free(). Let's call domains allocated through iommu_domain_alloc() as "B". So how can we sync between A and B? Need to go through the code to find out more. Best regards, Lu Baolu Cheers, James. I haven't spent time on details. So I cc'ed Jacob for corrections. Best regards, Lu Baolu Cheers, James. On Fri, Nov 2, 2018 at 2:43 AM Lu Baolu wrote: Hi, On 10/30/18 10:18 PM, James Sewart via iommu wrote: Hey, I’ve been investigating the relationship between iommu groups and domains on our systems and have a few question. Why does the intel iommu code not allow allocating IOMMU_DOMAIN_DMA? Returning NULL when given this domain type has the side effect that the default_domain for an iommu group is not set, which, when using for e.g. dma_map_ops.map_page, means a domain is allocated per device. Intel vt-d driver doesn't implement the default domain and allocates domain only on demanded. There are lots of things to do before we allow iommu API to allocate domains other than IOMMU_DOMAIN_UNMANAGED. Best regards, Lu Baolu This seems to be the opposite behaviour to the AMD iommu code which supports allocating an IOMMU_DOMAIN_DMA and will only look to the iommu group if a domain is not attached to the device rather than allocating a new one. On AMD every device in an iommu group will share the same domain. Appended is what I think a patch to implement domain_alloc for IOMMU_DOMAIN_DMA and also IOMMU_DOMAIN_IDENTITY would look like. Testing shows each device in a group will share a domain by default, it also allows allocating a new dma domain that can be successfully attached to a group with iommu_attach_group. Looking for comment on why the behaviour is how it is currently and if there are any issues with the solution I’ve been testing. Cheers, James. diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index bff2abd6..3a58389f 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5170,10 +5170,15 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) struct dmar_domain *dmar_domain; struct iommu_domain *domain; - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type == IOMMU_DOMAIN_UNMANAGED) + dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE); + else if(type == IOMMU_DOMAIN_DMA) + dmar_domain = alloc_domain(0); + else if(type == IOMMU_DOMAIN_IDENTITY) + dmar_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY); + else return NULL; - dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE); if (!dmar_domain) { pr_err("Can't allocate dmar_domain\n"); return NULL; @@ -5186,9 +5191,12 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) domain_update_iommu_cap(dmar_domain); domain = _domain->domain; - domain->geometry.aperture_start = 0; - domain->geometry.aperture_end = __DOMAIN_MAX_ADDR(dmar_domain->gaw); - domain->geometry.force_aperture = true; + + if (type == IOMMU_DOMAIN_UNMANAGED) { + domain->geometry.aperture_start = 0; + domain->geometry.aperture_end =
Re: [PATCH v2] iommu/vt-d: respect max guest address width in agaw
On Wed, 7 Nov 2018 17:04:28 +0100 Joerg Roedel wrote: > On Tue, Nov 06, 2018 at 02:47:15PM -0800, Jacob Pan wrote: > > drivers/iommu/intel-iommu.c | 6 +- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > Applied, thanks. Hi Joerg, We have found some issues with this patch on some platforms. Please disregard this patch. The intent of this patch was to fix some IOMMU with max address of 48 but SAGAW of both 48 and 57 bits. Need to do more investigation in the page table level calculation and context address width setup. Sorry about that. Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] of/device: Really only set bus DMA mask when appropriate
On Wed, Nov 07, 2018 at 04:30:32PM +, Robin Murphy wrote: > of_dma_configure() was *supposed* to be following the same logic as > acpi_dma_configure() and only setting bus_dma_mask if some range was > specified by the firmware. However, it seems that subtlety got lost in > the process of fitting it into the differently-shaped control flow, and > as a result the force_dma==true case ends up always setting the bus mask > to the 32-bit default, which is not what anyone wants. > > Make sure we only touch it if the DT actually said so. > > Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate") > Reported-by: Aaro Koskinen > Reported-by: Jean-Philippe Brucker > Tested-by: Aaro Koskinen > Tested-by: John Stultz > Tested-by: Geert Uytterhoeven > Tested-by: Robert Richter > Signed-off-by: Robin Murphy > --- > > v2: Add comment, collect tested-by tags > > drivers/of/device.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) Applied, thanks. Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] iommu/vt-d: Group and domain relationship
Hey, > On 7 Nov 2018, at 02:10, Lu Baolu wrote: > > Hi, > > On 11/6/18 6:40 PM, James Sewart wrote: >> Hey Lu, >> Would you be able to go into more detail about the issues with >> allowing IOMMU_DOMAIN_DMA to be allocated via domain_alloc? > > This door is closed because intel iommu driver does everything for > IOMMU_DOMAIN_DMA: allocating a domain and setup the context entries > for the domain. As far as I can tell, attach_device in the intel driver will handle cleaning up any old domain context mapping and ensure the new domain is mapped with calls to dmar_remove_one_dev_info and domain_add_dev_info. > > Why do we want to open this door? Probably we want the generic iommu > layer to handle these things (it's called default domain). I’d like to allocate a domain and attach it to multiple devices in a group/multiple groups so that they share address translation, but still allow drivers for devices in those groups to use the dma_map_ops api. > So we can't just open the door but not cleanup the things right? A user of domain_alloc and attach_device is responsible for detaching a domain if it is no longer needed and calling domain_free. Cheers, James. > > I haven't spent time on details. So I cc'ed Jacob for corrections. > > Best regards, > Lu Baolu > >> Cheers, >> James. >> On Fri, Nov 2, 2018 at 2:43 AM Lu Baolu wrote: >>> >>> Hi, >>> >>> On 10/30/18 10:18 PM, James Sewart via iommu wrote: Hey, I’ve been investigating the relationship between iommu groups and domains on our systems and have a few question. Why does the intel iommu code not allow allocating IOMMU_DOMAIN_DMA? Returning NULL when given this domain type has the side effect that the default_domain for an iommu group is not set, which, when using for e.g. dma_map_ops.map_page, means a domain is allocated per device. >>> >>> Intel vt-d driver doesn't implement the default domain and allocates >>> domain only on demanded. There are lots of things to do before we allow >>> iommu API to allocate domains other than IOMMU_DOMAIN_UNMANAGED. >>> >>> Best regards, >>> Lu Baolu >>> This seems to be the opposite behaviour to the AMD iommu code which supports allocating an IOMMU_DOMAIN_DMA and will only look to the iommu group if a domain is not attached to the device rather than allocating a new one. On AMD every device in an iommu group will share the same domain. Appended is what I think a patch to implement domain_alloc for IOMMU_DOMAIN_DMA and also IOMMU_DOMAIN_IDENTITY would look like. Testing shows each device in a group will share a domain by default, it also allows allocating a new dma domain that can be successfully attached to a group with iommu_attach_group. Looking for comment on why the behaviour is how it is currently and if there are any issues with the solution I’ve been testing. Cheers, James. diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index bff2abd6..3a58389f 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5170,10 +5170,15 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) struct dmar_domain *dmar_domain; struct iommu_domain *domain; - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type == IOMMU_DOMAIN_UNMANAGED) + dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE); + else if(type == IOMMU_DOMAIN_DMA) + dmar_domain = alloc_domain(0); + else if(type == IOMMU_DOMAIN_IDENTITY) + dmar_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY); + else return NULL; - dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE); if (!dmar_domain) { pr_err("Can't allocate dmar_domain\n"); return NULL; @@ -5186,9 +5191,12 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) domain_update_iommu_cap(dmar_domain); domain = _domain->domain; - domain->geometry.aperture_start = 0; - domain->geometry.aperture_end = __DOMAIN_MAX_ADDR(dmar_domain->gaw); - domain->geometry.force_aperture = true; + + if (type == IOMMU_DOMAIN_UNMANAGED) { + domain->geometry.aperture_start = 0; + domain->geometry.aperture_end = __DOMAIN_MAX_ADDR(dmar_domain->gaw); + domain->geometry.force_aperture = true; + } return domain; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org
Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3
On Wed, 7 Nov 2018 17:43:23 +0100 "j...@8bytes.org" wrote: > Hi, > > On Wed, Nov 07, 2018 at 11:40:30AM +0800, Lu Baolu wrote: > > Hi Joerg, > > > > On 11/7/18 12:25 AM, j...@8bytes.org wrote: > > Nowadays, we can find PASID granular DMA translation on both ARM and x86 > > platforms. With PASID granular DMA translation supported in system iommu, if > > a device divides itself into multiple subsets and tag the DMA > > transfers of each subset with a unique PASID, each subset become > > assignable. We call the assignable subset as an ADI (Assignable Device > > Interface). As the result, each ADI could be attached with a domain. > > Yeah, I know the background. The point is, the IOMMU-API as of today > implements a strict one-to-one relationship between domains and devices, > every device can only have one domain assigned at a given time. If we > assign a new domain to a device, the old gets unassigned. > > If we allow to attach multiple domains to a single device we > fundamentally break that semantic. > > Therefore I think it is better is support the ADI devices with > subdomains and a new set of functions in the API to handle only these > sub-domains. > > > Further more, a single domain might be attached to an ADI of device A, > > while attached to another legacy device B which doesn't support PASID > > features. In this case, we say "Domain 4" is attached to ADI(PASID#x) in > > aux mode and attached to device B in primary mode. > > This will of course not work with subdomains, but is that really > necessary? VFIO already allocates a separate domain for each device > anyway (iirc), so it is unlikely that is uses the same domain for a > legacy and an ADI device. VFIO will attempt to use the same domain for all groups within a container, only falling back to separate domains if there's an incompatibility. Multiple domains means that we need to mirror mapping and unmapping across each domain, so there's more work to do and theoretically more overhead in the IOMMU implementation assuming those domains land on the same IOMMU driver. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free
On Wed, Nov 07, 2018 at 04:17:09PM +, Robin Murphy wrote: > FWIW it looks like it *has* always been possible to hit this crash by > allocating a domain and freeing it again without attaching any devices, it's > just highly improbable for any sane code to do that explicitly, so the real > latent triggers are failure paths in external callers (which in this case > are themselves only being reached thanks to my bug elsewhere). Okay, given this, it probably makes more sense to change the fixes tag. This is what I just committed: Fixes: d25a2a16f0889 ('iommu: Add driver for Renesas VMSA-compatible IPMMU') Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3
Hi, On Wed, Nov 07, 2018 at 11:40:30AM +0800, Lu Baolu wrote: > Hi Joerg, > > On 11/7/18 12:25 AM, j...@8bytes.org wrote: > Nowadays, we can find PASID granular DMA translation on both ARM and x86 > platforms. With PASID granular DMA translation supported in system iommu, if > a device divides itself into multiple subsets and tag the DMA > transfers of each subset with a unique PASID, each subset become > assignable. We call the assignable subset as an ADI (Assignable Device > Interface). As the result, each ADI could be attached with a domain. Yeah, I know the background. The point is, the IOMMU-API as of today implements a strict one-to-one relationship between domains and devices, every device can only have one domain assigned at a given time. If we assign a new domain to a device, the old gets unassigned. If we allow to attach multiple domains to a single device we fundamentally break that semantic. Therefore I think it is better is support the ADI devices with subdomains and a new set of functions in the API to handle only these sub-domains. > Further more, a single domain might be attached to an ADI of device A, > while attached to another legacy device B which doesn't support PASID > features. In this case, we say "Domain 4" is attached to ADI(PASID#x) in > aux mode and attached to device B in primary mode. This will of course not work with subdomains, but is that really necessary? VFIO already allocates a separate domain for each device anyway (iirc), so it is unlikely that is uses the same domain for a legacy and an ADI device. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] of/device: Really only set bus DMA mask when appropriate
of_dma_configure() was *supposed* to be following the same logic as acpi_dma_configure() and only setting bus_dma_mask if some range was specified by the firmware. However, it seems that subtlety got lost in the process of fitting it into the differently-shaped control flow, and as a result the force_dma==true case ends up always setting the bus mask to the 32-bit default, which is not what anyone wants. Make sure we only touch it if the DT actually said so. Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate") Reported-by: Aaro Koskinen Reported-by: Jean-Philippe Brucker Tested-by: Aaro Koskinen Tested-by: John Stultz Tested-by: Geert Uytterhoeven Tested-by: Robert Richter Signed-off-by: Robin Murphy --- v2: Add comment, collect tested-by tags drivers/of/device.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/of/device.c b/drivers/of/device.c index 0f27fad9fe94..5592437bb3d1 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -149,9 +149,11 @@ int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma) * set by the driver. */ mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1); - dev->bus_dma_mask = mask; dev->coherent_dma_mask &= mask; *dev->dma_mask &= mask; + /* ...but only set bus mask if we found valid dma-ranges earlier */ + if (!ret) + dev->bus_dma_mask = mask; coherent = of_dma_is_coherent(np); dev_dbg(dev, "device is%sdma coherent\n", -- 2.19.1.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] of/device: Really only set bus DMA mask when appropriate
On 2018-11-07 3:52 pm, Rob Herring wrote: On Wed, Nov 07, 2018 at 12:56:49PM +, Robin Murphy wrote: On 2018-11-07 8:03 am, Christoph Hellwig wrote: On Tue, Nov 06, 2018 at 11:54:15AM +, Robin Murphy wrote: of_dma_configure() was *supposed* to be following the same logic as acpi_dma_configure() and only setting bus_dma_mask if some range was specified by the firmware. However, it seems that subtlety got lost in the process of fitting it into the differently-shaped control flow, and as a result the force_dma==true case ends up always setting the bus mask to the 32-bit default, which is not what anyone wants. Make sure we only touch it if the DT actually said so. This looks good, but I think it could really use a comment as the use of ret all the way down the function isn't exactly obvious. Fair point. Let me now if you want this picked up through the OF or DMA trees. I don't mind either way; I figure I'll wait a bit longer to see if Rob has any preference, then resend with the comment and the tags picked up so it can hopefully make rc2. I have other fixes to send, so I can take it. Cheers Rob, I'll send you that updated version momentarily. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free
On 2018-11-07 4:03 pm, Joerg Roedel wrote: On Wed, Nov 07, 2018 at 04:50:40PM +0100, Geert Uytterhoeven wrote: It only got triggered by the combination of commits 6c2fb2ea76361da9 ("of/device: Set bus DMA mask as appropriate") and b4ebe6063204da58 ("dma-direct: implement complete bus_dma_mask handling"), which is being fixed by "of/device: Really only set bus DMA mask when appropriate" (https://patchwork.kernel.org/patch/10670177/). Okay, but the bug is triggered since 6c2fb2ea76361da9, so I took this one for the fixes-tag. FWIW it looks like it *has* always been possible to hit this crash by allocating a domain and freeing it again without attaching any devices, it's just highly improbable for any sane code to do that explicitly, so the real latent triggers are failure paths in external callers (which in this case are themselves only being reached thanks to my bug elsewhere). Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/vt-d: respect max guest address width in agaw
On Tue, Nov 06, 2018 at 02:47:15PM -0800, Jacob Pan wrote: > drivers/iommu/intel-iommu.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free
On Wed, Nov 07, 2018 at 02:18:50PM +0100, Geert Uytterhoeven wrote: > Fix this by checking if the domain's context already exists, before > trying to destroy it. > > Signed-off-by: Geert Uytterhoeven > --- > drivers/iommu/ipmmu-vmsa.c | 3 +++ > 1 file changed, 3 insertions(+) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free
On Wed, Nov 07, 2018 at 04:50:40PM +0100, Geert Uytterhoeven wrote: > It only got triggered by the combination of commits 6c2fb2ea76361da9 > ("of/device: Set bus DMA mask as appropriate") and b4ebe6063204da58 > ("dma-direct: implement complete bus_dma_mask handling"), which is being > fixed by "of/device: Really only set bus DMA mask when > appropriate" (https://patchwork.kernel.org/patch/10670177/). Okay, but the bug is triggered since 6c2fb2ea76361da9, so I took this one for the fixes-tag. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] of/device: Really only set bus DMA mask when appropriate
On Wed, Nov 07, 2018 at 12:56:49PM +, Robin Murphy wrote: > On 2018-11-07 8:03 am, Christoph Hellwig wrote: > > On Tue, Nov 06, 2018 at 11:54:15AM +, Robin Murphy wrote: > > > of_dma_configure() was *supposed* to be following the same logic as > > > acpi_dma_configure() and only setting bus_dma_mask if some range was > > > specified by the firmware. However, it seems that subtlety got lost in > > > the process of fitting it into the differently-shaped control flow, and > > > as a result the force_dma==true case ends up always setting the bus mask > > > to the 32-bit default, which is not what anyone wants. > > > > > > Make sure we only touch it if the DT actually said so. > > > > This looks good, but I think it could really use a comment as the use > > of ret all the way down the function isn't exactly obvious. > > Fair point. > > > Let me now if you want this picked up through the OF or DMA trees. > > I don't mind either way; I figure I'll wait a bit longer to see if Rob has > any preference, then resend with the comment and the tags picked up so it > can hopefully make rc2. I have other fixes to send, so I can take it. Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free
Hi Jörg, On Wed, Nov 7, 2018 at 4:34 PM Joerg Roedel wrote: > On Wed, Nov 07, 2018 at 01:22:52PM +, Robin Murphy wrote: > > On 2018-11-07 1:18 pm, Geert Uytterhoeven wrote: > > > Fix this by checking if the domain's context already exists, before > > > trying to destroy it. > > > > Reviewed-by: Robin Murphy > > Does this need a Fixes-tag? If so, which patch should be in that tag? I think this bug has been present since the initial version of the driver, but this failure mode has probably never been tested before. It only got triggered by the combination of commits 6c2fb2ea76361da9 ("of/device: Set bus DMA mask as appropriate") and b4ebe6063204da58 ("dma-direct: implement complete bus_dma_mask handling"), which is being fixed by "of/device: Really only set bus DMA mask when appropriate" (https://patchwork.kernel.org/patch/10670177/). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free
On Wed, Nov 07, 2018 at 01:22:52PM +, Robin Murphy wrote: > On 2018-11-07 1:18 pm, Geert Uytterhoeven wrote: > > Fix this by checking if the domain's context already exists, before > > trying to destroy it. > > Reviewed-by: Robin Murphy Does this need a Fixes-tag? If so, which patch should be in that tag? Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free
On 2018-11-07 1:18 pm, Geert Uytterhoeven wrote: If iommu_ops.add_device() fails, iommu_ops.domain_free() is still called, leading to a crash, as the domain was only partially initialized: ipmmu-vmsa e67b.mmu: Cannot accommodate DMA translation for IOMMU page tables sata_rcar ee30.sata: Unable to initialize IPMMU context iommu: Failed to add device ee30.sata to group 0: -22 Unable to handle kernel NULL pointer dereference at virtual address 0038 ... Call trace: ipmmu_domain_free+0x1c/0xa0 iommu_group_release+0x48/0x68 kobject_put+0x74/0xe8 kobject_del.part.0+0x3c/0x50 kobject_put+0x60/0xe8 iommu_group_get_for_dev+0xa8/0x1f0 ipmmu_add_device+0x1c/0x40 of_iommu_configure+0x118/0x190 Fix this by checking if the domain's context already exists, before trying to destroy it. Reviewed-by: Robin Murphy Signed-off-by: Geert Uytterhoeven --- drivers/iommu/ipmmu-vmsa.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index c4114b92652eb0c9..a8b2c649c1d1f1b9 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -498,6 +498,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain) static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain) { + if (!domain->mmu) + return; + /* * Disable the context. Flush the TLB as required when modifying the * context registers. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free
If iommu_ops.add_device() fails, iommu_ops.domain_free() is still called, leading to a crash, as the domain was only partially initialized: ipmmu-vmsa e67b.mmu: Cannot accommodate DMA translation for IOMMU page tables sata_rcar ee30.sata: Unable to initialize IPMMU context iommu: Failed to add device ee30.sata to group 0: -22 Unable to handle kernel NULL pointer dereference at virtual address 0038 ... Call trace: ipmmu_domain_free+0x1c/0xa0 iommu_group_release+0x48/0x68 kobject_put+0x74/0xe8 kobject_del.part.0+0x3c/0x50 kobject_put+0x60/0xe8 iommu_group_get_for_dev+0xa8/0x1f0 ipmmu_add_device+0x1c/0x40 of_iommu_configure+0x118/0x190 Fix this by checking if the domain's context already exists, before trying to destroy it. Signed-off-by: Geert Uytterhoeven --- drivers/iommu/ipmmu-vmsa.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index c4114b92652eb0c9..a8b2c649c1d1f1b9 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -498,6 +498,9 @@ static int ipmmu_domain_init_context(struct ipmmu_vmsa_domain *domain) static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain) { + if (!domain->mmu) + return; + /* * Disable the context. Flush the TLB as required when modifying the * context registers. -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] of/device: Really only set bus DMA mask when appropriate
On 2018-11-07 8:03 am, Christoph Hellwig wrote: On Tue, Nov 06, 2018 at 11:54:15AM +, Robin Murphy wrote: of_dma_configure() was *supposed* to be following the same logic as acpi_dma_configure() and only setting bus_dma_mask if some range was specified by the firmware. However, it seems that subtlety got lost in the process of fitting it into the differently-shaped control flow, and as a result the force_dma==true case ends up always setting the bus mask to the 32-bit default, which is not what anyone wants. Make sure we only touch it if the DT actually said so. This looks good, but I think it could really use a comment as the use of ret all the way down the function isn't exactly obvious. Fair point. Let me now if you want this picked up through the OF or DMA trees. I don't mind either way; I figure I'll wait a bit longer to see if Rob has any preference, then resend with the comment and the tags picked up so it can hopefully make rc2. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] of/device: Really only set bus DMA mask when appropriate
On 2018-11-07 12:08 pm, Richter, Robert wrote: On 07.11.18 11:31:50, Robert Richter wrote: On 06.11.18 11:54:15, Robin Murphy wrote: of_dma_configure() was *supposed* to be following the same logic as acpi_dma_configure() and only setting bus_dma_mask if some range was specified by the firmware. However, it seems that subtlety got lost in the process of fitting it into the differently-shaped control flow, and as a result the force_dma==true case ends up always setting the bus mask to the 32-bit default, which is not what anyone wants. Make sure we only touch it if the DT actually said so. Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate") Reported-by: Aaro Koskinen Reported-by: Jean-Philippe Brucker Signed-off-by: Robin Murphy Tested-by: Robert Richter This patch makes the bad page state message on Cavium ThunderX below disappear. Note: The Fixes: tag suggests the issue was already in 4.19, but I have seen it in 4.20-rc1 first and it was pulled into mainline with: cff229491af5 Merge tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mapping I have bisected it and the issue was seen first with: b4ebe6063204 dma-direct: implement complete bus_dma_mask handling Right, that was the point at which the underlying problem started interacting with SWIOTLB and making arm64 unhappy - the prior effect in in 4.19 was to inadvertently break DT-based dma_direct_ops users (like MIPS), whom the above commit actually partially unbroke again. Thanks for testing, Robin. -Robert [ 37.634555] BUG: Bad page state in process swapper/5 pfn:f3ebb [ 37.640483] page:7e0003cfaec0 count:0 mapcount:0 mapping: index:0x0 [ 37.648493] flags: 0x0001000(reserved) [ 37.652942] raw: 00001000 7e0003cfaec8 7e0003cfaec8 [ 37.660691] raw: [ 37.668438] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 37.674880] bad because of flags: 0x1000(reserved) [ 37.679672] Modules linked in: ip6t_REJECT nf_reject_ipv6 xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat nf_nat_ipv6 ip6table_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle ip6table_filter ip6_tables iptable_filter crct10dif_ce cavium_rng_vf rng_core ipmi_ssif thunderx_zip thunderx_edac ipmi_devintf cavium_rng ipmi_msghandler ip_tables x_tables xfs libcrc32c nicvf nicpf thunder_bgx thunder_xcv i2c_thunderx mdio_thunder mdio_cavium ipv6 [ 37.723866] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc1-00012-gc106b1cbe843 #1 [ 37.731874] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 37.740228] Call trace: [ 37.742677] dump_backtrace+0x0/0x148 [ 37.746334] show_stack+0x14/0x1c [ 37.749643] dump_stack+0x84/0xa8 [ 37.752954] bad_page+0xe4/0x144 [ 37.756178] free_pages_check_bad+0x7c/0x84 [ 37.760355] __free_pages_ok+0x22c/0x284 [ 37.764272] page_frag_free+0x64/0x68 [ 37.767930] skb_free_head+0x24/0x2c [ 37.771500] skb_release_data+0x130/0x148 [ 37.775504] skb_release_all+0x24/0x30 [ 37.779246] kfree_skb+0x2c/0x54 [ 37.782471] ip_error+0x70/0x1d4 [ 37.785693] ip_rcv_finish+0x3c/0x50 [ 37.789262] ip_rcv+0xc0/0xd0 [ 37.792225] __netif_receive_skb_one_core+0x4c/0x70 [ 37.797099] __netif_receive_skb+0x2c/0x70 [ 37.801190] netif_receive_skb_internal+0x64/0x160 [ 37.805976] napi_gro_receive+0xa0/0xc4 [ 37.809815] nicvf_cq_intr_handler+0x930/0xc1c [nicvf] [ 37.814950] nicvf_poll+0x30/0xb0 [nicvf] [ 37.818954] net_rx_action+0x140/0x2f8 [ 37.822697] __do_softirq+0x11c/0x228 [ 37.826354] irq_exit+0xbc/0xd0 [ 37.829491] __handle_domain_irq+0x6c/0xb4 [ 37.833581] gic_handle_irq+0x8c/0x1a0 [ 37.837324] el1_irq+0xb0/0x128 [ 37.840459] arch_cpu_idle+0x10/0x18 [ 37.844031] default_idle_call+0x18/0x28 [ 37.847948] do_idle+0x194/0x258 [ 37.851169] cpu_startup_entry+0x20/0x24 [ 37.855089] secondary_start_kernel+0x144/0x168 --- Sorry about that... I guess I only have test setups that either have dma-ranges or where a 32-bit bus mask goes unnoticed :( The Octeon and SMMU issues sound like they're purely down to this, and it's probably related to at least one of John's Hikey woes. Robin. drivers/of/device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/of/device.c b/drivers/of/device.c index 0f27fad9fe94..757ae867674f 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -149,7 +149,8 @@ int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma) * set by the driver. */ mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1); - dev->bus_dma_mask = mask; + if (!ret) + dev->bus_dma_mask = mask; dev->coherent_dma_mask &= mask; *dev->dma_mask &= mask; -- 2.19.1.dirty
Re: [PATCH] of/device: Really only set bus DMA mask when appropriate
On 07.11.18 11:31:50, Robert Richter wrote: > On 06.11.18 11:54:15, Robin Murphy wrote: > > of_dma_configure() was *supposed* to be following the same logic as > > acpi_dma_configure() and only setting bus_dma_mask if some range was > > specified by the firmware. However, it seems that subtlety got lost in > > the process of fitting it into the differently-shaped control flow, and > > as a result the force_dma==true case ends up always setting the bus mask > > to the 32-bit default, which is not what anyone wants. > > > > Make sure we only touch it if the DT actually said so. > > > > Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate") > > Reported-by: Aaro Koskinen > > Reported-by: Jean-Philippe Brucker > > Signed-off-by: Robin Murphy > > Tested-by: Robert Richter > > This patch makes the bad page state message on Cavium ThunderX below > disappear. > > Note: The Fixes: tag suggests the issue was already in 4.19, but I > have seen it in 4.20-rc1 first and it was pulled into mainline with: > > cff229491af5 Merge tag 'dma-mapping-4.20' of > git://git.infradead.org/users/hch/dma-mapping I have bisected it and the issue was seen first with: b4ebe6063204 dma-direct: implement complete bus_dma_mask handling > > -Robert > > > [ 37.634555] BUG: Bad page state in process swapper/5 pfn:f3ebb > [ 37.640483] page:7e0003cfaec0 count:0 mapcount:0 > mapping: index:0x0 > [ 37.648493] flags: 0x0001000(reserved) > [ 37.652942] raw: 00001000 7e0003cfaec8 7e0003cfaec8 > > [ 37.660691] raw: > > [ 37.668438] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > [ 37.674880] bad because of flags: 0x1000(reserved) > [ 37.679672] Modules linked in: ip6t_REJECT nf_reject_ipv6 xt_tcpudp > ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat nf_nat_ipv6 > ip6table_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 > nf_defrag_ipv4 iptable_mangle ip6table_filter ip6_tables iptable_filter > crct10dif_ce cavium_rng_vf rng_core ipmi_ssif thunderx_zip thunderx_edac > ipmi_devintf cavium_rng ipmi_msghandler ip_tables x_tables xfs libcrc32c > nicvf nicpf thunder_bgx thunder_xcv i2c_thunderx mdio_thunder mdio_cavium ipv6 > [ 37.723866] CPU: 5 PID: 0 Comm: swapper/5 Not tainted > 4.20.0-rc1-00012-gc106b1cbe843 #1 > [ 37.731874] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., > BIOS 5.11 12/12/2012 > [ 37.740228] Call trace: > [ 37.742677] dump_backtrace+0x0/0x148 > [ 37.746334] show_stack+0x14/0x1c > [ 37.749643] dump_stack+0x84/0xa8 > [ 37.752954] bad_page+0xe4/0x144 > [ 37.756178] free_pages_check_bad+0x7c/0x84 > [ 37.760355] __free_pages_ok+0x22c/0x284 > [ 37.764272] page_frag_free+0x64/0x68 > [ 37.767930] skb_free_head+0x24/0x2c > [ 37.771500] skb_release_data+0x130/0x148 > [ 37.775504] skb_release_all+0x24/0x30 > [ 37.779246] kfree_skb+0x2c/0x54 > [ 37.782471] ip_error+0x70/0x1d4 > [ 37.785693] ip_rcv_finish+0x3c/0x50 > [ 37.789262] ip_rcv+0xc0/0xd0 > [ 37.792225] __netif_receive_skb_one_core+0x4c/0x70 > [ 37.797099] __netif_receive_skb+0x2c/0x70 > [ 37.801190] netif_receive_skb_internal+0x64/0x160 > [ 37.805976] napi_gro_receive+0xa0/0xc4 > [ 37.809815] nicvf_cq_intr_handler+0x930/0xc1c [nicvf] > [ 37.814950] nicvf_poll+0x30/0xb0 [nicvf] > [ 37.818954] net_rx_action+0x140/0x2f8 > [ 37.822697] __do_softirq+0x11c/0x228 > [ 37.826354] irq_exit+0xbc/0xd0 > [ 37.829491] __handle_domain_irq+0x6c/0xb4 > [ 37.833581] gic_handle_irq+0x8c/0x1a0 > [ 37.837324] el1_irq+0xb0/0x128 > [ 37.840459] arch_cpu_idle+0x10/0x18 > [ 37.844031] default_idle_call+0x18/0x28 > [ 37.847948] do_idle+0x194/0x258 > [ 37.851169] cpu_startup_entry+0x20/0x24 > [ 37.855089] secondary_start_kernel+0x144/0x168 > > > > --- > > > > Sorry about that... I guess I only have test setups that either have > > dma-ranges or where a 32-bit bus mask goes unnoticed :( > > > > The Octeon and SMMU issues sound like they're purely down to this, and > > it's probably related to at least one of John's Hikey woes. > > > > Robin. > > > > drivers/of/device.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/of/device.c b/drivers/of/device.c > > index 0f27fad9fe94..757ae867674f 100644 > > --- a/drivers/of/device.c > > +++ b/drivers/of/device.c > > @@ -149,7 +149,8 @@ int of_dma_configure(struct device *dev, struct > > device_node *np, bool force_dma) > > * set by the driver. > > */ > > mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1); > > - dev->bus_dma_mask = mask; > > + if (!ret) > > + dev->bus_dma_mask = mask; > > dev->coherent_dma_mask &= mask; > > *dev->dma_mask &= mask; > > > > -- > > 2.19.1.dirty > > > > > > ___ > > linux-arm-kernel
Re: [PATCH] of/device: Really only set bus DMA mask when appropriate
On 06.11.18 11:54:15, Robin Murphy wrote: > of_dma_configure() was *supposed* to be following the same logic as > acpi_dma_configure() and only setting bus_dma_mask if some range was > specified by the firmware. However, it seems that subtlety got lost in > the process of fitting it into the differently-shaped control flow, and > as a result the force_dma==true case ends up always setting the bus mask > to the 32-bit default, which is not what anyone wants. > > Make sure we only touch it if the DT actually said so. > > Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate") > Reported-by: Aaro Koskinen > Reported-by: Jean-Philippe Brucker > Signed-off-by: Robin Murphy Tested-by: Robert Richter This patch makes the bad page state message on Cavium ThunderX below disappear. Note: The Fixes: tag suggests the issue was already in 4.19, but I have seen it in 4.20-rc1 first and it was pulled into mainline with: cff229491af5 Merge tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mapping -Robert [ 37.634555] BUG: Bad page state in process swapper/5 pfn:f3ebb [ 37.640483] page:7e0003cfaec0 count:0 mapcount:0 mapping: index:0x0 [ 37.648493] flags: 0x0001000(reserved) [ 37.652942] raw: 00001000 7e0003cfaec8 7e0003cfaec8 [ 37.660691] raw: [ 37.668438] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set [ 37.674880] bad because of flags: 0x1000(reserved) [ 37.679672] Modules linked in: ip6t_REJECT nf_reject_ipv6 xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat nf_nat_ipv6 ip6table_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle ip6table_filter ip6_tables iptable_filter crct10dif_ce cavium_rng_vf rng_core ipmi_ssif thunderx_zip thunderx_edac ipmi_devintf cavium_rng ipmi_msghandler ip_tables x_tables xfs libcrc32c nicvf nicpf thunder_bgx thunder_xcv i2c_thunderx mdio_thunder mdio_cavium ipv6 [ 37.723866] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc1-00012-gc106b1cbe843 #1 [ 37.731874] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012 [ 37.740228] Call trace: [ 37.742677] dump_backtrace+0x0/0x148 [ 37.746334] show_stack+0x14/0x1c [ 37.749643] dump_stack+0x84/0xa8 [ 37.752954] bad_page+0xe4/0x144 [ 37.756178] free_pages_check_bad+0x7c/0x84 [ 37.760355] __free_pages_ok+0x22c/0x284 [ 37.764272] page_frag_free+0x64/0x68 [ 37.767930] skb_free_head+0x24/0x2c [ 37.771500] skb_release_data+0x130/0x148 [ 37.775504] skb_release_all+0x24/0x30 [ 37.779246] kfree_skb+0x2c/0x54 [ 37.782471] ip_error+0x70/0x1d4 [ 37.785693] ip_rcv_finish+0x3c/0x50 [ 37.789262] ip_rcv+0xc0/0xd0 [ 37.792225] __netif_receive_skb_one_core+0x4c/0x70 [ 37.797099] __netif_receive_skb+0x2c/0x70 [ 37.801190] netif_receive_skb_internal+0x64/0x160 [ 37.805976] napi_gro_receive+0xa0/0xc4 [ 37.809815] nicvf_cq_intr_handler+0x930/0xc1c [nicvf] [ 37.814950] nicvf_poll+0x30/0xb0 [nicvf] [ 37.818954] net_rx_action+0x140/0x2f8 [ 37.822697] __do_softirq+0x11c/0x228 [ 37.826354] irq_exit+0xbc/0xd0 [ 37.829491] __handle_domain_irq+0x6c/0xb4 [ 37.833581] gic_handle_irq+0x8c/0x1a0 [ 37.837324] el1_irq+0xb0/0x128 [ 37.840459] arch_cpu_idle+0x10/0x18 [ 37.844031] default_idle_call+0x18/0x28 [ 37.847948] do_idle+0x194/0x258 [ 37.851169] cpu_startup_entry+0x20/0x24 [ 37.855089] secondary_start_kernel+0x144/0x168 > --- > > Sorry about that... I guess I only have test setups that either have > dma-ranges or where a 32-bit bus mask goes unnoticed :( > > The Octeon and SMMU issues sound like they're purely down to this, and > it's probably related to at least one of John's Hikey woes. > > Robin. > > drivers/of/device.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/of/device.c b/drivers/of/device.c > index 0f27fad9fe94..757ae867674f 100644 > --- a/drivers/of/device.c > +++ b/drivers/of/device.c > @@ -149,7 +149,8 @@ int of_dma_configure(struct device *dev, struct > device_node *np, bool force_dma) >* set by the driver. >*/ > mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1); > - dev->bus_dma_mask = mask; > + if (!ret) > + dev->bus_dma_mask = mask; > dev->coherent_dma_mask &= mask; > *dev->dma_mask &= mask; > > -- > 2.19.1.dirty > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: IOMMU breakage on arm64
Hi Robin, On Tue, Nov 6, 2018 at 9:20 PM Robin Murphy wrote: > On 2018-11-06 7:44 pm, Geert Uytterhoeven wrote: > > On Tue, Oct 23, 2018 at 1:40 AM Linux Kernel Mailing List > > wrote: > >> Commit: b4ebe6063204da58e48600b810a97c29ae9e5d12 > >> Parent: 7d21ee4c719f00896767ce19c4c01a56374c2ced > >> Refname:refs/heads/master > >> Web: > >> https://git.kernel.org/torvalds/c/b4ebe6063204da58e48600b810a97c29ae9e5d12 > >> Author: Christoph Hellwig > >> AuthorDate: Thu Sep 20 14:04:08 2018 +0200 > >> Committer: Christoph Hellwig > >> CommitDate: Mon Oct 1 07:28:03 2018 -0700 > >> > >> dma-direct: implement complete bus_dma_mask handling > >> > >> Instead of rejecting devices with a too small bus_dma_mask we can > >> handle > >> by taking the bus dma_mask into account for allocations and bounce > >> buffering decisions. > >> > >> Signed-off-by: Christoph Hellwig > > > > I have bisected the following crash to the above commit: > > I think that commit mostly just changes the presentation of my > underlying cockup - see here for what should fix it: > https://patchwork.kernel.org/patch/10670177/ Thanks, that fixed the issue. > I have a feeling we've ironed out crash-on-early-domain-free bugs in the > SMMU drivers already - arm-smmu certainly has an early return in > arm_smmu_destroy_domain_context() which should behave exactly like your > diff below, while I think arm-smmu-v3 gets away with it by virtue of > smmu_domain->cfg being unset, but I'll double-check that when I'm fresh > tomorrow (Jean-Philippe reported SMMUv3 hitting the DMA thing to me > internally, but didn't mention any crash). OK, I'll send a proper patch for the ipmmu-vmsa driver. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] of/device: Really only set bus DMA mask when appropriate
Hi Robin, On Tue, Nov 6, 2018 at 12:54 PM Robin Murphy wrote: > of_dma_configure() was *supposed* to be following the same logic as > acpi_dma_configure() and only setting bus_dma_mask if some range was > specified by the firmware. However, it seems that subtlety got lost in > the process of fitting it into the differently-shaped control flow, and > as a result the force_dma==true case ends up always setting the bus mask > to the 32-bit default, which is not what anyone wants. > > Make sure we only touch it if the DT actually said so. > > Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate") > Reported-by: Aaro Koskinen > Reported-by: Jean-Philippe Brucker > Signed-off-by: Robin Murphy Thanks, this fixes the problem I saw with IPMMU on Salvator-X(S). Tested-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] of/device: Really only set bus DMA mask when appropriate
On Tue, Nov 06, 2018 at 11:54:15AM +, Robin Murphy wrote: > of_dma_configure() was *supposed* to be following the same logic as > acpi_dma_configure() and only setting bus_dma_mask if some range was > specified by the firmware. However, it seems that subtlety got lost in > the process of fitting it into the differently-shaped control flow, and > as a result the force_dma==true case ends up always setting the bus mask > to the 32-bit default, which is not what anyone wants. > > Make sure we only touch it if the DT actually said so. This looks good, but I think it could really use a comment as the use of ret all the way down the function isn't exactly obvious. Let me now if you want this picked up through the OF or DMA trees. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu