iommu/io-pgtable-arm-v7s: About pagetable 33bit PA

2018-11-07 Thread Yong Wu
Hi Robin,

After the commit ad67f5a6545f ("arm64: replace ZONE_DMA with
ZONE_DMA32"), we don't have ZONE_DMA in aarch64, but
__arm_v7s_alloc_table[1] use the GFP_DMA to expect the physical address
of pagetable should be 32bit.

Right now we meet a issue that the lvl1 pagetable PA is 0x1_3ab6_ in
the 4GB broad. then the IOMMU initialize failed.(This issue can be fixed
if we revert ad67f5a6545f.)

I planed to add GFP_DMA32 for pagetable allocation. the level1 was
ok but level2 was failed. It looks that slab don't like GFP_DMA32[2].
this is the warning log:
=
Unexpected gfp: 0x4 (GFP_DMA32). Fixing up to gfp: 0x488020 (GFP_ATOMIC|
__GFP_ZERO). Fix your code!
=
I don't know why slab don't allow GFP_DMA32, the gfp flags seems only
be bypassed to alloc_page. Is it possible that let slab allow GFP_DMA32
for aarch64?

We notice that this has been discussed[3]. but if we use alloc_page for
the level2 pagetable, It may waste lots of memory.

Any suggestion is appreciated.


Reported-by: Nicolas Boichat 

[1]
https://elixir.bootlin.com/linux/v4.20-rc1/source/drivers/iommu/io-pgtable-arm-v7s.c#L200
[2] https://elixir.bootlin.com/linux/v4.20-rc1/source/mm/internal.h#L37
[3] https://patchwork.kernel.org/patch/10474289/

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-11-07 Thread Liu, Yi L
Hi,

> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, November 8, 2018 2:14 PM
> 
> Hi,
> 
> On 11/8/18 1:45 PM, Liu, Yi L wrote:
> >> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> >> Sent: Thursday, November 8, 2018 1:25 PM
> >> Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation
> >> descriptor support
> >>
> >> Hi,
> >>
> >> On 11/8/18 11:49 AM, Liu, Yi L wrote:
> >>> Hi,
> >>>
>  From: Lu Baolu [mailto:baolu...@linux.intel.com]
>  Sent: Thursday, November 8, 2018 10:17 AM
>  Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation
>  descriptor support
> 
>  Hi Yi,
> 
>  On 11/7/18 2:07 PM, Liu, Yi L wrote:
> > Hi Baolu,
> >
> >> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> >> Sent: Monday, November 5, 2018 1:32 PM
> > [...]
> >
> >> ---
> >> drivers/iommu/dmar.c| 83 
> >> +++--
> >> drivers/iommu/intel-svm.c   | 76 --
> >> drivers/iommu/intel_irq_remapping.c |  6 ++-
> >> include/linux/intel-iommu.h |  9 +++-
> >> 4 files changed, 115 insertions(+), 59 deletions(-)
> >>
> >> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index
> >> d9c748b6f9e4..ec10427b98ac 100644
> >> --- a/drivers/iommu/dmar.c
> >> +++ b/drivers/iommu/dmar.c
> >> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct
> >> intel_iommu *iommu, int
> >> index)
> >>int head, tail;
> >>struct q_inval *qi = iommu->qi;
> >>int wait_index = (index + 1) % QI_LENGTH;
> >> +  int shift = qi_shift(iommu);
> >>
> >>if (qi->desc_status[wait_index] == QI_ABORT)
> >>return -EAGAIN;
> >> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct
> >> intel_iommu *iommu, int index)
> >> */
> >>if (fault & DMA_FSTS_IQE) {
> >>head = readl(iommu->reg + DMAR_IQH_REG);
> >> -  if ((head >> DMAR_IQ_SHIFT) == index) {
> >> +  if ((head >> shift) == index) {
> >> +  struct qi_desc *desc = qi->desc + head;
> >> +
> >>pr_err("VT-d detected invalid descriptor: "
> >>"low=%llx, high=%llx\n",
> >> -  (unsigned long long)qi->desc[index].low,
> >> -  (unsigned long 
> >> long)qi->desc[index].high);
> >> -  memcpy(>desc[index], >desc[wait_index],
> >> -  sizeof(struct qi_desc));
> >> +  (unsigned long long)desc->qw0,
> >> +  (unsigned long long)desc->qw1);
> > Still missing qw2 and qw3. May make the print differ based on if smts is
> configed.
>  qw2 and qw3 are reserved from software point of view. We don't need
>  to print it for information.
> >>> But for Scalable mode, it should be valid?
> >> No. It's reserved for software.
> > No, I don’t think so. PRQ response would also be queued to hardware by
> > QI. For such QI descriptors, the high bits are not reserved.
> >
> 
> Do you mean the private data fields of a page request descriptor or a page 
> group
> response descriptor? Those fields contains software defined private data 
> (might a
> kernel pointer?). We should avoid leaking such information in the generic 
> kernel
> message for security consideration.
> Or anything I missed?

yes, I'm not sure what kind of data it may be in the private data field. From 
software
point of view, it may be helpful to show the full content of the QI descriptor 
for error
triage. Personally, I'm fine if you keep it on this point.

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-11-07 Thread Lu Baolu



On 11/8/18 1:48 PM, Liu, Yi L wrote:

From: Liu, Yi L
Sent: Thursday, November 8, 2018 1:45 PM

+   memcpy(desc, qi->desc + (wait_index << shift),


Would "memcpy(desc, (unsigned long long) (qi->desc +  (wait_index
<< shift)," be more safe?


Can that be compiled? memcpy() requires a "const void *" for the
second

parameter.

By the way, why it's safer with this casting?


This is just an example. My point is the possibility that "qi->desc
+ (wait_index <<

shift)"

would be treated as "qi->desc plus (wait_index <<
shift)*sizeof(*qi->desc)". Is it possible for kernel build?


qi->desc is of type of "void *".


no, I don’t think so... Refer to the code below. Even it has no correctness 
issue her,
It's not due to qi->desc is "void *" type...

struct qi_desc {
-   u64 low, high;
+   u64 qw0;
+   u64 qw1;
+   u64 qw2;
+   u64 qw3;
};


Oops, just see you modified it to be "void *" in this patch. Ok, then this is 
fair enough.


Yes. :-)

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-11-07 Thread Lu Baolu

Hi,

On 11/8/18 1:45 PM, Liu, Yi L wrote:

From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Thursday, November 8, 2018 1:25 PM
Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor
support

Hi,

On 11/8/18 11:49 AM, Liu, Yi L wrote:

Hi,


From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Thursday, November 8, 2018 10:17 AM
Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation
descriptor support

Hi Yi,

On 11/7/18 2:07 PM, Liu, Yi L wrote:

Hi Baolu,


From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Monday, November 5, 2018 1:32 PM

[...]


---
drivers/iommu/dmar.c| 83 +++--
drivers/iommu/intel-svm.c   | 76 --
drivers/iommu/intel_irq_remapping.c |  6 ++-
include/linux/intel-iommu.h |  9 +++-
4 files changed, 115 insertions(+), 59 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index
d9c748b6f9e4..ec10427b98ac 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu
*iommu, int
index)
int head, tail;
struct q_inval *qi = iommu->qi;
int wait_index = (index + 1) % QI_LENGTH;
+   int shift = qi_shift(iommu);

if (qi->desc_status[wait_index] == QI_ABORT)
return -EAGAIN;
@@ -1173,13 +1174,15 @@ static int qi_check_fault(struct
intel_iommu *iommu, int index)
 */
if (fault & DMA_FSTS_IQE) {
head = readl(iommu->reg + DMAR_IQH_REG);
-   if ((head >> DMAR_IQ_SHIFT) == index) {
+   if ((head >> shift) == index) {
+   struct qi_desc *desc = qi->desc + head;
+
pr_err("VT-d detected invalid descriptor: "
"low=%llx, high=%llx\n",
-   (unsigned long long)qi->desc[index].low,
-   (unsigned long long)qi->desc[index].high);
-   memcpy(>desc[index], >desc[wait_index],
-   sizeof(struct qi_desc));
+   (unsigned long long)desc->qw0,
+   (unsigned long long)desc->qw1);

Still missing qw2 and qw3. May make the print differ based on if smts is 
configed.

qw2 and qw3 are reserved from software point of view. We don't need
to print it for information.

But for Scalable mode, it should be valid?

No. It's reserved for software.

No, I don’t think so. PRQ response would also be queued to hardware by QI. For 
such
QI descriptors, the high bits are not reserved.



Do you mean the private data fields of a page request descriptor or
a page group response descriptor? Those fields contains software defined
private data (might a kernel pointer?). We should avoid leaking such
information in the generic kernel message for security consideration.
Or anything I missed?

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH -next] iommu/amd: remove set but not used variable 'tag'

2018-11-07 Thread YueHaibing
From: Yue Haibing 

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/iommu/amd_iommu.c: In function 'iommu_print_event':
drivers/iommu/amd_iommu.c:550:33: warning:
 variable 'tag' set but not used [-Wunused-but-set-variable]

It never used since introduction in
commit e7f63ffc1bf1 ("iommu/amd: Update logging information for new event type")

Signed-off-by: Yue Haibing 
---
 drivers/iommu/amd_iommu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 4e04fff..58a7834 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -547,7 +547,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
 static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
 {
struct device *dev = iommu->iommu.dev;
-   int type, devid, pasid, flags, tag;
+   int type, devid, pasid, flags;
volatile u32 *event = __evt;
int count = 0;
u64 address;
@@ -615,7 +615,6 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
case EVENT_TYPE_INV_PPR_REQ:
pasid = ((event[0] >> 16) & 0x)
| ((event[1] << 6) & 0xF);
-   tag = event[1] & 0x03FF;
dev_err(dev, "INVALID_PPR_REQUEST device=%02x:%02x.%x 
pasid=0x%05x address=0x%016llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
pasid, address, flags);



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-11-07 Thread Liu, Yi L
> From: Liu, Yi L
> Sent: Thursday, November 8, 2018 1:45 PM
> >  +  memcpy(desc, qi->desc + (wait_index << shift),
> > >>>
> > >>> Would "memcpy(desc, (unsigned long long) (qi->desc +  (wait_index
> > >>> << shift)," be more safe?
> > >>
> > >> Can that be compiled? memcpy() requires a "const void *" for the
> > >> second
> > parameter.
> > >> By the way, why it's safer with this casting?
> > >
> > > This is just an example. My point is the possibility that "qi->desc
> > > + (wait_index <<
> > shift)"
> > > would be treated as "qi->desc plus (wait_index <<
> > > shift)*sizeof(*qi->desc)". Is it possible for kernel build?
> >
> > qi->desc is of type of "void *".
> 
> no, I don’t think so... Refer to the code below. Even it has no correctness 
> issue her,
> It's not due to qi->desc is "void *" type...
> 
> struct qi_desc {
> - u64 low, high;
> + u64 qw0;
> + u64 qw1;
> + u64 qw2;
> + u64 qw3;
> };

Oops, just see you modified it to be "void *" in this patch. Ok, then this is 
fair enough.

Thanks,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-11-07 Thread Liu, Yi L

> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, November 8, 2018 1:25 PM
> Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor
> support
> 
> Hi,
> 
> On 11/8/18 11:49 AM, Liu, Yi L wrote:
> > Hi,
> >
> >> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> >> Sent: Thursday, November 8, 2018 10:17 AM
> >> Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation
> >> descriptor support
> >>
> >> Hi Yi,
> >>
> >> On 11/7/18 2:07 PM, Liu, Yi L wrote:
> >>> Hi Baolu,
> >>>
>  From: Lu Baolu [mailto:baolu...@linux.intel.com]
>  Sent: Monday, November 5, 2018 1:32 PM
> >>>
> >>> [...]
> >>>
>  ---
> drivers/iommu/dmar.c| 83 +++--
> drivers/iommu/intel-svm.c   | 76 --
> drivers/iommu/intel_irq_remapping.c |  6 ++-
> include/linux/intel-iommu.h |  9 +++-
> 4 files changed, 115 insertions(+), 59 deletions(-)
> 
>  diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index
>  d9c748b6f9e4..ec10427b98ac 100644
>  --- a/drivers/iommu/dmar.c
>  +++ b/drivers/iommu/dmar.c
>  @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu
>  *iommu, int
>  index)
>   int head, tail;
>   struct q_inval *qi = iommu->qi;
>   int wait_index = (index + 1) % QI_LENGTH;
>  +int shift = qi_shift(iommu);
> 
>   if (qi->desc_status[wait_index] == QI_ABORT)
>   return -EAGAIN;
>  @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct
>  intel_iommu *iommu, int index)
>    */
>   if (fault & DMA_FSTS_IQE) {
>   head = readl(iommu->reg + DMAR_IQH_REG);
>  -if ((head >> DMAR_IQ_SHIFT) == index) {
>  +if ((head >> shift) == index) {
>  +struct qi_desc *desc = qi->desc + head;
>  +
>   pr_err("VT-d detected invalid descriptor: "
>   "low=%llx, high=%llx\n",
>  -(unsigned long long)qi->desc[index].low,
>  -(unsigned long 
>  long)qi->desc[index].high);
>  -memcpy(>desc[index], >desc[wait_index],
>  -sizeof(struct qi_desc));
>  +(unsigned long long)desc->qw0,
>  +(unsigned long long)desc->qw1);
> >>>
> >>> Still missing qw2 and qw3. May make the print differ based on if smts is 
> >>> configed.
> >>
> >> qw2 and qw3 are reserved from software point of view. We don't need
> >> to print it for information.
> >
> > But for Scalable mode, it should be valid?
> 
> No. It's reserved for software.

No, I don’t think so. PRQ response would also be queued to hardware by QI. For 
such
QI descriptors, the high bits are not reserved.

> >>
> >>>
>  +memcpy(desc, qi->desc + (wait_index << shift),
> >>>
> >>> Would "memcpy(desc, (unsigned long long) (qi->desc +  (wait_index <<
> >>> shift)," be more safe?
> >>
> >> Can that be compiled? memcpy() requires a "const void *" for the second
> parameter.
> >> By the way, why it's safer with this casting?
> >
> > This is just an example. My point is the possibility that "qi->desc + 
> > (wait_index <<
> shift)"
> > would be treated as "qi->desc plus (wait_index <<
> > shift)*sizeof(*qi->desc)". Is it possible for kernel build?
> 
> qi->desc is of type of "void *".

no, I don’t think so... Refer to the code below. Even it has no correctness 
issue her,
It's not due to qi->desc is "void *" type...

struct qi_desc {
-   u64 low, high;
+   u64 qw0;
+   u64 qw1;
+   u64 qw2;
+   u64 qw3;
};

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-11-07 Thread Lu Baolu

Hi,

On 11/8/18 11:49 AM, Liu, Yi L wrote:

Hi,


From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Thursday, November 8, 2018 10:17 AM
Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor
support

Hi Yi,

On 11/7/18 2:07 PM, Liu, Yi L wrote:

Hi Baolu,


From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Monday, November 5, 2018 1:32 PM


[...]


---
   drivers/iommu/dmar.c| 83 +++--
   drivers/iommu/intel-svm.c   | 76 --
   drivers/iommu/intel_irq_remapping.c |  6 ++-
   include/linux/intel-iommu.h |  9 +++-
   4 files changed, 115 insertions(+), 59 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index
d9c748b6f9e4..ec10427b98ac 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu
*iommu, int
index)
int head, tail;
struct q_inval *qi = iommu->qi;
int wait_index = (index + 1) % QI_LENGTH;
+   int shift = qi_shift(iommu);

if (qi->desc_status[wait_index] == QI_ABORT)
return -EAGAIN;
@@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu
*iommu, int index)
 */
if (fault & DMA_FSTS_IQE) {
head = readl(iommu->reg + DMAR_IQH_REG);
-   if ((head >> DMAR_IQ_SHIFT) == index) {
+   if ((head >> shift) == index) {
+   struct qi_desc *desc = qi->desc + head;
+
pr_err("VT-d detected invalid descriptor: "
"low=%llx, high=%llx\n",
-   (unsigned long long)qi->desc[index].low,
-   (unsigned long long)qi->desc[index].high);
-   memcpy(>desc[index], >desc[wait_index],
-   sizeof(struct qi_desc));
+   (unsigned long long)desc->qw0,
+   (unsigned long long)desc->qw1);


Still missing qw2 and qw3. May make the print differ based on if smts is 
configed.


qw2 and qw3 are reserved from software point of view. We don't need to print it 
for
information.


But for Scalable mode, it should be valid?


No. It's reserved for software.








+   memcpy(desc, qi->desc + (wait_index << shift),


Would "memcpy(desc, (unsigned long long) (qi->desc +  (wait_index <<
shift)," be more safe?


Can that be compiled? memcpy() requires a "const void *" for the second 
parameter.
By the way, why it's safer with this casting?


This is just an example. My point is the possibility that "qi->desc + (wait_index << 
shift)"
would be treated as "qi->desc plus (wait_index << shift)*sizeof(*qi->desc)". Is 
it possible
for kernel build?


qi->desc is of type of "void *".

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 06/12] iommu/vt-d: Add second level page table interface

2018-11-07 Thread Liu, Yi L
Hi,

> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, November 8, 2018 10:28 AM
> Subject: Re: [PATCH v4 06/12] iommu/vt-d: Add second level page table 
> interface
> 
> Hi,
> 
> On 11/7/18 3:13 PM, Liu, Yi L wrote:
> > Hi Baolu,
> >
> >> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> >> Sent: Monday, November 5, 2018 1:32 PM
> >>
> >> This adds the interfaces to setup or tear down the structures for
> >> second level page table translations. This includes types of second
> >> level only translation and pass through.
> >
> > A little bit refining to the description:) "This patch adds interfaces
> > for setup or tear down second level translation in PASID granularity.
> > Translation type includes second level only type and pass-through
> > type."
> >
> >> Cc: Ashok Raj 
> >> Cc: Jacob Pan 
> >> Cc: Kevin Tian 
> >> Cc: Liu Yi L 
> >> Signed-off-by: Sanjay Kumar 
> >
> > [...]
> >
> >> +
> >> +void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> >> +   struct device *dev, int pasid)
> >> +{
> >> +  struct pasid_entry *pte;
> >
> > pte is confusing as it is similar with pte in paging structures. may
> > use pt_entry or just pasid_entry. This comment applies to other "pte"s
> > in this patch.
> 
> "pte" in this file means "pasid table entry", not "page table entry".
> This file holds code to handle pasid table related staff. It has nothing to 
> do with
> paging structure. I think there should be no confusion here.
> :-)

I see. Then up to you. :) It's just my feeling when reading the patch, it leads 
me to
believe it is paging structure.

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-11-07 Thread Liu, Yi L
Hi,

> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, November 8, 2018 10:17 AM
> Subject: Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor
> support
> 
> Hi Yi,
> 
> On 11/7/18 2:07 PM, Liu, Yi L wrote:
> > Hi Baolu,
> >
> >> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> >> Sent: Monday, November 5, 2018 1:32 PM
> >
> > [...]
> >
> >> ---
> >>   drivers/iommu/dmar.c| 83 +++--
> >>   drivers/iommu/intel-svm.c   | 76 --
> >>   drivers/iommu/intel_irq_remapping.c |  6 ++-
> >>   include/linux/intel-iommu.h |  9 +++-
> >>   4 files changed, 115 insertions(+), 59 deletions(-)
> >>
> >> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index
> >> d9c748b6f9e4..ec10427b98ac 100644
> >> --- a/drivers/iommu/dmar.c
> >> +++ b/drivers/iommu/dmar.c
> >> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu
> >> *iommu, int
> >> index)
> >>int head, tail;
> >>struct q_inval *qi = iommu->qi;
> >>int wait_index = (index + 1) % QI_LENGTH;
> >> +  int shift = qi_shift(iommu);
> >>
> >>if (qi->desc_status[wait_index] == QI_ABORT)
> >>return -EAGAIN;
> >> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu
> >> *iommu, int index)
> >> */
> >>if (fault & DMA_FSTS_IQE) {
> >>head = readl(iommu->reg + DMAR_IQH_REG);
> >> -  if ((head >> DMAR_IQ_SHIFT) == index) {
> >> +  if ((head >> shift) == index) {
> >> +  struct qi_desc *desc = qi->desc + head;
> >> +
> >>pr_err("VT-d detected invalid descriptor: "
> >>"low=%llx, high=%llx\n",
> >> -  (unsigned long long)qi->desc[index].low,
> >> -  (unsigned long long)qi->desc[index].high);
> >> -  memcpy(>desc[index], >desc[wait_index],
> >> -  sizeof(struct qi_desc));
> >> +  (unsigned long long)desc->qw0,
> >> +  (unsigned long long)desc->qw1);
> >
> > Still missing qw2 and qw3. May make the print differ based on if smts is 
> > configed.
> 
> qw2 and qw3 are reserved from software point of view. We don't need to print 
> it for
> information.

But for Scalable mode, it should be valid?

> 
> >
> >> +  memcpy(desc, qi->desc + (wait_index << shift),
> >
> > Would "memcpy(desc, (unsigned long long) (qi->desc +  (wait_index <<
> > shift)," be more safe?
> 
> Can that be compiled? memcpy() requires a "const void *" for the second 
> parameter.
> By the way, why it's safer with this casting?

This is just an example. My point is the possibility that "qi->desc + 
(wait_index << shift)"
would be treated as "qi->desc plus (wait_index << shift)*sizeof(*qi->desc)". Is 
it possible
for kernel build?

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 08/12] iommu/vt-d: Pass pasid table to context mapping

2018-11-07 Thread Lu Baolu

Hi,

On 11/7/18 3:25 PM, Liu, Yi L wrote:

Hi Baolu,


From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Monday, November 5, 2018 1:32 PM
Subject: [PATCH v4 08/12] iommu/vt-d: Pass pasid table to context mapping

So that the pasid related info, such as the pasid table and the
maximum of pasid could be used during setting up scalable mode
context.


A little bit refine. Wish it helps. :)
"This patch passes the pasid related info(e.g. the pasid table and the
maximum of pasid) to context mapping, so that pasid related fields
can be setup accordingly in scalable mode context entry."


Yeah, thanks.

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 06/12] iommu/vt-d: Add second level page table interface

2018-11-07 Thread Lu Baolu

Hi,

On 11/7/18 3:13 PM, Liu, Yi L wrote:

Hi Baolu,


From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Monday, November 5, 2018 1:32 PM

This adds the interfaces to setup or tear down the structures
for second level page table translations. This includes types
of second level only translation and pass through.


A little bit refining to the description:)
"This patch adds interfaces for setup or tear down second level
translation in PASID granularity. Translation type includes second
level only type and pass-through type."


Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Sanjay Kumar 


[...]


+
+void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
+struct device *dev, int pasid)
+{
+   struct pasid_entry *pte;


pte is confusing as it is similar with pte in paging structures. may use
pt_entry or just pasid_entry. This comment applies to other "pte"s in
this patch.


"pte" in this file means "pasid table entry", not "page table entry".
This file holds code to handle pasid table related staff. It has nothing
to do with paging structure. I think there should be no confusion here.
:-)

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/12] iommu/vt-d: Reserve a domain id for FL and PT modes

2018-11-07 Thread Lu Baolu

Hi,

On 11/7/18 2:55 PM, Liu, Yi L wrote:

Hi Baolu,


From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Monday, November 5, 2018 1:32 PM
Subject: [PATCH v4 05/12] iommu/vt-d: Reserve a domain id for FL and PT modes

Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid
entry for first-level or pass-through translation should be
programmed with a domain id different from those used for
second-level or nested translation. It is recommended that
software could use a same domain id for all first-only and
pass-through translations.

This reserves a domain id for first-level and pass-through
translations.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Cc: Sanjay Kumar 
Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel-iommu.c | 10 ++
  drivers/iommu/intel-pasid.h |  6 ++
  2 files changed, 16 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 9331240c70b8..2f7455ee4e7a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1618,6 +1618,16 @@ static int iommu_init_domains(struct intel_iommu
*iommu)
 */
set_bit(0, iommu->domain_ids);

+   /*
+* Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid
+* entry for first-level or pass-through translation modes should
+* be programmed with a domain id different from those used for
+* second-level or nested translation. We reserve a domain id for
+* this purpose.
+*/
+   if (sm_supported(iommu))
+   set_bit(FLPT_DEFAULT_DID, iommu->domain_ids);


"FLPT_DEFAULT_DID" looks very likely for first level translation. How about
"PT_FL_DEFAULT_DID"?


We have comments above it, so people won't be confused.




return 0;
  }

diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 12f480c2bb8b..03c1612d173c 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -17,6 +17,12 @@
  #define PDE_PFN_MASK  PAGE_MASK
  #define PASID_PDE_SHIFT   6

+/*
+ * Domain ID reserved for pasid entries programmed for first-level
+ * only and pass-through transfer modes.
+ */
+#define FLPT_DEFAULT_DID   1


Would be helpful to elaborate why DID 1 is selected in the patch
description.


Yeah. DID 0 has been caved out for caching mode and we start from 1 for
this.

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 04/12] iommu/vt-d: Add 256-bit invalidation descriptor support

2018-11-07 Thread Lu Baolu

Hi Yi,

On 11/7/18 2:07 PM, Liu, Yi L wrote:

Hi Baolu,


From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Monday, November 5, 2018 1:32 PM


[...]


---
  drivers/iommu/dmar.c| 83 +++--
  drivers/iommu/intel-svm.c   | 76 --
  drivers/iommu/intel_irq_remapping.c |  6 ++-
  include/linux/intel-iommu.h |  9 +++-
  4 files changed, 115 insertions(+), 59 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index d9c748b6f9e4..ec10427b98ac 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int
index)
int head, tail;
struct q_inval *qi = iommu->qi;
int wait_index = (index + 1) % QI_LENGTH;
+   int shift = qi_shift(iommu);

if (qi->desc_status[wait_index] == QI_ABORT)
return -EAGAIN;
@@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu *iommu,
int index)
 */
if (fault & DMA_FSTS_IQE) {
head = readl(iommu->reg + DMAR_IQH_REG);
-   if ((head >> DMAR_IQ_SHIFT) == index) {
+   if ((head >> shift) == index) {
+   struct qi_desc *desc = qi->desc + head;
+
pr_err("VT-d detected invalid descriptor: "
"low=%llx, high=%llx\n",
-   (unsigned long long)qi->desc[index].low,
-   (unsigned long long)qi->desc[index].high);
-   memcpy(>desc[index], >desc[wait_index],
-   sizeof(struct qi_desc));
+   (unsigned long long)desc->qw0,
+   (unsigned long long)desc->qw1);


Still missing qw2 and qw3. May make the print differ based on if smts is 
configed.


qw2 and qw3 are reserved from software point of view. We don't need to
print it for information.




+   memcpy(desc, qi->desc + (wait_index << shift),


Would "memcpy(desc, (unsigned long long) (qi->desc +  (wait_index << shift)," be
more safe?


Can that be compiled? memcpy() requires a "const void *" for the second
parameter. By the way, why it's safer with this casting?




+  1 << shift);
writel(DMA_FSTS_IQE, iommu->reg + DMAR_FSTS_REG);
return -EINVAL;
}
@@ -1191,10 +1194,10 @@ static int qi_check_fault(struct intel_iommu *iommu,
int index)
 */
if (fault & DMA_FSTS_ITE) {
head = readl(iommu->reg + DMAR_IQH_REG);
-   head = ((head >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH;
+   head = ((head >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
head |= 1;
tail = readl(iommu->reg + DMAR_IQT_REG);
-   tail = ((tail >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH;
+   tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;

writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);

@@ -1222,15 +1225,14 @@ int qi_submit_sync(struct qi_desc *desc, struct
intel_iommu *iommu)
  {
int rc;
struct q_inval *qi = iommu->qi;
-   struct qi_desc *hw, wait_desc;
+   int offset, shift, length;
+   struct qi_desc wait_desc;
int wait_index, index;
unsigned long flags;

if (!qi)
return 0;

-   hw = qi->desc;
-
  restart:
rc = 0;

@@ -1243,16 +1245,21 @@ int qi_submit_sync(struct qi_desc *desc, struct
intel_iommu *iommu)

index = qi->free_head;
wait_index = (index + 1) % QI_LENGTH;
+   shift = qi_shift(iommu);
+   length = 1 << shift;

qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;

-   hw[index] = *desc;
-
-   wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) |
+   offset = index << shift;
+   memcpy(qi->desc + offset, desc, length);
+   wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
-   wait_desc.high = virt_to_phys(>desc_status[wait_index]);
+   wait_desc.qw1 = virt_to_phys(>desc_status[wait_index]);
+   wait_desc.qw2 = 0;
+   wait_desc.qw3 = 0;

-   hw[wait_index] = wait_desc;
+   offset = wait_index << shift;
+   memcpy(qi->desc + offset, _desc, length);


same question with above one.



Ditto.

Best regards,
Lu Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] iommu/vt-d: Group and domain relationship

2018-11-07 Thread Lu Baolu

Hi,

On 11/8/18 1:55 AM, James Sewart wrote:

Hey,


On 7 Nov 2018, at 02:10, Lu Baolu  wrote:

Hi,

On 11/6/18 6:40 PM, James Sewart wrote:

Hey Lu,
Would you be able to go into more detail about the issues with
allowing IOMMU_DOMAIN_DMA to be allocated via domain_alloc?


This door is closed because intel iommu driver does everything for
IOMMU_DOMAIN_DMA: allocating a domain and setup the context entries
for the domain.


As far as I can tell, attach_device in the intel driver will handle
cleaning up any old domain context mapping and ensure the new domain is
mapped with calls to dmar_remove_one_dev_info and domain_add_dev_info.


That's only for domains of IOMMU_DOMAIN_UNMANAGED, right?





Why do we want to open this door? Probably we want the generic iommu
layer to handle these things (it's called default domain).


I’d like to allocate a domain and attach it to multiple devices in a
group/multiple groups so that they share address translation, but still
allow drivers for devices in those groups to use the dma_map_ops api.


Just out of curiosity, why do you want to share a single domain across
multiple groups? By default, the groups and DMA domains are normally
1-1 mapped, right?




So we can't just open the door but not cleanup the things right?


A user of domain_alloc and attach_device is responsible for detaching a
domain if it is no longer needed and calling domain_free.


Currently DMA API calls get_valid_domain_for_dev() to retrieve a DMA
domain. If the domain has already been allocated, return directly.
Otherwise, allocate and initialize a new one for the device. Let's call
domains allocated by get_valid_domain_for_dev() as "A".

If we open the door and allow another component to manage the DMA
domains through domain iommu_domain_alloc/free(). Let's call domains
allocated through iommu_domain_alloc() as "B".

So how can we sync between A and B?

Need to go through the code to find out more.

Best regards,
Lu Baolu


Cheers,
James.




I haven't spent time on details. So I cc'ed Jacob for corrections.

Best regards,
Lu Baolu


Cheers,
James.
On Fri, Nov 2, 2018 at 2:43 AM Lu Baolu  wrote:


Hi,

On 10/30/18 10:18 PM, James Sewart via iommu wrote:

Hey,

I’ve been investigating the relationship between iommu groups and domains
on our systems and have a few question. Why does the intel iommu code not
allow allocating IOMMU_DOMAIN_DMA? Returning NULL when given this domain
type has the side effect that the default_domain for an iommu group is not
set, which, when using for e.g. dma_map_ops.map_page, means a domain is
allocated per device.


Intel vt-d driver doesn't implement the default domain and allocates
domain only on demanded. There are lots of things to do before we allow
iommu API to allocate domains other than IOMMU_DOMAIN_UNMANAGED.

Best regards,
Lu Baolu



This seems to be the opposite behaviour to the AMD iommu code which
supports allocating an IOMMU_DOMAIN_DMA and will only look to the iommu
group if a domain is not attached to the device rather than allocating a
new one. On AMD every device in an iommu group will share the same domain.

Appended is what I think a patch to implement domain_alloc for
IOMMU_DOMAIN_DMA and also IOMMU_DOMAIN_IDENTITY would look like. Testing
shows each device in a group will share a domain by default, it also
allows allocating a new dma domain that can be successfully attached to a
group with iommu_attach_group.

Looking for comment on why the behaviour is how it is currently and if
there are any issues with the solution I’ve been testing.

Cheers,
James.


diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index bff2abd6..3a58389f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5170,10 +5170,15 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
   struct dmar_domain *dmar_domain;
   struct iommu_domain *domain;

- if (type != IOMMU_DOMAIN_UNMANAGED)
+ if (type == IOMMU_DOMAIN_UNMANAGED)
+ dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE);
+ else if(type == IOMMU_DOMAIN_DMA)
+ dmar_domain = alloc_domain(0);
+ else if(type == IOMMU_DOMAIN_IDENTITY)
+ dmar_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY);
+ else
   return NULL;

- dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE);
   if (!dmar_domain) {
   pr_err("Can't allocate dmar_domain\n");
   return NULL;
@@ -5186,9 +5191,12 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
   domain_update_iommu_cap(dmar_domain);

   domain = _domain->domain;
- domain->geometry.aperture_start = 0;
- domain->geometry.aperture_end   = __DOMAIN_MAX_ADDR(dmar_domain->gaw);
- domain->geometry.force_aperture = true;
+
+ if (type == IOMMU_DOMAIN_UNMANAGED) {
+ domain->geometry.aperture_start = 0;
+ domain->geometry.aperture_end   = 

Re: [PATCH v2] iommu/vt-d: respect max guest address width in agaw

2018-11-07 Thread Jacob Pan
On Wed, 7 Nov 2018 17:04:28 +0100
Joerg Roedel  wrote:

> On Tue, Nov 06, 2018 at 02:47:15PM -0800, Jacob Pan wrote:
> >  drivers/iommu/intel-iommu.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)  
> 
> Applied, thanks.

Hi Joerg,
We have found some issues with this patch on some platforms. Please
disregard this patch.

The intent of this patch was to fix some IOMMU with max address of 48
but SAGAW of both 48 and 57 bits. Need to do more investigation in the
page table level calculation and context address width setup. Sorry
about that.

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] of/device: Really only set bus DMA mask when appropriate

2018-11-07 Thread Rob Herring
On Wed, Nov 07, 2018 at 04:30:32PM +, Robin Murphy wrote:
> of_dma_configure() was *supposed* to be following the same logic as
> acpi_dma_configure() and only setting bus_dma_mask if some range was
> specified by the firmware. However, it seems that subtlety got lost in
> the process of fitting it into the differently-shaped control flow, and
> as a result the force_dma==true case ends up always setting the bus mask
> to the 32-bit default, which is not what anyone wants.
> 
> Make sure we only touch it if the DT actually said so.
> 
> Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate")
> Reported-by: Aaro Koskinen 
> Reported-by: Jean-Philippe Brucker 
> Tested-by: Aaro Koskinen 
> Tested-by: John Stultz 
> Tested-by: Geert Uytterhoeven 
> Tested-by: Robert Richter 
> Signed-off-by: Robin Murphy 
> ---
> 
>  v2: Add comment, collect tested-by tags
> 
>  drivers/of/device.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Applied, thanks.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] iommu/vt-d: Group and domain relationship

2018-11-07 Thread James Sewart via iommu
Hey,

> On 7 Nov 2018, at 02:10, Lu Baolu  wrote:
> 
> Hi,
> 
> On 11/6/18 6:40 PM, James Sewart wrote:
>> Hey Lu,
>> Would you be able to go into more detail about the issues with
>> allowing IOMMU_DOMAIN_DMA to be allocated via domain_alloc?
> 
> This door is closed because intel iommu driver does everything for
> IOMMU_DOMAIN_DMA: allocating a domain and setup the context entries
> for the domain.

As far as I can tell, attach_device in the intel driver will handle 
cleaning up any old domain context mapping and ensure the new domain is 
mapped with calls to dmar_remove_one_dev_info and domain_add_dev_info.

> 
> Why do we want to open this door? Probably we want the generic iommu
> layer to handle these things (it's called default domain).

I’d like to allocate a domain and attach it to multiple devices in a 
group/multiple groups so that they share address translation, but still 
allow drivers for devices in those groups to use the dma_map_ops api.

> So we can't just open the door but not cleanup the things right?

A user of domain_alloc and attach_device is responsible for detaching a 
domain if it is no longer needed and calling domain_free.

Cheers,
James.


> 
> I haven't spent time on details. So I cc'ed Jacob for corrections.
> 
> Best regards,
> Lu Baolu
> 
>> Cheers,
>> James.
>> On Fri, Nov 2, 2018 at 2:43 AM Lu Baolu  wrote:
>>> 
>>> Hi,
>>> 
>>> On 10/30/18 10:18 PM, James Sewart via iommu wrote:
 Hey,
 
 I’ve been investigating the relationship between iommu groups and domains
 on our systems and have a few question. Why does the intel iommu code not
 allow allocating IOMMU_DOMAIN_DMA? Returning NULL when given this domain
 type has the side effect that the default_domain for an iommu group is not
 set, which, when using for e.g. dma_map_ops.map_page, means a domain is
 allocated per device.
>>> 
>>> Intel vt-d driver doesn't implement the default domain and allocates
>>> domain only on demanded. There are lots of things to do before we allow
>>> iommu API to allocate domains other than IOMMU_DOMAIN_UNMANAGED.
>>> 
>>> Best regards,
>>> Lu Baolu
>>> 
 
 This seems to be the opposite behaviour to the AMD iommu code which
 supports allocating an IOMMU_DOMAIN_DMA and will only look to the iommu
 group if a domain is not attached to the device rather than allocating a
 new one. On AMD every device in an iommu group will share the same domain.
 
 Appended is what I think a patch to implement domain_alloc for
 IOMMU_DOMAIN_DMA and also IOMMU_DOMAIN_IDENTITY would look like. Testing
 shows each device in a group will share a domain by default, it also
 allows allocating a new dma domain that can be successfully attached to a
 group with iommu_attach_group.
 
 Looking for comment on why the behaviour is how it is currently and if
 there are any issues with the solution I’ve been testing.
 
 Cheers,
 James.
 
 
 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index bff2abd6..3a58389f 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -5170,10 +5170,15 @@ static struct iommu_domain 
 *intel_iommu_domain_alloc(unsigned type)
   struct dmar_domain *dmar_domain;
   struct iommu_domain *domain;
 
 - if (type != IOMMU_DOMAIN_UNMANAGED)
 + if (type == IOMMU_DOMAIN_UNMANAGED)
 + dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE);
 + else if(type == IOMMU_DOMAIN_DMA)
 + dmar_domain = alloc_domain(0);
 + else if(type == IOMMU_DOMAIN_IDENTITY)
 + dmar_domain = alloc_domain(DOMAIN_FLAG_STATIC_IDENTITY);
 + else
   return NULL;
 
 - dmar_domain = alloc_domain(DOMAIN_FLAG_VIRTUAL_MACHINE);
   if (!dmar_domain) {
   pr_err("Can't allocate dmar_domain\n");
   return NULL;
 @@ -5186,9 +5191,12 @@ static struct iommu_domain 
 *intel_iommu_domain_alloc(unsigned type)
   domain_update_iommu_cap(dmar_domain);
 
   domain = _domain->domain;
 - domain->geometry.aperture_start = 0;
 - domain->geometry.aperture_end   = 
 __DOMAIN_MAX_ADDR(dmar_domain->gaw);
 - domain->geometry.force_aperture = true;
 +
 + if (type == IOMMU_DOMAIN_UNMANAGED) {
 + domain->geometry.aperture_start = 0;
 + domain->geometry.aperture_end   = 
 __DOMAIN_MAX_ADDR(dmar_domain->gaw);
 + domain->geometry.force_aperture = true;
 + }
 
   return domain;
   }
 ___
 iommu mailing list
 iommu@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/iommu
 

___
iommu mailing list
iommu@lists.linux-foundation.org

Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3

2018-11-07 Thread Alex Williamson
On Wed, 7 Nov 2018 17:43:23 +0100
"j...@8bytes.org"  wrote:

> Hi,
> 
> On Wed, Nov 07, 2018 at 11:40:30AM +0800, Lu Baolu wrote:
> > Hi Joerg,
> > 
> > On 11/7/18 12:25 AM, j...@8bytes.org wrote:
> > Nowadays, we can find PASID granular DMA translation on both ARM and x86
> > platforms. With PASID granular DMA translation supported in system iommu, if
> > a device divides itself into multiple subsets and tag the DMA
> > transfers of each subset with a unique PASID, each subset become
> > assignable. We call the assignable subset as an ADI (Assignable Device
> > Interface). As the result, each ADI could be attached with a domain.  
> 
> Yeah, I know the background. The point is, the IOMMU-API as of today
> implements a strict one-to-one relationship between domains and devices,
> every device can only have one domain assigned at a given time. If we
> assign a new domain to a device, the old gets unassigned.
> 
> If we allow to attach multiple domains to a single device we
> fundamentally break that semantic.
> 
> Therefore I think it is better is support the ADI devices with
> subdomains and a new set of functions in the API to handle only these
> sub-domains.
> 
> > Further more, a single domain might be attached to an ADI of device A,
> > while attached to another legacy device B which doesn't support PASID
> > features. In this case, we say "Domain 4" is attached to ADI(PASID#x) in
> > aux mode and attached to device B in primary mode.  
> 
> This will of course not work with subdomains, but is that really
> necessary? VFIO already allocates a separate domain for each device
> anyway (iirc), so it is unlikely that is uses the same domain for a
> legacy and an ADI device.

VFIO will attempt to use the same domain for all groups within a
container, only falling back to separate domains if there's an
incompatibility.  Multiple domains means that we need to mirror mapping
and unmapping across each domain, so there's more work to do and
theoretically more overhead in the IOMMU implementation assuming those
domains land on the same IOMMU driver.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free

2018-11-07 Thread Joerg Roedel
On Wed, Nov 07, 2018 at 04:17:09PM +, Robin Murphy wrote:
> FWIW it looks like it *has* always been possible to hit this crash by
> allocating a domain and freeing it again without attaching any devices, it's
> just highly improbable for any sane code to do that explicitly, so the real
> latent triggers are failure paths in external callers (which in this case
> are themselves only being reached thanks to my bug elsewhere).

Okay, given this, it probably makes more sense to change the fixes tag.
This is what I just committed:

Fixes: d25a2a16f0889 ('iommu: Add driver for Renesas VMSA-compatible IPMMU')

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/6] Auxiliary IOMMU domains and Arm SMMUv3

2018-11-07 Thread j...@8bytes.org
Hi,

On Wed, Nov 07, 2018 at 11:40:30AM +0800, Lu Baolu wrote:
> Hi Joerg,
> 
> On 11/7/18 12:25 AM, j...@8bytes.org wrote:
> Nowadays, we can find PASID granular DMA translation on both ARM and x86
> platforms. With PASID granular DMA translation supported in system iommu, if
> a device divides itself into multiple subsets and tag the DMA
> transfers of each subset with a unique PASID, each subset become
> assignable. We call the assignable subset as an ADI (Assignable Device
> Interface). As the result, each ADI could be attached with a domain.

Yeah, I know the background. The point is, the IOMMU-API as of today
implements a strict one-to-one relationship between domains and devices,
every device can only have one domain assigned at a given time. If we
assign a new domain to a device, the old gets unassigned.

If we allow to attach multiple domains to a single device we
fundamentally break that semantic.

Therefore I think it is better is support the ADI devices with
subdomains and a new set of functions in the API to handle only these
sub-domains.

> Further more, a single domain might be attached to an ADI of device A,
> while attached to another legacy device B which doesn't support PASID
> features. In this case, we say "Domain 4" is attached to ADI(PASID#x) in
> aux mode and attached to device B in primary mode.

This will of course not work with subdomains, but is that really
necessary? VFIO already allocates a separate domain for each device
anyway (iirc), so it is unlikely that is uses the same domain for a
legacy and an ADI device.


Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] of/device: Really only set bus DMA mask when appropriate

2018-11-07 Thread Robin Murphy
of_dma_configure() was *supposed* to be following the same logic as
acpi_dma_configure() and only setting bus_dma_mask if some range was
specified by the firmware. However, it seems that subtlety got lost in
the process of fitting it into the differently-shaped control flow, and
as a result the force_dma==true case ends up always setting the bus mask
to the 32-bit default, which is not what anyone wants.

Make sure we only touch it if the DT actually said so.

Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate")
Reported-by: Aaro Koskinen 
Reported-by: Jean-Philippe Brucker 
Tested-by: Aaro Koskinen 
Tested-by: John Stultz 
Tested-by: Geert Uytterhoeven 
Tested-by: Robert Richter 
Signed-off-by: Robin Murphy 
---

 v2: Add comment, collect tested-by tags

 drivers/of/device.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/of/device.c b/drivers/of/device.c
index 0f27fad9fe94..5592437bb3d1 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -149,9 +149,11 @@ int of_dma_configure(struct device *dev, struct 
device_node *np, bool force_dma)
 * set by the driver.
 */
mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
-   dev->bus_dma_mask = mask;
dev->coherent_dma_mask &= mask;
*dev->dma_mask &= mask;
+   /* ...but only set bus mask if we found valid dma-ranges earlier */
+   if (!ret)
+   dev->bus_dma_mask = mask;
 
coherent = of_dma_is_coherent(np);
dev_dbg(dev, "device is%sdma coherent\n",
-- 
2.19.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: Really only set bus DMA mask when appropriate

2018-11-07 Thread Robin Murphy

On 2018-11-07 3:52 pm, Rob Herring wrote:

On Wed, Nov 07, 2018 at 12:56:49PM +, Robin Murphy wrote:

On 2018-11-07 8:03 am, Christoph Hellwig wrote:

On Tue, Nov 06, 2018 at 11:54:15AM +, Robin Murphy wrote:

of_dma_configure() was *supposed* to be following the same logic as
acpi_dma_configure() and only setting bus_dma_mask if some range was
specified by the firmware. However, it seems that subtlety got lost in
the process of fitting it into the differently-shaped control flow, and
as a result the force_dma==true case ends up always setting the bus mask
to the 32-bit default, which is not what anyone wants.

Make sure we only touch it if the DT actually said so.


This looks good, but I think it could really use a comment as the use
of ret all the way down the function isn't exactly obvious.


Fair point.


Let me now if you want this picked up through the OF or DMA trees.


I don't mind either way; I figure I'll wait a bit longer to see if Rob has
any preference, then resend with the comment and the tags picked up so it
can hopefully make rc2.


I have other fixes to send, so I can take it.


Cheers Rob, I'll send you that updated version momentarily.

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free

2018-11-07 Thread Robin Murphy

On 2018-11-07 4:03 pm, Joerg Roedel wrote:

On Wed, Nov 07, 2018 at 04:50:40PM +0100, Geert Uytterhoeven wrote:

It only got triggered by the combination of commits 6c2fb2ea76361da9
("of/device: Set bus DMA mask as appropriate") and b4ebe6063204da58
("dma-direct: implement complete bus_dma_mask handling"), which is being
fixed by "of/device: Really only set bus DMA mask when
appropriate" (https://patchwork.kernel.org/patch/10670177/).


Okay, but the bug is triggered since 6c2fb2ea76361da9, so I took this
one for the fixes-tag.


FWIW it looks like it *has* always been possible to hit this crash by 
allocating a domain and freeing it again without attaching any devices, 
it's just highly improbable for any sane code to do that explicitly, so 
the real latent triggers are failure paths in external callers (which in 
this case are themselves only being reached thanks to my bug elsewhere).


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/vt-d: respect max guest address width in agaw

2018-11-07 Thread Joerg Roedel
On Tue, Nov 06, 2018 at 02:47:15PM -0800, Jacob Pan wrote:
>  drivers/iommu/intel-iommu.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free

2018-11-07 Thread Joerg Roedel
On Wed, Nov 07, 2018 at 02:18:50PM +0100, Geert Uytterhoeven wrote:
 
> Fix this by checking if the domain's context already exists, before
> trying to destroy it.
> 
> Signed-off-by: Geert Uytterhoeven 
> ---
>  drivers/iommu/ipmmu-vmsa.c | 3 +++
>  1 file changed, 3 insertions(+)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free

2018-11-07 Thread Joerg Roedel
On Wed, Nov 07, 2018 at 04:50:40PM +0100, Geert Uytterhoeven wrote:
> It only got triggered by the combination of commits 6c2fb2ea76361da9
> ("of/device: Set bus DMA mask as appropriate") and b4ebe6063204da58
> ("dma-direct: implement complete bus_dma_mask handling"), which is being
> fixed by "of/device: Really only set bus DMA mask when
> appropriate" (https://patchwork.kernel.org/patch/10670177/).

Okay, but the bug is triggered since 6c2fb2ea76361da9, so I took this
one for the fixes-tag.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: Really only set bus DMA mask when appropriate

2018-11-07 Thread Rob Herring
On Wed, Nov 07, 2018 at 12:56:49PM +, Robin Murphy wrote:
> On 2018-11-07 8:03 am, Christoph Hellwig wrote:
> > On Tue, Nov 06, 2018 at 11:54:15AM +, Robin Murphy wrote:
> > > of_dma_configure() was *supposed* to be following the same logic as
> > > acpi_dma_configure() and only setting bus_dma_mask if some range was
> > > specified by the firmware. However, it seems that subtlety got lost in
> > > the process of fitting it into the differently-shaped control flow, and
> > > as a result the force_dma==true case ends up always setting the bus mask
> > > to the 32-bit default, which is not what anyone wants.
> > > 
> > > Make sure we only touch it if the DT actually said so.
> > 
> > This looks good, but I think it could really use a comment as the use
> > of ret all the way down the function isn't exactly obvious.
> 
> Fair point.
> 
> > Let me now if you want this picked up through the OF or DMA trees.
> 
> I don't mind either way; I figure I'll wait a bit longer to see if Rob has
> any preference, then resend with the comment and the tags picked up so it
> can hopefully make rc2.

I have other fixes to send, so I can take it.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free

2018-11-07 Thread Geert Uytterhoeven
Hi Jörg,

On Wed, Nov 7, 2018 at 4:34 PM Joerg Roedel  wrote:
> On Wed, Nov 07, 2018 at 01:22:52PM +, Robin Murphy wrote:
> > On 2018-11-07 1:18 pm, Geert Uytterhoeven wrote:
> > > Fix this by checking if the domain's context already exists, before
> > > trying to destroy it.
> >
> > Reviewed-by: Robin Murphy 
>
> Does this need a Fixes-tag? If so, which patch should be in that tag?

I think this bug has been present since the initial version of the driver, but
this failure mode has probably never been tested before.

It only got triggered by the combination of commits 6c2fb2ea76361da9
("of/device: Set bus DMA mask as appropriate") and b4ebe6063204da58
("dma-direct: implement complete bus_dma_mask handling"), which is being
fixed by "of/device: Really only set bus DMA mask when
appropriate" (https://patchwork.kernel.org/patch/10670177/).

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free

2018-11-07 Thread Joerg Roedel
On Wed, Nov 07, 2018 at 01:22:52PM +, Robin Murphy wrote:
> On 2018-11-07 1:18 pm, Geert Uytterhoeven wrote:
> > Fix this by checking if the domain's context already exists, before
> > trying to destroy it.
> 
> Reviewed-by: Robin Murphy 

Does this need a Fixes-tag? If so, which patch should be in that tag?


Thanks,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free

2018-11-07 Thread Robin Murphy

On 2018-11-07 1:18 pm, Geert Uytterhoeven wrote:

If iommu_ops.add_device() fails, iommu_ops.domain_free() is still
called, leading to a crash, as the domain was only partially
initialized:

 ipmmu-vmsa e67b.mmu: Cannot accommodate DMA translation for IOMMU page 
tables
 sata_rcar ee30.sata: Unable to initialize IPMMU context
 iommu: Failed to add device ee30.sata to group 0: -22
 Unable to handle kernel NULL pointer dereference at virtual address 
0038
 ...
 Call trace:
  ipmmu_domain_free+0x1c/0xa0
  iommu_group_release+0x48/0x68
  kobject_put+0x74/0xe8
  kobject_del.part.0+0x3c/0x50
  kobject_put+0x60/0xe8
  iommu_group_get_for_dev+0xa8/0x1f0
  ipmmu_add_device+0x1c/0x40
  of_iommu_configure+0x118/0x190

Fix this by checking if the domain's context already exists, before
trying to destroy it.


Reviewed-by: Robin Murphy 


Signed-off-by: Geert Uytterhoeven 
---
  drivers/iommu/ipmmu-vmsa.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index c4114b92652eb0c9..a8b2c649c1d1f1b9 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -498,6 +498,9 @@ static int ipmmu_domain_init_context(struct 
ipmmu_vmsa_domain *domain)
  
  static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)

  {
+   if (!domain->mmu)
+   return;
+
/*
 * Disable the context. Flush the TLB as required when modifying the
 * context registers.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/ipmmu-vmsa: Fix crash on early domain free

2018-11-07 Thread Geert Uytterhoeven
If iommu_ops.add_device() fails, iommu_ops.domain_free() is still
called, leading to a crash, as the domain was only partially
initialized:

ipmmu-vmsa e67b.mmu: Cannot accommodate DMA translation for IOMMU page 
tables
sata_rcar ee30.sata: Unable to initialize IPMMU context
iommu: Failed to add device ee30.sata to group 0: -22
Unable to handle kernel NULL pointer dereference at virtual address 
0038
...
Call trace:
 ipmmu_domain_free+0x1c/0xa0
 iommu_group_release+0x48/0x68
 kobject_put+0x74/0xe8
 kobject_del.part.0+0x3c/0x50
 kobject_put+0x60/0xe8
 iommu_group_get_for_dev+0xa8/0x1f0
 ipmmu_add_device+0x1c/0x40
 of_iommu_configure+0x118/0x190

Fix this by checking if the domain's context already exists, before
trying to destroy it.

Signed-off-by: Geert Uytterhoeven 
---
 drivers/iommu/ipmmu-vmsa.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index c4114b92652eb0c9..a8b2c649c1d1f1b9 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -498,6 +498,9 @@ static int ipmmu_domain_init_context(struct 
ipmmu_vmsa_domain *domain)
 
 static void ipmmu_domain_destroy_context(struct ipmmu_vmsa_domain *domain)
 {
+   if (!domain->mmu)
+   return;
+
/*
 * Disable the context. Flush the TLB as required when modifying the
 * context registers.
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: Really only set bus DMA mask when appropriate

2018-11-07 Thread Robin Murphy

On 2018-11-07 8:03 am, Christoph Hellwig wrote:

On Tue, Nov 06, 2018 at 11:54:15AM +, Robin Murphy wrote:

of_dma_configure() was *supposed* to be following the same logic as
acpi_dma_configure() and only setting bus_dma_mask if some range was
specified by the firmware. However, it seems that subtlety got lost in
the process of fitting it into the differently-shaped control flow, and
as a result the force_dma==true case ends up always setting the bus mask
to the 32-bit default, which is not what anyone wants.

Make sure we only touch it if the DT actually said so.


This looks good, but I think it could really use a comment as the use
of ret all the way down the function isn't exactly obvious.


Fair point.


Let me now if you want this picked up through the OF or DMA trees.


I don't mind either way; I figure I'll wait a bit longer to see if Rob 
has any preference, then resend with the comment and the tags picked up 
so it can hopefully make rc2.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: Really only set bus DMA mask when appropriate

2018-11-07 Thread Robin Murphy

On 2018-11-07 12:08 pm, Richter, Robert wrote:

On 07.11.18 11:31:50, Robert Richter wrote:

On 06.11.18 11:54:15, Robin Murphy wrote:

of_dma_configure() was *supposed* to be following the same logic as
acpi_dma_configure() and only setting bus_dma_mask if some range was
specified by the firmware. However, it seems that subtlety got lost in
the process of fitting it into the differently-shaped control flow, and
as a result the force_dma==true case ends up always setting the bus mask
to the 32-bit default, which is not what anyone wants.

Make sure we only touch it if the DT actually said so.

Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate")
Reported-by: Aaro Koskinen 
Reported-by: Jean-Philippe Brucker 
Signed-off-by: Robin Murphy 


Tested-by: Robert Richter 

This patch makes the bad page state message on Cavium ThunderX below
disappear.

Note: The Fixes: tag suggests the issue was already in 4.19, but I
have seen it in 4.20-rc1 first and it was pulled into mainline with:

  cff229491af5 Merge tag 'dma-mapping-4.20' of 
git://git.infradead.org/users/hch/dma-mapping


I have bisected it and the issue was seen first with:

  b4ebe6063204 dma-direct: implement complete bus_dma_mask handling


Right, that was the point at which the underlying problem started 
interacting with SWIOTLB and making arm64 unhappy - the prior effect in 
in 4.19 was to inadvertently break DT-based dma_direct_ops users (like 
MIPS), whom the above commit actually partially unbroke again.


Thanks for testing,
Robin.





-Robert


[   37.634555] BUG: Bad page state in process swapper/5  pfn:f3ebb
[   37.640483] page:7e0003cfaec0 count:0 mapcount:0 
mapping: index:0x0
[   37.648493] flags: 0x0001000(reserved)
[   37.652942] raw: 00001000 7e0003cfaec8 7e0003cfaec8 

[   37.660691] raw:    

[   37.668438] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[   37.674880] bad because of flags: 0x1000(reserved)
[   37.679672] Modules linked in: ip6t_REJECT nf_reject_ipv6 xt_tcpudp 
ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat nf_nat_ipv6 ip6table_mangle 
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
iptable_mangle ip6table_filter ip6_tables iptable_filter crct10dif_ce 
cavium_rng_vf rng_core ipmi_ssif thunderx_zip thunderx_edac ipmi_devintf 
cavium_rng ipmi_msghandler ip_tables x_tables xfs libcrc32c nicvf nicpf 
thunder_bgx thunder_xcv i2c_thunderx mdio_thunder mdio_cavium ipv6
[   37.723866] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 
4.20.0-rc1-00012-gc106b1cbe843 #1
[   37.731874] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 
5.11 12/12/2012
[   37.740228] Call trace:
[   37.742677]  dump_backtrace+0x0/0x148
[   37.746334]  show_stack+0x14/0x1c
[   37.749643]  dump_stack+0x84/0xa8
[   37.752954]  bad_page+0xe4/0x144
[   37.756178]  free_pages_check_bad+0x7c/0x84
[   37.760355]  __free_pages_ok+0x22c/0x284
[   37.764272]  page_frag_free+0x64/0x68
[   37.767930]  skb_free_head+0x24/0x2c
[   37.771500]  skb_release_data+0x130/0x148
[   37.775504]  skb_release_all+0x24/0x30
[   37.779246]  kfree_skb+0x2c/0x54
[   37.782471]  ip_error+0x70/0x1d4
[   37.785693]  ip_rcv_finish+0x3c/0x50
[   37.789262]  ip_rcv+0xc0/0xd0
[   37.792225]  __netif_receive_skb_one_core+0x4c/0x70
[   37.797099]  __netif_receive_skb+0x2c/0x70
[   37.801190]  netif_receive_skb_internal+0x64/0x160
[   37.805976]  napi_gro_receive+0xa0/0xc4
[   37.809815]  nicvf_cq_intr_handler+0x930/0xc1c [nicvf]
[   37.814950]  nicvf_poll+0x30/0xb0 [nicvf]
[   37.818954]  net_rx_action+0x140/0x2f8
[   37.822697]  __do_softirq+0x11c/0x228
[   37.826354]  irq_exit+0xbc/0xd0
[   37.829491]  __handle_domain_irq+0x6c/0xb4
[   37.833581]  gic_handle_irq+0x8c/0x1a0
[   37.837324]  el1_irq+0xb0/0x128
[   37.840459]  arch_cpu_idle+0x10/0x18
[   37.844031]  default_idle_call+0x18/0x28
[   37.847948]  do_idle+0x194/0x258
[   37.851169]  cpu_startup_entry+0x20/0x24
[   37.855089]  secondary_start_kernel+0x144/0x168



---

Sorry about that... I guess I only have test setups that either have
dma-ranges or where a 32-bit bus mask goes unnoticed :(

The Octeon and SMMU issues sound like they're purely down to this, and
it's probably related to at least one of John's Hikey woes.

Robin.

  drivers/of/device.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/of/device.c b/drivers/of/device.c
index 0f27fad9fe94..757ae867674f 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -149,7 +149,8 @@ int of_dma_configure(struct device *dev, struct device_node 
*np, bool force_dma)
 * set by the driver.
 */
mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
-   dev->bus_dma_mask = mask;
+   if (!ret)
+   dev->bus_dma_mask = mask;
dev->coherent_dma_mask &= mask;
*dev->dma_mask &= mask;
  
--

2.19.1.dirty



Re: [PATCH] of/device: Really only set bus DMA mask when appropriate

2018-11-07 Thread Richter, Robert
On 07.11.18 11:31:50, Robert Richter wrote:
> On 06.11.18 11:54:15, Robin Murphy wrote:
> > of_dma_configure() was *supposed* to be following the same logic as
> > acpi_dma_configure() and only setting bus_dma_mask if some range was
> > specified by the firmware. However, it seems that subtlety got lost in
> > the process of fitting it into the differently-shaped control flow, and
> > as a result the force_dma==true case ends up always setting the bus mask
> > to the 32-bit default, which is not what anyone wants.
> > 
> > Make sure we only touch it if the DT actually said so.
> > 
> > Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate")
> > Reported-by: Aaro Koskinen 
> > Reported-by: Jean-Philippe Brucker 
> > Signed-off-by: Robin Murphy 
> 
> Tested-by: Robert Richter 
> 
> This patch makes the bad page state message on Cavium ThunderX below
> disappear.
> 
> Note: The Fixes: tag suggests the issue was already in 4.19, but I
> have seen it in 4.20-rc1 first and it was pulled into mainline with:
> 
>  cff229491af5 Merge tag 'dma-mapping-4.20' of 
> git://git.infradead.org/users/hch/dma-mapping

I have bisected it and the issue was seen first with:

 b4ebe6063204 dma-direct: implement complete bus_dma_mask handling

> 
> -Robert
> 
> 
> [   37.634555] BUG: Bad page state in process swapper/5  pfn:f3ebb
> [   37.640483] page:7e0003cfaec0 count:0 mapcount:0 
> mapping: index:0x0
> [   37.648493] flags: 0x0001000(reserved)
> [   37.652942] raw: 00001000 7e0003cfaec8 7e0003cfaec8 
> 
> [   37.660691] raw:    
> 
> [   37.668438] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> [   37.674880] bad because of flags: 0x1000(reserved)
> [   37.679672] Modules linked in: ip6t_REJECT nf_reject_ipv6 xt_tcpudp 
> ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat nf_nat_ipv6 
> ip6table_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 
> nf_defrag_ipv4 iptable_mangle ip6table_filter ip6_tables iptable_filter 
> crct10dif_ce cavium_rng_vf rng_core ipmi_ssif thunderx_zip thunderx_edac 
> ipmi_devintf cavium_rng ipmi_msghandler ip_tables x_tables xfs libcrc32c 
> nicvf nicpf thunder_bgx thunder_xcv i2c_thunderx mdio_thunder mdio_cavium ipv6
> [   37.723866] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 
> 4.20.0-rc1-00012-gc106b1cbe843 #1
> [   37.731874] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., 
> BIOS 5.11 12/12/2012
> [   37.740228] Call trace:
> [   37.742677]  dump_backtrace+0x0/0x148
> [   37.746334]  show_stack+0x14/0x1c
> [   37.749643]  dump_stack+0x84/0xa8
> [   37.752954]  bad_page+0xe4/0x144
> [   37.756178]  free_pages_check_bad+0x7c/0x84
> [   37.760355]  __free_pages_ok+0x22c/0x284
> [   37.764272]  page_frag_free+0x64/0x68
> [   37.767930]  skb_free_head+0x24/0x2c
> [   37.771500]  skb_release_data+0x130/0x148
> [   37.775504]  skb_release_all+0x24/0x30
> [   37.779246]  kfree_skb+0x2c/0x54
> [   37.782471]  ip_error+0x70/0x1d4
> [   37.785693]  ip_rcv_finish+0x3c/0x50
> [   37.789262]  ip_rcv+0xc0/0xd0
> [   37.792225]  __netif_receive_skb_one_core+0x4c/0x70
> [   37.797099]  __netif_receive_skb+0x2c/0x70
> [   37.801190]  netif_receive_skb_internal+0x64/0x160
> [   37.805976]  napi_gro_receive+0xa0/0xc4
> [   37.809815]  nicvf_cq_intr_handler+0x930/0xc1c [nicvf]
> [   37.814950]  nicvf_poll+0x30/0xb0 [nicvf]
> [   37.818954]  net_rx_action+0x140/0x2f8
> [   37.822697]  __do_softirq+0x11c/0x228
> [   37.826354]  irq_exit+0xbc/0xd0
> [   37.829491]  __handle_domain_irq+0x6c/0xb4
> [   37.833581]  gic_handle_irq+0x8c/0x1a0
> [   37.837324]  el1_irq+0xb0/0x128
> [   37.840459]  arch_cpu_idle+0x10/0x18
> [   37.844031]  default_idle_call+0x18/0x28
> [   37.847948]  do_idle+0x194/0x258
> [   37.851169]  cpu_startup_entry+0x20/0x24
> [   37.855089]  secondary_start_kernel+0x144/0x168
> 
> 
> > ---
> > 
> > Sorry about that... I guess I only have test setups that either have
> > dma-ranges or where a 32-bit bus mask goes unnoticed :(
> > 
> > The Octeon and SMMU issues sound like they're purely down to this, and
> > it's probably related to at least one of John's Hikey woes.
> > 
> > Robin.
> > 
> >  drivers/of/device.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > index 0f27fad9fe94..757ae867674f 100644
> > --- a/drivers/of/device.c
> > +++ b/drivers/of/device.c
> > @@ -149,7 +149,8 @@ int of_dma_configure(struct device *dev, struct 
> > device_node *np, bool force_dma)
> >  * set by the driver.
> >  */
> > mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
> > -   dev->bus_dma_mask = mask;
> > +   if (!ret)
> > +   dev->bus_dma_mask = mask;
> > dev->coherent_dma_mask &= mask;
> > *dev->dma_mask &= mask;
> >  
> > -- 
> > 2.19.1.dirty
> > 
> > 
> > ___
> > linux-arm-kernel 

Re: [PATCH] of/device: Really only set bus DMA mask when appropriate

2018-11-07 Thread Richter, Robert
On 06.11.18 11:54:15, Robin Murphy wrote:
> of_dma_configure() was *supposed* to be following the same logic as
> acpi_dma_configure() and only setting bus_dma_mask if some range was
> specified by the firmware. However, it seems that subtlety got lost in
> the process of fitting it into the differently-shaped control flow, and
> as a result the force_dma==true case ends up always setting the bus mask
> to the 32-bit default, which is not what anyone wants.
> 
> Make sure we only touch it if the DT actually said so.
> 
> Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate")
> Reported-by: Aaro Koskinen 
> Reported-by: Jean-Philippe Brucker 
> Signed-off-by: Robin Murphy 

Tested-by: Robert Richter 

This patch makes the bad page state message on Cavium ThunderX below
disappear.

Note: The Fixes: tag suggests the issue was already in 4.19, but I
have seen it in 4.20-rc1 first and it was pulled into mainline with:

 cff229491af5 Merge tag 'dma-mapping-4.20' of 
git://git.infradead.org/users/hch/dma-mapping

-Robert


[   37.634555] BUG: Bad page state in process swapper/5  pfn:f3ebb
[   37.640483] page:7e0003cfaec0 count:0 mapcount:0 
mapping: index:0x0
[   37.648493] flags: 0x0001000(reserved)
[   37.652942] raw: 00001000 7e0003cfaec8 7e0003cfaec8 

[   37.660691] raw:    

[   37.668438] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
[   37.674880] bad because of flags: 0x1000(reserved)
[   37.679672] Modules linked in: ip6t_REJECT nf_reject_ipv6 xt_tcpudp 
ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat nf_nat_ipv6 ip6table_mangle 
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
iptable_mangle ip6table_filter ip6_tables iptable_filter crct10dif_ce 
cavium_rng_vf rng_core ipmi_ssif thunderx_zip thunderx_edac ipmi_devintf 
cavium_rng ipmi_msghandler ip_tables x_tables xfs libcrc32c nicvf nicpf 
thunder_bgx thunder_xcv i2c_thunderx mdio_thunder mdio_cavium ipv6
[   37.723866] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 
4.20.0-rc1-00012-gc106b1cbe843 #1
[   37.731874] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 
5.11 12/12/2012
[   37.740228] Call trace:
[   37.742677]  dump_backtrace+0x0/0x148
[   37.746334]  show_stack+0x14/0x1c
[   37.749643]  dump_stack+0x84/0xa8
[   37.752954]  bad_page+0xe4/0x144
[   37.756178]  free_pages_check_bad+0x7c/0x84
[   37.760355]  __free_pages_ok+0x22c/0x284
[   37.764272]  page_frag_free+0x64/0x68
[   37.767930]  skb_free_head+0x24/0x2c
[   37.771500]  skb_release_data+0x130/0x148
[   37.775504]  skb_release_all+0x24/0x30
[   37.779246]  kfree_skb+0x2c/0x54
[   37.782471]  ip_error+0x70/0x1d4
[   37.785693]  ip_rcv_finish+0x3c/0x50
[   37.789262]  ip_rcv+0xc0/0xd0
[   37.792225]  __netif_receive_skb_one_core+0x4c/0x70
[   37.797099]  __netif_receive_skb+0x2c/0x70
[   37.801190]  netif_receive_skb_internal+0x64/0x160
[   37.805976]  napi_gro_receive+0xa0/0xc4
[   37.809815]  nicvf_cq_intr_handler+0x930/0xc1c [nicvf]
[   37.814950]  nicvf_poll+0x30/0xb0 [nicvf]
[   37.818954]  net_rx_action+0x140/0x2f8
[   37.822697]  __do_softirq+0x11c/0x228
[   37.826354]  irq_exit+0xbc/0xd0
[   37.829491]  __handle_domain_irq+0x6c/0xb4
[   37.833581]  gic_handle_irq+0x8c/0x1a0
[   37.837324]  el1_irq+0xb0/0x128
[   37.840459]  arch_cpu_idle+0x10/0x18
[   37.844031]  default_idle_call+0x18/0x28
[   37.847948]  do_idle+0x194/0x258
[   37.851169]  cpu_startup_entry+0x20/0x24
[   37.855089]  secondary_start_kernel+0x144/0x168


> ---
> 
> Sorry about that... I guess I only have test setups that either have
> dma-ranges or where a 32-bit bus mask goes unnoticed :(
> 
> The Octeon and SMMU issues sound like they're purely down to this, and
> it's probably related to at least one of John's Hikey woes.
> 
> Robin.
> 
>  drivers/of/device.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index 0f27fad9fe94..757ae867674f 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -149,7 +149,8 @@ int of_dma_configure(struct device *dev, struct 
> device_node *np, bool force_dma)
>* set by the driver.
>*/
>   mask = DMA_BIT_MASK(ilog2(dma_addr + size - 1) + 1);
> - dev->bus_dma_mask = mask;
> + if (!ret)
> + dev->bus_dma_mask = mask;
>   dev->coherent_dma_mask &= mask;
>   *dev->dma_mask &= mask;
>  
> -- 
> 2.19.1.dirty
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOMMU breakage on arm64

2018-11-07 Thread Geert Uytterhoeven
Hi Robin,

On Tue, Nov 6, 2018 at 9:20 PM Robin Murphy  wrote:
> On 2018-11-06 7:44 pm, Geert Uytterhoeven wrote:
> > On Tue, Oct 23, 2018 at 1:40 AM Linux Kernel Mailing List
> >  wrote:
> >> Commit: b4ebe6063204da58e48600b810a97c29ae9e5d12
> >> Parent: 7d21ee4c719f00896767ce19c4c01a56374c2ced
> >> Refname:refs/heads/master
> >> Web:
> >> https://git.kernel.org/torvalds/c/b4ebe6063204da58e48600b810a97c29ae9e5d12
> >> Author: Christoph Hellwig 
> >> AuthorDate: Thu Sep 20 14:04:08 2018 +0200
> >> Committer:  Christoph Hellwig 
> >> CommitDate: Mon Oct 1 07:28:03 2018 -0700
> >>
> >>  dma-direct: implement complete bus_dma_mask handling
> >>
> >>  Instead of rejecting devices with a too small bus_dma_mask we can 
> >> handle
> >>  by taking the bus dma_mask into account for allocations and bounce
> >>  buffering decisions.
> >>
> >>  Signed-off-by: Christoph Hellwig 
> >
> > I  have bisected the following crash to the above commit:
>
> I think that commit mostly just changes the presentation of my
> underlying cockup - see here for what should fix it:
> https://patchwork.kernel.org/patch/10670177/

Thanks, that fixed the issue.

> I have a feeling we've ironed out crash-on-early-domain-free bugs in the
> SMMU drivers already - arm-smmu certainly has an early return in
> arm_smmu_destroy_domain_context() which should behave exactly like your
> diff below, while I think arm-smmu-v3 gets away with it by virtue of
> smmu_domain->cfg being unset, but I'll double-check that when I'm fresh
> tomorrow (Jean-Philippe reported SMMUv3 hitting the DMA thing to me
> internally, but didn't mention any crash).

OK, I'll send a proper patch for the ipmmu-vmsa driver.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: Really only set bus DMA mask when appropriate

2018-11-07 Thread Geert Uytterhoeven
Hi Robin,

On Tue, Nov 6, 2018 at 12:54 PM Robin Murphy  wrote:
> of_dma_configure() was *supposed* to be following the same logic as
> acpi_dma_configure() and only setting bus_dma_mask if some range was
> specified by the firmware. However, it seems that subtlety got lost in
> the process of fitting it into the differently-shaped control flow, and
> as a result the force_dma==true case ends up always setting the bus mask
> to the 32-bit default, which is not what anyone wants.
>
> Make sure we only touch it if the DT actually said so.
>
> Fixes: 6c2fb2ea7636 ("of/device: Set bus DMA mask as appropriate")
> Reported-by: Aaro Koskinen 
> Reported-by: Jean-Philippe Brucker 
> Signed-off-by: Robin Murphy 

Thanks, this fixes the problem I saw with IPMMU on Salvator-X(S).

Tested-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of/device: Really only set bus DMA mask when appropriate

2018-11-07 Thread Christoph Hellwig
On Tue, Nov 06, 2018 at 11:54:15AM +, Robin Murphy wrote:
> of_dma_configure() was *supposed* to be following the same logic as
> acpi_dma_configure() and only setting bus_dma_mask if some range was
> specified by the firmware. However, it seems that subtlety got lost in
> the process of fitting it into the differently-shaped control flow, and
> as a result the force_dma==true case ends up always setting the bus mask
> to the 32-bit default, which is not what anyone wants.
> 
> Make sure we only touch it if the DT actually said so.

This looks good, but I think it could really use a comment as the use
of ret all the way down the function isn't exactly obvious.

Let me now if you want this picked up through the OF or DMA trees.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu