Re: panic in dmar_remove_one_dev_info

2019-12-16 Thread Jerry Snitselaar

On Mon Dec 16 19, Jerry Snitselaar wrote:

HP is seeing a panic on gen9 dl360 and dl560 while testing these other
changes we've been eorking on. I just took an initial look, but have
to run to a dentist appointment so couldn't dig too deep. It looks
like the device sets dev->archdata.iommu to DEFER_DEVICE_DOMAIN_INFO
in intel_iommu_add_device, and then it needs a private domain so
dmar_remove_one_dev_info gets called. That code path ends up trying to
use DEFER_DEVICE_DOMAIN_INFO as a pointer.  I don't need if there just
needs to be a check in there to bail out if it sees
DEFER_DEVICE_DOMAIN_INFO, or if something more is needed. I'll look
at it some more when I get back home.

Regards,
Jerry


Hi Baolu,

Does this look sane?

--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5163,7 +5163,8 @@ static void dmar_remove_one_dev_info(struct device *dev)
 
spin_lock_irqsave(_domain_lock, flags);

info = dev->archdata.iommu;
-   if (info)
+   if (info && info != DEFER_DEVICE_DOMAIN_INFO
+   && info != DUMMY_DEVICE_DOMAIN_INFO)
__dmar_remove_one_dev_info(info);
spin_unlock_irqrestore(_domain_lock, flags);
 }




Regards,
Jerry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level

2019-12-16 Thread Lu Baolu

Hi,

On 2019/12/17 10:36, Liu, Yi L wrote:

From: Liu, Yi L 
Sent: Tuesday, December 17, 2019 10:26 AM
To: Lu Baolu ; Joerg Roedel ; David
Woodhouse ; Alex Williamson

Subject: RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over 
first
level


From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Tuesday, December 17, 2019 9:37 AM
To: Liu, Yi L ; Joerg Roedel ; David
Woodhouse ; Alex Williamson

Subject: Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over 
first
level

Hi again,

On 12/17/19 9:19 AM, Lu Baolu wrote:

Hi Yi,

On 12/15/19 5:22 PM, Liu, Yi L wrote:

Ok, let me explain more... default pasid is meaningful only when
the domain has been attached to a device as an aux-domain. right?

No exactly. Each domain has a specific default pasid, no matter normal
domain (RID based) or aux-domain (PASID based). The difference is for a
normal domain RID2PASID value is used, for an aux-domain the pasid is
allocated from a global pool.

The same concept used in VT-d 3.x scalable mode. For RID based DMA
translation RID2PASID value is used when walking the tables; For PASID
based DMA translation a real pasid in the transaction is used.


If a domain only has one device, and it is attached to this device as
normal domain (normal domain means non aux-domain here). Then
you should flush cache with domain-id and RID2PASID value.
If a domain has one device, and it is attached to this device as
aux-domain. Then you may want to flush cache with domain-id
and default pasid. right?

A domain's counterpart is IOMMU group. So we say attach/detach domain
to/from devices in a group. We don't allow devices with different
default pasid sitting in a same group, right?


Then let's come to the case I mentioned in previous email. a mdev
and another device assigned to a single VM. In host, you will have
a domain which has two devices, one device(deva) is attached as

No. We will have two IOMMU groups and two domains. Correct me if my
understanding is not right.

Reconsidered this. Unfortunately, my understanding is not right. :-(

A single domain could be attached to multiple IOMMU groups. So it
comes to the issue you concerned. Do I understand it right?

yes. Device within the same group has no such issue since such
devices are not able to enabled aux-domain. Now our understanding
are aligned. :-)


normal domain, another one (devB) is attached as aux-domain. Then
which pasid should be used when the mapping in IOVA page table is
modified? RID2PASID or default pasid? I think both should be used
since the domain means differently to the two devices. If you just
use default pasid, then deva may still be able to use stale caches.

You are right. I will change it accordingly. The logic should look
like:

if (domain attached to physical device)
flush_piotlb_with_RID2PASID()
else if (domain_attached_to_mdev_device)
flush_piotlb_with_default_pasid()

Does this work for you? Thanks for catching this!

If no else, it would work for scalable mode. ^_^ I noticed you've
already corrected by yourself in another reply. :-) Look forward to
your next version.

BTW. The discussion in this thread may apply to other cache flush
in your series. Please have a check. At least, there are two places which
need to be updated in this single patch.


Sure. I will.

Best regards,

baolu
  
Regards,

Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level

2019-12-16 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Tuesday, December 17, 2019 9:39 AM
> To: Liu, Yi L ; Joerg Roedel ; David
> Woodhouse ; Alex Williamson
> 
> Subject: Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over 
> first
> level
> 
> Hi,
> 
> On 12/17/19 9:37 AM, Lu Baolu wrote:
> > You are right. I will change it accordingly. The logic should look
> > like:
> >
> > if (domain attached to physical device)
> >  flush_piotlb_with_RID2PASID()
> > else if (domain_attached_to_mdev_device)
> >  flush_piotlb_with_default_pasid()
> >
> 
> Both! so no "else" here.

aha, if we want to flush more precisely, we may check whether
domain->default_pasid is allocated. If not, it means no mdev is
involved, then we may save a p_iotlb_inv_dsc submission and also
save VT-d hardware circles from doing useless flush. This is just
optimization, it's up to you to pick it or not in this patchset. I'm
fine with flush "both" since it guarantees correctness.

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level

2019-12-16 Thread Liu, Yi L
> From: Liu, Yi L 
> Sent: Tuesday, December 17, 2019 10:26 AM
> To: Lu Baolu ; Joerg Roedel ; David
> Woodhouse ; Alex Williamson
> 
> Subject: RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over 
> first
> level
> 
> > From: Lu Baolu [mailto:baolu...@linux.intel.com]
> > Sent: Tuesday, December 17, 2019 9:37 AM
> > To: Liu, Yi L ; Joerg Roedel ; David
> > Woodhouse ; Alex Williamson
> > 
> > Subject: Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova 
> > over first
> > level
> >
> > Hi again,
> >
> > On 12/17/19 9:19 AM, Lu Baolu wrote:
> > > Hi Yi,
> > >
> > > On 12/15/19 5:22 PM, Liu, Yi L wrote:
> > >> Ok, let me explain more... default pasid is meaningful only when
> > >> the domain has been attached to a device as an aux-domain. right?
> > >
> > > No exactly. Each domain has a specific default pasid, no matter normal
> > > domain (RID based) or aux-domain (PASID based). The difference is for a
> > > normal domain RID2PASID value is used, for an aux-domain the pasid is
> > > allocated from a global pool.
> > >
> > > The same concept used in VT-d 3.x scalable mode. For RID based DMA
> > > translation RID2PASID value is used when walking the tables; For PASID
> > > based DMA translation a real pasid in the transaction is used.
> > >
> > >> If a domain only has one device, and it is attached to this device as
> > >> normal domain (normal domain means non aux-domain here). Then
> > >> you should flush cache with domain-id and RID2PASID value.
> > >> If a domain has one device, and it is attached to this device as
> > >> aux-domain. Then you may want to flush cache with domain-id
> > >> and default pasid. right?
> > >
> > > A domain's counterpart is IOMMU group. So we say attach/detach domain
> > > to/from devices in a group. We don't allow devices with different
> > > default pasid sitting in a same group, right?
> > >
> > >> Then let's come to the case I mentioned in previous email. a mdev
> > >> and another device assigned to a single VM. In host, you will have
> > >> a domain which has two devices, one device(deva) is attached as
> > >
> > > No. We will have two IOMMU groups and two domains. Correct me if my
> > > understanding is not right.
> >
> > Reconsidered this. Unfortunately, my understanding is not right. :-(
> >
> > A single domain could be attached to multiple IOMMU groups. So it
> > comes to the issue you concerned. Do I understand it right?
> 
> yes. Device within the same group has no such issue since such
> devices are not able to enabled aux-domain. Now our understanding
> are aligned. :-)
> 
> > >
> > >> normal domain, another one (devB) is attached as aux-domain. Then
> > >> which pasid should be used when the mapping in IOVA page table is
> > >> modified? RID2PASID or default pasid? I think both should be used
> > >> since the domain means differently to the two devices. If you just
> > >> use default pasid, then deva may still be able to use stale caches.
> >
> > You are right. I will change it accordingly. The logic should look
> > like:
> >
> > if (domain attached to physical device)
> > flush_piotlb_with_RID2PASID()
> > else if (domain_attached_to_mdev_device)
> > flush_piotlb_with_default_pasid()
> >
> > Does this work for you? Thanks for catching this!
> 
> If no else, it would work for scalable mode. ^_^ I noticed you've
> already corrected by yourself in another reply. :-) Look forward to
> your next version.

BTW. The discussion in this thread may apply to other cache flush
in your series. Please have a check. At least, there are two places which
need to be updated in this single patch.
 
Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level

2019-12-16 Thread Liu, Yi L
> From: Lu Baolu < baolu...@linux.intel.com >
> Sent: Tuesday, December 17, 2019 10:04 AM
> To: Liu, Yi L ; Joerg Roedel ; David
> Woodhouse ; Alex Williamson
> 
> Subject: Re: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over 
> first level
> 
> Hi Yi,
> 
> On 12/15/19 5:37 PM, Liu, Yi L wrote:
> >> XD (bit 63) is only for the first level, and SNP (bit 11) is only for
> >> second level, right? I think we need to always set XD bit for IOVA over FL 
> >> case.
> thoughts?
> > Oops, I made a mistake here. Please forget SNP bit, there is no way to
> > control SNP with first level page table.:-)
> >
> > Actually, it is execute (bit 1) of second level page table which I wanted 
> > to say.
> > If software sets R/W/X permission to an IOVA, with IOVA over second
> > level page table, it will set bit 1. However, if IOVA is over first
> > level page table, it may need to clear XD bit. This is what I want to
> > say here. If IOVA doesn’t allow execute permission, it's ok to always
> > set XD bit for IOVA over FL case. But I would like to do it just as
> > what we did for R/W permission. R/W permission relies on the permission
> configured by the page map caller. right?
> 
> Got your point.
> 
> Current driver always cleard X (bit 2) in the second level page table.
> So we will always set XD bit (bit 63) in the first level page table.

yes, I also noticed X (bit 2) is not used in intel-iommu driver. So I
know why you set XD for IOVA over FL case. But it's a little bit weird
to hard code it. That's why I suggested to relay page map caller's
permission input.

> If we decide to use the X permission, we need a separated patch, right?

sure, it would be a separate patch since current code doesn’t apply
X permission.

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level

2019-12-16 Thread Liu, Yi L
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Tuesday, December 17, 2019 9:37 AM
> To: Liu, Yi L ; Joerg Roedel ; David
> Woodhouse ; Alex Williamson
> 
> Subject: Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over 
> first
> level
> 
> Hi again,
> 
> On 12/17/19 9:19 AM, Lu Baolu wrote:
> > Hi Yi,
> >
> > On 12/15/19 5:22 PM, Liu, Yi L wrote:
> >> Ok, let me explain more... default pasid is meaningful only when
> >> the domain has been attached to a device as an aux-domain. right?
> >
> > No exactly. Each domain has a specific default pasid, no matter normal
> > domain (RID based) or aux-domain (PASID based). The difference is for a
> > normal domain RID2PASID value is used, for an aux-domain the pasid is
> > allocated from a global pool.
> >
> > The same concept used in VT-d 3.x scalable mode. For RID based DMA
> > translation RID2PASID value is used when walking the tables; For PASID
> > based DMA translation a real pasid in the transaction is used.
> >
> >> If a domain only has one device, and it is attached to this device as
> >> normal domain (normal domain means non aux-domain here). Then
> >> you should flush cache with domain-id and RID2PASID value.
> >> If a domain has one device, and it is attached to this device as
> >> aux-domain. Then you may want to flush cache with domain-id
> >> and default pasid. right?
> >
> > A domain's counterpart is IOMMU group. So we say attach/detach domain
> > to/from devices in a group. We don't allow devices with different
> > default pasid sitting in a same group, right?
> >
> >> Then let's come to the case I mentioned in previous email. a mdev
> >> and another device assigned to a single VM. In host, you will have
> >> a domain which has two devices, one device(deva) is attached as
> >
> > No. We will have two IOMMU groups and two domains. Correct me if my
> > understanding is not right.
> 
> Reconsidered this. Unfortunately, my understanding is not right. :-(
> 
> A single domain could be attached to multiple IOMMU groups. So it
> comes to the issue you concerned. Do I understand it right?

yes. Device within the same group has no such issue since such
devices are not able to enabled aux-domain. Now our understanding
are aligned. :-)

> >
> >> normal domain, another one (devB) is attached as aux-domain. Then
> >> which pasid should be used when the mapping in IOVA page table is
> >> modified? RID2PASID or default pasid? I think both should be used
> >> since the domain means differently to the two devices. If you just
> >> use default pasid, then deva may still be able to use stale caches.
> 
> You are right. I will change it accordingly. The logic should look
> like:
> 
> if (domain attached to physical device)
>   flush_piotlb_with_RID2PASID()
> else if (domain_attached_to_mdev_device)
>   flush_piotlb_with_default_pasid()
> 
> Does this work for you? Thanks for catching this!

If no else, it would work for scalable mode. ^_^ I noticed you've
already corrected by yourself in another reply. :-) Look forward to
your next version.

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level

2019-12-16 Thread Lu Baolu

Hi Yi,

On 12/15/19 5:37 PM, Liu, Yi L wrote:

XD (bit 63) is only for the first level, and SNP (bit 11) is only for second 
level, right? I
think we need to always set XD bit for IOVA over FL case. thoughts?

Oops, I made a mistake here. Please forget SNP bit, there is no way to control 
SNP
with first level page table.:-)

Actually, it is execute (bit 1) of second level page table which I wanted to 
say.
If software sets R/W/X permission to an IOVA, with IOVA over second level
page table, it will set bit 1. However, if IOVA is over first level page table, 
it
may need to clear XD bit. This is what I want to say here. If IOVA doesn’t allow
execute permission, it's ok to always set XD bit for IOVA over FL case. But I
would like to do it just as what we did for R/W permission. R/W permission
relies on the permission configured by the page map caller. right?


Got your point.

Current driver always cleard X (bit 2) in the second level page table.
So we will always set XD bit (bit 63) in the first level page table.
If we decide to use the X permission, we need a separated patch, right?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level

2019-12-16 Thread Lu Baolu

Hi,

On 12/17/19 9:37 AM, Lu Baolu wrote:

You are right. I will change it accordingly. The logic should look
like:

if (domain attached to physical device)
 flush_piotlb_with_RID2PASID()
else if (domain_attached_to_mdev_device)
 flush_piotlb_with_default_pasid()



Both! so no "else" here.

Best regards,
baolu


Does this work for you? Thanks for catching this!

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level

2019-12-16 Thread Lu Baolu

Hi again,

On 12/17/19 9:19 AM, Lu Baolu wrote:

Hi Yi,

On 12/15/19 5:22 PM, Liu, Yi L wrote:

Ok, let me explain more... default pasid is meaningful only when
the domain has been attached to a device as an aux-domain. right?


No exactly. Each domain has a specific default pasid, no matter normal
domain (RID based) or aux-domain (PASID based). The difference is for a
normal domain RID2PASID value is used, for an aux-domain the pasid is
allocated from a global pool.

The same concept used in VT-d 3.x scalable mode. For RID based DMA
translation RID2PASID value is used when walking the tables; For PASID
based DMA translation a real pasid in the transaction is used.


If a domain only has one device, and it is attached to this device as
normal domain (normal domain means non aux-domain here). Then
you should flush cache with domain-id and RID2PASID value.
If a domain has one device, and it is attached to this device as
aux-domain. Then you may want to flush cache with domain-id
and default pasid. right?


A domain's counterpart is IOMMU group. So we say attach/detach domain
to/from devices in a group. We don't allow devices with different
default pasid sitting in a same group, right?


Then let's come to the case I mentioned in previous email. a mdev
and another device assigned to a single VM. In host, you will have
a domain which has two devices, one device(deva) is attached as


No. We will have two IOMMU groups and two domains. Correct me if my
understanding is not right.


Reconsidered this. Unfortunately, my understanding is not right. :-(

A single domain could be attached to multiple IOMMU groups. So it
comes to the issue you concerned. Do I understand it right?




normal domain, another one (devB) is attached as aux-domain. Then
which pasid should be used when the mapping in IOVA page table is
modified? RID2PASID or default pasid? I think both should be used
since the domain means differently to the two devices. If you just
use default pasid, then deva may still be able to use stale caches.


You are right. I will change it accordingly. The logic should look
like:

if (domain attached to physical device)
flush_piotlb_with_RID2PASID()
else if (domain_attached_to_mdev_device)
flush_piotlb_with_default_pasid()

Does this work for you? Thanks for catching this!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/vt-d: Flush PASID-based iotlb for iova over first level

2019-12-16 Thread Lu Baolu

Hi Yi,

On 12/15/19 5:22 PM, Liu, Yi L wrote:

Ok, let me explain more... default pasid is meaningful only when
the domain has been attached to a device as an aux-domain. right?


No exactly. Each domain has a specific default pasid, no matter normal
domain (RID based) or aux-domain (PASID based). The difference is for a
normal domain RID2PASID value is used, for an aux-domain the pasid is
allocated from a global pool.

The same concept used in VT-d 3.x scalable mode. For RID based DMA
translation RID2PASID value is used when walking the tables; For PASID
based DMA translation a real pasid in the transaction is used.


If a domain only has one device, and it is attached to this device as
normal domain (normal domain means non aux-domain here). Then
you should flush cache with domain-id and RID2PASID value.
If a domain has one device, and it is attached to this device as
aux-domain. Then you may want to flush cache with domain-id
and default pasid. right?


A domain's counterpart is IOMMU group. So we say attach/detach domain
to/from devices in a group. We don't allow devices with different
default pasid sitting in a same group, right?


Then let's come to the case I mentioned in previous email. a mdev
and another device assigned to a single VM. In host, you will have
a domain which has two devices, one device(deva) is attached as


No. We will have two IOMMU groups and two domains. Correct me if my
understanding is not right.

Best regards,
baolu


normal domain, another one (devB) is attached as aux-domain. Then
which pasid should be used when the mapping in IOVA page table is
modified? RID2PASID or default pasid? I think both should be used
since the domain means differently to the two devices. If you just
use default pasid, then deva may still be able to use stale caches.

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


panic in dmar_remove_one_dev_info

2019-12-16 Thread Jerry Snitselaar

HP is seeing a panic on gen9 dl360 and dl560 while testing these other
changes we've been eorking on. I just took an initial look, but have
to run to a dentist appointment so couldn't dig too deep. It looks
like the device sets dev->archdata.iommu to DEFER_DEVICE_DOMAIN_INFO
in intel_iommu_add_device, and then it needs a private domain so
dmar_remove_one_dev_info gets called. That code path ends up trying to
use DEFER_DEVICE_DOMAIN_INFO as a pointer.  I don't need if there just
needs to be a check in there to bail out if it sees
DEFER_DEVICE_DOMAIN_INFO, or if something more is needed. I'll look
at it some more when I get back home.

Regards,
Jerry

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/vt-d: skip RMRR entries that fail the sanity check

2019-12-16 Thread Barret Rhoden via iommu

On 12/16/19 2:07 PM, Chen, Yian wrote:



On 12/11/2019 11:46 AM, Barret Rhoden wrote:

RMRR entries describe memory regions that are DMA targets for devices
outside the kernel's control.

RMRR entries that fail the sanity check are pointing to regions of
memory that the firmware did not tell the kernel are reserved or
otherwise should not be used.

Instead of aborting DMAR processing, this commit skips these RMRR
entries.  They will not be mapped into the IOMMU, but the IOMMU can
still be utilized.  If anything, when the IOMMU is on, those devices
will not be able to clobber RAM that the kernel has allocated from those
regions.

Signed-off-by: Barret Rhoden 
---
  drivers/iommu/intel-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f168cd8ee570..f7e09244c9e4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4316,7 +4316,7 @@ int __init dmar_parse_one_rmrr(struct 
acpi_dmar_header *header, void *arg)

  rmrr = (struct acpi_dmar_reserved_memory *)header;
  ret = arch_rmrr_sanity_check(rmrr);
  if (ret)
-    return ret;
+    return 0;
  rmrru = kzalloc(sizeof(*rmrru), GFP_KERNEL);
  if (!rmrru)
Parsing rmrr function should report the error to caller. The behavior to 
response the error can be
chose  by the caller in the calling stack, for example, 
dmar_walk_remapping_entries().
A concern is that ignoring a detected firmware bug might have a 
potential side impact though

it seemed safe for your case.


That's a little difficult given the current code.  Once we are in
dmar_walk_remapping_entries(), the specific function (parse_one_rmrr) is 
called via callback:


ret = cb->cb[iter->type](iter, cb->arg[iter->type]);
if (ret)
return ret;

If there's an error of any sort, it aborts the walk.  Handling the 
specific errors here is difficult, since we don't know what the errors 
mean to the specific callback.  Is there some errno we can use that 
means "there was a problem, but it's not so bad that you have to abort, 
but I figured you ought to know"?  Not that I think that's a good idea.


The knowledge of whether or not a specific error is worth aborting all 
DMAR functionality is best known inside the specific callback.  The only 
handling to do is print a warning and either skip it or abort.


I think skipping the entry for a bad RMRR is better than aborting 
completely, though I understand if people don't like that.  It's 
debatable.  By aborting, we lose the ability to use the IOMMU at all, 
but we are still in a situation where the devices using the RMRR regions 
might be clobbering kernel memory, right?  Using the IOMMU (with no 
mappings for the bad RMRRs) would stop those devices from clobbering memory.


Regardless, I have two other patches in this series that could resolve 
the problem for me and probably other people.  I'd just like at least 
one of the three patches to get merged so that my machine boots when the 
original commit f036c7fa0ab6 ("iommu/vt-d: Check VT-d RMRR region in 
BIOS is reported as reserved") gets released.


Thanks,

Barret

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v8 03/10] iommu/vt-d: Add bind guest PASID support

2019-12-16 Thread Jacob Pan
When supporting guest SVA with emulated IOMMU, the guest PASID
table is shadowed in VMM. Updates to guest vIOMMU PASID table
will result in PASID cache flush which will be passed down to
the host as bind guest PASID calls.

For the SL page tables, it will be harvested from device's
default domain (request w/o PASID), or aux domain in case of
mediated device.

.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-iommu.c |   4 +
 drivers/iommu/intel-svm.c   | 214 
 include/linux/intel-iommu.h |   8 +-
 include/linux/intel-svm.h   |  17 
 4 files changed, 242 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index cc89791d807c..304654dbc622 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5993,6 +5993,10 @@ const struct iommu_ops intel_iommu_ops = {
.dev_disable_feat   = intel_iommu_dev_disable_feat,
.is_attach_deferred = intel_iommu_is_attach_deferred,
.pgsize_bitmap  = INTEL_IOMMU_PGSIZES,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+   .sva_bind_gpasid= intel_svm_bind_gpasid,
+   .sva_unbind_gpasid  = intel_svm_unbind_gpasid,
+#endif
 };
 
 static void quirk_iommu_igfx(struct pci_dev *dev)
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 0fcbe631cd5f..f580b7be63c5 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -230,6 +230,220 @@ static LIST_HEAD(global_svm_list);
list_for_each_entry((sdev), &(svm)->devs, list) \
if ((d) != (sdev)->dev) {} else
 
+int intel_svm_bind_gpasid(struct iommu_domain *domain,
+   struct device *dev,
+   struct iommu_gpasid_bind_data *data)
+{
+   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+   struct dmar_domain *ddomain;
+   struct intel_svm_dev *sdev;
+   struct intel_svm *svm;
+   int ret = 0;
+
+   if (WARN_ON(!iommu) || !data)
+   return -EINVAL;
+
+   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
+   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+   return -EINVAL;
+
+   if (dev_is_pci(dev)) {
+   /* VT-d supports devices with full 20 bit PASIDs only */
+   if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
+   return -EINVAL;
+   } else {
+   return -ENOTSUPP;
+   }
+
+   /*
+* We only check host PASID range, we have no knowledge to check
+* guest PASID range nor do we use the guest PASID.
+*/
+   if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
+   return -EINVAL;
+
+   ddomain = to_dmar_domain(domain);
+
+   /* Sanity check paging mode support match between host and guest */
+   if (data->addr_width == ADDR_WIDTH_5LEVEL &&
+   !cap_5lp_support(iommu->cap)) {
+   pr_err("Cannot support 5 level paging requested by guest!\n");
+   return -EINVAL;
+   }
+
+   mutex_lock(_mutex);
+   svm = ioasid_find(NULL, data->hpasid, NULL);
+   if (IS_ERR(svm)) {
+   ret = PTR_ERR(svm);
+   goto out;
+   }
+
+   if (svm) {
+   /*
+* If we found svm for the PASID, there must be at
+* least one device bond, otherwise svm should be freed.
+*/
+   if (WARN_ON(list_empty(>devs)))
+   return -EINVAL;
+
+   if (svm->mm == get_task_mm(current) &&
+   data->hpasid == svm->pasid &&
+   data->gpasid == svm->gpasid) {
+   pr_warn("Cannot bind the same guest-host PASID for the 
same process\n");
+   mmput(svm->mm);
+   return -EINVAL;
+   }
+
+   for_each_svm_dev(sdev, svm, dev) {
+   /* In case of multiple sub-devices of the same 

[PATCH v8 02/10] iommu/vt-d: Add nested translation helper function

2019-12-16 Thread Jacob Pan
Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
With PASID granular translation type set to 0x11b, translation
result from the first level(FL) also subject to a second level(SL)
page table translation. This mode is used for SVA virtualization,
where FL performs guest virtual to guest physical translation and
SL performs guest physical to host physical translation.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-pasid.c | 213 
 drivers/iommu/intel-pasid.h |  12 +++
 include/linux/intel-iommu.h |   3 +
 include/uapi/linux/iommu.h  |   5 +-
 4 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 3cb569e76642..b178ad9e47ae 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -359,6 +359,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
pasid_set_bits(>val[2], GENMASK_ULL(3, 2), value << 2);
 }
 
+/*
+ * Setup the Extended Memory Type(EMT) field (Bits 91-93)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_emt(struct pasid_entry *pe, u64 value)
+{
+   pasid_set_bits(>val[1], GENMASK_ULL(29, 27), value << 27);
+}
+
+/*
+ * Setup the Page Attribute Table (PAT) field (Bits 96-127)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pat(struct pasid_entry *pe, u64 value)
+{
+   pasid_set_bits(>val[1], GENMASK_ULL(63, 32), value << 27);
+}
+
+/*
+ * Setup the Cache Disable (CD) field (Bit 89)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_cd(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 25, 1);
+}
+
+/*
+ * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_emte(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 26, 1);
+}
+
+/*
+ * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_eafe(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[2], 1 << 7, 1);
+}
+
+/*
+ * Setup the Page-level Cache Disable (PCD) field (Bit 95)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pcd(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 31, 1);
+}
+
+/*
+ * Setup the Page-level Write-Through (PWT)) field (Bit 94)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pwt(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[1], 1 << 30, 1);
+}
+
 static void
 pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
u16 did, int pasid)
@@ -599,3 +669,146 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
*iommu,
 
return 0;
 }
+
+static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
+   struct pasid_entry *pte,
+   struct iommu_gpasid_bind_data_vtd *pasid_data)
+{
+   /*
+* Not all guest PASID table entry fields are passed down during bind,
+* here we only set up the ones that are dependent on guest settings.
+* Execution related bits such as NXE, SMEP are not meaningful to IOMMU,
+* therefore not set. Other fields, such as snoop related, are set based
+* on host needs regardless of  guest settings.
+*/
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
+   if (!ecap_srs(iommu->ecap)) {
+   pr_err("No supervisor request support on %s\n",
+  iommu->name);
+   return -EINVAL;
+   }
+   pasid_set_sre(pte);
+   }
+
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) {
+   if (!ecap_eafs(iommu->ecap)) {
+   pr_err("No extended access flag support on %s\n",
+   iommu->name);
+   return -EINVAL;
+   }
+   pasid_set_eafe(pte);
+   }
+
+   /*
+* Memory type is only applicable to devices inside processor coherent
+* domain. PCIe devices are not included. We can skip the rest of the
+* flags if IOMMU does not support MTS.
+*/
+   if (ecap_mts(iommu->ecap)) {
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) {
+   pasid_set_emte(pte);
+   pasid_set_emt(pte, pasid_data->emt);
+   }
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD)
+   pasid_set_pcd(pte);
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT)
+   pasid_set_pwt(pte);
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD)
+   pasid_set_cd(pte);
+   pasid_set_pat(pte, pasid_data->pat);
+   } else if (pasid_data->flags & 

[PATCH v8 05/10] iommu/vt-d: Add svm/sva invalidate function

2019-12-16 Thread Jacob Pan
When Shared Virtual Address (SVA) is enabled for a guest OS via
vIOMMU, we need to provide invalidation support at IOMMU API and driver
level. This patch adds Intel VT-d specific function to implement
iommu passdown invalidate API for shared virtual address.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
Signed-off-by: Liu, Yi L 
---
 drivers/iommu/intel-iommu.c | 170 
 1 file changed, 170 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 304654dbc622..e90102c7540d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5458,6 +5458,175 @@ static void intel_iommu_aux_detach_device(struct 
iommu_domain *domain,
aux_domain_remove_dev(to_dmar_domain(domain), dev);
 }
 
+/*
+ * 2D array for converting and sanitizing IOMMU generic TLB granularity to
+ * VT-d granularity. Invalidation is typically included in the unmap operation
+ * as a result of DMA or VFIO unmap. However, for assigned device where guest
+ * could own the first level page tables without being shadowed by QEMU. In
+ * this case there is no pass down unmap to the host IOMMU as a result of unmap
+ * in the guest. Only invalidations are trapped and passed down.
+ * In all cases, only first level TLB invalidation (request with PASID) can be
+ * passed down, therefore we do not include IOTLB granularity for request
+ * without PASID (second level).
+ *
+ * For an example, to find the VT-d granularity encoding for IOTLB
+ * type and page selective granularity within PASID:
+ * X: indexed by iommu cache type
+ * Y: indexed by enum iommu_inv_granularity
+ * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
+ *
+ * Granu_map array indicates validity of the table. 1: valid, 0: invalid
+ *
+ */
+const static int 
inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+   /* PASID based IOTLB, support PASID selective and page selective */
+   {0, 1, 1},
+   /* PASID based dev TLBs, only support all PASIDs or single PASID */
+   {1, 1, 0},
+   /* PASID cache */
+   {1, 1, 0}
+};
+
+const static u64 
inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+   /* PASID based IOTLB */
+   {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
+   /* PASID based dev TLBs */
+   {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
+   /* PASID cache */
+   {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
+};
+
+static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu)
+{
+   if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= IOMMU_INV_GRANU_NR ||
+   !inv_type_granu_map[type][granu])
+   return -EINVAL;
+
+   *vtd_granu = inv_type_granu_table[type][granu];
+
+   return 0;
+}
+
+static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
+{
+   u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
+
+   /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
+* IOMMU cache invalidate API passes granu_size in bytes, and number of
+* granu size in contiguous memory.
+*/
+   return order_base_2(nr_pages);
+}
+
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct iommu_cache_invalidate_info 
*inv_info)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct device_domain_info *info;
+   struct intel_iommu *iommu;
+   unsigned long flags;
+   int cache_type;
+   u8 bus, devfn;
+   u16 did, sid;
+   int ret = 0;
+   u64 size;
+
+   if (!inv_info || !dmar_domain ||
+   inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   if (!dev || !dev_is_pci(dev))
+   return -ENODEV;
+
+   iommu = device_to_iommu(dev, , );
+   if (!iommu)
+   return -ENODEV;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   spin_lock(>lock);
+   info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
+   if (!info) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+   did = dmar_domain->iommu_did[iommu->seq_id];
+   sid = PCI_DEVID(bus, devfn);
+   size = to_vtd_size(inv_info->addr_info.granule_size, 
inv_info->addr_info.nb_granules);
+
+   for_each_set_bit(cache_type, (unsigned long *)_info->cache, 
IOMMU_CACHE_INV_TYPE_NR) {
+   u64 granu = 0;
+   u64 

[PATCH v8 00/10] Nested Shared Virtual Address (SVA) VT-d support

2019-12-16 Thread Jacob Pan
Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
platforms allow address space sharing between device DMA and applications.
SVA can reduce programming complexity and enhance security.
This series is intended to enable SVA virtualization, i.e. enable use of SVA
within a guest user application.

Only IOMMU portion of the changes are included in this series. Additional
support is needed in VFIO and QEMU (will be submitted separately) to complete
this functionality.

To make incremental changes and reduce the size of each patchset. This series
does not inlcude support for page request services.

In VT-d implementation, PASID table is per device and maintained in the host.
Guest PASID table is shadowed in VMM where virtual IOMMU is emulated.

.-.  .---.
|   vIOMMU|  | Guest process CR3, FL only|
| |  '---'
./
| PASID Entry |--- PASID cache flush -
'-'   |
| |   V
| |CR3 in GPA
'-'
Guest
--| Shadow |--|
  vv  v
Host
.-.  .--.
|   pIOMMU|  | Bind FL for GVA-GPA  |
| |  '--'
./  |
| PASID Entry | V (Nested xlate)
'\.--.
| |   |SL for GPA-HPA, default domain|
| |   '--'
'-'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

This is the remaining VT-d only portion of V5 since the uAPIs and IOASID common
code have been applied to Joerg's IOMMU core branch.
(https://lkml.org/lkml/2019/10/2/833)

The complete set with VFIO patches are here:
https://github.com/jacobpan/linux.git:siov_sva

The complete nested SVA upstream patches are divided into three phases:
1. Common APIs and PCI device direct assignment
2. Page Request Services (PRS) support
3. Mediated device assignment

With this set and the accompanied VFIO code, we will achieve phase #1.

Thanks,

Jacob

ChangeLog:
- v8
  - Extracted cleanup patches from V7 and accepted into maintainer's
tree (https://lkml.org/lkml/2019/12/2/514).
  - Added IOASID notifier and VT-d handler for termination of PASID
IOMMU context upon free. This will ensure success of VFIO IOASID
free API regardless PASID is in use.

(https://lore.kernel.org/linux-iommu/1571919983-3231-1-git-send-email-yi.l@intel.com/)

- V7
  - Respect vIOMMU PASID range in virtual command PASID/IOASID allocator
  - Caching virtual command capabilities to avoid runtime checks that
could cause vmexits.

- V6
  - Rebased on top of Joerg's core branch
  (git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git core)
  - Adapt to new uAPIs and IOASID allocators

- V5
  Rebased on v5.3-rc4 which has some of the IOMMU fault APIs merged.
  Addressed v4 review comments from Eric Auger, Baolu Lu, and
Jonathan Cameron. Specific changes are as follows:
  - Refined custom IOASID allocator to support multiple vIOMMU, hotplug
cases.
  - Extracted vendor data from IOMMU guest PASID bind data, for VT-d
will support all necessary guest PASID entry fields for PASID
bind.
  - Support non-identity host-guest PASID mapping
  - Exception handling in various cases

- V4
  - Redesigned IOASID allocator such that it can support custom
  allocators with shared helper functions. Use separate XArray
  to store IOASIDs per allocator. Took advice from Eric Auger to
  have default allocator use the generic allocator structure.
  Combined into one patch in that the default allocator is just
  "another" allocator now. Can be built as a module in case of
  driver use without IOMMU.
  - Extended bind guest PASID data to support SMMU and non-identity
  guest to host PASID mapping https://lkml.org/lkml/2019/5/21/802
  - Rebased on Jean's sva/api common tree, new patches starts with
   [PATCH v4 10/22]

- V3
  - Addressed thorough review comments from Eric Auger (Thank you!)
  - Moved IOASID allocator from driver core to IOMMU code per
suggestion by Christoph Hellwig
(https://lkml.org/lkml/2019/4/26/462)
  - Rebased on top of Jean's SVA API branch and Eric's v7[1]
(git://linux-arm.org/linux-jpb.git sva/api)
  - All IOMMU APIs are unmodified (except the new bind guest PASID
call in patch 9/16)

- V2
  - 

[PATCH v8 06/10] iommu/vt-d: Cache virtual command capability register

2019-12-16 Thread Jacob Pan
Virtual command registers are used in the guest only, to prevent
vmexit cost, we cache the capability and store it during initialization.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dmar.c| 1 +
 include/linux/intel-iommu.h | 4 
 2 files changed, 5 insertions(+)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index f2f5d75da94a..3f98dd9ad004 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -953,6 +953,7 @@ static int map_iommu(struct intel_iommu *iommu, u64 
phys_addr)
warn_invalid_dmar(phys_addr, " returns all ones");
goto unmap;
}
+   iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
 
/* the registers might be more than one page */
map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ee26989df008..4d25141ec3df 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -189,6 +189,9 @@
 #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
 #define ecap_sc_support(e) ((e >> 7) & 0x1) /* Snooping Control */
 
+/* Virtual command interface capabilities */
+#define vccap_pasid(v) ((v & DMA_VCS_PAS)) /* PASID allocation */
+
 /* IOTLB_REG */
 #define DMA_TLB_FLUSH_GRANU_OFFSET  60
 #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
@@ -531,6 +534,7 @@ struct intel_iommu {
u64 reg_size; /* size of hw register set */
u64 cap;
u64 ecap;
+   u64 vccap;
u32 gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
raw_spinlock_t  register_lock; /* protect register handling */
int seq_id; /* sequence id of the iommu */
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 07/10] iommu/vt-d: Enlightened PASID allocation

2019-12-16 Thread Jacob Pan
From: Lu Baolu 

Enabling IOMMU in a guest requires communication with the host
driver for certain aspects. Use of PASID ID to enable Shared Virtual
Addressing (SVA) requires managing PASID's in the host. VT-d 3.0 spec
provides a Virtual Command Register (VCMD) to facilitate this.
Writes to this register in the guest are trapped by QEMU which
proxies the call to the host driver.

This virtual command interface consists of a capability register,
a virtual command register, and a virtual response register. Refer
to section 10.4.42, 10.4.43, 10.4.44 for more information.

This patch adds the enlightened PASID allocation/free interfaces
via the virtual command interface.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Liu Yi L 
Signed-off-by: Lu Baolu 
Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
---
 drivers/iommu/intel-pasid.c | 57 +
 drivers/iommu/intel-pasid.h | 13 ++-
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 10f8c7564118..0a135a5937f3 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -27,6 +27,63 @@
 static DEFINE_SPINLOCK(pasid_lock);
 u32 intel_pasid_max_id = PASID_MAX;
 
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
+{
+   unsigned long flags;
+   u8 status_code;
+   int ret = 0;
+   u64 res;
+
+   raw_spin_lock_irqsave(>register_lock, flags);
+   dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
+   IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+ !(res & VCMD_VRSP_IP), res);
+   raw_spin_unlock_irqrestore(>register_lock, flags);
+
+   status_code = VCMD_VRSP_SC(res);
+   switch (status_code) {
+   case VCMD_VRSP_SC_SUCCESS:
+   *pasid = VCMD_VRSP_RESULT_PASID(res);
+   break;
+   case VCMD_VRSP_SC_NO_PASID_AVAIL:
+   pr_info("IOMMU: %s: No PASID available\n", iommu->name);
+   ret = -ENOSPC;
+   break;
+   default:
+   ret = -ENODEV;
+   pr_warn("IOMMU: %s: Unexpected error code %d\n",
+   iommu->name, status_code);
+   }
+
+   return ret;
+}
+
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
+{
+   unsigned long flags;
+   u8 status_code;
+   u64 res;
+
+   raw_spin_lock_irqsave(>register_lock, flags);
+   dmar_writeq(iommu->reg + DMAR_VCMD_REG,
+   VCMD_CMD_OPERAND(pasid) | VCMD_CMD_FREE);
+   IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+ !(res & VCMD_VRSP_IP), res);
+   raw_spin_unlock_irqrestore(>register_lock, flags);
+
+   status_code = VCMD_VRSP_SC(res);
+   switch (status_code) {
+   case VCMD_VRSP_SC_SUCCESS:
+   break;
+   case VCMD_VRSP_SC_INVALID_PASID:
+   pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
+   break;
+   default:
+   pr_warn("IOMMU: %s: Unexpected error code %d\n",
+   iommu->name, status_code);
+   }
+}
+
 /*
  * Per device pasid table management:
  */
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 95ed160b1947..b4c8aece979f 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -23,6 +23,16 @@
 #define is_pasid_enabled(entry)(((entry)->lo >> 3) & 0x1)
 #define get_pasid_dir_size(entry)  (1 << entry)->lo >> 9) & 0x7) + 7))
 
+/* Virtual command interface for enlightened pasid management. */
+#define VCMD_CMD_ALLOC 0x1
+#define VCMD_CMD_FREE  0x2
+#define VCMD_VRSP_IP   0x1
+#define VCMD_VRSP_SC(e)(((e) >> 1) & 0x3)
+#define VCMD_VRSP_SC_SUCCESS   0
+#define VCMD_VRSP_SC_NO_PASID_AVAIL1
+#define VCMD_VRSP_SC_INVALID_PASID 1
+#define VCMD_VRSP_RESULT_PASID(e)  (((e) >> 8) & 0xf)
+#define VCMD_CMD_OPERAND(e)((e) << 8)
 /*
  * Domain ID reserved for pasid entries programmed for first-level
  * only and pass-through transfer modes.
@@ -107,5 +117,6 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu,
int addr_width);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 struct device *dev, int pasid);
-
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
 #endif /* __INTEL_PASID_H */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4d25141ec3df..1e11560b0e59 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -164,6 +164,7 @@
 #define ecap_smpwc(e)  (((e) >> 48) & 0x1)
 #define ecap_flts(e)   (((e) >> 47) & 0x1)
 #define ecap_slts(e)   (((e) >> 46) 

[PATCH v8 01/10] iommu/vt-d: Move domain helper to header

2019-12-16 Thread Jacob Pan
Move domain helper to header to be used by SVA code.

Signed-off-by: Jacob Pan 
Reviewed-by: Eric Auger 
---
 drivers/iommu/intel-iommu.c | 6 --
 include/linux/intel-iommu.h | 6 ++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index fdb688983fec..cc89791d807c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -428,12 +428,6 @@ static void init_translation_status(struct intel_iommu 
*iommu)
iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
 }
 
-/* Convert generic 'struct iommu_domain to private struct dmar_domain */
-static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
-{
-   return container_of(dom, struct dmar_domain, domain);
-}
-
 static int __init intel_iommu_setup(char *str)
 {
if (!str)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index aaece25c055f..74b79e2e6a73 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -589,6 +589,12 @@ static inline void __iommu_flush_cache(
clflush_cache_range(addr, size);
 }
 
+/* Convert generic struct iommu_domain to private struct dmar_domain */
+static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct dmar_domain, domain);
+}
+
 /*
  * 0: readable
  * 1: writable
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 08/10] iommu/vt-d: Add custom allocator for IOASID

2019-12-16 Thread Jacob Pan
When VT-d driver runs in the guest, PASID allocation must be
performed via virtual command interface. This patch registers a
custom IOASID allocator which takes precedence over the default
XArray based allocator. The resulting IOASID allocation will always
come from the host. This ensures that PASID namespace is system-
wide.

Signed-off-by: Lu Baolu 
Signed-off-by: Liu, Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 75 +
 include/linux/intel-iommu.h |  2 ++
 2 files changed, 77 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e90102c7540d..b0c0bb6f740e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1700,6 +1700,9 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
}
+   if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
+   ioasid_unregister_allocator(>pasid_allocator);
+
 #endif
 }
 
@@ -3181,6 +3184,75 @@ static int copy_translation_tables(struct intel_iommu 
*iommu)
return ret;
 }
 
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
+{
+   struct intel_iommu *iommu = data;
+   ioasid_t ioasid;
+
+   /*
+* VT-d virtual command interface always uses the full 20 bit
+* PASID range. Host can partition guest PASID range based on
+* policies but it is out of guest's control.
+*/
+   if (min < PASID_MIN || max > intel_pasid_max_id)
+   return INVALID_IOASID;
+
+   if (vcmd_alloc_pasid(iommu, ))
+   return INVALID_IOASID;
+
+   return ioasid;
+}
+
+static void intel_ioasid_free(ioasid_t ioasid, void *data)
+{
+   struct intel_iommu *iommu = data;
+
+   if (!iommu)
+   return;
+   /*
+* Sanity check the ioasid owner is done at upper layer, e.g. VFIO
+* We can only free the PASID when all the devices are unbound.
+*/
+   if (ioasid_find(NULL, ioasid, NULL)) {
+   pr_alert("Cannot free active IOASID %d\n", ioasid);
+   return;
+   }
+   vcmd_free_pasid(iommu, ioasid);
+}
+
+static void register_pasid_allocator(struct intel_iommu *iommu)
+{
+   if (!intel_iommu_sm) {
+   pr_warn("VT-d scalable mode not enabled\n");
+   return;
+   }
+
+   /*
+* Register a custom PASID allocator if we are running in a guest,
+* guest PASID must be obtained via virtual command interface.
+* There can be multiple vIOMMUs in each guest but only one allocator
+* is active. All vIOMMU allocators will eventually be calling the same
+* host allocator.
+*/
+   if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) {
+   pr_info("Register custom PASID allocator\n");
+   iommu->pasid_allocator.alloc = intel_ioasid_alloc;
+   iommu->pasid_allocator.free = intel_ioasid_free;
+   iommu->pasid_allocator.pdata = (void *)iommu;
+   if (!ioasid_register_allocator(>pasid_allocator)) {
+   pr_warn("Custom PASID allocator failed, scalable mode 
disabled\n");
+   /*
+* Disable scalable mode on this IOMMU if there
+* is no custom allocator. Mixing SM capable vIOMMU
+* and non-SM vIOMMU are not supported.
+*/
+   intel_iommu_sm = 0;
+   }
+   }
+}
+#endif
+
 static int __init init_dmars(void)
 {
struct dmar_drhd_unit *drhd;
@@ -3298,6 +3370,9 @@ static int __init init_dmars(void)
 */
for_each_active_iommu(iommu, drhd) {
iommu_flush_write_buffer(iommu);
+#ifdef CONFIG_INTEL_IOMMU_SVM
+   register_pasid_allocator(iommu);
+#endif
iommu_set_root_entry(iommu);
iommu->flush.flush_context(iommu, 0, 0, 0, 
DMA_CCMD_GLOBAL_INVL);
iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 1e11560b0e59..8c30b23bd838 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -557,6 +558,7 @@ struct intel_iommu {
 #ifdef CONFIG_INTEL_IOMMU_SVM
struct page_req_dsc *prq;
unsigned char prq_name[16];/* Name for PRQ interrupt */
+   struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for 
PASIDs */
 #endif
struct q_inval  *qi;/* Queued invalidation info */
u32 *iommu_state; /* Store iommu states between suspend and resume.*/
-- 
2.7.4

___
iommu mailing list

[PATCH v8 04/10] iommu/vt-d: Support flushing more translation cache types

2019-12-16 Thread Jacob Pan
When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
IOTLB invalidation may be passed down from outside IOMMU subsystems.
This patch adds invalidation functions that can be used for additional
translation cache types.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dmar.c| 46 +
 drivers/iommu/intel-pasid.c |  3 ++-
 include/linux/intel-iommu.h | 21 +
 3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 3acfa6a25fa2..f2f5d75da94a 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1348,6 +1348,20 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, 
u64 addr,
qi_submit_sync(, iommu);
 }
 
+/* PASID-based IOTLB Invalidate */
+void qi_flush_iotlb_pasid(struct intel_iommu *iommu, u16 did, u64 addr, u32 
pasid,
+   unsigned int size_order, u64 granu, int ih)
+{
+   struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
+
+   desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
+   QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
+   desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
+   QI_EIOTLB_AM(size_order);
+
+   qi_submit_sync(, iommu);
+}
+
 void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
u16 qdep, u64 addr, unsigned mask)
 {
@@ -1371,6 +1385,38 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 
sid, u16 pfsid,
qi_submit_sync(, iommu);
 }
 
+/* PASID-based device IOTLB Invalidate */
+void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+   u32 pasid,  u16 qdep, u64 addr, unsigned size_order, u64 granu)
+{
+   struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
+
+   desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
+   QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
+   QI_DEV_IOTLB_PFSID(pfsid);
+   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
+
+   /* If S bit is 0, we only flush a single page. If S bit is set,
+* The least significant zero bit indicates the invalidation address
+* range. VT-d spec 6.5.2.6.
+* e.g. address bit 12[0] indicates 8KB, 13[0] indicates 16KB.
+*/
+   if (!size_order) {
+   desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE;
+   } else {
+   unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order);
+   desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | 
QI_DEV_EIOTLB_SIZE;
+   }
+   qi_submit_sync(, iommu);
+}
+
+void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int 
pasid)
+{
+   struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
+
+   desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) | QI_PC_GRAN(granu) | 
QI_PC_TYPE;
+   qi_submit_sync(, iommu);
+}
 /*
  * Disable Queued Invalidation interface.
  */
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index b178ad9e47ae..10f8c7564118 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -435,7 +435,8 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu 
*iommu,
 {
struct qi_desc desc;
 
-   desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
+   desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
+   QI_PC_PASID(pasid) | QI_PC_TYPE;
desc.qw1 = 0;
desc.qw2 = 0;
desc.qw3 = 0;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 412a90cb1738..ee26989df008 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -329,7 +329,7 @@ enum {
 #define QI_IOTLB_GRAN(gran)(((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
 #define QI_IOTLB_ADDR(addr)(((u64)addr) & VTD_PAGE_MASK)
 #define QI_IOTLB_IH(ih)(((u64)ih) << 6)
-#define QI_IOTLB_AM(am)(((u8)am))
+#define QI_IOTLB_AM(am)(((u8)am) & 0x3f)
 
 #define QI_CC_FM(fm)   (((u64)fm) << 48)
 #define QI_CC_SID(sid) (((u64)sid) << 32)
@@ -348,16 +348,21 @@ enum {
 #define QI_PC_DID(did) (((u64)did) << 16)
 #define QI_PC_GRAN(gran)   (((u64)gran) << 4)
 
-#define QI_PC_ALL_PASIDS   (QI_PC_TYPE | QI_PC_GRAN(0))
-#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1))
+/* PASID cache invalidation granu */
+#define QI_PC_ALL_PASIDS   0
+#define QI_PC_PASID_SEL1
 
 #define QI_EIOTLB_ADDR(addr)   ((u64)(addr) & VTD_PAGE_MASK)
 #define QI_EIOTLB_IH(ih)   (((u64)ih) << 6)
-#define QI_EIOTLB_AM(am)   (((u64)am))
+#define QI_EIOTLB_AM(am)   (((u64)am) & 0x3f)
 #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32)
 #define QI_EIOTLB_DID(did) (((u64)did) << 16)
 #define QI_EIOTLB_GRAN(gran)   (((u64)gran) << 4)
 
+/* QI Dev-IOTLB inv granu */
+#define QI_DEV_IOTLB_GRAN_ALL  1
+#define QI_DEV_IOTLB_GRAN_PASID_SEL0
+
 #define 

[PATCH v8 09/10] iommu/ioasid: Add notifier for status change

2019-12-16 Thread Jacob Pan
IOASIDs are system resources that can be shared by multiple drivers or
subsystems. When status of an IOASID changes at runtime, there is need
to notify all current users such that proper actions can be taken.

For example, an IOASID can be used by IOMMU subsystem for guest SVM as
well as KVM. When the guest is terminating unexpectedly, both KVM and
IOMMU need to perform clean up action before the IOASID is reclaimed.

This patch adds a per IOASID notifier that can be registered by
interesting parties.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/ioasid.c | 43 +++
 include/linux/ioasid.h | 20 
 2 files changed, 63 insertions(+)

diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 0f8dd377aada..53a2ab287f7d 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -15,6 +15,7 @@ struct ioasid_data {
struct ioasid_set *set;
void *private;
struct rcu_head rcu;
+   struct atomic_notifier_head notifications;
 };
 
 /*
@@ -314,6 +315,7 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, 
ioasid_t max,
 
data->set = set;
data->private = private;
+   ATOMIC_INIT_NOTIFIER_HEAD(>notifications);
 
/*
 * Custom allocator needs allocator data to perform platform specific
@@ -360,6 +362,9 @@ void ioasid_free(ioasid_t ioasid)
goto exit_unlock;
}
 
+   /* Notify all users that this IOASID is being freed */
+   atomic_notifier_call_chain(_data->notifications, IOASID_FREE,
+);
active_allocator->ops->free(ioasid, active_allocator->ops->pdata);
/* Custom allocator needs additional steps to free the xa element */
if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) {
@@ -416,6 +421,44 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
 }
 EXPORT_SYMBOL_GPL(ioasid_find);
 
+int ioasid_add_notifier(ioasid_t ioasid, struct notifier_block *nb)
+{
+   struct ioasid_allocator_data *idata;
+   struct ioasid_data *data;
+   int ret = 0;
+
+   rcu_read_lock();
+   idata = rcu_dereference(active_allocator);
+   data = xa_load(>xa, ioasid);
+   if (!data) {
+   ret = -ENOENT;
+   goto unlock;
+   }
+   ret = atomic_notifier_chain_register(>notifications, nb);
+unlock:
+   rcu_read_unlock();
+   return ret;
+}
+EXPORT_SYMBOL_GPL(ioasid_add_notifier);
+
+void ioasid_remove_notifier(ioasid_t ioasid, struct notifier_block *nb)
+{
+   struct ioasid_allocator_data *idata;
+   struct ioasid_data *data;
+
+   rcu_read_lock();
+   idata = rcu_dereference(active_allocator);
+   data = xa_load(>xa, ioasid);
+   rcu_read_unlock();
+   if (!data) {
+   pr_err("IOASID %d not found\n", ioasid);
+   return;
+   }
+   /* Unregister can sleep, called outside RCU critical section. */
+   atomic_notifier_chain_unregister(>notifications, nb);
+}
+EXPORT_SYMBOL_GPL(ioasid_remove_notifier);
+
 MODULE_AUTHOR("Jean-Philippe Brucker ");
 MODULE_AUTHOR("Jacob Pan ");
 MODULE_DESCRIPTION("IO Address Space ID (IOASID) allocator");
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index 6f000d7a0ddc..4517c4be4088 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 #define INVALID_IOASID ((ioasid_t)-1)
 typedef unsigned int ioasid_t;
@@ -29,6 +30,12 @@ struct ioasid_allocator_ops {
void *pdata;
 };
 
+/* Notification data when IOASID status changed */
+enum ioasid_notify_val {
+   IOASID_FREE = 1,
+   IOASID_SUSPEND,
+};
+
 #define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
 
 #if IS_ENABLED(CONFIG_IOASID)
@@ -40,6 +47,8 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
 int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
 void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
 int ioasid_set_data(ioasid_t ioasid, void *data);
+int ioasid_add_notifier(ioasid_t ioasid, struct notifier_block *nb);
+void ioasid_remove_notifier(ioasid_t ioasid, struct notifier_block *nb);
 
 #else /* !CONFIG_IOASID */
 static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
@@ -58,6 +67,17 @@ static inline void *ioasid_find(struct ioasid_set *set, 
ioasid_t ioasid,
return NULL;
 }
 
+static inline int ioasid_add_notifier(ioasid_t ioasid,
+ struct notifier_block *nb)
+{
+   return -ENOTSUPP;
+}
+
+static inline void ioasid_remove_notifier(ioasid_t ioasid,
+  struct notifier_block *nb)
+{
+}
+
 static inline int ioasid_register_allocator(struct ioasid_allocator_ops 
*allocator)
 {
return -ENOTSUPP;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH v8 10/10] iommu/vt-d: Handle IOASID notifications

2019-12-16 Thread Jacob Pan
IOASID/PASID are shared system resources that can be freed by software
components outside IOMMU subsystem. When status of an IOASID changes,
e.g. freed or suspended, notifications will be available to its users to
take proper action.

This patch adds a notification block such that when IOASID is freed by
other components such as VFIO, associated software and hardware context
can be cleaned.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-svm.c   | 52 +
 include/linux/intel-iommu.h |  2 +-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index f580b7be63c5..a660e741551c 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -230,6 +230,48 @@ static LIST_HEAD(global_svm_list);
list_for_each_entry((sdev), &(svm)->devs, list) \
if ((d) != (sdev)->dev) {} else
 
+static int ioasid_status_change(struct notifier_block *nb,
+   unsigned long code, void *data)
+{
+   ioasid_t ioasid = *(ioasid_t *)data;
+   struct intel_svm_dev *sdev;
+   struct intel_svm *svm;
+
+   if (code == IOASID_FREE) {
+   /*
+* Unbind all devices associated with this PASID which is
+* being freed by other users such as VFIO.
+*/
+   svm = ioasid_find(NULL, ioasid, NULL);
+   if (!svm || !svm->iommu)
+   return NOTIFY_DONE;
+
+   if (IS_ERR(svm))
+   return NOTIFY_BAD;
+
+   list_for_each_entry(sdev, >devs, list) {
+   list_del_rcu(>list);
+   intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
+   svm->pasid);
+   kfree_rcu(sdev, rcu);
+
+   if (list_empty(>devs)) {
+   list_del(>list);
+   ioasid_set_data(ioasid, NULL);
+   kfree(svm);
+   }
+   }
+
+   return NOTIFY_OK;
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block svm_ioasid_notifier = {
+   .notifier_call = ioasid_status_change,
+};
+
 int intel_svm_bind_gpasid(struct iommu_domain *domain,
struct device *dev,
struct iommu_gpasid_bind_data *data)
@@ -319,6 +361,13 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain,
svm->gpasid = data->gpasid;
svm->flags |= SVM_FLAG_GUEST_PASID;
}
+   /* Get notified when IOASID is freed by others, e.g. VFIO */
+   ret = ioasid_add_notifier(data->hpasid, _ioasid_notifier);
+   if (ret) {
+   mmput(svm->mm);
+   kfree(svm);
+   goto out;
+   }
ioasid_set_data(data->hpasid, svm);
INIT_LIST_HEAD_RCU(>devs);
INIT_LIST_HEAD(>list);
@@ -432,6 +481,9 @@ int intel_svm_unbind_gpasid(struct device *dev, int pasid)
 * that PASID allocated by one guest cannot be
 * used by another.
 */
+   ioasid_remove_notifier(pasid,
+  _ioasid_notifier);
+
ioasid_set_data(pasid, NULL);
kfree(svm);
}
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 8c30b23bd838..e2a33c794e8d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -711,7 +711,7 @@ struct intel_svm_dev {
 struct intel_svm {
struct mmu_notifier notifier;
struct mm_struct *mm;
-
+   struct notifier_block *nb;
struct intel_iommu *iommu;
int flags;
int pasid;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] iommu/vt-d bad RMRR workarounds

2019-12-16 Thread Chen, Yian



On 12/13/2019 5:52 PM, Lu Baolu wrote:


On 12/13/19 10:31 PM, Barret Rhoden wrote:

On 12/11/19 9:43 PM, Lu Baolu wrote:
The VT-d spec defines the BIOS considerations about RMRR in section 
8.4:


"
BIOS must report the RMRR reported memory addresses as reserved (or as
EFI runtime) in the system memory map returned through methods such as
INT15, EFI GetMemoryMap etc.
"

So we should treat it as firmware bug if the RMRR range is not 
mapped as

RESERVED in the system memory map table.

As for how should the driver handle this case, ignoring buggy RMRR with
a warning message might be a possible choice.


Agreed, firmware should not be doing this.  My first patch just skips 
those entries, instead of aborting DMAR processing, and keeps the 
warning.




Hi Yian,

Does this work for you?

Best regards,
baolu


I made a comment in the the patch email "[PATCH 1/3] iommu/vt-d: skip 
RMRR entries that fail the sanity check "

thanks,
Yian

So long as the machine still boots in a safe manner, I'm reasonably 
happy.


Thanks,

Barret




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/3] iommu/vt-d: skip RMRR entries that fail the sanity check

2019-12-16 Thread Chen, Yian



On 12/11/2019 11:46 AM, Barret Rhoden wrote:

RMRR entries describe memory regions that are DMA targets for devices
outside the kernel's control.

RMRR entries that fail the sanity check are pointing to regions of
memory that the firmware did not tell the kernel are reserved or
otherwise should not be used.

Instead of aborting DMAR processing, this commit skips these RMRR
entries.  They will not be mapped into the IOMMU, but the IOMMU can
still be utilized.  If anything, when the IOMMU is on, those devices
will not be able to clobber RAM that the kernel has allocated from those
regions.

Signed-off-by: Barret Rhoden 
---
  drivers/iommu/intel-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f168cd8ee570..f7e09244c9e4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4316,7 +4316,7 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header 
*header, void *arg)
rmrr = (struct acpi_dmar_reserved_memory *)header;
ret = arch_rmrr_sanity_check(rmrr);
if (ret)
-   return ret;
+   return 0;
  
  	rmrru = kzalloc(sizeof(*rmrru), GFP_KERNEL);

if (!rmrru)
Parsing rmrr function should report the error to caller. The behavior to 
response the error can be
chose  by the caller in the calling stack, for example, 
dmar_walk_remapping_entries().
A concern is that ignoring a detected firmware bug might have a 
potential side impact though

it seemed safe for your case.

Thanks,
Yian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 5/5] drm/msm/a6xx: Support split pagetables

2019-12-16 Thread Rob Clark
On Mon, Dec 16, 2019 at 8:38 AM Jordan Crouse  wrote:
>
> Attempt to enable split pagetables if the arm-smmu driver supports it.
> This will move the default address space from the default region to
> the address range assigned to TTBR1. The behavior should be transparent
> to the driver for now but it gets the default buffers out of the way
> when we want to start swapping TTBR0 for context-specific pagetables.
>
> Signed-off-by: Jordan Crouse 

Reviewed-by: Rob Clark 

(my previous r-b's on the other patches from v2 carries over to v3)

> ---
>
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 52 
> ++-
>  1 file changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
> b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 5dc0b2c..1c6da93 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -811,6 +811,56 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
> return (unsigned long)busy_time;
>  }
>
> +static struct msm_gem_address_space *
> +a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
> +{
> +   struct iommu_domain *iommu = iommu_domain_alloc(_bus_type);
> +   struct msm_gem_address_space *aspace;
> +   struct msm_mmu *mmu;
> +   u64 start, size;
> +   u32 val = 1;
> +   int ret;
> +
> +   if (!iommu)
> +   return ERR_PTR(-ENOMEM);
> +
> +   /*
> +* Try to request split pagetables - the request has to be made before
> +* the domian is attached
> +*/
> +   iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, );
> +
> +   mmu = msm_iommu_new(>dev, iommu);
> +   if (IS_ERR(mmu)) {
> +   iommu_domain_free(iommu);
> +   return ERR_CAST(mmu);
> +   }
> +
> +   /*
> +* After the domain is attached, see if the split tables were actually
> +* successful.
> +*/
> +   ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, );
> +   if (!ret && val) {
> +   /*
> +* The aperture start will be at the beginning of the TTBR1
> +* space so use that as a base
> +*/
> +   start = iommu->geometry.aperture_start;
> +   size = 0x;
> +   } else {
> +   /* Otherwise use the legacy 32 bit region */
> +   start = SZ_16M;
> +   size = 0x - SZ_16M;
> +   }
> +
> +   aspace = msm_gem_address_space_create(mmu, "gpu", start, size);
> +   if (IS_ERR(aspace))
> +   iommu_domain_free(iommu);
> +
> +   return aspace;
> +}
> +
>  static const struct adreno_gpu_funcs funcs = {
> .base = {
> .get_param = adreno_get_param,
> @@ -832,7 +882,7 @@ static const struct adreno_gpu_funcs funcs = {
>  #if defined(CONFIG_DRM_MSM_GPU_STATE)
> .gpu_state_get = a6xx_gpu_state_get,
> .gpu_state_put = a6xx_gpu_state_put,
> -   .create_address_space = adreno_iommu_create_address_space,
> +   .create_address_space = a6xx_create_address_space,
>  #endif
> },
> .get_timestamp = a6xx_get_timestamp,
> --
> 2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu: fix KASAN use-after-free in iommu_insert_resv_region

2019-12-16 Thread Linus Torvalds
On Mon, Dec 16, 2019 at 6:36 AM Auger Eric  wrote:
>On 12/16/19 3:24 PM, Qian Cai wrote:
> >
> > Looks like Joerg is away for a few weeks. Could Andrew or Linus pick up this
> > use-after-free?

I've taken it.

> Note the right version to pick up is the v4, reviewed by Jerry:
> https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg36226.html

Btw, please use lore.kernel.org, it's much more convenient for me
(faster, and a single interface for me so that I don't have to always
figure out "what's the incantation to get the original raw email" that
so many archives make it hard to get).

   Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 5/5] drm/msm/a6xx: Support split pagetables

2019-12-16 Thread Jordan Crouse
Attempt to enable split pagetables if the arm-smmu driver supports it.
This will move the default address space from the default region to
the address range assigned to TTBR1. The behavior should be transparent
to the driver for now but it gets the default buffers out of the way
when we want to start swapping TTBR0 for context-specific pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 52 ++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 5dc0b2c..1c6da93 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -811,6 +811,56 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+static struct msm_gem_address_space *
+a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct iommu_domain *iommu = iommu_domain_alloc(_bus_type);
+   struct msm_gem_address_space *aspace;
+   struct msm_mmu *mmu;
+   u64 start, size;
+   u32 val = 1;
+   int ret;
+
+   if (!iommu)
+   return ERR_PTR(-ENOMEM);
+
+   /*
+* Try to request split pagetables - the request has to be made before
+* the domian is attached
+*/
+   iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, );
+
+   mmu = msm_iommu_new(>dev, iommu);
+   if (IS_ERR(mmu)) {
+   iommu_domain_free(iommu);
+   return ERR_CAST(mmu);
+   }
+
+   /*
+* After the domain is attached, see if the split tables were actually
+* successful.
+*/
+   ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, );
+   if (!ret && val) {
+   /*
+* The aperture start will be at the beginning of the TTBR1
+* space so use that as a base
+*/
+   start = iommu->geometry.aperture_start;
+   size = 0x;
+   } else {
+   /* Otherwise use the legacy 32 bit region */
+   start = SZ_16M;
+   size = 0x - SZ_16M;
+   }
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", start, size);
+   if (IS_ERR(aspace))
+   iommu_domain_free(iommu);
+
+   return aspace;
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -832,7 +882,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
-   .create_address_space = adreno_iommu_create_address_space,
+   .create_address_space = a6xx_create_address_space,
 #endif
},
.get_timestamp = a6xx_get_timestamp,
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/5] drm/msm: Refactor address space initialization

2019-12-16 Thread Jordan Crouse
Refactor how address space initialization works. Instead of having the
address space function create the MMU object (and thus require separate but
equal functions for gpummu and iommu) use a single function and pass the
MMU struct. Make the generic code cleaner by using target specific
functions to create the address space so a2xx can do its own thing in its
own space.  For all the other targets use a generic helper to initialize
IOMMU but leave the door open for newer targets to use customization
if they need it.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 ++
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 10 +++---
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 14 +
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 ---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 11 +--
 drivers/gpu/drm/msm/msm_drv.h|  8 ++---
 drivers/gpu/drm/msm/msm_gem_vma.c| 52 +---
 drivers/gpu/drm/msm/msm_gpu.c| 40 ++--
 drivers/gpu/drm/msm/msm_gpu.h|  4 +--
 drivers/gpu/drm/msm/msm_iommu.c  |  3 ++
 16 files changed, 83 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index 1f83bc1..60f6472 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -401,6 +401,21 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct 
msm_gpu *gpu)
return state;
 }
 
+static struct msm_gem_address_space *
+a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct msm_mmu *mmu = msm_gpummu_new(>dev, gpu);
+   struct msm_gem_address_space *aspace;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
+   SZ_16M + 0xfff * SZ_64K);
+
+   if (IS_ERR(aspace) && !IS_ERR(mmu))
+   mmu->funcs->destroy(mmu);
+
+   return aspace;
+}
+
 /* Register offset defines for A2XX - copy of A3XX */
 static const unsigned int a2xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
@@ -429,6 +444,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a2xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = a2xx_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 7ad1493..41e51e0 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -441,6 +441,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a3xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index b01388a..3655440 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -532,6 +532,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a4xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a4xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index b02e204..0f5db72 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1432,6 +1432,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_busy = a5xx_gpu_busy,
.gpu_state_get = a5xx_gpu_state_get,
.gpu_state_put = a5xx_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a5xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index dc8ec2c..5dc0b2c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -832,6 +832,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
 #endif
},
.get_timestamp = a6xx_get_timestamp,
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 

[PATCH v3 2/5] iommu/arm-smmu: Add support for split pagetables

2019-12-16 Thread Jordan Crouse
Add support to enable split pagetables (TTBR1) if the supporting driver
requests it via the DOMAIN_ATTR_SPLIT_TABLES flag. When enabled, the driver
will set up the TTBR0 and TTBR1 regions and program the default domain
pagetable on TTBR1.

After attaching the device, the value of he domain attribute can
be queried to see if the split pagetables were successfully programmed.
Furthermore the domain geometry will be updated so that the caller can
determine the active region for the pagetable that was programmed.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu.c | 40 +++-
 drivers/iommu/arm-smmu.h | 45 +++--
 2 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c106406..7b59116 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -538,9 +538,17 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
cb->ttbr[1] = 0;
} else {
-   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-   cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
-   cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
+   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   cb->ttbr[0] = FIELD_PREP(TTBRn_ASID, cfg->asid);
+   cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[1] |=
+   FIELD_PREP(TTBRn_ASID, cfg->asid);
+   } else {
+   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[0] |=
+   FIELD_PREP(TTBRn_ASID, cfg->asid);
+   cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
+   }
}
} else {
cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -651,6 +659,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
enum io_pgtable_fmt fmt;
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_cfg *cfg = _domain->cfg;
+   u32 quirks = 0;
 
mutex_lock(_domain->init_mutex);
if (smmu_domain->smmu)
@@ -719,6 +728,8 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
oas = smmu->ipa_size;
if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
fmt = ARM_64_LPAE_S1;
+   if (smmu_domain->split_pagetables)
+   quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
} else if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_L) {
fmt = ARM_32_LPAE_S1;
ias = min(ias, 32UL);
@@ -788,6 +799,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.coherent_walk  = smmu->features & ARM_SMMU_FEAT_COHERENT_WALK,
.tlb= smmu_domain->flush_ops,
.iommu_dev  = smmu->dev,
+   .quirks = quirks,
};
 
if (smmu_domain->non_strict)
@@ -801,8 +813,15 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 
/* Update the domain's page sizes to reflect the page table format */
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
-   domain->geometry.force_aperture = true;
+
+   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   domain->geometry.aperture_start = ~((1ULL << ias) - 1);
+   domain->geometry.aperture_end = ~0UL;
+   } else {
+   domain->geometry.aperture_end = (1UL << ias) - 1;
+   domain->geometry.force_aperture = true;
+   smmu_domain->split_pagetables = false;
+   }
 
/* Initialise the context bank with our page table cfg */
arm_smmu_init_context_bank(smmu_domain, _cfg);
@@ -1484,6 +1503,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_SPLIT_TABLES:
+   *(int *)data = smmu_domain->split_pagetables;
+   return 0;
default:
return -ENODEV;
}
@@ -1524,6 +1546,14 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
else
smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
break;
+   case DOMAIN_ATTR_SPLIT_TABLES:
+ 

[PATCH v3 3/5] drm/msm: Attach the IOMMU device during initialization

2019-12-16 Thread Jordan Crouse
Everywhere an IOMMU object is created by msm_gpu_create_address_space
the IOMMU device is attached immediately after. Instead of carrying around
the infrastructure to do the attach from the device specific code do it
directly in the msm_iommu_init() function. This gets it out of the way for
more aggressive cleanups that follow.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  |  8 
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c |  4 
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  7 ---
 drivers/gpu/drm/msm/msm_gem_vma.c| 23 +++
 drivers/gpu/drm/msm/msm_gpu.c| 11 +--
 drivers/gpu/drm/msm/msm_gpummu.c |  6 --
 drivers/gpu/drm/msm/msm_iommu.c  | 15 +++
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 8 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index 6c92f0f..b082b23 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -704,7 +704,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
 {
struct iommu_domain *domain;
struct msm_gem_address_space *aspace;
-   int ret;
 
domain = iommu_domain_alloc(_bus_type);
if (!domain)
@@ -720,13 +719,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
return PTR_ERR(aspace);
}
 
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DPU_ERROR("failed to attach iommu %d\n", ret);
-   msm_gem_address_space_put(aspace);
-   return ret;
-   }
-
dpu_kms->base.aspace = aspace;
return 0;
 }
diff --git a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c 
b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
index dda0543..9dba37c 100644
--- a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
@@ -518,10 +518,6 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret)
-   goto fail;
} else {
DRM_DEV_INFO(dev->dev, "no iommu, fallback to phys "
"contig buffers for scanout\n");
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index e43ecd4..653dab2 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -736,13 +736,6 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DRM_DEV_ERROR(>dev, "failed to attach iommu: 
%d\n",
-   ret);
-   goto fail;
-   }
} else {
DRM_DEV_INFO(>dev,
 "no iommu, fallback to phys contig buffers for 
scanout\n");
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 1af5354..91d993a 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -131,8 +131,8 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
const char *name)
 {
struct msm_gem_address_space *aspace;
-   u64 size = domain->geometry.aperture_end -
-   domain->geometry.aperture_start;
+   u64 start = domain->geometry.aperture_start;
+   u64 size = domain->geometry.aperture_end - start;
 
aspace = kzalloc(sizeof(*aspace), GFP_KERNEL);
if (!aspace)
@@ -141,9 +141,18 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
spin_lock_init(>lock);
aspace->name = name;
aspace->mmu = msm_iommu_new(dev, domain);
+   if (IS_ERR(aspace->mmu)) {
+   int ret = PTR_ERR(aspace->mmu);
 
-   drm_mm_init(>mm, (domain->geometry.aperture_start >> 
PAGE_SHIFT),
-   size >> PAGE_SHIFT);
+   kfree(aspace);
+   return ERR_PTR(ret);
+   }
+
+   /*
+* Attaching the IOMMU device changes the aperture values so use the
+* cached values instead
+*/
+   drm_mm_init(>mm, start >> PAGE_SHIFT, size >> PAGE_SHIFT);
 
kref_init(>kref);
 
@@ -164,6 +173,12 @@ msm_gem_address_space_create_a2xx(struct device *dev, 
struct msm_gpu *gpu,
spin_lock_init(>lock);
aspace->name = name;
aspace->mmu = msm_gpummu_new(dev, gpu);
+   if (IS_ERR(aspace->mmu)) {
+   int ret = PTR_ERR(aspace->mmu);
+
+   kfree(aspace);
+   return ERR_PTR(ret);
+   }
 
drm_mm_init(>mm, (va_start >> PAGE_SHIFT),
size >> PAGE_SHIFT);
diff --git 

[PATCH v3 1/5] iommu: Add DOMAIN_ATTR_SPLIT_TABLES

2019-12-16 Thread Jordan Crouse
Add a new attribute to enable and query the state of split pagetables
for the domain.

Signed-off-by: Jordan Crouse 
---

 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f2223cb..18c861e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -126,6 +126,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   DOMAIN_ATTR_SPLIT_TABLES,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/5] iommu/arm-smmu: Split pagetable support for arm-smmu-v2

2019-12-16 Thread Jordan Crouse
Another refresh to support split pagetables for Adreno GPUs as part of an
incremental process to enable per-context pagetables.

In order to support per-context pagetables the GPU needs to enable split tables
so that we can store global buffers in the TTBR1 space leaving the GPU free to
program the TTBR0 register with the address of a context specific pagetable.

This patchset adds split pagetable support if requested by the domain owner
via the DOMAIN_ATTR_SPLIT_TABLES attribute. If the attribute is non zero at
attach time, the implementation will set up the TTBR0 and TTBR1 spaces with
identical configurations and program the domain pagetable into the TTBR1
register. The TTBR0 register will be unused.

The driver can determine if split pagetables were programmed by querying
DOMAIN_ATTR_SPLIT_TABLES after attaching. The domain geometry will also be
updated to reflect the virtual address space for the TTBR1 range.

These patches are on based on top of linux-next-20191216 with [1], [2], and [3]
from Robin on the iommu list.

Change log:

v3: Remove the implementation specific and make split pagetable support
part of the generic configuration

[1] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039718.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039719.html
[3] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039720.html


Jordan Crouse (5):
  iommu: Add DOMAIN_ATTR_SPLIT_TABLES
  iommu/arm-smmu: Add support for split pagetables
  drm/msm: Attach the IOMMU device during initialization
  drm/msm: Refactor address space initialization
  drm/msm/a6xx: Support split pagetables

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 ++
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c| 51 
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 18 ---
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 18 +--
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 ---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 18 +--
 drivers/gpu/drm/msm/msm_drv.h|  8 ++---
 drivers/gpu/drm/msm/msm_gem_vma.c| 37 +--
 drivers/gpu/drm/msm/msm_gpu.c| 49 ++
 drivers/gpu/drm/msm/msm_gpu.h|  4 +--
 drivers/gpu/drm/msm/msm_gpummu.c |  6 
 drivers/gpu/drm/msm/msm_iommu.c  | 18 ++-
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 drivers/iommu/arm-smmu.c | 40 +
 drivers/iommu/arm-smmu.h | 45 
 include/linux/iommu.h|  1 +
 21 files changed, 215 insertions(+), 153 deletions(-)

-- 
2.7.4
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu: fix KASAN use-after-free in iommu_insert_resv_region

2019-12-16 Thread Auger Eric
Hi,
On 12/16/19 3:24 PM, Qian Cai wrote:
> 
> 
>> On Nov 26, 2019, at 5:27 AM, Eric Auger  wrote:
>>
>> In case the new region gets merged into another one, the nr
>> list node is freed. Checking its type while completing the
>> merge algorithm leads to a use-after-free. Use new->type
>> instead.
>>
>> Fixes: 4dbd258ff63e ("iommu: Revisit iommu_insert_resv_region()
>> implementation")
>> Signed-off-by: Eric Auger 
>> Reported-by: Qian Cai 
>> Cc: Stable  #v5.3+
> 
> 
> Looks like Joerg is away for a few weeks. Could Andrew or Linus pick up this 
> use-after-free?
Thanks for the heads up. Indeed I was wondering why it hasn't been taken
yet.

Note the right version to pick up is the v4, reviewed by Jerry:
https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg36226.html

Thanks

Eric
> 
>>
>> ---
>>
>> v2 -> v3:
>> - directly use new->type
>>
>> v1 -> v2:
>> - remove spurious new line
>> ---
>> drivers/iommu/iommu.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index d658c7c6a2ab..285ad4a4c7f2 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -313,7 +313,7 @@ int iommu_insert_resv_region(struct iommu_resv_region 
>> *new,
>>  phys_addr_t top_end, iter_end = iter->start + iter->length - 1;
>>
>>  /* no merge needed on elements of different types than @nr */
>> -if (iter->type != nr->type) {
>> +if (iter->type != new->type) {
>>  list_move_tail(>list, );
>>  continue;
>>  }
>> -- 
>> 2.20.1
>>
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu: fix KASAN use-after-free in iommu_insert_resv_region

2019-12-16 Thread Qian Cai



> On Nov 26, 2019, at 5:27 AM, Eric Auger  wrote:
> 
> In case the new region gets merged into another one, the nr
> list node is freed. Checking its type while completing the
> merge algorithm leads to a use-after-free. Use new->type
> instead.
> 
> Fixes: 4dbd258ff63e ("iommu: Revisit iommu_insert_resv_region()
> implementation")
> Signed-off-by: Eric Auger 
> Reported-by: Qian Cai 
> Cc: Stable  #v5.3+


Looks like Joerg is away for a few weeks. Could Andrew or Linus pick up this 
use-after-free?

> 
> ---
> 
> v2 -> v3:
> - directly use new->type
> 
> v1 -> v2:
> - remove spurious new line
> ---
> drivers/iommu/iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d658c7c6a2ab..285ad4a4c7f2 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -313,7 +313,7 @@ int iommu_insert_resv_region(struct iommu_resv_region 
> *new,
>   phys_addr_t top_end, iter_end = iter->start + iter->length - 1;
> 
>   /* no merge needed on elements of different types than @nr */
> - if (iter->type != nr->type) {
> + if (iter->type != new->type) {
>   list_move_tail(>list, );
>   continue;
>   }
> -- 
> 2.20.1
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RESEND,PATCH 02/13] iommu/mediatek: Add mt6779 IOMMU basic support

2019-12-16 Thread Yong Wu
On Mon, 2019-11-04 at 19:52 +0800, Chao Hao wrote:
> 1. Add mt6779 registers define for iommu.
> 2. Add mt6779_data define to support mt6779 iommu HW init.
> 3. There are two iommus, one is mm_iommu, the other is vpu_iommu.
> MM_IOMMU is connected smi_larb to support multimedia engine to
> access DRAM, and VPU_IOMMU is connected to APU_bus to support
> VPU,MDLA,EDMA to access DRAM. MM_IOMMU and VPU_IOMMU use the same
> page table to simplify design by "mtk_iommu_get_m4u_data".
> 4. For smi_larb6, it doesn't use mm_iommu, so we can distinguish
> vpu_iommu by it when excutes iommu_probe.
> 5. For mt6779 APU_IOMMU fault id is irregular, so it was treated
> specially.
> 
> Signed-off-by: Chao Hao 
> ---
>  drivers/iommu/mtk_iommu.c | 91 +--
>  drivers/iommu/mtk_iommu.h | 10 -
>  2 files changed, 87 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 8ca2e99964fe..f2847e661137 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -38,12 +38,24 @@
>  #define REG_MMU_INVLD_END_A  0x028
>  
>  #define REG_MMU_INV_SEL  0x038
> +#define REG_MMU_INV_SEL_MT6779   0x02c
>  #define F_INVLD_EN0  BIT(0)
>  #define F_INVLD_EN1  BIT(1)
>  
>  #define REG_MMU_STANDARD_AXI_MODE0x048
> +
> +#define REG_MMU_MISC_CRTL_MT6779 0x048

Defining two register in the same offset look strange. see below.

> +#define REG_MMU_STANDARD_AXI_MODE_MT6779 (BIT(3) | BIT(19))
> +#define REG_MMU_COHERENCE_EN (BIT(0) | BIT(16))
> +#define REG_MMU_IN_ORDER_WR_EN   (BIT(1) | BIT(17))
> +#define F_MMU_HALF_ENTRY_MODE_L  (BIT(5) | BIT(21))
> +#define F_MMU_BLOCKING_MODE_L(BIT(4) | BIT(20))

The last four ones are not used. Please remove.

> +
>  #define REG_MMU_DCM_DIS  0x050
>  
> +#define REG_MMU_WR_LEN   0x054
> +#define F_MMU_WR_THROT_DIS   (BIT(5) |  BIT(21))
> +
>  #define REG_MMU_CTRL_REG 0x110
>  #define F_MMU_TF_PROT_TO_PROGRAM_ADDR(2 << 4)
>  #define F_MMU_PREFETCH_RT_REPLACE_MODBIT(4)
> @@ -88,10 +100,14 @@
>  #define REG_MMU1_INVLD_PA0x148
>  #define REG_MMU0_INT_ID  0x150
>  #define REG_MMU1_INT_ID  0x154
> +#define F_MMU_INT_ID_COMM_ID(a)  (((a) >> 9) & 0x7)
> +#define F_MMU_INT_ID_SUB_COMM_ID(a)  (((a) >> 7) & 0x3)
>  #define F_MMU_INT_ID_LARB_ID(a)  (((a) >> 7) & 0x7)
>  #define F_MMU_INT_ID_PORT_ID(a)  (((a) >> 2) & 0x1f)
> +#define F_MMU_INT_ID_COMM_APU_ID(a)  ((a) & 0x3)
> +#define F_MMU_INT_ID_SUB_APU_ID(a)   (((a) >> 2) & 0x3)
>  
> -#define MTK_PROTECT_PA_ALIGN 128
> +#define MTK_PROTECT_PA_ALIGN 256
>  
>  /*
>   * Get the local arbiter ID and the portid within the larb arbiter
> @@ -165,7 +181,7 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
>  
>   for_each_m4u(data) {
>   writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
> -data->base + REG_MMU_INV_SEL);
> +data->base + data->plat_data->inv_sel_reg);
>   writel_relaxed(F_ALL_INVLD, data->base + REG_MMU_INVALIDATE);
>   wmb(); /* Make sure the tlb flush all done */
>   }
> @@ -182,7 +198,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
> iova, size_t size,
>   for_each_m4u(data) {
>   spin_lock_irqsave(>tlb_lock, flags);
>   writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
> -data->base + REG_MMU_INV_SEL);
> +data->base + data->plat_data->inv_sel_reg);
>  
>   writel_relaxed(iova, data->base + REG_MMU_INVLD_START_A);
>   writel_relaxed(iova + size - 1,
> @@ -226,7 +242,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
>   struct mtk_iommu_data *data = dev_id;
>   struct mtk_iommu_domain *dom = data->m4u_dom;
>   u32 int_state, regval, fault_iova, fault_pa;
> - unsigned int fault_larb, fault_port;
> + unsigned int fault_larb, fault_port, sub_comm = 0;
>   bool layer, write;
>  
>   /* Read error info from registers */
> @@ -242,17 +258,30 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
>   }
>   layer = fault_iova & F_MMU_FAULT_VA_LAYER_BIT;
>   write = fault_iova & F_MMU_FAULT_VA_WRITE_BIT;
> - fault_larb = F_MMU_INT_ID_LARB_ID(regval);
>   fault_port = F_MMU_INT_ID_PORT_ID(regval);
> + if (data->plat_data->has_sub_comm[data->m4u_id]) {
> + /* m4u1 is VPU in mt6779.*/
> + if (data->m4u_id && data->plat_data->m4u_plat == 

Re: [RESEND,PATCH 03/13] iommu/mediatek: Add mtk_iommu_pgtable structure

2019-12-16 Thread Yong Wu
On Mon, 2019-11-04 at 19:52 +0800, Chao Hao wrote:
> Start with this patch, we will change the SW architecture
> to support multiple domains. SW architecture will has a big change,
> so we need to modify a little bit by more than one patch.
> The new SW overall architecture is as below:
> 
>   iommu0   iommu1
> | |
> ---
>   |
>   mtk_iommu_pgtable
>   |
>   --
>   ||   |
>   mtk_iommu_domain1   mtk_iommu_domain2  mtk_iommu_domain3
>   ||   |
>   iommu_group1 iommu_group2   iommu_group3
>   ||   |
>   iommu_domain1   iommu_domain2   iommu_domain3
>   ||   |
>   iova region1(normal)  iova region2(CCU)iova region3(VPU)
> 
> For current structure, no matter how many iommus there are,
> they use the same page table to simplify the usage of module.
> In order to make the software architecture more explicit, this
> patch will create a global mtk_iommu_pgtable structure to describe
> page table and all the iommus use it.

Thanks for the hard work of this file. Actually this patch and the later
ones confuse me. Why do you make this flow change? 
for making the code "more explicit" or for adding multi-domain support
in 13/13.

IMHO, the change is unnecessary.
a) For me, this change has no improvement. currently we use a global
mtk_iommu_get_m4u_data to get the M4U data. I will be very glad if you
could get rid of it. But in this patchset, You use a another global
mtk_iommu_pgtable to instead. For me. It has no improvement.

b) This patchset break the original flow. device_group give you a
software chance for initializing, then you move pagetable allocating
code into it. But it isn't device_group job.

I can not decide if your flow is right. But if you only want to add
support multi-domain, I guess you could extend the current "m4u_group"
to a array "m4u_group[N]". It may be more simple. To make mt6779
progress easily, I suggest you can use this way to support multi-domain
firstly. Then you could send this new mtk_iommu_pgtable patchset for the
code "more explicit" if you insist.

> The diagram is as below:
> 
>   mtk_iommu_data1(MM)   mtk_iommu_data2(APU)
>   |  |
>   |  |
>   --mtk_iommu_pgtable-
> 
> We need to create global mtk_iommu_pgtable to include all the iova
> regions firstly and special iova regions by divided based on it,
> so the information of pgtable needs to be created in device_group.
> 
> Signed-off-by: Chao Hao 
> ---
>  drivers/iommu/mtk_iommu.c | 84 +++
>  drivers/iommu/mtk_iommu.h |  1 +
>  2 files changed, 85 insertions(+)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index f2847e661137..fcbde6b0f58d 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -123,6 +123,12 @@ struct mtk_iommu_domain {
>   struct iommu_domain domain;
>  };
>  
> +struct mtk_iommu_pgtable {
> + struct io_pgtable_cfg   cfg;
> + struct io_pgtable_ops   *iop;
> +};
> +
> +static struct mtk_iommu_pgtable *share_pgtable;
>  static const struct iommu_ops mtk_iommu_ops;
>  
>  /*
> @@ -170,6 +176,11 @@ static struct mtk_iommu_data 
> *mtk_iommu_get_m4u_data(void)
>   return NULL;
>  }
>  
> +static struct mtk_iommu_pgtable *mtk_iommu_get_pgtable(void)
> +{
> + return share_pgtable;
> +}
> +
>  static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom)
>  {
>   return container_of(dom, struct mtk_iommu_domain, domain);
> @@ -322,6 +333,13 @@ static int mtk_iommu_domain_finalise(struct 
> mtk_iommu_domain *dom)
>  {
>   struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
>  
> + if (data->pgtable) {
> + dom->cfg = data->pgtable->cfg;
> + dom->iop = data->pgtable->iop;
> + dom->domain.pgsize_bitmap = data->pgtable->cfg.pgsize_bitmap;
> + return 0;
> + }
> +
>   dom->cfg = (struct io_pgtable_cfg) {
>   .quirks = IO_PGTABLE_QUIRK_ARM_NS |
>   IO_PGTABLE_QUIRK_NO_PERMS |
> @@ -345,6 +363,61 @@ static int mtk_iommu_domain_finalise(struct 
> mtk_iommu_domain *dom)
>   return 0;
>  }
>  
> +static struct mtk_iommu_pgtable *create_pgtable(struct mtk_iommu_data *data)
> +{
> + struct mtk_iommu_pgtable *pgtable;
> +
> + pgtable = kzalloc(sizeof(*pgtable), GFP_KERNEL);
> + if (!pgtable)
> + return 

Re: [RESEND,PATCH 01/13] dt-bindings: mediatek: Add bindings for MT6779

2019-12-16 Thread Yong Wu
On Mon, 2019-11-04 at 19:52 +0800, Chao Hao wrote:
> This patch adds description for MT6779 IOMMU.
> 
> MT6779 has two iommus, they are MM_IOMMU and APU_IOMMU which
> use ARM Short-Descriptor translation format.
> 
> The MT6779 IOMMU hardware diagram is as below, it is only a brief
> diagram about iommu, it don't focus on the part of smi_larb, so
> I don't describe the smi_larb detailedly.
> 
>EMI
> |
>  --
>  ||
> MM_IOMMUAPU_IOMMU
>  ||
>SMI_COMMOM---   APU_BUS
>   |  ||
> SMI_LARB(0~11)  SMI_LARB12(FAKE)  SMI_LARB13(FAKE)
> |||
> ||   --
> ||   | |  |
>Multimedia engineCCU VPU   MDLA   EMDA
> 
> All the connections are hardware fixed, software can not adjust it.
> 
> From the diagram above, MM_IOMMU provides mapping for multimedia engine,
> but CCU is connected with smi_common directly, we can take them as larb12.
> APU_IOMMU provides mapping for APU engine, we can take them larb13.
> Larb12 and Larb13 are fake larbs.
> 
> Signed-off-by: Chao Hao 
> ---
>  .../bindings/iommu/mediatek,iommu.txt |   2 +
>  include/dt-bindings/memory/mt6779-larb-port.h | 217 ++
>  2 files changed, 219 insertions(+)
>  create mode 100644 include/dt-bindings/memory/mt6779-larb-port.h
> 
> diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt 
> b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> index ce59a505f5a4..c1ccd8582eb2 100644
> --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
> @@ -58,6 +58,7 @@ Required properties:
>  - compatible : must be one of the following string:
>   "mediatek,mt2701-m4u" for mt2701 which uses generation one m4u HW.
>   "mediatek,mt2712-m4u" for mt2712 which uses generation two m4u HW.
> + "mediatek,mt6779-m4u" for mt6779 which uses generation two m4u HW.
>   "mediatek,mt7623-m4u", "mediatek,mt2701-m4u" for mt7623 which uses
>generation one m4u HW.
>   "mediatek,mt8173-m4u" for mt8173 which uses generation two m4u HW.
> @@ -78,6 +79,7 @@ Required properties:
>   Specifies the mtk_m4u_id as defined in
>   dt-binding/memory/mt2701-larb-port.h for mt2701, mt7623
>   dt-binding/memory/mt2712-larb-port.h for mt2712,
> + dt-binding/memory/mt6779-larb-port.h for mt6779,
>   dt-binding/memory/mt8173-larb-port.h for mt8173, and
>   dt-binding/memory/mt8183-larb-port.h for mt8183.
>  
> diff --git a/include/dt-bindings/memory/mt6779-larb-port.h 
> b/include/dt-bindings/memory/mt6779-larb-port.h
> new file mode 100644
> index ..8b7f2d2446ea
> --- /dev/null
> +++ b/include/dt-bindings/memory/mt6779-larb-port.h
> @@ -0,0 +1,217 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (c) 2019 MediaTek Inc.
> + * Author: Chao Hao 
> + */
> +
> +#ifndef _DTS_IOMMU_PORT_MT6779_H_
> +#define _DTS_IOMMU_PORT_MT6779_H_
> +
> +#define MTK_M4U_ID(larb, port)(((larb) << 5) | (port))
> +
> +#define M4U_LARB0_ID  0
> +#define M4U_LARB1_ID  1
> +#define M4U_LARB2_ID  2
> +#define M4U_LARB3_ID  3
> +#define M4U_LARB4_ID  4
> +#define M4U_LARB5_ID  5
> +#define M4U_LARB6_ID  6
> +#define M4U_LARB7_ID  7
> +#define M4U_LARB8_ID  8
> +#define M4U_LARB9_ID  9
> +#define M4U_LARB10_ID 10
> +#define M4U_LARB11_ID 11
> +#define M4U_LARB12_ID 12
> +#define M4U_LARB13_ID 13
> +
> +/* larb0 */
> +#define M4U_PORT_DISP_POSTMASK0   MTK_M4U_ID(M4U_LARB0_ID, 0)
> +#define M4U_PORT_DISP_OVL0_HDRMTK_M4U_ID(M4U_LARB0_ID, 1)
> +#define M4U_PORT_DISP_OVL1_HDRMTK_M4U_ID(M4U_LARB0_ID, 2)
> +#define M4U_PORT_DISP_OVL0MTK_M4U_ID(M4U_LARB0_ID, 3)
> +#define M4U_PORT_DISP_OVL1MTK_M4U_ID(M4U_LARB0_ID, 4)
> +#define M4U_PORT_DISP_PVRIC0  MTK_M4U_ID(M4U_LARB0_ID, 5)
> +#define M4U_PORT_DISP_RDMA0   MTK_M4U_ID(M4U_LARB0_ID, 6)
> +#define M4U_PORT_DISP_WDMA0   MTK_M4U_ID(M4U_LARB0_ID, 7)
> +#define M4U_PORT_DISP_FAKE0   MTK_M4U_ID(M4U_LARB0_ID, 8)
> +
> +/* larb1 */
> +#define M4U_PORT_DISP_OVL0_2L_HDR MTK_M4U_ID(M4U_LARB1_ID, 0)
> +#define M4U_PORT_DISP_OVL1_2L_HDR MTK_M4U_ID(M4U_LARB1_ID, 1)
> +#define M4U_PORT_DISP_OVL0_2L MTK_M4U_ID(M4U_LARB1_ID, 2)
>