RE: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Tian, Kevin
> From: David Gibson 
> Sent: Tuesday, June 8, 2021 8:50 AM
> 
> On Thu, Jun 03, 2021 at 06:49:20AM +, Tian, Kevin wrote:
> > > From: David Gibson
> > > Sent: Thursday, June 3, 2021 1:09 PM
> > [...]
> > > > > In this way the SW mode is the same as a HW mode with an infinite
> > > > > cache.
> > > > >
> > > > > The collaposed shadow page table is really just a cache.
> > > > >
> > > >
> > > > OK. One additional thing is that we may need a 'caching_mode"
> > > > thing reported by /dev/ioasid, indicating whether invalidation is
> > > > required when changing non-present to present. For hardware
> > > > nesting it's not reported as the hardware IOMMU will walk the
> > > > guest page table in cases of iotlb miss. For software nesting
> > > > caching_mode is reported so the user must issue invalidation
> > > > upon any change in guest page table so the kernel can update
> > > > the shadow page table timely.
> > >
> > > For the fist cut, I'd have the API assume that invalidates are
> > > *always* required.  Some bypass to avoid them in cases where they're
> > > not needed can be an additional extension.
> > >
> >
> > Isn't a typical TLB semantics is that non-present entries are not
> > cached thus invalidation is not required when making non-present
> > to present?
> 
> Usually, but not necessarily.
> 
> > It's true to both CPU TLB and IOMMU TLB.
> 
> I don't think it's entirely true of the CPU TLB on all ppc MMU models
> (of which there are far too many).
> 
> > In reality
> > I feel there are more usages built on hardware nesting than software
> > nesting thus making default following hardware TLB behavior makes
> > more sense...
> 
> I'm arguing for always-require-invalidate because it's strictly more
> general.  Requiring the invalidate will support models that don't
> require it in all cases; we just make the invalidate a no-op.  The
> reverse is not true, so we should tackle the general case first, then
> optimize.
> 

It makes sense. Will adopt this way.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Wednesday, June 9, 2021 2:47 AM
> 
> On Tue, 8 Jun 2021 15:44:26 +0200
> Paolo Bonzini  wrote:
> 
> > On 08/06/21 15:15, Jason Gunthorpe wrote:
> > > On Tue, Jun 08, 2021 at 09:56:09AM +0200, Paolo Bonzini wrote:
> > >
> >  Alternatively you can add a KVM_DEV_IOASID_{ADD,DEL} pair of
> ioctls. But it
> >  seems useless complication compared to just using what we have
> now, at least
> >  while VMs only use IOASIDs via VFIO.
> > >>>
> > >>> The simplest is KVM_ENABLE_WBINVD() and be
> done
> > >>> with it.
> 
> Even if we were to relax wbinvd access to any device (capable of
> no-snoop or not) in any IOMMU configuration (blocking no-snoop or not),
> I think as soon as we say "proof" is required to gain this access then
> that proof should be ongoing for the life of the access.
> 
> That alone makes this more than a "I want this feature, here's my
> proof", one-shot ioctl.  Like the groupfd enabling a path for KVM to
> ask if that group is non-coherent and holding a group reference to
> prevent the group from being used to authorize multiple KVM instances,
> the ioasidfd proof would need to minimally validate that devices are
> present and provide some reference to enforce that model as ongoing, or
> notifier to indicate an end of that authorization.  I don't think we
> can simplify that out of the equation or we've essentially invalidated
> that the proof is really required.
> 
> > >>
> > >> The simplest one is KVM_DEV_VFIO_GROUP_ADD/DEL, that already
> exists and also
> > >> covers hot-unplug.  The second simplest one is
> KVM_DEV_IOASID_ADD/DEL.
> > >
> > > This isn't the same thing, this is back to trying to have the kernel
> > > set policy for userspace.
> >
> > If you want a userspace policy then there would be three states:
> >
> > * WBINVD enabled because a WBINVD-enabled VFIO device is attached.
> >
> > * WBINVD potentially enabled but no WBINVD-enabled VFIO device
> attached
> >
> > * WBINVD forcefully disabled
> >
> > KVM_DEV_VFIO_GROUP_ADD/DEL can still be used to distinguish the first
> > two.  Due to backwards compatibility, those two describe the default
> > behavior; disabling wbinvd can be done easily with a new sub-ioctl of
> > KVM_ENABLE_CAP and doesn't require any security proof.
> 
> That seems like a good model, use the kvm-vfio device for the default
> behavior and extend an existing KVM ioctl if QEMU still needs a way to
> tell KVM to assume all DMA is coherent, regardless of what the kvm-vfio
> device reports.
> 
> If feels like we should be able to support a backwards compatibility
> mode using the vfio group, but I expect long term we'll want to
> transition the kvm-vfio device from a groupfd to an ioasidfd.
> 
> > The meaning of WBINVD-enabled is "won't return -ENXIO for the wbinvd
> > ioctl", nothing more nothing less.  If all VFIO devices are going to be
> > WBINVD-enabled, then that will reflect on KVM as well, and I won't have
> > anything to object if there's consensus on the device assignment side of
> > things that the wbinvd ioctl won't ever fail.
> 
> If we create the IOMMU vs device coherency matrix:
> 
> \ Device supports
> IOMMU blocks \   no-snoop
>   no-snoop\  yes | no  |
> ---+-+-+
>yes |  1  |  2  |
> ---+-+-+
>no  |  3  |  4  |
> ---+-+-+
> 
> DMA is always coherent in boxes {1,2,4} (wbinvd emulation is not
> needed).  VFIO will currently always configure the IOMMU for {1,2} when
> the feature is supported.  Boxes {3,4} are where we'll currently
> emulate wbinvd.  The best we could do, not knowing the guest or
> insights into the guest driver would be to only emulate wbinvd for {3}.
> 
> The majority of devices appear to report no-snoop support {1,3}, but
> the claim is that it's mostly unused outside of GPUs, effectively {2,4}.
> I'll speculate that most IOMMUs support enforcing coherency (amd, arm,
> fsl unconditionally, intel conditionally) {1,2}.
> 
> I think that means we're currently operating primarily in Box {1},
> which does not seem to lean towards unconditional wbinvd access with
> device ownership.
> 
> I think we have a desire with IOASID to allow user policy to operate
> certain devices in {3} and I think the argument there is that a
> specific software enforced coherence sync is more efficient on the bus
> than the constant coherence enforcement by the IOMMU.
> 
> I think that the disable mode Jason proposed is essentially just a way
> to move a device from {3} to {4}, ie. the IOASID support or
> configuration does not block no-snoop and the device claims to support
> no-snoop, but doesn't use it.  How we'd determine this to be true for
> a device without a crystal ball of driver development or hardware
> errata that no-snoop transactions are not possible regardless of the
> behavior of the enable bit, I'm not sure.  If we're operating a device
> in {3}, but the device does not generate no-snoop 

Re: [PATCH v2 1/1] iommu: Clear a lot of spelling mistakes

2021-06-08 Thread Leizhen (ThunderTown)



On 2021/6/8 17:38, Will Deacon wrote:
> On Mon, May 10, 2021 at 11:54:25AM +0800, Zhen Lei wrote:
>> All spelling mistakes are in the comments, no functional change.
> 
> [...]
> 
>> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
>> index e06b8a0e2b56bdd..b9e4be35c8dcad2 100644
>> --- a/drivers/iommu/mtk_iommu.c
>> +++ b/drivers/iommu/mtk_iommu.c
>> @@ -161,7 +161,7 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
>> *data);
>>   * The Region 'A'(I/O) can NOT be mapped by M4U; For Region 'B'/'C'/'D', the
>>   * bit32 of the CPU physical address always is needed to set, and for Region
>>   * 'E', the CPU physical address keep as is.
>> - * Additionally, The iommu consumers always use the CPU phyiscal address.
>> + * Additionally, The iommu consumers always use the CPU physical address.
> 
> If we're fixing typos, then I think the 'T' in "The iommu" should be
> lowercase here.

OK, I will change it.

In addition, the current linux-next has several new spelling errors.

> 
> Anyway, all the other fixes look correct to me. I'll leave it up to Joerg as
> to whether he wants to apply this or not, given the impact on git blame and
> the potential for conflicts.

Currently, there are no conflicts in linux-next.

> 
> Will
> 
> .
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Jason Gunthorpe
On Tue, Jun 08, 2021 at 04:04:26PM +1000, David Gibson wrote:

> > What I would like is that the /dev/iommu side managing the IOASID
> > doesn't really care much, but the device driver has to tell
> > drivers/iommu what it is going to do when it attaches.
> 
> By the device driver, do you mean the userspace or guest device
> driver?  Or do you mean the vfio_pci or mdev "shim" device driver"?

I mean vfio_pci, mdev "shim", vdpa, etc. Some kernel driver that is
allowing qemu access to a HW resource.

> > It makes sense, in PCI terms, only the driver knows what TLPs the
> > device will generate. The IOMMU needs to know what TLPs it will
> > recieve to configure properly.
> > 
> > PASID or not is major device specific variation, as is the ENQCMD/etc
> > 
> > Having the device be explicit when it tells the IOMMU what it is going
> > to be sending is a major plus to me. I actually don't want to see this
> > part of the interface be made less strong.
> 
> Ok, if I'm understanding this right a PASID capable IOMMU will be able
> to process *both* transactions with just a RID and transactions with a
> RID+PASID.

Yes

> So if we're thinking of this notional 84ish-bit address space, then
> that includes "no PASID" as well as all the possible PASID values.
> Yes?  Or am I confused?

Right, though I expect how to model 'no pasid' vs all the pasids is
some micro-detail someone would need to work on a real vIOMMU
implemetnation to decide..
 
> > /dev/iommu is concerned with setting up the IOAS and filling the IO
> > page tables with information
> > 
> > The driver behind "struct vfio_device" is responsible to "route" its
> > HW into that IOAS.
> > 
> > They are two halfs of the problem, one is only the io page table, and one
> > the is connection of a PCI TLP to a specific io page table.
> > 
> > Only the driver knows what format of TLPs the device will generate so
> > only the driver can specify the "route"
> 
> Ok.  I'd really like if we can encode this in a way that doesn't build
> PCI-specific structure into the API, though.

I think we should at least have bus specific "convenience" APIs for
the popular cases. It is complicated enough already, trying to force
people to figure out the kernel synonym for a PCI standard name gets
pretty rough... Plus the RID is inherently a hardware specific
concept.

> > Inability to match the RID is rare, certainly I would expect any IOMMU
> > HW that can do PCIEe PASID matching can also do RID matching.
> 
> It's not just up to the IOMMU.  The obvious case is a PCIe-to-PCI
> bridge.  

Yes.. but PCI is *really* old at this point. Even PCI-X sustains the
originating RID.

The general case here is that each device can route to its own
IOAS. The specialty case is that only one IOAS in a group can be
used. We should make room in the API for the special case without
destroying the general case.

> > Oh, I hadn't spent time thinking about any of those.. It is messy but
> > it can still be forced to work, I guess. A device centric model means
> > all the devices using the same routing ID have to be connected to the
> > same IOASID by userspace. So some of the connections will be NOPs.
> 
> See, that's exactly what I thought the group checks were enforcing.
> I'm really hoping we don't need two levels of granularity here: groups
> of devices that can't be identified from each other, and then groups
> of those that can't be isolated from each other.  That introduces a
> huge amount of extra conceptual complexity.

We've got this far with groups that mean all those things together, I
wouldn't propose to do a bunch of kernel work to change that
significantly.

I just want to have a device centric uAPI so we are not trapped
forever in groups being 1:1 with an IOASID model, which is clearly not
accurately modeling what today's systems are actually able to do,
especially with PASID.

We can report some fixed info to user space 'all these devices share
one ioasid' and leave it for now/ever

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Jason Gunthorpe
On Tue, Jun 08, 2021 at 10:53:02AM +1000, David Gibson wrote:
> On Thu, Jun 03, 2021 at 08:52:24AM -0300, Jason Gunthorpe wrote:
> > On Thu, Jun 03, 2021 at 03:13:44PM +1000, David Gibson wrote:
> > 
> > > > We can still consider it a single "address space" from the IOMMU
> > > > perspective. What has happened is that the address table is not just a
> > > > 64 bit IOVA, but an extended ~80 bit IOVA formed by "PASID, IOVA".
> > > 
> > > True.  This does complexify how we represent what IOVA ranges are
> > > valid, though.  I'll bet you most implementations don't actually
> > > implement a full 64-bit IOVA, which means we effectively have a large
> > > number of windows from (0..max IOVA) for each valid pasid.  This adds
> > > another reason I don't think my concept of IOVA windows is just a
> > > power specific thing.
> > 
> > Yes
> > 
> > Things rapidly get into weird hardware specific stuff though, the
> > request will be for things like:
> >   "ARM PASID page table format from SMMU IP block vXX"
> 
> So, I'm happy enough for picking a user-managed pagetable format to
> imply the set of valid IOVA ranges (though a query might be nice).

I think a query is mandatory, and optionally asking for ranges seems
generally useful as a HW property.

The danger is things can get really tricky as the app can ask for
ranges some HW needs but other HW can't provide. 

I would encourage a flow where "generic" apps like DPDK can somehow
just ignore this, or at least be very, very simplified "I want around
XX GB of IOVA space"

dpdk type apps vs qemu apps are really quite different and we should
be carefully that the needs of HW accelerated vIOMMU emulation do not
trump the needs of simple universal control over a DMA map.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Jason Gunthorpe
On Tue, Jun 08, 2021 at 12:47:00PM -0600, Alex Williamson wrote:
> On Tue, 8 Jun 2021 15:44:26 +0200
> Paolo Bonzini  wrote:
> 
> > On 08/06/21 15:15, Jason Gunthorpe wrote:
> > > On Tue, Jun 08, 2021 at 09:56:09AM +0200, Paolo Bonzini wrote:
> > >   
> >  Alternatively you can add a KVM_DEV_IOASID_{ADD,DEL} pair of ioctls. 
> >  But it
> >  seems useless complication compared to just using what we have now, at 
> >  least
> >  while VMs only use IOASIDs via VFIO.  
> > >>>
> > >>> The simplest is KVM_ENABLE_WBINVD() and be done
> > >>> with it.  
> 
> Even if we were to relax wbinvd access to any device (capable of
> no-snoop or not) in any IOMMU configuration (blocking no-snoop or not),
> I think as soon as we say "proof" is required to gain this access then
> that proof should be ongoing for the life of the access.

This idea is not entirely consistent with the usual Unix access
control model..

Eg I can do open() on a file and I get to keep that FD. I get to keep
that FD even if someone later does chmod() on that file so I can't
open it again.

There are lots of examples where a one time access control check
provides continuing access to a resource. I feel the ongoing proof is
the rarity in Unix.. 'revoke' is an uncommon concept in Unix..

That said, I don't feel strongly either way, would just like to see
something implementatbale. Even the various options to change the
feature are more thought explorations to try to understand how to
model the feature than any requirements I am aware of.

> notifier to indicate an end of that authorization.  I don't think we
> can simplify that out of the equation or we've essentially invalidated
> that the proof is really required.

The proof is like the chown in the above open() example. Once kvm is
authorized it keeps working even if a new authorization could not be
obtained. It is not very different from chmod'ing a file after
something opened it.

Inablity to revoke doesn't invalidate the value of the initial
one-time access control check.

Generally agree on the rest of your message

Regards,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Alex Williamson
On Tue, 8 Jun 2021 15:44:26 +0200
Paolo Bonzini  wrote:

> On 08/06/21 15:15, Jason Gunthorpe wrote:
> > On Tue, Jun 08, 2021 at 09:56:09AM +0200, Paolo Bonzini wrote:
> >   
>  Alternatively you can add a KVM_DEV_IOASID_{ADD,DEL} pair of ioctls. But 
>  it
>  seems useless complication compared to just using what we have now, at 
>  least
>  while VMs only use IOASIDs via VFIO.  
> >>>
> >>> The simplest is KVM_ENABLE_WBINVD() and be done
> >>> with it.  

Even if we were to relax wbinvd access to any device (capable of
no-snoop or not) in any IOMMU configuration (blocking no-snoop or not),
I think as soon as we say "proof" is required to gain this access then
that proof should be ongoing for the life of the access.

That alone makes this more than a "I want this feature, here's my
proof", one-shot ioctl.  Like the groupfd enabling a path for KVM to
ask if that group is non-coherent and holding a group reference to
prevent the group from being used to authorize multiple KVM instances,
the ioasidfd proof would need to minimally validate that devices are
present and provide some reference to enforce that model as ongoing, or
notifier to indicate an end of that authorization.  I don't think we
can simplify that out of the equation or we've essentially invalidated
that the proof is really required.

> >>
> >> The simplest one is KVM_DEV_VFIO_GROUP_ADD/DEL, that already exists and 
> >> also
> >> covers hot-unplug.  The second simplest one is KVM_DEV_IOASID_ADD/DEL.  
> > 
> > This isn't the same thing, this is back to trying to have the kernel
> > set policy for userspace.  
> 
> If you want a userspace policy then there would be three states:
> 
> * WBINVD enabled because a WBINVD-enabled VFIO device is attached.
> 
> * WBINVD potentially enabled but no WBINVD-enabled VFIO device attached
> 
> * WBINVD forcefully disabled
> 
> KVM_DEV_VFIO_GROUP_ADD/DEL can still be used to distinguish the first 
> two.  Due to backwards compatibility, those two describe the default 
> behavior; disabling wbinvd can be done easily with a new sub-ioctl of 
> KVM_ENABLE_CAP and doesn't require any security proof.

That seems like a good model, use the kvm-vfio device for the default
behavior and extend an existing KVM ioctl if QEMU still needs a way to
tell KVM to assume all DMA is coherent, regardless of what the kvm-vfio
device reports.

If feels like we should be able to support a backwards compatibility
mode using the vfio group, but I expect long term we'll want to
transition the kvm-vfio device from a groupfd to an ioasidfd.

> The meaning of WBINVD-enabled is "won't return -ENXIO for the wbinvd 
> ioctl", nothing more nothing less.  If all VFIO devices are going to be 
> WBINVD-enabled, then that will reflect on KVM as well, and I won't have 
> anything to object if there's consensus on the device assignment side of 
> things that the wbinvd ioctl won't ever fail.

If we create the IOMMU vs device coherency matrix:

\ Device supports
IOMMU blocks \   no-snoop
  no-snoop\  yes | no  |
---+-+-+
   yes |  1  |  2  |
---+-+-+
   no  |  3  |  4  |
---+-+-+

DMA is always coherent in boxes {1,2,4} (wbinvd emulation is not
needed).  VFIO will currently always configure the IOMMU for {1,2} when
the feature is supported.  Boxes {3,4} are where we'll currently
emulate wbinvd.  The best we could do, not knowing the guest or
insights into the guest driver would be to only emulate wbinvd for {3}.

The majority of devices appear to report no-snoop support {1,3}, but
the claim is that it's mostly unused outside of GPUs, effectively {2,4}.
I'll speculate that most IOMMUs support enforcing coherency (amd, arm,
fsl unconditionally, intel conditionally) {1,2}.

I think that means we're currently operating primarily in Box {1},
which does not seem to lean towards unconditional wbinvd access with
device ownership.

I think we have a desire with IOASID to allow user policy to operate
certain devices in {3} and I think the argument there is that a
specific software enforced coherence sync is more efficient on the bus
than the constant coherence enforcement by the IOMMU.

I think that the disable mode Jason proposed is essentially just a way
to move a device from {3} to {4}, ie. the IOASID support or
configuration does not block no-snoop and the device claims to support
no-snoop, but doesn't use it.  How we'd determine this to be true for
a device without a crystal ball of driver development or hardware
errata that no-snoop transactions are not possible regardless of the
behavior of the enable bit, I'm not sure.  If we're operating a device
in {3}, but the device does not generate no-snoop transactions, then
presumably the guest driver isn't generating wbinvd instructions for us
to emulate, so where are the wbinvd instructions that this feature
would prevent being emulated coming from?  Thanks,

Alex


Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-06-08 Thread Jason Gunthorpe
On Tue, Jun 08, 2021 at 10:44:31AM +1000, David Gibson wrote:

> When you say "not using a drivers/iommu IOMMU interface" do you
> basically mean the device doesn't do DMA? 

No, I mean the device doesn't use iommu_map() to manage the DMA
mappings.

vfio_iommu_type1 has a special code path that mdev triggers that
doesn't allocate an IOMMU domain and doesn't call iommu_map() or
anything related to that.

Instead a mdev driver calls vfio_pin_pages() which "reads" a fake page
table and returns back the CPU pages for the mdev to DMA map however
it likes.

> Now, we could represent those different sorts of isolation separately,
> but at the time our thinking was that we should group together devices
> that can't be safely isolated for *any* reason, since the practical
> upshot is the same: you can't safely split those devices between
> different owners.

It is fine, but the direction is going the other way, devices have
perfect ioslation and rely on special interactions with the iommu to
get it.
 
> > What I don't like is forcing certain things depending on how the
> > vfio_device was created - for instance forcing a IOMMU group as part
> > and forcing an ugly "SW IOMMU" mode in the container only as part of
> > mdev_device.
> 
> I don't really see how this is depending on how the device is
> created.

static int vfio_iommu_type1_attach_group(void *iommu_data,
 struct iommu_group
 *iommu_group)
{
if (vfio_bus_is_mdev(bus)) {

What the iommu code does depends on how the device was created. This
is really ugly.

This is happening becaus the three objects in the model:
driver/group/domain are not being linked together in a way that
reflects the modern world.

The group has no idea what the driver wants but is in charge of
creating the domain on behalf of the device.

And so people have been created complicated hackery to pass
information from the driver to the group, through the device, so that
the group can create the right domain.

I want to see the driver simply create the right domain directly. It
is much simpler and scales to more domain complexity.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv4 0/2] iommu/arm-smmu-qcom: Add SC7280 support

2021-06-08 Thread Will Deacon
On Tue, 8 Jun 2021 18:00:05 +0530, Sai Prakash Ranjan wrote:
> Patch 1 adds the sc7280 smmu compatible.
> Patch 2 moves the adreno smmu check before apss smmu to enable
> adreno smmu specific implementation.
> 
> Note that dt-binding for sc7280 is already merged.
> 
> Changes in v4:
>  * Rebased on top of arm-smmu/updates with acpi changes.
> 
> [...]

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/2] iommu/arm-smmu-qcom: Add SC7280 SMMU compatible
  https://git.kernel.org/will/c/0b779f562b14
[2/2] iommu/arm-smmu-qcom: Move the adreno smmu specific impl
  https://git.kernel.org/will/c/ab9a77a141cc

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm: Cleanup resources in case of probe error path

2021-06-08 Thread Will Deacon
On Tue, 8 Jun 2021 22:15:59 +0530, Amey Narkhede wrote:
> If device registration fails, remove sysfs attribute
> and if setting bus callbacks fails, unregister the device
> and cleanup the sysfs attribute.

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm: Cleanup resources in case of probe error path
  https://git.kernel.org/will/c/249c9dc6aa0d

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/5] dt-bindings: reserved-memory: Document memory region specifier

2021-06-08 Thread Thierry Reding
On Fri, May 28, 2021 at 06:54:55PM +0200, Thierry Reding wrote:
> On Thu, May 20, 2021 at 05:03:06PM -0500, Rob Herring wrote:
> > On Fri, Apr 23, 2021 at 06:32:30PM +0200, Thierry Reding wrote:
> > > From: Thierry Reding 
> > > 
> > > Reserved memory region phandle references can be accompanied by a
> > > specifier that provides additional information about how that specific
> > > reference should be treated.
> > > 
> > > One use-case is to mark a memory region as needing an identity mapping
> > > in the system's IOMMU for the device that references the region. This is
> > > needed for example when the bootloader has set up hardware (such as a
> > > display controller) to actively access a memory region (e.g. a boot
> > > splash screen framebuffer) during boot. The operating system can use the
> > > identity mapping flag from the specifier to make sure an IOMMU identity
> > > mapping is set up for the framebuffer before IOMMU translations are
> > > enabled for the display controller.
> > > 
> > > Signed-off-by: Thierry Reding 
> > > ---
> > >  .../reserved-memory/reserved-memory.txt   | 21 +++
> > >  include/dt-bindings/reserved-memory.h |  8 +++
> > >  2 files changed, 29 insertions(+)
> > >  create mode 100644 include/dt-bindings/reserved-memory.h
> > 
> > Sorry for being slow on this. I have 2 concerns.
> > 
> > First, this creates an ABI issue. A DT with cells in 'memory-region' 
> > will not be understood by an existing OS. I'm less concerned about this 
> > if we address that with a stable fix. (Though I'm pretty sure we've 
> > naively added #?-cells in the past ignoring this issue.)
> 
> A while ago I had proposed adding memory-region*s* as an alternative
> name for memory-region to make the naming more consistent with other
> types of properties (think clocks, resets, gpios, ...). If we added
> that, we could easily differentiate between the "legacy" cases where
> no #memory-region-cells was allowed and the new cases where it was.
> 
> > Second, it could be the bootloader setting up the reserved region. If a 
> > node already has 'memory-region', then adding more regions is more 
> > complicated compared to adding new properties. And defining what each 
> > memory-region entry is or how many in schemas is impossible.
> 
> It's true that updating the property gets a bit complicated, but it's
> not exactly rocket science. We really just need to splice the array. I
> have a working implemention for this in U-Boot.
> 
> For what it's worth, we could run into the same issue with any new
> property that we add. Even if we renamed this to iommu-memory-region,
> it's still possible that a bootloader may have to update this property
> if it already exists (it could be hard-coded in DT, or it could have
> been added by some earlier bootloader or firmware).
> 
> > Both could be addressed with a new property. Perhaps something like 
> > 'iommu-memory-region = <>;'. I think the 'iommu' prefix is 
> > appropriate given this is entirely because of the IOMMU being in the 
> > mix. I might feel differently if we had other uses for cells, but I 
> > don't really see it in this case. 
> 
> I'm afraid that down the road we'll end up with other cases and then we
> might proliferate a number of *-memory-region properties with varying
> prefixes.
> 
> I am aware of one other case where we might need something like this: on
> some Tegra SoCs we have audio processors that will access memory buffers
> using a DMA engine. These processors are booted from early firmware
> using firmware from system memory. In order to avoid trashing the
> firmware, we need to reserve memory. We can do this using reserved
> memory nodes. However, the audio DMA engine also uses the SMMU, so we
> need to make sure that the firmware memory is marked as reserved within
> the SMMU. This is similar to the identity mapping case, but not exactly
> the same. Instead of creating a 1:1 mapping, we just want that IOVA
> region to be reserved (i.e. IOMMU_RESV_RESERVED instead of
> IOMMU_RESV_DIRECT{,_RELAXABLE}).
> 
> That would also fall into the IOMMU domain, but we can't reuse the
> iommu-memory-region property for that because then we don't have enough
> information to decide which type of reservation we need.
> 
> We could obviously make iommu-memory-region take a specifier, but we
> could just as well use memory-regions in that case since we have
> something more generic anyway.
> 
> With the #memory-region-cells proposal, we can easily extend the cell in
> the specifier with an additional MEMORY_REGION_IOMMU_RESERVE flag to
> take that other use case into account. If we than also change to the new
> memory-regions property name, we avoid the ABI issue (and we gain a bit
> of consistency while at it).

Ping? Rob, do you want me to add this second use-case to the patch
series to make it more obvious that this isn't just a one-off thing? Or
how do we proceed?

Thierry


signature.asc
Description: PGP signature

Re: [GIT PULL] memory: Tegra memory controller for v5.14

2021-06-08 Thread Will Deacon
On Tue, Jun 08, 2021 at 04:38:48PM +0200, Thierry Reding wrote:
> On Tue, Jun 08, 2021 at 01:01:29PM +0100, Will Deacon wrote:
> > On Mon, Jun 07, 2021 at 10:49:10AM +0200, Krzysztof Kozlowski wrote:
> > > This is the pull for you to base further SMMU aptches (prevent early SMMU
> > > faults).
> > 
> > This is a tonne of code for me to pull into the SMMU tree given that I only
> > want one patch!
> > 
> > Thierry, if I just stick:
> > 
> > https://lore.kernel.org/r/20210603164632.1000458-4-thierry.red...@gmail.com
> > 
> > on its own branch, can you stitch together whatever you need?
> 
> I'm not sure I understand what you're proposing. For reference, here's
> the set of patches that I sent out:
> 
>   1. memory: tegra: Implement SID override programming
>   2. dt-bindings: arm-smmu: Add Tegra186 compatible string
>   3. iommu/arm-smmu: Implement ->probe_finalize()
>   4. iommu/arm-smmu: tegra: Detect number of instances at runtime
>   5. iommu/arm-smmu: tegra: Implement SID override programming
>   6. iommu/arm-smmu: Use Tegra implementation on Tegra186
>   7. arm64: tegra: Use correct compatible string for Tegra186 SMMU
>   8. arm64: tegra: Hook up memory controller to SMMU on Tegra186
>   9. arm64: tegra: Enable SMMU support on Tegra194
> 
> Krzysztof already picked up patch 1 and I was assuming that you'd pick
> up 2-6 because they are to the ARM SMMU driver. However, if you're
> primarily interested in just patch 3, which is more "core" ARM SMMU than
> the rest, which are Tegra-specific, then I suppose what we could do is
> for you to give an Acked-by on the rest (2, 4-6) and then Krzysztof or I
> can pick them up and take them via ARM SoC, based on the stable branch
> from your tree that only has patch 3.

I think you previously said that patch 5 depends on patch 1, so I can't
take 2-6 without also pulling in the memory controller queue.

> Patch 6 touches arm-smmu-impl.c, though it's a two-line change that
> touches only the Tegra-specific matching bit in arm_smmu_impl_init(), so
> the likelihood of that conflicting with anything else is fairly small.
> 
> Is that what you were proposing?

I can queue as much or as little of 2-6 as you like, but I would like to
avoid pulling in the memory controller queue into the arm smmu tree. But
yes, whichever of those I take, I can put them on a separate branch so
that you're not blocked for the later patches.

You have a better handle on the dependencies, so please tell me what works
for you. I just want to make sure that at least patch 3 lands in my tree,
so we don't get late conflicts with other driver changes.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm: Cleanup resources in case of probe error path

2021-06-08 Thread Amey Narkhede
If device registration fails, remove sysfs attribute
and if setting bus callbacks fails, unregister the device
and cleanup the sysfs attribute.

Signed-off-by: Amey Narkhede 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 --
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 15 ---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c | 13 +++--
 3 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 54b2f27b81d4..de2499754025 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3669,10 +3669,20 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
ret = iommu_device_register(>iommu, _smmu_ops, dev);
if (ret) {
dev_err(dev, "Failed to register iommu\n");
-   return ret;
+   goto err_sysfs_remove;
}

-   return arm_smmu_set_bus_ops(_smmu_ops);
+   ret = arm_smmu_set_bus_ops(_smmu_ops);
+   if (ret)
+   goto err_unregister_device;
+
+   return 0;
+
+err_unregister_device:
+   iommu_device_unregister(>iommu);
+err_sysfs_remove:
+   iommu_device_sysfs_remove(>iommu);
+   return ret;
 }

 static int arm_smmu_device_remove(struct platform_device *pdev)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 6f72c4d208ca..88a3023676ce 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2164,7 +2164,7 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
err = iommu_device_register(>iommu, _smmu_ops, dev);
if (err) {
dev_err(dev, "Failed to register iommu\n");
-   return err;
+   goto err_sysfs_remove;
}

platform_set_drvdata(pdev, smmu);
@@ -2187,10 +2187,19 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
 * any device which might need it, so we want the bus ops in place
 * ready to handle default domain setup as soon as any SMMU exists.
 */
-   if (!using_legacy_binding)
-   return arm_smmu_bus_init(_smmu_ops);
+   if (!using_legacy_binding) {
+   err = arm_smmu_bus_init(_smmu_ops);
+   if (err)
+   goto err_unregister_device;
+   }

return 0;
+
+err_unregister_device:
+   iommu_device_unregister(>iommu);
+err_sysfs_remove:
+   iommu_device_sysfs_remove(>iommu);
+   return err;
 }

 static int arm_smmu_device_remove(struct platform_device *pdev)
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 4294abe389b2..b785d9fb7602 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -850,10 +850,12 @@ static int qcom_iommu_device_probe(struct platform_device 
*pdev)
ret = iommu_device_register(_iommu->iommu, _iommu_ops, dev);
if (ret) {
dev_err(dev, "Failed to register iommu\n");
-   return ret;
+   goto err_sysfs_remove;
}

-   bus_set_iommu(_bus_type, _iommu_ops);
+   ret = bus_set_iommu(_bus_type, _iommu_ops);
+   if (ret)
+   goto err_unregister_device;

if (qcom_iommu->local_base) {
pm_runtime_get_sync(dev);
@@ -862,6 +864,13 @@ static int qcom_iommu_device_probe(struct platform_device 
*pdev)
}

return 0;
+
+err_unregister_device:
+   iommu_device_unregister(_iommu->iommu);
+
+err_sysfs_remove:
+   iommu_device_sysfs_remove(_iommu->iommu);
+   return ret;
 }

 static int qcom_iommu_device_remove(struct platform_device *pdev)
--
2.31.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/qcom: Cleanup resources in case of probe error path

2021-06-08 Thread Amey Narkhede
On 21/06/08 10:29AM, Will Deacon wrote:
> On Thu, Apr 22, 2021 at 03:40:30AM +0530, Amey Narkhede wrote:
> > If device registration fails, remove sysfs attribute
> > and if setting bus callbacks fails, unregister the device
> > and cleanup the sysfs attribute.
> >
> > Signed-off-by: Amey Narkhede 
> > ---
> >  drivers/iommu/arm/arm-smmu/qcom_iommu.c | 14 --
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
> > b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> > index 4294abe389b2..5fa128a1f7f0 100644
> > --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> > +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> > @@ -850,10 +850,12 @@ static int qcom_iommu_device_probe(struct 
> > platform_device *pdev)
> > ret = iommu_device_register(_iommu->iommu, _iommu_ops, dev);
> > if (ret) {
> > dev_err(dev, "Failed to register iommu\n");
> > -   return ret;
> > +   goto err_sysfs_remove;
> > }
> >
> > -   bus_set_iommu(_bus_type, _iommu_ops);
> > +   ret = bus_set_iommu(_bus_type, _iommu_ops);
> > +   if (ret)
> > +   goto err_unregister_device;
> >
> > if (qcom_iommu->local_base) {
> > pm_runtime_get_sync(dev);
> > @@ -862,6 +864,14 @@ static int qcom_iommu_device_probe(struct 
> > platform_device *pdev)
> > }
> >
> > return 0;
> > +
> > +err_unregister_device:
> > +   iommu_device_unregister(_iommu->iommu);
> > +
> > +err_sysfs_remove:
> > +   iommu_device_sysfs_remove(_iommu->iommu);
> > +
> > +   return ret;
>
> It looks like we're also missing this logic in arm-smmu/arm-smmu.c and
> arm-smmu-v3/arm-smmu-v3.c. Would you be able to fix those up too, please?
>
> Will

Sure, I will send a v2 with this and new patches.

Thanks,
Amey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [GIT PULL] memory: Tegra memory controller for v5.14

2021-06-08 Thread Thierry Reding
On Tue, Jun 08, 2021 at 01:01:29PM +0100, Will Deacon wrote:
> Hi Krzysztof, Thierry,
> 
> On Mon, Jun 07, 2021 at 10:49:10AM +0200, Krzysztof Kozlowski wrote:
> > Hi Thierry and Will,
> > 
> > This is the pull for you to base further SMMU aptches (prevent early SMMU
> > faults).
> 
> This is a tonne of code for me to pull into the SMMU tree given that I only
> want one patch!
> 
> Thierry, if I just stick:
> 
> https://lore.kernel.org/r/20210603164632.1000458-4-thierry.red...@gmail.com
> 
> on its own branch, can you stitch together whatever you need?

I'm not sure I understand what you're proposing. For reference, here's
the set of patches that I sent out:

  1. memory: tegra: Implement SID override programming
  2. dt-bindings: arm-smmu: Add Tegra186 compatible string
  3. iommu/arm-smmu: Implement ->probe_finalize()
  4. iommu/arm-smmu: tegra: Detect number of instances at runtime
  5. iommu/arm-smmu: tegra: Implement SID override programming
  6. iommu/arm-smmu: Use Tegra implementation on Tegra186
  7. arm64: tegra: Use correct compatible string for Tegra186 SMMU
  8. arm64: tegra: Hook up memory controller to SMMU on Tegra186
  9. arm64: tegra: Enable SMMU support on Tegra194

Krzysztof already picked up patch 1 and I was assuming that you'd pick
up 2-6 because they are to the ARM SMMU driver. However, if you're
primarily interested in just patch 3, which is more "core" ARM SMMU than
the rest, which are Tegra-specific, then I suppose what we could do is
for you to give an Acked-by on the rest (2, 4-6) and then Krzysztof or I
can pick them up and take them via ARM SoC, based on the stable branch
from your tree that only has patch 3.

Patch 6 touches arm-smmu-impl.c, though it's a two-line change that
touches only the Tegra-specific matching bit in arm_smmu_impl_init(), so
the likelihood of that conflicting with anything else is fairly small.

Is that what you were proposing?

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [Linuxarm] Re: [PATCH 0/4] Free cached iovas when rmmod the driver of the last device in group

2021-06-08 Thread Robin Murphy

On 2021-06-08 08:50, chenxiang (M) wrote:

Hi Robin,


在 2021/6/7 19:23, Robin Murphy 写道:

On 2021-06-07 03:42, chenxiang wrote:

From: Xiang Chen 

When rmmod the driver of the last device in the group, cached iovas 
are not

used, and it is better to free them to save memories. And also export
function free_rcache_cached_iovas() and iommu_domain_to_iova().


How common is it to use a device a significant amount, then unbind its 
driver and not use it for long enough to care about? Off-hand I can't 
think of a particularly realistic use-case which looks like that - the 
closest I can imagine is unbinding a default kernel driver to replace 
it with VFIO, but I would expect the set of devices intended for 
assignment to be distinct from the set of devices used by the host 
itself, and thus the host driver wouldn't have actually done much to 
generate a lot of DMA mappings in that initial period. Is my 
expectation there wrong?


It is possible that user uses the driver for a while and then rmmod it 
(for example, SAS card is the driver we use always, but sometimes we 
need to compare the performance with raid card, so we will insmod the 
raid driver, and rmmod
the driver after the test). At this situation, the rcache iovas of raid 
card are always still even if rmmod the driver (user always rmmod the 
driver rather than removing the device which will release the group and 
release all resources).


OK, so my question is how many end users are doing that on production 
systems with distro kernels, such that this feature deserves being 
supported and maintained in mainline?
If there is such a case, how much memory does this actually save in 
practice? The theoretical worst-case should be roughly 4 * 128 * 6 * 
sizeof(struct iova) bytes per CPU, which is around 192KB assuming 
64-byte kmem cache alignment. 


There are 128 CPUs in our ARM64 kunpeng920 board, and i add a debugfs 
interface drop_rcache of every group in local code which calls function 
free_rcache_cached_iovas(), and we can see that it saves 1~2M memory 
after freeing cached iovas.


estuary:/$ free
  total   used   free shared    buffers cached
Mem: 526776720    1336216  525440504 177664  0 177664
-/+ buffers/cache:    1158552  525618168
Swap:    0  0  0
estuary:/sys/kernel/debug/iommu/iovad/iommu_domain29$ echo 1 > drop_rcache
estuary:/sys/kernel/debug/iommu/iovad/iommu_domain29$ free
  total   used   free shared    buffers cached
Mem: 526776720    1334672  525442048 177664  0 177664
-/+ buffers/cache:    1157008  525619712
Swap:    0  0  0


Erm, isn't 12KB from 4GB (per CPU) basically just noise? :/

Sure, it might start to look a bit more significant on a quad-core TV 
box with 1GB of RAM total, but then something like that isn't ever going 
to be driving a big multi-queue SAS adapter to churn that many DMA 
mappings in the first place.


Consider that a "memory saving" feature which only has some minimal 
benefit for niche use-cases on very large systems, but costs a constant 
amount of memory in terms of code and symbol bloat in the kernel image, 
even on actually-memory-constrained systems, is rather counterproductive 
overall. As I said before, I do like the general idea *if* there's a 
significant saving to be had from it, but what you're showing us here 
suggests that there isn't :(


Note also that on arm64 systems, commit 65688d2a05de ("arm64: cache: 
Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)") queued in linux-next is 
already likely to cut everything in half here.


The other reason i want to free rcache iovas in suitable place is the 
IOVA longterm issue[0] (which is unrecoverable), if freeing cached iovas 
when rmmod the driver or
adding a debugfs to do that, i think we can recover it by rmmod the 
driver and then insmod it or calling the debugfs drop_rcache though it 
doesn't solve the issue.


Or perhaps we could, y'know, actually solve that problem, rather than 
invent hacks to bodge around it. I've lost track of where we got to with 
that thread (sorry!), but I think it still needs someone to look into 
the relative timing of all the flush queue interactions if we're to 
really understand what's going on, and if a simple obvious fix doesn't 
fall out of that then it probably makes sense to convert the rcache 
depot to a garbage-collected list (per the original concept).


Much as I think that purging the rcaches automatically to paper over 
that issue isn't a great idea, making the user do it manually is 
certainly even worse.


Robin.

[0] 
https://lore.kernel.org/linux-iommu/1607538189-237944-4-git-send-email-john.ga...@huawei.com/ 



However it seems rather unlikely in practice to have every possible 
cache entry of every size used, so if saving smaller amounts of memory 
is a concern wouldn't you also want to explicitly drain the flush 
queues (16KB per CPU) and maybe look at trying to 

RE: [PATCH] iommu/amd: Fix section mismatch warning for detect_ivrs()

2021-06-08 Thread Deucher, Alexander via iommu
[AMD Public Use]

> -Original Message-
> From: Joerg Roedel 
> Sent: Tuesday, June 8, 2021 8:29 AM
> To: Joerg Roedel ; Will Deacon 
> Cc: Deucher, Alexander ;
> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; Joerg
> Roedel 
> Subject: [PATCH] iommu/amd: Fix section mismatch warning for
> detect_ivrs()
> 
> From: Joerg Roedel 
> 
> A recent commit introduced this section mismatch warning:
> 
>   WARNING: modpost: vmlinux.o(.text.unlikely+0x22a1f): Section
> mismatch in reference from the function detect_ivrs() to the variable
> .init.data:amd_iommu_force_enable
> 
> The reason is that detect_ivrs() is not marked __init while it should be,
> because it is only called from another __init function. Mark
> detect_ivrs() __init to get rid of the warning.
> 
> Fixes: b1e650db2cc4 ("iommu/amd: Add amd_iommu=force_enable
> option")
> Signed-off-by: Joerg Roedel 

Acked-by: Alex Deucher 

> ---
>  drivers/iommu/amd/init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index
> 4e4fb0f4e412..46280e6e1535 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -2817,7 +2817,7 @@ static int amd_iommu_enable_interrupts(void)
>   return ret;
>  }
> 
> -static bool detect_ivrs(void)
> +static bool __init detect_ivrs(void)
>  {
>   struct acpi_table_header *ivrs_base;
>   acpi_status status;
> --
> 2.31.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Paolo Bonzini

On 08/06/21 15:15, Jason Gunthorpe wrote:

On Tue, Jun 08, 2021 at 09:56:09AM +0200, Paolo Bonzini wrote:


Alternatively you can add a KVM_DEV_IOASID_{ADD,DEL} pair of ioctls. But it
seems useless complication compared to just using what we have now, at least
while VMs only use IOASIDs via VFIO.


The simplest is KVM_ENABLE_WBINVD() and be done
with it.


The simplest one is KVM_DEV_VFIO_GROUP_ADD/DEL, that already exists and also
covers hot-unplug.  The second simplest one is KVM_DEV_IOASID_ADD/DEL.


This isn't the same thing, this is back to trying to have the kernel
set policy for userspace.


If you want a userspace policy then there would be three states:

* WBINVD enabled because a WBINVD-enabled VFIO device is attached.

* WBINVD potentially enabled but no WBINVD-enabled VFIO device attached

* WBINVD forcefully disabled

KVM_DEV_VFIO_GROUP_ADD/DEL can still be used to distinguish the first 
two.  Due to backwards compatibility, those two describe the default 
behavior; disabling wbinvd can be done easily with a new sub-ioctl of 
KVM_ENABLE_CAP and doesn't require any security proof.


The meaning of WBINVD-enabled is "won't return -ENXIO for the wbinvd 
ioctl", nothing more nothing less.  If all VFIO devices are going to be 
WBINVD-enabled, then that will reflect on KVM as well, and I won't have 
anything to object if there's consensus on the device assignment side of 
things that the wbinvd ioctl won't ever fail.


Paolo

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 1/3] iommu: Enhance IOMMU default DMA mode build options

2021-06-08 Thread John Garry
From: Zhen Lei 

First, add build options IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
opportunity to set {lazy|strict} mode as default at build time. Then put
the two config options in a choice, as they are mutually exclusive.

[jpg: Make choice between strict and lazy only (and not passthrough)]
Signed-off-by: Zhen Lei 
Signed-off-by: John Garry 
---
 drivers/iommu/Kconfig | 35 +++
 drivers/iommu/iommu.c |  3 ++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bca..369a3af9e5bf 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -90,6 +90,41 @@ config IOMMU_DEFAULT_PASSTHROUGH
 
  If unsure, say N here.
 
+choice
+   prompt "IOMMU default DMA mode"
+   depends on IOMMU_API
+   depends on X86 || IA64 || X86_64 || ARM || ARM64
+
+   default IOMMU_DEFAULT_STRICT
+   help
+ This option allows an IOMMU DMA mode to be chosen at build time, to
+ override the default DMA mode of each ARCH, removing the need to
+ pass in kernel parameters through command line. It is still possible
+ to provide ARCH-specific or common boot options to override this
+ option.
+
+ If unsure, keep the default.
+
+config IOMMU_DEFAULT_LAZY
+   bool "lazy"
+   help
+ Support lazy mode, where for every IOMMU DMA unmap operation, the
+ flush operation of IOTLB and the free operation of IOVA are deferred.
+ They are only guaranteed to be done before the related IOVA will be
+ reused.
+
+config IOMMU_DEFAULT_STRICT
+   bool "strict"
+   help
+ For every IOMMU DMA unmap operation, the flush operation of IOTLB and
+ the free operation of IOVA are guaranteed to be done in the unmap
+ function.
+
+ This mode is safer than lazy mode, but it may be slower in some high
+ performance scenarios.
+
+endchoice
+
 config OF_IOMMU
def_bool y
depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 966426a96520..177b0dafc535 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -29,7 +29,8 @@ static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
 
 static unsigned int iommu_def_domain_type __read_mostly;
-static bool iommu_dma_strict __read_mostly = true;
+static bool iommu_dma_strict __read_mostly =
+   IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
 static u32 iommu_cmd_line __read_mostly;
 
 struct iommu_group {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 3/3] iommu/amd: Add support for IOMMU default DMA mode build options

2021-06-08 Thread John Garry
From: Zhen Lei 

The default DMA mode lazy, but allow this to be set to strict mode at
build time. It may still be overridden by the relevant boot option.

Also make IOMMU_DEFAULT_LAZY default for when AMD_IOMMU config is set.

[jpg: Rebase for relocated file]
Signed-off-by: Zhen Lei 
Signed-off-by: John Garry 
---
 drivers/iommu/Kconfig| 2 +-
 drivers/iommu/amd/init.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index b68ec7ed23c0..6d99150050b9 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -95,7 +95,7 @@ choice
depends on IOMMU_API
depends on X86 || IA64 || X86_64 || ARM || ARM64
 
-   default IOMMU_DEFAULT_LAZY if INTEL_IOMMU
+   default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU)
default IOMMU_DEFAULT_STRICT
help
  This option allows an IOMMU DMA mode to be chosen at build time, to
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d006724f4dc2..af662fc37cbe 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -160,7 +160,8 @@ u16 amd_iommu_last_bdf; /* largest PCI 
device id we have
   to handle */
 LIST_HEAD(amd_iommu_unity_map);/* a list of required unity 
mappings
   we find in ACPI */
-bool amd_iommu_unmap_flush;/* if true, flush on every unmap */
+/* if true, flush on every unmap */
+bool amd_iommu_unmap_flush = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
 
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
   system */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 2/3] iommu/vt-d: Add support for IOMMU default DMA mode build options

2021-06-08 Thread John Garry
From: Zhen Lei 

Make IOMMU_DEFAULT_LAZY default for when INTEL_IOMMU config is set,
and make intel_iommu_strict initially hold value of
IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT).

Signed-off-by: Zhen Lei 
Signed-off-by: John Garry 
---
 drivers/iommu/Kconfig   | 1 +
 drivers/iommu/intel/iommu.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 369a3af9e5bf..b68ec7ed23c0 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -95,6 +95,7 @@ choice
depends on IOMMU_API
depends on X86 || IA64 || X86_64 || ARM || ARM64
 
+   default IOMMU_DEFAULT_LAZY if INTEL_IOMMU
default IOMMU_DEFAULT_STRICT
help
  This option allows an IOMMU DMA mode to be chosen at build time, to
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index be35284a2016..fe33347f4a1e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -360,7 +360,7 @@ int intel_iommu_enabled = 0;
 EXPORT_SYMBOL_GPL(intel_iommu_enabled);
 
 static int dmar_map_gfx = 1;
-static int intel_iommu_strict;
+static int intel_iommu_strict = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
 static int intel_iommu_superpage = 1;
 static int iommu_identity_mapping;
 static int iommu_skip_te_disable;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 0/3] Enhance IOMMU default DMA mode build options

2021-06-08 Thread John Garry
This is a reboot of Zhen Lei's series from a couple of years ago, which
never made it across the line.

I still think that it has some value, so taking up the mantle.

Motivation:
Allow lazy mode be default mode for DMA domains for all ARCHs, and not
only those who hardcode it (to be lazy). For ARM64, currently we must use
a kernel command line parameter to use lazy mode, which is less than
ideal.

Differences to v10:
- Rebase to v5.13-rc4
- Correct comment and typo in Kconfig (Randy)
- Make Kconfig choice depend on relevant architectures

Differences to v9:
https://lore.kernel.org/linux-iommu/20190613084240.16768-1-thunder.leiz...@huawei.com/#t
- Rebase to v5.13-rc2
- Remove CONFIG_IOMMU_DEFAULT_PASSTHROUGH from choice:
  Since we can dynamically change default domain of group, lazy or strict and
  passthrough are not mutually exclusive
- Drop ia64 patch, which I don't think was ever required
- Drop "x86/dma: use IS_ENABLED() to simplify the code", which is no
  longer required
- Drop s390/pci patch, as this arch does not use CONFIG_IOMMU_API or even
  already honour CONFIG_IOMMU_DEFAULT_PASSTHROUGH
  
https://lore.kernel.org/linux-iommu/20190613084240.16768-4-thunder.leiz...@huawei.com/
- Drop powernv/iommu patch, as I no longer think that it is relevant
  
https://lore.kernel.org/linux-iommu/20190613084240.16768-5-thunder.leiz...@huawei.com/
- Some tidying

Zhen Lei (3):
  iommu: Enhance IOMMU default DMA mode build options
  iommu/vt-d: Add support for IOMMU default DMA mode build options
  iommu/amd: Add support for IOMMU default DMA mode build options

 drivers/iommu/Kconfig   | 36 
 drivers/iommu/amd/init.c|  3 ++-
 drivers/iommu/intel/iommu.c |  2 +-
 drivers/iommu/iommu.c   |  3 ++-
 4 files changed, 41 insertions(+), 3 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Jason Gunthorpe
On Tue, Jun 08, 2021 at 09:10:42AM +0800, Jason Wang wrote:

> Well, this sounds like a re-invention of io_uring which has already worked
> for multifds.

How so? io_uring is about sending work to the kernel, not getting
structued events back?

It is more like one of the perf rings

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Jason Gunthorpe
On Tue, Jun 08, 2021 at 12:37:04PM +1000, David Gibson wrote:

> > The PPC/SPAPR support allows KVM to associate a vfio group to an IOMMU
> > page table so that it can handle iotlb programming from pre-registered
> > memory without trapping out to userspace.
> 
> To clarify that's a guest side logical vIOMMU page table which is
> partially managed by KVM.  This is an optimization - things can work
> without it, but it means guest iomap/unmap becomes a hot path because
> each map/unmap hypercall has to go
>   guest -> KVM -> qemu -> VFIO
> 
> So there are multiple context transitions.

Isn't this overhead true of many of the vIOMMUs? Can the fast path be
generalized?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Jason Gunthorpe
On Tue, Jun 08, 2021 at 09:56:09AM +0200, Paolo Bonzini wrote:

> > > Alternatively you can add a KVM_DEV_IOASID_{ADD,DEL} pair of ioctls. But 
> > > it
> > > seems useless complication compared to just using what we have now, at 
> > > least
> > > while VMs only use IOASIDs via VFIO.
> > 
> > The simplest is KVM_ENABLE_WBINVD() and be done
> > with it.
> 
> The simplest one is KVM_DEV_VFIO_GROUP_ADD/DEL, that already exists and also
> covers hot-unplug.  The second simplest one is KVM_DEV_IOASID_ADD/DEL.

This isn't the same thing, this is back to trying to have the kernel
set policy for userspace.

qmeu need to be in direct control of this specific KVM emulation
feature if it is going to support the full range of options.

IMHO obfuscating it with some ADD/DEL doesn't achieve that.

Especially since even today GROUP_ADD/DEL is not just about
controlling wbinvd but also about linking mdevs to the kvm struct - it
is both not optional to call from qemu and triggers behavior that is
against the userspace policy.

This is why I prefer a direct and obvious KVM_ENABLE_WBINVD approach

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Jason Gunthorpe
On Tue, Jun 08, 2021 at 12:43:38PM +0200, Enrico Weigelt, metux IT consult 
wrote:

> > It is two devices, thus two files.
> 
> Two separate real (hardware) devices or just two logical device nodes ?

A real PCI device and a real IOMMU block integrated into the CPU

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Jason Gunthorpe
On Tue, Jun 08, 2021 at 10:54:59AM +0200, Enrico Weigelt, metux IT consult 
wrote:

> Maybe the device as well as the transport could announce their
> capability (which IMHO should go via the virtio protocol), and if both
> are capable, the (guest's) virtio subsys tells the driver whether it's
> usable for a specific device. Perhaps we should also have a mechanism
> to tell the device that it's actually used.

The usage should be extremely narrow, like 

"If the driver issues a GPU command with flag X then the resulting
DMAs will be no-snoop and the driver must re-establish coherency at
the right moment"

It is not a general idea, but something baked directly into the device
protocol that virtio carries.

The general notion of no-nsoop should ideally be carried in the PCIe
extended config space flag. If 0 then no-snoop should never be issued,
expected, or requested.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3 0/2] iommu/arm-smmu-qcom: Add SC7280 support

2021-06-08 Thread Sai Prakash Ranjan

On 2021-06-08 17:31, Will Deacon wrote:

On Tue, Apr 20, 2021 at 11:34:55AM +0530, Sai Prakash Ranjan wrote:

Patch 1 adds the sc7280 smmu compatible.
Patch 2 moves the adreno smmu check before apss smmu to enable
adreno smmu specific implementation.

Note that dt-binding for sc7280 is already merged.


This conflicts with what I've already got queued at [1]. Please can you
send an updated version, as I wasn't sure about the initialisation 
order

you need here wrt to the ACPI parts.

Thanks,

Will

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates


Sure, have rebased and sent the updated patch [1] after testing for the 
order.


Thanks,
Sai

[1] 
https://lore.kernel.org/lkml/cover.1623155117.git.saiprakash.ran...@codeaurora.org/


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv4 1/2] iommu/arm-smmu-qcom: Add SC7280 SMMU compatible

2021-06-08 Thread Sai Prakash Ranjan
Add compatible for SC7280 SMMU to use the Qualcomm Technologies, Inc.
specific implementation.

Signed-off-by: Sai Prakash Ranjan 
Reviewed-by: Bjorn Andersson 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 6f70f0e57c64..e93b5dbda7de 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -178,6 +178,7 @@ static const struct of_device_id 
qcom_smmu_client_of_match[] __maybe_unused = {
{ .compatible = "qcom,mdss" },
{ .compatible = "qcom,sc7180-mdss" },
{ .compatible = "qcom,sc7180-mss-pil" },
+   { .compatible = "qcom,sc7280-mdss" },
{ .compatible = "qcom,sc8180x-mdss" },
{ .compatible = "qcom,sdm845-mdss" },
{ .compatible = "qcom,sdm845-mss-pil" },
@@ -342,6 +343,7 @@ static struct arm_smmu_device *qcom_smmu_create(struct 
arm_smmu_device *smmu,
 static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = {
{ .compatible = "qcom,msm8998-smmu-v2" },
{ .compatible = "qcom,sc7180-smmu-500" },
+   { .compatible = "qcom,sc7280-smmu-500" },
{ .compatible = "qcom,sc8180x-smmu-500" },
{ .compatible = "qcom,sdm630-smmu-v2" },
{ .compatible = "qcom,sdm845-smmu-500" },
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv4 2/2] iommu/arm-smmu-qcom: Move the adreno smmu specific impl

2021-06-08 Thread Sai Prakash Ranjan
Adreno(GPU) SMMU and APSS(Application Processor SubSystem) SMMU
both implement "arm,mmu-500" in some QTI SoCs and to run through
adreno smmu specific implementation such as enabling split pagetables
support, we need to match the "qcom,adreno-smmu" compatible first
before apss smmu or else we will be running apps smmu implementation
for adreno smmu and the additional features for adreno smmu is never
set. For ex: we have "qcom,sc7280-smmu-500" compatible for both apps
and adreno smmu implementing "arm,mmu-500", so the adreno smmu
implementation is never reached because the current sequence checks
for apps smmu compatible(qcom,sc7280-smmu-500) first and runs that
specific impl and we never reach adreno smmu specific implementation.

Suggested-by: Akhil P Oommen 
Signed-off-by: Sai Prakash Ranjan 
Reviewed-by: Bjorn Andersson 
Acked-by: Jordan Crouse 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index e93b5dbda7de..83c32566bf64 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -370,11 +370,17 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct 
arm_smmu_device *smmu)
return qcom_smmu_create(smmu, _smmu_impl);
}
 
-   if (of_match_node(qcom_smmu_impl_of_match, np))
-   return qcom_smmu_create(smmu, _smmu_impl);
-
+   /*
+* Do not change this order of implementation, i.e., first adreno
+* smmu impl and then apss smmu since we can have both implementing
+* arm,mmu-500 in which case we will miss setting adreno smmu specific
+* features if the order is changed.
+*/
if (of_device_is_compatible(np, "qcom,adreno-smmu"))
return qcom_smmu_create(smmu, _adreno_smmu_impl);
 
+   if (of_match_node(qcom_smmu_impl_of_match, np))
+   return qcom_smmu_create(smmu, _smmu_impl);
+
return smmu;
 }
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCHv4 0/2] iommu/arm-smmu-qcom: Add SC7280 support

2021-06-08 Thread Sai Prakash Ranjan
Patch 1 adds the sc7280 smmu compatible.
Patch 2 moves the adreno smmu check before apss smmu to enable
adreno smmu specific implementation.

Note that dt-binding for sc7280 is already merged.

Changes in v4:
 * Rebased on top of arm-smmu/updates with acpi changes.

Changes in v3:
 * Collect acks and reviews
 * Rebase on top of for-joerg/arm-smmu/updates

Changes in v2:
 * Add a comment to make sure this order is not changed in future (Jordan)

Sai Prakash Ranjan (2):
  iommu/arm-smmu-qcom: Add SC7280 SMMU compatible
  iommu/arm-smmu-qcom: Move the adreno smmu specific impl

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Fix section mismatch warning for detect_ivrs()

2021-06-08 Thread Joerg Roedel
From: Joerg Roedel 

A recent commit introduced this section mismatch warning:

WARNING: modpost: vmlinux.o(.text.unlikely+0x22a1f): Section mismatch 
in reference from the function detect_ivrs() to the variable 
.init.data:amd_iommu_force_enable

The reason is that detect_ivrs() is not marked __init while it should
be, because it is only called from another __init function. Mark
detect_ivrs() __init to get rid of the warning.

Fixes: b1e650db2cc4 ("iommu/amd: Add amd_iommu=force_enable option")
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd/init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 4e4fb0f4e412..46280e6e1535 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2817,7 +2817,7 @@ static int amd_iommu_enable_interrupts(void)
return ret;
 }
 
-static bool detect_ivrs(void)
+static bool __init detect_ivrs(void)
 {
struct acpi_table_header *ivrs_base;
acpi_status status;
-- 
2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] firmware: QCOM_SCM: Allow qcom_scm driver to be loadable as a permenent module

2021-06-08 Thread Will Deacon
On Tue, May 18, 2021 at 09:19:22PM +, John Stultz wrote:
> Allow the qcom_scm driver to be loadable as a permenent module.
> 
> This still uses the "depends on QCOM_SCM || !QCOM_SCM" bit to
> ensure that drivers that call into the qcom_scm driver are
> also built as modules. While not ideal in some cases its the
> only safe way I can find to avoid build errors without having
> those drivers select QCOM_SCM and have to force it on (as
> QCOM_SCM=n can be valid for those drivers).
> 
> Reviving this now that Saravana's fw_devlink defaults to on,
> which should avoid loading troubles seen before.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Andy Gross 
> Cc: Bjorn Andersson 
> Cc: Joerg Roedel 
> Cc: Thomas Gleixner 
> Cc: Jason Cooper 
> Cc: Marc Zyngier 
> Cc: Linus Walleij 
> Cc: Vinod Koul 
> Cc: Kalle Valo 
> Cc: Maulik Shah 
> Cc: Lina Iyer 
> Cc: Saravana Kannan 
> Cc: Todd Kjos 
> Cc: Greg Kroah-Hartman 
> Cc: linux-arm-...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Cc: linux-g...@vger.kernel.org
> Acked-by: Kalle Valo 
> Acked-by: Greg Kroah-Hartman 
> Reviewed-by: Bjorn Andersson 
> Signed-off-by: John Stultz 
> ---
> v3:
> * Fix __arm_smccc_smc build issue reported by
>   kernel test robot 
> v4:
> * Add "depends on QCOM_SCM || !QCOM_SCM" bit to ath10k
>   config that requires it.
> v5:
> * Fix QCOM_QCM typo in Kconfig, it should be QCOM_SCM
> ---
>  drivers/firmware/Kconfig| 2 +-
>  drivers/firmware/Makefile   | 3 ++-
>  drivers/firmware/qcom_scm.c | 4 
>  drivers/iommu/Kconfig   | 2 ++
>  drivers/net/wireless/ath/ath10k/Kconfig | 1 +
>  5 files changed, 10 insertions(+), 2 deletions(-)

[...]

> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 1f111b399bcab..38f7b7a8e2843 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -253,6 +253,7 @@ config SPAPR_TCE_IOMMU
>  config ARM_SMMU
>   tristate "ARM Ltd. System MMU (SMMU) Support"
>   depends on ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)
> + depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
>   select IOMMU_API
>   select IOMMU_IO_PGTABLE_LPAE
>   select ARM_DMA_USE_IOMMU if ARM
> @@ -382,6 +383,7 @@ config QCOM_IOMMU
>   # Note: iommu drivers cannot (yet?) be built as modules
>   bool "Qualcomm IOMMU Support"
>   depends on ARCH_QCOM || (COMPILE_TEST && !GENERIC_ATOMIC64)
> + depends on QCOM_SCM=y
>   select IOMMU_API
>   select IOMMU_IO_PGTABLE_LPAE
>   select ARM_DMA_USE_IOMMU

For this part:

Acked-by: Will Deacon 

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [GIT PULL] memory: Tegra memory controller for v5.14

2021-06-08 Thread Will Deacon
Hi Krzysztof, Thierry,

On Mon, Jun 07, 2021 at 10:49:10AM +0200, Krzysztof Kozlowski wrote:
> Hi Thierry and Will,
> 
> This is the pull for you to base further SMMU aptches (prevent early SMMU
> faults).

This is a tonne of code for me to pull into the SMMU tree given that I only
want one patch!

Thierry, if I just stick:

https://lore.kernel.org/r/20210603164632.1000458-4-thierry.red...@gmail.com

on its own branch, can you stitch together whatever you need?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv3 0/2] iommu/arm-smmu-qcom: Add SC7280 support

2021-06-08 Thread Will Deacon
On Tue, Apr 20, 2021 at 11:34:55AM +0530, Sai Prakash Ranjan wrote:
> Patch 1 adds the sc7280 smmu compatible.
> Patch 2 moves the adreno smmu check before apss smmu to enable
> adreno smmu specific implementation.
> 
> Note that dt-binding for sc7280 is already merged.

This conflicts with what I've already got queued at [1]. Please can you
send an updated version, as I wasn't sure about the initialisation order
you need here wrt to the ACPI parts.

Thanks,

Will

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/arm-smmu-v3: Decrease the queue size of evtq and priq

2021-06-08 Thread Will Deacon
On Mon, 31 May 2021 20:35:53 +0800, Zhen Lei wrote:
> Commit d25f6ead162e ("iommu/arm-smmu-v3: Increase maximum size of queues")
> expands the cmdq queue size to improve the success rate of concurrent
> command queue space allocation by multiple cores. However, this extension
> does not apply to evtq and priq, because for both of them, the SMMU driver
> is the consumer. Instead, memory resources are wasted. Therefore, the
> queue size of evtq and priq is restored to the original setting, one page.

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu-v3: Decrease the queue size of evtq and priq
  https://git.kernel.org/will/c/f115f3c0d5d8

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2] iommu: arm-smmu-qcom: Add sm6125 compatible

2021-06-08 Thread Will Deacon
On Sun, 23 May 2021 23:25:33 +0200, Martin Botka wrote:
> Add compatible for SM6125 SoC

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu: arm-smmu-qcom: Add sm6125 compatible
  https://git.kernel.org/will/c/6321484d1c24

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 0/3] iommu/arm-smmu-v3: Add stall support

2021-06-08 Thread Will Deacon
On Wed, 26 May 2021 18:19:25 +0200, Jean-Philippe Brucker wrote:
> Add stall support for SMMUv3, enabling I/O page faults and SVA for
> compatible devices. No change since last version [1], but I'd still like
> this to be considered for upstream, because there exists hardware and
> applications.
> 
> Stall is implemented by the Kunpeng 920 processor for its compression
> and crypto accelerators, with which I tested the SVA infrastructure.
> Using the userspace accelerator API [2], a program can obtain a queue
> from one of these devices and submit compression or encryption work
> within the program's address space. UADK [3] provides a library to do
> this, and there is an openssl plugin [4] to use it.
> 
> [...]

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/3] dt-bindings: document stall property for IOMMU masters
  https://git.kernel.org/will/c/ed1d08b9d0c9
[2/3] ACPI/IORT: Enable stall support for platform devices
  https://git.kernel.org/will/c/6522b1e0c78f
[3/3] iommu/arm-smmu-v3: Add stall support for platform devices
  https://git.kernel.org/will/c/395ad89d11fd

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Ratelimit event dump

2021-06-08 Thread Will Deacon
On Mon, 31 May 2021 11:56:50 +0200, Jean-Philippe Brucker wrote:
> When a device or driver misbehaves, it is possible to receive DMA fault
> events much faster than we can print them out, causing a lock up of the
> system and inability to cancel the source of the problem. Ratelimit
> printing of events to help recovery.

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu-v3: Ratelimit event dump
  https://git.kernel.org/will/c/9cff922bba42

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RESEND PATCH v4 1/6] iommu/arm-smmu: Add support for driver IOMMU fault handlers

2021-06-08 Thread Will Deacon
On Wed, Jun 02, 2021 at 09:52:44AM -0700, Rob Clark wrote:
> From: Jordan Crouse 
> 
> Call report_iommu_fault() to allow upper-level drivers to register their
> own fault handlers.
> 
> Signed-off-by: Jordan Crouse 
> Signed-off-by: Rob Clark 
> ---
>  drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 6f72c4d208ca..b4b32d31fc06 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -408,6 +408,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
> *dev)
>   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
>   int idx = smmu_domain->cfg.cbndx;
> + int ret;
>  
>   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
>   if (!(fsr & ARM_SMMU_FSR_FAULT))
> @@ -417,8 +418,12 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
> *dev)
>   iova = arm_smmu_cb_readq(smmu, idx, ARM_SMMU_CB_FAR);
>   cbfrsynra = arm_smmu_gr1_read(smmu, ARM_SMMU_GR1_CBFRSYNRA(idx));
>  
> - dev_err_ratelimited(smmu->dev,
> - "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
> cbfrsynra=0x%x, cb=%d\n",
> + ret = report_iommu_fault(domain, NULL, iova,
> + fsynr & ARM_SMMU_FSYNR0_WNR ? IOMMU_FAULT_WRITE : 
> IOMMU_FAULT_READ);
> +
> + if (ret == -ENOSYS)
> + dev_err_ratelimited(smmu->dev,
> + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
> cbfrsynra=0x%x, cb=%d\n",
>   fsr, iova, fsynr, cbfrsynra, idx);

Acked-by: Will Deacon 

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Enrico Weigelt, metux IT consult

On 07.06.21 20:01, Jason Gunthorpe wrote:

 it is what it is, select has a fixed size bitmap of FD #s and
a hard upper bound on that size as part of the glibc ABI - can't be
fixed.


in glibc ABI ? Uuuuh!


--mtx

--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
i...@metux.net -- +49-151-27565287
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Enrico Weigelt, metux IT consult

On 04.06.21 14:30, Jason Gunthorpe wrote:

Hi,


Containers already needed to do this today. Container orchestration is
hard.


Yes, but I hate to see even more work upcoming here.


Yes, /dev/ioasid shouldn't do anything unless you have a device to
connect it with. In this way it is probably safe to stuff it into
every container.


Okay, if we can guarantee that, I'm completely fine.


Having FDs spawn other FDs is pretty ugly, it defeats the "everything
is a file" model of UNIX.


Unfortunately, this is already defeated in many other places :(
(I'd even claim that ioctls already break it :p)


I think you are reaching a bit :)


It seems your approach also breaks this, since we now need to open two
files in order to talk to one device.


It is two devices, thus two files.


Two separate real (hardware) devices or just two logical device nodes ?


--mtx

--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
i...@metux.net -- +49-151-27565287
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4] iommu/of: Fix pci_request_acs() before enumerating PCI devices

2021-06-08 Thread John Garry

On 07/06/2021 13:58, Xingang Wang wrote:

On 2021/6/5 3:04, Bjorn Helgaas wrote:

[+cc John, who tested 6bf6c24720d3]


I think that whoever committed this patch just blanket applied my TB 
tag. I would not have tested anything supporting ACS or even DT:

https://lore.kernel.org/linux-iommu/c8eb97b1-dab5-cc25-7669-2819f64a8...@huawei.com/

Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu: Delete a duplicate check in iommu_change_dev_def_domain()

2021-06-08 Thread Will Deacon
On Thu, May 13, 2021 at 03:58:15PM +0800, Zhen Lei wrote:
> Function iommu_group_store_type() is the only caller of the static
> function iommu_change_dev_def_domain() and has performed
> "if (WARN_ON(!group))" detection before calling it. So the one here is
> redundant.
> 
> Signed-off-by: Zhen Lei 
> ---
>  drivers/iommu/iommu.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 971068da67cb91d..8cdf6a1c4bfd773 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3059,9 +3059,6 @@ static int iommu_change_dev_def_domain(struct 
> iommu_group *group,
>   int ret, dev_def_dom;
>   struct device *dev;
>  
> - if (!group)
> - return -EINVAL;
> -
>   mutex_lock(>mutex);
>  
>   if (group->default_domain != group->domain) {

Acked-by: Will Deacon 

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 08/16] media: mtk-mdp: Get rid of mtk_smi_larb_get/put

2021-06-08 Thread houlong wei
On Sat, 2021-04-10 at 17:11 +0800, Yong Wu wrote:
> MediaTek IOMMU has already added the device_link between the consumer
> and smi-larb device. If the mdp device call the pm_runtime_get_sync,
> the smi-larb's pm_runtime_get_sync also be called automatically.
> 
> CC: Minghsiu Tsai 
> CC: Houlong Wei 
> Signed-off-by: Yong Wu 
> Reviewed-by: Evan Green 
> ---
>  drivers/media/platform/mtk-mdp/mtk_mdp_comp.c | 40 ---
>  drivers/media/platform/mtk-mdp/mtk_mdp_comp.h |  2 -
>  drivers/media/platform/mtk-mdp/mtk_mdp_core.c |  1 -
>  3 files changed, 43 deletions(-)
> 
> diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c 
> b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c
> index b3426a551bea..1e3833f1c9ae 100644
> --- a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c
> +++ b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.c
> @@ -9,7 +9,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "mtk_mdp_comp.h"
>  
> @@ -18,14 +17,6 @@ void mtk_mdp_comp_clock_on(struct device *dev, struct 
> mtk_mdp_comp *comp)
>  {
>   int i, err;
>  
> - if (comp->larb_dev) {
> - err = mtk_smi_larb_get(comp->larb_dev);
> - if (err)
> - dev_err(dev,
> - "failed to get larb, err %d. type:%d\n",
> - err, comp->type);
> - }
> -
>   for (i = 0; i < ARRAY_SIZE(comp->clk); i++) {
>   if (IS_ERR(comp->clk[i]))
>   continue;
> @@ -46,17 +37,12 @@ void mtk_mdp_comp_clock_off(struct device *dev, struct 
> mtk_mdp_comp *comp)
>   continue;
>   clk_disable_unprepare(comp->clk[i]);
>   }
> -
> - if (comp->larb_dev)
> - mtk_smi_larb_put(comp->larb_dev);
>  }
>  
>  int mtk_mdp_comp_init(struct device *dev, struct device_node *node,
> struct mtk_mdp_comp *comp,
> enum mtk_mdp_comp_type comp_type)
>  {
> - struct device_node *larb_node;
> - struct platform_device *larb_pdev;
>   int ret;
>   int i;
>  
> @@ -77,32 +63,6 @@ int mtk_mdp_comp_init(struct device *dev, struct 
> device_node *node,
>   break;
>   }
>  
> - /* Only DMA capable components need the LARB property */
> - comp->larb_dev = NULL;
> - if (comp->type != MTK_MDP_RDMA &&
> - comp->type != MTK_MDP_WDMA &&
> - comp->type != MTK_MDP_WROT)
> - return 0;
> -
> - larb_node = of_parse_phandle(node, "mediatek,larb", 0);
> - if (!larb_node) {
> - dev_err(dev,
> - "Missing mediadek,larb phandle in %pOF node\n", node);
> - ret = -EINVAL;
> - goto put_dev;
> - }
> -
> - larb_pdev = of_find_device_by_node(larb_node);
> - if (!larb_pdev) {
> - dev_warn(dev, "Waiting for larb device %pOF\n", larb_node);
> - of_node_put(larb_node);
> - ret = -EPROBE_DEFER;
> - goto put_dev;
> - }
> - of_node_put(larb_node);
> -
> - comp->larb_dev = _pdev->dev;
> -
>   return 0;
>  
>  put_dev:
> diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h 
> b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h
> index 1bf0242cce46..36bc1b8f6222 100644
> --- a/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h
> +++ b/drivers/media/platform/mtk-mdp/mtk_mdp_comp.h
> @@ -27,14 +27,12 @@ enum mtk_mdp_comp_type {
>   * @node:list node to track sibing MDP components
>   * @dev_node:component device node
>   * @clk: clocks required for component
> - * @larb_dev:SMI device required for component
>   * @type:component type
>   */
>  struct mtk_mdp_comp {
>   struct list_headnode;
>   struct device_node  *dev_node;
>   struct clk  *clk[2];
> - struct device   *larb_dev;
>   enum mtk_mdp_comp_type  type;
>  };
>  
> diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_core.c 
> b/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
> index 976aa1f4829b..70a8eab16863 100644
> --- a/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
> +++ b/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
> @@ -17,7 +17,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "mtk_mdp_core.h"
>  #include "mtk_mdp_m2m.h"

Reviewed-by: Houlong Wei 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/1] iommu: Clear a lot of spelling mistakes

2021-06-08 Thread Will Deacon
On Mon, May 10, 2021 at 11:54:25AM +0800, Zhen Lei wrote:
> All spelling mistakes are in the comments, no functional change.

[...]

> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index e06b8a0e2b56bdd..b9e4be35c8dcad2 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -161,7 +161,7 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
> *data);
>   * The Region 'A'(I/O) can NOT be mapped by M4U; For Region 'B'/'C'/'D', the
>   * bit32 of the CPU physical address always is needed to set, and for Region
>   * 'E', the CPU physical address keep as is.
> - * Additionally, The iommu consumers always use the CPU phyiscal address.
> + * Additionally, The iommu consumers always use the CPU physical address.

If we're fixing typos, then I think the 'T' in "The iommu" should be
lowercase here.

Anyway, all the other fixes look correct to me. I'll leave it up to Joerg as
to whether he wants to apply this or not, given the impact on git blame and
the potential for conflicts.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: Select PCI_ATS explicitly

2021-06-08 Thread Will Deacon
On Wed, May 12, 2021 at 02:53:13PM +0800, Lu Baolu wrote:
> The Intel VT-d implementation supports device TLB management. Select
> PCI_ATS explicitly so that the pci_ats helpers are always available.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
> index 28a3d1596c76..7e5b240b801d 100644
> --- a/drivers/iommu/intel/Kconfig
> +++ b/drivers/iommu/intel/Kconfig
> @@ -14,6 +14,7 @@ config INTEL_IOMMU
>   select SWIOTLB
>   select IOASID
>   select IOMMU_DMA
> + select PCI_ATS
>   help
> DMA remapping (DMAR) devices support enables independent address
> translations for Direct Memory Access (DMA) from devices.

Acked-by: Will Deacon 

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/qcom: Cleanup resources in case of probe error path

2021-06-08 Thread Will Deacon
On Thu, Apr 22, 2021 at 03:40:30AM +0530, Amey Narkhede wrote:
> If device registration fails, remove sysfs attribute
> and if setting bus callbacks fails, unregister the device
> and cleanup the sysfs attribute.
> 
> Signed-off-by: Amey Narkhede 
> ---
>  drivers/iommu/arm/arm-smmu/qcom_iommu.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
> b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> index 4294abe389b2..5fa128a1f7f0 100644
> --- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> +++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
> @@ -850,10 +850,12 @@ static int qcom_iommu_device_probe(struct 
> platform_device *pdev)
>   ret = iommu_device_register(_iommu->iommu, _iommu_ops, dev);
>   if (ret) {
>   dev_err(dev, "Failed to register iommu\n");
> - return ret;
> + goto err_sysfs_remove;
>   }
> 
> - bus_set_iommu(_bus_type, _iommu_ops);
> + ret = bus_set_iommu(_bus_type, _iommu_ops);
> + if (ret)
> + goto err_unregister_device;
> 
>   if (qcom_iommu->local_base) {
>   pm_runtime_get_sync(dev);
> @@ -862,6 +864,14 @@ static int qcom_iommu_device_probe(struct 
> platform_device *pdev)
>   }
> 
>   return 0;
> +
> +err_unregister_device:
> + iommu_device_unregister(_iommu->iommu);
> +
> +err_sysfs_remove:
> + iommu_device_sysfs_remove(_iommu->iommu);
> +
> + return ret;

It looks like we're also missing this logic in arm-smmu/arm-smmu.c and
arm-smmu-v3/arm-smmu-v3.c. Would you be able to fix those up too, please?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Enrico Weigelt, metux IT consult

On 08.06.21 03:00, Jason Wang wrote:

Hi folks,

Just to make sure we are in the same page. What I meant is, if the DMA 
behavior like (no-snoop) is device specific. There's no need to mandate 
a virtio general attributes. We can describe it per device. The devices 
implemented in the current spec does not use non-coherent DMA doesn't 
mean any future devices won't do that. The driver could choose to use 
transport (e.g PCI), platform (ACPI) or device specific (general virtio 
command) way to detect and flush cache when necessary.


Maybe I've totally misunderstood the whole issue, but what I've learned
to far:

* it's a performance improvement for certain scenarios
* whether it can be used depends on the devices as well as the
  underlying transport (combination of both)
* whether it should be used (when possible) can only be decided by the
  driver

Correct ?

I tend to believe that's something that virtio infrastructure should
handle in a generic way.

Maybe the device as well as the transport could announce their
capability (which IMHO should go via the virtio protocol), and if both
are capable, the (guest's) virtio subsys tells the driver whether it's
usable for a specific device. Perhaps we should also have a mechanism
to tell the device that it's actually used.


Sorry, if i'm completely on the wrong page and just talking junk here :o


--mtx

--
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
i...@metux.net -- +49-151-27565287
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Paolo Bonzini

On 07/06/21 19:59, Jason Gunthorpe wrote:

The KVM interface is the same kvm-vfio device that exists already.  The
userspace API does not need to change at all: adding one VFIO file
descriptor with WBINVD enabled to the kvm-vfio device lets the VM use WBINVD
functionality (see kvm_vfio_update_coherency).


The problem is we are talking about adding a new /dev/ioasid FD and it
won't fit into the existing KVM VFIO FD interface. There are lots of
options here, one is to add new ioctls that specifically use the new
FD, the other is to somehow use VFIO as a proxy to carry things to the
/dev/ioasid FD code.


Exactly.


Alternatively you can add a KVM_DEV_IOASID_{ADD,DEL} pair of ioctls. But it
seems useless complication compared to just using what we have now, at least
while VMs only use IOASIDs via VFIO.


The simplest is KVM_ENABLE_WBINVD() and be done
with it.


The simplest one is KVM_DEV_VFIO_GROUP_ADD/DEL, that already exists and 
also covers hot-unplug.  The second simplest one is KVM_DEV_IOASID_ADD/DEL.


It need not be limited to wbinvd support, it's just a generic "let VMs 
do what userspace can do if it has access to this file descriptor". 
That it enables guest WBINVD is an implementation detail.



Either way, there should be no policy attached to the add/delete operations.
KVM users want to add the VFIO (or IOASID) file descriptors to the device
independent of WBINVD.  If userspace wants/needs to apply its own policy on
whether to enable WBINVD or not, they can do it on the VFIO/IOASID side:


Why does KVM need to know abut IOASID's? I don't think it can do
anything with this general information.


Indeed, it only uses them as the security proofs---either VFIO or IOASID 
file descriptors can be used as such.


Paolo

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] dma-mapping: remove trailing spaces and tabs

2021-06-08 Thread Zhen Lei
Run the following command to find and remove the trailing spaces and tabs:

find kernel/dma/ -type f | xargs sed -r -i 's/[ \t]+$//'

Signed-off-by: Zhen Lei 
---
 kernel/dma/coherent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c
index 5b5b6c7ec7f2..794e76b03b34 100644
--- a/kernel/dma/coherent.c
+++ b/kernel/dma/coherent.c
@@ -111,7 +111,7 @@ static int dma_assign_coherent_memory(struct device *dev,
  * Declare a region of memory to be handed out by dma_alloc_coherent() when it
  * is asked for coherent memory for this device.  This shall only be used
  * from platform code, usually based on the device tree description.
- * 
+ *
  * phys_addr is the CPU physical address to which the memory is currently
  * assigned (this will be ioremapped so the CPU can access the region).
  *
-- 
2.25.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Linuxarm] Re: [PATCH 0/4] Free cached iovas when rmmod the driver of the last device in group

2021-06-08 Thread chenxiang (M)

Hi Robin,


在 2021/6/7 19:23, Robin Murphy 写道:

On 2021-06-07 03:42, chenxiang wrote:

From: Xiang Chen 

When rmmod the driver of the last device in the group, cached iovas 
are not

used, and it is better to free them to save memories. And also export
function free_rcache_cached_iovas() and iommu_domain_to_iova().


How common is it to use a device a significant amount, then unbind its 
driver and not use it for long enough to care about? Off-hand I can't 
think of a particularly realistic use-case which looks like that - the 
closest I can imagine is unbinding a default kernel driver to replace 
it with VFIO, but I would expect the set of devices intended for 
assignment to be distinct from the set of devices used by the host 
itself, and thus the host driver wouldn't have actually done much to 
generate a lot of DMA mappings in that initial period. Is my 
expectation there wrong?


It is possible that user uses the driver for a while and then rmmod it 
(for example, SAS card is the driver we use always, but sometimes we 
need to compare the performance with raid card, so we will insmod the 
raid driver, and rmmod
the driver after the test). At this situation, the rcache iovas of raid 
card are always still even if rmmod the driver (user always rmmod the 
driver rather than removing the device which will release the group and 
release all resources).




If there is such a case, how much memory does this actually save in 
practice? The theoretical worst-case should be roughly 4 * 128 * 6 * 
sizeof(struct iova) bytes per CPU, which is around 192KB assuming 
64-byte kmem cache alignment. 


There are 128 CPUs in our ARM64 kunpeng920 board, and i add a debugfs 
interface drop_rcache of every group in local code which calls function 
free_rcache_cached_iovas(), and we can see that it saves 1~2M memory 
after freeing cached iovas.


estuary:/$ free
 total   used   free sharedbuffers cached
Mem: 5267767201336216  525440504 177664  0 177664
-/+ buffers/cache:1158552  525618168
Swap:0  0  0
estuary:/sys/kernel/debug/iommu/iovad/iommu_domain29$ echo 1 > drop_rcache
estuary:/sys/kernel/debug/iommu/iovad/iommu_domain29$ free
 total   used   free sharedbuffers cached
Mem: 5267767201334672  525442048 177664  0 177664
-/+ buffers/cache:1157008  525619712
Swap:0  0  0

The other reason i want to free rcache iovas in suitable place is the 
IOVA longterm issue[0] (which is unrecoverable), if freeing cached iovas 
when rmmod the driver or
adding a debugfs to do that, i think we can recover it by rmmod the 
driver and then insmod it or calling the debugfs drop_rcache though it 
doesn't solve the issue.


[0] 
https://lore.kernel.org/linux-iommu/1607538189-237944-4-git-send-email-john.ga...@huawei.com/


However it seems rather unlikely in practice to have every possible 
cache entry of every size used, so if saving smaller amounts of memory 
is a concern wouldn't you also want to explicitly drain the flush 
queues (16KB per CPU) and maybe look at trying to reclaim the unused 
pagetable pages from the domain itself - that ~192KB worth of cached 
IOVAs represents ~32K pages worth of IOVA space, which for an 
implementation like io-pgtable-arm with the 4KB granule means ~256KB 
worth of non-leaf pagetables left behind.


Ok, we may consider about those.



I'm not against the idea of having a mechanism to "compact" an idle 
DMA domain if there are convincing numbers to back it up, but the 
actual implementation would need to be better than this as well - 
having the IOMMU core code poking directly into the internals of the 
iommu-dma abstraction is not the way to go, and exporting anything to 
modules, which the IOMU core is not, smells suspicious.


Robin.


Xiang Chen (4):
   iommu/iova: add a function to free all rcached iovas and export it
   iommu/iova: use function free_rcache_cached_iovas() to free all
 rcached iovas
   dma-iommu: add a interface to get iova_domain from iommu domain
   iommu: free cached iovas when rmmod the driver of the last device in
 the group

  drivers/iommu/dma-iommu.c |  7 +++
  drivers/iommu/iommu.c |  7 +++
  drivers/iommu/iova.c  | 17 -
  include/linux/dma-iommu.h |  6 ++
  include/linux/iova.h  |  5 +
  5 files changed, 37 insertions(+), 5 deletions(-)


___
Linuxarm mailing list -- linux...@openeuler.org
To unsubscribe send an email to linuxarm-le...@openeuler.org



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread David Gibson
On Thu, Jun 03, 2021 at 09:11:05AM -0300, Jason Gunthorpe wrote:
> On Thu, Jun 03, 2021 at 03:45:09PM +1000, David Gibson wrote:
> > On Wed, Jun 02, 2021 at 01:58:38PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Jun 02, 2021 at 04:48:35PM +1000, David Gibson wrote:
> > > > > > /* Bind guest I/O page table  */
> > > > > > bind_data = {
> > > > > > .ioasid = gva_ioasid;
> > > > > > .addr   = gva_pgtable1;
> > > > > > // and format information
> > > > > > };
> > > > > > ioctl(ioasid_fd, IOASID_BIND_PGTABLE, _data);
> > > > > 
> > > > > Again I do wonder if this should just be part of alloc_ioasid. Is
> > > > > there any reason to split these things? The only advantage to the
> > > > > split is the device is known, but the device shouldn't impact
> > > > > anything..
> > > > 
> > > > I'm pretty sure the device(s) could matter, although they probably
> > > > won't usually. 
> > > 
> > > It is a bit subtle, but the /dev/iommu fd itself is connected to the
> > > devices first. This prevents wildly incompatible devices from being
> > > joined together, and allows some "get info" to report the capability
> > > union of all devices if we want to do that.
> > 
> > Right.. but I've not been convinced that having a /dev/iommu fd
> > instance be the boundary for these types of things actually makes
> > sense.  For example if we were doing the preregistration thing
> > (whether by child ASes or otherwise) then that still makes sense
> > across wildly different devices, but we couldn't share that layer if
> > we have to open different instances for each of them.
> 
> It is something that still seems up in the air.. What seems clear for
> /dev/iommu is that it
>  - holds a bunch of IOASID's organized into a tree
>  - holds a bunch of connected devices

Right, and it's still not really clear to me what devices connected to
the same /dev/iommu instance really need to have in common, as
distinct from what devices connected to the same specific ioasid need
to have in common.

>  - holds a pinned memory cache
> 
> One thing it must do is enforce IOMMU group security. A device cannot
> be attached to an IOASID unless all devices in its IOMMU group are
> part of the same /dev/iommu FD.

Well, you can't attach a device to an individual IOASID unless all
devices in its group are attached to the same individual IOASID
either, so I'm not clear what benefit there is to enforcing it at the
/dev/iommu instance as well as at the individual ioasid level.

> The big open question is what parameters govern allowing devices to
> connect to the /dev/iommu:
>  - all devices can connect and we model the differences inside the API
>somehow.
>  - Only sufficiently "similar" devices can be connected
>  - The FD's capability is the minimum of all the connected devices
> 
> There are some practical problems here, when an IOASID is created the
> kernel does need to allocate a page table for it, and that has to be
> in some definite format.
> 
> It may be that we had a false start thinking the FD container should
> be limited. Perhaps creating an IOASID should pass in a list
> of the "device labels" that the IOASID will be used with and that can
> guide the kernel what to do?
> 
> > Right, but at this stage I'm just not seeing a really clear (across
> > platforms and device typpes) boundary for what things have to be per
> > IOASID container and what have to be per IOASID, so I'm just not sure
> > the /dev/iommu instance grouping makes any sense.
> 
> I would push as much stuff as possible to be per-IOASID..

I agree.  But the question is what's *not* possible to be per-IOASID,
so what's the semantic boundary that defines when things have to be in
the same /dev/iommu instance, but not the same IOASID.

> > > I don't know if that small advantage is worth the extra complexity
> > > though.
> > > 
> > > > But it would certainly be possible for a system to have two
> > > > different host bridges with two different IOMMUs with different
> > > > pagetable formats.  Until you know which devices (and therefore
> > > > which host bridge) you're talking about, you don't know what formats
> > > > of pagetable to accept.  And if you have devices from *both* bridges
> > > > you can't bind a page table at all - you could theoretically support
> > > > a kernel managed pagetable by mirroring each MAP and UNMAP to tables
> > > > in both formats, but it would be pretty reasonable not to support
> > > > that.
> > > 
> > > The basic process for a user space owned pgtable mode would be:
> > > 
> > >  1) qemu has to figure out what format of pgtable to use
> > > 
> > > Presumably it uses query functions using the device label.
> > 
> > No... in the qemu case it would always select the page table format
> > that it needs to present to the guest.  That's part of the
> > guest-visible platform that's selected by qemu's configuration.
> 
> I should have said "vfio user" here because apps like DPDK might use
> this 

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread David Gibson
On Thu, Jun 03, 2021 at 07:17:23AM +, Tian, Kevin wrote:
> > From: David Gibson 
> > Sent: Wednesday, June 2, 2021 2:15 PM
> > 
> [...] 
> > > An I/O address space takes effect in the IOMMU only after it is attached
> > > to a device. The device in the /dev/ioasid context always refers to a
> > > physical one or 'pdev' (PF or VF).
> > 
> > What you mean by "physical" device here isn't really clear - VFs
> > aren't really physical devices, and the PF/VF terminology also doesn't
> > extent to non-PCI devices (which I think we want to consider for the
> > API, even if we're not implemenenting it any time soon).
> 
> Yes, it's not very clear, and more in PCI context to simplify the 
> description. A "physical" one here means an PCI endpoint function
> which has a unique RID. It's more to differentiate with later mdev/
> subdevice which uses both RID+PASID. Naming is always a hard
> exercise to me... Possibly I'll just use device vs. subdevice in future
> versions.
> 
> > 
> > Now, it's clear that we can't program things into the IOMMU before
> > attaching a device - we might not even know which IOMMU to use.
> 
> yes
> 
> > However, I'm not sure if its wise to automatically make the AS "real"
> > as soon as we attach a device:
> > 
> >  * If we're going to attach a whole bunch of devices, could we (for at
> >least some IOMMU models) end up doing a lot of work which then has
> >to be re-done for each extra device we attach?
> 
> which extra work did you specifically refer to? each attach just implies
> writing the base address of the I/O page table to the IOMMU structure
> corresponding to this device (either being a per-device entry, or per
> device+PASID entry).
> 
> and generally device attach should not be in a hot path.
> 
> > 
> >  * With kernel managed IO page tables could attaching a second device
> >(at least on some IOMMU models) require some operation which would
> >require discarding those tables?  e.g. if the second device somehow
> >forces a different IO page size
> 
> Then the attach should fail and the user should create another IOASID
> for the second device.

Couldn't this make things weirdly order dependent though?  If device A
has strictly more capabilities than device B, then attaching A then B
will be fine, but B then A will trigger a new ioasid fd.

> > For that reason I wonder if we want some sort of explicit enable or
> > activate call.  Device attaches would only be valid before, map or
> > attach pagetable calls would only be valid after.
> 
> I'm interested in learning a real example requiring explicit enable...
> 
> > 
> > > One I/O address space could be attached to multiple devices. In this case,
> > > /dev/ioasid uAPI applies to all attached devices under the specified 
> > > IOASID.
> > >
> > > Based on the underlying IOMMU capability one device might be allowed
> > > to attach to multiple I/O address spaces, with DMAs accessing them by
> > > carrying different routing information. One of them is the default I/O
> > > address space routed by PCI Requestor ID (RID) or ARM Stream ID. The
> > > remaining are routed by RID + Process Address Space ID (PASID) or
> > > Stream+Substream ID. For simplicity the following context uses RID and
> > > PASID when talking about the routing information for I/O address spaces.
> > 
> > I'm not really clear on how this interacts with nested ioasids.  Would
> > you generally expect the RID+PASID IOASes to be children of the base
> > RID IOAS, or not?
> 
> No. With Intel SIOV both parent/children could be RID+PASID, e.g.
> when one enables vSVA on a mdev.

Hm, ok.  I really haven't understood how the PASIDs fit into this
then.  I'll try again on v2.

> > If the PASID ASes are children of the RID AS, can we consider this not
> > as the device explicitly attaching to multiple IOASIDs, but instead
> > attaching to the parent IOASID with awareness of the child ones?
> > 
> > > Device attachment is initiated through passthrough framework uAPI (use
> > > VFIO for simplicity in following context). VFIO is responsible for 
> > > identifying
> > > the routing information and registering it to the ioasid driver when 
> > > calling
> > > ioasid attach helper function. It could be RID if the assigned device is
> > > pdev (PF/VF) or RID+PASID if the device is mediated (mdev). In addition,
> > > user might also provide its view of virtual routing information (vPASID) 
> > > in
> > > the attach call, e.g. when multiple user-managed I/O address spaces are
> > > attached to the vfio_device. In this case VFIO must figure out whether
> > > vPASID should be directly used (for pdev) or converted to a kernel-
> > > allocated one (pPASID, for mdev) for physical routing (see section 4).
> > >
> > > Device must be bound to an IOASID FD before attach operation can be
> > > conducted. This is also through VFIO uAPI. In this proposal one device
> > > should not be bound to multiple FD's. Not sure about the gain of
> > > allowing it except adding 

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread David Gibson
On Fri, Jun 04, 2021 at 12:24:08PM +0200, Jean-Philippe Brucker wrote:
> On Thu, Jun 03, 2021 at 03:45:09PM +1000, David Gibson wrote:
> > > > But it would certainly be possible for a system to have two
> > > > different host bridges with two different IOMMUs with different
> > > > pagetable formats.  Until you know which devices (and therefore
> > > > which host bridge) you're talking about, you don't know what formats
> > > > of pagetable to accept.  And if you have devices from *both* bridges
> > > > you can't bind a page table at all - you could theoretically support
> > > > a kernel managed pagetable by mirroring each MAP and UNMAP to tables
> > > > in both formats, but it would be pretty reasonable not to support
> > > > that.
> > > 
> > > The basic process for a user space owned pgtable mode would be:
> > > 
> > >  1) qemu has to figure out what format of pgtable to use
> > > 
> > > Presumably it uses query functions using the device label.
> > 
> > No... in the qemu case it would always select the page table format
> > that it needs to present to the guest.  That's part of the
> > guest-visible platform that's selected by qemu's configuration.
> > 
> > There's no negotiation here: either the kernel can supply what qemu
> > needs to pass to the guest, or it can't.  If it can't qemu, will have
> > to either emulate in SW (if possible, probably using a kernel-managed
> > IOASID to back it) or fail outright.
> > 
> > > The
> > > kernel code should look at the entire device path through all the
> > > IOMMU HW to determine what is possible.
> > > 
> > > Or it already knows because the VM's vIOMMU is running in some
> > > fixed page table format, or the VM's vIOMMU already told it, or
> > > something.
> > 
> > Again, I think you have the order a bit backwards.  The user selects
> > the capabilities that the vIOMMU will present to the guest as part of
> > the qemu configuration.  Qemu then requests that of the host kernel,
> > and either the host kernel supplies it, qemu emulates it in SW, or
> > qemu fails to start.
> 
> Hm, how fine a capability are we talking about?  If it's just "give me
> VT-d capabilities" or "give me Arm capabilities" that would work, but
> probably isn't useful. Anything finer will be awkward because userspace
> will have to try combinations of capabilities to see what sticks, and
> supporting new hardware will drop compatibility for older one.

For the qemu case, I would imagine a two stage fallback:

1) Ask for the exact IOMMU capabilities (including pagetable
   format) that the vIOMMU has.  If the host can supply, you're
   good

2) If not, ask for a kernel managed IOAS.  Verify that it can map
   all the IOVA ranges the guest vIOMMU needs, and has an equal or
   smaller pagesize than the guest vIOMMU presents.  If so,
   software emulate the vIOMMU by shadowing guest io pagetable
   updates into the kernel managed IOAS.

3) You're out of luck, don't start.

For both (1) and (2) I'd expect it to be asking this question *after*
saying what devices are attached to the IOAS, based on the virtual
hardware configuration.  That doesn't cover hotplug, of course, for
that you have to just fail the hotplug if the new device isn't
supportable with the IOAS you already have.

One can imagine optimizations where for certain intermediate cases you
could do a lighter SW emu if the host supports a model that's close to
the vIOMMU one, and you're able to trap and emulate the differences.
In practice I doubt anyone's going to have time to look for such cases
and implement the logic for it.

> For example depending whether the hardware IOMMU is SMMUv2 or SMMUv3, that
> completely changes the capabilities offered to the guest (some v2
> implementations support nesting page tables, but never PASID nor PRI
> unlike v3.) The same vIOMMU could support either, presenting different
> capabilities to the guest, even multiple page table formats if we wanted
> to be exhaustive (SMMUv2 supports the older 32-bit descriptor), but it
> needs to know early on what the hardware is precisely. Then some new page
> table format shows up and, although the vIOMMU can support that in
> addition to older ones, QEMU will have to pick a single one, that it
> assumes the guest knows how to drive?
> 
> I think once it binds a device to an IOASID fd, QEMU will want to probe
> what hardware features are available before going further with the vIOMMU
> setup (is there PASID, PRI, which page table formats are supported,
> address size, page granule, etc). Obtaining precise information about the
> hardware would be less awkward than trying different configurations until
> one succeeds. Binding an additional device would then fail if its pIOMMU
> doesn't support exactly the features supported for the first device,
> because we don't know which ones the guest will choose. QEMU will have to
> open a new IOASID fd for that device.

No, this fundamentally 

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread David Gibson
On Thu, Jun 03, 2021 at 09:28:32AM -0300, Jason Gunthorpe wrote:
> On Thu, Jun 03, 2021 at 03:23:17PM +1000, David Gibson wrote:
> > On Wed, Jun 02, 2021 at 01:37:53PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Jun 02, 2021 at 04:57:52PM +1000, David Gibson wrote:
> > > 
> > > > I don't think presence or absence of a group fd makes a lot of
> > > > difference to this design.  Having a group fd just means we attach
> > > > groups to the ioasid instead of individual devices, and we no longer
> > > > need the bookkeeping of "partial" devices.
> > > 
> > > Oh, I think we really don't want to attach the group to an ioasid, or
> > > at least not as a first-class idea.
> > > 
> > > The fundamental problem that got us here is we now live in a world
> > > where there are many ways to attach a device to an IOASID:
> > 
> > I'm not seeing that that's necessarily a problem.
> > 
> > >  - A RID binding
> > >  - A RID,PASID binding
> > >  - A RID,PASID binding for ENQCMD
> > 
> > I have to admit I haven't fully grasped the differences between these
> > modes.  I'm hoping we can consolidate at least some of them into the
> > same sort of binding onto different IOASIDs (which may be linked in
> > parent/child relationships).
> 
> What I would like is that the /dev/iommu side managing the IOASID
> doesn't really care much, but the device driver has to tell
> drivers/iommu what it is going to do when it attaches.

By the device driver, do you mean the userspace or guest device
driver?  Or do you mean the vfio_pci or mdev "shim" device driver"?

> It makes sense, in PCI terms, only the driver knows what TLPs the
> device will generate. The IOMMU needs to know what TLPs it will
> recieve to configure properly.
> 
> PASID or not is major device specific variation, as is the ENQCMD/etc
> 
> Having the device be explicit when it tells the IOMMU what it is going
> to be sending is a major plus to me. I actually don't want to see this
> part of the interface be made less strong.

Ok, if I'm understanding this right a PASID capable IOMMU will be able
to process *both* transactions with just a RID and transactions with a
RID+PASID.

So if we're thinking of this notional 84ish-bit address space, then
that includes "no PASID" as well as all the possible PASID values.
Yes?  Or am I confused?

> 
> > > The selection of which mode to use is based on the specific
> > > driver/device operation. Ie the thing that implements the 'struct
> > > vfio_device' is the thing that has to select the binding mode.
> > 
> > I thought userspace selected the binding mode - although not all modes
> > will be possible for all devices.
> 
> /dev/iommu is concerned with setting up the IOAS and filling the IO
> page tables with information
> 
> The driver behind "struct vfio_device" is responsible to "route" its
> HW into that IOAS.
> 
> They are two halfs of the problem, one is only the io page table, and one
> the is connection of a PCI TLP to a specific io page table.
> 
> Only the driver knows what format of TLPs the device will generate so
> only the driver can specify the "route"

Ok.  I'd really like if we can encode this in a way that doesn't build
PCI-specific structure into the API, though.

>  
> > > eg if two PCI devices are in a group then it is perfectly fine that
> > > one device uses RID binding and the other device uses RID,PASID
> > > binding.
> > 
> > U... I don't see how that can be.  They could well be in the same
> > group because their RIDs cannot be distinguished from each other.
> 
> Inability to match the RID is rare, certainly I would expect any IOMMU
> HW that can do PCIEe PASID matching can also do RID matching.

It's not just up to the IOMMU.  The obvious case is a PCIe-to-PCI
bridge.  All transactions show the RID of the bridge, because vanilla
PCI doesn't have them.  Same situation with a buggy multifunction
device which uses function 0's RID for all functions.

It may be rare, but we still have to deal with it one way or another.

I really don't think we want to support multiple binding types for a
single group.

> With
> such HW the above is perfectly fine - the group may not be secure
> between members (eg !ACS), but the TLPs still carry valid RIDs and
> PASID and the IOMMU can still discriminate.

They carry RIDs, whether they're valid depends on how buggy your
hardware is.

> I think you are talking about really old IOMMU's that could only
> isolate based on ingress port or something.. I suppose modern PCIe has
> some cases like this in the NTB stuff too.

Depends what you mean by really old.  They may seem really old to
those working on new fancy IOMMU technology.  But I hit problems in
practice not long ago with awkwardly multi-device groups because it
was on a particular Dell server without ACS implementation.  Likewise
I strongly suspect non-PASID IOMMUs will remain common on low end
hardware (like peoples' laptops) for some time.

> Oh, I hadn't spent time thinking about any of those.. It is messy but
> it can 

RE: [RFC] /dev/ioasid uAPI proposal

2021-06-08 Thread Parav Pandit
Hi Jaocb,

Sorry for the late response. Was on PTO on Friday last week.
Please see comments below.

> From: Jacob Pan 
> Sent: Friday, June 4, 2021 2:28 AM
> 
> Hi Parav,
> 
> On Tue, 1 Jun 2021 17:30:51 +, Parav Pandit  wrote:
> 
> > > From: Tian, Kevin 
> > > Sent: Thursday, May 27, 2021 1:28 PM
> >
> > > 5.6. I/O page fault
> > > +++
> > >
> > > (uAPI is TBD. Here is just about the high-level flow from host IOMMU
> > > driver to guest IOMMU driver and backwards).
> > >
> > > -   Host IOMMU driver receives a page request with raw fault_data {rid,
> > > pasid, addr};
> > >
> > > -   Host IOMMU driver identifies the faulting I/O page table according
> > > to information registered by IOASID fault handler;
> > >
> > > -   IOASID fault handler is called with raw fault_data (rid, pasid,
> > > addr), which is saved in ioasid_data->fault_data (used for
> > > response);
> > >
> > > -   IOASID fault handler generates an user fault_data (ioasid, addr),
> > > links it to the shared ring buffer and triggers eventfd to
> > > userspace;
> > >
> > > -   Upon received event, Qemu needs to find the virtual routing
> > > information (v_rid + v_pasid) of the device attached to the faulting
> > > ioasid. If there are multiple, pick a random one. This should be
> > > fine since the purpose is to fix the I/O page table on the guest;
> > >
> > > -   Qemu generates a virtual I/O page fault through vIOMMU into guest,
> > > carrying the virtual fault data (v_rid, v_pasid, addr);
> > >
> > Why does it have to be through vIOMMU?
> I think this flow is for fully emulated IOMMU, the same IOMMU and device
> drivers run in the host and guest. Page request interrupt is reported by the
> IOMMU, thus reporting to vIOMMU in the guest.
In non-emulated case, how will the page fault of guest will be handled?
If I take Intel example, I thought FL page table entry still need to be handled 
by guest, which in turn fills up 2nd level page table entries.
No?

> 
> > For a VFIO PCI device, have you considered to reuse the same PRI
> > interface to inject page fault in the guest? This eliminates any new
> > v_rid. It will also route the page fault request and response through
> > the right vfio device.
> >
> I am curious how would PCI PRI can be used to inject fault. Are you talking
> about PCI config PRI extended capability structure? 
PCI PRI capability is only to expose page fault support.
Page fault injection/response cannot happen through the pci cap anyway.
This requires a side channel.
I was suggesting to emulate pci_endpoint->rc->iommu->iommu_irq path of 
hypervisor, as

vmm->guest_emuated_pri_device->pri_req/rsp queue(s).


> The control is very
> limited, only enable and reset. Can you explain how would page fault
> handled in generic PCI cap?
Not via pci cap.
Through more generic interface without attaching to viommu.

> Some devices may have device specific way to handle page faults, but I guess
> this is not the PCI PRI method you are referring to?
This was my next question that if page fault reporting and response interface 
is generic, it will be more scalable given that PCI PRI is limited to single 
page requests.
And additionally VT-d seems to funnel all the page fault interrupts via single 
IRQ.
And 3rdly, its requirement to always come through the hypervisor intermediatory.

Having a generic mechanism, will help to overcome above limitations as Jean 
already pointed out that page fault is a hot path.

> 
> > > -   Guest IOMMU driver fixes up the fault, updates the I/O page table,
> > > and then sends a page response with virtual completion data (v_rid,
> > > v_pasid, response_code) to vIOMMU;
> > >
> > What about fixing up the fault for mmu page table as well in guest?
> > Or you meant both when above you said "updates the I/O page table"?
> >
> > It is unclear to me that if there is single nested page table
> > maintained or two (one for cr3 references and other for iommu). Can
> > you please clarify?
> >
> I think it is just one, at least for VT-d, guest cr3 in GPA is stored in the 
> host
> iommu. Guest iommu driver calls handle_mm_fault to fix the mmu page
> tables which is shared by the iommu.
> 
So if guest has touched the page data, FL and SL entries of mmu should be 
populated and IOMMU side should not even reach a point of raising the PRI.
(ATS should be enough).
Because IOMMU side share the same FL and SL table entries referred by the 
scalable-mode PASID-table entry format described in Section 9.6.
Is that correct?

> > > -   Qemu finds the pending fault event, converts virtual completion data
> > > into (ioasid, response_code), and then calls a /dev/ioasid ioctl to
> > > complete the pending fault;
> > >
> > For VFIO PCI device a virtual PRI request response interface is done,
> > it can be generic interface among multiple vIOMMUs.
> >
> same question above, not sure how this works in terms of interrupts and
> response queuing etc.
> 
Citing "VFIO PCI device" was wrong on my part.
Was