Re: [PATCH v7 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2022-07-05 Thread Jason Gunthorpe
On Tue, Jul 05, 2022 at 06:12:40PM +0200, Christoph Hellwig wrote:
> On Tue, Jul 05, 2022 at 10:51:02AM -0300, Jason Gunthorpe wrote:
> > > In fact I'm not even sure this should be a character device, it seems
> > > to fit it way better with the PCI sysfs hierchacy, just like how we
> > > map MMIO resources, which these are anyway.  And once it is on sysfs
> > > we do have a uniqueue inode and need none of the pseudofs stuff, and
> > > don't need all the glue code in nvme either.
> > 
> > Shouldn't there be an allocator here? It feels a bit weird that the
> > entire CMB is given to a single process, it is a sharable resource,
> > isn't it?
> 
> Making the entire area given by the device to the p2p allocator available
> to user space seems sensible to me.  That is what the current series does,
> and what a sysfs interface would do as well.

That makes openning the mmap exclusive with the in-kernel allocator -
so it means opening the mmap fails if something else is using a P2P
page and once the mmap is open all kernel side P2P allocations will
fail?

Which seems inelegant, I would expect the the mmap operation to
request some pages from the P2P allocator and provide them to
userspace so user and kernel workflows can co-exist using the same
CMB.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2022-07-05 Thread Jason Gunthorpe
On Tue, Jul 05, 2022 at 09:51:08AM +0200, Christoph Hellwig wrote:

> But what also really matters here:  I don't want every user that
> wants to be able to mmap a character device to do all this work.
> The layering is simply wrong, it needs some character device
> based helpers, not be open code everywhere.

I think alot (all?) cases would be happy if the inode was 1:1 with the
cdev struct device. I suppose the cdev code would still have to create
pseudo fs, but at least that is hidden.

> In fact I'm not even sure this should be a character device, it seems
> to fit it way better with the PCI sysfs hierchacy, just like how we
> map MMIO resources, which these are anyway.  And once it is on sysfs
> we do have a uniqueue inode and need none of the pseudofs stuff, and
> don't need all the glue code in nvme either.

Shouldn't there be an allocator here? It feels a bit weird that the
entire CMB is given to a single process, it is a sharable resource,
isn't it?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH kernel] powerpc/iommu: Add simple iommu_ops to report capabilities

2022-07-05 Thread Jason Gunthorpe via iommu
On Tue, Jul 05, 2022 at 04:22:35PM +1000, Alexey Kardashevskiy wrote:

> I have not looked into the domains for ages, what is missing here? With this
> on top of 5.19-rc1 VFIO works again on my POWER9 box. Thanks,

Does this solve all the problems or just coherency? It seems like it
should solve everything now as there will be a IOMMU_DOMAIN_BLOCKED
and the ref logic will succeed to assign it?

> +static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
> +{
> + struct iommu_domain *domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> +
> + if (!domain)
> + return NULL;

This should only succeed if type is IOMMU_DOMAIN_BLOCKED

> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
> +{
> + struct iommu_group *grp = dev->iommu_group;
> +
> + if (!grp)
> + grp = ERR_PTR(-ENODEV);

It looks like this should just always fail since the code code already
checks iommu_group before calling this? (Arguably ppc should be
refactored to use the normal probe_device and device_group ops to
create groups, but that doesn't seem critical for this.

Thanks
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/4] vfio: Require that devices support DMA cache coherence

2022-07-04 Thread Jason Gunthorpe via iommu
On Mon, Jul 04, 2022 at 09:21:26PM +0800, chenxiang (M) wrote:
> Hi,
> 
> We encounter a issue with the patch: our platform is ARM64, and we run DPDK
> with smmu disable on VM (without iommu=smmuv3 etc),
> 
> so we use noiommu mode with enable_unsafe_noiommu_mode=1 to passthrough the
> device to VM with following steps (those steps are on VM) :
> 
> insmod vfio.ko enable_unsafe_noiommu_mode=1
> insmod vfio_virqfd.ko
> insmod vfio-pci-core.ko
> insmdo vfio-pci.ko
> insmod vfio_iommu_type1.ko
> 
> echo vfio-pci > /sys/bus/pci/devices/:00:02.0/driver_override
> echo :00:02.0 > /sys/bus/pci/drivers_probe -- failed
> 
> I find that vfio-pci device is not probed because of the additional check.
> It works well without this patch.
> 
> Do we need to skip the check if enable_unsafe_noiommu_mode=1?

Yes, that is definately an unintended mistake.

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2022-06-29 Thread Jason Gunthorpe
On Wed, Jun 29, 2022 at 10:00:09AM -0600, Logan Gunthorpe wrote:
> 
> 
> 
> On 2022-06-29 00:48, Christoph Hellwig wrote:
> > On Wed, Jun 15, 2022 at 10:12:32AM -0600, Logan Gunthorpe wrote:
> >> A pseudo mount is used to allocate an inode for each PCI device. The
> >> inode's address_space is used in the file doing the mmap so that all
> >> VMAs are collected and can be unmapped if the PCI device is unbound.
> >> After unmapping, the VMAs are iterated through and their pages are
> >> put so the device can continue to be unbound. An active flag is used
> >> to signal to VMAs not to allocate any further P2P memory once the
> >> removal process starts. The flag is synchronized with concurrent
> >> access with an RCU lock.
> > 
> > Can't we come up with a way of doing this without all the pseudo-fs
> > garbagage?  I really hate all the overhead for that in the next
> > nvme patch as well.
> 
> I assume you still want to be able to unmap the VMAs on unbind and not
> just hang?
> 
> I'll see if I can come up with something to do the a similar thing using
> vm_private data or some such.

I've tried in the past, this is not a good idea. There is no way to
handle failures when a VMA is dup'd and if you rely on private_data
you almost certainly have to alloc here.

Then there is the issue of making the locking work on invalidation
which is crazy ugly.

> I was not a fan of the extra code for this either, but I was given to
> understand that it was the standard way to collect and cleanup VMAs.

Christoph you tried tried to clean it once globally, what happened to
that?

All that is needed here is a way to get a unique inode for the PCI
memory.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/2] vfio: Use device_iommu_capable()

2022-06-24 Thread Jason Gunthorpe via iommu
On Fri, Jun 24, 2022 at 06:59:35PM +0100, Robin Murphy wrote:
> Use the new interface to check the capabilities for our device
> specifically.
> 
> Reviewed-by: Lu Baolu 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/vfio/vfio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/2] vfio/type1: Simplify bus_type determination

2022-06-24 Thread Jason Gunthorpe via iommu
On Fri, Jun 24, 2022 at 06:51:44PM +0100, Robin Murphy wrote:
> Since IOMMU groups are mandatory for drivers to support, it stands to
> reason that any device which has been successfully added to a group
> must be on a bus supported by that IOMMU driver, and therefore a domain
> viable for any device in the group must be viable for all devices in
> the group. This already has to be the case for the IOMMU API's internal
> default domain, for instance. Thus even if the group contains devices on
> different buses, that can only mean that the IOMMU driver actually
> supports such an odd topology, and so without loss of generality we can
> expect the bus type of any device in a group to be suitable for IOMMU
> API calls.
> 
> Furthermore, scrutiny reveals a lack of protection for the bus being
> removed while vfio_iommu_type1_attach_group() is using it; the reference
> that VFIO holds on the iommu_group ensures that data remains valid, but
> does not prevent the group's membership changing underfoot.
> 
> We can address both concerns by recycling vfio_bus_type() into some
> superficially similar logic to indirect the IOMMU API calls themselves.
> Each call is thus protected from races by the IOMMU group's own locking,
> and we no longer need to hold group-derived pointers beyond that scope.
> It also gives us an easy path for the IOMMU API's migration of bus-based
> interfaces to device-based, of which we can already take the first step
> with device_iommu_capable(). As with domains, any capability must in
> practice be consistent for devices in a given group - and after all it's
> still the same capability which was expected to be consistent across an
> entire bus! - so there's no need for any complicated validation.
> 
> Signed-off-by: Robin Murphy 
> ---
> 
> v3: Complete rewrite yet again, and finally it doesn't feel like we're
> stretching any abstraction boundaries the wrong way, and the diffstat
> looks right too. I did think about embedding IOMMU_CAP_INTR_REMAP
> directly in the callback, but decided I like the consistency of minimal
> generic wrappers. And yes, if the capability isn't supported then it
> does end up getting tested for the whole group, but meh, it's harmless.

I think this is really weird, but it works and isn't worth debating
further, so OK:

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-24 Thread Jason Gunthorpe via iommu
On Fri, Jun 24, 2022 at 03:28:33PM -0300, Jason Gunthorpe wrote:
> On Thu, Jun 23, 2022 at 01:00:27PM -0700, Nicolin Chen wrote:
> > The domain->ops validation was added, as a precaution, for mixed-driver
> > systems.
> > 
> > Per Robin's remarks,
> > * While bus_set_iommu() still exists, the core code prevents multiple
> >   drivers from registering, so we can't really run into a situation of
> >   having a mixed-driver system:
> >   https://lore.kernel.org/kvm/6e1280c5-4b22-ebb3-3912-6c72bc169...@arm.com/
> > 
> > * And there's plenty more significant problems than this to fix; in future
> >   when many can be permitted, we will rely on the IOMMU core code to check
> >   the domain->ops:
> >   https://lore.kernel.org/kvm/6575de6d-94ba-c427-5b1e-967750ddf...@arm.com/
> > 
> > So remove the check in VFIO for simplicity.
> 
> As discussed we can't remove this check, multiple iommus on different
> busses are allowed today and VFIO does not restrict its attempts to
> attach a pre-existing domain to a single group or bus.
> 
> Please go back to the original version with the ops check in the core
> code.

Robin convinced me, so never mind :)

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-24 Thread Jason Gunthorpe via iommu
On Fri, Jun 24, 2022 at 07:31:47PM +0100, Robin Murphy wrote:

> > > Oh, physical platforms with mixed IOMMUs definitely exist already. The 
> > > main
> > > point is that while bus_set_iommu still exists, the core code effectively
> > > *does* prevent multiple drivers from registering - even in emulated cases
> > > like the example above, virtio-iommu and VT-d would both try to
> > > bus_set_iommu(_bus_type), and one of them will lose. The aspect which
> > > might warrant clarification is that there's no combination of supported
> > > drivers which claim non-overlapping buses *and* could appear in the same
> > > system - even if you tried to contrive something by emulating, say, VT-d
> > > (PCI) alongside rockchip-iommu (platform), you could still only describe 
> > > one
> > > or the other due to ACPI vs. Devicetree.
> > 
> > Right, and that is still something we need to protect against with
> > this ops check. VFIO is not checking that the bus's are the same
> > before attempting to re-use a domain.
> > 
> > So it is actually functional and does protect against systems with
> > multiple iommu drivers on different busses.
> 
> But as above, which systems *are* those? 

IDK it seems wrong that the system today will allow different buses to
have different IOMMU drivers and not provide a trivial protection
check.

> FWIW my iommu/bus dev branch has got as far as the final bus ops removal and
> allowing multiple driver registrations, and before it allows that, it does
> now have the common attach check that I sketched out in the previous
> discussion of this.

If you want to put the check in your series that seems fine too, as
long as we get it in the end.

> It's probably also noteworthy that domain->ops is no longer the same
> domain->ops that this code was written to check, and may now be different
> between domains from the same driver.

Yes, the vfio check is not good anymore.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-24 Thread Jason Gunthorpe via iommu
On Thu, Jun 23, 2022 at 01:00:27PM -0700, Nicolin Chen wrote:
> The domain->ops validation was added, as a precaution, for mixed-driver
> systems.
> 
> Per Robin's remarks,
> * While bus_set_iommu() still exists, the core code prevents multiple
>   drivers from registering, so we can't really run into a situation of
>   having a mixed-driver system:
>   https://lore.kernel.org/kvm/6e1280c5-4b22-ebb3-3912-6c72bc169...@arm.com/
> 
> * And there's plenty more significant problems than this to fix; in future
>   when many can be permitted, we will rely on the IOMMU core code to check
>   the domain->ops:
>   https://lore.kernel.org/kvm/6575de6d-94ba-c427-5b1e-967750ddf...@arm.com/
> 
> So remove the check in VFIO for simplicity.

As discussed we can't remove this check, multiple iommus on different
busses are allowed today and VFIO does not restrict its attempts to
attach a pre-existing domain to a single group or bus.

Please go back to the original version with the ops check in the core
code.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-24 Thread Jason Gunthorpe via iommu
On Fri, Jun 24, 2022 at 06:35:49PM +0800, Yong Wu wrote:

> > > It's not used in VFIO context. "return 0" just satisfy the iommu
> > > framework to go ahead. and yes, here we only allow the shared
> > > "mapping-domain" (All the devices share a domain created
> > > internally).

What part of the iommu framework is trying to attach a domain and
wants to see success when the domain was not actually attached ?

> > What prevent this driver from being used in VFIO context?
> 
> Nothing prevent this. Just I didn't test.

This is why it is wrong to return success here.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-24 Thread Jason Gunthorpe via iommu
On Fri, Jun 24, 2022 at 08:28:31AM -0600, Alex Williamson wrote:

> > > That's essentially what I'm suggesting, the vfio_group is passed as an
> > > opaque pointer which type1 can use for a
> > > vfio_group_for_each_vfio_device() type call.  Thanks,  
> > 
> > I don't want to add a whole vfio_group_for_each_vfio_device()
> > machinery that isn't actually needed by anything.. This is all
> > internal, we don't need to design more than exactly what is needed.
> > 
> > At this point if we change the signature of the attach then we may as
> > well just pass in the representative vfio_device, that is probably
> > less LOC overall.
> 
> That means that vfio core still needs to pick an arbitrary
> representative device, which I find in fundamental conflict to the
> nature of groups.

Well, this is where iommu is going, I think Robin has explained this
view well enough.

Ideally we'd move VFIO away from trying to attach groups and attach
when the device FD is opened, I view this as a micro step in that
direction.

> Type1 is the interface to the IOMMU API, if through the IOMMU API we
> can make an assumption that all devices within the group are
> equivalent for a given operation, that should be done in type1 code,
> not in vfio core.

iommu_group is part of the core code, if the representative device
assumption stems from the iommu_group then the core code can safely
make it.

> A for-each interface is commonplace and not significantly more code
> or design than already proposed.

Except that someone else might get the idea to use it for something
completely inappropriate.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-24 Thread Jason Gunthorpe via iommu
On Fri, Jun 24, 2022 at 08:11:59AM -0600, Alex Williamson wrote:
> On Thu, 23 Jun 2022 22:50:30 -0300
> Jason Gunthorpe  wrote:
> 
> > On Thu, Jun 23, 2022 at 05:00:44PM -0600, Alex Williamson wrote:
> > 
> > > > >> +struct vfio_device *vfio_device_get_from_iommu(struct iommu_group 
> > > > >> *iommu_group)
> > > > >> +{
> > > > >> +struct vfio_group *group = 
> > > > >> vfio_group_get_from_iommu(iommu_group);
> > > > >> +struct vfio_device *device;
> > > > > 
> > > > > Check group for NULL.
> > > > 
> > > > OK - FWIW in context this should only ever make sense to call with an 
> > > > iommu_group which has already been derived from a vfio_group, and I did 
> > > > initially consider a check with a WARN_ON(), but then decided that the 
> > > > unguarded dereference would be a sufficiently strong message. No 
> > > > problem 
> > > > with bringing that back to make it more defensive if that's what you 
> > > > prefer.  
> > > 
> > > A while down the road, that's a bit too much implicit knowledge of the
> > > intent and single purpose of this function just to simply avoid a test.  
> > 
> > I think we should just pass the 'struct vfio_group *' into the
> > attach_group op and have this API take that type in and forget the
> > vfio_group_get_from_iommu().
> 
> That's essentially what I'm suggesting, the vfio_group is passed as an
> opaque pointer which type1 can use for a
> vfio_group_for_each_vfio_device() type call.  Thanks,

I don't want to add a whole vfio_group_for_each_vfio_device()
machinery that isn't actually needed by anything.. This is all
internal, we don't need to design more than exactly what is needed.

At this point if we change the signature of the attach then we may as
well just pass in the representative vfio_device, that is probably
less LOC overall.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] vfio/type1: Simplify bus_type determination

2022-06-24 Thread Jason Gunthorpe via iommu
On Tue, Jun 21, 2022 at 08:09:20PM +0100, Robin Murphy wrote:

> Quick consensus then: does anyone have a particular preference between
> changing the .attach_group signature vs. adding a helper based on
> vfio_group_get_from_iommu() for type1 to call from within its callback? They
> seem about equal (but opposite) in terms of the simplicity vs. impact
> tradeoff to me, so I can't quite decide conclusively...

Ah, I've been away and only just saw this..

Given Alex's remarks and the need to re-get the vfio_group in the
helper, it may be very slightly nicer to pass in the vfio_device as an
argument. But either works for me.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu: Add Visconti5 IOMMU driver

2022-06-24 Thread Jason Gunthorpe via iommu
On Mon, Jun 20, 2022 at 05:49:13AM +, nobuhiro1.iwama...@toshiba.co.jp 
wrote:
> Hi,
> 
> Thanks for your review.
> 
> > -Original Message-----
> > From: Jason Gunthorpe 
> > Sent: Thursday, May 26, 2022 3:27 AM
> > To: Baolu Lu 
> > Cc: iwamatsu nobuhiro(岩松 信洋 □SWC◯ACT)
> > ; Joerg Roedel ; Will
> > Deacon ; Rob Herring ;
> > devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
> > iommu@lists.linux-foundation.org; ishikawa yuji(石川 悠司 ○RDC□AIT
> > C○EA開) ;
> > linux-arm-ker...@lists.infradead.org
> > Subject: Re: [PATCH 1/3] iommu: Add Visconti5 IOMMU driver
> > 
> > On Wed, May 25, 2022 at 02:26:37PM +0800, Baolu Lu wrote:
> > > On 2022/5/25 09:31, Nobuhiro Iwamatsu wrote:
> > > > +static const struct iommu_ops visconti_atu_ops = {
> > > > +   .domain_alloc = visconti_atu_domain_alloc,
> > > > +   .probe_device = visconti_atu_probe_device,
> > > > +   .release_device = visconti_atu_release_device,
> > > > +   .device_group = generic_device_group,
> > > > +   .of_xlate = visconti_atu_of_xlate,
> > > > +   .pgsize_bitmap = ATU_IOMMU_PGSIZE_BITMAP,
> > > > +   .default_domain_ops = &(const struct iommu_domain_ops) {
> > > > +   .attach_dev = visconti_atu_attach_device,
> > > > +   .detach_dev = visconti_atu_detach_device,
> > >
> > > The detach_dev callback is about to be deprecated. The new drivers
> > > should implement the default domain and blocking domain instead.
> > 
> > Yes please, new drivers need to use default_domains.
> > 
> > It is very strange that visconti_atu_detach_device() does nothing.  It is 
> > not
> > required that a domain is fully unmapped before being destructed, I think
> > detach should set ATU_AT_EN to 0.
> 
> I see, I rethink implementation.
> 
> > 
> > What behavior does the HW have when ATU_AT_ENTRY_EN == 0? If DMA is
> > rejected then this driver should have a IOMMU_DOMAIN_BLOCKING and
> > return that from ops->def_domain_type().
> 
> If ATU_AT_ENTRY_EN is 0, nothing happens. It does not work with IOMMU,
> it works with the memory space set in device tree.

So that would be an assignment to DOMAIN_IDENTITY

> > I'm feeling like these "special" drivers need some kind of handshake with 
> > their
> > only users because they don't work with things like VFIO..
> 
> Since the devices that utilize this IOMMU function are fixed, I do
> not think that a special handshake is required.  Could you you tell
> me where you thought you needed a handshake?

In this case the iommu driver is so limited that it will not work with
VFIO - it is only ment to be used with the fixed drivers that are
paired with it.

Ideally we'd prevent VFIO from connecting and only allow drivers that
know the limitations of the IOMMU to use the unmanaged domain.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-24 Thread Jason Gunthorpe via iommu
On Wed, Jun 22, 2022 at 08:54:45AM +0100, Robin Murphy wrote:
> On 2022-06-16 23:23, Nicolin Chen wrote:
> > On Thu, Jun 16, 2022 at 06:40:14AM +, Tian, Kevin wrote:
> > 
> > > > The domain->ops validation was added, as a precaution, for mixed-driver
> > > > systems. However, at this moment only one iommu driver is possible. So
> > > > remove it.
> > > 
> > > It's true on a physical platform. But I'm not sure whether a virtual 
> > > platform
> > > is allowed to include multiple e.g. one virtio-iommu alongside a virtual 
> > > VT-d
> > > or a virtual smmu. It might be clearer to claim that (as Robin pointed 
> > > out)
> > > there is plenty more significant problems than this to solve instead of 
> > > simply
> > > saying that only one iommu driver is possible if we don't have explicit 
> > > code
> > > to reject such configuration. 
> > 
> > Will edit this part. Thanks!
> 
> Oh, physical platforms with mixed IOMMUs definitely exist already. The main
> point is that while bus_set_iommu still exists, the core code effectively
> *does* prevent multiple drivers from registering - even in emulated cases
> like the example above, virtio-iommu and VT-d would both try to
> bus_set_iommu(_bus_type), and one of them will lose. The aspect which
> might warrant clarification is that there's no combination of supported
> drivers which claim non-overlapping buses *and* could appear in the same
> system - even if you tried to contrive something by emulating, say, VT-d
> (PCI) alongside rockchip-iommu (platform), you could still only describe one
> or the other due to ACPI vs. Devicetree.

Right, and that is still something we need to protect against with
this ops check. VFIO is not checking that the bus's are the same
before attempting to re-use a domain.

So it is actually functional and does protect against systems with
multiple iommu drivers on different busses.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-23 Thread Jason Gunthorpe via iommu
On Wed, Jun 22, 2022 at 01:04:11PM +0100, Robin Murphy wrote:

> +struct vfio_device *vfio_device_get_from_iommu(struct iommu_group 
> *iommu_group)
> +{
> + struct vfio_group *group = vfio_group_get_from_iommu(iommu_group);
> + struct vfio_device *device;
> +
> + mutex_lock(>device_lock);
> + list_for_each_entry(device, >device_list, group_next) {
> + if (vfio_device_try_get(device)) {
> + mutex_unlock(>device_lock);
> + return device;
> + }
> + }
> + mutex_unlock(>device_lock);
> + return NULL;
> +}

FWIW, I have no objection to this general approach, and I don't think
we should make any broader API just for this.

Though I might call it something like
'vfio_get_group_representor_device()' which more strongly suggests
what it is only used for.

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-23 Thread Jason Gunthorpe via iommu
On Thu, Jun 23, 2022 at 05:00:44PM -0600, Alex Williamson wrote:

> > >> +struct vfio_device *vfio_device_get_from_iommu(struct iommu_group 
> > >> *iommu_group)
> > >> +{
> > >> +struct vfio_group *group = 
> > >> vfio_group_get_from_iommu(iommu_group);
> > >> +struct vfio_device *device;  
> > > 
> > > Check group for NULL.  
> > 
> > OK - FWIW in context this should only ever make sense to call with an 
> > iommu_group which has already been derived from a vfio_group, and I did 
> > initially consider a check with a WARN_ON(), but then decided that the 
> > unguarded dereference would be a sufficiently strong message. No problem 
> > with bringing that back to make it more defensive if that's what you prefer.
> 
> A while down the road, that's a bit too much implicit knowledge of the
> intent and single purpose of this function just to simply avoid a test.

I think we should just pass the 'struct vfio_group *' into the
attach_group op and have this API take that type in and forget the
vfio_group_get_from_iommu().

At this point there is little justification for
vfio_group_get_from_iommu() existing at all, it should be folded into
the one use in vfio_group_find_or_alloc() and the locking widened so
we don't have the unlock/alloc/lock race that requires it to be called
twice.

> I'd lean towards Kevin's idea that we could store bus_type on the
> vfio_group and pass that to type1, with the same assumptions we're
> making in the commit log that it's consistent, but that doesn't get us
> closer to the long term plan of dropping the bus_type interfaces
> AIUI.

Right, the point is to get a representative struct device here to use.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] vfio/type1: Simplify bus_type determination

2022-06-23 Thread Jason Gunthorpe via iommu
On Thu, Jun 23, 2022 at 01:23:05PM +0100, Robin Murphy wrote:

> So yes, technically we could implement an iommu_group_capable() and an
> iommu_group_domain_alloc(), which would still just internally resolve the
> IOMMU ops and instance data from a member device to perform the driver-level
> call, but once again it would be for the benefit of precisely one
> user. 

Benefit one user and come with a fairly complex locking situation to
boot.

Alex, I'd rather think about moving the type 1 code so that the iommu
attach happens during device FD creation (then we have a concrete
non-fake device), not during group FD opening.

That is the model we need for iommufd anyhow.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: New IOMMU mailing list coming

2022-06-23 Thread Jason Gunthorpe via iommu
On Tue, Jun 14, 2022 at 10:30:20AM +0200, Jörg Rödel wrote:
> Hi all,
> 
> after several problems with the current IOMMU mailing list (no DKIM
> support, poor b4 interoperability) we have decided to move the IOMMU
> development discussions to a new list hosted at lists.linux.dev.
> 
> The new list is up and running already, to subscribe please send an
> email to iommu+subscr...@lists.linux.dev. It is not yet archived, but
> there will be a hard switch between the old and the new list on
> 
>   July 5th, 2022
> 
> After that date the old list will no longer work and the archive will
> switch to the new mailing list.

Nice! Thanks for doing it

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support

2022-06-19 Thread Jason Gunthorpe via iommu
On Thu, Jun 16, 2022 at 10:27:02AM +0800, Baolu Lu wrote:
> On 2022/6/15 23:40, Jason Gunthorpe wrote:
> > On Fri, Jun 10, 2022 at 01:37:20PM +0800, Baolu Lu wrote:
> > > On 2022/6/9 20:49, Jason Gunthorpe wrote:
> > > > > +void iommu_free_pgtbl_pages(struct iommu_domain *domain,
> > > > > + struct list_head *pages)
> > > > > +{
> > > > > + struct page *page, *next;
> > > > > +
> > > > > + if (!domain->concurrent_traversal) {
> > > > > + put_pages_list(pages);
> > > > > + return;
> > > > > + }
> > > > > +
> > > > > + list_for_each_entry_safe(page, next, pages, lru) {
> > > > > + list_del(>lru);
> > > > > + call_rcu(>rcu_head, pgtble_page_free_rcu);
> > > > > + }
> > > > It seems OK, but I wonder if there is benifit to using
> > > > put_pages_list() from the rcu callback
> > > 
> > > The price is that we need to allocate a "struct list_head" and free it
> > > in the rcu callback as well. Currently the list_head is sitting in the
> > > stack.
> > 
> > You'd have to use a different struct page layout so that the list_head
> > was in the struct page and didn't overlap with the rcu_head
> 
> Okay, let me head this direction in the next version.

I'm not sure it is worth it performance wise, would be interesting to
know perhaps. But see my prior email about how slab has a custom
struct page.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/5] vfio/iommu_type1: Simplify group attachment

2022-06-19 Thread Jason Gunthorpe via iommu
On Fri, Jun 17, 2022 at 04:07:20PM -0700, Nicolin Chen wrote:

> > > > > + vfio_iommu_aper_expand(iommu, _copy);
> > > >
> > > > but now it's done for every group detach. The aperture is decided
> > > > by domain geometry which is not affected by attached groups.
> > >
> > > Yea, I've noticed this part. Actually Jason did this change for
> > > simplicity, and I think it'd be safe to do so?
> > 
> > Perhaps detach_destroy() can return a Boolean to indicate whether
> > a domain is destroyed.
> 
> It could be a solution but doesn't feel that common for a clean
> function to have a return value indicating a special case. Maybe
> passing in "" so that we can check if it's NULL after?

It is harmless to do every time, it just burns a few CPU cycles on a
slow path. We don't need complexity to optmize it.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support

2022-06-15 Thread Jason Gunthorpe via iommu
On Fri, Jun 10, 2022 at 01:37:20PM +0800, Baolu Lu wrote:
> On 2022/6/9 20:49, Jason Gunthorpe wrote:
> > > +void iommu_free_pgtbl_pages(struct iommu_domain *domain,
> > > + struct list_head *pages)
> > > +{
> > > + struct page *page, *next;
> > > +
> > > + if (!domain->concurrent_traversal) {
> > > + put_pages_list(pages);
> > > + return;
> > > + }
> > > +
> > > + list_for_each_entry_safe(page, next, pages, lru) {
> > > + list_del(>lru);
> > > + call_rcu(>rcu_head, pgtble_page_free_rcu);
> > > + }
> > It seems OK, but I wonder if there is benifit to using
> > put_pages_list() from the rcu callback
> 
> The price is that we need to allocate a "struct list_head" and free it
> in the rcu callback as well. Currently the list_head is sitting in the
> stack.

You'd have to use a different struct page layout so that the list_head
was in the struct page and didn't overlap with the rcu_head

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] vfio/type1: Simplify bus_type determination

2022-06-09 Thread Jason Gunthorpe via iommu
On Wed, Jun 08, 2022 at 03:25:49PM +0100, Robin Murphy wrote:
> Since IOMMU groups are mandatory for drivers to support, it stands to
> reason that any device which has been successfully be added to a group
> must be on a bus supported by that IOMMU driver, and therefore a domain
> viable for any device in the group must be viable for all devices in
> the group. This already has to be the case for the IOMMU API's internal
> default domain, for instance. Thus even if the group contains devices
> on different buses, that can only mean that the IOMMU driver actually
> supports such an odd topology, and so without loss of generality we can
> expect the bus type of any arbitrary device in a group to be suitable
> for IOMMU API calls.
> 
> Replace vfio_bus_type() with a trivial callback that simply returns any
> device from which to then derive a usable bus type. This is also a step
> towards removing the vague bus-based interfaces from the IOMMU API.
> 
> Furthermore, scrutiny reveals a lack of protection for the bus and/or
> device being removed while .attach_group is inspecting them; the
> reference we hold on the iommu_group ensures that data remains valid,
> but does not prevent the group's membership changing underfoot. Holding
> the vfio_goup's device_lock should be sufficient to block any relevant
> device's VFIO driver from unregistering, and thus block unbinding and
> any further stages of removal for the duration of the attach operation.

The device_lock only protects devices that are on the device_list from
concurrent unregistration, the device returned by
iommu_group_for_each_dev() is not guarented to be the on the device
list.

> @@ -760,8 +760,11 @@ static int __vfio_container_attach_groups(struct 
> vfio_container *container,
>   int ret = -ENODEV;
>  
>   list_for_each_entry(group, >group_list, container_next) {
> + /* Prevent devices unregistering during attach */
> + mutex_lock(>device_lock);
>   ret = driver->ops->attach_group(data, group->iommu_group,
>   group->type);
> + mutex_unlock(>device_lock);

I still prefer the version where we pass in an arbitrary vfio_device
from the list the group maintains:

   list_first_entry(group->device_list)

And don't call iommu_group_for_each_dev(), it is much simpler to
reason about how it works.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 01/11] iommu: Add max_pasids field in struct iommu_device

2022-06-09 Thread Jason Gunthorpe via iommu
On Thu, Jun 09, 2022 at 05:25:42PM +, Raj, Ashok wrote:
> 
> On Tue, Jun 07, 2022 at 09:49:32AM +0800, Lu Baolu wrote:
> > Use this field to keep the number of supported PASIDs that an IOMMU
> > hardware is able to support. This is a generic attribute of an IOMMU
> > and lifting it into the per-IOMMU device structure makes it possible
> 
> There is also a per-device attribute that tells what width is supported on
> each device. When a device enables SVM, for simplicity we were proposing to
> disable SVM on devices that don't support the full width, since it adds
> additional complexity on the allocation interface. Is that factored into
> this?

I would like to see the concept of a 'global PASID' and this is the
only place we'd union all the per-device limits together. SVM can
optionally use a global PASID and ENQCMD requires it, but I don't want
to see the core code assuming everything is ENQCMD.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support

2022-06-09 Thread Jason Gunthorpe via iommu
On Thu, Jun 09, 2022 at 02:19:06PM +0100, Robin Murphy wrote:

> Is there a significant benefit to keeping both paths, or could we get away
> with just always using RCU? Realistically, pagetable pages aren't likely to
> be freed all that frequently, except perhaps at domain teardown, but that
> shouldn't really be performance-critical, and I guess we could stick an RCU
> sync point in iommu_domain_free() if we're really worried about releasing
> larger quantities of pages back to the allocator ASAP?

I think you are right, anything that uses the iommu_iotlb_gather may
as well use RCU too.

IIRC the allocators already know that RCU is often sitting on
freed-memory and have some contigency to flush it out before OOMing,
so nothing special should be needed.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCHES 1/2] iommu: Add RCU-protected page free support

2022-06-09 Thread Jason Gunthorpe via iommu
On Thu, Jun 09, 2022 at 03:08:10PM +0800, Lu Baolu wrote:
> The IOMMU page tables are updated using iommu_map/unmap() interfaces.
> Currently, there is no mandatory requirement for drivers to use locks
> to ensure concurrent updates to page tables, because it's assumed that
> overlapping IOVA ranges do not have concurrent updates. Therefore the
> IOMMU drivers only need to take care of concurrent updates to level
> page table entries.
> 
> But enabling new features challenges this assumption. For example, the
> hardware assisted dirty page tracking feature requires scanning page
> tables in interfaces other than mapping and unmapping. This might result
> in a use-after-free scenario in which a level page table has been freed
> by the unmap() interface, while another thread is scanning the next level
> page table.
> 
> This adds RCU-protected page free support so that the pages are really
> freed and reused after a RCU grace period. Hence, the page tables are
> safe for scanning within a rcu_read_lock critical region. Considering
> that scanning the page table is a rare case, this also adds a domain
> flag and the RCU-protected page free is only used when this flat is set.
> 
> Signed-off-by: Lu Baolu 
> ---
>  include/linux/iommu.h |  9 +
>  drivers/iommu/iommu.c | 23 +++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 5e1afe169549..6f68eabb8567 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -95,6 +95,7 @@ struct iommu_domain {
>   void *handler_token;
>   struct iommu_domain_geometry geometry;
>   struct iommu_dma_cookie *iova_cookie;
> + unsigned long concurrent_traversal:1;
>  };
>  
>  static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
> @@ -657,6 +658,12 @@ static inline void dev_iommu_priv_set(struct device 
> *dev, void *priv)
>   dev->iommu->priv = priv;
>  }
>  
> +static inline void domain_set_concurrent_traversal(struct iommu_domain 
> *domain,
> +bool value)
> +{
> + domain->concurrent_traversal = value;
> +}

?? If you want it to be a driver opt in I would just add a flags to
the domain ops. "DOMAIN_FLAG_RCU_FREE_PAGES"

> +void iommu_free_pgtbl_pages(struct iommu_domain *domain,
> + struct list_head *pages)
> +{
> + struct page *page, *next;
> +
> + if (!domain->concurrent_traversal) {
> + put_pages_list(pages);
> + return;
> + }
> +
> + list_for_each_entry_safe(page, next, pages, lru) {
> + list_del(>lru);
> + call_rcu(>rcu_head, pgtble_page_free_rcu);
> + }

It seems OK, but I wonder if there is benifit to using
put_pages_list() from the rcu callback

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency

2022-06-08 Thread Jason Gunthorpe via iommu
On Wed, Jun 08, 2022 at 08:28:03AM +, Tian, Kevin wrote:
> > From: Nicolin Chen
> > Sent: Monday, June 6, 2022 2:19 PM
> > 
> > From: Jason Gunthorpe 
> > 
> > The KVM mechanism for controlling wbinvd is only triggered during
> > kvm_vfio_group_add(), meaning it is a one-shot test done once the devices
> > are setup.
> 
> It's not one-shot. kvm_vfio_update_coherency() is called in both
> group_add() and group_del(). Then the coherency property is
> checked dynamically in wbinvd emulation:

>From the perspective of managing the domains that is still
one-shot. It doesn't get updated when individual devices are
added/removed to domains.

> given that I'm fine with the change in this patch. Even more probably
> we really want an explicit one-shot model so KVM can lock down
> the property once it starts to consume it then further adding a new
> group which would change the coherency is explicitly rejected and
> removing an existing group leaves it intact.

Why? Once wbinvd is enabled it is compatible with all domain
configurations, so just leave it on and ignore everything at that
point.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] Simplify vfio_iommu_type1 attach/detach routine

2022-06-07 Thread Jason Gunthorpe via iommu
On Tue, Jun 07, 2022 at 03:44:43PM +0800, Baolu Lu wrote:
> On 2022/6/6 14:19, Nicolin Chen wrote:
> > Worths mentioning the exact match for enforce_cache_coherency is removed
> > with this series, since there's very less value in doing that since KVM
> > won't be able to take advantage of it -- this just wastes domain memory.
> > Instead, we rely on Intel IOMMU driver taking care of that internally.
> 
> After reading this series, I don't see that Intel IOMMU driver needs any
> further change to support the new scheme. Did I miss anything?

You already did it :)

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu: Ensure device has the same iommu_ops as the domain

2022-06-06 Thread Jason Gunthorpe via iommu
On Mon, Jun 06, 2022 at 11:28:49AM -0700, Nicolin Chen wrote:

> > Well, as before I'd prefer to make the code match the commit message -
> > if I really need to spell it out, see below - since I can't imagine that
> > we should ever have need to identify a set of iommu_domain_ops in
> > isolation, therefore I think it's considerably clearer to use the
> > iommu_domain itself. However, either way we really don't need this yet,
> > so we may as well just go ahead and remove the redundant test from VFIO
> > anyway, and I can add some form of this patch to my dev branch for now.
> 
> I see. The version below is much cleaner. Yet, it'd become having a
> common pointer per iommu_domain vs. one pointer per driver. Jason
> pointed it out to me earlier that by doing so memory waste would be
> unnecessary on platforms that have considerable numbers of masters.

I had ment using struct iommu_domain when there is a simple solution
to use rodata doesn't seem ideal.

I don't quite understand the reluctance to make small changes to
drivers, it was the same logic with the default_domain_ops thing too.

> Since we know that it'd be safe to exclude this single change from
> this series, I can drop it in next version, if you don't like the
> change.

But this is fine too, that really is half the point of this series..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2022-06-02 Thread Jason Gunthorpe
On Thu, Jun 02, 2022 at 10:49:15AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2022-06-02 10:30, Jason Gunthorpe wrote:
> > On Thu, Jun 02, 2022 at 10:16:10AM -0600, Logan Gunthorpe wrote:
> > 
> >>> Just stuff the pages into the mmap, and your driver unprobe will
> >>> automatically block until all the mmaps are closed - no different than
> >>> having an open file descriptor or something.
> >>
> >> Oh is that what we want?
> > 
> > Yes, it is the typical case - eg if you have a sysfs file open unbind
> > hangs indefinitely. Many drivers can't unbind while they have open file
> > descriptors/etc.
> > 
> > A couple drivers go out of their way to allow unbinding while a live
> > userspace exists but this can get complicated. Usually there should be
> > a good reason.
> > 
> > The module will already be refcounted anyhow because the mmap points
> > to a char file which holds a module reference - meaning a simple rmmod
> > of the driver shouldn't work already..
> 
> Also, I just tried it... If I open a sysfs file for an nvme device (ie.
> /sys/class/nvme/nvme4/cntlid) and unbind the device, it does not block.
> A subsequent read on that file descriptor returns ENODEV. Which is what
> I would have expected.

Oh interesting, this has been changed since years ago when I last
looked, the kernfs_get_active() is now more narrowed than it once
was. So manybe sysfs isn't the same concern it used to be!

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2022-06-02 Thread Jason Gunthorpe
On Thu, Jun 02, 2022 at 10:45:55AM -0600, Logan Gunthorpe wrote:
> 
> 
> 
> On 2022-06-02 10:30, Jason Gunthorpe wrote:
> > On Thu, Jun 02, 2022 at 10:16:10AM -0600, Logan Gunthorpe wrote:
> > 
> >>> Just stuff the pages into the mmap, and your driver unprobe will
> >>> automatically block until all the mmaps are closed - no different than
> >>> having an open file descriptor or something.
> >>
> >> Oh is that what we want?
> > 
> > Yes, it is the typical case - eg if you have a sysfs file open unbind
> > hangs indefinitely. Many drivers can't unbind while they have open file
> > descriptors/etc.
> > 
> > A couple drivers go out of their way to allow unbinding while a live
> > userspace exists but this can get complicated. Usually there should be
> > a good reason.
> 
> This is not my experience. All the drivers I've worked with do not block
> unbind with open file descriptors (at least for char devices). I know,
> for example, that having a file descriptor open of /dev/nvmeX does not
> cause unbinding to block.

So there are lots of bugs in the kernel, and I've seen many drivers
that think calling cdev_device_del() is all they need to do - and then
happily allow cdev ioctl's/etc on a de-initialized driver struct.

Drivers that do take care of this usually have to put a lock around
all their fops to serialize against unbind. RDMA uses SRCU, iirc TPM
used a rwlock. But this is tricky and hurts fops performance.

I don't know what nvme did to protect against this, I didn't notice
an obvious lock.

> I figured this was the expectation as the userspace process doing
> the unbind won't be able to be interrupted seeing there's no way to
> fail on that path. Though, it certainly would make things a lot
> easier if the unbind can block indefinitely as it usually requires
> some complicated locking.

As I said, this is what sysfs does today and I don't see that ever
changing. If you userspace has a sysfs file open then the driver
unbind hangs until the file is closed.

So, doing as bad as sysfs seems like a reasonable baseline to me.

> Do you have an example of this? What mechanisms are developers using to
> block unbind with open file descriptors?

Sysfs maintains a refcount with a bias that is basically a fancied
rwlock. Most places use some kind of refcount triggering a
completion. Sleep on the completion until refcount is 0 on unbind kind
of thing.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2022-06-02 Thread Jason Gunthorpe
On Thu, Jun 02, 2022 at 10:16:10AM -0600, Logan Gunthorpe wrote:

> > Just stuff the pages into the mmap, and your driver unprobe will
> > automatically block until all the mmaps are closed - no different than
> > having an open file descriptor or something.
> 
> Oh is that what we want?

Yes, it is the typical case - eg if you have a sysfs file open unbind
hangs indefinitely. Many drivers can't unbind while they have open file
descriptors/etc.

A couple drivers go out of their way to allow unbinding while a live
userspace exists but this can get complicated. Usually there should be
a good reason.

The module will already be refcounted anyhow because the mmap points
to a char file which holds a module reference - meaning a simple rmmod
of the driver shouldn't work already..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2022-06-01 Thread Jason Gunthorpe
On Fri, May 27, 2022 at 04:41:08PM -0600, Logan Gunthorpe wrote:
> > 
> > IIRC this is the last part:
> > 
> > https://lore.kernel.org/linux-mm/20220524190632.3304-1-alex.sie...@amd.com/
> > 
> > And the earlier bit with Christoph's pieces looks like it might get
> > merged to v5.19..
> > 
> > The general idea is once pte_devmap is not set then all the
> > refcounting works the way it should. This is what all new ZONE_DEVICE
> > users should do..
> 
> Ok, I don't actually follow how those patches relate to this.
> 
> Based on your description I guess I don't need to set PFN_DEV and

Yes

> perhaps not use vmf_insert_mixed()? And then just use vm_normal_page()?

I'm not sure ATM the best function to use, but yes, a function that
doesn't set PFN_DEV is needed here.
 
> But the refcounting of the pages seemed like it was already sane to me,
> unless you mean that the code no longer has to synchronize_rcu() before
> returning the pages... 

Right. It also doesn't need to call unmap range or keep track of the
inode, or do any of that stuff unless it really needs mmap revokation
semantics (which I doubt this use case does)

unmap range was only necessary because the refcounting is wrong -
since the pte's don't hold a ref on the page in PFN_DEV mode it is
necessary to wipe all the PTE explicitly before going ahead to
decrement the refcount on this path.

Just stuff the pages into the mmap, and your driver unprobe will
automatically block until all the mmaps are closed - no different than
having an open file descriptor or something.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-06-01 Thread Jason Gunthorpe via iommu
On Wed, Jun 01, 2022 at 02:52:05PM +0100, Joao Martins wrote:
> On 6/1/22 13:33, Jason Gunthorpe wrote:
> > On Wed, Jun 01, 2022 at 01:18:52PM +0100, Joao Martins wrote:
> > 
> >>> So having safe racy reading in the kernel is probably best, and so RCU
> >>> would be good here too.
> >>
> >> Reading dirties ought to be similar to map/unmap but slightly simpler as
> >> I supposedly don't need to care about the pte changing under the hood (or
> >> so I initially thought). I was wrestling at some point if test-and-clear
> >> was enough or whether I switch back cmpxchg to detect the pte has changed
> >> and only mark dirty based on the old value[*]. The latter would align with
> >> how map/unmap performs the pte updates.
> > 
> > test-and-clear should be fine, but this all needs to be done under a
> > RCU context while the page tables themsevles are freed by RCU. Then
> > you can safely chase the page table pointers down to each level
> > without fear of UAF.
> > 
> 
> I was actually thinking more towards holding the same IOVA range lock to
> align with the rest of map/unmap/demote/etc? All these IO page table
> manip have all have the same performance requirements.

IMHO ideally we would not make read dirty use the IOVA range lock because
we want to read dirty big swaths of IOVA a time and it shouldn't be
forced to chunk based on the arbitary area construction.

But yes this could also be an option.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-06-01 Thread Jason Gunthorpe via iommu
On Wed, Jun 01, 2022 at 01:18:52PM +0100, Joao Martins wrote:

> > So having safe racy reading in the kernel is probably best, and so RCU
> > would be good here too.
> 
> Reading dirties ought to be similar to map/unmap but slightly simpler as
> I supposedly don't need to care about the pte changing under the hood (or
> so I initially thought). I was wrestling at some point if test-and-clear
> was enough or whether I switch back cmpxchg to detect the pte has changed
> and only mark dirty based on the old value[*]. The latter would align with
> how map/unmap performs the pte updates.

test-and-clear should be fine, but this all needs to be done under a
RCU context while the page tables themsevles are freed by RCU. Then
you can safely chase the page table pointers down to each level
without fear of UAF.

> I am not sure yet on dynamic demote/promote of page sizes if it changes this.

For this kind of primitive the caller must provide the locking, just
like map/unmap.

Effectively you can consider the iommu_domain has having externally
provided range-locks over the IOVA space. map/unmap/demote/promote
must run serially over intersecting IOVA ranges.

In terms of iommufd this means we always have to hold a lock related
to the area (which is the IOVA range) before issuing any iommu call on
the domain.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-31 Thread Jason Gunthorpe via iommu
On Tue, May 31, 2022 at 10:22:32PM +0100, Robin Murphy wrote:

> There are only 3 instances where we'll free a table while the domain is
> live. The first is the one legitimate race condition, where two map requests
> targeting relatively nearby PTEs both go to fill in an intermediate level of
> table; whoever loses that race frees the table they allocated, but it was
> never visible to anyone else so that's definitely fine. The second is if
> we're mapping a block entry, and find that there's already a table entry
> there, wherein we assume the table must be empty, clear the entry,
> invalidate any walk caches, install the block entry, then free the orphaned
> table; since we're mapping the entire IOVA range covered by that table,
> there should be no other operations on that IOVA range attempting to walk
> the table at the same time, so it's fine. 

I saw these two in the Intel driver

> The third is effectively the inverse, if we get a block-sized unmap
> but find a table entry rather than a block at that point (on the
> assumption that it's de-facto allowed for a single unmap to cover
> multiple adjacent mappings as long as it does so exactly); similarly
> we assume that the table must be full, and no other operations
> should be racing because we're unmapping its whole IOVA range, so we
> remove the table entry, invalidate, and free as before.

Not sure I noticed this one though

This all it all makes sense though.

> Although we don't have debug dumping for io-pgtable-arm, it's good to be
> thinking about this, since it's made me realise that dirty-tracking sweeps
> per that proposal might pose a similar kind of concern, so we might still
> need to harden these corners for the sake of that.

Lets make sure Joao sees this..

It is interesting because we probably don't want the big latency
spikes that would come from using locking to block map/unmap while
dirty reading is happening - eg at the iommfd level.

>From a consistency POV the only case that matters is unmap and unmap
should already be doing a dedicated dirty read directly prior to the
unmap (as per that other long thread)

So having safe racy reading in the kernel is probably best, and so RCU
would be good here too.

> that somewhere I have some half-finished patches making io-pgtable-arm use
> the iommu_iotlb_gather freelist, so maybe I'll tackle both concerns at once
> (perhaps we might even be able to RCU-ify the freelist generically? I'll see
> how it goes when I get there).

This is all very similar to how the mm's tlb gather stuff works,
especially on PPC which requires RCU protection of page tables instead
of the IPI trick.

Currently the rcu_head and the lru overlap in the struct page. To do
this I'd suggest following what was done for slab - see ca1a46d6f506
("Merge tag 'slab-for-5.17'..) and create a 'struct page' sub-class
like 'struct slab' for use with iommu page tables and put the
list_head and rcu_head sequentially in the struct iommu_pt_page.

Continue to use put_pages_list() just with the list_head in the new
struct not the lru.

If we need it for dirty tracking then it makes sense to start
progressing parts at least..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-31 Thread Jason Gunthorpe via iommu
On Tue, May 31, 2022 at 10:29:55AM -0700, Jacob Pan wrote:

> The reason why I store PASID at IOMMU domain is for IOTLB flush within the
> domain. Device driver is not aware of domain level IOTLB flush. We also
> have iova_cookie for each domain which essentially is for RIDPASID.

You need to make the PASID stuff work generically.

The domain needs to hold a list of all the places it needs to flush
and that list needs to be maintained during attach/detach.

A single PASID on the domain is obviously never going to work
generically.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-31 Thread Jason Gunthorpe via iommu
On Tue, May 31, 2022 at 07:07:32PM +0100, Robin Murphy wrote:

> > And we expect the iommu driver to be unable to free page table levels
> > that have IOVA boundaries in them?
> 
> I'm not entirely sure what you mean there, but in general an unmap request
> is expected to match some previous map request 

atomic cmpxchg is OK for inserting new page table levels but it can't
protect you against concurrent freeing of page table levels. So
without locks it means that page tables can't usually be freed. Which
seems to match what the Intel driver does - at least from a cursory
look.

This is one of the reasons the mm has the mmap/etc lock and spinlocks
because we do expect page table levels to get wiped out when VMA's are
zap'd - all the different locks provide the protection against page
tables disappearing under from something manipulating them.

Basically every "lockless" walk in (process) MM land is actually
protected by some kind of lock that blocks zap_page_range() from
removing the page table levels themselves.

> They might either unmap the entire region originally mapped, or just
> the requested part, or might fail entirely (IIRC there was some
> nasty code in VFIO for detecting a particular behaviour).

This is something I did differently in iommufd. It always generates
unmaps that are strict supersets of the maps it issued. So it would be
a kernel bug if the driver unmaps more or less than requested.

> Oh, I've spent the last couple of weeks hacking up horrible things
> manipulating entries in init_mm, and never realised that that was actually
> the special case. Oh well, live and learn.

The init_mm is sort of different, it doesn't have zap in quite the
same way, for example. I was talking about the typical process mm.

Anyhow, the right solution is to use RCU as I described before, Baolu
do you want to try?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-31 Thread Jason Gunthorpe via iommu
On Tue, May 31, 2022 at 05:01:46PM +0100, Robin Murphy wrote:

> The DMA API doesn't need locking, partly since it can trust itself not to do
> stupid things, and mostly because it's DMA API performance that's
> fundamentally incompatible with serialisation anyway. Why do you think we
> have a complicated per-CPU IOVA caching mechanism, if not to support big
> multi-queue devices with multiple CPU threads mapping/unmapping in different
> parts of the same DMA domain concurrently?

Well, per-CPU is a form of locking.

So what are the actual locking rules here? We can call map/unmap
concurrently but not if ... ?

IOVA overlaps?

And we expect the iommu driver to be unable to free page table levels
that have IOVA boundaries in them?

> The simpler drivers already serialise on a per-domain lock internally, while
> the more performance-focused ones implement lock-free atomic pagetable
> management in a similar style to CPU arch code; either way it should work
> fine as-is.

The mm has page table locks at every level and generally expects them
to be held for a lot of manipulations. There are some lockless cases,
but it is not as aggressive as this sounds.

> The difference with debugfs is that it's a completely orthogonal
> side-channel - an iommu_domain user like VFIO or iommu-dma can make sure its
> *own* API usage is sane, but can't be aware of the user triggering some
> driver-internal introspection of that domain in a manner that could race
> more harmfully. 

The mm solution to this problem is to RCU free the page table
levels. This way something like debugfs can read a page table under
RCU completely safely, though incoherently, and there is no
performance cost on the map/unmap fast path side.

Today struct page has a rcu_head that can be used to rcu free it, so
it costs nothing.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-31 Thread Jason Gunthorpe via iommu
On Tue, May 31, 2022 at 08:45:28PM +0800, Baolu Lu wrote:

> My understanding is that device driver owns its PASID policy.

I'm not sure this is actually useful, the core code should provide the
allocator as I can't think of any device that actually needs a special
allocator.

> If ENQCMD is supported on the device, the PASIDs should be allocated
> through ioasid_alloc(). Otherwise, the whole PASID pool is managed
> by the device driver.

This is OK only if we know in-advance that the device needs a global
PASID. ENQCMD is one reason, maybe there are others down the road.

Better to make this core code policy.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-31 Thread Jason Gunthorpe via iommu
On Tue, May 31, 2022 at 02:52:28PM +0100, Robin Murphy wrote:

> > +    break;
> > +    pgtable_walk_level(m, phys_to_virt(phys_addr),
> 
> Also, obligatory reminder that pfn_valid() only means that pfn_to_page()
> gets you a valid struct page. Whether that page is direct-mapped kernel
> memory or not is a different matter.

Even though this is debugfs, if the operation is sketchy like that and
can theortically crash the kernel the driver should test capabilities,
CAP_SYS_RAWIO or something may be appropriate. I don't think we have a
better cap for 'userspace may crash the kernel'

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-31 Thread Jason Gunthorpe via iommu
On Tue, May 31, 2022 at 04:01:04PM +0100, Robin Murphy wrote:
> On 2022-05-31 15:53, Jason Gunthorpe wrote:
> > On Tue, May 31, 2022 at 10:11:18PM +0800, Baolu Lu wrote:
> > > On 2022/5/31 21:10, Jason Gunthorpe wrote:
> > > > On Tue, May 31, 2022 at 11:02:06AM +0800, Baolu Lu wrote:
> > > > 
> > > > > For case 2, it is a bit weird. I tried to add a rwsem lock to make the
> > > > > iommu_unmap() and dumping tables in debugfs exclusive. This does not
> > > > > work because debugfs may depend on the DMA of the devices to work. It
> > > > > seems that what we can do is to allow this race, but when we traverse
> > > > > the page table in debugfs, we will check the validity of the physical
> > > > > address retrieved from the page table entry. Then, the worst case is 
> > > > > to
> > > > > print some useless information.
> > > > 
> > > > Sounds horrible, don't you have locking around the IOPTEs of some
> > > > kind? How does updating them work reliably?
> > > 
> > > There's no locking around updating the IOPTEs. The basic assumption is
> > > that at any time, there's only a single thread manipulating the mappings
> > > of the range specified in iommu_map/unmap() APIs. Therefore, the race
> > > only exists when multiple ranges share some high-level IOPTEs. The IOMMU
> > > driver updates those IOPTEs using the compare-and-exchange atomic
> > > operation.
> > 
> > Oh? Did I miss where that was documented as part of the iommu API?
> > 
> > Daniel posted patches for VFIO to multi-thread iommu_domin mapping.
> > 
> > iommufd goes out of its way to avoid this kind of serialization so
> > that userspace can parallel map IOVA.
> > 
> > I think if this is the requirement then the iommu API needs to
> > provide a lock around the domain for the driver..
> 
> Eww, no, we can't kill performance by forcing serialisation on the entire
> API just for one silly driver-internal debugfs corner :(

I'm not worried about debugfs, I'm worried about these efforts to
speed up VFIO VM booting by parallel domain loading:

https://lore.kernel.org/kvm/20220106004656.126790-1-daniel.m.jor...@oracle.com/

The DMA API should maintain its own external lock, but the general
domain interface to the rest of the kernel should be safe, IMHO.

Or at least it should be documented..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-31 Thread Jason Gunthorpe via iommu
On Tue, May 31, 2022 at 10:11:18PM +0800, Baolu Lu wrote:
> On 2022/5/31 21:10, Jason Gunthorpe wrote:
> > On Tue, May 31, 2022 at 11:02:06AM +0800, Baolu Lu wrote:
> > 
> > > For case 2, it is a bit weird. I tried to add a rwsem lock to make the
> > > iommu_unmap() and dumping tables in debugfs exclusive. This does not
> > > work because debugfs may depend on the DMA of the devices to work. It
> > > seems that what we can do is to allow this race, but when we traverse
> > > the page table in debugfs, we will check the validity of the physical
> > > address retrieved from the page table entry. Then, the worst case is to
> > > print some useless information.
> > 
> > Sounds horrible, don't you have locking around the IOPTEs of some
> > kind? How does updating them work reliably?
> 
> There's no locking around updating the IOPTEs. The basic assumption is
> that at any time, there's only a single thread manipulating the mappings
> of the range specified in iommu_map/unmap() APIs. Therefore, the race
> only exists when multiple ranges share some high-level IOPTEs. The IOMMU
> driver updates those IOPTEs using the compare-and-exchange atomic
> operation.

Oh? Did I miss where that was documented as part of the iommu API?

Daniel posted patches for VFIO to multi-thread iommu_domin mapping.

iommufd goes out of its way to avoid this kind of serialization so
that userspace can parallel map IOVA.

I think if this is the requirement then the iommu API needs to
provide a lock around the domain for the driver..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-31 Thread Jason Gunthorpe via iommu
On Tue, May 31, 2022 at 11:02:06AM +0800, Baolu Lu wrote:

> For case 2, it is a bit weird. I tried to add a rwsem lock to make the
> iommu_unmap() and dumping tables in debugfs exclusive. This does not
> work because debugfs may depend on the DMA of the devices to work. It
> seems that what we can do is to allow this race, but when we traverse
> the page table in debugfs, we will check the validity of the physical
> address retrieved from the page table entry. Then, the worst case is to
> print some useless information.

Sounds horrible, don't you have locking around the IOPTEs of some
kind? How does updating them work reliably?

It is just debugfs, so maybe it is not the end of the world, but
still..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-30 Thread Jason Gunthorpe via iommu
On Tue, May 24, 2022 at 08:17:27AM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 24 May 2022 10:50:34 -0300, Jason Gunthorpe  wrote:
> 
> > On Wed, May 18, 2022 at 11:21:15AM -0700, Jacob Pan wrote:
> > > DMA requests tagged with PASID can target individual IOMMU domains.
> > > Introduce a domain-wide PASID for DMA API, it will be used on the same
> > > mapping as legacy DMA without PASID. Let it be IOVA or PA in case of
> > > identity domain.  
> > 
> > Huh? I can't understand what this is trying to say or why this patch
> > makes sense.
> > 
> > We really should not have pasid's like this attached to the domains..
> > 
> This is the same "DMA API global PASID" you reviewed in v3, I just
> singled it out as a standalone patch and renamed it. Here is your previous
> review comment.
> 
> > +++ b/include/linux/iommu.h
> > @@ -105,6 +105,8 @@ struct iommu_domain {
> > enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
> >   void *data);
> > void *fault_data;
> > +   ioasid_t pasid; /* Used for DMA requests with PASID */
> > +   atomic_t pasid_users;  
> 
> These are poorly named, this is really the DMA API global PASID and
> shouldn't be used for other things.
> 
> 
> 
> Perhaps I misunderstood, do you mind explaining more?

You still haven't really explained what this is for in this patch,
maybe it just needs a better commit message, or maybe something is
wrong.

I keep saying the DMA API usage is not special, so why do we need to
create a new global pasid and refcount? Realistically this is only
going to be used by IDXD, why can't we just allocate a PASID and
return it to the driver every time a driver asks for DMA API on PASI
mode? Why does the core need to do anything special?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-30 Thread Jason Gunthorpe via iommu
On Sun, May 29, 2022 at 01:14:46PM +0800, Baolu Lu wrote:

> From 1e87b5df40c6ce9414cdd03988c3b52bfb17af5f Mon Sep 17 00:00:00 2001
> From: Lu Baolu 
> Date: Sun, 29 May 2022 10:18:56 +0800
> Subject: [PATCH 1/1] iommu/vt-d: debugfs: Remove device_domain_lock usage
> 
> The domain_translation_struct debugfs node is used to dump static
> mappings of PCI devices. It potentially races with setting new
> domains to devices and the iommu_map/unmap() interfaces. The existing
> code tries to use the global spinlock device_domain_lock to avoid the
> races, but this is problematical as this lock is only used to protect
> the device tracking lists of the domains.
> 
> Instead of using an immature lock to cover up the problem, it's better
> to explicitly restrict the use of this debugfs node. This also makes
> device_domain_lock static.

What does "explicitly restrict" mean?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2022-05-27 Thread Jason Gunthorpe
On Fri, May 27, 2022 at 09:35:07AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2022-05-27 06:55, Jason Gunthorpe wrote:
> > On Thu, Apr 07, 2022 at 09:47:16AM -0600, Logan Gunthorpe wrote:
> >> +static void pci_p2pdma_unmap_mappings(void *data)
> >> +{
> >> +  struct pci_dev *pdev = data;
> >> +  struct pci_p2pdma *p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
> >> +
> >> +  /* Ensure no new pages can be allocated in mappings */
> >> +  p2pdma->active = false;
> >> +  synchronize_rcu();
> >> +
> >> +  unmap_mapping_range(p2pdma->inode->i_mapping, 0, 0, 1);
> >> +
> >> +  /*
> >> +   * On some architectures, TLB flushes are done with call_rcu()
> >> +   * so to ensure GUP fast is done with the pages, call synchronize_rcu()
> >> +   * before freeing them.
> >> +   */
> >> +  synchronize_rcu();
> >> +  pci_p2pdma_free_mappings(p2pdma->inode->i_mapping);
> > 
> > With the series from Felix getting close this should get updated to
> > not set pte_devmap and use proper natural refcounting without any of
> > this stuff.
> 
> Can you send a link? I'm not sure what you are referring to.

IIRC this is the last part:

https://lore.kernel.org/linux-mm/20220524190632.3304-1-alex.sie...@amd.com/

And the earlier bit with Christoph's pieces looks like it might get
merged to v5.19..

The general idea is once pte_devmap is not set then all the
refcounting works the way it should. This is what all new ZONE_DEVICE
users should do..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 09/12] iommu/vt-d: Check device list of domain in domain free path

2022-05-27 Thread Jason Gunthorpe via iommu
On Fri, May 27, 2022 at 02:30:16PM +0800, Lu Baolu wrote:
> When the IOMMU domain is about to be freed, it should not be set on any
> device. Instead of silently dealing with some bug cases, it's better to
> trigger a warning to report and fix any potential bugs at the first time.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c | 17 ++---
>  1 file changed, 2 insertions(+), 15 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/12] iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()

2022-05-27 Thread Jason Gunthorpe via iommu
On Fri, May 27, 2022 at 02:30:11PM +0800, Lu Baolu wrote:
> Use pci_get_domain_bus_and_slot() instead of searching the global list
> to retrieve the pci device pointer. This removes device_domain_list
> global list as there are no consumers anymore.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.h |  1 -
>  drivers/iommu/intel/iommu.c | 33 ++---
>  2 files changed, 6 insertions(+), 28 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/12] iommu/vt-d: Remove clearing translation data in disable_dmar_iommu()

2022-05-27 Thread Jason Gunthorpe via iommu
On Fri, May 27, 2022 at 02:30:10PM +0800, Lu Baolu wrote:
> The disable_dmar_iommu() is called when IOMMU initialzation fails or
> the IOMMU is hot-removed from the system. In both cases, there is no
> need to clear the IOMMU translation data structures for devices.
> 
> On the initialization path, the device probing only happens after the
> IOMMU is initialized successfully, hence there're no translation data
> structures.
> 
> On the hot-remove path, there is no real use case where the IOMMU is
> hot-removed, but the devices that it manages are still alive in the
> system. The translation data structures were torn down during device
> release, hence there's no need to repeat it in IOMMU hot-remove path
> either.

Can you leave behind a 1 statement WARN_ON of some kind to check this?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/12] iommu/vt-d: Remove for_each_device_domain()

2022-05-27 Thread Jason Gunthorpe via iommu
On Fri, May 27, 2022 at 02:30:09PM +0800, Lu Baolu wrote:
> The per-device device_domain_info data could be retrieved from the
> device itself. There's no need to search a global list.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.h |  2 --
>  drivers/iommu/intel/iommu.c | 25 -
>  drivers/iommu/intel/pasid.c | 37 +++--
>  3 files changed, 11 insertions(+), 53 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-27 Thread Jason Gunthorpe via iommu
On Fri, May 27, 2022 at 02:30:08PM +0800, Lu Baolu wrote:
> Retrieve the attached domain for a device through the generic interface
> exposed by the iommu core. This also makes device_domain_lock static.
> 
> Signed-off-by: Lu Baolu 
>  drivers/iommu/intel/iommu.h   |  1 -
>  drivers/iommu/intel/debugfs.c | 20 
>  drivers/iommu/intel/iommu.c   |  2 +-
>  3 files changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index a22adfbdf870..8a6d64d726c0 100644
> +++ b/drivers/iommu/intel/iommu.h
> @@ -480,7 +480,6 @@ enum {
>  #define VTD_FLAG_SVM_CAPABLE (1 << 2)
>  
>  extern int intel_iommu_sm;
> -extern spinlock_t device_domain_lock;
>  
>  #define sm_supported(iommu)  (intel_iommu_sm && ecap_smts((iommu)->ecap))
>  #define pasid_supported(iommu)   (sm_supported(iommu) && 
> \
> diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
> index d927ef10641b..eea8727aa7bc 100644
> +++ b/drivers/iommu/intel/debugfs.c
> @@ -344,19 +344,21 @@ static void pgtable_walk_level(struct seq_file *m, 
> struct dma_pte *pde,
>  
>  static int show_device_domain_translation(struct device *dev, void *data)
>  {
> - struct device_domain_info *info = dev_iommu_priv_get(dev);
> - struct dmar_domain *domain = info->domain;
> + struct dmar_domain *dmar_domain;
> + struct iommu_domain *domain;
>   struct seq_file *m = data;
>   u64 path[6] = { 0 };
>  
> + domain = iommu_get_domain_for_dev(dev);
>   if (!domain)
>   return 0;

The iommu_get_domain_for_dev() API should be called something like
'iommu_get_dma_api_domain()' and clearly documented that it is safe to
call only so long as a DMA API using driver is attached to the device,
which is most of the current callers.

This use in random sysfs inside the iommu driver is not OK because it
doesn't have any locking protecting domain from concurrent free.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] drivers: iommu: Directly use ida_alloc()/free()

2022-05-27 Thread Jason Gunthorpe via iommu
On Fri, May 27, 2022 at 07:03:07AM +, keliu wrote:
> Use ida_alloc()/ida_free() instead of deprecated
> ida_simple_get()/ida_simple_remove() .
> 
> Signed-off-by: keliu 
> ---
>  drivers/iommu/iommu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 20/21] PCI/P2PDMA: Introduce pci_mmap_p2pmem()

2022-05-27 Thread Jason Gunthorpe
On Thu, Apr 07, 2022 at 09:47:16AM -0600, Logan Gunthorpe wrote:
> +static void pci_p2pdma_unmap_mappings(void *data)
> +{
> + struct pci_dev *pdev = data;
> + struct pci_p2pdma *p2pdma = rcu_dereference_protected(pdev->p2pdma, 1);
> +
> + /* Ensure no new pages can be allocated in mappings */
> + p2pdma->active = false;
> + synchronize_rcu();
> +
> + unmap_mapping_range(p2pdma->inode->i_mapping, 0, 0, 1);
> +
> + /*
> +  * On some architectures, TLB flushes are done with call_rcu()
> +  * so to ensure GUP fast is done with the pages, call synchronize_rcu()
> +  * before freeing them.
> +  */
> + synchronize_rcu();
> + pci_p2pdma_free_mappings(p2pdma->inode->i_mapping);

With the series from Felix getting close this should get updated to
not set pte_devmap and use proper natural refcounting without any of
this stuff.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu: Add Visconti5 IOMMU driver

2022-05-25 Thread Jason Gunthorpe via iommu
On Thu, May 26, 2022 at 12:43:25AM +, Tian, Kevin wrote:

> iiuc then this suggests there should be only one iommu group per atu,
> instead of using generic_device_group() to create one group per
> device.

Sounds like it

> > rejected then this driver should have a IOMMU_DOMAIN_BLOCKING and
> > return that from ops->def_domain_type().
> 
> BLOCKING should not be used as a default domain type for DMA API
> which needs either a DMA or IDENTITY type.

New drivers should not have a NULL group->default_domain. 

IMHO this driver does not support the DMA API so the default_domain
should be assigned to blocking and the DMA API disabled. We might need
some core changes to accommodate this.

The alternative would be to implement the identity domain, assuming
the ATU thing can store that kind of translation.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu: Add Visconti5 IOMMU driver

2022-05-25 Thread Jason Gunthorpe via iommu
On Wed, May 25, 2022 at 02:26:37PM +0800, Baolu Lu wrote:
> On 2022/5/25 09:31, Nobuhiro Iwamatsu wrote:
> > +static const struct iommu_ops visconti_atu_ops = {
> > +   .domain_alloc = visconti_atu_domain_alloc,
> > +   .probe_device = visconti_atu_probe_device,
> > +   .release_device = visconti_atu_release_device,
> > +   .device_group = generic_device_group,
> > +   .of_xlate = visconti_atu_of_xlate,
> > +   .pgsize_bitmap = ATU_IOMMU_PGSIZE_BITMAP,
> > +   .default_domain_ops = &(const struct iommu_domain_ops) {
> > +   .attach_dev = visconti_atu_attach_device,
> > +   .detach_dev = visconti_atu_detach_device,
> 
> The detach_dev callback is about to be deprecated. The new drivers
> should implement the default domain and blocking domain instead.

Yes please, new drivers need to use default_domains.

It is very strange that visconti_atu_detach_device() does nothing.  It
is not required that a domain is fully unmapped before being
destructed, I think detach should set ATU_AT_EN to 0.

What behavior does the HW have when ATU_AT_ENTRY_EN == 0? If DMA is
rejected then this driver should have a IOMMU_DOMAIN_BLOCKING and
return that from ops->def_domain_type().

Attaching a the blocking domain should set ATU_AT_ENTRY_EN = 0

Also, if I surmise how this works properly, it is not following the
iommu API to halt all DMA during map/unmap operations. Should at least
document this and explain why it is OK..

I'm feeling like these "special" drivers need some kind of handshake
with their only users because they don't work with things like VFIO..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-25 Thread Jason Gunthorpe via iommu
On Wed, May 25, 2022 at 01:19:08PM +0800, Baolu Lu wrote:
> Hi Jason,
> 
> On 2022/5/24 21:44, Jason Gunthorpe wrote:
> > > diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
> > > index 106506143896..210c376f6043 100644
> > > +++ b/drivers/iommu/iommu-sva-lib.c
> > > @@ -69,3 +69,51 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
> > >   return ioasid_find(_sva_pasid, pasid, __mmget_not_zero);
> > >   }
> > >   EXPORT_SYMBOL_GPL(iommu_sva_find);
> > > +
> > > +/*
> > > + * IOMMU SVA driver-oriented interfaces
> > > + */
> > > +struct iommu_domain *
> > > +iommu_sva_alloc_domain(struct bus_type *bus, struct mm_struct *mm)
> > This should return the proper type
> > 
> 
> Can you please elaborate a bit on "return the proper type"? Did you mean
> return iommu_sva_domain instead? That's a wrapper of iommu_domain and
> only for iommu internal usage.

If you are exposing special SVA APIs then it is not internal usage
only anymore, so expose the type.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/6] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-24 Thread Jason Gunthorpe via iommu
On Tue, May 24, 2022 at 01:45:26PM -0700, Jacob Pan wrote:

> > The idea that there is only one PASID per domain per device is not
> > right.
> > 
> Got you, I was under the impression that there is no use case yet for
> multiple PASIDs per device-domain based on our early discussion.

The key word there is "in-kernel" and "DMA API" - iommufd userspace
will do whatever it likes.

I wish you guys would organize your work so adding generic PASID
support to IOMMU was its own series with its own purpose so everything
stops becoming confused with SVA and DMA API ideas that are not
general.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/6] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-24 Thread Jason Gunthorpe via iommu
On Tue, May 24, 2022 at 09:12:35AM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Tue, 24 May 2022 10:51:35 -0300, Jason Gunthorpe  wrote:
> 
> > On Wed, May 18, 2022 at 11:21:17AM -0700, Jacob Pan wrote:
> > > On VT-d platforms with scalable mode enabled, devices issue DMA requests
> > > with PASID need to attach PASIDs to given IOMMU domains. The attach
> > > operation involves the following:
> > > - Programming the PASID into the device's PASID table
> > > - Tracking device domain and the PASID relationship
> > > - Managing IOTLB and device TLB invalidations
> > > 
> > > This patch add attach_dev_pasid functions to the default domain ops
> > > which is used by DMA and identity domain types. It could be extended to
> > > support other domain types whenever necessary.
> > > 
> > > Signed-off-by: Lu Baolu 
> > > Signed-off-by: Jacob Pan 
> > >  drivers/iommu/intel/iommu.c | 72 +++--
> > >  1 file changed, 70 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > > index 1c2c92b657c7..75615c105fdf 100644
> > > +++ b/drivers/iommu/intel/iommu.c
> > > @@ -1556,12 +1556,18 @@ static void __iommu_flush_dev_iotlb(struct
> > > device_domain_info *info, u64 addr, unsigned int mask)
> > >  {
> > >   u16 sid, qdep;
> > > + ioasid_t pasid;
> > >  
> > >   if (!info || !info->ats_enabled)
> > >   return;
> > >  
> > >   sid = info->bus << 8 | info->devfn;
> > >   qdep = info->ats_qdep;
> > > + pasid = iommu_get_pasid_from_domain(info->dev,
> > > >domain->domain);  
> > 
> > No, a simgple domain can be attached to multiple pasids, all need to
> > be flushed.
> > 
> Here is device TLB flush, why would I want to flush PASIDs other than my
> own device attached?

Again, a domain can be attached to multiple PASID's *on the same
device*

The idea that there is only one PASID per domain per device is not
right.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/6] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-24 Thread Jason Gunthorpe via iommu
On Wed, May 18, 2022 at 11:21:17AM -0700, Jacob Pan wrote:
> On VT-d platforms with scalable mode enabled, devices issue DMA requests
> with PASID need to attach PASIDs to given IOMMU domains. The attach
> operation involves the following:
> - Programming the PASID into the device's PASID table
> - Tracking device domain and the PASID relationship
> - Managing IOTLB and device TLB invalidations
> 
> This patch add attach_dev_pasid functions to the default domain ops which
> is used by DMA and identity domain types. It could be extended to support
> other domain types whenever necessary.
> 
> Signed-off-by: Lu Baolu 
> Signed-off-by: Jacob Pan 
>  drivers/iommu/intel/iommu.c | 72 +++--
>  1 file changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 1c2c92b657c7..75615c105fdf 100644
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1556,12 +1556,18 @@ static void __iommu_flush_dev_iotlb(struct 
> device_domain_info *info,
>   u64 addr, unsigned int mask)
>  {
>   u16 sid, qdep;
> + ioasid_t pasid;
>  
>   if (!info || !info->ats_enabled)
>   return;
>  
>   sid = info->bus << 8 | info->devfn;
>   qdep = info->ats_qdep;
> + pasid = iommu_get_pasid_from_domain(info->dev, >domain->domain);

No, a simgple domain can be attached to multiple pasids, all need to
be flushed.

This whole API isn't suitable.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-24 Thread Jason Gunthorpe via iommu
On Wed, May 18, 2022 at 11:21:15AM -0700, Jacob Pan wrote:
> DMA requests tagged with PASID can target individual IOMMU domains.
> Introduce a domain-wide PASID for DMA API, it will be used on the same
> mapping as legacy DMA without PASID. Let it be IOVA or PA in case of
> identity domain.

Huh? I can't understand what this is trying to say or why this patch
makes sense.

We really should not have pasid's like this attached to the domains..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-24 Thread Jason Gunthorpe via iommu
On Thu, May 19, 2022 at 03:20:40PM +0800, Lu Baolu wrote:
> The iommu_sva_domain represents a hardware pagetable that the IOMMU
> hardware could use for SVA translation. This adds some infrastructure
> to support SVA domain in the iommu common layer. It includes:
> 
> - Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA domain
>   type.
> - Add a new domain ops pointer in iommu_ops. The IOMMU drivers that
>   support SVA should provide the callbacks.
> - Add helpers to allocate and free an SVA domain.
> - Add helpers to set an SVA domain to a device and the reverse
>   operation.
> 
> Some buses, like PCI, route packets without considering the PASID value.
> Thus a DMA target address with PASID might be treated as P2P if the
> address falls into the MMIO BAR of other devices in the group. To make
> things simple, the attach/detach interfaces only apply to devices
> belonging to the singleton groups, and the singleton is immutable in
> fabric i.e. not affected by hotplug.
> 
> The iommu_set/block_device_pasid() can be used for other purposes,
> such as kernel DMA with pasid, mediation device, etc. Hence, it is put
> in the iommu.c.
> 
> Suggested-by: Jean-Philippe Brucker 
> Suggested-by: Jason Gunthorpe 
> Signed-off-by: Lu Baolu 
>  include/linux/iommu.h | 51 +
>  drivers/iommu/iommu-sva-lib.h | 15 
>  drivers/iommu/iommu-sva-lib.c | 48 +++
>  drivers/iommu/iommu.c | 71 +++
>  4 files changed, 185 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 0c358b7c583b..e8cf82d46ce1 100644
> +++ b/include/linux/iommu.h
> @@ -64,6 +64,9 @@ struct iommu_domain_geometry {
>  #define __IOMMU_DOMAIN_PT(1U << 2)  /* Domain is identity mapped   */
>  #define __IOMMU_DOMAIN_DMA_FQ(1U << 3)  /* DMA-API uses flush queue  
>   */
>  
> +#define __IOMMU_DOMAIN_SHARED(1U << 4)  /* Page table shared from 
> CPU  */
> +#define __IOMMU_DOMAIN_HOST_VA   (1U << 5)  /* Host CPU virtual address 
> */
> +
>  /*
>   * This are the possible domain-types
>   *
> @@ -86,6 +89,8 @@ struct iommu_domain_geometry {
>  #define IOMMU_DOMAIN_DMA_FQ  (__IOMMU_DOMAIN_PAGING |\
>__IOMMU_DOMAIN_DMA_API |   \
>__IOMMU_DOMAIN_DMA_FQ)
> +#define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SHARED |\
> +  __IOMMU_DOMAIN_HOST_VA)
>  
>  struct iommu_domain {
>   unsigned type;
> @@ -254,6 +259,7 @@ struct iommu_ops {
>   int (*def_domain_type)(struct device *dev);
>  
>   const struct iommu_domain_ops *default_domain_ops;
> + const struct iommu_domain_ops *sva_domain_ops;
>   unsigned long pgsize_bitmap;
>   struct module *owner;
>  };
> @@ -262,6 +268,8 @@ struct iommu_ops {
>   * struct iommu_domain_ops - domain specific operations
>   * @attach_dev: attach an iommu domain to a device
>   * @detach_dev: detach an iommu domain from a device
> + * @set_dev_pasid: set an iommu domain to a pasid of device
> + * @block_dev_pasid: block pasid of device from using iommu domain
>   * @map: map a physically contiguous memory region to an iommu domain
>   * @map_pages: map a physically contiguous set of pages of the same size to
>   * an iommu domain.
> @@ -282,6 +290,10 @@ struct iommu_ops {
>  struct iommu_domain_ops {
>   int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
>   void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
> + int (*set_dev_pasid)(struct iommu_domain *domain, struct device *dev,
> +  ioasid_t pasid);
> + void (*block_dev_pasid)(struct iommu_domain *domain, struct device *dev,
> + ioasid_t pasid);
>  
>   int (*map)(struct iommu_domain *domain, unsigned long iova,
>  phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
> @@ -677,6 +689,10 @@ int iommu_group_claim_dma_owner(struct iommu_group 
> *group, void *owner);
>  void iommu_group_release_dma_owner(struct iommu_group *group);
>  bool iommu_group_dma_owner_claimed(struct iommu_group *group);
>  
> +int iommu_set_device_pasid(struct iommu_domain *domain, struct device *dev,
> +ioasid_t pasid);
> +void iommu_block_device_pasid(struct iommu_domain *domain, struct device 
> *dev,
> +   ioasid_t pasid);
>  #else /* CONFIG_IOMMU_API */
>  
>  struct iommu_ops {};
> @@ -1050,6 +1066,17 @@ static inline bool 
> iommu_group_dma_owner_claimed(struct iommu_group *group)
>  {
>   return false;
> 

Re: [PATCH v7 03/10] iommu/sva: Add iommu_sva_domain support

2022-05-24 Thread Jason Gunthorpe via iommu
On Tue, May 24, 2022 at 09:39:52AM +, Tian, Kevin wrote:
> > From: Lu Baolu 
> > Sent: Thursday, May 19, 2022 3:21 PM
> > 
> > The iommu_sva_domain represents a hardware pagetable that the IOMMU
> > hardware could use for SVA translation. This adds some infrastructure
> > to support SVA domain in the iommu common layer. It includes:
> > 
> > - Add a new struct iommu_sva_domain and new IOMMU_DOMAIN_SVA
> > domain
> >   type.
> > - Add a new domain ops pointer in iommu_ops. The IOMMU drivers that
> >   support SVA should provide the callbacks.
> > - Add helpers to allocate and free an SVA domain.
> > - Add helpers to set an SVA domain to a device and the reverse
> >   operation.
> > 
> > Some buses, like PCI, route packets without considering the PASID value.
> > Thus a DMA target address with PASID might be treated as P2P if the
> > address falls into the MMIO BAR of other devices in the group. To make
> > things simple, the attach/detach interfaces only apply to devices
> > belonging to the singleton groups, and the singleton is immutable in
> > fabric i.e. not affected by hotplug.
> > 
> > The iommu_set/block_device_pasid() can be used for other purposes,
> > such as kernel DMA with pasid, mediation device, etc. Hence, it is put
> > in the iommu.c.
> 
> usually we have 'set/clear' pair or 'allow/block'. Having 'set' paired
> with 'block' doesn't read very clearly.

I thought we agreed we'd use the blocking domain for this? Why did it
go back to an op?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-24 Thread Jason Gunthorpe via iommu
On Mon, May 23, 2022 at 04:02:22PM +1000, Alexey Kardashevskiy wrote:

> Which means the guest RAM does not need to be all mapped in that base IOAS
> suggested down this thread as that would mean all memory is pinned and
> powervm won't be able to swap it out (yeah, it can do such thing now!). Not
> sure if we really want to support this or stick to a simpler design.

Huh? How can it swap? Calling GUP is not optional. Either you call GUP
at the start and there is no swap, or you call GUP for each vIOMMU
hypercall.

Since everyone says PPC doesn't call GUP during the hypercall - how is
it working?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-20 Thread Jason Gunthorpe via iommu
On Fri, May 20, 2022 at 01:00:32PM +0100, Will Deacon wrote:
> On Fri, May 20, 2022 at 12:02:23PM +0100, Robin Murphy wrote:
> > On 2022-05-10 17:55, Jason Gunthorpe via iommu wrote:
> > > This control causes the ARM SMMU drivers to choose a stage 2
> > > implementation for the IO pagetable (vs the stage 1 usual default),
> > > however this choice has no visible impact to the VFIO user.
> > 
> > Oh, I should have read more carefully... this isn't entirely true. Stage 2
> > has a different permission model from stage 1, so although it's arguably
> > undocumented behaviour, VFIO users that know enough about the underlying
> > system could use this to get write-only mappings if they so wish.
> 
> There's also an impact on combining memory attributes, but it's not hugely
> clear how that impacts userspace via things like VFIO_DMA_CC_IOMMU.

VFIO_DMA_CC_IOMMU has been clarified now to only be about the ability
to reject device requested no-snoop transactions. ie ignore the
NoSnoop bit in PCIe TLPs.

AFAICT SMMU can't implement this currently, and maybe doesn't need to.

VFIO requires the IO page tables be setup for cache coherent operation
for ordinary device DMA otherwises users like DPDK will not work.

It is surprising to hear you say that VFIO_TYPE1_NESTING_IOMMU might
also cause the IO to become non-coherent..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-20 Thread Jason Gunthorpe via iommu
On Fri, May 20, 2022 at 12:02:23PM +0100, Robin Murphy wrote:
> On 2022-05-10 17:55, Jason Gunthorpe via iommu wrote:
> > This control causes the ARM SMMU drivers to choose a stage 2
> > implementation for the IO pagetable (vs the stage 1 usual default),
> > however this choice has no visible impact to the VFIO user.
> 
> Oh, I should have read more carefully... this isn't entirely true. Stage 2
> has a different permission model from stage 1, so although it's arguably
> undocumented behaviour, VFIO users that know enough about the underlying
> system could use this to get write-only mappings if they so wish.

I think this is getting very pedantic, the intention of this was to
enable nesting, not to expose subtle differences in ARM
architecture. I'm very doubtful something exists here that uses this
without also using the other parts.

> > Just in-case there is some userspace using this continue to treat
> > requesting it as a NOP, but do not advertise support any more.
> 
> This also seems a bit odd, especially given that it's not actually a no-op;
> surely either it's supported and functional or it isn't?

Indeed, I was loath to completely break possibly existing broken
userspace, but I agree we could just fail here as well. Normal
userspace should see the missing capability and not call the API at
all. But maybe somehow hardwired their VMM or something IDK.

> In all honesty, I'm not personally attached to this code either way. If this
> patch had come 5 years ago, when the interface already looked like a bit of
> a dead end, I'd probably have agreed more readily. But now, when we're
> possibly mere months away from implementing the functional equivalent for
> IOMMUFD, which if done right might be able to support a trivial compat layer
> for this anyway,

There won't be compat for this - the generic code is not going to know
about the driver specific details of how nesting works.  The plan is
to have some new op like:

   alloc_user_domain(struct device *dev, void *params, size_t params_len, ..)

And all the special driver-specific iommu domains will be allocated
opaquely this way. The stage 1 or stage 2 choice will be specified in
the params along with any other HW specific parameters required to
hook it up properly.

So, the core code can't just do what VFIO_TYPE1_NESTING_IOMMU is doing
now without also being hardwired to make it work with ARM.

This is why I want to remove it now to be clear we are not going to be
accommodating this API.

> I just don't see what we gain from not at least waiting to see where
> that ends up. The given justification reads as "get rid of this code
> that we already know we'll need to bring back in some form, and
> half-break an unpopular VFIO ABI because it doesn't do *everything*
> that its name might imply", which just isn't convincing me.

No it wont come back. The code that we will need is the driver code in
SMMU, and that wasn't touched here.

We can defer this, as long as we all agree the uAPI is going away.

But I don't see a strong argument why we should wait either.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/s390: tolerate repeat attach_dev calls

2022-05-19 Thread Jason Gunthorpe via iommu
On Thu, May 19, 2022 at 02:29:29PM -0400, Matthew Rosato wrote:
> Since commit 0286300e6045 ("iommu: iommu_group_claim_dma_owner() must
> always assign a domain") s390-iommu will get called to allocate multiple
> unmanaged iommu domains for a vfio-pci device -- however the current
> s390-iommu logic tolerates only one.  Recognize that multiple domains can
> be allocated and handle switching between DMA or different iommu domain
> tables during attach_dev.
> 
> Signed-off-by: Matthew Rosato 
> ---
>  drivers/iommu/s390-iommu.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)

Makes senese, thanks

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] vfio: Do not manipulate iommu dma_owner for fake iommu groups

2022-05-19 Thread Jason Gunthorpe via iommu
Since asserting dma ownership now causes the group to have its DMA blocked
the iommu layer requires a working iommu. This means the dma_owner APIs
cannot be used on the fake groups that VFIO creates. Test for this and
avoid calling them.

Otherwise asserting dma ownership will fail for VFIO mdev devices as a
BLOCKING iommu_domain cannot be allocated due to the NULL iommu ops.

Fixes: 0286300e6045 ("iommu: iommu_group_claim_dma_owner() must always assign a 
domain")
Reported-by: Eric Farman 
Tested-by: Eric Farman 
Signed-off-by: Jason Gunthorpe 
---
 drivers/vfio/vfio.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

Sort of a v2 from here:

https://lore.kernel.org/all/20220518191446.gu1343...@nvidia.com/

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index cfcff7764403fc..f5ed03897210c3 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -927,7 +927,8 @@ static void __vfio_group_unset_container(struct vfio_group 
*group)
driver->ops->detach_group(container->iommu_data,
  group->iommu_group);
 
-   iommu_group_release_dma_owner(group->iommu_group);
+   if (group->type == VFIO_IOMMU)
+   iommu_group_release_dma_owner(group->iommu_group);
 
group->container = NULL;
group->container_users = 0;
@@ -1001,9 +1002,11 @@ static int vfio_group_set_container(struct vfio_group 
*group, int container_fd)
goto unlock_out;
}
 
-   ret = iommu_group_claim_dma_owner(group->iommu_group, f.file);
-   if (ret)
-   goto unlock_out;
+   if (group->type == VFIO_IOMMU) {
+   ret = iommu_group_claim_dma_owner(group->iommu_group, f.file);
+   if (ret)
+   goto unlock_out;
+   }
 
driver = container->iommu_driver;
if (driver) {
@@ -1011,7 +1014,9 @@ static int vfio_group_set_container(struct vfio_group 
*group, int container_fd)
group->iommu_group,
group->type);
if (ret) {
-   iommu_group_release_dma_owner(group->iommu_group);
+   if (group->type == VFIO_IOMMU)
+   iommu_group_release_dma_owner(
+   group->iommu_group);
goto unlock_out;
}
}

base-commit: 7ab5e10eda02da1d9562ffde562c51055d368e9c
-- 
2.36.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu: iommu_group_claim_dma_owner() must always assign a domain

2022-05-19 Thread Jason Gunthorpe via iommu
On Thu, May 19, 2022 at 10:51:47AM -0600, Alex Williamson wrote:
> On Thu, 19 May 2022 09:32:05 +0200
> Joerg Roedel  wrote:
> 
> > On Wed, May 18, 2022 at 04:14:46PM -0300, Jason Gunthorpe wrote:
> > > Fixes: 0286300e6045 ("iommu: iommu_group_claim_dma_owner() must always 
> > > assign a domain")
> > > Reported-by: Eric Farman 
> > > Signed-off-by: Jason Gunthorpe 
> > >  drivers/vfio/vfio.c | 15 ++-
> > >  1 file changed, 10 insertions(+), 5 deletions(-)  
> > 
> > That fix will be taken care of by Alex? Or does it need to go through
> > the IOMMU tree?
> 
> The comment suggested my tree.  Jason, were you planning to post this
> on its own instead of buried in a reply or shall we take it from
> here?

Let me repost it with a proper cc list, I think your tree is best.

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 10/12] iommufd: Add kAPI toward external drivers

2022-05-19 Thread Jason Gunthorpe via iommu
On Thu, May 19, 2022 at 05:45:06PM +0800, Yi Liu wrote:
> Hi Jason,
> 
> On 2022/3/19 01:27, Jason Gunthorpe wrote:
> 
> > +/**
> > + * iommufd_device_attach - Connect a device to an iommu_domain
> > + * @idev: device to attach
> > + * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
> > + * Output the IOMMUFD_OBJ_HW_PAGETABLE ID
> > + * @flags: Optional flags
> > + *
> > + * This connects the device to an iommu_domain, either automatically or 
> > manually
> > + * selected. Once this completes the device could do DMA.
> > + *
> > + * The caller should return the resulting pt_id back to userspace.
> > + * This function is undone by calling iommufd_device_detach().
> > + */
> > +int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id,
> > + unsigned int flags)
> > +{

> Just double check here.
> This API doesn't prevent caller from calling this API multiple times with
> the same @idev and @pt_id. right? Note that idev has only one device_item
> list head. If caller does do multiple callings, then there should be
> problem. right? If so, this API assumes caller should take care of it and
> not do such bad function call. Is this the design here?

Yes, caller must ensure strict pairing, we don't have an assertion to
check it.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] iommu: iommu_group_claim_dma_owner() must always assign a domain

2022-05-18 Thread Jason Gunthorpe via iommu
On Wed, May 18, 2022 at 02:50:36PM -0400, Eric Farman wrote:

> I got a heads up from Matt about the s390 KVM vfio- variants failing on
> linux-next.
> 
> For vfio-ap and vfio-ccw, they fail on the above error. Both calls to
> __iommu_domain_alloc fail because while dev->dev->bus is non-NULL (it
> points to the mdev bus_type registered in mdev_init()), the bus-
> >iommu_ops pointer is NULL. Which makes sense; the iommu_group is vfio-
> noiommu, via vfio_register_emulated_iommu_dev(), and mdev didn't
> establish an iommu_ops for its bus.

Oh, I think this is a VFIO problem, the iommu layer should not have to
deal with these fake non-iommu groups.

>From 9884850a5ceac957e6715beab0888294d4088877 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe 
Date: Wed, 18 May 2022 16:03:34 -0300
Subject: [PATCH] vfio: Do not manipulate iommu dma_owner for fake iommu groups

Since asserting dma ownership now causes the group to have its DMA blocked
the iommu layer requires a working iommu. This means the dma_owner APIs
cannot be used on the fake groups that VFIO creates. Test for this and
avoid calling them.

Otherwise asserting dma ownership will fail for VFIO mdev devices as a
BLOCKING iommu_domain cannot be allocated due to the NULL iommu ops.

Fixes: 0286300e6045 ("iommu: iommu_group_claim_dma_owner() must always assign a 
domain")
Reported-by: Eric Farman 
Signed-off-by: Jason Gunthorpe 
---
 drivers/vfio/vfio.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

I think this will have to go through Alex's tree due to all the other rework
in this area.

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index cfcff7764403fc..f5ed03897210c3 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -927,7 +927,8 @@ static void __vfio_group_unset_container(struct vfio_group 
*group)
driver->ops->detach_group(container->iommu_data,
  group->iommu_group);
 
-   iommu_group_release_dma_owner(group->iommu_group);
+   if (group->type == VFIO_IOMMU)
+   iommu_group_release_dma_owner(group->iommu_group);
 
group->container = NULL;
group->container_users = 0;
@@ -1001,9 +1002,11 @@ static int vfio_group_set_container(struct vfio_group 
*group, int container_fd)
goto unlock_out;
}
 
-   ret = iommu_group_claim_dma_owner(group->iommu_group, f.file);
-   if (ret)
-   goto unlock_out;
+   if (group->type == VFIO_IOMMU) {
+   ret = iommu_group_claim_dma_owner(group->iommu_group, f.file);
+   if (ret)
+   goto unlock_out;
+   }
 
driver = container->iommu_driver;
if (driver) {
@@ -1011,7 +1014,9 @@ static int vfio_group_set_container(struct vfio_group 
*group, int container_fd)
group->iommu_group,
group->type);
if (ret) {
-   iommu_group_release_dma_owner(group->iommu_group);
+   if (group->type == VFIO_IOMMU)
+   iommu_group_release_dma_owner(
+   group->iommu_group);
goto unlock_out;
}
}
-- 
2.36.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-18 Thread Jason Gunthorpe via iommu
On Wed, May 18, 2022 at 11:42:04AM -0700, Jacob Pan wrote:

> > Yes.. It seems inefficient to iterate over that xarray multiple times
> > on the flush hot path, but maybe there is little choice. Try to use
> > use the xas iterators under the xa_lock spinlock..
> > 
> xas_for_each takes a max range, here we don't really have one. So I posted
> v4 w/o using the xas advanced API. Please let me know if you have
> suggestions.

You are supposed to use ULONG_MAX in cases like that.

> xa_for_each takes RCU read lock, it should be fast for tlb flush, right? The
> worst case maybe over flush when we have stale data but should be very rare.

Not really, xa_for_each walks the tree for every iteration, it is
slower than a linked list walk in any cases where the xarray is
multi-node. xas_for_each is able to retain a pointer where it is in
the tree so each iteration is usually just a pointer increment.

The downside is you cannot sleep while doing xas_for_each

> > The challenge will be accessing the group xa in the first place, but
> > maybe the core code can gain a function call to return a pointer to
> > that XA or something..
 
> I added a helper function to find the matching DMA API PASID in v4.

Again, why are we focused on DMA API? Nothing you build here should be
DMA API beyond the fact that the iommu_domain being attached is the
default domain.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-17 Thread Jason Gunthorpe via iommu
On Tue, May 17, 2022 at 01:43:03PM +0100, Robin Murphy wrote:

> FWIW from my point of view I'm happy with having a .detach_dev_pasid op
> meaning implicitly-blocked access for now. 

If this is the path then lets not call it attach/detach
please. 'set_dev_pasid' and 'set_dev_blocking_pasid' are clearer
names.

> On SMMUv3, PASIDs don't mix with our current notion of
> IOMMU_DOMAIN_IDENTITY (nor the potential one for
> IOMMU_DOMAIN_BLOCKED), so giving PASIDs functional symmetry with
> devices would need significantly more work anyway.

There is no extra work in the driver, beyond SMMU having to implement
a blocking domain. The blocking domain's set_dev_pasid op simply is
whatever set_dev_blocking_pasid would have done on the unmanaged
domain.

identity doesn't come into this, identity domains should have a NULL
set_dev_pasid op if the driver can't support using it on a PASID.

IMHO blocking_domain->ops->set_dev_pasid() is just a more logical name
than domain->ops->set_dev_blocking_pasid() - especially since VFIO
would like drivers to implement blocking domain anyhow.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-17 Thread Jason Gunthorpe via iommu
On Tue, May 17, 2022 at 10:37:55AM +0800, Baolu Lu wrote:
> Hi Jason,
> 
> On 2022/5/16 21:57, Jason Gunthorpe wrote:
> > On Mon, May 16, 2022 at 12:22:08PM +0100, Robin Murphy wrote:
> > > On 2022-05-16 02:57, Lu Baolu wrote:
> > > > Each IOMMU driver must provide a blocking domain ops. If the hardware
> > > > supports detaching domain from device, setting blocking domain equals
> > > > detaching the existing domain from the deivce. Otherwise, an UNMANAGED
> > > > domain without any mapping will be used instead.
> > > Unfortunately that's backwards - most of the implementations of 
> > > .detach_dev
> > > are disabling translation entirely, meaning the device ends up effectively
> > > in passthrough rather than blocked.
> > Ideally we'd convert the detach_dev of every driver into either
> > a blocking or identity domain. The trick is knowing which is which..
> 
> I am still a bit puzzled about how the blocking_domain should be used when
> it is extended to support ->set_dev_pasid.
> 
> If it's a blocking domain, the IOMMU driver knows that setting the
> blocking domain to device pasid means detaching the existing one.

> But if it's an identity domain, how could the IOMMU driver choose
> between:

The identity domain would never be used for detaching a PASID. The
reason it is in this explanation is because every detach_dev op should
be deleted - and the code that it was doing can remain in the driver
as either a blocking or identity domain.

> > So, the approach here should be to go driver by driver and convert
> > detach_dev to either identity, blocking or just delete it entirely,
> > excluding the above 7 that don't support default domains. And get acks
> > from the driver owners.
 
> Agreed. There seems to be a long way to go. I am wondering if we could
> decouple this refactoring from my new SVA API work? We can easily switch
> .detach_dev_pasid to using blocking domain later.

Well, PASID has nothing to do with this. PASID enabled drivers would
just need:
 - Must be using default domains and must have a NULL detach_dev op
 - Must provide a blocking_domain
 - Must provide set_dev_pasid for the unmanaged and blocking domains
   it creates

That is it.

Realigning the existing drivers for their base RID support is another
task.

The appeal of this is that it matches how base RID support should look
and doesn't continue the confusing appearance that there is a strictly
paired attach/detach operation.

Any domain can be set ony and RID/PASID at any time.

Drivers intended to be used with VFIO really want a blocking_domain
anyhow for efficiency.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU

2022-05-17 Thread Jason Gunthorpe via iommu
On Tue, May 17, 2022 at 10:05:43AM +0800, Baolu Lu wrote:
> Hi Jason,
> 
> On 2022/5/17 02:06, Jason Gunthorpe wrote:
> > > +static __init int tboot_force_iommu(void)
> > > +{
> > > + if (!tboot_enabled())
> > > + return 0;
> > > +
> > > + if (no_iommu || dmar_disabled)
> > > + pr_warn("Forcing Intel-IOMMU to enabled\n");
> > Unrelated, but when we are in the special secure IOMMU modes, do we
> > force ATS off? Specifically does the IOMMU reject TLPs that are marked
> > as translated?
> 
> Good question. From IOMMU point of view, I don't see a point to force
> ATS off, but trust boot involves lots of other things that I am not
> familiar with. Anybody else could help to answer?

ATS is inherently not secure, if a rouge device can issue a TLP with
the translated bit set then it has unlimited access to host memory.

Many of these trusted iommu scenarios rely on the idea that a rouge
device cannot DMA to arbitary system memory.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 7/7] iommu/vt-d: Move include/linux/intel_iommu.h under iommu

2022-05-16 Thread Jason Gunthorpe via iommu
On Sat, May 14, 2022 at 09:43:22AM +0800, Lu Baolu wrote:
> This header file is private to the Intel IOMMU driver. Move it to the
> driver folder.
> 
> Signed-off-by: Lu Baolu 
> ---
>  include/linux/intel-iommu.h => drivers/iommu/intel/iommu.h | 0
>  drivers/iommu/intel/trace.h| 3 ++-
>  drivers/iommu/intel/cap_audit.c| 2 +-
>  drivers/iommu/intel/debugfs.c  | 2 +-
>  drivers/iommu/intel/dmar.c | 2 +-
>  drivers/iommu/intel/iommu.c| 2 +-
>  drivers/iommu/intel/irq_remapping.c| 2 +-
>  drivers/iommu/intel/pasid.c| 2 +-
>  drivers/iommu/intel/perf.c | 2 +-
>  drivers/iommu/intel/svm.c  | 2 +-
>  MAINTAINERS| 1 -
>  11 files changed, 10 insertions(+), 10 deletions(-)
>  rename include/linux/intel-iommu.h => drivers/iommu/intel/iommu.h (100%)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU

2022-05-16 Thread Jason Gunthorpe via iommu
On Sat, May 14, 2022 at 09:43:21AM +0800, Lu Baolu wrote:
> tboot_force_iommu() is only called by the Intel IOMMU driver. Move the
> helper into that driver. No functional change intended.
> 
> Signed-off-by: Lu Baolu 
> ---
>  include/linux/tboot.h   |  2 --
>  arch/x86/kernel/tboot.c | 15 ---
>  drivers/iommu/intel/iommu.c | 14 ++
>  3 files changed, 14 insertions(+), 17 deletions(-)

Reviewed-by: Jason Gunthorpe 

> +static __init int tboot_force_iommu(void)
> +{
> + if (!tboot_enabled())
> + return 0;
> +
> + if (no_iommu || dmar_disabled)
> + pr_warn("Forcing Intel-IOMMU to enabled\n");

Unrelated, but when we are in the special secure IOMMU modes, do we
force ATS off? Specifically does the IOMMU reject TLPs that are marked
as translated?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/7] KVM: x86: Remove unnecessary include

2022-05-16 Thread Jason Gunthorpe via iommu
On Sat, May 14, 2022 at 09:43:20AM +0800, Lu Baolu wrote:
> intel-iommu.h is not needed in kvm/x86 anymore. Remove its include.
> 
> Signed-off-by: Lu Baolu 
> ---
>  arch/x86/kvm/x86.c | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/7] drm/i915: Remove unnecessary include

2022-05-16 Thread Jason Gunthorpe via iommu
On Sat, May 14, 2022 at 09:43:19AM +0800, Lu Baolu wrote:
> intel-iommu.h is not needed in drm/i915 anymore. Remove its include.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/gpu/drm/i915/i915_drv.h| 1 -
>  drivers/gpu/drm/i915/display/intel_display.c   | 1 -
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 1 -
>  3 files changed, 3 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/7] iommu/vt-d: Remove unnecessary exported symbol

2022-05-16 Thread Jason Gunthorpe via iommu
On Sat, May 14, 2022 at 09:43:18AM +0800, Lu Baolu wrote:
> The exported symbol intel_iommu_gfx_mapped is not used anywhere in the
> tree. Remove it to avoid dead code.
> 
> Signed-off-by: Lu Baolu 
> ---
>  include/linux/intel-iommu.h | 1 -
>  drivers/iommu/intel/iommu.c | 6 --
>  2 files changed, 7 deletions(-)

Reviewed-by: Jason Gunthorpe 

Maybe could squash to the prior patch

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/7] agp/intel: Use per device iommu check

2022-05-16 Thread Jason Gunthorpe via iommu
On Sat, May 14, 2022 at 09:43:17AM +0800, Lu Baolu wrote:
> The IOMMU subsystem has already provided an interface to query whether
> the IOMMU hardware is enabled for a specific device. This changes the
> check from Intel specific intel_iommu_gfx_mapped (globally exported by
> the Intel IOMMU driver) to probing the presence of IOMMU on a specific
> device using the generic device_iommu_mapped().
> 
> This follows commit cca084692394a ("drm/i915: Use per device iommu check")
> which converted drm/i915 driver to use device_iommu_mapped().
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/char/agp/intel-gtt.c | 17 +++--
>  1 file changed, 7 insertions(+), 10 deletions(-)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/7] iommu/vt-d: Move trace/events/intel_iommu.h under iommu

2022-05-16 Thread Jason Gunthorpe via iommu
On Sat, May 14, 2022 at 09:43:16AM +0800, Lu Baolu wrote:
> This header file is private to the Intel IOMMU driver. Move it to the
> driver folder.
> 
> Signed-off-by: Lu Baolu 
> ---
>  .../trace/events/intel_iommu.h => drivers/iommu/intel/trace.h | 4 
>  drivers/iommu/intel/dmar.c| 2 +-
>  drivers/iommu/intel/svm.c | 2 +-
>  drivers/iommu/intel/trace.c   | 2 +-
>  4 files changed, 7 insertions(+), 3 deletions(-)
>  rename include/trace/events/intel_iommu.h => drivers/iommu/intel/trace.h 
> (94%)

Reviewed-by: Jason Gunthorpe 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-16 Thread Jason Gunthorpe via iommu
On Mon, May 16, 2022 at 12:22:08PM +0100, Robin Murphy wrote:
> On 2022-05-16 02:57, Lu Baolu wrote:
> > Each IOMMU driver must provide a blocking domain ops. If the hardware
> > supports detaching domain from device, setting blocking domain equals
> > detaching the existing domain from the deivce. Otherwise, an UNMANAGED
> > domain without any mapping will be used instead.
> 
> Unfortunately that's backwards - most of the implementations of .detach_dev
> are disabling translation entirely, meaning the device ends up effectively
> in passthrough rather than blocked.

Ideally we'd convert the detach_dev of every driver into either
a blocking or identity domain. The trick is knowing which is which..

Guessing going down the list:
 apple dart - blocking, detach_dev calls apple_dart_hw_disable_dma() same as
  IOMMU_DOMAIN_BLOCKED
  [I wonder if this drive ris wrong in other ways though because
   I dont see a remove_streams in attach_dev]
 exynos - this seems to disable the 'sysmmu' so I'm guessing this is
  identity
 iommu-vmsa - Comment says 'disable mmu translaction' so I'm guessing
  this is idenity
 mkt_v1 - Code looks similar to mkt, which is probably identity.
 rkt - No idea
 sprd - No idea
 sun50i - This driver confusingly treats identity the same as
  unmanaged, seems wrong, no idea.
 amd - Not sure, clear_dte_entry() seems to set translation on but points
   the PTE to 0 ? Based on the spec table 8 I would have expected
   TV to be clear which would be blocking. Maybe a bug??
 arm smmu qcomm - not sure
 intel - blocking

These doesn't support default domains, so detach_dev should return 
back to DMA API ownership, which is either identity or something weird:
 fsl_pamu - identity due to the PPC use of dma direct
 msm
 mkt
 omap
 s390 - platform DMA ops
 terga-gart - Usually something called a GART would be 0 length once
  disabled, guessing blocking?
 tegra-smmu

So, the approach here should be to go driver by driver and convert
detach_dev to either identity, blocking or just delete it entirely,
excluding the above 7 that don't support default domains. And get acks
from the driver owners.

> Conversely, at least arm-smmu and arm-smmu-v3 could implement
> IOMMU_DOMAIN_BLOCKED properly with fault-type S2CRs and STEs
> respectively, it just needs a bit of wiring up.

Given that vfio now uses them it seems worthwhile to do..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu: Add blocking_domain_ops field in iommu_ops

2022-05-16 Thread Jason Gunthorpe via iommu
On Mon, May 16, 2022 at 12:27:41AM -0700, Christoph Hellwig wrote:
> On Mon, May 16, 2022 at 09:57:56AM +0800, Lu Baolu wrote:
> > Each IOMMU driver must provide a blocking domain ops. If the hardware
> > supports detaching domain from device, setting blocking domain equals
> > detaching the existing domain from the deivce. Otherwise, an UNMANAGED
> > domain without any mapping will be used instead.
> 
> blocking in this case means not allowing any access?  The naming
> sounds a bit odd to me as blocking in the kernel has a specific
> meaning.  Maybe something like noaccess ops might be a better name?

It is because of this:

include/linux/iommu.h: *IOMMU_DOMAIN_BLOCKED- All DMA is blocked, 
can be used to isolate
include/linux/iommu.h:#define IOMMU_DOMAIN_BLOCKED  (0U)

noaccess might be clearer

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RESEND PATCH v8 00/11] Fix BUG_ON in vfio_iommu_group_notifier()

2022-05-13 Thread Jason Gunthorpe via iommu
On Fri, May 13, 2022 at 10:25:48AM -0600, Alex Williamson wrote:
> On Fri, 13 May 2022 17:49:44 +0200
> Joerg Roedel  wrote:
> 
> > Hi Alex,
> > 
> > On Wed, May 04, 2022 at 10:29:56AM -0600, Alex Williamson wrote:
> > > Done, and thanks for the heads-up.  Please try to cc me when the
> > > vfio-notifier-fix branch is merged back into your next branch.  Thanks,  
> > 
> > This has happened now, the vfio-notifier-fix branch got the fix and is
> > merged back into my next branch.
> 
> Thanks, Joerg!
> 
> Jason, I'll push a merge of this with
> 
> Subject: [PATCH] vfio: Delete container_q
> 0-v1-a1e8791d795b+6b-vfio_container_q_...@nvidia.com
> 
> and
> 
> Subject: [PATCH v3 0/8] Remove vfio_group from the struct file facing VFIO API
> 0-v3-f7729924a7ea+25e33-vfio_kvm_no_group_...@nvidia.com
> 
> as soon as my sanity build finishes.  Thanks,

Thanks, I'll rebase and repost the remaining vfio series.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Try info->iommu in device_to_iommu()

2022-05-13 Thread Jason Gunthorpe via iommu
On Fri, May 13, 2022 at 11:32:11AM +0800, Baolu Lu wrote:
> On 2022/5/13 08:32, Nicolin Chen wrote:
> > Local boot test and VFIO sanity test show that info->iommu can be
> > used in device_to_iommu() as a fast path. So this patch adds it.
> > 
> > Signed-off-by: Nicolin Chen 
> >   drivers/iommu/intel/iommu.c | 5 +
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 2990f80c5e08..412fca5ab9cd 100644
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -777,6 +777,7 @@ static bool iommu_is_dummy(struct intel_iommu *iommu, 
> > struct device *dev)
> >   struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 
> > *devfn)
> >   {
> > struct dmar_drhd_unit *drhd = NULL;
> > +   struct device_domain_info *info;
> > struct pci_dev *pdev = NULL;
> > struct intel_iommu *iommu;
> > struct device *tmp;
> > @@ -786,6 +787,10 @@ struct intel_iommu *device_to_iommu(struct device 
> > *dev, u8 *bus, u8 *devfn)
> > if (!dev)
> > return NULL;
> > +   info = dev_iommu_priv_get(dev);
> > +   if (info)
> > +   return info->iommu;
> 
> device_to_iommu() also returns device source id (@bus, @devfn).

Those are in the info too..

> Perhaps, we can make device_to_iommu() only for probe_device() where the
> per-device info data is not initialized yet. After probe_device(), iommu
> and sid are retrieved through other helpers by looking up the device
> info directly?

This design makes the most sense to me... Nicolin you said there was a
case where attach was happening before probe though??

Jasno
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Jason Gunthorpe via iommu
On Thu, May 12, 2022 at 07:59:41PM +0800, Baolu Lu wrote:
> On 2022/5/12 19:48, Jason Gunthorpe wrote:
> > On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote:
> > > On 2022/5/12 13:01, Tian, Kevin wrote:
> > > > > From: Baolu Lu 
> > > > > Sent: Thursday, May 12, 2022 11:03 AM
> > > > > 
> > > > > On 2022/5/11 22:53, Jason Gunthorpe wrote:
> > > > > > > > Also, given the current arrangement it might make sense to have 
> > > > > > > > a
> > > > > > > > struct iommu_domain_sva given that no driver is wrappering this 
> > > > > > > > in
> > > > > > > > something else.
> > > > > > > Fair enough. How about below wrapper?
> > > > > > > 
> > > > > > > +struct iommu_sva_domain {
> > > > > > > +   /*
> > > > > > > +* Common iommu domain header,*must*  be put at the top
> > > > > > > +* of the structure.
> > > > > > > +*/
> > > > > > > +   struct iommu_domain domain;
> > > > > > > +   struct mm_struct *mm;
> > > > > > > +   struct iommu_sva bond;
> > > > > > > +}
> > > > > > > 
> > > > > > > The refcount is wrapped in bond.
> > > > > > I'm still not sure that bond is necessary
> > > > > 
> > > > > "bond" is the sva handle that the device drivers get through calling
> > > > > iommu_sva_bind().
> > > > > 
> > > > 
> > > > 'bond' was required before because we didn't have a domain to wrap
> > > > the page table at that time.
> > > > 
> > > > Now we have a domain and it is 1:1 associated to bond. Probably
> > > > make sense now by just returning the domain as the sva handle
> > > > instead?
> > > 
> > > It also includes the device information that the domain has been
> > > attached. So the sva_unbind() looks like this:
> > > 
> > > /**
> > >   * iommu_sva_unbind_device() - Remove a bond created with
> > > iommu_sva_bind_device
> > >   * @handle: the handle returned by iommu_sva_bind_device()
> > >   *
> > >   * Put reference to a bond between device and address space. The device
> > > should
> > >   * not be issuing any more transaction for this PASID. All outstanding 
> > > page
> > >   * requests for this PASID must have been flushed to the IOMMU.
> > >   */
> > > void iommu_sva_unbind_device(struct iommu_sva *handle)
> > > 
> > > It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
> > > if we can include the device in the unbind() interface.
> > 
> > Why would we have a special unbind for SVA?
> 
> It's about SVA kAPI for device drivers. The existing kAPIs include:
> 
> struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> struct mm_struct *mm,
> void *drvdata);
> void iommu_sva_unbind_device(struct iommu_sva *handle);
> u32 iommu_sva_get_pasid(struct iommu_sva *handle);

This is not what we agreed the API should be. We agreed:

 iommu_sva_domain_alloc()
 iommu_attach_device_pasid()
 iommu_detach_device_pasid()

Again, SVA should not be different from normal domain stuff.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-12 Thread Jason Gunthorpe via iommu
On Thu, May 12, 2022 at 07:57:26PM +0800, zhangfei@foxmail.com wrote:
> 
> 
> On 2022/5/12 下午7:32, Jason Gunthorpe via iommu wrote:
> > On Thu, May 12, 2022 at 03:07:09PM +0800, Zhangfei Gao wrote:
> > > > > I can't help feeling a little wary about removing this until IOMMUFD
> > > > > can actually offer a functional replacement - is it in the way of
> > > > > anything upcoming?
> > > >   From an upstream perspective if someone has a patched kernel to
> > > > complete the feature, then they can patch this part in as well, we
> > > > should not carry dead code like this in the kernel and in the uapi.
> > > > 
> > > > It is not directly in the way, but this needs to get done at some
> > > > point, I'd rather just get it out of the way.
> > > We are using this interface for nested mode.
> > How are you using it? It doesn't do anything. Do you have out of tree
> > patches as well?
> 
> Yes, there are some patches out of tree, since they are pending for almost
> one year.
> 
> By the way, I am trying to test nesting mode based on iommufd, still
> requires iommu_enable_nesting,
> which is removed in this patch as well.

This is temporary.

> So can we wait for alternative patch then remove them?

Do you see a problem with patching this along with all the other
patches you need?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-12 Thread Jason Gunthorpe via iommu
On Thu, May 12, 2022 at 02:22:03PM +0800, Baolu Lu wrote:
> Hi Jason,
> 
> On 2022/5/12 01:00, Jason Gunthorpe wrote:
> > > Consolidate pasid programming into dev_set_pasid() then called by both
> > > intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right?
> > I was only suggesting that really dev_attach_pasid() op is misnamed,
> > it should be called set_dev_pasid() and act like a set, not a paired
> > attach/detach - same as the non-PASID ops.
> 
> So,
> 
>   "set_dev_pasid(domain, device, pasid)" equals to dev_attach_pasid()
> 
> and
> 
>   "set_dev_pasid(NULL, device, pasid)" equals to dev_detach_pasid()?
> 
> do I understand it right?

blocking_domain should be passed, not null

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Jason Gunthorpe via iommu
On Thu, May 12, 2022 at 11:02:39AM +0800, Baolu Lu wrote:
> > > +   mutex_lock(>mutex);
> > > +   domain = xa_load(>pasid_array, pasid);
> > > +   if (domain && domain->type != type)
> > > +   domain = NULL;
> > > +   mutex_unlock(>mutex);
> > > +   iommu_group_put(group);
> > > +
> > > +   return domain;
> > This is bad locking, group->pasid_array values cannot be taken outside
> > the lock.
> 
> It's not iommu core, but SVA (or other feature components) that manage
> the life cycle of a domain. The iommu core only provides a place to
> store the domain pointer. The feature components are free to fetch their
> domain pointers from iommu core as long as they are sure that the domain
> is alive during use.

I'm not convinced.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-12 Thread Jason Gunthorpe via iommu
On Thu, May 12, 2022 at 08:00:59AM +0100, Jean-Philippe Brucker wrote:

> > It is not "missing" it is just renamed to 
> > blocking_domain->ops->set_dev_pasid()
> > 
> > The implementation of that function would be identical to
> > detach_dev_pasid.
> 
>   attach(dev, pasid, sva_domain)
>   detach(dev, pasid, sva_domain)
> 
> versus
> 
>   set_dev_pasid(dev, pasid, sva_domain)
>   set_dev_pasid(dev, pasid, blocking)
> 
> we loose the information of the domain previously attached, and the SMMU
> driver has to retrieve it to find the ASID corresponding to the mm. 

It would be easy to have the old domain be an input as well - the core
code knows it.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-12 Thread Jason Gunthorpe via iommu
On Thu, May 12, 2022 at 01:17:08PM +0800, Baolu Lu wrote:
> On 2022/5/12 13:01, Tian, Kevin wrote:
> > > From: Baolu Lu 
> > > Sent: Thursday, May 12, 2022 11:03 AM
> > > 
> > > On 2022/5/11 22:53, Jason Gunthorpe wrote:
> > > > > > Also, given the current arrangement it might make sense to have a
> > > > > > struct iommu_domain_sva given that no driver is wrappering this in
> > > > > > something else.
> > > > > Fair enough. How about below wrapper?
> > > > > 
> > > > > +struct iommu_sva_domain {
> > > > > +   /*
> > > > > +* Common iommu domain header,*must*  be put at the top
> > > > > +* of the structure.
> > > > > +*/
> > > > > +   struct iommu_domain domain;
> > > > > +   struct mm_struct *mm;
> > > > > +   struct iommu_sva bond;
> > > > > +}
> > > > > 
> > > > > The refcount is wrapped in bond.
> > > > I'm still not sure that bond is necessary
> > > 
> > > "bond" is the sva handle that the device drivers get through calling
> > > iommu_sva_bind().
> > > 
> > 
> > 'bond' was required before because we didn't have a domain to wrap
> > the page table at that time.
> > 
> > Now we have a domain and it is 1:1 associated to bond. Probably
> > make sense now by just returning the domain as the sva handle
> > instead?
> 
> It also includes the device information that the domain has been
> attached. So the sva_unbind() looks like this:
> 
> /**
>  * iommu_sva_unbind_device() - Remove a bond created with
> iommu_sva_bind_device
>  * @handle: the handle returned by iommu_sva_bind_device()
>  *
>  * Put reference to a bond between device and address space. The device
> should
>  * not be issuing any more transaction for this PASID. All outstanding page
>  * requests for this PASID must have been flushed to the IOMMU.
>  */
> void iommu_sva_unbind_device(struct iommu_sva *handle)
> 
> It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
> if we can include the device in the unbind() interface.

Why would we have a special unbind for SVA?

SVA should not different from normal domains it should use the normal
detach flow too.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-12 Thread Jason Gunthorpe via iommu
On Thu, May 12, 2022 at 03:07:09PM +0800, Zhangfei Gao wrote:
> > > I can't help feeling a little wary about removing this until IOMMUFD
> > > can actually offer a functional replacement - is it in the way of
> > > anything upcoming?
> >  From an upstream perspective if someone has a patched kernel to
> > complete the feature, then they can patch this part in as well, we
> > should not carry dead code like this in the kernel and in the uapi.
> > 
> > It is not directly in the way, but this needs to get done at some
> > point, I'd rather just get it out of the way.
> 
> We are using this interface for nested mode.

How are you using it? It doesn't do anything. Do you have out of tree
patches as well?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 10:25:21AM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Wed, 11 May 2022 14:00:25 -0300, Jason Gunthorpe  wrote:
> 
> > On Wed, May 11, 2022 at 10:02:16AM -0700, Jacob Pan wrote:
> > > > > If not global, perhaps we could have a list of pasids (e.g. xarray)
> > > > > attached to the device_domain_info. The TLB flush logic would just
> > > > > go through the list w/o caring what the PASIDs are for. Does it
> > > > > make sense to you?
> > > > 
> > > > Sort of, but we shouldn't duplicate xarrays - the group already has
> > > > this xarray - need to find some way to allow access to it from the
> > > > driver.
> > > >   
> > > I am not following,  here are the PASIDs for devTLB flush which is per
> > > device. Why group?  
> > 
> > Because group is where the core code stores it.
> I see, with singleton group. I guess I can let dma-iommu code call
> 
> iommu_attach_dma_pasid {
>   iommu_attach_device_pasid();
> Then the PASID will be stored in the group xa.

Yes, again, the dma-iommu should not be any different from the normal
unmanaged path. At this point there is no longer any difference, we
should not invent new ones.

> The flush code can retrieve PASIDs from device_domain_info.device ->
> group -> pasid_array.  Thanks for pointing it out, I missed the new
> pasid_array.

Yes.. It seems inefficient to iterate over that xarray multiple times
on the flush hot path, but maybe there is little choice. Try to use
use the xas iterators under the xa_lock spinlock..

The challenge will be accessing the group xa in the first place, but
maybe the core code can gain a function call to return a pointer to
that XA or something..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 10:02:16AM -0700, Jacob Pan wrote:
> > > If not global, perhaps we could have a list of pasids (e.g. xarray)
> > > attached to the device_domain_info. The TLB flush logic would just go
> > > through the list w/o caring what the PASIDs are for. Does it make sense
> > > to you?  
> > 
> > Sort of, but we shouldn't duplicate xarrays - the group already has
> > this xarray - need to find some way to allow access to it from the
> > driver.
> > 
> I am not following,  here are the PASIDs for devTLB flush which is per
> device. Why group?

Because group is where the core code stores it.

> We could retrieve PASIDs from the device PASID table but xa would be more
> efficient.
> 
> > > > > Are you suggesting the dma-iommu API should be called
> > > > > iommu_set_dma_pasid instead of iommu_attach_dma_pasid?
> > > > 
> > > > No that API is Ok - the driver ops API should be 'set' not
> > > > attach/detach 
> > > Sounds good, this operation has little in common with
> > > domain_ops.dev_attach_pasid() used by SVA domain. So I will add a new
> > > domain_ops.dev_set_pasid()  
> > 
> > What? No, their should only be one operation, 'dev_set_pasid' and it
> > is exactly the same as the SVA operation. It configures things so that
> > any existing translation on the PASID is removed and the PASID
> > translates according to the given domain.
> > 
> > SVA given domain or UNMANAGED given domain doesn't matter to the
> > higher level code. The driver should implement per-domain ops as
> > required to get the different behaviors.
> Perhaps some code to clarify, we have
> sva_domain_ops.dev_attach_pasid() = intel_svm_attach_dev_pasid;
> default_domain_ops.dev_attach_pasid() = intel_iommu_attach_dev_pasid;

Yes, keep that structure
 
> Consolidate pasid programming into dev_set_pasid() then called by both
> intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right?

I was only suggesting that really dev_attach_pasid() op is misnamed,
it should be called set_dev_pasid() and act like a set, not a paired
attach/detach - same as the non-PASID ops.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 03:15:22AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, May 11, 2022 3:00 AM
> > 
> > On Tue, May 10, 2022 at 05:12:04PM +1000, David Gibson wrote:
> > > Ok... here's a revised version of my proposal which I think addresses
> > > your concerns and simplfies things.
> > >
> > > - No new operations, but IOAS_MAP gets some new flags (and IOAS_COPY
> > >   will probably need matching changes)
> > >
> > > - By default the IOVA given to IOAS_MAP is a hint only, and the IOVA
> > >   is chosen by the kernel within the aperture(s).  This is closer to
> > >   how mmap() operates, and DPDK and similar shouldn't care about
> > >   having specific IOVAs, even at the individual mapping level.
> > >
> > > - IOAS_MAP gets an IOMAP_FIXED flag, analagous to mmap()'s MAP_FIXED,
> > >   for when you really do want to control the IOVA (qemu, maybe some
> > >   special userspace driver cases)
> > 
> > We already did both of these, the flag is called
> > IOMMU_IOAS_MAP_FIXED_IOVA - if it is not specified then kernel will
> > select the IOVA internally.
> > 
> > > - ATTACH will fail if the new device would shrink the aperture to
> > >   exclude any already established mappings (I assume this is already
> > >   the case)
> > 
> > Yes
> > 
> > > - IOAS_MAP gets an IOMAP_RESERVE flag, which operates a bit like a
> > >   PROT_NONE mmap().  It reserves that IOVA space, so other (non-FIXED)
> > >   MAPs won't use it, but doesn't actually put anything into the IO
> > >   pagetables.
> > > - Like a regular mapping, ATTACHes that are incompatible with an
> > >   IOMAP_RESERVEed region will fail
> > > - An IOMAP_RESERVEed area can be overmapped with an IOMAP_FIXED
> > >   mapping
> > 
> > Yeah, this seems OK, I'm thinking a new API might make sense because
> > you don't really want mmap replacement semantics but a permanent
> > record of what IOVA must always be valid.
> > 
> > IOMMU_IOA_REQUIRE_IOVA perhaps, similar signature to
> > IOMMUFD_CMD_IOAS_IOVA_RANGES:
> > 
> > struct iommu_ioas_require_iova {
> > __u32 size;
> > __u32 ioas_id;
> > __u32 num_iovas;
> > __u32 __reserved;
> > struct iommu_required_iovas {
> > __aligned_u64 start;
> > __aligned_u64 last;
> > } required_iovas[];
> > };
> 
> As a permanent record do we want to enforce that once the required
> range list is set all FIXED and non-FIXED allocations must be within the
> list of ranges?

No, I would just use this as a guarntee that going forward any
get_ranges will always return ranges that cover the listed required
ranges. Ie any narrowing of the ranges will be refused.

map/unmap should only be restricted to the get_ranges output.

Wouldn't burn CPU cycles to nanny userspace here.

> If yes we can take the end of the last range as the max size of the iova
> address space to optimize the page table layout.

I think this API should not interact with the driver. Its only job is
to prevent devices from attaching that would narrow the ranges.

If we also use it to adjust the aperture of the created iommu_domain
then it looses its usefullness as guard since something like qemu
would have to leave room for hotplug as well.

I suppose optimizing the created iommu_domains should be some other
API, with a different set of ranges and the '# of bytes of IOVA' hint
as well.

> > > For (unoptimized) qemu it would be:
> > >
> > > 1. Create IOAS
> > > 2. IOAS_MAP(IOMAP_FIXED|IOMAP_RESERVE) the valid IOVA regions of
> > the
> > >guest platform
> > > 3. ATTACH devices (this will fail if they're not compatible with the
> > >reserved IOVA regions)
> > > 4. Boot the guest
> 
> I suppose above is only the sample flow for PPC vIOMMU. For non-PPC
> vIOMMUs regular mappings are required before booting the guest and
> reservation might be done but not mandatory (at least not what current
> Qemu vfio can afford as it simply replays valid ranges in the CPU address
> space).

I think qemu can always do it, it feels like it would simplify error
cases around aperture mismatches.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 08:35:18AM -0700, Jacob Pan wrote:

> > Huh? The intel driver shares the same ops between UNMANAGED and DMA -
> > and in general I do not think we should be putting special knowledge
> > about the DMA domains in the drivers. Drivers should continue to treat
> > them identically to UNMANAGED.
> > 
> OK, other than SVA domain, the rest domain types share the same default ops.
> I agree that the default ops should be the same for UNMANAGED, IDENTITY, and
> DMA domain types. Minor detail is that we need to treat IDENTITY domain
> slightly different when it comes down to PASID entry programming.

I would be happy if IDENTITY had its own ops, if that makes sense

> If not global, perhaps we could have a list of pasids (e.g. xarray) attached
> to the device_domain_info. The TLB flush logic would just go through the
> list w/o caring what the PASIDs are for. Does it make sense to you?

Sort of, but we shouldn't duplicate xarrays - the group already has
this xarray - need to find some way to allow access to it from the
driver.

> > > Are you suggesting the dma-iommu API should be called
> > > iommu_set_dma_pasid instead of iommu_attach_dma_pasid?  
> > 
> > No that API is Ok - the driver ops API should be 'set' not attach/detach
> > 
> Sounds good, this operation has little in common with
> domain_ops.dev_attach_pasid() used by SVA domain. So I will add a new
> domain_ops.dev_set_pasid()

What? No, their should only be one operation, 'dev_set_pasid' and it
is exactly the same as the SVA operation. It configures things so that
any existing translation on the PASID is removed and the PASID
translates according to the given domain.

SVA given domain or UNMANAGED given domain doesn't matter to the
higher level code. The driver should implement per-domain ops as
required to get the different behaviors.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 03:21:31PM +0800, Baolu Lu wrote:
> On 2022/5/10 23:23, Jason Gunthorpe wrote:
> > On Tue, May 10, 2022 at 02:17:34PM +0800, Lu Baolu wrote:
> > 
> > > +/**
> > > + * iommu_sva_bind_device() - Bind a process address space to a device
> > > + * @dev: the device
> > > + * @mm: the mm to bind, caller must hold a reference to mm_users
> > > + * @drvdata: opaque data pointer to pass to bind callback
> > > + *
> > > + * Create a bond between device and address space, allowing the device 
> > > to access
> > > + * the mm using the returned PASID. If a bond already exists between 
> > > @device and
> > > + * @mm, it is returned and an additional reference is taken. Caller must 
> > > call
> > > + * iommu_sva_unbind_device() to release each reference.
> > > + *
> > > + * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called 
> > > first, to
> > > + * initialize the required SVA features.
> > > + *
> > > + * On error, returns an ERR_PTR value.
> > > + */
> > > +struct iommu_sva *
> > > +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void 
> > > *drvdata)
> > > +{
> > > + int ret = -EINVAL;
> > > + struct iommu_sva *handle;
> > > + struct iommu_domain *domain;
> > > +
> > > + /*
> > > +  * TODO: Remove the drvdata parameter after kernel PASID support is
> > > +  * enabled for the idxd driver.
> > > +  */
> > > + if (drvdata)
> > > + return ERR_PTR(-EOPNOTSUPP);
> > 
> > Why is this being left behind? Clean up the callers too please.
> 
> Okay, let me try to.
> 
> > 
> > > + /* Allocate mm->pasid if necessary. */
> > > + ret = iommu_sva_alloc_pasid(mm, 1, (1U << dev->iommu->pasid_bits) - 1);
> > > + if (ret)
> > > + return ERR_PTR(ret);
> > > +
> > > + mutex_lock(_sva_lock);
> > > + /* Search for an existing bond. */
> > > + handle = xa_load(>iommu->sva_bonds, mm->pasid);
> > > + if (handle) {
> > > + refcount_inc(>users);
> > > + goto out_success;
> > > + }
> > 
> > How can there be an existing bond?
> > 
> > dev->iommu is per-device
> > 
> > The device_group_immutable_singleton() insists on a single device
> > group
> > 
> > Basically 'sva_bonds' is the same thing as the group->pasid_array.
> 
> Yes, really.
> 
> > 
> > Assuming we leave room for multi-device groups this logic should just
> > be
> > 
> > group = iommu_group_get(dev);
> > if (!group)
> > return -ENODEV;
> > 
> > mutex_lock(>mutex);
> > domain = xa_load(>pasid_array, mm->pasid);
> > if (!domain || domain->type != IOMMU_DOMAIN_SVA || domain->mm != mm)
> > domain = iommu_sva_alloc_domain(dev, mm);
> > 
> > ?
> 
> Agreed. As a helper in iommu core, how about making it more generic like
> below?

IDK, is there more users of this? AFAIK SVA is the only place that
will be auto-sharing?

> +   mutex_lock(>mutex);
> +   domain = xa_load(>pasid_array, pasid);
> +   if (domain && domain->type != type)
> +   domain = NULL;
> +   mutex_unlock(>mutex);
> +   iommu_group_put(group);
> +
> +   return domain;

This is bad locking, group->pasid_array values cannot be taken outside
the lock.

> > And stick the refcount in the sva_domain
> > 
> > Also, given the current arrangement it might make sense to have a
> > struct iommu_domain_sva given that no driver is wrappering this in
> > something else.
> 
> Fair enough. How about below wrapper?
> 
> +struct iommu_sva_domain {
> +   /*
> +* Common iommu domain header, *must* be put at the top
> +* of the structure.
> +*/
> +   struct iommu_domain domain;
> +   struct mm_struct *mm;
> +   struct iommu_sva bond;
> +}
>
> The refcount is wrapped in bond.

I'm still not sure that bond is necessary

But yes, something like that

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 08:54:39AM +0100, Jean-Philippe Brucker wrote:
> > > > Then 'detach pasid' is:
> > > >
> > > > iommu_ops->blocking_domain->ops->attach_dev_pasid(domain, dev,
> > > pasid);
> > > >
> > > > And we move away from the notion of 'detach' and in the direction that
> > > > everything continuously has a domain set. PASID would logically
> > > > default to blocking_domain, though we wouldn't track this anywhere.
> > > 
> > > I am not sure whether we still need to keep the blocking domain concept
> > > when we are entering the new PASID world. Please allow me to wait and
> > > listen to more opinions.
> > > 
> > 
> > I'm with Jason on this direction. In concept after a PASID is detached it's
> > essentially blocked. Implementation-wise it doesn't prevent the iommu
> > driver from marking the PASID entry as non-present as doing in this
> > series instead of actually pointing to the empty page table of the block
> > domain. But api-wise it does make the entire semantics more consistent.
> 
> This is all internal to IOMMU so I don't think we should be concerned
> about API consistency. I prefer a straighforward detach() operation
> because that way IOMMU drivers don't have to keep track of which domain is
> attached to which PASID. That code can be factored into the IOMMU core.

Why would a driver need to keep additional tracking?

> In addition to clearing contexts, detach() also needs to invalidate TLBs,
> and for that the SMMU driver needs to know the old ASID (!= PASID) that
> was used by the context descriptor. We can certainly work around a missing
> detach() to implement this, but it will be convoluted.

It is not "missing" it is just renamed to blocking_domain->ops->set_dev_pasid()

The implementation of that function would be identical to
detach_dev_pasid.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   3   4   5   6   7   8   9   10   >