Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jason Gunthorpe via iommu
On Thu, Oct 07, 2021 at 12:11:27PM -0700, Jacob Pan wrote:
> Hi Barry,
> 
> On Thu, 7 Oct 2021 18:43:33 +1300, Barry Song <21cn...@gmail.com> wrote:
> 
> > > > Security-wise, KVA respects kernel mapping. So permissions are better
> > > > enforced than pass-through and identity mapping.  
> > >
> > > Is this meaningful? Isn't the entire physical map still in the KVA and
> > > isn't it entirely RW ?  
> > 
> > Some areas are RX, for example, ARCH64 supports KERNEL_TEXT_RDONLY.
> > But the difference is really minor.
> That brought up a good point if we were to use DMA API to give out KVA as
> dma_addr for trusted devices. We cannot satisfy DMA direction requirements
> since we can't change kernel mapping. It will be similar to DMA direct
> where dir is ignored AFAICT.

Right.

Using the DMA API to DMA to read only kernel memory is a bug in the
first place.

> Or we are saying if the device is trusted, using pass-through is allowed.
> i.e. physical address.

I don't see trusted being relavent here beyond the usual decision to
use the trusted map or not.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jacob Pan
Hi Barry,

On Thu, 7 Oct 2021 18:43:33 +1300, Barry Song <21cn...@gmail.com> wrote:

> > > Security-wise, KVA respects kernel mapping. So permissions are better
> > > enforced than pass-through and identity mapping.  
> >
> > Is this meaningful? Isn't the entire physical map still in the KVA and
> > isn't it entirely RW ?  
> 
> Some areas are RX, for example, ARCH64 supports KERNEL_TEXT_RDONLY.
> But the difference is really minor.
That brought up a good point if we were to use DMA API to give out KVA as
dma_addr for trusted devices. We cannot satisfy DMA direction requirements
since we can't change kernel mapping. It will be similar to DMA direct
where dir is ignored AFAICT.

Or we are saying if the device is trusted, using pass-through is allowed.
i.e. physical address.

Thoughts?

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jacob Pan
Hi Jason,

On Thu, 7 Oct 2021 14:48:22 -0300, Jason Gunthorpe  wrote:

> On Thu, Oct 07, 2021 at 10:50:10AM -0700, Jacob Pan wrote:
> 
> > On platforms that are DMA snooped, this barrier is not needed. But I
> > think your point is that once we convert to DMA API, the sync/barrier
> > is covered by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are
> > good.  
> 
> No.. my point is that a CPU store release is not necessary a DMA
> visiable event on all platforms and things like dma_wmb/rmb() may
> still be necessary. This all needs to be architected before anyone
> starts writing drivers that assume a coherent DMA model without using
> a coherent DMA allocation.
> 
Why is that specific to SVA? Or you are talking about things in general?

Can we ensure coherency at the API level where SVA bind device is
happening? i.e. fail the bind if not passing coherency check.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jason Gunthorpe via iommu
On Thu, Oct 07, 2021 at 10:50:10AM -0700, Jacob Pan wrote:

> On platforms that are DMA snooped, this barrier is not needed. But I think
> your point is that once we convert to DMA API, the sync/barrier is covered
> by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are good.

No.. my point is that a CPU store release is not necessary a DMA
visiable event on all platforms and things like dma_wmb/rmb() may
still be necessary. This all needs to be architected before anyone
starts writing drivers that assume a coherent DMA model without using
a coherent DMA allocation.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jacob Pan
Hi Jason,

On Thu, 7 Oct 2021 08:59:18 -0300, Jason Gunthorpe  wrote:

> On Fri, Oct 08, 2021 at 12:54:52AM +1300, Barry Song wrote:
> > On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe  wrote:
> >  
> > >
> > > On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:
> > >  
> > > > So do we have a case where devices can directly access the kernel's
> > > > data structure such as a list/graph/tree with pointers to a kernel
> > > > virtual address? then devices don't need to translate the address
> > > > of pointers in a structure. I assume this is one of the most useful
> > > > features userspace SVA can provide.  
> > >
> > > AFIACT that is the only good case for KVA, but it is also completely
> > > against the endianess, word size and DMA portability design of the
> > > kernel.
> > >
> > > Going there requires some new set of portable APIs for gobally
> > > coherent KVA dma.  
> > 
> > yep. I agree. it would be very weird if accelerators/gpu are sharing
> > kernel' data struct, but for each "DMA" operation - reading or writing
> > the data struct, we have to call dma_map_single/sg or
> > dma_sync_single_for_cpu/device etc. It seems once devices and cpus
> > are sharing virtual address(SVA), code doesn't need to do explicit
> > map/sync each time.  
> 
That is what we have today with sva_bind_device.
> No, it still need to do something to manage visibility from the
> current CPU to the DMA - it might not be flushing a cache, but it is
> probably a arch specific CPU barrier instruction.
> 
Are you talking about iommu_dma_sync_single_for_cpu(), this is not SVA
specific, right?

On platforms that are DMA snooped, this barrier is not needed. But I think
your point is that once we convert to DMA API, the sync/barrier is covered
by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are good.

We could also add a check for dev_is_dma_coherent(dev) before using SVA.

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jason Gunthorpe via iommu
On Fri, Oct 08, 2021 at 12:54:52AM +1300, Barry Song wrote:
> On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe  wrote:
> >
> > On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:
> >
> > > So do we have a case where devices can directly access the kernel's data
> > > structure such as a list/graph/tree with pointers to a kernel virtual 
> > > address?
> > > then devices don't need to translate the address of pointers in a 
> > > structure.
> > > I assume this is one of the most useful features userspace SVA can 
> > > provide.
> >
> > AFIACT that is the only good case for KVA, but it is also completely
> > against the endianess, word size and DMA portability design of the
> > kernel.
> >
> > Going there requires some new set of portable APIs for gobally
> > coherent KVA dma.
> 
> yep. I agree. it would be very weird if accelerators/gpu are sharing
> kernel' data struct, but for each "DMA" operation - reading or writing
> the data struct, we have to call dma_map_single/sg or
> dma_sync_single_for_cpu/device etc. It seems once devices and cpus
> are sharing virtual address(SVA), code doesn't need to do explicit
> map/sync each time.

No, it still need to do something to manage visibility from the
current CPU to the DMA - it might not be flushing a cache, but it is
probably a arch specific CPU barrier instruction.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Barry Song
On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe  wrote:
>
> On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:
>
> > So do we have a case where devices can directly access the kernel's data
> > structure such as a list/graph/tree with pointers to a kernel virtual 
> > address?
> > then devices don't need to translate the address of pointers in a structure.
> > I assume this is one of the most useful features userspace SVA can provide.
>
> AFIACT that is the only good case for KVA, but it is also completely
> against the endianess, word size and DMA portability design of the
> kernel.
>
> Going there requires some new set of portable APIs for gobally
> coherent KVA dma.

yep. I agree. it would be very weird if accelerators/gpu are sharing
kernel' data struct, but for each "DMA" operation - reading or writing
the data struct, we have to call dma_map_single/sg or
dma_sync_single_for_cpu/device etc. It seems once devices and cpus
are sharing virtual address(SVA), code doesn't need to do explicit
map/sync each time.

>
> Jason

Thanks
barry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-07 Thread Jason Gunthorpe via iommu
On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote:

> So do we have a case where devices can directly access the kernel's data
> structure such as a list/graph/tree with pointers to a kernel virtual address?
> then devices don't need to translate the address of pointers in a structure.
> I assume this is one of the most useful features userspace SVA can provide.

AFIACT that is the only good case for KVA, but it is also completely
against the endianess, word size and DMA portability design of the
kernel.

Going there requires some new set of portable APIs for gobally
coherent KVA dma.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-06 Thread Barry Song
On Tue, Oct 5, 2021 at 7:21 AM Jason Gunthorpe  wrote:
>
> On Mon, Oct 04, 2021 at 09:40:03AM -0700, Jacob Pan wrote:
> > Hi Barry,
> >
> > On Sat, 2 Oct 2021 01:45:59 +1300, Barry Song <21cn...@gmail.com> wrote:
> >
> > > >
> > > > > I assume KVA mode can avoid this iotlb flush as the device is using
> > > > > the page table of the kernel and sharing the whole kernel space. But
> > > > > will users be glad to accept this mode?
> > > >
> > > > You can avoid the lock be identity mapping the physical address space
> > > > of the kernel and maping map/unmap a NOP.
> > > >
> > > > KVA is just a different way to achive this identity map with slightly
> > > > different security properties than the normal way, but it doesn't
> > > > reach to the same security level as proper map/unmap.
> > > >
> > > > I'm not sure anyone who cares about DMA security would see value in
> > > > the slight difference between KVA and a normal identity map.
> > >
> > > yes. This is an important question. if users want a high security level,
> > > kva might not their choice; if users don't want the security, they are
> > > using iommu passthrough. So when will users choose KVA?
> > Right, KVAs sit in the middle in terms of performance and security.
> > Performance is better than IOVA due to IOTLB flush as you mentioned. Also
> > not too far behind of pass-through.
>
> The IOTLB flush is not on a DMA path but on a vmap path, so it is very
> hard to compare the two things.. Maybe vmap can be made to do lazy
> IOTLB flush or something and it could be closer
>
> > Security-wise, KVA respects kernel mapping. So permissions are better
> > enforced than pass-through and identity mapping.
>
> Is this meaningful? Isn't the entire physical map still in the KVA and
> isn't it entirely RW ?

Some areas are RX, for example, ARCH64 supports KERNEL_TEXT_RDONLY.
But the difference is really minor.

So do we have a case where devices can directly access the kernel's data
structure such as a list/graph/tree with pointers to a kernel virtual address?
then devices don't need to translate the address of pointers in a structure.
I assume this is one of the most useful features userspace SVA can provide.

But do we have a case where accelerators/GPU want to use the complex data
structures of kernel drivers?

>
> Jason

Thanks
barry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-04 Thread Jason Gunthorpe via iommu
On Mon, Oct 04, 2021 at 09:40:03AM -0700, Jacob Pan wrote:
> Hi Barry,
> 
> On Sat, 2 Oct 2021 01:45:59 +1300, Barry Song <21cn...@gmail.com> wrote:
> 
> > >  
> > > > I assume KVA mode can avoid this iotlb flush as the device is using
> > > > the page table of the kernel and sharing the whole kernel space. But
> > > > will users be glad to accept this mode?  
> > >
> > > You can avoid the lock be identity mapping the physical address space
> > > of the kernel and maping map/unmap a NOP.
> > >
> > > KVA is just a different way to achive this identity map with slightly
> > > different security properties than the normal way, but it doesn't
> > > reach to the same security level as proper map/unmap.
> > >
> > > I'm not sure anyone who cares about DMA security would see value in
> > > the slight difference between KVA and a normal identity map.  
> > 
> > yes. This is an important question. if users want a high security level,
> > kva might not their choice; if users don't want the security, they are
> > using iommu passthrough. So when will users choose KVA?
> Right, KVAs sit in the middle in terms of performance and security.
> Performance is better than IOVA due to IOTLB flush as you mentioned. Also
> not too far behind of pass-through.

The IOTLB flush is not on a DMA path but on a vmap path, so it is very
hard to compare the two things.. Maybe vmap can be made to do lazy
IOTLB flush or something and it could be closer

> Security-wise, KVA respects kernel mapping. So permissions are better
> enforced than pass-through and identity mapping.

Is this meaningful? Isn't the entire physical map still in the KVA and
isn't it entirely RW ?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-04 Thread Jacob Pan
Hi Barry,

On Sat, 2 Oct 2021 01:45:59 +1300, Barry Song <21cn...@gmail.com> wrote:

> >  
> > > I assume KVA mode can avoid this iotlb flush as the device is using
> > > the page table of the kernel and sharing the whole kernel space. But
> > > will users be glad to accept this mode?  
> >
> > You can avoid the lock be identity mapping the physical address space
> > of the kernel and maping map/unmap a NOP.
> >
> > KVA is just a different way to achive this identity map with slightly
> > different security properties than the normal way, but it doesn't
> > reach to the same security level as proper map/unmap.
> >
> > I'm not sure anyone who cares about DMA security would see value in
> > the slight difference between KVA and a normal identity map.  
> 
> yes. This is an important question. if users want a high security level,
> kva might not their choice; if users don't want the security, they are
> using iommu passthrough. So when will users choose KVA?
Right, KVAs sit in the middle in terms of performance and security.
Performance is better than IOVA due to IOTLB flush as you mentioned. Also
not too far behind of pass-through.

Security-wise, KVA respects kernel mapping. So permissions are better
enforced than pass-through and identity mapping.
To balance performance and security, we are proposing KVA is only supported
on trusted devices. On an Intel platform, it would be based on ACPI SATC
(SoC Integrated Address Translation Cache (SATC) reporting structure, VT-d
spec. 8.2). I am also adding a kernel iommu parameter to allow user
override.


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-01 Thread Barry Song
On Wed, Sep 22, 2021 at 5:14 PM Jacob Pan  wrote:
>
> Hi Joerg/Jason/Christoph et all,
>
> The current in-kernel supervisor PASID support is based on the SVM/SVA
> machinery in sva-lib. Kernel SVA is achieved by extending a special flag
> to indicate the binding of the device and a page table should be performed
> on init_mm instead of the mm of the current process.Page requests and other
> differences between user and kernel SVA are handled as special cases.
>
> This unrestricted binding with the kernel page table is being challenged
> for security and the convention that in-kernel DMA must be compatible with
> DMA APIs.
> (https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/)
> There is also the lack of IOTLB synchronization upon kernel page table 
> updates.
>
> This patchset is trying to address these concerns by having an explicit DMA
> API compatible model while continue to support in-kernel use of DMA requests
> with PASID. Specifically, the following DMA-IOMMU APIs are introduced:
>
> int iommu_dma_pasid_enable/disable(struct device *dev,
>struct iommu_domain **domain,
>enum iommu_dma_pasid_mode mode);
> int iommu_map/unmap_kva(struct iommu_domain *domain,
> void *cpu_addr,size_t size, int prot);
>
> The following three addressing modes are supported with example API usages
> by device drivers.
>
> 1. Physical address (bypass) mode. Similar to DMA direct where trusted devices
> can DMA pass through IOMMU on a per PASID basis.
> Example:
> pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS);
> /* Use the returning PASID and PA for work submission */
>
> 2. IOVA mode. DMA API compatible. Map a supervisor PASID the same way as the
> PCI requester ID (RID)
> Example:
> pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_IOVA);
> /* Use the PASID and DMA API allocated IOVA for work submission */

Hi Jacob,
might be stupid question, what is the performance benefit of this IOVA
mode comparing
with the current dma_map/unmap_single/sg API which have enabled IOMMU like
drivers/iommu/arm/arm-smmu-v3? Do we still need to flush IOTLB by sending
commands to IOMMU  each time while doing dma_unmap?

>
> 3. KVA mode. New kva map/unmap APIs. Support fast and strict sub-modes
> transparently based on device trustfulness.
> Example:
> pasid = iommu_dma_pasid_enable(dev, , IOMMU_DMA_PASID_KVA);
> iommu_map_kva(domain, , size, prot);
> /* Use the returned PASID and KVA to submit work */
> Where:
> Fast mode: Shared CPU page tables for trusted devices only
> Strict mode: IOMMU domain returned for the untrusted device to
> replicate KVA-PA mapping in IOMMU page tables.

a huge bottleneck of IOMMU we have seen before is that dma_unmap will
require IOTLB
flush, for example, in arm_smmu_cmdq_issue_cmdlist(), we are having
serious contention
on acquiring lock and delay on waiting for iotlb flush completion in
arm_smmu_cmdq_poll_until_sync() while multi-threads run.

I assume KVA mode can avoid this iotlb flush as the device is using
the page table
of the kernel and sharing the whole kernel space. But will users be
glad to accept
this mode?
It seems users are enduring the performance decrease of IOVA mapping
and unmapping
because it has better security. dma operations can only run on some
specific dma buffers
which have been mapped in the current dma-map/unmap with IOMMU backend.
some drivers are using bouncing buffer to overcome the performance loss of
dma_map/unmap as copying is faster than unmapping:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=907676b130711fd1f

BTW, we have been debugging on dma_map/unmap performance by this
benchmark:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/dma/map_benchmark.c
you might be able to use it for your benchmarking as well :-)

>
> On a per device basis, DMA address and performance modes are enabled by the
> device drivers. Platform information such as trustability, user command line
> input (not included in this set) could also be taken into consideration (not
> implemented in this RFC).
>
> This RFC is intended to communicate the API directions. Little testing is done
> outside IDXD and DMA engine tests.
>
> For PA and IOVA modes, the implementation is straightforward and tested with
> Intel IDXD driver. But several opens remain in KVA fast mode thus not tested:
> 1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a
> result of module loading/eBPF load. Adding kernel mmu notifier?
> 2. The use of the auxiliary domain for KVA map, will aux domain stay in the
> long term? Is there another way to represent sub-device granu isolation?
> 3. Is limiting the KVA sharing to the direct map range reasonable and
> practical for all architectures?
>
>
> Many thanks to Ashok Raj, Kevin Tian, and 

Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-01 Thread Barry Song
On Sat, Oct 2, 2021 at 1:36 AM Jason Gunthorpe  wrote:
>
> On Sat, Oct 02, 2021 at 01:24:54AM +1300, Barry Song wrote:
>
> > I assume KVA mode can avoid this iotlb flush as the device is using
> > the page table of the kernel and sharing the whole kernel space. But
> > will users be glad to accept this mode?
>
> You can avoid the lock be identity mapping the physical address space
> of the kernel and maping map/unmap a NOP.
>
> KVA is just a different way to achive this identity map with slightly
> different security properties than the normal way, but it doesn't
> reach to the same security level as proper map/unmap.
>
> I'm not sure anyone who cares about DMA security would see value in
> the slight difference between KVA and a normal identity map.

yes. This is an important question. if users want a high security level,
kva might not their choice; if users don't want the security, they are using
iommu passthrough. So when will users choose KVA?

>
> > which have been mapped in the current dma-map/unmap with IOMMU backend.
> > some drivers are using bouncing buffer to overcome the performance loss of
> > dma_map/unmap as copying is faster than unmapping:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=907676b130711fd1f
>
> It is pretty unforuntate that drivers are hard coding behaviors based
> on assumptions of what the portable API is doing under the covers.

not real when it has a tx_copybreak which can be set by ethtool or
similar userspace
tools . if users are using iommu passthrough, copying won't happen by
the default
tx_copybreak.  if users are using restrict iommu mode, socket buffers
are copied into
the buffers allocated and mapped in the driver. so this won't require
mapping and
unmapping socket buffers frequently.

>
> Jason

Thanks
barry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-10-01 Thread Jason Gunthorpe via iommu
On Sat, Oct 02, 2021 at 01:24:54AM +1300, Barry Song wrote:

> I assume KVA mode can avoid this iotlb flush as the device is using
> the page table of the kernel and sharing the whole kernel space. But
> will users be glad to accept this mode?

You can avoid the lock be identity mapping the physical address space
of the kernel and maping map/unmap a NOP.

KVA is just a different way to achive this identity map with slightly
different security properties than the normal way, but it doesn't
reach to the same security level as proper map/unmap.

I'm not sure anyone who cares about DMA security would see value in
the slight difference between KVA and a normal identity map.

> which have been mapped in the current dma-map/unmap with IOMMU backend.
> some drivers are using bouncing buffer to overcome the performance loss of
> dma_map/unmap as copying is faster than unmapping:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=907676b130711fd1f

It is pretty unforuntate that drivers are hard coding behaviors based
on assumptions of what the portable API is doing under the covers.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-09-30 Thread Jacob Pan
Hi Mike,

On Thu, 30 Sep 2021 14:22:34 +, "Campin, Mike" 
wrote:

> I need support for mixed user PASID, kernel PASID and non-PASID use cases
> in the driver.
> 
This specific RFC is for kernel PASID only. User PASID native use is
supported under SVA lib kernel API and /dev/uacce UAPI or driver specific
char dev. Guest PASID is being developed under the new /dev/iommu framework.
Non-PASID kernel use should be under DMA API unchanged from the driver's
POV. In fact, this proposal will map non-PASID and PASID DMA identically.

Thanks,

Jacob

> -Original Message-
> From: Jason Gunthorpe  
> Sent: Wednesday, September 29, 2021 4:43 PM
> To: Jacob Pan 
> Cc: iommu@lists.linux-foundation.org; LKML
> ; Joerg Roedel ; Christoph
> Hellwig ; Tian, Kevin ; Luck,
> Tony ; Jiang, Dave ; Raj,
> Ashok ; Kumar, Sanjay K ;
> Campin, Mike ; Thomas Gleixner
>  Subject: Re: [RFC 0/7] Support in-kernel DMA with
> PASID and SVA
> 
> On Wed, Sep 29, 2021 at 03:57:20PM -0700, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe 
> > wrote: 
> > > On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote:
> > >
> > > > For #2, it seems we can store the kernel PASID in struct device. 
> > > > This will preserve the DMA API interface while making it PASID
> > > > capable. Essentially, each PASID capable device would have two
> > > > special global
> > > > PASIDs: 
> > > > - PASID 0 for DMA request w/o PASID, aka RID2PASID
> > > > - PASID 1 (randomly selected) for in-kernel DMA request w/
> > > > PASID  
> > > 
> > > This seems reasonable, I had the same thought. Basically just have 
> > > the driver issue some trivial call:
> > >   pci_enable_pasid_dma(pdev, )  
> > That would work, but I guess it needs to be an iommu_ call instead of
> > pci_?  
> 
> Which ever makes sense..  The API should take in a struct pci_device and
> return a PCI PASID - at least as a wrapper around a more generic immu api.
> 
> > I think your suggestion is more precise, in case the driver does not 
> > want to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only).  
> 
> Since it is odd, and it may create overhead, I would do it only when
> asked to do it
> 
> > > Having multiple RID's pointing at the same IO page table is 
> > > something we expect iommufd to require so the whole thing should 
> > > ideally fall out naturally.  
> 
> > That would be the equivalent of attaching multiple devices to the same 
> > IOMMU domain. right?  
> 
> Effectively..
> 
> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-09-30 Thread Campin, Mike
I need support for mixed user PASID, kernel PASID and non-PASID use cases in 
the driver.

-Original Message-
From: Jason Gunthorpe  
Sent: Wednesday, September 29, 2021 4:43 PM
To: Jacob Pan 
Cc: iommu@lists.linux-foundation.org; LKML ; 
Joerg Roedel ; Christoph Hellwig ; Tian, 
Kevin ; Luck, Tony ; Jiang, Dave 
; Raj, Ashok ; Kumar, Sanjay K 
; Campin, Mike ; Thomas 
Gleixner 
Subject: Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

On Wed, Sep 29, 2021 at 03:57:20PM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe  wrote:
> 
> > On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote:
> >  
> > > For #2, it seems we can store the kernel PASID in struct device. 
> > > This will preserve the DMA API interface while making it PASID capable.
> > > Essentially, each PASID capable device would have two special 
> > > global
> > > PASIDs: 
> > >   - PASID 0 for DMA request w/o PASID, aka RID2PASID
> > >   - PASID 1 (randomly selected) for in-kernel DMA request w/ PASID
> > 
> > This seems reasonable, I had the same thought. Basically just have 
> > the driver issue some trivial call:
> >   pci_enable_pasid_dma(pdev, )
> That would work, but I guess it needs to be an iommu_ call instead of pci_?

Which ever makes sense..  The API should take in a struct pci_device and return 
a PCI PASID - at least as a wrapper around a more generic immu api.

> I think your suggestion is more precise, in case the driver does not 
> want to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only).

Since it is odd, and it may create overhead, I would do it only when asked to 
do it

> > Having multiple RID's pointing at the same IO page table is 
> > something we expect iommufd to require so the whole thing should 
> > ideally fall out naturally.

> That would be the equivalent of attaching multiple devices to the same 
> IOMMU domain. right?

Effectively..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-09-29 Thread Jason Gunthorpe via iommu
On Wed, Sep 29, 2021 at 03:57:20PM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe  wrote:
> 
> > On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote:
> >  
> > > For #2, it seems we can store the kernel PASID in struct device. This
> > > will preserve the DMA API interface while making it PASID capable.
> > > Essentially, each PASID capable device would have two special global
> > > PASIDs: 
> > >   - PASID 0 for DMA request w/o PASID, aka RID2PASID
> > >   - PASID 1 (randomly selected) for in-kernel DMA request w/
> > > PASID  
> > 
> > This seems reasonable, I had the same thought. Basically just have the
> > driver issue some trivial call:
> >   pci_enable_pasid_dma(pdev, )
> That would work, but I guess it needs to be an iommu_ call instead of pci_?

Which ever makes sense..  The API should take in a struct pci_device
and return a PCI PASID - at least as a wrapper around a more generic
immu api.

> I think your suggestion is more precise, in case the driver does not want
> to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only).

Since it is odd, and it may create overhead, I would do it only when
asked to do it

> > Having multiple RID's pointing at the same IO page table is something
> > we expect iommufd to require so the whole thing should ideally fall
> > out naturally.

> That would be the equivalent of attaching multiple devices to the same
> IOMMU domain. right?

Effectively..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-09-29 Thread Jacob Pan
Hi Jason,

On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe  wrote:

> On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote:
>  
> > For #2, it seems we can store the kernel PASID in struct device. This
> > will preserve the DMA API interface while making it PASID capable.
> > Essentially, each PASID capable device would have two special global
> > PASIDs: 
> > - PASID 0 for DMA request w/o PASID, aka RID2PASID
> > - PASID 1 (randomly selected) for in-kernel DMA request w/
> > PASID  
> 
> This seems reasonable, I had the same thought. Basically just have the
> driver issue some trivial call:
>   pci_enable_pasid_dma(pdev, )
That would work, but I guess it needs to be an iommu_ call instead of pci_?

Or, it can be done by the platform IOMMU code where system PASID is
automatically enabled for PASID capable devices during boot and stored in
struct device. Device drivers can retrieve the PASID from struct device.

I think your suggestion is more precise, in case the driver does not want
to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only).

> And then DMA tagged with the PASID will be handled equivilant to
> untagged DMA. Basically PASID and no PASID point to the exact same IO
> page table and the DMA API manipulates that single page table.
> 
> Having multiple RID's pointing at the same IO page table is something
> we expect iommufd to require so the whole thing should ideally fall
> out naturally.
That would be the equivalent of attaching multiple devices to the same
IOMMU domain. right?

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-09-29 Thread Jason Gunthorpe via iommu
On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote:
 
> For #2, it seems we can store the kernel PASID in struct device. This will
> preserve the DMA API interface while making it PASID capable. Essentially,
> each PASID capable device would have two special global PASIDs: 
>   - PASID 0 for DMA request w/o PASID, aka RID2PASID
>   - PASID 1 (randomly selected) for in-kernel DMA request w/ PASID

This seems reasonable, I had the same thought. Basically just have the
driver issue some trivial call:
  pci_enable_pasid_dma(pdev, )

And then DMA tagged with the PASID will be handled equivilant to
untagged DMA. Basically PASID and no PASID point to the exact same IO
page table and the DMA API manipulates that single page table.

Having multiple RID's pointing at the same IO page table is something
we expect iommufd to require so the whole thing should ideally fall
out naturally.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-09-29 Thread Jacob Pan
Hi,

Just to follow up on what we discussed during LPC VFIO/IOMMU/PCI MC.
https://linuxplumbersconf.org/event/11/contributions/1021/
The key takeaways are:
1. Addressing mode selections (PA, IOVA, and KVA) should be a policy
decision *not* to be made by device drivers. This implies that it is up to
the platform code or user (via some sysfs knobs) to decide what is the best
for each device. Drivers should not be aware of what addressing modes is
returned by DMA API.

2. DMA APIs can be extended to support DMA request with PASID. 

3. Performance benefit of using KVA (shared) should be demonstrated. Though
the saving of IOTLB flush over IOVA is conceivable.

#1 could be done in platform IOMMU code when devices are attached to their
default domains. E.g. if the device is trusted, it can operate at shared
KVA mode.

For #2, it seems we can store the kernel PASID in struct device. This will
preserve the DMA API interface while making it PASID capable. Essentially,
each PASID capable device would have two special global PASIDs: 
- PASID 0 for DMA request w/o PASID, aka RID2PASID
- PASID 1 (randomly selected) for in-kernel DMA request w/ PASID

Both PASID 0 and 1 will always point to the same page table. I.e. same
addressing mode, IOVA or KVA.

For devices does not support PASID, there is no change. For devices can do
both DMA w/ and w/o PASID, the IOTLB invalidation would include both PASIDs.

By embedding PASID in struct device, it also avoided changes in upper level
APIs. DMA engine API can continue to give out channels without knowing
whether PASID is used or not. The accelerator drivers that does work
submission can retrieve PASID from struct device.

Thoughts?

Thanks for the review and feedback at LPC!

Jacob

On Tue, 21 Sep 2021 13:29:34 -0700, Jacob Pan
 wrote:

> Hi Joerg/Jason/Christoph et all,
> 
> The current in-kernel supervisor PASID support is based on the SVM/SVA
> machinery in sva-lib. Kernel SVA is achieved by extending a special flag
> to indicate the binding of the device and a page table should be performed
> on init_mm instead of the mm of the current process.Page requests and
> other differences between user and kernel SVA are handled as special
> cases.
> 
> This unrestricted binding with the kernel page table is being challenged
> for security and the convention that in-kernel DMA must be compatible with
> DMA APIs.
> (https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/)
> There is also the lack of IOTLB synchronization upon kernel page table
> updates.
> 
> This patchset is trying to address these concerns by having an explicit
> DMA API compatible model while continue to support in-kernel use of DMA
> requests with PASID. Specifically, the following DMA-IOMMU APIs are
> introduced:
> 
> int iommu_dma_pasid_enable/disable(struct device *dev,
>  struct iommu_domain **domain,
>  enum iommu_dma_pasid_mode mode);
> int iommu_map/unmap_kva(struct iommu_domain *domain,
>   void *cpu_addr,size_t size, int prot);
> 
> The following three addressing modes are supported with example API usages
> by device drivers.
> 
> 1. Physical address (bypass) mode. Similar to DMA direct where trusted
> devices can DMA pass through IOMMU on a per PASID basis.
> Example:
>   pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS);
>   /* Use the returning PASID and PA for work submission */
> 
> 2. IOVA mode. DMA API compatible. Map a supervisor PASID the same way as
> the PCI requester ID (RID)
> Example:
>   pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_IOVA);
>   /* Use the PASID and DMA API allocated IOVA for work submission */
> 
> 3. KVA mode. New kva map/unmap APIs. Support fast and strict sub-modes
> transparently based on device trustfulness.
> Example:
>   pasid = iommu_dma_pasid_enable(dev, , IOMMU_DMA_PASID_KVA);
>   iommu_map_kva(domain, , size, prot);
>   /* Use the returned PASID and KVA to submit work */
> Where:
>   Fast mode: Shared CPU page tables for trusted devices only
>   Strict mode: IOMMU domain returned for the untrusted device to
>   replicate KVA-PA mapping in IOMMU page tables.
> 
> On a per device basis, DMA address and performance modes are enabled by
> the device drivers. Platform information such as trustability, user
> command line input (not included in this set) could also be taken into
> consideration (not implemented in this RFC).
> 
> This RFC is intended to communicate the API directions. Little testing is
> done outside IDXD and DMA engine tests.
> 
> For PA and IOVA modes, the implementation is straightforward and tested
> with Intel IDXD driver. But several opens remain in KVA fast mode thus
> not tested: 1. Lack of IOTLB synchronization, kernel direct map alias can
> be updated as a result of module loading/eBPF load. Adding kernel mmu
> notifier? 2. The use of the auxiliary domain for 

Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-09-22 Thread Jason Gunthorpe via iommu
On Tue, Sep 21, 2021 at 01:29:34PM -0700, Jacob Pan wrote:
> Hi Joerg/Jason/Christoph et all,
> 
> The current in-kernel supervisor PASID support is based on the SVM/SVA
> machinery in sva-lib. Kernel SVA is achieved by extending a special flag
> to indicate the binding of the device and a page table should be performed
> on init_mm instead of the mm of the current process.Page requests and other
> differences between user and kernel SVA are handled as special cases.
> 
> This unrestricted binding with the kernel page table is being challenged
> for security and the convention that in-kernel DMA must be compatible with
> DMA APIs.
> (https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/)
> There is also the lack of IOTLB synchronization upon kernel page table 
> updates.
> 
> This patchset is trying to address these concerns by having an explicit DMA
> API compatible model while continue to support in-kernel use of DMA requests
> with PASID. Specifically, the following DMA-IOMMU APIs are introduced:
> 
> int iommu_dma_pasid_enable/disable(struct device *dev,
>  struct iommu_domain **domain,
>  enum iommu_dma_pasid_mode mode);
> int iommu_map/unmap_kva(struct iommu_domain *domain,
>   void *cpu_addr,size_t size, int prot);

I'm not convinced this is going in the right direction..

You should create/find a 'struct device' for the PASID and use the
normal DMA API, not try to create a parallel DMA API under the iommu
framework.

Again, there should no driver in Linux doing DMA without going through
the normal DMA API.

> The following three addressing modes are supported with example API usages
> by device drivers.
> 
> 1. Physical address (bypass) mode. Similar to DMA direct where trusted devices
> can DMA pass through IOMMU on a per PASID basis.
> Example:
>   pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS);
>   /* Use the returning PASID and PA for work submission */

And why should this even be a choice given to drivers?

Drivers do not get to self declare their "trustiness" - this is only
set by the admin.

PASID tagged DMA is no different than any other DMA and needs to
follow the global admin set IOMMU modes - without any driver knob to
change behaviors.

The API design should look more like this:

   u32 hw_pasid;
   struct device *pasid_dev = iommu_get_pasid_device_handle(pci_device, 
_pasid);
   dma_addr_t addr = dma_map_XX(pasid_dev, buf, size)

   'tell HW to do DMA'(hw_pasid, addr, size)

   dma_unmap_XX(pasid_dev, addr, size);

If there is any performance tunable around how the IO page table is
consutrcted then the IOMMU layer will handle it transparently from
global config, just as it does for every other DMA out there.

> 1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a
> result of module loading/eBPF load. Adding kernel mmu notifier?

I'm deeply skeptical we should even have "KSVA" and would want to see
a lot of performance justification to introduce something like
this.

Given that basically only valloc memory could truely benefit from it,
I don't expect to see much win, especially when balanced with
burdening all valloc users with an IO page table synchronization.

Certainly it should not be part of a patch series fixing kPASID
support for basic DMA, and still doesn't excuse skpping the DMA API -
that is still mandatory for portability to support cache flushing.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 0/7] Support in-kernel DMA with PASID and SVA

2021-09-21 Thread Jacob Pan
Hi Joerg/Jason/Christoph et all,

The current in-kernel supervisor PASID support is based on the SVM/SVA
machinery in sva-lib. Kernel SVA is achieved by extending a special flag
to indicate the binding of the device and a page table should be performed
on init_mm instead of the mm of the current process.Page requests and other
differences between user and kernel SVA are handled as special cases.

This unrestricted binding with the kernel page table is being challenged
for security and the convention that in-kernel DMA must be compatible with
DMA APIs.
(https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/)
There is also the lack of IOTLB synchronization upon kernel page table updates.

This patchset is trying to address these concerns by having an explicit DMA
API compatible model while continue to support in-kernel use of DMA requests
with PASID. Specifically, the following DMA-IOMMU APIs are introduced:

int iommu_dma_pasid_enable/disable(struct device *dev,
   struct iommu_domain **domain,
   enum iommu_dma_pasid_mode mode);
int iommu_map/unmap_kva(struct iommu_domain *domain,
void *cpu_addr,size_t size, int prot);

The following three addressing modes are supported with example API usages
by device drivers.

1. Physical address (bypass) mode. Similar to DMA direct where trusted devices
can DMA pass through IOMMU on a per PASID basis.
Example:
pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS);
/* Use the returning PASID and PA for work submission */

2. IOVA mode. DMA API compatible. Map a supervisor PASID the same way as the
PCI requester ID (RID)
Example:
pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_IOVA);
/* Use the PASID and DMA API allocated IOVA for work submission */

3. KVA mode. New kva map/unmap APIs. Support fast and strict sub-modes
transparently based on device trustfulness.
Example:
pasid = iommu_dma_pasid_enable(dev, , IOMMU_DMA_PASID_KVA);
iommu_map_kva(domain, , size, prot);
/* Use the returned PASID and KVA to submit work */
Where:
Fast mode: Shared CPU page tables for trusted devices only
Strict mode: IOMMU domain returned for the untrusted device to
replicate KVA-PA mapping in IOMMU page tables.

On a per device basis, DMA address and performance modes are enabled by the
device drivers. Platform information such as trustability, user command line
input (not included in this set) could also be taken into consideration (not
implemented in this RFC).

This RFC is intended to communicate the API directions. Little testing is done
outside IDXD and DMA engine tests.

For PA and IOVA modes, the implementation is straightforward and tested with
Intel IDXD driver. But several opens remain in KVA fast mode thus not tested:
1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a
result of module loading/eBPF load. Adding kernel mmu notifier?
2. The use of the auxiliary domain for KVA map, will aux domain stay in the
long term? Is there another way to represent sub-device granu isolation?
3. Is limiting the KVA sharing to the direct map range reasonable and
practical for all architectures?


Many thanks to Ashok Raj, Kevin Tian, and Baolu who provided feedback and
many ideas in this set.

Thanks,

Jacob

Jacob Pan (7):
  ioasid: reserve special PASID for in-kernel DMA
  dma-iommu: Add API for DMA request with PASID
  iommu/vt-d: Add DMA w/ PASID support for PA and IOVA
  dma-iommu: Add support for DMA w/ PASID in KVA
  iommu/vt-d: Add support for KVA PASID mode
  iommu: Add KVA map API
  dma/idxd: Use dma-iommu PASID API instead of SVA lib

 drivers/dma/idxd/idxd.h   |   4 +-
 drivers/dma/idxd/init.c   |  36 ++--
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +-
 drivers/iommu/dma-iommu.c | 123 +-
 drivers/iommu/intel/iommu.c   | 154 +-
 drivers/iommu/ioasid.c|   2 +
 drivers/iommu/iommu-sva-lib.c |   1 +
 drivers/iommu/iommu.c |  63 +++
 include/linux/dma-iommu.h |  14 ++
 include/linux/intel-iommu.h   |   7 +-
 include/linux/ioasid.h|   4 +
 include/linux/iommu.h |  13 ++
 12 files changed, 390 insertions(+), 33 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu