Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Thu, Oct 07, 2021 at 12:11:27PM -0700, Jacob Pan wrote: > Hi Barry, > > On Thu, 7 Oct 2021 18:43:33 +1300, Barry Song <21cn...@gmail.com> wrote: > > > > > Security-wise, KVA respects kernel mapping. So permissions are better > > > > enforced than pass-through and identity mapping. > > > > > > Is this meaningful? Isn't the entire physical map still in the KVA and > > > isn't it entirely RW ? > > > > Some areas are RX, for example, ARCH64 supports KERNEL_TEXT_RDONLY. > > But the difference is really minor. > That brought up a good point if we were to use DMA API to give out KVA as > dma_addr for trusted devices. We cannot satisfy DMA direction requirements > since we can't change kernel mapping. It will be similar to DMA direct > where dir is ignored AFAICT. Right. Using the DMA API to DMA to read only kernel memory is a bug in the first place. > Or we are saying if the device is trusted, using pass-through is allowed. > i.e. physical address. I don't see trusted being relavent here beyond the usual decision to use the trusted map or not. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
Hi Barry, On Thu, 7 Oct 2021 18:43:33 +1300, Barry Song <21cn...@gmail.com> wrote: > > > Security-wise, KVA respects kernel mapping. So permissions are better > > > enforced than pass-through and identity mapping. > > > > Is this meaningful? Isn't the entire physical map still in the KVA and > > isn't it entirely RW ? > > Some areas are RX, for example, ARCH64 supports KERNEL_TEXT_RDONLY. > But the difference is really minor. That brought up a good point if we were to use DMA API to give out KVA as dma_addr for trusted devices. We cannot satisfy DMA direction requirements since we can't change kernel mapping. It will be similar to DMA direct where dir is ignored AFAICT. Or we are saying if the device is trusted, using pass-through is allowed. i.e. physical address. Thoughts? Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
Hi Jason, On Thu, 7 Oct 2021 14:48:22 -0300, Jason Gunthorpe wrote: > On Thu, Oct 07, 2021 at 10:50:10AM -0700, Jacob Pan wrote: > > > On platforms that are DMA snooped, this barrier is not needed. But I > > think your point is that once we convert to DMA API, the sync/barrier > > is covered by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are > > good. > > No.. my point is that a CPU store release is not necessary a DMA > visiable event on all platforms and things like dma_wmb/rmb() may > still be necessary. This all needs to be architected before anyone > starts writing drivers that assume a coherent DMA model without using > a coherent DMA allocation. > Why is that specific to SVA? Or you are talking about things in general? Can we ensure coherency at the API level where SVA bind device is happening? i.e. fail the bind if not passing coherency check. Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Thu, Oct 07, 2021 at 10:50:10AM -0700, Jacob Pan wrote: > On platforms that are DMA snooped, this barrier is not needed. But I think > your point is that once we convert to DMA API, the sync/barrier is covered > by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are good. No.. my point is that a CPU store release is not necessary a DMA visiable event on all platforms and things like dma_wmb/rmb() may still be necessary. This all needs to be architected before anyone starts writing drivers that assume a coherent DMA model without using a coherent DMA allocation. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
Hi Jason, On Thu, 7 Oct 2021 08:59:18 -0300, Jason Gunthorpe wrote: > On Fri, Oct 08, 2021 at 12:54:52AM +1300, Barry Song wrote: > > On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe wrote: > > > > > > > > On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote: > > > > > > > So do we have a case where devices can directly access the kernel's > > > > data structure such as a list/graph/tree with pointers to a kernel > > > > virtual address? then devices don't need to translate the address > > > > of pointers in a structure. I assume this is one of the most useful > > > > features userspace SVA can provide. > > > > > > AFIACT that is the only good case for KVA, but it is also completely > > > against the endianess, word size and DMA portability design of the > > > kernel. > > > > > > Going there requires some new set of portable APIs for gobally > > > coherent KVA dma. > > > > yep. I agree. it would be very weird if accelerators/gpu are sharing > > kernel' data struct, but for each "DMA" operation - reading or writing > > the data struct, we have to call dma_map_single/sg or > > dma_sync_single_for_cpu/device etc. It seems once devices and cpus > > are sharing virtual address(SVA), code doesn't need to do explicit > > map/sync each time. > That is what we have today with sva_bind_device. > No, it still need to do something to manage visibility from the > current CPU to the DMA - it might not be flushing a cache, but it is > probably a arch specific CPU barrier instruction. > Are you talking about iommu_dma_sync_single_for_cpu(), this is not SVA specific, right? On platforms that are DMA snooped, this barrier is not needed. But I think your point is that once we convert to DMA API, the sync/barrier is covered by DMA APIs if !dev_is_dma_coherent(dev). Then all archs are good. We could also add a check for dev_is_dma_coherent(dev) before using SVA. > Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Fri, Oct 08, 2021 at 12:54:52AM +1300, Barry Song wrote: > On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe wrote: > > > > On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote: > > > > > So do we have a case where devices can directly access the kernel's data > > > structure such as a list/graph/tree with pointers to a kernel virtual > > > address? > > > then devices don't need to translate the address of pointers in a > > > structure. > > > I assume this is one of the most useful features userspace SVA can > > > provide. > > > > AFIACT that is the only good case for KVA, but it is also completely > > against the endianess, word size and DMA portability design of the > > kernel. > > > > Going there requires some new set of portable APIs for gobally > > coherent KVA dma. > > yep. I agree. it would be very weird if accelerators/gpu are sharing > kernel' data struct, but for each "DMA" operation - reading or writing > the data struct, we have to call dma_map_single/sg or > dma_sync_single_for_cpu/device etc. It seems once devices and cpus > are sharing virtual address(SVA), code doesn't need to do explicit > map/sync each time. No, it still need to do something to manage visibility from the current CPU to the DMA - it might not be flushing a cache, but it is probably a arch specific CPU barrier instruction. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Fri, Oct 8, 2021 at 12:32 AM Jason Gunthorpe wrote: > > On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote: > > > So do we have a case where devices can directly access the kernel's data > > structure such as a list/graph/tree with pointers to a kernel virtual > > address? > > then devices don't need to translate the address of pointers in a structure. > > I assume this is one of the most useful features userspace SVA can provide. > > AFIACT that is the only good case for KVA, but it is also completely > against the endianess, word size and DMA portability design of the > kernel. > > Going there requires some new set of portable APIs for gobally > coherent KVA dma. yep. I agree. it would be very weird if accelerators/gpu are sharing kernel' data struct, but for each "DMA" operation - reading or writing the data struct, we have to call dma_map_single/sg or dma_sync_single_for_cpu/device etc. It seems once devices and cpus are sharing virtual address(SVA), code doesn't need to do explicit map/sync each time. > > Jason Thanks barry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Thu, Oct 07, 2021 at 06:43:33PM +1300, Barry Song wrote: > So do we have a case where devices can directly access the kernel's data > structure such as a list/graph/tree with pointers to a kernel virtual address? > then devices don't need to translate the address of pointers in a structure. > I assume this is one of the most useful features userspace SVA can provide. AFIACT that is the only good case for KVA, but it is also completely against the endianess, word size and DMA portability design of the kernel. Going there requires some new set of portable APIs for gobally coherent KVA dma. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Tue, Oct 5, 2021 at 7:21 AM Jason Gunthorpe wrote: > > On Mon, Oct 04, 2021 at 09:40:03AM -0700, Jacob Pan wrote: > > Hi Barry, > > > > On Sat, 2 Oct 2021 01:45:59 +1300, Barry Song <21cn...@gmail.com> wrote: > > > > > > > > > > > I assume KVA mode can avoid this iotlb flush as the device is using > > > > > the page table of the kernel and sharing the whole kernel space. But > > > > > will users be glad to accept this mode? > > > > > > > > You can avoid the lock be identity mapping the physical address space > > > > of the kernel and maping map/unmap a NOP. > > > > > > > > KVA is just a different way to achive this identity map with slightly > > > > different security properties than the normal way, but it doesn't > > > > reach to the same security level as proper map/unmap. > > > > > > > > I'm not sure anyone who cares about DMA security would see value in > > > > the slight difference between KVA and a normal identity map. > > > > > > yes. This is an important question. if users want a high security level, > > > kva might not their choice; if users don't want the security, they are > > > using iommu passthrough. So when will users choose KVA? > > Right, KVAs sit in the middle in terms of performance and security. > > Performance is better than IOVA due to IOTLB flush as you mentioned. Also > > not too far behind of pass-through. > > The IOTLB flush is not on a DMA path but on a vmap path, so it is very > hard to compare the two things.. Maybe vmap can be made to do lazy > IOTLB flush or something and it could be closer > > > Security-wise, KVA respects kernel mapping. So permissions are better > > enforced than pass-through and identity mapping. > > Is this meaningful? Isn't the entire physical map still in the KVA and > isn't it entirely RW ? Some areas are RX, for example, ARCH64 supports KERNEL_TEXT_RDONLY. But the difference is really minor. So do we have a case where devices can directly access the kernel's data structure such as a list/graph/tree with pointers to a kernel virtual address? then devices don't need to translate the address of pointers in a structure. I assume this is one of the most useful features userspace SVA can provide. But do we have a case where accelerators/GPU want to use the complex data structures of kernel drivers? > > Jason Thanks barry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Mon, Oct 04, 2021 at 09:40:03AM -0700, Jacob Pan wrote: > Hi Barry, > > On Sat, 2 Oct 2021 01:45:59 +1300, Barry Song <21cn...@gmail.com> wrote: > > > > > > > > I assume KVA mode can avoid this iotlb flush as the device is using > > > > the page table of the kernel and sharing the whole kernel space. But > > > > will users be glad to accept this mode? > > > > > > You can avoid the lock be identity mapping the physical address space > > > of the kernel and maping map/unmap a NOP. > > > > > > KVA is just a different way to achive this identity map with slightly > > > different security properties than the normal way, but it doesn't > > > reach to the same security level as proper map/unmap. > > > > > > I'm not sure anyone who cares about DMA security would see value in > > > the slight difference between KVA and a normal identity map. > > > > yes. This is an important question. if users want a high security level, > > kva might not their choice; if users don't want the security, they are > > using iommu passthrough. So when will users choose KVA? > Right, KVAs sit in the middle in terms of performance and security. > Performance is better than IOVA due to IOTLB flush as you mentioned. Also > not too far behind of pass-through. The IOTLB flush is not on a DMA path but on a vmap path, so it is very hard to compare the two things.. Maybe vmap can be made to do lazy IOTLB flush or something and it could be closer > Security-wise, KVA respects kernel mapping. So permissions are better > enforced than pass-through and identity mapping. Is this meaningful? Isn't the entire physical map still in the KVA and isn't it entirely RW ? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
Hi Barry, On Sat, 2 Oct 2021 01:45:59 +1300, Barry Song <21cn...@gmail.com> wrote: > > > > > I assume KVA mode can avoid this iotlb flush as the device is using > > > the page table of the kernel and sharing the whole kernel space. But > > > will users be glad to accept this mode? > > > > You can avoid the lock be identity mapping the physical address space > > of the kernel and maping map/unmap a NOP. > > > > KVA is just a different way to achive this identity map with slightly > > different security properties than the normal way, but it doesn't > > reach to the same security level as proper map/unmap. > > > > I'm not sure anyone who cares about DMA security would see value in > > the slight difference between KVA and a normal identity map. > > yes. This is an important question. if users want a high security level, > kva might not their choice; if users don't want the security, they are > using iommu passthrough. So when will users choose KVA? Right, KVAs sit in the middle in terms of performance and security. Performance is better than IOVA due to IOTLB flush as you mentioned. Also not too far behind of pass-through. Security-wise, KVA respects kernel mapping. So permissions are better enforced than pass-through and identity mapping. To balance performance and security, we are proposing KVA is only supported on trusted devices. On an Intel platform, it would be based on ACPI SATC (SoC Integrated Address Translation Cache (SATC) reporting structure, VT-d spec. 8.2). I am also adding a kernel iommu parameter to allow user override. Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Wed, Sep 22, 2021 at 5:14 PM Jacob Pan wrote: > > Hi Joerg/Jason/Christoph et all, > > The current in-kernel supervisor PASID support is based on the SVM/SVA > machinery in sva-lib. Kernel SVA is achieved by extending a special flag > to indicate the binding of the device and a page table should be performed > on init_mm instead of the mm of the current process.Page requests and other > differences between user and kernel SVA are handled as special cases. > > This unrestricted binding with the kernel page table is being challenged > for security and the convention that in-kernel DMA must be compatible with > DMA APIs. > (https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/) > There is also the lack of IOTLB synchronization upon kernel page table > updates. > > This patchset is trying to address these concerns by having an explicit DMA > API compatible model while continue to support in-kernel use of DMA requests > with PASID. Specifically, the following DMA-IOMMU APIs are introduced: > > int iommu_dma_pasid_enable/disable(struct device *dev, >struct iommu_domain **domain, >enum iommu_dma_pasid_mode mode); > int iommu_map/unmap_kva(struct iommu_domain *domain, > void *cpu_addr,size_t size, int prot); > > The following three addressing modes are supported with example API usages > by device drivers. > > 1. Physical address (bypass) mode. Similar to DMA direct where trusted devices > can DMA pass through IOMMU on a per PASID basis. > Example: > pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS); > /* Use the returning PASID and PA for work submission */ > > 2. IOVA mode. DMA API compatible. Map a supervisor PASID the same way as the > PCI requester ID (RID) > Example: > pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_IOVA); > /* Use the PASID and DMA API allocated IOVA for work submission */ Hi Jacob, might be stupid question, what is the performance benefit of this IOVA mode comparing with the current dma_map/unmap_single/sg API which have enabled IOMMU like drivers/iommu/arm/arm-smmu-v3? Do we still need to flush IOTLB by sending commands to IOMMU each time while doing dma_unmap? > > 3. KVA mode. New kva map/unmap APIs. Support fast and strict sub-modes > transparently based on device trustfulness. > Example: > pasid = iommu_dma_pasid_enable(dev, , IOMMU_DMA_PASID_KVA); > iommu_map_kva(domain, , size, prot); > /* Use the returned PASID and KVA to submit work */ > Where: > Fast mode: Shared CPU page tables for trusted devices only > Strict mode: IOMMU domain returned for the untrusted device to > replicate KVA-PA mapping in IOMMU page tables. a huge bottleneck of IOMMU we have seen before is that dma_unmap will require IOTLB flush, for example, in arm_smmu_cmdq_issue_cmdlist(), we are having serious contention on acquiring lock and delay on waiting for iotlb flush completion in arm_smmu_cmdq_poll_until_sync() while multi-threads run. I assume KVA mode can avoid this iotlb flush as the device is using the page table of the kernel and sharing the whole kernel space. But will users be glad to accept this mode? It seems users are enduring the performance decrease of IOVA mapping and unmapping because it has better security. dma operations can only run on some specific dma buffers which have been mapped in the current dma-map/unmap with IOMMU backend. some drivers are using bouncing buffer to overcome the performance loss of dma_map/unmap as copying is faster than unmapping: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=907676b130711fd1f BTW, we have been debugging on dma_map/unmap performance by this benchmark: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/dma/map_benchmark.c you might be able to use it for your benchmarking as well :-) > > On a per device basis, DMA address and performance modes are enabled by the > device drivers. Platform information such as trustability, user command line > input (not included in this set) could also be taken into consideration (not > implemented in this RFC). > > This RFC is intended to communicate the API directions. Little testing is done > outside IDXD and DMA engine tests. > > For PA and IOVA modes, the implementation is straightforward and tested with > Intel IDXD driver. But several opens remain in KVA fast mode thus not tested: > 1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a > result of module loading/eBPF load. Adding kernel mmu notifier? > 2. The use of the auxiliary domain for KVA map, will aux domain stay in the > long term? Is there another way to represent sub-device granu isolation? > 3. Is limiting the KVA sharing to the direct map range reasonable and > practical for all architectures? > > > Many thanks to Ashok Raj, Kevin Tian, and
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Sat, Oct 2, 2021 at 1:36 AM Jason Gunthorpe wrote: > > On Sat, Oct 02, 2021 at 01:24:54AM +1300, Barry Song wrote: > > > I assume KVA mode can avoid this iotlb flush as the device is using > > the page table of the kernel and sharing the whole kernel space. But > > will users be glad to accept this mode? > > You can avoid the lock be identity mapping the physical address space > of the kernel and maping map/unmap a NOP. > > KVA is just a different way to achive this identity map with slightly > different security properties than the normal way, but it doesn't > reach to the same security level as proper map/unmap. > > I'm not sure anyone who cares about DMA security would see value in > the slight difference between KVA and a normal identity map. yes. This is an important question. if users want a high security level, kva might not their choice; if users don't want the security, they are using iommu passthrough. So when will users choose KVA? > > > which have been mapped in the current dma-map/unmap with IOMMU backend. > > some drivers are using bouncing buffer to overcome the performance loss of > > dma_map/unmap as copying is faster than unmapping: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=907676b130711fd1f > > It is pretty unforuntate that drivers are hard coding behaviors based > on assumptions of what the portable API is doing under the covers. not real when it has a tx_copybreak which can be set by ethtool or similar userspace tools . if users are using iommu passthrough, copying won't happen by the default tx_copybreak. if users are using restrict iommu mode, socket buffers are copied into the buffers allocated and mapped in the driver. so this won't require mapping and unmapping socket buffers frequently. > > Jason Thanks barry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Sat, Oct 02, 2021 at 01:24:54AM +1300, Barry Song wrote: > I assume KVA mode can avoid this iotlb flush as the device is using > the page table of the kernel and sharing the whole kernel space. But > will users be glad to accept this mode? You can avoid the lock be identity mapping the physical address space of the kernel and maping map/unmap a NOP. KVA is just a different way to achive this identity map with slightly different security properties than the normal way, but it doesn't reach to the same security level as proper map/unmap. I'm not sure anyone who cares about DMA security would see value in the slight difference between KVA and a normal identity map. > which have been mapped in the current dma-map/unmap with IOMMU backend. > some drivers are using bouncing buffer to overcome the performance loss of > dma_map/unmap as copying is faster than unmapping: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=907676b130711fd1f It is pretty unforuntate that drivers are hard coding behaviors based on assumptions of what the portable API is doing under the covers. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
Hi Mike, On Thu, 30 Sep 2021 14:22:34 +, "Campin, Mike" wrote: > I need support for mixed user PASID, kernel PASID and non-PASID use cases > in the driver. > This specific RFC is for kernel PASID only. User PASID native use is supported under SVA lib kernel API and /dev/uacce UAPI or driver specific char dev. Guest PASID is being developed under the new /dev/iommu framework. Non-PASID kernel use should be under DMA API unchanged from the driver's POV. In fact, this proposal will map non-PASID and PASID DMA identically. Thanks, Jacob > -Original Message- > From: Jason Gunthorpe > Sent: Wednesday, September 29, 2021 4:43 PM > To: Jacob Pan > Cc: iommu@lists.linux-foundation.org; LKML > ; Joerg Roedel ; Christoph > Hellwig ; Tian, Kevin ; Luck, > Tony ; Jiang, Dave ; Raj, > Ashok ; Kumar, Sanjay K ; > Campin, Mike ; Thomas Gleixner > Subject: Re: [RFC 0/7] Support in-kernel DMA with > PASID and SVA > > On Wed, Sep 29, 2021 at 03:57:20PM -0700, Jacob Pan wrote: > > Hi Jason, > > > > On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe > > wrote: > > > On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote: > > > > > > > For #2, it seems we can store the kernel PASID in struct device. > > > > This will preserve the DMA API interface while making it PASID > > > > capable. Essentially, each PASID capable device would have two > > > > special global > > > > PASIDs: > > > > - PASID 0 for DMA request w/o PASID, aka RID2PASID > > > > - PASID 1 (randomly selected) for in-kernel DMA request w/ > > > > PASID > > > > > > This seems reasonable, I had the same thought. Basically just have > > > the driver issue some trivial call: > > > pci_enable_pasid_dma(pdev, ) > > That would work, but I guess it needs to be an iommu_ call instead of > > pci_? > > Which ever makes sense.. The API should take in a struct pci_device and > return a PCI PASID - at least as a wrapper around a more generic immu api. > > > I think your suggestion is more precise, in case the driver does not > > want to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only). > > Since it is odd, and it may create overhead, I would do it only when > asked to do it > > > > Having multiple RID's pointing at the same IO page table is > > > something we expect iommufd to require so the whole thing should > > > ideally fall out naturally. > > > That would be the equivalent of attaching multiple devices to the same > > IOMMU domain. right? > > Effectively.. > > Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 0/7] Support in-kernel DMA with PASID and SVA
I need support for mixed user PASID, kernel PASID and non-PASID use cases in the driver. -Original Message- From: Jason Gunthorpe Sent: Wednesday, September 29, 2021 4:43 PM To: Jacob Pan Cc: iommu@lists.linux-foundation.org; LKML ; Joerg Roedel ; Christoph Hellwig ; Tian, Kevin ; Luck, Tony ; Jiang, Dave ; Raj, Ashok ; Kumar, Sanjay K ; Campin, Mike ; Thomas Gleixner Subject: Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA On Wed, Sep 29, 2021 at 03:57:20PM -0700, Jacob Pan wrote: > Hi Jason, > > On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe wrote: > > > On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote: > > > > > For #2, it seems we can store the kernel PASID in struct device. > > > This will preserve the DMA API interface while making it PASID capable. > > > Essentially, each PASID capable device would have two special > > > global > > > PASIDs: > > > - PASID 0 for DMA request w/o PASID, aka RID2PASID > > > - PASID 1 (randomly selected) for in-kernel DMA request w/ PASID > > > > This seems reasonable, I had the same thought. Basically just have > > the driver issue some trivial call: > > pci_enable_pasid_dma(pdev, ) > That would work, but I guess it needs to be an iommu_ call instead of pci_? Which ever makes sense.. The API should take in a struct pci_device and return a PCI PASID - at least as a wrapper around a more generic immu api. > I think your suggestion is more precise, in case the driver does not > want to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only). Since it is odd, and it may create overhead, I would do it only when asked to do it > > Having multiple RID's pointing at the same IO page table is > > something we expect iommufd to require so the whole thing should > > ideally fall out naturally. > That would be the equivalent of attaching multiple devices to the same > IOMMU domain. right? Effectively.. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Wed, Sep 29, 2021 at 03:57:20PM -0700, Jacob Pan wrote: > Hi Jason, > > On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe wrote: > > > On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote: > > > > > For #2, it seems we can store the kernel PASID in struct device. This > > > will preserve the DMA API interface while making it PASID capable. > > > Essentially, each PASID capable device would have two special global > > > PASIDs: > > > - PASID 0 for DMA request w/o PASID, aka RID2PASID > > > - PASID 1 (randomly selected) for in-kernel DMA request w/ > > > PASID > > > > This seems reasonable, I had the same thought. Basically just have the > > driver issue some trivial call: > > pci_enable_pasid_dma(pdev, ) > That would work, but I guess it needs to be an iommu_ call instead of pci_? Which ever makes sense.. The API should take in a struct pci_device and return a PCI PASID - at least as a wrapper around a more generic immu api. > I think your suggestion is more precise, in case the driver does not want > to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only). Since it is odd, and it may create overhead, I would do it only when asked to do it > > Having multiple RID's pointing at the same IO page table is something > > we expect iommufd to require so the whole thing should ideally fall > > out naturally. > That would be the equivalent of attaching multiple devices to the same > IOMMU domain. right? Effectively.. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
Hi Jason, On Wed, 29 Sep 2021 16:39:53 -0300, Jason Gunthorpe wrote: > On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote: > > > For #2, it seems we can store the kernel PASID in struct device. This > > will preserve the DMA API interface while making it PASID capable. > > Essentially, each PASID capable device would have two special global > > PASIDs: > > - PASID 0 for DMA request w/o PASID, aka RID2PASID > > - PASID 1 (randomly selected) for in-kernel DMA request w/ > > PASID > > This seems reasonable, I had the same thought. Basically just have the > driver issue some trivial call: > pci_enable_pasid_dma(pdev, ) That would work, but I guess it needs to be an iommu_ call instead of pci_? Or, it can be done by the platform IOMMU code where system PASID is automatically enabled for PASID capable devices during boot and stored in struct device. Device drivers can retrieve the PASID from struct device. I think your suggestion is more precise, in case the driver does not want to do DMA w/ PASID, we can do less IOTLB flush (PASID 0 only). > And then DMA tagged with the PASID will be handled equivilant to > untagged DMA. Basically PASID and no PASID point to the exact same IO > page table and the DMA API manipulates that single page table. > > Having multiple RID's pointing at the same IO page table is something > we expect iommufd to require so the whole thing should ideally fall > out naturally. That would be the equivalent of attaching multiple devices to the same IOMMU domain. right? > Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Wed, Sep 29, 2021 at 12:37:19PM -0700, Jacob Pan wrote: > For #2, it seems we can store the kernel PASID in struct device. This will > preserve the DMA API interface while making it PASID capable. Essentially, > each PASID capable device would have two special global PASIDs: > - PASID 0 for DMA request w/o PASID, aka RID2PASID > - PASID 1 (randomly selected) for in-kernel DMA request w/ PASID This seems reasonable, I had the same thought. Basically just have the driver issue some trivial call: pci_enable_pasid_dma(pdev, ) And then DMA tagged with the PASID will be handled equivilant to untagged DMA. Basically PASID and no PASID point to the exact same IO page table and the DMA API manipulates that single page table. Having multiple RID's pointing at the same IO page table is something we expect iommufd to require so the whole thing should ideally fall out naturally. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
Hi, Just to follow up on what we discussed during LPC VFIO/IOMMU/PCI MC. https://linuxplumbersconf.org/event/11/contributions/1021/ The key takeaways are: 1. Addressing mode selections (PA, IOVA, and KVA) should be a policy decision *not* to be made by device drivers. This implies that it is up to the platform code or user (via some sysfs knobs) to decide what is the best for each device. Drivers should not be aware of what addressing modes is returned by DMA API. 2. DMA APIs can be extended to support DMA request with PASID. 3. Performance benefit of using KVA (shared) should be demonstrated. Though the saving of IOTLB flush over IOVA is conceivable. #1 could be done in platform IOMMU code when devices are attached to their default domains. E.g. if the device is trusted, it can operate at shared KVA mode. For #2, it seems we can store the kernel PASID in struct device. This will preserve the DMA API interface while making it PASID capable. Essentially, each PASID capable device would have two special global PASIDs: - PASID 0 for DMA request w/o PASID, aka RID2PASID - PASID 1 (randomly selected) for in-kernel DMA request w/ PASID Both PASID 0 and 1 will always point to the same page table. I.e. same addressing mode, IOVA or KVA. For devices does not support PASID, there is no change. For devices can do both DMA w/ and w/o PASID, the IOTLB invalidation would include both PASIDs. By embedding PASID in struct device, it also avoided changes in upper level APIs. DMA engine API can continue to give out channels without knowing whether PASID is used or not. The accelerator drivers that does work submission can retrieve PASID from struct device. Thoughts? Thanks for the review and feedback at LPC! Jacob On Tue, 21 Sep 2021 13:29:34 -0700, Jacob Pan wrote: > Hi Joerg/Jason/Christoph et all, > > The current in-kernel supervisor PASID support is based on the SVM/SVA > machinery in sva-lib. Kernel SVA is achieved by extending a special flag > to indicate the binding of the device and a page table should be performed > on init_mm instead of the mm of the current process.Page requests and > other differences between user and kernel SVA are handled as special > cases. > > This unrestricted binding with the kernel page table is being challenged > for security and the convention that in-kernel DMA must be compatible with > DMA APIs. > (https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/) > There is also the lack of IOTLB synchronization upon kernel page table > updates. > > This patchset is trying to address these concerns by having an explicit > DMA API compatible model while continue to support in-kernel use of DMA > requests with PASID. Specifically, the following DMA-IOMMU APIs are > introduced: > > int iommu_dma_pasid_enable/disable(struct device *dev, > struct iommu_domain **domain, > enum iommu_dma_pasid_mode mode); > int iommu_map/unmap_kva(struct iommu_domain *domain, > void *cpu_addr,size_t size, int prot); > > The following three addressing modes are supported with example API usages > by device drivers. > > 1. Physical address (bypass) mode. Similar to DMA direct where trusted > devices can DMA pass through IOMMU on a per PASID basis. > Example: > pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS); > /* Use the returning PASID and PA for work submission */ > > 2. IOVA mode. DMA API compatible. Map a supervisor PASID the same way as > the PCI requester ID (RID) > Example: > pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_IOVA); > /* Use the PASID and DMA API allocated IOVA for work submission */ > > 3. KVA mode. New kva map/unmap APIs. Support fast and strict sub-modes > transparently based on device trustfulness. > Example: > pasid = iommu_dma_pasid_enable(dev, , IOMMU_DMA_PASID_KVA); > iommu_map_kva(domain, , size, prot); > /* Use the returned PASID and KVA to submit work */ > Where: > Fast mode: Shared CPU page tables for trusted devices only > Strict mode: IOMMU domain returned for the untrusted device to > replicate KVA-PA mapping in IOMMU page tables. > > On a per device basis, DMA address and performance modes are enabled by > the device drivers. Platform information such as trustability, user > command line input (not included in this set) could also be taken into > consideration (not implemented in this RFC). > > This RFC is intended to communicate the API directions. Little testing is > done outside IDXD and DMA engine tests. > > For PA and IOVA modes, the implementation is straightforward and tested > with Intel IDXD driver. But several opens remain in KVA fast mode thus > not tested: 1. Lack of IOTLB synchronization, kernel direct map alias can > be updated as a result of module loading/eBPF load. Adding kernel mmu > notifier? 2. The use of the auxiliary domain for
Re: [RFC 0/7] Support in-kernel DMA with PASID and SVA
On Tue, Sep 21, 2021 at 01:29:34PM -0700, Jacob Pan wrote: > Hi Joerg/Jason/Christoph et all, > > The current in-kernel supervisor PASID support is based on the SVM/SVA > machinery in sva-lib. Kernel SVA is achieved by extending a special flag > to indicate the binding of the device and a page table should be performed > on init_mm instead of the mm of the current process.Page requests and other > differences between user and kernel SVA are handled as special cases. > > This unrestricted binding with the kernel page table is being challenged > for security and the convention that in-kernel DMA must be compatible with > DMA APIs. > (https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/) > There is also the lack of IOTLB synchronization upon kernel page table > updates. > > This patchset is trying to address these concerns by having an explicit DMA > API compatible model while continue to support in-kernel use of DMA requests > with PASID. Specifically, the following DMA-IOMMU APIs are introduced: > > int iommu_dma_pasid_enable/disable(struct device *dev, > struct iommu_domain **domain, > enum iommu_dma_pasid_mode mode); > int iommu_map/unmap_kva(struct iommu_domain *domain, > void *cpu_addr,size_t size, int prot); I'm not convinced this is going in the right direction.. You should create/find a 'struct device' for the PASID and use the normal DMA API, not try to create a parallel DMA API under the iommu framework. Again, there should no driver in Linux doing DMA without going through the normal DMA API. > The following three addressing modes are supported with example API usages > by device drivers. > > 1. Physical address (bypass) mode. Similar to DMA direct where trusted devices > can DMA pass through IOMMU on a per PASID basis. > Example: > pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS); > /* Use the returning PASID and PA for work submission */ And why should this even be a choice given to drivers? Drivers do not get to self declare their "trustiness" - this is only set by the admin. PASID tagged DMA is no different than any other DMA and needs to follow the global admin set IOMMU modes - without any driver knob to change behaviors. The API design should look more like this: u32 hw_pasid; struct device *pasid_dev = iommu_get_pasid_device_handle(pci_device, _pasid); dma_addr_t addr = dma_map_XX(pasid_dev, buf, size) 'tell HW to do DMA'(hw_pasid, addr, size) dma_unmap_XX(pasid_dev, addr, size); If there is any performance tunable around how the IO page table is consutrcted then the IOMMU layer will handle it transparently from global config, just as it does for every other DMA out there. > 1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a > result of module loading/eBPF load. Adding kernel mmu notifier? I'm deeply skeptical we should even have "KSVA" and would want to see a lot of performance justification to introduce something like this. Given that basically only valloc memory could truely benefit from it, I don't expect to see much win, especially when balanced with burdening all valloc users with an IO page table synchronization. Certainly it should not be part of a patch series fixing kPASID support for basic DMA, and still doesn't excuse skpping the DMA API - that is still mandatory for portability to support cache flushing. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 0/7] Support in-kernel DMA with PASID and SVA
Hi Joerg/Jason/Christoph et all, The current in-kernel supervisor PASID support is based on the SVM/SVA machinery in sva-lib. Kernel SVA is achieved by extending a special flag to indicate the binding of the device and a page table should be performed on init_mm instead of the mm of the current process.Page requests and other differences between user and kernel SVA are handled as special cases. This unrestricted binding with the kernel page table is being challenged for security and the convention that in-kernel DMA must be compatible with DMA APIs. (https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/) There is also the lack of IOTLB synchronization upon kernel page table updates. This patchset is trying to address these concerns by having an explicit DMA API compatible model while continue to support in-kernel use of DMA requests with PASID. Specifically, the following DMA-IOMMU APIs are introduced: int iommu_dma_pasid_enable/disable(struct device *dev, struct iommu_domain **domain, enum iommu_dma_pasid_mode mode); int iommu_map/unmap_kva(struct iommu_domain *domain, void *cpu_addr,size_t size, int prot); The following three addressing modes are supported with example API usages by device drivers. 1. Physical address (bypass) mode. Similar to DMA direct where trusted devices can DMA pass through IOMMU on a per PASID basis. Example: pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_BYPASS); /* Use the returning PASID and PA for work submission */ 2. IOVA mode. DMA API compatible. Map a supervisor PASID the same way as the PCI requester ID (RID) Example: pasid = iommu_dma_pasid_enable(dev, NULL, IOMMU_DMA_PASID_IOVA); /* Use the PASID and DMA API allocated IOVA for work submission */ 3. KVA mode. New kva map/unmap APIs. Support fast and strict sub-modes transparently based on device trustfulness. Example: pasid = iommu_dma_pasid_enable(dev, , IOMMU_DMA_PASID_KVA); iommu_map_kva(domain, , size, prot); /* Use the returned PASID and KVA to submit work */ Where: Fast mode: Shared CPU page tables for trusted devices only Strict mode: IOMMU domain returned for the untrusted device to replicate KVA-PA mapping in IOMMU page tables. On a per device basis, DMA address and performance modes are enabled by the device drivers. Platform information such as trustability, user command line input (not included in this set) could also be taken into consideration (not implemented in this RFC). This RFC is intended to communicate the API directions. Little testing is done outside IDXD and DMA engine tests. For PA and IOVA modes, the implementation is straightforward and tested with Intel IDXD driver. But several opens remain in KVA fast mode thus not tested: 1. Lack of IOTLB synchronization, kernel direct map alias can be updated as a result of module loading/eBPF load. Adding kernel mmu notifier? 2. The use of the auxiliary domain for KVA map, will aux domain stay in the long term? Is there another way to represent sub-device granu isolation? 3. Is limiting the KVA sharing to the direct map range reasonable and practical for all architectures? Many thanks to Ashok Raj, Kevin Tian, and Baolu who provided feedback and many ideas in this set. Thanks, Jacob Jacob Pan (7): ioasid: reserve special PASID for in-kernel DMA dma-iommu: Add API for DMA request with PASID iommu/vt-d: Add DMA w/ PASID support for PA and IOVA dma-iommu: Add support for DMA w/ PASID in KVA iommu/vt-d: Add support for KVA PASID mode iommu: Add KVA map API dma/idxd: Use dma-iommu PASID API instead of SVA lib drivers/dma/idxd/idxd.h | 4 +- drivers/dma/idxd/init.c | 36 ++-- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- drivers/iommu/dma-iommu.c | 123 +- drivers/iommu/intel/iommu.c | 154 +- drivers/iommu/ioasid.c| 2 + drivers/iommu/iommu-sva-lib.c | 1 + drivers/iommu/iommu.c | 63 +++ include/linux/dma-iommu.h | 14 ++ include/linux/intel-iommu.h | 7 +- include/linux/ioasid.h| 4 + include/linux/iommu.h | 13 ++ 12 files changed, 390 insertions(+), 33 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu