RE: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO

2021-10-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Friday, October 22, 2021 7:31 AM
> 
> On Thu, Oct 21, 2021 at 02:26:00AM +, Tian, Kevin wrote:
> 
> > But in reality only Intel integrated GPUs have this special no-snoop
> > trick (fixed knowledge), with a dedicated IOMMU which doesn't
> > support enforce-snoop format at all. In this case there is no choice
> > that the user can further make.
> 
> huh? That is not true at all. no-snoop is a PCIe spec behavior, any
> device can trigger it

yes, I should say Intel GPU 'drivers'.

> 
> What is true today is that only Intel GPU drivers are crazy enough to
> use it on Linux without platform support.
> 
> > Also per Christoph's comment no-snoop is not an encouraged
> > usage overall.
> 
> I wouldn't say that, I think Christoph said using it without API
> support through the DMA layer is very wrong.

ok, sounds like I drew out a wrong impression from previous discussion.

> 
> DMA layer support could be added if there was interest, all the pieces
> are there to do it.
> 
> > Given that I wonder whether the current vfio model better suites for
> > this corner case, i.e. just let the kernel to handle instead of
> > exposing it in uAPI. The simple policy (as vfio does) is to
> > automatically set enforce-snoop when the target IOMMU supports it,
> > otherwise enable vfio/kvm contract to handle no-snoop requirement.
> 
> IMHO you need to model it as the KVM people said - if KVM can execute
> a real wbinvd in a VM then an ioctl shoudl be available to normal
> userspace to run the same instruction.
> 
> So, figure out some rules to add a wbinvd ioctl to iommufd that makes
> some kind of sense and logically kvm is just triggering that ioctl,
> including whatever security model protects it.

wbinvd instruction is x86 specific. Here we'd want a generic cache 
invalidation ioctl and then need some form of arch callbacks though x86 
is the only concerned platform for now. 

> 
> I have no idea what security model makes sense for wbinvd, that is the
> major question you have to answer.

wbinvd flushes the entire cache in local cpu. It's more a performance
isolation problem but nothing can prevent it once the user is allowed
to call this ioctl. This is the main reason why wbinvd is a privileged 
instruction and is emulated by kvm as a nop unless an assigned device
has no-snoop requirement. alternatively the user may call clflush
which is unprivileged and can invalidate a specific cache line, though 
not efficient for flushing a big buffer.

One tricky thing is that the process might be scheduled to different 
cpus between writing buffers and calling wbinvd ioctl. Since wbvind 
only has local behavior, it requires the ioctl to call wbinvd on all
cpus that this process has previously been scheduled on.

kvm maintains a dirty cpu mask in its preempt notifier (see 
kvm_sched_in/out).

Is there any concern if iommufd also follows the same mechanism?
Currently looks preempt notifier is only  used by kvm. Not sure whether
there is strong criteria around using it. and this local behavior may
not apply to all platforms (then better hidden behind arch callback?)

> 
> And obviously none of this should be hidden behind a private API to
> KVM.
> 
> > I don't see any interest in implementing an Intel GPU driver fully
> > in userspace. If just talking about possibility, a separate uAPI can
> > be still introduced to allow the userspace to issue wbinvd as Paolo
> > suggested.
> >
> > One side-effect of doing so is that then we may have to support
> > multiple domains per IOAS when Intel GPU and other devices are
> > attached to the same IOAS.
> 
> I think we already said the IOAS should represent a single IO page
> table layout?

yes. I was just talking about tradeoff possibility if the aforementioned
option is feasible. Now based on above discussion then we will resume
back to this one-ioas-one-layout model.

> 
> So if there is a new for incompatible layouts then the IOAS should be
> duplicated.
> 
> Otherwise, I also think the iommu core code should eventually learn to
> share the io page table across HW instances. Eg ARM has a similar
> efficiency issue if there are multiple SMMU HW blocks.
> 

or we may introduce an alias ioas concept that any change on one 
ioas is automatically replayed on the alias ioas if two ioas's are created 
just due to incompatible layout.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device

2021-10-21 Thread Lu Baolu

On 10/21/21 4:10 PM, Marc Zyngier wrote:

On Thu, 21 Oct 2021 03:22:30 +0100,
Lu Baolu  wrote:


On 10/20/21 10:22 PM, Marc Zyngier wrote:

On Wed, 20 Oct 2021 06:21:44 +0100,
Lu Baolu  wrote:


On 2021/10/20 0:37, Sven Peter via iommu wrote:

+   /*
+* Check that CPU pages can be represented by the IOVA granularity.
+* This has to be done after ops->attach_dev since many IOMMU drivers
+* only limit domain->pgsize_bitmap after having attached the first
+* device.
+*/
+   ret = iommu_check_page_size(domain);
+   if (ret) {
+   __iommu_detach_device(domain, dev);
+   return ret;
+   }


It looks odd. __iommu_attach_device() attaches an I/O page table for a
device. How does it relate to CPU pages? Why is it a failure case if CPU
page size is not covered?


If you allocate a CPU PAGE_SIZE'd region, and point it at a device
that now can DMA to more than what you have allocated because the
IOMMU's own page size is larger, the device has now access to data it
shouldn't see. In my book, that's a pretty bad thing.


But even you enforce the CPU page size check here, this problem still
exists unless all DMA buffers are PAGE_SIZE aligned and sized, right?


Let me take a CPU analogy: you have a page that contains some user
data *and* a kernel secret. How do you map this page into userspace
without leaking the kernel secret?

PAGE_SIZE allocations are the unit of isolation, and this applies to
both CPU and IOMMU. If you have allocated a DMA buffer that is less
than a page, you then have to resort to bounce buffering, or accept
that your data isn't safe.


I can understand the problems when IOMMU page sizes is larger than CPU
page size. But the code itself is not clean. The vendor iommu drivers
know more hardware details than the iommu core. It looks odd that the
vendor iommu says "okay, I can attach this I/O page table to the
device", but the iommu core says "no, you can't" and rolls everything
back.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO

2021-10-21 Thread Jason Gunthorpe via iommu
On Thu, Oct 21, 2021 at 02:26:00AM +, Tian, Kevin wrote:

> But in reality only Intel integrated GPUs have this special no-snoop 
> trick (fixed knowledge), with a dedicated IOMMU which doesn't
> support enforce-snoop format at all. In this case there is no choice
> that the user can further make. 

huh? That is not true at all. no-snoop is a PCIe spec behavior, any
device can trigger it

What is true today is that only Intel GPU drivers are crazy enough to
use it on Linux without platform support.

> Also per Christoph's comment no-snoop is not an encouraged 
> usage overall.

I wouldn't say that, I think Christoph said using it without API
support through the DMA layer is very wrong.

DMA layer support could be added if there was interest, all the pieces
are there to do it.

> Given that I wonder whether the current vfio model better suites for
> this corner case, i.e. just let the kernel to handle instead of
> exposing it in uAPI. The simple policy (as vfio does) is to
> automatically set enforce-snoop when the target IOMMU supports it,
> otherwise enable vfio/kvm contract to handle no-snoop requirement.

IMHO you need to model it as the KVM people said - if KVM can execute
a real wbinvd in a VM then an ioctl shoudl be available to normal
userspace to run the same instruction.

So, figure out some rules to add a wbinvd ioctl to iommufd that makes
some kind of sense and logically kvm is just triggering that ioctl,
including whatever security model protects it.

I have no idea what security model makes sense for wbinvd, that is the
major question you have to answer.

And obviously none of this should be hidden behind a private API to
KVM.

> I don't see any interest in implementing an Intel GPU driver fully
> in userspace. If just talking about possibility, a separate uAPI can 
> be still introduced to allow the userspace to issue wbinvd as Paolo
> suggested.
> 
> One side-effect of doing so is that then we may have to support
> multiple domains per IOAS when Intel GPU and other devices are
> attached to the same IOAS.

I think we already said the IOAS should represent a single IO page
table layout?

So if there is a new for incompatible layouts then the IOAS should be
duplicated.

Otherwise, I also think the iommu core code should eventually learn to
share the io page table across HW instances. Eg ARM has a similar
efficiency issue if there are multiple SMMU HW blocks.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO

2021-10-21 Thread Jason Gunthorpe via iommu
On Thu, Oct 21, 2021 at 03:58:02PM +0100, Jean-Philippe Brucker wrote:
> On Thu, Oct 21, 2021 at 02:26:00AM +, Tian, Kevin wrote:
> > > I'll leave it to Jean to confirm. If only coherent DMA can be used in
> > > the guest on other platforms, suppose VFIO should not blindly set
> > > IOMMU_CACHE and in concept it should deny assigning a non-coherent
> > > device since no co-ordination with guest exists today.
> > 
> > Jean, what's your opinion?
> 
> Yes a sanity check to prevent assigning non-coherent devices would be
> good, though I'm not particularly worried about non-coherent devices. PCIe
> on Arm should be coherent (according to the Base System Architecture). So
> vfio-pci devices should be coherent, but vfio-platform and mdev are
> case-by-case (hopefully all coherent since it concerns newer platforms).
> 
> More worrying, I thought we disabled No-Snoop for VFIO but I was wrong,
> it's left enabled. On Arm I don't think userspace can perform the right
> cache maintenance operations to maintain coherency with a device that
> issues No-Snoop writes. Userspace can issue clean+invalidate but not
> invalidate alone, so there is no equivalent to
> arch_sync_dma_for_cpu().

So what happens in a VM? Does a VM know that arch_sync_dma_for_cpu()
is not available?

And how does this work with the nested IOMMU translation? I thought I
read in the SMMU spec that the io page table entries could control
cachability including in nesting cases?

> I think the worse that can happen is the device owner shooting itself in
> the foot by using No-Snoop, but would it hurt to disable it?

No, the worst is the same as Intel - a driver running in the guest VM
assumes it can use arch_sync_dma_for_cpu() and acts accordingly,
resulting in a broken VM.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH][next] iommu/dma: Use kvcalloc() instead of kvzalloc()

2021-10-21 Thread Robin Murphy

On 2021-09-28 23:22, Gustavo A. R. Silva wrote:

Use 2-factor argument form kvcalloc() instead of kvzalloc().


If we have a thing for that now, then sure, why not. FWIW this can't 
ever overflow due to where "count" comes from, but it has no reason to 
be special.


Acked-by: Robin Murphy 


Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Gustavo A. R. Silva 
---
  drivers/iommu/dma-iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 896bea04c347..18c6edbe5fbf 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -616,7 +616,7 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
if (!order_mask)
return NULL;
  
-	pages = kvzalloc(count * sizeof(*pages), GFP_KERNEL);

+   pages = kvcalloc(count, sizeof(*pages), GFP_KERNEL);
if (!pages)
return NULL;
  


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH][next] iommu/dma: Use kvcalloc() instead of kvzalloc()

2021-10-21 Thread Kees Cook
On Tue, Sep 28, 2021 at 05:22:29PM -0500, Gustavo A. R. Silva wrote:
> Use 2-factor argument form kvcalloc() instead of kvzalloc().
> 
> Link: https://github.com/KSPP/linux/issues/162
> Signed-off-by: Gustavo A. R. Silva 

Looks right.

Reviewed-by: Kees Cook 

-- 
Kees Cook
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 10/20] iommu/iommufd: Add IOMMU_DEVICE_GET_INFO

2021-10-21 Thread Jean-Philippe Brucker
On Thu, Oct 21, 2021 at 02:26:00AM +, Tian, Kevin wrote:
> > I'll leave it to Jean to confirm. If only coherent DMA can be used in
> > the guest on other platforms, suppose VFIO should not blindly set
> > IOMMU_CACHE and in concept it should deny assigning a non-coherent
> > device since no co-ordination with guest exists today.
> 
> Jean, what's your opinion?

Yes a sanity check to prevent assigning non-coherent devices would be
good, though I'm not particularly worried about non-coherent devices. PCIe
on Arm should be coherent (according to the Base System Architecture). So
vfio-pci devices should be coherent, but vfio-platform and mdev are
case-by-case (hopefully all coherent since it concerns newer platforms).

More worrying, I thought we disabled No-Snoop for VFIO but I was wrong,
it's left enabled. On Arm I don't think userspace can perform the right
cache maintenance operations to maintain coherency with a device that
issues No-Snoop writes. Userspace can issue clean+invalidate but not
invalidate alone, so there is no equivalent to arch_sync_dma_for_cpu().
I think the worse that can happen is the device owner shooting itself in
the foot by using No-Snoop, but would it hurt to disable it?

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/10] dma-direct: add a dma_direct_use_pool helper

2021-10-21 Thread Christoph Hellwig
Add a helper to check if a potentially blocking operation should
dip into the atomic pools.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f2ec40f5733fc..babf79c16c041 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -151,6 +151,15 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
return page;
 }
 
+/*
+ * Check if a potentially blocking operations needs to dip into the atomic
+ * pools for the given device/gfp.
+ */
+static bool dma_direct_use_pool(struct device *dev, gfp_t gfp)
+{
+   return gfpflags_allow_blocking(gfp) && !is_swiotlb_for_alloc(dev);
+}
+
 static void *dma_direct_alloc_from_pool(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp)
 {
@@ -229,8 +238,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 * instead have to dip into the atomic pools.
 */
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP)) {
-   if (!gfpflags_allow_blocking(gfp) &&
-   !is_swiotlb_for_alloc(dev))
+   if (dma_direct_use_pool(dev, gfp))
return dma_direct_alloc_from_pool(dev, size,
dma_handle, gfp);
remap = true;
@@ -241,8 +249,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 * Decrypting memory may block, so allocate the memory from the atomic
 * pools if we can't block.
 */
-   if (force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
-   !is_swiotlb_for_alloc(dev))
+   if (force_dma_unencrypted(dev) && dma_direct_use_pool(dev, gfp))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
/* we always manually zero the memory once we are done */
@@ -359,8 +366,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
struct page *page;
void *ret;
 
-   if (force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
-   !is_swiotlb_for_alloc(dev))
+   if (force_dma_unencrypted(dev) && dma_direct_use_pool(dev, gfp))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
page = __dma_direct_alloc_pages(dev, size, gfp);
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 09/10] dma-direct: factor the swiotlb code out of __dma_direct_alloc_pages

2021-10-21 Thread Christoph Hellwig
Add a new helper to deal with the swiotlb case.  This keeps the code
nicely boundled and removes the not required call to
dma_direct_optimal_gfp_mask for the swiotlb case.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f4ac2e1cdf469..f2ec40f5733fc 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -97,6 +97,18 @@ static void __dma_direct_free_pages(struct device *dev, 
struct page *page,
dma_free_contiguous(dev, page, size);
 }
 
+static struct page *dma_direct_alloc_swiotlb(struct device *dev, size_t size)
+{
+   struct page *page = swiotlb_alloc(dev, size);
+
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   swiotlb_free(dev, page, size);
+   return NULL;
+   }
+
+   return page;
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
 {
@@ -106,17 +118,11 @@ static struct page *__dma_direct_alloc_pages(struct 
device *dev, size_t size,
 
WARN_ON_ONCE(!PAGE_ALIGNED(size));
 
+   if (is_swiotlb_for_alloc(dev))
+   return dma_direct_alloc_swiotlb(dev, size);
+
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
-   if (is_swiotlb_for_alloc(dev)) {
-   page = swiotlb_alloc(dev, size);
-   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
-   __dma_direct_free_pages(dev, page, size);
-   return NULL;
-   }
-   return page;
-   }
-
page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/10] dma-direct: drop two CONFIG_DMA_RESTRICTED_POOL conditionals

2021-10-21 Thread Christoph Hellwig
swiotlb_alloc and swiotlb_free are properly stubbed out if
CONFIG_DMA_RESTRICTED_POOL is not set, so skip the extra checks.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 680fe10410645..f4ac2e1cdf469 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -92,8 +92,7 @@ static int dma_set_encrypted(struct device *dev, void *vaddr, 
size_t size)
 static void __dma_direct_free_pages(struct device *dev, struct page *page,
size_t size)
 {
-   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
-   swiotlb_free(dev, page, size))
+   if (swiotlb_free(dev, page, size))
return;
dma_free_contiguous(dev, page, size);
 }
@@ -109,8 +108,7 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
-   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
-   is_swiotlb_for_alloc(dev)) {
+   if (is_swiotlb_for_alloc(dev)) {
page = swiotlb_alloc(dev, size);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
__dma_direct_free_pages(dev, page, size);
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/10] dma-direct: warn if there is no pool for force unencrypted allocations

2021-10-21 Thread Christoph Hellwig
Instead of blindly running into a blocking operation for a non-blocking gfp,
return NULL and spew an error.  Note that Kconfig prevents this for all
currently relevant platforms, and this is just a debug check.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index d66f37f34ba71..680fe10410645 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -154,6 +154,9 @@ static void *dma_direct_alloc_from_pool(struct device *dev, 
size_t size,
u64 phys_mask;
void *ret;
 
+   if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_DMA_COHERENT_POOL)))
+   return NULL;
+
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _mask);
page = dma_alloc_from_pool(dev, size, , gfp, dma_coherent_ok);
@@ -234,8 +237,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 * Decrypting memory may block, so allocate the memory from the atomic
 * pools if we can't block.
 */
-   if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
-   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
+   if (force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
!is_swiotlb_for_alloc(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
@@ -353,8 +355,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
struct page *page;
void *ret;
 
-   if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
-   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
+   if (force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
!is_swiotlb_for_alloc(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 06/10] dma-direct: refactor the !coherent checks in dma_direct_alloc

2021-10-21 Thread Christoph Hellwig
Add a big central !dev_is_dma_coherent(dev) block to deal with as much
as of the uncached allocation schemes and document the schemes a bit
better.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 58 -
 1 file changed, 36 insertions(+), 22 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4ffdb524942a1..d66f37f34ba71 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -196,29 +196,46 @@ void *dma_direct_alloc(struct device *dev, size_t size,
!force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev))
return dma_direct_alloc_no_mapping(dev, size, dma_handle, gfp);
 
-   if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) &&
-   !dev_is_dma_coherent(dev) &&
-   !is_swiotlb_for_alloc(dev))
-   return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
+   if (!dev_is_dma_coherent(dev)) {
+   /*
+* Fallback to the arch handler if it exists.  This should
+* eventually go away.
+*/
+   if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+   !IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) &&
+   !is_swiotlb_for_alloc(dev))
+   return arch_dma_alloc(dev, size, dma_handle, gfp,
+ attrs);
 
-   if (IS_ENABLED(CONFIG_DMA_GLOBAL_POOL) &&
-   !dev_is_dma_coherent(dev))
-   return dma_alloc_from_global_coherent(dev, size, dma_handle);
+   /*
+* If there is a global pool, always allocate from it for
+* non-coherent devices.
+*/
+   if (IS_ENABLED(CONFIG_DMA_GLOBAL_POOL))
+   return dma_alloc_from_global_coherent(dev, size,
+   dma_handle);
+
+   /*
+* Otherwise remap if the architecture is asking for it.  But
+* given that remapping memory is a blocking operation we'll
+* instead have to dip into the atomic pools.
+*/
+   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP)) {
+   if (!gfpflags_allow_blocking(gfp) &&
+   !is_swiotlb_for_alloc(dev))
+   return dma_direct_alloc_from_pool(dev, size,
+   dma_handle, gfp);
+   remap = true;
+   }
+   }
 
/*
-* Remapping or decrypting memory may block. If either is required and
-* we can't block, allocate the memory from the atomic pools.
-* If restricted DMA (i.e., is_swiotlb_for_alloc) is required, one must
-* set up another device coherent pool by shared-dma-pool and use
-* dma_alloc_from_dev_coherent instead.
+* Decrypting memory may block, so allocate the memory from the atomic
+* pools if we can't block.
 */
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
-   !gfpflags_allow_blocking(gfp) &&
-   (force_dma_unencrypted(dev) ||
-(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
- !dev_is_dma_coherent(dev))) &&
+   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
!is_swiotlb_for_alloc(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
@@ -226,10 +243,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
-
-   if (!dev_is_dma_coherent(dev) && IS_ENABLED(CONFIG_DMA_DIRECT_REMAP)) {
-   remap = true;
-   } else if (PageHighMem(page)) {
+   if (PageHighMem(page)) {
/*
 * Depending on the cma= arguments and per-arch setup,
 * dma_alloc_contiguous could return highmem pages.
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 05/10] dma-direct: factor out a helper for DMA_ATTR_NO_KERNEL_MAPPING allocations

2021-10-21 Thread Christoph Hellwig
Split the code for DMA_ATTR_NO_KERNEL_MAPPING allocations into a separate
helper to make dma_direct_alloc a little more readable.

Signed-off-by: Christoph Hellwig 
Acked-by: David Rientjes 
---
 kernel/dma/direct.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a6b6fe72af4d1..4ffdb524942a1 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -163,6 +163,24 @@ static void *dma_direct_alloc_from_pool(struct device 
*dev, size_t size,
return ret;
 }
 
+static void *dma_direct_alloc_no_mapping(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp)
+{
+   struct page *page;
+
+   page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
+   if (!page)
+   return NULL;
+
+   /* remove any dirty cache lines on the kernel alias */
+   if (!PageHighMem(page))
+   arch_dma_prep_coherent(page, size);
+
+   /* return the page pointer as the opaque cookie */
+   *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
+   return page;
+}
+
 void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
@@ -175,17 +193,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
-   page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
-   if (!page)
-   return NULL;
-   /* remove any dirty cache lines on the kernel alias */
-   if (!PageHighMem(page))
-   arch_dma_prep_coherent(page, size);
-   *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
-   /* return the page pointer as the opaque cookie */
-   return page;
-   }
+   !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev))
+   return dma_direct_alloc_no_mapping(dev, size, dma_handle, gfp);
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
!IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/10] dma-direct: clean up the remapping checks in dma_direct_alloc

2021-10-21 Thread Christoph Hellwig
Add a local variable to track if we want to remap the returned address
using vmap and use that to simplify the code flow.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 47 +++--
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 60cb75aa6778e..a6b6fe72af4d1 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -166,6 +166,7 @@ static void *dma_direct_alloc_from_pool(struct device *dev, 
size_t size,
 void *dma_direct_alloc(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
+   bool remap = false;
struct page *page;
void *ret;
 
@@ -217,9 +218,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
if (!page)
return NULL;
 
-   if ((IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-!dev_is_dma_coherent(dev)) ||
-   (IS_ENABLED(CONFIG_DMA_REMAP) && PageHighMem(page))) {
+   if (!dev_is_dma_coherent(dev) && IS_ENABLED(CONFIG_DMA_DIRECT_REMAP)) {
+   remap = true;
+   } else if (PageHighMem(page)) {
+   /*
+* Depending on the cma= arguments and per-arch setup,
+* dma_alloc_contiguous could return highmem pages.
+* Without remapping there is no way to return them here, so
+* log an error and fail.
+*/
+   if (!IS_ENABLED(CONFIG_DMA_REMAP)) {
+   dev_info(dev, "Rejecting highmem page from CMA.\n");
+   goto out_free_pages;
+   }
+   remap = true;
+   }
+
+   if (remap) {
/* remove any dirty cache lines on the kernel alias */
arch_dma_prep_coherent(page, size);
 
@@ -229,36 +244,22 @@ void *dma_direct_alloc(struct device *dev, size_t size,
__builtin_return_address(0));
if (!ret)
goto out_free_pages;
-   if (dma_set_decrypted(dev, ret, size))
-   goto out_unmap_pages;
-   memset(ret, 0, size);
-   goto done;
-   }
-
-   if (PageHighMem(page)) {
-   /*
-* Depending on the cma= arguments and per-arch setup
-* dma_alloc_contiguous could return highmem pages.
-* Without remapping there is no way to return them here,
-* so log an error and fail.
-*/
-   dev_info(dev, "Rejecting highmem page from CMA.\n");
-   goto out_free_pages;
+   } else {
+   ret = page_address(page);
}
 
-   ret = page_address(page);
if (dma_set_decrypted(dev, ret, size))
-   goto out_free_pages;
+   goto out_unmap_pages;
memset(ret, 0, size);
 
-   if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !dev_is_dma_coherent(dev)) {
+   if (!dev_is_dma_coherent(dev) && !remap &&
+   IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED)) {
arch_dma_prep_coherent(page, size);
ret = arch_dma_set_uncached(ret, size);
if (IS_ERR(ret))
goto out_encrypt_pages;
}
-done:
+
*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
return ret;
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 03/10] dma-direct: leak memory that can't be re-encrypted

2021-10-21 Thread Christoph Hellwig
We must never unencryped memory go back into the general page pool.
So if we fail to set it back to encrypted when freeing DMA memory, leak
the memory insted and warn the user.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 2fef8dd401fe9..60cb75aa6778e 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -263,9 +263,11 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return ret;
 
 out_encrypt_pages:
-   /* If memory cannot be re-encrypted, it must be leaked */
-   if (dma_set_encrypted(dev, page_address(page), size))
+   if (dma_set_encrypted(dev, page_address(page), size)) {
+   pr_warn_ratelimited(
+   "leaking DMA memory that can't be re-encrypted\n");
return NULL;
+   }
 out_unmap_pages:
if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(ret))
vunmap(ret);
@@ -307,7 +309,11 @@ void dma_direct_free(struct device *dev, size_t size,
dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
return;
 
-   dma_set_encrypted(dev, cpu_addr, 1 << page_order);
+   if (dma_set_encrypted(dev, cpu_addr, 1 << page_order)) {
+   pr_warn_ratelimited(
+   "leaking DMA memory that can't be re-encrypted\n");
+   return;
+   }
 
if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr))
vunmap(cpu_addr);
@@ -365,7 +371,11 @@ void dma_direct_free_pages(struct device *dev, size_t size,
dma_free_from_pool(dev, vaddr, size))
return;
 
-   dma_set_encrypted(dev, vaddr, 1 << page_order);
+   if (dma_set_encrypted(dev, vaddr, 1 << page_order)) {
+   pr_warn_ratelimited(
+   "leaking DMA memory that can't be re-encrypted\n");
+   return;
+   }
__dma_direct_free_pages(dev, page, size);
 }
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/10] dma-direct: unmapped remapped pages when dma_set_decrypted

2021-10-21 Thread Christoph Hellwig
When dma_set_decrypted fails for the remapping case in dma_direct_alloc
we also need to unmap the pages before freeing them.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index d4d54af31a341..2fef8dd401fe9 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -230,7 +230,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
if (!ret)
goto out_free_pages;
if (dma_set_decrypted(dev, ret, size))
-   goto out_free_pages;
+   goto out_unmap_pages;
memset(ret, 0, size);
goto done;
}
@@ -266,6 +266,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
/* If memory cannot be re-encrypted, it must be leaked */
if (dma_set_encrypted(dev, page_address(page), size))
return NULL;
+out_unmap_pages:
+   if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(ret))
+   vunmap(ret);
 out_free_pages:
__dma_direct_free_pages(dev, page, size);
return NULL;
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/10] dma-direct: factor out dma_set_{de,en}crypted helpers

2021-10-21 Thread Christoph Hellwig
Factor out helpers the make dealing with memory encryption a little less
cumbersome.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/direct.c | 56 -
 1 file changed, 25 insertions(+), 31 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4c6c5e0635e34..d4d54af31a341 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,20 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static int dma_set_decrypted(struct device *dev, void *vaddr, size_t size)
+{
+   if (!force_dma_unencrypted(dev))
+   return 0;
+   return set_memory_decrypted((unsigned long)vaddr, 1 << get_order(size));
+}
+
+static int dma_set_encrypted(struct device *dev, void *vaddr, size_t size)
+{
+   if (!force_dma_unencrypted(dev))
+   return 0;
+   return set_memory_encrypted((unsigned long)vaddr, 1 << get_order(size));
+}
+
 static void __dma_direct_free_pages(struct device *dev, struct page *page,
size_t size)
 {
@@ -154,7 +168,6 @@ void *dma_direct_alloc(struct device *dev, size_t size,
 {
struct page *page;
void *ret;
-   int err;
 
size = PAGE_ALIGN(size);
if (attrs & DMA_ATTR_NO_WARN)
@@ -216,12 +229,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
__builtin_return_address(0));
if (!ret)
goto out_free_pages;
-   if (force_dma_unencrypted(dev)) {
-   err = set_memory_decrypted((unsigned long)ret,
-  1 << get_order(size));
-   if (err)
-   goto out_free_pages;
-   }
+   if (dma_set_decrypted(dev, ret, size))
+   goto out_free_pages;
memset(ret, 0, size);
goto done;
}
@@ -238,13 +247,8 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
 
ret = page_address(page);
-   if (force_dma_unencrypted(dev)) {
-   err = set_memory_decrypted((unsigned long)ret,
-  1 << get_order(size));
-   if (err)
-   goto out_free_pages;
-   }
-
+   if (dma_set_decrypted(dev, ret, size))
+   goto out_free_pages;
memset(ret, 0, size);
 
if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
@@ -259,13 +263,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return ret;
 
 out_encrypt_pages:
-   if (force_dma_unencrypted(dev)) {
-   err = set_memory_encrypted((unsigned long)page_address(page),
-  1 << get_order(size));
-   /* If memory cannot be re-encrypted, it must be leaked */
-   if (err)
-   return NULL;
-   }
+   /* If memory cannot be re-encrypted, it must be leaked */
+   if (dma_set_encrypted(dev, page_address(page), size))
+   return NULL;
 out_free_pages:
__dma_direct_free_pages(dev, page, size);
return NULL;
@@ -304,8 +304,7 @@ void dma_direct_free(struct device *dev, size_t size,
dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
return;
 
-   if (force_dma_unencrypted(dev))
-   set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order);
+   dma_set_encrypted(dev, cpu_addr, 1 << page_order);
 
if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr))
vunmap(cpu_addr);
@@ -341,11 +340,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
}
 
ret = page_address(page);
-   if (force_dma_unencrypted(dev)) {
-   if (set_memory_decrypted((unsigned long)ret,
-   1 << get_order(size)))
-   goto out_free_pages;
-   }
+   if (dma_set_decrypted(dev, ret, size))
+   goto out_free_pages;
memset(ret, 0, size);
*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
return page;
@@ -366,9 +362,7 @@ void dma_direct_free_pages(struct device *dev, size_t size,
dma_free_from_pool(dev, vaddr, size))
return;
 
-   if (force_dma_unencrypted(dev))
-   set_memory_encrypted((unsigned long)vaddr, 1 << page_order);
-
+   dma_set_encrypted(dev, vaddr, 1 << page_order);
__dma_direct_free_pages(dev, page, size);
 }
 
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


dma-direct fixes and cleanups v2

2021-10-21 Thread Christoph Hellwig
Hi all,

Linus complained about the complex flow in dma_direct_alloc, so this
tries to simplify it a bit, and while I was at it I also made sure that
unencrypted pages never leak back into the page allocator.

Changes since v1:
 - fix a missing return
 - add a new patch to fix a pre-existing missing unmap
 - various additional cleanups
 
Diffstat:
 direct.c |  234 +--
 1 file changed, 138 insertions(+), 96 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device

2021-10-21 Thread Sven Peter via iommu



On Wed, Oct 20, 2021, at 07:21, Lu Baolu wrote:
> On 2021/10/20 0:37, Sven Peter via iommu wrote:
>> The iova allocator is capable of handling any granularity which is a power
>> of two. Remove the much stronger condition that the granularity must be
>> smaller or equal to the CPU page size from a BUG_ON there.
>> Instead, check this condition during __iommu_attach_device and fail
>> gracefully.
>> 
>> Signed-off-by: Sven Peter
>> ---
>>   drivers/iommu/iommu.c | 35 ---
>>   drivers/iommu/iova.c  |  7 ---
>>   include/linux/iommu.h |  5 +
>>   3 files changed, 41 insertions(+), 6 deletions(-)
>> 
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index dd7863e453a5..28896739964b 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -80,6 +80,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
>> bus_type *bus,
>>   unsigned type);
>>   static int __iommu_attach_device(struct iommu_domain *domain,
>>   struct device *dev);
>> +static void __iommu_detach_device(struct iommu_domain *domain,
>> +  struct device *dev);
>>   static int __iommu_attach_group(struct iommu_domain *domain,
>>  struct iommu_group *group);
>>   static void __iommu_detach_group(struct iommu_domain *domain,
>> @@ -1974,6 +1976,19 @@ void iommu_domain_free(struct iommu_domain *domain)
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_domain_free);
>>   
>> +static int iommu_check_page_size(struct iommu_domain *domain)
>> +{
>> +if (!iommu_is_paging_domain(domain))
>> +return 0;
>> +
>> +if (!(domain->pgsize_bitmap & (PAGE_SIZE | (PAGE_SIZE - 1 {
>> +pr_warn("IOMMU pages cannot exactly represent CPU pages.\n");
>> +return -EFAULT;
>> +}
>> +
>> +return 0;
>> +}
>> +
>>   static int __iommu_attach_device(struct iommu_domain *domain,
>>   struct device *dev)
>>   {
>> @@ -1983,9 +1998,23 @@ static int __iommu_attach_device(struct iommu_domain 
>> *domain,
>>  return -ENODEV;
>>   
>>  ret = domain->ops->attach_dev(domain, dev);
>> -if (!ret)
>> -trace_attach_device_to_domain(dev);
>> -return ret;
>> +if (ret)
>> +return ret;
>> +
>> +/*
>> + * Check that CPU pages can be represented by the IOVA granularity.
>> + * This has to be done after ops->attach_dev since many IOMMU drivers
>> + * only limit domain->pgsize_bitmap after having attached the first
>> + * device.
>> + */
>> +ret = iommu_check_page_size(domain);
>> +if (ret) {
>> +__iommu_detach_device(domain, dev);
>> +return ret;
>> +}
>
> It looks odd. __iommu_attach_device() attaches an I/O page table for a
> device. How does it relate to CPU pages? Why is it a failure case if CPU
> page size is not covered?

Ideally, I'd only allow allocating DMA domains (which are going to be able
to handle larger IOMMU page sizes) while disallowing UNMANAGED domains
(which can theoretically read the granule but I doubt any client right now
considers this situation and will just run into odd issues) when the I/O
page size is bigger than the CPU page size. There was a brief previous
discussion about this [1,2,3].

Unfortunately, Apple's DART IOMMU is hardwired to either support 4K or
16K pages but never both. And to make things worse there was at least one
SoC used in the iPhones that mixed 4K and 16K DARTs on the same bus. Ugh.
That's why this awkward check is here because this is the earliest
place where I know which I/O page size will be used.


But I guess I could just limit the DART driver to 16K pages for now
(since every instance on the M1 is hard wired for that anyway) and then
just disallow allocating UNMANAGED domains when granule > PAGE_SIZE.

I'd still need a similar check here (at least for now) to prevent attaching
untrusted devices since I haven't changed swiotlb yet to support aligning
buffers correctly to more than PAGE_SIZE. That's possible but the interaction
with min_align_mask is a bit tricky to get right.
If there really shouldn't be any check here I can also do that for the next
version but I'd really like to keep that as a separate series - especially
since Thunderbolt support is still far away.


Thanks,


Sven

[1] 
https://lore.kernel.org/linux-iommu/7261df01-34a9-4e53-37cd-ae1aa15b1...@arm.com/
[2] 
https://lore.kernel.org/linux-iommu/CAK8P3a18XK2mfMGbZ+M32Mbabhbkd+=DNrnzampOah_j=rw...@mail.gmail.com/
[3] https://lore.kernel.org/linux-iommu/yo%2fbmuaolrgoj...@8bytes.org/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/6] iommu: Move IOMMU pagesize check to attach_device

2021-10-21 Thread Marc Zyngier
On Thu, 21 Oct 2021 03:22:30 +0100,
Lu Baolu  wrote:
> 
> On 10/20/21 10:22 PM, Marc Zyngier wrote:
> > On Wed, 20 Oct 2021 06:21:44 +0100,
> > Lu Baolu  wrote:
> >> 
> >> On 2021/10/20 0:37, Sven Peter via iommu wrote:
> >>> + /*
> >>> +  * Check that CPU pages can be represented by the IOVA granularity.
> >>> +  * This has to be done after ops->attach_dev since many IOMMU drivers
> >>> +  * only limit domain->pgsize_bitmap after having attached the first
> >>> +  * device.
> >>> +  */
> >>> + ret = iommu_check_page_size(domain);
> >>> + if (ret) {
> >>> + __iommu_detach_device(domain, dev);
> >>> + return ret;
> >>> + }
> >> 
> >> It looks odd. __iommu_attach_device() attaches an I/O page table for a
> >> device. How does it relate to CPU pages? Why is it a failure case if CPU
> >> page size is not covered?
> > 
> > If you allocate a CPU PAGE_SIZE'd region, and point it at a device
> > that now can DMA to more than what you have allocated because the
> > IOMMU's own page size is larger, the device has now access to data it
> > shouldn't see. In my book, that's a pretty bad thing.
> 
> But even you enforce the CPU page size check here, this problem still
> exists unless all DMA buffers are PAGE_SIZE aligned and sized, right?

Let me take a CPU analogy: you have a page that contains some user
data *and* a kernel secret. How do you map this page into userspace
without leaking the kernel secret?

PAGE_SIZE allocations are the unit of isolation, and this applies to
both CPU and IOMMU. If you have allocated a DMA buffer that is less
than a page, you then have to resort to bounce buffering, or accept
that your data isn't safe.

M.

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] dma-direct: leak memory that can't be re-encrypted

2021-10-21 Thread Christoph Hellwig
On Tue, Oct 19, 2021 at 12:56:36PM -0700, David Rientjes wrote:
> > -   dma_set_encrypted(dev, vaddr, 1 << page_order);
> > +   if (dma_set_encrypted(dev, vaddr, 1 << page_order)) {
> > +   pr_warn_ratelimited(
> > +   "leaking DMA memory that can't be re-encrypted\n");
> > +   }
> 
> Is this actually leaking the memory?

Yes, it should return here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] dma-direct: factor out dma_set_{de,en}crypted helpers

2021-10-21 Thread Christoph Hellwig
On Tue, Oct 19, 2021 at 12:54:54PM -0700, David Rientjes wrote:
> > -  1 << get_order(size));
> > -   if (err)
> > -   goto out_free_pages;
> > -   }
> > +   err = dma_set_decrypted(dev, ret, size);
> 
> Should be
> 
>   if (dma_set_decrypted(dev, ret, size))
>   goto out_free_pages;
> 
> ?  Otherwise we ignore the value?

Yes.  Except that we also need to undo the remap as well.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-21 Thread Tian, Kevin
> From: Jean-Philippe Brucker 
> Sent: Wednesday, October 13, 2021 8:11 PM
> 
> Support identity domains, allowing to only enable IOMMU protection for a
> subset of endpoints (those assigned to userspace, for example). Users
> may enable identity domains at compile time
> (CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time
> (iommu.passthrough=1) or
> runtime (/sys/kernel/iommu_groups/*/type = identity).
> 
> Patches 1-2 support identity domains using the optional
> VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in the
> spec, see [1] for the latest proposal.
> 
> Patches 3-5 add a fallback to identity mappings, when the feature is not
> supported.
> 
> Note that this series doesn't touch the global bypass bit added by
> VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the IOMMU
> should
> be attached to a domain, so global bypass isn't in use after endpoints
> are probed. Before that, the global bypass policy is decided by the
> hypervisor and firmware. So I don't think Linux needs to touch the
> global bypass bit, but there are some patches available on my
> virtio-iommu/bypass branch [2] to test it.
> 
> QEMU patches are on my virtio-iommu/bypass branch [3] (and the list)
> 
> [1] https://www.mail-archive.com/virtio-dev@lists.oasis-
> open.org/msg07898.html
> [2] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/bypass
> [3] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/bypass
> 
> Jean-Philippe Brucker (5):
>   iommu/virtio: Add definitions for VIRTIO_IOMMU_F_BYPASS_CONFIG
>   iommu/virtio: Support bypass domains
>   iommu/virtio: Sort reserved regions
>   iommu/virtio: Pass end address to viommu_add_mapping()
>   iommu/virtio: Support identity-mapped domains
> 
>  include/uapi/linux/virtio_iommu.h |   8 ++-
>  drivers/iommu/virtio-iommu.c  | 113 +-
>  2 files changed, 101 insertions(+), 20 deletions(-)
> 

For this series:

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-21 Thread Tian, Kevin
> From: Jean-Philippe Brucker 
> Sent: Monday, October 18, 2021 11:24 PM
> 
> On Thu, Oct 14, 2021 at 03:00:38AM +, Tian, Kevin wrote:
> > > From: Jean-Philippe Brucker 
> > > Sent: Wednesday, October 13, 2021 8:11 PM
> > >
> > > Support identity domains, allowing to only enable IOMMU protection for
> a
> > > subset of endpoints (those assigned to userspace, for example). Users
> > > may enable identity domains at compile time
> > > (CONFIG_IOMMU_DEFAULT_PASSTHROUGH), boot time
> > > (iommu.passthrough=1) or
> > > runtime (/sys/kernel/iommu_groups/*/type = identity).
> >
> > Do we want to use consistent terms between spec (bypass domain)
> > and code (identity domain)?
> 
> I don't think we have to. Linux uses "identity" domains and "passthrough"
> IOMMU. The old virtio-iommu feature was "bypass" so we should keep that
> for the new one, to be consistent. And then I've used "bypass" for domains
> as well, in the spec, because it would look strange to use a different
> term for the same concept. I find that it sort of falls into place: Linux'
> identity domains can be implemented either with bypass or identity-mapped
> virtio-iommu domains.

make sense.

> 
> > >
> > > Patches 1-2 support identity domains using the optional
> > > VIRTIO_IOMMU_F_BYPASS_CONFIG feature. The feature bit is not yet in
> the
> > > spec, see [1] for the latest proposal.
> > >
> > > Patches 3-5 add a fallback to identity mappings, when the feature is not
> > > supported.
> > >
> > > Note that this series doesn't touch the global bypass bit added by
> > > VIRTIO_IOMMU_F_BYPASS_CONFIG. All endpoints managed by the
> IOMMU
> > > should
> > > be attached to a domain, so global bypass isn't in use after endpoints
> >
> > I saw a concept of deferred attach in iommu core. See iommu_is_
> > attach_deferred(). Currently this is vendor specific and I haven't
> > looked into the exact reason why some vendor sets it now. Just
> > be curious whether the same reason might be applied to virtio-iommu.
> >
> > > are probed. Before that, the global bypass policy is decided by the
> > > hypervisor and firmware. So I don't think Linux needs to touch the
> >
> > This reminds me one thing. The spec says that the global bypass
> > bit is sticky and not affected by reset.
> 
> The spec talks about *device* reset, triggered by software writing 0 to
> the status register, but it doesn't mention system reset. Would be good to
> clarify that. Something like:
> 
> If the device offers the VIRTIO_IOMMU_F_BYPASS_CONFIG feature, it MAY
> initialize the \field{bypass} field to 1. Field \field{bypass} SHOULD
> NOT change on device reset, but SHOULD be restored to its initial
> value on system reset.

looks good to me

> 
> > This implies that in the case
> > of rebooting the VM into a different OS, the previous OS actually
> > has the right to override this setting for the next OS. Is it a right
> > design? Even the firmware itself is unable to identify the original
> > setting enforced by the hypervisor after reboot. I feel the hypervisor
> > setting should be recovered after reset since it reflects the
> > security measure enforced by the virtual platform?
> 
> So I think clarifying system reset should address your questions.
> I believe we should leave bypass sticky across device reset, so a FW->OS
> transition, where the OS resets the device, does not open a vulnerability
> (if bypass was enabled at boot and then left disabled by FW.)
> 
> It's still a good idea for the OS to restore on shutdown the bypass value
> it was booted with. So it can kexec into an OS that doesn't support
> virtio-iommu, for example.
> 

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 0/5] iommu/virtio: Add identity domains

2021-10-21 Thread Tian, Kevin
> From: j...@8bytes.org 
> Sent: Monday, October 18, 2021 7:38 PM
> 
> On Thu, Oct 14, 2021 at 03:00:38AM +, Tian, Kevin wrote:
> > I saw a concept of deferred attach in iommu core. See iommu_is_
> > attach_deferred(). Currently this is vendor specific and I haven't
> > looked into the exact reason why some vendor sets it now. Just
> > be curious whether the same reason might be applied to virtio-iommu.
> 
> The reason for attach_deferred is kdump support, where the IOMMU driver
> needs to keep the mappings from the old kernel until the device driver
> of the new kernel takes over.
> 

ok. Then there is no problem with the original statement:

All endpoints managed by the IOMMU should be attached to a 
domain, so global bypass isn't in use after endpoints are probed.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu