Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/5/3 15:49, Jean-Philippe Brucker wrote: On Sat, Apr 30, 2022 at 03:33:17PM +0800, Baolu Lu wrote: Jean, another quick question about the iommu_sva_bind_device() /** * iommu_sva_bind_device() - Bind a process address space to a device * @dev: the device * @mm: the mm to bind, caller must hold a reference to it * @drvdata: opaque data pointer to pass to bind callback This interface requires the caller to take a reference to mm. Which reference should it take, mm->mm_count or mm->mm_users? It's better to make it explicit in this comment. Agreed, it's mm_users as required by mmu_notifier_register() Thanks! I will add this in my refactoring patch. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Sat, Apr 30, 2022 at 03:33:17PM +0800, Baolu Lu wrote: > Jean, another quick question about the iommu_sva_bind_device() > > /** > * iommu_sva_bind_device() - Bind a process address space to a device > * @dev: the device > * @mm: the mm to bind, caller must hold a reference to it > * @drvdata: opaque data pointer to pass to bind callback > > This interface requires the caller to take a reference to mm. Which > reference should it take, mm->mm_count or mm->mm_users? It's better to > make it explicit in this comment. Agreed, it's mm_users as required by mmu_notifier_register() Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/30 06:19, Fenghua Yu wrote: Hi, Jean and Baolu, On Fri, Apr 29, 2022 at 03:34:36PM +0100, Jean-Philippe Brucker wrote: On Fri, Apr 29, 2022 at 06:51:17AM -0700, Fenghua Yu wrote: Hi, Baolu, On Fri, Apr 29, 2022 at 03:53:57PM +0800, Baolu Lu wrote: On 2022/4/28 16:39, Jean-Philippe Brucker wrote: The address space is what the OOM killer is after. That gets refcounted with mmget()/mmput()/mm->mm_users. The OOM killer is satiated by the page freeing done in __mmput()->exit_mmap(). Also, all the VMAs should be gone after exit_mmap(). So, even if vma->vm_file was holding a reference to a device driver, that reference should be gone by the time __mmdrop() is actually freeing the PASID. I agree with all that. The concern was about tearing down the PASID in the IOMMU and device from the release() MMU notifier, which would happen in exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And since the IOMMU drivers need to hold mm->mm_count anyway between bind() and unbind(), I think Fenghua's fix works. But I didn't find mmgrab()/mmdrop() get called in both arm and intel IOMMU drivers. $ git grep mmgrab drivers/iommu/ [no output] Do we need to add these in a separated fix patch, or I missed anything here? On both ARM and X86, sva_bind() calls mmu_notifier_register()->mmgrab() and sva_unbind() calls mmu_notifier_unregister()/mmu_notifier_put()->mmdrop(). Yes, although for Arm I realized the mmu_notifier grab wasn't sufficient so I sent a separate fix that should go in 5.18 as well https://lore.kernel.org/linux-iommu/20220426130444.300556-1-jean-phili...@linaro.org/ The Arm driver still touches the arch mm context after mmu_notifier_put(). I don't think X86 has that problem. I think so too. On X86, the mm is not used after mmu_notifier_unregister(). In case of supervisor mode SVM (i.e. svm is bound to init_mm), the code is right too because init_mm and its PASID cannot be dropped and mmu_notifier_register()/mmu_notifier_unregister() are not called. So I think no extra fix patch is needed on X86. Thanks, Fenghua and Jean. It's clear to me now. Jean, another quick question about the iommu_sva_bind_device() /** * iommu_sva_bind_device() - Bind a process address space to a device * @dev: the device * @mm: the mm to bind, caller must hold a reference to it * @drvdata: opaque data pointer to pass to bind callback This interface requires the caller to take a reference to mm. Which reference should it take, mm->mm_count or mm->mm_users? It's better to make it explicit in this comment. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Jean and Baolu, On Fri, Apr 29, 2022 at 03:34:36PM +0100, Jean-Philippe Brucker wrote: > On Fri, Apr 29, 2022 at 06:51:17AM -0700, Fenghua Yu wrote: > > Hi, Baolu, > > > > On Fri, Apr 29, 2022 at 03:53:57PM +0800, Baolu Lu wrote: > > > On 2022/4/28 16:39, Jean-Philippe Brucker wrote: > > > > > The address space is what the OOM killer is after. That gets > > > > > refcounted > > > > > with mmget()/mmput()/mm->mm_users. The OOM killer is satiated by the > > > > > page freeing done in __mmput()->exit_mmap(). > > > > > > > > > > Also, all the VMAs should be gone after exit_mmap(). So, even if > > > > > vma->vm_file was holding a reference to a device driver, that > > > > > reference > > > > > should be gone by the time __mmdrop() is actually freeing the PASID. > > > > > > > > I agree with all that. The concern was about tearing down the PASID in > > > > the > > > > IOMMU and device from the release() MMU notifier, which would happen in > > > > exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And > > > > since the IOMMU drivers need to hold mm->mm_count anyway between bind() > > > > and unbind(), I think Fenghua's fix works. > > > > > > But I didn't find mmgrab()/mmdrop() get called in both arm and intel > > > IOMMU drivers. > > > > > > $ git grep mmgrab drivers/iommu/ > > > [no output] > > > > > > Do we need to add these in a separated fix patch, or I missed anything > > > here? > > > > On both ARM and X86, sva_bind() calls mmu_notifier_register()->mmgrab() and > > sva_unbind() calls mmu_notifier_unregister()/mmu_notifier_put()->mmdrop(). > > Yes, although for Arm I realized the mmu_notifier grab wasn't sufficient > so I sent a separate fix that should go in 5.18 as well > https://lore.kernel.org/linux-iommu/20220426130444.300556-1-jean-phili...@linaro.org/ > The Arm driver still touches the arch mm context after mmu_notifier_put(). > I don't think X86 has that problem. I think so too. On X86, the mm is not used after mmu_notifier_unregister(). In case of supervisor mode SVM (i.e. svm is bound to init_mm), the code is right too because init_mm and its PASID cannot be dropped and mmu_notifier_register()/mmu_notifier_unregister() are not called. So I think no extra fix patch is needed on X86. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Fri, Apr 29, 2022 at 06:51:17AM -0700, Fenghua Yu wrote: > Hi, Baolu, > > On Fri, Apr 29, 2022 at 03:53:57PM +0800, Baolu Lu wrote: > > On 2022/4/28 16:39, Jean-Philippe Brucker wrote: > > > > The address space is what the OOM killer is after. That gets refcounted > > > > with mmget()/mmput()/mm->mm_users. The OOM killer is satiated by the > > > > page freeing done in __mmput()->exit_mmap(). > > > > > > > > Also, all the VMAs should be gone after exit_mmap(). So, even if > > > > vma->vm_file was holding a reference to a device driver, that reference > > > > should be gone by the time __mmdrop() is actually freeing the PASID. > > > > > > I agree with all that. The concern was about tearing down the PASID in the > > > IOMMU and device from the release() MMU notifier, which would happen in > > > exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And > > > since the IOMMU drivers need to hold mm->mm_count anyway between bind() > > > and unbind(), I think Fenghua's fix works. > > > > But I didn't find mmgrab()/mmdrop() get called in both arm and intel > > IOMMU drivers. > > > > $ git grep mmgrab drivers/iommu/ > > [no output] > > > > Do we need to add these in a separated fix patch, or I missed anything > > here? > > On both ARM and X86, sva_bind() calls mmu_notifier_register()->mmgrab() and > sva_unbind() calls mmu_notifier_unregister()/mmu_notifier_put()->mmdrop(). Yes, although for Arm I realized the mmu_notifier grab wasn't sufficient so I sent a separate fix that should go in 5.18 as well https://lore.kernel.org/linux-iommu/20220426130444.300556-1-jean-phili...@linaro.org/ The Arm driver still touches the arch mm context after mmu_notifier_put(). I don't think X86 has that problem. Thanks, Jean > So mm->mm_count are already counted in existing ARM and X86 binding and > unbinding. The fix patch just frees the PASID in __mmdrop() after > no more mm->mm_count. > > There is no need to add extra mmgrab() and mmdrop() pair. > > Thanks. > > -Fenghua > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Baolu, On Fri, Apr 29, 2022 at 03:53:57PM +0800, Baolu Lu wrote: > On 2022/4/28 16:39, Jean-Philippe Brucker wrote: > > > The address space is what the OOM killer is after. That gets refcounted > > > with mmget()/mmput()/mm->mm_users. The OOM killer is satiated by the > > > page freeing done in __mmput()->exit_mmap(). > > > > > > Also, all the VMAs should be gone after exit_mmap(). So, even if > > > vma->vm_file was holding a reference to a device driver, that reference > > > should be gone by the time __mmdrop() is actually freeing the PASID. > > > > I agree with all that. The concern was about tearing down the PASID in the > > IOMMU and device from the release() MMU notifier, which would happen in > > exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And > > since the IOMMU drivers need to hold mm->mm_count anyway between bind() > > and unbind(), I think Fenghua's fix works. > > But I didn't find mmgrab()/mmdrop() get called in both arm and intel > IOMMU drivers. > > $ git grep mmgrab drivers/iommu/ > [no output] > > Do we need to add these in a separated fix patch, or I missed anything > here? On both ARM and X86, sva_bind() calls mmu_notifier_register()->mmgrab() and sva_unbind() calls mmu_notifier_unregister()/mmu_notifier_put()->mmdrop(). So mm->mm_count are already counted in existing ARM and X86 binding and unbinding. The fix patch just frees the PASID in __mmdrop() after no more mm->mm_count. There is no need to add extra mmgrab() and mmdrop() pair. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/28 16:39, Jean-Philippe Brucker wrote: On Tue, Apr 26, 2022 at 04:31:57PM -0700, Dave Hansen wrote: On 4/26/22 09:48, Jean-Philippe Brucker wrote: On Tue, Apr 26, 2022 at 08:27:00AM -0700, Dave Hansen wrote: On 4/25/22 09:40, Jean-Philippe Brucker wrote: The problem is that we'd have to request the device driver to stop DMA before we can destroy the context and free the PASID. We did consider doing this in the release() MMU notifier, but there were concerns about blocking mmput() for too long (for example https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/ though I think there was a more recent discussion). We also need to drain the PRI and fault queues to get rid of all references to that PASID. Is the concern truly about blocking mmput() itself? Or, is it about releasing the resources associated with the mm? The latter I think, this one was about releasing pages as fast as possible if the process is picked by the OOM killer. We're tying the PASID to the life of the mm itself, not the mm's address space. That means the PASID should be tied to mmgrab()/mmdrop()/mm->mm_count. The address space is what the OOM killer is after. That gets refcounted with mmget()/mmput()/mm->mm_users. The OOM killer is satiated by the page freeing done in __mmput()->exit_mmap(). Also, all the VMAs should be gone after exit_mmap(). So, even if vma->vm_file was holding a reference to a device driver, that reference should be gone by the time __mmdrop() is actually freeing the PASID. I agree with all that. The concern was about tearing down the PASID in the IOMMU and device from the release() MMU notifier, which would happen in exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And since the IOMMU drivers need to hold mm->mm_count anyway between bind() and unbind(), I think Fenghua's fix works. But I didn't find mmgrab()/mmdrop() get called in both arm and intel IOMMU drivers. $ git grep mmgrab drivers/iommu/ [no output] Do we need to add these in a separated fix patch, or I missed anything here? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/28/22 09:01, Jean-Philippe Brucker wrote: >> But, this misses an important point: even after the address space is >> gone, the PASID will still be programmed into a device. Device drivers >> might, for instance, still need to flush operations that are outstanding >> and need to use that PASID. They do this at ->release() time. > It's not really clear which release() this is. For us it's file descriptor > release() (not MMU notifier release(), which is how I initially understood > this sentence) OK, maybe that should be: "file->release() time" to make it more clear. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Thu, Apr 28, 2022 at 08:09:04AM -0700, Dave Hansen wrote: > On 4/25/22 21:20, Fenghua Yu wrote: > >>From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001 > > From: Fenghua Yu > > Date: Fri, 15 Apr 2022 00:51:33 -0700 > > Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue > > > > A PASID might be still used on ARM after it is freed in __mmput(). > > Is it really just ARM? > > > process: > > open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm > > exit(); > > exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1 > > exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure > > > > To avoid the use-after-free issue, free the PASID after no device uses it, > > i.e. after all devices are unbound from the mm. > > > > sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count. > > __mmdrop() is called only after mm->mm_count is zero. So freeing the PASID > > in __mmdrop() guarantees the PASID is safely freed only after no device > > is bound to the mm. > > Does this changelog work for everyone? > > == > > tl;dr: The PASID is being freed too early. It needs to stay around > until after device drivers that might be using it have had a chance to > clear it out of the hardware. > > -- > > As a reminder: > > mmget() /mmput() refcount the mm's address space > mmgrab()/mmdrop() refcount the mm itself > > The PASID is currently tied to the life of the mm's address space and > freed in __mmput(). This makes logical sense because the PASID can't be > used once the address space is gone. > > But, this misses an important point: even after the address space is > gone, the PASID will still be programmed into a device. Device drivers > might, for instance, still need to flush operations that are outstanding > and need to use that PASID. They do this at ->release() time. It's not really clear which release() this is. For us it's file descriptor release() (not MMU notifier release(), which is how I initially understood this sentence) > > Device drivers hold a reference on the mm itself and drop it at > ->release() time. But, the device driver holds a reference mm itself, "to the mm" (To be pendantic it's the IOMMU driver that holds this reference, and the device driver calls the IOMMU driver to release it, but the idea is the same.) > not the address space. The address space (and the PASID) is long gone > by the time the driver tries to clean up. This is effectively a > use-after-free bug on the PASID. > > To fix this, move the PASID free operation from __mmput() to __mmdrop(). > This ensures that the device drivers' existing mmgrab() keeps the PASID > allocated until they drop their mm reference. Good changelog otherwise Thanks, Jean > > > kernel/fork.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 9796897560ab..35a3beff140b 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm) > > mmu_notifier_subscriptions_destroy(mm); > > check_mm(mm); > > put_user_ns(mm->user_ns); > > + mm_pasid_drop(mm); > > free_mm(mm); > > } > > EXPORT_SYMBOL_GPL(__mmdrop); > > @@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm) > > } > > if (mm->binfmt) > > module_put(mm->binfmt->module); > > - mm_pasid_drop(mm); > > mmdrop(mm); > > } > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/28/22 08:28, Fenghua Yu wrote: > Do you want me to change the changlog to add both this paragraph and the > following paragraph? Yes, as long as everyone agrees that it captures the issue well. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Dave, On Thu, Apr 28, 2022 at 08:09:04AM -0700, Dave Hansen wrote: > On 4/25/22 21:20, Fenghua Yu wrote: > >>From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001 > > From: Fenghua Yu > > Date: Fri, 15 Apr 2022 00:51:33 -0700 > > Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue > > > > A PASID might be still used on ARM after it is freed in __mmput(). > > Is it really just ARM? Actually it should happen on X86 as well. I will remove "on ARM" in the changelog. > > > process: > > open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm > > exit(); > > exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1 > > exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure > > > > To avoid the use-after-free issue, free the PASID after no device uses it, > > i.e. after all devices are unbound from the mm. > > > > sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count. > > __mmdrop() is called only after mm->mm_count is zero. So freeing the PASID > > in __mmdrop() guarantees the PASID is safely freed only after no device > > is bound to the mm. > > Does this changelog work for everyone? > > == > > tl;dr: The PASID is being freed too early. It needs to stay around > until after device drivers that might be using it have had a chance to > clear it out of the hardware. > Do you want me to change the changlog to add both this paragraph and the following paragraph? > -- > > As a reminder: > > mmget() /mmput() refcount the mm's address space > mmgrab()/mmdrop() refcount the mm itself > > The PASID is currently tied to the life of the mm's address space and > freed in __mmput(). This makes logical sense because the PASID can't be > used once the address space is gone. > > But, this misses an important point: even after the address space is > gone, the PASID will still be programmed into a device. Device drivers > might, for instance, still need to flush operations that are outstanding > and need to use that PASID. They do this at ->release() time. > > Device drivers hold a reference on the mm itself and drop it at > ->release() time. But, the device driver holds a reference mm itself, > not the address space. The address space (and the PASID) is long gone > by the time the driver tries to clean up. This is effectively a > use-after-free bug on the PASID. > > To fix this, move the PASID free operation from __mmput() to __mmdrop(). > This ensures that the device drivers' existing mmgrab() keeps the PASID > allocated until they drop their mm reference. > Thank you very much! -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/25/22 21:20, Fenghua Yu wrote: >>From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001 > From: Fenghua Yu > Date: Fri, 15 Apr 2022 00:51:33 -0700 > Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue > > A PASID might be still used on ARM after it is freed in __mmput(). Is it really just ARM? > process: > open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm > exit(); > exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1 > exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure > > To avoid the use-after-free issue, free the PASID after no device uses it, > i.e. after all devices are unbound from the mm. > > sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count. > __mmdrop() is called only after mm->mm_count is zero. So freeing the PASID > in __mmdrop() guarantees the PASID is safely freed only after no device > is bound to the mm. Does this changelog work for everyone? == tl;dr: The PASID is being freed too early. It needs to stay around until after device drivers that might be using it have had a chance to clear it out of the hardware. -- As a reminder: mmget() /mmput() refcount the mm's address space mmgrab()/mmdrop() refcount the mm itself The PASID is currently tied to the life of the mm's address space and freed in __mmput(). This makes logical sense because the PASID can't be used once the address space is gone. But, this misses an important point: even after the address space is gone, the PASID will still be programmed into a device. Device drivers might, for instance, still need to flush operations that are outstanding and need to use that PASID. They do this at ->release() time. Device drivers hold a reference on the mm itself and drop it at ->release() time. But, the device driver holds a reference mm itself, not the address space. The address space (and the PASID) is long gone by the time the driver tries to clean up. This is effectively a use-after-free bug on the PASID. To fix this, move the PASID free operation from __mmput() to __mmdrop(). This ensures that the device drivers' existing mmgrab() keeps the PASID allocated until they drop their mm reference. > kernel/fork.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index 9796897560ab..35a3beff140b 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm) > mmu_notifier_subscriptions_destroy(mm); > check_mm(mm); > put_user_ns(mm->user_ns); > + mm_pasid_drop(mm); > free_mm(mm); > } > EXPORT_SYMBOL_GPL(__mmdrop); > @@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm) > } > if (mm->binfmt) > module_put(mm->binfmt->module); > - mm_pasid_drop(mm); > mmdrop(mm); > } > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Wed, Apr 27, 2022 at 05:54:57PM -0700, Fenghua Yu wrote: > Hi, Dave and Jean, > > On Tue, Apr 26, 2022 at 01:04:45PM +0800, Zhangfei Gao wrote: > > > > > > On 2022/4/26 下午12:20, Fenghua Yu wrote: > > > Hi, Jean and Zhangfei, > > > > > > On Mon, Apr 25, 2022 at 05:13:02PM +0100, Jean-Philippe Brucker wrote: > > > > Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()? For > > > > Arm > > > > we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is > > > > also part of Lu's rework [1]. > > > Is this a right fix for the issue? Could you please test it on ARM? > > > I don't have an ARM machine. > > > > > > Thanks. > > > > > > -Fenghua > > > > > > From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001 > > > From: Fenghua Yu > > > Date: Fri, 15 Apr 2022 00:51:33 -0700 > > > Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue > > > > > > A PASID might be still used on ARM after it is freed in __mmput(). > > > > > > process: > > > open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm > > > exit(); > > > exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1 > > > exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure > > > > > > To avoid the use-after-free issue, free the PASID after no device uses it, > > > i.e. after all devices are unbound from the mm. > > > > > > sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count. > > > __mmdrop() is called only after mm->mm_count is zero. So freeing the PASID > > > in __mmdrop() guarantees the PASID is safely freed only after no device > > > is bound to the mm. > > > > > > Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation > > > and free it on mm exit") > > > > > > Reported-by: Zhangfei Gao > > > Suggested-by: Jean-Philippe Brucker > > > Suggested-by: Jacob Pan > > > Signed-off-by: Fenghua Yu > > Thanks for the fix. > > > > Tested-by: Zhangfei Gao > > > > > > > --- > > > kernel/fork.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > > index 9796897560ab..35a3beff140b 100644 > > > --- a/kernel/fork.c > > > +++ b/kernel/fork.c > > > @@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm) > > > mmu_notifier_subscriptions_destroy(mm); > > > check_mm(mm); > > > put_user_ns(mm->user_ns); > > > + mm_pasid_drop(mm); > > > free_mm(mm); > > > } > > > EXPORT_SYMBOL_GPL(__mmdrop); > > > @@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm) > > > } > > > if (mm->binfmt) > > > module_put(mm->binfmt->module); > > > - mm_pasid_drop(mm); > > > mmdrop(mm); > > > } > > > > Is this patch a good fix? Will you help push the fix into upstream? Yes, I think it's the right thing to do for now. Could you resend it separately so it gets visibility from the maintainers? Reviewed-by: Jean-Philippe Brucker ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Tue, Apr 26, 2022 at 04:31:57PM -0700, Dave Hansen wrote: > On 4/26/22 09:48, Jean-Philippe Brucker wrote: > > On Tue, Apr 26, 2022 at 08:27:00AM -0700, Dave Hansen wrote: > >> On 4/25/22 09:40, Jean-Philippe Brucker wrote: > >>> The problem is that we'd have to request the device driver to stop DMA > >>> before we can destroy the context and free the PASID. We did consider > >>> doing this in the release() MMU notifier, but there were concerns about > >>> blocking mmput() for too long (for example > >>> https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/ > >>> though I think there was a more recent discussion). We also need to drain > >>> the PRI and fault queues to get rid of all references to that PASID. > >> Is the concern truly about blocking mmput() itself? Or, is it about > >> releasing the resources associated with the mm? > > The latter I think, this one was about releasing pages as fast as possible > > if the process is picked by the OOM killer. > > We're tying the PASID to the life of the mm itself, not the mm's address > space. That means the PASID should be tied to > mmgrab()/mmdrop()/mm->mm_count. > > The address space is what the OOM killer is after. That gets refcounted > with mmget()/mmput()/mm->mm_users. The OOM killer is satiated by the > page freeing done in __mmput()->exit_mmap(). > > Also, all the VMAs should be gone after exit_mmap(). So, even if > vma->vm_file was holding a reference to a device driver, that reference > should be gone by the time __mmdrop() is actually freeing the PASID. I agree with all that. The concern was about tearing down the PASID in the IOMMU and device from the release() MMU notifier, which would happen in exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And since the IOMMU drivers need to hold mm->mm_count anyway between bind() and unbind(), I think Fenghua's fix works. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Dave and Jean, On Tue, Apr 26, 2022 at 01:04:45PM +0800, Zhangfei Gao wrote: > > > On 2022/4/26 下午12:20, Fenghua Yu wrote: > > Hi, Jean and Zhangfei, > > > > On Mon, Apr 25, 2022 at 05:13:02PM +0100, Jean-Philippe Brucker wrote: > > > Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()? For Arm > > > we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is > > > also part of Lu's rework [1]. > > Is this a right fix for the issue? Could you please test it on ARM? > > I don't have an ARM machine. > > > > Thanks. > > > > -Fenghua > > > > From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001 > > From: Fenghua Yu > > Date: Fri, 15 Apr 2022 00:51:33 -0700 > > Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue > > > > A PASID might be still used on ARM after it is freed in __mmput(). > > > > process: > > open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm > > exit(); > > exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1 > > exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure > > > > To avoid the use-after-free issue, free the PASID after no device uses it, > > i.e. after all devices are unbound from the mm. > > > > sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count. > > __mmdrop() is called only after mm->mm_count is zero. So freeing the PASID > > in __mmdrop() guarantees the PASID is safely freed only after no device > > is bound to the mm. > > > > Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation > > and free it on mm exit") > > > > Reported-by: Zhangfei Gao > > Suggested-by: Jean-Philippe Brucker > > Suggested-by: Jacob Pan > > Signed-off-by: Fenghua Yu > Thanks for the fix. > > Tested-by: Zhangfei Gao > > > > --- > > kernel/fork.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 9796897560ab..35a3beff140b 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm) > > mmu_notifier_subscriptions_destroy(mm); > > check_mm(mm); > > put_user_ns(mm->user_ns); > > + mm_pasid_drop(mm); > > free_mm(mm); > > } > > EXPORT_SYMBOL_GPL(__mmdrop); > > @@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm) > > } > > if (mm->binfmt) > > module_put(mm->binfmt->module); > > - mm_pasid_drop(mm); > > mmdrop(mm); > > } > Is this patch a good fix? Will you help push the fix into upstream? Thank you very much! -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/26/22 09:48, Jean-Philippe Brucker wrote: > On Tue, Apr 26, 2022 at 08:27:00AM -0700, Dave Hansen wrote: >> On 4/25/22 09:40, Jean-Philippe Brucker wrote: >>> The problem is that we'd have to request the device driver to stop DMA >>> before we can destroy the context and free the PASID. We did consider >>> doing this in the release() MMU notifier, but there were concerns about >>> blocking mmput() for too long (for example >>> https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/ >>> though I think there was a more recent discussion). We also need to drain >>> the PRI and fault queues to get rid of all references to that PASID. >> Is the concern truly about blocking mmput() itself? Or, is it about >> releasing the resources associated with the mm? > The latter I think, this one was about releasing pages as fast as possible > if the process is picked by the OOM killer. We're tying the PASID to the life of the mm itself, not the mm's address space. That means the PASID should be tied to mmgrab()/mmdrop()/mm->mm_count. The address space is what the OOM killer is after. That gets refcounted with mmget()/mmput()/mm->mm_users. The OOM killer is satiated by the page freeing done in __mmput()->exit_mmap(). Also, all the VMAs should be gone after exit_mmap(). So, even if vma->vm_file was holding a reference to a device driver, that reference should be gone by the time __mmdrop() is actually freeing the PASID. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Tue, Apr 26, 2022 at 08:27:00AM -0700, Dave Hansen wrote: > On 4/25/22 09:40, Jean-Philippe Brucker wrote: > > The problem is that we'd have to request the device driver to stop DMA > > before we can destroy the context and free the PASID. We did consider > > doing this in the release() MMU notifier, but there were concerns about > > blocking mmput() for too long (for example > > https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/ > > though I think there was a more recent discussion). We also need to drain > > the PRI and fault queues to get rid of all references to that PASID. > > Is the concern truly about blocking mmput() itself? Or, is it about > releasing the resources associated with the mm? The latter I think, this one was about releasing pages as fast as possible if the process is picked by the OOM killer. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/25/22 09:40, Jean-Philippe Brucker wrote: > The problem is that we'd have to request the device driver to stop DMA > before we can destroy the context and free the PASID. We did consider > doing this in the release() MMU notifier, but there were concerns about > blocking mmput() for too long (for example > https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/ > though I think there was a more recent discussion). We also need to drain > the PRI and fault queues to get rid of all references to that PASID. Is the concern truly about blocking mmput() itself? Or, is it about releasing the resources associated with the mm? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/26 下午12:36, Fenghua Yu wrote: On Tue, Apr 26, 2022 at 12:28:00PM +0800, Zhangfei Gao wrote: Hi, Jean On 2022/4/26 上午12:13, Jean-Philippe Brucker wrote: Hi Jacob, On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote: Hi Jean-Philippe, On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker wrote: On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote: On 4/25/22 06:53, Jean-Philippe Brucker wrote: On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com wrote: On 5.17 fops_release is called automatically, as well as iommu_sva_unbind_device. On 5.18-rc1. fops_release is not called, have to manually call close(fd) Right that's weird Looks it is caused by the fix patch, via mmget, which may add refcount of fd. Yes indirectly I think: when the process mmaps the queue, mmap_region() takes a reference to the uacce fd. That reference is released either by explicit close() or munmap(), or by exit_mmap() (which is triggered by mmput()). Since there is an mm->fd dependency, we cannot add a fd->mm dependency, so no mmget()/mmput() in bind()/unbind(). I guess we should go back to refcounted PASIDs instead, to avoid freeing them until unbind(). Yeah, this is a bit gnarly for -rc4. Let's just make sure there's nothing else simple we can do. How does the IOMMU hardware know that all activity to a given PASID is finished? That activity should, today, be independent of an mm or a fd's lifetime. In the case of uacce, it's tied to the fd lifetime: opening an accelerator queue calls iommu_sva_bind_device(), which sets up the PASID context in the IOMMU. Closing the queue calls iommu_sva_unbind_device() which destroys the PASID context (after the device driver stopped all DMA for this PASID). For VT-d, it is essentially the same flow except managed by the individual drivers such as DSA. If free() happens before unbind(), we deactivate the PASIDs and suppress faults from the device. When the unbind finally comes, we finalize the PASID teardown. It seems we have a need for an intermediate state where PASID is "pending free"? Yes we do have that state, though I'm not sure we need to make it explicit in the ioasid allocator. Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()? For Arm we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is also part of Lu's rework [1]. Move mm_pasid_drop to __mmdrop looks workable. The nginx works since ioasid is not freed when master exit until nginx stop. The ioasid does not free immediately when fops_release->unbind finished. Instead, __mmdrop happens a bit lazy, which has no issue though I passed 1 times exit without unbind test, the pasid allocation is ok. Thanks diff --git a/kernel/fork.c b/kernel/fork.c index 9796897560ab..60f417f69367 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -792,6 +792,8 @@ void __mmdrop(struct mm_struct *mm) mmu_notifier_subscriptions_destroy(mm); check_mm(mm); put_user_ns(mm->user_ns); + mm_pasid_drop(mm); free_mm(mm); } EXPORT_SYMBOL_GPL(__mmdrop); @@ -1190,7 +1192,6 @@ static inline void __mmput(struct mm_struct *mm) } if (mm->binfmt) module_put(mm->binfmt->module); - mm_pasid_drop(mm); mmdrop(mm); } Thank you very much, Zhangfei! I just now sent out an identical patch. It works on X86 as well. So seems the patch is the right fix. Either you can send out the patch or I add your Signed-off-by? Either way is OK for me. Thanks Fenghua, It does not matter. Have added tested-by. I was in stress test for checking the pasid free, since it was freed lazily. Thank all for the help, a bit nervous, since it is rc4 now. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/26 下午12:20, Fenghua Yu wrote: Hi, Jean and Zhangfei, On Mon, Apr 25, 2022 at 05:13:02PM +0100, Jean-Philippe Brucker wrote: Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()? For Arm we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is also part of Lu's rework [1]. Is this a right fix for the issue? Could you please test it on ARM? I don't have an ARM machine. Thanks. -Fenghua From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Fri, 15 Apr 2022 00:51:33 -0700 Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue A PASID might be still used on ARM after it is freed in __mmput(). process: open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm exit(); exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1 exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure To avoid the use-after-free issue, free the PASID after no device uses it, i.e. after all devices are unbound from the mm. sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count. __mmdrop() is called only after mm->mm_count is zero. So freeing the PASID in __mmdrop() guarantees the PASID is safely freed only after no device is bound to the mm. Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") Reported-by: Zhangfei Gao Suggested-by: Jean-Philippe Brucker Suggested-by: Jacob Pan Signed-off-by: Fenghua Yu Thanks for the fix. Tested-by: Zhangfei Gao --- kernel/fork.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/fork.c b/kernel/fork.c index 9796897560ab..35a3beff140b 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm) mmu_notifier_subscriptions_destroy(mm); check_mm(mm); put_user_ns(mm->user_ns); + mm_pasid_drop(mm); free_mm(mm); } EXPORT_SYMBOL_GPL(__mmdrop); @@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm) } if (mm->binfmt) module_put(mm->binfmt->module); - mm_pasid_drop(mm); mmdrop(mm); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Tue, Apr 26, 2022 at 12:28:00PM +0800, Zhangfei Gao wrote: > Hi, Jean > > On 2022/4/26 上午12:13, Jean-Philippe Brucker wrote: > > Hi Jacob, > > > > On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote: > > > Hi Jean-Philippe, > > > > > > On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker > > > wrote: > > > > > > > On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote: > > > > > On 4/25/22 06:53, Jean-Philippe Brucker wrote: > > > > > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com > > > > > > wrote: > > > > > > > > > On 5.17 > > > > > > > > > fops_release is called automatically, as well as > > > > > > > > > iommu_sva_unbind_device. On 5.18-rc1. > > > > > > > > > fops_release is not called, have to manually call close(fd) > > > > > > > > Right that's weird > > > > > > > Looks it is caused by the fix patch, via mmget, which may add > > > > > > > refcount of fd. > > > > > > Yes indirectly I think: when the process mmaps the queue, > > > > > > mmap_region() takes a reference to the uacce fd. That reference is > > > > > > released either by explicit close() or munmap(), or by exit_mmap() > > > > > > (which is triggered by mmput()). Since there is an mm->fd > > > > > > dependency, > > > > > > we cannot add a fd->mm dependency, so no mmget()/mmput() in > > > > > > bind()/unbind(). > > > > > > > > > > > > I guess we should go back to refcounted PASIDs instead, to avoid > > > > > > freeing them until unbind(). > > > > > Yeah, this is a bit gnarly for -rc4. Let's just make sure there's > > > > > nothing else simple we can do. > > > > > > > > > > How does the IOMMU hardware know that all activity to a given PASID is > > > > > finished? That activity should, today, be independent of an mm or a > > > > > fd's lifetime. > > > > In the case of uacce, it's tied to the fd lifetime: opening an > > > > accelerator > > > > queue calls iommu_sva_bind_device(), which sets up the PASID context in > > > > the IOMMU. Closing the queue calls iommu_sva_unbind_device() which > > > > destroys the PASID context (after the device driver stopped all DMA for > > > > this PASID). > > > > > > > For VT-d, it is essentially the same flow except managed by the individual > > > drivers such as DSA. > > > If free() happens before unbind(), we deactivate the PASIDs and suppress > > > faults from the device. When the unbind finally comes, we finalize the > > > PASID teardown. It seems we have a need for an intermediate state where > > > PASID is "pending free"? > > Yes we do have that state, though I'm not sure we need to make it explicit > > in the ioasid allocator. > > > > Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()? For Arm > > we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is > > also part of Lu's rework [1]. > > Move mm_pasid_drop to __mmdrop looks workable. > > The nginx works since ioasid is not freed when master exit until nginx stop. > > The ioasid does not free immediately when fops_release->unbind finished. > Instead, __mmdrop happens a bit lazy, which has no issue though > I passed 1 times exit without unbind test, the pasid allocation is ok. > > Thanks > > > diff --git a/kernel/fork.c b/kernel/fork.c > index 9796897560ab..60f417f69367 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -792,6 +792,8 @@ void __mmdrop(struct mm_struct *mm) > mmu_notifier_subscriptions_destroy(mm); > check_mm(mm); > put_user_ns(mm->user_ns); > + mm_pasid_drop(mm); > free_mm(mm); > } > EXPORT_SYMBOL_GPL(__mmdrop); > @@ -1190,7 +1192,6 @@ static inline void __mmput(struct mm_struct *mm) > } > if (mm->binfmt) > module_put(mm->binfmt->module); > - mm_pasid_drop(mm); > mmdrop(mm); > } Thank you very much, Zhangfei! I just now sent out an identical patch. It works on X86 as well. So seems the patch is the right fix. Either you can send out the patch or I add your Signed-off-by? Either way is OK for me. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Jean On 2022/4/26 上午12:13, Jean-Philippe Brucker wrote: Hi Jacob, On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote: Hi Jean-Philippe, On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker wrote: On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote: On 4/25/22 06:53, Jean-Philippe Brucker wrote: On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com wrote: On 5.17 fops_release is called automatically, as well as iommu_sva_unbind_device. On 5.18-rc1. fops_release is not called, have to manually call close(fd) Right that's weird Looks it is caused by the fix patch, via mmget, which may add refcount of fd. Yes indirectly I think: when the process mmaps the queue, mmap_region() takes a reference to the uacce fd. That reference is released either by explicit close() or munmap(), or by exit_mmap() (which is triggered by mmput()). Since there is an mm->fd dependency, we cannot add a fd->mm dependency, so no mmget()/mmput() in bind()/unbind(). I guess we should go back to refcounted PASIDs instead, to avoid freeing them until unbind(). Yeah, this is a bit gnarly for -rc4. Let's just make sure there's nothing else simple we can do. How does the IOMMU hardware know that all activity to a given PASID is finished? That activity should, today, be independent of an mm or a fd's lifetime. In the case of uacce, it's tied to the fd lifetime: opening an accelerator queue calls iommu_sva_bind_device(), which sets up the PASID context in the IOMMU. Closing the queue calls iommu_sva_unbind_device() which destroys the PASID context (after the device driver stopped all DMA for this PASID). For VT-d, it is essentially the same flow except managed by the individual drivers such as DSA. If free() happens before unbind(), we deactivate the PASIDs and suppress faults from the device. When the unbind finally comes, we finalize the PASID teardown. It seems we have a need for an intermediate state where PASID is "pending free"? Yes we do have that state, though I'm not sure we need to make it explicit in the ioasid allocator. Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()? For Arm we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is also part of Lu's rework [1]. Move mm_pasid_drop to __mmdrop looks workable. The nginx works since ioasid is not freed when master exit until nginx stop. The ioasid does not free immediately when fops_release->unbind finished. Instead, __mmdrop happens a bit lazy, which has no issue though I passed 1 times exit without unbind test, the pasid allocation is ok. Thanks diff --git a/kernel/fork.c b/kernel/fork.c index 9796897560ab..60f417f69367 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -792,6 +792,8 @@ void __mmdrop(struct mm_struct *mm) mmu_notifier_subscriptions_destroy(mm); check_mm(mm); put_user_ns(mm->user_ns); + mm_pasid_drop(mm); free_mm(mm); } EXPORT_SYMBOL_GPL(__mmdrop); @@ -1190,7 +1192,6 @@ static inline void __mmput(struct mm_struct *mm) } if (mm->binfmt) module_put(mm->binfmt->module); - mm_pasid_drop(mm); mmdrop(mm); } Thanks, Jean [1] https://lore.kernel.org/linux-iommu/20220421052121.3464100-9-baolu...@linux.intel.com/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Jean and Zhangfei, On Mon, Apr 25, 2022 at 05:13:02PM +0100, Jean-Philippe Brucker wrote: > Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()? For Arm > we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is > also part of Lu's rework [1]. Is this a right fix for the issue? Could you please test it on ARM? I don't have an ARM machine. Thanks. -Fenghua >From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Fri, 15 Apr 2022 00:51:33 -0700 Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue A PASID might be still used on ARM after it is freed in __mmput(). process: open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm exit(); exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1 exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure To avoid the use-after-free issue, free the PASID after no device uses it, i.e. after all devices are unbound from the mm. sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count. __mmdrop() is called only after mm->mm_count is zero. So freeing the PASID in __mmdrop() guarantees the PASID is safely freed only after no device is bound to the mm. Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") Reported-by: Zhangfei Gao Suggested-by: Jean-Philippe Brucker Suggested-by: Jacob Pan Signed-off-by: Fenghua Yu --- kernel/fork.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/fork.c b/kernel/fork.c index 9796897560ab..35a3beff140b 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm) mmu_notifier_subscriptions_destroy(mm); check_mm(mm); put_user_ns(mm->user_ns); + mm_pasid_drop(mm); free_mm(mm); } EXPORT_SYMBOL_GPL(__mmdrop); @@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm) } if (mm->binfmt) module_put(mm->binfmt->module); - mm_pasid_drop(mm); mmdrop(mm); } -- 2.32.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi Jean-Philippe, On Mon, 25 Apr 2022 17:13:02 +0100, Jean-Philippe Brucker wrote: > Hi Jacob, > > On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote: > > Hi Jean-Philippe, > > > > On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker > > wrote: > > > > > On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote: > > > > On 4/25/22 06:53, Jean-Philippe Brucker wrote: > > > > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com > > > > > wrote: > > > > On 5.17 > > > > fops_release is called automatically, as well as > > > > iommu_sva_unbind_device. On 5.18-rc1. > > > > fops_release is not called, have to manually call close(fd) > > > > >>> Right that's weird > > > > >> Looks it is caused by the fix patch, via mmget, which may add > > > > >> refcount of fd. > > > > > Yes indirectly I think: when the process mmaps the queue, > > > > > mmap_region() takes a reference to the uacce fd. That reference is > > > > > released either by explicit close() or munmap(), or by exit_mmap() > > > > > (which is triggered by mmput()). Since there is an mm->fd > > > > > dependency, we cannot add a fd->mm dependency, so no > > > > > mmget()/mmput() in bind()/unbind(). > > > > > > > > > > I guess we should go back to refcounted PASIDs instead, to avoid > > > > > freeing them until unbind(). > > > > > > > > Yeah, this is a bit gnarly for -rc4. Let's just make sure there's > > > > nothing else simple we can do. > > > > > > > > How does the IOMMU hardware know that all activity to a given PASID > > > > is finished? That activity should, today, be independent of an mm > > > > or a fd's lifetime. > > > > > > In the case of uacce, it's tied to the fd lifetime: opening an > > > accelerator queue calls iommu_sva_bind_device(), which sets up the > > > PASID context in the IOMMU. Closing the queue calls > > > iommu_sva_unbind_device() which destroys the PASID context (after the > > > device driver stopped all DMA for this PASID). > > > > > For VT-d, it is essentially the same flow except managed by the > > individual drivers such as DSA. > > If free() happens before unbind(), we deactivate the PASIDs and suppress > > faults from the device. When the unbind finally comes, we finalize the > > PASID teardown. It seems we have a need for an intermediate state where > > PASID is "pending free"? > > Yes we do have that state, though I'm not sure we need to make it explicit > in the ioasid allocator. > IMHO, making it explicit would fail ioasid_get() on a "pending free" PASID. Making free a one-way trip and prevent further complications. > Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()? For Arm > we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is > also part of Lu's rework [1]. > Yes, I would agree. IIRC, Fenghua's early patch was doing pasid drop in mmdrop. Maybe I missed something. > Thanks, > Jean > > [1] > https://lore.kernel.org/linux-iommu/20220421052121.3464100-9-baolu...@linux.intel.com/ Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Mon, Apr 25, 2022 at 08:55:46AM -0700, Dave Hansen wrote: > On 4/25/22 07:26, Jean-Philippe Brucker wrote: > >> > >> How does the IOMMU hardware know that all activity to a given PASID is > >> finished? That activity should, today, be independent of an mm or a > >> fd's lifetime. > > In the case of uacce, it's tied to the fd lifetime: opening an accelerator > > queue calls iommu_sva_bind_device(), which sets up the PASID context in > > the IOMMU. Closing the queue calls iommu_sva_unbind_device() which > > destroys the PASID context (after the device driver stopped all DMA for > > this PASID). > > Could this PASID context destruction move from being "fd-based" to > happening under mm_pasid_drop()? Logically, it seems like that should > work because mm_pasid_drop() happens after exit_mmap() where the VMAs > (which hold references to 'struct file' via vma->vm_file) are torn down. The problem is that we'd have to request the device driver to stop DMA before we can destroy the context and free the PASID. We did consider doing this in the release() MMU notifier, but there were concerns about blocking mmput() for too long (for example https://lore.kernel.org/linux-iommu/4d68da96-0ad5-b412-5987-2f7a6aa79...@amd.com/ though I think there was a more recent discussion). We also need to drain the PRI and fault queues to get rid of all references to that PASID. At the moment we disable (but not destroy) the PASID context in release(), so when the process gets killed pending DMA transactions are silently ignored. Then the device driver informs us through unbind() that no DMA is active anymore and we can finish cleaning up, then reuse the PASID. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi Jacob, On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote: > Hi Jean-Philippe, > > On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker > wrote: > > > On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote: > > > On 4/25/22 06:53, Jean-Philippe Brucker wrote: > > > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com > > > > wrote: > > > On 5.17 > > > fops_release is called automatically, as well as > > > iommu_sva_unbind_device. On 5.18-rc1. > > > fops_release is not called, have to manually call close(fd) > > > >>> Right that's weird > > > >> Looks it is caused by the fix patch, via mmget, which may add > > > >> refcount of fd. > > > > Yes indirectly I think: when the process mmaps the queue, > > > > mmap_region() takes a reference to the uacce fd. That reference is > > > > released either by explicit close() or munmap(), or by exit_mmap() > > > > (which is triggered by mmput()). Since there is an mm->fd dependency, > > > > we cannot add a fd->mm dependency, so no mmget()/mmput() in > > > > bind()/unbind(). > > > > > > > > I guess we should go back to refcounted PASIDs instead, to avoid > > > > freeing them until unbind(). > > > > > > Yeah, this is a bit gnarly for -rc4. Let's just make sure there's > > > nothing else simple we can do. > > > > > > How does the IOMMU hardware know that all activity to a given PASID is > > > finished? That activity should, today, be independent of an mm or a > > > fd's lifetime. > > > > In the case of uacce, it's tied to the fd lifetime: opening an accelerator > > queue calls iommu_sva_bind_device(), which sets up the PASID context in > > the IOMMU. Closing the queue calls iommu_sva_unbind_device() which > > destroys the PASID context (after the device driver stopped all DMA for > > this PASID). > > > For VT-d, it is essentially the same flow except managed by the individual > drivers such as DSA. > If free() happens before unbind(), we deactivate the PASIDs and suppress > faults from the device. When the unbind finally comes, we finalize the > PASID teardown. It seems we have a need for an intermediate state where > PASID is "pending free"? Yes we do have that state, though I'm not sure we need to make it explicit in the ioasid allocator. Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()? For Arm we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is also part of Lu's rework [1]. Thanks, Jean [1] https://lore.kernel.org/linux-iommu/20220421052121.3464100-9-baolu...@linux.intel.com/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/25/22 07:26, Jean-Philippe Brucker wrote: >> >> How does the IOMMU hardware know that all activity to a given PASID is >> finished? That activity should, today, be independent of an mm or a >> fd's lifetime. > In the case of uacce, it's tied to the fd lifetime: opening an accelerator > queue calls iommu_sva_bind_device(), which sets up the PASID context in > the IOMMU. Closing the queue calls iommu_sva_unbind_device() which > destroys the PASID context (after the device driver stopped all DMA for > this PASID). Could this PASID context destruction move from being "fd-based" to happening under mm_pasid_drop()? Logically, it seems like that should work because mm_pasid_drop() happens after exit_mmap() where the VMAs (which hold references to 'struct file' via vma->vm_file) are torn down. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi Jean-Philippe, On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker wrote: > On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote: > > On 4/25/22 06:53, Jean-Philippe Brucker wrote: > > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com > > > wrote: > > On 5.17 > > fops_release is called automatically, as well as > > iommu_sva_unbind_device. On 5.18-rc1. > > fops_release is not called, have to manually call close(fd) > > >>> Right that's weird > > >> Looks it is caused by the fix patch, via mmget, which may add > > >> refcount of fd. > > > Yes indirectly I think: when the process mmaps the queue, > > > mmap_region() takes a reference to the uacce fd. That reference is > > > released either by explicit close() or munmap(), or by exit_mmap() > > > (which is triggered by mmput()). Since there is an mm->fd dependency, > > > we cannot add a fd->mm dependency, so no mmget()/mmput() in > > > bind()/unbind(). > > > > > > I guess we should go back to refcounted PASIDs instead, to avoid > > > freeing them until unbind(). > > > > Yeah, this is a bit gnarly for -rc4. Let's just make sure there's > > nothing else simple we can do. > > > > How does the IOMMU hardware know that all activity to a given PASID is > > finished? That activity should, today, be independent of an mm or a > > fd's lifetime. > > In the case of uacce, it's tied to the fd lifetime: opening an accelerator > queue calls iommu_sva_bind_device(), which sets up the PASID context in > the IOMMU. Closing the queue calls iommu_sva_unbind_device() which > destroys the PASID context (after the device driver stopped all DMA for > this PASID). > For VT-d, it is essentially the same flow except managed by the individual drivers such as DSA. If free() happens before unbind(), we deactivate the PASIDs and suppress faults from the device. When the unbind finally comes, we finalize the PASID teardown. It seems we have a need for an intermediate state where PASID is "pending free"? > Thanks, > Jean > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote: > On 4/25/22 06:53, Jean-Philippe Brucker wrote: > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com wrote: > On 5.17 > fops_release is called automatically, as well as iommu_sva_unbind_device. > On 5.18-rc1. > fops_release is not called, have to manually call close(fd) > >>> Right that's weird > >> Looks it is caused by the fix patch, via mmget, which may add refcount of > >> fd. > > Yes indirectly I think: when the process mmaps the queue, mmap_region() > > takes a reference to the uacce fd. That reference is released either by > > explicit close() or munmap(), or by exit_mmap() (which is triggered by > > mmput()). Since there is an mm->fd dependency, we cannot add a fd->mm > > dependency, so no mmget()/mmput() in bind()/unbind(). > > > > I guess we should go back to refcounted PASIDs instead, to avoid freeing > > them until unbind(). > > Yeah, this is a bit gnarly for -rc4. Let's just make sure there's > nothing else simple we can do. > > How does the IOMMU hardware know that all activity to a given PASID is > finished? That activity should, today, be independent of an mm or a > fd's lifetime. In the case of uacce, it's tied to the fd lifetime: opening an accelerator queue calls iommu_sva_bind_device(), which sets up the PASID context in the IOMMU. Closing the queue calls iommu_sva_unbind_device() which destroys the PASID context (after the device driver stopped all DMA for this PASID). Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/25/22 06:53, Jean-Philippe Brucker wrote: > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com wrote: On 5.17 fops_release is called automatically, as well as iommu_sva_unbind_device. On 5.18-rc1. fops_release is not called, have to manually call close(fd) >>> Right that's weird >> Looks it is caused by the fix patch, via mmget, which may add refcount of >> fd. > Yes indirectly I think: when the process mmaps the queue, mmap_region() > takes a reference to the uacce fd. That reference is released either by > explicit close() or munmap(), or by exit_mmap() (which is triggered by > mmput()). Since there is an mm->fd dependency, we cannot add a fd->mm > dependency, so no mmget()/mmput() in bind()/unbind(). > > I guess we should go back to refcounted PASIDs instead, to avoid freeing > them until unbind(). Yeah, this is a bit gnarly for -rc4. Let's just make sure there's nothing else simple we can do. How does the IOMMU hardware know that all activity to a given PASID is finished? That activity should, today, be independent of an mm or a fd's lifetime. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com wrote: > > > On 5.17 > > > fops_release is called automatically, as well as iommu_sva_unbind_device. > > > On 5.18-rc1. > > > fops_release is not called, have to manually call close(fd) > > Right that's weird > Looks it is caused by the fix patch, via mmget, which may add refcount of > fd. Yes indirectly I think: when the process mmaps the queue, mmap_region() takes a reference to the uacce fd. That reference is released either by explicit close() or munmap(), or by exit_mmap() (which is triggered by mmput()). Since there is an mm->fd dependency, we cannot add a fd->mm dependency, so no mmget()/mmput() in bind()/unbind(). I guess we should go back to refcounted PASIDs instead, to avoid freeing them until unbind(). Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Dave On 2022/4/12 下午11:35, Dave Hansen wrote: On 4/12/22 08:10, Jean-Philippe Brucker wrote: I wonder if the Intel and ARM IOMMU code differ in the way they keep references to the mm, or if this affects Intel as well, but we just haven't tested the code enough. The Arm code was written expecting the PASID to be freed on unbind(), not mm exit. I missed the change of behavior, sorry (I thought your plan was to extend PASID lifetime, not shorten it?) but as is it seems very broken. For example in the iommu_sva_unbind_device(), we have arm_smmu_mmu_notifier_put() clearing the PASID table entry for "mm->pasid", which is going to end badly if the PASID has been cleared or reallocated. We can't clear the PASID entry in mm exit because at that point the device may still be issuing DMA for that PASID and we need to quiesce the entry rather than deactivate it. I think we ended up flipping some of this around on the Intel side. Instead of having to quiesce the device on mm exit, we don't let the mm exit until the device is done. When you program the pasid into the device, it's a lot like when you create a thread. We bump the reference count on the mm when we program the page table pointer into a CPU. We drop the thread's reference to the mm when the thread exits and will no longer be using the page tables. Same thing with pasids. We bump the refcount on the mm when the pasid is programmed into the device. Once the device is done with the mm, we drop the mm. Basically, instead of recounting the pasid itself, we just refcount the mm. This has issue, since refcount the mm will block fops_release to be called, where unbind may really happen. For example, user driver are ended unexpectedly, usually system will end all applications via close fd -> fops_release -> unbind may happen here. Now mmget is called, fops_release -> unbind has NO chance to be called, so ioasid can NOT be freed. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Jean & Fenghua The issue of "iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit" is mm_pasid_drop in __mmput only ioasid_free(mm->pasid), but keep all related resources, like cd tables etc. This introduces many strange issues. For example, if application does not use close fd but exit directly, mm_pasid_drop is called first, then fops_release->unbind is called, when mm->pasid=-1, arm_smmu_write_ctx_desc will get error. And in nginx case, pasid is freed when fork daemon, but cd table is still there, then next time, same pasid is allocated. So either __mmput free pasid as well as all related resources like cd table, or convert back to rely on the original unbind method to free pasid and resources together. Since SVA is main feature on ARM, which has been developed for years, It is already used in the product. It will be horrible if SVA is broke on 5.18. Any suggestion? Thanks On 2022/4/24 上午10:58, Zhangfei Gao wrote: On Sat, 23 Apr 2022 at 19:13, zhangfei@foxmail.com wrote: Hi, Jean On 2022/4/22 下午11:50, Jean-Philippe Brucker wrote: On Fri, Apr 22, 2022 at 09:15:01PM +0800, zhangfei@foxmail.com wrote: I'm trying to piece together what happens from the kernel point of view. * master process with mm A opens a queue fd through uacce, which calls iommu_sva_bind_device(dev, A) -> PASID 1 * master forks and exits. Child (daemon) gets mm B, inherits the queue fd. The device is still bound to mm A with PASID 1, since the queue fd is still open. We discussed this before, but I don't remember where we left off. The child can't use the queue because its mappings are not copied on fork(), and the queue is still bound to the parent mm A. The child either needs to open a new queue or take ownership of the old one with a new uacce ioctl. Yes, currently nginx aligned with the case. Child process (worker process) reopen uacce, Master process (do init) open uacce, iommu_sva_bind_device(dev, A) -> PASID 1 Master process fork Child (daemon) and exit. Child (daemon) does not use PASID 1 any more, only fork and manage worker process. Worker process reopen uacce, iommu_sva_bind_device(dev, B) PASID 2 So it is expected. Yes, that's fine Is that the "IMPLEMENT_DYNAMIC_BIND_FN()" you mention, something out of tree? This operation should unbind from A before binding to B, no? Otherwise we leak PASID 1. In 5.16 PASID 1 from master is hold until nginx service stop. nginx start master: iommu_sva_alloc_pasid mm->pasid=1 // master process lynx https start: iommu_sva_alloc_pasid mm->pasid=2//worker process nginx stop: from fops_release iommu_sva_free_pasid mm->pasid=2 // worker process iommu_sva_free_pasid mm->pasid=1 // master process That's the expected behavior (master could close its fd before forking, in order to free things up earlier, but it's not mandatory) Currently we unbind in fops_release, so the ioasid allocated in master can only be freed when nginx stop, when all forked fd are closed. Have one silly question. kerne driver fops_open iommu_sva_bind_device fops_release iommu_sva_unbind_device application main() fd = open return; Application exit but not close(fd), is it expected fops_release will be called automatically by system? Yes, the application doesn't have to call close() explicitly, the file descriptor is closed automatically on exit. Note that the fd is copied on fork(), so it is only released once parent and all child processes exit. Yes, in case the application ended unexpected, like ctrl+c On 5.17 fops_release is called automatically, as well as iommu_sva_unbind_device. On 5.18-rc1. fops_release is not called, have to manually call close(fd) Right that's weird Looks it is caused by the fix patch, via mmget, which may add refcount of fd. Some experiments 1. 5.17, everything works well. 2. 5.17 + patchset of "iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit" Test application, exit without close uacce fd First time: fops_release can be called automatically. log: ioasid_alloc ioasid=1 iommu_sva_alloc_pasid pasid=1 iommu_sva_bind_device handle=263a2ee8 ioasid_free ioasid=1 uacce_fops_release q=55ca3cdf iommu_sva_unbind_device handle=263a2ee8 Second time: hardware reports error uacce_fops_open q=8e4d6f78 ioasid_alloc ioasid=1 iommu_sva_alloc_pasid pasid=1 iommu_sva_bind_device handle=cfd11788 // Haredware reports error hisi_sec2 :b6:00.0: qm_acc_do_task_timeout [error status=0x20] found hisi_sec2 :b6:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found hisi_sec2 :b6:00.0: sec_fsm_hbeat_rint [error status=0x20] found hisi_sec2 :b6:00.0: Controller resetting... hisi_sec2 :b6:00.0: QM mailbox operation timeout! hisi_sec2 :b6:00.0: Failed to dump sqc! hisi_sec2 :b6:00.0: Failed to drain out data for stopping! hisi_sec2 :b6:00.0: Bus lock! Please reset system. hisi_sec2 :b6:00.0: Controller
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Sat, 23 Apr 2022 at 19:13, zhangfei@foxmail.com wrote: > > Hi, Jean > > On 2022/4/22 下午11:50, Jean-Philippe Brucker wrote: > > On Fri, Apr 22, 2022 at 09:15:01PM +0800, zhangfei@foxmail.com wrote: > >>> I'm trying to piece together what happens from the kernel point of view. > >>> > >>> * master process with mm A opens a queue fd through uacce, which calls > >>> iommu_sva_bind_device(dev, A) -> PASID 1 > >>> > >>> * master forks and exits. Child (daemon) gets mm B, inherits the queue fd. > >>> The device is still bound to mm A with PASID 1, since the queue fd is > >>> still open. > >>> We discussed this before, but I don't remember where we left off. The > >>> child can't use the queue because its mappings are not copied on fork(), > >>> and the queue is still bound to the parent mm A. The child either needs to > >>> open a new queue or take ownership of the old one with a new uacce ioctl. > >> Yes, currently nginx aligned with the case. > >> Child process (worker process) reopen uacce, > >> > >> Master process (do init) open uacce, iommu_sva_bind_device(dev, A) -> PASID > >> 1 > >> Master process fork Child (daemon) and exit. > >> > >> Child (daemon) does not use PASID 1 any more, only fork and manage worker > >> process. > >> Worker process reopen uacce, iommu_sva_bind_device(dev, B) PASID 2 > >> > >> So it is expected. > > Yes, that's fine > > > >>> Is that the "IMPLEMENT_DYNAMIC_BIND_FN()" you mention, something out of > >>> tree? This operation should unbind from A before binding to B, no? > >>> Otherwise we leak PASID 1. > >> In 5.16 PASID 1 from master is hold until nginx service stop. > >> nginx start > >> master: > >> iommu_sva_alloc_pasid mm->pasid=1 // master process > >> > >> lynx https start: > >> iommu_sva_alloc_pasid mm->pasid=2//worker process > >> > >> nginx stop: from fops_release > >> iommu_sva_free_pasid mm->pasid=2 // worker process > >> iommu_sva_free_pasid mm->pasid=1 // master process > > That's the expected behavior (master could close its fd before forking, in > > order to free things up earlier, but it's not mandatory) > Currently we unbind in fops_release, so the ioasid allocated in master > can only be freed when nginx stop, > when all forked fd are closed. > > > > >> Have one silly question. > >> > >> kerne driver > >> fops_open > >> iommu_sva_bind_device > >> > >> fops_release > >> iommu_sva_unbind_device > >> > >> application > >> main() > >> fd = open > >> return; > >> > >> Application exit but not close(fd), is it expected fops_release will be > >> called automatically by system? > > Yes, the application doesn't have to call close() explicitly, the file > > descriptor is closed automatically on exit. Note that the fd is copied on > > fork(), so it is only released once parent and all child processes exit. > Yes, in case the application ended unexpected, like ctrl+c > > > >> On 5.17 > >> fops_release is called automatically, as well as iommu_sva_unbind_device. > >> On 5.18-rc1. > >> fops_release is not called, have to manually call close(fd) > > Right that's weird > Looks it is caused by the fix patch, via mmget, which may add refcount > of fd. > > Some experiments > 1. 5.17, everything works well. > > 2. 5.17 + patchset of "iommu/sva: Assign a PASID to mm on PASID > allocation and free it on mm exit" > > Test application, exit without close uacce fd > First time: fops_release can be called automatically. > > log: > ioasid_alloc ioasid=1 > iommu_sva_alloc_pasid pasid=1 > iommu_sva_bind_device handle=263a2ee8 > ioasid_free ioasid=1 > uacce_fops_release q=55ca3cdf > iommu_sva_unbind_device handle=263a2ee8 > > Second time: hardware reports error > > uacce_fops_open q=8e4d6f78 > ioasid_alloc ioasid=1 > iommu_sva_alloc_pasid pasid=1 > iommu_sva_bind_device handle=cfd11788 > // Haredware reports error > hisi_sec2 :b6:00.0: qm_acc_do_task_timeout [error status=0x20] found > hisi_sec2 :b6:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] > found > hisi_sec2 :b6:00.0: sec_fsm_hbeat_rint [error status=0x20] found > hisi_sec2 :b6:00.0: Controller resetting... > hisi_sec2 :b6:00.0: QM mailbox operation timeout! > hisi_sec2 :b6:00.0: Failed to dump sqc! > hisi_sec2 :b6:00.0: Failed to drain out data for stopping! > hisi_sec2 :b6:00.0: Bus lock! Please reset system. > hisi_sec2 :b6:00.0: Controller reset failed (-110) > hisi_sec2 :b6:00.0: controller reset failed (-110) > > 3. Add the fix patch of using mmget in bind. > Test application, exit without close uacce fd > > fops_release can NOT be called automatically, looks mmget adds refcount > of fd. Test application, exit without closing fd. > >> kernel driver > >> fops_open > >> iommu_sva_bind_device > >> > >> fops_release > >> iommu_sva_unbind_device 1. 5.17 kernel, no mmget & mmput wd_release_queue no close Compress bz=512000 nb=1×10, speed=139.5 MB/s (±0.0% N=1) overall=122.9 MB/s (±0.0%) [ 16.052989]
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Jean On 2022/4/22 下午11:50, Jean-Philippe Brucker wrote: On Fri, Apr 22, 2022 at 09:15:01PM +0800, zhangfei@foxmail.com wrote: I'm trying to piece together what happens from the kernel point of view. * master process with mm A opens a queue fd through uacce, which calls iommu_sva_bind_device(dev, A) -> PASID 1 * master forks and exits. Child (daemon) gets mm B, inherits the queue fd. The device is still bound to mm A with PASID 1, since the queue fd is still open. We discussed this before, but I don't remember where we left off. The child can't use the queue because its mappings are not copied on fork(), and the queue is still bound to the parent mm A. The child either needs to open a new queue or take ownership of the old one with a new uacce ioctl. Yes, currently nginx aligned with the case. Child process (worker process) reopen uacce, Master process (do init) open uacce, iommu_sva_bind_device(dev, A) -> PASID 1 Master process fork Child (daemon) and exit. Child (daemon) does not use PASID 1 any more, only fork and manage worker process. Worker process reopen uacce, iommu_sva_bind_device(dev, B) PASID 2 So it is expected. Yes, that's fine Is that the "IMPLEMENT_DYNAMIC_BIND_FN()" you mention, something out of tree? This operation should unbind from A before binding to B, no? Otherwise we leak PASID 1. In 5.16 PASID 1 from master is hold until nginx service stop. nginx start master: iommu_sva_alloc_pasid mm->pasid=1 // master process lynx https start: iommu_sva_alloc_pasid mm->pasid=2 //worker process nginx stop: from fops_release iommu_sva_free_pasid mm->pasid=2 // worker process iommu_sva_free_pasid mm->pasid=1 // master process That's the expected behavior (master could close its fd before forking, in order to free things up earlier, but it's not mandatory) Currently we unbind in fops_release, so the ioasid allocated in master can only be freed when nginx stop, when all forked fd are closed. Have one silly question. kerne driver fops_open iommu_sva_bind_device fops_release iommu_sva_unbind_device application main() fd = open return; Application exit but not close(fd), is it expected fops_release will be called automatically by system? Yes, the application doesn't have to call close() explicitly, the file descriptor is closed automatically on exit. Note that the fd is copied on fork(), so it is only released once parent and all child processes exit. Yes, in case the application ended unexpected, like ctrl+c On 5.17 fops_release is called automatically, as well as iommu_sva_unbind_device. On 5.18-rc1. fops_release is not called, have to manually call close(fd) Right that's weird Looks it is caused by the fix patch, via mmget, which may add refcount of fd. Some experiments 1. 5.17, everything works well. 2. 5.17 + patchset of "iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit" Test application, exit without close uacce fd First time: fops_release can be called automatically. log: ioasid_alloc ioasid=1 iommu_sva_alloc_pasid pasid=1 iommu_sva_bind_device handle=263a2ee8 ioasid_free ioasid=1 uacce_fops_release q=55ca3cdf iommu_sva_unbind_device handle=263a2ee8 Second time: hardware reports error uacce_fops_open q=8e4d6f78 ioasid_alloc ioasid=1 iommu_sva_alloc_pasid pasid=1 iommu_sva_bind_device handle=cfd11788 // Haredware reports error hisi_sec2 :b6:00.0: qm_acc_do_task_timeout [error status=0x20] found hisi_sec2 :b6:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found hisi_sec2 :b6:00.0: sec_fsm_hbeat_rint [error status=0x20] found hisi_sec2 :b6:00.0: Controller resetting... hisi_sec2 :b6:00.0: QM mailbox operation timeout! hisi_sec2 :b6:00.0: Failed to dump sqc! hisi_sec2 :b6:00.0: Failed to drain out data for stopping! hisi_sec2 :b6:00.0: Bus lock! Please reset system. hisi_sec2 :b6:00.0: Controller reset failed (-110) hisi_sec2 :b6:00.0: controller reset failed (-110) 3. Add the fix patch of using mmget in bind. Test application, exit without close uacce fd fops_release can NOT be called automatically, looks mmget adds refcount of fd. So the fix method of using mmget blocks fops_release to be called once fd is closed without unbind. Since nginx may have a issue, it does not call close(fd) when nginx -s quit. And you're sure that none of the processes are still alive or in zombie state? Just to cover every possibility. It can also reproduced by a simple application exit without close(uacce_fd) Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Fri, Apr 22, 2022 at 09:15:01PM +0800, zhangfei@foxmail.com wrote: > > I'm trying to piece together what happens from the kernel point of view. > > > > * master process with mm A opens a queue fd through uacce, which calls > >iommu_sva_bind_device(dev, A) -> PASID 1 > > > > * master forks and exits. Child (daemon) gets mm B, inherits the queue fd. > >The device is still bound to mm A with PASID 1, since the queue fd is > >still open. > > > We discussed this before, but I don't remember where we left off. The > > child can't use the queue because its mappings are not copied on fork(), > > and the queue is still bound to the parent mm A. The child either needs to > > open a new queue or take ownership of the old one with a new uacce ioctl. > Yes, currently nginx aligned with the case. > Child process (worker process) reopen uacce, > > Master process (do init) open uacce, iommu_sva_bind_device(dev, A) -> PASID > 1 > Master process fork Child (daemon) and exit. > > Child (daemon) does not use PASID 1 any more, only fork and manage worker > process. > Worker process reopen uacce, iommu_sva_bind_device(dev, B) PASID 2 > > So it is expected. Yes, that's fine > > Is that the "IMPLEMENT_DYNAMIC_BIND_FN()" you mention, something out of > > tree? This operation should unbind from A before binding to B, no? > > Otherwise we leak PASID 1. > In 5.16 PASID 1 from master is hold until nginx service stop. > nginx start > master: > iommu_sva_alloc_pasid mm->pasid=1 // master process > > lynx https start: > iommu_sva_alloc_pasid mm->pasid=2 //worker process > > nginx stop: from fops_release > iommu_sva_free_pasid mm->pasid=2 // worker process > iommu_sva_free_pasid mm->pasid=1 // master process That's the expected behavior (master could close its fd before forking, in order to free things up earlier, but it's not mandatory) > Have one silly question. > > kerne driver > fops_open > iommu_sva_bind_device > > fops_release > iommu_sva_unbind_device > > application > main() > fd = open > return; > > Application exit but not close(fd), is it expected fops_release will be > called automatically by system? Yes, the application doesn't have to call close() explicitly, the file descriptor is closed automatically on exit. Note that the fd is copied on fork(), so it is only released once parent and all child processes exit. > On 5.17 > fops_release is called automatically, as well as iommu_sva_unbind_device. > On 5.18-rc1. > fops_release is not called, have to manually call close(fd) Right that's weird > Since nginx may have a issue, it does not call close(fd) when nginx -s quit. And you're sure that none of the processes are still alive or in zombie state? Just to cover every possibility. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Jean On 2022/4/22 下午6:11, Jean-Philippe Brucker wrote: On Fri, Apr 22, 2022 at 05:03:10PM +0800, zhangfei@foxmail.com wrote: [...] Have tested, still got some issue with our openssl-engine. 1. If openssl-engine does not register rsa, nginx works well. 2. If openssl-engine register rsa, nginx also works, but ioasid is not freed when nginx stop. IMPLEMENT_DYNAMIC_BIND_FN(bind_fn) bind_fn ENGINE_set_RSA(e, rsa_methods()) destroy_fn If ENGINE_set_RSA is set, nginx start and stop will NOT call destroy_fn. Even rsa_methods is almost new via RSA_meth_new. In 5.18-rcx, this caused ioasid not freed in nginx start and stop. In 5.17, though destroy_fn is not called, but ioasid is freed when nginx stop, so not noticed this issue before. 1. uacce_fops_release In 5.16 or 5.17 In fact, we aslo has the issue: openssl engine does not call destroy_fn -> close(uacce_fd) But system will automatically close all opened fd, so uacce_fops_release is also called and free ioasid. Have one experiment, not call close fd log: open uacce fd but no close [ 2583.471225] dump_backtrace+0x0/0x1a0 [ 2583.474876] show_stack+0x20/0x30 [ 2583.478178] dump_stack_lvl+0x8c/0xb8 [ 2583.481825] dump_stack+0x18/0x34 [ 2583.485126] uacce_fops_release+0x44/0xdc [ 2583.489117] __fput+0x78/0x240 [ 2583.492159] fput+0x18/0x28 [ 2583.495288] task_work_run+0x88/0x160 [ 2583.498936] do_notify_resume+0x214/0x490 [ 2583.502927] el0_svc+0x58/0x70 [ 2583.505968] el0t_64_sync_handler+0xb0/0xb8 [ 2583.510132] el0t_64_sync+0x1a0/0x1a4 [ 2583.582292] uacce_fops_release q=d6674128 In 5.18, since refcount was add. The opened uacce fd was not closed automatically by system So we see the issue. log: open uacce fd but no close [ 106.360140] uacce_fops_open q=ccc38d74 [ 106.364929] ioasid_alloc ioasid=1 [ 106.368585] iommu_sva_alloc_pasid pasid=1 [ 106.372943] iommu_sva_bind_device handle=6cca298a // ioasid is not free I'm trying to piece together what happens from the kernel point of view. * master process with mm A opens a queue fd through uacce, which calls iommu_sva_bind_device(dev, A) -> PASID 1 * master forks and exits. Child (daemon) gets mm B, inherits the queue fd. The device is still bound to mm A with PASID 1, since the queue fd is still open. We discussed this before, but I don't remember where we left off. The child can't use the queue because its mappings are not copied on fork(), and the queue is still bound to the parent mm A. The child either needs to open a new queue or take ownership of the old one with a new uacce ioctl. Yes, currently nginx aligned with the case. Child process (worker process) reopen uacce, Master process (do init) open uacce, iommu_sva_bind_device(dev, A) -> PASID 1 Master process fork Child (daemon) and exit. Child (daemon) does not use PASID 1 any more, only fork and manage worker process. Worker process reopen uacce, iommu_sva_bind_device(dev, B) PASID 2 So it is expected. Is that the "IMPLEMENT_DYNAMIC_BIND_FN()" you mention, something out of tree? This operation should unbind from A before binding to B, no? Otherwise we leak PASID 1. In 5.16 PASID 1 from master is hold until nginx service stop. nginx start master: iommu_sva_alloc_pasid mm->pasid=1 // master process lynx https start: iommu_sva_alloc_pasid mm->pasid=2 //worker process nginx stop: from fops_release iommu_sva_free_pasid mm->pasid=2 // worker process iommu_sva_free_pasid mm->pasid=1 // master process Have one silly question. kerne driver fops_open iommu_sva_bind_device fops_release iommu_sva_unbind_device application main() fd = open return; Application exit but not close(fd), is it expected fops_release will be called automatically by system? On 5.17 fops_release is called automatically, as well as iommu_sva_unbind_device. On 5.18-rc1. fops_release is not called, have to manually call close(fd) Since nginx may have a issue, it does not call close(fd) when nginx -s quit. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Fri, Apr 22, 2022 at 05:03:10PM +0800, zhangfei@foxmail.com wrote: [...] > > Have tested, still got some issue with our openssl-engine. > > > > 1. If openssl-engine does not register rsa, nginx works well. > > > > 2. If openssl-engine register rsa, nginx also works, but ioasid is not > > freed when nginx stop. > > > > IMPLEMENT_DYNAMIC_BIND_FN(bind_fn) > > bind_fn > > ENGINE_set_RSA(e, rsa_methods()) > > > > destroy_fn > > > > If ENGINE_set_RSA is set, nginx start and stop will NOT call destroy_fn. > > Even rsa_methods is almost new via RSA_meth_new. > > > > In 5.18-rcx, this caused ioasid not freed in nginx start and stop. > > In 5.17, though destroy_fn is not called, but ioasid is freed when nginx > > stop, so not noticed this issue before. > > 1. uacce_fops_release > In 5.16 or 5.17 > In fact, we aslo has the issue: openssl engine does not call destroy_fn -> > close(uacce_fd) > But system will automatically close all opened fd, > so uacce_fops_release is also called and free ioasid. > > Have one experiment, not call close fd > > log: open uacce fd but no close > [ 2583.471225] dump_backtrace+0x0/0x1a0 > [ 2583.474876] show_stack+0x20/0x30 > [ 2583.478178] dump_stack_lvl+0x8c/0xb8 > [ 2583.481825] dump_stack+0x18/0x34 > [ 2583.485126] uacce_fops_release+0x44/0xdc > [ 2583.489117] __fput+0x78/0x240 > [ 2583.492159] fput+0x18/0x28 > [ 2583.495288] task_work_run+0x88/0x160 > [ 2583.498936] do_notify_resume+0x214/0x490 > [ 2583.502927] el0_svc+0x58/0x70 > [ 2583.505968] el0t_64_sync_handler+0xb0/0xb8 > [ 2583.510132] el0t_64_sync+0x1a0/0x1a4 > [ 2583.582292] uacce_fops_release q=d6674128 > > In 5.18, since refcount was add. > The opened uacce fd was not closed automatically by system > So we see the issue. > > log: open uacce fd but no close > [ 106.360140] uacce_fops_open q=ccc38d74 > [ 106.364929] ioasid_alloc ioasid=1 > [ 106.368585] iommu_sva_alloc_pasid pasid=1 > [ 106.372943] iommu_sva_bind_device handle=6cca298a > // ioasid is not free I'm trying to piece together what happens from the kernel point of view. * master process with mm A opens a queue fd through uacce, which calls iommu_sva_bind_device(dev, A) -> PASID 1 * master forks and exits. Child (daemon) gets mm B, inherits the queue fd. The device is still bound to mm A with PASID 1, since the queue fd is still open. We discussed this before, but I don't remember where we left off. The child can't use the queue because its mappings are not copied on fork(), and the queue is still bound to the parent mm A. The child either needs to open a new queue or take ownership of the old one with a new uacce ioctl. Is that the "IMPLEMENT_DYNAMIC_BIND_FN()" you mention, something out of tree? This operation should unbind from A before binding to B, no? Otherwise we leak PASID 1. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Jean On 2022/4/21 下午2:47, zhangfei@foxmail.com wrote: On 2022/4/21 上午12:45, Jean-Philippe Brucker wrote: Hi, On Fri, Apr 15, 2022 at 02:51:08AM -0700, Fenghua Yu wrote: From a6444e1e5bd8076f5e5c5e950d3192de327f0c9c Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Fri, 15 Apr 2022 00:51:33 -0700 Subject: [RFC PATCH] iommu/sva: Fix PASID use-after-free issue A PASID might be still used even though it is freed on mm exit. process A: sva_bind(); ioasid_alloc() = N; // Get PASID N for the mm fork(): // spawn process B exit(); ioasid_free(N); process B: device uses PASID N -> failure sva_unbind(); Dave Hansen suggests to take a refcount on the mm whenever binding the PASID to a device and drop the refcount on unbinding. The mm won't be dropped if the PASID is still bound to it. Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") Reported-by: Zhangfei Gao Suggested-by: Dave Hansen" Signed-off-by: Fenghua Yu --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 6 ++ drivers/iommu/intel/svm.c | 4 2 files changed, 10 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 22ddd05bbdcd..3fcb842a0df0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -7,6 +7,7 @@ #include #include #include +#include #include "arm-smmu-v3.h" #include "../../iommu-sva-lib.h" @@ -363,6 +364,9 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata) mutex_lock(_lock); handle = __arm_smmu_sva_bind(dev, mm); + /* Take an mm refcount on a successful bind. */ + if (!IS_ERR(handle)) + mmget(mm); mutex_unlock(_lock); return handle; } @@ -372,6 +376,8 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle) struct arm_smmu_bond *bond = sva_to_bond(handle); mutex_lock(_lock); + /* Drop an mm refcount. */ + mmput(bond->mm); I do like the idea because it will simplify the driver. We can't call mmput() here, though, because it may call the release() MMU notifier which will try to grab sva_lock, already held. I also found another use-after-free in arm_smmu_free_shared_cd(), where we call arm64_mm_context_put() when the mm could already be freed. There used to be an mmgrab() preventing this but it went away during a rewrite. To fix both we could just move mmput() at the end of unbind() but I'd rather do a proper cleanup removing the release() notifier right away. Zhangfei, could you try the patch below? Thanks, Jean --- 8< --- From 4e09c0d71dfb35fc90915bd1e36545027fbf8a03 Mon Sep 17 00:00:00 2001 From: Jean-Philippe Brucker Date: Wed, 20 Apr 2022 10:19:24 +0100 Subject: [PATCH] iommu/arm-smmu-v3-sva: Fix PASID and mm use-after-free issues Commit 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") frees the PASID earlier than what the SMMUv3 driver expects. At the moment the SMMU driver handles mm exit in the release() MMU notifier by quiescing the context descriptor. The context descriptor is only made invalid in unbind(), after the device driver ensured the PASID is not used anymore. Releasing the PASID on mm exit may cause it to be reallocated while it is still used by the context descriptor. There is another use-after-free, present since the beginning, where we call arm64_mm_context_put() without a guarantee that mm_count is held. Dave Hansen suggests to grab mm_users whenever binding the mm to a device and drop it on unbinding. With that we can fix both issues and simplify the driver by removing the release() notifier. Fixes: 32784a9562fb ("iommu/arm-smmu-v3: Implement iommu_sva_bind/unbind()") Reported-by: Zhangfei Gao Suggested-by: Dave Hansen Signed-off-by: Fenghua Yu Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 49 +-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 +- 3 files changed, 15 insertions(+), 49 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index cd48590ada30..d50d215d946c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -735,7 +735,6 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) extern struct xarray arm_smmu_asid_xa; extern struct mutex arm_smmu_asid_lock; -extern struct arm_smmu_ctx_desc quiet_cd; int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, struct arm_smmu_ctx_desc *cd); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 22ddd05bbdcd..f9dff0f6cdd4 100644 ---
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/21 上午12:45, Jean-Philippe Brucker wrote: Hi, On Fri, Apr 15, 2022 at 02:51:08AM -0700, Fenghua Yu wrote: From a6444e1e5bd8076f5e5c5e950d3192de327f0c9c Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Fri, 15 Apr 2022 00:51:33 -0700 Subject: [RFC PATCH] iommu/sva: Fix PASID use-after-free issue A PASID might be still used even though it is freed on mm exit. process A: sva_bind(); ioasid_alloc() = N; // Get PASID N for the mm fork(): // spawn process B exit(); ioasid_free(N); process B: device uses PASID N -> failure sva_unbind(); Dave Hansen suggests to take a refcount on the mm whenever binding the PASID to a device and drop the refcount on unbinding. The mm won't be dropped if the PASID is still bound to it. Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") Reported-by: Zhangfei Gao Suggested-by: Dave Hansen" Signed-off-by: Fenghua Yu --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 6 ++ drivers/iommu/intel/svm.c | 4 2 files changed, 10 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 22ddd05bbdcd..3fcb842a0df0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -7,6 +7,7 @@ #include #include #include +#include #include "arm-smmu-v3.h" #include "../../iommu-sva-lib.h" @@ -363,6 +364,9 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata) mutex_lock(_lock); handle = __arm_smmu_sva_bind(dev, mm); + /* Take an mm refcount on a successful bind. */ + if (!IS_ERR(handle)) + mmget(mm); mutex_unlock(_lock); return handle; } @@ -372,6 +376,8 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle) struct arm_smmu_bond *bond = sva_to_bond(handle); mutex_lock(_lock); + /* Drop an mm refcount. */ + mmput(bond->mm); I do like the idea because it will simplify the driver. We can't call mmput() here, though, because it may call the release() MMU notifier which will try to grab sva_lock, already held. I also found another use-after-free in arm_smmu_free_shared_cd(), where we call arm64_mm_context_put() when the mm could already be freed. There used to be an mmgrab() preventing this but it went away during a rewrite. To fix both we could just move mmput() at the end of unbind() but I'd rather do a proper cleanup removing the release() notifier right away. Zhangfei, could you try the patch below? Thanks, Jean --- 8< --- From 4e09c0d71dfb35fc90915bd1e36545027fbf8a03 Mon Sep 17 00:00:00 2001 From: Jean-Philippe Brucker Date: Wed, 20 Apr 2022 10:19:24 +0100 Subject: [PATCH] iommu/arm-smmu-v3-sva: Fix PASID and mm use-after-free issues Commit 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") frees the PASID earlier than what the SMMUv3 driver expects. At the moment the SMMU driver handles mm exit in the release() MMU notifier by quiescing the context descriptor. The context descriptor is only made invalid in unbind(), after the device driver ensured the PASID is not used anymore. Releasing the PASID on mm exit may cause it to be reallocated while it is still used by the context descriptor. There is another use-after-free, present since the beginning, where we call arm64_mm_context_put() without a guarantee that mm_count is held. Dave Hansen suggests to grab mm_users whenever binding the mm to a device and drop it on unbinding. With that we can fix both issues and simplify the driver by removing the release() notifier. Fixes: 32784a9562fb ("iommu/arm-smmu-v3: Implement iommu_sva_bind/unbind()") Reported-by: Zhangfei Gao Suggested-by: Dave Hansen Signed-off-by: Fenghua Yu Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 49 +-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 +- 3 files changed, 15 insertions(+), 49 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index cd48590ada30..d50d215d946c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -735,7 +735,6 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) extern struct xarray arm_smmu_asid_xa; extern struct mutex arm_smmu_asid_lock; -extern struct arm_smmu_ctx_desc quiet_cd; int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, struct arm_smmu_ctx_desc *cd); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 22ddd05bbdcd..f9dff0f6cdd4 100644 ---
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, On Fri, Apr 15, 2022 at 02:51:08AM -0700, Fenghua Yu wrote: > From a6444e1e5bd8076f5e5c5e950d3192de327f0c9c Mon Sep 17 00:00:00 2001 > From: Fenghua Yu > Date: Fri, 15 Apr 2022 00:51:33 -0700 > Subject: [RFC PATCH] iommu/sva: Fix PASID use-after-free issue > > A PASID might be still used even though it is freed on mm exit. > > process A: > sva_bind(); > ioasid_alloc() = N; // Get PASID N for the mm > fork(): // spawn process B > exit(); > ioasid_free(N); > > process B: > device uses PASID N -> failure > sva_unbind(); > > Dave Hansen suggests to take a refcount on the mm whenever binding the > PASID to a device and drop the refcount on unbinding. The mm won't be > dropped if the PASID is still bound to it. > > Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and > free it on mm exit") > > Reported-by: Zhangfei Gao > Suggested-by: Dave Hansen" > Signed-off-by: Fenghua Yu > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 6 ++ > drivers/iommu/intel/svm.c | 4 > 2 files changed, 10 insertions(+) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > index 22ddd05bbdcd..3fcb842a0df0 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > #include "arm-smmu-v3.h" > #include "../../iommu-sva-lib.h" > @@ -363,6 +364,9 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct > *mm, void *drvdata) > > mutex_lock(_lock); > handle = __arm_smmu_sva_bind(dev, mm); > + /* Take an mm refcount on a successful bind. */ > + if (!IS_ERR(handle)) > + mmget(mm); > mutex_unlock(_lock); > return handle; > } > @@ -372,6 +376,8 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle) > struct arm_smmu_bond *bond = sva_to_bond(handle); > > mutex_lock(_lock); > + /* Drop an mm refcount. */ > + mmput(bond->mm); I do like the idea because it will simplify the driver. We can't call mmput() here, though, because it may call the release() MMU notifier which will try to grab sva_lock, already held. I also found another use-after-free in arm_smmu_free_shared_cd(), where we call arm64_mm_context_put() when the mm could already be freed. There used to be an mmgrab() preventing this but it went away during a rewrite. To fix both we could just move mmput() at the end of unbind() but I'd rather do a proper cleanup removing the release() notifier right away. Zhangfei, could you try the patch below? Thanks, Jean --- 8< --- >From 4e09c0d71dfb35fc90915bd1e36545027fbf8a03 Mon Sep 17 00:00:00 2001 From: Jean-Philippe Brucker Date: Wed, 20 Apr 2022 10:19:24 +0100 Subject: [PATCH] iommu/arm-smmu-v3-sva: Fix PASID and mm use-after-free issues Commit 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") frees the PASID earlier than what the SMMUv3 driver expects. At the moment the SMMU driver handles mm exit in the release() MMU notifier by quiescing the context descriptor. The context descriptor is only made invalid in unbind(), after the device driver ensured the PASID is not used anymore. Releasing the PASID on mm exit may cause it to be reallocated while it is still used by the context descriptor. There is another use-after-free, present since the beginning, where we call arm64_mm_context_put() without a guarantee that mm_count is held. Dave Hansen suggests to grab mm_users whenever binding the mm to a device and drop it on unbinding. With that we can fix both issues and simplify the driver by removing the release() notifier. Fixes: 32784a9562fb ("iommu/arm-smmu-v3: Implement iommu_sva_bind/unbind()") Reported-by: Zhangfei Gao Suggested-by: Dave Hansen Signed-off-by: Fenghua Yu Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 - .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 49 +-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 +- 3 files changed, 15 insertions(+), 49 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index cd48590ada30..d50d215d946c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -735,7 +735,6 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) extern struct xarray arm_smmu_asid_xa; extern struct mutex arm_smmu_asid_lock; -extern struct arm_smmu_ctx_desc quiet_cd; int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, struct arm_smmu_ctx_desc *cd); diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 22ddd05bbdcd..f9dff0f6cdd4 100644 ---
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/19 上午2:14, Jacob Pan wrote: Hi zhangfei@foxmail.com, On Sat, 16 Apr 2022 09:43:07 +0800, "zhangfei@foxmail.com" wrote: On 2022/4/16 上午5:00, Jacob Pan wrote: Hi zhangfei@foxmail.com, On Fri, 15 Apr 2022 19:52:03 +0800, "zhangfei@foxmail.com" wrote: A PASID might be still used even though it is freed on mm exit. process A: sva_bind(); ioasid_alloc() = N; // Get PASID N for the mm fork(): // spawn process B exit(); ioasid_free(N); process B: device uses PASID N -> failure sva_unbind(); Dave Hansen suggests to take a refcount on the mm whenever binding the PASID to a device and drop the refcount on unbinding. The mm won't be dropped if the PASID is still bound to it. Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") Is process A's mm intended to be used by process B? Or you really should use PASID N on process B's mm? If the latter, it may work for a while until B changes mapping. It seems you are just extending the life of a defunct mm? From nginx code, the master process init resources, then fork daemon process to take over, then master process exit by itself. src/core/nginx.c main ngx_ssl_init(log); -> openssl engine -> bind_fn -> sva_bind() ngx_daemon(cycle->log) src/os/unix/ngx_daemon.c ngx_daemon(ngx_log_t *log) { int fd; switch (fork()) { case -1: ngx_log_error(NGX_LOG_EMERG, log, ngx_errno, "fork() failed"); return NGX_ERROR; case 0: // the fork daemon process break; Does this child process call sva_bind() again to get another PASID? Or it will keep using the parent's PASID for DMA? The master process call sva_bind (PASID A), fork daemon process, then exit. The daemon process does not call sva_bind again, only for managing worker processes. The worker process will call sva_bind for new PASID (B), for real transaction. The worker process will free the PASID (B) when worker process exit like nginx quit. nginx -s quit does not free PASID A via callback, which may should be freed by signal handler in engine itself, still in check. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi zhangfei@foxmail.com, On Sat, 16 Apr 2022 09:43:07 +0800, "zhangfei@foxmail.com" wrote: > On 2022/4/16 上午5:00, Jacob Pan wrote: > > Hi zhangfei@foxmail.com, > > > > On Fri, 15 Apr 2022 19:52:03 +0800, "zhangfei@foxmail.com" > > wrote: > > > > A PASID might be still used even though it is freed on mm exit. > > > > process A: > > sva_bind(); > > ioasid_alloc() = N; // Get PASID N for the mm > > fork(): // spawn process B > > exit(); > > ioasid_free(N); > > > > process B: > > device uses PASID N -> failure > > sva_unbind(); > > > > Dave Hansen suggests to take a refcount on the mm whenever binding > > the PASID to a device and drop the refcount on unbinding. The mm > > won't be dropped if the PASID is still bound to it. > > > > Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID > > allocation and free it on mm exit") > > > > Is process A's mm intended to be used by process B? Or you really should > > use PASID N on process B's mm? If the latter, it may work for a while > > until B changes mapping. > > > > It seems you are just extending the life of a defunct mm? > > From nginx code, the master process init resources, then fork daemon > process to take over, > then master process exit by itself. > > src/core/nginx.c > main > ngx_ssl_init(log); -> openssl engine -> bind_fn -> sva_bind() > ngx_daemon(cycle->log) > > src/os/unix/ngx_daemon.c > ngx_daemon(ngx_log_t *log) > { > int fd; > > switch (fork()) { > case -1: > ngx_log_error(NGX_LOG_EMERG, log, ngx_errno, "fork() failed"); > return NGX_ERROR; > > case 0: > // the fork daemon process > break; > Does this child process call sva_bind() again to get another PASID? Or it will keep using the parent's PASID for DMA? > default: > // master process directly exit, and release mm as well as ioasid > exit(0); > } > > // only daemon process > > Thanks > > > > > Thanks, > > > > Jacob > Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi Kevin, On Mon, 18 Apr 2022 06:34:19 +, "Tian, Kevin" wrote: > > From: Jacob Pan > > Sent: Saturday, April 16, 2022 5:00 AM > > > > Hi zhangfei@foxmail.com, > > > > On Fri, 15 Apr 2022 19:52:03 +0800, "zhangfei@foxmail.com" > > wrote: > > > > > >>> A PASID might be still used even though it is freed on mm exit. > > > >>> > > > >>> process A: > > > >>> sva_bind(); > > > >>> ioasid_alloc() = N; // Get PASID N for the mm > > > >>> fork(): // spawn process B > > > >>> exit(); > > > >>> ioasid_free(N); > > > >>> > > > >>> process B: > > > >>> device uses PASID N -> failure > > > >>> sva_unbind(); > > > >>> > > > >>> Dave Hansen suggests to take a refcount on the mm whenever > > > >>> binding > > the > > > >>> PASID to a device and drop the refcount on unbinding. The mm > > > >>> won't > > be > > > >>> dropped if the PASID is still bound to it. > > > >>> > > > >>> Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID > > > >>> allocation and free it on mm exit") > > > >>> > > Is process A's mm intended to be used by process B? Or you really should > > use PASID N on process B's mm? If the latter, it may work for a while > > until B changes mapping. > > > > It seems you are just extending the life of a defunct mm? > > > > IMHO the intention is not to allow B to access A's mm. > > The problem is that PASID N is released on exit() of A and then > reallocated to B before iommu driver gets the chance to quiesce > the device and clear the PASID entry. According to the discussion > the quiesce operation must be done when driver calls unbind() > instead of in mm exit. In this case a failure is reported when > B tries to call bind() on PASID N due to an already-present entry. > > Dave's patch extending the life of A's mm until unbind() is called. > With it B either gets a different PASID before A's unbind() is > completed or same PASID N pointing to B's mm after A's unbind(). > As long as B gets a different PASID, that is fine. It seems PASID N has no use then. > Thanks > Kevin Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
> From: Jacob Pan > Sent: Saturday, April 16, 2022 5:00 AM > > Hi zhangfei@foxmail.com, > > On Fri, 15 Apr 2022 19:52:03 +0800, "zhangfei@foxmail.com" > wrote: > > > >>> A PASID might be still used even though it is freed on mm exit. > > >>> > > >>> process A: > > >>> sva_bind(); > > >>> ioasid_alloc() = N; // Get PASID N for the mm > > >>> fork(): // spawn process B > > >>> exit(); > > >>> ioasid_free(N); > > >>> > > >>> process B: > > >>> device uses PASID N -> failure > > >>> sva_unbind(); > > >>> > > >>> Dave Hansen suggests to take a refcount on the mm whenever binding > the > > >>> PASID to a device and drop the refcount on unbinding. The mm won't > be > > >>> dropped if the PASID is still bound to it. > > >>> > > >>> Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID > > >>> allocation and free it on mm exit") > > >>> > Is process A's mm intended to be used by process B? Or you really should > use PASID N on process B's mm? If the latter, it may work for a while until > B changes mapping. > > It seems you are just extending the life of a defunct mm? > IMHO the intention is not to allow B to access A's mm. The problem is that PASID N is released on exit() of A and then reallocated to B before iommu driver gets the chance to quiesce the device and clear the PASID entry. According to the discussion the quiesce operation must be done when driver calls unbind() instead of in mm exit. In this case a failure is reported when B tries to call bind() on PASID N due to an already-present entry. Dave's patch extending the life of A's mm until unbind() is called. With it B either gets a different PASID before A's unbind() is completed or same PASID N pointing to B's mm after A's unbind(). Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/16 上午5:00, Jacob Pan wrote: Hi zhangfei@foxmail.com, On Fri, 15 Apr 2022 19:52:03 +0800, "zhangfei@foxmail.com" wrote: A PASID might be still used even though it is freed on mm exit. process A: sva_bind(); ioasid_alloc() = N; // Get PASID N for the mm fork(): // spawn process B exit(); ioasid_free(N); process B: device uses PASID N -> failure sva_unbind(); Dave Hansen suggests to take a refcount on the mm whenever binding the PASID to a device and drop the refcount on unbinding. The mm won't be dropped if the PASID is still bound to it. Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") Is process A's mm intended to be used by process B? Or you really should use PASID N on process B's mm? If the latter, it may work for a while until B changes mapping. It seems you are just extending the life of a defunct mm? From nginx code, the master process init resources, then fork daemon process to take over, then master process exit by itself. src/core/nginx.c main ngx_ssl_init(log); -> openssl engine -> bind_fn -> sva_bind() ngx_daemon(cycle->log) src/os/unix/ngx_daemon.c ngx_daemon(ngx_log_t *log) { int fd; switch (fork()) { case -1: ngx_log_error(NGX_LOG_EMERG, log, ngx_errno, "fork() failed"); return NGX_ERROR; case 0: // the fork daemon process break; default: // master process directly exit, and release mm as well as ioasid exit(0); } // only daemon process Thanks Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/15 下午8:37, Fenghua Yu wrote: Hi, Zhangfei, On Fri, Apr 15, 2022 at 07:52:03PM +0800, zhangfei@foxmail.com wrote: On 2022/4/15 下午6:50, Fenghua Yu wrote: Hi, Zhangfei, On Fri, Apr 15, 2022 at 06:14:09PM +0800, zhangfei@foxmail.com wrote: I download this patch from: https://lore.kernel.org/lkml/ylladl6umolll...@fyu1.sc.intel.com/raw git am to either v5.18-rc2 or the latest upstream without any issue. It is my copy paste issue. I have tested, nginx woks well. Great! Other than the following issue, Each time /sbin/nginx will alloc ioasid but not free. which I think it maybe nginx issue or the mis-usage, will ask there. Which nginx/openssl function is supposed to call kernel sva_unbind? I couldn't find the function in nginx tree. If nginx doesn't free ioasid, it will cause ioasid leak and memory leak. Yes In my case, sva_bind/unbind is from openssl_engine, bind_fn, not in nginx itself nginx will use openssl -> openssl engine. nginx: src/core/nginx.c main ngx_ssl_init(log); OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL) openssl_engine IMPLEMENT_DYNAMIC_BIND_FN(bind_fn) bind_fn sva_bind destroy sva_unbind But destroy seems not called in sbin/nginx -s quit. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi zhangfei@foxmail.com, On Fri, 15 Apr 2022 19:52:03 +0800, "zhangfei@foxmail.com" wrote: > >>> A PASID might be still used even though it is freed on mm exit. > >>> > >>> process A: > >>> sva_bind(); > >>> ioasid_alloc() = N; // Get PASID N for the mm > >>> fork(): // spawn process B > >>> exit(); > >>> ioasid_free(N); > >>> > >>> process B: > >>> device uses PASID N -> failure > >>> sva_unbind(); > >>> > >>> Dave Hansen suggests to take a refcount on the mm whenever binding the > >>> PASID to a device and drop the refcount on unbinding. The mm won't be > >>> dropped if the PASID is still bound to it. > >>> > >>> Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID > >>> allocation and free it on mm exit") > >>> Is process A's mm intended to be used by process B? Or you really should use PASID N on process B's mm? If the latter, it may work for a while until B changes mapping. It seems you are just extending the life of a defunct mm? Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Zhangfei, On Fri, Apr 15, 2022 at 07:52:03PM +0800, zhangfei@foxmail.com wrote: > > On my X86 machine, nginx doesn't trigger the kernel sva binding function > > to allocate ioasid. I tried pre- nstalled nginx/openssl and also tried my > > built > > a few versions of nginx/openssl. nginx does call OPENSSL_init_ssl() but > > doesn't go to the binding function. Don't know if it's my configuration > > issue. > > Maybe you can give me some advice? > I am using openssl engine, which use crypto driver and using sva via uacce. > nginx -> openssl -> openssl engine -> sva related. uacce is not used on X86. That's why I cannot test IOASID/PASID by nginx on X86. I only can test the RFC patch by other test tools via IDXD driver which uses PASID on X86. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Zhangfei, On Fri, Apr 15, 2022 at 07:52:03PM +0800, zhangfei@foxmail.com wrote: > > > On 2022/4/15 下午6:50, Fenghua Yu wrote: > > Hi, Zhangfei, > > > > On Fri, Apr 15, 2022 at 06:14:09PM +0800, zhangfei@foxmail.com wrote: > > I download this patch from: > > https://lore.kernel.org/lkml/ylladl6umolll...@fyu1.sc.intel.com/raw > > git am to either v5.18-rc2 or the latest upstream without any issue. > It is my copy paste issue. > > I have tested, nginx woks well. Great! > > Other than the following issue, > Each time /sbin/nginx will alloc ioasid but not free. > which I think it maybe nginx issue or the mis-usage, will ask there. Which nginx/openssl function is supposed to call kernel sva_unbind? I couldn't find the function in nginx tree. If nginx doesn't free ioasid, it will cause ioasid leak and memory leak. > > Tested-by: Zhangfei Gao Thank you for your testing! > > > > > > It should work for arm. > > > > > > In fact I have a similar patch at hand but pending since I found an issue. > > > > > > I start & stop nginx via this cmd. > > > //start > > > sudo sbin/nginx // this alloc an ioasid=1 > > > //stop > > > sudo sbin/nginx -s quit // this does not free ioasid=1, but still alloc > > > ioasid=2. > > > So ioasid will keep allocated but not freed if continue start/stop nginx, > > > though not impact the nginx function. > > > > > > stop nginx with -s quit still calls > > > src/core/nginx.c > > > main -> ngx_ssl_init -> openssl engine: bind_fn -> ... -> alloc asid > > > But openssl engine: ENGINE_free is not called > > > > > > Still in checking nginx code. > > > > > > Or do you test with nginx? > > On my X86 machine, nginx doesn't trigger the kernel sva binding function > > to allocate ioasid. I tried pre- nstalled nginx/openssl and also tried my > > built > > a few versions of nginx/openssl. nginx does call OPENSSL_init_ssl() but > > doesn't go to the binding function. Don't know if it's my configuration > > issue. > > Maybe you can give me some advice? > I am using openssl engine, which use crypto driver and using sva via uacce. > nginx -> openssl -> openssl engine -> sva related. I'll do more nginx experiments. > > > > > I test the patch with a few internal test tools and observe mmget()/mmput() > > works fine in various cases. > OK, thanks Thank you very much! -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/15 下午6:50, Fenghua Yu wrote: Hi, Zhangfei, On Fri, Apr 15, 2022 at 06:14:09PM +0800, zhangfei@foxmail.com wrote: On 2022/4/15 下午5:51, Fenghua Yu wrote: On Thu, Apr 14, 2022 at 06:08:09PM +0800, zhangfei@foxmail.com wrote: On 2022/4/12 下午11:35, zhangfei@foxmail.com wrote: Hi, Fenghua On 2022/4/12 下午9:41, Fenghua Yu wrote: From a6444e1e5bd8076f5e5c5e950d3192de327f0c9c Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Fri, 15 Apr 2022 00:51:33 -0700 Subject: [RFC PATCH] iommu/sva: Fix PASID use-after-free issue A PASID might be still used even though it is freed on mm exit. process A: sva_bind(); ioasid_alloc() = N; // Get PASID N for the mm fork(): // spawn process B exit(); ioasid_free(N); process B: device uses PASID N -> failure sva_unbind(); Dave Hansen suggests to take a refcount on the mm whenever binding the PASID to a device and drop the refcount on unbinding. The mm won't be dropped if the PASID is still bound to it. Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") Reported-by: Zhangfei Gao Suggested-by: Dave Hansen" Signed-off-by: Fenghua Yu --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 6 ++ drivers/iommu/intel/svm.c | 4 2 files changed, 10 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 22ddd05bbdcd..3fcb842a0df0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -7,6 +7,7 @@ #include #include #include +#include #include "arm-smmu-v3.h" #include "../../iommu-sva-lib.h" @@ -363,6 +364,9 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata) mutex_lock(_lock); handle = __arm_smmu_sva_bind(dev, mm); + /* Take an mm refcount on a successful bind. */ + if (!IS_ERR(handle)) + mmget(mm); mutex_unlock(_lock); return handle; } @@ -372,6 +376,8 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle) struct arm_smmu_bond *bond = sva_to_bond(handle); mutex_lock(_lock); + /* Drop an mm refcount. */ + mmput(bond->mm); if (refcount_dec_and_test(>refs)) { list_del(>list); arm_smmu_mmu_notifier_put(bond->smmu_mn); diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 23a38763c1d1..345a0d5d7922 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -403,6 +403,8 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu, goto free_sdev; list_add_rcu(>list, >devs); + /* Take an mm refcount on binding mm. */ + mmget(mm); success: return >sva; @@ -465,6 +467,8 @@ static int intel_svm_unbind_mm(struct device *dev, u32 pasid) kfree(svm); } } + /* Drop an mm reference on unbinding mm. */ + mmput(mm); } out: return ret; This patch can not be applied on 5.18-rc2 for intel part. What error do you see? Could you please send to me errors? I download this patch from: https://lore.kernel.org/lkml/ylladl6umolll...@fyu1.sc.intel.com/raw git am to either v5.18-rc2 or the latest upstream without any issue. It is my copy paste issue. I have tested, nginx woks well. Other than the following issue, Each time /sbin/nginx will alloc ioasid but not free. which I think it maybe nginx issue or the mis-usage, will ask there. Tested-by: Zhangfei Gao It should work for arm. In fact I have a similar patch at hand but pending since I found an issue. I start & stop nginx via this cmd. //start sudo sbin/nginx // this alloc an ioasid=1 //stop sudo sbin/nginx -s quit // this does not free ioasid=1, but still alloc ioasid=2. So ioasid will keep allocated but not freed if continue start/stop nginx, though not impact the nginx function. stop nginx with -s quit still calls src/core/nginx.c main -> ngx_ssl_init -> openssl engine: bind_fn -> ... -> alloc asid But openssl engine: ENGINE_free is not called Still in checking nginx code. Or do you test with nginx? On my X86 machine, nginx doesn't trigger the kernel sva binding function to allocate ioasid. I tried pre- nstalled nginx/openssl and also tried my built a few versions of nginx/openssl. nginx does call OPENSSL_init_ssl() but doesn't go to the binding function. Don't know if it's my configuration issue. Maybe you can give me some advice? I am using openssl engine, which use crypto driver and using sva via uacce. nginx -> openssl -> openssl engine -> sva related. I test the patch with a few internal test tools and observe mmget()/mmput() works fine in various cases. OK, thanks ___
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Zhangfei, On Fri, Apr 15, 2022 at 06:14:09PM +0800, zhangfei@foxmail.com wrote: > > > On 2022/4/15 下午5:51, Fenghua Yu wrote: > > On Thu, Apr 14, 2022 at 06:08:09PM +0800, zhangfei@foxmail.com wrote: > > > On 2022/4/12 下午11:35, zhangfei@foxmail.com wrote: > > > > Hi, Fenghua > > > > > > > > On 2022/4/12 下午9:41, Fenghua Yu wrote: > > From a6444e1e5bd8076f5e5c5e950d3192de327f0c9c Mon Sep 17 00:00:00 2001 > > From: Fenghua Yu > > Date: Fri, 15 Apr 2022 00:51:33 -0700 > > Subject: [RFC PATCH] iommu/sva: Fix PASID use-after-free issue > > > > A PASID might be still used even though it is freed on mm exit. > > > > process A: > > sva_bind(); > > ioasid_alloc() = N; // Get PASID N for the mm > > fork(): // spawn process B > > exit(); > > ioasid_free(N); > > > > process B: > > device uses PASID N -> failure > > sva_unbind(); > > > > Dave Hansen suggests to take a refcount on the mm whenever binding the > > PASID to a device and drop the refcount on unbinding. The mm won't be > > dropped if the PASID is still bound to it. > > > > Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation > > and free it on mm exit") > > > > Reported-by: Zhangfei Gao > > Suggested-by: Dave Hansen" > > Signed-off-by: Fenghua Yu > > --- > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 6 ++ > > drivers/iommu/intel/svm.c | 4 > > 2 files changed, 10 insertions(+) > > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > > index 22ddd05bbdcd..3fcb842a0df0 100644 > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > > @@ -7,6 +7,7 @@ > > #include > > #include > > #include > > +#include > > #include "arm-smmu-v3.h" > > #include "../../iommu-sva-lib.h" > > @@ -363,6 +364,9 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct > > *mm, void *drvdata) > > mutex_lock(_lock); > > handle = __arm_smmu_sva_bind(dev, mm); > > + /* Take an mm refcount on a successful bind. */ > > + if (!IS_ERR(handle)) > > + mmget(mm); > > mutex_unlock(_lock); > > return handle; > > } > > @@ -372,6 +376,8 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle) > > struct arm_smmu_bond *bond = sva_to_bond(handle); > > mutex_lock(_lock); > > + /* Drop an mm refcount. */ > > + mmput(bond->mm); > > if (refcount_dec_and_test(>refs)) { > > list_del(>list); > > arm_smmu_mmu_notifier_put(bond->smmu_mn); > > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c > > index 23a38763c1d1..345a0d5d7922 100644 > > --- a/drivers/iommu/intel/svm.c > > +++ b/drivers/iommu/intel/svm.c > > @@ -403,6 +403,8 @@ static struct iommu_sva *intel_svm_bind_mm(struct > > intel_iommu *iommu, > > goto free_sdev; > > list_add_rcu(>list, >devs); > > + /* Take an mm refcount on binding mm. */ > > + mmget(mm); > > success: > > return >sva; > > @@ -465,6 +467,8 @@ static int intel_svm_unbind_mm(struct device *dev, u32 > > pasid) > > kfree(svm); > > } > > } > > + /* Drop an mm reference on unbinding mm. */ > > + mmput(mm); > > } > > out: > > return ret; > This patch can not be applied on 5.18-rc2 for intel part. What error do you see? Could you please send to me errors? I download this patch from: https://lore.kernel.org/lkml/ylladl6umolll...@fyu1.sc.intel.com/raw git am to either v5.18-rc2 or the latest upstream without any issue. > It should work for arm. > > In fact I have a similar patch at hand but pending since I found an issue. > > I start & stop nginx via this cmd. > //start > sudo sbin/nginx // this alloc an ioasid=1 > //stop > sudo sbin/nginx -s quit // this does not free ioasid=1, but still alloc > ioasid=2. > So ioasid will keep allocated but not freed if continue start/stop nginx, > though not impact the nginx function. > > stop nginx with -s quit still calls > src/core/nginx.c > main -> ngx_ssl_init -> openssl engine: bind_fn -> ... -> alloc asid > But openssl engine: ENGINE_free is not called > > Still in checking nginx code. > > Or do you test with nginx? On my X86 machine, nginx doesn't trigger the kernel sva binding function to allocate ioasid. I tried pre- nstalled nginx/openssl and also tried my built a few versions of nginx/openssl. nginx does call OPENSSL_init_ssl() but doesn't go to the binding function. Don't know if it's my configuration issue. Maybe you can give me some advice? I test the patch with a few internal test tools and observe mmget()/mmput() works fine in various cases. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/15 下午5:51, Fenghua Yu wrote: On Thu, Apr 14, 2022 at 06:08:09PM +0800, zhangfei@foxmail.com wrote: On 2022/4/12 下午11:35, zhangfei@foxmail.com wrote: Hi, Fenghua On 2022/4/12 下午9:41, Fenghua Yu wrote: Hi, Zhangfei, On Tue, Apr 12, 2022 at 03:04:09PM +0800, zhangfei@foxmail.com wrote: On 2022/4/11 下午10:52, Dave Hansen wrote: On 4/11/22 07:44, zhangfei@foxmail.com wrote: On 2022/4/11 下午10:36, Dave Hansen wrote: On 4/11/22 07:20, zhangfei@foxmail.com wrote: Agree with Dave, I think user space should not be broken. Thanks Any plan about this regression? Currently I need this patch to workaround the issue. diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 22ddd05bbdcd..2d74ac53d11c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -4,6 +4,7 @@ */ #include +#include #include #include #include @@ -363,6 +364,7 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata) mutex_lock(_lock); handle = __arm_smmu_sva_bind(dev, mm); + mmget(mm); mutex_unlock(_lock); return handle; } @@ -377,6 +379,7 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle) arm_smmu_mmu_notifier_put(bond->smmu_mn); kfree(bond); } + mmput(bond->mm); mutex_unlock(_lock); } Could you please review and/or test this patch? It's supposed to fix the PASID issue on both ARM and X86. From a6444e1e5bd8076f5e5c5e950d3192de327f0c9c Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Fri, 15 Apr 2022 00:51:33 -0700 Subject: [RFC PATCH] iommu/sva: Fix PASID use-after-free issue A PASID might be still used even though it is freed on mm exit. process A: sva_bind(); ioasid_alloc() = N; // Get PASID N for the mm fork(): // spawn process B exit(); ioasid_free(N); process B: device uses PASID N -> failure sva_unbind(); Dave Hansen suggests to take a refcount on the mm whenever binding the PASID to a device and drop the refcount on unbinding. The mm won't be dropped if the PASID is still bound to it. Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") Reported-by: Zhangfei Gao Suggested-by: Dave Hansen" Signed-off-by: Fenghua Yu --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 6 ++ drivers/iommu/intel/svm.c | 4 2 files changed, 10 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 22ddd05bbdcd..3fcb842a0df0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -7,6 +7,7 @@ #include #include #include +#include #include "arm-smmu-v3.h" #include "../../iommu-sva-lib.h" @@ -363,6 +364,9 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata) mutex_lock(_lock); handle = __arm_smmu_sva_bind(dev, mm); + /* Take an mm refcount on a successful bind. */ + if (!IS_ERR(handle)) + mmget(mm); mutex_unlock(_lock); return handle; } @@ -372,6 +376,8 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle) struct arm_smmu_bond *bond = sva_to_bond(handle); mutex_lock(_lock); + /* Drop an mm refcount. */ + mmput(bond->mm); if (refcount_dec_and_test(>refs)) { list_del(>list); arm_smmu_mmu_notifier_put(bond->smmu_mn); diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 23a38763c1d1..345a0d5d7922 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -403,6 +403,8 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu, goto free_sdev; list_add_rcu(>list, >devs); + /* Take an mm refcount on binding mm. */ + mmget(mm); success: return >sva; @@ -465,6 +467,8 @@ static int intel_svm_unbind_mm(struct device *dev, u32 pasid) kfree(svm); } } + /* Drop an mm reference on unbinding mm. */ + mmput(mm); } out: return ret; This patch can not be applied on 5.18-rc2 for intel part. It should work for arm. In fact I have a similar patch at hand but pending since I found an issue. I start & stop nginx via this cmd. //start sudo sbin/nginx // this alloc an ioasid=1 //stop sudo sbin/nginx -s quit // this does not free ioasid=1, but still alloc ioasid=2. So ioasid will keep allocated but not freed if continue start/stop nginx, though not impact the nginx function. stop nginx with -s quit still calls src/core/nginx.c main -> ngx_ssl_init -> openssl engine: bind_fn -> ... -> alloc
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Dave, On Tue, Apr 12, 2022 at 07:39:10AM -0700, Dave Hansen wrote: > On 4/12/22 06:41, Fenghua Yu wrote: > >> master process quit, mmput -> mm_pasid_drop->ioasid_free > >> But this ignore driver's iommu_sva_unbind_device function, > >> iommu_sva_bind_device and iommu_sva_unbind_device are not pair, So driver > >> does not know ioasid is freed. > >> > >> Any suggestion? > > ioasid is per process or per mm. A daemon process shouldn't share the same > > ioasid with any other process with even its parent process. Its parent gets > > an ioasid and frees it on exit. The ioasid is gone and shouldn't be used > > by its child process. > > > > Each daemon process should call driver -> iommu_sva_bind_device -> > > ioasid_alloc > > to get its own ioasid/PASID. On daemon quit, the ioasid is freed. > > > > That means nqnix needs to be changed. > > Fenghua, please step back for a second and look at what you are saying. > Your patch caused userspace to break. Now, you're telling someone that > they need to go change that userspace to work around something that your > patch. How, exactly, are you suggesting that nginx could change to fix > this? What, specifically, was it doing with *fork()* that was wrong? > > It sounds to me like you're saying that it's OK to break userspace. You are right. The patch should not break userspace. I follow your suggestion to fix the issue by mmget() in binding and mmput() in unbinding. The RFC patch was sent out in another thread. Please review it. Thank you very much for your advice. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Thu, Apr 14, 2022 at 06:08:09PM +0800, zhangfei@foxmail.com wrote: > > On 2022/4/12 下午11:35, zhangfei@foxmail.com wrote: > > Hi, Fenghua > > > > On 2022/4/12 下午9:41, Fenghua Yu wrote: > > > Hi, Zhangfei, > > > > > > On Tue, Apr 12, 2022 at 03:04:09PM +0800, zhangfei@foxmail.com > > > wrote: > > > > > > > > On 2022/4/11 下午10:52, Dave Hansen wrote: > > > > > On 4/11/22 07:44, zhangfei@foxmail.com wrote: > > > > > > On 2022/4/11 下午10:36, Dave Hansen wrote: > > > > > > > On 4/11/22 07:20, zhangfei@foxmail.com wrote: > > Agree with Dave, I think user space should not be broken. > > > > Thanks > > Any plan about this regression? > Currently I need this patch to workaround the issue. > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > index 22ddd05bbdcd..2d74ac53d11c 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c > @@ -4,6 +4,7 @@ > */ > > #include > +#include > #include > #include > #include > @@ -363,6 +364,7 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct > *mm, void *drvdata) > > mutex_lock(_lock); > handle = __arm_smmu_sva_bind(dev, mm); > + mmget(mm); > mutex_unlock(_lock); > return handle; > } > @@ -377,6 +379,7 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle) > arm_smmu_mmu_notifier_put(bond->smmu_mn); > kfree(bond); > } > + mmput(bond->mm); > mutex_unlock(_lock); > } Could you please review and/or test this patch? It's supposed to fix the PASID issue on both ARM and X86. From a6444e1e5bd8076f5e5c5e950d3192de327f0c9c Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Fri, 15 Apr 2022 00:51:33 -0700 Subject: [RFC PATCH] iommu/sva: Fix PASID use-after-free issue A PASID might be still used even though it is freed on mm exit. process A: sva_bind(); ioasid_alloc() = N; // Get PASID N for the mm fork(): // spawn process B exit(); ioasid_free(N); process B: device uses PASID N -> failure sva_unbind(); Dave Hansen suggests to take a refcount on the mm whenever binding the PASID to a device and drop the refcount on unbinding. The mm won't be dropped if the PASID is still bound to it. Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit") Reported-by: Zhangfei Gao Suggested-by: Dave Hansen" Signed-off-by: Fenghua Yu --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 6 ++ drivers/iommu/intel/svm.c | 4 2 files changed, 10 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 22ddd05bbdcd..3fcb842a0df0 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -7,6 +7,7 @@ #include #include #include +#include #include "arm-smmu-v3.h" #include "../../iommu-sva-lib.h" @@ -363,6 +364,9 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata) mutex_lock(_lock); handle = __arm_smmu_sva_bind(dev, mm); + /* Take an mm refcount on a successful bind. */ + if (!IS_ERR(handle)) + mmget(mm); mutex_unlock(_lock); return handle; } @@ -372,6 +376,8 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle) struct arm_smmu_bond *bond = sva_to_bond(handle); mutex_lock(_lock); + /* Drop an mm refcount. */ + mmput(bond->mm); if (refcount_dec_and_test(>refs)) { list_del(>list); arm_smmu_mmu_notifier_put(bond->smmu_mn); diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 23a38763c1d1..345a0d5d7922 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -403,6 +403,8 @@ static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu, goto free_sdev; list_add_rcu(>list, >devs); + /* Take an mm refcount on binding mm. */ + mmget(mm); success: return >sva; @@ -465,6 +467,8 @@ static int intel_svm_unbind_mm(struct device *dev, u32 pasid) kfree(svm); } } + /* Drop an mm reference on unbinding mm. */ + mmput(mm); } out: return ret; -- 2.32.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/12 下午11:35, zhangfei@foxmail.com wrote: Hi, Fenghua On 2022/4/12 下午9:41, Fenghua Yu wrote: Hi, Zhangfei, On Tue, Apr 12, 2022 at 03:04:09PM +0800, zhangfei@foxmail.com wrote: On 2022/4/11 下午10:52, Dave Hansen wrote: On 4/11/22 07:44, zhangfei@foxmail.com wrote: On 2022/4/11 下午10:36, Dave Hansen wrote: On 4/11/22 07:20, zhangfei@foxmail.com wrote: Is there nothing before this call trace? Usually there will be at least some warning text. I added dump_stack() in ioasid_free. Hold on a sec, though... What's the *problem* here? Did something break or are you just saying that something looks weird to _you_? After this, nginx is not working at all, and hardware reports error. Suppose the the master use the ioasid for init, but got freed. hardware reports: [ 152.731869] hisi_sec2 :76:00.0: qm_acc_do_task_timeout [error status=0x20] found [ 152.739657] hisi_sec2 :76:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found [ 152.747877] hisi_sec2 :76:00.0: sec_fsm_hbeat_rint [error status=0x20] found [ 152.755340] hisi_sec2 :76:00.0: Controller resetting... [ 152.762044] hisi_sec2 :76:00.0: QM mailbox operation timeout! [ 152.768198] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.773490] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.781426] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.787468] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.792753] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.800685] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.806730] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.812017] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.819946] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.825992] hisi_sec2 :76:00.0: Failed to dump sqc! That would have been awfully handy information to have in an initial bug report. :) Is there a chance you could dump out that ioasid alloc *and* free information in ioasid_alloc/free()? This could be some kind of problem with the allocator, or with copying the ioasid at fork. The issue is nginx master process init resource, start daemon process, then master process quit and free ioasid. The daemon nginx process is not the original master process. master process: init resource driver -> iommu_sva_bind_device -> ioasid_alloc Which code in the master process/daemon calls driver->iommu_sva_unbind_device? Our calling sequence is nginx -> openssl -> openssl engine -> kernel driver The calling entrence should be ngx_ssl_init : OPENSSL_config(NULL); nginx: src/event/ngx_event_openssl.c ngx_ssl_init if (OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL) == 0) I add some print. /usr/local/nginx$ sudo sbin/nginx ngx_ssl_init pid=2361 bind_fn ngx_openssl_create_conf pid=2361 hisi sec init Kunpeng920! ngx_ssl_create pid=2361 ngx_ssl_certificates pid=2361 ngx_ssl_certificate pid=2361 uadk_e_wd_digest_init hisi sec init Kunpeng920! ngx_ssl_ciphers pid=2361 ngx_daemon pid=2361 fork daemon master pid=2361 will exit // here master process is exit fork return 0 pid=2364 // here daemon process started ngx_daemon fork ngx_pid=2364, ngx_parent=2361 $ ps -aux | grep nginx root 2364 0.0 0.0 31324 15380 ? Ssl 15:21 0:00 nginx: master process sbin/nginx nobody 2366 0.0 0.0 32304 16448 ? Sl 15:21 0:00 nginx: worker process linaro 2371 0.0 0.0 7696 2048 pts/0 S+ 15:22 0:00 grep --color=auto nginx nginx src/os/unix/ngx_daemon.c ngx_daemon(ngx_log_t *log) { int fd; switch (fork()) { case -1: ngx_log_error(NGX_LOG_EMERG, log, ngx_errno, "fork() failed"); return NGX_ERROR; case 0: // here fork daemon process break; default: // master process directly exit, and release mm as well as ioasid exit(0); } // only daemon process ngx_parent = ngx_pid; ngx_pid = ngx_getpid(); Any suggestion? ioasid is per process or per mm. A daemon process shouldn't share the same ioasid with any other process with even its parent process. Its parent gets an ioasid and frees it on exit. The ioasid is gone and shouldn't be used by its child process. Each daemon process should call driver -> iommu_sva_bind_device -> ioasid_alloc to get its own ioasid/PASID. On daemon quit, the ioasid is freed. That means nqnix needs to be changed. Agree with Dave, I think user space should not be broken. Thanks Any plan about this regression? Currently I need this patch to workaround the issue. diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index 22ddd05bbdcd..2d74ac53d11c 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -4,6 +4,7 @@ */ #include
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi Dave, On 2022/4/12 23:35, Dave Hansen wrote: On 4/12/22 08:10, Jean-Philippe Brucker wrote: I wonder if the Intel and ARM IOMMU code differ in the way they keep references to the mm, or if this affects Intel as well, but we just haven't tested the code enough. The Arm code was written expecting the PASID to be freed on unbind(), not mm exit. I missed the change of behavior, sorry (I thought your plan was to extend PASID lifetime, not shorten it?) but as is it seems very broken. For example in the iommu_sva_unbind_device(), we have arm_smmu_mmu_notifier_put() clearing the PASID table entry for "mm->pasid", which is going to end badly if the PASID has been cleared or reallocated. We can't clear the PASID entry in mm exit because at that point the device may still be issuing DMA for that PASID and we need to quiesce the entry rather than deactivate it. I think we ended up flipping some of this around on the Intel side. Instead of having to quiesce the device on mm exit, we don't let the mm exit until the device is done. The Intel IOMMU code doesn't quiesce the device on mm exit. It only tears down the PASID entry so that the subsequent device accesses to mm is dropped silently. Just like ARM, Intel IOMMU code also expects that PASID should be freed and reused after device is done (i.e. after iommu_sva_unbind_device()) so that the PASID could be drained in both hardware and software before reusing it for other purpose. When you program the pasid into the device, it's a lot like when you create a thread. We bump the reference count on the mm when we program the page table pointer into a CPU. We drop the thread's reference to the mm when the thread exits and will no longer be using the page tables. Same thing with pasids. We bump the refcount on the mm when the pasid is programmed into the device. Once the device is done with the mm, we drop the mm. Basically, instead of recounting the pasid itself, we just refcount the mm. Above makes sense to me. It guarantees that the mm->pasid could only be freed and reused after the device is done. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Fenghua On 2022/4/12 下午9:41, Fenghua Yu wrote: Hi, Zhangfei, On Tue, Apr 12, 2022 at 03:04:09PM +0800, zhangfei@foxmail.com wrote: On 2022/4/11 下午10:52, Dave Hansen wrote: On 4/11/22 07:44, zhangfei@foxmail.com wrote: On 2022/4/11 下午10:36, Dave Hansen wrote: On 4/11/22 07:20, zhangfei@foxmail.com wrote: Is there nothing before this call trace? Usually there will be at least some warning text. I added dump_stack() in ioasid_free. Hold on a sec, though... What's the *problem* here? Did something break or are you just saying that something looks weird to _you_? After this, nginx is not working at all, and hardware reports error. Suppose the the master use the ioasid for init, but got freed. hardware reports: [ 152.731869] hisi_sec2 :76:00.0: qm_acc_do_task_timeout [error status=0x20] found [ 152.739657] hisi_sec2 :76:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found [ 152.747877] hisi_sec2 :76:00.0: sec_fsm_hbeat_rint [error status=0x20] found [ 152.755340] hisi_sec2 :76:00.0: Controller resetting... [ 152.762044] hisi_sec2 :76:00.0: QM mailbox operation timeout! [ 152.768198] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.773490] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.781426] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.787468] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.792753] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.800685] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.806730] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.812017] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.819946] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.825992] hisi_sec2 :76:00.0: Failed to dump sqc! That would have been awfully handy information to have in an initial bug report. :) Is there a chance you could dump out that ioasid alloc *and* free information in ioasid_alloc/free()? This could be some kind of problem with the allocator, or with copying the ioasid at fork. The issue is nginx master process init resource, start daemon process, then master process quit and free ioasid. The daemon nginx process is not the original master process. master process: init resource driver -> iommu_sva_bind_device -> ioasid_alloc Which code in the master process/daemon calls driver->iommu_sva_unbind_device? Our calling sequence is nginx -> openssl -> openssl engine -> kernel driver The calling entrence should be ngx_ssl_init : OPENSSL_config(NULL); nginx: src/event/ngx_event_openssl.c ngx_ssl_init if (OPENSSL_init_ssl(OPENSSL_INIT_LOAD_CONFIG, NULL) == 0) I add some print. /usr/local/nginx$ sudo sbin/nginx ngx_ssl_init pid=2361 bind_fn ngx_openssl_create_conf pid=2361 hisi sec init Kunpeng920! ngx_ssl_create pid=2361 ngx_ssl_certificates pid=2361 ngx_ssl_certificate pid=2361 uadk_e_wd_digest_init hisi sec init Kunpeng920! ngx_ssl_ciphers pid=2361 ngx_daemon pid=2361 fork daemon master pid=2361 will exit // here master process is exit fork return 0 pid=2364 // here daemon process started ngx_daemon fork ngx_pid=2364, ngx_parent=2361 $ ps -aux | grep nginx root 2364 0.0 0.0 31324 15380 ? Ssl 15:21 0:00 nginx: master process sbin/nginx nobody 2366 0.0 0.0 32304 16448 ? Sl 15:21 0:00 nginx: worker process linaro 2371 0.0 0.0 7696 2048 pts/0 S+ 15:22 0:00 grep --color=auto nginx nginx src/os/unix/ngx_daemon.c ngx_daemon(ngx_log_t *log) { int fd; switch (fork()) { case -1: ngx_log_error(NGX_LOG_EMERG, log, ngx_errno, "fork() failed"); return NGX_ERROR; case 0: // here fork daemon process break; default: // master process directly exit, and release mm as well as ioasid exit(0); } // only daemon process ngx_parent = ngx_pid; ngx_pid = ngx_getpid(); nginx : ngx_daemon fork daemon, without add mm's refcount. src/os/unix/ngx_daemon.c ngx_daemon(ngx_log_t *log) { int fd; switch (fork()) { case -1: ngx_log_error(NGX_LOG_EMERG, log, ngx_errno, "fork() failed"); return NGX_ERROR; case 0: // here master process is quit directly and will be released. break; default: exit(0); } // here daemon process take control. ngx_parent = ngx_pid; ngx_pid = ngx_getpid(); fork.c copy_mm if (clone_flags & CLONE_VM) { mmget(oldmm); mm = oldmm; } else { mm = dup_mm(tsk, current->mm); // here daemon process handling without mmget. master process quit, mmput -> mm_pasid_drop->ioasid_free But this ignore driver's iommu_sva_unbind_device function, iommu_sva_bind_device and iommu_sva_unbind_device are not pair, So driver does not know ioasid is freed. Any suggestion?
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/12/22 08:10, Jean-Philippe Brucker wrote: >> I wonder if the Intel and ARM IOMMU code differ in the way they keep >> references to the mm, or if this affects Intel as well, but we just >> haven't tested the code enough. > The Arm code was written expecting the PASID to be freed on unbind(), not > mm exit. I missed the change of behavior, sorry (I thought your plan was > to extend PASID lifetime, not shorten it?) but as is it seems very broken. > For example in the iommu_sva_unbind_device(), we have > arm_smmu_mmu_notifier_put() clearing the PASID table entry for > "mm->pasid", which is going to end badly if the PASID has been cleared or > reallocated. We can't clear the PASID entry in mm exit because at that > point the device may still be issuing DMA for that PASID and we need to > quiesce the entry rather than deactivate it. I think we ended up flipping some of this around on the Intel side. Instead of having to quiesce the device on mm exit, we don't let the mm exit until the device is done. When you program the pasid into the device, it's a lot like when you create a thread. We bump the reference count on the mm when we program the page table pointer into a CPU. We drop the thread's reference to the mm when the thread exits and will no longer be using the page tables. Same thing with pasids. We bump the refcount on the mm when the pasid is programmed into the device. Once the device is done with the mm, we drop the mm. Basically, instead of recounting the pasid itself, we just refcount the mm. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, On Tue, Apr 12, 2022 at 07:36:21AM -0700, Dave Hansen wrote: > On 4/12/22 00:04, zhangfei@foxmail.com wrote: > > master process quit, mmput -> mm_pasid_drop->ioasid_free > > But this ignore driver's iommu_sva_unbind_device function, > > iommu_sva_bind_device and iommu_sva_unbind_device are not pair, So > > driver does not know ioasid is freed. > > > > Any suggestion? > > It sounds like you're saying that the device is still abound to the > PASID even though the mm has exited and freed the PASID. This is > essentially a use-after-free for the PASID. Right? > > The right thing to do here is to have the PASID code hold a reference on > the mm. The mm "owns" the PASID for its entire lifetime and if anything > needs the PASID to live longer, its only recourse for doing that is via > an mmget(). I _think_ mmget() is the right thing as opposed to mmgrab() > because the PASID users actually need the page tables to be around. > > This would still be nice to confirm with some traces of fork()/exit() > and the iommu_sva_{bind,unbind} and ioasid_{alloc,free} functions. > > I wonder if the Intel and ARM IOMMU code differ in the way they keep > references to the mm, or if this affects Intel as well, but we just > haven't tested the code enough. The Arm code was written expecting the PASID to be freed on unbind(), not mm exit. I missed the change of behavior, sorry (I thought your plan was to extend PASID lifetime, not shorten it?) but as is it seems very broken. For example in the iommu_sva_unbind_device(), we have arm_smmu_mmu_notifier_put() clearing the PASID table entry for "mm->pasid", which is going to end badly if the PASID has been cleared or reallocated. We can't clear the PASID entry in mm exit because at that point the device may still be issuing DMA for that PASID and we need to quiesce the entry rather than deactivate it. We can only deactivate it once the device driver has properly stopped the device, at which point it can call unbind(). There may be other issues but I can't check it thoroughly until next week. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/12/22 06:41, Fenghua Yu wrote: >> master process quit, mmput -> mm_pasid_drop->ioasid_free >> But this ignore driver's iommu_sva_unbind_device function, >> iommu_sva_bind_device and iommu_sva_unbind_device are not pair, So driver >> does not know ioasid is freed. >> >> Any suggestion? > ioasid is per process or per mm. A daemon process shouldn't share the same > ioasid with any other process with even its parent process. Its parent gets > an ioasid and frees it on exit. The ioasid is gone and shouldn't be used > by its child process. > > Each daemon process should call driver -> iommu_sva_bind_device -> > ioasid_alloc > to get its own ioasid/PASID. On daemon quit, the ioasid is freed. > > That means nqnix needs to be changed. Fenghua, please step back for a second and look at what you are saying. Your patch caused userspace to break. Now, you're telling someone that they need to go change that userspace to work around something that your patch. How, exactly, are you suggesting that nginx could change to fix this? What, specifically, was it doing with *fork()* that was wrong? It sounds to me like you're saying that it's OK to break userspace. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/12/22 00:04, zhangfei@foxmail.com wrote: > master process quit, mmput -> mm_pasid_drop->ioasid_free > But this ignore driver's iommu_sva_unbind_device function, > iommu_sva_bind_device and iommu_sva_unbind_device are not pair, So > driver does not know ioasid is freed. > > Any suggestion? It sounds like you're saying that the device is still abound to the PASID even though the mm has exited and freed the PASID. This is essentially a use-after-free for the PASID. Right? The right thing to do here is to have the PASID code hold a reference on the mm. The mm "owns" the PASID for its entire lifetime and if anything needs the PASID to live longer, its only recourse for doing that is via an mmget(). I _think_ mmget() is the right thing as opposed to mmgrab() because the PASID users actually need the page tables to be around. This would still be nice to confirm with some traces of fork()/exit() and the iommu_sva_{bind,unbind} and ioasid_{alloc,free} functions. I wonder if the Intel and ARM IOMMU code differ in the way they keep references to the mm, or if this affects Intel as well, but we just haven't tested the code enough. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Zhangfei, On Tue, Apr 12, 2022 at 03:04:09PM +0800, zhangfei@foxmail.com wrote: > > > On 2022/4/11 下午10:52, Dave Hansen wrote: > > On 4/11/22 07:44, zhangfei@foxmail.com wrote: > > > On 2022/4/11 下午10:36, Dave Hansen wrote: > > > > On 4/11/22 07:20, zhangfei@foxmail.com wrote: > > > > > > Is there nothing before this call trace? Usually there will be at > > > > > > least > > > > > > some warning text. > > > > > I added dump_stack() in ioasid_free. > > > > Hold on a sec, though... > > > > > > > > What's the *problem* here? Did something break or are you just saying > > > > that something looks weird to _you_? > > > After this, nginx is not working at all, and hardware reports error. > > > Suppose the the master use the ioasid for init, but got freed. > > > > > > hardware reports: > > > [ 152.731869] hisi_sec2 :76:00.0: qm_acc_do_task_timeout [error > > > status=0x20] found > > > [ 152.739657] hisi_sec2 :76:00.0: qm_acc_wb_not_ready_timeout [error > > > status=0x40] found > > > [ 152.747877] hisi_sec2 :76:00.0: sec_fsm_hbeat_rint [error > > > status=0x20] found > > > [ 152.755340] hisi_sec2 :76:00.0: Controller resetting... > > > [ 152.762044] hisi_sec2 :76:00.0: QM mailbox operation timeout! > > > [ 152.768198] hisi_sec2 :76:00.0: Failed to dump sqc! > > > [ 152.773490] hisi_sec2 :76:00.0: Failed to drain out data for > > > stopping! > > > [ 152.781426] hisi_sec2 :76:00.0: QM mailbox is busy to start! > > > [ 152.787468] hisi_sec2 :76:00.0: Failed to dump sqc! > > > [ 152.792753] hisi_sec2 :76:00.0: Failed to drain out data for > > > stopping! > > > [ 152.800685] hisi_sec2 :76:00.0: QM mailbox is busy to start! > > > [ 152.806730] hisi_sec2 :76:00.0: Failed to dump sqc! > > > [ 152.812017] hisi_sec2 :76:00.0: Failed to drain out data for > > > stopping! > > > [ 152.819946] hisi_sec2 :76:00.0: QM mailbox is busy to start! > > > [ 152.825992] hisi_sec2 :76:00.0: Failed to dump sqc! > > That would have been awfully handy information to have in an initial bug > > report. :) > > Is there a chance you could dump out that ioasid alloc *and* free > > information in ioasid_alloc/free()? This could be some kind of problem > > with the allocator, or with copying the ioasid at fork. > The issue is nginx master process init resource, start daemon process, then > master process quit and free ioasid. > The daemon nginx process is not the original master process. > > master process: init resource > driver -> iommu_sva_bind_device -> ioasid_alloc Which code in the master process/daemon calls driver->iommu_sva_unbind_device? > > nginx : ngx_daemon > fork daemon, without add mm's refcount. > > src/os/unix/ngx_daemon.c > ngx_daemon(ngx_log_t *log) > { > int fd; > > switch (fork()) { > case -1: > ngx_log_error(NGX_LOG_EMERG, log, ngx_errno, "fork() failed"); > return NGX_ERROR; > > case 0: // here master process is quit directly and will be > released. > break; > > default: > exit(0); > } > // here daemon process take control. > ngx_parent = ngx_pid; > ngx_pid = ngx_getpid(); > > > fork.c > copy_mm > if (clone_flags & CLONE_VM) { > mmget(oldmm); > mm = oldmm; > } else { > mm = dup_mm(tsk, current->mm); // here daemon process > handling without mmget. > > master process quit, mmput -> mm_pasid_drop->ioasid_free > But this ignore driver's iommu_sva_unbind_device function, > iommu_sva_bind_device and iommu_sva_unbind_device are not pair, So driver > does not know ioasid is freed. > > Any suggestion? ioasid is per process or per mm. A daemon process shouldn't share the same ioasid with any other process with even its parent process. Its parent gets an ioasid and frees it on exit. The ioasid is gone and shouldn't be used by its child process. Each daemon process should call driver -> iommu_sva_bind_device -> ioasid_alloc to get its own ioasid/PASID. On daemon quit, the ioasid is freed. That means nqnix needs to be changed. > Or can we still use the original ioasid refcount mechanism? > Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/11 下午10:52, Dave Hansen wrote: On 4/11/22 07:44, zhangfei@foxmail.com wrote: On 2022/4/11 下午10:36, Dave Hansen wrote: On 4/11/22 07:20, zhangfei@foxmail.com wrote: Is there nothing before this call trace? Usually there will be at least some warning text. I added dump_stack() in ioasid_free. Hold on a sec, though... What's the *problem* here? Did something break or are you just saying that something looks weird to _you_? After this, nginx is not working at all, and hardware reports error. Suppose the the master use the ioasid for init, but got freed. hardware reports: [ 152.731869] hisi_sec2 :76:00.0: qm_acc_do_task_timeout [error status=0x20] found [ 152.739657] hisi_sec2 :76:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found [ 152.747877] hisi_sec2 :76:00.0: sec_fsm_hbeat_rint [error status=0x20] found [ 152.755340] hisi_sec2 :76:00.0: Controller resetting... [ 152.762044] hisi_sec2 :76:00.0: QM mailbox operation timeout! [ 152.768198] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.773490] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.781426] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.787468] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.792753] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.800685] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.806730] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.812017] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.819946] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.825992] hisi_sec2 :76:00.0: Failed to dump sqc! That would have been awfully handy information to have in an initial bug report. :) Is there a chance you could dump out that ioasid alloc *and* free information in ioasid_alloc/free()? This could be some kind of problem with the allocator, or with copying the ioasid at fork. The issue is nginx master process init resource, start daemon process, then master process quit and free ioasid. The daemon nginx process is not the original master process. master process: init resource driver -> iommu_sva_bind_device -> ioasid_alloc nginx : ngx_daemon fork daemon, without add mm's refcount. src/os/unix/ngx_daemon.c ngx_daemon(ngx_log_t *log) { int fd; switch (fork()) { case -1: ngx_log_error(NGX_LOG_EMERG, log, ngx_errno, "fork() failed"); return NGX_ERROR; case 0: // here master process is quit directly and will be released. break; default: exit(0); } // here daemon process take control. ngx_parent = ngx_pid; ngx_pid = ngx_getpid(); fork.c copy_mm if (clone_flags & CLONE_VM) { mmget(oldmm); mm = oldmm; } else { mm = dup_mm(tsk, current->mm); // here daemon process handling without mmget. master process quit, mmput -> mm_pasid_drop->ioasid_free But this ignore driver's iommu_sva_unbind_device function, iommu_sva_bind_device and iommu_sva_unbind_device are not pair, So driver does not know ioasid is freed. Any suggestion? Or can we still use the original ioasid refcount mechanism? Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/11 下午10:52, Dave Hansen wrote: On 4/11/22 07:44, zhangfei@foxmail.com wrote: On 2022/4/11 下午10:36, Dave Hansen wrote: On 4/11/22 07:20, zhangfei@foxmail.com wrote: Is there nothing before this call trace? Usually there will be at least some warning text. I added dump_stack() in ioasid_free. Hold on a sec, though... What's the *problem* here? Did something break or are you just saying that something looks weird to _you_? After this, nginx is not working at all, and hardware reports error. Suppose the the master use the ioasid for init, but got freed. hardware reports: [ 152.731869] hisi_sec2 :76:00.0: qm_acc_do_task_timeout [error status=0x20] found [ 152.739657] hisi_sec2 :76:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found [ 152.747877] hisi_sec2 :76:00.0: sec_fsm_hbeat_rint [error status=0x20] found [ 152.755340] hisi_sec2 :76:00.0: Controller resetting... [ 152.762044] hisi_sec2 :76:00.0: QM mailbox operation timeout! [ 152.768198] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.773490] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.781426] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.787468] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.792753] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.800685] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.806730] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.812017] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.819946] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.825992] hisi_sec2 :76:00.0: Failed to dump sqc! That would have been awfully handy information to have in an initial bug report. :) Is there a chance you could dump out that ioasid alloc *and* free information in ioasid_alloc/free()? This could be some kind of problem with the allocator, or with copying the ioasid at fork. Sure, will add some trace tomorrow. Thanks Dave ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/11/22 07:44, zhangfei@foxmail.com wrote: > On 2022/4/11 下午10:36, Dave Hansen wrote: >> On 4/11/22 07:20, zhangfei@foxmail.com wrote: Is there nothing before this call trace? Usually there will be at least some warning text. >>> I added dump_stack() in ioasid_free. >> Hold on a sec, though... >> >> What's the *problem* here? Did something break or are you just saying >> that something looks weird to _you_? > > After this, nginx is not working at all, and hardware reports error. > Suppose the the master use the ioasid for init, but got freed. > > hardware reports: > [ 152.731869] hisi_sec2 :76:00.0: qm_acc_do_task_timeout [error > status=0x20] found > [ 152.739657] hisi_sec2 :76:00.0: qm_acc_wb_not_ready_timeout [error > status=0x40] found > [ 152.747877] hisi_sec2 :76:00.0: sec_fsm_hbeat_rint [error status=0x20] > found > [ 152.755340] hisi_sec2 :76:00.0: Controller resetting... > [ 152.762044] hisi_sec2 :76:00.0: QM mailbox operation timeout! > [ 152.768198] hisi_sec2 :76:00.0: Failed to dump sqc! > [ 152.773490] hisi_sec2 :76:00.0: Failed to drain out data for stopping! > [ 152.781426] hisi_sec2 :76:00.0: QM mailbox is busy to start! > [ 152.787468] hisi_sec2 :76:00.0: Failed to dump sqc! > [ 152.792753] hisi_sec2 :76:00.0: Failed to drain out data for stopping! > [ 152.800685] hisi_sec2 :76:00.0: QM mailbox is busy to start! > [ 152.806730] hisi_sec2 :76:00.0: Failed to dump sqc! > [ 152.812017] hisi_sec2 :76:00.0: Failed to drain out data for stopping! > [ 152.819946] hisi_sec2 :76:00.0: QM mailbox is busy to start! > [ 152.825992] hisi_sec2 :76:00.0: Failed to dump sqc! That would have been awfully handy information to have in an initial bug report. :) Is there a chance you could dump out that ioasid alloc *and* free information in ioasid_alloc/free()? This could be some kind of problem with the allocator, or with copying the ioasid at fork. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/11 下午10:36, Dave Hansen wrote: On 4/11/22 07:20, zhangfei@foxmail.com wrote: Is there nothing before this call trace? Usually there will be at least some warning text. I added dump_stack() in ioasid_free. Hold on a sec, though... What's the *problem* here? Did something break or are you just saying that something looks weird to _you_? After this, nginx is not working at all, and hardware reports error. Suppose the the master use the ioasid for init, but got freed. hardware reports: [ 152.731869] hisi_sec2 :76:00.0: qm_acc_do_task_timeout [error status=0x20] found [ 152.739657] hisi_sec2 :76:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found [ 152.747877] hisi_sec2 :76:00.0: sec_fsm_hbeat_rint [error status=0x20] found [ 152.755340] hisi_sec2 :76:00.0: Controller resetting... [ 152.762044] hisi_sec2 :76:00.0: QM mailbox operation timeout! [ 152.768198] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.773490] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.781426] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.787468] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.792753] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.800685] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.806730] hisi_sec2 :76:00.0: Failed to dump sqc! [ 152.812017] hisi_sec2 :76:00.0: Failed to drain out data for stopping! [ 152.819946] hisi_sec2 :76:00.0: QM mailbox is busy to start! [ 152.825992] hisi_sec2 :76:00.0: Failed to dump sqc! Thanks For instance, if we have: nginx: ioasid_alloc()==1 fork(); // spawn the daemon exit(); ioasid_free(1); and then a moment later: lynx: ioasid_alloc()==1 fork(); exit(); ioasid_free(1); There's no problem. The original nginx process with ioasid==1 exited. The fact that *some* process called nginx is still running doesn't mean that it still has a reference to asid==1. Are you sure the nginx process that allocated the ASID is the same process you see in ps? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2022/4/11 下午10:10, Dave Hansen wrote: On 4/11/22 07:00, Zhangfei Gao wrote: with this patchset, each time after sbin/nginx, ioasid is freed immediately. lynx test will alloc the same ioasid=1. That doesn't seem right. Isn't 'sbin/nginx' still running when lynx runs? How can they get the same ioasid? Yes, sbin/nginx is still running, root 3228 0.0 0.0 31316 13476 ? Ssl 12:50 0:00 nginx: master process sbin/nginx nobody 3230 0.0 0.0 32296 16456 ? Sl 12:50 0:00 nginx: worker process Since ioasid is freed, so lynx can get the same ioasid. This sounds like a refcounting problem, like that the ioasid wasn't properly refcounted as nginx forked into the background. Yes, in checking, thanks and this patchset removed the old refcount. To verify, hack comment mm_pasid_drop in __mmput will make the issue disappear. log: after sbin/nginx. [ 96.526730] Call trace: [ 96.526732] dump_backtrace+0xe4/0xf0 [ 96.526741] show_stack+0x20/0x70 [ 96.526744] dump_stack_lvl+0x8c/0xb8 [ 96.526751] dump_stack+0x18/0x34 [ 96.526754] ioasid_free+0xdc/0xfc [ 96.526757] mmput+0x138/0x160 [ 96.526760] do_exit+0x284/0x9d0 [ 96.526765] do_group_exit+0x3c/0xa8 [ 96.526767] __wake_up_parent+0x0/0x38 [ 96.526770] invoke_syscall+0x4c/0x110 [ 96.526775] el0_svc_common.constprop.0+0x68/0x128 [ 96.526778] do_el0_svc+0x2c/0x90 [ 96.526781] el0_svc+0x30/0x98 [ 96.526783] el0t_64_sync_handler+0xb0/0xb8 [ 96.526785] el0t_64_sync+0x18c/0x190 Is there nothing before this call trace? Usually there will be at least some warning text. I added dump_stack() in ioasid_free. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/11/22 07:20, zhangfei@foxmail.com wrote: >> Is there nothing before this call trace? Usually there will be at least >> some warning text. > I added dump_stack() in ioasid_free. Hold on a sec, though... What's the *problem* here? Did something break or are you just saying that something looks weird to _you_? For instance, if we have: nginx: ioasid_alloc()==1 fork(); // spawn the daemon exit(); ioasid_free(1); and then a moment later: lynx: ioasid_alloc()==1 fork(); exit(); ioasid_free(1); There's no problem. The original nginx process with ioasid==1 exited. The fact that *some* process called nginx is still running doesn't mean that it still has a reference to asid==1. Are you sure the nginx process that allocated the ASID is the same process you see in ps? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 4/11/22 07:00, Zhangfei Gao wrote: > with this patchset, each time after sbin/nginx, ioasid is freed > immediately. lynx test will alloc the same ioasid=1. That doesn't seem right. Isn't 'sbin/nginx' still running when lynx runs? How can they get the same ioasid? This sounds like a refcounting problem, like that the ioasid wasn't properly refcounted as nginx forked into the background. > To verify, hack comment mm_pasid_drop in __mmput will make the issue > disappear. > > log: after sbin/nginx. > [ 96.526730] Call trace: > [ 96.526732] dump_backtrace+0xe4/0xf0 > [ 96.526741] show_stack+0x20/0x70 > [ 96.526744] dump_stack_lvl+0x8c/0xb8 > [ 96.526751] dump_stack+0x18/0x34 > [ 96.526754] ioasid_free+0xdc/0xfc > [ 96.526757] mmput+0x138/0x160 > [ 96.526760] do_exit+0x284/0x9d0 > [ 96.526765] do_group_exit+0x3c/0xa8 > [ 96.526767] __wake_up_parent+0x0/0x38 > [ 96.526770] invoke_syscall+0x4c/0x110 > [ 96.526775] el0_svc_common.constprop.0+0x68/0x128 > [ 96.526778] do_el0_svc+0x2c/0x90 > [ 96.526781] el0_svc+0x30/0x98 > [ 96.526783] el0t_64_sync_handler+0xb0/0xb8 > [ 96.526785] el0t_64_sync+0x18c/0x190 Is there nothing before this call trace? Usually there will be at least some warning text. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Fenghua I got an issue when testing nginx with 5.18-rc1/2 on aarch64 system, After debugging I found it is caused by this patchset. In the test, when accessing https, crypto driver will be used with sva feature, and nginx is testing multi-threads. Test cmd: sudo sbin/nginx lynx https://localhost sudo sbin/nginx -s quit Before this patchset, ioasid is kept ioasid=1 after sbin/nginx until quit. And new ioasid is alloc ioasid=2 when testing lynx. ioasid=1 is freed when sbin/nginx -s quit. with this patchset, each time after sbin/nginx, ioasid is freed immediately. lynx test will alloc the sameioasid=1. To verify, hack commentmm_pasid_drop in__mmput will make the issue disappear. log: after sbin/nginx. [ 96.526730] Call trace: [ 96.526732] dump_backtrace+0xe4/0xf0 [ 96.526741] show_stack+0x20/0x70 [ 96.526744] dump_stack_lvl+0x8c/0xb8 [ 96.526751] dump_stack+0x18/0x34 [ 96.526754] ioasid_free+0xdc/0xfc [ 96.526757] mmput+0x138/0x160 [ 96.526760] do_exit+0x284/0x9d0 [ 96.526765] do_group_exit+0x3c/0xa8 [ 96.526767] __wake_up_parent+0x0/0x38 [ 96.526770] invoke_syscall+0x4c/0x110 [ 96.526775] el0_svc_common.constprop.0+0x68/0x128 [ 96.526778] do_el0_svc+0x2c/0x90 [ 96.526781] el0_svc+0x30/0x98 [ 96.526783] el0t_64_sync_handler+0xb0/0xb8 [ 96.526785] el0t_64_sync+0x18c/0x190 I am still in checking. Thanks --Original-- From: "Joerg Roedel" https://lists.linuxfoundation.org/mailman/listinfo/iommu___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Mon, Feb 07, 2022 at 03:02:48PM -0800, Fenghua Yu wrote: > PASIDs are process wide. It was attempted to use refcounted PASIDs to > free them when the last thread drops the refcount. This turned out to > be complex and error prone. Given the fact that the PASID space is 20 > bits, which allows up to 1M processes to have a PASID associated > concurrently, PASID resource exhaustion is not a realistic concern. > > Therefore it was decided to simplify the approach and stick with lazy > on demand PASID allocation, but drop the eager free approach and make > a allocated PASID lifetime bound to the life time of the process. > > Get rid of the refcounting mechanisms and replace/rename the interfaces > to reflect this new approach. > > Suggested-by: Dave Hansen > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Acked-by: Joerg Roedel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Mon, Feb 07 2022 at 15:02, Fenghua Yu wrote: > PASIDs are process wide. It was attempted to use refcounted PASIDs to > free them when the last thread drops the refcount. This turned out to > be complex and error prone. Given the fact that the PASID space is 20 > bits, which allows up to 1M processes to have a PASID associated > concurrently, PASID resource exhaustion is not a realistic concern. > > Therefore it was decided to simplify the approach and stick with lazy > on demand PASID allocation, but drop the eager free approach and make > a allocated PASID lifetime bound to the life time of the process. > > Get rid of the refcounting mechanisms and replace/rename the interfaces > to reflect this new approach. > > Suggested-by: Dave Hansen > Signed-off-by: Fenghua Yu > Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2/7/22 15:02, Fenghua Yu wrote: ... > Get rid of the refcounting mechanisms and replace/rename the interfaces > to reflect this new approach. ... > .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 5 +-- > drivers/iommu/intel/iommu.c | 4 +- > drivers/iommu/intel/svm.c | 9 - > drivers/iommu/ioasid.c| 39 ++- > drivers/iommu/iommu-sva-lib.c | 39 ++- > drivers/iommu/iommu-sva-lib.h | 1 - > include/linux/ioasid.h| 12 +- > include/linux/sched/mm.h | 16 > kernel/fork.c | 1 + > 9 files changed, 38 insertions(+), 88 deletions(-) Given the heavily non-x86 diffstat here, I was hoping to see some acks from folks that this might affect, especially on the ARM side. Is everyone OK with this? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Tony, On Thu, Feb 10, 2022 at 10:31:42AM -0800, Fenghua Yu wrote: > > On Thu, Feb 10, 2022 at 09:24:50AM -0800, Luck, Tony wrote: > > On Thu, Feb 10, 2022 at 08:27:50AM -0800, Fenghua Yu wrote: > > > Hi, Jacob, > > > > > > On Wed, Feb 09, 2022 at 07:16:14PM -0800, Jacob Pan wrote: > > > > Hi Fenghua, > > > > > > > > On Mon, 7 Feb 2022 15:02:48 -0800, Fenghua Yu > > > > wrote: > > > > > > > > > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device > > > > > *dev, struct mm_struct *mm, void } > > > > > > > > > > sva = intel_svm_bind_mm(iommu, dev, mm, flags); > > > > > - if (IS_ERR_OR_NULL(sva)) > > > > > - intel_svm_free_pasid(mm); > > > > If bind fails, the PASID has no IOMMU nor CPU context. It should be > > > > safe to > > > > free here. > > > > > > The PASID can not be freed even if bind fails. The PASID allocated earlier > > > (either in this thread or in another thread) might be populated to other > > > threads already and being used now. > > > > > > Without freeing the PASID on bind failure, the worst case is the PASID > > > might > > > not be used in the process (and will be freed on process exit anyway). > > > > > > This all matches with the PASID life time described in the commit message. > > > > But what does this mean for the user that failed that intel_svm_bind_mm()? > > > > That means the user may have a PASID but there is no PASID entry for the > device. Then ENQCMD cannot be executed successfully. > > > Here's a scenario: > > > > Process sets up to use PASID capable device #1. Everything works, > > so the process has mm->pasid, and the IOMMU has the tables to map > > virtual addresses coming from device #1 using that PASID. > > > > Now the same process asks to start using PASID capable device #2, > > but there is a failure at intel_svm_bind_mm(). > > > > Fenghua is right that we shouldn't free the PASID. It is in use > > by at least one thread of the process to access device #1. > > > > But what happens with device #2? Does the caller of intel_svm_bind() > > do the right thing with the IS_ERR_OR_NULL return value to let the > > user know that device #2 isn't accessible? > > A driver caller of intel_svm_bind() should handle this failure, i.e. fail > the binding and let the user know the failure. > > Even if the driver doesn't do the right thing to handle the failure, > intel_svm_bind() doesn't set up a PASID entry for device #2. > > One example is IDXD driver. User calls open()->IDXD driver idxd_cdev_open() > ->intel_svm_bind()->intel_svm_bind_mm(). idxd_cdev_open() gets failed "sva" > and passes the PTR_ERR(sva) to the user and the user cannot open the device. > Due to the failure, no PASID entry is set up for the device. > > Even if the user ignores the open() failure and tries to run ENQCMD on > device #2, a PASID table fault will be generated due to no PASID entry > for the device and the ENQCMD execution will fail. Plus, the above behavior of handling intel_svm_bind_mm() failure is expected right behavior and the current series doesn't need to be changed. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Thu, Feb 10, 2022 at 10:49:04AM -0800, Jacob Pan wrote: > > On Wed, 9 Feb 2022 19:16:14 -0800, Jacob Pan > wrote: > > > Hi Fenghua, > > > > On Mon, 7 Feb 2022 15:02:48 -0800, Fenghua Yu > > wrote: > > > > > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device > > > *dev, struct mm_struct *mm, void } > > > > > > sva = intel_svm_bind_mm(iommu, dev, mm, flags); > > > - if (IS_ERR_OR_NULL(sva)) > > > - intel_svm_free_pasid(mm); > > If bind fails, the PASID has no IOMMU nor CPU context. It should be safe > > to free here. > > > Actually, here we cannot tell if the bind is the first of the mm so I think > this is fine. > > Reviewed-by: Jacob Pan Thank you very much for your review, Jacob! -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Wed, 9 Feb 2022 19:16:14 -0800, Jacob Pan wrote: > Hi Fenghua, > > On Mon, 7 Feb 2022 15:02:48 -0800, Fenghua Yu > wrote: > > > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device > > *dev, struct mm_struct *mm, void } > > > > sva = intel_svm_bind_mm(iommu, dev, mm, flags); > > - if (IS_ERR_OR_NULL(sva)) > > - intel_svm_free_pasid(mm); > If bind fails, the PASID has no IOMMU nor CPU context. It should be safe > to free here. > Actually, here we cannot tell if the bind is the first of the mm so I think this is fine. Reviewed-by: Jacob Pan > Thanks, > > Jacob Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Tony, On Thu, Feb 10, 2022 at 09:24:50AM -0800, Luck, Tony wrote: > On Thu, Feb 10, 2022 at 08:27:50AM -0800, Fenghua Yu wrote: > > Hi, Jacob, > > > > On Wed, Feb 09, 2022 at 07:16:14PM -0800, Jacob Pan wrote: > > > Hi Fenghua, > > > > > > On Mon, 7 Feb 2022 15:02:48 -0800, Fenghua Yu > > > wrote: > > > > > > > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device > > > > *dev, struct mm_struct *mm, void } > > > > > > > > sva = intel_svm_bind_mm(iommu, dev, mm, flags); > > > > - if (IS_ERR_OR_NULL(sva)) > > > > - intel_svm_free_pasid(mm); > > > If bind fails, the PASID has no IOMMU nor CPU context. It should be safe > > > to > > > free here. > > > > The PASID can not be freed even if bind fails. The PASID allocated earlier > > (either in this thread or in another thread) might be populated to other > > threads already and being used now. > > > > Without freeing the PASID on bind failure, the worst case is the PASID might > > not be used in the process (and will be freed on process exit anyway). > > > > This all matches with the PASID life time described in the commit message. > > But what does this mean for the user that failed that intel_svm_bind_mm()? > That means the user may have a PASID but there is no PASID entry for the device. Then ENQCMD cannot be executed successfully. > Here's a scenario: > > Process sets up to use PASID capable device #1. Everything works, > so the process has mm->pasid, and the IOMMU has the tables to map > virtual addresses coming from device #1 using that PASID. > > Now the same process asks to start using PASID capable device #2, > but there is a failure at intel_svm_bind_mm(). > > Fenghua is right that we shouldn't free the PASID. It is in use > by at least one thread of the process to access device #1. > > But what happens with device #2? Does the caller of intel_svm_bind() > do the right thing with the IS_ERR_OR_NULL return value to let the > user know that device #2 isn't accessible? A driver caller of intel_svm_bind() should handle this failure, i.e. fail the binding and let the user know the failure. Even if the driver doesn't do the right thing to handle the failure, intel_svm_bind() doesn't set up a PASID entry for device #2. One example is IDXD driver. User calls open()->IDXD driver idxd_cdev_open() ->intel_svm_bind()->intel_svm_bind_mm(). idxd_cdev_open() gets failed "sva" and passes the PTR_ERR(sva) to the user and the user cannot open the device. Due to the failure, no PASID entry is set up for the device. Even if the user ignores the open() failure and tries to run ENQCMD on device #2, a PASID table fault will be generated due to no PASID entry for the device and the ENQCMD execution will fail. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Thu, Feb 10, 2022 at 08:27:50AM -0800, Fenghua Yu wrote: > Hi, Jacob, > > On Wed, Feb 09, 2022 at 07:16:14PM -0800, Jacob Pan wrote: > > Hi Fenghua, > > > > On Mon, 7 Feb 2022 15:02:48 -0800, Fenghua Yu wrote: > > > > > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device > > > *dev, struct mm_struct *mm, void } > > > > > > sva = intel_svm_bind_mm(iommu, dev, mm, flags); > > > - if (IS_ERR_OR_NULL(sva)) > > > - intel_svm_free_pasid(mm); > > If bind fails, the PASID has no IOMMU nor CPU context. It should be safe to > > free here. > > The PASID can not be freed even if bind fails. The PASID allocated earlier > (either in this thread or in another thread) might be populated to other > threads already and being used now. > > Without freeing the PASID on bind failure, the worst case is the PASID might > not be used in the process (and will be freed on process exit anyway). > > This all matches with the PASID life time described in the commit message. But what does this mean for the user that failed that intel_svm_bind_mm()? Here's a scenario: Process sets up to use PASID capable device #1. Everything works, so the process has mm->pasid, and the IOMMU has the tables to map virtual addresses coming from device #1 using that PASID. Now the same process asks to start using PASID capable device #2, but there is a failure at intel_svm_bind_mm(). Fenghua is right that we shouldn't free the PASID. It is in use by at least one thread of the process to access device #1. But what happens with device #2? Does the caller of intel_svm_bind() do the right thing with the IS_ERR_OR_NULL return value to let the user know that device #2 isn't accessible? -Tony ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi, Jacob, On Wed, Feb 09, 2022 at 07:16:14PM -0800, Jacob Pan wrote: > Hi Fenghua, > > On Mon, 7 Feb 2022 15:02:48 -0800, Fenghua Yu wrote: > > > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device > > *dev, struct mm_struct *mm, void } > > > > sva = intel_svm_bind_mm(iommu, dev, mm, flags); > > - if (IS_ERR_OR_NULL(sva)) > > - intel_svm_free_pasid(mm); > If bind fails, the PASID has no IOMMU nor CPU context. It should be safe to > free here. The PASID can not be freed even if bind fails. The PASID allocated earlier (either in this thread or in another thread) might be populated to other threads already and being used now. Without freeing the PASID on bind failure, the worst case is the PASID might not be used in the process (and will be freed on process exit anyway). This all matches with the PASID life time described in the commit message. Thanks. -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
Hi Fenghua, On Mon, 7 Feb 2022 15:02:48 -0800, Fenghua Yu wrote: > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device > *dev, struct mm_struct *mm, void } > > sva = intel_svm_bind_mm(iommu, dev, mm, flags); > - if (IS_ERR_OR_NULL(sva)) > - intel_svm_free_pasid(mm); If bind fails, the PASID has no IOMMU nor CPU context. It should be safe to free here. Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On Tue, Feb 08, 2022 at 10:41:39AM +0800, Lu Baolu wrote: > On 2/8/22 7:02 AM, Fenghua Yu wrote: > > PASIDs are process wide. It was attempted to use refcounted PASIDs to > > free them when the last thread drops the refcount. This turned out to > > be complex and error prone. Given the fact that the PASID space is 20 > > bits, which allows up to 1M processes to have a PASID associated > > concurrently, PASID resource exhaustion is not a realistic concern. > > > > Therefore it was decided to simplify the approach and stick with lazy > > on demand PASID allocation, but drop the eager free approach and make > > a allocated PASID lifetime bound to the life time of the process. > > > > Get rid of the refcounting mechanisms and replace/rename the interfaces > > to reflect this new approach. > > > > Suggested-by: Dave Hansen > > Signed-off-by: Fenghua Yu > > Reviewed-by: Tony Luck > > --- > > v4: > > - Update the commit message (Thomas). > > Reviewed-by: Lu Baolu Thank you very much for your review, Baolu! -Fenghua ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit
On 2/8/22 7:02 AM, Fenghua Yu wrote: PASIDs are process wide. It was attempted to use refcounted PASIDs to free them when the last thread drops the refcount. This turned out to be complex and error prone. Given the fact that the PASID space is 20 bits, which allows up to 1M processes to have a PASID associated concurrently, PASID resource exhaustion is not a realistic concern. Therefore it was decided to simplify the approach and stick with lazy on demand PASID allocation, but drop the eager free approach and make a allocated PASID lifetime bound to the life time of the process. Get rid of the refcounting mechanisms and replace/rename the interfaces to reflect this new approach. Suggested-by: Dave Hansen Signed-off-by: Fenghua Yu Reviewed-by: Tony Luck --- v4: - Update the commit message (Thomas). v3: - Rename mm_pasid_get() to mm_pasid_set() (Thomas). - Remove ioasid_get() because it's not used any more when the IOASID is freed on mm exit (Thomas). - Remove PASID's refcount exercise in ioasid_put() and rename ioasid_put() to ioasid_free() (Thomas). v2: - Free PASID on mm exit instead of in exit(2) or unbind() (Thomas, AndyL, PeterZ) - Add mm_pasid_init(), mm_pasid_get(), and mm_pasid_drop() functions in mm. So the mm's PASID operations are generic for both X86 and ARM (Dave Hansen) .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 5 +-- drivers/iommu/intel/iommu.c | 4 +- drivers/iommu/intel/svm.c | 9 - drivers/iommu/ioasid.c| 39 ++- drivers/iommu/iommu-sva-lib.c | 39 ++- drivers/iommu/iommu-sva-lib.h | 1 - include/linux/ioasid.h| 12 +- include/linux/sched/mm.h | 16 kernel/fork.c | 1 + 9 files changed, 38 insertions(+), 88 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index a737ba5f727e..22ddd05bbdcd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -340,14 +340,12 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm) bond->smmu_mn = arm_smmu_mmu_notifier_get(smmu_domain, mm); if (IS_ERR(bond->smmu_mn)) { ret = PTR_ERR(bond->smmu_mn); - goto err_free_pasid; + goto err_free_bond; } list_add(>list, >bonds); return >sva; -err_free_pasid: - iommu_sva_free_pasid(mm); err_free_bond: kfree(bond); return ERR_PTR(ret); @@ -377,7 +375,6 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle) if (refcount_dec_and_test(>refs)) { list_del(>list); arm_smmu_mmu_notifier_put(bond->smmu_mn); - iommu_sva_free_pasid(bond->mm); kfree(bond); } mutex_unlock(_lock); diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 92fea3fbbb11..ef03b2176bbd 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4781,7 +4781,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain, link_failed: spin_unlock_irqrestore(_domain_lock, flags); if (list_empty(>subdevices) && domain->default_pasid > 0) - ioasid_put(domain->default_pasid); + ioasid_free(domain->default_pasid); return ret; } @@ -4811,7 +4811,7 @@ static void aux_domain_remove_dev(struct dmar_domain *domain, spin_unlock_irqrestore(_domain_lock, flags); if (list_empty(>subdevices) && domain->default_pasid > 0) - ioasid_put(domain->default_pasid); + ioasid_free(domain->default_pasid); } static int prepare_domain_attach_device(struct iommu_domain *domain, diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 5b5d69b04fcc..51ac2096b3da 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -514,11 +514,6 @@ static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm, return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1); } -static void intel_svm_free_pasid(struct mm_struct *mm) -{ - iommu_sva_free_pasid(mm); -} - static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu, struct device *dev, struct mm_struct *mm, @@ -662,8 +657,6 @@ static int intel_svm_unbind_mm(struct device *dev, u32 pasid) kfree(svm); } } - /* Drop a PASID reference and free it if no reference. */ - intel_svm_free_pasid(mm); } out: return ret; @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, void }