Re: [PATCH v2 00/33] Separate struct slab from struct page
On Tue, Dec 21, 2021 at 12:58:14AM +0100, Vlastimil Babka wrote: > On 12/16/21 16:00, Hyeonggon Yoo wrote: > > On Tue, Dec 14, 2021 at 01:57:22PM +0100, Vlastimil Babka wrote: > >> On 12/1/21 19:14, Vlastimil Babka wrote: > >> > Folks from non-slab subsystems are Cc'd only to patches affecting them, > >> > and > >> > this cover letter. > >> > > >> > Series also available in git, based on 5.16-rc3: > >> > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2 > >> > >> Pushed a new branch slab-struct-slab-v3r3 with accumulated fixes and small > >> tweaks > >> and a new patch from Hyeonggon Yoo on top. To avoid too much spam, here's > >> a range diff: > > > > Reviewing the whole patch series takes longer than I thought. > > I'll try to review and test rest of patches when I have time. > > > > I added Tested-by if kernel builds okay and kselftests > > does not break the kernel on my machine. > > (with CONFIG_SLAB/SLUB/SLOB depending on the patch), > > Thanks! > :) > > Let me know me if you know better way to test a patch. > > Testing on your machine is just fine. > Good! > > # mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only when > > enabled > > > > Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com> > > Tested-by: Hyeonggon Yoo <42.hye...@gmail.com> > > > > Comment: > > Works on both SLUB_CPU_PARTIAL and !SLUB_CPU_PARTIAL. > > btw, do we need slabs_cpu_partial attribute when we don't use > > cpu partials? (!SLUB_CPU_PARTIAL) > > The sysfs attribute? Yeah we should be consistent to userspace expecting to > read it (even with zeroes), regardless of config. > I thought entirely disabling the attribute is simpler, But okay If it should be exposed even if it's always zero. > > # mm/slub: Simplify struct slab slabs field definition > > Comment: > > > > This is how struct page looks on the top of v3r3 branch: > > struct page { > > [...] > > struct {/* slab, slob and slub */ > > union { > > struct list_head slab_list; > > struct {/* Partial pages */ > > struct page *next; > > #ifdef CONFIG_64BIT > > int pages; /* Nr of pages left > > */ > > #else > > short int pages; > > #endif > > }; > > }; > > [...] > > > > It's not consistent with struct slab. > > Hm right. But as we don't actually use the struct page version anymore, and > it's not one of the fields checked by SLAB_MATCH(), we can ignore this. > Yeah this is not a big problem. just mentioned this because it looked weird and I didn't know when the patch "mm: Remove slab from struct page" will come back. > > I think this is because "mm: Remove slab from struct page" was dropped. > > That was just postponed until iommu changes are in. Matthew mentioned those > might be merged too, so that final cleanup will happen too and take care of > the discrepancy above, so no need for extra churn to address it speficially. > Okay it seems no extra work needed until the iommu changes are in! BTW, in the patch (that I sent) ("mm/slob: Remove unnecessary page_mapcount_reset() function call"), it refers commit 4525180926f9 ("mm/sl*b: Differentiate struct slab fields by sl*b implementations"). But the commit hash 4525180926f9 changed after the tree has been changed. It will be nice to write a script to handle situations like this. > > Would you update some of patches? > > > > # mm/sl*b: Differentiate struct slab fields by sl*b implementations > > Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com> > > Tested-by: Hyeonggon Yoo <42.hye...@gmail.com> > > Works SL[AUO]B on my machine and makes code much better. > > > > # mm/slob: Convert SLOB to use struct slab and struct folio > > Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com> > > Tested-by: Hyeonggon Yoo <42.hye...@gmail.com> > > It still works fine on SLOB. > > > > # mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab > > Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com> > > Tested-by: Hyeonggon Yoo <42.hye...@gmail.com> > > > > # mm/slub: Convert __free_slab() to use struct slab > > Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com> > > Tested-by: Hyeonggon Yoo <42.hye...@gmail.com> > > > > Thanks, > > Hyeonggon. > > Thanks again, > Vlastimil Have a nice day, thanks! Hyeonggon. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
On 12/22/21 12:22 PM, Lu Baolu wrote: void iommu_detach_device_shared(struct iommu_domain *domain, struct device *dev) Sorry for typo. Please ignore the _shared postfix. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
On 12/22/21 2:46 AM, Jason Gunthorpe wrote: It's worth taking a step back and realising that overall, this is really just a more generalised and finer-grained extension of what 426a273834ea already did for non-group-aware code, so it makes little sense*not* to integrate it into the existing interfaces. This is taking 426a to it's logical conclusion and*removing* the group API from the drivers entirely. This is desirable because drivers cannot do anything sane with the group. The drivers have struct devices, and so we provide APIs that work in terms of struct devices to cover both driver use cases today, and do so more safely than what is already implemented. Do not mix up VFIO with the driver interface, these are different things. It is better VFIO stay on its own and not complicate the driver world. Per Joerg's previous comments: https://lore.kernel.org/linux-iommu/2029150612.jhsvsbzisvux2...@8bytes.org/ The commit 426a273834ea came only in order to disallow attaching a single device within a group to a different iommu_domain. So it's reasonable to improve the existing iommu_attach/detach_device() to cover all cases. How about below code? Did I miss anything? int iommu_attach_device(struct iommu_domain *domain, struct device *dev) { struct iommu_group *group; int ret = 0; group = iommu_group_get(dev); if (!group) return -ENODEV; mutex_lock(>mutex); if (group->attach_cnt) { if (group->domain != domain) { ret = -EBUSY; goto unlock_out; } } else { ret = __iommu_attach_group(domain, group); if (ret) goto unlock_out; } group->attach_cnt++; unlock_out: mutex_unlock(>mutex); iommu_group_put(group); return ret; } EXPORT_SYMBOL_GPL(iommu_attach_device); void iommu_detach_device_shared(struct iommu_domain *domain, struct device *dev) { struct iommu_group *group; group = iommu_group_get(dev); if (WARN_ON(!group)) return; mutex_lock(>mutex); if (WARN_ON(!group->attach_cnt || group->domain != domain) goto unlock_out; if (--group->attach_cnt == 0) __iommu_detach_group(domain, group); unlock_out: mutex_unlock(>mutex); iommu_group_put(group); } EXPORT_SYMBOL_GPL(iommu_detach_device); Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] Swiotlb: remove a duplicate include
Remove a duplicate "#include ". The deleted one in line 43 is under "#ifdef CONFIG_DMA_RESTRICTED_POOL". However, there is already one in line 53 with no conditional compile. Signed-off-by: Guo Zhengkui --- kernel/dma/swiotlb.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 1b0501fd3e0e..8c091626ca35 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -33,21 +33,20 @@ #include #include #include #include #include #include #ifdef CONFIG_DEBUG_FS #include #endif #ifdef CONFIG_DMA_RESTRICTED_POOL -#include #include #include #include #include #endif #include #include #include -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 4/5] iommu/arm-smmu-v3: Add host support for NVIDIA Grace CMDQ-V
On Tue, Dec 21, 2021 at 06:55:20PM +, Robin Murphy wrote: > External email: Use caution opening links or attachments > > > On 2021-12-20 19:27, Nicolin Chen wrote: > > Hi Robin, > > > > Thank you for the reply! > > > > On Mon, Dec 20, 2021 at 06:42:26PM +, Robin Murphy wrote: > > > On 2021-11-19 07:19, Nicolin Chen wrote: > > > > From: Nate Watterson > > > > > > > > NVIDIA's Grace Soc has a CMDQ-Virtualization (CMDQV) hardware, > > > > which extends the standard ARM SMMU v3 IP to support multiple > > > > VCMDQs with virtualization capabilities. In-kernel of host OS, > > > > they're used to reduce contention on a single queue. In terms > > > > of command queue, they are very like the standard CMDQ/ECMDQs, > > > > but only support CS_NONE in the CS field of CMD_SYNC command. > > > > > > > > This patch adds a new nvidia-grace-cmdqv file and inserts its > > > > structure pointer into the existing arm_smmu_device, and then > > > > adds related function calls in the arm-smmu-v3 driver. > > > > > > > > In the CMDQV driver itself, this patch only adds minimal part > > > > for host kernel support. Upon probe(), VINTF0 is reserved for > > > > in-kernel use. And some of the VCMDQs are assigned to VINTF0. > > > > Then the driver will select one of VCMDQs in the VINTF0 based > > > > on the CPU currently executing, to issue commands. > > > > > > Is there a tangible difference to DMA API or VFIO performance? > > > > Our testing environment is currently running on a single-core > > CPU, so unfortunately we don't have a perf data at this point. > > OK, as for the ECMDQ patches I think we'll need some investigation with > real workloads to judge whether we can benefit from these things enough > to justify the complexity, and whether the design is right. > > My gut feeling is that if these multi-queue schemes really can live up > to their promise of making contention negligible, then they should > further stand to benefit from bypassing the complex lock-free command > batching in favour of something more lightweight, which could change the > direction of much of the refactoring. Makes sense. We will share our perf data once we have certain level of support on our test environment. > > > [...] > > > > +struct arm_smmu_cmdq *nvidia_grace_cmdqv_get_cmdq(struct > > > > arm_smmu_device *smmu) > > > > +{ > > > > + struct nvidia_grace_cmdqv *cmdqv = smmu->nvidia_grace_cmdqv; > > > > + struct nvidia_grace_cmdqv_vintf *vintf0 = >vintf0; > > > > + u16 qidx; > > > > + > > > > + /* Check error status of vintf0 */ > > > > + if (!FIELD_GET(VINTF_STATUS, vintf0->status)) > > > > + return >cmdq; > > > > + > > > > + /* > > > > + * Select a vcmdq to use. Here we use a temporal solution to > > > > + * balance out traffic on cmdq issuing: each cmdq has its own > > > > + * lock, if all cpus issue cmdlist using the same cmdq, only > > > > + * one CPU at a time can enter the process, while the others > > > > + * will be spinning at the same lock. > > > > + */ > > > > + qidx = smp_processor_id() % cmdqv->num_vcmdqs_per_vintf; > > > > > > How does ordering work between queues? Do they follow a global order > > > such that a sync on any queue is guaranteed to complete all prior > > > commands on all queues? > > > > CMDQV internal scheduler would insert a SYNC when (for example) > > switching from VCMDQ0 to VCMDQ1 while last command in VCMDQ0 is > > not SYNC. HW has a configuration bit in the register to disable > > this feature, which is by default enabled. > > Interesting, thanks. So it sounds like this is something you can get > away with for the moment, but may need to revisit once people chasing > real-world performance start wanting to turn that bit off. Yea, we have limitations on both testing setup and available clients for an in-depth perf measurement at this moment. But we surely will do as you mentioned. Anyway, this is just for initial support. > > > The challenge to make ECMDQ useful to Linux is how to make sure that all > > > the commands expected to be within scope of a future CMND_SYNC plus that > > > sync itself all get issued on the same queue, so I'd be mildly surprised > > > if you didn't have the same problem. > > > > PATCH-3 in this series actually helps align the command queues, > > between issued commands and SYNC, if bool sync == true. Yet, if > > doing something like issue->issue->issue_with_sync, it could be > > tricker. > > Indeed between the iommu_iotlb_gather mechanism and low-level command > batching things are already a lot more concentrated than they could be, > but arm_smmu_cmdq_batch_add() and its callers stand out as examples of > where we'd still be vulnerable to preemption. What I haven't even tried > to reason about yet is assumptions in the higher-level APIs, e.g. if > io-pgtable might chuck out a TLBI during an iommu_unmap() which we > implicitly expect a later iommu_iotlb_sync() to cover. Though I might have
Re: [PATCH v3 4/5] iommu/arm-smmu-v3: Add host support for NVIDIA Grace CMDQ-V
On 2021-12-20 19:27, Nicolin Chen wrote: Hi Robin, Thank you for the reply! On Mon, Dec 20, 2021 at 06:42:26PM +, Robin Murphy wrote: On 2021-11-19 07:19, Nicolin Chen wrote: From: Nate Watterson NVIDIA's Grace Soc has a CMDQ-Virtualization (CMDQV) hardware, which extends the standard ARM SMMU v3 IP to support multiple VCMDQs with virtualization capabilities. In-kernel of host OS, they're used to reduce contention on a single queue. In terms of command queue, they are very like the standard CMDQ/ECMDQs, but only support CS_NONE in the CS field of CMD_SYNC command. This patch adds a new nvidia-grace-cmdqv file and inserts its structure pointer into the existing arm_smmu_device, and then adds related function calls in the arm-smmu-v3 driver. In the CMDQV driver itself, this patch only adds minimal part for host kernel support. Upon probe(), VINTF0 is reserved for in-kernel use. And some of the VCMDQs are assigned to VINTF0. Then the driver will select one of VCMDQs in the VINTF0 based on the CPU currently executing, to issue commands. Is there a tangible difference to DMA API or VFIO performance? Our testing environment is currently running on a single-core CPU, so unfortunately we don't have a perf data at this point. OK, as for the ECMDQ patches I think we'll need some investigation with real workloads to judge whether we can benefit from these things enough to justify the complexity, and whether the design is right. My gut feeling is that if these multi-queue schemes really can live up to their promise of making contention negligible, then they should further stand to benefit from bypassing the complex lock-free command batching in favour of something more lightweight, which could change the direction of much of the refactoring. [...] +struct arm_smmu_cmdq *nvidia_grace_cmdqv_get_cmdq(struct arm_smmu_device *smmu) +{ + struct nvidia_grace_cmdqv *cmdqv = smmu->nvidia_grace_cmdqv; + struct nvidia_grace_cmdqv_vintf *vintf0 = >vintf0; + u16 qidx; + + /* Check error status of vintf0 */ + if (!FIELD_GET(VINTF_STATUS, vintf0->status)) + return >cmdq; + + /* + * Select a vcmdq to use. Here we use a temporal solution to + * balance out traffic on cmdq issuing: each cmdq has its own + * lock, if all cpus issue cmdlist using the same cmdq, only + * one CPU at a time can enter the process, while the others + * will be spinning at the same lock. + */ + qidx = smp_processor_id() % cmdqv->num_vcmdqs_per_vintf; How does ordering work between queues? Do they follow a global order such that a sync on any queue is guaranteed to complete all prior commands on all queues? CMDQV internal scheduler would insert a SYNC when (for example) switching from VCMDQ0 to VCMDQ1 while last command in VCMDQ0 is not SYNC. HW has a configuration bit in the register to disable this feature, which is by default enabled. Interesting, thanks. So it sounds like this is something you can get away with for the moment, but may need to revisit once people chasing real-world performance start wanting to turn that bit off. The challenge to make ECMDQ useful to Linux is how to make sure that all the commands expected to be within scope of a future CMND_SYNC plus that sync itself all get issued on the same queue, so I'd be mildly surprised if you didn't have the same problem. PATCH-3 in this series actually helps align the command queues, between issued commands and SYNC, if bool sync == true. Yet, if doing something like issue->issue->issue_with_sync, it could be tricker. Indeed between the iommu_iotlb_gather mechanism and low-level command batching things are already a lot more concentrated than they could be, but arm_smmu_cmdq_batch_add() and its callers stand out as examples of where we'd still be vulnerable to preemption. What I haven't even tried to reason about yet is assumptions in the higher-level APIs, e.g. if io-pgtable might chuck out a TLBI during an iommu_unmap() which we implicitly expect a later iommu_iotlb_sync() to cover. I've been thinking that in many ways per-domain queues make quite a bit of sense and would be easier to manage than per-CPU ones - plus that's pretty much the usage model once we get to VMs anyway - but that fails to help the significant cases like networking and storage where many CPUs are servicing a big monolithic device in a single domain :( Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
On Tue, Dec 21, 2021 at 04:50:56PM +, Robin Murphy wrote: > this proposal is the worst of both worlds, in that drivers still have to be > just as aware of groups in order to know whether to call the _shared > interface or not, except it's now entirely implicit and non-obvious. Drivers are not aware of groups, where did you see that? Drivers have to indicate their intention, based entirely on their own internal design. If groups are present, or not is irrelevant to the driver. If the driver uses a single struct device (which is most) then it uses iommu_attach_device(). If the driver uses multiple struct devices and intends to connect them all to the same domain then it uses the _shared variant. The only difference between the two is the _shared varient lacks some of the protections against driver abuse of the API. Nothing uses the group interface except for VFIO and stuff inside drivers/iommu. VFIO has a uAPI tied to the group interface and it is stuck with it. > Otherwise just add the housekeeping stuff to iommu_{attach,detach}_group() - > there's no way we want *three* attach/detach interfaces all with different > semantics. I'm not sure why you think 3 APIs is bad thing. Threes APIs, with clearly intended purposes is a lot better than one giant API with a bunch of parameters that tries to do everything. In this case, it is not simple to 'add the housekeeping' to iommu_attach_group() in a way that is useful to both tegra and VFIO. What tegra wants is what the _shared API implements, and that logic should not be open coded in drivers. VFIO does not want exactly that, it has its own logic to deal directly with groups tied to its uAPI. Due to the uAPI it doesn't even have a struct device, unfortunately. The reason there are three APIs is because there are three different use-cases. It is not bad thing to have APIs designed for the use cases they serve. > It's worth taking a step back and realising that overall, this is really > just a more generalised and finer-grained extension of what 426a273834ea > already did for non-group-aware code, so it makes little sense *not* to > integrate it into the existing interfaces. This is taking 426a to it's logical conclusion and *removing* the group API from the drivers entirely. This is desirable because drivers cannot do anything sane with the group. The drivers have struct devices, and so we provide APIs that work in terms of struct devices to cover both driver use cases today, and do so more safely than what is already implemented. Do not mix up VFIO with the driver interface, these are different things. It is better VFIO stay on its own and not complicate the driver world. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 21/23] mm: use custom page_free for P2PDMA pages
On 2021-12-21 2:06 a.m., Christoph Hellwig wrote: > On Wed, Nov 17, 2021 at 02:54:08PM -0700, Logan Gunthorpe wrote: >> When P2PDMA pages are passed to userspace, they will need to be >> reference counted properly and returned to their genalloc after their >> reference count returns to 1. This is accomplished with the existing >> DEV_PAGEMAP_OPS and the .page_free() operation. >> >> Change CONFIG_P2PDMA to select CONFIG_DEV_PAGEMAP_OPS and add >> MEMORY_DEVICE_PCI_P2PDMA to page_is_devmap_managed(), >> devmap_managed_enable_[put|get]() and free_devmap_managed_page(). > > Uuuh. We are trying hard to kill off this magic free at refcount 1 > behavior in the amdgpu device coherent series. We really should not > add more of this. Ah, ok. I found Ralph's patch that cleans this up and I can try to rebase this onto it for future postings until it gets merged. Your other comments I can address for the next time I post this series. Thanks for the review! Logan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 00/33] Separate struct slab from struct page
On 2021-12-20 23:58, Vlastimil Babka wrote: On 12/16/21 16:00, Hyeonggon Yoo wrote: On Tue, Dec 14, 2021 at 01:57:22PM +0100, Vlastimil Babka wrote: On 12/1/21 19:14, Vlastimil Babka wrote: Folks from non-slab subsystems are Cc'd only to patches affecting them, and this cover letter. Series also available in git, based on 5.16-rc3: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2 Pushed a new branch slab-struct-slab-v3r3 with accumulated fixes and small tweaks and a new patch from Hyeonggon Yoo on top. To avoid too much spam, here's a range diff: Reviewing the whole patch series takes longer than I thought. I'll try to review and test rest of patches when I have time. I added Tested-by if kernel builds okay and kselftests does not break the kernel on my machine. (with CONFIG_SLAB/SLUB/SLOB depending on the patch), Thanks! Let me know me if you know better way to test a patch. Testing on your machine is just fine. # mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only when enabled Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com> Comment: Works on both SLUB_CPU_PARTIAL and !SLUB_CPU_PARTIAL. btw, do we need slabs_cpu_partial attribute when we don't use cpu partials? (!SLUB_CPU_PARTIAL) The sysfs attribute? Yeah we should be consistent to userspace expecting to read it (even with zeroes), regardless of config. # mm/slub: Simplify struct slab slabs field definition Comment: This is how struct page looks on the top of v3r3 branch: struct page { [...] struct {/* slab, slob and slub */ union { struct list_head slab_list; struct {/* Partial pages */ struct page *next; #ifdef CONFIG_64BIT int pages; /* Nr of pages left */ #else short int pages; #endif }; }; [...] It's not consistent with struct slab. Hm right. But as we don't actually use the struct page version anymore, and it's not one of the fields checked by SLAB_MATCH(), we can ignore this. I think this is because "mm: Remove slab from struct page" was dropped. That was just postponed until iommu changes are in. Matthew mentioned those might be merged too, so that final cleanup will happen too and take care of the discrepancy above, so no need for extra churn to address it speficially. FYI the IOMMU changes are now queued in linux-next, so if all goes well you might be able to sneak that final patch in too. Robin. Would you update some of patches? # mm/sl*b: Differentiate struct slab fields by sl*b implementations Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com> Works SL[AUO]B on my machine and makes code much better. # mm/slob: Convert SLOB to use struct slab and struct folio Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com> It still works fine on SLOB. # mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com> # mm/slub: Convert __free_slab() to use struct slab Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com> Thanks, Hyeonggon. Thanks again, Vlastimil ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 01/23] lib/scatterlist: cleanup macros into static inline functions
On 2021-12-21 2:00 a.m., Christoph Hellwig wrote: > On Wed, Nov 17, 2021 at 02:53:48PM -0700, Logan Gunthorpe wrote: >> Convert the sg_is_chain(), sg_is_last() and sg_chain_ptr() macros >> into static inline functions. There's no reason for these to be macros >> and static inline are generally preferred these days. >> >> Also introduce the SG_PAGE_LINK_MASK define so the P2PDMA work, which is >> adding another bit to this mask, can do so more easily. >> >> Suggested-by: Jason Gunthorpe >> Signed-off-by: Logan Gunthorpe > > Looks fine: > > Reviewed-by: Christoph Hellwig > > scatterlist.h doesn't have a real maintainer, do you want me to pick > this up through the DMA tree? Sure, that would be great! Thanks, Logan ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups
On 2021-12-17 06:37, Lu Baolu wrote: The iommu_attach/detach_device() interfaces were exposed for the device drivers to attach/detach their own domains. The commit <426a273834eae> ("iommu: Limit iommu_attach/detach_device to device with their own group") restricted them to singleton groups to avoid different device in a group attaching different domain. As we've introduced device DMA ownership into the iommu core. We can now introduce interfaces for muliple-device groups, and "all devices are in the same address space" is still guaranteed. The iommu_attach/detach_device_shared() could be used when multiple drivers sharing the group claim the DMA_OWNER_PRIVATE_DOMAIN ownership. The first call of iommu_attach_device_shared() attaches the domain to the group. Other drivers could join it later. The domain will be detached from the group after all drivers unjoin it. I don't see the point of this at all - if you really want to hide the concept of IOMMU groups away from drivers then just make iommu_{attach,detach}_device() do the right thing. At least the iommu_group_get_for_dev() plus iommu_{attach,detach}_group() API is clear - this proposal is the worst of both worlds, in that drivers still have to be just as aware of groups in order to know whether to call the _shared interface or not, except it's now entirely implicit and non-obvious. Otherwise just add the housekeeping stuff to iommu_{attach,detach}_group() - there's no way we want *three* attach/detach interfaces all with different semantics. It's worth taking a step back and realising that overall, this is really just a more generalised and finer-grained extension of what 426a273834ea already did for non-group-aware code, so it makes little sense *not* to integrate it into the existing interfaces. Robin. Signed-off-by: Jason Gunthorpe Signed-off-by: Lu Baolu Tested-by: Dmitry Osipenko --- include/linux/iommu.h | 13 +++ drivers/iommu/iommu.c | 79 +++ 2 files changed, 92 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 5ad4cf13370d..1bc03118dfb3 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -703,6 +703,8 @@ int iommu_group_set_dma_owner(struct iommu_group *group, enum iommu_dma_owner ow void *owner_cookie); void iommu_group_release_dma_owner(struct iommu_group *group, enum iommu_dma_owner owner); bool iommu_group_dma_owner_unclaimed(struct iommu_group *group); +int iommu_attach_device_shared(struct iommu_domain *domain, struct device *dev); +void iommu_detach_device_shared(struct iommu_domain *domain, struct device *dev); #else /* CONFIG_IOMMU_API */ @@ -743,11 +745,22 @@ static inline int iommu_attach_device(struct iommu_domain *domain, return -ENODEV; } +static inline int iommu_attach_device_shared(struct iommu_domain *domain, +struct device *dev) +{ + return -ENODEV; +} + static inline void iommu_detach_device(struct iommu_domain *domain, struct device *dev) { } +static inline void iommu_detach_device_shared(struct iommu_domain *domain, + struct device *dev) +{ +} + static inline struct iommu_domain *iommu_get_domain_for_dev(struct device *dev) { return NULL; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 8bec71b1cc18..3ad66cb9bedc 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -50,6 +50,7 @@ struct iommu_group { struct list_head entry; enum iommu_dma_owner dma_owner; unsigned int owner_cnt; + unsigned int attach_cnt; void *owner_cookie; }; @@ -3512,3 +3513,81 @@ void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner own iommu_group_put(group); } EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner); + +/** + * iommu_attach_device_shared() - Attach shared domain to a device + * @domain: The shared domain. + * @dev: The device. + * + * Similar to iommu_attach_device(), but allowed for shared-group devices + * and guarantees that all devices in an iommu group could only be attached + * by a same iommu domain. The caller should explicitly set the dma ownership + * of DMA_OWNER_PRIVATE_DOMAIN or DMA_OWNER_PRIVATE_DOMAIN_USER type before + * calling it and use the paired helper iommu_detach_device_shared() for + * cleanup. + */ +int iommu_attach_device_shared(struct iommu_domain *domain, struct device *dev) +{ + struct iommu_group *group; + int ret = 0; + + group = iommu_group_get(dev); + if (!group) + return -ENODEV; + + mutex_lock(>mutex); + if (group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN && + group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER) { + ret = -EPERM; + goto unlock_out; + } + + if
Re: [PATCH v4 23/23] nvme-pci: allow mmaping the CMB in userspace
> file->private_data = ctrl; > + > + if (ctrl->ops->mmap_file_open) > + ctrl->ops->mmap_file_open(ctrl, file); > + The callout doesn't really have anything to do with mmap, that is just how you use it. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 21/23] mm: use custom page_free for P2PDMA pages
On Wed, Nov 17, 2021 at 02:54:08PM -0700, Logan Gunthorpe wrote: > When P2PDMA pages are passed to userspace, they will need to be > reference counted properly and returned to their genalloc after their > reference count returns to 1. This is accomplished with the existing > DEV_PAGEMAP_OPS and the .page_free() operation. > > Change CONFIG_P2PDMA to select CONFIG_DEV_PAGEMAP_OPS and add > MEMORY_DEVICE_PCI_P2PDMA to page_is_devmap_managed(), > devmap_managed_enable_[put|get]() and free_devmap_managed_page(). Uuuh. We are trying hard to kill off this magic free at refcount 1 behavior in the amdgpu device coherent series. We really should not add more of this. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 17/23] block: add check when merging zone device pages
> +/* > + * Consecutive zone device pages should not be merged into the same sgl > + * or bvec segment with other types of pages or if they belong to different > + * pgmaps. Otherwise getting the pgmap of a given segment is not possible > + * without scanning the entire segment. This helper returns true either if > + * both pages are not zone device pages or both pages are zone device pages > + * with the same pgmap. > + */ > +static inline bool zone_device_pages_are_mergeable(const struct page *a, > +const struct page *b) Merging is only really a use case here. This really checks if they belong to the same pgmap, so I suspect that should be in the name. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 16/23] iov_iter: introduce iov_iter_get_pages_[alloc_]flags()
All these new helpers should be _GPL exports, keeping the existing ones (which should never have been non-GPL exports as tiny wrappers around GUP-fast) as out of line wrappers. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 02/23] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL
> + # > + # The need for the scatterlist DMA bus address flag means PCI P2PDMA > + # requires 64bit > + # > + select NEED_SG_DMA_BUS_ADDR_FLAG > +config NEED_SG_DMA_BUS_ADDR_FLAG > + depends on 64BIT > + bool depends does not work for symbols that are selected using select. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 01/23] lib/scatterlist: cleanup macros into static inline functions
On Wed, Nov 17, 2021 at 02:53:48PM -0700, Logan Gunthorpe wrote: > Convert the sg_is_chain(), sg_is_last() and sg_chain_ptr() macros > into static inline functions. There's no reason for these to be macros > and static inline are generally preferred these days. > > Also introduce the SG_PAGE_LINK_MASK define so the P2PDMA work, which is > adding another bit to this mask, can do so more easily. > > Suggested-by: Jason Gunthorpe > Signed-off-by: Logan Gunthorpe Looks fine: Reviewed-by: Christoph Hellwig scatterlist.h doesn't have a real maintainer, do you want me to pick this up through the DMA tree? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu