Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-21 Thread Hyeonggon Yoo
On Tue, Dec 21, 2021 at 12:58:14AM +0100, Vlastimil Babka wrote:
> On 12/16/21 16:00, Hyeonggon Yoo wrote:
> > On Tue, Dec 14, 2021 at 01:57:22PM +0100, Vlastimil Babka wrote:
> >> On 12/1/21 19:14, Vlastimil Babka wrote:
> >> > Folks from non-slab subsystems are Cc'd only to patches affecting them, 
> >> > and
> >> > this cover letter.
> >> > 
> >> > Series also available in git, based on 5.16-rc3:
> >> > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2
> >> 
> >> Pushed a new branch slab-struct-slab-v3r3 with accumulated fixes and small 
> >> tweaks
> >> and a new patch from Hyeonggon Yoo on top. To avoid too much spam, here's 
> >> a range diff:
> > 
> > Reviewing the whole patch series takes longer than I thought.
> > I'll try to review and test rest of patches when I have time.
> > 
> > I added Tested-by if kernel builds okay and kselftests
> > does not break the kernel on my machine.
> > (with CONFIG_SLAB/SLUB/SLOB depending on the patch),
> 
> Thanks!
>

:)

> > Let me know me if you know better way to test a patch.
> 
> Testing on your machine is just fine.
>

Good!

> > # mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only when 
> > enabled
> > 
> > Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> > Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> > 
> > Comment:
> > Works on both SLUB_CPU_PARTIAL and !SLUB_CPU_PARTIAL.
> > btw, do we need slabs_cpu_partial attribute when we don't use
> > cpu partials? (!SLUB_CPU_PARTIAL)
> 
> The sysfs attribute? Yeah we should be consistent to userspace expecting to
> read it (even with zeroes), regardless of config.
> 

I thought entirely disabling the attribute is simpler,
But okay If it should be exposed even if it's always zero.

> > # mm/slub: Simplify struct slab slabs field definition
> > Comment:
> > 
> > This is how struct page looks on the top of v3r3 branch:
> > struct page {
> > [...]
> > struct {/* slab, slob and slub */
> > union {
> > struct list_head slab_list;
> > struct {/* Partial pages */
> > struct page *next;
> > #ifdef CONFIG_64BIT
> > int pages;  /* Nr of pages left 
> > */
> > #else
> > short int pages;
> > #endif
> > };
> > };
> > [...]
> > 
> > It's not consistent with struct slab.
> 
> Hm right. But as we don't actually use the struct page version anymore, and
> it's not one of the fields checked by SLAB_MATCH(), we can ignore this.
>

Yeah this is not a big problem. just mentioned this because 
it looked weird and I didn't know when the patch "mm: Remove slab from struct 
page"
will come back.

> > I think this is because "mm: Remove slab from struct page" was dropped.
>
> That was just postponed until iommu changes are in. Matthew mentioned those
> might be merged too, so that final cleanup will happen too and take care of
> the discrepancy above, so no need for extra churn to address it speficially.
> 

Okay it seems no extra work needed until the iommu changes are in!

BTW, in the patch (that I sent) ("mm/slob: Remove unnecessary 
page_mapcount_reset()
function call"), it refers commit 4525180926f9  ("mm/sl*b: Differentiate struct 
slab fields by
sl*b implementations"). But the commit hash 4525180926f9 changed after the
tree has been changed.

It will be nice to write a script to handle situations like this.

> > Would you update some of patches?
> > 
> > # mm/sl*b: Differentiate struct slab fields by sl*b implementations
> > Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> > Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> > Works SL[AUO]B on my machine and makes code much better.
> > 
> > # mm/slob: Convert SLOB to use struct slab and struct folio
> > Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> > Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> > It still works fine on SLOB.
> > 
> > # mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab
> > Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> > Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> >
> > # mm/slub: Convert __free_slab() to use struct slab
> > Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> > Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> > 
> > Thanks,
> > Hyeonggon.
> 
> Thanks again,
> Vlastimil

Have a nice day, thanks!
Hyeonggon.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups

2021-12-21 Thread Lu Baolu

On 12/22/21 12:22 PM, Lu Baolu wrote:
void iommu_detach_device_shared(struct iommu_domain *domain, struct 
device *dev)


Sorry for typo. Please ignore the _shared postfix.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups

2021-12-21 Thread Lu Baolu

On 12/22/21 2:46 AM, Jason Gunthorpe wrote:

It's worth taking a step back and realising that overall, this is really
just a more generalised and finer-grained extension of what 426a273834ea
already did for non-group-aware code, so it makes little sense*not*  to
integrate it into the existing interfaces.

This is taking 426a to it's logical conclusion and*removing*  the
group API from the drivers entirely. This is desirable because drivers
cannot do anything sane with the group.

The drivers have struct devices, and so we provide APIs that work in
terms of struct devices to cover both driver use cases today, and do
so more safely than what is already implemented.

Do not mix up VFIO with the driver interface, these are different
things. It is better VFIO stay on its own and not complicate the
driver world.


Per Joerg's previous comments:

https://lore.kernel.org/linux-iommu/2029150612.jhsvsbzisvux2...@8bytes.org/

The commit 426a273834ea came only in order to disallow attaching a
single device within a group to a different iommu_domain. So it's
reasonable to improve the existing iommu_attach/detach_device() to cover
all cases. How about below code? Did I miss anything?

int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
{
struct iommu_group *group;
int ret = 0;

group = iommu_group_get(dev);
if (!group)
return -ENODEV;

mutex_lock(>mutex);
if (group->attach_cnt) {
if (group->domain != domain) {
ret = -EBUSY;
goto unlock_out;
}
} else {
ret = __iommu_attach_group(domain, group);
if (ret)
goto unlock_out;
}

group->attach_cnt++;
unlock_out:
mutex_unlock(>mutex);
iommu_group_put(group);

return ret;
}
EXPORT_SYMBOL_GPL(iommu_attach_device);

void iommu_detach_device_shared(struct iommu_domain *domain, struct 
device *dev)

{
struct iommu_group *group;

group = iommu_group_get(dev);
if (WARN_ON(!group))
return;

mutex_lock(>mutex);
if (WARN_ON(!group->attach_cnt || group->domain != domain)
goto unlock_out;

if (--group->attach_cnt == 0)
__iommu_detach_group(domain, group);

unlock_out:
mutex_unlock(>mutex);
iommu_group_put(group);
}
EXPORT_SYMBOL_GPL(iommu_detach_device);

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] Swiotlb: remove a duplicate include

2021-12-21 Thread Guo Zhengkui
Remove a duplicate "#include ". The deleted one in line 43
is under "#ifdef CONFIG_DMA_RESTRICTED_POOL". However, there is already
one in line 53 with no conditional compile.

Signed-off-by: Guo Zhengkui 
---
 kernel/dma/swiotlb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1b0501fd3e0e..8c091626ca35 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -33,21 +33,20 @@
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
 #ifdef CONFIG_DMA_RESTRICTED_POOL
-#include 
 #include 
 #include 
 #include 
 #include 
 #endif
 
 #include 
 #include 
 
 #include 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/5] iommu/arm-smmu-v3: Add host support for NVIDIA Grace CMDQ-V

2021-12-21 Thread Nicolin Chen via iommu
On Tue, Dec 21, 2021 at 06:55:20PM +, Robin Murphy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2021-12-20 19:27, Nicolin Chen wrote:
> > Hi Robin,
> > 
> > Thank you for the reply!
> > 
> > On Mon, Dec 20, 2021 at 06:42:26PM +, Robin Murphy wrote:
> > > On 2021-11-19 07:19, Nicolin Chen wrote:
> > > > From: Nate Watterson 
> > > > 
> > > > NVIDIA's Grace Soc has a CMDQ-Virtualization (CMDQV) hardware,
> > > > which extends the standard ARM SMMU v3 IP to support multiple
> > > > VCMDQs with virtualization capabilities. In-kernel of host OS,
> > > > they're used to reduce contention on a single queue. In terms
> > > > of command queue, they are very like the standard CMDQ/ECMDQs,
> > > > but only support CS_NONE in the CS field of CMD_SYNC command.
> > > > 
> > > > This patch adds a new nvidia-grace-cmdqv file and inserts its
> > > > structure pointer into the existing arm_smmu_device, and then
> > > > adds related function calls in the arm-smmu-v3 driver.
> > > > 
> > > > In the CMDQV driver itself, this patch only adds minimal part
> > > > for host kernel support. Upon probe(), VINTF0 is reserved for
> > > > in-kernel use. And some of the VCMDQs are assigned to VINTF0.
> > > > Then the driver will select one of VCMDQs in the VINTF0 based
> > > > on the CPU currently executing, to issue commands.
> > > 
> > > Is there a tangible difference to DMA API or VFIO performance?
> > 
> > Our testing environment is currently running on a single-core
> > CPU, so unfortunately we don't have a perf data at this point.
> 
> OK, as for the ECMDQ patches I think we'll need some investigation with
> real workloads to judge whether we can benefit from these things enough
> to justify the complexity, and whether the design is right.
> 
> My gut feeling is that if these multi-queue schemes really can live up
> to their promise of making contention negligible, then they should
> further stand to benefit from bypassing the complex lock-free command
> batching in favour of something more lightweight, which could change the
> direction of much of the refactoring.

Makes sense. We will share our perf data once we have certain
level of support on our test environment.

> > > [...]
> > > > +struct arm_smmu_cmdq *nvidia_grace_cmdqv_get_cmdq(struct 
> > > > arm_smmu_device *smmu)
> > > > +{
> > > > + struct nvidia_grace_cmdqv *cmdqv = smmu->nvidia_grace_cmdqv;
> > > > + struct nvidia_grace_cmdqv_vintf *vintf0 = >vintf0;
> > > > + u16 qidx;
> > > > +
> > > > + /* Check error status of vintf0 */
> > > > + if (!FIELD_GET(VINTF_STATUS, vintf0->status))
> > > > + return >cmdq;
> > > > +
> > > > + /*
> > > > +  * Select a vcmdq to use. Here we use a temporal solution to
> > > > +  * balance out traffic on cmdq issuing: each cmdq has its own
> > > > +  * lock, if all cpus issue cmdlist using the same cmdq, only
> > > > +  * one CPU at a time can enter the process, while the others
> > > > +  * will be spinning at the same lock.
> > > > +  */
> > > > + qidx = smp_processor_id() % cmdqv->num_vcmdqs_per_vintf;
> > > 
> > > How does ordering work between queues? Do they follow a global order
> > > such that a sync on any queue is guaranteed to complete all prior
> > > commands on all queues?
> > 
> > CMDQV internal scheduler would insert a SYNC when (for example)
> > switching from VCMDQ0 to VCMDQ1 while last command in VCMDQ0 is
> > not SYNC. HW has a configuration bit in the register to disable
> > this feature, which is by default enabled.
> 
> Interesting, thanks. So it sounds like this is something you can get
> away with for the moment, but may need to revisit once people chasing
> real-world performance start wanting to turn that bit off.

Yea, we have limitations on both testing setup and available
clients for an in-depth perf measurement at this moment. But
we surely will do as you mentioned. Anyway, this is just for
initial support.

> > > The challenge to make ECMDQ useful to Linux is how to make sure that all
> > > the commands expected to be within scope of a future CMND_SYNC plus that
> > > sync itself all get issued on the same queue, so I'd be mildly surprised
> > > if you didn't have the same problem.
> > 
> > PATCH-3 in this series actually helps align the command queues,
> > between issued commands and SYNC, if bool sync == true. Yet, if
> > doing something like issue->issue->issue_with_sync, it could be
> > tricker.
> 
> Indeed between the iommu_iotlb_gather mechanism and low-level command
> batching things are already a lot more concentrated than they could be,
> but arm_smmu_cmdq_batch_add() and its callers stand out as examples of
> where we'd still be vulnerable to preemption. What I haven't even tried
> to reason about yet is assumptions in the higher-level APIs, e.g. if
> io-pgtable might chuck out a TLBI during an iommu_unmap() which we
> implicitly expect a later iommu_iotlb_sync() to cover.

Though I might have 

Re: [PATCH v3 4/5] iommu/arm-smmu-v3: Add host support for NVIDIA Grace CMDQ-V

2021-12-21 Thread Robin Murphy

On 2021-12-20 19:27, Nicolin Chen wrote:

Hi Robin,

Thank you for the reply!

On Mon, Dec 20, 2021 at 06:42:26PM +, Robin Murphy wrote:

On 2021-11-19 07:19, Nicolin Chen wrote:

From: Nate Watterson 

NVIDIA's Grace Soc has a CMDQ-Virtualization (CMDQV) hardware,
which extends the standard ARM SMMU v3 IP to support multiple
VCMDQs with virtualization capabilities. In-kernel of host OS,
they're used to reduce contention on a single queue. In terms
of command queue, they are very like the standard CMDQ/ECMDQs,
but only support CS_NONE in the CS field of CMD_SYNC command.

This patch adds a new nvidia-grace-cmdqv file and inserts its
structure pointer into the existing arm_smmu_device, and then
adds related function calls in the arm-smmu-v3 driver.

In the CMDQV driver itself, this patch only adds minimal part
for host kernel support. Upon probe(), VINTF0 is reserved for
in-kernel use. And some of the VCMDQs are assigned to VINTF0.
Then the driver will select one of VCMDQs in the VINTF0 based
on the CPU currently executing, to issue commands.


Is there a tangible difference to DMA API or VFIO performance?


Our testing environment is currently running on a single-core
CPU, so unfortunately we don't have a perf data at this point.


OK, as for the ECMDQ patches I think we'll need some investigation with 
real workloads to judge whether we can benefit from these things enough 
to justify the complexity, and whether the design is right.


My gut feeling is that if these multi-queue schemes really can live up 
to their promise of making contention negligible, then they should 
further stand to benefit from bypassing the complex lock-free command 
batching in favour of something more lightweight, which could change the 
direction of much of the refactoring.



[...]

+struct arm_smmu_cmdq *nvidia_grace_cmdqv_get_cmdq(struct arm_smmu_device *smmu)
+{
+ struct nvidia_grace_cmdqv *cmdqv = smmu->nvidia_grace_cmdqv;
+ struct nvidia_grace_cmdqv_vintf *vintf0 = >vintf0;
+ u16 qidx;
+
+ /* Check error status of vintf0 */
+ if (!FIELD_GET(VINTF_STATUS, vintf0->status))
+ return >cmdq;
+
+ /*
+  * Select a vcmdq to use. Here we use a temporal solution to
+  * balance out traffic on cmdq issuing: each cmdq has its own
+  * lock, if all cpus issue cmdlist using the same cmdq, only
+  * one CPU at a time can enter the process, while the others
+  * will be spinning at the same lock.
+  */
+ qidx = smp_processor_id() % cmdqv->num_vcmdqs_per_vintf;


How does ordering work between queues? Do they follow a global order
such that a sync on any queue is guaranteed to complete all prior
commands on all queues?


CMDQV internal scheduler would insert a SYNC when (for example)
switching from VCMDQ0 to VCMDQ1 while last command in VCMDQ0 is
not SYNC. HW has a configuration bit in the register to disable
this feature, which is by default enabled.


Interesting, thanks. So it sounds like this is something you can get 
away with for the moment, but may need to revisit once people chasing 
real-world performance start wanting to turn that bit off.



The challenge to make ECMDQ useful to Linux is how to make sure that all
the commands expected to be within scope of a future CMND_SYNC plus that
sync itself all get issued on the same queue, so I'd be mildly surprised
if you didn't have the same problem.


PATCH-3 in this series actually helps align the command queues,
between issued commands and SYNC, if bool sync == true. Yet, if
doing something like issue->issue->issue_with_sync, it could be
tricker.


Indeed between the iommu_iotlb_gather mechanism and low-level command 
batching things are already a lot more concentrated than they could be, 
but arm_smmu_cmdq_batch_add() and its callers stand out as examples of 
where we'd still be vulnerable to preemption. What I haven't even tried 
to reason about yet is assumptions in the higher-level APIs, e.g. if 
io-pgtable might chuck out a TLBI during an iommu_unmap() which we 
implicitly expect a later iommu_iotlb_sync() to cover.


I've been thinking that in many ways per-domain queues make quite a bit 
of sense and would be easier to manage than per-CPU ones - plus that's 
pretty much the usage model once we get to VMs anyway - but that fails 
to help the significant cases like networking and storage where many 
CPUs are servicing a big monolithic device in a single domain :(


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups

2021-12-21 Thread Jason Gunthorpe via iommu
On Tue, Dec 21, 2021 at 04:50:56PM +, Robin Murphy wrote:

> this proposal is the worst of both worlds, in that drivers still have to be
> just as aware of groups in order to know whether to call the _shared
> interface or not, except it's now entirely implicit and non-obvious.

Drivers are not aware of groups, where did you see that?

Drivers have to indicate their intention, based entirely on their own
internal design. If groups are present, or not is irrelevant to the
driver.

If the driver uses a single struct device (which is most) then it uses
iommu_attach_device().

If the driver uses multiple struct devices and intends to connect them
all to the same domain then it uses the _shared variant. The only
difference between the two is the _shared varient lacks some of the
protections against driver abuse of the API.

Nothing uses the group interface except for VFIO and stuff inside
drivers/iommu. VFIO has a uAPI tied to the group interface and it
is stuck with it.

> Otherwise just add the housekeeping stuff to iommu_{attach,detach}_group() -
> there's no way we want *three* attach/detach interfaces all with different
> semantics.

I'm not sure why you think 3 APIs is bad thing. Threes APIs, with
clearly intended purposes is a lot better than one giant API with a
bunch of parameters that tries to do everything.

In this case, it is not simple to 'add the housekeeping' to
iommu_attach_group() in a way that is useful to both tegra and
VFIO. What tegra wants is what the _shared API implements, and that
logic should not be open coded in drivers.

VFIO does not want exactly that, it has its own logic to deal directly
with groups tied to its uAPI. Due to the uAPI it doesn't even have a
struct device, unfortunately.

The reason there are three APIs is because there are three different
use-cases. It is not bad thing to have APIs designed for the use cases
they serve.

> It's worth taking a step back and realising that overall, this is really
> just a more generalised and finer-grained extension of what 426a273834ea
> already did for non-group-aware code, so it makes little sense *not* to
> integrate it into the existing interfaces.

This is taking 426a to it's logical conclusion and *removing* the
group API from the drivers entirely. This is desirable because drivers
cannot do anything sane with the group.

The drivers have struct devices, and so we provide APIs that work in
terms of struct devices to cover both driver use cases today, and do
so more safely than what is already implemented.

Do not mix up VFIO with the driver interface, these are different
things. It is better VFIO stay on its own and not complicate the
driver world.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 21/23] mm: use custom page_free for P2PDMA pages

2021-12-21 Thread Logan Gunthorpe



On 2021-12-21 2:06 a.m., Christoph Hellwig wrote:
> On Wed, Nov 17, 2021 at 02:54:08PM -0700, Logan Gunthorpe wrote:
>> When P2PDMA pages are passed to userspace, they will need to be
>> reference counted properly and returned to their genalloc after their
>> reference count returns to 1. This is accomplished with the existing
>> DEV_PAGEMAP_OPS and the .page_free() operation.
>>
>> Change CONFIG_P2PDMA to select CONFIG_DEV_PAGEMAP_OPS and add
>> MEMORY_DEVICE_PCI_P2PDMA to page_is_devmap_managed(),
>> devmap_managed_enable_[put|get]() and free_devmap_managed_page().
> 
> Uuuh.  We are trying hard to kill off this magic free at refcount 1
> behavior in the amdgpu device coherent series.  We really should not
> add more of this.

Ah, ok. I found Ralph's patch that cleans this up and I can try to
rebase this onto it for future postings until it gets merged.

Your other comments I can address for the next time I post this series.

Thanks for the review!

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-21 Thread Robin Murphy

On 2021-12-20 23:58, Vlastimil Babka wrote:

On 12/16/21 16:00, Hyeonggon Yoo wrote:

On Tue, Dec 14, 2021 at 01:57:22PM +0100, Vlastimil Babka wrote:

On 12/1/21 19:14, Vlastimil Babka wrote:

Folks from non-slab subsystems are Cc'd only to patches affecting them, and
this cover letter.

Series also available in git, based on 5.16-rc3:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2


Pushed a new branch slab-struct-slab-v3r3 with accumulated fixes and small 
tweaks
and a new patch from Hyeonggon Yoo on top. To avoid too much spam, here's a 
range diff:


Reviewing the whole patch series takes longer than I thought.
I'll try to review and test rest of patches when I have time.

I added Tested-by if kernel builds okay and kselftests
does not break the kernel on my machine.
(with CONFIG_SLAB/SLUB/SLOB depending on the patch),


Thanks!


Let me know me if you know better way to test a patch.


Testing on your machine is just fine.


# mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only when 
enabled

Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>

Comment:
Works on both SLUB_CPU_PARTIAL and !SLUB_CPU_PARTIAL.
btw, do we need slabs_cpu_partial attribute when we don't use
cpu partials? (!SLUB_CPU_PARTIAL)


The sysfs attribute? Yeah we should be consistent to userspace expecting to
read it (even with zeroes), regardless of config.


# mm/slub: Simplify struct slab slabs field definition
Comment:

This is how struct page looks on the top of v3r3 branch:
struct page {
[...]
 struct {/* slab, slob and slub */
 union {
 struct list_head slab_list;
 struct {/* Partial pages */
 struct page *next;
#ifdef CONFIG_64BIT
 int pages;  /* Nr of pages left */
#else
 short int pages;
#endif
 };
 };
[...]

It's not consistent with struct slab.


Hm right. But as we don't actually use the struct page version anymore, and
it's not one of the fields checked by SLAB_MATCH(), we can ignore this.


I think this is because "mm: Remove slab from struct page" was dropped.


That was just postponed until iommu changes are in. Matthew mentioned those
might be merged too, so that final cleanup will happen too and take care of
the discrepancy above, so no need for extra churn to address it speficially.


FYI the IOMMU changes are now queued in linux-next, so if all goes well 
you might be able to sneak that final patch in too.


Robin.




Would you update some of patches?

# mm/sl*b: Differentiate struct slab fields by sl*b implementations
Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
Works SL[AUO]B on my machine and makes code much better.

# mm/slob: Convert SLOB to use struct slab and struct folio
Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
It still works fine on SLOB.

# mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab
Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>

# mm/slub: Convert __free_slab() to use struct slab
Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>

Thanks,
Hyeonggon.


Thanks again,
Vlastimil
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 01/23] lib/scatterlist: cleanup macros into static inline functions

2021-12-21 Thread Logan Gunthorpe



On 2021-12-21 2:00 a.m., Christoph Hellwig wrote:
> On Wed, Nov 17, 2021 at 02:53:48PM -0700, Logan Gunthorpe wrote:
>> Convert the sg_is_chain(), sg_is_last() and sg_chain_ptr() macros
>> into static inline functions. There's no reason for these to be macros
>> and static inline are generally preferred these days.
>>
>> Also introduce the SG_PAGE_LINK_MASK define so the P2PDMA work, which is
>> adding another bit to this mask, can do so more easily.
>>
>> Suggested-by: Jason Gunthorpe 
>> Signed-off-by: Logan Gunthorpe 
> 
> Looks fine:
> 
> Reviewed-by: Christoph Hellwig 
> 
> scatterlist.h doesn't have a real maintainer, do you want me to pick
> this up through the DMA tree?

Sure, that would be great!

Thanks,

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 07/13] iommu: Add iommu_at[de]tach_device_shared() for multi-device groups

2021-12-21 Thread Robin Murphy

On 2021-12-17 06:37, Lu Baolu wrote:

The iommu_attach/detach_device() interfaces were exposed for the device
drivers to attach/detach their own domains. The commit <426a273834eae>
("iommu: Limit iommu_attach/detach_device to device with their own group")
restricted them to singleton groups to avoid different device in a group
attaching different domain.

As we've introduced device DMA ownership into the iommu core. We can now
introduce interfaces for muliple-device groups, and "all devices are in the
same address space" is still guaranteed.

The iommu_attach/detach_device_shared() could be used when multiple drivers
sharing the group claim the DMA_OWNER_PRIVATE_DOMAIN ownership. The first
call of iommu_attach_device_shared() attaches the domain to the group.
Other drivers could join it later. The domain will be detached from the
group after all drivers unjoin it.


I don't see the point of this at all - if you really want to hide the 
concept of IOMMU groups away from drivers then just make 
iommu_{attach,detach}_device() do the right thing. At least the 
iommu_group_get_for_dev() plus iommu_{attach,detach}_group() API is 
clear - this proposal is the worst of both worlds, in that drivers still 
have to be just as aware of groups in order to know whether to call the 
_shared interface or not, except it's now entirely implicit and non-obvious.


Otherwise just add the housekeeping stuff to 
iommu_{attach,detach}_group() - there's no way we want *three* 
attach/detach interfaces all with different semantics.


It's worth taking a step back and realising that overall, this is really 
just a more generalised and finer-grained extension of what 426a273834ea 
already did for non-group-aware code, so it makes little sense *not* to 
integrate it into the existing interfaces.


Robin.


Signed-off-by: Jason Gunthorpe 
Signed-off-by: Lu Baolu 
Tested-by: Dmitry Osipenko 
---
  include/linux/iommu.h | 13 +++
  drivers/iommu/iommu.c | 79 +++
  2 files changed, 92 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5ad4cf13370d..1bc03118dfb3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -703,6 +703,8 @@ int iommu_group_set_dma_owner(struct iommu_group *group, 
enum iommu_dma_owner ow
  void *owner_cookie);
  void iommu_group_release_dma_owner(struct iommu_group *group, enum 
iommu_dma_owner owner);
  bool iommu_group_dma_owner_unclaimed(struct iommu_group *group);
+int iommu_attach_device_shared(struct iommu_domain *domain, struct device 
*dev);
+void iommu_detach_device_shared(struct iommu_domain *domain, struct device 
*dev);
  
  #else /* CONFIG_IOMMU_API */
  
@@ -743,11 +745,22 @@ static inline int iommu_attach_device(struct iommu_domain *domain,

return -ENODEV;
  }
  
+static inline int iommu_attach_device_shared(struct iommu_domain *domain,

+struct device *dev)
+{
+   return -ENODEV;
+}
+
  static inline void iommu_detach_device(struct iommu_domain *domain,
   struct device *dev)
  {
  }
  
+static inline void iommu_detach_device_shared(struct iommu_domain *domain,

+ struct device *dev)
+{
+}
+
  static inline struct iommu_domain *iommu_get_domain_for_dev(struct device 
*dev)
  {
return NULL;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 8bec71b1cc18..3ad66cb9bedc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -50,6 +50,7 @@ struct iommu_group {
struct list_head entry;
enum iommu_dma_owner dma_owner;
unsigned int owner_cnt;
+   unsigned int attach_cnt;
void *owner_cookie;
  };
  
@@ -3512,3 +3513,81 @@ void iommu_device_release_dma_owner(struct device *dev, enum iommu_dma_owner own

iommu_group_put(group);
  }
  EXPORT_SYMBOL_GPL(iommu_device_release_dma_owner);
+
+/**
+ * iommu_attach_device_shared() - Attach shared domain to a device
+ * @domain: The shared domain.
+ * @dev: The device.
+ *
+ * Similar to iommu_attach_device(), but allowed for shared-group devices
+ * and guarantees that all devices in an iommu group could only be attached
+ * by a same iommu domain. The caller should explicitly set the dma ownership
+ * of DMA_OWNER_PRIVATE_DOMAIN or DMA_OWNER_PRIVATE_DOMAIN_USER type before
+ * calling it and use the paired helper iommu_detach_device_shared() for
+ * cleanup.
+ */
+int iommu_attach_device_shared(struct iommu_domain *domain, struct device *dev)
+{
+   struct iommu_group *group;
+   int ret = 0;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return -ENODEV;
+
+   mutex_lock(>mutex);
+   if (group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN &&
+   group->dma_owner != DMA_OWNER_PRIVATE_DOMAIN_USER) {
+   ret = -EPERM;
+   goto unlock_out;
+   }
+
+   if 

Re: [PATCH v4 23/23] nvme-pci: allow mmaping the CMB in userspace

2021-12-21 Thread Christoph Hellwig
>   file->private_data = ctrl;
> +
> + if (ctrl->ops->mmap_file_open)
> + ctrl->ops->mmap_file_open(ctrl, file);
> +

The callout doesn't really have anything to do with mmap, that is just
how you use it.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 21/23] mm: use custom page_free for P2PDMA pages

2021-12-21 Thread Christoph Hellwig
On Wed, Nov 17, 2021 at 02:54:08PM -0700, Logan Gunthorpe wrote:
> When P2PDMA pages are passed to userspace, they will need to be
> reference counted properly and returned to their genalloc after their
> reference count returns to 1. This is accomplished with the existing
> DEV_PAGEMAP_OPS and the .page_free() operation.
> 
> Change CONFIG_P2PDMA to select CONFIG_DEV_PAGEMAP_OPS and add
> MEMORY_DEVICE_PCI_P2PDMA to page_is_devmap_managed(),
> devmap_managed_enable_[put|get]() and free_devmap_managed_page().

Uuuh.  We are trying hard to kill off this magic free at refcount 1
behavior in the amdgpu device coherent series.  We really should not
add more of this.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 17/23] block: add check when merging zone device pages

2021-12-21 Thread Christoph Hellwig
> +/*
> + * Consecutive zone device pages should not be merged into the same sgl
> + * or bvec segment with other types of pages or if they belong to different
> + * pgmaps. Otherwise getting the pgmap of a given segment is not possible
> + * without scanning the entire segment. This helper returns true either if
> + * both pages are not zone device pages or both pages are zone device pages
> + * with the same pgmap.
> + */
> +static inline bool zone_device_pages_are_mergeable(const struct page *a,
> +const struct page *b)

Merging is only really a use case here.  This really checks if they
belong to the same pgmap, so I suspect that should be in the name.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 16/23] iov_iter: introduce iov_iter_get_pages_[alloc_]flags()

2021-12-21 Thread Christoph Hellwig
All these new helpers should be _GPL exports, keeping the existing
ones (which should never have been non-GPL exports as tiny wrappers
around GUP-fast) as out of line wrappers.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 02/23] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL

2021-12-21 Thread Christoph Hellwig
> + #
> + # The need for the scatterlist DMA bus address flag means PCI P2PDMA
> + # requires 64bit
> + #
> + select NEED_SG_DMA_BUS_ADDR_FLAG

> +config NEED_SG_DMA_BUS_ADDR_FLAG
> + depends on 64BIT
> + bool

depends does not work for symbols that are selected using select.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 01/23] lib/scatterlist: cleanup macros into static inline functions

2021-12-21 Thread Christoph Hellwig
On Wed, Nov 17, 2021 at 02:53:48PM -0700, Logan Gunthorpe wrote:
> Convert the sg_is_chain(), sg_is_last() and sg_chain_ptr() macros
> into static inline functions. There's no reason for these to be macros
> and static inline are generally preferred these days.
> 
> Also introduce the SG_PAGE_LINK_MASK define so the P2PDMA work, which is
> adding another bit to this mask, can do so more easily.
> 
> Suggested-by: Jason Gunthorpe 
> Signed-off-by: Logan Gunthorpe 

Looks fine:

Reviewed-by: Christoph Hellwig 

scatterlist.h doesn't have a real maintainer, do you want me to pick
this up through the DMA tree?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu