Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-07-01 Thread Robin Murphy

On 2022-07-01 12:33, John Garry wrote:

On 01/07/2022 04:56, Feng Tang wrote:

inclination.


ok, what you are saying sounds reasonable. I just remember that when we
analyzed the longterm aging issue that we concluded that the FQ size 
and its
relation to the magazine size was a factor and this change makes me a 
little

worried about new issues. Better the devil you know and all that...

Anyway, if I get some time I might do some testing to see if this 
change has

any influence.

Another thought is if we need even store the size in the 
iova_magazine? mags
in the depot are always full. As such, we only need worry about mags 
loaded

in the cpu rcache and their sizes, so maybe we could have something like
this:

struct iova_magazine {
-   unsigned long size;
    unsigned long pfns[IOVA_MAG_SIZE];
};

@@ -631,6 +630,8 @@ struct iova_cpu_rcache {
    spinlock_t lock;
    struct iova_magazine *loaded;
    struct iova_magazine *prev;
+   int loaded_size;
+   int prev_size;
};

I haven't tried to implement it though..

I have very few knowledge of iova, so you can chose what's the better
solution. I just wanted to raise the problem and will be happy to see
it solved:)


I quickly tested your patch for performance and saw no noticeable 
difference, which is no surprise.


But I'll defer to Robin if he thinks that your patch is a better 
solution - I would guess that he does. For me personally I would prefer 
that this value was not changed, as I mentioned before.


This idea is interesting, but it would mean a bit more fiddly work to 
keep things in sync when magazines are allocated, freed and swapped 
around. It seems like the kind of non-obvious thing that might make 
sense if it gave a significant improvement in cache locality or 
something like that, but for simply fixing an allocation size it feels a 
bit too wacky.


From my perspective, indeed I'd rather do the simple thing for now to 
address the memory wastage issue directly, then we can do the deeper 
performance analysis on top to see if further tweaking of magazine sizes 
and/or design is justified.


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-07-01 Thread John Garry via iommu

On 01/07/2022 04:56, Feng Tang wrote:

inclination.


ok, what you are saying sounds reasonable. I just remember that when we
analyzed the longterm aging issue that we concluded that the FQ size and its
relation to the magazine size was a factor and this change makes me a little
worried about new issues. Better the devil you know and all that...

Anyway, if I get some time I might do some testing to see if this change has
any influence.

Another thought is if we need even store the size in the iova_magazine? mags
in the depot are always full. As such, we only need worry about mags loaded
in the cpu rcache and their sizes, so maybe we could have something like
this:

struct iova_magazine {
-   unsigned long size;
unsigned long pfns[IOVA_MAG_SIZE];
};

@@ -631,6 +630,8 @@ struct iova_cpu_rcache {
spinlock_t lock;
struct iova_magazine *loaded;
struct iova_magazine *prev;
+   int loaded_size;
+   int prev_size;
};

I haven't tried to implement it though..

I have very few knowledge of iova, so you can chose what's the better
solution. I just wanted to raise the problem and will be happy to see
it solved:)


I quickly tested your patch for performance and saw no noticeable 
difference, which is no surprise.


But I'll defer to Robin if he thinks that your patch is a better 
solution - I would guess that he does. For me personally I would prefer 
that this value was not changed, as I mentioned before.


thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-06-30 Thread Feng Tang
Hi John,

On Thu, Jun 30, 2022 at 11:52:18AM +0100, John Garry wrote:
> > > > 
> > > > >    [    4.319253] iommu: Adding device :06:00.2 to group 5
> > > > >    [    4.325869] iommu: Adding device :20:01.0 to group 15
> > > > >    [    4.332648] iommu: Adding device :20:02.0 to group 16
> > > > >    [    4.338946] swapper/0 invoked oom-killer:
> > > > > gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null),
> > > > > order=0, oom_score_adj=0
> > > > >    [    4.350251] swapper/0 cpuset=/ mems_allowed=0
> > > > >    [    4.354618] CPU: 0 PID: 1 Comm: swapper/0 Not
> > > > > tainted 4.19.57.mx64.282 #1
> > > > >    [    4.355612] Hardware name: Dell Inc. PowerEdge
> > > > > R7425/08V001, BIOS 1.9.3 06/25/2019
> > > > >    [    4.355612] Call Trace:
> > > > >    [    4.355612]  dump_stack+0x46/0x5b
> > > > >    [    4.355612]  dump_header+0x6b/0x289
> > > > >    [    4.355612]  out_of_memory+0x470/0x4c0
> > > > >    [    4.355612]  __alloc_pages_nodemask+0x970/0x1030
> > > > >    [    4.355612]  cache_grow_begin+0x7d/0x520
> > > > >    [    4.355612]  fallback_alloc+0x148/0x200
> > > > >    [    4.355612]  kmem_cache_alloc_trace+0xac/0x1f0
> > > > >    [    4.355612]  init_iova_domain+0x112/0x170
> 
> Note for Feng Tang: This callchain does not exist anymore since we separated
> out the rcache init from the IOVA domain init. Indeed, not so much memory is
> wasted for unused rcaches now.
 
Thanks for the info, I also planned to remove the callstack as Robin suggested. 
 

> My point really is that it would be nicer to see a modern callchain - but
> don't read that as me saying that the change is this patch is bad.
> 
> > > > >    [    4.355612]  amd_iommu_domain_alloc+0x138/0x1a0
> > > > >    [    4.355612]  iommu_group_get_for_dev+0xc4/0x1a0
> > > > >    [    4.355612]  amd_iommu_add_device+0x13a/0x610
> > > > >    [    4.355612]  add_iommu_group+0x20/0x30
> > > > >    [    4.355612]  bus_for_each_dev+0x76/0xc0
> > > > >    [    4.355612]  bus_set_iommu+0xb6/0xf0
> > > > >    [    4.355612]  amd_iommu_init_api+0x112/0x132
> > > > >    [    4.355612]  state_next+0xfb1/0x1165
> > > > >    [    4.355612]  amd_iommu_init+0x1f/0x67
> > > > >    [    4.355612]  pci_iommu_init+0x16/0x3f
> > > > >    ...
> > > > >    [    4.670295] Unreclaimable slab info:
> > > > >    ...
> > > > >    [    4.857565] kmalloc-2048   59164KB  59164KB
> > > > > 
> > > > > Change IOVA_MAG_SIZE from 128 to 127 to make size of 'iova_magazine'
> > > > > 1024 bytes so that no memory will be wasted.
> > > > > 
> > > > > [1]. https://lkml.org/lkml/2019/8/12/266
> > > > > 
> > > > > Signed-off-by: Feng Tang 
> > > > > ---
> > > > >   drivers/iommu/iova.c | 7 ++-
> > > > >   1 file changed, 6 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> > > > > index db77aa675145b..27634ddd9b904 100644
> > > > > --- a/drivers/iommu/iova.c
> > > > > +++ b/drivers/iommu/iova.c
> > > > > @@ -614,7 +614,12 @@ EXPORT_SYMBOL_GPL(reserve_iova);
> > > > >    * dynamic size tuning described in the paper.
> > > > >    */
> > > > > -#define IOVA_MAG_SIZE 128
> > > > > +/*
> > > > > + * As kmalloc's buffer size is fixed to power of 2, 127 is chosen to
> > > > > + * assure size of 'iova_magzine' to be 1024 bytes, so that no memory
> > > > 
> > > > Typo: iova_magazine
> > > > 
> > > > > + * will be wasted.
> > > > > + */
> > > > > +#define IOVA_MAG_SIZE 127
> > > 
> > > I do wonder if we will see some strange new behaviour since
> > > IOVA_FQ_SIZE % IOVA_MAG_SIZE != 0 now...
> > 
> > I doubt it - even if a flush queue does happen to be entirely full of
> > equal-sized IOVAs, a CPU's loaded magazines also both being perfectly
> > empty when it comes to dump a full fq seem further unlikely, so in
> > practice I don't see this making any appreciable change to the
> > likelihood of spilling back to the depot or not. In fact the smaller the
> > magazines get, the less time would be spent flushing the depot back to
> > the rbtree, where your interesting workload falls off the cliff and> > 
> > never catches back up with the fq timer, so at some point it might even
> > improve (unless it's also already close to the point where smaller
> > caches would bottleneck allocation)... might be interesting to
> > experiment with a wider range of magazine sizes if you had the time and
> > inclination.
> > 
> 
> ok, what you are saying sounds reasonable. I just remember that when we
> analyzed the longterm aging issue that we concluded that the FQ size and its
> relation to the magazine size was a factor and this change makes me a little
> worried about new issues. Better the devil you know and all that...
> 
> Anyway, if I get some time I might do some testing to see if this change has
> any influence.
> 
> Another thought is if we need even store the size in the iova_magazine? mags
> in the depot are always full. As such, we only need worry about mags loaded
> in the cpu rcache and their sizes, 

Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-06-30 Thread Feng Tang
Hi Robin,

On Thu, Jun 30, 2022 at 10:02:00AM +0100, Robin Murphy wrote:
> On 2022-06-30 08:33, Feng Tang wrote:
> > kmalloc will round up the request size to power of 2, and current
> > iova_magazine's size is 1032 (1024+8) bytes, so each instance
> > allocated will get 2048 bytes from kmalloc, causing around 1KB
> > waste.
> > 
> > And in some exstreme case, the memory wasted can trigger OOM as
> > reported in 2019 on a crash kernel with 256 MB memory [1].
> 
> I don't think it really needs pointing out that excessive memory consumption
> can cause OOM. Especially not in the particularly silly context of a system
> with only 2MB of RAM per CPU - that's pretty much guaranteed to be doomed
> one way or another.

Yes, the 256MB for 128 CPU is kind of extreme, will remove this.

> >[4.319253] iommu: Adding device :06:00.2 to group 5
> >[4.325869] iommu: Adding device :20:01.0 to group 15
> >[4.332648] iommu: Adding device :20:02.0 to group 16
> >[4.338946] swapper/0 invoked oom-killer: 
> > gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, 
> > oom_score_adj=0
> >[4.350251] swapper/0 cpuset=/ mems_allowed=0
> >[4.354618] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> > 4.19.57.mx64.282 #1
> >[4.355612] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 
> > 1.9.3 06/25/2019
> >[4.355612] Call Trace:
> >[4.355612]  dump_stack+0x46/0x5b
> >[4.355612]  dump_header+0x6b/0x289
> >[4.355612]  out_of_memory+0x470/0x4c0
> >[4.355612]  __alloc_pages_nodemask+0x970/0x1030
> >[4.355612]  cache_grow_begin+0x7d/0x520
> >[4.355612]  fallback_alloc+0x148/0x200
> >[4.355612]  kmem_cache_alloc_trace+0xac/0x1f0
> >[4.355612]  init_iova_domain+0x112/0x170
> >[4.355612]  amd_iommu_domain_alloc+0x138/0x1a0
> >[4.355612]  iommu_group_get_for_dev+0xc4/0x1a0
> >[4.355612]  amd_iommu_add_device+0x13a/0x610
> >[4.355612]  add_iommu_group+0x20/0x30
> >[4.355612]  bus_for_each_dev+0x76/0xc0
> >[4.355612]  bus_set_iommu+0xb6/0xf0
> >[4.355612]  amd_iommu_init_api+0x112/0x132
> >[4.355612]  state_next+0xfb1/0x1165
> >[4.355612]  amd_iommu_init+0x1f/0x67
> >[4.355612]  pci_iommu_init+0x16/0x3f
> >...
> >[4.670295] Unreclaimable slab info:
> >...
> >[4.857565] kmalloc-2048   59164KB  59164KB
> > 
> > Change IOVA_MAG_SIZE from 128 to 127 to make size of 'iova_magazine'
> > 1024 bytes so that no memory will be wasted.
> > 
> > [1]. https://lkml.org/lkml/2019/8/12/266
> > 
> > Signed-off-by: Feng Tang 
> > ---
> >   drivers/iommu/iova.c | 7 ++-
> >   1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> > index db77aa675145b..27634ddd9b904 100644
> > --- a/drivers/iommu/iova.c
> > +++ b/drivers/iommu/iova.c
> > @@ -614,7 +614,12 @@ EXPORT_SYMBOL_GPL(reserve_iova);
> >* dynamic size tuning described in the paper.
> >*/
> > -#define IOVA_MAG_SIZE 128
> > +/*
> > + * As kmalloc's buffer size is fixed to power of 2, 127 is chosen to
> > + * assure size of 'iova_magzine' to be 1024 bytes, so that no memory
> 
> Typo: iova_magazine

will fix.

> > + * will be wasted.
> > + */
> > +#define IOVA_MAG_SIZE 127
> 
> The change itself seems perfectly reasonable, though.
> 
> Acked-by: Robin Murphy 
 
Thanks for the review!

- Feng

> >   #define MAX_GLOBAL_MAGS 32/* magazines per bin */
> >   struct iova_magazine {
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-06-30 Thread Feng Tang
kmalloc will round up the request size to power of 2, and current
iova_magazine's size is 1032 (1024+8) bytes, so each instance
allocated will get 2048 bytes from kmalloc, causing around 1KB
waste.

And in some exstreme case, the memory wasted can trigger OOM as
reported in 2019 on a crash kernel with 256 MB memory [1].

  [4.319253] iommu: Adding device :06:00.2 to group 5
  [4.325869] iommu: Adding device :20:01.0 to group 15
  [4.332648] iommu: Adding device :20:02.0 to group 16
  [4.338946] swapper/0 invoked oom-killer: 
gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, 
oom_score_adj=0
  [4.350251] swapper/0 cpuset=/ mems_allowed=0
  [4.354618] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.57.mx64.282 #1
  [4.355612] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.9.3 
06/25/2019
  [4.355612] Call Trace:
  [4.355612]  dump_stack+0x46/0x5b
  [4.355612]  dump_header+0x6b/0x289
  [4.355612]  out_of_memory+0x470/0x4c0
  [4.355612]  __alloc_pages_nodemask+0x970/0x1030
  [4.355612]  cache_grow_begin+0x7d/0x520
  [4.355612]  fallback_alloc+0x148/0x200
  [4.355612]  kmem_cache_alloc_trace+0xac/0x1f0
  [4.355612]  init_iova_domain+0x112/0x170
  [4.355612]  amd_iommu_domain_alloc+0x138/0x1a0
  [4.355612]  iommu_group_get_for_dev+0xc4/0x1a0
  [4.355612]  amd_iommu_add_device+0x13a/0x610
  [4.355612]  add_iommu_group+0x20/0x30
  [4.355612]  bus_for_each_dev+0x76/0xc0
  [4.355612]  bus_set_iommu+0xb6/0xf0
  [4.355612]  amd_iommu_init_api+0x112/0x132
  [4.355612]  state_next+0xfb1/0x1165
  [4.355612]  amd_iommu_init+0x1f/0x67
  [4.355612]  pci_iommu_init+0x16/0x3f
  ...
  [4.670295] Unreclaimable slab info:
  ...
  [4.857565] kmalloc-2048   59164KB  59164KB

Change IOVA_MAG_SIZE from 128 to 127 to make size of 'iova_magazine'
1024 bytes so that no memory will be wasted.

[1]. https://lkml.org/lkml/2019/8/12/266

Signed-off-by: Feng Tang 
---
 drivers/iommu/iova.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145b..27634ddd9b904 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -614,7 +614,12 @@ EXPORT_SYMBOL_GPL(reserve_iova);
  * dynamic size tuning described in the paper.
  */
 
-#define IOVA_MAG_SIZE 128
+/*
+ * As kmalloc's buffer size is fixed to power of 2, 127 is chosen to
+ * assure size of 'iova_magzine' to be 1024 bytes, so that no memory
+ * will be wasted.
+ */
+#define IOVA_MAG_SIZE 127
 #define MAX_GLOBAL_MAGS 32 /* magazines per bin */
 
 struct iova_magazine {
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-06-30 Thread John Garry via iommu



   [    4.319253] iommu: Adding device :06:00.2 to group 5
   [    4.325869] iommu: Adding device :20:01.0 to group 15
   [    4.332648] iommu: Adding device :20:02.0 to group 16
   [    4.338946] swapper/0 invoked oom-killer: 
gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, 
oom_score_adj=0

   [    4.350251] swapper/0 cpuset=/ mems_allowed=0
   [    4.354618] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.19.57.mx64.282 #1
   [    4.355612] Hardware name: Dell Inc. PowerEdge R7425/08V001, 
BIOS 1.9.3 06/25/2019

   [    4.355612] Call Trace:
   [    4.355612]  dump_stack+0x46/0x5b
   [    4.355612]  dump_header+0x6b/0x289
   [    4.355612]  out_of_memory+0x470/0x4c0
   [    4.355612]  __alloc_pages_nodemask+0x970/0x1030
   [    4.355612]  cache_grow_begin+0x7d/0x520
   [    4.355612]  fallback_alloc+0x148/0x200
   [    4.355612]  kmem_cache_alloc_trace+0xac/0x1f0
   [    4.355612]  init_iova_domain+0x112/0x170


Note for Feng Tang: This callchain does not exist anymore since we 
separated out the rcache init from the IOVA domain init. Indeed, not so 
much memory is wasted for unused rcaches now.


My point really is that it would be nicer to see a modern callchain - 
but don't read that as me saying that the change is this patch is bad.



   [    4.355612]  amd_iommu_domain_alloc+0x138/0x1a0
   [    4.355612]  iommu_group_get_for_dev+0xc4/0x1a0
   [    4.355612]  amd_iommu_add_device+0x13a/0x610
   [    4.355612]  add_iommu_group+0x20/0x30
   [    4.355612]  bus_for_each_dev+0x76/0xc0
   [    4.355612]  bus_set_iommu+0xb6/0xf0
   [    4.355612]  amd_iommu_init_api+0x112/0x132
   [    4.355612]  state_next+0xfb1/0x1165
   [    4.355612]  amd_iommu_init+0x1f/0x67
   [    4.355612]  pci_iommu_init+0x16/0x3f
   ...
   [    4.670295] Unreclaimable slab info:
   ...
   [    4.857565] kmalloc-2048   59164KB  59164KB

Change IOVA_MAG_SIZE from 128 to 127 to make size of 'iova_magazine'
1024 bytes so that no memory will be wasted.

[1]. https://lkml.org/lkml/2019/8/12/266

Signed-off-by: Feng Tang 
---
  drivers/iommu/iova.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145b..27634ddd9b904 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -614,7 +614,12 @@ EXPORT_SYMBOL_GPL(reserve_iova);
   * dynamic size tuning described in the paper.
   */
-#define IOVA_MAG_SIZE 128
+/*
+ * As kmalloc's buffer size is fixed to power of 2, 127 is chosen to
+ * assure size of 'iova_magzine' to be 1024 bytes, so that no memory


Typo: iova_magazine


+ * will be wasted.
+ */
+#define IOVA_MAG_SIZE 127


I do wonder if we will see some strange new behaviour since 
IOVA_FQ_SIZE % IOVA_MAG_SIZE != 0 now...


I doubt it - even if a flush queue does happen to be entirely full of 
equal-sized IOVAs, a CPU's loaded magazines also both being perfectly 
empty when it comes to dump a full fq seem further unlikely, so in 
practice I don't see this making any appreciable change to the 
likelihood of spilling back to the depot or not. In fact the smaller the 
magazines get, the less time would be spent flushing the depot back to 
the rbtree, where your interesting workload falls off the cliff and 
never catches back up with the fq timer, so at some point it might even 
improve (unless it's also already close to the point where smaller 
caches would bottleneck allocation)... might be interesting to 
experiment with a wider range of magazine sizes if you had the time and 
inclination.




ok, what you are saying sounds reasonable. I just remember that when we 
analyzed the longterm aging issue that we concluded that the FQ size and 
its relation to the magazine size was a factor and this change makes me 
a little worried about new issues. Better the devil you know and all that...


Anyway, if I get some time I might do some testing to see if this change 
has any influence.


Another thought is if we need even store the size in the iova_magazine? 
mags in the depot are always full. As such, we only need worry about 
mags loaded in the cpu rcache and their sizes, so maybe we could have 
something like this:


struct iova_magazine {
-   unsigned long size;
   unsigned long pfns[IOVA_MAG_SIZE];
};

@@ -631,6 +630,8 @@ struct iova_cpu_rcache {
   spinlock_t lock;
   struct iova_magazine *loaded;
   struct iova_magazine *prev;
+   int loaded_size;
+   int prev_size;
};

I haven't tried to implement it though..

Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-06-30 Thread Robin Murphy

On 2022-06-30 10:37, John Garry wrote:

On 30/06/2022 10:02, Robin Murphy wrote:

On 2022-06-30 08:33, Feng Tang wrote:

kmalloc will round up the request size to power of 2, and current
iova_magazine's size is 1032 (1024+8) bytes, so each instance
allocated will get 2048 bytes from kmalloc, causing around 1KB
waste.

And in some exstreme case, the memory wasted can trigger OOM as
reported in 2019 on a crash kernel with 256 MB memory [1].


I don't think it really needs pointing out that excessive memory 
consumption can cause OOM. Especially not in the particularly silly 
context of a system with only 2MB of RAM per CPU - that's pretty much 
guaranteed to be doomed one way or another.



   [    4.319253] iommu: Adding device :06:00.2 to group 5
   [    4.325869] iommu: Adding device :20:01.0 to group 15
   [    4.332648] iommu: Adding device :20:02.0 to group 16
   [    4.338946] swapper/0 invoked oom-killer: 
gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, 
oom_score_adj=0

   [    4.350251] swapper/0 cpuset=/ mems_allowed=0
   [    4.354618] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.19.57.mx64.282 #1
   [    4.355612] Hardware name: Dell Inc. PowerEdge R7425/08V001, 
BIOS 1.9.3 06/25/2019

   [    4.355612] Call Trace:
   [    4.355612]  dump_stack+0x46/0x5b
   [    4.355612]  dump_header+0x6b/0x289
   [    4.355612]  out_of_memory+0x470/0x4c0
   [    4.355612]  __alloc_pages_nodemask+0x970/0x1030
   [    4.355612]  cache_grow_begin+0x7d/0x520
   [    4.355612]  fallback_alloc+0x148/0x200
   [    4.355612]  kmem_cache_alloc_trace+0xac/0x1f0
   [    4.355612]  init_iova_domain+0x112/0x170
   [    4.355612]  amd_iommu_domain_alloc+0x138/0x1a0
   [    4.355612]  iommu_group_get_for_dev+0xc4/0x1a0
   [    4.355612]  amd_iommu_add_device+0x13a/0x610
   [    4.355612]  add_iommu_group+0x20/0x30
   [    4.355612]  bus_for_each_dev+0x76/0xc0
   [    4.355612]  bus_set_iommu+0xb6/0xf0
   [    4.355612]  amd_iommu_init_api+0x112/0x132
   [    4.355612]  state_next+0xfb1/0x1165
   [    4.355612]  amd_iommu_init+0x1f/0x67
   [    4.355612]  pci_iommu_init+0x16/0x3f
   ...
   [    4.670295] Unreclaimable slab info:
   ...
   [    4.857565] kmalloc-2048   59164KB  59164KB

Change IOVA_MAG_SIZE from 128 to 127 to make size of 'iova_magazine'
1024 bytes so that no memory will be wasted.

[1]. https://lkml.org/lkml/2019/8/12/266

Signed-off-by: Feng Tang 
---
  drivers/iommu/iova.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145b..27634ddd9b904 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -614,7 +614,12 @@ EXPORT_SYMBOL_GPL(reserve_iova);
   * dynamic size tuning described in the paper.
   */
-#define IOVA_MAG_SIZE 128
+/*
+ * As kmalloc's buffer size is fixed to power of 2, 127 is chosen to
+ * assure size of 'iova_magzine' to be 1024 bytes, so that no memory


Typo: iova_magazine


+ * will be wasted.
+ */
+#define IOVA_MAG_SIZE 127


I do wonder if we will see some strange new behaviour since IOVA_FQ_SIZE 
% IOVA_MAG_SIZE != 0 now...


I doubt it - even if a flush queue does happen to be entirely full of 
equal-sized IOVAs, a CPU's loaded magazines also both being perfectly 
empty when it comes to dump a full fq seem further unlikely, so in 
practice I don't see this making any appreciable change to the 
likelihood of spilling back to the depot or not. In fact the smaller the 
magazines get, the less time would be spent flushing the depot back to 
the rbtree, where your interesting workload falls off the cliff and 
never catches back up with the fq timer, so at some point it might even 
improve (unless it's also already close to the point where smaller 
caches would bottleneck allocation)... might be interesting to 
experiment with a wider range of magazine sizes if you had the time and 
inclination.


Cheers,
Robin.





The change itself seems perfectly reasonable, though.

Acked-by: Robin Murphy 



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-06-30 Thread John Garry via iommu

On 30/06/2022 10:02, Robin Murphy wrote:

On 2022-06-30 08:33, Feng Tang wrote:

kmalloc will round up the request size to power of 2, and current
iova_magazine's size is 1032 (1024+8) bytes, so each instance
allocated will get 2048 bytes from kmalloc, causing around 1KB
waste.

And in some exstreme case, the memory wasted can trigger OOM as
reported in 2019 on a crash kernel with 256 MB memory [1].


I don't think it really needs pointing out that excessive memory 
consumption can cause OOM. Especially not in the particularly silly 
context of a system with only 2MB of RAM per CPU - that's pretty much 
guaranteed to be doomed one way or another.



   [    4.319253] iommu: Adding device :06:00.2 to group 5
   [    4.325869] iommu: Adding device :20:01.0 to group 15
   [    4.332648] iommu: Adding device :20:02.0 to group 16
   [    4.338946] swapper/0 invoked oom-killer: 
gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, 
oom_score_adj=0

   [    4.350251] swapper/0 cpuset=/ mems_allowed=0
   [    4.354618] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.19.57.mx64.282 #1
   [    4.355612] Hardware name: Dell Inc. PowerEdge R7425/08V001, 
BIOS 1.9.3 06/25/2019

   [    4.355612] Call Trace:
   [    4.355612]  dump_stack+0x46/0x5b
   [    4.355612]  dump_header+0x6b/0x289
   [    4.355612]  out_of_memory+0x470/0x4c0
   [    4.355612]  __alloc_pages_nodemask+0x970/0x1030
   [    4.355612]  cache_grow_begin+0x7d/0x520
   [    4.355612]  fallback_alloc+0x148/0x200
   [    4.355612]  kmem_cache_alloc_trace+0xac/0x1f0
   [    4.355612]  init_iova_domain+0x112/0x170
   [    4.355612]  amd_iommu_domain_alloc+0x138/0x1a0
   [    4.355612]  iommu_group_get_for_dev+0xc4/0x1a0
   [    4.355612]  amd_iommu_add_device+0x13a/0x610
   [    4.355612]  add_iommu_group+0x20/0x30
   [    4.355612]  bus_for_each_dev+0x76/0xc0
   [    4.355612]  bus_set_iommu+0xb6/0xf0
   [    4.355612]  amd_iommu_init_api+0x112/0x132
   [    4.355612]  state_next+0xfb1/0x1165
   [    4.355612]  amd_iommu_init+0x1f/0x67
   [    4.355612]  pci_iommu_init+0x16/0x3f
   ...
   [    4.670295] Unreclaimable slab info:
   ...
   [    4.857565] kmalloc-2048   59164KB  59164KB

Change IOVA_MAG_SIZE from 128 to 127 to make size of 'iova_magazine'
1024 bytes so that no memory will be wasted.

[1]. https://lkml.org/lkml/2019/8/12/266

Signed-off-by: Feng Tang 
---
  drivers/iommu/iova.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145b..27634ddd9b904 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -614,7 +614,12 @@ EXPORT_SYMBOL_GPL(reserve_iova);
   * dynamic size tuning described in the paper.
   */
-#define IOVA_MAG_SIZE 128
+/*
+ * As kmalloc's buffer size is fixed to power of 2, 127 is chosen to
+ * assure size of 'iova_magzine' to be 1024 bytes, so that no memory


Typo: iova_magazine


+ * will be wasted.
+ */
+#define IOVA_MAG_SIZE 127


I do wonder if we will see some strange new behaviour since IOVA_FQ_SIZE 
% IOVA_MAG_SIZE != 0 now...




The change itself seems perfectly reasonable, though.

Acked-by: Robin Murphy 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-06-30 Thread Robin Murphy

On 2022-06-30 08:33, Feng Tang wrote:

kmalloc will round up the request size to power of 2, and current
iova_magazine's size is 1032 (1024+8) bytes, so each instance
allocated will get 2048 bytes from kmalloc, causing around 1KB
waste.

And in some exstreme case, the memory wasted can trigger OOM as
reported in 2019 on a crash kernel with 256 MB memory [1].


I don't think it really needs pointing out that excessive memory 
consumption can cause OOM. Especially not in the particularly silly 
context of a system with only 2MB of RAM per CPU - that's pretty much 
guaranteed to be doomed one way or another.



   [4.319253] iommu: Adding device :06:00.2 to group 5
   [4.325869] iommu: Adding device :20:01.0 to group 15
   [4.332648] iommu: Adding device :20:02.0 to group 16
   [4.338946] swapper/0 invoked oom-killer: 
gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, 
oom_score_adj=0
   [4.350251] swapper/0 cpuset=/ mems_allowed=0
   [4.354618] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.57.mx64.282 #1
   [4.355612] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.9.3 
06/25/2019
   [4.355612] Call Trace:
   [4.355612]  dump_stack+0x46/0x5b
   [4.355612]  dump_header+0x6b/0x289
   [4.355612]  out_of_memory+0x470/0x4c0
   [4.355612]  __alloc_pages_nodemask+0x970/0x1030
   [4.355612]  cache_grow_begin+0x7d/0x520
   [4.355612]  fallback_alloc+0x148/0x200
   [4.355612]  kmem_cache_alloc_trace+0xac/0x1f0
   [4.355612]  init_iova_domain+0x112/0x170
   [4.355612]  amd_iommu_domain_alloc+0x138/0x1a0
   [4.355612]  iommu_group_get_for_dev+0xc4/0x1a0
   [4.355612]  amd_iommu_add_device+0x13a/0x610
   [4.355612]  add_iommu_group+0x20/0x30
   [4.355612]  bus_for_each_dev+0x76/0xc0
   [4.355612]  bus_set_iommu+0xb6/0xf0
   [4.355612]  amd_iommu_init_api+0x112/0x132
   [4.355612]  state_next+0xfb1/0x1165
   [4.355612]  amd_iommu_init+0x1f/0x67
   [4.355612]  pci_iommu_init+0x16/0x3f
   ...
   [4.670295] Unreclaimable slab info:
   ...
   [4.857565] kmalloc-2048   59164KB  59164KB

Change IOVA_MAG_SIZE from 128 to 127 to make size of 'iova_magazine'
1024 bytes so that no memory will be wasted.

[1]. https://lkml.org/lkml/2019/8/12/266

Signed-off-by: Feng Tang 
---
  drivers/iommu/iova.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145b..27634ddd9b904 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -614,7 +614,12 @@ EXPORT_SYMBOL_GPL(reserve_iova);
   * dynamic size tuning described in the paper.
   */
  
-#define IOVA_MAG_SIZE 128

+/*
+ * As kmalloc's buffer size is fixed to power of 2, 127 is chosen to
+ * assure size of 'iova_magzine' to be 1024 bytes, so that no memory


Typo: iova_magazine


+ * will be wasted.
+ */
+#define IOVA_MAG_SIZE 127


The change itself seems perfectly reasonable, though.

Acked-by: Robin Murphy 


  #define MAX_GLOBAL_MAGS 32/* magazines per bin */
  
  struct iova_magazine {

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu