Re: [PATCH v2 00/33] Separate struct slab from struct page

2022-01-03 Thread Vlastimil Babka
On 12/29/21 12:22, Hyeonggon Yoo wrote:
> On Wed, Dec 22, 2021 at 05:56:50PM +0100, Vlastimil Babka wrote:
>> On 12/14/21 13:57, Vlastimil Babka wrote:
>> > On 12/1/21 19:14, Vlastimil Babka wrote:
>> >> Folks from non-slab subsystems are Cc'd only to patches affecting them, 
>> >> and
>> >> this cover letter.
>> >>
>> >> Series also available in git, based on 5.16-rc3:
>> >> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2
>> > 
>> > Pushed a new branch slab-struct-slab-v3r3 with accumulated fixes and small 
>> > tweaks
>> > and a new patch from Hyeonggon Yoo on top. To avoid too much spam, here's 
>> > a range diff:
>> 
>> Hi, I've pushed another update branch slab-struct_slab-v4r1, and also to
>> -next. I've shortened git commit log lines to make checkpatch happier,
>> so no range-diff as it would be too long. I believe it would be useless
>> spam to post the whole series now, shortly before xmas, so I will do it
>> at rc8 time, to hopefully collect remaining reviews. But if anyone wants
>> a mailed version, I can do that.
>>
> 
> Hello Matthew and Vlastimil.
> it's part 3 of review.
> 
> # mm: Convert struct page to struct slab in functions used by other subsystems
> Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> 
> 
> # mm/slub: Convert most struct page to struct slab by spatch
> Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> with a question below.
> 
> -static int check_slab(struct kmem_cache *s, struct page *page)
> +static int check_slab(struct kmem_cache *s, struct slab *slab)
>  {
> int maxobj;
>  
> -   if (!PageSlab(page)) {
> -   slab_err(s, page, "Not a valid slab page");
> +   if (!folio_test_slab(slab_folio(slab))) {
> +   slab_err(s, slab, "Not a valid slab page");
> return 0;
> }
> 
> Can't we guarantee that struct slab * always points to a slab?

Normally, yes.

> for struct page * it can be !PageSlab(page) because struct page *
> can be other than slab. but struct slab * can only be slab
> unlike struct page. code will be simpler if we guarantee that
> struct slab * always points to a slab (or NULL).

That's what the code does indeed. But check_slab() is called as part of
various consistency checks, so there we on purpose question all assumptions
in order to find a bug (e.g. memory corruption) - such as a page that's
still on the list of slabs while it was already freed and reused and thus
e.g. lacks the slab page flag.

But it's nice how using struct slab makes such a check immediately stand out
as suspicious, right?

> # mm/slub: Convert pfmemalloc_match() to take a struct slab
> It's confusing to me because the original pfmemalloc_match() is removed
> and pfmemalloc_match_unsafe() was renamed to pfmemalloc_match() and
> converted to use slab_test_pfmemalloc() helper.
> 
> But I agree with the resulting code. so:
> Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> 
> 
> # mm/slub: Convert alloc_slab_page() to return a struct slab
> Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> 
> 
> # mm/slub: Convert print_page_info() to print_slab_info()
> Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> 
> I hope to review rest of patches in a week.

Thanks for your reviews/tests!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-22 Thread Vlastimil Babka
On 12/14/21 13:57, Vlastimil Babka wrote:
> On 12/1/21 19:14, Vlastimil Babka wrote:
>> Folks from non-slab subsystems are Cc'd only to patches affecting them, and
>> this cover letter.
>>
>> Series also available in git, based on 5.16-rc3:
>> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2
> 
> Pushed a new branch slab-struct-slab-v3r3 with accumulated fixes and small 
> tweaks
> and a new patch from Hyeonggon Yoo on top. To avoid too much spam, here's a 
> range diff:

Hi, I've pushed another update branch slab-struct_slab-v4r1, and also to
-next. I've shortened git commit log lines to make checkpatch happier,
so no range-diff as it would be too long. I believe it would be useless
spam to post the whole series now, shortly before xmas, so I will do it
at rc8 time, to hopefully collect remaining reviews. But if anyone wants
a mailed version, I can do that.

Changes in v4:
- rebase to 5.16-rc6 to avoid a conflict with mainline
- collect acks/reviews/tested-by from Johannes, Roman, Hyeonggon Yoo -
thanks!
- in patch "mm/slub: Convert detached_freelist to use a struct slab"
renamed free_nonslab_page() to free_large_kmalloc() and use folio there,
as suggested by Roman
- in "mm/memcg: Convert slab objcgs from struct page to struct slab"
change one caller of slab_objcgs_check() to slab_objcgs() as suggested
by Johannes, realize the other caller should be also changed, and remove
slab_objcgs_check() completely.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-20 Thread Vlastimil Babka
On 12/16/21 16:00, Hyeonggon Yoo wrote:
> On Tue, Dec 14, 2021 at 01:57:22PM +0100, Vlastimil Babka wrote:
>> On 12/1/21 19:14, Vlastimil Babka wrote:
>> > Folks from non-slab subsystems are Cc'd only to patches affecting them, and
>> > this cover letter.
>> > 
>> > Series also available in git, based on 5.16-rc3:
>> > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2
>> 
>> Pushed a new branch slab-struct-slab-v3r3 with accumulated fixes and small 
>> tweaks
>> and a new patch from Hyeonggon Yoo on top. To avoid too much spam, here's a 
>> range diff:
> 
> Reviewing the whole patch series takes longer than I thought.
> I'll try to review and test rest of patches when I have time.
> 
> I added Tested-by if kernel builds okay and kselftests
> does not break the kernel on my machine.
> (with CONFIG_SLAB/SLUB/SLOB depending on the patch),

Thanks!

> Let me know me if you know better way to test a patch.

Testing on your machine is just fine.

> # mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only when 
> enabled
> 
> Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> 
> Comment:
> Works on both SLUB_CPU_PARTIAL and !SLUB_CPU_PARTIAL.
> btw, do we need slabs_cpu_partial attribute when we don't use
> cpu partials? (!SLUB_CPU_PARTIAL)

The sysfs attribute? Yeah we should be consistent to userspace expecting to
read it (even with zeroes), regardless of config.

> # mm/slub: Simplify struct slab slabs field definition
> Comment:
> 
> This is how struct page looks on the top of v3r3 branch:
> struct page {
> [...]
> struct {/* slab, slob and slub */
> union {
> struct list_head slab_list;
> struct {/* Partial pages */
> struct page *next;
> #ifdef CONFIG_64BIT
> int pages;  /* Nr of pages left */
> #else
> short int pages;
> #endif
> };
> };
> [...]
> 
> It's not consistent with struct slab.

Hm right. But as we don't actually use the struct page version anymore, and
it's not one of the fields checked by SLAB_MATCH(), we can ignore this.

> I think this is because "mm: Remove slab from struct page" was dropped.

That was just postponed until iommu changes are in. Matthew mentioned those
might be merged too, so that final cleanup will happen too and take care of
the discrepancy above, so no need for extra churn to address it speficially.

> Would you update some of patches?
> 
> # mm/sl*b: Differentiate struct slab fields by sl*b implementations
> Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> Works SL[AUO]B on my machine and makes code much better.
> 
> # mm/slob: Convert SLOB to use struct slab and struct folio
> Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> It still works fine on SLOB.
> 
> # mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab
> Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> 
> # mm/slub: Convert __free_slab() to use struct slab
> Reviewed-by: Hyeonggon Yoo <42.hye...@gmail.com>
> Tested-by: Hyeonggon Yoo <42.hye...@gmail.com>
> 
> Thanks,
> Hyeonggon.

Thanks again,
Vlastimil
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-19 Thread Vlastimil Babka
On 12/16/21 00:38, Roman Gushchin wrote:
> Part 2:
> 
> * mm: Convert check_heap_object() to use struct slab
> Reviewed-by: Roman Gushchin 
> 
> * mm/slub: Convert detached_freelist to use a struct slab
> How about to convert free_nonslab_page() to free_nonslab_folio()?
> And maybe rename it to something like free_large_kmalloc()?
> If I'm not missing something, large kmallocs is the only way how we can end up
> there with a !slab folio/page.

Good point, thanks! But did at as part of the following patch, where it fits
logically better.

> * mm/slub: Convert kfree() to use a struct slab
> Reviewed-by: Roman Gushchin 

Didn't add your tag because of the addition of free_large_kmalloc() change.

> * mm/slub: Convert __slab_lock() and __slab_unlock() to struct slab
> Reviewed-by: Roman Gushchin 
> 
> * mm/slub: Convert print_page_info() to print_slab_info()
> Do we really need to explicitly convert slab_folio()'s result to (struct 
> folio *)?

Unfortunately yes, as long as folio_flags() don't take const struct folio *,
which will need some yak shaving.

> Reviewed-by: Roman Gushchin 
> 
> * mm/slub: Convert alloc_slab_page() to return a struct slab
> Reviewed-by: Roman Gushchin 
> 
> * mm/slub: Convert __free_slab() to use struct slab
> Reviewed-by: Roman Gushchin 
> 
> * mm/slub: Convert pfmemalloc_match() to take a struct slab
> Cool! Removing pfmemalloc_unsafe() is really nice.
> Reviewed-by: Roman Gushchin 
> 
> * mm/slub: Convert most struct page to struct slab by spatch
> Reviewed-by: Roman Gushchin 
> 
> * mm/slub: Finish struct page to struct slab conversion
> Reviewed-by: Roman Gushchin 
> 
> * mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab
> Reviewed-by: Roman Gushchin 

Thanks again!

> * mm/slab: Convert most struct page to struct slab by spatch
> 
> Another patch with the same title? Rebase error?
> 
> * mm/slab: Finish struct page to struct slab conversion
> 
> And this one too?
> 
> 
> Thanks!
> 
> Roman

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-19 Thread Vlastimil Babka
On 12/15/21 02:03, Roman Gushchin wrote:
> On Tue, Dec 14, 2021 at 01:57:22PM +0100, Vlastimil Babka wrote:
>> On 12/1/21 19:14, Vlastimil Babka wrote:
>> > Folks from non-slab subsystems are Cc'd only to patches affecting them, and
>> > this cover letter.
>> > 
>> > Series also available in git, based on 5.16-rc3:
>> > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2
>> 
>> Pushed a new branch slab-struct-slab-v3r3 with accumulated fixes and small 
>> tweaks
>> and a new patch from Hyeonggon Yoo on top. To avoid too much spam, here's a 
>> range diff:
> 
> Hi Vlastimil!
> 
> I've started to review this patchset (btw, a really nice work, I like
> the resulting code way more). Because I'm looking at v3 and I don't have
> the whole v2 in my mailbox, here is what I've now:

Thanks a lot, Roman!

> * mm: add virt_to_folio() and folio_address()
> Reviewed-by: Roman Gushchin 
> 
> * mm/slab: Dissolve slab_map_pages() in its caller
> Reviewed-by: Roman Gushchin 
> 
> * mm/slub: Make object_err() static
> Reviewed-by: Roman Gushchin 
> 
> * mm: Split slab into its own type
> 1) Shouldn't SLAB_MATCH() macro use struct folio instead of struct page for 
> the
> comparison?

Folio doesn't have define most of the fields, and matching some to page and
others to folio seems like unnecessary complication. Maybe as part of the
final struct page cleanup when the slab fields are gone from struct page,
the rest could all be in folio - I'll check once we get there.

> 2) page_slab() is used only in kasan and only in one place, so maybe it's 
> better
> to not introduce it as a generic helper?

Yeah that's the case after the series, but as part of the incremental steps,
page_slab() gets used in many places. I'll consider removing it on top though.

> Other than that
> Reviewed-by: Roman Gushchin 
> 
> * mm: Add account_slab() and unaccount_slab()
> 1) maybe change the title to convert/replace instead of add?

Done.

> 2) maybe move later changes to memcg_alloc_page_obj_cgroups() to this patch?

Maybe possible, but that would distort the series more than I'd like to at
this rc6 time.

> Reviewed-by: Roman Gushchin 
> 
> * mm: Convert virt_to_cache() to use struct slab
> Reviewed-by: Roman Gushchin 
> 
> * mm: Convert __ksize() to struct slab
> It looks like certain parts of __ksize() can be merged between slab, slub and 
> slob?
> Reviewed-by: Roman Gushchin 
> 
> * mm: Use struct slab in kmem_obj_info()
> Reviewed-by: Roman Gushchin 
> 
> 
> I'll try to finish reviewing the patchset until the  end of the week.
> 
> Thanks!
> 
> Roman

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-16 Thread Vlastimil Babka
On 12/16/21 00:38, Roman Gushchin wrote:
> On Tue, Dec 14, 2021 at 05:03:12PM -0800, Roman Gushchin wrote:
>> On Tue, Dec 14, 2021 at 01:57:22PM +0100, Vlastimil Babka wrote:
>> > On 12/1/21 19:14, Vlastimil Babka wrote:
>> > > Folks from non-slab subsystems are Cc'd only to patches affecting them, 
>> > > and
>> > > this cover letter.
>> > > 
>> > > Series also available in git, based on 5.16-rc3:
>> > > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2
>> > 
>> > Pushed a new branch slab-struct-slab-v3r3 with accumulated fixes and small 
>> > tweaks
>> > and a new patch from Hyeonggon Yoo on top. To avoid too much spam, here's 
>> > a range diff:
>> 
>> Hi Vlastimil!
>> 
>> I've started to review this patchset (btw, a really nice work, I like
>> the resulting code way more). Because I'm looking at v3 and I don't have

Thanks a lot, Roman!

...

> 
> * mm/slab: Convert most struct page to struct slab by spatch
> 
> Another patch with the same title? Rebase error?
> 
> * mm/slab: Finish struct page to struct slab conversion
> 
> And this one too?

No, these are for mm/slab.c, the previous were for mm/slub.c :)

> 
> Thanks!
> 
> Roman

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-14 Thread Vlastimil Babka
On 12/14/21 15:38, Hyeonggon Yoo wrote:
> On Tue, Dec 14, 2021 at 01:57:22PM +0100, Vlastimil Babka wrote:
>> On 12/1/21 19:14, Vlastimil Babka wrote:
>> > Folks from non-slab subsystems are Cc'd only to patches affecting them, and
>> > this cover letter.
>> > 
>> > Series also available in git, based on 5.16-rc3:
>> > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2
>> 
>> Pushed a new branch slab-struct-slab-v3r3 with accumulated fixes and small 
>> tweaks
>> and a new patch from Hyeonggon Yoo on top. To avoid too much spam, here's a 
>> range diff:
>> 
> 
> Hello Vlastimil, Thank you for nice work.
> I'm going to review and test new version soon in free time.

Thanks!

> Btw, I gave you some review and test tags and seems to be missing in new
> series. Did I do review/test process wrongly? It's first time to review
> patches so please let me know if I did it wrongly.

You did right, sorry! I didn't include them as those were for patches that I
was additionally changing after your review/test and the decision what is
substantial change enough to need a new test/review is often fuzzy. So if
you can recheck the new versions it would be great and then I will pick that
up, thanks!

> --
> Thank you.
> Hyeonggon.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-14 Thread Vlastimil Babka
On 12/1/21 19:14, Vlastimil Babka wrote:
> Folks from non-slab subsystems are Cc'd only to patches affecting them, and
> this cover letter.
> 
> Series also available in git, based on 5.16-rc3:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2

Pushed a new branch slab-struct-slab-v3r3 with accumulated fixes and small 
tweaks
and a new patch from Hyeonggon Yoo on top. To avoid too much spam, here's a 
range diff:

 1:  10b656f9eb1e =  1:  10b656f9eb1e mm: add virt_to_folio() and 
folio_address()
 2:  5e6ad846acf1 =  2:  5e6ad846acf1 mm/slab: Dissolve slab_map_pages() in its 
caller
 3:  48d4e9407aa0 =  3:  48d4e9407aa0 mm/slub: Make object_err() static
 4:  fe1e19081321 =  4:  fe1e19081321 mm: Split slab into its own type
 5:  af7fd46fbb9b =  5:  af7fd46fbb9b mm: Add account_slab() and 
unaccount_slab()
 6:  7ed088d601d9 =  6:  7ed088d601d9 mm: Convert virt_to_cache() to use struct 
slab
 7:  1d41188b9401 =  7:  1d41188b9401 mm: Convert __ksize() to struct slab
 8:  5d9d1231461f !  8:  8fd22e0b086e mm: Use struct slab in kmem_obj_info()
@@ Commit message
 slab type instead of the page type, we make it obvious that this can
 only be called for slabs.
 
+[ vba...@suse.cz: also convert the related kmem_valid_obj() to folios ]
+
 Signed-off-by: Matthew Wilcox (Oracle) 
 Signed-off-by: Vlastimil Babka 
 
@@ mm/slab.h: struct kmem_obj_info {
  #endif /* MM_SLAB_H */
 
  ## mm/slab_common.c ##
+@@ mm/slab_common.c: bool slab_is_available(void)
+  */
+ bool kmem_valid_obj(void *object)
+ {
+-  struct page *page;
++  struct folio *folio;
+ 
+   /* Some arches consider ZERO_SIZE_PTR to be a valid address. */
+   if (object < (void *)PAGE_SIZE || !virt_addr_valid(object))
+   return false;
+-  page = virt_to_head_page(object);
+-  return PageSlab(page);
++  folio = virt_to_folio(object);
++  return folio_test_slab(folio);
+ }
+ EXPORT_SYMBOL_GPL(kmem_valid_obj);
+ 
 @@ mm/slab_common.c: void kmem_dump_obj(void *object)
  {
char *cp = IS_ENABLED(CONFIG_MMU) ? "" : "/vmalloc";
@@ mm/slub.c: int __kmem_cache_shutdown(struct kmem_cache *s)
objp = base + s->size * objnr;
kpp->kp_objp = objp;
 -  if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size 
|| (objp - base) % s->size) ||
-+  if (WARN_ON_ONCE(objp < base || objp >= base + slab->objects * s->size 
|| (objp - base) % s->size) ||
++  if (WARN_ON_ONCE(objp < base || objp >= base + slab->objects * s->size
++   || (objp - base) % s->size) ||
!(s->flags & SLAB_STORE_USER))
return;
  #ifdef CONFIG_SLUB_DEBUG
 9:  3aef771be335 !  9:  c97e73c3b6c2 mm: Convert check_heap_object() to use 
struct slab
@@ mm/slab.h: struct kmem_obj_info {
 +#else
 +static inline
 +void __check_heap_object(const void *ptr, unsigned long n,
-+   const struct slab *slab, bool to_user) { }
++   const struct slab *slab, bool to_user)
++{
++}
 +#endif
 +
  #endif /* MM_SLAB_H */
10:  2253e45e6bef = 10:  da05e0f7179c mm/slub: Convert detached_freelist to use 
a struct slab
11:  f28202bc27ba = 11:  383887e77104 mm/slub: Convert kfree() to use a struct 
slab
12:  31b58b1e914f = 12:  c46be093c637 mm/slub: Convert __slab_lock() and 
__slab_unlock() to struct slab
13:  636406a3ad59 = 13:  49dbbf917052 mm/slub: Convert print_page_info() to 
print_slab_info()
14:  3b49efda3b6f = 14:  4bb0c932156a mm/slub: Convert alloc_slab_page() to 
return a struct slab
15:  61a195526d3b ! 15:  4b9761b5cfab mm/slub: Convert __free_slab() to use 
struct slab
@@ mm/slub.c: static struct page *new_slab(struct kmem_cache *s, gfp_t 
flags, int n
  
 -  __ClearPageSlabPfmemalloc(page);
 -  __ClearPageSlab(page);
+-  /* In union with page->mapping where page allocator expects NULL */
+-  page->slab_cache = NULL;
 +  __slab_clear_pfmemalloc(slab);
 +  __folio_clear_slab(folio);
-   /* In union with page->mapping where page allocator expects NULL */
--  page->slab_cache = NULL;
-+  slab->slab_cache = NULL;
++  folio->mapping = NULL;
if (current->reclaim_state)
current->reclaim_state->reclaimed_slab += pages;
 -  unaccount_slab(page_slab(page), order, s);
16:  987c7ed31580 = 16:  f384ec918065 mm/slub: Convert pfmemalloc_match() to 
take a struct slab
17:  cc742564237e ! 17:  06738ade4e17 mm/slub: Convert most struct page to 
struct slab by spatch
@@ Commit message
 
 // Options: --include-headers --no-includes --smpl-spacing 
include/linux/slub_def.h mm/slub.c
 // Note: needs coccinelle 1.1.1 to avoid breaking whitespace, and 
ocaml 

Re: [PATCH v2 00/33] Separate struct slab from struct page

2021-12-02 Thread Vlastimil Babka
On 12/1/21 19:14, Vlastimil Babka wrote:
> Folks from non-slab subsystems are Cc'd only to patches affecting them, and
> this cover letter.
> 
> Series also available in git, based on 5.16-rc3:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2

I have pushed a v3, but not going to resent immediately to avoid unnecessary
spamming, the differences is just that some patches are removed and other
reordered, so the current v2 posting should be still sufficient for on-list
review:

https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v3r1

patch 29/33 iommu: Use put_pages_list
- removed as this version is broken and Robin Murphy has meanwhile
incorporated it partially to his series:
https://lore.kernel.org/lkml/cover.1637671820.git.robin.mur...@arm.com/

patch 30/33 mm: Remove slab from struct page
- removed and postponed for later as this can be only be applied after the
iommu use of page.freelist is resolved

patch 27/33 zsmalloc: Stop using slab fields in struct page
patch 28/33 bootmem: Use page->index instead of page->freelist
- moved towards the end of series, to further separate the part that adjusts
non-slab users of slab fields towards removing those fields from struct page.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 29/33] iommu: Use put_pages_list

2021-12-01 Thread Vlastimil Babka
From: "Matthew Wilcox (Oracle)" 

page->freelist is for the use of slab.  We already have the ability
to free a list of pages in the core mm, but it requires the use of a
list_head and for the pages to be chained together through page->lru.
Switch the iommu code over to using put_pages_list().

Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Vlastimil Babka 
Cc: Joerg Roedel 
Cc: Suravee Suthikulpanit 
Cc: Will Deacon 
Cc: David Woodhouse 
Cc: Lu Baolu 
Cc: 
---
 drivers/iommu/amd/io_pgtable.c | 59 +-
 drivers/iommu/dma-iommu.c  | 11 +
 drivers/iommu/intel/iommu.c| 89 --
 include/linux/iommu.h  |  3 +-
 4 files changed, 57 insertions(+), 105 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 182c93a43efd..08ea6a02cda9 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -74,27 +74,15 @@ static u64 *first_pte_l7(u64 *pte, unsigned long *page_size,
  *
  /
 
-static void free_page_list(struct page *freelist)
-{
-   while (freelist != NULL) {
-   unsigned long p = (unsigned long)page_address(freelist);
-
-   freelist = freelist->freelist;
-   free_page(p);
-   }
-}
-
-static struct page *free_pt_page(unsigned long pt, struct page *freelist)
+static void free_pt_page(unsigned long pt, struct list_head *list)
 {
struct page *p = virt_to_page((void *)pt);
 
-   p->freelist = freelist;
-
-   return p;
+   list_add(>lru, list);
 }
 
 #define DEFINE_FREE_PT_FN(LVL, FN) 
\
-static struct page *free_pt_##LVL (unsigned long __pt, struct page *freelist)  
\
+static void free_pt_##LVL (unsigned long __pt, struct list_head *list) 
\
 {  
\
unsigned long p;
\
u64 *pt;
\
@@ -113,10 +101,10 @@ static struct page *free_pt_##LVL (unsigned long __pt, 
struct page *freelist) \
continue;   
\

\
p = (unsigned long)IOMMU_PTE_PAGE(pt[i]);   
\
-   freelist = FN(p, freelist); 
\
+   FN(p, list);
\
}   
\

\
-   return free_pt_page((unsigned long)pt, freelist);   
\
+   free_pt_page((unsigned long)pt, list);  
\
 }
 
 DEFINE_FREE_PT_FN(l2, free_pt_page)
@@ -125,36 +113,33 @@ DEFINE_FREE_PT_FN(l4, free_pt_l3)
 DEFINE_FREE_PT_FN(l5, free_pt_l4)
 DEFINE_FREE_PT_FN(l6, free_pt_l5)
 
-static struct page *free_sub_pt(unsigned long root, int mode,
-   struct page *freelist)
+static void free_sub_pt(unsigned long root, int mode, struct list_head *list)
 {
switch (mode) {
case PAGE_MODE_NONE:
case PAGE_MODE_7_LEVEL:
break;
case PAGE_MODE_1_LEVEL:
-   freelist = free_pt_page(root, freelist);
+   free_pt_page(root, list);
break;
case PAGE_MODE_2_LEVEL:
-   freelist = free_pt_l2(root, freelist);
+   free_pt_l2(root, list);
break;
case PAGE_MODE_3_LEVEL:
-   freelist = free_pt_l3(root, freelist);
+   free_pt_l3(root, list);
break;
case PAGE_MODE_4_LEVEL:
-   freelist = free_pt_l4(root, freelist);
+   free_pt_l4(root, list);
break;
case PAGE_MODE_5_LEVEL:
-   freelist = free_pt_l5(root, freelist);
+   free_pt_l5(root, list);
break;
case PAGE_MODE_6_LEVEL:
-   freelist = free_pt_l6(root, freelist);
+   free_pt_l6(root, list);
break;
default:
BUG();
}
-
-   return freelist;
 }
 
 void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
@@ -362,7 +347,7 @@ static u64 *fetch_pte(struct amd_io_pgtable *pgtable,
return pte;
 }
 
-static struct page *free_clear_pte(u64 *pte, u64 pteval, struct page *freelist)
+static void free_clear_pte(u64 *pte, u64 pteval, struct list_head *list)
 {
unsigned long pt;
int mode;
@@ -373,12 +358,12 @@ static struct page *free_clear_pte(u64 *pte, u64 pteval, 
struct page *freelist)
}
 
if (

[PATCH v2 00/33] Separate struct slab from struct page

2021-12-01 Thread Vlastimil Babka
Folks from non-slab subsystems are Cc'd only to patches affecting them, and
this cover letter.

Series also available in git, based on 5.16-rc3:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v2r2

The plan: as my SLUB PREEMPT_RT series in 5.15, I would prefer to go again with
the git pull request way of eventually merging this, as it's also not a small
series. I will thus reply to this mail with asking to include my branch in
linux-next.

As stated in the v1/RFC cover letter, I wouldn't mind to then continue with
maintaining a git tree for all slab patches in general. It was apparently
already done that way before, by Pekka:
https://lore.kernel.org/linux-mm/alpine.DEB.2.00.1107221108190.2996@tiger/

Changes from v1/RFC:
https://lore.kernel.org/all/2026001628.24216-1-vba...@suse.cz/
- Added virt_to_folio() and folio_address() in the new Patch 1.
- Addressed feedback from Andrey Konovalov and Matthew Wilcox (Thanks!)
- Added Tested-by: Marco Elver for the KFENCE parts (Thanks!)

Previous version from Matthew Wilcox:
https://lore.kernel.org/all/20211004134650.4031813-1-wi...@infradead.org/

LWN coverage of the above:
https://lwn.net/Articles/871982/

This is originally an offshoot of the folio work by Matthew. One of the more
complex parts of the struct page definition are the parts used by the slab
allocators. It would be good for the MM in general if struct slab were its own
data type, and it also helps to prevent tail pages from slipping in anywhere.
As Matthew requested in his proof of concept series, I have taken over the
development of this series, so it's a mix of patches from him (often modified
by me) and my own.

One big difference is the use of coccinelle to perform the relatively trivial
parts of the conversions automatically and at once, instead of a larger number
of smaller incremental reviewable steps. Thanks to Julia Lawall and Luis
Chamberlain for all their help!

Another notable difference is (based also on review feedback) I don't represent
with a struct slab the large kmalloc allocations which are not really a slab,
but use page allocator directly. When going from an object address to a struct
slab, the code tests first folio slab flag, and only if it's set it converts to
struct slab. This makes the struct slab type stronger.

Finally, although Matthew's version didn't use any of the folio work, the
initial support has been merged meanwhile so my version builds on top of it
where appropriate. This eliminates some of the redundant compound_head()
being performed e.g. when testing the slab flag.

To sum up, after this series, struct page fields used by slab allocators are
moved from struct page to a new struct slab, that uses the same physical
storage. The availability of the fields is further distinguished by the
selected slab allocator implementation. The advantages include:

- Similar to folios, if the slab is of order > 0, struct slab always is
  guaranteed to be the head page. Additionally it's guaranteed to be an actual
  slab page, not a large kmalloc. This removes uncertainty and potential for
  bugs.
- It's not possible to accidentally use fields of the slab implementation that's
  not configured.
- Other subsystems cannot use slab's fields in struct page anymore (some
  existing non-slab usages had to be adjusted in this series), so slab
  implementations have more freedom in rearranging them in the struct slab.

Matthew Wilcox (Oracle) (16):
  mm: Split slab into its own type
  mm: Add account_slab() and unaccount_slab()
  mm: Convert virt_to_cache() to use struct slab
  mm: Convert __ksize() to struct slab
  mm: Use struct slab in kmem_obj_info()
  mm: Convert check_heap_object() to use struct slab
  mm/slub: Convert detached_freelist to use a struct slab
  mm/slub: Convert kfree() to use a struct slab
  mm/slub: Convert print_page_info() to print_slab_info()
  mm/slub: Convert pfmemalloc_match() to take a struct slab
  mm/slob: Convert SLOB to use struct slab
  mm/kasan: Convert to struct folio and struct slab
  zsmalloc: Stop using slab fields in struct page
  bootmem: Use page->index instead of page->freelist
  iommu: Use put_pages_list
  mm: Remove slab from struct page

Vlastimil Babka (17):
  mm: add virt_to_folio() and folio_address()
  mm/slab: Dissolve slab_map_pages() in its caller
  mm/slub: Make object_err() static
  mm/slub: Convert __slab_lock() and __slab_unlock() to struct slab
  mm/slub: Convert alloc_slab_page() to return a struct slab
  mm/slub: Convert __free_slab() to use struct slab
  mm/slub: Convert most struct page to struct slab by spatch
  mm/slub: Finish struct page to struct slab conversion
  mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab
  mm/slab: Convert most struct page to struct slab by spatch
  mm/slab: Finish struct page to struct slab conversion
  mm: Convert struct page to struct slab in functions used by other
subsystems
  mm/memcg: Convert slab objcgs from st

Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.

2021-11-30 Thread Vlastimil Babka
On 11/29/21 23:08, Zi Yan wrote:
> On 23 Nov 2021, at 12:32, Vlastimil Babka wrote:
> 
>> On 11/23/21 17:35, Zi Yan wrote:
>>> On 19 Nov 2021, at 10:15, Zi Yan wrote:
>>>>>> From what my understanding, cma required alignment of
>>>>>> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was 
>>>>>> introduced,
>>>>>> __free_one_page() does not prevent merging two different pageblocks, when
>>>>>> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() 
>>>>>> implementation
>>>>>> does prevent that.
>>>>>
>>>>> But it does prevent that only for isolated pageblock, not CMA, and yout
>>>>> patchset doesn't seem to expand that to CMA? Or am I missing something.
>>>>
>>>> Yeah, you are right. Originally, I thought preventing merging isolated 
>>>> pageblock
>>>> with other types of pageblocks is sufficient, since MIGRATE_CMA is always
>>>> converted from MIGRATE_ISOLATE. But that is not true. I will rework the 
>>>> code.
>>>> Thanks for pointing this out.
>>>>
>>>
>>> I find that two pageblocks with different migratetypes, like 
>>> MIGRATE_RECLAIMABLE
>>> and MIGRATE_MOVABLE can be merged into a single free page after I checked
>>> __free_one_page() in detail and printed pageblock information during buddy 
>>> page
>>> merging.
>>
>> Yes, that can happen.
>>
>> I am not sure what consequence it will cause. Do you have any idea?
>>
>> For MIGRATE_RECLAIMABLE or MIGRATE_MOVABLE or even MIGRATE_UNMOVABLE it's
>> absolutely fine. As long as these pageblocks are fully free (and they are if
>> it's a single free page spanning 2 pageblocks), they can be of any of these
>> type, as they can be reused as needed without causing fragmentation.
>>
>> But in case of MIGRATE_CMA and MIGRATE_ISOLATE, uncontrolled merging would
>> break the specifics of those types. That's why the code is careful for
>> MIGRATE_ISOLATE, and MIGRATE_CMA was until now done in MAX_ORDER granularity.
> 
> Thanks for the explanation. Basically migratetypes that can fall back to each
> other can be merged into a single free page, right?

Yes.

> How about MIGRATE_HIGHATOMIC? It should not be merged with other migratetypes
> from my understanding.

Hmm it shouldn't minimally because it has an accounting that would become
broken. So it should prevent merging or make sure the reservations are with
MAX_ORDER granularity, but seems that neither is true? CCing Mel.

> --
> Best Regards,
> Yan, Zi
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.

2021-11-23 Thread Vlastimil Babka
On 11/23/21 17:35, Zi Yan wrote:
> On 19 Nov 2021, at 10:15, Zi Yan wrote:
 From what my understanding, cma required alignment of
 max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was 
 introduced,
 __free_one_page() does not prevent merging two different pageblocks, when
 MAX_ORDER - 1 > pageblock_order. But current __free_one_page() 
 implementation
 does prevent that.
>>>
>>> But it does prevent that only for isolated pageblock, not CMA, and yout
>>> patchset doesn't seem to expand that to CMA? Or am I missing something.
>>
>> Yeah, you are right. Originally, I thought preventing merging isolated 
>> pageblock
>> with other types of pageblocks is sufficient, since MIGRATE_CMA is always
>> converted from MIGRATE_ISOLATE. But that is not true. I will rework the code.
>> Thanks for pointing this out.
>>
> 
> I find that two pageblocks with different migratetypes, like 
> MIGRATE_RECLAIMABLE
> and MIGRATE_MOVABLE can be merged into a single free page after I checked
> __free_one_page() in detail and printed pageblock information during buddy 
> page
> merging.

Yes, that can happen.

I am not sure what consequence it will cause. Do you have any idea?

For MIGRATE_RECLAIMABLE or MIGRATE_MOVABLE or even MIGRATE_UNMOVABLE it's
absolutely fine. As long as these pageblocks are fully free (and they are if
it's a single free page spanning 2 pageblocks), they can be of any of these
type, as they can be reused as needed without causing fragmentation.

But in case of MIGRATE_CMA and MIGRATE_ISOLATE, uncontrolled merging would
break the specifics of those types. That's why the code is careful for
MIGRATE_ISOLATE, and MIGRATE_CMA was until now done in MAX_ORDER granularity.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/3] Use pageblock_order for cma and alloc_contig_range alignment.

2021-11-19 Thread Vlastimil Babka
On 11/15/21 20:37, Zi Yan wrote:
> From: Zi Yan 
> 
> Hi David,
> 
> You suggested to make alloc_contig_range() deal with pageblock_order instead 
> of
> MAX_ORDER - 1 and get rid of MAX_ORDER - 1 dependency in virtio_mem[1]. This
> patchset is my attempt to achieve that. Please take a look and let me know if
> I am doing it correctly or not.
> 
> From what my understanding, cma required alignment of
> max(MAX_ORDER - 1, pageblock_order), because when MIGRATE_CMA was introduced,
> __free_one_page() does not prevent merging two different pageblocks, when
> MAX_ORDER - 1 > pageblock_order. But current __free_one_page() implementation
> does prevent that.

But it does prevent that only for isolated pageblock, not CMA, and yout
patchset doesn't seem to expand that to CMA? Or am I missing something.


> It should be OK to just align cma to pageblock_order.
> alloc_contig_range() relies on MIGRATE_CMA to get free pages, so it can use
> pageblock_order as alignment too.
> 
> In terms of virtio_mem, if I understand correctly, it relies on
> alloc_contig_range() to obtain contiguous free pages and offlines them to 
> reduce
> guest memory size. As the result of alloc_contig_range() alignment change,
> virtio_mem should be able to just align PFNs to pageblock_order.
> 
> Thanks.
> 
> 
> [1] 
> https://lore.kernel.org/linux-mm/28b57903-fae6-47ac-7e1b-a1dd41421...@redhat.com/
> 
> Zi Yan (3):
>   mm: cma: alloc_contig_range: use pageblock_order as the single
> alignment.
>   drivers: virtio_mem: use pageblock size as the minimum virtio_mem
> size.
>   arch: powerpc: adjust fadump alignment to be pageblock aligned.
> 
>  arch/powerpc/include/asm/fadump-internal.h |  4 +---
>  drivers/virtio/virtio_mem.c|  6 ++
>  include/linux/mmzone.h |  5 +
>  kernel/dma/contiguous.c|  2 +-
>  mm/cma.c   |  6 ++
>  mm/page_alloc.c| 12 +---
>  6 files changed, 12 insertions(+), 23 deletions(-)
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 28/32] iommu: Use put_pages_list

2021-11-15 Thread Vlastimil Babka
From: "Matthew Wilcox (Oracle)" 

page->freelist is for the use of slab.  We already have the ability
to free a list of pages in the core mm, but it requires the use of a
list_head and for the pages to be chained together through page->lru.
Switch the iommu code over to using free_pages_list().

Signed-off-by: Matthew Wilcox (Oracle) 
Signed-off-by: Vlastimil Babka 
Cc: Joerg Roedel 
Cc: Suravee Suthikulpanit 
Cc: Will Deacon 
Cc: David Woodhouse 
Cc: Lu Baolu 
Cc: 
---
 drivers/iommu/amd/io_pgtable.c | 59 +-
 drivers/iommu/dma-iommu.c  | 11 +
 drivers/iommu/intel/iommu.c| 89 --
 include/linux/iommu.h  |  3 +-
 4 files changed, 57 insertions(+), 105 deletions(-)

diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 182c93a43efd..d4f83e1b633f 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -74,27 +74,15 @@ static u64 *first_pte_l7(u64 *pte, unsigned long *page_size,
  *
  /
 
-static void free_page_list(struct page *freelist)
-{
-   while (freelist != NULL) {
-   unsigned long p = (unsigned long)page_address(freelist);
-
-   freelist = freelist->freelist;
-   free_page(p);
-   }
-}
-
-static struct page *free_pt_page(unsigned long pt, struct page *freelist)
+static void free_pt_page(unsigned long pt, struct list_head *list)
 {
struct page *p = virt_to_page((void *)pt);
 
-   p->freelist = freelist;
-
-   return p;
+   list_add_tail(>lru, list);
 }
 
 #define DEFINE_FREE_PT_FN(LVL, FN) 
\
-static struct page *free_pt_##LVL (unsigned long __pt, struct page *freelist)  
\
+static void free_pt_##LVL (unsigned long __pt, struct list_head *list) 
\
 {  
\
unsigned long p;
\
u64 *pt;
\
@@ -113,10 +101,10 @@ static struct page *free_pt_##LVL (unsigned long __pt, 
struct page *freelist) \
continue;   
\

\
p = (unsigned long)IOMMU_PTE_PAGE(pt[i]);   
\
-   freelist = FN(p, freelist); 
\
+   FN(p, list);
\
}   
\

\
-   return free_pt_page((unsigned long)pt, freelist);   
\
+   free_pt_page((unsigned long)pt, list);  
\
 }
 
 DEFINE_FREE_PT_FN(l2, free_pt_page)
@@ -125,36 +113,33 @@ DEFINE_FREE_PT_FN(l4, free_pt_l3)
 DEFINE_FREE_PT_FN(l5, free_pt_l4)
 DEFINE_FREE_PT_FN(l6, free_pt_l5)
 
-static struct page *free_sub_pt(unsigned long root, int mode,
-   struct page *freelist)
+static void free_sub_pt(unsigned long root, int mode, struct list_head *list)
 {
switch (mode) {
case PAGE_MODE_NONE:
case PAGE_MODE_7_LEVEL:
break;
case PAGE_MODE_1_LEVEL:
-   freelist = free_pt_page(root, freelist);
+   free_pt_page(root, list);
break;
case PAGE_MODE_2_LEVEL:
-   freelist = free_pt_l2(root, freelist);
+   free_pt_l2(root, list);
break;
case PAGE_MODE_3_LEVEL:
-   freelist = free_pt_l3(root, freelist);
+   free_pt_l3(root, list);
break;
case PAGE_MODE_4_LEVEL:
-   freelist = free_pt_l4(root, freelist);
+   free_pt_l4(root, list);
break;
case PAGE_MODE_5_LEVEL:
-   freelist = free_pt_l5(root, freelist);
+   free_pt_l5(root, list);
break;
case PAGE_MODE_6_LEVEL:
-   freelist = free_pt_l6(root, freelist);
+   free_pt_l6(root, list);
break;
default:
BUG();
}
-
-   return freelist;
 }
 
 void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
@@ -362,7 +347,7 @@ static u64 *fetch_pte(struct amd_io_pgtable *pgtable,
return pte;
 }
 
-static struct page *free_clear_pte(u64 *pte, u64 pteval, struct page *freelist)
+static void free_clear_pte(u64 *pte, u64 pteval, struct list_head *list)
 {
unsigned long pt;
int mode;
@@ -373,12 +358,12 @@ static struct page *free_clear_pte(u64 *pte, u64 pteval, 
struct page *freelist)
}
 
if (

[RFC PATCH 00/32] Separate struct slab from struct page

2021-11-15 Thread Vlastimil Babka
Folks from non-slab subsystems are Cc'd only to patches affecting them, and
this cover letter.

Series also available in git, based on 5.16-rc1:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-struct_slab-v1r13

Side note: as my SLUB PREEMPT_RT series in 5.15, I would prefer to repeat the
git pull request way of eventually merging this, as it's also not a small
series. Also I wouldn't mind to then continue with a git tree for all slab
patches in general. It was apparently even done that way before:
https://lore.kernel.org/linux-mm/alpine.DEB.2.00.1107221108190.2996@tiger/
What do other slab maintainers think?

Previous version from Matthew Wilcox:
https://lore.kernel.org/all/20211004134650.4031813-1-wi...@infradead.org/

LWN coverage of the above:
https://lwn.net/Articles/871982/

This is originally an offshoot of the folio work by Matthew. One of the more
complex parts of the struct page definition is the parts used by the slab
allocators. It would be good for the MM in general if struct slab were its own
data type, and it also helps to prevent tail pages from slipping in anywhere.
As Matthew requested in his proof of concept series, I have taken over the
development of this series, so it's a mix of patches from him (often modified
by me) and my own.

One big difference is the use of coccinelle to perform the less interesting
parts of the conversions automatically and at once, instead of a larger number
of smaller incremental reviewable steps. Thanks to Julia Lawall and Luis
Chamberlain for all their help!

Another notable difference is (based also on review feedback) I don't represent
with a struct slab the large kmalloc allocations which are not really a slab,
but use page allocator directly. When going from an object address to a struct
slab, the code tests first folio slab flag, and only if it's set it converts to
struct slab. This makes the struct slab type stronger.

Finally, although Matthew's version didn't use any of the folio work, the
initial support has been merged meanwhile so my version builds on top of it
where appropriate. This eliminates some of the redundant compound_head() e.g.
when testing the slab flag.

To sum up, after this series, struct page fields used by slab allocators are
moved from struct page to a new struct slab, that uses the same physical
storage. The availability of the fields is further distinguished by the
selected slab allocator implementation. The advantages include:

- Similar to plain folio, if the slab is of order > 0, struct slab always is
guaranteed to be the head page. Additionally it's guaranteed to be an actual
slab page, not a large kmalloc. This removes uncertainty and potential for
bugs.
- It's not possible to accidentally use fields of slab implementation that's
not actually selected.
- Other subsystems cannot use slab's fields in struct page anymore (some
existing non-slab usages had to be adjusted in this series), so slab
implementations have more freedom in rearranging them in the struct slab.

Matthew Wilcox (Oracle) (16):
  mm: Split slab into its own type
  mm: Add account_slab() and unaccount_slab()
  mm: Convert virt_to_cache() to use struct slab
  mm: Convert __ksize() to struct slab
  mm: Use struct slab in kmem_obj_info()
  mm: Convert check_heap_object() to use struct slab
  mm/slub: Convert detached_freelist to use a struct slab
  mm/slub: Convert kfree() to use a struct slab
  mm/slub: Convert print_page_info() to print_slab_info()
  mm/slub: Convert pfmemalloc_match() to take a struct slab
  mm/slob: Convert SLOB to use struct slab
  mm/kasan: Convert to struct slab
  zsmalloc: Stop using slab fields in struct page
  bootmem: Use page->index instead of page->freelist
  iommu: Use put_pages_list
  mm: Remove slab from struct page

Vlastimil Babka (16):
  mm/slab: Dissolve slab_map_pages() in its caller
  mm/slub: Make object_err() static
  mm/slub: Convert __slab_lock() and __slab_unlock() to struct slab
  mm/slub: Convert alloc_slab_page() to return a struct slab
  mm/slub: Convert __free_slab() to use struct slab
  mm/slub: Convert most struct page to struct slab by spatch
  mm/slub: Finish struct page to struct slab conversion
  mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab
  mm/slab: Convert most struct page to struct slab by spatch
  mm/slab: Finish struct page to struct slab conversion
  mm: Convert struct page to struct slab in functions used by other
subsystems
  mm/memcg: Convert slab objcgs from struct page to struct slab
  mm/kfence: Convert kfence_guarded_alloc() to struct slab
  mm/sl*b: Differentiate struct slab fields by sl*b implementations
  mm/slub: Simplify struct slab slabs field definition
  mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only
when enabled

 arch/x86/mm/init_64.c  |2 +-
 drivers/iommu/amd/io_pgtable.c |   59 +-
 drivers/iommu/dma-iommu.c  |   11 +-
 drivers/iommu/intel/iommu.c|   89 +--
 incl

Re: [PATCH v6 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2019-02-13 Thread Vlastimil Babka
On 1/22/19 11:51 PM, Nicolas Boichat wrote:
> Hi Andrew,
> 
> On Fri, Jan 11, 2019 at 6:21 PM Joerg Roedel  wrote:
>>
>> On Wed, Jan 02, 2019 at 01:51:45PM +0800, Nicolas Boichat wrote:
>> > Does anyone have any further comment on this series? If not, which
>> > maintainer is going to pick this up? I assume Andrew Morton?
>>
>> Probably, yes. I don't like to carry the mm-changes in iommu-tree, so
>> this should go through mm.
> 
> Gentle ping on this series, it seems like it's better if it goes
> through your tree.
> 
> Series still applies cleanly on linux-next, but I'm happy to resend if
> that helps.

Ping, Andrew?

> Thanks!
> 
>> Regards,
>>
>> Joerg
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-07 Thread Vlastimil Babka
On 12/7/18 7:16 AM, Nicolas Boichat wrote:
> IOMMUs using ARMv7 short-descriptor format require page tables
> (level 1 and 2) to be allocated within the first 4GB of RAM, even
> on 64-bit systems.
> 
> For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> is defined (e.g. on arm64 platforms).
> 
> For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. Note
> that we do not explicitly pass GFP_DMA[32] to kmem_cache_zalloc,
> as this is not strictly necessary, and would cause a warning
> in mm/sl*b.c, as we did not update GFP_SLAB_BUG_MASK.
> 
> Also, print an error when the physical address does not fit in
> 32-bit, to make debugging easier in the future.
> 
> Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")

Also, CC stable?

> Signed-off-by: Nicolas Boichat 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/3] mm: Add support for kmem caches in DMA32 zone

2018-12-06 Thread Vlastimil Babka
On 12/7/18 7:16 AM, Nicolas Boichat wrote:
> IOMMUs using ARMv7 short-descriptor format require page tables
> to be allocated within the first 4GB of RAM, even on 64-bit systems.
> On arm64, this is done by passing GFP_DMA32 flag to memory allocation
> functions.
> 
> For IOMMU L2 tables that only take 1KB, it would be a waste to allocate
> a full page using get_free_pages, so we considered 3 approaches:
>  1. This patch, adding support for GFP_DMA32 slab caches.
>  2. genalloc, which requires pre-allocating the maximum number of L2
> page tables (4096, so 4MB of memory).
>  3. page_frag, which is not very memory-efficient as it is unable
> to reuse freed fragments until the whole page is freed.
> 
> This change makes it possible to create a custom cache in DMA32 zone
> using kmem_cache_create, then allocate memory using kmem_cache_alloc.
> 
> We do not create a DMA32 kmalloc cache array, as there are currently
> no users of kmalloc(..., GFP_DMA32). These calls will continue to
> trigger a warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK.
> 
> This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32
> kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and
> unnecessary).
> 
> Signed-off-by: Nicolas Boichat 

Acked-by: Vlastimil Babka 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-06 Thread Vlastimil Babka
On 12/6/18 4:49 AM, Nicolas Boichat wrote:
>> So it would be fine even unchanged. The check would anyway need some
>> more love to catch the same with __GFP_DMA to be consistent and cover
>> all corner cases.
> Yes, the test is not complete. If we really wanted this to be
> accurate, we'd need to check that GFP_* exactly matches SLAB_CACHE_*.
> 
> The only problem with dropping this is test that we should restore
> GFP_DMA32 warning/errors somewhere else (as Christopher pointed out
> here: https://lkml.org/lkml/2018/11/22/430), especially for kmalloc
> case.

I meant just dropping that patch hunk, not the whole test. Then the test
stays as it is and will keep warning anyone calling kmalloc(GFP_DMA32).
It would also warn anyone calling kmem_cache_alloc(GFP_DMA32) on
SLAB_CACHE_DMA32 cache, but since the gfp can be just dropped, and you
as the only user of this so far will do that, it's fine?

> Maybe this can be done in kmalloc_slab.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-05 Thread Vlastimil Babka
On 12/5/18 6:48 AM, Nicolas Boichat wrote:
> IOMMUs using ARMv7 short-descriptor format require page tables
> (level 1 and 2) to be allocated within the first 4GB of RAM, even
> on 64-bit systems.
> 
> For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
> is defined (e.g. on arm64 platforms).
> 
> For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32.
> 
> Also, print an error when the physical address does not fit in
> 32-bit, to make debugging easier in the future.
> 
> Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> Signed-off-by: Nicolas Boichat 
> ---
> 
> Changes since v2:
>  - Commit message
> 
> (v3 used the page_frag approach)
> 
>  drivers/iommu/io-pgtable-arm-v7s.c | 20 
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index 445c3bde04800c..996f7b6d00b44a 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -161,6 +161,14 @@
>  
>  #define ARM_V7S_TCR_PD1  BIT(5)
>  
> +#ifdef CONFIG_ZONE_DMA32
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
> +#else
> +#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
> +#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
> +#endif
> +
>  typedef u32 arm_v7s_iopte;
>  
>  static bool selftest_running;
> @@ -198,13 +206,17 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
>   void *table = NULL;
>  
>   if (lvl == 1)
> - table = (void *)__get_dma_pages(__GFP_ZERO, get_order(size));
> + table = (void *)__get_free_pages(
> + __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA, get_order(size));
>   else if (lvl == 2)
> - table = kmem_cache_zalloc(data->l2_tables, gfp | GFP_DMA);
> + table = kmem_cache_zalloc(data->l2_tables,
> +   gfp | ARM_V7S_TABLE_GFP_DMA);

So as I've explained in 2/3, you don't need ARM_V7S_TABLE_GFP_DMA here
(and then you don't need to adjust the slab warnings).

>   phys = virt_to_phys(table);
> - if (phys != (arm_v7s_iopte)phys)
> + if (phys != (arm_v7s_iopte)phys) {
>   /* Doesn't fit in PTE */
> + dev_err(dev, "Page table does not fit in PTE: %pa", );
>   goto out_free;
> + }
>   if (table && !(cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA)) {
>   dma = dma_map_single(dev, table, size, DMA_TO_DEVICE);
>   if (dma_mapping_error(dev, dma))
> @@ -737,7 +749,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
> io_pgtable_cfg *cfg,
>   data->l2_tables = kmem_cache_create("io-pgtable_armv7s_l2",
>   ARM_V7S_TABLE_SIZE(2),
>   ARM_V7S_TABLE_SIZE(2),
> - SLAB_CACHE_DMA, NULL);
> + ARM_V7S_TABLE_SLAB_CACHE, NULL);
>   if (!data->l2_tables)
>   goto out_free_data;
>  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-05 Thread Vlastimil Babka
On 12/5/18 6:48 AM, Nicolas Boichat wrote:
> In some cases (e.g. IOMMU ARMv7s page allocator), we need to allocate
> data structures smaller than a page with GFP_DMA32 flag.
> 
> This change makes it possible to create a custom cache in DMA32 zone
> using kmem_cache_create, then allocate memory using kmem_cache_alloc.
> 
> We do not create a DMA32 kmalloc cache array, as there are currently
> no users of kmalloc(..., GFP_DMA32). The new test in check_slab_flags
> ensures that such calls still fail (as they do before this change).
> 
> Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")

Same as my comment for 1/3.

> Signed-off-by: Nicolas Boichat 

In general,
Acked-by: Vlastimil Babka 

Some comments below:

> ---
> 
> Changes since v2:
>  - Clarified commit message
>  - Add entry in sysfs-kernel-slab to document the new sysfs file
> 
> (v3 used the page_frag approach)
> 
> Documentation/ABI/testing/sysfs-kernel-slab |  9 +
>  include/linux/slab.h|  2 ++
>  mm/internal.h   |  8 ++--
>  mm/slab.c   |  4 +++-
>  mm/slab.h   |  3 ++-
>  mm/slab_common.c|  2 +-
>  mm/slub.c   | 18 +-
>  7 files changed, 40 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
> b/Documentation/ABI/testing/sysfs-kernel-slab
> index 29601d93a1c2ea..d742c6cfdffbe9 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-slab
> +++ b/Documentation/ABI/testing/sysfs-kernel-slab
> @@ -106,6 +106,15 @@ Description:
>   are from ZONE_DMA.
>   Available when CONFIG_ZONE_DMA is enabled.
>  
> +What:/sys/kernel/slab/cache/cache_dma32
> +Date:December 2018
> +KernelVersion:   4.21
> +Contact: Nicolas Boichat 
> +Description:
> + The cache_dma32 file is read-only and specifies whether objects
> + are from ZONE_DMA32.
> + Available when CONFIG_ZONE_DMA32 is enabled.

I don't have a strong opinion. It's a new file, yeah, but consistent
with already existing ones. I'd leave the decision with SL*B maintainers.

>  What:/sys/kernel/slab/cache/cpu_slabs
>  Date:May 2007
>  KernelVersion:   2.6.22
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 11b45f7ae4057c..9449b19c5f107a 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -32,6 +32,8 @@
>  #define SLAB_HWCACHE_ALIGN   ((slab_flags_t __force)0x2000U)
>  /* Use GFP_DMA memory */
>  #define SLAB_CACHE_DMA   ((slab_flags_t __force)0x4000U)
> +/* Use GFP_DMA32 memory */
> +#define SLAB_CACHE_DMA32 ((slab_flags_t __force)0x8000U)
>  /* DEBUG: Store the last owner for bug hunting */
>  #define SLAB_STORE_USER  ((slab_flags_t __force)0x0001U)
>  /* Panic if kmem_cache_create() fails */
> diff --git a/mm/internal.h b/mm/internal.h
> index a2ee82a0cd44ae..fd244ad716eaf8 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  /*
> @@ -34,9 +35,12 @@
>  #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
>  
>  /* Check for flags that must not be used with a slab allocator */
> -static inline gfp_t check_slab_flags(gfp_t flags)
> +static inline gfp_t check_slab_flags(gfp_t flags, slab_flags_t slab_flags)
>  {
> - gfp_t bug_mask = __GFP_DMA32 | __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> + gfp_t bug_mask = __GFP_HIGHMEM | ~__GFP_BITS_MASK;
> +
> + if (!IS_ENABLED(CONFIG_ZONE_DMA32) || !(slab_flags & SLAB_CACHE_DMA32))
> + bug_mask |= __GFP_DMA32;

I'll point out that this is not even strictly needed AFAICS, as only
flags passed to kmem_cache_alloc() are checked - the cache->allocflags
derived from SLAB_CACHE_DMA32 are appended only after check_slab_flags()
(in both SLAB and SLUB AFAICS). And for a cache created with
SLAB_CACHE_DMA32, the caller of kmem_cache_alloc() doesn't need to also
include __GFP_DMA32, the allocation will be from ZONE_DMA32 regardless.
So it would be fine even unchanged. The check would anyway need some
more love to catch the same with __GFP_DMA to be consistent and cover
all corner cases.

>  
>   if (unlikely(flags & bug_mask)) {
>   gfp_t invalid_mask = flags & bug_mask;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/3] mm: slab/slub: Add check_slab_flags function to check for valid flags

2018-12-05 Thread Vlastimil Babka
On 12/5/18 6:48 AM, Nicolas Boichat wrote:
> Remove duplicated code between slab and slub, and will make it
> easier to make the test more complicated in the next commits.
> 
> Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")

Well, not really. Patch 3 does that and yeah this will be a prerequisity
for a clean stable backport, but we don't tag all prerequisities like
this, I think?

> Signed-off-by: Nicolas Boichat 

Acked-by: Vlastimil Babka 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-12-04 Thread Vlastimil Babka
On 12/4/18 10:37 AM, Nicolas Boichat wrote:
> On Sun, Nov 11, 2018 at 5:04 PM Nicolas Boichat  wrote:
>>
>> This is a follow-up to the discussion in [1], to make sure that the page
>> tables allocated by iommu/io-pgtable-arm-v7s are contained within 32-bit
>> physical address space.
>>
>> [1] 
>> https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
> 
> Hi everyone,
> 
> Let's try to summarize here.
> 
> First, we confirmed that this is a regression, and IOMMU errors happen
> on 4.19 and linux-next/master on MT8173 (elm, Acer Chromebook R13).
> The issue most likely starts from ad67f5a6545f ("arm64: replace
> ZONE_DMA with ZONE_DMA32"), i.e. 4.15, and presumably breaks a number
> of Mediatek platforms (and maybe others?).
> 
> We have a few options here:
> 1. This series [2], that adds support for GFP_DMA32 slab caches,
> _without_ adding kmalloc caches (since there are no users of
> kmalloc(..., GFP_DMA32)). I think I've addressed all the comments on
> the 3 patches, and AFAICT this solution works fine.
> 2. genalloc. That works, but unless we preallocate 4MB for L2 tables
> (which is wasteful as we usually only need a handful of L2 tables),
> we'll need changes in the core (use GFP_ATOMIC) to allow allocating on
> demand, and as it stands we'd have no way to shrink the allocation.
> 3. page_frag [3]. That works fine, and the code is quite simple. One
> drawback is that fragments in partially freed pages cannot be reused
> (from limited experiments, I see that IOMMU L2 tables are rarely
> freed, so it's unlikely a whole page would get freed). But given the
> low number of L2 tables, maybe we can live with that.
> 
> I think 2 is out. Any preference between 1 and 3? I think 1 makes
> better use of the memory, so that'd be my preference. But I'm probably
> missing something.

I would prefer 1 as well. IIRC you already confirmed that alignment
requirements are not broken for custom kmem caches even in presence of
SLUB debug options (and I would say it's a bug to be fixed if they
weren't). I just asked (and didn't get a reply I think) about your
ability to handle the GFP_ATOMIC allocation failures. They should be
rare when only single page allocations are needed for the kmem cache.
But in case they are not an option, then preallocating would be needed,
thus probably option 2.

> [2] https://patchwork.kernel.org/cover/10677529/, 3 patches
> [3] https://patchwork.codeaurora.org/patch/671639/
> 
> Thanks,
> 
> Nicolas
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-11-23 Thread Vlastimil Babka
On 11/22/18 9:23 AM, Christoph Hellwig wrote:
> On Wed, Nov 21, 2018 at 10:26:26PM +, Robin Murphy wrote:
>> TBH, if this DMA32 stuff is going to be contentious we could possibly just
>> rip out the offending kmem_cache - it seemed like good practice for the
>> use-case, but provided kzalloc(SZ_1K, gfp | GFP_DMA32) can be relied upon to
>> give the same 1KB alignment and chance of succeeding as the equivalent
>> kmem_cache_alloc(), then we could quite easily make do with that instead.
> 
> Neither is the slab support for kmalloc, not do kmalloc allocations
> have useful alignment apparently (at least if you use slub debug).

Is this also true for caches created by kmem_cache_create(), that
debugging options can result in not respecting the alignment passed to
kmem_cache_create()? That would be rather bad, IMHO.

> But I do agree with the sentiment of not wanting to spread GFP_DMA32
> futher into the slab allocator.

I don't see a problem with GFP_DMA32 for custom caches. Generic
kmalloc() would be worse, since it would have to create a new array of
kmalloc caches. But that's already ruled out due to the alignment.

> I think you want a simple genalloc allocator for this rather special
> use case.

I would prefer if slab could support it, as it doesn't have to
preallocate. OTOH if the allocations are GFP_ATOMIC as suggested later
in the thread, and need to always succeed, then preallocation could be
better, and thus maybe genalloc.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-11-23 Thread Vlastimil Babka
On 11/22/18 2:20 AM, Nicolas Boichat wrote:
> On Thu, Nov 22, 2018 at 2:02 AM Michal Hocko  wrote:
>>
>> On Wed 21-11-18 16:46:38, Will Deacon wrote:
>>> On Sun, Nov 11, 2018 at 05:03:41PM +0800, Nicolas Boichat wrote:
>>>
>>> It's a bit grotty that GFP_DMA32 doesn't just map to GFP_DMA on 32-bit
>>> architectures, since then we wouldn't need this #ifdeffery afaict.
>>
>> But GFP_DMA32 should map to GFP_KERNEL on 32b, no? Or what exactly is
>> going on in here?
> 
> GFP_DMA32 will fail due to check_slab_flags (aka GFP_SLAB_BUG_MASK
> before patch 1/3 of this series)... But yes, it may be neater if there
> was transparent remapping of GFP_DMA32/SLAB_CACHE_DMA32 to
> GFP_DMA/SLAB_CACHE_DMA on 32-bit arch...

I don't know about ARM, but AFAIK on x86 DMA means within first 4MB of
physical memory, and DMA32 means within first 4GB. It doesn't matter if
the CPU is running in 32bit or 64bit mode. But, when it runs 32bit, the
kernel can direct map less than 4GB anyway, which means it doesn't need
the extra DMA32 zone, i.e. GFP_KERNEL can only get you memory that's
also acceptable for GFP_DMA32.
But, DMA is still DMA, i.e. first 4MB. Remapping GFP_DMA32 to GFP_DMA on
x86 wouldn't work, as the GFP_DMA32 allocations would then only use
those 4MB and exhaust it very fast.

>> --
>> Michal Hocko
>> SUSE Labs

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 1/3] mm: When CONFIG_ZONE_DMA32 is set, use DMA32 for SLAB_CACHE_DMA

2018-11-09 Thread Vlastimil Babka
On 11/9/18 12:57 PM, Nicolas Boichat wrote:
> On Fri, Nov 9, 2018 at 6:43 PM Vlastimil Babka  wrote:
>> Also I'm probably missing the point of this all. In patch 3 you use
>> __get_dma32_pages() thus __get_free_pages(__GFP_DMA32), which uses
>> alloc_pages, thus the page allocator directly, and there's no slab
>> caches involved.
> 
> __get_dma32_pages fixes level 1 page allocations in the patch 3.
> 
> This change fixes level 2 page allocations
> (kmem_cache_zalloc(data->l2_tables, gfp | GFP_DMA)), by transparently
> remapping GFP_DMA to an underlying ZONE_DMA32.
> 
> The alternative would be to create a new SLAB_CACHE_DMA32 when
> CONFIG_ZONE_DMA32 is defined, but then I'm concerned that the callers
> would need to choose between the 2 (GFP_DMA or GFP_DMA32...), and also
> need to use some ifdefs (but maybe that's not a valid concern?).
> 
>> It makes little sense to involve slab for page table
>> allocations anyway, as those tend to be aligned to a page size (or
>> high-order page size). So what am I missing?
> 
> Level 2 tables are ARM_V7S_TABLE_SIZE(2) => 1kb, so we'd waste 3kb if
> we allocated a full page.

Oh, I see.

Well, I think indeed the most transparent would be to support
SLAB_CACHE_DMA32. The callers of kmem_cache_zalloc() would then need not
add anything special to gfp, as that's stored internally upon
kmem_cache_create(). Of course SLAB_BUG_MASK would no longer have to
treat __GFP_DMA32 as unexpected. It would be unexpected when passed to
kmalloc() which doesn't have special dma32 caches, but for a cache
explicitly created to allocate from ZONE_DMA32, I don't see why not. I'm
somewhat surprised that there wouldn't be a need for this earlier, so
maybe I'm still missing something.

> Thanks,
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 1/3] mm: When CONFIG_ZONE_DMA32 is set, use DMA32 for SLAB_CACHE_DMA

2018-11-09 Thread Vlastimil Babka
On 11/9/18 9:24 AM, Nicolas Boichat wrote:
> Some callers, namely iommu/io-pgtable-arm-v7s, expect the physical
> address returned by kmem_cache_alloc with GFP_DMA parameter to be
> a 32-bit address.
> 
> Instead of adding a separate SLAB_CACHE_DMA32 (and then audit
> all the calls to check if they require memory from DMA or DMA32
> zone), we simply allocate SLAB_CACHE_DMA cache in DMA32 region,
> if CONFIG_ZONE_DMA32 is set.
> 
> Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
> Signed-off-by: Nicolas Boichat 
> ---
>  include/linux/slab.h | 13 -
>  mm/slab.c|  2 +-
>  mm/slub.c|  2 +-
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 918f374e7156f4..390afe90c5dec0 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -30,7 +30,7 @@
>  #define SLAB_POISON  ((slab_flags_t __force)0x0800U)
>  /* Align objs on cache lines */
>  #define SLAB_HWCACHE_ALIGN   ((slab_flags_t __force)0x2000U)
> -/* Use GFP_DMA memory */
> +/* Use GFP_DMA or GFP_DMA32 memory */
>  #define SLAB_CACHE_DMA   ((slab_flags_t __force)0x4000U)
>  /* DEBUG: Store the last owner for bug hunting */
>  #define SLAB_STORE_USER  ((slab_flags_t __force)0x0001U)
> @@ -126,6 +126,17 @@
>  #define ZERO_OR_NULL_PTR(x) ((unsigned long)(x) <= \
>   (unsigned long)ZERO_SIZE_PTR)
>  
> +/*
> + * When ZONE_DMA32 is defined, have SLAB_CACHE_DMA allocate memory with
> + * GFP_DMA32 instead of GFP_DMA, as this is what some of the callers
> + * require (instead of duplicating cache for DMA and DMA32 zones).
> + */
> +#ifdef CONFIG_ZONE_DMA32
> +#define SLAB_CACHE_DMA_GFP GFP_DMA32
> +#else
> +#define SLAB_CACHE_DMA_GFP GFP_DMA
> +#endif

AFAICS this will break e.g. x86 which can have both ZONE_DMA and
ZONE_DMA32, and now you would make kmalloc(__GFP_DMA) return objects
from ZONE_DMA32 instead of __ZONE_DMA, which can break something.

Also I'm probably missing the point of this all. In patch 3 you use
__get_dma32_pages() thus __get_free_pages(__GFP_DMA32), which uses
alloc_pages, thus the page allocator directly, and there's no slab
caches involved. It makes little sense to involve slab for page table
allocations anyway, as those tend to be aligned to a page size (or
high-order page size). So what am I missing?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] mm/cma: remove unsupported gfp_mask parameter from cma_alloc()

2018-07-16 Thread Vlastimil Babka
On 07/09/2018 02:19 PM, Marek Szyprowski wrote:
> cma_alloc() function doesn't really support gfp flags other than
> __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn parameter.
> 
> This will help to avoid giving false feeling that this function supports
> standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
> what has already been an issue: see commit dd65a941f6ba ("arm64:
> dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").
> 
> Signed-off-by: Marek Szyprowski 

Acked-by: Vlastimil Babka 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous()

2018-07-16 Thread Vlastimil Babka
On 07/09/2018 02:19 PM, Marek Szyprowski wrote:
> The CMA memory allocator doesn't support standard gfp flags for memory
> allocation, so there is no point having it as a parameter for
> dma_alloc_from_contiguous() function. Replace it by a boolean no_warn
> argument, which covers all the underlaying cma_alloc() function supports.
> 
> This will help to avoid giving false feeling that this function supports
> standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
> what has already been an issue: see commit dd65a941f6ba ("arm64:
> dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").
> 
> Signed-off-by: Marek Szyprowski 

Acked-by: Vlastimil Babka 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] mm: alloc_contig_range: allow to specify GFP mask

2017-01-28 Thread Vlastimil Babka

On 01/27/2017 06:23 PM, Lucas Stach wrote:

Currently alloc_contig_range assumes that the compaction should
be done with the default GFP_KERNEL flags. This is probably
right for all current uses of this interface, but may change as
CMA is used in more use-cases (including being the default DMA
memory allocator on some platforms).

Change the function prototype, to allow for passing through the
GFP mask set by upper layers.

Also respect global restrictions by applying memalloc_noio_flags
to the passed in flags.

Signed-off-by: Lucas Stach <l.st...@pengutronix.de>
Acked-by: Michal Hocko <mho...@suse.com>



Acked-by: Vlastimil Babka <vba...@suse.cz>


---
v2: add memalloc_noio restriction
---
 include/linux/gfp.h | 2 +-
 mm/cma.c| 3 ++-
 mm/hugetlb.c| 3 ++-
 mm/page_alloc.c | 5 +++--
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4175dca4ac39..1efa221e0e1d 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -549,7 +549,7 @@ static inline bool pm_suspended_storage(void)
 #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || 
defined(CONFIG_CMA)
 /* The below functions must be run on a range from a single zone. */
 extern int alloc_contig_range(unsigned long start, unsigned long end,
- unsigned migratetype);
+ unsigned migratetype, gfp_t gfp_mask);
 extern void free_contig_range(unsigned long pfn, unsigned nr_pages);
 #endif

diff --git a/mm/cma.c b/mm/cma.c
index c960459eda7e..fbd67d866f67 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -407,7 +407,8 @@ struct page *cma_alloc(struct cma *cma, size_t count, 
unsigned int align)

pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
mutex_lock(_mutex);
-   ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+   ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
+GFP_KERNEL);
mutex_unlock(_mutex);
if (ret == 0) {
page = pfn_to_page(pfn);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3edb759c5c7d..6ed8b160fc0d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1051,7 +1051,8 @@ static int __alloc_gigantic_page(unsigned long start_pfn,
unsigned long nr_pages)
 {
unsigned long end_pfn = start_pfn + nr_pages;
-   return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+   return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE,
+ GFP_KERNEL);
 }

 static bool pfn_range_valid_gigantic(struct zone *z,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index eced9fee582b..c5a745b521c0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7230,6 +7230,7 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
  * #MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
  * in range must have the same migratetype and it must
  * be either of the two.
+ * @gfp_mask:  GFP mask to use during compaction
  *
  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  * aligned, however it's the caller's responsibility to guarantee that
@@ -7243,7 +7244,7 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
  * need to be freed with free_contig_range().
  */
 int alloc_contig_range(unsigned long start, unsigned long end,
-  unsigned migratetype)
+  unsigned migratetype, gfp_t gfp_mask)
 {
unsigned long outer_start, outer_end;
unsigned int order;
@@ -7255,7 +7256,7 @@ int alloc_contig_range(unsigned long start, unsigned long 
end,
.zone = page_zone(pfn_to_page(start)),
.mode = MIGRATE_SYNC,
.ignore_skip_hint = true,
-   .gfp_mask = GFP_KERNEL,
+   .gfp_mask = memalloc_noio_flags(gfp_mask),
};
INIT_LIST_HEAD();




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] mm: wire up GFP flag passing in dma_alloc_from_contiguous

2017-01-20 Thread Vlastimil Babka
On 01/19/2017 06:07 PM, Lucas Stach wrote:
> The callers of the DMA alloc functions already provide the proper
> context GFP flags. Make sure to pass them through to the CMA
> allocator, to make the CMA compaction context aware.
> 
> Signed-off-by: Lucas Stach <l.st...@pengutronix.de>

Acked-by: Vlastimil Babka <vba...@suse.cz>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] mm: cma_alloc: allow to specify GFP mask

2017-01-20 Thread Vlastimil Babka
On 01/19/2017 06:07 PM, Lucas Stach wrote:
> Most users of this interface just want to use it with the default
> GFP_KERNEL flags, but for cases where DMA memory is allocated it may
> be called from a different context.
> 
> No functional change yet, just passing through the flag to the
> underlying alloc_contig_range function.
> 
> Signed-off-by: Lucas Stach <l.st...@pengutronix.de>

Acked-by: Vlastimil Babka <vba...@suse.cz>

> ---
>  arch/powerpc/kvm/book3s_hv_builtin.c | 3 ++-
>  drivers/base/dma-contiguous.c| 2 +-
>  include/linux/cma.h  | 3 ++-
>  mm/cma.c | 5 +++--
>  mm/cma_debug.c   | 2 +-
>  5 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
> b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 5bb24be0b346..56a62d97ab2d 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -56,7 +56,8 @@ struct page *kvm_alloc_hpt(unsigned long nr_pages)
>  {
>   VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);
>  
> - return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES));
> + return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES),
> +  GFP_KERNEL);
>  }
>  EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
>  
> diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c
> index e167a1e1bccb..d1a9cbabc627 100644
> --- a/drivers/base/dma-contiguous.c
> +++ b/drivers/base/dma-contiguous.c
> @@ -193,7 +193,7 @@ struct page *dma_alloc_from_contiguous(struct device 
> *dev, size_t count,
>   if (align > CONFIG_CMA_ALIGNMENT)
>   align = CONFIG_CMA_ALIGNMENT;
>  
> - return cma_alloc(dev_get_cma_area(dev), count, align);
> + return cma_alloc(dev_get_cma_area(dev), count, align, GFP_KERNEL);
>  }
>  
>  /**
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index 6f0a91b37f68..03f32d0bd1d8 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -29,6 +29,7 @@ extern int __init cma_declare_contiguous(phys_addr_t base,
>  extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
>   unsigned int order_per_bit,
>   struct cma **res_cma);
> -extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int 
> align);
> +extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int 
> align,
> +   gfp_t gfp_mask);
>  extern bool cma_release(struct cma *cma, const struct page *pages, unsigned 
> int count);
>  #endif
> diff --git a/mm/cma.c b/mm/cma.c
> index fbd67d866f67..a33ddfde315d 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -362,7 +362,8 @@ int __init cma_declare_contiguous(phys_addr_t base,
>   * This function allocates part of contiguous memory on specific
>   * contiguous memory area.
>   */
> -struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align)
> +struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
> +gfp_t gfp_mask)
>  {
>   unsigned long mask, offset;
>   unsigned long pfn = -1;
> @@ -408,7 +409,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, 
> unsigned int align)
>   pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
>   mutex_lock(_mutex);
>   ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
> -  GFP_KERNEL);
> +  gfp_mask);
>   mutex_unlock(_mutex);
>   if (ret == 0) {
>   page = pfn_to_page(pfn);
> diff --git a/mm/cma_debug.c b/mm/cma_debug.c
> index f8e4b60db167..ffc0c3d0ae64 100644
> --- a/mm/cma_debug.c
> +++ b/mm/cma_debug.c
> @@ -138,7 +138,7 @@ static int cma_alloc_mem(struct cma *cma, int count)
>   if (!mem)
>   return -ENOMEM;
>  
> - p = cma_alloc(cma, count, 0);
> + p = cma_alloc(cma, count, 0, GFP_KERNEL);
>   if (!p) {
>   kfree(mem);
>   return -ENOMEM;
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] mm: alloc_contig_range: allow to specify GFP mask

2017-01-20 Thread Vlastimil Babka
On 01/19/2017 06:07 PM, Lucas Stach wrote:
> Currently alloc_contig_range assumes that the compaction should
> be done with the default GFP_KERNEL flags. This is probably
> right for all current uses of this interface, but may change as
> CMA is used in more use-cases (including being the default DMA
> memory allocator on some platforms).
> 
> Change the function prototype, to allow for passing through the
> GFP mask set by upper layers. No functional change in this patch,
> just making the assumptions a bit more obvious.
> 
> Signed-off-by: Lucas Stach 

[...]

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index eced9fee582b..6d392d8dee36 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7230,6 +7230,7 @@ static int __alloc_contig_migrate_range(struct 
> compact_control *cc,
>   *   #MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
>   *   in range must have the same migratetype and it must
>   *   be either of the two.
> + * @gfp_mask:GFP mask to use during compaction
>   *
>   * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
>   * aligned, however it's the caller's responsibility to guarantee that
> @@ -7243,7 +7244,7 @@ static int __alloc_contig_migrate_range(struct 
> compact_control *cc,
>   * need to be freed with free_contig_range().
>   */
>  int alloc_contig_range(unsigned long start, unsigned long end,
> -unsigned migratetype)
> +unsigned migratetype, gfp_t gfp_mask)
>  {
>   unsigned long outer_start, outer_end;
>   unsigned int order;
> @@ -7255,7 +7256,7 @@ int alloc_contig_range(unsigned long start, unsigned 
> long end,
>   .zone = page_zone(pfn_to_page(start)),
>   .mode = MIGRATE_SYNC,
>   .ignore_skip_hint = true,
> - .gfp_mask = GFP_KERNEL,
> + .gfp_mask = gfp_mask,

I think you should apply memalloc_noio_flags() here (and Michal should
then convert it to the new name in his scoped gfp_nofs series). Note
that then it's technically a functional change, but it's needed.
Otherwise looks good.

>   };
>   INIT_LIST_HEAD();
>  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu