David Hildenbrand wrote: [..] > > Maybe there is something missing in ZONE_DEVICE freeing/splitting code > > of large folios, where we should do the same, to make sure that all > > page->memcg_data is actually 0? > > > > I assume so. Let me dig. > > > > I suspect this should do the trick: > > diff --git a/fs/dax.c b/fs/dax.c > index af5045b0f476e..8dffffef70d21 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -397,6 +397,10 @@ static inline unsigned long dax_folio_put(struct folio > *folio) > if (!order) > return 0; > > +#ifdef NR_PAGES_IN_LARGE_FOLIO > + folio->_nr_pages = 0; > +#endif
I assume this new fs/dax.c instance of this pattern motivates a folio_set_nr_pages() helper to hide the ifdef? While it is concerning that fs/dax.c misses common expectations like this, but I think that is the nature of bypassing the page allocator to get folios(). However, raises the question if fixing it here is sufficient for other ZONE_DEVICE folio cases. I did not immediately find a place where other ZONE_DEVICE users might be calling prep_compound_page() and leaving stale tail page metadata lying around. Alistair? > + > for (i = 0; i < (1UL << order); i++) { > struct dev_pagemap *pgmap = page_pgmap(&folio->page); > struct page *page = folio_page(folio, i); > > > Alternatively (in the style of fa23a338de93aa03eb0b6146a0440f5762309f85) > > diff --git a/fs/dax.c b/fs/dax.c > index af5045b0f476e..a1e354b748522 100644 > --- a/fs/dax.c > +++ b/fs/dax.c > @@ -412,6 +412,9 @@ static inline unsigned long dax_folio_put(struct folio > *folio) > */ > new_folio->pgmap = pgmap; > new_folio->share = 0; > +#ifdef CONFIG_MEMCG > + new_folio->memcg_data = 0; > +#endif This looks correct, but I like the first option because I would never expect a dax-page to need to worry about being part of a memcg. > WARN_ON_ONCE(folio_ref_count(new_folio)); > } > > > > -- > Cheers, > > David / dhildenb Thanks for the help, David!