Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount
On Wed, Feb 09, 2022 at 02:53:51PM +0100, Christoph Hellwig wrote: > On Wed, Feb 09, 2022 at 08:29:56AM -0400, Jason Gunthorpe wrote: > > It is nice, but the other series are still impacted by the fsdax mess > > - they still stuff pages into ptes without proper refcounts and have > > to carry nonsense to dance around this problem. > > > > I certainly would be unhappy if the amd driver, for instance, gained > > the fsdax problem as well and started pushing 4k pages into PMDs. > > As said before: I think this all needs to be fixed. But I'd rather > fix it gradually and I think this series is a nice step forward. > After that we can look at the pte mappings. Right, I agree with this Jason
Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount
On Wed, Feb 09, 2022 at 07:23:45AM +0100, Christoph Hellwig wrote: > On Tue, Feb 08, 2022 at 07:30:11PM -0800, Dan Williams wrote: > > Interesting. I had expected that to really fix the refcount problem > > that fs/dax.c would need to start taking real page references as pages > > were added to a mapping, just like page cache. > > I think we should do that eventually. But I think this series that > just attacks the device private type and extends to the device coherent > and p2p enhacements is a good first step to stop the proliferation of > the one off refcount and to allow to deal with the fsdax pages in another > more focuessed series. It is nice, but the other series are still impacted by the fsdax mess - they still stuff pages into ptes without proper refcounts and have to carry nonsense to dance around this problem. I certainly would be unhappy if the amd driver, for instance, gained the fsdax problem as well and started pushing 4k pages into PMDs. Jason
Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount
On Sun, Feb 6, 2022 at 10:33 PM Christoph Hellwig wrote: [..] > @@ -500,28 +482,27 @@ void free_devmap_managed_page(struct page *page) > */ > page->mapping = NULL; > page->pgmap->ops->page_free(page); > + > + /* > +* Reset the page count to 1 to prepare for handing out the page > again. > +*/ > + set_page_count(page, 1); Interesting. I had expected that to really fix the refcount problem that fs/dax.c would need to start taking real page references as pages were added to a mapping, just like page cache. This looks ok to me, and passes my tests. So given I'm still working my way back to fixing the references properly I'm ok for this hack to replace the more broken hack that is there presently. Reviewed-by: Dan Williams
Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount
On 2/6/22 22:32, Christoph Hellwig wrote: ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE pages. Note that this excludes the special idle page wakeup for fsdax pages, which still happens at refcount 1. This is a separate issue and will be sorted out later. Given that only fsdax pages require the notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig symbol can go away and be replaced with a FS_DAX check for this hook in the put_page fastpath. Based on an earlier patch from Ralph Campbell . Signed-off-by: Christoph Hellwig Thanks for working on this, definite step forward. Reviewed-by: Ralph Campbell
Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount
On Mon, Feb 07, 2022 at 07:32:48AM +0100, Christoph Hellwig wrote: > ZONE_DEVICE struct pages have an extra reference count that complicates > the code for put_page() and several places in the kernel that need to > check the reference count to see that a page is not being used (gup, > compaction, migration, etc.). Clean up the code so the reference count > doesn't need to be treated specially for ZONE_DEVICE pages. > > Note that this excludes the special idle page wakeup for fsdax pages, > which still happens at refcount 1. This is a separate issue and will > be sorted out later. Given that only fsdax pages require the > notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig > symbol can go away and be replaced with a FS_DAX check for this hook > in the put_page fastpath. > > Based on an earlier patch from Ralph Campbell . > > Signed-off-by: Christoph Hellwig > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 1 - > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 - > drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 - > fs/Kconfig | 1 - > include/linux/memremap.h | 12 +++-- > include/linux/mm.h | 6 +-- > lib/test_hmm.c | 1 - > mm/Kconfig | 4 -- > mm/internal.h| 2 + > mm/memcontrol.c | 11 ++--- > mm/memremap.c| 57 > mm/migrate.c | 6 --- > mm/swap.c| 16 ++- > 13 files changed, 36 insertions(+), 83 deletions(-) It looks like a good next step to me Reviewed-by: Jason Gunthorpe > struct dev_pagemap_ops { > /* > - * Called once the page refcount reaches 1. (ZONE_DEVICE pages never > - * reach 0 refcount unless there is a refcount bug. This allows the > - * device driver to implement its own memory management.) > + * Called once the page refcount reaches 0. The reference count will be > + * reset to one by the core code after the method is called to prepare > + * for handing out the page again. I did prefer Ralph's version of this that kept the refcount at 0 while the page was on the free-list. I hope we can get there again after later series :) Jason