Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-09 Thread Jason Gunthorpe
On Wed, Feb 09, 2022 at 02:53:51PM +0100, Christoph Hellwig wrote:
> On Wed, Feb 09, 2022 at 08:29:56AM -0400, Jason Gunthorpe wrote:
> > It is nice, but the other series are still impacted by the fsdax mess
> > - they still stuff pages into ptes without proper refcounts and have
> > to carry nonsense to dance around this problem.
> > 
> > I certainly would be unhappy if the amd driver, for instance, gained
> > the fsdax problem as well and started pushing 4k pages into PMDs.
> 
> As said before: I think this all needs to be fixed.  But I'd rather
> fix it gradually and I think this series is a nice step forward.
> After that we can look at the pte mappings.

Right, I agree with this

Jason


Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-09 Thread Jason Gunthorpe
On Wed, Feb 09, 2022 at 07:23:45AM +0100, Christoph Hellwig wrote:
> On Tue, Feb 08, 2022 at 07:30:11PM -0800, Dan Williams wrote:
> > Interesting. I had expected that to really fix the refcount problem
> > that fs/dax.c would need to start taking real page references as pages
> > were added to a mapping, just like page cache.
> 
> I think we should do that eventually.  But I think this series that
> just attacks the device private type and extends to the device coherent
> and p2p enhacements is a good first step to stop the proliferation of
> the one off refcount and to allow to deal with the fsdax pages in another
> more focuessed series.

It is nice, but the other series are still impacted by the fsdax mess
- they still stuff pages into ptes without proper refcounts and have
to carry nonsense to dance around this problem.

I certainly would be unhappy if the amd driver, for instance, gained
the fsdax problem as well and started pushing 4k pages into PMDs.

Jason


Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-08 Thread Dan Williams
On Sun, Feb 6, 2022 at 10:33 PM Christoph Hellwig  wrote:
[..]
> @@ -500,28 +482,27 @@ void free_devmap_managed_page(struct page *page)
>  */
> page->mapping = NULL;
> page->pgmap->ops->page_free(page);
> +
> +   /*
> +* Reset the page count to 1 to prepare for handing out the page 
> again.
> +*/
> +   set_page_count(page, 1);

Interesting. I had expected that to really fix the refcount problem
that fs/dax.c would need to start taking real page references as pages
were added to a mapping, just like page cache.

This looks ok to me, and passes my tests. So given I'm still working
my way back to fixing the references properly I'm ok for this hack to
replace the more broken hack that is there presently.

Reviewed-by: Dan Williams 


Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-07 Thread Ralph Campbell

On 2/6/22 22:32, Christoph Hellwig wrote:

ZONE_DEVICE struct pages have an extra reference count that complicates
the code for put_page() and several places in the kernel that need to
check the reference count to see that a page is not being used (gup,
compaction, migration, etc.). Clean up the code so the reference count
doesn't need to be treated specially for ZONE_DEVICE pages.

Note that this excludes the special idle page wakeup for fsdax pages,
which still happens at refcount 1.  This is a separate issue and will
be sorted out later.  Given that only fsdax pages require the
notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig
symbol can go away and be replaced with a FS_DAX check for this hook
in the put_page fastpath.

Based on an earlier patch from Ralph Campbell .

Signed-off-by: Christoph Hellwig 


Thanks for working on this, definite step forward.

Reviewed-by: Ralph Campbell 



Re: [PATCH 7/8] mm: remove the extra ZONE_DEVICE struct page refcount

2022-02-07 Thread Jason Gunthorpe
On Mon, Feb 07, 2022 at 07:32:48AM +0100, Christoph Hellwig wrote:
> ZONE_DEVICE struct pages have an extra reference count that complicates
> the code for put_page() and several places in the kernel that need to
> check the reference count to see that a page is not being used (gup,
> compaction, migration, etc.). Clean up the code so the reference count
> doesn't need to be treated specially for ZONE_DEVICE pages.
> 
> Note that this excludes the special idle page wakeup for fsdax pages,
> which still happens at refcount 1.  This is a separate issue and will
> be sorted out later.  Given that only fsdax pages require the
> notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig
> symbol can go away and be replaced with a FS_DAX check for this hook
> in the put_page fastpath.
> 
> Based on an earlier patch from Ralph Campbell .
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c   |  1 -
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  1 -
>  drivers/gpu/drm/nouveau/nouveau_dmem.c   |  1 -
>  fs/Kconfig   |  1 -
>  include/linux/memremap.h | 12 +++--
>  include/linux/mm.h   |  6 +--
>  lib/test_hmm.c   |  1 -
>  mm/Kconfig   |  4 --
>  mm/internal.h|  2 +
>  mm/memcontrol.c  | 11 ++---
>  mm/memremap.c| 57 
>  mm/migrate.c |  6 ---
>  mm/swap.c| 16 ++-
>  13 files changed, 36 insertions(+), 83 deletions(-)

It looks like a good next step to me

Reviewed-by: Jason Gunthorpe 

>  struct dev_pagemap_ops {
>   /*
> -  * Called once the page refcount reaches 1.  (ZONE_DEVICE pages never
> -  * reach 0 refcount unless there is a refcount bug. This allows the
> -  * device driver to implement its own memory management.)
> +  * Called once the page refcount reaches 0.  The reference count will be
> +  * reset to one by the core code after the method is called to prepare
> +  * for handing out the page again.

I did prefer Ralph's version of this that kept the refcount at 0 while
the page was on the free-list. I hope we can get there again after
later series :)

Jason