Re: [PATCH] mm: fix missing wake-up event for FSDAX pages

2022-07-21 Thread Dan Williams
Jason Gunthorpe wrote:
> On Mon, Jul 11, 2022 at 07:39:17PM -0700, Dan Williams wrote:
> > Muchun Song wrote:
> > > On Mon, Jul 04, 2022 at 11:38:16AM +0100, Matthew Wilcox wrote:
> > > > On Mon, Jul 04, 2022 at 03:40:54PM +0800, Muchun Song wrote:
> > > > > FSDAX page refcounts are 1-based, rather than 0-based: if refcount is
> > > > > 1, then the page is freed.  The FSDAX pages can be pinned through GUP,
> > > > > then they will be unpinned via unpin_user_page() using a folio variant
> > > > > to put the page, however, folio variants did not consider this special
> > > > > case, the result will be to miss a wakeup event (like the user of
> > > > > __fuse_dax_break_layouts()).
> > > > 
> > > > Argh, no.  The 1-based refcounts are a blight on the entire kernel.
> > > > They need to go away, not be pushed into folios as well.  I think
> > > 
> > > I would be happy if this could go away.
> > 
> > Continue to agree that this blight needs to end.
> > 
> > One of the pre-requisites to getting back to normal accounting of FSDAX
> > page pin counts was to first drop the usage of get_dev_pagemap() in the
> > GUP path:
> > 
> > https://lore.kernel.org/linux-mm/161604048257.1463742.1374527716381197629.st...@dwillia2-desk3.amr.corp.intel.com/
> > 
> > That work stalled on notifying mappers of surprise removal events of FSDAX 
> > pfns.
> 
> We already talked about this - once we have proper refcounting the
> above is protected naturally by the proper refcounting. The reason it
> is there is only because the refcount goes to 1 and the page is
> re-used so the natural protection in GUP doesn't work.
> 
> We don't need surprise removal events to fix this, we need the FS side
> to hold a reference when it puts the pages into the PTEs..

Ah, true. Once the FS reference can make device removal hang on the open
references then that is good enough for fixing up the dax-page reference
count problem.

The notification to force the FS to drop its references is just a
behaviour improvment at that point.

> 
> > So, once I dig out from a bit of CXL backlog and review that effort the
> > next step that I see will be convert the FSDAX path to take typical
> > references vmf_insert() time. Unless I am missing a shorter path to get
> > this fixed up?
> 
> Yeah, just do this. IIRC Christoph already did all the infrastructure now,
> just take the correct references and remove the special cases that
> turn off the new infrastructure for fsdax.
> 
> When I looked at it a long while ago it seemd to require some
> understanding of the fsdax code and exactly what the lifecycle model
> was supposed to be there.

CXL development has reached a good break point for me to hop over and
take a look at this now.

Speaking of CXL, if you have any heartburn on that rework of
devm_request_free_mem_region(), let me know:

https://lore.kernel.org/all/62d97a89d66a1_17f3e829...@dwillia2-xfh.jf.intel.com.notmuch/



Re: [PATCH] fs: Call kmap_local_page() in copy_string_kernel()

2022-07-21 Thread Ira Weiny
On Sun, Jul 10, 2022 at 12:01:36PM +0200, Fabio M. De Francesco wrote:
> The use of kmap_atomic() is being deprecated in favor of kmap_local_page().
> 
> With kmap_local_page(), the mappings are per thread, CPU local, not
> globally visible and can take page faults. Furthermore, the mappings can be
> acquired from any context (including interrupts).
> 
> Therefore, use kmap_local_page() in copy_string_kernel() instead of
> kmap_atomic().
> 
> Tested with xfstests on a QEMU + KVM 32-bits VM booting a kernel with
> HIGHMEM64GB enabled.
> 
> Suggested-by: Ira Weiny 
> Signed-off-by: Fabio M. De Francesco 
> ---
> 
> I sent a first patch to fs/exec.c for converting kmap() and kmap_atomic()
> to kmap_local_page():
> https://lore.kernel.org/lkml/20220630163527.9776-1-fmdefrance...@gmail.com/
> 
> Some days ago, Ira Weiny, while he was reviewing that patch, made me notice
> that I had overlooked a second kmap_atomic() in the same file (thanks):
> https://lore.kernel.org/lkml/YsiQptk19txHrG4c@iweiny-desk3/
> 
> I've been asked to send this as an additional change. This is why there will
> not be any second version of that previous patch.
> 
>  fs/exec.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 4a2129c0d422..5fa652ca5823 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -639,11 +639,11 @@ int copy_string_kernel(const char *arg, struct 
> linux_binprm *bprm)
>   page = get_arg_page(bprm, pos, 1);
>   if (!page)
>   return -E2BIG;
> - kaddr = kmap_atomic(page);
> + kaddr = kmap_local_page(page);
>   flush_arg_page(bprm, pos & PAGE_MASK, page);

I really question why we can't use memcpy_to_page() here and move the
flush_arg_page() prior to the mapping?

flush_arg_page() only calls flush_cache_page() which does not need the
mapping to work correctly AFAICT.

Ira

>   memcpy(kaddr + offset_in_page(pos), arg, bytes_to_copy);
>   flush_dcache_page(page);
> - kunmap_atomic(kaddr);
> + kunmap_local(kaddr);
>   put_arg_page(page);
>   }
>  
> -- 
> 2.36.1
> 



Re: [PATCH] mm: fix missing wake-up event for FSDAX pages

2022-07-21 Thread Jason Gunthorpe
On Mon, Jul 11, 2022 at 07:39:17PM -0700, Dan Williams wrote:
> Muchun Song wrote:
> > On Mon, Jul 04, 2022 at 11:38:16AM +0100, Matthew Wilcox wrote:
> > > On Mon, Jul 04, 2022 at 03:40:54PM +0800, Muchun Song wrote:
> > > > FSDAX page refcounts are 1-based, rather than 0-based: if refcount is
> > > > 1, then the page is freed.  The FSDAX pages can be pinned through GUP,
> > > > then they will be unpinned via unpin_user_page() using a folio variant
> > > > to put the page, however, folio variants did not consider this special
> > > > case, the result will be to miss a wakeup event (like the user of
> > > > __fuse_dax_break_layouts()).
> > > 
> > > Argh, no.  The 1-based refcounts are a blight on the entire kernel.
> > > They need to go away, not be pushed into folios as well.  I think
> > 
> > I would be happy if this could go away.
> 
> Continue to agree that this blight needs to end.
> 
> One of the pre-requisites to getting back to normal accounting of FSDAX
> page pin counts was to first drop the usage of get_dev_pagemap() in the
> GUP path:
> 
> https://lore.kernel.org/linux-mm/161604048257.1463742.1374527716381197629.st...@dwillia2-desk3.amr.corp.intel.com/
> 
> That work stalled on notifying mappers of surprise removal events of FSDAX 
> pfns.

We already talked about this - once we have proper refcounting the
above is protected naturally by the proper refcounting. The reason it
is there is only because the refcount goes to 1 and the page is
re-used so the natural protection in GUP doesn't work.

We don't need surprise removal events to fix this, we need the FS side
to hold a reference when it puts the pages into the PTEs..

> So, once I dig out from a bit of CXL backlog and review that effort the
> next step that I see will be convert the FSDAX path to take typical
> references vmf_insert() time. Unless I am missing a shorter path to get
> this fixed up?

Yeah, just do this. IIRC Christoph already did all the infrastructure now,
just take the correct references and remove the special cases that
turn off the new infrastructure for fsdax.

When I looked at it a long while ago it seemd to require some
understanding of the fsdax code and exactly what the lifecycle model
was supposed to be there.

Jason



Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition

2022-07-21 Thread Darrick J. Wong
On Thu, Jul 21, 2022 at 02:06:10PM +, ruansy.f...@fujitsu.com wrote:
> 在 2022/7/1 8:31, Darrick J. Wong 写道:
> > On Thu, Jun 09, 2022 at 10:34:35PM +0800, Shiyang Ruan wrote:
> >> Failure notification is not supported on partitions.  So, when we mount
> >> a reflink enabled xfs on a partition with dax option, let it fail with
> >> -EINVAL code.
> >>
> >> Signed-off-by: Shiyang Ruan 
> > 
> > Looks good to me, though I think this patch applies to ... wherever all
> > those rmap+reflink+dax patches went.  I think that's akpm's tree, right?
> > 
> > Ideally this would go in through there to keep the pieces together, but
> > I don't mind tossing this in at the end of the 5.20 merge window if akpm
> > is unwilling.
> 
> BTW, since these patches (dax&reflink&rmap + THIS + pmem-unbind) are 
> waiting to be merged, is it time to think about "removing the 
> experimental tag" again?  :)

It's probably time to take up that question again.

Yesterday I tried running generic/470 (aka the MAP_SYNC test) and it
didn't succeed because it sets up dmlogwrites atop dmthinp atop pmem,
and at least one of those dm layers no longer allows fsdax pass-through,
so XFS silently turned mount -o dax into -o dax=never. :(

I'm not sure how to fix that...

--D

> 
> --
> Thanks,
> Ruan.
> 
> > 
> > Reviewed-by: Darrick J. Wong 
> > 
> > --D
> > 
> >> ---
> >>   fs/xfs/xfs_super.c | 6 --
> >>   1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> >> index 8495ef076ffc..a3c221841fa6 100644
> >> --- a/fs/xfs/xfs_super.c
> >> +++ b/fs/xfs/xfs_super.c
> >> @@ -348,8 +348,10 @@ xfs_setup_dax_always(
> >>goto disable_dax;
> >>}
> >>   
> >> -  if (xfs_has_reflink(mp)) {
> >> -  xfs_alert(mp, "DAX and reflink cannot be used together!");
> >> +  if (xfs_has_reflink(mp) &&
> >> +  bdev_is_partition(mp->m_ddev_targp->bt_bdev)) {
> >> +  xfs_alert(mp,
> >> +  "DAX and reflink cannot work with multi-partitions!");
> >>return -EINVAL;
> >>}
> >>   
> >> -- 
> >> 2.36.1
> >>
> >>
> >>



Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition

2022-07-21 Thread ruansy.f...@fujitsu.com
在 2022/7/1 8:31, Darrick J. Wong 写道:
> On Thu, Jun 09, 2022 at 10:34:35PM +0800, Shiyang Ruan wrote:
>> Failure notification is not supported on partitions.  So, when we mount
>> a reflink enabled xfs on a partition with dax option, let it fail with
>> -EINVAL code.
>>
>> Signed-off-by: Shiyang Ruan 
> 
> Looks good to me, though I think this patch applies to ... wherever all
> those rmap+reflink+dax patches went.  I think that's akpm's tree, right?
> 
> Ideally this would go in through there to keep the pieces together, but
> I don't mind tossing this in at the end of the 5.20 merge window if akpm
> is unwilling.

BTW, since these patches (dax&reflink&rmap + THIS + pmem-unbind) are 
waiting to be merged, is it time to think about "removing the 
experimental tag" again?  :)


--
Thanks,
Ruan.

> 
> Reviewed-by: Darrick J. Wong 
> 
> --D
> 
>> ---
>>   fs/xfs/xfs_super.c | 6 --
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
>> index 8495ef076ffc..a3c221841fa6 100644
>> --- a/fs/xfs/xfs_super.c
>> +++ b/fs/xfs/xfs_super.c
>> @@ -348,8 +348,10 @@ xfs_setup_dax_always(
>>  goto disable_dax;
>>  }
>>   
>> -if (xfs_has_reflink(mp)) {
>> -xfs_alert(mp, "DAX and reflink cannot be used together!");
>> +if (xfs_has_reflink(mp) &&
>> +bdev_is_partition(mp->m_ddev_targp->bt_bdev)) {
>> +xfs_alert(mp,
>> +"DAX and reflink cannot work with multi-partitions!");
>>  return -EINVAL;
>>  }
>>   
>> -- 
>> 2.36.1
>>
>>
>>


Re: [PATCH] xfs: fail dax mount if reflink is enabled on a partition

2022-07-21 Thread ruansy.f...@fujitsu.com
Hi Andrew,

I am not sure if you had seen this.  So make a ping.

在 2022/7/1 13:14, Shiyang Ruan 写道:
> 
> 
> 在 2022/7/1 8:31, Darrick J. Wong 写道:
>> On Thu, Jun 09, 2022 at 10:34:35PM +0800, Shiyang Ruan wrote:
>>> Failure notification is not supported on partitions.  So, when we mount
>>> a reflink enabled xfs on a partition with dax option, let it fail with
>>> -EINVAL code.
>>>
>>> Signed-off-by: Shiyang Ruan 
>>
>> Looks good to me, though I think this patch applies to ... wherever all
>> those rmap+reflink+dax patches went.  I think that's akpm's tree, right?
> 
> Yes, they are in his tree, still in mm-unstable now.
> 
>>
>> Ideally this would go in through there to keep the pieces together, but
>> I don't mind tossing this in at the end of the 5.20 merge window if akpm
>> is unwilling.

What's your opinion on this?  I found that the rmap+reflink+dax patches have 
been merged into mm-stable,  so maybe it's a bit late to merge this one.  ( I'm 
not pushing, just to know the situation. )


--
Thanks,
Ruan. 

> 
> Both are fine to me.  Thanks!
> 
> 
> -- 
> Ruan.
> 
>>
>> Reviewed-by: Darrick J. Wong 
>>
>> --D
>>
>>> ---
>>>   fs/xfs/xfs_super.c | 6 --
>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
>>> index 8495ef076ffc..a3c221841fa6 100644
>>> --- a/fs/xfs/xfs_super.c
>>> +++ b/fs/xfs/xfs_super.c
>>> @@ -348,8 +348,10 @@ xfs_setup_dax_always(
>>>   goto disable_dax;
>>>   }
>>> -    if (xfs_has_reflink(mp)) {
>>> -    xfs_alert(mp, "DAX and reflink cannot be used together!");
>>> +    if (xfs_has_reflink(mp) &&
>>> +    bdev_is_partition(mp->m_ddev_targp->bt_bdev)) {
>>> +    xfs_alert(mp,
>>> +    "DAX and reflink cannot work with multi-partitions!");
>>>   return -EINVAL;
>>>   }
>>> -- 
>>> 2.36.1
>>>
>>>
>>>
> 
> 
>