Re: [PATCH] xfs: drop experimental warning for FSDAX

2024-02-23 Thread Darrick J. Wong
On Fri, Sep 15, 2023 at 02:38:54PM +0800, Shiyang Ruan wrote:
> FSDAX and reflink can work together now, let's drop this warning.
> 
> Signed-off-by: Shiyang Ruan 

Chandan: Can we get this queued up for 6.8, please?  This has been a
loong time coming.

Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/xfs/xfs_super.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 1f77014c6e1a..faee773fa026 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -371,7 +371,6 @@ xfs_setup_dax_always(
>   return -EINVAL;
>   }
>  
> - xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own 
> risk");
>   return 0;
>  
>  disable_dax:
> -- 
> 2.42.0
> 



Re: [PATCH] xfs: drop experimental warning for FSDAX

2024-01-11 Thread Darrick J. Wong
On Thu, Jan 11, 2024 at 10:59:21AM -0600, Bill O'Donnell wrote:
> On Fri, Sep 15, 2023 at 02:38:54PM +0800, Shiyang Ruan wrote:
> > FSDAX and reflink can work together now, let's drop this warning.
> > 
> > Signed-off-by: Shiyang Ruan 
> 
> Are there any updates on this?
 
Remind us to slip this in for 6.8-rc7 if nobody complains about the new
dax functionality. :)

--D

> Thanks-
> Bill
> 
> 
> > ---
> >  fs/xfs/xfs_super.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 1f77014c6e1a..faee773fa026 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -371,7 +371,6 @@ xfs_setup_dax_always(
> > return -EINVAL;
> > }
> >  
> > -   xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own 
> > risk");
> > return 0;
> >  
> >  disable_dax:
> > -- 
> > 2.42.0
> > 
> 
> 



Re: [PATCH v15] mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind

2023-10-20 Thread Darrick J. Wong
On Fri, Oct 20, 2023 at 03:26:32PM +0530, Chandan Babu R wrote:
> On Thu, Sep 28, 2023 at 06:32:27 PM +0800, Shiyang Ruan wrote:
> > 
> > Changes since v14:
> >  1. added/fixed code comments per Dan's comments
> > 
> >
> > Now, if we suddenly remove a PMEM device(by calling unbind) which
> > contains FSDAX while programs are still accessing data in this device,
> > e.g.:
> > ```
> >  $FSSTRESS_PROG -d $SCRATCH_MNT -n 9 -p 4 &
> >  # $FSX_PROG -N 100 -o 8192 -l 50 $SCRATCH_MNT/t001 &
> >  echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind
> > ```
> > it could come into an unacceptable state:
> >   1. device has gone but mount point still exists, and umount will fail
> >with "target is busy"
> >   2. programs will hang and cannot be killed
> >   3. may crash with NULL pointer dereference
> >
> > To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we
> > are going to remove the whole device, and make sure all related processes
> > could be notified so that they could end up gracefully.
> >
> > This patch is inspired by Dan's "mm, dax, pmem: Introduce
> > dev_pagemap_failure()"[1].  With the help of dax_holder and
> > ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> > on it to unmap all files in use, and notify processes who are using
> > those files.
> >
> > Call trace:
> > trigger unbind
> >  -> unbind_store()
> >   -> ... (skip)
> >-> devres_release_all()
> > -> kill_dax()
> >  -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
> >   -> xfs_dax_notify_failure()
> >   `-> freeze_super() // freeze (kernel call)
> >   `-> do xfs rmap
> >   ` -> mf_dax_kill_procs()
> >   `  -> collect_procs_fsdax()// all associated processes
> >   `  -> unmap_and_kill()
> >   ` -> invalidate_inode_pages2_range() // drop file's cache
> >   `-> thaw_super()   // thaw (both kernel & user call)
> >
> > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> > event.  Use the exclusive freeze/thaw[2] to lock the filesystem to prevent
> > new dax mapping from being created.  Do not shutdown filesystem directly
> > if configuration is not supported, or if failure range includes metadata
> > area.  Make sure all files and processes(not only the current progress)
> > are handled correctly.  Also drop the cache of associated files before
> > pmem is removed.
> >
> > [1]: 
> > https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.st...@dwillia2-desk3.amr.corp.intel.com/
> > [2]: 
> > https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/
> >
> > Signed-off-by: Shiyang Ruan 
> > Reviewed-by: Darrick J. Wong 
> > Acked-by: Dan Williams 
> 
> Hi Andrew,
> 
> Shiyang had indicated that this patch has been added to
> akpm/mm-hotfixes-unstable branch. However, I don't see the patch listed in
> that branch.
> 
> I am about to start collecting XFS patches for v6.7 cycle. Please let me know
> if you have any objections with me taking this patch via the XFS tree.

V15 was dropped from his tree on 28 Sept., you might as well pull it
into your own tree for 6.7.  It's been testing fine on my trees for the
past 3 weeks.

https://lore.kernel.org/mm-commits/20230928172815.ee6afc43...@smtp.kernel.org/

--D

> 
> -- 
> Chandan



Re: [PATCH] xfs: drop experimental warning for FSDAX

2023-10-10 Thread Darrick J. Wong
On Tue, Oct 10, 2023 at 11:53:02AM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2023/10/10 0:47, Darrick J. Wong 写道:
> > On Mon, Oct 09, 2023 at 10:14:12PM +0800, Shiyang Ruan wrote:
> > > 
> > > 
> > > 在 2023/10/6 0:05, Darrick J. Wong 写道:
> > > > On Thu, Oct 05, 2023 at 04:53:12PM +0800, Shiyang Ruan wrote:
> > > > > 
> > > > > 
> > > > > 在 2023/10/5 8:08, Darrick J. Wong 写道:
> > > > > > > > 
> > > > > > > > Sorry, I sent the list below to Chandan, didn't cc the maillist
> > > > > > > > because it's just a rough list rather than a PR:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 1. subject: [v3]  xfs: correct calculation for agend and 
> > > > > > > > blockcount
> > > > > > > >   url:
> > > > > > > >   
> > > > > > > > https://lore.kernel.org/linux-xfs/20230913102942.601271-1-ruansy.f...@fujitsu.com/
> > > > > > > >   note:This one is a fix patch for commit: 5cf32f63b0f4 
> > > > > > > > ("xfs:
> > > > > > > >   fix the calculation for "end" and "length"").
> > > > > > > >It can solve the fail of xfs/55[0-2]: the 
> > > > > > > > programs
> > > > > > > >accessing the DAX file may not be notified as 
> > > > > > > > expected,
> > > > > > > >   because the length always 1 block less than 
> > > > > > > > actual.  Then
> > > > > > > >  this patch fixes this.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 2. subject: [v15] mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE 
> > > > > > > > for unbind
> > > > > > > >   url:
> > > > > > > >   
> > > > > > > > https://lore.kernel.org/linux-xfs/20230928103227.250550-1-ruansy.f...@fujitsu.com/T/#u
> > > > > > > >   note:This is a feature patch.  It handles the 
> > > > > > > > pre-remove event
> > > > > > > >   of DAX device, by notifying kernel/user space before 
> > > > > > > > actually
> > > > > > > >  removing.
> > > > > > > >It has been picked by Andrew in his
> > > > > > > >mm-hotfixes-unstable. I am not sure whether you 
> > > > > > > > or he will
> > > > > > > >   merge this one.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 3. subject: [v1]  xfs: drop experimental warning for FSDAX
> > > > > > > >   url:
> > > > > > > >   
> > > > > > > > https://lore.kernel.org/linux-xfs/20230915063854.1784918-1-ruansy.f...@fujitsu.com/
> > > > > > > >   note:With the patches mentioned above, I did a lot of 
> > > > > > > > tests,
> > > > > > > >   including xfstests and blackbox tests, the FSDAX function 
> > > > > > > > looks
> > > > > > > >  good now.  So I think the experimental warning could be 
> > > > > > > > dropped.
> > > > > > > 
> > > > > > > Darrick/Dave, Could you please review the above patch and let us 
> > > > > > > know if you
> > > > > > > have any objections?
> > > > > > 
> > > > > > The first two patches are ok.  The third one ... well I was about 
> > > > > > to say
> > > > > > ok but then this happened with generic/269 on a 6.6-rc4 kernel and 
> > > > > > those
> > > > > > two patches applied:
> > > > > 
> > > > > Hi Darrick,
> > > > > 
> > > > > Thanks for testing.  I just tested this case (generic/269) on 
> > > > > v6.6-rc4 with
> > > > > my 3 patches again, but it didn't fail.  Such WARNING message didn't 
> > > > > show in
> > > > > dmesg too.
> > > > > 
> > > > > My local.config is show

Re: [PATCH] xfs: drop experimental warning for FSDAX

2023-10-09 Thread Darrick J. Wong
On Mon, Oct 09, 2023 at 10:14:12PM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2023/10/6 0:05, Darrick J. Wong 写道:
> > On Thu, Oct 05, 2023 at 04:53:12PM +0800, Shiyang Ruan wrote:
> > > 
> > > 
> > > 在 2023/10/5 8:08, Darrick J. Wong 写道:
> > > > > > 
> > > > > > Sorry, I sent the list below to Chandan, didn't cc the maillist
> > > > > > because it's just a rough list rather than a PR:
> > > > > > 
> > > > > > 
> > > > > > 1. subject: [v3]  xfs: correct calculation for agend and blockcount
> > > > > >  url:
> > > > > >  
> > > > > > https://lore.kernel.org/linux-xfs/20230913102942.601271-1-ruansy.f...@fujitsu.com/
> > > > > >  note:This one is a fix patch for commit: 5cf32f63b0f4 
> > > > > > ("xfs:
> > > > > >  fix the calculation for "end" and "length"").
> > > > > >   It can solve the fail of xfs/55[0-2]: the programs
> > > > > >   accessing the DAX file may not be notified as 
> > > > > > expected,
> > > > > >  because the length always 1 block less than actual.  
> > > > > > Then
> > > > > > this patch fixes this.
> > > > > > 
> > > > > > 
> > > > > > 2. subject: [v15] mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for 
> > > > > > unbind
> > > > > >  url:
> > > > > >  
> > > > > > https://lore.kernel.org/linux-xfs/20230928103227.250550-1-ruansy.f...@fujitsu.com/T/#u
> > > > > >  note:This is a feature patch.  It handles the pre-remove 
> > > > > > event
> > > > > >  of DAX device, by notifying kernel/user space before actually
> > > > > > removing.
> > > > > >   It has been picked by Andrew in his
> > > > > >   mm-hotfixes-unstable. I am not sure whether you or he 
> > > > > > will
> > > > > >  merge this one.
> > > > > > 
> > > > > > 
> > > > > > 3. subject: [v1]  xfs: drop experimental warning for FSDAX
> > > > > >  url:
> > > > > >  
> > > > > > https://lore.kernel.org/linux-xfs/20230915063854.1784918-1-ruansy.f...@fujitsu.com/
> > > > > >  note:With the patches mentioned above, I did a lot of 
> > > > > > tests,
> > > > > >  including xfstests and blackbox tests, the FSDAX function looks
> > > > > > good now.  So I think the experimental warning could be dropped.
> > > > > 
> > > > > Darrick/Dave, Could you please review the above patch and let us know 
> > > > > if you
> > > > > have any objections?
> > > > 
> > > > The first two patches are ok.  The third one ... well I was about to say
> > > > ok but then this happened with generic/269 on a 6.6-rc4 kernel and those
> > > > two patches applied:
> > > 
> > > Hi Darrick,
> > > 
> > > Thanks for testing.  I just tested this case (generic/269) on v6.6-rc4 
> > > with
> > > my 3 patches again, but it didn't fail.  Such WARNING message didn't show 
> > > in
> > > dmesg too.
> > > 
> > > My local.config is shown as below:
> > > [nodax_reflink]
> > > export FSTYP=xfs
> > > export TEST_DEV=/dev/pmem0
> > > export TEST_DIR=/mnt/test
> > > export SCRATCH_DEV=/dev/pmem1
> > > export SCRATCH_MNT=/mnt/scratch
> > > export MKFS_OPTIONS="-m reflink=1,rmapbt=1"
> > > 
> > > [dax_reflink]
> > > export FSTYP=xfs
> > > export TEST_DEV=/dev/pmem0
> > > export TEST_DIR=/mnt/test
> > > export SCRATCH_DEV=/dev/pmem1
> > > export SCRATCH_MNT=/mnt/scratch
> > > export MKFS_OPTIONS="-m reflink=1,rmapbt=1"
> > > export MOUNT_OPTIONS="-o dax"
> > > export TEST_FS_MOUNT_OPTS="-o dax"
> > > 
> > > And tools version are:
> > >   - xfstests (v2023.09.03)
> > 
> > Same here.
> > 
> > >   - xfsprogs (v6.4.0)
> > 
> > I have a newer branch, though it only contains resyncs with newer kernel
> > versions and bugfixes.
> > 
> > > Could you sho

Re: [PATCH] xfs: drop experimental warning for FSDAX

2023-10-05 Thread Darrick J. Wong
On Thu, Oct 05, 2023 at 04:53:12PM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2023/10/5 8:08, Darrick J. Wong 写道:
> > > > 
> > > > Sorry, I sent the list below to Chandan, didn't cc the maillist
> > > > because it's just a rough list rather than a PR:
> > > > 
> > > > 
> > > > 1. subject: [v3]  xfs: correct calculation for agend and blockcount
> > > > url:
> > > > 
> > > > https://lore.kernel.org/linux-xfs/20230913102942.601271-1-ruansy.f...@fujitsu.com/
> > > > note:This one is a fix patch for commit: 5cf32f63b0f4 ("xfs:
> > > > fix the calculation for "end" and "length"").
> > > >  It can solve the fail of xfs/55[0-2]: the programs
> > > >  accessing the DAX file may not be notified as expected,
> > > > because the length always 1 block less than actual.  Then
> > > >this patch fixes this.
> > > > 
> > > > 
> > > > 2. subject: [v15] mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind
> > > > url:
> > > > 
> > > > https://lore.kernel.org/linux-xfs/20230928103227.250550-1-ruansy.f...@fujitsu.com/T/#u
> > > > note:This is a feature patch.  It handles the pre-remove event
> > > > of DAX device, by notifying kernel/user space before actually
> > > >removing.
> > > >  It has been picked by Andrew in his
> > > >  mm-hotfixes-unstable. I am not sure whether you or he will
> > > > merge this one.
> > > > 
> > > > 
> > > > 3. subject: [v1]  xfs: drop experimental warning for FSDAX
> > > > url:
> > > > 
> > > > https://lore.kernel.org/linux-xfs/20230915063854.1784918-1-ruansy.f...@fujitsu.com/
> > > > note:With the patches mentioned above, I did a lot of tests,
> > > > including xfstests and blackbox tests, the FSDAX function looks
> > > >good now.  So I think the experimental warning could be dropped.
> > > 
> > > Darrick/Dave, Could you please review the above patch and let us know if 
> > > you
> > > have any objections?
> > 
> > The first two patches are ok.  The third one ... well I was about to say
> > ok but then this happened with generic/269 on a 6.6-rc4 kernel and those
> > two patches applied:
> 
> Hi Darrick,
> 
> Thanks for testing.  I just tested this case (generic/269) on v6.6-rc4 with
> my 3 patches again, but it didn't fail.  Such WARNING message didn't show in
> dmesg too.
> 
> My local.config is shown as below:
> [nodax_reflink]
> export FSTYP=xfs
> export TEST_DEV=/dev/pmem0
> export TEST_DIR=/mnt/test
> export SCRATCH_DEV=/dev/pmem1
> export SCRATCH_MNT=/mnt/scratch
> export MKFS_OPTIONS="-m reflink=1,rmapbt=1"
> 
> [dax_reflink]
> export FSTYP=xfs
> export TEST_DEV=/dev/pmem0
> export TEST_DIR=/mnt/test
> export SCRATCH_DEV=/dev/pmem1
> export SCRATCH_MNT=/mnt/scratch
> export MKFS_OPTIONS="-m reflink=1,rmapbt=1"
> export MOUNT_OPTIONS="-o dax"
> export TEST_FS_MOUNT_OPTS="-o dax"
> 
> And tools version are:
>  - xfstests (v2023.09.03)

Same here.

>  - xfsprogs (v6.4.0)

I have a newer branch, though it only contains resyncs with newer kernel
versions and bugfixes.

> Could you show me more info (such as kernel config, local.config) ?  So that
> I can find out what exactly is going wrong.

The full xml output from fstests is here:

https://djwong.org/fstests/output/.fa9f295c6a2dd4426aa26b4d74e8e0299ad2307507547d5444c157f0e883df92/.2e718425eda716ad848ae05dfab82a670af351f314e26b3cb658a929331bf2eb/result.xml

I think the key difference between your setup and mine is that
MKFS_OPTIONS includes '-d daxinherit=1' and MOUNT_OPTIONS do not include
-o dax.  That shouldn't make any difference, though.

Also: In the weeks leading up to me adding the PREREMOVE patches a
couple of days ago, no test (generic/269 or otherwise) hit that ASSERT.
I'm wondering if that means that the preremove code isn't shooting down
a page mapping or something?

Grepping through the result.xml reveals:

$ grep -E '(generic.269|xfs.55[012])' /tmp/result.xml
563:
910:
1685:   
1686:   
1689:[ 6046.844058] run fstests generic/269 at 2023-10-04 15:26:57
2977:   

So it's possible that 550 or 552 messed this up for us. :/

See attached kconfig.

--D

> 
> 
> --
> Thanks,
> Ruan.
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 6.6.0-rc4 djw:djwx Kernel
#
CONFIG_CC_VERSION_TEXT=

Re: [PATCH] xfs: drop experimental warning for FSDAX

2023-10-04 Thread Darrick J. Wong
On Mon, Oct 02, 2023 at 06:09:56PM +0530, Chandan Babu R wrote:
> On Mon, Oct 02, 2023 at 08:15:57 PM +0800, Shiyang Ruan wrote:
> > 在 2023/9/30 2:34, Dan Williams 写道:
> >> Shiyang Ruan wrote:
> >>>
> >>>
> >>> 在 2023/9/29 1:13, Darrick J. Wong 写道:
> >>>> On Thu, Sep 28, 2023 at 09:20:52AM -0700, Andrew Morton wrote:
> >>>>> On Thu, 28 Sep 2023 16:44:00 +0800 Shiyang Ruan 
> >>>>>  wrote:
> >>>>>
> >>>>>> But please pick the following patch[1] as well, which fixes failures of
> >>>>>> xfs55[0-2] cases.
> >>>>>>
> >>>>>> [1]
> >>>>>> https://lore.kernel.org/linux-xfs/20230913102942.601271-1-ruansy.f...@fujitsu.com
> >>>>>
> >>>>> I guess I can take that xfs patch, as it fixes a DAX patch.  I hope the 
> >>>>> xfs team
> >>>>> are watching.
> >>>>>
> >>>>> But
> >>>>>
> >>>>> a) I'm not subscribed to linux-xfs and
> >>>>>
> >>>>> b) the changelog fails to describe the userspace-visible effects of
> >>>>>  the bug, so I (and others) are unable to determine which kernel
> >>>>>  versions should be patched.
> >>>>>
> >>>>> Please update that changelog and resend?
> >>>>
> >>>> That's a purely xfs patch anyways.  The correct maintainer is Chandan,
> >>>> not Andrew.
> >>>>
> >>>> /me notes that post-reorg, patch authors need to ask the release manager
> >>>> (Chandan) directly to merge their patches after they've gone through
> >>>> review.  Pull requests of signed tags are encouraged strongly.
> >>>>
> >>>> Shiyang, could you please send Chandan pull requests with /all/ the
> >>>> relevant pmem patches incorporated?  I think that's one PR for the
> >>>> "xfs: correct calculation for agend and blockcount" for 6.6; and a
> >>>> second PR with all the non-bugfix stuff (PRE_REMOVE and whatnot) for
> >>>> 6.7.
> >>>
> >>> OK.  Though I don't know how to send the PR by email, I have sent a list
> >>> of the patches and added description for each one.
> >> If you want I can create a signed pull request from a git.kernel.org
> >> tree.
> >> Where is that list of patches? I see v15 of preremove.
> >
> > Sorry, I sent the list below to Chandan, didn't cc the maillist
> > because it's just a rough list rather than a PR:
> >
> >
> > 1. subject: [v3]  xfs: correct calculation for agend and blockcount
> >url:
> >
> > https://lore.kernel.org/linux-xfs/20230913102942.601271-1-ruansy.f...@fujitsu.com/
> >note:This one is a fix patch for commit: 5cf32f63b0f4 ("xfs:
> >fix the calculation for "end" and "length"").
> > It can solve the fail of xfs/55[0-2]: the programs
> > accessing the DAX file may not be notified as expected,
> >because the length always 1 block less than actual.  Then
> >   this patch fixes this.
> >
> >
> > 2. subject: [v15] mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind
> >url:
> >
> > https://lore.kernel.org/linux-xfs/20230928103227.250550-1-ruansy.f...@fujitsu.com/T/#u
> >note:This is a feature patch.  It handles the pre-remove event
> >of DAX device, by notifying kernel/user space before actually
> >   removing.
> > It has been picked by Andrew in his
> > mm-hotfixes-unstable. I am not sure whether you or he will
> >merge this one.
> >
> >
> > 3. subject: [v1]  xfs: drop experimental warning for FSDAX
> >url:
> >
> > https://lore.kernel.org/linux-xfs/20230915063854.1784918-1-ruansy.f...@fujitsu.com/
> >note:With the patches mentioned above, I did a lot of tests,
> >including xfstests and blackbox tests, the FSDAX function looks
> >   good now.  So I think the experimental warning could be dropped.
> 
> Darrick/Dave, Could you please review the above patch and let us know if you
> have any objections?

The first two patches are ok.  The third one ... well I was about to say
ok but then this happened with generic/269 on a 6.6-rc4 kernel and those
two patches applied:

[ 6046.844058] run fstests generic/269 at 2023-10-04 15:26:57
[ 6047.479351] XFS (pmem0): Mounting V5 Fil

Re: [PATCH] xfs: drop experimental warning for FSDAX

2023-10-04 Thread Darrick J. Wong
On Fri, Sep 29, 2023 at 11:28:02AM -0700, Dan Williams wrote:
> Eric Sandeen wrote:
> > On 9/29/23 9:17 AM, Chandan Babu R wrote:
> > > On Thu, Sep 28, 2023 at 09:20:52 AM -0700, Andrew Morton wrote:
> > >> On Thu, 28 Sep 2023 16:44:00 +0800 Shiyang Ruan 
> > >>  wrote:
> > >>
> > >>> But please pick the following patch[1] as well, which fixes failures of 
> > >>> xfs55[0-2] cases.
> > >>>
> > >>> [1] 
> > >>> https://lore.kernel.org/linux-xfs/20230913102942.601271-1-ruansy.f...@fujitsu.com
> > >>
> > >> I guess I can take that xfs patch, as it fixes a DAX patch.  I hope the 
> > >> xfs team
> > >> are watching.
> > >>
> > >> But
> > >>
> > >> a) I'm not subscribed to linux-xfs and
> > >>
> > >> b) the changelog fails to describe the userspace-visible effects of
> > >>the bug, so I (and others) are unable to determine which kernel
> > >>versions should be patched.
> > >>
> > >> Please update that changelog and resend?
> > > 
> > > I will apply "xfs: correct calculation for agend and blockcount" patch to
> > > xfs-linux Git tree and include it for the next v6.6 pull request to Linus.
> > > 
> > > At the outset, It looks like I can pick "mm, pmem, xfs: Introduce
> > > MF_MEM_PRE_REMOVE for unbind"
> > > (i.e. 
> > > https://lore.kernel.org/linux-xfs/20230928103227.250550-1-ruansy.f...@fujitsu.com/T/#u)
> > > patch for v6.7 as well. But that will require your Ack. Please let me know
> > > your opinion.
> > > 
> > > Also, I will pick "xfs: drop experimental warning for FSDAX" patch for 
> > > v6.7.
> > 
> > While I hate to drag it out even longer, it seems slightly optimistic to
> > drop experimental at the same time as the "last" fix, in case it's not
> > really the last fix.
> > 
> > But I don't have super strong feelings about it, and I would be happy to
> > finally see experimental go away. So if those who are more tuned into
> > the details are comfortable with that 6.7 plan, I'll defer to them on
> > the question.
> 
> The main blockage of "experimental" was the inability to specify
> dax+reflink, and the concern that resolving that conflict would end up
> breaking MAP_SYNC semantics or some other regression.
> 
> The dax_notify_failure() work has resolved that conflict without
> regressing semantics.
> 
> Ultimately this is an XFS filesystem maintainer decision, but my
> perspective is that v6.7-rc1 starts the clock on experimental going away
> and if the bug reports stay quiet that state can persist into
> v6.7-final.  If new reports crop up, revert the experimental removal and
> try again for v6.8.

I'm ok with this.  Let's merge the PRE_REMOVE patch (and the arithematic
fix) for 6.7-rc1.  If nobody screams during 6.7, send a patch to Linus
removing EXPERIMENTAL after (say) 6.7-rc8.  DAX will no longer be
experimental for the 2024 LTS.

--D



Re: [PATCH] xfs: drop experimental warning for FSDAX

2023-09-28 Thread Darrick J. Wong
On Thu, Sep 28, 2023 at 09:20:52AM -0700, Andrew Morton wrote:
> On Thu, 28 Sep 2023 16:44:00 +0800 Shiyang Ruan  
> wrote:
> 
> > But please pick the following patch[1] as well, which fixes failures of 
> > xfs55[0-2] cases.
> > 
> > [1] 
> > https://lore.kernel.org/linux-xfs/20230913102942.601271-1-ruansy.f...@fujitsu.com
> 
> I guess I can take that xfs patch, as it fixes a DAX patch.  I hope the xfs 
> team
> are watching.
> 
> But
> 
> a) I'm not subscribed to linux-xfs and
> 
> b) the changelog fails to describe the userspace-visible effects of
>the bug, so I (and others) are unable to determine which kernel
>versions should be patched.
> 
> Please update that changelog and resend?

That's a purely xfs patch anyways.  The correct maintainer is Chandan,
not Andrew.

/me notes that post-reorg, patch authors need to ask the release manager
(Chandan) directly to merge their patches after they've gone through
review.  Pull requests of signed tags are encouraged strongly.

Shiyang, could you please send Chandan pull requests with /all/ the
relevant pmem patches incorporated?  I think that's one PR for the
"xfs: correct calculation for agend and blockcount" for 6.6; and a
second PR with all the non-bugfix stuff (PRE_REMOVE and whatnot) for
6.7.

--D



Re: [PATCH] xfs: drop experimental warning for FSDAX

2023-09-26 Thread Darrick J. Wong
On Wed, Sep 27, 2023 at 11:18:42AM +1000, Dave Chinner wrote:
> On Tue, Sep 26, 2023 at 07:55:19AM -0700, Darrick J. Wong wrote:
> > On Thu, Sep 21, 2023 at 04:33:04PM +0800, Shiyang Ruan wrote:
> > > Hi,
> > > 
> > > Any comments?
> > 
> > I notice that xfs/55[0-2] still fail on my fakepmem machine:
> > 
> > --- /tmp/fstests/tests/xfs/550.out  2023-09-23 09:40:47.839521305 -0700
> > +++ /var/tmp/fstests/xfs/550.out.bad2023-09-24 20:00:23.4 
> > -0700
> > @@ -3,7 +3,6 @@ Format and mount
> >  Create the original files
> >  Inject memory failure (1 page)
> >  Inject poison...
> > -Process is killed by signal: 7
> >  Inject memory failure (2 pages)
> >  Inject poison...
> > -Process is killed by signal: 7
> > +Memory failure didn't kill the process
> > 
> > (yes, rmap is enabled)
> 
> Yes, I see the same failures, too. I've just been ignoring them
> because I thought that all the memory failure code was still not
> complete

Oh, I bet we were supposed to have merged this

https://lore.kernel.org/linux-xfs/20230828065744.1446462-1-ruansy.f...@fujitsu.com/

to complete the pmem media failure handling code.  Should we (by which I
mostly mean Shiyang) ask Chandan to merge these two patches for 6.7?

--D

> -Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com



Re: [PATCH] xfs: drop experimental warning for FSDAX

2023-09-26 Thread Darrick J. Wong
On Thu, Sep 21, 2023 at 04:33:04PM +0800, Shiyang Ruan wrote:
> Hi,
> 
> Any comments?

I notice that xfs/55[0-2] still fail on my fakepmem machine:

--- /tmp/fstests/tests/xfs/550.out  2023-09-23 09:40:47.839521305 -0700
+++ /var/tmp/fstests/xfs/550.out.bad2023-09-24 20:00:23.4 -0700
@@ -3,7 +3,6 @@ Format and mount
 Create the original files
 Inject memory failure (1 page)
 Inject poison...
-Process is killed by signal: 7
 Inject memory failure (2 pages)
 Inject poison...
-Process is killed by signal: 7
+Memory failure didn't kill the process

(yes, rmap is enabled)

Not sure what that's about?

--D

> 
> 
> --
> Thanks,
> Ruan.
> 
> 
> 在 2023/9/15 14:38, Shiyang Ruan 写道:
> > FSDAX and reflink can work together now, let's drop this warning.
> > 
> > Signed-off-by: Shiyang Ruan 
> > ---
> >   fs/xfs/xfs_super.c | 1 -
> >   1 file changed, 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 1f77014c6e1a..faee773fa026 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -371,7 +371,6 @@ xfs_setup_dax_always(
> > return -EINVAL;
> > }
> > -   xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own 
> > risk");
> > return 0;
> >   disable_dax:



Re: [PATCH v14] mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind

2023-08-30 Thread Darrick J. Wong
On Mon, Aug 28, 2023 at 02:57:44PM +0800, Shiyang Ruan wrote:
> 
> Changes since v13:
>  1. don't return error if _freeze(FREEZE_HOLDER_KERNEL) got other error
> 
> 
> Now, if we suddenly remove a PMEM device(by calling unbind) which
> contains FSDAX while programs are still accessing data in this device,
> e.g.:
> ```
>  $FSSTRESS_PROG -d $SCRATCH_MNT -n 9 -p 4 &
>  # $FSX_PROG -N 100 -o 8192 -l 50 $SCRATCH_MNT/t001 &
>  echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind
> ```
> it could come into an unacceptable state:
>   1. device has gone but mount point still exists, and umount will fail
>with "target is busy"
>   2. programs will hang and cannot be killed
>   3. may crash with NULL pointer dereference
> 
> To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we
> are going to remove the whole device, and make sure all related processes
> could be notified so that they could end up gracefully.
> 
> This patch is inspired by Dan's "mm, dax, pmem: Introduce
> dev_pagemap_failure()"[1].  With the help of dax_holder and
> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> on it to unmap all files in use, and notify processes who are using
> those files.
> 
> Call trace:
> trigger unbind
>  -> unbind_store()
>   -> ... (skip)
>-> devres_release_all()
> -> kill_dax()
>  -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
>   -> xfs_dax_notify_failure()
>   `-> freeze_super() // freeze (kernel call)
>   `-> do xfs rmap
>   ` -> mf_dax_kill_procs()
>   `  -> collect_procs_fsdax()// all associated processes
>   `  -> unmap_and_kill()
>   ` -> invalidate_inode_pages2_range() // drop file's cache
>   `-> thaw_super()   // thaw (both kernel & user call)
> 
> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> event.  Use the exclusive freeze/thaw[2] to lock the filesystem to prevent
> new dax mapping from being created.  Do not shutdown filesystem directly
> if configuration is not supported, or if failure range includes metadata
> area.  Make sure all files and processes(not only the current progress)
> are handled correctly.  Also drop the cache of associated files before
> pmem is removed.
> 
> [1]: 
> https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.st...@dwillia2-desk3.amr.corp.intel.com/
> [2]: 
> https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/
> 
> Signed-off-by: Shiyang Ruan 

Looks good, now who wants to take this patch?

Reviewed-by: Darrick J. Wong 

--D

> ---
>  drivers/dax/super.c |  3 +-
>  fs/xfs/xfs_notify_failure.c | 99 ++---
>  include/linux/mm.h  |  1 +
>  mm/memory-failure.c | 17 +--
>  4 files changed, 109 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 0da9232ea175..f4b635526345 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -326,7 +326,8 @@ void kill_dax(struct dax_device *dax_dev)
>   return;
>  
>   if (dax_dev->holder_data != NULL)
> - dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> + dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> + MF_MEM_PRE_REMOVE);
>  
>   clear_bit(DAXDEV_ALIVE, _dev->flags);
>   synchronize_srcu(_srcu);
> diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> index 4a9bbd3fe120..79586abc75bf 100644
> --- a/fs/xfs/xfs_notify_failure.c
> +++ b/fs/xfs/xfs_notify_failure.c
> @@ -22,6 +22,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  struct xfs_failure_info {
>   xfs_agblock_t   startblock;
> @@ -73,10 +74,16 @@ xfs_dax_failure_fn(
>   struct xfs_mount*mp = cur->bc_mp;
>   struct xfs_inode*ip;
>   struct xfs_failure_info *notify = data;
> + struct address_space*mapping;
> + pgoff_t pgoff;
> + unsigned long   pgcnt;
>   int error = 0;
>  
>   if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
>   (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
> + /* Continue the query because this isn't a failure. */
> + if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> + return 0;
>   notify->want_shutdown = true;
>   

Re: [PATCH v13] mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind

2023-08-25 Thread Darrick J. Wong
On Fri, Aug 25, 2023 at 11:52:35AM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2023/8/25 7:57, Darrick J. Wong 写道:
> > On Thu, Aug 24, 2023 at 05:41:50PM +0800, Shiyang Ruan wrote:
> > > 
> > > 
> > > 在 2023/8/24 7:36, Darrick J. Wong 写道:
> > > > On Wed, Aug 23, 2023 at 04:17:06PM +0800, Shiyang Ruan wrote:
> > > > > 
> > > > > Changes since v12:
> > > > >1. correct flag name in subject (MF_MEM_REMOVE => 
> > > > > MF_MEM_PRE_REMOVE)
> > > > >2. complete the behavior when fs has already frozen by kernel call
> > > > > NOTICE: Instead of "call notify_failure() again w/o 
> > > > > PRE_REMOVE",
> > > > > I tried this proposal[0].
> > > > >3. call xfs_dax_notify_failure_freeze() and _thaw() in same 
> > > > > function
> > > > >4. rebase on: xfs/xfs-linux.git vfs-for-next
> > > > > 
> > > > > 
> > > > > Now, if we suddenly remove a PMEM device(by calling unbind) which
> > > > > contains FSDAX while programs are still accessing data in this device,
> > > > > e.g.:
> > > > > ```
> > > > >$FSSTRESS_PROG -d $SCRATCH_MNT -n 9 -p 4 &
> > > > ># $FSX_PROG -N 100 -o 8192 -l 50 $SCRATCH_MNT/t001 &
> > > > >echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind
> > > > > ```
> > > > > it could come into an unacceptable state:
> > > > > 1. device has gone but mount point still exists, and umount will 
> > > > > fail
> > > > >  with "target is busy"
> > > > > 2. programs will hang and cannot be killed
> > > > > 3. may crash with NULL pointer dereference
> > > > > 
> > > > > To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know 
> > > > > that we
> > > > > are going to remove the whole device, and make sure all related 
> > > > > processes
> > > > > could be notified so that they could end up gracefully.
> > > > > 
> > > > > This patch is inspired by Dan's "mm, dax, pmem: Introduce
> > > > > dev_pagemap_failure()"[1].  With the help of dax_holder and
> > > > > ->notify_failure() mechanism, the pmem driver is able to ask 
> > > > > filesystem
> > > > > on it to unmap all files in use, and notify processes who are using
> > > > > those files.
> > > > > 
> > > > > Call trace:
> > > > > trigger unbind
> > > > >-> unbind_store()
> > > > > -> ... (skip)
> > > > >  -> devres_release_all()
> > > > >   -> kill_dax()
> > > > >-> dax_holder_notify_failure(dax_dev, 0, U64_MAX, 
> > > > > MF_MEM_PRE_REMOVE)
> > > > > -> xfs_dax_notify_failure()
> > > > > `-> freeze_super() // freeze (kernel call)
> > > > > `-> do xfs rmap
> > > > > ` -> mf_dax_kill_procs()
> > > > > `  -> collect_procs_fsdax()// all associated processes
> > > > > `  -> unmap_and_kill()
> > > > > ` -> invalidate_inode_pages2_range() // drop file's cache
> > > > > `-> thaw_super()   // thaw (both kernel & user 
> > > > > call)
> > > > > 
> > > > > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> > > > > event.  Use the exclusive freeze/thaw[2] to lock the filesystem to 
> > > > > prevent
> > > > > new dax mapping from being created.  Do not shutdown filesystem 
> > > > > directly
> > > > > if configuration is not supported, or if failure range includes 
> > > > > metadata
> > > > > area.  Make sure all files and processes(not only the current 
> > > > > progress)
> > > > > are handled correctly.  Also drop the cache of associated files before
> > > > > pmem is removed.
> > > > > 
> > > > > [0]: 
> > > > > https://lore.kernel.org/linux-xfs/25cf6700-4db0-a346-632c-ec9fc2917...@fujitsu.com/
> > > > > [1]: 
> > > > > https://lore.kernel.org/linux-mm/161604050314.1463742.141516651400357955

Re: [PATCH v13] mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind

2023-08-24 Thread Darrick J. Wong
On Thu, Aug 24, 2023 at 05:41:50PM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2023/8/24 7:36, Darrick J. Wong 写道:
> > On Wed, Aug 23, 2023 at 04:17:06PM +0800, Shiyang Ruan wrote:
> > > 
> > > Changes since v12:
> > >   1. correct flag name in subject (MF_MEM_REMOVE => MF_MEM_PRE_REMOVE)
> > >   2. complete the behavior when fs has already frozen by kernel call
> > >NOTICE: Instead of "call notify_failure() again w/o PRE_REMOVE",
> > >I tried this proposal[0].
> > >   3. call xfs_dax_notify_failure_freeze() and _thaw() in same function
> > >   4. rebase on: xfs/xfs-linux.git vfs-for-next
> > > 
> > > 
> > > Now, if we suddenly remove a PMEM device(by calling unbind) which
> > > contains FSDAX while programs are still accessing data in this device,
> > > e.g.:
> > > ```
> > >   $FSSTRESS_PROG -d $SCRATCH_MNT -n 9 -p 4 &
> > >   # $FSX_PROG -N 100 -o 8192 -l 50 $SCRATCH_MNT/t001 &
> > >   echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind
> > > ```
> > > it could come into an unacceptable state:
> > >1. device has gone but mount point still exists, and umount will fail
> > > with "target is busy"
> > >2. programs will hang and cannot be killed
> > >3. may crash with NULL pointer dereference
> > > 
> > > To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we
> > > are going to remove the whole device, and make sure all related processes
> > > could be notified so that they could end up gracefully.
> > > 
> > > This patch is inspired by Dan's "mm, dax, pmem: Introduce
> > > dev_pagemap_failure()"[1].  With the help of dax_holder and
> > > ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> > > on it to unmap all files in use, and notify processes who are using
> > > those files.
> > > 
> > > Call trace:
> > > trigger unbind
> > >   -> unbind_store()
> > >-> ... (skip)
> > > -> devres_release_all()
> > >  -> kill_dax()
> > >   -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
> > >-> xfs_dax_notify_failure()
> > >`-> freeze_super() // freeze (kernel call)
> > >`-> do xfs rmap
> > >` -> mf_dax_kill_procs()
> > >`  -> collect_procs_fsdax()// all associated processes
> > >`  -> unmap_and_kill()
> > >` -> invalidate_inode_pages2_range() // drop file's cache
> > >`-> thaw_super()   // thaw (both kernel & user call)
> > > 
> > > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> > > event.  Use the exclusive freeze/thaw[2] to lock the filesystem to prevent
> > > new dax mapping from being created.  Do not shutdown filesystem directly
> > > if configuration is not supported, or if failure range includes metadata
> > > area.  Make sure all files and processes(not only the current progress)
> > > are handled correctly.  Also drop the cache of associated files before
> > > pmem is removed.
> > > 
> > > [0]: 
> > > https://lore.kernel.org/linux-xfs/25cf6700-4db0-a346-632c-ec9fc2917...@fujitsu.com/
> > > [1]: 
> > > https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.st...@dwillia2-desk3.amr.corp.intel.com/
> > > [2]: 
> > > https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/
> > > 
> > > Signed-off-by: Shiyang Ruan 
> > > ---
> > >   drivers/dax/super.c |  3 +-
> > >   fs/xfs/xfs_notify_failure.c | 99 ++---
> > >   include/linux/mm.h  |  1 +
> > >   mm/memory-failure.c | 17 +--
> > >   4 files changed, 109 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > > index c4c4728a36e4..2e1a35e82fce 100644
> > > --- a/drivers/dax/super.c
> > > +++ b/drivers/dax/super.c
> > > @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
> > >   return;
> > >   if (dax_dev->holder_data != NULL)
> > > - dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> > > + dax_holder_notify_failure(dax_dev, 0, U64_MAX,
>

Re: [PATCH v13] mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind

2023-08-23 Thread Darrick J. Wong
On Wed, Aug 23, 2023 at 04:17:06PM +0800, Shiyang Ruan wrote:
> 
> Changes since v12:
>  1. correct flag name in subject (MF_MEM_REMOVE => MF_MEM_PRE_REMOVE)
>  2. complete the behavior when fs has already frozen by kernel call
>   NOTICE: Instead of "call notify_failure() again w/o PRE_REMOVE",
>   I tried this proposal[0].
>  3. call xfs_dax_notify_failure_freeze() and _thaw() in same function
>  4. rebase on: xfs/xfs-linux.git vfs-for-next
> 
> 
> Now, if we suddenly remove a PMEM device(by calling unbind) which
> contains FSDAX while programs are still accessing data in this device,
> e.g.:
> ```
>  $FSSTRESS_PROG -d $SCRATCH_MNT -n 9 -p 4 &
>  # $FSX_PROG -N 100 -o 8192 -l 50 $SCRATCH_MNT/t001 &
>  echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind
> ```
> it could come into an unacceptable state:
>   1. device has gone but mount point still exists, and umount will fail
>with "target is busy"
>   2. programs will hang and cannot be killed
>   3. may crash with NULL pointer dereference
> 
> To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we
> are going to remove the whole device, and make sure all related processes
> could be notified so that they could end up gracefully.
> 
> This patch is inspired by Dan's "mm, dax, pmem: Introduce
> dev_pagemap_failure()"[1].  With the help of dax_holder and
> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> on it to unmap all files in use, and notify processes who are using
> those files.
> 
> Call trace:
> trigger unbind
>  -> unbind_store()
>   -> ... (skip)
>-> devres_release_all()
> -> kill_dax()
>  -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
>   -> xfs_dax_notify_failure()
>   `-> freeze_super() // freeze (kernel call)
>   `-> do xfs rmap
>   ` -> mf_dax_kill_procs()
>   `  -> collect_procs_fsdax()// all associated processes
>   `  -> unmap_and_kill()
>   ` -> invalidate_inode_pages2_range() // drop file's cache
>   `-> thaw_super()   // thaw (both kernel & user call)
> 
> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> event.  Use the exclusive freeze/thaw[2] to lock the filesystem to prevent
> new dax mapping from being created.  Do not shutdown filesystem directly
> if configuration is not supported, or if failure range includes metadata
> area.  Make sure all files and processes(not only the current progress)
> are handled correctly.  Also drop the cache of associated files before
> pmem is removed.
> 
> [0]: 
> https://lore.kernel.org/linux-xfs/25cf6700-4db0-a346-632c-ec9fc2917...@fujitsu.com/
> [1]: 
> https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.st...@dwillia2-desk3.amr.corp.intel.com/
> [2]: 
> https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/
> 
> Signed-off-by: Shiyang Ruan 
> ---
>  drivers/dax/super.c |  3 +-
>  fs/xfs/xfs_notify_failure.c | 99 ++---
>  include/linux/mm.h  |  1 +
>  mm/memory-failure.c | 17 +--
>  4 files changed, 109 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index c4c4728a36e4..2e1a35e82fce 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
>   return;
>  
>   if (dax_dev->holder_data != NULL)
> - dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> + dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> + MF_MEM_PRE_REMOVE);
>  
>   clear_bit(DAXDEV_ALIVE, _dev->flags);
>   synchronize_srcu(_srcu);
> diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> index 4a9bbd3fe120..6496c32a9172 100644
> --- a/fs/xfs/xfs_notify_failure.c
> +++ b/fs/xfs/xfs_notify_failure.c
> @@ -22,6 +22,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  struct xfs_failure_info {
>   xfs_agblock_t   startblock;
> @@ -73,10 +74,16 @@ xfs_dax_failure_fn(
>   struct xfs_mount*mp = cur->bc_mp;
>   struct xfs_inode*ip;
>   struct xfs_failure_info *notify = data;
> + struct address_space*mapping;
> + pgoff_t pgoff;
> + unsigned long   pgcnt;
>   int error = 0;
>  
>   if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
>   (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
> + /* Continue the query because this isn't a failure. */
> + if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> + return 0;
>   notify->want_shutdown = true;
>   return 0;
>   }
> @@ -92,14 +99,60 @@ xfs_dax_failure_fn(
>   return 0;
>   }
>  
> - error = 

Re: [PATCH v12 2/2] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

2023-07-31 Thread Darrick J. Wong
On Mon, Jul 31, 2023 at 05:36:36PM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2023/7/29 23:15, Darrick J. Wong 写道:
> > On Thu, Jun 29, 2023 at 04:16:51PM +0800, Shiyang Ruan wrote:
> > > This patch is inspired by Dan's "mm, dax, pmem: Introduce
> > > dev_pagemap_failure()"[1].  With the help of dax_holder and
> > > ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> > > on it to unmap all files in use, and notify processes who are using
> > > those files.
> > > 
> > > Call trace:
> > > trigger unbind
> > >   -> unbind_store()
> > >-> ... (skip)
> > > -> devres_release_all()
> > >  -> kill_dax()
> > >   -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
> > >-> xfs_dax_notify_failure()
> > >`-> freeze_super() // freeze (kernel call)
> > >`-> do xfs rmap
> > >` -> mf_dax_kill_procs()
> > >`  -> collect_procs_fsdax()// all associated processes
> > >`  -> unmap_and_kill()
> > >` -> invalidate_inode_pages2_range() // drop file's cache
> > >`-> thaw_super()   // thaw (both kernel & user call)
> > > 
> > > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> > > event.  Use the exclusive freeze/thaw[2] to lock the filesystem to prevent
> > > new dax mapping from being created.  Do not shutdown filesystem directly
> > > if configuration is not supported, or if failure range includes metadata
> > > area.  Make sure all files and processes(not only the current progress)
> > > are handled correctly.  Also drop the cache of associated files before
> > > pmem is removed.
> > > 
> > > [1]: 
> > > https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.st...@dwillia2-desk3.amr.corp.intel.com/
> > > [2]: 
> > > https://lore.kernel.org/linux-xfs/168688010689.860947.1788875898367401950.stgit@frogsfrogsfrogs/
> > > 
> > > Signed-off-by: Shiyang Ruan 
> > > ---
> > >   drivers/dax/super.c |  3 +-
> > >   fs/xfs/xfs_notify_failure.c | 86 ++---
> > >   include/linux/mm.h  |  1 +
> > >   mm/memory-failure.c | 17 ++--
> > >   4 files changed, 96 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > > index c4c4728a36e4..2e1a35e82fce 100644
> > > --- a/drivers/dax/super.c
> > > +++ b/drivers/dax/super.c
> > > @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
> > >   return;
> > >   if (dax_dev->holder_data != NULL)
> > > - dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> > > + dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> > > + MF_MEM_PRE_REMOVE);
> > >   clear_bit(DAXDEV_ALIVE, _dev->flags);
> > >   synchronize_srcu(_srcu);
> > > diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> > > index 4a9bbd3fe120..f6ec56b76db6 100644
> > > --- a/fs/xfs/xfs_notify_failure.c
> > > +++ b/fs/xfs/xfs_notify_failure.c
> > > @@ -22,6 +22,7 @@
> > >   #include 
> > >   #include 
> > > +#include 
> > >   struct xfs_failure_info {
> > >   xfs_agblock_t   startblock;
> > > @@ -73,10 +74,16 @@ xfs_dax_failure_fn(
> > >   struct xfs_mount*mp = cur->bc_mp;
> > >   struct xfs_inode*ip;
> > >   struct xfs_failure_info *notify = data;
> > > + struct address_space*mapping;
> > > + pgoff_t pgoff;
> > > + unsigned long   pgcnt;
> > >   int error = 0;
> > >   if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
> > >   (rec->rm_flags & (XFS_RMAP_ATTR_FORK | 
> > > XFS_RMAP_BMBT_BLOCK))) {
> > > + /* Continue the query because this isn't a failure. */
> > > + if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> > > + return 0;
> > >   notify->want_shutdown = true;
> > >   return 0;
> > >   }
> > > @@ -92,14 +99,55 @@ xfs_dax_failure_fn(
&g

Re: [PATCH v12 2/2] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

2023-07-29 Thread Darrick J. Wong
On Sat, Jul 29, 2023 at 06:01:00PM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2023/7/20 9:50, Shiyang Ruan 写道:
> > 
> > 
> > 在 2023/7/14 22:18, Darrick J. Wong 写道:
> > > On Fri, Jul 14, 2023 at 05:07:58PM +0800, Shiyang Ruan wrote:
> > > > Hi Darrick,
> > > > 
> > > > Thanks for applying the 1st patch.
> > > > 
> > > > Now, since this patch is based on the new freeze_super()/thaw_super()
> > > > api[1], I'd like to ask what's the plan for this api?  It seems to have
> > > > missed the v6.5-rc1.
> > > > 
> > > > [1] 
> > > > https://lore.kernel.org/linux-xfs/168688010689.860947.1788875898367401950.stgit@frogsfrogsfrogs/
> > > 
> > > 6.6.  I intend to push the XFS UBSAN fixes to the list today for review.
> > > Early next week I'll resend the 6.5 rebase of the kernelfreeze series
> > > and push it to vfs-for-next.  Some time after that will come large folio
> > > writes.
> > 
> > Got it.  Thanks for your information!
> 
> A small request:  If you have time to give some comments, I would appreciate
> it because I hope we can make the most out of this period(before freeze api
> be merged in 6.6).

Done.

--D


> 
> --
> Thanks,
> Ruan.
> 
> > 
> > 
> > -- 
> > Ruan.
> > 
> > > 
> > > --D
> > > 
> > > > 
> > > > -- 
> > > > Thanks,
> > > > Ruan.
> > > > 
> > > > 
> > > > 在 2023/6/29 16:16, Shiyang Ruan 写道:
> > > > > This patch is inspired by Dan's "mm, dax, pmem: Introduce
> > > > > dev_pagemap_failure()"[1].  With the help of dax_holder and
> > > > > ->notify_failure() mechanism, the pmem driver is able to ask 
> > > > > filesystem
> > > > > on it to unmap all files in use, and notify processes who are using
> > > > > those files.
> > > > > 
> > > > > Call trace:
> > > > > trigger unbind
> > > > >    -> unbind_store()
> > > > >     -> ... (skip)
> > > > >  -> devres_release_all()
> > > > >   -> kill_dax()
> > > > >    -> dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> > > > > MF_MEM_PRE_REMOVE)
> > > > >     -> xfs_dax_notify_failure()
> > > > >     `-> freeze_super() // freeze (kernel call)
> > > > >     `-> do xfs rmap
> > > > >     ` -> mf_dax_kill_procs()
> > > > >     `  -> collect_procs_fsdax()    // all associated processes
> > > > >     `  -> unmap_and_kill()
> > > > >     ` -> invalidate_inode_pages2_range() // drop file's cache
> > > > >     `-> thaw_super()   // thaw (both kernel
> > > > > & user call)
> > > > > 
> > > > > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> > > > > event.  Use the exclusive freeze/thaw[2] to lock the
> > > > > filesystem to prevent
> > > > > new dax mapping from being created.  Do not shutdown
> > > > > filesystem directly
> > > > > if configuration is not supported, or if failure range
> > > > > includes metadata
> > > > > area.  Make sure all files and processes(not only the current 
> > > > > progress)
> > > > > are handled correctly.  Also drop the cache of associated files before
> > > > > pmem is removed.
> > > > > 
> > > > > [1]: 
> > > > > https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.st...@dwillia2-desk3.amr.corp.intel.com/
> > > > > [2]: 
> > > > > https://lore.kernel.org/linux-xfs/168688010689.860947.1788875898367401950.stgit@frogsfrogsfrogs/
> > > > > 
> > > > > Signed-off-by: Shiyang Ruan 
> > > > > ---
> > > > >    drivers/dax/super.c |  3 +-
> > > > >    fs/xfs/xfs_notify_failure.c | 86
> > > > > ++---
> > > > >    include/linux/mm.h  |  1 +
> > > > >    mm/memory-failure.c | 17 ++--
> > > > >    4 files changed, 96 insertions(+), 11 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > > > > index c4c4728a36e4..2e1a35e82f

Re: [PATCH v12 2/2] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

2023-07-29 Thread Darrick J. Wong
On Thu, Jun 29, 2023 at 04:16:51PM +0800, Shiyang Ruan wrote:
> This patch is inspired by Dan's "mm, dax, pmem: Introduce
> dev_pagemap_failure()"[1].  With the help of dax_holder and
> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> on it to unmap all files in use, and notify processes who are using
> those files.
> 
> Call trace:
> trigger unbind
>  -> unbind_store()
>   -> ... (skip)
>-> devres_release_all()
> -> kill_dax()
>  -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
>   -> xfs_dax_notify_failure()
>   `-> freeze_super() // freeze (kernel call)
>   `-> do xfs rmap
>   ` -> mf_dax_kill_procs()
>   `  -> collect_procs_fsdax()// all associated processes
>   `  -> unmap_and_kill()
>   ` -> invalidate_inode_pages2_range() // drop file's cache
>   `-> thaw_super()   // thaw (both kernel & user call)
> 
> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> event.  Use the exclusive freeze/thaw[2] to lock the filesystem to prevent
> new dax mapping from being created.  Do not shutdown filesystem directly
> if configuration is not supported, or if failure range includes metadata
> area.  Make sure all files and processes(not only the current progress)
> are handled correctly.  Also drop the cache of associated files before
> pmem is removed.
> 
> [1]: 
> https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.st...@dwillia2-desk3.amr.corp.intel.com/
> [2]: 
> https://lore.kernel.org/linux-xfs/168688010689.860947.1788875898367401950.stgit@frogsfrogsfrogs/
> 
> Signed-off-by: Shiyang Ruan 
> ---
>  drivers/dax/super.c |  3 +-
>  fs/xfs/xfs_notify_failure.c | 86 ++---
>  include/linux/mm.h  |  1 +
>  mm/memory-failure.c | 17 ++--
>  4 files changed, 96 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index c4c4728a36e4..2e1a35e82fce 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
>   return;
>  
>   if (dax_dev->holder_data != NULL)
> - dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> + dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> + MF_MEM_PRE_REMOVE);
>  
>   clear_bit(DAXDEV_ALIVE, _dev->flags);
>   synchronize_srcu(_srcu);
> diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> index 4a9bbd3fe120..f6ec56b76db6 100644
> --- a/fs/xfs/xfs_notify_failure.c
> +++ b/fs/xfs/xfs_notify_failure.c
> @@ -22,6 +22,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  struct xfs_failure_info {
>   xfs_agblock_t   startblock;
> @@ -73,10 +74,16 @@ xfs_dax_failure_fn(
>   struct xfs_mount*mp = cur->bc_mp;
>   struct xfs_inode*ip;
>   struct xfs_failure_info *notify = data;
> + struct address_space*mapping;
> + pgoff_t pgoff;
> + unsigned long   pgcnt;
>   int error = 0;
>  
>   if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
>   (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
> + /* Continue the query because this isn't a failure. */
> + if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> + return 0;
>   notify->want_shutdown = true;
>   return 0;
>   }
> @@ -92,14 +99,55 @@ xfs_dax_failure_fn(
>   return 0;
>   }
>  
> - error = mf_dax_kill_procs(VFS_I(ip)->i_mapping,
> -   xfs_failure_pgoff(mp, rec, notify),
> -   xfs_failure_pgcnt(mp, rec, notify),
> -   notify->mf_flags);
> + mapping = VFS_I(ip)->i_mapping;
> + pgoff = xfs_failure_pgoff(mp, rec, notify);
> + pgcnt = xfs_failure_pgcnt(mp, rec, notify);
> +
> + /* Continue the rmap query if the inode isn't a dax file. */
> + if (dax_mapping(mapping))
> + error = mf_dax_kill_procs(mapping, pgoff, pgcnt,
> +   notify->mf_flags);
> +
> + /* Invalidate the cache in dax pages. */
> + if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> + invalidate_inode_pages2_range(mapping, pgoff,
> +   pgoff + pgcnt - 1);
> +
>   xfs_irele(ip);
>   return error;
>  }
>  
> +static void
> +xfs_dax_notify_failure_freeze(
> + struct xfs_mount*mp)
> +{
> + struct super_block  *sb = mp->m_super;

Nit: extra space right^ here.

> +
> + /* Wait until no one is holding the FREEZE_HOLDER_KERNEL. */
> + while (freeze_super(sb, FREEZE_HOLDER_KERNEL) != 0) {
> + // Shall we just wait, or print warning then return -EBUSY?

Hm.  PRE_REMOVE gets 

Re: [PATCH v12 2/2] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

2023-07-14 Thread Darrick J. Wong
On Fri, Jul 14, 2023 at 05:07:58PM +0800, Shiyang Ruan wrote:
> Hi Darrick,
> 
> Thanks for applying the 1st patch.
> 
> Now, since this patch is based on the new freeze_super()/thaw_super()
> api[1], I'd like to ask what's the plan for this api?  It seems to have
> missed the v6.5-rc1.
> 
> [1] 
> https://lore.kernel.org/linux-xfs/168688010689.860947.1788875898367401950.stgit@frogsfrogsfrogs/

6.6.  I intend to push the XFS UBSAN fixes to the list today for review.
Early next week I'll resend the 6.5 rebase of the kernelfreeze series
and push it to vfs-for-next.  Some time after that will come large folio
writes.

--D

> 
> --
> Thanks,
> Ruan.
> 
> 
> 在 2023/6/29 16:16, Shiyang Ruan 写道:
> > This patch is inspired by Dan's "mm, dax, pmem: Introduce
> > dev_pagemap_failure()"[1].  With the help of dax_holder and
> > ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> > on it to unmap all files in use, and notify processes who are using
> > those files.
> > 
> > Call trace:
> > trigger unbind
> >   -> unbind_store()
> >-> ... (skip)
> > -> devres_release_all()
> >  -> kill_dax()
> >   -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
> >-> xfs_dax_notify_failure()
> >`-> freeze_super() // freeze (kernel call)
> >`-> do xfs rmap
> >` -> mf_dax_kill_procs()
> >`  -> collect_procs_fsdax()// all associated processes
> >`  -> unmap_and_kill()
> >` -> invalidate_inode_pages2_range() // drop file's cache
> >`-> thaw_super()   // thaw (both kernel & user call)
> > 
> > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> > event.  Use the exclusive freeze/thaw[2] to lock the filesystem to prevent
> > new dax mapping from being created.  Do not shutdown filesystem directly
> > if configuration is not supported, or if failure range includes metadata
> > area.  Make sure all files and processes(not only the current progress)
> > are handled correctly.  Also drop the cache of associated files before
> > pmem is removed.
> > 
> > [1]: 
> > https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.st...@dwillia2-desk3.amr.corp.intel.com/
> > [2]: 
> > https://lore.kernel.org/linux-xfs/168688010689.860947.1788875898367401950.stgit@frogsfrogsfrogs/
> > 
> > Signed-off-by: Shiyang Ruan 
> > ---
> >   drivers/dax/super.c |  3 +-
> >   fs/xfs/xfs_notify_failure.c | 86 ++---
> >   include/linux/mm.h  |  1 +
> >   mm/memory-failure.c | 17 ++--
> >   4 files changed, 96 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index c4c4728a36e4..2e1a35e82fce 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
> > return;
> > if (dax_dev->holder_data != NULL)
> > -   dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> > +   dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> > +   MF_MEM_PRE_REMOVE);
> > clear_bit(DAXDEV_ALIVE, _dev->flags);
> > synchronize_srcu(_srcu);
> > diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> > index 4a9bbd3fe120..f6ec56b76db6 100644
> > --- a/fs/xfs/xfs_notify_failure.c
> > +++ b/fs/xfs/xfs_notify_failure.c
> > @@ -22,6 +22,7 @@
> >   #include 
> >   #include 
> > +#include 
> >   struct xfs_failure_info {
> > xfs_agblock_t   startblock;
> > @@ -73,10 +74,16 @@ xfs_dax_failure_fn(
> > struct xfs_mount*mp = cur->bc_mp;
> > struct xfs_inode*ip;
> > struct xfs_failure_info *notify = data;
> > +   struct address_space*mapping;
> > +   pgoff_t pgoff;
> > +   unsigned long   pgcnt;
> > int error = 0;
> > if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
> > (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
> > +   /* Continue the query because this isn't a failure. */
> > +   if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> > +   return 0;
> > notify->want_shutdown = true;
> > return 0;
> > }
> > @@ -92,14 +99,55 @@ xfs_dax_failure_fn(
> > return 0;
> > }
> > -   error = mf_dax_kill_procs(VFS_I(ip)->i_mapping,
> > - xfs_failure_pgoff(mp, rec, notify),
> > - xfs_failure_pgcnt(mp, rec, notify),
> > - notify->mf_flags);
> > +   mapping = VFS_I(ip)->i_mapping;
> > +   pgoff = xfs_failure_pgoff(mp, rec, notify);
> > +   pgcnt = xfs_failure_pgcnt(mp, rec, notify);
> > +
> > +   /* Continue the rmap query if the inode isn't a dax file. */
> > +   if (dax_mapping(mapping))
> > +   error = mf_dax_kill_procs(mapping, pgoff, pgcnt,
> > +

Re: [RFC PATCH v11.1 2/2] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

2023-04-25 Thread Darrick J. Wong
On Wed, Apr 26, 2023 at 10:27:43AM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2023/4/25 23:18, Darrick J. Wong 写道:
> > On Tue, Apr 25, 2023 at 03:23:15PM +0200, Jan Kara wrote:
> > > On Tue 25-04-23 20:47:35, Shiyang Ruan wrote:
> > > > 
> > > > 
> > > > 在 2023/4/20 20:09, Jan Kara 写道:
> > > > > On Thu 20-04-23 10:07:39, Shiyang Ruan wrote:
> > > > > > 在 2023/4/12 18:52, Shiyang Ruan 写道:
> > > > > > > This is a RFC HOTFIX.
> > > > > > > 
> > > > > > > This hotfix adds a exclusive forzen state to make sure any others 
> > > > > > > won't
> > > > > > > thaw the fs during xfs_dax_notify_failure():
> > > > > > > 
> > > > > > >  #define SB_FREEZE_EXCLUSIVE  (SB_FREEZE_COMPLETE + 2)
> > > > > > > Using +2 here is because Darrick's patch[0] is using +1.  So, 
> > > > > > > should we
> > > > > > > make these definitions global?
> > > > > > > 
> > > > > > > Another thing I can't make up my mind is: when another freezer 
> > > > > > > has freeze
> > > > > > > the fs, should we wait unitl it finish, or print a warning in 
> > > > > > > dmesg and
> > > > > > > return -EBUSY?
> > > > > > > 
> > > > > > > Since there are at least 2 places needs exclusive forzen state, I 
> > > > > > > think
> > > > > > > we can refactor helper functions of freeze/thaw for them.  e.g.
> > > > > > >  int freeze_super_exclusive(struct super_block *sb, int 
> > > > > > > frozen);
> > > > > > >  int thaw_super_exclusive(struct super_block *sb, int frozen);
> > > > > > > 
> > > > > > > [0] 
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=repair-fscounters=c3a0d1de4d54ffb565dbc7092dfe1fb851940669
> > > > > 
> > > > > I'm OK with the idea of new freeze state that does not allow 
> > > > > userspace to
> > > > > thaw the filesystem. But I don't really like the guts of filesystem
> > > > > freezing being replicated inside XFS. It is bad enough that they are
> > > > > replicated in [0], replicating them *once more* in another XFS file 
> > > > > shows
> > > > > we are definitely doing something wrong. And Luis will need yet 
> > > > > another
> > > > > incantation of the exlusive freeze for suspend-to-disk. So please 
> > > > > guys get
> > > > > together and reorganize the generic freezing code so that it supports
> > > > > exclusive freeze (for in-kernel users) and works for your usecases 
> > > > > instead
> > > > > of replicating it inside XFS...
> > > > 
> > > > I agree that too much replicating code is not good.  It's necessary to
> > > > create a generic exclusive freeze/thaw for all users.  But for me, I 
> > > > don't
> > > > have the confidence to do it well, because it requires good design and 
> > > > code
> > > > changes will involve other filesystems.  It's diffcult.
> > > > 
> > > > However, I hope to be able to make progress on this unbind feature. 
> > > > Thus, I
> > > > tend to refactor a common helper function for xfs first, and update the 
> > > > code
> > > > later when the generic freeze is done.
> > > 
> > > I think Darrick was thinking about working on a proper generic interface.
> > > So please coordinate with him.
> > 
> > I'll post a vfs generic kernelfreeze series later today.
> > 
> > One thing I haven't figured out yet is what's supposed to happen when
> > PREREMOVE is called on a frozen filesystem.
> 
> call PREREMOVE when:
> 1. freezed by kernel:we wait unitl kernel thaws -> not sure
> 2. freezed by userspace: we take over the control of freeze state:
>  a. userspace can't thaw before PREREMOVE is done
>  b. kernel keeps freeze state after PREREMOVE is done and before
> userspace thaws
> 
> Since the unbind interface doesn't return any other errcode except -ENODEV,
> the only thing I can think of to do is wait for the other one done?  If
> another one doesn't thaw after a long time waitting, we print a "waitting
> too long" warning 

Re: [RFC PATCH v11.1 2/2] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

2023-04-25 Thread Darrick J. Wong
On Tue, Apr 25, 2023 at 03:23:15PM +0200, Jan Kara wrote:
> On Tue 25-04-23 20:47:35, Shiyang Ruan wrote:
> > 
> > 
> > 在 2023/4/20 20:09, Jan Kara 写道:
> > > On Thu 20-04-23 10:07:39, Shiyang Ruan wrote:
> > > > 在 2023/4/12 18:52, Shiyang Ruan 写道:
> > > > > This is a RFC HOTFIX.
> > > > > 
> > > > > This hotfix adds a exclusive forzen state to make sure any others 
> > > > > won't
> > > > > thaw the fs during xfs_dax_notify_failure():
> > > > > 
> > > > > #define SB_FREEZE_EXCLUSIVE   (SB_FREEZE_COMPLETE + 2)
> > > > > Using +2 here is because Darrick's patch[0] is using +1.  So, should 
> > > > > we
> > > > > make these definitions global?
> > > > > 
> > > > > Another thing I can't make up my mind is: when another freezer has 
> > > > > freeze
> > > > > the fs, should we wait unitl it finish, or print a warning in dmesg 
> > > > > and
> > > > > return -EBUSY?
> > > > > 
> > > > > Since there are at least 2 places needs exclusive forzen state, I 
> > > > > think
> > > > > we can refactor helper functions of freeze/thaw for them.  e.g.
> > > > > int freeze_super_exclusive(struct super_block *sb, int frozen);
> > > > > int thaw_super_exclusive(struct super_block *sb, int frozen);
> > > > > 
> > > > > [0] 
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=repair-fscounters=c3a0d1de4d54ffb565dbc7092dfe1fb851940669
> > > 
> > > I'm OK with the idea of new freeze state that does not allow userspace to
> > > thaw the filesystem. But I don't really like the guts of filesystem
> > > freezing being replicated inside XFS. It is bad enough that they are
> > > replicated in [0], replicating them *once more* in another XFS file shows
> > > we are definitely doing something wrong. And Luis will need yet another
> > > incantation of the exlusive freeze for suspend-to-disk. So please guys get
> > > together and reorganize the generic freezing code so that it supports
> > > exclusive freeze (for in-kernel users) and works for your usecases instead
> > > of replicating it inside XFS...
> > 
> > I agree that too much replicating code is not good.  It's necessary to
> > create a generic exclusive freeze/thaw for all users.  But for me, I don't
> > have the confidence to do it well, because it requires good design and code
> > changes will involve other filesystems.  It's diffcult.
> > 
> > However, I hope to be able to make progress on this unbind feature. Thus, I
> > tend to refactor a common helper function for xfs first, and update the code
> > later when the generic freeze is done.
> 
> I think Darrick was thinking about working on a proper generic interface.
> So please coordinate with him.

I'll post a vfs generic kernelfreeze series later today.

One thing I haven't figured out yet is what's supposed to happen when
PREREMOVE is called on a frozen filesystem.  We don't want userspace to
be able to thaw the fs while PREREMOVE is running, so I /guess/ that
means we need some method for the kernel to take over a userspace
freeze and then put it back when we're done?

--D

>   Honza
> 
> -- 
> Jan Kara 
> SUSE Labs, CR



Re: [PATCH v11 2/2] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

2023-04-06 Thread Darrick J. Wong
On Thu, Apr 06, 2023 at 06:50:22PM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2023/4/5 1:45, Darrick J. Wong 写道:
> > On Tue, Mar 28, 2023 at 09:41:46AM +, Shiyang Ruan wrote:
> > > This patch is inspired by Dan's "mm, dax, pmem: Introduce
> > > dev_pagemap_failure()"[1].  With the help of dax_holder and
> > > ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> > > (or mapped device) on it to unmap all files in use and notify processes
> > > who are using those files.
> > > 
> > > Call trace:
> > > trigger unbind
> > >   -> unbind_store()
> > >-> ... (skip)
> > > -> devres_release_all()
> > >  -> kill_dax()
> > >   -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
> > >-> xfs_dax_notify_failure()
> > >`-> freeze_super()
> > >`-> do xfs rmap
> > >` -> mf_dax_kill_procs()
> > >`  -> collect_procs_fsdax()// all associated
> > >`  -> unmap_and_kill()
> > >` -> invalidate_inode_pages2() // drop file's cache
> > >`-> thaw_super()
> > > 
> > > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> > > event.  Freeze the filesystem to prevent new dax mapping being created.
> > > And do not shutdown filesystem directly if something not supported, or
> > > if failure range includes metadata area.  Make sure all files and
> > > processes are handled correctly.  Also drop the cache of associated
> > > files before pmem is removed.
> > > 
> > > [1]: 
> > > https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.st...@dwillia2-desk3.amr.corp.intel.com/
> > > 
> > > Signed-off-by: Shiyang Ruan 
> > > ---
> > >   drivers/dax/super.c |  3 +-
> > >   fs/xfs/xfs_notify_failure.c | 56 +
> > >   include/linux/mm.h  |  1 +
> > >   mm/memory-failure.c | 17 ---
> > >   4 files changed, 67 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > > index c4c4728a36e4..2e1a35e82fce 100644
> > > --- a/drivers/dax/super.c
> > > +++ b/drivers/dax/super.c
> > > @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
> > >   return;
> > >   if (dax_dev->holder_data != NULL)
> > > - dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> > > + dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> > > + MF_MEM_PRE_REMOVE);
> > >   clear_bit(DAXDEV_ALIVE, _dev->flags);
> > >   synchronize_srcu(_srcu);
> > > diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> > > index 1e2eddb8f90f..1b4eff43f9b5 100644
> > > --- a/fs/xfs/xfs_notify_failure.c
> > > +++ b/fs/xfs/xfs_notify_failure.c
> > > @@ -22,6 +22,7 @@
> > >   #include 
> > >   #include 
> > > +#include 
> > >   struct xfs_failure_info {
> > >   xfs_agblock_t   startblock;
> > > @@ -73,10 +74,16 @@ xfs_dax_failure_fn(
> > >   struct xfs_mount*mp = cur->bc_mp;
> > >   struct xfs_inode*ip;
> > >   struct xfs_failure_info *notify = data;
> > > + struct address_space*mapping;
> > > + pgoff_t pgoff;
> > > + unsigned long   pgcnt;
> > >   int error = 0;
> > >   if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
> > >   (rec->rm_flags & (XFS_RMAP_ATTR_FORK | 
> > > XFS_RMAP_BMBT_BLOCK))) {
> > > + /* The device is about to be removed.  Not a really failure. */
> > > + if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> > > + return 0;
> > >   notify->want_shutdown = true;
> > >   return 0;
> > >   }
> > > @@ -92,10 +99,18 @@ xfs_dax_failure_fn(
> > >   return 0;
> > >   }
> > > - error = mf_dax_kill_procs(VFS_I(ip)->i_mapping,
> > > -   xfs_failure_pgoff(mp, rec, notify),
> > > -   xfs_failure_pgcnt(mp, rec, notify),
> > > - 

Re: [PATCH v11 2/2] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

2023-04-04 Thread Darrick J. Wong
On Tue, Mar 28, 2023 at 09:41:46AM +, Shiyang Ruan wrote:
> This patch is inspired by Dan's "mm, dax, pmem: Introduce
> dev_pagemap_failure()"[1].  With the help of dax_holder and
> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> (or mapped device) on it to unmap all files in use and notify processes
> who are using those files.
> 
> Call trace:
> trigger unbind
>  -> unbind_store()
>   -> ... (skip)
>-> devres_release_all()
> -> kill_dax()
>  -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
>   -> xfs_dax_notify_failure()
>   `-> freeze_super()
>   `-> do xfs rmap
>   ` -> mf_dax_kill_procs()
>   `  -> collect_procs_fsdax()// all associated
>   `  -> unmap_and_kill()
>   ` -> invalidate_inode_pages2() // drop file's cache
>   `-> thaw_super()
> 
> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> event.  Freeze the filesystem to prevent new dax mapping being created.
> And do not shutdown filesystem directly if something not supported, or
> if failure range includes metadata area.  Make sure all files and
> processes are handled correctly.  Also drop the cache of associated
> files before pmem is removed.
> 
> [1]: 
> https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.st...@dwillia2-desk3.amr.corp.intel.com/
> 
> Signed-off-by: Shiyang Ruan 
> ---
>  drivers/dax/super.c |  3 +-
>  fs/xfs/xfs_notify_failure.c | 56 +
>  include/linux/mm.h  |  1 +
>  mm/memory-failure.c | 17 ---
>  4 files changed, 67 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index c4c4728a36e4..2e1a35e82fce 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
>   return;
>  
>   if (dax_dev->holder_data != NULL)
> - dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> + dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> + MF_MEM_PRE_REMOVE);
>  
>   clear_bit(DAXDEV_ALIVE, _dev->flags);
>   synchronize_srcu(_srcu);
> diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> index 1e2eddb8f90f..1b4eff43f9b5 100644
> --- a/fs/xfs/xfs_notify_failure.c
> +++ b/fs/xfs/xfs_notify_failure.c
> @@ -22,6 +22,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  struct xfs_failure_info {
>   xfs_agblock_t   startblock;
> @@ -73,10 +74,16 @@ xfs_dax_failure_fn(
>   struct xfs_mount*mp = cur->bc_mp;
>   struct xfs_inode*ip;
>   struct xfs_failure_info *notify = data;
> + struct address_space*mapping;
> + pgoff_t pgoff;
> + unsigned long   pgcnt;
>   int error = 0;
>  
>   if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
>   (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
> + /* The device is about to be removed.  Not a really failure. */
> + if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> + return 0;
>   notify->want_shutdown = true;
>   return 0;
>   }
> @@ -92,10 +99,18 @@ xfs_dax_failure_fn(
>   return 0;
>   }
>  
> - error = mf_dax_kill_procs(VFS_I(ip)->i_mapping,
> -   xfs_failure_pgoff(mp, rec, notify),
> -   xfs_failure_pgcnt(mp, rec, notify),
> -   notify->mf_flags);
> + mapping = VFS_I(ip)->i_mapping;
> + pgoff = xfs_failure_pgoff(mp, rec, notify);
> + pgcnt = xfs_failure_pgcnt(mp, rec, notify);
> +
> + /* Continue the rmap query if the inode isn't a dax file. */
> + if (dax_mapping(mapping))
> + error = mf_dax_kill_procs(mapping, pgoff, pgcnt,
> + notify->mf_flags);
> +
> + /* Invalidate the cache anyway. */
> + invalidate_inode_pages2_range(mapping, pgoff, pgoff + pgcnt - 1);
> +
>   xfs_irele(ip);
>   return error;
>  }
> @@ -164,11 +179,25 @@ xfs_dax_notify_ddev_failure(
>   }
>  
>   xfs_trans_cancel(tp);
> +
> + /* Unfreeze filesystem anyway if it is freezed before. */
> + if (mf_flags & MF_MEM_PRE_REMOVE) {
> + error = thaw_super(mp->m_super);
> + if (error)
> + return error;

If someone *else* wanders in and thaws the fs, you'll get EINVAL here.

I guess that's useful for knowing if someone's screwed up the freeze
state on us, but ... really, don't you want to make sure you've gotten
the freeze and nobody else can take it away?

I think you want the kernel-initiated freeze proposed by Luis here:
https://lore.kernel.org/linux-fsdevel/20230114003409.1168311-4-mcg...@kernel.org/

Also: Is Fujitsu still pursuing pmem products?  Even 

Re: [PATCH] fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN

2023-03-23 Thread Darrick J. Wong
On Thu, Mar 23, 2023 at 02:50:38PM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2023/3/23 7:03, Andrew Morton 写道:
> > On Wed, 22 Mar 2023 11:11:09 + Shiyang Ruan  
> > wrote:
> > 
> > > unshare copies data from source to destination. But if the source is
> > > HOLE or UNWRITTEN extents, we should zero the destination, otherwise the
> > > result will be unexpectable.
> > 
> > Please provide much more detail on the user-visible effects of the bug.
> > For example, are we leaking kernel memory contents to userspace?
> 
> This fixes fail of generic/649.

Please make a note of easy(ish) reproducers when you know about them,
e.g.

"Found by running generic/649 while mounting with -o dax=always on
pmem."

So that anyone scanning around for fix backports has an easier time
doing an A/B comparison to assess whether or not they need the fix.
Especially if there's any chance that person is you circa 2024. ;)

--D

> 
> --
> Thanks,
> Ruan.
> 
> > 
> > Thanks.
> > 
> > 



Re: [PATCH] fsdax: dedupe should compare the min of two iters' length

2023-03-22 Thread Darrick J. Wong
On Wed, Mar 22, 2023 at 07:25:58AM +, Shiyang Ruan wrote:
> In an dedupe corporation iter loop, the length of iomap_iter decreases
> because it implies the remaining length after each iteration.  The
> compare function should use the min length of the current iters, not the
> total length.
> 
> Fixes: 0e79e3736d54 ("fsdax: dedupe: iter two files at the same time")
> Signed-off-by: Shiyang Ruan 

Makese sense,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/dax.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 3e457a16c7d1..9800b93ee14d 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -2022,8 +2022,8 @@ int dax_dedupe_file_range_compare(struct inode *src, 
> loff_t srcoff,
>  
>   while ((ret = iomap_iter(_iter, ops)) > 0 &&
>  (ret = iomap_iter(_iter, ops)) > 0) {
> - compared = dax_range_compare_iter(_iter, _iter, len,
> -   same);
> + compared = dax_range_compare_iter(_iter, _iter,
> + min(src_iter.len, dst_iter.len), same);
>   if (compared < 0)
>   return ret;
>   src_iter.processed = dst_iter.processed = compared;
> -- 
> 2.39.2
> 



Re: [GIT PULL] iomap for 5.17

2022-01-11 Thread Darrick J. Wong
On Mon, Jan 10, 2022 at 09:42:15PM +, Matthew Wilcox wrote:
> I know these requests usually come from Darrick, and we had intended
> that they would come that route.  Between the holidays and various
> things which Darrick needed to work on, he asked if I could send the
> pull request directly.  There weren't any other iomap patches pending
> for this release, which probably also played a role.

Just to confirm this explicitly -- yes, it really did transpire that
Matthew submitted the only iomap patches for 5.17, so I told him to go
ahead and send a pull request straight to Linus.

--D

> There is a conflict between this tree and the nvdimm tree.  We've done
> our best to make the resolution easy for you with the last patch in this
> series ("Inline __iomap_zero_iter into its caller").  If you'd rather just
> resolve the entire conflict yourself, feel free to drop that last patch.
> 
> The resolution Stephen has been carrying is here:
> https://lore.kernel.org/all/20211224172421.3f009...@canb.auug.org.au/
> 
> The following changes since commit 2585cf9dfaaddf00b069673f27bb3f8530e2039c:
> 
>   Linux 5.16-rc5 (2021-12-12 14:53:01 -0800)
> 
> are available in the Git repository at:
> 
>   git://git.infradead.org/users/willy/linux.git tags/iomap-5.17
> 
> for you to fetch changes up to 4d7bd0eb72e5831ddb1288786a96448b48440825:
> 
>   iomap: Inline __iomap_zero_iter into its caller (2021-12-21 13:51:08 -0500)
> 
> 
> Convert xfs/iomap to use folios
> 
> This should be all that is needed for XFS to use large folios.
> There is no code in this pull request to create large folios, but
> no additional changes should be needed to XFS or iomap once they
> are created.
> 
> 
> Matthew Wilcox (Oracle) (26):
>   block: Add bio_add_folio()
>   block: Add bio_for_each_folio_all()
>   fs/buffer: Convert __block_write_begin_int() to take a folio
>   iomap: Convert to_iomap_page to take a folio
>   iomap: Convert iomap_page_create to take a folio
>   iomap: Convert iomap_page_release to take a folio
>   iomap: Convert iomap_releasepage to use a folio
>   iomap: Add iomap_invalidate_folio
>   iomap: Pass the iomap_page into iomap_set_range_uptodate
>   iomap: Convert bio completions to use folios
>   iomap: Use folio offsets instead of page offsets
>   iomap: Convert iomap_read_inline_data to take a folio
>   iomap: Convert readahead and readpage to use a folio
>   iomap: Convert iomap_page_mkwrite to use a folio
>   iomap: Allow iomap_write_begin() to be called with the full length
>   iomap: Convert __iomap_zero_iter to use a folio
>   iomap: Convert iomap_write_begin() and iomap_write_end() to folios
>   iomap: Convert iomap_write_end_inline to take a folio
>   iomap,xfs: Convert ->discard_page to ->discard_folio
>   iomap: Simplify iomap_writepage_map()
>   iomap: Simplify iomap_do_writepage()
>   iomap: Convert iomap_add_to_ioend() to take a folio
>   iomap: Convert iomap_migrate_page() to use folios
>   iomap: Support large folios in invalidatepage
>   xfs: Support large folios
>   iomap: Inline __iomap_zero_iter into its caller
> 
>  Documentation/core-api/kernel-api.rst |   1 +
>  block/bio.c   |  22 ++
>  fs/buffer.c   |  23 +-
>  fs/internal.h |   2 +-
>  fs/iomap/buffered-io.c| 551 
> +-
>  fs/xfs/xfs_aops.c |  24 +-
>  fs/xfs/xfs_icache.c   |   2 +
>  include/linux/bio.h   |  56 +++-
>  include/linux/iomap.h |   3 +-
>  9 files changed, 389 insertions(+), 295 deletions(-)
> 



Re: [PATCH] iomap: turn the byte variable in iomap_zero_iter into a ssize_t

2021-12-09 Thread Darrick J. Wong
On Thu, Dec 09, 2021 at 07:11:19AM +0100, Christoph Hellwig wrote:
> On Wed, Dec 08, 2021 at 04:55:59PM -0800, Darrick J. Wong wrote:
> > Ok, so ... I don't know what I'm supposed to apply this to?  Is this
> > something that should go in Christoph's development branch?
> 
> This is against the decouple DAX from block devices series, which also
> decouples DAX zeroing from iomap buffered I/O zeroing.  It is in nvdimm.git
> and has been reviewed by you as well:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=libnvdimm-pending

Ah, ok.  IOWs, this is a 5.17 thing, not a critical fix for 5.16-rcX.

--D



Re: [PATCH] iomap: turn the byte variable in iomap_zero_iter into a ssize_t

2021-12-08 Thread Darrick J. Wong
On Wed, Dec 08, 2021 at 04:48:46PM -0800, Darrick J. Wong wrote:
> On Wed, Dec 08, 2021 at 10:12:03AM +0100, Christoph Hellwig wrote:
> > bytes also hold the return value from iomap_write_end, which can contain
> > a negative error value.  As bytes is always less than the page size even
> > the signed type can hold the entire possible range.
> > 
> > Fixes: c6f40468657d ("fsdax: decouple zeroing from the iomap buffered I/O 
> > code")

...waitaminute, ^^ in what tree is this commit?  Did Linus merge
the dax decoupling series into upstream without telling me?

/me checks... no?

Though I searched for it on gitweb and came up with this bizarre plot
twist:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c6f40468657d16e4010ef84bf32a761feb3469ea

(Is this the same as that github thing a few months ago where
everybody's commits get deduplicated into the same realm and hence
anyone can make trick the frontend into sort of making it look like
their rando commits end up in Linus' tree?  Or did it get merged and
push -f reverted?)

Ok, so ... I don't know what I'm supposed to apply this to?  Is this
something that should go in Christoph's development branch?



On the plus side, that means I /can/ go test-merge willy's iomap folios
for 5.17 stuff tonight.

--D

> > Reported-by: Dan Carpenter 
> > Signed-off-by: Christoph Hellwig 
> 
> Looks good,
> Reviewed-by: Darrick J. Wong 
> 
> --D
> 
> > ---
> >  fs/iomap/buffered-io.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> > index b1511255b4df8..ac040d607f4fe 100644
> > --- a/fs/iomap/buffered-io.c
> > +++ b/fs/iomap/buffered-io.c
> > @@ -883,7 +883,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, 
> > bool *did_zero)
> >  
> > do {
> > unsigned offset = offset_in_page(pos);
> > -   size_t bytes = min_t(u64, PAGE_SIZE - offset, length);
> > +   ssize_t bytes = min_t(u64, PAGE_SIZE - offset, length);
> > struct page *page;
> > int status;
> >  
> > -- 
> > 2.30.2
> > 



Re: [PATCH] iomap: turn the byte variable in iomap_zero_iter into a ssize_t

2021-12-08 Thread Darrick J. Wong
On Wed, Dec 08, 2021 at 10:12:03AM +0100, Christoph Hellwig wrote:
> bytes also hold the return value from iomap_write_end, which can contain
> a negative error value.  As bytes is always less than the page size even
> the signed type can hold the entire possible range.
> 
> Fixes: c6f40468657d ("fsdax: decouple zeroing from the iomap buffered I/O 
> code")
> Reported-by: Dan Carpenter 
> Signed-off-by: Christoph Hellwig 

Looks good,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/buffered-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index b1511255b4df8..ac040d607f4fe 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -883,7 +883,7 @@ static loff_t iomap_zero_iter(struct iomap_iter *iter, 
> bool *did_zero)
>  
>   do {
>   unsigned offset = offset_in_page(pos);
> - size_t bytes = min_t(u64, PAGE_SIZE - offset, length);
> + ssize_t bytes = min_t(u64, PAGE_SIZE - offset, length);
>   struct page *page;
>   int status;
>  
> -- 
> 2.30.2
> 



Re: [regression] fs dax xfstests panic

2021-09-27 Thread Darrick J. Wong
On Tue, Sep 28, 2021 at 07:17:00AM +0200, Christoph Hellwig wrote:
> On Mon, Sep 27, 2021 at 10:16:10PM -0700, Darrick J. Wong wrote:
> > > > My test machinse all hit this when writeback throttling is enabled, so
> > > > 
> > > > Tested-by: Darrick J. Wong 
> > > 
> > > Do you mean the series fixed it for you?
> > 
> > Yes.
> 
> Thanks!  I was just a little confused this came in in this thread.

Yeah, I was too incoherent after arguing on #xfs to be able to form
complete sentences, sorry about that.

Also thank you for fixing this problem. :)

--D



Re: [regression] fs dax xfstests panic

2021-09-27 Thread Darrick J. Wong
On Tue, Sep 28, 2021 at 06:34:26AM +0200, Christoph Hellwig wrote:
> On Mon, Sep 27, 2021 at 04:02:59PM -0700, Darrick J. Wong wrote:
> > > > Looping xfstests generic/108 or xfs/279 on mountpoints with fsdax
> > > > enabled can lead to panic like this:
> > > 
> > > Does this still happen with this series:
> > > 
> > > https://lore.kernel.org/linux-block/2021092217.2453343-1-...@lst.de/T/#m8dc646a4dfc40f443227da6bb1c77d9daec524db
> > > 
> > > ?
> > 
> > My test machinse all hit this when writeback throttling is enabled, so
> > 
> > Tested-by: Darrick J. Wong 
> 
> Do you mean the series fixed it for you?

Yes.

--D



Re: [regression] fs dax xfstests panic

2021-09-27 Thread Darrick J. Wong
On Mon, Sep 27, 2021 at 01:51:16PM +0200, Christoph Hellwig wrote:
> On Mon, Sep 27, 2021 at 02:17:47PM +0800, Murphy Zhou wrote:
> > Hi folks,
> > 
> > Since this commit:
> > 
> > commit edb0872f44ec9976ea6d052cb4b93cd2d23ac2ba
> > Author: Christoph Hellwig 
> > Date:   Mon Aug 9 16:17:43 2021 +0200
> > 
> > block: move the bdi from the request_queue to the gendisk
> > 
> > 
> > Looping xfstests generic/108 or xfs/279 on mountpoints with fsdax
> > enabled can lead to panic like this:
> 
> Does this still happen with this series:
> 
> https://lore.kernel.org/linux-block/2021092217.2453343-1-...@lst.de/T/#m8dc646a4dfc40f443227da6bb1c77d9daec524db
> 
> ?

My test machinse all hit this when writeback throttling is enabled, so

Tested-by: Darrick J. Wong 

--D




[PATCH] dax: remove silly single-page limitation in dax_zero_page_range

2021-09-22 Thread Darrick J. Wong
From: Darrick J. Wong 

It's totally silly that the dax zero_page_range implementations are
required to accept a page count, but one of the four implementations
silently ignores the page count and the wrapper itself errors out if you
try to do more than one page.

Fix the nvdimm implementation to loop over the page count and remove the
artificial limitation.

Signed-off-by: Darrick J. Wong 
---
 drivers/dax/super.c   |7 ---
 drivers/nvdimm/pmem.c |   14 +++---
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index fc89e91beea7..ca61a01f9ccd 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -353,13 +353,6 @@ int dax_zero_page_range(struct dax_device *dax_dev, 
pgoff_t pgoff,
 {
if (!dax_alive(dax_dev))
return -ENXIO;
-   /*
-* There are no callers that want to zero more than one page as of now.
-* Once users are there, this check can be removed after the
-* device mapper code has been updated to split ranges across targets.
-*/
-   if (nr_pages != 1)
-   return -EIO;
 
return dax_dev->ops->zero_page_range(dax_dev, pgoff, nr_pages);
 }
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 72de88ff0d30..3ef40bf74168 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -288,10 +288,18 @@ static int pmem_dax_zero_page_range(struct dax_device 
*dax_dev, pgoff_t pgoff,
size_t nr_pages)
 {
struct pmem_device *pmem = dax_get_private(dax_dev);
+   int ret = 0;
 
-   return blk_status_to_errno(pmem_do_write(pmem, ZERO_PAGE(0), 0,
-  PFN_PHYS(pgoff) >> SECTOR_SHIFT,
-  PAGE_SIZE));
+   for (; nr_pages > 0 && ret == 0; pgoff++, nr_pages--) {
+   blk_status_t status;
+
+   status = pmem_do_write(pmem, ZERO_PAGE(0), 0,
+  PFN_PHYS(pgoff) >> SECTOR_SHIFT,
+  PAGE_SIZE);
+   ret = blk_status_to_errno(status);
+   }
+
+   return ret;
 }
 
 static long pmem_dax_direct_access(struct dax_device *dax_dev,



Re: [PATCH 8/9] xfs: factor out a xfs_buftarg_is_dax helper

2021-08-26 Thread Darrick J. Wong
On Thu, Aug 26, 2021 at 07:37:29AM -0700, Dan Williams wrote:
> [ add Darrick ]
> 
> 
> On Thu, Aug 26, 2021 at 7:07 AM Christoph Hellwig  wrote:
> >
> > Refactor the DAX setup code in preparation of removing
> > bdev_dax_supported.
> >
> > Signed-off-by: Christoph Hellwig 
> > Reviewed-by: Dan Williams 
> > ---
> >  fs/xfs/xfs_super.c | 15 +++
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> Darrick, any concerns with me taking this through the dax tree?

Nope.

Reviewed-by: Darrick J. Wong 

--D

> 
> >
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 2c9e26a44546..5a89bf601d97 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -314,6 +314,14 @@ xfs_set_inode_alloc(
> > return (mp->m_flags & XFS_MOUNT_32BITINODES) ? maxagi : agcount;
> >  }
> >
> > +static bool
> > +xfs_buftarg_is_dax(
> > +   struct super_block  *sb,
> > +   struct xfs_buftarg  *bt)
> > +{
> > +   return bdev_dax_supported(bt->bt_bdev, sb->s_blocksize);
> > +}
> > +
> >  STATIC int
> >  xfs_blkdev_get(
> > xfs_mount_t *mp,
> > @@ -1549,11 +1557,10 @@ xfs_fs_fill_super(
> > xfs_warn(mp,
> > "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
> >
> > -   datadev_is_dax = 
> > bdev_dax_supported(mp->m_ddev_targp->bt_bdev,
> > -   sb->s_blocksize);
> > +   datadev_is_dax = xfs_buftarg_is_dax(sb, mp->m_ddev_targp);
> > if (mp->m_rtdev_targp)
> > -   rtdev_is_dax = bdev_dax_supported(
> > -   mp->m_rtdev_targp->bt_bdev, 
> > sb->s_blocksize);
> > +   rtdev_is_dax = xfs_buftarg_is_dax(sb,
> > +   mp->m_rtdev_targp);
> > if (!rtdev_is_dax && !datadev_is_dax) {
> > xfs_alert(mp,
> > "DAX unsupported by block device. Turning off 
> > DAX.");
> > --
> > 2.30.2
> >
> >



Re: [PATCH RESEND v6 1/9] pagemap: Introduce ->memory_failure()

2021-08-18 Thread Darrick J. Wong
On Tue, Aug 17, 2021 at 11:08:40PM -0700, Jane Chu wrote:
> 
> On 8/17/2021 10:43 PM, Jane Chu wrote:
> > More information -
> > 
> > On 8/16/2021 10:20 AM, Jane Chu wrote:
> > > Hi, ShiYang,
> > > 
> > > So I applied the v6 patch series to my 5.14-rc3 as it's what you
> > > indicated is what v6 was based at, and injected a hardware poison.
> > > 
> > > I'm seeing the same problem that was reported a while ago after the
> > > poison was consumed - in the SIGBUS payload, the si_addr is missing:
> > > 
> > > ** SIGBUS(7): canjmp=1, whichstep=0, **
> > > ** si_addr(0x(nil)), si_lsb(0xC), si_code(0x4, BUS_MCEERR_AR) **
> > > 
> > > The si_addr ought to be 0x7f656800 - the vaddr of the first page
> > > in this case.
> > 
> > The failure came from here :
> > 
> > [PATCH RESEND v6 6/9] xfs: Implement ->notify_failure() for XFS
> > 
> > +static int
> > +xfs_dax_notify_failure(
> > ...
> > +    if (!xfs_sb_version_hasrmapbt(>m_sb)) {
> > +    xfs_warn(mp, "notify_failure() needs rmapbt enabled!");
> > +    return -EOPNOTSUPP;
> > +    }
> > 
> > I am not familiar with XFS, but I have a few questions I hope to get
> > answers -
> > 
> > 1) What does it take and cost to make
> >     xfs_sb_version_hasrmapbt(>m_sb) to return true?

mkfs.xfs -m rmapbt=1

> > 2) For a running environment that fails the above check, is it
> >     okay to leave the poison handle in limbo and why?
> > 
> > 3) If the above regression is not acceptable, any potential remedy?
> 
> How about moving the check to prior to the notifier registration?
> And register only if the check is passed?  This seems better
> than an alternative which is to fall back to the legacy memory_failure
> handling in case the filesystem returns -EOPNOTSUPP.

"return -EOPNOTSUPP;" is the branching point where a future patch could
probe the (DRAM) buffer cache to bwrite the contents to restore the pmem
contents.  Right now the focus should be on landing the core code
changes without drawing any more NAKs from Dan.

--D

> thanks,
> -jane
> 
> > 
> > thanks!
> > -jane
> > 
> > 
> > > 
> > > Something is not right...
> > > 
> > > thanks,
> > > -jane
> > > 
> > > 
> > > On 8/5/2021 6:17 PM, Jane Chu wrote:
> > > > The filesystem part of the pmem failure handling is at minimum built
> > > > on PAGE_SIZE granularity - an inheritance from general
> > > > memory_failure handling.  However, with Intel's DCPMEM
> > > > technology, the error blast
> > > > radius is no more than 256bytes, and might get smaller with future
> > > > hardware generation, also advanced atomic 64B write to clear the poison.
> > > > But I don't see any of that could be incorporated in, given that the
> > > > filesystem is notified a corruption with pfn, rather than an exact
> > > > address.
> > > > 
> > > > So I guess this question is also for Dan: how to avoid unnecessarily
> > > > repairing a PMD range for a 256B corrupt range going forward?
> > > > 
> > > > thanks,
> > > > -jane
> > > > 
> > > > 
> > > > On 7/30/2021 3:01 AM, Shiyang Ruan wrote:
> > > > > When memory-failure occurs, we call this function which is implemented
> > > > > by each kind of devices.  For the fsdax case, pmem device driver
> > > > > implements it.  Pmem device driver will find out the
> > > > > filesystem in which
> > > > > the corrupted page located in.  And finally call filesystem handler to
> > > > > deal with this error.
> > > > > 
> > > > > The filesystem will try to recover the corrupted data if necessary.
> > > > 
> > > 
> > 



Re: [PATCH 11/30] iomap: add the new iomap_iter model

2021-08-12 Thread Darrick J. Wong
On Thu, Aug 12, 2021 at 08:49:14AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 11, 2021 at 12:17:08PM -0700, Darrick J. Wong wrote:
> > > iter.c is also my preference, but in the end I don't care too much.
> > 
> > Ok.  My plan for this is to change this patch to add the new iter code
> > to apply.c, and change patch 24 to remove iomap_apply.  I'll add a patch
> > on the end to rename apply.c to iter.c, which will avoid breaking the
> > history.
> 
> What history?  There is no shared code, so no shared history and.

The history of the gluecode that enables us to walk a bunch of extent
mappings.  In the beginning it was the _apply function, but now in our
spectre-weary world, you've switched it to a direct loop to reduce the
number of indirect calls in the hot path by 30-50%.

As you correctly point out, there's no /code/ shared by the two
implementations, but Dave and I would like to preserve the continuity
from one to the next.

> > I'll send the updated patches as replies to this series to avoid
> > spamming the list, since I also have a patchset of bugfixes to send out
> > and don't want to overwhelm everyone.
> 
> Just as a clear statement:  I think this dance is obsfucation and doesn't
> help in any way.  But if that's what it takes..

I /would/ appreciate it if you'd rvb (or at least ack) patch 31 so I can
get the 5.15 iomap changes finalized next week.  Pretty please? :)

--D



[PATCH 31/30] iomap: move iomap iteration code to iter.c

2021-08-11 Thread Darrick J. Wong
From: Darrick J. Wong 

Now that we've moved iomap to the iterator model, rename this file to be
in sync with the functions contained inside of it.

Signed-off-by: Darrick J. Wong 
---
 fs/iomap/Makefile |2 +-
 fs/iomap/iter.c   |0 
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename fs/iomap/{apply.c => iter.c} (100%)

diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index e46f936dde81..bb64215ae256 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -26,9 +26,9 @@ ccflags-y += -I $(srctree)/$(src) # needed for 
trace events
 obj-$(CONFIG_FS_IOMAP) += iomap.o
 
 iomap-y+= trace.o \
-  apply.o \
   buffered-io.o \
   direct-io.o \
   fiemap.o \
+  iter.o \
   seek.o
 iomap-$(CONFIG_SWAP)   += swapfile.o
diff --git a/fs/iomap/apply.c b/fs/iomap/iter.c
similarity index 100%
rename from fs/iomap/apply.c
rename to fs/iomap/iter.c



[PATCH v2.1 24/30] iomap: remove iomap_apply

2021-08-11 Thread Darrick J. Wong
From: Christoph Hellwig 

iomap_apply is unused now, so remove it.

Signed-off-by: Christoph Hellwig 
[djwong: rebase this patch to preserve git history of iomap loop control]
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong 
---
 fs/iomap/apply.c  |   91 -
 fs/iomap/trace.h  |   40 --
 include/linux/iomap.h |   10 -
 3 files changed, 141 deletions(-)

diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
index e82647aef7ea..a1c7592d2ade 100644
--- a/fs/iomap/apply.c
+++ b/fs/iomap/apply.c
@@ -3,101 +3,10 @@
  * Copyright (C) 2010 Red Hat, Inc.
  * Copyright (c) 2016-2021 Christoph Hellwig.
  */
-#include 
-#include 
 #include 
 #include 
 #include "trace.h"
 
-/*
- * Execute a iomap write on a segment of the mapping that spans a
- * contiguous range of pages that have identical block mapping state.
- *
- * This avoids the need to map pages individually, do individual allocations
- * for each page and most importantly avoid the need for filesystem specific
- * locking per page. Instead, all the operations are amortised over the entire
- * range of pages. It is assumed that the filesystems will lock whatever
- * resources they require in the iomap_begin call, and release them in the
- * iomap_end call.
- */
-loff_t
-iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
-   const struct iomap_ops *ops, void *data, iomap_actor_t actor)
-{
-   struct iomap iomap = { .type = IOMAP_HOLE };
-   struct iomap srcmap = { .type = IOMAP_HOLE };
-   loff_t written = 0, ret;
-   u64 end;
-
-   trace_iomap_apply(inode, pos, length, flags, ops, actor, _RET_IP_);
-
-   /*
-* Need to map a range from start position for length bytes. This can
-* span multiple pages - it is only guaranteed to return a range of a
-* single type of pages (e.g. all into a hole, all mapped or all
-* unwritten). Failure at this point has nothing to undo.
-*
-* If allocation is required for this range, reserve the space now so
-* that the allocation is guaranteed to succeed later on. Once we copy
-* the data into the page cache pages, then we cannot fail otherwise we
-* expose transient stale data. If the reserve fails, we can safely
-* back out at this point as there is nothing to undo.
-*/
-   ret = ops->iomap_begin(inode, pos, length, flags, , );
-   if (ret)
-   return ret;
-   if (WARN_ON(iomap.offset > pos)) {
-   written = -EIO;
-   goto out;
-   }
-   if (WARN_ON(iomap.length == 0)) {
-   written = -EIO;
-   goto out;
-   }
-
-   trace_iomap_apply_dstmap(inode, );
-   if (srcmap.type != IOMAP_HOLE)
-   trace_iomap_apply_srcmap(inode, );
-
-   /*
-* Cut down the length to the one actually provided by the filesystem,
-* as it might not be able to give us the whole size that we requested.
-*/
-   end = iomap.offset + iomap.length;
-   if (srcmap.type != IOMAP_HOLE)
-   end = min(end, srcmap.offset + srcmap.length);
-   if (pos + length > end)
-   length = end - pos;
-
-   /*
-* Now that we have guaranteed that the space allocation will succeed,
-* we can do the copy-in page by page without having to worry about
-* failures exposing transient data.
-*
-* To support COW operations, we read in data for partially blocks from
-* the srcmap if the file system filled it in.  In that case we the
-* length needs to be limited to the earlier of the ends of the iomaps.
-* If the file system did not provide a srcmap we pass in the normal
-* iomap into the actors so that they don't need to have special
-* handling for the two cases.
-*/
-   written = actor(inode, pos, length, data, ,
-   srcmap.type != IOMAP_HOLE ?  : );
-
-out:
-   /*
-* Now the data has been copied, commit the range we've copied.  This
-* should not fail unless the filesystem has had a fatal error.
-*/
-   if (ops->iomap_end) {
-   ret = ops->iomap_end(inode, pos, length,
-written > 0 ? written : 0,
-flags, );
-   }
-
-   return written ? written : ret;
-}
-
 static inline int iomap_iter_advance(struct iomap_iter *iter)
 {
/* handle the previous iteration (if any) */
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 1012d7af6b68..f1519f9a1403 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -138,49 +138,9 @@ DECLARE_EVENT_CLASS(iomap_class,
 DEFINE_EVENT(iomap_class, name,\
TP_PROTO(struct inode *inode, struct iomap *iomap), \
TP_ARGS(inode, iomap))
-DEFINE_IOMAP_EVENT(iomap_apply_

[PATCH v2.1 19/30] iomap: switch iomap_bmap to use iomap_iter

2021-08-11 Thread Darrick J. Wong
From: Christoph Hellwig 

Rewrite the ->bmap implementation based on iomap_iter.

Signed-off-by: Christoph Hellwig 
[djwong: restructure the loop to make its behavior a little clearer]
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong 
---
 fs/iomap/fiemap.c |   31 +--
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
index acad09a8c188..66cf267c68ae 100644
--- a/fs/iomap/fiemap.c
+++ b/fs/iomap/fiemap.c
@@ -92,37 +92,32 @@ int iomap_fiemap(struct inode *inode, struct 
fiemap_extent_info *fi,
 }
 EXPORT_SYMBOL_GPL(iomap_fiemap);
 
-static loff_t
-iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
-   void *data, struct iomap *iomap, struct iomap *srcmap)
-{
-   sector_t *bno = data, addr;
-
-   if (iomap->type == IOMAP_MAPPED) {
-   addr = (pos - iomap->offset + iomap->addr) >> inode->i_blkbits;
-   *bno = addr;
-   }
-   return 0;
-}
-
 /* legacy ->bmap interface.  0 is the error return (!) */
 sector_t
 iomap_bmap(struct address_space *mapping, sector_t bno,
const struct iomap_ops *ops)
 {
-   struct inode *inode = mapping->host;
-   loff_t pos = bno << inode->i_blkbits;
-   unsigned blocksize = i_blocksize(inode);
+   struct iomap_iter iter = {
+   .inode  = mapping->host,
+   .pos= (loff_t)bno << mapping->host->i_blkbits,
+   .len= i_blocksize(mapping->host),
+   .flags  = IOMAP_REPORT,
+   };
+   const unsigned int blkshift = mapping->host->i_blkbits - SECTOR_SHIFT;
int ret;
 
if (filemap_write_and_wait(mapping))
return 0;
 
bno = 0;
-   ret = iomap_apply(inode, pos, blocksize, 0, ops, ,
- iomap_bmap_actor);
+   while ((ret = iomap_iter(, ops)) > 0) {
+   if (iter.iomap.type == IOMAP_MAPPED)
+   bno = iomap_sector(, iter.pos) >> blkshift;
+   /* leave iter.processed unset to abort loop */
+   }
if (ret)
return 0;
+
return bno;
 }
 EXPORT_SYMBOL_GPL(iomap_bmap);



[PATCH v2.1 11/30] iomap: add the new iomap_iter model

2021-08-11 Thread Darrick J. Wong
From: Christoph Hellwig 

The iomap_iter struct provides a convenient way to package up and
maintain all the arguments to the various mapping and operation
functions.  It is operated on using the iomap_iter() function that
is called in loop until the whole range has been processed.  Compared
to the existing iomap_apply() function this avoid an indirect call
for each iteration.

For now iomap_iter() calls back into the existing ->iomap_begin and
->iomap_end methods, but in the future this could be further optimized
to avoid indirect calls entirely.

Based on an earlier patch from Matthew Wilcox .

Signed-off-by: Christoph Hellwig 
[djwong: add to apply.c to preserve git history of iomap loop control]
Reviewed-by: Darrick J. Wong 
Signed-off-by: Darrick J. Wong 
---
 fs/iomap/apply.c  |   74 -
 fs/iomap/trace.h  |   37 -
 include/linux/iomap.h |   56 +
 3 files changed, 165 insertions(+), 2 deletions(-)

diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
index 26ab6563181f..e82647aef7ea 100644
--- a/fs/iomap/apply.c
+++ b/fs/iomap/apply.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Copyright (C) 2010 Red Hat, Inc.
- * Copyright (c) 2016-2018 Christoph Hellwig.
+ * Copyright (c) 2016-2021 Christoph Hellwig.
  */
 #include 
 #include 
@@ -97,3 +97,75 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, 
unsigned flags,
 
return written ? written : ret;
 }
+
+static inline int iomap_iter_advance(struct iomap_iter *iter)
+{
+   /* handle the previous iteration (if any) */
+   if (iter->iomap.length) {
+   if (iter->processed <= 0)
+   return iter->processed;
+   if (WARN_ON_ONCE(iter->processed > iomap_length(iter)))
+   return -EIO;
+   iter->pos += iter->processed;
+   iter->len -= iter->processed;
+   if (!iter->len)
+   return 0;
+   }
+
+   /* clear the state for the next iteration */
+   iter->processed = 0;
+   memset(>iomap, 0, sizeof(iter->iomap));
+   memset(>srcmap, 0, sizeof(iter->srcmap));
+   return 1;
+}
+
+static inline void iomap_iter_done(struct iomap_iter *iter)
+{
+   WARN_ON_ONCE(iter->iomap.offset > iter->pos);
+   WARN_ON_ONCE(iter->iomap.length == 0);
+   WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos);
+
+   trace_iomap_iter_dstmap(iter->inode, >iomap);
+   if (iter->srcmap.type != IOMAP_HOLE)
+   trace_iomap_iter_srcmap(iter->inode, >srcmap);
+}
+
+/**
+ * iomap_iter - iterate over a ranges in a file
+ * @iter: iteration structue
+ * @ops: iomap ops provided by the file system
+ *
+ * Iterate over filesystem-provided space mappings for the provided file range.
+ *
+ * This function handles cleanup of resources acquired for iteration when the
+ * filesystem indicates there are no more space mappings, which means that this
+ * function must be called in a loop that continues as long it returns a
+ * positive value.  If 0 or a negative value is returned, the caller must not
+ * return to the loop body.  Within a loop body, there are two ways to break 
out
+ * of the loop body:  leave @iter.processed unchanged, or set it to a negative
+ * errno.
+ */
+int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
+{
+   int ret;
+
+   if (iter->iomap.length && ops->iomap_end) {
+   ret = ops->iomap_end(iter->inode, iter->pos, iomap_length(iter),
+   iter->processed > 0 ? iter->processed : 0,
+   iter->flags, >iomap);
+   if (ret < 0 && !iter->processed)
+   return ret;
+   }
+
+   trace_iomap_iter(iter, ops, _RET_IP_);
+   ret = iomap_iter_advance(iter);
+   if (ret <= 0)
+   return ret;
+
+   ret = ops->iomap_begin(iter->inode, iter->pos, iter->len, iter->flags,
+  >iomap, >srcmap);
+   if (ret < 0)
+   return ret;
+   iomap_iter_done(iter);
+   return 1;
+}
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index e9cd5cc0d6ba..1012d7af6b68 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * Copyright (c) 2009-2019 Christoph Hellwig
+ * Copyright (c) 2009-2021 Christoph Hellwig
  *
  * NOTE: none of these tracepoints shall be considered a stable kernel ABI
  * as they can change at any time.
@@ -140,6 +140,8 @@ DEFINE_EVENT(iomap_class, name, \
TP_ARGS(inode, iomap))
 DEFINE_IOMAP_EVENT(iomap_apply_dstmap);
 DEFINE_IOMAP_EVENT(iomap_apply_srcmap);
+DEFINE_IOMAP_EVENT(iomap_iter_dstmap);
+DEFI

Re: [PATCH 11/30] iomap: add the new iomap_iter model

2021-08-11 Thread Darrick J. Wong
On Wed, Aug 11, 2021 at 07:38:56AM +0200, Christoph Hellwig wrote:
> On Tue, Aug 10, 2021 at 05:31:18PM -0700, Darrick J. Wong wrote:
> > > +static inline void iomap_iter_done(struct iomap_iter *iter)
> > 
> > I wonder why this is a separate function, since it only has debugging
> > warnings and tracepoints?
> 
> The reason for these two sub-helpers was to force me to structure the
> code so that Matthews original idea of replacing ->iomap_begin and
> ->iomap_end with a single next callback so that iomap_iter could
> be inlined into callers and the indirect calls could be elided is
> still possible.  This would only be useful for a few specific
> methods (probably dax and direct I/O) where we care so much, but it
> seemed like a nice idea conceptually so I would not want to break it.
> 
> OTOH we could just remove this function for now and do that once needed.



> > Modulo the question about iomap_iter_done, I guess this looks all right
> > to me.  As far as apply.c vs. core.c, I'm not wildly passionate about
> > either naming choice (I would have called it iter.c) but ... fmeh.
> 
> iter.c is also my preference, but in the end I don't care too much.

Ok.  My plan for this is to change this patch to add the new iter code
to apply.c, and change patch 24 to remove iomap_apply.  I'll add a patch
on the end to rename apply.c to iter.c, which will avoid breaking the
history.

I'll send the updated patches as replies to this series to avoid
spamming the list, since I also have a patchset of bugfixes to send out
and don't want to overwhelm everyone.

--D



Re: [PATCH 11/30] iomap: add the new iomap_iter model

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:25AM +0200, Christoph Hellwig wrote:
> The iomap_iter struct provides a convenient way to package up and
> maintain all the arguments to the various mapping and operation
> functions.  It is operated on using the iomap_iter() function that
> is called in loop until the whole range has been processed.  Compared
> to the existing iomap_apply() function this avoid an indirect call
> for each iteration.
> 
> For now iomap_iter() calls back into the existing ->iomap_begin and
> ->iomap_end methods, but in the future this could be further optimized
> to avoid indirect calls entirely.
> 
> Based on an earlier patch from Matthew Wilcox .
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/iomap/Makefile |  1 +
>  fs/iomap/core.c   | 79 +++
>  fs/iomap/trace.h  | 37 +++-
>  include/linux/iomap.h | 56 ++
>  4 files changed, 172 insertions(+), 1 deletion(-)
>  create mode 100644 fs/iomap/core.c
> 
> diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
> index eef2722d93a183..6b56b10ded347a 100644
> --- a/fs/iomap/Makefile
> +++ b/fs/iomap/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_FS_IOMAP)  += iomap.o
>  
>  iomap-y  += trace.o \
>  apply.o \
> +core.o \
>  buffered-io.o \
>  direct-io.o \
>  fiemap.o \
> diff --git a/fs/iomap/core.c b/fs/iomap/core.c
> new file mode 100644
> index 00..89a87a1654e8e6
> --- /dev/null
> +++ b/fs/iomap/core.c
> @@ -0,0 +1,79 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021 Christoph Hellwig.
> + */
> +#include 
> +#include 
> +#include "trace.h"
> +
> +static inline int iomap_iter_advance(struct iomap_iter *iter)
> +{
> + /* handle the previous iteration (if any) */
> + if (iter->iomap.length) {
> + if (iter->processed <= 0)
> + return iter->processed;
> + if (WARN_ON_ONCE(iter->processed > iomap_length(iter)))
> + return -EIO;
> + iter->pos += iter->processed;
> + iter->len -= iter->processed;
> + if (!iter->len)
> + return 0;
> + }
> +
> + /* clear the state for the next iteration */
> + iter->processed = 0;
> + memset(>iomap, 0, sizeof(iter->iomap));
> + memset(>srcmap, 0, sizeof(iter->srcmap));
> + return 1;
> +}
> +
> +static inline void iomap_iter_done(struct iomap_iter *iter)

I wonder why this is a separate function, since it only has debugging
warnings and tracepoints?

> +{
> + WARN_ON_ONCE(iter->iomap.offset > iter->pos);
> + WARN_ON_ONCE(iter->iomap.length == 0);
> + WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos);
> +
> + trace_iomap_iter_dstmap(iter->inode, >iomap);
> + if (iter->srcmap.type != IOMAP_HOLE)
> + trace_iomap_iter_srcmap(iter->inode, >srcmap);
> +}
> +
> +/**
> + * iomap_iter - iterate over a ranges in a file
> + * @iter: iteration structue
> + * @ops: iomap ops provided by the file system
> + *
> + * Iterate over filesystem-provided space mappings for the provided file 
> range.
> + *
> + * This function handles cleanup of resources acquired for iteration when the
> + * filesystem indicates there are no more space mappings, which means that 
> this
> + * function must be called in a loop that continues as long it returns a
> + * positive value.  If 0 or a negative value is returned, the caller must not
> + * return to the loop body.  Within a loop body, there are two ways to break 
> out
> + * of the loop body:  leave @iter.processed unchanged, or set it to a 
> negative
> + * errno.

Thanks for improving the documentation.

Modulo the question about iomap_iter_done, I guess this looks all right
to me.  As far as apply.c vs. core.c, I'm not wildly passionate about
either naming choice (I would have called it iter.c) but ... fmeh.

Reviewed-by: Darrick J. Wong 

--D

> + */
> +int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
> +{
> + int ret;
> +
> + if (iter->iomap.length && ops->iomap_end) {
> + ret = ops->iomap_end(iter->inode, iter->pos, iomap_length(iter),
> + iter->processed > 0 ? iter->processed : 0,
> + iter->flags, >iomap);
> + if (ret

Re: [PATCH 17/30] iomap: switch __iomap_dio_rw to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:31AM +0200, Christoph Hellwig wrote:
> Switch __iomap_dio_rw to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 

I like the reduction in ->submit_io arguments.  The conversion seems
straightforward enough.

Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/btrfs/inode.c  |   5 +-
>  fs/iomap/direct-io.c  | 164 +-
>  include/linux/iomap.h |   4 +-
>  3 files changed, 86 insertions(+), 87 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 0117d867ecf876..3b0595e8bdd929 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8194,9 +8194,10 @@ static struct btrfs_dio_private 
> *btrfs_create_dio_private(struct bio *dio_bio,
>   return dip;
>  }
>  
> -static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap,
> +static blk_qc_t btrfs_submit_direct(const struct iomap_iter *iter,
>   struct bio *dio_bio, loff_t file_offset)
>  {
> + struct inode *inode = iter->inode;
>   const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE);
>   struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   const bool raid56 = (btrfs_data_alloc_profile(fs_info) &
> @@ -8212,7 +8213,7 @@ static blk_qc_t btrfs_submit_direct(struct inode 
> *inode, struct iomap *iomap,
>   int ret;
>   blk_status_t status;
>   struct btrfs_io_geometry geom;
> - struct btrfs_dio_data *dio_data = iomap->private;
> + struct btrfs_dio_data *dio_data = iter->iomap.private;
>   struct extent_map *em = NULL;
>  
>   dip = btrfs_create_dio_private(dio_bio, inode, file_offset);
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 41ccbfc9dc820a..4ecd255e0511ce 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (C) 2010 Red Hat, Inc.
> - * Copyright (c) 2016-2018 Christoph Hellwig.
> + * Copyright (c) 2016-2021 Christoph Hellwig.
>   */
>  #include 
>  #include 
> @@ -59,19 +59,17 @@ int iomap_dio_iopoll(struct kiocb *kiocb, bool spin)
>  }
>  EXPORT_SYMBOL_GPL(iomap_dio_iopoll);
>  
> -static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap,
> - struct bio *bio, loff_t pos)
> +static void iomap_dio_submit_bio(const struct iomap_iter *iter,
> + struct iomap_dio *dio, struct bio *bio, loff_t pos)
>  {
>   atomic_inc(>ref);
>  
>   if (dio->iocb->ki_flags & IOCB_HIPRI)
>   bio_set_polled(bio, dio->iocb);
>  
> - dio->submit.last_queue = bdev_get_queue(iomap->bdev);
> + dio->submit.last_queue = bdev_get_queue(iter->iomap.bdev);
>   if (dio->dops && dio->dops->submit_io)
> - dio->submit.cookie = dio->dops->submit_io(
> - file_inode(dio->iocb->ki_filp),
> - iomap, bio, pos);
> + dio->submit.cookie = dio->dops->submit_io(iter, bio, pos);
>   else
>   dio->submit.cookie = submit_bio(bio);
>  }
> @@ -181,24 +179,23 @@ static void iomap_dio_bio_end_io(struct bio *bio)
>   }
>  }
>  
> -static void
> -iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos,
> - unsigned len)
> +static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio 
> *dio,
> + loff_t pos, unsigned len)
>  {
>   struct page *page = ZERO_PAGE(0);
>   int flags = REQ_SYNC | REQ_IDLE;
>   struct bio *bio;
>  
>   bio = bio_alloc(GFP_KERNEL, 1);
> - bio_set_dev(bio, iomap->bdev);
> - bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
> + bio_set_dev(bio, iter->iomap.bdev);
> + bio->bi_iter.bi_sector = iomap_sector(>iomap, pos);
>   bio->bi_private = dio;
>   bio->bi_end_io = iomap_dio_bio_end_io;
>  
>   get_page(page);
>   __bio_add_page(bio, page, len, 0);
>   bio_set_op_attrs(bio, REQ_OP_WRITE, flags);
> - iomap_dio_submit_bio(dio, iomap, bio, pos);
> + iomap_dio_submit_bio(iter, dio, bio, pos);
>  }
>  
>  /*
> @@ -206,8 +203,8 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap 
> *iomap, loff_t pos,
>   * mapping, and whether or not we want FUA.  Note that we can end up
>   * clearing the WRITE_FUA flag in the dio request.
>   */
> -static inline unsigned int
> -iomap_dio_bio_opflags(struct iomap_dio *dio, struct iomap *iomap, bool 
> use_fua)
> +static inline unsigned int iomap_dio_bio_opflags(struct iomap_dio *dio,
> + const struct iomap *iomap, bool use

Re: [PATCH 20/30] iomap: switch iomap_seek_hole to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:34AM +0200, Christoph Hellwig wrote:
> Rewrite iomap_seek_hole to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 

Looks good to me,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/seek.c | 51 +
>  1 file changed, 26 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> index ce6fb810854fec..fed8f9005f9e46 100644
> --- a/fs/iomap/seek.c
> +++ b/fs/iomap/seek.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (C) 2017 Red Hat, Inc.
> - * Copyright (c) 2018 Christoph Hellwig.
> + * Copyright (c) 2018-2021 Christoph Hellwig.
>   */
>  #include 
>  #include 
> @@ -10,21 +10,20 @@
>  #include 
>  #include 
>  
> -static loff_t
> -iomap_seek_hole_actor(struct inode *inode, loff_t start, loff_t length,
> -   void *data, struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_seek_hole_iter(const struct iomap_iter *iter,
> + loff_t *hole_pos)
>  {
> - loff_t offset = start;
> + loff_t length = iomap_length(iter);
>  
> - switch (iomap->type) {
> + switch (iter->iomap.type) {
>   case IOMAP_UNWRITTEN:
> - offset = mapping_seek_hole_data(inode->i_mapping, start,
> - start + length, SEEK_HOLE);
> - if (offset == start + length)
> + *hole_pos = mapping_seek_hole_data(iter->inode->i_mapping,
> + iter->pos, iter->pos + length, SEEK_HOLE);
> + if (*hole_pos == iter->pos + length)
>   return length;
> - fallthrough;
> + return 0;
>   case IOMAP_HOLE:
> - *(loff_t *)data = offset;
> + *hole_pos = iter->pos;
>   return 0;
>   default:
>   return length;
> @@ -32,26 +31,28 @@ iomap_seek_hole_actor(struct inode *inode, loff_t start, 
> loff_t length,
>  }
>  
>  loff_t
> -iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops 
> *ops)
> +iomap_seek_hole(struct inode *inode, loff_t pos, const struct iomap_ops *ops)
>  {
>   loff_t size = i_size_read(inode);
> - loff_t ret;
> + struct iomap_iter iter = {
> + .inode  = inode,
> + .pos= pos,
> + .flags  = IOMAP_REPORT,
> + };
> + int ret;
>  
>   /* Nothing to be found before or beyond the end of the file. */
> - if (offset < 0 || offset >= size)
> + if (pos < 0 || pos >= size)
>   return -ENXIO;
>  
> - while (offset < size) {
> - ret = iomap_apply(inode, offset, size - offset, IOMAP_REPORT,
> -   ops, , iomap_seek_hole_actor);
> - if (ret < 0)
> - return ret;
> - if (ret == 0)
> - break;
> - offset += ret;
> - }
> -
> - return offset;
> + iter.len = size - pos;
> + while ((ret = iomap_iter(, ops)) > 0)
> + iter.processed = iomap_seek_hole_iter(, );
> + if (ret < 0)
> + return ret;
> + if (iter.len) /* found hole before EOF */
> + return pos;
> + return size;
>  }
>  EXPORT_SYMBOL_GPL(iomap_seek_hole);
>  
> -- 
> 2.30.2
> 



Re: [PATCH 21/30] iomap: switch iomap_seek_data to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:35AM +0200, Christoph Hellwig wrote:
> Rewrite iomap_seek_data to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 

Nice cleanup,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/seek.c | 47 ---
>  1 file changed, 24 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> index fed8f9005f9e46..a845c012b50c67 100644
> --- a/fs/iomap/seek.c
> +++ b/fs/iomap/seek.c
> @@ -56,47 +56,48 @@ iomap_seek_hole(struct inode *inode, loff_t pos, const 
> struct iomap_ops *ops)
>  }
>  EXPORT_SYMBOL_GPL(iomap_seek_hole);
>  
> -static loff_t
> -iomap_seek_data_actor(struct inode *inode, loff_t start, loff_t length,
> -   void *data, struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_seek_data_iter(const struct iomap_iter *iter,
> + loff_t *hole_pos)
>  {
> - loff_t offset = start;
> + loff_t length = iomap_length(iter);
>  
> - switch (iomap->type) {
> + switch (iter->iomap.type) {
>   case IOMAP_HOLE:
>   return length;
>   case IOMAP_UNWRITTEN:
> - offset = mapping_seek_hole_data(inode->i_mapping, start,
> - start + length, SEEK_DATA);
> - if (offset < 0)
> + *hole_pos = mapping_seek_hole_data(iter->inode->i_mapping,
> + iter->pos, iter->pos + length, SEEK_DATA);
> + if (*hole_pos < 0)
>   return length;
> - fallthrough;
> + return 0;
>   default:
> - *(loff_t *)data = offset;
> + *hole_pos = iter->pos;
>   return 0;
>   }
>  }
>  
>  loff_t
> -iomap_seek_data(struct inode *inode, loff_t offset, const struct iomap_ops 
> *ops)
> +iomap_seek_data(struct inode *inode, loff_t pos, const struct iomap_ops *ops)
>  {
>   loff_t size = i_size_read(inode);
> - loff_t ret;
> + struct iomap_iter iter = {
> + .inode  = inode,
> + .pos= pos,
> + .flags  = IOMAP_REPORT,
> + };
> + int ret;
>  
>   /* Nothing to be found before or beyond the end of the file. */
> - if (offset < 0 || offset >= size)
> + if (pos < 0 || pos >= size)
>   return -ENXIO;
>  
> - while (offset < size) {
> - ret = iomap_apply(inode, offset, size - offset, IOMAP_REPORT,
> -   ops, , iomap_seek_data_actor);
> - if (ret < 0)
> - return ret;
> - if (ret == 0)
> - return offset;
> - offset += ret;
> - }
> -
> + iter.len = size - pos;
> + while ((ret = iomap_iter(, ops)) > 0)
> + iter.processed = iomap_seek_data_iter(, );
> + if (ret < 0)
> + return ret;
> + if (iter.len) /* found data before EOF */
> + return pos;
>   /* We've reached the end of the file without finding data */
>   return -ENXIO;
>  }
> -- 
> 2.30.2
> 



Re: [PATCH 22/30] iomap: switch iomap_swapfile_activate to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:36AM +0200, Christoph Hellwig wrote:
> Switch iomap_swapfile_activate to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 

Smooth
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/swapfile.c | 38 --
>  1 file changed, 16 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
> index 6250ca6a1f851d..7069606eca85b2 100644
> --- a/fs/iomap/swapfile.c
> +++ b/fs/iomap/swapfile.c
> @@ -88,13 +88,9 @@ static int iomap_swapfile_fail(struct iomap_swapfile_info 
> *isi, const char *str)
>   * swap only cares about contiguous page-aligned physical extents and makes 
> no
>   * distinction between written and unwritten extents.
>   */
> -static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos,
> - loff_t count, void *data, struct iomap *iomap,
> - struct iomap *srcmap)
> +static loff_t iomap_swapfile_iter(const struct iomap_iter *iter,
> + struct iomap *iomap, struct iomap_swapfile_info *isi)
>  {
> - struct iomap_swapfile_info *isi = data;
> - int error;
> -
>   switch (iomap->type) {
>   case IOMAP_MAPPED:
>   case IOMAP_UNWRITTEN:
> @@ -125,12 +121,12 @@ static loff_t iomap_swapfile_activate_actor(struct 
> inode *inode, loff_t pos,
>   isi->iomap.length += iomap->length;
>   } else {
>   /* Otherwise, add the retained iomap and store this one. */
> - error = iomap_swapfile_add_extent(isi);
> + int error = iomap_swapfile_add_extent(isi);
>   if (error)
>   return error;
>   memcpy(>iomap, iomap, sizeof(isi->iomap));
>   }
> - return count;
> + return iomap_length(iter);
>  }
>  
>  /*
> @@ -141,16 +137,19 @@ int iomap_swapfile_activate(struct swap_info_struct 
> *sis,
>   struct file *swap_file, sector_t *pagespan,
>   const struct iomap_ops *ops)
>  {
> + struct inode *inode = swap_file->f_mapping->host;
> + struct iomap_iter iter = {
> + .inode  = inode,
> + .pos= 0,
> + .len= ALIGN_DOWN(i_size_read(inode), PAGE_SIZE),
> + .flags  = IOMAP_REPORT,
> + };
>   struct iomap_swapfile_info isi = {
>   .sis = sis,
>   .lowest_ppage = (sector_t)-1ULL,
>   .file = swap_file,
>   };
> - struct address_space *mapping = swap_file->f_mapping;
> - struct inode *inode = mapping->host;
> - loff_t pos = 0;
> - loff_t len = ALIGN_DOWN(i_size_read(inode), PAGE_SIZE);
> - loff_t ret;
> + int ret;
>  
>   /*
>* Persist all file mapping metadata so that we won't have any
> @@ -160,15 +159,10 @@ int iomap_swapfile_activate(struct swap_info_struct 
> *sis,
>   if (ret)
>   return ret;
>  
> - while (len > 0) {
> - ret = iomap_apply(inode, pos, len, IOMAP_REPORT,
> - ops, , iomap_swapfile_activate_actor);
> - if (ret <= 0)
> - return ret;
> -
> - pos += ret;
> - len -= ret;
> - }
> + while ((ret = iomap_iter(, ops)) > 0)
> + iter.processed = iomap_swapfile_iter(, , );
> + if (ret < 0)
> + return ret;
>  
>   if (isi.iomap.length) {
>   ret = iomap_swapfile_add_extent();
> -- 
> 2.30.2
> 



Re: [PATCH 23/30] fsdax: switch dax_iomap_rw to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:37AM +0200, Christoph Hellwig wrote:
> Switch the dax_iomap_rw implementation to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 

/me gets excited about this file getting cleaned up
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/dax.c | 49 -
>  1 file changed, 24 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 4d63040fd71f56..51da45301350a6 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1103,20 +1103,21 @@ s64 dax_iomap_zero(loff_t pos, u64 length, struct 
> iomap *iomap)
>   return size;
>  }
>  
> -static loff_t
> -dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap, struct iomap *srcmap)
> +static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
> + struct iov_iter *iter)
>  {
> + const struct iomap *iomap = >iomap;
> + loff_t length = iomap_length(iomi);
> + loff_t pos = iomi->pos;
>   struct block_device *bdev = iomap->bdev;
>   struct dax_device *dax_dev = iomap->dax_dev;
> - struct iov_iter *iter = data;
>   loff_t end = pos + length, done = 0;
>   ssize_t ret = 0;
>   size_t xfer;
>   int id;
>  
>   if (iov_iter_rw(iter) == READ) {
> - end = min(end, i_size_read(inode));
> + end = min(end, i_size_read(iomi->inode));
>   if (pos >= end)
>   return 0;
>  
> @@ -1133,7 +1134,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t 
> length, void *data,
>* written by write(2) is visible in mmap.
>*/
>   if (iomap->flags & IOMAP_F_NEW) {
> - invalidate_inode_pages2_range(inode->i_mapping,
> + invalidate_inode_pages2_range(iomi->inode->i_mapping,
> pos >> PAGE_SHIFT,
> (end - 1) >> PAGE_SHIFT);
>   }
> @@ -1209,31 +1210,29 @@ ssize_t
>  dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
>   const struct iomap_ops *ops)
>  {
> - struct address_space *mapping = iocb->ki_filp->f_mapping;
> - struct inode *inode = mapping->host;
> - loff_t pos = iocb->ki_pos, ret = 0, done = 0;
> - unsigned flags = 0;
> + struct iomap_iter iomi = {
> + .inode  = iocb->ki_filp->f_mapping->host,
> + .pos= iocb->ki_pos,
> + .len= iov_iter_count(iter),
> + };
> + loff_t done = 0;
> + int ret;
>  
>   if (iov_iter_rw(iter) == WRITE) {
> - lockdep_assert_held_write(>i_rwsem);
> - flags |= IOMAP_WRITE;
> + lockdep_assert_held_write(>i_rwsem);
> + iomi.flags |= IOMAP_WRITE;
>   } else {
> - lockdep_assert_held(>i_rwsem);
> + lockdep_assert_held(>i_rwsem);
>   }
>  
>   if (iocb->ki_flags & IOCB_NOWAIT)
> - flags |= IOMAP_NOWAIT;
> + iomi.flags |= IOMAP_NOWAIT;
>  
> - while (iov_iter_count(iter)) {
> - ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
> - iter, dax_iomap_actor);
> - if (ret <= 0)
> - break;
> - pos += ret;
> - done += ret;
> - }
> + while ((ret = iomap_iter(, ops)) > 0)
> + iomi.processed = dax_iomap_iter(, iter);
>  
> - iocb->ki_pos += done;
> + done = iomi.pos - iocb->ki_pos;
> + iocb->ki_pos = iomi.pos;
>   return done ? done : ret;
>  }
>  EXPORT_SYMBOL_GPL(dax_iomap_rw);
> @@ -1307,7 +1306,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault 
> *vmf, pfn_t *pfnp,
>   }
>  
>   /*
> -  * Note that we don't bother to use iomap_apply here: DAX required
> +  * Note that we don't bother to use iomap_iter here: DAX required
>* the file system block size to be equal the page size, which means
>* that we never have to deal with more than a single extent here.
>*/
> @@ -1561,7 +1560,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault 
> *vmf, pfn_t *pfnp,
>   }
>  
>   /*
> -  * Note that we don't use iomap_apply here.  We aren't doing I/O, only
> +  * Note that we don't use iomap_iter here.  We aren't doing I/O, only
>* setting up a mapping, so really we're using iomap_begin() as a way
>* to look up our filesystem block.
>*/
> -- 
> 2.30.2
> 



Re: [PATCH 18/30] iomap: switch iomap_fiemap to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:32AM +0200, Christoph Hellwig wrote:
> Rewrite the ->fiemap implementation based on iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 

Nice cleanups!
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/fiemap.c | 70 ---
>  1 file changed, 29 insertions(+), 41 deletions(-)
> 
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index aab070df4a2175..acad09a8c188df 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -1,6 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
> - * Copyright (c) 2016-2018 Christoph Hellwig.
> + * Copyright (c) 2016-2021 Christoph Hellwig.
>   */
>  #include 
>  #include 
> @@ -8,13 +8,8 @@
>  #include 
>  #include 
>  
> -struct fiemap_ctx {
> - struct fiemap_extent_info *fi;
> - struct iomap prev;
> -};
> -
>  static int iomap_to_fiemap(struct fiemap_extent_info *fi,
> - struct iomap *iomap, u32 flags)
> + const struct iomap *iomap, u32 flags)
>  {
>   switch (iomap->type) {
>   case IOMAP_HOLE:
> @@ -43,24 +38,22 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
>   iomap->length, flags);
>  }
>  
> -static loff_t
> -iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void 
> *data,
> - struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_fiemap_iter(const struct iomap_iter *iter,
> + struct fiemap_extent_info *fi, struct iomap *prev)
>  {
> - struct fiemap_ctx *ctx = data;
> - loff_t ret = length;
> + int ret;
>  
> - if (iomap->type == IOMAP_HOLE)
> - return length;
> + if (iter->iomap.type == IOMAP_HOLE)
> + return iomap_length(iter);
>  
> - ret = iomap_to_fiemap(ctx->fi, >prev, 0);
> - ctx->prev = *iomap;
> + ret = iomap_to_fiemap(fi, prev, 0);
> + *prev = iter->iomap;
>   switch (ret) {
>   case 0: /* success */
> - return length;
> + return iomap_length(iter);
>   case 1: /* extent array full */
>   return 0;
> - default:
> + default:/* error */
>   return ret;
>   }
>  }
> @@ -68,38 +61,33 @@ iomap_fiemap_actor(struct inode *inode, loff_t pos, 
> loff_t length, void *data,
>  int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi,
>   u64 start, u64 len, const struct iomap_ops *ops)
>  {
> - struct fiemap_ctx ctx;
> - loff_t ret;
> -
> - memset(, 0, sizeof(ctx));
> - ctx.fi = fi;
> - ctx.prev.type = IOMAP_HOLE;
> + struct iomap_iter iter = {
> + .inode  = inode,
> + .pos= start,
> + .len= len,
> + .flags  = IOMAP_REPORT,
> + };
> + struct iomap prev = {
> + .type   = IOMAP_HOLE,
> + };
> + int ret;
>  
> - ret = fiemap_prep(inode, fi, start, , 0);
> + ret = fiemap_prep(inode, fi, start, , 0);
>   if (ret)
>   return ret;
>  
> - while (len > 0) {
> - ret = iomap_apply(inode, start, len, IOMAP_REPORT, ops, ,
> - iomap_fiemap_actor);
> - /* inode with no (attribute) mapping will give ENOENT */
> - if (ret == -ENOENT)
> - break;
> - if (ret < 0)
> - return ret;
> - if (ret == 0)
> - break;
> + while ((ret = iomap_iter(, ops)) > 0)
> + iter.processed = iomap_fiemap_iter(, fi, );
>  
> - start += ret;
> - len -= ret;
> - }
> -
> - if (ctx.prev.type != IOMAP_HOLE) {
> - ret = iomap_to_fiemap(fi, , FIEMAP_EXTENT_LAST);
> + if (prev.type != IOMAP_HOLE) {
> + ret = iomap_to_fiemap(fi, , FIEMAP_EXTENT_LAST);
>   if (ret < 0)
>   return ret;
>   }
>  
> + /* inode with no (attribute) mapping will give ENOENT */
> + if (ret < 0 && ret != -ENOENT)
> + return ret;
>   return 0;
>  }
>  EXPORT_SYMBOL_GPL(iomap_fiemap);
> -- 
> 2.30.2
> 



Re: [PATCH 16/30] iomap: switch iomap_page_mkwrite to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:30AM +0200, Christoph Hellwig wrote:
> Switch iomap_page_mkwrite to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/buffered-io.c | 39 +--
>  1 file changed, 17 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 3a23f7346938fb..5ab464937d4886 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -961,15 +961,15 @@ iomap_truncate_page(struct inode *inode, loff_t pos, 
> bool *did_zero,
>  }
>  EXPORT_SYMBOL_GPL(iomap_truncate_page);
>  
> -static loff_t
> -iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_page_mkwrite_iter(struct iomap_iter *iter,
> + struct page *page)
>  {
> - struct page *page = data;
> + loff_t length = iomap_length(iter);
>   int ret;
>  
> - if (iomap->flags & IOMAP_F_BUFFER_HEAD) {
> - ret = __block_write_begin_int(page, pos, length, NULL, iomap);
> + if (iter->iomap.flags & IOMAP_F_BUFFER_HEAD) {
> + ret = __block_write_begin_int(page, iter->pos, length, NULL,
> +   >iomap);
>   if (ret)
>   return ret;
>   block_commit_write(page, 0, length);
> @@ -983,29 +983,24 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t 
> pos, loff_t length,
>  
>  vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops 
> *ops)
>  {
> + struct iomap_iter iter = {
> + .inode  = file_inode(vmf->vma->vm_file),
> + .flags  = IOMAP_WRITE | IOMAP_FAULT,
> + };
>   struct page *page = vmf->page;
> - struct inode *inode = file_inode(vmf->vma->vm_file);
> - unsigned long length;
> - loff_t offset;
>   ssize_t ret;
>  
>   lock_page(page);
> - ret = page_mkwrite_check_truncate(page, inode);
> + ret = page_mkwrite_check_truncate(page, iter.inode);
>   if (ret < 0)
>   goto out_unlock;
> - length = ret;
> -
> - offset = page_offset(page);
> - while (length > 0) {
> - ret = iomap_apply(inode, offset, length,
> - IOMAP_WRITE | IOMAP_FAULT, ops, page,
> - iomap_page_mkwrite_actor);
> - if (unlikely(ret <= 0))
> - goto out_unlock;
> - offset += ret;
> - length -= ret;
> - }
> + iter.pos = page_offset(page);
> + iter.len = ret;
> + while ((ret = iomap_iter(, ops)) > 0)
> + iter.processed = iomap_page_mkwrite_iter(, page);
>  
> + if (ret < 0)
> + goto out_unlock;
>   wait_for_stable_page(page);
>   return VM_FAULT_LOCKED;
>  out_unlock:
> -- 
> 2.30.2
> 



Re: [PATCH 15/30] iomap: switch iomap_zero_range to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:29AM +0200, Christoph Hellwig wrote:
> Switch iomap_zero_range to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 

Straightforward.
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/buffered-io.c | 36 ++--
>  1 file changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 4f525727462f33..3a23f7346938fb 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -896,11 +896,12 @@ static s64 iomap_zero(struct inode *inode, loff_t pos, 
> u64 length,
>   return iomap_write_end(inode, pos, bytes, bytes, page, iomap, srcmap);
>  }
>  
> -static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos,
> - loff_t length, void *data, struct iomap *iomap,
> - struct iomap *srcmap)
> +static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero)
>  {
> - bool *did_zero = data;
> + struct iomap *iomap = >iomap;
> + struct iomap *srcmap = iomap_iter_srcmap(iter);
> + loff_t pos = iter->pos;
> + loff_t length = iomap_length(iter);
>   loff_t written = 0;
>  
>   /* already zeroed?  we're done. */
> @@ -910,10 +911,11 @@ static loff_t iomap_zero_range_actor(struct inode 
> *inode, loff_t pos,
>   do {
>   s64 bytes;
>  
> - if (IS_DAX(inode))
> + if (IS_DAX(iter->inode))
>   bytes = dax_iomap_zero(pos, length, iomap);
>   else
> - bytes = iomap_zero(inode, pos, length, iomap, srcmap);
> + bytes = iomap_zero(iter->inode, pos, length, iomap,
> +srcmap);
>   if (bytes < 0)
>   return bytes;
>  
> @@ -931,19 +933,17 @@ int
>  iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
>   const struct iomap_ops *ops)
>  {
> - loff_t ret;
> -
> - while (len > 0) {
> - ret = iomap_apply(inode, pos, len, IOMAP_ZERO,
> - ops, did_zero, iomap_zero_range_actor);
> - if (ret <= 0)
> - return ret;
> -
> - pos += ret;
> - len -= ret;
> - }
> + struct iomap_iter iter = {
> + .inode  = inode,
> + .pos= pos,
> + .len= len,
> + .flags  = IOMAP_ZERO,
> + };
> + int ret;
>  
> - return 0;
> + while ((ret = iomap_iter(, ops)) > 0)
> + iter.processed = iomap_zero_iter(, did_zero);
> + return ret;
>  }
>  EXPORT_SYMBOL_GPL(iomap_zero_range);
>  
> -- 
> 2.30.2
> 



Re: [PATCH 14/30] iomap: switch iomap_file_unshare to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:28AM +0200, Christoph Hellwig wrote:
> Switch iomap_file_unshare to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/buffered-io.c | 35 ++-
>  1 file changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 4c7e82928cc546..4f525727462f33 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -817,10 +817,12 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
> iov_iter *i,
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  
> -static loff_t
> -iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void 
> *data,
> - struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_unshare_iter(struct iomap_iter *iter)
>  {
> + struct iomap *iomap = >iomap;
> + struct iomap *srcmap = iomap_iter_srcmap(iter);
> + loff_t pos = iter->pos;
> + loff_t length = iomap_length(iter);
>   long status = 0;
>   loff_t written = 0;
>  
> @@ -836,12 +838,12 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, 
> loff_t length, void *data,
>   unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
>   struct page *page;
>  
> - status = iomap_write_begin(inode, pos, bytes,
> + status = iomap_write_begin(iter->inode, pos, bytes,
>   IOMAP_WRITE_F_UNSHARE, , iomap, srcmap);
>   if (unlikely(status))
>   return status;
>  
> - status = iomap_write_end(inode, pos, bytes, bytes, page, iomap,
> + status = iomap_write_end(iter->inode, pos, bytes, bytes, page, 
> iomap,
>   srcmap);
>   if (WARN_ON_ONCE(status == 0))
>   return -EIO;
> @@ -852,7 +854,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, 
> loff_t length, void *data,
>   written += status;
>   length -= status;
>  
> - balance_dirty_pages_ratelimited(inode->i_mapping);
> + balance_dirty_pages_ratelimited(iter->inode->i_mapping);
>   } while (length);
>  
>   return written;
> @@ -862,18 +864,17 @@ int
>  iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
>   const struct iomap_ops *ops)
>  {
> - loff_t ret;
> -
> - while (len) {
> - ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
> - iomap_unshare_actor);
> - if (ret <= 0)
> - return ret;
> - pos += ret;
> - len -= ret;
> - }
> + struct iomap_iter iter = {
> + .inode  = inode,
> + .pos= pos,
> + .len= len,
> + .flags  = IOMAP_WRITE,
> + };
> + int ret;
>  
> - return 0;
> + while ((ret = iomap_iter(, ops)) > 0)
> + iter.processed = iomap_unshare_iter();
> + return ret;
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_unshare);
>  
> -- 
> 2.30.2
> 



Re: [PATCH 13/30] iomap: switch iomap_file_buffered_write to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:27AM +0200, Christoph Hellwig wrote:
> Switch iomap_file_buffered_write to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 

Seems pretty straightforward.
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/buffered-io.c | 49 +-
>  1 file changed, 25 insertions(+), 24 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 9cda461887afad..4c7e82928cc546 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -726,13 +726,14 @@ static size_t iomap_write_end(struct inode *inode, 
> loff_t pos, size_t len,
>   return ret;
>  }
>  
> -static loff_t
> -iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> - struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i)
>  {
> - struct iov_iter *i = data;
> - long status = 0;
> + struct iomap *srcmap = iomap_iter_srcmap(iter);
> + struct iomap *iomap = >iomap;
> + loff_t length = iomap_length(iter);
> + loff_t pos = iter->pos;
>   ssize_t written = 0;
> + long status = 0;
>  
>   do {
>   struct page *page;
> @@ -758,18 +759,18 @@ iomap_write_actor(struct inode *inode, loff_t pos, 
> loff_t length, void *data,
>   break;
>   }
>  
> - status = iomap_write_begin(inode, pos, bytes, 0, , iomap,
> - srcmap);
> + status = iomap_write_begin(iter->inode, pos, bytes, 0, ,
> +iomap, srcmap);
>   if (unlikely(status))
>   break;
>  
> - if (mapping_writably_mapped(inode->i_mapping))
> + if (mapping_writably_mapped(iter->inode->i_mapping))
>   flush_dcache_page(page);
>  
>   copied = copy_page_from_iter_atomic(page, offset, bytes, i);
>  
> - status = iomap_write_end(inode, pos, bytes, copied, page, iomap,
> - srcmap);
> + status = iomap_write_end(iter->inode, pos, bytes, copied, page,
> +  iomap, srcmap);
>  
>   if (unlikely(copied != status))
>   iov_iter_revert(i, copied - status);
> @@ -790,29 +791,29 @@ iomap_write_actor(struct inode *inode, loff_t pos, 
> loff_t length, void *data,
>   written += status;
>   length -= status;
>  
> - balance_dirty_pages_ratelimited(inode->i_mapping);
> + balance_dirty_pages_ratelimited(iter->inode->i_mapping);
>   } while (iov_iter_count(i) && length);
>  
>   return written ? written : status;
>  }
>  
>  ssize_t
> -iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
> +iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
>   const struct iomap_ops *ops)
>  {
> - struct inode *inode = iocb->ki_filp->f_mapping->host;
> - loff_t pos = iocb->ki_pos, ret = 0, written = 0;
> -
> - while (iov_iter_count(iter)) {
> - ret = iomap_apply(inode, pos, iov_iter_count(iter),
> - IOMAP_WRITE, ops, iter, iomap_write_actor);
> - if (ret <= 0)
> - break;
> - pos += ret;
> - written += ret;
> - }
> + struct iomap_iter iter = {
> + .inode  = iocb->ki_filp->f_mapping->host,
> + .pos= iocb->ki_pos,
> + .len= iov_iter_count(i),
> + .flags  = IOMAP_WRITE,
> + };
> + int ret;
>  
> - return written ? written : ret;
> + while ((ret = iomap_iter(, ops)) > 0)
> + iter.processed = iomap_write_iter(, i);
> + if (iter.pos == iocb->ki_pos)
> + return ret;
> + return iter.pos - iocb->ki_pos;
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  
> -- 
> 2.30.2
> 



Re: [PATCH 12/30] iomap: switch readahead and readpage to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:26AM +0200, Christoph Hellwig wrote:
> Switch the page cache read functions to use iomap_iter instead of
> iomap_apply.
> 
> Signed-off-by: Christoph Hellwig 

Looks reasonable,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/buffered-io.c | 80 +++---
>  1 file changed, 37 insertions(+), 43 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 26e16cc9d44931..9cda461887afad 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -241,11 +241,12 @@ static inline bool iomap_block_needs_zeroing(struct 
> inode *inode,
>   pos >= i_size_read(inode);
>  }
>  
> -static loff_t
> -iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void 
> *data,
> - struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_readpage_iter(struct iomap_iter *iter,
> + struct iomap_readpage_ctx *ctx, loff_t offset)
>  {
> - struct iomap_readpage_ctx *ctx = data;
> + struct iomap *iomap = >iomap;
> + loff_t pos = iter->pos + offset;
> + loff_t length = iomap_length(iter) - offset;
>   struct page *page = ctx->cur_page;
>   struct iomap_page *iop;
>   loff_t orig_pos = pos;
> @@ -253,15 +254,16 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
> loff_t length, void *data,
>   sector_t sector;
>  
>   if (iomap->type == IOMAP_INLINE)
> - return min(iomap_read_inline_data(inode, page, iomap), length);
> + return min(iomap_read_inline_data(iter->inode, page, iomap),
> +   length);
>  
>   /* zero post-eof blocks as the page may be mapped */
> - iop = iomap_page_create(inode, page);
> - iomap_adjust_read_range(inode, iop, , length, , );
> + iop = iomap_page_create(iter->inode, page);
> + iomap_adjust_read_range(iter->inode, iop, , length, , );
>   if (plen == 0)
>   goto done;
>  
> - if (iomap_block_needs_zeroing(inode, iomap, pos)) {
> + if (iomap_block_needs_zeroing(iter->inode, iomap, pos)) {
>   zero_user(page, poff, plen);
>   iomap_set_range_uptodate(page, poff, plen);
>   goto done;
> @@ -313,23 +315,23 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
> loff_t length, void *data,
>  int
>  iomap_readpage(struct page *page, const struct iomap_ops *ops)
>  {
> - struct iomap_readpage_ctx ctx = { .cur_page = page };
> - struct inode *inode = page->mapping->host;
> - unsigned poff;
> - loff_t ret;
> + struct iomap_iter iter = {
> + .inode  = page->mapping->host,
> + .pos= page_offset(page),
> + .len= PAGE_SIZE,
> + };
> + struct iomap_readpage_ctx ctx = {
> + .cur_page   = page,
> + };
> + int ret;
>  
>   trace_iomap_readpage(page->mapping->host, 1);
>  
> - for (poff = 0; poff < PAGE_SIZE; poff += ret) {
> - ret = iomap_apply(inode, page_offset(page) + poff,
> - PAGE_SIZE - poff, 0, ops, ,
> - iomap_readpage_actor);
> - if (ret <= 0) {
> - WARN_ON_ONCE(ret == 0);
> - SetPageError(page);
> - break;
> - }
> - }
> + while ((ret = iomap_iter(, ops)) > 0)
> + iter.processed = iomap_readpage_iter(, , 0);
> +
> + if (ret < 0)
> + SetPageError(page);
>  
>   if (ctx.bio) {
>   submit_bio(ctx.bio);
> @@ -348,15 +350,14 @@ iomap_readpage(struct page *page, const struct 
> iomap_ops *ops)
>  }
>  EXPORT_SYMBOL_GPL(iomap_readpage);
>  
> -static loff_t
> -iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_readahead_iter(struct iomap_iter *iter,
> + struct iomap_readpage_ctx *ctx)
>  {
> - struct iomap_readpage_ctx *ctx = data;
> + loff_t length = iomap_length(iter);
>   loff_t done, ret;
>  
>   for (done = 0; done < length; done += ret) {
> - if (ctx->cur_page && offset_in_page(pos + done) == 0) {
> + if (ctx->cur_page && offset_in_page(iter->pos + done) == 0) {
>   if (!ctx->cur_page_in_bio)
>   unlock_page(ctx->cur_page);
>   put_page(ctx->cur_page);
> @@ -366,8 +367,7 @

Re: [PATCH 10/30] iomap: fix the iomap_readpage_actor return value for inline data

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:24AM +0200, Christoph Hellwig wrote:
> The actor should never return a larger value than the length that was
> passed in.  The current code handles this gracefully, but the opcoming
> iter model will be more picky.

s/opcoming/upcoming/

With that fixed,
Reviewed-by: Darrick J. Wong 

--D

> 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/iomap/buffered-io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 44587209e6d7c7..26e16cc9d44931 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -205,7 +205,7 @@ struct iomap_readpage_ctx {
>   struct readahead_control *rac;
>  };
>  
> -static int iomap_read_inline_data(struct inode *inode, struct page *page,
> +static loff_t iomap_read_inline_data(struct inode *inode, struct page *page,
>   const struct iomap *iomap)
>  {
>   size_t size = i_size_read(inode) - iomap->offset;
> @@ -253,7 +253,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
> loff_t length, void *data,
>   sector_t sector;
>  
>   if (iomap->type == IOMAP_INLINE)
> - return iomap_read_inline_data(inode, page, iomap);
> + return min(iomap_read_inline_data(inode, page, iomap), length);
>  
>   /* zero post-eof blocks as the page may be mapped */
>   iop = iomap_page_create(inode, page);
> -- 
> 2.30.2
> 



Re: [PATCH 11/30] iomap: add the new iomap_iter model

2021-08-10 Thread Darrick J. Wong
On Tue, Aug 10, 2021 at 08:10:47AM +1000, Dave Chinner wrote:
> On Mon, Aug 09, 2021 at 08:12:25AM +0200, Christoph Hellwig wrote:
> > The iomap_iter struct provides a convenient way to package up and
> > maintain all the arguments to the various mapping and operation
> > functions.  It is operated on using the iomap_iter() function that
> > is called in loop until the whole range has been processed.  Compared
> > to the existing iomap_apply() function this avoid an indirect call
> > for each iteration.
> > 
> > For now iomap_iter() calls back into the existing ->iomap_begin and
> > ->iomap_end methods, but in the future this could be further optimized
> > to avoid indirect calls entirely.
> > 
> > Based on an earlier patch from Matthew Wilcox .
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  fs/iomap/Makefile |  1 +
> >  fs/iomap/core.c   | 79 +++
> >  fs/iomap/trace.h  | 37 +++-
> >  include/linux/iomap.h | 56 ++
> >  4 files changed, 172 insertions(+), 1 deletion(-)
> >  create mode 100644 fs/iomap/core.c
> > 
> > diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
> > index eef2722d93a183..6b56b10ded347a 100644
> > --- a/fs/iomap/Makefile
> > +++ b/fs/iomap/Makefile
> > @@ -10,6 +10,7 @@ obj-$(CONFIG_FS_IOMAP)+= iomap.o
> >  
> >  iomap-y+= trace.o \
> >apply.o \
> > +  core.o \
> 
> This creates a discontinuity in the iomap git history. Can you add
> these new functions to iomap/apply.c, then when the old apply code
> is removed later in the series rename the file to core.c? At least
> that way 'git log --follow fs/iomap/core.c' will walk back into the
> current history of fs/iomap/apply.c and the older pre-disaggregation
> fs/iomap.c without having to take the tree back in time to find
> those files...

...or put the new code in apply.c, remove iomap_apply, and don't bother
with the renaming at all?

I don't see much reason to break the git history.  This is effectively a
new epoch in iomap, but that is plainly obvious from the function
declarations.

I'll wander through the rest of the unreviewed patches tomorrow morning,
these are merely my off-the-cuff impressions.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com



Re: [PATCH 19/30] iomap: switch iomap_bmap to use iomap_iter

2021-08-10 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:33AM +0200, Christoph Hellwig wrote:
> Rewrite the ->bmap implementation based on iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/iomap/fiemap.c | 31 +--
>  1 file changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index acad09a8c188df..60daadba16c149 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -92,35 +92,30 @@ int iomap_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fi,
>  }
>  EXPORT_SYMBOL_GPL(iomap_fiemap);
>  
> -static loff_t
> -iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap, struct iomap *srcmap)
> -{
> - sector_t *bno = data, addr;
> -
> - if (iomap->type == IOMAP_MAPPED) {
> - addr = (pos - iomap->offset + iomap->addr) >> inode->i_blkbits;
> - *bno = addr;
> - }
> - return 0;
> -}
> -
>  /* legacy ->bmap interface.  0 is the error return (!) */
>  sector_t
>  iomap_bmap(struct address_space *mapping, sector_t bno,
>   const struct iomap_ops *ops)
>  {
> - struct inode *inode = mapping->host;
> - loff_t pos = bno << inode->i_blkbits;
> - unsigned blocksize = i_blocksize(inode);
> + struct iomap_iter iter = {
> + .inode  = mapping->host,
> + .pos= (loff_t)bno << mapping->host->i_blkbits,
> + .len= i_blocksize(mapping->host),
> + .flags  = IOMAP_REPORT,
> + };
>   int ret;
>  
>   if (filemap_write_and_wait(mapping))
>   return 0;
>  
>   bno = 0;
> - ret = iomap_apply(inode, pos, blocksize, 0, ops, ,
> -   iomap_bmap_actor);
> + while ((ret = iomap_iter(, ops)) > 0) {
> + if (iter.iomap.type != IOMAP_MAPPED)
> + continue;

I still feel uncomfortable about this use of "continue" here, because it
really means "call iomap_iter again to clean up and exit even though we
know it won't even look for more iomaps to iterate".

To me that feels subtly broken (I usually associate 'continue' with
'go run the loop body again'), and even though bmap has been a quirky
hot mess for 45 years, we don't need to make it even moreso.

Can't this at least be rephrased as:

const uint bno_shift = (mapping->host->i_blkbits - SECTOR_SHIFT);

while ((ret = iomap_iter(, ops)) > 0) {
if (iter.iomap.type == IOMAP_MAPPED)
bno = iomap_sector(iomap, iter.pos) << bno_shift;
/* leave iter.processed unset to stop iteration */
}

to make the loop exit more explicit?

--D

> + bno = (iter.pos - iter.iomap.offset + iter.iomap.addr) >>
> + mapping->host->i_blkbits;
> + }
> +
>   if (ret)
>   return 0;
>   return bno;
> -- 
> 2.30.2
> 



Re: [PATCH 05/30] iomap: mark the iomap argument to iomap_inline_data_valid const

2021-08-09 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:19AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 

Looks ok,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  include/linux/iomap.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 560247130357b5..76bfc5d16ef49d 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -109,7 +109,7 @@ static inline void *iomap_inline_data(const struct iomap 
> *iomap, loff_t pos)
>   * This is used to guard against accessing data beyond the page inline_data
>   * points at.
>   */
> -static inline bool iomap_inline_data_valid(struct iomap *iomap)
> +static inline bool iomap_inline_data_valid(const struct iomap *iomap)
>  {
>   return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data);
>  }
> -- 
> 2.30.2
> 



Re: [PATCH 04/30] iomap: mark the iomap argument to iomap_inline_data const

2021-08-09 Thread Darrick J. Wong
On Mon, Aug 09, 2021 at 08:12:18AM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 

Looks ok,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  include/linux/iomap.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 8030483331d17f..560247130357b5 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -99,7 +99,7 @@ static inline sector_t iomap_sector(const struct iomap 
> *iomap, loff_t pos)
>  /*
>   * Returns the inline data pointer for logical offset @pos.
>   */
> -static inline void *iomap_inline_data(struct iomap *iomap, loff_t pos)
> +static inline void *iomap_inline_data(const struct iomap *iomap, loff_t pos)
>  {
>   return iomap->inline_data + pos - iomap->offset;
>  }
> -- 
> 2.30.2
> 



Re: RFC: switch iomap to an iterator model

2021-07-29 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:34:53PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series replies the existing callback-based iomap_apply to an iter based
> model.  The prime aim here is to simply the DAX reflink support, which

Jan Kara pointed out that recent gcc and clang support a magic attribute
that causes a cleanup function to be called when an automatic variable
goes out of scope.  I've ported the XFS for_each_perag* macros to use
it, but I think this would be roughly (totally untested) what you'd do
for iomap iterators:

/* automatic iteration cleanup via macro hell */
struct iomap_iter_cleanup {
struct iomap_ops*ops;
struct iomap_iter   *iter;
loff_t  *ret;
};

static inline void iomap_iter_cleanup(struct iomap_iter_cleanup *ic)
{
struct iomap_iter *iter = ic->iter;
int ret2 = 0;

if (!iter->iomap.length || !ic->ops->iomap_end)
return;

ret2 = ops->iomap_end(iter->inode, iter->pos,
iomap_length(iter), 0, iter->flags,
>iomap);

if (ret2 && *ic->ret == 0)
*ic->ret = ret2;

iter->iomap.length = 0;
}

#define IOMAP_ITER_CLEANUP(pag) \
struct iomap_iter_cleanup __iomap_iter_cleanup \
__attribute__((__cleanup__(iomap_iter_cleanup))) = \
{ .iter = (iter), .ops = (ops), .ret = &(ret) }

#define for_each_iomap(iter, ops, ret) \
(ret) = iomap_iter((iter), (ops)); \
for (IOMAP_ITER_CLEANUP(iter, ops, ret); \
(ret) > 0; \
(ret) = iomap_iter((iter), (ops)) \

Then we actually /can/ write our iteration loops in the normal C style:

struct iomap_iter iter = {
.inode = ...,
.pos = 0,
.length = 32768,
};
loff_t ret = 0;

for_each_iomap(, ops, ret) {
if (iter.iomap.type != WHAT_I_WANT)
break;

ret = am_i_pissed_off(...);
if (ret)
return ret;
}

return ret;

and ->iomap_end will always get called.  There are a few sharp edges:

I can't figure out how far back clang and gcc support this attribute.
The gcc docs mention it at least far back as 3.3.6.  clang (afaict) docs
don't reference it directly, but the clang 4 docs claim that it can be
as pedantic as gcc w.r.t. attribute use.  That's more than new enough
for upstream, which requires gcc 4.9 or clang 10.

The /other/ problem is that gcc gets fussy about defining variables
inside the for loop parentheses, which means that any code using it has
to compile with -std=gnu99, which is /not/ the usual c89 that the kernel
uses.  OTOH, it's been 22 years since C99 was ratified, c'mon...

--D



Re: [PATCH 16/27] iomap: switch iomap_bmap to use iomap_iter

2021-07-27 Thread Darrick J. Wong
On Tue, Jul 27, 2021 at 08:31:38AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 26, 2021 at 09:39:22AM -0700, Darrick J. Wong wrote:
> > The documentation needs to be much more explicit about the fact that you
> > cannot "break;" your way out of an iomap_iter loop.  I think the comment
> > should be rewritten along these lines:
> > 
> > "Iterate over filesystem-provided space mappings for the provided file
> > range.  This function handles cleanup of resources acquired for
> > iteration when the filesystem indicates there are no more space
> > mappings, which means that this function must be called in a loop that
> > continues as long it returns a positive value.  If 0 or a negative value
> > is returned, the caller must not return to the loop body.  Within a loop
> > body, there are two ways to break out of the loop body: leave
> > @iter.processed unchanged, or set it to the usual negative errno."
> > 
> > Hm.
> 
> Yes, I'll update the documentation.

Ok, thanks!

> > Clunky, for sure, but at least we still get to use break as the language
> > designers intended.
> 
> I can't see any advantage there over just proper documentation.  If you
> are totally attached to a working break we might have to come up with
> a nasty for_each macro that ensures we have a final iomap_apply, but I
> doubt it is worth the effort.

I was pushing the explicit _break() function as a means to avoid an even
fuglier loop macro.

--D



Re: [PATCH 17/27] iomap: switch iomap_seek_hole to use iomap_iter

2021-07-26 Thread Darrick J. Wong
On Mon, Jul 26, 2021 at 10:22:36AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 19, 2021 at 10:22:47AM -0700, Darrick J. Wong wrote:
> > > -static loff_t
> > > -iomap_seek_hole_actor(struct inode *inode, loff_t start, loff_t length,
> > > -   void *data, struct iomap *iomap, struct iomap *srcmap)
> > > +static loff_t iomap_seek_hole_iter(const struct iomap_iter *iter, loff_t 
> > > *pos)
> > 
> > /me wonders if @pos should be named hole_pos (here and in the caller) to
> > make it a little easier to read...
> 
> Sure.
> 
> > ...because what we're really saying here is that if seek_hole_iter found
> > a hole (and returned zero, thereby terminating the loop before iter.len
> > could reach zero), we want to return the position of the hole.
> 
> Yes.
> 
> > > + return size;
> > 
> > Not sure why we return size here...?  Oh, because there's an implicit
> > hole at EOF, so we return i_size.  Uh, does this do the right thing if
> > ->iomap_begin returns posteof mappings?  I don't see anything in
> > iomap_iter_advance that would stop iteration at EOF.
> 
> Nothing in ->iomap_begin checks that, iomap_seek_hole initializes
> iter.len so that it stops at EOF.

Oh, right.  Sorry, I forgot that. :(

--D



Re: [PATCH 16/27] iomap: switch iomap_bmap to use iomap_iter

2021-07-26 Thread Darrick J. Wong
On Mon, Jul 26, 2021 at 10:19:42AM +0200, Christoph Hellwig wrote:
> On Mon, Jul 19, 2021 at 10:05:45AM -0700, Darrick J. Wong wrote:
> > >   bno = 0;
> > > - ret = iomap_apply(inode, pos, blocksize, 0, ops, ,
> > > -   iomap_bmap_actor);
> > > + while ((ret = iomap_iter(, ops)) > 0) {
> > > + if (iter.iomap.type != IOMAP_MAPPED)
> > > + continue;
> > 
> > There isn't a mapped extent, so return 0 here, right?
> 
> We can't just return 0, we always need the final iomap_iter() call
> to clean up in case a ->iomap_end method is supplied.  No for bmap
> having and needing one is rather theoretical, but people will copy
> and paste that once we start breaking the rules.

Oh, right, I forgot that someone might want to ->iomap_end.  The
"continue" works because we only asked for one block, therefore we know
that we'll never get to the loop body a second time; and we ignore
iter.processed, which also means we never revisit the loop body.

This "continue without setting iter.processed to break out of loop"
pattern is a rather indirect subtlety, since C programmers are taught
that they can break out of a loop using break;.  This new iomap_iter
pattern fubars that longstanding language feature, and the language
around it is soft:

> /**
>  * iomap_iter - iterate over a ranges in a file
>  * @iter: iteration structue
>  * @ops: iomap ops provided by the file system
>  *
>  * Iterate over file system provided contiguous ranges of blocks with the same
>  * state.  Should be called in a loop that continues as long as this function
>  * returns a positive value.  If 0 or a negative value is returned the caller
>  * should break out of the loop - a negative value is an error either from the
>  * file system or from the last iteration stored in @iter.copied.
>  */

The documentation needs to be much more explicit about the fact that you
cannot "break;" your way out of an iomap_iter loop.  I think the comment
should be rewritten along these lines:

"Iterate over filesystem-provided space mappings for the provided file
range.  This function handles cleanup of resources acquired for
iteration when the filesystem indicates there are no more space
mappings, which means that this function must be called in a loop that
continues as long it returns a positive value.  If 0 or a negative value
is returned, the caller must not return to the loop body.  Within a loop
body, there are two ways to break out of the loop body: leave
@iter.processed unchanged, or set it to the usual negative errno."

Hm.

What if we provide an explicit loop break function?  That would be clear
overkill for bmap, but somebody else wanting to break out of a more
complex loop body ought to be able to say "break" to do that, not
"continue with subtleties".

static inline int
iomap_iter_break(struct iomap_iter *iter, int ret)
{
int ret2;

if (!iter->iomap.length || !ops->iomap_end)
return ret;

ret2 = ops->iomap_end(iter->inode, iter->pos, iomap_length(iter),
0, iter->flags, >iomap);
return ret ? ret : ret2;
}

And then then theoretical loop body becomes:

while ((ret = iomap_iter(, ops)) > 0) {
if (iter.iomap.type != WHAT_I_WANT) {
ret = iomap_iter_break(, 0);
break;
}



ret = vfs_do_some_risky_thing(...);
if (ret) {
ret = iomap_iter_break(, ret);
break;
}



iter.processed = iter.iomap.length;
}
return ret;

Clunky, for sure, but at least we still get to use break as the language
designers intended.

--D



Re: RFC: switch iomap to an iterator model

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:34:53PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series replies the existing callback-based iomap_apply to an iter based
> model.  The prime aim here is to simply the DAX reflink support, which
> requires iterating through two inodes, something that is rather painful
> with the apply model.  It also helps to kill an indirect call per segment
> as-is.  Compared to the earlier patchset from Matthew Wilcox that this
> series is based upon it does not eliminate all indirect calls, but as the
> upside it does not change the file systems at all (except for the btrfs
> and gfs2 hooks which have slight prototype changes).

FWIW patches 9-20 look ok to me, modulo the discussion I started in
patch 8 about defining a distinct type for iomap byte lengths instead of
the combination of int/ssize_t/u64 that we use now.

> This passes basic testing on XFS for block based file systems.  The DAX
> changes are entirely untested as I haven't managed to get pmem work in
> recent qemu.

This gets increasingly difficult as time goes by.

Right now I have the following bits of libvirt xml in the vm
definitions:

  1073741824
  

  
/run/g.mem
  
  
10487808
0
  
  

  

Which seems to translate to:

-machine pc-q35-4.2,accel=kvm,usb=off,vmport=off,dump-guest-core=off,nvdimm=on
-object 
memory-backend-file,id=memnvdimm0,prealloc=no,mem-path=/run/g.mem,share=yes,size=10739515392,align=128M
-device nvdimm,memdev=memnvdimm0,id=nvdimm0,slot=0,label-size=2M

Evidently something was added to the pmem code(?) that makes it fussy if
the memory region doesn't align to a 128M boundary or the label isn't
big enough for ... whatever gets written into them.

The file /run/g.mem is intended to provide 10GB of pmem to the VM, with
an additional 2M allocated for the label.

--D

> Diffstat:
>  b/fs/btrfs/inode.c   |5 
>  b/fs/buffer.c|4 
>  b/fs/dax.c   |  578 
> ++-
>  b/fs/gfs2/bmap.c |5 
>  b/fs/internal.h  |4 
>  b/fs/iomap/Makefile  |2 
>  b/fs/iomap/buffered-io.c |  344 +--
>  b/fs/iomap/direct-io.c   |  162 ++---
>  b/fs/iomap/fiemap.c  |  101 +++-
>  b/fs/iomap/iter.c|   74 ++
>  b/fs/iomap/seek.c|   88 +++
>  b/fs/iomap/swapfile.c|   38 +--
>  b/fs/iomap/trace.h   |   35 +-
>  b/include/linux/iomap.h  |   73 -
>  fs/iomap/apply.c |   99 
>  15 files changed, 777 insertions(+), 835 deletions(-)



Re: [PATCH 21/27] iomap: remove iomap_apply

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:35:14PM +0200, Christoph Hellwig wrote:
> iomap_apply is unused now, so remove it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/iomap/Makefile |  1 -
>  fs/iomap/apply.c  | 99 ---
>  fs/iomap/trace.h  | 40 -
>  include/linux/iomap.h | 10 -
>  4 files changed, 150 deletions(-)

mmm, negative LOC delta ;)
Reviewed-by: Darrick J. Wong 

--D

>  delete mode 100644 fs/iomap/apply.c
> 
> diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
> index 85034deb5a2f19..ebd9866d80ae90 100644
> --- a/fs/iomap/Makefile
> +++ b/fs/iomap/Makefile
> @@ -9,7 +9,6 @@ ccflags-y += -I $(srctree)/$(src) # needed for 
> trace events
>  obj-$(CONFIG_FS_IOMAP)   += iomap.o
>  
>  iomap-y  += trace.o \
> -apply.o \
>  iter.o \
>  buffered-io.o \
>  direct-io.o \
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> deleted file mode 100644
> index 26ab6563181fc6..00
> --- a/fs/iomap/apply.c
> +++ /dev/null
> @@ -1,99 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * Copyright (C) 2010 Red Hat, Inc.
> - * Copyright (c) 2016-2018 Christoph Hellwig.
> - */
> -#include 
> -#include 
> -#include 
> -#include 
> -#include "trace.h"
> -
> -/*
> - * Execute a iomap write on a segment of the mapping that spans a
> - * contiguous range of pages that have identical block mapping state.
> - *
> - * This avoids the need to map pages individually, do individual allocations
> - * for each page and most importantly avoid the need for filesystem specific
> - * locking per page. Instead, all the operations are amortised over the 
> entire
> - * range of pages. It is assumed that the filesystems will lock whatever
> - * resources they require in the iomap_begin call, and release them in the
> - * iomap_end call.
> - */
> -loff_t
> -iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
> - const struct iomap_ops *ops, void *data, iomap_actor_t actor)
> -{
> - struct iomap iomap = { .type = IOMAP_HOLE };
> - struct iomap srcmap = { .type = IOMAP_HOLE };
> - loff_t written = 0, ret;
> - u64 end;
> -
> - trace_iomap_apply(inode, pos, length, flags, ops, actor, _RET_IP_);
> -
> - /*
> -  * Need to map a range from start position for length bytes. This can
> -  * span multiple pages - it is only guaranteed to return a range of a
> -  * single type of pages (e.g. all into a hole, all mapped or all
> -  * unwritten). Failure at this point has nothing to undo.
> -  *
> -  * If allocation is required for this range, reserve the space now so
> -  * that the allocation is guaranteed to succeed later on. Once we copy
> -  * the data into the page cache pages, then we cannot fail otherwise we
> -  * expose transient stale data. If the reserve fails, we can safely
> -  * back out at this point as there is nothing to undo.
> -  */
> - ret = ops->iomap_begin(inode, pos, length, flags, , );
> - if (ret)
> - return ret;
> - if (WARN_ON(iomap.offset > pos)) {
> - written = -EIO;
> - goto out;
> - }
> - if (WARN_ON(iomap.length == 0)) {
> - written = -EIO;
> - goto out;
> - }
> -
> - trace_iomap_apply_dstmap(inode, );
> - if (srcmap.type != IOMAP_HOLE)
> - trace_iomap_apply_srcmap(inode, );
> -
> - /*
> -  * Cut down the length to the one actually provided by the filesystem,
> -  * as it might not be able to give us the whole size that we requested.
> -  */
> - end = iomap.offset + iomap.length;
> - if (srcmap.type != IOMAP_HOLE)
> - end = min(end, srcmap.offset + srcmap.length);
> - if (pos + length > end)
> - length = end - pos;
> -
> - /*
> -  * Now that we have guaranteed that the space allocation will succeed,
> -  * we can do the copy-in page by page without having to worry about
> -  * failures exposing transient data.
> -  *
> -  * To support COW operations, we read in data for partially blocks from
> -  * the srcmap if the file system filled it in.  In that case we the
> -  * length needs to be limited to the earlier of the ends of the iomaps.
> -  * If the file system did not provide a srcmap we pass in the normal
> -  * iomap into the actors so that they don't need to have special
> -  * handling for the two cases

Re: [PATCH 22/27] iomap: pass an iomap_iter to various buffered I/O helpers

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:35:15PM +0200, Christoph Hellwig wrote:
> Pass the iomap_iter structure instead of individual parameters to
> various internal helpers for buffered I/O.
> 
> Signed-off-by: Christoph Hellwig 

Looks ok,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/buffered-io.c | 117 -
>  1 file changed, 56 insertions(+), 61 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index c273b5d88dd8a8..daabbe8d7edfb5 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -226,12 +226,14 @@ iomap_read_inline_data(struct inode *inode, struct page 
> *page,
>   SetPageUptodate(page);
>  }
>  
> -static inline bool iomap_block_needs_zeroing(struct inode *inode,
> - struct iomap *iomap, loff_t pos)
> +static inline bool iomap_block_needs_zeroing(struct iomap_iter *iter,
> + loff_t pos)
>  {
> - return iomap->type != IOMAP_MAPPED ||
> - (iomap->flags & IOMAP_F_NEW) ||
> - pos >= i_size_read(inode);
> + struct iomap *srcmap = iomap_iter_srcmap(iter);
> +
> + return srcmap->type != IOMAP_MAPPED ||
> + (srcmap->flags & IOMAP_F_NEW) ||
> + pos >= i_size_read(iter->inode);
>  }
>  
>  static loff_t iomap_readpage_iter(struct iomap_iter *iter,
> @@ -259,7 +261,7 @@ static loff_t iomap_readpage_iter(struct iomap_iter *iter,
>   if (plen == 0)
>   goto done;
>  
> - if (iomap_block_needs_zeroing(iter->inode, iomap, pos)) {
> + if (iomap_block_needs_zeroing(iter, pos)) {
>   zero_user(page, poff, plen);
>   iomap_set_range_uptodate(page, poff, plen);
>   goto done;
> @@ -541,12 +543,12 @@ iomap_read_page_sync(loff_t block_start, struct page 
> *page, unsigned poff,
>   return submit_bio_wait();
>  }
>  
> -static int
> -__iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, int flags,
> - struct page *page, struct iomap *srcmap)
> +static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> + unsigned len, int flags, struct page *page)
>  {
> - struct iomap_page *iop = iomap_page_create(inode, page);
> - loff_t block_size = i_blocksize(inode);
> + struct iomap *srcmap = iomap_iter_srcmap(iter);
> + struct iomap_page *iop = iomap_page_create(iter->inode, page);
> + loff_t block_size = i_blocksize(iter->inode);
>   loff_t block_start = round_down(pos, block_size);
>   loff_t block_end = round_up(pos + len, block_size);
>   unsigned from = offset_in_page(pos), to = from + len, poff, plen;
> @@ -556,7 +558,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, 
> unsigned len, int flags,
>   ClearPageError(page);
>  
>   do {
> - iomap_adjust_read_range(inode, iop, _start,
> + iomap_adjust_read_range(iter->inode, iop, _start,
>   block_end - block_start, , );
>   if (plen == 0)
>   break;
> @@ -566,7 +568,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, 
> unsigned len, int flags,
>   (to <= poff || to >= poff + plen))
>   continue;
>  
> - if (iomap_block_needs_zeroing(inode, srcmap, block_start)) {
> + if (iomap_block_needs_zeroing(iter, block_start)) {
>   if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
>   return -EIO;
>   zero_user_segments(page, poff, from, to, poff + plen);
> @@ -582,41 +584,40 @@ __iomap_write_begin(struct inode *inode, loff_t pos, 
> unsigned len, int flags,
>   return 0;
>  }
>  
> -static int
> -iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned 
> flags,
> - struct page **pagep, struct iomap *iomap, struct iomap *srcmap)
> +static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, unsigned 
> len,
> + unsigned flags, struct page **pagep)
>  {
> - const struct iomap_page_ops *page_ops = iomap->page_ops;
> + const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> + struct iomap *srcmap = iomap_iter_srcmap(iter);
>   struct page *page;
>   int status = 0;
>  
> - BUG_ON(pos + len > iomap->offset + iomap->length);
> - if (srcmap != iomap)
> + BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
> + if (srcmap != >iomap)
>   BUG_ON(pos + len > srcmap->offset + srcmap->length);
>  
>   if (fatal

Re: [PATCH 23/27] iomap: rework unshare flag

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:35:16PM +0200, Christoph Hellwig wrote:
> Instead of another internal flags namespace inside of buffered-io.c,
> just pass a UNSHARE hint in the main iomap flags field.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/iomap/buffered-io.c | 23 +--
>  include/linux/iomap.h  |  1 +
>  2 files changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index daabbe8d7edfb5..eb5d742b5bf8b7 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -511,10 +511,6 @@ iomap_migrate_page(struct address_space *mapping, struct 
> page *newpage,
>  EXPORT_SYMBOL_GPL(iomap_migrate_page);
>  #endif /* CONFIG_MIGRATION */
>  
> -enum {
> - IOMAP_WRITE_F_UNSHARE   = (1 << 0),
> -};

Oh good, this finally dies.
Reviewed-by: Darrick J. Wong 

--D

> -
>  static void
>  iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
>  {
> @@ -544,7 +540,7 @@ iomap_read_page_sync(loff_t block_start, struct page 
> *page, unsigned poff,
>  }
>  
>  static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> - unsigned len, int flags, struct page *page)
> + unsigned len, struct page *page)
>  {
>   struct iomap *srcmap = iomap_iter_srcmap(iter);
>   struct iomap_page *iop = iomap_page_create(iter->inode, page);
> @@ -563,13 +559,13 @@ static int __iomap_write_begin(struct iomap_iter *iter, 
> loff_t pos,
>   if (plen == 0)
>   break;
>  
> - if (!(flags & IOMAP_WRITE_F_UNSHARE) &&
> + if (!(iter->flags & IOMAP_UNSHARE) &&
>   (from <= poff || from >= poff + plen) &&
>   (to <= poff || to >= poff + plen))
>   continue;
>  
>   if (iomap_block_needs_zeroing(iter, block_start)) {
> - if (WARN_ON_ONCE(flags & IOMAP_WRITE_F_UNSHARE))
> + if (WARN_ON_ONCE(iter->flags & IOMAP_UNSHARE))
>   return -EIO;
>   zero_user_segments(page, poff, from, to, poff + plen);
>   } else {
> @@ -585,7 +581,7 @@ static int __iomap_write_begin(struct iomap_iter *iter, 
> loff_t pos,
>  }
>  
>  static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, unsigned 
> len,
> - unsigned flags, struct page **pagep)
> + struct page **pagep)
>  {
>   const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
>   struct iomap *srcmap = iomap_iter_srcmap(iter);
> @@ -617,7 +613,7 @@ static int iomap_write_begin(struct iomap_iter *iter, 
> loff_t pos, unsigned len,
>   else if (iter->iomap.flags & IOMAP_F_BUFFER_HEAD)
>   status = __block_write_begin_int(page, pos, len, NULL, srcmap);
>   else
> - status = __iomap_write_begin(iter, pos, len, flags, page);
> + status = __iomap_write_begin(iter, pos, len, page);
>  
>   if (unlikely(status))
>   goto out_unlock;
> @@ -748,7 +744,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, 
> struct iov_iter *i)
>   break;
>   }
>  
> - status = iomap_write_begin(iter, pos, bytes, 0, );
> + status = iomap_write_begin(iter, pos, bytes, );
>   if (unlikely(status))
>   break;
>  
> @@ -825,8 +821,7 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter)
>   unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length);
>   struct page *page;
>  
> - status = iomap_write_begin(iter, pos, bytes,
> - IOMAP_WRITE_F_UNSHARE, );
> + status = iomap_write_begin(iter, pos, bytes, );
>   if (unlikely(status))
>   return status;
>  
> @@ -854,7 +849,7 @@ iomap_file_unshare(struct inode *inode, loff_t pos, 
> loff_t len,
>   .inode  = inode,
>   .pos= pos,
>   .len= len,
> - .flags  = IOMAP_WRITE,
> + .flags  = IOMAP_WRITE | IOMAP_UNSHARE,
>   };
>   int ret;
>  
> @@ -871,7 +866,7 @@ static s64 __iomap_zero_iter(struct iomap_iter *iter, 
> loff_t pos, u64 length)
>   unsigned offset = offset_in_page(pos);
>   unsigned bytes = min_t(u64, PAGE_SIZE - offset, length);
>  
> - status = iomap_write_begin(iter, pos, bytes, 0, );
> + status = iomap_write_begin(iter, pos, bytes, );
>   if (status)
>   r

Re: [PATCH 27/27] iomap: constify iomap_iter_srcmap

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:35:20PM +0200, Christoph Hellwig wrote:
> The srcmap returned from iomap_iter_srcmap is never modified, so mark
> the iomap returned from it const and constify a lot of code that never
> modifies the iomap.
> 
> Signed-off-by: Christoph Hellwig 

LGTM!
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/buffered-io.c | 32 
>  include/linux/iomap.h  |  2 +-
>  2 files changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index eb5d742b5bf8b7..a2dd42f3115cfa 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -226,20 +226,20 @@ iomap_read_inline_data(struct inode *inode, struct page 
> *page,
>   SetPageUptodate(page);
>  }
>  
> -static inline bool iomap_block_needs_zeroing(struct iomap_iter *iter,
> +static inline bool iomap_block_needs_zeroing(const struct iomap_iter *iter,
>   loff_t pos)
>  {
> - struct iomap *srcmap = iomap_iter_srcmap(iter);
> + const struct iomap *srcmap = iomap_iter_srcmap(iter);
>  
>   return srcmap->type != IOMAP_MAPPED ||
>   (srcmap->flags & IOMAP_F_NEW) ||
>   pos >= i_size_read(iter->inode);
>  }
>  
> -static loff_t iomap_readpage_iter(struct iomap_iter *iter,
> +static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
>   struct iomap_readpage_ctx *ctx, loff_t offset)
>  {
> - struct iomap *iomap = >iomap;
> + const struct iomap *iomap = >iomap;
>   loff_t pos = iter->pos + offset;
>   loff_t length = iomap_length(iter) - offset;
>   struct page *page = ctx->cur_page;
> @@ -355,7 +355,7 @@ iomap_readpage(struct page *page, const struct iomap_ops 
> *ops)
>  }
>  EXPORT_SYMBOL_GPL(iomap_readpage);
>  
> -static loff_t iomap_readahead_iter(struct iomap_iter *iter,
> +static loff_t iomap_readahead_iter(const struct iomap_iter *iter,
>   struct iomap_readpage_ctx *ctx)
>  {
>   loff_t length = iomap_length(iter);
> @@ -539,10 +539,10 @@ iomap_read_page_sync(loff_t block_start, struct page 
> *page, unsigned poff,
>   return submit_bio_wait();
>  }
>  
> -static int __iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> +static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
>   unsigned len, struct page *page)
>  {
> - struct iomap *srcmap = iomap_iter_srcmap(iter);
> + const struct iomap *srcmap = iomap_iter_srcmap(iter);
>   struct iomap_page *iop = iomap_page_create(iter->inode, page);
>   loff_t block_size = i_blocksize(iter->inode);
>   loff_t block_start = round_down(pos, block_size);
> @@ -580,11 +580,11 @@ static int __iomap_write_begin(struct iomap_iter *iter, 
> loff_t pos,
>   return 0;
>  }
>  
> -static int iomap_write_begin(struct iomap_iter *iter, loff_t pos, unsigned 
> len,
> - struct page **pagep)
> +static int iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
> + unsigned len, struct page **pagep)
>  {
>   const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> - struct iomap *srcmap = iomap_iter_srcmap(iter);
> + const struct iomap *srcmap = iomap_iter_srcmap(iter);
>   struct page *page;
>   int status = 0;
>  
> @@ -655,10 +655,10 @@ static size_t __iomap_write_end(struct inode *inode, 
> loff_t pos, size_t len,
>   return copied;
>  }
>  
> -static size_t iomap_write_end_inline(struct iomap_iter *iter, struct page 
> *page,
> - loff_t pos, size_t copied)
> +static size_t iomap_write_end_inline(const struct iomap_iter *iter,
> + struct page *page, loff_t pos, size_t copied)
>  {
> - struct iomap *iomap = >iomap;
> + const struct iomap *iomap = >iomap;
>   void *addr;
>  
>   WARN_ON_ONCE(!PageUptodate(page));
> @@ -678,7 +678,7 @@ static size_t iomap_write_end(struct iomap_iter *iter, 
> loff_t pos, size_t len,
>   size_t copied, struct page *page)
>  {
>   const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> - struct iomap *srcmap = iomap_iter_srcmap(iter);
> + const struct iomap *srcmap = iomap_iter_srcmap(iter);
>   loff_t old_size = iter->inode->i_size;
>   size_t ret;
>  
> @@ -803,7 +803,7 @@ EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>  static loff_t iomap_unshare_iter(struct iomap_iter *iter)
>  {
>   struct iomap *iomap = >iomap;
> - struct iomap *srcmap = iomap_iter_srcmap(iter);
> + const struct iomap *srcmap = iomap_iter_srcmap(iter);
>   loff_t pos = iter->pos;
>

Re: [PATCH 07/27] iomap: mark the iomap argument to iomap_read_page_sync const

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:35:00PM +0200, Christoph Hellwig wrote:
> iomap_read_page_sync never modifies the passed in iomap, so mark
> it const.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/buffered-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index e47380259cf7e1..8c26cf7cbd72b0 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -535,7 +535,7 @@ iomap_write_failed(struct inode *inode, loff_t pos, 
> unsigned len)
>  
>  static int
>  iomap_read_page_sync(loff_t block_start, struct page *page, unsigned poff,
> - unsigned plen, struct iomap *iomap)
> + unsigned plen, const struct iomap *iomap)
>  {
>   struct bio_vec bvec;
>   struct bio bio;
> -- 
> 2.30.2
> 



Re: [PATCH 06/27] iomap: mark the iomap argument to iomap_read_inline_data const

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:34:59PM +0200, Christoph Hellwig wrote:
> iomap_read_inline_data never modifies the passed in iomap, so mark
> it const.
> 
> Signed-off-by: Christoph Hellwig 

Looks good,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/buffered-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 75310f6fcf8401..e47380259cf7e1 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -207,7 +207,7 @@ struct iomap_readpage_ctx {
>  
>  static void
>  iomap_read_inline_data(struct inode *inode, struct page *page,
> - struct iomap *iomap)
> + const struct iomap *iomap)
>  {
>   size_t size = i_size_read(inode);
>   void *addr;
> -- 
> 2.30.2
> 



Re: [PATCH 05/27] fsdax: mark the iomap argument to dax_iomap_sector as const

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:34:58PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 

LGTM
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/dax.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index da41f9363568e0..4d63040fd71f56 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1005,7 +1005,7 @@ int dax_writeback_mapping_range(struct address_space 
> *mapping,
>  }
>  EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
>  
> -static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
> +static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
>  {
>   return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
>  }
> -- 
> 2.30.2
> 



Re: [PATCH 04/27] fs: mark the iomap argument to __block_write_begin_int const

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:34:57PM +0200, Christoph Hellwig wrote:
> __block_write_begin_int never modifies the passed in iomap, so mark it
> const.
> 
> Signed-off-by: Christoph Hellwig 

Looks ok,
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/buffer.c   | 4 ++--
>  fs/internal.h | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 6290c3afdba488..bd6a9e9fbd64c9 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1912,7 +1912,7 @@ EXPORT_SYMBOL(page_zero_new_buffers);
>  
>  static void
>  iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
> - struct iomap *iomap)
> + const struct iomap *iomap)
>  {
>   loff_t offset = block << inode->i_blkbits;
>  
> @@ -1966,7 +1966,7 @@ iomap_to_bh(struct inode *inode, sector_t block, struct 
> buffer_head *bh,
>  }
>  
>  int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
> - get_block_t *get_block, struct iomap *iomap)
> + get_block_t *get_block, const struct iomap *iomap)
>  {
>   unsigned from = pos & (PAGE_SIZE - 1);
>   unsigned to = from + len;
> diff --git a/fs/internal.h b/fs/internal.h
> index 3ce8edbaa3ca2f..9ad6b5157584b8 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -48,8 +48,8 @@ static inline int emergency_thaw_bdev(struct super_block 
> *sb)
>  /*
>   * buffer.c
>   */
> -extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned 
> len,
> - get_block_t *get_block, struct iomap *iomap);
> +int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
> + get_block_t *get_block, const struct iomap *iomap);
>  
>  /*
>   * char_dev.c
> -- 
> 2.30.2
> 



Re: [PATCH 26/27] fsdax: switch the fault handlers to use iomap_iter

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:35:19PM +0200, Christoph Hellwig wrote:
> Avoid the open coded calls to ->iomap_begin and ->iomap_end and call
> iomap_iter instead.
> 
> Signed-off-by: Christoph Hellwig 

Finally this nightmare is over...
Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/dax.c | 193 +--
>  1 file changed, 75 insertions(+), 118 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 6d0c6d28be83b1..118c9e2923f5f8 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1010,7 +1010,7 @@ static sector_t dax_iomap_sector(const struct iomap 
> *iomap, loff_t pos)
>   return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
>  }
>  
> -static int dax_iomap_pfn(struct iomap *iomap, loff_t pos, size_t size,
> +static int dax_iomap_pfn(const struct iomap *iomap, loff_t pos, size_t size,
>pfn_t *pfnp)
>  {
>   const sector_t sector = dax_iomap_sector(iomap, pos);
> @@ -1068,7 +1068,7 @@ static vm_fault_t dax_load_hole(struct xa_state *xas,
>  
>  #ifdef CONFIG_FS_DAX_PMD
>  static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault 
> *vmf,
> - struct iomap *iomap, void **entry)
> + const struct iomap *iomap, void **entry)
>  {
>   struct address_space *mapping = vmf->vma->vm_file->f_mapping;
>   unsigned long pmd_addr = vmf->address & PMD_MASK;
> @@ -1120,7 +1120,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state 
> *xas, struct vm_fault *vmf,
>  }
>  #else
>  static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, struct vm_fault 
> *vmf,
> - struct iomap *iomap, void **entry)
> + const struct iomap *iomap, void **entry)
>  {
>   return VM_FAULT_FALLBACK;
>  }
> @@ -1309,7 +1309,7 @@ static vm_fault_t dax_fault_return(int error)
>   * flushed on write-faults (non-cow), but not read-faults.
>   */
>  static bool dax_fault_is_synchronous(unsigned long flags,
> - struct vm_area_struct *vma, struct iomap *iomap)
> + struct vm_area_struct *vma, const struct iomap *iomap)
>  {
>   return (flags & IOMAP_WRITE) && (vma->vm_flags & VM_SYNC)
>   && (iomap->flags & IOMAP_F_DIRTY);
> @@ -1329,22 +1329,22 @@ static vm_fault_t dax_fault_synchronous_pfnp(pfn_t 
> *pfnp, pfn_t pfn)
>   return VM_FAULT_NEEDDSYNC;
>  }
>  
> -static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf, struct iomap 
> *iomap,
> - loff_t pos)
> +static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf,
> + const struct iomap_iter *iter)
>  {
> - sector_t sector = dax_iomap_sector(iomap, pos);
> + sector_t sector = dax_iomap_sector(>iomap, iter->pos);
>   unsigned long vaddr = vmf->address;
>   vm_fault_t ret;
>   int error = 0;
>  
> - switch (iomap->type) {
> + switch (iter->iomap.type) {
>   case IOMAP_HOLE:
>   case IOMAP_UNWRITTEN:
>   clear_user_highpage(vmf->cow_page, vaddr);
>   break;
>   case IOMAP_MAPPED:
> - error = copy_cow_page_dax(iomap->bdev, iomap->dax_dev, sector,
> -   vmf->cow_page, vaddr);
> + error = copy_cow_page_dax(iter->iomap.bdev, iter->iomap.dax_dev,
> +   sector, vmf->cow_page, vaddr);
>   break;
>   default:
>   WARN_ON_ONCE(1);
> @@ -1363,29 +1363,31 @@ static vm_fault_t dax_fault_cow_page(struct vm_fault 
> *vmf, struct iomap *iomap,
>  }
>  
>  /**
> - * dax_fault_actor - Common actor to handle pfn insertion in PTE/PMD fault.
> + * dax_fault_iter - Common actor to handle pfn insertion in PTE/PMD fault.
>   * @vmf: vm fault instance
> + * @iter:iomap iter
>   * @pfnp:pfn to be returned
>   * @xas: the dax mapping tree of a file
>   * @entry:   an unlocked dax entry to be inserted
>   * @pmd: distinguish whether it is a pmd fault
> - * @flags:   iomap flags
> - * @iomap:   from iomap_begin()
> - * @srcmap:  from iomap_begin(), not equal to iomap if it is a CoW
>   */
> -static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
> - struct xa_state *xas, void **entry, bool pmd,
> - unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
> +static vm_fault_t dax_fault_iter(struct vm_fault *vmf,
> + const struct iomap_iter *iter, pfn_t *pfnp,
> + struct xa_state *xas, void **entry, bool pmd)
>  {
>   struct address_space *mapping = vmf->vma->vm_file->f_mapping;
&g

Re: [PATCH 17/27] iomap: switch iomap_seek_hole to use iomap_iter

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:35:10PM +0200, Christoph Hellwig wrote:
> Rewrite iomap_seek_hole to use iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/iomap/seek.c | 46 +++---
>  1 file changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c
> index ce6fb810854fec..7d6ed9af925e96 100644
> --- a/fs/iomap/seek.c
> +++ b/fs/iomap/seek.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (C) 2017 Red Hat, Inc.
> - * Copyright (c) 2018 Christoph Hellwig.
> + * Copyright (c) 2018-2021 Christoph Hellwig.
>   */
>  #include 
>  #include 
> @@ -10,21 +10,19 @@
>  #include 
>  #include 
>  
> -static loff_t
> -iomap_seek_hole_actor(struct inode *inode, loff_t start, loff_t length,
> -   void *data, struct iomap *iomap, struct iomap *srcmap)
> +static loff_t iomap_seek_hole_iter(const struct iomap_iter *iter, loff_t 
> *pos)

/me wonders if @pos should be named hole_pos (here and in the caller) to
make it a little easier to read...

>  {
> - loff_t offset = start;
> + loff_t length = iomap_length(iter);
>  
> - switch (iomap->type) {
> + switch (iter->iomap.type) {
>   case IOMAP_UNWRITTEN:
> - offset = mapping_seek_hole_data(inode->i_mapping, start,
> - start + length, SEEK_HOLE);
> - if (offset == start + length)
> + *pos = mapping_seek_hole_data(iter->inode->i_mapping,
> + iter->pos, iter->pos + length, SEEK_HOLE);
> + if (*pos == iter->pos + length)
>   return length;
> - fallthrough;
> + return 0;
>   case IOMAP_HOLE:
> - *(loff_t *)data = offset;
> + *pos = iter->pos;
>   return 0;
>   default:
>   return length;
> @@ -35,23 +33,25 @@ loff_t
>  iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops 
> *ops)
>  {
>   loff_t size = i_size_read(inode);
> - loff_t ret;
> + struct iomap_iter iter = {
> + .inode  = inode,
> + .pos= offset,
> + .flags  = IOMAP_REPORT,
> + };
> + int ret;
>  
>   /* Nothing to be found before or beyond the end of the file. */
>   if (offset < 0 || offset >= size)
>   return -ENXIO;
>  
> - while (offset < size) {
> - ret = iomap_apply(inode, offset, size - offset, IOMAP_REPORT,
> -   ops, , iomap_seek_hole_actor);
> - if (ret < 0)
> - return ret;
> - if (ret == 0)
> - break;
> - offset += ret;
> - }
> -
> - return offset;
> + iter.len = size - offset;
> + while ((ret = iomap_iter(, ops)) > 0)
> + iter.processed = iomap_seek_hole_iter(, );
> + if (ret < 0)
> + return ret;
> + if (iter.len)
> + return offset;

...because what we're really saying here is that if seek_hole_iter found
a hole (and returned zero, thereby terminating the loop before iter.len
could reach zero), we want to return the position of the hole.

> + return size;

Not sure why we return size here...?  Oh, because there's an implicit
hole at EOF, so we return i_size.  Uh, does this do the right thing if
->iomap_begin returns posteof mappings?  I don't see anything in
iomap_iter_advance that would stop iteration at EOF.

--D

>  }
>  EXPORT_SYMBOL_GPL(iomap_seek_hole);
>  
> -- 
> 2.30.2
> 



Re: [PATCH 16/27] iomap: switch iomap_bmap to use iomap_iter

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:35:09PM +0200, Christoph Hellwig wrote:
> Rewrite the ->bmap implementation based on iomap_iter.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/iomap/fiemap.c | 31 +--
>  1 file changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c
> index acad09a8c188df..60daadba16c149 100644
> --- a/fs/iomap/fiemap.c
> +++ b/fs/iomap/fiemap.c
> @@ -92,35 +92,30 @@ int iomap_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fi,
>  }
>  EXPORT_SYMBOL_GPL(iomap_fiemap);
>  
> -static loff_t
> -iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length,
> - void *data, struct iomap *iomap, struct iomap *srcmap)
> -{
> - sector_t *bno = data, addr;
> -
> - if (iomap->type == IOMAP_MAPPED) {
> - addr = (pos - iomap->offset + iomap->addr) >> inode->i_blkbits;
> - *bno = addr;
> - }
> - return 0;
> -}
> -
>  /* legacy ->bmap interface.  0 is the error return (!) */
>  sector_t
>  iomap_bmap(struct address_space *mapping, sector_t bno,
>   const struct iomap_ops *ops)
>  {
> - struct inode *inode = mapping->host;
> - loff_t pos = bno << inode->i_blkbits;
> - unsigned blocksize = i_blocksize(inode);
> + struct iomap_iter iter = {
> + .inode  = mapping->host,
> + .pos= (loff_t)bno << mapping->host->i_blkbits,
> + .len= i_blocksize(mapping->host),
> + .flags  = IOMAP_REPORT,
> + };
>   int ret;
>  
>   if (filemap_write_and_wait(mapping))
>   return 0;
>  
>   bno = 0;
> - ret = iomap_apply(inode, pos, blocksize, 0, ops, ,
> -   iomap_bmap_actor);
> + while ((ret = iomap_iter(, ops)) > 0) {
> + if (iter.iomap.type != IOMAP_MAPPED)
> + continue;

There isn't a mapped extent, so return 0 here, right?

--D

> + bno = (iter.pos - iter.iomap.offset + iter.iomap.addr) >>
> + mapping->host->i_blkbits;
> + }
> +
>   if (ret)
>   return 0;
>   return bno;
> -- 
> 2.30.2
> 



Re: [PATCH 08/27] iomap: add the new iomap_iter model

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:35:01PM +0200, Christoph Hellwig wrote:
> The iomap_iter struct provides a convenient way to package up and
> maintain all the arguments to the various mapping and operation
> functions.  It is operated on using the iomap_iter() function that
> is called in loop until the whole range has been processed.  Compared
> to the existing iomap_apply() function this avoid an indirect call
> for each iteration.
> 
> For now iomap_iter() calls back into the existing ->iomap_begin and
> ->iomap_end methods, but in the future this could be further optimized
> to avoid indirect calls entirely.
> 
> Based on an earlier patch from Matthew Wilcox .
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/iomap/Makefile |  1 +
>  fs/iomap/iter.c   | 74 +++
>  fs/iomap/trace.h  | 37 +-
>  include/linux/iomap.h | 56 
>  4 files changed, 167 insertions(+), 1 deletion(-)
>  create mode 100644 fs/iomap/iter.c
> 
> diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
> index eef2722d93a183..85034deb5a2f19 100644
> --- a/fs/iomap/Makefile
> +++ b/fs/iomap/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_FS_IOMAP)  += iomap.o
>  
>  iomap-y  += trace.o \
>  apply.o \
> +iter.o \
>  buffered-io.o \
>  direct-io.o \
>  fiemap.o \
> diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
> new file mode 100644
> index 00..b21e2489700b7c
> --- /dev/null
> +++ b/fs/iomap/iter.c
> @@ -0,0 +1,74 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2021 Christoph Hellwig.
> + */
> +#include 
> +#include 
> +#include "trace.h"
> +
> +static inline int iomap_iter_advance(struct iomap_iter *iter)
> +{
> + /* handle the previous iteration (if any) */
> + if (iter->iomap.length) {
> + if (iter->processed <= 0)
> + return iter->processed;

Hmm, converting ssize_t to int here... I suppose that's fine since we're
merely returning "the usual negative errno code", but read on.

> + WARN_ON_ONCE(iter->processed > iomap_length(iter));
> + iter->pos += iter->processed;
> + iter->len -= iter->processed;
> + if (!iter->len)
> + return 0;
> + }
> +
> + /* clear the state for the next iteration */
> + iter->processed = 0;
> + memset(>iomap, 0, sizeof(iter->iomap));
> + memset(>srcmap, 0, sizeof(iter->srcmap));
> + return 1;
> +}
> +
> +static inline void iomap_iter_done(struct iomap_iter *iter)
> +{
> + WARN_ON_ONCE(iter->iomap.offset > iter->pos);
> + WARN_ON_ONCE(iter->iomap.length == 0);
> + WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos);
> +
> + trace_iomap_iter_dstmap(iter->inode, >iomap);
> + if (iter->srcmap.type != IOMAP_HOLE)
> + trace_iomap_iter_srcmap(iter->inode, >srcmap);
> +}
> +
> +/**
> + * iomap_iter - iterate over a ranges in a file
> + * @iter: iteration structue
> + * @ops: iomap ops provided by the file system
> + *
> + * Iterate over file system provided contiguous ranges of blocks with the 
> same
> + * state.  Should be called in a loop that continues as long as this function
> + * returns a positive value.  If 0 or a negative value is returned the caller
> + * should break out of the loop - a negative value is an error either from 
> the
> + * file system or from the last iteration stored in @iter.copied.
> + */
> +int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
> +{
> + int ret;
> +
> + if (iter->iomap.length && ops->iomap_end) {
> + ret = ops->iomap_end(iter->inode, iter->pos, iomap_length(iter),
> + iter->processed > 0 ? iter->processed : 0,
> + iter->flags, >iomap);
> + if (ret < 0 && !iter->processed)
> + return ret;
> + }
> +
> + trace_iomap_iter(iter, ops, _RET_IP_);
> + ret = iomap_iter_advance(iter);
> + if (ret <= 0)
> + return ret;
> +
> + ret = ops->iomap_begin(iter->inode, iter->pos, iter->len, iter->flags,
> +>iomap, >srcmap);
> + if (ret < 0)
> + return ret;
> + iomap_iter_done(iter);
> + return 1;
> +}



> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index f9c36df6a3061b..a9f3f736017989 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -143,6 +143,62 @@ struct iomap_ops {
>   ssize_t written, unsigned flags, struct iomap *iomap);
>  };
>  
> +/**
> + * struct iomap_iter - Iterate through a range of a file
> + * @inode: Set at the start of the iteration and should not change.
> + * @pos: The current file position we are operating on.  It is 

Re: [PATCH 03/27] iomap: mark the iomap argument to iomap_sector const

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:34:56PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 

/me wonders, does this have any significant effect on the generated
code?

It's probably a good idea to feed the optimizer as much usage info as we
can, though I would imagine that for such a simple function it can
probably tell that we don't change *iomap.

IMHO, constifiying functions is a good way to signal to /programmers/
that they're not intended to touch the arguments, so

Reviewed-by: Darrick J. Wong 

--D

> ---
>  include/linux/iomap.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 093519d91cc9cc..f9c36df6a3061b 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -91,8 +91,7 @@ struct iomap {
>   const struct iomap_page_ops *page_ops;
>  };
>  
> -static inline sector_t
> -iomap_sector(struct iomap *iomap, loff_t pos)
> +static inline sector_t iomap_sector(const struct iomap *iomap, loff_t pos)
>  {
>   return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
>  }
> -- 
> 2.30.2
> 



Re: [PATCH 02/27] iomap: remove the iomap arguments to ->page_{prepare,done}

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:34:55PM +0200, Christoph Hellwig wrote:
> These aren't actually used by the only instance implementing the methods.
> 
> Signed-off-by: Christoph Hellwig 

/me finds it kind of amusing that we still don't have any ->page_prepare
use cases for actually passing the page in, but if nobody /else/ has any
objection or imminently wants to use the iomap argument, then...

Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/gfs2/bmap.c | 5 ++---
>  fs/iomap/buffered-io.c | 6 +++---
>  include/linux/iomap.h  | 5 ++---
>  3 files changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index ed8b67b2171817..5414c2c3358092 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -1002,7 +1002,7 @@ static void gfs2_write_unlock(struct inode *inode)
>  }
>  
>  static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
> -unsigned len, struct iomap *iomap)
> +unsigned len)
>  {
>   unsigned int blockmask = i_blocksize(inode) - 1;
>   struct gfs2_sbd *sdp = GFS2_SB(inode);
> @@ -1013,8 +1013,7 @@ static int gfs2_iomap_page_prepare(struct inode *inode, 
> loff_t pos,
>  }
>  
>  static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
> -  unsigned copied, struct page *page,
> -  struct iomap *iomap)
> +  unsigned copied, struct page *page)
>  {
>   struct gfs2_trans *tr = current->journal_info;
>   struct gfs2_inode *ip = GFS2_I(inode);
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 87ccb3438becd9..75310f6fcf8401 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -605,7 +605,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, 
> unsigned len, unsigned flags,
>   return -EINTR;
>  
>   if (page_ops && page_ops->page_prepare) {
> - status = page_ops->page_prepare(inode, pos, len, iomap);
> + status = page_ops->page_prepare(inode, pos, len);
>   if (status)
>   return status;
>   }
> @@ -638,7 +638,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, 
> unsigned len, unsigned flags,
>  
>  out_no_page:
>   if (page_ops && page_ops->page_done)
> - page_ops->page_done(inode, pos, 0, NULL, iomap);
> + page_ops->page_done(inode, pos, 0, NULL);
>   return status;
>  }
>  
> @@ -714,7 +714,7 @@ static size_t iomap_write_end(struct inode *inode, loff_t 
> pos, size_t len,
>   if (old_size < pos)
>   pagecache_isize_extended(inode, old_size, pos);
>   if (page_ops && page_ops->page_done)
> - page_ops->page_done(inode, pos, ret, page, iomap);
> + page_ops->page_done(inode, pos, ret, page);
>   put_page(page);
>  
>   if (ret < len)
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 479c1da3e2211e..093519d91cc9cc 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -108,10 +108,9 @@ iomap_sector(struct iomap *iomap, loff_t pos)
>   * associated page could not be obtained.
>   */
>  struct iomap_page_ops {
> - int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len,
> - struct iomap *iomap);
> + int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len);
>   void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
> - struct page *page, struct iomap *iomap);
> + struct page *page);
>  };
>  
>  /*
> -- 
> 2.30.2
> 



Re: [PATCH 01/27] iomap: fix a trivial comment typo in trace.h

2021-07-19 Thread Darrick J. Wong
On Mon, Jul 19, 2021 at 12:34:54PM +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Darrick J. Wong 

--D

> ---
>  fs/iomap/trace.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index fdc7ae388476f5..e9cd5cc0d6ba40 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -2,7 +2,7 @@
>  /*
>   * Copyright (c) 2009-2019 Christoph Hellwig
>   *
> - * NOTE: none of these tracepoints shall be consider a stable kernel ABI
> + * NOTE: none of these tracepoints shall be considered a stable kernel ABI
>   * as they can change at any time.
>   */
>  #undef TRACE_SYSTEM
> -- 
> 2.30.2
> 



Re: [RFC PATCH v1.1 2/2] erofs: dax support for non-tailpacking regular file

2021-07-08 Thread Darrick J. Wong
On Mon, Jul 05, 2021 at 09:21:53PM +0800, Gao Xiang wrote:
> DAX is quite useful for some VM use cases in order to save guest
> memory extremely with minimal lightweight EROFS.
> 
> In order to prepare for such use cases, add preliminary dax support
> for non-tailpacking regular files for now.
> 
> Tested with the DRAM-emulated PMEM and the EROFS image generated by
> "mkfs.erofs -Enoinline_data enwik9.fsdax.img enwik9"
> 
> Cc: nvdimm@lists.linux.dev
> Cc: linux-fsde...@vger.kernel.org
> Signed-off-by: Gao Xiang 
> ---
> change since v1:
>  - update missing hunks due to patch spliting...
> bdev_dax_supported(...)
> erofs_file_mmap(...)   
> 
>  fs/erofs/data.c | 43 +--
>  fs/erofs/inode.c|  5 +
>  fs/erofs/internal.h |  2 ++
>  fs/erofs/super.c| 26 --
>  4 files changed, 72 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 0f82b4cb474c..c188c629be45 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -6,7 +6,7 @@
>  #include "internal.h"
>  #include 
>  #include 
> -
> +#include 
>  #include 
>  
>  static void erofs_readendio(struct bio *bio)
> @@ -323,6 +323,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t 
> offset, loff_t length,
>   return ret;
>  
>   iomap->bdev = inode->i_sb->s_bdev;
> + iomap->dax_dev = EROFS_I_SB(inode)->dax_dev;
>   iomap->offset = map.m_la;
>   iomap->length = map.m_llen;
>  
> @@ -382,6 +383,11 @@ static ssize_t erofs_file_read_iter(struct kiocb *iocb, 
> struct iov_iter *to)
>   if (!iov_iter_count(to))
>   return 0;
>  
> +#ifdef CONFIG_FS_DAX
> + if (IS_DAX(iocb->ki_filp->f_mapping->host))
> + return dax_iomap_rw(iocb, to, _iomap_ops);
> +#endif
> +
>   if (iocb->ki_flags & IOCB_DIRECT) {
>   int err = erofs_prepare_dio(iocb, to);
>  
> @@ -410,9 +416,42 @@ const struct address_space_operations 
> erofs_raw_access_aops = {
>   .direct_IO = noop_direct_IO,
>  };
>  
> +#ifdef CONFIG_FS_DAX
> +static vm_fault_t erofs_dax_huge_fault(struct vm_fault *vmf,
> + enum page_entry_size pe_size)
> +{
> + return dax_iomap_fault(vmf, pe_size, NULL, NULL, _iomap_ops);
> +}
> +
> +static vm_fault_t erofs_dax_fault(struct vm_fault *vmf)
> +{
> + return erofs_dax_huge_fault(vmf, PE_SIZE_PTE);
> +}
> +
> +static const struct vm_operations_struct erofs_dax_vm_ops = {
> + .fault  = erofs_dax_fault,
> + .huge_fault = erofs_dax_huge_fault,
> +};
> +
> +static int erofs_file_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + if (!IS_DAX(file_inode(file)))
> + return generic_file_readonly_mmap(file, vma);
> +
> + if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
> + return -EINVAL;
> +
> + vma->vm_ops = _dax_vm_ops;
> + vma->vm_flags |= VM_HUGEPAGE;
> + return 0;
> +}
> +#else
> +#define erofs_file_mmap  generic_file_readonly_mmap
> +#endif
> +
>  const struct file_operations erofs_file_fops = {
>   .llseek = generic_file_llseek,
>   .read_iter  = erofs_file_read_iter,
> - .mmap   = generic_file_readonly_mmap,
> + .mmap   = erofs_file_mmap,
>   .splice_read= generic_file_splice_read,
>  };
> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> index 00edb7562fea..695b97acb9a6 100644
> --- a/fs/erofs/inode.c
> +++ b/fs/erofs/inode.c
> @@ -174,6 +174,11 @@ static struct page *erofs_read_inode(struct inode *inode,
>   inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec;
>   inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec;
>  
> + inode->i_flags &= ~S_DAX;
> + if (test_opt(>ctx, DAX) && S_ISREG(inode->i_mode) &&
> + vi->datalayout == EROFS_INODE_FLAT_PLAIN)
> + inode->i_flags |= S_DAX;
> +
>   if (!nblks)
>   /* measure inode.i_blocks as generic filesystems */
>   inode->i_blocks = roundup(inode->i_size, EROFS_BLKSIZ) >> 9;
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index 2669c785d548..8b0542d35148 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -83,6 +83,7 @@ struct erofs_sb_info {
>  
>   struct erofs_sb_lz4_info lz4;
>  #endif   /* CONFIG_EROFS_FS_ZIP */
> + struct dax_device *dax_dev;
>   u32 blocks;
>   u32 meta_blkaddr;
>  #ifdef CONFIG_EROFS_FS_XATTR
> @@ -115,6 +116,7 @@ struct erofs_sb_info {
>  /* Mount flags set via mount options or defaults */
>  #define EROFS_MOUNT_XATTR_USER   0x0010
>  #define EROFS_MOUNT_POSIX_ACL0x0020
> +#define EROFS_MOUNT_DAX  0x0040
>  
>  #define clear_opt(ctx, option)   ((ctx)->mount_opt &= 
> ~EROFS_MOUNT_##option)
>  #define set_opt(ctx, option) ((ctx)->mount_opt |= EROFS_MOUNT_##option)
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 8fc6c04b54f4..b44a964ab24f 100644
> --- 

Re: [PATCH v6.1 6/7] fs/xfs: Handle CoW for fsdax write() path

2021-06-29 Thread Darrick J. Wong
On Tue, Jun 29, 2021 at 11:25:37AM +, ruansy.f...@fujitsu.com wrote:
> 
> 
> > -Original Message-
> > From: Darrick J. Wong 
> > Subject: Re: [PATCH v6.1 6/7] fs/xfs: Handle CoW for fsdax write() path
> > 
> > On Mon, Jun 28, 2021 at 02:55:03AM +, ruansy.f...@fujitsu.com wrote:
> > > > -Original Message-
> > > > Subject: Re: [PATCH v6.1 6/7] fs/xfs: Handle CoW for fsdax write()
> > > > path
> > > >
> > > > On Thu, Jun 24, 2021 at 08:49:17AM +, ruansy.f...@fujitsu.com wrote:
> > > > > Hi Darrick,
> > > > >
> > > > > Do you have any comment on this?
> > > >
> > > > Sorry, was on vacation.
> > > >
> > > > > Thanks,
> > > > > Ruan.
> > > > >
> > > > > > -Original Message-
> > > > > > From: Shiyang Ruan 
> > > > > > Subject: [PATCH v6.1 6/7] fs/xfs: Handle CoW for fsdax write()
> > > > > > path
> > > > > >
> > > > > > Hi Darrick,
> > > > > >
> > > > > > Since other patches looks good, I post this RFC patch singly to
> > > > > > hot-fix the problem in xfs_dax_write_iomap_ops->iomap_end() of
> > > > > > v6 that the error code was ingored. I will split this in two
> > > > > > patches(changes in iomap and xfs
> > > > > > respectively) in next formal version if it looks ok.
> > > > > >
> > > > > > 
> > > > > >
> > > > > > Introduce a new interface called "iomap_post_actor()" in iomap_ops.
> > > > > > And call it between ->actor() and ->iomap_end().  It is mean to
> > > > > > handle the error code returned from ->actor().  In this
> > > > > > patchset, it is used to remap or cancel the CoW extents according 
> > > > > > to the
> > error code.
> > > > > >
> > > > > > Signed-off-by: Shiyang Ruan 
> > > > > > ---
> > > > > >  fs/dax.c   | 27 ++-
> > > > > >  fs/iomap/apply.c   |  4 
> > > > > >  fs/xfs/xfs_bmap_util.c |  3 +--
> > > > > >  fs/xfs/xfs_file.c  |  5 +++--
> > > > > >  fs/xfs/xfs_iomap.c | 33 -
> > > > > >  fs/xfs/xfs_iomap.h | 24 
> > > > > >  fs/xfs/xfs_iops.c  |  7 +++
> > > > > >  fs/xfs/xfs_reflink.c   |  3 +--
> > > > > >  include/linux/iomap.h  |  8 
> > > > > >  9 files changed, 94 insertions(+), 20 deletions(-)
> > > > > >
> > > > > > diff --git a/fs/dax.c b/fs/dax.c index
> > > > > > 93f16210847b..0740c2610b6f 100644
> > > > > > --- a/fs/dax.c
> > > > > > +++ b/fs/dax.c
> > > > > > @@ -1537,7 +1537,7 @@ static vm_fault_t
> > > > > > dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> > > > > > struct iomap iomap = { .type = IOMAP_HOLE };
> > > > > > struct iomap srcmap = { .type = IOMAP_HOLE };
> > > > > > unsigned flags = IOMAP_FAULT;
> > > > > > -   int error;
> > > > > > +   int error, copied = PAGE_SIZE;
> > > > > > bool write = vmf->flags & FAULT_FLAG_WRITE;
> > > > > > vm_fault_t ret = 0, major = 0;
> > > > > > void *entry;
> > > > > > @@ -1598,7 +1598,7 @@ static vm_fault_t
> > > > > > dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
> > > > > > ret = dax_fault_actor(vmf, pfnp, , , false, flags,
> > > > > >   , );
> > > > > > if (ret == VM_FAULT_SIGBUS)
> > > > > > -   goto finish_iomap;
> > > > > > +   goto finish_iomap_actor;
> > > > > >
> > > > > > /* read/write MAPPED, CoW UNWRITTEN */
> > > > > > if (iomap.flags & IOMAP_F_NEW) { @@ -1607,10 +1607,16 @@
> > > > > > static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf,
> > > > > > pfn_t *pfnp,
> > > > > > major = VM_FAULT_MAJOR;
> > > > > > }
> > > > > >
> > > > > > + f

Re: [PATCH v6.1 6/7] fs/xfs: Handle CoW for fsdax write() path

2021-06-27 Thread Darrick J. Wong
On Mon, Jun 28, 2021 at 02:55:03AM +, ruansy.f...@fujitsu.com wrote:
> > -Original Message-
> > Subject: Re: [PATCH v6.1 6/7] fs/xfs: Handle CoW for fsdax write() path
> > 
> > On Thu, Jun 24, 2021 at 08:49:17AM +, ruansy.f...@fujitsu.com wrote:
> > > Hi Darrick,
> > >
> > > Do you have any comment on this?
> > 
> > Sorry, was on vacation.
> > 
> > > Thanks,
> > > Ruan.
> > >
> > > > -Original Message-
> > > > From: Shiyang Ruan 
> > > > Subject: [PATCH v6.1 6/7] fs/xfs: Handle CoW for fsdax write() path
> > > >
> > > > Hi Darrick,
> > > >
> > > > Since other patches looks good, I post this RFC patch singly to
> > > > hot-fix the problem in xfs_dax_write_iomap_ops->iomap_end() of v6
> > > > that the error code was ingored. I will split this in two
> > > > patches(changes in iomap and xfs
> > > > respectively) in next formal version if it looks ok.
> > > >
> > > > 
> > > >
> > > > Introduce a new interface called "iomap_post_actor()" in iomap_ops.
> > > > And call it between ->actor() and ->iomap_end().  It is mean to
> > > > handle the error code returned from ->actor().  In this patchset, it
> > > > is used to remap or cancel the CoW extents according to the error code.
> > > >
> > > > Signed-off-by: Shiyang Ruan 
> > > > ---
> > > >  fs/dax.c   | 27 ++-
> > > >  fs/iomap/apply.c   |  4 
> > > >  fs/xfs/xfs_bmap_util.c |  3 +--
> > > >  fs/xfs/xfs_file.c  |  5 +++--
> > > >  fs/xfs/xfs_iomap.c | 33 -
> > > >  fs/xfs/xfs_iomap.h | 24 
> > > >  fs/xfs/xfs_iops.c  |  7 +++
> > > >  fs/xfs/xfs_reflink.c   |  3 +--
> > > >  include/linux/iomap.h  |  8 
> > > >  9 files changed, 94 insertions(+), 20 deletions(-)
> > > >
> > > > diff --git a/fs/dax.c b/fs/dax.c
> > > > index 93f16210847b..0740c2610b6f 100644
> > > > --- a/fs/dax.c
> > > > +++ b/fs/dax.c
> > > > @@ -1537,7 +1537,7 @@ static vm_fault_t dax_iomap_pte_fault(struct
> > > > vm_fault *vmf, pfn_t *pfnp,
> > > > struct iomap iomap = { .type = IOMAP_HOLE };
> > > > struct iomap srcmap = { .type = IOMAP_HOLE };
> > > > unsigned flags = IOMAP_FAULT;
> > > > -   int error;
> > > > +   int error, copied = PAGE_SIZE;
> > > > bool write = vmf->flags & FAULT_FLAG_WRITE;
> > > > vm_fault_t ret = 0, major = 0;
> > > > void *entry;
> > > > @@ -1598,7 +1598,7 @@ static vm_fault_t dax_iomap_pte_fault(struct
> > > > vm_fault *vmf, pfn_t *pfnp,
> > > > ret = dax_fault_actor(vmf, pfnp, , , false, flags,
> > > >   , );
> > > > if (ret == VM_FAULT_SIGBUS)
> > > > -   goto finish_iomap;
> > > > +   goto finish_iomap_actor;
> > > >
> > > > /* read/write MAPPED, CoW UNWRITTEN */
> > > > if (iomap.flags & IOMAP_F_NEW) {
> > > > @@ -1607,10 +1607,16 @@ static vm_fault_t dax_iomap_pte_fault(struct
> > > > vm_fault *vmf, pfn_t *pfnp,
> > > > major = VM_FAULT_MAJOR;
> > > > }
> > > >
> > > > + finish_iomap_actor:
> > > > +   if (ops->iomap_post_actor) {
> > > > +   if (ret & VM_FAULT_ERROR)
> > > > +   copied = 0;
> > > > +   ops->iomap_post_actor(inode, pos, PMD_SIZE, copied, 
> > > > flags,
> > > > + , );
> > > > +   }
> > > > +
> > > >  finish_iomap:
> > > > if (ops->iomap_end) {
> > > > -   int copied = PAGE_SIZE;
> > > > -
> > > > if (ret & VM_FAULT_ERROR)
> > > > copied = 0;
> > > > /*
> > > > @@ -1677,7 +1683,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
> > > > vm_fault *vmf, pfn_t *pfnp,
> > > > pgoff_t max_pgoff;
> > > > void *entry;
> > > > loff_t pos;
> > > > -   int error;
> > > > +   int error, copied = PMD_SIZE;
> > > >
> > > > /*
> > > >  * Check whether offset isn't beyond end of file now. Caller is 
> > > > @@
> > > > -1736,12
> > > > +1742,15 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault
> > > > +*vmf,
> > > > pfn_t *pfnp,
> > > > ret = dax_fault_actor(vmf, pfnp, , , true, flags,
> > > >   , );
> > > >
> > > > +   if (ret == VM_FAULT_FALLBACK)
> > > > +   copied = 0;
> > > > +   if (ops->iomap_post_actor) {
> > > > +   ops->iomap_post_actor(inode, pos, PMD_SIZE, copied, 
> > > > flags,
> > > > + , );
> > > > +   }
> > > > +
> > > >  finish_iomap:
> > > > if (ops->iomap_end) {
> > > > -   int copied = PMD_SIZE;
> > > > -
> > > > -   if (ret == VM_FAULT_FALLBACK)
> > > > -   copied = 0;
> > > > /*
> > > >  * The fault is done by now and there's no way back 
> > > > (other
> > > >  * thread may be already happily using PMD 

Re: [PATCH v6.1 6/7] fs/xfs: Handle CoW for fsdax write() path

2021-06-25 Thread Darrick J. Wong
On Thu, Jun 24, 2021 at 08:49:17AM +, ruansy.f...@fujitsu.com wrote:
> Hi Darrick,
> 
> Do you have any comment on this?

Sorry, was on vacation.

> Thanks,
> Ruan.
> 
> > -Original Message-
> > From: Shiyang Ruan 
> > Subject: [PATCH v6.1 6/7] fs/xfs: Handle CoW for fsdax write() path
> > 
> > Hi Darrick,
> > 
> > Since other patches looks good, I post this RFC patch singly to hot-fix the
> > problem in xfs_dax_write_iomap_ops->iomap_end() of v6 that the error code
> > was ingored. I will split this in two patches(changes in iomap and xfs
> > respectively) in next formal version if it looks ok.
> > 
> > 
> > 
> > Introduce a new interface called "iomap_post_actor()" in iomap_ops.  And 
> > call
> > it between ->actor() and ->iomap_end().  It is mean to handle the error code
> > returned from ->actor().  In this patchset, it is used to remap or cancel 
> > the
> > CoW extents according to the error code.
> > 
> > Signed-off-by: Shiyang Ruan 
> > ---
> >  fs/dax.c   | 27 ++-
> >  fs/iomap/apply.c   |  4 
> >  fs/xfs/xfs_bmap_util.c |  3 +--
> >  fs/xfs/xfs_file.c  |  5 +++--
> >  fs/xfs/xfs_iomap.c | 33 -
> >  fs/xfs/xfs_iomap.h | 24 
> >  fs/xfs/xfs_iops.c  |  7 +++
> >  fs/xfs/xfs_reflink.c   |  3 +--
> >  include/linux/iomap.h  |  8 
> >  9 files changed, 94 insertions(+), 20 deletions(-)
> > 
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 93f16210847b..0740c2610b6f 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1537,7 +1537,7 @@ static vm_fault_t dax_iomap_pte_fault(struct
> > vm_fault *vmf, pfn_t *pfnp,
> > struct iomap iomap = { .type = IOMAP_HOLE };
> > struct iomap srcmap = { .type = IOMAP_HOLE };
> > unsigned flags = IOMAP_FAULT;
> > -   int error;
> > +   int error, copied = PAGE_SIZE;
> > bool write = vmf->flags & FAULT_FLAG_WRITE;
> > vm_fault_t ret = 0, major = 0;
> > void *entry;
> > @@ -1598,7 +1598,7 @@ static vm_fault_t dax_iomap_pte_fault(struct
> > vm_fault *vmf, pfn_t *pfnp,
> > ret = dax_fault_actor(vmf, pfnp, , , false, flags,
> >   , );
> > if (ret == VM_FAULT_SIGBUS)
> > -   goto finish_iomap;
> > +   goto finish_iomap_actor;
> > 
> > /* read/write MAPPED, CoW UNWRITTEN */
> > if (iomap.flags & IOMAP_F_NEW) {
> > @@ -1607,10 +1607,16 @@ static vm_fault_t dax_iomap_pte_fault(struct
> > vm_fault *vmf, pfn_t *pfnp,
> > major = VM_FAULT_MAJOR;
> > }
> > 
> > + finish_iomap_actor:
> > +   if (ops->iomap_post_actor) {
> > +   if (ret & VM_FAULT_ERROR)
> > +   copied = 0;
> > +   ops->iomap_post_actor(inode, pos, PMD_SIZE, copied, flags,
> > + , );
> > +   }
> > +
> >  finish_iomap:
> > if (ops->iomap_end) {
> > -   int copied = PAGE_SIZE;
> > -
> > if (ret & VM_FAULT_ERROR)
> > copied = 0;
> > /*
> > @@ -1677,7 +1683,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct
> > vm_fault *vmf, pfn_t *pfnp,
> > pgoff_t max_pgoff;
> > void *entry;
> > loff_t pos;
> > -   int error;
> > +   int error, copied = PMD_SIZE;
> > 
> > /*
> >  * Check whether offset isn't beyond end of file now. Caller is @@ 
> > -1736,12
> > +1742,15 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf,
> > pfn_t *pfnp,
> > ret = dax_fault_actor(vmf, pfnp, , , true, flags,
> >   , );
> > 
> > +   if (ret == VM_FAULT_FALLBACK)
> > +   copied = 0;
> > +   if (ops->iomap_post_actor) {
> > +   ops->iomap_post_actor(inode, pos, PMD_SIZE, copied, flags,
> > + , );
> > +   }
> > +
> >  finish_iomap:
> > if (ops->iomap_end) {
> > -   int copied = PMD_SIZE;
> > -
> > -   if (ret == VM_FAULT_FALLBACK)
> > -   copied = 0;
> > /*
> >  * The fault is done by now and there's no way back (other
> >  * thread may be already happily using PMD we have installed).
> > diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c index
> > 0493da5286ad..26a54ded184f 100644
> > --- a/fs/iomap/apply.c
> > +++ b/fs/iomap/apply.c
> > @@ -84,6 +84,10 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t 
> > length,
> > unsigned flags,
> > written = actor(inode, pos, length, data, ,
> > srcmap.type != IOMAP_HOLE ?  : );
> > 
> > +   if (ops->iomap_post_actor) {
> > +   written = ops->iomap_post_actor(inode, pos, length, written,
> > +   flags, , );

How many operations actually need an iomap_post_actor?  It's just the
dax ones, right?  Which is ... iomap_truncate_page, iomap_zero_range,
dax_iomap_fault, and dax_iomap_rw, right?  We don't need a post_actor
for other iomap functionality (like FIEMAP, SEEK_DATA/SEEK_HOLE, etc.)
so adding a new function