Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write

2023-09-13 Thread Christoph Hellwig
> direct_write_fallback(): on error revert the ->ki_pos update from buffered 
> write

Al, Christian: can you send this fix on top Linus?



Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write

2023-08-28 Thread Christoph Hellwig
On Mon, Aug 28, 2023 at 02:56:15PM +0100, Al Viro wrote:
> The first failure exit does not need any work - the caller had not bumped
> ->ki_pos; the second one (after that 'if (err < 0) {' line) does and that's
> where the patch upthread adds iocb->ki_pos -= buffered_written.
> 
> Or am I completely misparsing what you've written?

No, I misread the patch.  Looks good:

Acked-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write

2023-08-28 Thread Christoph Hellwig
On Sun, Aug 27, 2023 at 08:41:22PM +0100, Al Viro wrote:
> That part is somewhat fishy - there's a case where you return a positive value
> and advance ->ki_pos by more than that amount.  I really wonder if all callers
> of ->write_iter() are OK with that.  Consider e.g. this:

This should not exist in the latest version merged by Jens.  Can you
check if you still  see issues in the version in the block tree or
linux-next.

> Suppose ->write_iter() ends up doing returning a positive value smaller than
> the increment of kiocb.ki_pos.  What do we get?  ret is positive, so
> kiocb.ki_pos gets copied into *ppos, which is ksys_write's pos and there
> we copy it into file->f_pos.
> 
> Is it really OK to have write() return 4096 and advance the file position
> by 16K?  AFAICS, userland wouldn't get any indication of something
> odd going on - just a short write to a regular file, with followup write
> of remaining 12K getting quietly written in the range 16K..28K.
> 
> I don't remember what POSIX says about that, but it would qualify as
> nasty surprise for any userland program - sure, one can check fsync()
> results before closing the sucker and see if everything looks fine,
> but the way it's usually discussed could easily lead to assumption that
> (synchronous) O_DIRECT writes would not be affected by anything of that
> sort.

ki_pos should always be updated by the write return value.  Everything
else is a bug.



Re: [Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write

2023-08-28 Thread Christoph Hellwig
On Sun, Aug 27, 2023 at 10:45:18PM +0100, Al Viro wrote:
> IOW, I suspect that the right thing to do would be something along the lines
> of

The idea looks sensible to me, but we'll also need to do it for the
filemap_write_and_wait_range failure case.



Re: [Cluster-devel] allow building a kernel without buffer_heads

2023-07-21 Thread Christoph Hellwig
On Thu, Jul 20, 2023 at 09:51:30AM -0500, Bob Peterson wrote:
> Gfs2 still uses buffer_heads to manage the metadata being pushed through 
> its journals. We've been reducing our dependency on them but eliminating 
> them altogether is a large and daunting task. We can still work toward that 
> goal, but it will take time.

That's fine - gfs2 selects CONFIG_BUFFER_HEAD after this series and
will be perfectly fine.



Re: [Cluster-devel] [PATCH 16/17] block: use iomap for writes to block devices

2023-07-20 Thread Christoph Hellwig
On Wed, May 24, 2023 at 02:33:13PM +0100, Matthew Wilcox wrote:
> As you can see, do_page_cache_ra() does limit readahead to i_size.
> Is ractl->mapping->host the correct way to find the inode?  I always
> get confused.

As far as I can tell it is the right inode, the indirection through
file->f_mapping ensures it actually points to the backing inode.



Re: [Cluster-devel] [PATCH 16/17] block: use iomap for writes to block devices

2023-07-20 Thread Christoph Hellwig
On Fri, May 19, 2023 at 04:22:01PM +0200, Hannes Reinecke wrote:
> I'm hitting this during booting:
> [5.016324]  
> [5.030256]  iomap_iter+0x11a/0x350
> [5.030264]  iomap_readahead+0x1eb/0x2c0
> [5.030272]  read_pages+0x5d/0x220
> [5.030279]  page_cache_ra_unbounded+0x131/0x180
> [5.030284]  filemap_get_pages+0xff/0x5a0
> [5.030292]  filemap_read+0xca/0x320
> [5.030296]  ? aa_file_perm+0x126/0x500
> [5.040216]  ? touch_atime+0xc8/0x150
> [5.040224]  blkdev_read_iter+0xb0/0x150
> [5.040228]  vfs_read+0x226/0x2d0
> [5.040234]  ksys_read+0xa5/0xe0
> [5.040238]  do_syscall_64+0x5b/0x80
>
> Maybe we should consider this patch:

As willy said this should be taken care of by the i_size check.
Did you run with just this patch set or some of the large block
size experiments on top which might change the variables?

I'll repost the series today without any chances in the area, and
if you can reproduce it with just that series we need to root
cause it, so please send your kernel and VM config along for the
next report.



Re: [Cluster-devel] [LTP] [linus:master] [iomap] 219580eea1: ltp.writev07.fail

2023-07-13 Thread Christoph Hellwig
On Thu, Jul 13, 2023 at 05:34:55PM +0200, Cyril Hrubis wrote:
> iter.processed = iomap_write_iter(, i);
> 
> +   iocb->ki_pos += iter.pos - iocb->ki_pos;
> +
> if (unlikely(ret < 0))
> return ret;
> -   ret = iter.pos - iocb->ki_pos;
> -   iocb->ki_pos += ret;
> -   return ret;
> +
> +   return iter.pos - iocb->ki_pos;

I don't think this works, as iocb->ki_pos has been updated above.
What you want is probably the version below.  But so far I can't
reproduce anything yet..

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index adb92cdb24b009..02aea0174ddbcf 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -872,7 +872,7 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
iov_iter *i,
while ((ret = iomap_iter(, ops)) > 0)
iter.processed = iomap_write_iter(, i);
 
-   if (unlikely(ret < 0))
+   if (iter.pos == iocb->ki_pos)
return ret;
ret = iter.pos - iocb->ki_pos;
iocb->ki_pos += ret;



[Cluster-devel] [PATCH] gfs2: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method

2023-06-11 Thread Christoph Hellwig
Since commit a2ad63daa88b ("VFS: add FMODE_CAN_ODIRECT file flag") file
systems can just set the FMODE_CAN_ODIRECT flag at open time instead of
wiring up a dummy direct_IO method to indicate support for direct I/O.

Remove .direct_IO from gfs2_aops, and set FMODE_CAN_ODIRECT in
gfs2_open_common for regular files that do not use data journalling.

Signed-off-by: Christoph Hellwig 
---
 fs/gfs2/aops.c | 1 -
 fs/gfs2/file.c | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index a5f4be6b9213ed..d95125714ebb38 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -750,7 +750,6 @@ static const struct address_space_operations gfs2_aops = {
.release_folio = iomap_release_folio,
.invalidate_folio = iomap_invalidate_folio,
.bmap = gfs2_bmap,
-   .direct_IO = noop_direct_IO,
.migrate_folio = filemap_migrate_folio,
.is_partially_uptodate = iomap_is_partially_uptodate,
.error_remove_page = generic_error_remove_page,
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 300844f50dcd28..dcb2b7dd2269cf 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -630,6 +630,9 @@ int gfs2_open_common(struct inode *inode, struct file *file)
ret = generic_file_open(inode, file);
if (ret)
return ret;
+
+   if (!gfs2_is_jdata(GFS2_I(inode)))
+   file->f_mode |= FMODE_CAN_ODIRECT;
}
 
fp = kzalloc(sizeof(struct gfs2_file), GFP_NOFS);
-- 
2.39.2



Re: [Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper

2023-06-06 Thread Christoph Hellwig
On Mon, Jun 05, 2023 at 05:04:14PM -0700, Darrick J. Wong wrote:
> On Thu, Jun 01, 2023 at 04:59:01PM +0200, Christoph Hellwig wrote:
> > Add a helper dealing with handling the syncing of a buffered write fallback
> > for direct I/O.
> > 
> > Signed-off-by: Christoph Hellwig 
> > Reviewed-by: Damien Le Moal 
> > Reviewed-by: Miklos Szeredi 
> 
> Looks good to me; whose tree do you want this to go through?

Andrew has already picked them up.



[Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback

2023-06-01 Thread Christoph Hellwig
Use the generic direct_write_fallback helper instead of duplicating the
logic.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
 fs/fuse/file.c | 24 ++--
 1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b4e272a65fdd25..3a7c7d7181ccb9 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1340,7 +1340,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
ssize_t written = 0;
-   ssize_t written_buffered = 0;
struct inode *inode = mapping->host;
ssize_t err;
struct fuse_conn *fc = get_fuse_conn(inode);
@@ -1377,30 +1376,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb 
*iocb, struct iov_iter *from)
goto out;
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   loff_t pos, endbyte;
-
written = generic_file_direct_write(iocb, from);
if (written < 0 || !iov_iter_count(from))
goto out;
-
-   written_buffered = fuse_perform_write(iocb, from);
-   if (written_buffered < 0) {
-   err = written_buffered;
-   goto out;
-   }
-   pos = iocb->ki_pos - written_buffered;
-   endbyte = iocb->ki_pos - 1;
-
-   err = filemap_write_and_wait_range(file->f_mapping, pos,
-  endbyte);
-   if (err)
-   goto out;
-
-   invalidate_mapping_pages(file->f_mapping,
-pos >> PAGE_SHIFT,
-endbyte >> PAGE_SHIFT);
-
-   written += written_buffered;
+   written = direct_write_fallback(iocb, from, written,
+   fuse_perform_write(iocb, from));
} else {
written = fuse_perform_write(iocb, from);
}
-- 
2.39.2



[Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info

2023-06-01 Thread Christoph Hellwig
The last user of current->backing_dev_info disappeared in commit
b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
functions").  Remove the field and all assignments to it.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Christian Brauner 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Darrick J. Wong 
Acked-by: Theodore Ts'o 
---
 fs/btrfs/file.c   | 6 +-
 fs/ceph/file.c| 4 
 fs/ext4/file.c| 2 --
 fs/f2fs/file.c| 2 --
 fs/fuse/file.c| 4 
 fs/gfs2/file.c| 2 --
 fs/nfs/file.c | 5 +
 fs/ntfs/file.c| 2 --
 fs/ntfs3/file.c   | 3 ---
 fs/xfs/xfs_file.c | 4 
 include/linux/sched.h | 3 ---
 mm/filemap.c  | 3 ---
 12 files changed, 2 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f649647392e0e4..ecd43ab66fa6c7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1145,7 +1145,6 @@ static int btrfs_write_check(struct kiocb *iocb, struct 
iov_iter *from,
!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | 
BTRFS_INODE_PREALLOC)))
return -EAGAIN;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = file_remove_privs(file);
if (ret)
return ret;
@@ -1165,10 +1164,8 @@ static int btrfs_write_check(struct kiocb *iocb, struct 
iov_iter *from,
loff_t end_pos = round_up(pos + count, fs_info->sectorsize);
 
ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, end_pos);
-   if (ret) {
-   current->backing_dev_info = NULL;
+   if (ret)
return ret;
-   }
}
 
return 0;
@@ -1689,7 +1686,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct 
iov_iter *from,
if (sync)
atomic_dec(>sync_writers);
 
-   current->backing_dev_info = NULL;
return num_written;
 }
 
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index f4d8bf7dec88a8..c8ef72f723badd 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1791,9 +1791,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
else
ceph_start_io_write(inode);
 
-   /* We can write back this queue in page reclaim */
-   current->backing_dev_info = inode_to_bdi(inode);
-
if (iocb->ki_flags & IOCB_APPEND) {
err = ceph_do_getattr(inode, CEPH_STAT_CAP_SIZE, false);
if (err < 0)
@@ -1940,7 +1937,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
ceph_end_io_write(inode);
 out_unlocked:
ceph_free_cap_flush(prealloc_cf);
-   current->backing_dev_info = NULL;
return written ? written : err;
 }
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index d101b3b0c7dad8..bc430270c23c19 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -285,9 +285,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
if (ret <= 0)
goto out;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(iocb, from);
-   current->backing_dev_info = NULL;
 
 out:
inode_unlock(inode);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5ac53d2627d20d..4f423d367a44b9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4517,9 +4517,7 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb 
*iocb,
if (iocb->ki_flags & IOCB_NOWAIT)
return -EOPNOTSUPP;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(iocb, from);
-   current->backing_dev_info = NULL;
 
if (ret > 0) {
iocb->ki_pos += ret;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 89d97f6188e05e..97d435874b14aa 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1362,9 +1362,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 writethrough:
inode_lock(inode);
 
-   /* We can write back this queue in page reclaim */
-   current->backing_dev_info = inode_to_bdi(inode);
-
err = generic_write_checks(iocb, from);
if (err <= 0)
goto out;
@@ -1409,7 +1406,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
iocb->ki_pos += written;
}
 out:
-   current->backing_dev_info = NULL;
inode_unlock(inode);
if (written > 0)
written = generic_write_sync(iocb, written);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 300844f50dcd28..904a0d6ac1a1a9 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1041,11 +1041,9 @@ static ssize_t gfs2_file_buffered_write(struct kiocb 
*iocb,
goto out_unlock;
}
 
-  

[Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper

2023-06-01 Thread Christoph Hellwig
Add a helper dealing with handling the syncing of a buffered write fallback
for direct I/O.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Miklos Szeredi 
---
 fs/libfs.c | 41 
 include/linux/fs.h |  2 ++
 mm/filemap.c   | 66 +++---
 3 files changed, 58 insertions(+), 51 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 89cf614a327158..5b851315eeed03 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1613,3 +1613,44 @@ u64 inode_query_iversion(struct inode *inode)
return cur >> I_VERSION_QUERIED_SHIFT;
 }
 EXPORT_SYMBOL(inode_query_iversion);
+
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+   ssize_t direct_written, ssize_t buffered_written)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos - buffered_written;
+   loff_t end = iocb->ki_pos - 1;
+   int err;
+
+   /*
+* If the buffered write fallback returned an error, we want to return
+* the number of bytes which were written by direct I/O, or the error
+* code if that was zero.
+*
+* Note that this differs from normal direct-io semantics, which will
+* return -EFOO even if some bytes were written.
+*/
+   if (unlikely(buffered_written < 0)) {
+   if (direct_written)
+   return direct_written;
+   return buffered_written;
+   }
+
+   /*
+* We need to ensure that the page cache pages are written to disk and
+* invalidated to preserve the expected O_DIRECT semantics.
+*/
+   err = filemap_write_and_wait_range(mapping, pos, end);
+   if (err < 0) {
+   /*
+* We don't know how much we wrote, so just return the number of
+* bytes which were direct-written
+*/
+   if (direct_written)
+   return direct_written;
+   return err;
+   }
+   invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
+   return direct_written + buffered_written;
+}
+EXPORT_SYMBOL_GPL(direct_write_fallback);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 91021b4e1f6f48..6af25137543824 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2738,6 +2738,8 @@ extern ssize_t __generic_file_write_iter(struct kiocb *, 
struct iov_iter *);
 extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *);
 ssize_t generic_perform_write(struct kiocb *, struct iov_iter *);
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+   ssize_t direct_written, ssize_t buffered_written);
 
 ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
rwf_t flags);
diff --git a/mm/filemap.c b/mm/filemap.c
index ddb6f8aa86d6ca..137508da5525b6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4006,23 +4006,19 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 {
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
-   struct inode*inode = mapping->host;
-   ssize_t written = 0;
-   ssize_t err;
-   ssize_t status;
+   struct inode *inode = mapping->host;
+   ssize_t ret;
 
-   err = file_remove_privs(file);
-   if (err)
-   goto out;
+   ret = file_remove_privs(file);
+   if (ret)
+   return ret;
 
-   err = file_update_time(file);
-   if (err)
-   goto out;
+   ret = file_update_time(file);
+   if (ret)
+   return ret;
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   loff_t pos, endbyte;
-
-   written = generic_file_direct_write(iocb, from);
+   ret = generic_file_direct_write(iocb, from);
/*
 * If the write stopped short of completing, fall back to
 * buffered writes.  Some filesystems do this for writes to
@@ -4030,45 +4026,13 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 * not succeed (even if it did, DAX does not handle dirty
 * page-cache pages correctly).
 */
-   if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
-   goto out;
-
-   pos = iocb->ki_pos;
-   status = generic_perform_write(iocb, from);
-   /*
-* If generic_perform_write() returned a synchronous error
-* then we want to return the number of bytes which were
-* direct-written, or the error code if that was zero.  Note
-  

[Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write

2023-06-01 Thread Christoph Hellwig
Both callers of fuse_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
 fs/fuse/file.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 97d435874b14aa..d5902506cdcc65 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1329,7 +1329,10 @@ static ssize_t fuse_perform_write(struct kiocb *iocb,
fuse_write_update_attr(inode, pos, res);
clear_bit(FUSE_I_SIZE_UNSTABLE, >state);
 
-   return res > 0 ? res : err;
+   if (!res)
+   return err;
+   iocb->ki_pos += res;
+   return res;
 }
 
 static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
@@ -1341,7 +1344,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
struct inode *inode = mapping->host;
ssize_t err;
struct fuse_conn *fc = get_fuse_conn(inode);
-   loff_t endbyte = 0;
 
if (fc->writeback_cache) {
/* Update size (EOF optimization) and mode (SUID clearing) */
@@ -1375,19 +1377,20 @@ static ssize_t fuse_cache_write_iter(struct kiocb 
*iocb, struct iov_iter *from)
goto out;
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   loff_t pos = iocb->ki_pos;
+   loff_t pos, endbyte;
+
written = generic_file_direct_write(iocb, from);
if (written < 0 || !iov_iter_count(from))
goto out;
 
-   pos += written;
-
-   written_buffered = fuse_perform_write(iocb, mapping, from, pos);
+   written_buffered = fuse_perform_write(iocb, mapping, from,
+ iocb->ki_pos);
if (written_buffered < 0) {
err = written_buffered;
goto out;
}
-   endbyte = pos + written_buffered - 1;
+   pos = iocb->ki_pos - written_buffered;
+   endbyte = iocb->ki_pos - 1;
 
err = filemap_write_and_wait_range(file->f_mapping, pos,
   endbyte);
@@ -1399,11 +1402,8 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 endbyte >> PAGE_SHIFT);
 
written += written_buffered;
-   iocb->ki_pos = pos + written_buffered;
} else {
written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
-   if (written >= 0)
-   iocb->ki_pos += written;
}
 out:
inode_unlock(inode);
-- 
2.39.2



[Cluster-devel] [PATCH 07/12] iomap: update ki_pos in iomap_file_buffered_write

2023-06-01 Thread Christoph Hellwig
All callers of iomap_file_buffered_write need to updated ki_pos, move it
into common code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Andreas Gruenbacher 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Darrick J. Wong 
Acked-by: Damien Le Moal 
---
 fs/gfs2/file.c | 4 +---
 fs/iomap/buffered-io.c | 9 ++---
 fs/xfs/xfs_file.c  | 2 --
 fs/zonefs/file.c   | 4 +---
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 904a0d6ac1a1a9..c6a7555d5ad8bb 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1044,10 +1044,8 @@ static ssize_t gfs2_file_buffered_write(struct kiocb 
*iocb,
pagefault_disable();
ret = iomap_file_buffered_write(iocb, from, _iomap_ops);
pagefault_enable();
-   if (ret > 0) {
-   iocb->ki_pos += ret;
+   if (ret > 0)
written += ret;
-   }
 
if (inode == sdp->sd_rindex)
gfs2_glock_dq_uninit(statfs_gh);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 063133ec77f49e..550525a525c45c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -864,16 +864,19 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
iov_iter *i,
.len= iov_iter_count(i),
.flags  = IOMAP_WRITE,
};
-   int ret;
+   ssize_t ret;
 
if (iocb->ki_flags & IOCB_NOWAIT)
iter.flags |= IOMAP_NOWAIT;
 
while ((ret = iomap_iter(, ops)) > 0)
iter.processed = iomap_write_iter(, i);
-   if (iter.pos == iocb->ki_pos)
+
+   if (unlikely(ret < 0))
return ret;
-   return iter.pos - iocb->ki_pos;
+   ret = iter.pos - iocb->ki_pos;
+   iocb->ki_pos += ret;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 431c3fd0e2b598..d57443db633637 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -720,8 +720,6 @@ xfs_file_buffered_write(
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
_buffered_write_iomap_ops);
-   if (likely(ret >= 0))
-   iocb->ki_pos += ret;
 
/*
 * If we hit a space limit, try to free up some lingering preallocated
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 132f01d3461f14..e212d0636f848e 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -643,9 +643,7 @@ static ssize_t zonefs_file_buffered_write(struct kiocb 
*iocb,
goto inode_unlock;
 
ret = iomap_file_buffered_write(iocb, from, _write_iomap_ops);
-   if (ret > 0)
-   iocb->ki_pos += ret;
-   else if (ret == -EIO)
+   if (ret == -EIO)
zonefs_io_error(inode, true);
 
 inode_unlock:
-- 
2.39.2



[Cluster-devel] [PATCH 05/12] filemap: add a kiocb_invalidate_pages helper

2023-06-01 Thread Christoph Hellwig
Factor out a helper that calls filemap_write_and_wait_range and
invalidate_inode_pages2_range for the range covered by a write kiocb or
returns -EAGAIN if the kiocb is marked as nowait and there would be pages
to write or invalidate.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Darrick J. Wong 
---
 include/linux/pagemap.h |  1 +
 mm/filemap.c| 48 -
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 36fc2cea13ce20..6e4c9ee40baa99 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode 
*inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index 5fcd5227f9cae2..a1cb01a4b8046a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2777,6 +2777,33 @@ int kiocb_write_and_wait(struct kiocb *iocb, size_t 
count)
return filemap_write_and_wait_range(mapping, pos, end);
 }
 
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos;
+   loff_t end = pos + count - 1;
+   int ret;
+
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   /* we could block if there are any pages in the range */
+   if (filemap_range_has_page(mapping, pos, end))
+   return -EAGAIN;
+   } else {
+   ret = filemap_write_and_wait_range(mapping, pos, end);
+   if (ret)
+   return ret;
+   }
+
+   /*
+* After a write we want buffered reads to be sure to go to disk to get
+* the new data.  We invalidate clean cached page from the region we're
+* about to write.  We do this *before* the write so that we can return
+* without clobbering -EIOCBQUEUED from ->direct_IO().
+*/
+   return invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
+end >> PAGE_SHIFT);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -3820,30 +3847,11 @@ generic_file_direct_write(struct kiocb *iocb, struct 
iov_iter *from)
write_len = iov_iter_count(from);
end = (pos + write_len - 1) >> PAGE_SHIFT;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   /* If there are pages to writeback, return */
-   if (filemap_range_has_page(file->f_mapping, pos,
-  pos + write_len - 1))
-   return -EAGAIN;
-   } else {
-   written = filemap_write_and_wait_range(mapping, pos,
-   pos + write_len - 1);
-   if (written)
-   goto out;
-   }
-
-   /*
-* After a write we want buffered reads to be sure to go to disk to get
-* the new data.  We invalidate clean cached page from the region we're
-* about to write.  We do this *before* the write so that we can return
-* without clobbering -EIOCBQUEUED from ->direct_IO().
-*/
-   written = invalidate_inode_pages2_range(mapping,
-   pos >> PAGE_SHIFT, end);
/*
 * If a page can not be invalidated, return 0 to fall back
 * to buffered write.
 */
+   written = kiocb_invalidate_pages(iocb, write_len);
if (written) {
if (written == -EBUSY)
return 0;
-- 
2.39.2



[Cluster-devel] [PATCH 06/12] filemap: add a kiocb_invalidate_post_direct_write helper

2023-06-01 Thread Christoph Hellwig
Add a helper to invalidate page cache after a dio write.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Darrick J. Wong 
---
 fs/direct-io.c  | 10 ++
 fs/iomap/direct-io.c| 12 ++--
 include/linux/fs.h  |  5 -
 include/linux/pagemap.h |  1 +
 mm/filemap.c| 37 -
 5 files changed, 25 insertions(+), 40 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 0b380bb8a81e11..4f9069aee0fe19 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -285,14 +285,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, 
unsigned int flags)
 * zeros from unwritten extents.
 */
if (flags & DIO_COMPLETE_INVALIDATE &&
-   ret > 0 && dio_op == REQ_OP_WRITE &&
-   dio->inode->i_mapping->nrpages) {
-   err = invalidate_inode_pages2_range(dio->inode->i_mapping,
-   offset >> PAGE_SHIFT,
-   (offset + ret - 1) >> PAGE_SHIFT);
-   if (err)
-   dio_warn_stale_pagecache(dio->iocb->ki_filp);
-   }
+   ret > 0 && dio_op == REQ_OP_WRITE)
+   kiocb_invalidate_post_direct_write(dio->iocb, ret);
 
inode_dio_end(dio->inode);
 
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 6207a59d2162e1..0795c54a745bca 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -81,7 +81,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 {
const struct iomap_dio_ops *dops = dio->dops;
struct kiocb *iocb = dio->iocb;
-   struct inode *inode = file_inode(iocb->ki_filp);
loff_t offset = iocb->ki_pos;
ssize_t ret = dio->error;
 
@@ -108,15 +107,8 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 * ->end_io() when necessary, otherwise a racing buffer read would cache
 * zeros from unwritten extents.
 */
-   if (!dio->error && dio->size &&
-   (dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) {
-   int err;
-   err = invalidate_inode_pages2_range(inode->i_mapping,
-   offset >> PAGE_SHIFT,
-   (offset + dio->size - 1) >> PAGE_SHIFT);
-   if (err)
-   dio_warn_stale_pagecache(iocb->ki_filp);
-   }
+   if (!dio->error && dio->size && (dio->flags & IOMAP_DIO_WRITE))
+   kiocb_invalidate_post_direct_write(iocb, dio->size);
 
inode_dio_end(file_inode(iocb->ki_filp));
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 133f0640fb2411..91021b4e1f6f48 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2837,11 +2837,6 @@ static inline void inode_dio_end(struct inode *inode)
wake_up_bit(>i_state, __I_DIO_WAKEUP);
 }
 
-/*
- * Warn about a page cache invalidation failure diring a direct I/O write.
- */
-void dio_warn_stale_pagecache(struct file *filp);
-
 extern void inode_set_flags(struct inode *inode, unsigned int flags,
unsigned int mask);
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6e4c9ee40baa99..6ecc4aaf5e3d51 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -31,6 +31,7 @@ int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
 int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index a1cb01a4b8046a..ddb6f8aa86d6ca 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3816,7 +3816,7 @@ EXPORT_SYMBOL(read_cache_page_gfp);
 /*
  * Warn about a page cache invalidation failure during a direct I/O write.
  */
-void dio_warn_stale_pagecache(struct file *filp)
+static void dio_warn_stale_pagecache(struct file *filp)
 {
static DEFINE_RATELIMIT_STATE(_rs, 86400 * HZ, DEFAULT_RATELIMIT_BURST);
char pathname[128];
@@ -3833,19 +3833,23 @@ void dio_warn_stale_pagecache(struct file *filp)
}
 }
 
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+
+   if (mapping->nrpages &&
+   invalidate_inode_pages2_range(mapping,
+   iocb->ki_pos >> PAGE_SHIFT,
+   (iocb->ki_pos + count - 1) >> PAGE_SHIFT))
+   dio_warn_stale_pagecache(iocb->ki_filp)

[Cluster-devel] [PATCH 04/12] filemap: add a kiocb_write_and_wait helper

2023-06-01 Thread Christoph Hellwig
Factor out a helper that does filemap_write_and_wait_range for the range
covered by a read kiocb, or returns -EAGAIN if the kiocb is marked as
nowait and there would be pages to write.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Darrick J. Wong 
---
 block/fops.c| 18 +++---
 include/linux/pagemap.h |  2 ++
 mm/filemap.c| 30 ++
 3 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 58d0aebc7313a8..575171049c5d83 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -576,21 +576,9 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct 
iov_iter *to)
goto reexpand; /* skip atime */
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   struct address_space *mapping = iocb->ki_filp->f_mapping;
-
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, pos,
- pos + count - 1)) {
-   ret = -EAGAIN;
-   goto reexpand;
-   }
-   } else {
-   ret = filemap_write_and_wait_range(mapping, pos,
-  pos + count - 1);
-   if (ret < 0)
-   goto reexpand;
-   }
-
+   ret = kiocb_write_and_wait(iocb, count);
+   if (ret < 0)
+   goto reexpand;
file_accessed(iocb->ki_filp);
 
ret = blkdev_direct_IO(iocb, to);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a56308a9d1a450..36fc2cea13ce20 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode 
*inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
 int filemap_flush(struct address_space *);
@@ -54,6 +55,7 @@ int filemap_check_errors(struct address_space *mapping);
 void __filemap_set_wb_err(struct address_space *mapping, int err);
 int filemap_fdatawrite_wbc(struct address_space *mapping,
   struct writeback_control *wbc);
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count);
 
 static inline int filemap_write_and_wait(struct address_space *mapping)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index 15907af4a57ff5..5fcd5227f9cae2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2762,6 +2762,21 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter 
*iter,
 }
 EXPORT_SYMBOL_GPL(filemap_read);
 
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos;
+   loff_t end = pos + count - 1;
+
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   if (filemap_range_needs_writeback(mapping, pos, end))
+   return -EAGAIN;
+   return 0;
+   }
+
+   return filemap_write_and_wait_range(mapping, pos, end);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -2797,18 +2812,9 @@ generic_file_read_iter(struct kiocb *iocb, struct 
iov_iter *iter)
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
-   iocb->ki_pos + count - 1))
-   return -EAGAIN;
-   } else {
-   retval = filemap_write_and_wait_range(mapping,
-   iocb->ki_pos,
-   iocb->ki_pos + count - 1);
-   if (retval < 0)
-   return retval;
-   }
-
+   retval = kiocb_write_and_wait(iocb, count);
+   if (retval < 0)
+   return retval;
file_accessed(file);
 
retval = mapping->a_ops->direct_IO(iocb, iter);
-- 
2.39.2



[Cluster-devel] [PATCH 11/12] fuse: drop redundant arguments to fuse_perform_write

2023-06-01 Thread Christoph Hellwig
pos is always equal to iocb->ki_pos, and mapping is always equal to
iocb->ki_filp->f_mapping.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Miklos Szeredi 
---
 fs/fuse/file.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index d5902506cdcc65..b4e272a65fdd25 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1280,13 +1280,13 @@ static inline unsigned int fuse_wr_pages(loff_t pos, 
size_t len,
 max_pages);
 }
 
-static ssize_t fuse_perform_write(struct kiocb *iocb,
- struct address_space *mapping,
- struct iov_iter *ii, loff_t pos)
+static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii)
 {
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_inode *fi = get_fuse_inode(inode);
+   loff_t pos = iocb->ki_pos;
int err = 0;
ssize_t res = 0;
 
@@ -1383,8 +1383,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
if (written < 0 || !iov_iter_count(from))
goto out;
 
-   written_buffered = fuse_perform_write(iocb, mapping, from,
- iocb->ki_pos);
+   written_buffered = fuse_perform_write(iocb, from);
if (written_buffered < 0) {
err = written_buffered;
goto out;
@@ -1403,7 +1402,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 
written += written_buffered;
} else {
-   written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
+   written = fuse_perform_write(iocb, from);
}
 out:
inode_unlock(inode);
-- 
2.39.2



[Cluster-devel] cleanup the filemap / direct I/O interaction v4

2023-06-01 Thread Christoph Hellwig
Hi all,

this series cleans up some of the generic write helper calling
conventions and the page cache writeback / invalidation for
direct I/O.  This is a spinoff from the no-bufferhead kernel
project, for which we'll want to an use iomap based buffered
write path in the block layer.

Changes since v3:
 - fix a generic_sync_file that got lost in fuse
 - fix fuse to call fuse_perform_write and not generic_perform_write

Changes since v2:
 - stick to the existing behavior of returning a short write
   if the buffer fallback write or sync fails
 - bring back "fuse: use direct_write_fallback" which accidentally
   got lost in v2

Changes since v1:
 - remove current->backing_dev_info entirely
 - fix the pos/end calculation in direct_write_fallback
 - rename kiocb_invalidate_post_write to
   kiocb_invalidate_post_direct_write
 - typo fixes

diffstat:
 block/fops.c|   18 
 fs/btrfs/file.c |6 -
 fs/ceph/file.c  |6 -
 fs/direct-io.c  |   10 --
 fs/ext4/file.c  |   11 --
 fs/f2fs/file.c  |3 
 fs/fuse/file.c  |   45 ++-
 fs/gfs2/file.c  |6 -
 fs/iomap/buffered-io.c  |9 +-
 fs/iomap/direct-io.c|   88 -
 fs/libfs.c  |   41 ++
 fs/nfs/file.c   |6 -
 fs/ntfs/file.c  |2 
 fs/ntfs3/file.c |3 
 fs/xfs/xfs_file.c   |6 -
 fs/zonefs/file.c|4 
 include/linux/fs.h  |7 -
 include/linux/pagemap.h |4 
 include/linux/sched.h   |3 
 mm/filemap.c|  194 +---
 20 files changed, 194 insertions(+), 278 deletions(-)



[Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write

2023-06-01 Thread Christoph Hellwig
All callers of generic_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Xiubo Li 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Theodore Ts'o 
Acked-by: Darrick J. Wong 
---
 fs/ceph/file.c | 2 --
 fs/ext4/file.c | 9 +++--
 fs/f2fs/file.c | 1 -
 fs/nfs/file.c  | 1 -
 mm/filemap.c   | 8 
 5 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c8ef72f723badd..767f4dfe7def64 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1891,8 +1891,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
 * can not run at the same time
 */
written = generic_perform_write(iocb, from);
-   if (likely(written >= 0))
-   iocb->ki_pos = pos + written;
ceph_end_io_write(inode);
}
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index bc430270c23c19..ea0ada3985cba2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -289,12 +289,9 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
 
 out:
inode_unlock(inode);
-   if (likely(ret > 0)) {
-   iocb->ki_pos += ret;
-   ret = generic_write_sync(iocb, ret);
-   }
-
-   return ret;
+   if (unlikely(ret <= 0))
+   return ret;
+   return generic_write_sync(iocb, ret);
 }
 
 static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4f423d367a44b9..7134fe8bd008cb 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4520,7 +4520,6 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb 
*iocb,
ret = generic_perform_write(iocb, from);
 
if (ret > 0) {
-   iocb->ki_pos += ret;
f2fs_update_iostat(F2FS_I_SB(inode), inode,
APP_BUFFERED_IO, ret);
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 665ce3fc62eaf4..e8bb4c48a3210a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -655,7 +655,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter 
*from)
goto out;
 
written = result;
-   iocb->ki_pos += written;
nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
 
if (mntflags & NFS_MOUNT_WRITE_EAGER) {
diff --git a/mm/filemap.c b/mm/filemap.c
index 33b54660ad2b39..15907af4a57ff5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3957,7 +3957,10 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct 
iov_iter *i)
balance_dirty_pages_ratelimited(mapping);
} while (iov_iter_count(i));
 
-   return written ? written : status;
+   if (!written)
+   return status;
+   iocb->ki_pos += written;
+   return written;
 }
 EXPORT_SYMBOL(generic_perform_write);
 
@@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
endbyte = pos + status - 1;
err = filemap_write_and_wait_range(mapping, pos, endbyte);
if (err == 0) {
-   iocb->ki_pos = endbyte + 1;
written += status;
invalidate_mapping_pages(mapping,
 pos >> PAGE_SHIFT,
@@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
}
} else {
written = generic_perform_write(iocb, from);
-   if (likely(written > 0))
-   iocb->ki_pos += written;
}
 out:
return written ? written : err;
-- 
2.39.2



[Cluster-devel] [PATCH 08/12] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages

2023-06-01 Thread Christoph Hellwig
Use the common helpers for direct I/O page invalidation instead of
open coding the logic.  This leads to a slight reordering of checks
in __iomap_dio_rw to keep the logic straight.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Darrick J. Wong 
---
 fs/iomap/direct-io.c | 55 
 1 file changed, 20 insertions(+), 35 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 0795c54a745bca..6bd14691f96e07 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -472,7 +472,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
unsigned int dio_flags, void *private, size_t done_before)
 {
-   struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = file_inode(iocb->ki_filp);
struct iomap_iter iomi = {
.inode  = inode,
@@ -481,11 +480,11 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
.flags  = IOMAP_DIRECT,
.private= private,
};
-   loff_t end = iomi.pos + iomi.len - 1, ret = 0;
bool wait_for_completion =
is_sync_kiocb(iocb) || (dio_flags & IOMAP_DIO_FORCE_WAIT);
struct blk_plug plug;
struct iomap_dio *dio;
+   loff_t ret = 0;
 
trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before);
 
@@ -509,31 +508,29 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->submit.waiter = current;
dio->submit.poll_bio = NULL;
 
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   iomi.flags |= IOMAP_NOWAIT;
+
if (iov_iter_rw(iter) == READ) {
if (iomi.pos >= dio->i_size)
goto out_free_dio;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, iomi.pos,
-   end)) {
-   ret = -EAGAIN;
-   goto out_free_dio;
-   }
-   iomi.flags |= IOMAP_NOWAIT;
-   }
-
if (user_backed_iter(iter))
dio->flags |= IOMAP_DIO_DIRTY;
+
+   ret = kiocb_write_and_wait(iocb, iomi.len);
+   if (ret)
+   goto out_free_dio;
} else {
iomi.flags |= IOMAP_WRITE;
dio->flags |= IOMAP_DIO_WRITE;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_has_page(mapping, iomi.pos, end)) {
-   ret = -EAGAIN;
+   if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
+   ret = -EAGAIN;
+   if (iomi.pos >= dio->i_size ||
+   iomi.pos + iomi.len > dio->i_size)
goto out_free_dio;
-   }
-   iomi.flags |= IOMAP_NOWAIT;
+   iomi.flags |= IOMAP_OVERWRITE_ONLY;
}
 
/* for data sync or sync, we need sync completion processing */
@@ -549,31 +546,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (!(iocb->ki_flags & IOCB_SYNC))
dio->flags |= IOMAP_DIO_WRITE_FUA;
}
-   }
-
-   if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
-   ret = -EAGAIN;
-   if (iomi.pos >= dio->i_size ||
-   iomi.pos + iomi.len > dio->i_size)
-   goto out_free_dio;
-   iomi.flags |= IOMAP_OVERWRITE_ONLY;
-   }
 
-   ret = filemap_write_and_wait_range(mapping, iomi.pos, end);
-   if (ret)
-   goto out_free_dio;
-
-   if (iov_iter_rw(iter) == WRITE) {
/*
 * Try to invalidate cache pages for the range we are writing.
 * If this invalidation fails, let the caller fall back to
 * buffered I/O.
 */
-   if (invalidate_inode_pages2_range(mapping,
-   iomi.pos >> PAGE_SHIFT, end >> PAGE_SHIFT)) {
-   trace_iomap_dio_invalidate_fail(inode, iomi.pos,
-   iomi.len);
-   ret = -ENOTBLK;
+   ret = kiocb_invalidate_pages(iocb, iomi.len);
+   if (ret) {
+   if (ret != -EAGAIN) {
+   trace_iomap_dio_invalidate_fail(inode, iomi.pos,
+   iomi.len);
+   ret = -ENOTBLK;
+   }
goto out_free_dio;
}
 
-- 
2.39.2



[Cluster-devel] [PATCH 02/12] iomap: update ki_pos a little later in iomap_dio_complete

2023-06-01 Thread Christoph Hellwig
Move the ki_pos update down a bit to prepare for a better common
helper that invalidates pages based of an iocb.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Darrick J. Wong 
---
 fs/iomap/direct-io.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 019cc87d0fb339..6207a59d2162e1 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -94,7 +94,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
if (offset + ret > dio->i_size &&
!(dio->flags & IOMAP_DIO_WRITE))
ret = dio->i_size - offset;
-   iocb->ki_pos += ret;
}
 
/*
@@ -120,19 +119,21 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
}
 
inode_dio_end(file_inode(iocb->ki_filp));
-   /*
-* If this is a DSYNC write, make sure we push it to stable storage now
-* that we've written data.
-*/
-   if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
-   ret = generic_write_sync(iocb, ret);
 
-   if (ret > 0)
-   ret += dio->done_before;
+   if (ret > 0) {
+   iocb->ki_pos += ret;
 
+   /*
+* If this is a DSYNC write, make sure we push it to stable
+* storage now that we've written data.
+*/
+   if (dio->flags & IOMAP_DIO_NEED_SYNC)
+   ret = generic_write_sync(iocb, ret);
+   if (ret > 0)
+   ret += dio->done_before;
+   }
trace_iomap_dio_complete(iocb, dio->error, ret);
kfree(dio);
-
return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_dio_complete);
-- 
2.39.2



Re: [Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write

2023-06-01 Thread Christoph Hellwig
On Wed, May 31, 2023 at 11:11:13AM +0200, Miklos Szeredi wrote:
> Why remove generic_write_sync()?  Definitely doesn't belong in this
> patch even if there's a good reason.

Yes, this shouldn't have happened.  I think this was a bad merge
resolution after the current->backing_dev removal.



Re: [Cluster-devel] [PATCH v7 19/20] fs: iomap: use bio_add_folio_nofail where possible

2023-05-31 Thread Christoph Hellwig
On Thu, Jun 01, 2023 at 08:36:59AM +1000, Dave Chinner wrote:
> We lose adjacent page merging with this change.

This is only used for adding the first folio to a brand new bio,
so there is nothing to merge with yet at this point.



Re: [Cluster-devel] [PATCH v7 20/20] block: mark bio_add_folio as __must_check

2023-05-31 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v7 17/20] block: mark bio_add_page as __must_check

2023-05-31 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v7 18/20] block: add bio_add_folio_nofail

2023-05-31 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



[Cluster-devel] [PATCH 3/8] filemap: update ki_pos in generic_perform_write

2023-05-31 Thread Christoph Hellwig
All callers of generic_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Xiubo Li 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Darrick J. Wong 
---
 fs/ceph/file.c | 2 --
 fs/ext4/file.c | 9 +++--
 fs/f2fs/file.c | 1 -
 fs/nfs/file.c  | 1 -
 mm/filemap.c   | 8 
 5 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c8ef72f723badd..767f4dfe7def64 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1891,8 +1891,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
 * can not run at the same time
 */
written = generic_perform_write(iocb, from);
-   if (likely(written >= 0))
-   iocb->ki_pos = pos + written;
ceph_end_io_write(inode);
}
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index bc430270c23c19..ea0ada3985cba2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -289,12 +289,9 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
 
 out:
inode_unlock(inode);
-   if (likely(ret > 0)) {
-   iocb->ki_pos += ret;
-   ret = generic_write_sync(iocb, ret);
-   }
-
-   return ret;
+   if (unlikely(ret <= 0))
+   return ret;
+   return generic_write_sync(iocb, ret);
 }
 
 static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4f423d367a44b9..7134fe8bd008cb 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4520,7 +4520,6 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb 
*iocb,
ret = generic_perform_write(iocb, from);
 
if (ret > 0) {
-   iocb->ki_pos += ret;
f2fs_update_iostat(F2FS_I_SB(inode), inode,
APP_BUFFERED_IO, ret);
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 665ce3fc62eaf4..e8bb4c48a3210a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -655,7 +655,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter 
*from)
goto out;
 
written = result;
-   iocb->ki_pos += written;
nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
 
if (mntflags & NFS_MOUNT_WRITE_EAGER) {
diff --git a/mm/filemap.c b/mm/filemap.c
index 33b54660ad2b39..15907af4a57ff5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3957,7 +3957,10 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct 
iov_iter *i)
balance_dirty_pages_ratelimited(mapping);
} while (iov_iter_count(i));
 
-   return written ? written : status;
+   if (!written)
+   return status;
+   iocb->ki_pos += written;
+   return written;
 }
 EXPORT_SYMBOL(generic_perform_write);
 
@@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
endbyte = pos + status - 1;
err = filemap_write_and_wait_range(mapping, pos, endbyte);
if (err == 0) {
-   iocb->ki_pos = endbyte + 1;
written += status;
invalidate_mapping_pages(mapping,
 pos >> PAGE_SHIFT,
@@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
}
} else {
written = generic_perform_write(iocb, from);
-   if (likely(written > 0))
-   iocb->ki_pos += written;
}
 out:
return written ? written : err;
-- 
2.39.2



[Cluster-devel] [PATCH 01/12] backing_dev: remove current->backing_dev_info

2023-05-31 Thread Christoph Hellwig
The last user of current->backing_dev_info disappeared in commit
b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
functions").  Remove the field and all assignments to it.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Darrick J. Wong 
---
 fs/btrfs/file.c   | 6 +-
 fs/ceph/file.c| 4 
 fs/ext4/file.c| 2 --
 fs/f2fs/file.c| 2 --
 fs/fuse/file.c| 4 
 fs/gfs2/file.c| 2 --
 fs/nfs/file.c | 5 +
 fs/ntfs/file.c| 2 --
 fs/ntfs3/file.c   | 3 ---
 fs/xfs/xfs_file.c | 4 
 include/linux/sched.h | 3 ---
 mm/filemap.c  | 3 ---
 12 files changed, 2 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f649647392e0e4..ecd43ab66fa6c7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1145,7 +1145,6 @@ static int btrfs_write_check(struct kiocb *iocb, struct 
iov_iter *from,
!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | 
BTRFS_INODE_PREALLOC)))
return -EAGAIN;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = file_remove_privs(file);
if (ret)
return ret;
@@ -1165,10 +1164,8 @@ static int btrfs_write_check(struct kiocb *iocb, struct 
iov_iter *from,
loff_t end_pos = round_up(pos + count, fs_info->sectorsize);
 
ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, end_pos);
-   if (ret) {
-   current->backing_dev_info = NULL;
+   if (ret)
return ret;
-   }
}
 
return 0;
@@ -1689,7 +1686,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct 
iov_iter *from,
if (sync)
atomic_dec(>sync_writers);
 
-   current->backing_dev_info = NULL;
return num_written;
 }
 
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index f4d8bf7dec88a8..c8ef72f723badd 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1791,9 +1791,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
else
ceph_start_io_write(inode);
 
-   /* We can write back this queue in page reclaim */
-   current->backing_dev_info = inode_to_bdi(inode);
-
if (iocb->ki_flags & IOCB_APPEND) {
err = ceph_do_getattr(inode, CEPH_STAT_CAP_SIZE, false);
if (err < 0)
@@ -1940,7 +1937,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
ceph_end_io_write(inode);
 out_unlocked:
ceph_free_cap_flush(prealloc_cf);
-   current->backing_dev_info = NULL;
return written ? written : err;
 }
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index d101b3b0c7dad8..bc430270c23c19 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -285,9 +285,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
if (ret <= 0)
goto out;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(iocb, from);
-   current->backing_dev_info = NULL;
 
 out:
inode_unlock(inode);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5ac53d2627d20d..4f423d367a44b9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4517,9 +4517,7 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb 
*iocb,
if (iocb->ki_flags & IOCB_NOWAIT)
return -EOPNOTSUPP;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(iocb, from);
-   current->backing_dev_info = NULL;
 
if (ret > 0) {
iocb->ki_pos += ret;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 89d97f6188e05e..97d435874b14aa 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1362,9 +1362,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 writethrough:
inode_lock(inode);
 
-   /* We can write back this queue in page reclaim */
-   current->backing_dev_info = inode_to_bdi(inode);
-
err = generic_write_checks(iocb, from);
if (err <= 0)
goto out;
@@ -1409,7 +1406,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
iocb->ki_pos += written;
}
 out:
-   current->backing_dev_info = NULL;
inode_unlock(inode);
if (written > 0)
written = generic_write_sync(iocb, written);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 300844f50dcd28..904a0d6ac1a1a9 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1041,11 +1041,9 @@ static ssize_t gfs2_file_buffered_write(struct kiocb 
*iocb,
goto out_unlock;
}
 
-   current->backing_dev_info = inode_to_bdi(inode);
pagefault_disable();
ret = iomap_file_buffered_write(iocb, from, _ioma

[Cluster-devel] cleanup the filemap / direct I/O interaction v3

2023-05-31 Thread Christoph Hellwig
Hi all,

this series cleans up some of the generic write helper calling
conventions and the page cache writeback / invalidation for
direct I/O.  This is a spinoff from the no-bufferhead kernel
project, for which we'll want to an use iomap based buffered
write path in the block layer.

Changes since v2:
 - stick to the existing behavior of returning a short write
   if the buffer fallback write or sync fails
 - bring back "fuse: use direct_write_fallback" which accidentally
   got lost in v2

Changes since v1:
 - remove current->backing_dev_info entirely
 - fix the pos/end calculation in direct_write_fallback
 - rename kiocb_invalidate_post_write to
   kiocb_invalidate_post_direct_write
 - typo fixes

diffstat:
 block/fops.c|   18 +-
 fs/btrfs/file.c |6 --
 fs/ceph/file.c  |6 --
 fs/direct-io.c  |   10 ---
 fs/ext4/file.c  |   11 +---
 fs/f2fs/file.c  |3 -
 fs/fuse/file.c  |4 -
 fs/gfs2/file.c  |6 --
 fs/iomap/buffered-io.c  |9 ++-
 fs/iomap/direct-io.c|   88 -
 fs/nfs/file.c   |6 --
 fs/ntfs/file.c  |2 
 fs/ntfs3/file.c |3 -
 fs/xfs/xfs_file.c   |6 --
 fs/zonefs/file.c|4 -
 include/linux/fs.h  |5 -
 include/linux/pagemap.h |4 +
 include/linux/sched.h   |3 -
 mm/filemap.c|  126 ++--
 19 files changed, 125 insertions(+), 195 deletions(-)



[Cluster-devel] [PATCH 2/8] iomap: update ki_pos a little later in iomap_dio_complete

2023-05-31 Thread Christoph Hellwig
Move the ki_pos update down a bit to prepare for a better common
helper that invalidates pages based of an iocb.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Darrick J. Wong 
---
 fs/iomap/direct-io.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 019cc87d0fb339..6207a59d2162e1 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -94,7 +94,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
if (offset + ret > dio->i_size &&
!(dio->flags & IOMAP_DIO_WRITE))
ret = dio->i_size - offset;
-   iocb->ki_pos += ret;
}
 
/*
@@ -120,19 +119,21 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
}
 
inode_dio_end(file_inode(iocb->ki_filp));
-   /*
-* If this is a DSYNC write, make sure we push it to stable storage now
-* that we've written data.
-*/
-   if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
-   ret = generic_write_sync(iocb, ret);
 
-   if (ret > 0)
-   ret += dio->done_before;
+   if (ret > 0) {
+   iocb->ki_pos += ret;
 
+   /*
+* If this is a DSYNC write, make sure we push it to stable
+* storage now that we've written data.
+*/
+   if (dio->flags & IOMAP_DIO_NEED_SYNC)
+   ret = generic_write_sync(iocb, ret);
+   if (ret > 0)
+   ret += dio->done_before;
+   }
trace_iomap_dio_complete(iocb, dio->error, ret);
kfree(dio);
-
return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_dio_complete);
-- 
2.39.2



[Cluster-devel] [PATCH 08/12] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages

2023-05-31 Thread Christoph Hellwig
Use the common helpers for direct I/O page invalidation instead of
open coding the logic.  This leads to a slight reordering of checks
in __iomap_dio_rw to keep the logic straight.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Darrick J. Wong 
---
 fs/iomap/direct-io.c | 55 
 1 file changed, 20 insertions(+), 35 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 0795c54a745bca..6bd14691f96e07 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -472,7 +472,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
unsigned int dio_flags, void *private, size_t done_before)
 {
-   struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = file_inode(iocb->ki_filp);
struct iomap_iter iomi = {
.inode  = inode,
@@ -481,11 +480,11 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
.flags  = IOMAP_DIRECT,
.private= private,
};
-   loff_t end = iomi.pos + iomi.len - 1, ret = 0;
bool wait_for_completion =
is_sync_kiocb(iocb) || (dio_flags & IOMAP_DIO_FORCE_WAIT);
struct blk_plug plug;
struct iomap_dio *dio;
+   loff_t ret = 0;
 
trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before);
 
@@ -509,31 +508,29 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->submit.waiter = current;
dio->submit.poll_bio = NULL;
 
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   iomi.flags |= IOMAP_NOWAIT;
+
if (iov_iter_rw(iter) == READ) {
if (iomi.pos >= dio->i_size)
goto out_free_dio;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, iomi.pos,
-   end)) {
-   ret = -EAGAIN;
-   goto out_free_dio;
-   }
-   iomi.flags |= IOMAP_NOWAIT;
-   }
-
if (user_backed_iter(iter))
dio->flags |= IOMAP_DIO_DIRTY;
+
+   ret = kiocb_write_and_wait(iocb, iomi.len);
+   if (ret)
+   goto out_free_dio;
} else {
iomi.flags |= IOMAP_WRITE;
dio->flags |= IOMAP_DIO_WRITE;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_has_page(mapping, iomi.pos, end)) {
-   ret = -EAGAIN;
+   if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
+   ret = -EAGAIN;
+   if (iomi.pos >= dio->i_size ||
+   iomi.pos + iomi.len > dio->i_size)
goto out_free_dio;
-   }
-   iomi.flags |= IOMAP_NOWAIT;
+   iomi.flags |= IOMAP_OVERWRITE_ONLY;
}
 
/* for data sync or sync, we need sync completion processing */
@@ -549,31 +546,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (!(iocb->ki_flags & IOCB_SYNC))
dio->flags |= IOMAP_DIO_WRITE_FUA;
}
-   }
-
-   if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
-   ret = -EAGAIN;
-   if (iomi.pos >= dio->i_size ||
-   iomi.pos + iomi.len > dio->i_size)
-   goto out_free_dio;
-   iomi.flags |= IOMAP_OVERWRITE_ONLY;
-   }
 
-   ret = filemap_write_and_wait_range(mapping, iomi.pos, end);
-   if (ret)
-   goto out_free_dio;
-
-   if (iov_iter_rw(iter) == WRITE) {
/*
 * Try to invalidate cache pages for the range we are writing.
 * If this invalidation fails, let the caller fall back to
 * buffered I/O.
 */
-   if (invalidate_inode_pages2_range(mapping,
-   iomi.pos >> PAGE_SHIFT, end >> PAGE_SHIFT)) {
-   trace_iomap_dio_invalidate_fail(inode, iomi.pos,
-   iomi.len);
-   ret = -ENOTBLK;
+   ret = kiocb_invalidate_pages(iocb, iomi.len);
+   if (ret) {
+   if (ret != -EAGAIN) {
+   trace_iomap_dio_invalidate_fail(inode, iomi.pos,
+   iomi.len);
+   ret = -ENOTBLK;
+   }
goto out_free_dio;
}
 
-- 
2.39.2



[Cluster-devel] [PATCH 6/8] filemap: add a kiocb_invalidate_post_direct_write helper

2023-05-31 Thread Christoph Hellwig
Add a helper to invalidate page cache after a dio write.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Darrick J. Wong 
---
 fs/direct-io.c  | 10 ++
 fs/iomap/direct-io.c| 12 ++--
 include/linux/fs.h  |  5 -
 include/linux/pagemap.h |  1 +
 mm/filemap.c| 37 -
 5 files changed, 25 insertions(+), 40 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 0b380bb8a81e11..4f9069aee0fe19 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -285,14 +285,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, 
unsigned int flags)
 * zeros from unwritten extents.
 */
if (flags & DIO_COMPLETE_INVALIDATE &&
-   ret > 0 && dio_op == REQ_OP_WRITE &&
-   dio->inode->i_mapping->nrpages) {
-   err = invalidate_inode_pages2_range(dio->inode->i_mapping,
-   offset >> PAGE_SHIFT,
-   (offset + ret - 1) >> PAGE_SHIFT);
-   if (err)
-   dio_warn_stale_pagecache(dio->iocb->ki_filp);
-   }
+   ret > 0 && dio_op == REQ_OP_WRITE)
+   kiocb_invalidate_post_direct_write(dio->iocb, ret);
 
inode_dio_end(dio->inode);
 
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 6207a59d2162e1..0795c54a745bca 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -81,7 +81,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 {
const struct iomap_dio_ops *dops = dio->dops;
struct kiocb *iocb = dio->iocb;
-   struct inode *inode = file_inode(iocb->ki_filp);
loff_t offset = iocb->ki_pos;
ssize_t ret = dio->error;
 
@@ -108,15 +107,8 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 * ->end_io() when necessary, otherwise a racing buffer read would cache
 * zeros from unwritten extents.
 */
-   if (!dio->error && dio->size &&
-   (dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) {
-   int err;
-   err = invalidate_inode_pages2_range(inode->i_mapping,
-   offset >> PAGE_SHIFT,
-   (offset + dio->size - 1) >> PAGE_SHIFT);
-   if (err)
-   dio_warn_stale_pagecache(iocb->ki_filp);
-   }
+   if (!dio->error && dio->size && (dio->flags & IOMAP_DIO_WRITE))
+   kiocb_invalidate_post_direct_write(iocb, dio->size);
 
inode_dio_end(file_inode(iocb->ki_filp));
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 133f0640fb2411..91021b4e1f6f48 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2837,11 +2837,6 @@ static inline void inode_dio_end(struct inode *inode)
wake_up_bit(>i_state, __I_DIO_WAKEUP);
 }
 
-/*
- * Warn about a page cache invalidation failure diring a direct I/O write.
- */
-void dio_warn_stale_pagecache(struct file *filp);
-
 extern void inode_set_flags(struct inode *inode, unsigned int flags,
unsigned int mask);
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6e4c9ee40baa99..6ecc4aaf5e3d51 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -31,6 +31,7 @@ int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
 int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index a1cb01a4b8046a..ddb6f8aa86d6ca 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3816,7 +3816,7 @@ EXPORT_SYMBOL(read_cache_page_gfp);
 /*
  * Warn about a page cache invalidation failure during a direct I/O write.
  */
-void dio_warn_stale_pagecache(struct file *filp)
+static void dio_warn_stale_pagecache(struct file *filp)
 {
static DEFINE_RATELIMIT_STATE(_rs, 86400 * HZ, DEFAULT_RATELIMIT_BURST);
char pathname[128];
@@ -3833,19 +3833,23 @@ void dio_warn_stale_pagecache(struct file *filp)
}
 }
 
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+
+   if (mapping->nrpages &&
+   invalidate_inode_pages2_range(mapping,
+   iocb->ki_pos >> PAGE_SHIFT,
+   (iocb->ki_pos + count - 1) >> PAGE_SHIFT))
+   dio_warn_stale_pagecache(iocb->ki_filp)

[Cluster-devel] [PATCH 06/12] filemap: add a kiocb_invalidate_post_direct_write helper

2023-05-31 Thread Christoph Hellwig
Add a helper to invalidate page cache after a dio write.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Darrick J. Wong 
---
 fs/direct-io.c  | 10 ++
 fs/iomap/direct-io.c| 12 ++--
 include/linux/fs.h  |  5 -
 include/linux/pagemap.h |  1 +
 mm/filemap.c| 37 -
 5 files changed, 25 insertions(+), 40 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 0b380bb8a81e11..4f9069aee0fe19 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -285,14 +285,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, 
unsigned int flags)
 * zeros from unwritten extents.
 */
if (flags & DIO_COMPLETE_INVALIDATE &&
-   ret > 0 && dio_op == REQ_OP_WRITE &&
-   dio->inode->i_mapping->nrpages) {
-   err = invalidate_inode_pages2_range(dio->inode->i_mapping,
-   offset >> PAGE_SHIFT,
-   (offset + ret - 1) >> PAGE_SHIFT);
-   if (err)
-   dio_warn_stale_pagecache(dio->iocb->ki_filp);
-   }
+   ret > 0 && dio_op == REQ_OP_WRITE)
+   kiocb_invalidate_post_direct_write(dio->iocb, ret);
 
inode_dio_end(dio->inode);
 
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 6207a59d2162e1..0795c54a745bca 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -81,7 +81,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 {
const struct iomap_dio_ops *dops = dio->dops;
struct kiocb *iocb = dio->iocb;
-   struct inode *inode = file_inode(iocb->ki_filp);
loff_t offset = iocb->ki_pos;
ssize_t ret = dio->error;
 
@@ -108,15 +107,8 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 * ->end_io() when necessary, otherwise a racing buffer read would cache
 * zeros from unwritten extents.
 */
-   if (!dio->error && dio->size &&
-   (dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) {
-   int err;
-   err = invalidate_inode_pages2_range(inode->i_mapping,
-   offset >> PAGE_SHIFT,
-   (offset + dio->size - 1) >> PAGE_SHIFT);
-   if (err)
-   dio_warn_stale_pagecache(iocb->ki_filp);
-   }
+   if (!dio->error && dio->size && (dio->flags & IOMAP_DIO_WRITE))
+   kiocb_invalidate_post_direct_write(iocb, dio->size);
 
inode_dio_end(file_inode(iocb->ki_filp));
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 133f0640fb2411..91021b4e1f6f48 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2837,11 +2837,6 @@ static inline void inode_dio_end(struct inode *inode)
wake_up_bit(>i_state, __I_DIO_WAKEUP);
 }
 
-/*
- * Warn about a page cache invalidation failure diring a direct I/O write.
- */
-void dio_warn_stale_pagecache(struct file *filp);
-
 extern void inode_set_flags(struct inode *inode, unsigned int flags,
unsigned int mask);
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6e4c9ee40baa99..6ecc4aaf5e3d51 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -31,6 +31,7 @@ int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
 int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index a1cb01a4b8046a..ddb6f8aa86d6ca 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3816,7 +3816,7 @@ EXPORT_SYMBOL(read_cache_page_gfp);
 /*
  * Warn about a page cache invalidation failure during a direct I/O write.
  */
-void dio_warn_stale_pagecache(struct file *filp)
+static void dio_warn_stale_pagecache(struct file *filp)
 {
static DEFINE_RATELIMIT_STATE(_rs, 86400 * HZ, DEFAULT_RATELIMIT_BURST);
char pathname[128];
@@ -3833,19 +3833,23 @@ void dio_warn_stale_pagecache(struct file *filp)
}
 }
 
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+
+   if (mapping->nrpages &&
+   invalidate_inode_pages2_range(mapping,
+   iocb->ki_pos >> PAGE_SHIFT,
+   (iocb->ki_pos + count - 1) >> PAGE_SHIFT))
+   dio_warn_stale_pagecache(iocb->ki_filp)

[Cluster-devel] [PATCH 4/8] filemap: add a kiocb_write_and_wait helper

2023-05-31 Thread Christoph Hellwig
Factor out a helper that does filemap_write_and_wait_range for the range
covered by a read kiocb, or returns -EAGAIN if the kiocb is marked as
nowait and there would be pages to write.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Darrick J. Wong 
---
 block/fops.c| 18 +++---
 include/linux/pagemap.h |  2 ++
 mm/filemap.c| 30 ++
 3 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 58d0aebc7313a8..575171049c5d83 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -576,21 +576,9 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct 
iov_iter *to)
goto reexpand; /* skip atime */
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   struct address_space *mapping = iocb->ki_filp->f_mapping;
-
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, pos,
- pos + count - 1)) {
-   ret = -EAGAIN;
-   goto reexpand;
-   }
-   } else {
-   ret = filemap_write_and_wait_range(mapping, pos,
-  pos + count - 1);
-   if (ret < 0)
-   goto reexpand;
-   }
-
+   ret = kiocb_write_and_wait(iocb, count);
+   if (ret < 0)
+   goto reexpand;
file_accessed(iocb->ki_filp);
 
ret = blkdev_direct_IO(iocb, to);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a56308a9d1a450..36fc2cea13ce20 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode 
*inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
 int filemap_flush(struct address_space *);
@@ -54,6 +55,7 @@ int filemap_check_errors(struct address_space *mapping);
 void __filemap_set_wb_err(struct address_space *mapping, int err);
 int filemap_fdatawrite_wbc(struct address_space *mapping,
   struct writeback_control *wbc);
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count);
 
 static inline int filemap_write_and_wait(struct address_space *mapping)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index 15907af4a57ff5..5fcd5227f9cae2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2762,6 +2762,21 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter 
*iter,
 }
 EXPORT_SYMBOL_GPL(filemap_read);
 
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos;
+   loff_t end = pos + count - 1;
+
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   if (filemap_range_needs_writeback(mapping, pos, end))
+   return -EAGAIN;
+   return 0;
+   }
+
+   return filemap_write_and_wait_range(mapping, pos, end);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -2797,18 +2812,9 @@ generic_file_read_iter(struct kiocb *iocb, struct 
iov_iter *iter)
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
-   iocb->ki_pos + count - 1))
-   return -EAGAIN;
-   } else {
-   retval = filemap_write_and_wait_range(mapping,
-   iocb->ki_pos,
-   iocb->ki_pos + count - 1);
-   if (retval < 0)
-   return retval;
-   }
-
+   retval = kiocb_write_and_wait(iocb, count);
+   if (retval < 0)
+   return retval;
file_accessed(file);
 
retval = mapping->a_ops->direct_IO(iocb, iter);
-- 
2.39.2



[Cluster-devel] [PATCH 05/12] filemap: add a kiocb_invalidate_pages helper

2023-05-31 Thread Christoph Hellwig
Factor out a helper that calls filemap_write_and_wait_range and
invalidate_inode_pages2_range for the range covered by a write kiocb or
returns -EAGAIN if the kiocb is marked as nowait and there would be pages
to write or invalidate.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Darrick J. Wong 
---
 include/linux/pagemap.h |  1 +
 mm/filemap.c| 48 -
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 36fc2cea13ce20..6e4c9ee40baa99 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode 
*inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index 5fcd5227f9cae2..a1cb01a4b8046a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2777,6 +2777,33 @@ int kiocb_write_and_wait(struct kiocb *iocb, size_t 
count)
return filemap_write_and_wait_range(mapping, pos, end);
 }
 
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos;
+   loff_t end = pos + count - 1;
+   int ret;
+
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   /* we could block if there are any pages in the range */
+   if (filemap_range_has_page(mapping, pos, end))
+   return -EAGAIN;
+   } else {
+   ret = filemap_write_and_wait_range(mapping, pos, end);
+   if (ret)
+   return ret;
+   }
+
+   /*
+* After a write we want buffered reads to be sure to go to disk to get
+* the new data.  We invalidate clean cached page from the region we're
+* about to write.  We do this *before* the write so that we can return
+* without clobbering -EIOCBQUEUED from ->direct_IO().
+*/
+   return invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
+end >> PAGE_SHIFT);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -3820,30 +3847,11 @@ generic_file_direct_write(struct kiocb *iocb, struct 
iov_iter *from)
write_len = iov_iter_count(from);
end = (pos + write_len - 1) >> PAGE_SHIFT;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   /* If there are pages to writeback, return */
-   if (filemap_range_has_page(file->f_mapping, pos,
-  pos + write_len - 1))
-   return -EAGAIN;
-   } else {
-   written = filemap_write_and_wait_range(mapping, pos,
-   pos + write_len - 1);
-   if (written)
-   goto out;
-   }
-
-   /*
-* After a write we want buffered reads to be sure to go to disk to get
-* the new data.  We invalidate clean cached page from the region we're
-* about to write.  We do this *before* the write so that we can return
-* without clobbering -EIOCBQUEUED from ->direct_IO().
-*/
-   written = invalidate_inode_pages2_range(mapping,
-   pos >> PAGE_SHIFT, end);
/*
 * If a page can not be invalidated, return 0 to fall back
 * to buffered write.
 */
+   written = kiocb_invalidate_pages(iocb, write_len);
if (written) {
if (written == -EBUSY)
return 0;
-- 
2.39.2



[Cluster-devel] [PATCH 12/12] fuse: use direct_write_fallback

2023-05-31 Thread Christoph Hellwig
Use the generic direct_write_fallback helper instead of duplicating the
logic.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
 fs/fuse/file.c | 24 ++--
 1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 025973ad813e05..7a72dc0a691201 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1340,7 +1340,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
ssize_t written = 0;
-   ssize_t written_buffered = 0;
struct inode *inode = mapping->host;
ssize_t err;
struct fuse_conn *fc = get_fuse_conn(inode);
@@ -1377,30 +1376,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb 
*iocb, struct iov_iter *from)
goto out;
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   loff_t pos, endbyte;
-
written = generic_file_direct_write(iocb, from);
if (written < 0 || !iov_iter_count(from))
goto out;
-
-   written_buffered = fuse_perform_write(iocb, from);
-   if (written_buffered < 0) {
-   err = written_buffered;
-   goto out;
-   }
-   pos = iocb->ki_pos - written_buffered;
-   endbyte = iocb->ki_pos - 1;
-
-   err = filemap_write_and_wait_range(file->f_mapping, pos,
-  endbyte);
-   if (err)
-   goto out;
-
-   invalidate_mapping_pages(file->f_mapping,
-pos >> PAGE_SHIFT,
-endbyte >> PAGE_SHIFT);
-
-   written += written_buffered;
+   written = direct_write_fallback(iocb, from, written,
+   generic_perform_write(iocb, from));
} else {
written = fuse_perform_write(iocb, from);
}
-- 
2.39.2



[Cluster-devel] [PATCH 1/8] backing_dev: remove current->backing_dev_info

2023-05-31 Thread Christoph Hellwig
The last user of current->backing_dev_info disappeared in commit
b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
functions").  Remove the field and all assignments to it.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Darrick J. Wong 
---
 fs/btrfs/file.c   | 6 +-
 fs/ceph/file.c| 4 
 fs/ext4/file.c| 2 --
 fs/f2fs/file.c| 2 --
 fs/fuse/file.c| 4 
 fs/gfs2/file.c| 2 --
 fs/nfs/file.c | 5 +
 fs/ntfs/file.c| 2 --
 fs/ntfs3/file.c   | 3 ---
 fs/xfs/xfs_file.c | 4 
 include/linux/sched.h | 3 ---
 mm/filemap.c  | 3 ---
 12 files changed, 2 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f649647392e0e4..ecd43ab66fa6c7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1145,7 +1145,6 @@ static int btrfs_write_check(struct kiocb *iocb, struct 
iov_iter *from,
!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | 
BTRFS_INODE_PREALLOC)))
return -EAGAIN;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = file_remove_privs(file);
if (ret)
return ret;
@@ -1165,10 +1164,8 @@ static int btrfs_write_check(struct kiocb *iocb, struct 
iov_iter *from,
loff_t end_pos = round_up(pos + count, fs_info->sectorsize);
 
ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, end_pos);
-   if (ret) {
-   current->backing_dev_info = NULL;
+   if (ret)
return ret;
-   }
}
 
return 0;
@@ -1689,7 +1686,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct 
iov_iter *from,
if (sync)
atomic_dec(>sync_writers);
 
-   current->backing_dev_info = NULL;
return num_written;
 }
 
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index f4d8bf7dec88a8..c8ef72f723badd 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1791,9 +1791,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
else
ceph_start_io_write(inode);
 
-   /* We can write back this queue in page reclaim */
-   current->backing_dev_info = inode_to_bdi(inode);
-
if (iocb->ki_flags & IOCB_APPEND) {
err = ceph_do_getattr(inode, CEPH_STAT_CAP_SIZE, false);
if (err < 0)
@@ -1940,7 +1937,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
ceph_end_io_write(inode);
 out_unlocked:
ceph_free_cap_flush(prealloc_cf);
-   current->backing_dev_info = NULL;
return written ? written : err;
 }
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index d101b3b0c7dad8..bc430270c23c19 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -285,9 +285,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
if (ret <= 0)
goto out;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(iocb, from);
-   current->backing_dev_info = NULL;
 
 out:
inode_unlock(inode);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5ac53d2627d20d..4f423d367a44b9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4517,9 +4517,7 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb 
*iocb,
if (iocb->ki_flags & IOCB_NOWAIT)
return -EOPNOTSUPP;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(iocb, from);
-   current->backing_dev_info = NULL;
 
if (ret > 0) {
iocb->ki_pos += ret;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 89d97f6188e05e..97d435874b14aa 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1362,9 +1362,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 writethrough:
inode_lock(inode);
 
-   /* We can write back this queue in page reclaim */
-   current->backing_dev_info = inode_to_bdi(inode);
-
err = generic_write_checks(iocb, from);
if (err <= 0)
goto out;
@@ -1409,7 +1406,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
iocb->ki_pos += written;
}
 out:
-   current->backing_dev_info = NULL;
inode_unlock(inode);
if (written > 0)
written = generic_write_sync(iocb, written);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 300844f50dcd28..904a0d6ac1a1a9 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1041,11 +1041,9 @@ static ssize_t gfs2_file_buffered_write(struct kiocb 
*iocb,
goto out_unlock;
}
 
-   current->backing_dev_info = inode_to_bdi(inode);
pagefault_disable();
ret = iomap_file_buffered_write(iocb, from, _ioma

[Cluster-devel] [PATCH 09/12] fs: factor out a direct_write_fallback helper

2023-05-31 Thread Christoph Hellwig
Add a helper dealing with handling the syncing of a buffered write fallback
for direct I/O.

Signed-off-by: Christoph Hellwig 
---
 fs/libfs.c | 41 
 include/linux/fs.h |  2 ++
 mm/filemap.c   | 66 +++---
 3 files changed, 58 insertions(+), 51 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 89cf614a327158..5b851315eeed03 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1613,3 +1613,44 @@ u64 inode_query_iversion(struct inode *inode)
return cur >> I_VERSION_QUERIED_SHIFT;
 }
 EXPORT_SYMBOL(inode_query_iversion);
+
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+   ssize_t direct_written, ssize_t buffered_written)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos - buffered_written;
+   loff_t end = iocb->ki_pos - 1;
+   int err;
+
+   /*
+* If the buffered write fallback returned an error, we want to return
+* the number of bytes which were written by direct I/O, or the error
+* code if that was zero.
+*
+* Note that this differs from normal direct-io semantics, which will
+* return -EFOO even if some bytes were written.
+*/
+   if (unlikely(buffered_written < 0)) {
+   if (direct_written)
+   return direct_written;
+   return buffered_written;
+   }
+
+   /*
+* We need to ensure that the page cache pages are written to disk and
+* invalidated to preserve the expected O_DIRECT semantics.
+*/
+   err = filemap_write_and_wait_range(mapping, pos, end);
+   if (err < 0) {
+   /*
+* We don't know how much we wrote, so just return the number of
+* bytes which were direct-written
+*/
+   if (direct_written)
+   return direct_written;
+   return err;
+   }
+   invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
+   return direct_written + buffered_written;
+}
+EXPORT_SYMBOL_GPL(direct_write_fallback);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 91021b4e1f6f48..6af25137543824 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2738,6 +2738,8 @@ extern ssize_t __generic_file_write_iter(struct kiocb *, 
struct iov_iter *);
 extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *);
 ssize_t generic_perform_write(struct kiocb *, struct iov_iter *);
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+   ssize_t direct_written, ssize_t buffered_written);
 
 ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
rwf_t flags);
diff --git a/mm/filemap.c b/mm/filemap.c
index ddb6f8aa86d6ca..137508da5525b6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4006,23 +4006,19 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 {
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
-   struct inode*inode = mapping->host;
-   ssize_t written = 0;
-   ssize_t err;
-   ssize_t status;
+   struct inode *inode = mapping->host;
+   ssize_t ret;
 
-   err = file_remove_privs(file);
-   if (err)
-   goto out;
+   ret = file_remove_privs(file);
+   if (ret)
+   return ret;
 
-   err = file_update_time(file);
-   if (err)
-   goto out;
+   ret = file_update_time(file);
+   if (ret)
+   return ret;
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   loff_t pos, endbyte;
-
-   written = generic_file_direct_write(iocb, from);
+   ret = generic_file_direct_write(iocb, from);
/*
 * If the write stopped short of completing, fall back to
 * buffered writes.  Some filesystems do this for writes to
@@ -4030,45 +4026,13 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 * not succeed (even if it did, DAX does not handle dirty
 * page-cache pages correctly).
 */
-   if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
-   goto out;
-
-   pos = iocb->ki_pos;
-   status = generic_perform_write(iocb, from);
-   /*
-* If generic_perform_write() returned a synchronous error
-* then we want to return the number of bytes which were
-* direct-written, or the error code if that was zero.  Note
-* that this differs from normal direct-io seman

[Cluster-devel] [PATCH 02/12] iomap: update ki_pos a little later in iomap_dio_complete

2023-05-31 Thread Christoph Hellwig
Move the ki_pos update down a bit to prepare for a better common
helper that invalidates pages based of an iocb.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Darrick J. Wong 
---
 fs/iomap/direct-io.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 019cc87d0fb339..6207a59d2162e1 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -94,7 +94,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
if (offset + ret > dio->i_size &&
!(dio->flags & IOMAP_DIO_WRITE))
ret = dio->i_size - offset;
-   iocb->ki_pos += ret;
}
 
/*
@@ -120,19 +119,21 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
}
 
inode_dio_end(file_inode(iocb->ki_filp));
-   /*
-* If this is a DSYNC write, make sure we push it to stable storage now
-* that we've written data.
-*/
-   if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
-   ret = generic_write_sync(iocb, ret);
 
-   if (ret > 0)
-   ret += dio->done_before;
+   if (ret > 0) {
+   iocb->ki_pos += ret;
 
+   /*
+* If this is a DSYNC write, make sure we push it to stable
+* storage now that we've written data.
+*/
+   if (dio->flags & IOMAP_DIO_NEED_SYNC)
+   ret = generic_write_sync(iocb, ret);
+   if (ret > 0)
+   ret += dio->done_before;
+   }
trace_iomap_dio_complete(iocb, dio->error, ret);
kfree(dio);
-
return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_dio_complete);
-- 
2.39.2



[Cluster-devel] [PATCH 5/8] filemap: add a kiocb_invalidate_pages helper

2023-05-31 Thread Christoph Hellwig
Factor out a helper that calls filemap_write_and_wait_range and
invalidate_inode_pages2_range for the range covered by a write kiocb or
returns -EAGAIN if the kiocb is marked as nowait and there would be pages
to write or invalidate.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Darrick J. Wong 
---
 include/linux/pagemap.h |  1 +
 mm/filemap.c| 48 -
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 36fc2cea13ce20..6e4c9ee40baa99 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode 
*inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index 5fcd5227f9cae2..a1cb01a4b8046a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2777,6 +2777,33 @@ int kiocb_write_and_wait(struct kiocb *iocb, size_t 
count)
return filemap_write_and_wait_range(mapping, pos, end);
 }
 
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos;
+   loff_t end = pos + count - 1;
+   int ret;
+
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   /* we could block if there are any pages in the range */
+   if (filemap_range_has_page(mapping, pos, end))
+   return -EAGAIN;
+   } else {
+   ret = filemap_write_and_wait_range(mapping, pos, end);
+   if (ret)
+   return ret;
+   }
+
+   /*
+* After a write we want buffered reads to be sure to go to disk to get
+* the new data.  We invalidate clean cached page from the region we're
+* about to write.  We do this *before* the write so that we can return
+* without clobbering -EIOCBQUEUED from ->direct_IO().
+*/
+   return invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
+end >> PAGE_SHIFT);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -3820,30 +3847,11 @@ generic_file_direct_write(struct kiocb *iocb, struct 
iov_iter *from)
write_len = iov_iter_count(from);
end = (pos + write_len - 1) >> PAGE_SHIFT;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   /* If there are pages to writeback, return */
-   if (filemap_range_has_page(file->f_mapping, pos,
-  pos + write_len - 1))
-   return -EAGAIN;
-   } else {
-   written = filemap_write_and_wait_range(mapping, pos,
-   pos + write_len - 1);
-   if (written)
-   goto out;
-   }
-
-   /*
-* After a write we want buffered reads to be sure to go to disk to get
-* the new data.  We invalidate clean cached page from the region we're
-* about to write.  We do this *before* the write so that we can return
-* without clobbering -EIOCBQUEUED from ->direct_IO().
-*/
-   written = invalidate_inode_pages2_range(mapping,
-   pos >> PAGE_SHIFT, end);
/*
 * If a page can not be invalidated, return 0 to fall back
 * to buffered write.
 */
+   written = kiocb_invalidate_pages(iocb, write_len);
if (written) {
if (written == -EBUSY)
return 0;
-- 
2.39.2



[Cluster-devel] [PATCH 03/12] filemap: update ki_pos in generic_perform_write

2023-05-31 Thread Christoph Hellwig
All callers of generic_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Xiubo Li 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Darrick J. Wong 
---
 fs/ceph/file.c | 2 --
 fs/ext4/file.c | 9 +++--
 fs/f2fs/file.c | 1 -
 fs/nfs/file.c  | 1 -
 mm/filemap.c   | 8 
 5 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c8ef72f723badd..767f4dfe7def64 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1891,8 +1891,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
 * can not run at the same time
 */
written = generic_perform_write(iocb, from);
-   if (likely(written >= 0))
-   iocb->ki_pos = pos + written;
ceph_end_io_write(inode);
}
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index bc430270c23c19..ea0ada3985cba2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -289,12 +289,9 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
 
 out:
inode_unlock(inode);
-   if (likely(ret > 0)) {
-   iocb->ki_pos += ret;
-   ret = generic_write_sync(iocb, ret);
-   }
-
-   return ret;
+   if (unlikely(ret <= 0))
+   return ret;
+   return generic_write_sync(iocb, ret);
 }
 
 static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4f423d367a44b9..7134fe8bd008cb 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4520,7 +4520,6 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb 
*iocb,
ret = generic_perform_write(iocb, from);
 
if (ret > 0) {
-   iocb->ki_pos += ret;
f2fs_update_iostat(F2FS_I_SB(inode), inode,
APP_BUFFERED_IO, ret);
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 665ce3fc62eaf4..e8bb4c48a3210a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -655,7 +655,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter 
*from)
goto out;
 
written = result;
-   iocb->ki_pos += written;
nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
 
if (mntflags & NFS_MOUNT_WRITE_EAGER) {
diff --git a/mm/filemap.c b/mm/filemap.c
index 33b54660ad2b39..15907af4a57ff5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3957,7 +3957,10 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct 
iov_iter *i)
balance_dirty_pages_ratelimited(mapping);
} while (iov_iter_count(i));
 
-   return written ? written : status;
+   if (!written)
+   return status;
+   iocb->ki_pos += written;
+   return written;
 }
 EXPORT_SYMBOL(generic_perform_write);
 
@@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
endbyte = pos + status - 1;
err = filemap_write_and_wait_range(mapping, pos, endbyte);
if (err == 0) {
-   iocb->ki_pos = endbyte + 1;
written += status;
invalidate_mapping_pages(mapping,
 pos >> PAGE_SHIFT,
@@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
}
} else {
written = generic_perform_write(iocb, from);
-   if (likely(written > 0))
-   iocb->ki_pos += written;
}
 out:
return written ? written : err;
-- 
2.39.2



[Cluster-devel] [PATCH 8/8] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages

2023-05-31 Thread Christoph Hellwig
Use the common helpers for direct I/O page invalidation instead of
open coding the logic.  This leads to a slight reordering of checks
in __iomap_dio_rw to keep the logic straight.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Darrick J. Wong 
---
 fs/iomap/direct-io.c | 55 
 1 file changed, 20 insertions(+), 35 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 0795c54a745bca..6bd14691f96e07 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -472,7 +472,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
unsigned int dio_flags, void *private, size_t done_before)
 {
-   struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = file_inode(iocb->ki_filp);
struct iomap_iter iomi = {
.inode  = inode,
@@ -481,11 +480,11 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
.flags  = IOMAP_DIRECT,
.private= private,
};
-   loff_t end = iomi.pos + iomi.len - 1, ret = 0;
bool wait_for_completion =
is_sync_kiocb(iocb) || (dio_flags & IOMAP_DIO_FORCE_WAIT);
struct blk_plug plug;
struct iomap_dio *dio;
+   loff_t ret = 0;
 
trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before);
 
@@ -509,31 +508,29 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->submit.waiter = current;
dio->submit.poll_bio = NULL;
 
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   iomi.flags |= IOMAP_NOWAIT;
+
if (iov_iter_rw(iter) == READ) {
if (iomi.pos >= dio->i_size)
goto out_free_dio;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, iomi.pos,
-   end)) {
-   ret = -EAGAIN;
-   goto out_free_dio;
-   }
-   iomi.flags |= IOMAP_NOWAIT;
-   }
-
if (user_backed_iter(iter))
dio->flags |= IOMAP_DIO_DIRTY;
+
+   ret = kiocb_write_and_wait(iocb, iomi.len);
+   if (ret)
+   goto out_free_dio;
} else {
iomi.flags |= IOMAP_WRITE;
dio->flags |= IOMAP_DIO_WRITE;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_has_page(mapping, iomi.pos, end)) {
-   ret = -EAGAIN;
+   if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
+   ret = -EAGAIN;
+   if (iomi.pos >= dio->i_size ||
+   iomi.pos + iomi.len > dio->i_size)
goto out_free_dio;
-   }
-   iomi.flags |= IOMAP_NOWAIT;
+   iomi.flags |= IOMAP_OVERWRITE_ONLY;
}
 
/* for data sync or sync, we need sync completion processing */
@@ -549,31 +546,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (!(iocb->ki_flags & IOCB_SYNC))
dio->flags |= IOMAP_DIO_WRITE_FUA;
}
-   }
-
-   if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
-   ret = -EAGAIN;
-   if (iomi.pos >= dio->i_size ||
-   iomi.pos + iomi.len > dio->i_size)
-   goto out_free_dio;
-   iomi.flags |= IOMAP_OVERWRITE_ONLY;
-   }
 
-   ret = filemap_write_and_wait_range(mapping, iomi.pos, end);
-   if (ret)
-   goto out_free_dio;
-
-   if (iov_iter_rw(iter) == WRITE) {
/*
 * Try to invalidate cache pages for the range we are writing.
 * If this invalidation fails, let the caller fall back to
 * buffered I/O.
 */
-   if (invalidate_inode_pages2_range(mapping,
-   iomi.pos >> PAGE_SHIFT, end >> PAGE_SHIFT)) {
-   trace_iomap_dio_invalidate_fail(inode, iomi.pos,
-   iomi.len);
-   ret = -ENOTBLK;
+   ret = kiocb_invalidate_pages(iocb, iomi.len);
+   if (ret) {
+   if (ret != -EAGAIN) {
+   trace_iomap_dio_invalidate_fail(inode, iomi.pos,
+   iomi.len);
+   ret = -ENOTBLK;
+   }
goto out_free_dio;
}
 
-- 
2.39.2



[Cluster-devel] cleanup the filemap / direct I/O interaction v3 (full series now)

2023-05-31 Thread Christoph Hellwig
[Sorry for the previous attempt that stopped at patch 8]

Hi all,

this series cleans up some of the generic write helper calling
conventions and the page cache writeback / invalidation for
direct I/O.  This is a spinoff from the no-bufferhead kernel
project, for which we'll want to an use iomap based buffered
write path in the block layer.

Changes since v2:
 - stick to the existing behavior of returning a short write
   if the buffer fallback write or sync fails
 - bring back "fuse: use direct_write_fallback" which accidentally
   got lost in v2

Changes since v1:
 - remove current->backing_dev_info entirely
 - fix the pos/end calculation in direct_write_fallback
 - rename kiocb_invalidate_post_write to
   kiocb_invalidate_post_direct_write
 - typo fixes

diffstat:
 block/fops.c|   18 +-
 fs/btrfs/file.c |6 --
 fs/ceph/file.c  |6 --
 fs/direct-io.c  |   10 ---
 fs/ext4/file.c  |   11 +---
 fs/f2fs/file.c  |3 -
 fs/fuse/file.c  |4 -
 fs/gfs2/file.c  |6 --
 fs/iomap/buffered-io.c  |9 ++-
 fs/iomap/direct-io.c|   88 -
 fs/nfs/file.c   |6 --
 fs/ntfs/file.c  |2 
 fs/ntfs3/file.c |3 -
 fs/xfs/xfs_file.c   |6 --
 fs/zonefs/file.c|4 -
 include/linux/fs.h  |5 -
 include/linux/pagemap.h |4 +
 include/linux/sched.h   |3 -
 mm/filemap.c|  126 ++--
 19 files changed, 125 insertions(+), 195 deletions(-)



[Cluster-devel] [PATCH 07/12] iomap: update ki_pos in iomap_file_buffered_write

2023-05-31 Thread Christoph Hellwig
All callers of iomap_file_buffered_write need to updated ki_pos, move it
into common code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Andreas Gruenbacher 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Darrick J. Wong 
Acked-by: Damien Le Moal 
---
 fs/gfs2/file.c | 4 +---
 fs/iomap/buffered-io.c | 9 ++---
 fs/xfs/xfs_file.c  | 2 --
 fs/zonefs/file.c   | 4 +---
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 904a0d6ac1a1a9..c6a7555d5ad8bb 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1044,10 +1044,8 @@ static ssize_t gfs2_file_buffered_write(struct kiocb 
*iocb,
pagefault_disable();
ret = iomap_file_buffered_write(iocb, from, _iomap_ops);
pagefault_enable();
-   if (ret > 0) {
-   iocb->ki_pos += ret;
+   if (ret > 0)
written += ret;
-   }
 
if (inode == sdp->sd_rindex)
gfs2_glock_dq_uninit(statfs_gh);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 063133ec77f49e..550525a525c45c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -864,16 +864,19 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
iov_iter *i,
.len= iov_iter_count(i),
.flags  = IOMAP_WRITE,
};
-   int ret;
+   ssize_t ret;
 
if (iocb->ki_flags & IOCB_NOWAIT)
iter.flags |= IOMAP_NOWAIT;
 
while ((ret = iomap_iter(, ops)) > 0)
iter.processed = iomap_write_iter(, i);
-   if (iter.pos == iocb->ki_pos)
+
+   if (unlikely(ret < 0))
return ret;
-   return iter.pos - iocb->ki_pos;
+   ret = iter.pos - iocb->ki_pos;
+   iocb->ki_pos += ret;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 431c3fd0e2b598..d57443db633637 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -720,8 +720,6 @@ xfs_file_buffered_write(
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
_buffered_write_iomap_ops);
-   if (likely(ret >= 0))
-   iocb->ki_pos += ret;
 
/*
 * If we hit a space limit, try to free up some lingering preallocated
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 132f01d3461f14..e212d0636f848e 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -643,9 +643,7 @@ static ssize_t zonefs_file_buffered_write(struct kiocb 
*iocb,
goto inode_unlock;
 
ret = iomap_file_buffered_write(iocb, from, _write_iomap_ops);
-   if (ret > 0)
-   iocb->ki_pos += ret;
-   else if (ret == -EIO)
+   if (ret == -EIO)
zonefs_io_error(inode, true);
 
 inode_unlock:
-- 
2.39.2



[Cluster-devel] [PATCH 7/8] iomap: update ki_pos in iomap_file_buffered_write

2023-05-31 Thread Christoph Hellwig
All callers of iomap_file_buffered_write need to updated ki_pos, move it
into common code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Andreas Gruenbacher 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Darrick J. Wong 
Acked-by: Damien Le Moal 
---
 fs/gfs2/file.c | 4 +---
 fs/iomap/buffered-io.c | 9 ++---
 fs/xfs/xfs_file.c  | 2 --
 fs/zonefs/file.c   | 4 +---
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 904a0d6ac1a1a9..c6a7555d5ad8bb 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1044,10 +1044,8 @@ static ssize_t gfs2_file_buffered_write(struct kiocb 
*iocb,
pagefault_disable();
ret = iomap_file_buffered_write(iocb, from, _iomap_ops);
pagefault_enable();
-   if (ret > 0) {
-   iocb->ki_pos += ret;
+   if (ret > 0)
written += ret;
-   }
 
if (inode == sdp->sd_rindex)
gfs2_glock_dq_uninit(statfs_gh);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 063133ec77f49e..550525a525c45c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -864,16 +864,19 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
iov_iter *i,
.len= iov_iter_count(i),
.flags  = IOMAP_WRITE,
};
-   int ret;
+   ssize_t ret;
 
if (iocb->ki_flags & IOCB_NOWAIT)
iter.flags |= IOMAP_NOWAIT;
 
while ((ret = iomap_iter(, ops)) > 0)
iter.processed = iomap_write_iter(, i);
-   if (iter.pos == iocb->ki_pos)
+
+   if (unlikely(ret < 0))
return ret;
-   return iter.pos - iocb->ki_pos;
+   ret = iter.pos - iocb->ki_pos;
+   iocb->ki_pos += ret;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 431c3fd0e2b598..d57443db633637 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -720,8 +720,6 @@ xfs_file_buffered_write(
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
_buffered_write_iomap_ops);
-   if (likely(ret >= 0))
-   iocb->ki_pos += ret;
 
/*
 * If we hit a space limit, try to free up some lingering preallocated
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 132f01d3461f14..e212d0636f848e 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -643,9 +643,7 @@ static ssize_t zonefs_file_buffered_write(struct kiocb 
*iocb,
goto inode_unlock;
 
ret = iomap_file_buffered_write(iocb, from, _write_iomap_ops);
-   if (ret > 0)
-   iocb->ki_pos += ret;
-   else if (ret == -EIO)
+   if (ret == -EIO)
zonefs_io_error(inode, true);
 
 inode_unlock:
-- 
2.39.2



[Cluster-devel] [PATCH 04/12] filemap: add a kiocb_write_and_wait helper

2023-05-31 Thread Christoph Hellwig
Factor out a helper that does filemap_write_and_wait_range for the range
covered by a read kiocb, or returns -EAGAIN if the kiocb is marked as
nowait and there would be pages to write.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Darrick J. Wong 
---
 block/fops.c| 18 +++---
 include/linux/pagemap.h |  2 ++
 mm/filemap.c| 30 ++
 3 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 58d0aebc7313a8..575171049c5d83 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -576,21 +576,9 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct 
iov_iter *to)
goto reexpand; /* skip atime */
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   struct address_space *mapping = iocb->ki_filp->f_mapping;
-
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, pos,
- pos + count - 1)) {
-   ret = -EAGAIN;
-   goto reexpand;
-   }
-   } else {
-   ret = filemap_write_and_wait_range(mapping, pos,
-  pos + count - 1);
-   if (ret < 0)
-   goto reexpand;
-   }
-
+   ret = kiocb_write_and_wait(iocb, count);
+   if (ret < 0)
+   goto reexpand;
file_accessed(iocb->ki_filp);
 
ret = blkdev_direct_IO(iocb, to);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a56308a9d1a450..36fc2cea13ce20 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode 
*inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
 int filemap_flush(struct address_space *);
@@ -54,6 +55,7 @@ int filemap_check_errors(struct address_space *mapping);
 void __filemap_set_wb_err(struct address_space *mapping, int err);
 int filemap_fdatawrite_wbc(struct address_space *mapping,
   struct writeback_control *wbc);
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count);
 
 static inline int filemap_write_and_wait(struct address_space *mapping)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index 15907af4a57ff5..5fcd5227f9cae2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2762,6 +2762,21 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter 
*iter,
 }
 EXPORT_SYMBOL_GPL(filemap_read);
 
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos;
+   loff_t end = pos + count - 1;
+
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   if (filemap_range_needs_writeback(mapping, pos, end))
+   return -EAGAIN;
+   return 0;
+   }
+
+   return filemap_write_and_wait_range(mapping, pos, end);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -2797,18 +2812,9 @@ generic_file_read_iter(struct kiocb *iocb, struct 
iov_iter *iter)
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
-   iocb->ki_pos + count - 1))
-   return -EAGAIN;
-   } else {
-   retval = filemap_write_and_wait_range(mapping,
-   iocb->ki_pos,
-   iocb->ki_pos + count - 1);
-   if (retval < 0)
-   return retval;
-   }
-
+   retval = kiocb_write_and_wait(iocb, count);
+   if (retval < 0)
+   return retval;
file_accessed(file);
 
retval = mapping->a_ops->direct_IO(iocb, iter);
-- 
2.39.2



[Cluster-devel] [PATCH 10/12] fuse: update ki_pos in fuse_perform_write

2023-05-31 Thread Christoph Hellwig
Both callers of fuse_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
 fs/fuse/file.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 97d435874b14aa..e60e48bf392d49 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1329,7 +1329,10 @@ static ssize_t fuse_perform_write(struct kiocb *iocb,
fuse_write_update_attr(inode, pos, res);
clear_bit(FUSE_I_SIZE_UNSTABLE, >state);
 
-   return res > 0 ? res : err;
+   if (!res)
+   return err;
+   iocb->ki_pos += res;
+   return res;
 }
 
 static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
@@ -1341,7 +1344,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
struct inode *inode = mapping->host;
ssize_t err;
struct fuse_conn *fc = get_fuse_conn(inode);
-   loff_t endbyte = 0;
 
if (fc->writeback_cache) {
/* Update size (EOF optimization) and mode (SUID clearing) */
@@ -1375,19 +1377,20 @@ static ssize_t fuse_cache_write_iter(struct kiocb 
*iocb, struct iov_iter *from)
goto out;
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   loff_t pos = iocb->ki_pos;
+   loff_t pos, endbyte;
+
written = generic_file_direct_write(iocb, from);
if (written < 0 || !iov_iter_count(from))
goto out;
 
-   pos += written;
-
-   written_buffered = fuse_perform_write(iocb, mapping, from, pos);
+   written_buffered = fuse_perform_write(iocb, mapping, from,
+ iocb->ki_pos);
if (written_buffered < 0) {
err = written_buffered;
goto out;
}
-   endbyte = pos + written_buffered - 1;
+   pos = iocb->ki_pos - written_buffered;
+   endbyte = iocb->ki_pos - 1;
 
err = filemap_write_and_wait_range(file->f_mapping, pos,
   endbyte);
@@ -1399,17 +1402,11 @@ static ssize_t fuse_cache_write_iter(struct kiocb 
*iocb, struct iov_iter *from)
 endbyte >> PAGE_SHIFT);
 
written += written_buffered;
-   iocb->ki_pos = pos + written_buffered;
} else {
written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
-   if (written >= 0)
-   iocb->ki_pos += written;
}
 out:
inode_unlock(inode);
-   if (written > 0)
-   written = generic_write_sync(iocb, written);
-
return written ? written : err;
 }
 
-- 
2.39.2



[Cluster-devel] [PATCH 11/12] fuse: drop redundant arguments to fuse_perform_write

2023-05-31 Thread Christoph Hellwig
pos is always equal to iocb->ki_pos, and mapping is always equal to
iocb->ki_filp->f_mapping.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Acked-by: Miklos Szeredi 
---
 fs/fuse/file.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index e60e48bf392d49..025973ad813e05 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1280,13 +1280,13 @@ static inline unsigned int fuse_wr_pages(loff_t pos, 
size_t len,
 max_pages);
 }
 
-static ssize_t fuse_perform_write(struct kiocb *iocb,
- struct address_space *mapping,
- struct iov_iter *ii, loff_t pos)
+static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii)
 {
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_inode *fi = get_fuse_inode(inode);
+   loff_t pos = iocb->ki_pos;
int err = 0;
ssize_t res = 0;
 
@@ -1383,8 +1383,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
if (written < 0 || !iov_iter_count(from))
goto out;
 
-   written_buffered = fuse_perform_write(iocb, mapping, from,
- iocb->ki_pos);
+   written_buffered = fuse_perform_write(iocb, from);
if (written_buffered < 0) {
err = written_buffered;
goto out;
@@ -1403,7 +1402,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 
written += written_buffered;
} else {
-   written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
+   written = fuse_perform_write(iocb, from);
}
 out:
inode_unlock(inode);
-- 
2.39.2



Re: [Cluster-devel] [PATCH v6 08/20] jfs: logmgr: use __bio_add_page to add single page to bio

2023-05-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 20/20] block: mark bio_add_folio as __must_check

2023-05-30 Thread Christoph Hellwig
On Tue, May 30, 2023 at 08:49:23AM -0700, Johannes Thumshirn wrote:
> +bool __must_check bio_add_folio(struct bio *, struct folio *, size_t len, 
> size_t off);

Please spell out the parameters and avoid the overly long line.



Re: [Cluster-devel] [PATCH v6 19/20] fs: iomap: use __bio_add_folio where possible

2023-05-30 Thread Christoph Hellwig
On Tue, May 30, 2023 at 08:49:22AM -0700, Johannes Thumshirn wrote:
> When the iomap buffered-io code can't add a folio to a bio, it allocates a
> new bio and adds the folio to that one. This is done using bio_add_folio(),
> but doesn't check for errors.
> 
> As adding a folio to a newly created bio can't fail, use the newly
> introduced __bio_add_folio() function.

Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 18/20] block: add __bio_add_folio

2023-05-30 Thread Christoph Hellwig
On Tue, May 30, 2023 at 08:49:21AM -0700, Johannes Thumshirn wrote:
> Just like for bio_add_pages() add a no-fail variant for bio_add_folio().

Can we call this bio_add_folio_nofail?  I really regret the __ prefix for
bio_add_page these days - it wasn't really intended to be used as widely
originally..

> +void __bio_add_folio(struct bio *, struct folio *, size_t len, size_t off);

.. and please spell out the parameters.



Re: [Cluster-devel] [PATCH v6 17/20] block: mark bio_add_page as __must_check

2023-05-30 Thread Christoph Hellwig
> +int __must_check bio_add_page(struct bio *, struct page *, unsigned len, 
> unsigned off);

Please spell out all parameters while you touch this, and also avoid the
overly long line.



Re: [Cluster-devel] [PATCH v6 15/20] md: raid1: check if adding pages to resync bio fails

2023-05-30 Thread Christoph Hellwig
To me these look like __bio_add_page candidates, but I guess Song
preferred it this way?  It'll add a bit pointless boilerplate code,
but I'm ok with that.



Re: [Cluster-devel] [PATCH v6 10/20] zonefs: use __bio_add_page for adding single page to bio

2023-05-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 14/20] md: raid1: use __bio_add_page for adding single page to bio

2023-05-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 13/20] md: check for failure when adding pages in alloc_behind_master_bio

2023-05-30 Thread Christoph Hellwig
On Tue, May 30, 2023 at 08:49:16AM -0700, Johannes Thumshirn wrote:
> alloc_behind_master_bio() can possibly add multiple pages to a bio, but it
> is not checking for the return value of bio_add_page() if adding really
> succeeded.
> 
> Check if the page adding succeeded and if not bail out.
> 
> Reviewed-by: Damien Le Moal 
> Signed-off-by: Johannes Thumshirn 

Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 12/20] floppy: use __bio_add_page for adding single page to bio

2023-05-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 11/20] zram: use __bio_add_page for adding single page to bio

2023-05-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 09/20] gfs2: use __bio_add_page for adding single page to bio

2023-05-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 07/20] md: raid5: use __bio_add_page to add single page to new bio

2023-05-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 06/20] md: raid5-log: use __bio_add_page to add single page

2023-05-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 04/20] fs: buffer: use __bio_add_page to add single page to bio

2023-05-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 05/20] md: use __bio_add_page to add single page

2023-05-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH v6 01/20] swap: use __bio_add_page to add page to bio

2023-05-30 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 



Re: [Cluster-devel] [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping

2023-05-26 Thread Christoph Hellwig
On Thu, May 25, 2023 at 04:50:39PM -0400, Kent Overstreet wrote:
> A cache that isn't actually consistent is a _bug_. You're being
> Obsequious. And any time this has come up in previous discussions
> (including at LSF), that was never up for debate, the only question has
> been whether it was even possible to practically fix it.

That is not my impression.  But again, if you think it is useful,
go ahead and seel people on the idea.  But please prepare a series
that includes the rationale, performance tradeoffs and real live
implications for it.  And do it on the existing code that people use
and not just your shiny new thing.



Re: [Cluster-devel] [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping

2023-05-26 Thread Christoph Hellwig
On Thu, May 25, 2023 at 07:20:46PM -0400, Kent Overstreet wrote:
> > > I'm absolutely not in favour to add workarounds for thes kind of locking
> > > problems to the core kernel.  I already feel bad for allowing the
> > > small workaround in iomap for btrfs, as just fixing the locking back
> > > then would have avoid massive ratholing.
> > 
> > Please let me know when those btrfs changes are in a presentable shape ...
> 
> I would also be curious to know what btrfs needs and what the approach
> is there.

btrfs has the extent locked, where "extent locked" is a somewhat magic
range lock that actually includes different lock bits.  It does so
because it clears the page writeback bit when the data made it to the
media, but before the metadata required to find it is commited, and
the extent lock prevents it from trying to do a readpage on something
that has actually very recently been written back but not fully
commited.  Once btrfs is changed to only clear the page writeback bit
once the write is fully commited like in other file systems this extra
level of locking can go away, and there are no more locks in the
readpage path that are also taken by the direct I/O code.  With that
a lot of code in btrfs working around this can go away, including the
no fault direct I/O code.



Re: [Cluster-devel] [PATCH 09/11] fs: factor out a direct_write_fallback helper

2023-05-25 Thread Christoph Hellwig
On Wed, May 24, 2023 at 09:00:36AM +0200, Miklos Szeredi wrote:
> > +ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
> > +   ssize_t direct_written, ssize_t buffered_written)
> > +{
> > +   struct address_space *mapping = iocb->ki_filp->f_mapping;
> > +   loff_t pos = iocb->ki_pos - buffered_written;
> > +   loff_t end = iocb->ki_pos - 1;
> > +   int err;
> > +
> > +   /*
> > +* If the buffered write fallback returned an error, we want to 
> > return
> > +* the number of bytes which were written by direct I/O, or the 
> > error
> > +* code if that was zero.
> > +*
> > +* Note that this differs from normal direct-io semantics, which 
> > will
> > +* return -EFOO even if some bytes were written.
> > +*/
> > +   if (unlikely(buffered_written < 0))
> > +   return buffered_written;
> 
> Comment/code mismatch.   The comment says:
> 
> if (buffered_written < 0)
> return direct_written ?: buffered_written;

Yeah.  And the old code matches the comment, so I'll update to that.
I'm really wondering how I could come up with a good test case for
this..



Re: [Cluster-devel] [PATCH 10/11] fuse: update ki_pos in fuse_perform_write

2023-05-25 Thread Christoph Hellwig
On Wed, May 24, 2023 at 09:07:22AM +0200, Miklos Szeredi wrote:
> > -   endbyte = pos + written_buffered - 1;
> > +   endbyte = iocb->ki_pos + written_buffered - 1;
> 
> Wrong endpos.
> 
> >
> > -   err = filemap_write_and_wait_range(file->f_mapping, pos,
> > +   err = filemap_write_and_wait_range(file->f_mapping,
> > +  iocb->ki_pos,
> 
> Wrong startpos.

Yeah, fixed for the next version.



Re: [Cluster-devel] [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping

2023-05-25 Thread Christoph Hellwig
On Wed, May 24, 2023 at 04:09:02AM -0400, Kent Overstreet wrote:
> > Well, it seems like you are talking about something else than the
> > existing cases in gfs2 and btrfs, that is you want full consistency
> > between direct I/O and buffered I/O.  That's something nothing in the
> > kernel has ever provided, so I'd be curious why you think you need it
> > and want different semantics from everyone else?
> 
> Because I like code that is correct.

Well, start with explaining your definition of correctness, why everyone
else is "not correct", an how you can help fixing this correctness
problem in the existing kernel.  Thanks for your cooperation!



[Cluster-devel] [PATCH 07/11] iomap: update ki_pos in iomap_file_buffered_write

2023-05-24 Thread Christoph Hellwig
All callers of iomap_file_buffered_write need to updated ki_pos, move it
into common code.

Signed-off-by: Christoph Hellwig 
Acked-by: Damien Le Moal 
Reviewed-by: Darrick J. Wong 
---
 fs/gfs2/file.c | 4 +---
 fs/iomap/buffered-io.c | 9 ++---
 fs/xfs/xfs_file.c  | 2 --
 fs/zonefs/file.c   | 4 +---
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 904a0d6ac1a1a9..c6a7555d5ad8bb 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1044,10 +1044,8 @@ static ssize_t gfs2_file_buffered_write(struct kiocb 
*iocb,
pagefault_disable();
ret = iomap_file_buffered_write(iocb, from, _iomap_ops);
pagefault_enable();
-   if (ret > 0) {
-   iocb->ki_pos += ret;
+   if (ret > 0)
written += ret;
-   }
 
if (inode == sdp->sd_rindex)
gfs2_glock_dq_uninit(statfs_gh);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 063133ec77f49e..550525a525c45c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -864,16 +864,19 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
iov_iter *i,
.len= iov_iter_count(i),
.flags  = IOMAP_WRITE,
};
-   int ret;
+   ssize_t ret;
 
if (iocb->ki_flags & IOCB_NOWAIT)
iter.flags |= IOMAP_NOWAIT;
 
while ((ret = iomap_iter(, ops)) > 0)
iter.processed = iomap_write_iter(, i);
-   if (iter.pos == iocb->ki_pos)
+
+   if (unlikely(ret < 0))
return ret;
-   return iter.pos - iocb->ki_pos;
+   ret = iter.pos - iocb->ki_pos;
+   iocb->ki_pos += ret;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 431c3fd0e2b598..d57443db633637 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -720,8 +720,6 @@ xfs_file_buffered_write(
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
_buffered_write_iomap_ops);
-   if (likely(ret >= 0))
-   iocb->ki_pos += ret;
 
/*
 * If we hit a space limit, try to free up some lingering preallocated
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 132f01d3461f14..e212d0636f848e 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -643,9 +643,7 @@ static ssize_t zonefs_file_buffered_write(struct kiocb 
*iocb,
goto inode_unlock;
 
ret = iomap_file_buffered_write(iocb, from, _write_iomap_ops);
-   if (ret > 0)
-   iocb->ki_pos += ret;
-   else if (ret == -EIO)
+   if (ret == -EIO)
zonefs_io_error(inode, true);
 
 inode_unlock:
-- 
2.39.2



[Cluster-devel] [PATCH 02/11] iomap: update ki_pos a little later in iomap_dio_complete

2023-05-24 Thread Christoph Hellwig
Move the ki_pos update down a bit to prepare for a better common
helper that invalidates pages based of an iocb.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Darrick J. Wong 
---
 fs/iomap/direct-io.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 019cc87d0fb339..6207a59d2162e1 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -94,7 +94,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
if (offset + ret > dio->i_size &&
!(dio->flags & IOMAP_DIO_WRITE))
ret = dio->i_size - offset;
-   iocb->ki_pos += ret;
}
 
/*
@@ -120,19 +119,21 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
}
 
inode_dio_end(file_inode(iocb->ki_filp));
-   /*
-* If this is a DSYNC write, make sure we push it to stable storage now
-* that we've written data.
-*/
-   if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
-   ret = generic_write_sync(iocb, ret);
 
-   if (ret > 0)
-   ret += dio->done_before;
+   if (ret > 0) {
+   iocb->ki_pos += ret;
 
+   /*
+* If this is a DSYNC write, make sure we push it to stable
+* storage now that we've written data.
+*/
+   if (dio->flags & IOMAP_DIO_NEED_SYNC)
+   ret = generic_write_sync(iocb, ret);
+   if (ret > 0)
+   ret += dio->done_before;
+   }
trace_iomap_dio_complete(iocb, dio->error, ret);
kfree(dio);
-
return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_dio_complete);
-- 
2.39.2



[Cluster-devel] [PATCH 08/11] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages

2023-05-24 Thread Christoph Hellwig
Use the common helpers for direct I/O page invalidation instead of
open coding the logic.  This leads to a slight reordering of checks
in __iomap_dio_rw to keep the logic straight.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Reviewed-by: Darrick J. Wong 
---
 fs/iomap/direct-io.c | 55 
 1 file changed, 20 insertions(+), 35 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 0795c54a745bca..6bd14691f96e07 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -472,7 +472,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
unsigned int dio_flags, void *private, size_t done_before)
 {
-   struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = file_inode(iocb->ki_filp);
struct iomap_iter iomi = {
.inode  = inode,
@@ -481,11 +480,11 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
.flags  = IOMAP_DIRECT,
.private= private,
};
-   loff_t end = iomi.pos + iomi.len - 1, ret = 0;
bool wait_for_completion =
is_sync_kiocb(iocb) || (dio_flags & IOMAP_DIO_FORCE_WAIT);
struct blk_plug plug;
struct iomap_dio *dio;
+   loff_t ret = 0;
 
trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before);
 
@@ -509,31 +508,29 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->submit.waiter = current;
dio->submit.poll_bio = NULL;
 
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   iomi.flags |= IOMAP_NOWAIT;
+
if (iov_iter_rw(iter) == READ) {
if (iomi.pos >= dio->i_size)
goto out_free_dio;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, iomi.pos,
-   end)) {
-   ret = -EAGAIN;
-   goto out_free_dio;
-   }
-   iomi.flags |= IOMAP_NOWAIT;
-   }
-
if (user_backed_iter(iter))
dio->flags |= IOMAP_DIO_DIRTY;
+
+   ret = kiocb_write_and_wait(iocb, iomi.len);
+   if (ret)
+   goto out_free_dio;
} else {
iomi.flags |= IOMAP_WRITE;
dio->flags |= IOMAP_DIO_WRITE;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_has_page(mapping, iomi.pos, end)) {
-   ret = -EAGAIN;
+   if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
+   ret = -EAGAIN;
+   if (iomi.pos >= dio->i_size ||
+   iomi.pos + iomi.len > dio->i_size)
goto out_free_dio;
-   }
-   iomi.flags |= IOMAP_NOWAIT;
+   iomi.flags |= IOMAP_OVERWRITE_ONLY;
}
 
/* for data sync or sync, we need sync completion processing */
@@ -549,31 +546,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (!(iocb->ki_flags & IOCB_SYNC))
dio->flags |= IOMAP_DIO_WRITE_FUA;
}
-   }
-
-   if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
-   ret = -EAGAIN;
-   if (iomi.pos >= dio->i_size ||
-   iomi.pos + iomi.len > dio->i_size)
-   goto out_free_dio;
-   iomi.flags |= IOMAP_OVERWRITE_ONLY;
-   }
 
-   ret = filemap_write_and_wait_range(mapping, iomi.pos, end);
-   if (ret)
-   goto out_free_dio;
-
-   if (iov_iter_rw(iter) == WRITE) {
/*
 * Try to invalidate cache pages for the range we are writing.
 * If this invalidation fails, let the caller fall back to
 * buffered I/O.
 */
-   if (invalidate_inode_pages2_range(mapping,
-   iomi.pos >> PAGE_SHIFT, end >> PAGE_SHIFT)) {
-   trace_iomap_dio_invalidate_fail(inode, iomi.pos,
-   iomi.len);
-   ret = -ENOTBLK;
+   ret = kiocb_invalidate_pages(iocb, iomi.len);
+   if (ret) {
+   if (ret != -EAGAIN) {
+   trace_iomap_dio_invalidate_fail(inode, iomi.pos,
+   iomi.len);
+   ret = -ENOTBLK;
+   }
goto out_free_dio;
}
 
-- 
2.39.2



[Cluster-devel] [PATCH 04/11] filemap: add a kiocb_write_and_wait helper

2023-05-24 Thread Christoph Hellwig
Factor out a helper that does filemap_write_and_wait_range for the range
covered by a read kiocb, or returns -EAGAIN if the kiocb is marked as
nowait and there would be pages to write.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Acked-by: Darrick J. Wong 
---
 block/fops.c| 18 +++---
 include/linux/pagemap.h |  2 ++
 mm/filemap.c| 30 ++
 3 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 58d0aebc7313a8..575171049c5d83 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -576,21 +576,9 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct 
iov_iter *to)
goto reexpand; /* skip atime */
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   struct address_space *mapping = iocb->ki_filp->f_mapping;
-
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, pos,
- pos + count - 1)) {
-   ret = -EAGAIN;
-   goto reexpand;
-   }
-   } else {
-   ret = filemap_write_and_wait_range(mapping, pos,
-  pos + count - 1);
-   if (ret < 0)
-   goto reexpand;
-   }
-
+   ret = kiocb_write_and_wait(iocb, count);
+   if (ret < 0)
+   goto reexpand;
file_accessed(iocb->ki_filp);
 
ret = blkdev_direct_IO(iocb, to);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a56308a9d1a450..36fc2cea13ce20 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode 
*inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
 int filemap_flush(struct address_space *);
@@ -54,6 +55,7 @@ int filemap_check_errors(struct address_space *mapping);
 void __filemap_set_wb_err(struct address_space *mapping, int err);
 int filemap_fdatawrite_wbc(struct address_space *mapping,
   struct writeback_control *wbc);
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count);
 
 static inline int filemap_write_and_wait(struct address_space *mapping)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index 15907af4a57ff5..5fcd5227f9cae2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2762,6 +2762,21 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter 
*iter,
 }
 EXPORT_SYMBOL_GPL(filemap_read);
 
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos;
+   loff_t end = pos + count - 1;
+
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   if (filemap_range_needs_writeback(mapping, pos, end))
+   return -EAGAIN;
+   return 0;
+   }
+
+   return filemap_write_and_wait_range(mapping, pos, end);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -2797,18 +2812,9 @@ generic_file_read_iter(struct kiocb *iocb, struct 
iov_iter *iter)
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
-   iocb->ki_pos + count - 1))
-   return -EAGAIN;
-   } else {
-   retval = filemap_write_and_wait_range(mapping,
-   iocb->ki_pos,
-   iocb->ki_pos + count - 1);
-   if (retval < 0)
-   return retval;
-   }
-
+   retval = kiocb_write_and_wait(iocb, count);
+   if (retval < 0)
+   return retval;
file_accessed(file);
 
retval = mapping->a_ops->direct_IO(iocb, iter);
-- 
2.39.2



[Cluster-devel] [PATCH 09/11] fs: factor out a direct_write_fallback helper

2023-05-24 Thread Christoph Hellwig
Add a helper dealing with handling the syncing of a buffered write fallback
for direct I/O.

Signed-off-by: Christoph Hellwig 
---
 fs/libfs.c | 36 +
 include/linux/fs.h |  2 ++
 mm/filemap.c   | 66 +++---
 3 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 89cf614a327158..ad37a49e2ecfb7 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1613,3 +1613,39 @@ u64 inode_query_iversion(struct inode *inode)
return cur >> I_VERSION_QUERIED_SHIFT;
 }
 EXPORT_SYMBOL(inode_query_iversion);
+
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+   ssize_t direct_written, ssize_t buffered_written)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos - buffered_written;
+   loff_t end = iocb->ki_pos - 1;
+   int err;
+
+   /*
+* If the buffered write fallback returned an error, we want to return
+* the number of bytes which were written by direct I/O, or the error
+* code if that was zero.
+*
+* Note that this differs from normal direct-io semantics, which will
+* return -EFOO even if some bytes were written.
+*/
+   if (unlikely(buffered_written < 0))
+   return buffered_written;
+
+   /*
+* We need to ensure that the page cache pages are written to disk and
+* invalidated to preserve the expected O_DIRECT semantics.
+*/
+   err = filemap_write_and_wait_range(mapping, pos, end);
+   if (err < 0) {
+   /*
+* We don't know how much we wrote, so just return the number of
+* bytes which were direct-written
+*/
+   return err;
+   }
+   invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
+   return direct_written + buffered_written;
+}
+EXPORT_SYMBOL_GPL(direct_write_fallback);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e4efc1792a877a..576a945db178ef 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2738,6 +2738,8 @@ extern ssize_t __generic_file_write_iter(struct kiocb *, 
struct iov_iter *);
 extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *);
 ssize_t generic_perform_write(struct kiocb *, struct iov_iter *);
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+   ssize_t direct_written, ssize_t buffered_written);
 
 ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
rwf_t flags);
diff --git a/mm/filemap.c b/mm/filemap.c
index ddb6f8aa86d6ca..137508da5525b6 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4006,23 +4006,19 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 {
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
-   struct inode*inode = mapping->host;
-   ssize_t written = 0;
-   ssize_t err;
-   ssize_t status;
+   struct inode *inode = mapping->host;
+   ssize_t ret;
 
-   err = file_remove_privs(file);
-   if (err)
-   goto out;
+   ret = file_remove_privs(file);
+   if (ret)
+   return ret;
 
-   err = file_update_time(file);
-   if (err)
-   goto out;
+   ret = file_update_time(file);
+   if (ret)
+   return ret;
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   loff_t pos, endbyte;
-
-   written = generic_file_direct_write(iocb, from);
+   ret = generic_file_direct_write(iocb, from);
/*
 * If the write stopped short of completing, fall back to
 * buffered writes.  Some filesystems do this for writes to
@@ -4030,45 +4026,13 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 * not succeed (even if it did, DAX does not handle dirty
 * page-cache pages correctly).
 */
-   if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
-   goto out;
-
-   pos = iocb->ki_pos;
-   status = generic_perform_write(iocb, from);
-   /*
-* If generic_perform_write() returned a synchronous error
-* then we want to return the number of bytes which were
-* direct-written, or the error code if that was zero.  Note
-* that this differs from normal direct-io semantics, which
-* will return -EFOO even if some bytes were written.
-*/
-   if (unlikely(status < 0)) {
- 

[Cluster-devel] [PATCH 03/11] filemap: update ki_pos in generic_perform_write

2023-05-24 Thread Christoph Hellwig
All callers of generic_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Xiubo Li 
Reviewed-by: Damien Le Moal 
Acked-by: Darrick J. Wong 
---
 fs/ceph/file.c | 2 --
 fs/ext4/file.c | 9 +++--
 fs/f2fs/file.c | 1 -
 fs/nfs/file.c  | 1 -
 mm/filemap.c   | 8 
 5 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c8ef72f723badd..767f4dfe7def64 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1891,8 +1891,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
 * can not run at the same time
 */
written = generic_perform_write(iocb, from);
-   if (likely(written >= 0))
-   iocb->ki_pos = pos + written;
ceph_end_io_write(inode);
}
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index bc430270c23c19..ea0ada3985cba2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -289,12 +289,9 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
 
 out:
inode_unlock(inode);
-   if (likely(ret > 0)) {
-   iocb->ki_pos += ret;
-   ret = generic_write_sync(iocb, ret);
-   }
-
-   return ret;
+   if (unlikely(ret <= 0))
+   return ret;
+   return generic_write_sync(iocb, ret);
 }
 
 static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4f423d367a44b9..7134fe8bd008cb 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4520,7 +4520,6 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb 
*iocb,
ret = generic_perform_write(iocb, from);
 
if (ret > 0) {
-   iocb->ki_pos += ret;
f2fs_update_iostat(F2FS_I_SB(inode), inode,
APP_BUFFERED_IO, ret);
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 665ce3fc62eaf4..e8bb4c48a3210a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -655,7 +655,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter 
*from)
goto out;
 
written = result;
-   iocb->ki_pos += written;
nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
 
if (mntflags & NFS_MOUNT_WRITE_EAGER) {
diff --git a/mm/filemap.c b/mm/filemap.c
index 33b54660ad2b39..15907af4a57ff5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3957,7 +3957,10 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct 
iov_iter *i)
balance_dirty_pages_ratelimited(mapping);
} while (iov_iter_count(i));
 
-   return written ? written : status;
+   if (!written)
+   return status;
+   iocb->ki_pos += written;
+   return written;
 }
 EXPORT_SYMBOL(generic_perform_write);
 
@@ -4034,7 +4037,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
endbyte = pos + status - 1;
err = filemap_write_and_wait_range(mapping, pos, endbyte);
if (err == 0) {
-   iocb->ki_pos = endbyte + 1;
written += status;
invalidate_mapping_pages(mapping,
 pos >> PAGE_SHIFT,
@@ -4047,8 +4049,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
}
} else {
written = generic_perform_write(iocb, from);
-   if (likely(written > 0))
-   iocb->ki_pos += written;
}
 out:
return written ? written : err;
-- 
2.39.2



[Cluster-devel] [PATCH 06/11] filemap: add a kiocb_invalidate_post_direct_write helper

2023-05-24 Thread Christoph Hellwig
Add a helper to invalidate page cache after a dio write.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Acked-by: Darrick J. Wong 
---
 fs/direct-io.c  | 10 ++
 fs/iomap/direct-io.c| 12 ++--
 include/linux/fs.h  |  5 -
 include/linux/pagemap.h |  1 +
 mm/filemap.c| 37 -
 5 files changed, 25 insertions(+), 40 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 0b380bb8a81e11..4f9069aee0fe19 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -285,14 +285,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, 
unsigned int flags)
 * zeros from unwritten extents.
 */
if (flags & DIO_COMPLETE_INVALIDATE &&
-   ret > 0 && dio_op == REQ_OP_WRITE &&
-   dio->inode->i_mapping->nrpages) {
-   err = invalidate_inode_pages2_range(dio->inode->i_mapping,
-   offset >> PAGE_SHIFT,
-   (offset + ret - 1) >> PAGE_SHIFT);
-   if (err)
-   dio_warn_stale_pagecache(dio->iocb->ki_filp);
-   }
+   ret > 0 && dio_op == REQ_OP_WRITE)
+   kiocb_invalidate_post_direct_write(dio->iocb, ret);
 
inode_dio_end(dio->inode);
 
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 6207a59d2162e1..0795c54a745bca 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -81,7 +81,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 {
const struct iomap_dio_ops *dops = dio->dops;
struct kiocb *iocb = dio->iocb;
-   struct inode *inode = file_inode(iocb->ki_filp);
loff_t offset = iocb->ki_pos;
ssize_t ret = dio->error;
 
@@ -108,15 +107,8 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 * ->end_io() when necessary, otherwise a racing buffer read would cache
 * zeros from unwritten extents.
 */
-   if (!dio->error && dio->size &&
-   (dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) {
-   int err;
-   err = invalidate_inode_pages2_range(inode->i_mapping,
-   offset >> PAGE_SHIFT,
-   (offset + dio->size - 1) >> PAGE_SHIFT);
-   if (err)
-   dio_warn_stale_pagecache(iocb->ki_filp);
-   }
+   if (!dio->error && dio->size && (dio->flags & IOMAP_DIO_WRITE))
+   kiocb_invalidate_post_direct_write(iocb, dio->size);
 
inode_dio_end(file_inode(iocb->ki_filp));
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a98168085641..e4efc1792a877a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2837,11 +2837,6 @@ static inline void inode_dio_end(struct inode *inode)
wake_up_bit(>i_state, __I_DIO_WAKEUP);
 }
 
-/*
- * Warn about a page cache invalidation failure diring a direct I/O write.
- */
-void dio_warn_stale_pagecache(struct file *filp);
-
 extern void inode_set_flags(struct inode *inode, unsigned int flags,
unsigned int mask);
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6e4c9ee40baa99..6ecc4aaf5e3d51 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -31,6 +31,7 @@ int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
 int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index a1cb01a4b8046a..ddb6f8aa86d6ca 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3816,7 +3816,7 @@ EXPORT_SYMBOL(read_cache_page_gfp);
 /*
  * Warn about a page cache invalidation failure during a direct I/O write.
  */
-void dio_warn_stale_pagecache(struct file *filp)
+static void dio_warn_stale_pagecache(struct file *filp)
 {
static DEFINE_RATELIMIT_STATE(_rs, 86400 * HZ, DEFAULT_RATELIMIT_BURST);
char pathname[128];
@@ -3833,19 +3833,23 @@ void dio_warn_stale_pagecache(struct file *filp)
}
 }
 
+void kiocb_invalidate_post_direct_write(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+
+   if (mapping->nrpages &&
+   invalidate_inode_pages2_range(mapping,
+   iocb->ki_pos >> PAGE_SHIFT,
+   (iocb->ki_pos + count - 1) >> PAGE_SHIFT))
+   dio_warn_stale_pagecache(iocb->ki_filp);
+}
+
 ssize_t
 generic_file_direct

[Cluster-devel] [PATCH 11/11] fuse: drop redundant arguments to fuse_perform_write

2023-05-24 Thread Christoph Hellwig
pos is always equal to iocb->ki_pos, and mapping is always equal to
iocb->ki_filp->f_mapping.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
 fs/fuse/file.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 90d587a7bdf813..bf48aae49daf56 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1280,13 +1280,13 @@ static inline unsigned int fuse_wr_pages(loff_t pos, 
size_t len,
 max_pages);
 }
 
-static ssize_t fuse_perform_write(struct kiocb *iocb,
- struct address_space *mapping,
- struct iov_iter *ii, loff_t pos)
+static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii)
 {
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_inode *fi = get_fuse_inode(inode);
+   loff_t pos = iocb->ki_pos;
int err = 0;
ssize_t res = 0;
 
@@ -1382,8 +1382,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
if (written < 0 || !iov_iter_count(from))
goto out;
 
-   written_buffered = fuse_perform_write(iocb, mapping, from,
- iocb->ki_pos);
+   written_buffered = fuse_perform_write(iocb, from);
if (written_buffered < 0) {
err = written_buffered;
goto out;
@@ -1403,7 +1402,7 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
written += written_buffered;
iocb->ki_pos += written_buffered;
} else {
-   written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
+   written = fuse_perform_write(iocb, from);
}
 out:
inode_unlock(inode);
-- 
2.39.2



[Cluster-devel] cleanup the filemap / direct I/O interaction v2

2023-05-24 Thread Christoph Hellwig
Hi all,

this series cleans up some of the generic write helper calling
conventions and the page cache writeback / invalidation for
direct I/O.  This is a spinoff from the no-bufferhead kernel
project, for which we'll want to an use iomap based buffered
write path in the block layer.

Changes since v1:
 - remove current->backing_dev_info entirely
 - fix the pos/end calculation in direct_write_fallback
 - rename kiocb_invalidate_post_write to
   kiocb_invalidate_post_direct_write
 - typo fixes

diffstat:
 block/fops.c|   18 
 fs/btrfs/file.c |6 -
 fs/ceph/file.c  |6 -
 fs/direct-io.c  |   10 --
 fs/ext4/file.c  |   11 --
 fs/f2fs/file.c  |3 
 fs/fuse/file.c  |   36 +++-
 fs/gfs2/file.c  |6 -
 fs/iomap/buffered-io.c  |9 +-
 fs/iomap/direct-io.c|   88 -
 fs/libfs.c  |   36 
 fs/nfs/file.c   |6 -
 fs/ntfs/file.c  |2 
 fs/ntfs3/file.c |3 
 fs/xfs/xfs_file.c   |6 -
 fs/zonefs/file.c|4 
 include/linux/fs.h  |7 -
 include/linux/pagemap.h |4 
 include/linux/sched.h   |3 
 mm/filemap.c|  194 +---
 20 files changed, 193 insertions(+), 265 deletions(-)



[Cluster-devel] [PATCH 10/11] fuse: update ki_pos in fuse_perform_write

2023-05-24 Thread Christoph Hellwig
Both callers of fuse_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
---
 fs/fuse/file.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 97d435874b14aa..90d587a7bdf813 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1329,7 +1329,10 @@ static ssize_t fuse_perform_write(struct kiocb *iocb,
fuse_write_update_attr(inode, pos, res);
clear_bit(FUSE_I_SIZE_UNSTABLE, >state);
 
-   return res > 0 ? res : err;
+   if (!res)
+   return err;
+   iocb->ki_pos += res;
+   return res;
 }
 
 static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
@@ -1375,41 +1378,35 @@ static ssize_t fuse_cache_write_iter(struct kiocb 
*iocb, struct iov_iter *from)
goto out;
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   loff_t pos = iocb->ki_pos;
written = generic_file_direct_write(iocb, from);
if (written < 0 || !iov_iter_count(from))
goto out;
 
-   pos += written;
-
-   written_buffered = fuse_perform_write(iocb, mapping, from, pos);
+   written_buffered = fuse_perform_write(iocb, mapping, from,
+ iocb->ki_pos);
if (written_buffered < 0) {
err = written_buffered;
goto out;
}
-   endbyte = pos + written_buffered - 1;
+   endbyte = iocb->ki_pos + written_buffered - 1;
 
-   err = filemap_write_and_wait_range(file->f_mapping, pos,
+   err = filemap_write_and_wait_range(file->f_mapping,
+  iocb->ki_pos,
   endbyte);
if (err)
goto out;
 
invalidate_mapping_pages(file->f_mapping,
-pos >> PAGE_SHIFT,
+iocb->ki_pos >> PAGE_SHIFT,
 endbyte >> PAGE_SHIFT);
 
written += written_buffered;
-   iocb->ki_pos = pos + written_buffered;
+   iocb->ki_pos += written_buffered;
} else {
written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
-   if (written >= 0)
-   iocb->ki_pos += written;
}
 out:
inode_unlock(inode);
-   if (written > 0)
-   written = generic_write_sync(iocb, written);
-
return written ? written : err;
 }
 
-- 
2.39.2



[Cluster-devel] [PATCH 01/11] backing_dev: remove current->backing_dev_info

2023-05-24 Thread Christoph Hellwig
The last user of current->backing_dev_info disappeared in commit
b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
functions").  Remove the field and all assignments to it.

Signed-off-by: Christoph Hellwig 
---
 fs/btrfs/file.c   | 6 +-
 fs/ceph/file.c| 4 
 fs/ext4/file.c| 2 --
 fs/f2fs/file.c| 2 --
 fs/fuse/file.c| 4 
 fs/gfs2/file.c| 2 --
 fs/nfs/file.c | 5 +
 fs/ntfs/file.c| 2 --
 fs/ntfs3/file.c   | 3 ---
 fs/xfs/xfs_file.c | 4 
 include/linux/sched.h | 3 ---
 mm/filemap.c  | 3 ---
 12 files changed, 2 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f649647392e0e4..ecd43ab66fa6c7 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1145,7 +1145,6 @@ static int btrfs_write_check(struct kiocb *iocb, struct 
iov_iter *from,
!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | 
BTRFS_INODE_PREALLOC)))
return -EAGAIN;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = file_remove_privs(file);
if (ret)
return ret;
@@ -1165,10 +1164,8 @@ static int btrfs_write_check(struct kiocb *iocb, struct 
iov_iter *from,
loff_t end_pos = round_up(pos + count, fs_info->sectorsize);
 
ret = btrfs_cont_expand(BTRFS_I(inode), oldsize, end_pos);
-   if (ret) {
-   current->backing_dev_info = NULL;
+   if (ret)
return ret;
-   }
}
 
return 0;
@@ -1689,7 +1686,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct 
iov_iter *from,
if (sync)
atomic_dec(>sync_writers);
 
-   current->backing_dev_info = NULL;
return num_written;
 }
 
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index f4d8bf7dec88a8..c8ef72f723badd 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1791,9 +1791,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
else
ceph_start_io_write(inode);
 
-   /* We can write back this queue in page reclaim */
-   current->backing_dev_info = inode_to_bdi(inode);
-
if (iocb->ki_flags & IOCB_APPEND) {
err = ceph_do_getattr(inode, CEPH_STAT_CAP_SIZE, false);
if (err < 0)
@@ -1940,7 +1937,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
ceph_end_io_write(inode);
 out_unlocked:
ceph_free_cap_flush(prealloc_cf);
-   current->backing_dev_info = NULL;
return written ? written : err;
 }
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index d101b3b0c7dad8..bc430270c23c19 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -285,9 +285,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
if (ret <= 0)
goto out;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(iocb, from);
-   current->backing_dev_info = NULL;
 
 out:
inode_unlock(inode);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5ac53d2627d20d..4f423d367a44b9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4517,9 +4517,7 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb 
*iocb,
if (iocb->ki_flags & IOCB_NOWAIT)
return -EOPNOTSUPP;
 
-   current->backing_dev_info = inode_to_bdi(inode);
ret = generic_perform_write(iocb, from);
-   current->backing_dev_info = NULL;
 
if (ret > 0) {
iocb->ki_pos += ret;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 89d97f6188e05e..97d435874b14aa 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1362,9 +1362,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 writethrough:
inode_lock(inode);
 
-   /* We can write back this queue in page reclaim */
-   current->backing_dev_info = inode_to_bdi(inode);
-
err = generic_write_checks(iocb, from);
if (err <= 0)
goto out;
@@ -1409,7 +1406,6 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
iocb->ki_pos += written;
}
 out:
-   current->backing_dev_info = NULL;
inode_unlock(inode);
if (written > 0)
written = generic_write_sync(iocb, written);
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 300844f50dcd28..904a0d6ac1a1a9 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1041,11 +1041,9 @@ static ssize_t gfs2_file_buffered_write(struct kiocb 
*iocb,
goto out_unlock;
}
 
-   current->backing_dev_info = inode_to_bdi(inode);
pagefault_disable();
ret = iomap_file_buffered_write(iocb, from, _iomap_ops);
pagefault_enable();
-  

[Cluster-devel] [PATCH 05/11] filemap: add a kiocb_invalidate_pages helper

2023-05-24 Thread Christoph Hellwig
Factor out a helper that calls filemap_write_and_wait_range and
invalidate_inode_pages2_range for the range covered by a write kiocb or
returns -EAGAIN if the kiocb is marked as nowait and there would be pages
to write or invalidate.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Acked-by: Darrick J. Wong 
---
 include/linux/pagemap.h |  1 +
 mm/filemap.c| 48 -
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 36fc2cea13ce20..6e4c9ee40baa99 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode 
*inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index 5fcd5227f9cae2..a1cb01a4b8046a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2777,6 +2777,33 @@ int kiocb_write_and_wait(struct kiocb *iocb, size_t 
count)
return filemap_write_and_wait_range(mapping, pos, end);
 }
 
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos;
+   loff_t end = pos + count - 1;
+   int ret;
+
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   /* we could block if there are any pages in the range */
+   if (filemap_range_has_page(mapping, pos, end))
+   return -EAGAIN;
+   } else {
+   ret = filemap_write_and_wait_range(mapping, pos, end);
+   if (ret)
+   return ret;
+   }
+
+   /*
+* After a write we want buffered reads to be sure to go to disk to get
+* the new data.  We invalidate clean cached page from the region we're
+* about to write.  We do this *before* the write so that we can return
+* without clobbering -EIOCBQUEUED from ->direct_IO().
+*/
+   return invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
+end >> PAGE_SHIFT);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -3820,30 +3847,11 @@ generic_file_direct_write(struct kiocb *iocb, struct 
iov_iter *from)
write_len = iov_iter_count(from);
end = (pos + write_len - 1) >> PAGE_SHIFT;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   /* If there are pages to writeback, return */
-   if (filemap_range_has_page(file->f_mapping, pos,
-  pos + write_len - 1))
-   return -EAGAIN;
-   } else {
-   written = filemap_write_and_wait_range(mapping, pos,
-   pos + write_len - 1);
-   if (written)
-   goto out;
-   }
-
-   /*
-* After a write we want buffered reads to be sure to go to disk to get
-* the new data.  We invalidate clean cached page from the region we're
-* about to write.  We do this *before* the write so that we can return
-* without clobbering -EIOCBQUEUED from ->direct_IO().
-*/
-   written = invalidate_inode_pages2_range(mapping,
-   pos >> PAGE_SHIFT, end);
/*
 * If a page can not be invalidated, return 0 to fall back
 * to buffered write.
 */
+   written = kiocb_invalidate_pages(iocb, write_len);
if (written) {
if (written == -EBUSY)
return 0;
-- 
2.39.2



Re: [Cluster-devel] [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping

2023-05-24 Thread Christoph Hellwig
On Tue, May 23, 2023 at 12:35:35PM -0400, Kent Overstreet wrote:
> No, this is fundamentally because userspace controls the ordering of
> locking because the buffer passed to dio can point into any address
> space. You can't solve this by changing the locking heirarchy.
> 
> If you want to be able to have locking around adding things to the
> pagecache so that things that bypass the pagecache can prevent
> inconsistencies (and we do, the big one is fcollapse), and if you want
> dio to be able to use that same locking (because otherwise dio will also
> cause page cache inconsistency), this is the way to do it.

Well, it seems like you are talking about something else than the
existing cases in gfs2 and btrfs, that is you want full consistency
between direct I/O and buffered I/O.  That's something nothing in the
kernel has ever provided, so I'd be curious why you think you need it
and want different semantics from everyone else?



Re: [Cluster-devel] [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping

2023-05-23 Thread Christoph Hellwig
On Tue, May 23, 2023 at 03:34:31PM +0200, Jan Kara wrote:
> I've checked the code and AFAICT it is all indeed handled. BTW, I've now
> remembered that GFS2 has dealt with the same deadlocks - b01b2d72da25
> ("gfs2: Fix mmap + page fault deadlocks for direct I/O") - in a different
> way (by prefaulting pages from the iter before grabbing the problematic
> lock and then disabling page faults for the iomap_dio_rw() call). I guess
> we should somehow unify these schemes so that we don't have two mechanisms
> for avoiding exactly the same deadlock. Adding GFS2 guys to CC.
> 
> Also good that you've written a fstest for this, that is definitely a useful
> addition, although I suspect GFS2 guys added a test for this not so long
> ago when testing their stuff. Maybe they have a pointer handy?

generic/708 is the btrfs version of this.

But I think all of the file systems that have this deadlock are actually
fundamentally broken because they have a mess up locking hierarchy
where page faults take the same lock that is held over the the direct I/
operation.  And the right thing is to fix this.  I have work in progress
for btrfs, and something similar should apply to gfs2, with the added
complication that it probably means a revision to their network
protocol.

I'm absolutely not in favour to add workarounds for thes kind of locking
problems to the core kernel.  I already feel bad for allowing the
small workaround in iomap for btrfs, as just fixing the locking back
then would have avoid massive ratholing.



Re: [Cluster-devel] [PATCH 10/13] fs: factor out a direct_write_fallback helper

2023-05-23 Thread Christoph Hellwig
On Mon, May 22, 2023 at 04:19:38PM +0200, Miklos Szeredi wrote:
> > +   ssize_t direct_written, ssize_t buffered_written)
> > +{
> > +   struct address_space *mapping = iocb->ki_filp->f_mapping;
> > +   loff_t pos = iocb->ki_pos, end;
> 
> At this point pos will point after the end of the buffered write (as
> per earlier patches), yes?

Yes.  I'll fix the pos and end calculation.



Re: [Cluster-devel] [PATCH 08/13] iomap: assign current->backing_dev_info in iomap_file_buffered_write

2023-05-23 Thread Christoph Hellwig
On Tue, May 23, 2023 at 04:30:51AM +0100, Matthew Wilcox wrote:
> AFAICT (the code went through some metamorphoses in the intervening
> twenty years), the last use of it ended up in current_may_throttle(),
> and it was removed in March 2022 by Neil Brown in commit b9b1335e6403.
> Since then, there have been no users of task->backing_dev_info, and I'm
> pretty sure it can go away.

Oh, nice.  I hadn't noticed it finally went away.  The next iteration
of the series will just remove it.



Re: [Cluster-devel] [PATCH 07/13] iomap: update ki_pos in iomap_file_buffered_write

2023-05-23 Thread Christoph Hellwig
On Mon, May 22, 2023 at 09:01:05AM +0900, Damien Le Moal wrote:
> > -   int ret;
> > +   ssize_t ret;
> >  
> > if (iocb->ki_flags & IOCB_NOWAIT)
> > iter.flags |= IOMAP_NOWAIT;
> >  
> > while ((ret = iomap_iter(, ops)) > 0)
> > iter.processed = iomap_write_iter(, i);
> > -   if (iter.pos == iocb->ki_pos)
> > +
> > +   if (unlikely(ret < 0))
> 
> Nit: This could be if (unlikely(ret <= 0)), no ?

No.  iomap_iter does not return te amount of bytes written.



Re: [Cluster-devel] [PATCH 06/13] filemap: add a kiocb_invalidate_post_write helper

2023-05-23 Thread Christoph Hellwig
On Mon, May 22, 2023 at 08:56:34AM +0900, Damien Le Moal wrote:
> On 5/19/23 18:35, Christoph Hellwig wrote:
> > Add a helper to invalidate page cache after a dio write.
> > 
> > Signed-off-by: Christoph Hellwig 
> 
> Nit: kiocb_invalidate_post_dio_write() may be a better name to be explicit 
> about
> the fact that this is for DIOs only ?

I've renamed it to kiocb_invalidate_post_direct_write, thanks.



[Cluster-devel] [PATCH 10/13] fs: factor out a direct_write_fallback helper

2023-05-19 Thread Christoph Hellwig
Add a helper dealing with handling the syncing of a buffered write fallback
for direct I/O.

Signed-off-by: Christoph Hellwig 
---
 fs/libfs.c | 36 
 include/linux/fs.h |  2 ++
 mm/filemap.c   | 59 ++
 3 files changed, 50 insertions(+), 47 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 89cf614a327158..9f3791fc6e0715 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1613,3 +1613,39 @@ u64 inode_query_iversion(struct inode *inode)
return cur >> I_VERSION_QUERIED_SHIFT;
 }
 EXPORT_SYMBOL(inode_query_iversion);
+
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+   ssize_t direct_written, ssize_t buffered_written)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos, end;
+   int err;
+
+   /*
+* If the buffered write fallback returned an error, we want to return
+* the number of bytes which were written by direct I/O, or the error
+* code if that was zero.
+*
+* Note that this differs from normal direct-io semantics, which will
+* return -EFOO even if some bytes were written.
+*/
+   if (unlikely(buffered_written < 0))
+   return buffered_written;
+
+   /*
+* We need to ensure that the page cache pages are written to disk and
+* invalidated to preserve the expected O_DIRECT semantics.
+*/
+   end = pos + buffered_written - 1;
+   err = filemap_write_and_wait_range(mapping, pos, end);
+   if (err < 0) {
+   /*
+* We don't know how much we wrote, so just return the number of
+* bytes which were direct-written
+*/
+   return err;
+   }
+   invalidate_mapping_pages(mapping, pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
+   return direct_written + buffered_written;
+}
+EXPORT_SYMBOL_GPL(direct_write_fallback);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e4efc1792a877a..576a945db178ef 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2738,6 +2738,8 @@ extern ssize_t __generic_file_write_iter(struct kiocb *, 
struct iov_iter *);
 extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *);
 extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *);
 ssize_t generic_perform_write(struct kiocb *, struct iov_iter *);
+ssize_t direct_write_fallback(struct kiocb *iocb, struct iov_iter *iter,
+   ssize_t direct_written, ssize_t buffered_written);
 
 ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos,
rwf_t flags);
diff --git a/mm/filemap.c b/mm/filemap.c
index c1b988199aece5..875b2108d0a05f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4008,25 +4008,21 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 {
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
-   struct inode*inode = mapping->host;
-   ssize_t written = 0;
-   ssize_t err;
-   ssize_t status;
+   struct inode *inode = mapping->host;
+   ssize_t ret;
 
/* We can write back this queue in page reclaim */
current->backing_dev_info = inode_to_bdi(inode);
-   err = file_remove_privs(file);
-   if (err)
+   ret = file_remove_privs(file);
+   if (ret)
goto out;
 
-   err = file_update_time(file);
-   if (err)
+   ret = file_update_time(file);
+   if (ret)
goto out;
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   loff_t pos, endbyte;
-
-   written = generic_file_direct_write(iocb, from);
+   ret = generic_file_direct_write(iocb, from);
/*
 * If the write stopped short of completing, fall back to
 * buffered writes.  Some filesystems do this for writes to
@@ -4034,46 +4030,15 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
 * not succeed (even if it did, DAX does not handle dirty
 * page-cache pages correctly).
 */
-   if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
-   goto out;
-
-   pos = iocb->ki_pos;
-   status = generic_perform_write(iocb, from);
-   /*
-* If generic_perform_write() returned a synchronous error
-* then we want to return the number of bytes which were
-* direct-written, or the error code if that was zero.  Note
-* that this differs from normal direct-io semantics, which
-* will return -EFOO even if some bytes were written.
-*/
-   if 

[Cluster-devel] [PATCH 07/13] iomap: update ki_pos in iomap_file_buffered_write

2023-05-19 Thread Christoph Hellwig
All callers of iomap_file_buffered_write need to updated ki_pos, move it
into common code.

Signed-off-by: Christoph Hellwig 
---
 fs/gfs2/file.c | 4 +---
 fs/iomap/buffered-io.c | 9 ++---
 fs/xfs/xfs_file.c  | 2 --
 fs/zonefs/file.c   | 4 +---
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 300844f50dcd28..499ef174dec138 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1046,10 +1046,8 @@ static ssize_t gfs2_file_buffered_write(struct kiocb 
*iocb,
ret = iomap_file_buffered_write(iocb, from, _iomap_ops);
pagefault_enable();
current->backing_dev_info = NULL;
-   if (ret > 0) {
-   iocb->ki_pos += ret;
+   if (ret > 0)
written += ret;
-   }
 
if (inode == sdp->sd_rindex)
gfs2_glock_dq_uninit(statfs_gh);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 063133ec77f49e..550525a525c45c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -864,16 +864,19 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
iov_iter *i,
.len= iov_iter_count(i),
.flags  = IOMAP_WRITE,
};
-   int ret;
+   ssize_t ret;
 
if (iocb->ki_flags & IOCB_NOWAIT)
iter.flags |= IOMAP_NOWAIT;
 
while ((ret = iomap_iter(, ops)) > 0)
iter.processed = iomap_write_iter(, i);
-   if (iter.pos == iocb->ki_pos)
+
+   if (unlikely(ret < 0))
return ret;
-   return iter.pos - iocb->ki_pos;
+   ret = iter.pos - iocb->ki_pos;
+   iocb->ki_pos += ret;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index aede746541f8ae..bfba10e0b0f3c2 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -723,8 +723,6 @@ xfs_file_buffered_write(
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
_buffered_write_iomap_ops);
-   if (likely(ret >= 0))
-   iocb->ki_pos += ret;
 
/*
 * If we hit a space limit, try to free up some lingering preallocated
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 132f01d3461f14..e212d0636f848e 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -643,9 +643,7 @@ static ssize_t zonefs_file_buffered_write(struct kiocb 
*iocb,
goto inode_unlock;
 
ret = iomap_file_buffered_write(iocb, from, _write_iomap_ops);
-   if (ret > 0)
-   iocb->ki_pos += ret;
-   else if (ret == -EIO)
+   if (ret == -EIO)
zonefs_io_error(inode, true);
 
 inode_unlock:
-- 
2.39.2



[Cluster-devel] [PATCH 06/13] filemap: add a kiocb_invalidate_post_write helper

2023-05-19 Thread Christoph Hellwig
Add a helper to invalidate page cache after a dio write.

Signed-off-by: Christoph Hellwig 
---
 fs/direct-io.c  | 10 ++
 fs/iomap/direct-io.c| 12 ++--
 include/linux/fs.h  |  5 -
 include/linux/pagemap.h |  1 +
 mm/filemap.c| 37 -
 5 files changed, 25 insertions(+), 40 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 0b380bb8a81e11..c25d68eabf4281 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -285,14 +285,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, 
unsigned int flags)
 * zeros from unwritten extents.
 */
if (flags & DIO_COMPLETE_INVALIDATE &&
-   ret > 0 && dio_op == REQ_OP_WRITE &&
-   dio->inode->i_mapping->nrpages) {
-   err = invalidate_inode_pages2_range(dio->inode->i_mapping,
-   offset >> PAGE_SHIFT,
-   (offset + ret - 1) >> PAGE_SHIFT);
-   if (err)
-   dio_warn_stale_pagecache(dio->iocb->ki_filp);
-   }
+   ret > 0 && dio_op == REQ_OP_WRITE)
+   kiocb_invalidate_post_write(dio->iocb, ret);
 
inode_dio_end(dio->inode);
 
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 6207a59d2162e1..45accd98344e79 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -81,7 +81,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 {
const struct iomap_dio_ops *dops = dio->dops;
struct kiocb *iocb = dio->iocb;
-   struct inode *inode = file_inode(iocb->ki_filp);
loff_t offset = iocb->ki_pos;
ssize_t ret = dio->error;
 
@@ -108,15 +107,8 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 * ->end_io() when necessary, otherwise a racing buffer read would cache
 * zeros from unwritten extents.
 */
-   if (!dio->error && dio->size &&
-   (dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) {
-   int err;
-   err = invalidate_inode_pages2_range(inode->i_mapping,
-   offset >> PAGE_SHIFT,
-   (offset + dio->size - 1) >> PAGE_SHIFT);
-   if (err)
-   dio_warn_stale_pagecache(iocb->ki_filp);
-   }
+   if (!dio->error && dio->size && (dio->flags & IOMAP_DIO_WRITE))
+   kiocb_invalidate_post_write(iocb, dio->size);
 
inode_dio_end(file_inode(iocb->ki_filp));
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a98168085641..e4efc1792a877a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2837,11 +2837,6 @@ static inline void inode_dio_end(struct inode *inode)
wake_up_bit(>i_state, __I_DIO_WAKEUP);
 }
 
-/*
- * Warn about a page cache invalidation failure diring a direct I/O write.
- */
-void dio_warn_stale_pagecache(struct file *filp);
-
 extern void inode_set_flags(struct inode *inode, unsigned int flags,
unsigned int mask);
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 6e4c9ee40baa99..9695730ea86a98 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -31,6 +31,7 @@ int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
 int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
+void kiocb_invalidate_post_write(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index 8607220e20eae3..c1b988199aece5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3816,7 +3816,7 @@ EXPORT_SYMBOL(read_cache_page_gfp);
 /*
  * Warn about a page cache invalidation failure during a direct I/O write.
  */
-void dio_warn_stale_pagecache(struct file *filp)
+static void dio_warn_stale_pagecache(struct file *filp)
 {
static DEFINE_RATELIMIT_STATE(_rs, 86400 * HZ, DEFAULT_RATELIMIT_BURST);
char pathname[128];
@@ -3833,19 +3833,23 @@ void dio_warn_stale_pagecache(struct file *filp)
}
 }
 
+void kiocb_invalidate_post_write(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+
+   if (mapping->nrpages &&
+   invalidate_inode_pages2_range(mapping,
+   iocb->ki_pos >> PAGE_SHIFT,
+   (iocb->ki_pos + count - 1) >> PAGE_SHIFT))
+   dio_warn_stale_pagecache(iocb->ki_filp);
+}
+
 ssize_t
 generic_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 {
-   struct file

[Cluster-devel] cleanup the filemap / direct I/O interaction

2023-05-19 Thread Christoph Hellwig
Hi all,

this series cleans up some of the generic write helper calling
conventions and the page cache writeback / invalidation for
direct I/O.  This is a spinoff from the no-bufferhead kernel
project, for while we'll want to an use iomap based buffered
write path in the block layer.

diffstat:
 block/fops.c|   18 
 fs/ceph/file.c  |6 -
 fs/direct-io.c  |   10 --
 fs/ext4/file.c  |   12 ---
 fs/f2fs/file.c  |3 
 fs/fuse/file.c  |   47 ++--
 fs/gfs2/file.c  |7 -
 fs/iomap/buffered-io.c  |   12 ++-
 fs/iomap/direct-io.c|   88 --
 fs/libfs.c  |   36 +
 fs/nfs/file.c   |6 -
 fs/xfs/xfs_file.c   |7 -
 fs/zonefs/file.c|4 -
 include/linux/fs.h  |7 -
 include/linux/pagemap.h |4 +
 mm/filemap.c|  184 +---
 16 files changed, 190 insertions(+), 261 deletions(-)



[Cluster-devel] [PATCH 11/13] fuse: update ki_pos in fuse_perform_write

2023-05-19 Thread Christoph Hellwig
Both callers of fuse_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig 
---
 fs/fuse/file.c | 25 +++--
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 89d97f6188e05e..fd2f27f2144750 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1329,7 +1329,10 @@ static ssize_t fuse_perform_write(struct kiocb *iocb,
fuse_write_update_attr(inode, pos, res);
clear_bit(FUSE_I_SIZE_UNSTABLE, >state);
 
-   return res > 0 ? res : err;
+   if (!res)
+   return err;
+   iocb->ki_pos += res;
+   return res;
 }
 
 static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
@@ -1378,42 +1381,36 @@ static ssize_t fuse_cache_write_iter(struct kiocb 
*iocb, struct iov_iter *from)
goto out;
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   loff_t pos = iocb->ki_pos;
written = generic_file_direct_write(iocb, from);
if (written < 0 || !iov_iter_count(from))
goto out;
 
-   pos += written;
-
-   written_buffered = fuse_perform_write(iocb, mapping, from, pos);
+   written_buffered = fuse_perform_write(iocb, mapping, from,
+ iocb->ki_pos);
if (written_buffered < 0) {
err = written_buffered;
goto out;
}
-   endbyte = pos + written_buffered - 1;
+   endbyte = iocb->ki_pos + written_buffered - 1;
 
-   err = filemap_write_and_wait_range(file->f_mapping, pos,
+   err = filemap_write_and_wait_range(file->f_mapping,
+  iocb->ki_pos,
   endbyte);
if (err)
goto out;
 
invalidate_mapping_pages(file->f_mapping,
-pos >> PAGE_SHIFT,
+iocb->ki_pos >> PAGE_SHIFT,
 endbyte >> PAGE_SHIFT);
 
written += written_buffered;
-   iocb->ki_pos = pos + written_buffered;
+   iocb->ki_pos += written_buffered;
} else {
written = fuse_perform_write(iocb, mapping, from, iocb->ki_pos);
-   if (written >= 0)
-   iocb->ki_pos += written;
}
 out:
current->backing_dev_info = NULL;
inode_unlock(inode);
-   if (written > 0)
-   written = generic_write_sync(iocb, written);
-
return written ? written : err;
 }
 
-- 
2.39.2



[Cluster-devel] [PATCH 05/13] filemap: add a kiocb_invalidate_pages helper

2023-05-19 Thread Christoph Hellwig
Factor out a helper that calls filemap_write_and_wait_range and
invalidate_inode_pages2_rangefor a the range covered by a write kiocb or
returns -EAGAIN if the kiocb is marked as nowait and there would be pages
to write or invalidate.

Signed-off-by: Christoph Hellwig 
---
 include/linux/pagemap.h |  1 +
 mm/filemap.c| 48 -
 2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 36fc2cea13ce20..6e4c9ee40baa99 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode 
*inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count);
 
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
diff --git a/mm/filemap.c b/mm/filemap.c
index 2d7712b13b95c9..8607220e20eae3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2777,6 +2777,33 @@ int kiocb_write_and_wait(struct kiocb *iocb, size_t 
count)
return filemap_write_and_wait_range(mapping, pos, end);
 }
 
+int kiocb_invalidate_pages(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos;
+   loff_t end = pos + count - 1;
+   int ret;
+
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   /* we could block if there are any pages in the range */
+   if (filemap_range_has_page(mapping, pos, end))
+   return -EAGAIN;
+   } else {
+   ret = filemap_write_and_wait_range(mapping, pos, end);
+   if (ret)
+   return ret;
+   }
+
+   /*
+* After a write we want buffered reads to be sure to go to disk to get
+* the new data.  We invalidate clean cached page from the region we're
+* about to write.  We do this *before* the write so that we can return
+* without clobbering -EIOCBQUEUED from ->direct_IO().
+*/
+   return invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT,
+end >> PAGE_SHIFT);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -3820,30 +3847,11 @@ generic_file_direct_write(struct kiocb *iocb, struct 
iov_iter *from)
write_len = iov_iter_count(from);
end = (pos + write_len - 1) >> PAGE_SHIFT;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   /* If there are pages to writeback, return */
-   if (filemap_range_has_page(file->f_mapping, pos,
-  pos + write_len - 1))
-   return -EAGAIN;
-   } else {
-   written = filemap_write_and_wait_range(mapping, pos,
-   pos + write_len - 1);
-   if (written)
-   goto out;
-   }
-
-   /*
-* After a write we want buffered reads to be sure to go to disk to get
-* the new data.  We invalidate clean cached page from the region we're
-* about to write.  We do this *before* the write so that we can return
-* without clobbering -EIOCBQUEUED from ->direct_IO().
-*/
-   written = invalidate_inode_pages2_range(mapping,
-   pos >> PAGE_SHIFT, end);
/*
 * If a page can not be invalidated, return 0 to fall back
 * to buffered write.
 */
+   written = kiocb_invalidate_pages(iocb, write_len);
if (written) {
if (written == -EBUSY)
return 0;
-- 
2.39.2



[Cluster-devel] [PATCH 02/13] filemap: update ki_pos in generic_perform_write

2023-05-19 Thread Christoph Hellwig
All callers of generic_perform_write need to updated ki_pos, move it into
common code.

Signed-off-by: Christoph Hellwig 
---
 fs/ceph/file.c | 2 --
 fs/ext4/file.c | 9 +++--
 fs/f2fs/file.c | 1 -
 fs/nfs/file.c  | 1 -
 mm/filemap.c   | 8 
 5 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index f4d8bf7dec88a8..feeb9882ef635a 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1894,8 +1894,6 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
 * can not run at the same time
 */
written = generic_perform_write(iocb, from);
-   if (likely(written >= 0))
-   iocb->ki_pos = pos + written;
ceph_end_io_write(inode);
}
 
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index d101b3b0c7dad8..50824831d31def 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -291,12 +291,9 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb,
 
 out:
inode_unlock(inode);
-   if (likely(ret > 0)) {
-   iocb->ki_pos += ret;
-   ret = generic_write_sync(iocb, ret);
-   }
-
-   return ret;
+   if (unlikely(ret <= 0))
+   return ret;
+   return generic_write_sync(iocb, ret);
 }
 
 static ssize_t ext4_handle_inode_extension(struct inode *inode, loff_t offset,
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5ac53d2627d20d..9e3855e43a7a63 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -4522,7 +4522,6 @@ static ssize_t f2fs_buffered_write_iter(struct kiocb 
*iocb,
current->backing_dev_info = NULL;
 
if (ret > 0) {
-   iocb->ki_pos += ret;
f2fs_update_iostat(F2FS_I_SB(inode), inode,
APP_BUFFERED_IO, ret);
}
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index f0edf5a36237d1..3cc87ae8473356 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -658,7 +658,6 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter 
*from)
goto out;
 
written = result;
-   iocb->ki_pos += written;
nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
 
if (mntflags & NFS_MOUNT_WRITE_EAGER) {
diff --git a/mm/filemap.c b/mm/filemap.c
index b4c9bd368b7e58..4d0ec2fa1c7070 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3957,7 +3957,10 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct 
iov_iter *i)
balance_dirty_pages_ratelimited(mapping);
} while (iov_iter_count(i));
 
-   return written ? written : status;
+   if (!written)
+   return status;
+   iocb->ki_pos += written;
+   return written;
 }
 EXPORT_SYMBOL(generic_perform_write);
 
@@ -4036,7 +4039,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
endbyte = pos + status - 1;
err = filemap_write_and_wait_range(mapping, pos, endbyte);
if (err == 0) {
-   iocb->ki_pos = endbyte + 1;
written += status;
invalidate_mapping_pages(mapping,
 pos >> PAGE_SHIFT,
@@ -4049,8 +4051,6 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
}
} else {
written = generic_perform_write(iocb, from);
-   if (likely(written > 0))
-   iocb->ki_pos += written;
}
 out:
current->backing_dev_info = NULL;
-- 
2.39.2



[Cluster-devel] [PATCH 08/13] iomap: assign current->backing_dev_info in iomap_file_buffered_write

2023-05-19 Thread Christoph Hellwig
Move the assignment to current->backing_dev_info from the callers into
iomap_file_buffered_write to reduce boiler plate code and reduce the
scope to just around the page dirtying loop.

Note that zonefs was missing this assignment before.

Signed-off-by: Christoph Hellwig 
---
 fs/gfs2/file.c | 3 ---
 fs/iomap/buffered-io.c | 3 +++
 fs/xfs/xfs_file.c  | 5 -
 3 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 499ef174dec138..261897fcfbc495 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "gfs2.h"
@@ -1041,11 +1040,9 @@ static ssize_t gfs2_file_buffered_write(struct kiocb 
*iocb,
goto out_unlock;
}
 
-   current->backing_dev_info = inode_to_bdi(inode);
pagefault_disable();
ret = iomap_file_buffered_write(iocb, from, _iomap_ops);
pagefault_enable();
-   current->backing_dev_info = NULL;
if (ret > 0)
written += ret;
 
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 550525a525c45c..b2779bd1f10611 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2010 Red Hat, Inc.
  * Copyright (C) 2016-2019 Christoph Hellwig.
  */
+#include 
 #include 
 #include 
 #include 
@@ -869,8 +870,10 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
iov_iter *i,
if (iocb->ki_flags & IOCB_NOWAIT)
iter.flags |= IOMAP_NOWAIT;
 
+   current->backing_dev_info = inode_to_bdi(iter.inode);
while ((ret = iomap_iter(, ops)) > 0)
iter.processed = iomap_write_iter(, i);
+   current->backing_dev_info = NULL;
 
if (unlikely(ret < 0))
return ret;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index bfba10e0b0f3c2..98d763cc3b114c 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -27,7 +27,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -717,9 +716,6 @@ xfs_file_buffered_write(
if (ret)
goto out;
 
-   /* We can write back this queue in page reclaim */
-   current->backing_dev_info = inode_to_bdi(inode);
-
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
_buffered_write_iomap_ops);
@@ -751,7 +747,6 @@ xfs_file_buffered_write(
goto write_retry;
}
 
-   current->backing_dev_info = NULL;
 out:
if (iolock)
xfs_iunlock(ip, iolock);
-- 
2.39.2



[Cluster-devel] [PATCH 09/13] iomap: use kiocb_write_and_wait and kiocb_invalidate_pages

2023-05-19 Thread Christoph Hellwig
Use the common helpers for direct I/O page invalidation instead of
open coding the logic.  This leads to a slight reordering of checks
in __iomap_dio_rw to keep the logic straight.

Signed-off-by: Christoph Hellwig 
---
 fs/iomap/direct-io.c | 55 
 1 file changed, 20 insertions(+), 35 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 45accd98344e79..ccf51d57619721 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -472,7 +472,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
unsigned int dio_flags, void *private, size_t done_before)
 {
-   struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = file_inode(iocb->ki_filp);
struct iomap_iter iomi = {
.inode  = inode,
@@ -481,11 +480,11 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
.flags  = IOMAP_DIRECT,
.private= private,
};
-   loff_t end = iomi.pos + iomi.len - 1, ret = 0;
bool wait_for_completion =
is_sync_kiocb(iocb) || (dio_flags & IOMAP_DIO_FORCE_WAIT);
struct blk_plug plug;
struct iomap_dio *dio;
+   loff_t ret = 0;
 
trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before);
 
@@ -509,31 +508,29 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->submit.waiter = current;
dio->submit.poll_bio = NULL;
 
+   if (iocb->ki_flags & IOCB_NOWAIT)
+   iomi.flags |= IOMAP_NOWAIT;
+
if (iov_iter_rw(iter) == READ) {
if (iomi.pos >= dio->i_size)
goto out_free_dio;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, iomi.pos,
-   end)) {
-   ret = -EAGAIN;
-   goto out_free_dio;
-   }
-   iomi.flags |= IOMAP_NOWAIT;
-   }
-
if (user_backed_iter(iter))
dio->flags |= IOMAP_DIO_DIRTY;
+
+   ret = kiocb_write_and_wait(iocb, iomi.len);
+   if (ret)
+   goto out_free_dio;
} else {
iomi.flags |= IOMAP_WRITE;
dio->flags |= IOMAP_DIO_WRITE;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_has_page(mapping, iomi.pos, end)) {
-   ret = -EAGAIN;
+   if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
+   ret = -EAGAIN;
+   if (iomi.pos >= dio->i_size ||
+   iomi.pos + iomi.len > dio->i_size)
goto out_free_dio;
-   }
-   iomi.flags |= IOMAP_NOWAIT;
+   iomi.flags |= IOMAP_OVERWRITE_ONLY;
}
 
/* for data sync or sync, we need sync completion processing */
@@ -549,31 +546,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (!(iocb->ki_flags & IOCB_SYNC))
dio->flags |= IOMAP_DIO_WRITE_FUA;
}
-   }
-
-   if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
-   ret = -EAGAIN;
-   if (iomi.pos >= dio->i_size ||
-   iomi.pos + iomi.len > dio->i_size)
-   goto out_free_dio;
-   iomi.flags |= IOMAP_OVERWRITE_ONLY;
-   }
 
-   ret = filemap_write_and_wait_range(mapping, iomi.pos, end);
-   if (ret)
-   goto out_free_dio;
-
-   if (iov_iter_rw(iter) == WRITE) {
/*
 * Try to invalidate cache pages for the range we are writing.
 * If this invalidation fails, let the caller fall back to
 * buffered I/O.
 */
-   if (invalidate_inode_pages2_range(mapping,
-   iomi.pos >> PAGE_SHIFT, end >> PAGE_SHIFT)) {
-   trace_iomap_dio_invalidate_fail(inode, iomi.pos,
-   iomi.len);
-   ret = -ENOTBLK;
+   ret = kiocb_invalidate_pages(iocb, iomi.len);
+   if (ret) {
+   if (ret != -EAGAIN) {
+   trace_iomap_dio_invalidate_fail(inode, iomi.pos,
+   iomi.len);
+   ret = -ENOTBLK;
+   }
goto out_free_dio;
}
 
-- 
2.39.2



[Cluster-devel] [PATCH 04/13] filemap: add a kiocb_write_and_wait helper

2023-05-19 Thread Christoph Hellwig
Factor out a helper that does filemap_write_and_wait_range for a the
range covered by a read kiocb, or returns -EAGAIN if the kiocb
is marked as nowait and there would be pages to write.

Signed-off-by: Christoph Hellwig 
---
 block/fops.c| 18 +++---
 include/linux/pagemap.h |  2 ++
 mm/filemap.c| 30 ++
 3 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index d2e6be4e3d1c7d..c194939b851cfb 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -576,21 +576,9 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct 
iov_iter *to)
goto reexpand; /* skip atime */
 
if (iocb->ki_flags & IOCB_DIRECT) {
-   struct address_space *mapping = iocb->ki_filp->f_mapping;
-
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, pos,
- pos + count - 1)) {
-   ret = -EAGAIN;
-   goto reexpand;
-   }
-   } else {
-   ret = filemap_write_and_wait_range(mapping, pos,
-  pos + count - 1);
-   if (ret < 0)
-   goto reexpand;
-   }
-
+   ret = kiocb_write_and_wait(iocb, count);
+   if (ret < 0)
+   goto reexpand;
file_accessed(iocb->ki_filp);
 
ret = blkdev_direct_IO(iocb, to);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a56308a9d1a450..36fc2cea13ce20 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -30,6 +30,7 @@ static inline void invalidate_remote_inode(struct inode 
*inode)
 int invalidate_inode_pages2(struct address_space *mapping);
 int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
+
 int write_inode_now(struct inode *, int sync);
 int filemap_fdatawrite(struct address_space *);
 int filemap_flush(struct address_space *);
@@ -54,6 +55,7 @@ int filemap_check_errors(struct address_space *mapping);
 void __filemap_set_wb_err(struct address_space *mapping, int err);
 int filemap_fdatawrite_wbc(struct address_space *mapping,
   struct writeback_control *wbc);
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count);
 
 static inline int filemap_write_and_wait(struct address_space *mapping)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index bf693ad1da1ece..2d7712b13b95c9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2762,6 +2762,21 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter 
*iter,
 }
 EXPORT_SYMBOL_GPL(filemap_read);
 
+int kiocb_write_and_wait(struct kiocb *iocb, size_t count)
+{
+   struct address_space *mapping = iocb->ki_filp->f_mapping;
+   loff_t pos = iocb->ki_pos;
+   loff_t end = pos + count - 1;
+
+   if (iocb->ki_flags & IOCB_NOWAIT) {
+   if (filemap_range_needs_writeback(mapping, pos, end))
+   return -EAGAIN;
+   return 0;
+   }
+
+   return filemap_write_and_wait_range(mapping, pos, end);
+}
+
 /**
  * generic_file_read_iter - generic filesystem read routine
  * @iocb:  kernel I/O control block
@@ -2797,18 +2812,9 @@ generic_file_read_iter(struct kiocb *iocb, struct 
iov_iter *iter)
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
 
-   if (iocb->ki_flags & IOCB_NOWAIT) {
-   if (filemap_range_needs_writeback(mapping, iocb->ki_pos,
-   iocb->ki_pos + count - 1))
-   return -EAGAIN;
-   } else {
-   retval = filemap_write_and_wait_range(mapping,
-   iocb->ki_pos,
-   iocb->ki_pos + count - 1);
-   if (retval < 0)
-   return retval;
-   }
-
+   retval = kiocb_write_and_wait(iocb, count);
+   if (retval < 0)
+   return retval;
file_accessed(file);
 
retval = mapping->a_ops->direct_IO(iocb, iter);
-- 
2.39.2



  1   2   3   4   5   6   7   8   9   10   >