Re: [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end

2019-06-26 Thread Andreas Gruenbacher
On Wed, 26 Jun 2019 at 08:04, Christoph Hellwig  wrote:
> On Tue, Jun 25, 2019 at 08:13:29PM +0200, Andreas Gruenbacher wrote:
> > On Tue, 25 Jun 2019 at 12:50, Christoph Hellwig  wrote:
> > >
> > > > That seems way more complicated.  I'd much rather go with something
> > > > like may patch plus maybe a big fat comment explaining that persisting
> > > > the size update is the file systems job.  Note that a lot of the modern
> > > > file systems don't use the VFS inode tracking for that, besides XFS
> > > > that includes at least btrfs and ocfs2 as well.
> > >
> > > I'd suggest something like this as the baseline:
> > >
> > > http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/iomap-i_size
> >
> > Alright, can we change this as follows?
> >
> > [Also, I'm not really sure why we check for (pos + ret > inode->i_size)
> > when we have already read inode->i_size into old_size.]
>
> Yeah, you probably want to change that to old_size.  Your changes look
> good to me,
>
> Can you just take the patch over from here as you've clearly done more
> work on it and resend the whole series?

Ok, done. It's just the two patches; passes xfstests for xfs and gfs2,
the current users.

Darrick, can you please push this in the next merge window as "usual"?

Thanks,
Andreas



Re: [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end

2019-06-26 Thread Christoph Hellwig
On Tue, Jun 25, 2019 at 08:13:29PM +0200, Andreas Gruenbacher wrote:
> On Tue, 25 Jun 2019 at 12:50, Christoph Hellwig  wrote:
> >
> > > That seems way more complicated.  I'd much rather go with something
> > > like may patch plus maybe a big fat comment explaining that persisting
> > > the size update is the file systems job.  Note that a lot of the modern
> > > file systems don't use the VFS inode tracking for that, besides XFS
> > > that includes at least btrfs and ocfs2 as well.
> >
> > I'd suggest something like this as the baseline:
> >
> > http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/iomap-i_size
> 
> Alright, can we change this as follows?
> 
> [Also, I'm not really sure why we check for (pos + ret > inode->i_size)
> when we have already read inode->i_size into old_size.]

Yeah, you probably want to change that to old_size.  Your changes look
good to me,

Can you just take the patch over from here as you've clearly done more
work on it and resend the whole series?



Re: [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end

2019-06-25 Thread Andreas Gruenbacher
On Tue, 25 Jun 2019 at 12:50, Christoph Hellwig  wrote:
>
> > That seems way more complicated.  I'd much rather go with something
> > like may patch plus maybe a big fat comment explaining that persisting
> > the size update is the file systems job.  Note that a lot of the modern
> > file systems don't use the VFS inode tracking for that, besides XFS
> > that includes at least btrfs and ocfs2 as well.
>
> I'd suggest something like this as the baseline:
>
> http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/iomap-i_size

Alright, can we change this as follows?

[Also, I'm not really sure why we check for (pos + ret > inode->i_size)
when we have already read inode->i_size into old_size.]

Thanks,
Andreas

--

iomap: don't mark the inode dirty in iomap_write_end

Marking the inode dirty for each page copied into the page cache
can be very ineffcient for file systems that use the VFS dirty
inode tracking, and is completely pointless for those that don't
use the VFS dirty tracking.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/bmap.c|  2 ++
 fs/iomap.c| 15 ++-
 include/linux/iomap.h |  1 +
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 93ea1d5..f4b895f 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1182,6 +1182,8 @@ static int gfs2_iomap_end(struct inode *inode, loff_t 
pos, loff_t length,
 
if (ip->i_qadata && ip->i_qadata->qa_qd_num)
gfs2_quota_unlock(ip);
+   if (iomap->flags & IOMAP_F_SIZE_CHANGED)
+   mark_inode_dirty(inode);
gfs2_write_unlock(inode);
 
 out:
diff --git a/fs/iomap.c b/fs/iomap.c
index 12654c2..cd24bd1 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -777,6 +777,7 @@ struct iomap_readpage_ctx {
unsigned copied, struct page *page, struct iomap *iomap)
 {
const struct iomap_page_ops *page_ops = iomap->page_ops;
+   loff_t old_size = inode->i_size;
int ret;
 
if (iomap->type == IOMAP_INLINE) {
@@ -788,7 +789,19 @@ struct iomap_readpage_ctx {
ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
}
 
-   __generic_write_end(inode, pos, ret, page);
+   /*
+* Update the in-memory inode size after copying the data into the page
+* cache.  It's up to the file system to write the updated size to disk,
+* preferably after I/O completion so that no stale data is exposed.
+*/
+   if (pos + ret > inode->i_size) {
+   i_size_write(inode, pos + ret);
+   iomap->flags |= IOMAP_F_SIZE_CHANGED;
+   }
+   unlock_page(page);
+
+   if (old_size < pos)
+   pagecache_isize_extended(inode, old_size, pos);
if (page_ops && page_ops->page_done)
page_ops->page_done(inode, pos, copied, page, iomap);
put_page(page);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 2103b94..1df9ea1 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -35,6 +35,7 @@
 #define IOMAP_F_NEW0x01/* blocks have been newly allocated */
 #define IOMAP_F_DIRTY  0x02/* uncommitted metadata */
 #define IOMAP_F_BUFFER_HEAD0x04/* file system requires buffer heads */
+#define IOMAP_F_SIZE_CHANGED   0x08/* file size has changed */
 
 /*
  * Flags that only need to be reported for IOMAP_REPORT requests:
-- 
1.8.3.1



Re: [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end

2019-06-25 Thread Andreas Gruenbacher
On Tue, 25 Jun 2019 at 11:57, Christoph Hellwig  wrote:
> On Mon, Jun 24, 2019 at 08:22:43PM +0200, Andreas Gruenbacher wrote:
> > That would work, but I don't like how this leaves us with a vfs function
> > that updates i_size without bothering to dirty the inode very much.
>
> This isn't a VFS function, it is a helper library.

Well, let's call it that if this suits you better.

> > How about if we move the __generic_write_end call into the page_done
> > callback and leave special handling to the filesystem code if needed
> > instead?  The below patch seems to work for gfs2.
>
> That seems way more complicated.  I'd much rather go with something
> like may patch plus maybe a big fat comment explaining that persisting
> the size update is the file systems job.  Note that a lot of the modern
> file systems don't use the VFS inode tracking for that, besides XFS
> that includes at least btrfs and ocfs2 as well.

Andreas



Re: [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end

2019-06-25 Thread Christoph Hellwig
> That seems way more complicated.  I'd much rather go with something
> like may patch plus maybe a big fat comment explaining that persisting
> the size update is the file systems job.  Note that a lot of the modern
> file systems don't use the VFS inode tracking for that, besides XFS
> that includes at least btrfs and ocfs2 as well.

I'd suggest something like this as the baseline:

http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/iomap-i_size



Re: [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end

2019-06-24 Thread Andreas Gruenbacher
On Mon, 24 Jun 2019 at 08:55, Christoph Hellwig  wrote:
> At least for xfs we don't need the mark_inode_dirty at all.  Can you
> solve your gfs2 requirements on top of something like the patch below?
> Note that in general it seems like you should try to only update the
> on-disk inode size in writeback completion anyway, otherwise you can
> have a stale i_size update before the data was actually written.
>
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index c98107a6bf81..fcf2cbd39114 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -785,6 +785,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned 
> len,
> unsigned copied, struct page *page, struct iomap *iomap)
>  {
> const struct iomap_page_ops *page_ops = iomap->page_ops;
> +   loff_t old_size = inode->i_size;
> int ret;
>
> if (iomap->type == IOMAP_INLINE) {
> @@ -796,7 +797,12 @@ iomap_write_end(struct inode *inode, loff_t pos, 
> unsigned len,
> ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
> }
>
> -   __generic_write_end(inode, pos, ret, page);
> +   if (pos + ret > inode->i_size)
> +   i_size_write(inode, pos + ret);
> +   unlock_page(page);
> +
> +   if (old_size < pos)
> +   pagecache_isize_extended(inode, old_size, pos);
> if (page_ops && page_ops->page_done)
> page_ops->page_done(inode, pos, copied, page, iomap);
> put_page(page);

That would work, but I don't like how this leaves us with a vfs function
that updates i_size without bothering to dirty the inode very much.

How about if we move the __generic_write_end call into the page_done
callback and leave special handling to the filesystem code if needed
instead?  The below patch seems to work for gfs2.

Thanks,
Andreas

---
 fs/gfs2/bmap.c   | 42 --
 fs/gfs2/incore.h |  1 +
 fs/iomap.c   |  5 +++--
 3 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 93ea1d529aa3..7569770e6871 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -991,10 +991,13 @@ static void gfs2_write_unlock(struct inode *inode)
 static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
   unsigned len, struct iomap *iomap)
 {
-   unsigned int blockmask = i_blocksize(inode) - 1;
+   struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_sbd *sdp = GFS2_SB(inode);
-   unsigned int blocks;
+   unsigned int blockmask, blocks;
 
+   if (!(gfs2_is_stuffed(ip) || gfs2_is_jdata(ip)))
+   return 0;
+   blockmask = i_blocksize(inode) - 1;
blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits;
return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
 }
@@ -1005,10 +1008,33 @@ static void gfs2_iomap_page_done(struct inode *inode, 
loff_t pos,
 {
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_sbd *sdp = GFS2_SB(inode);
+   loff_t old_size;
+
+   if (!page)
+   goto out;
 
-   if (page && !gfs2_is_stuffed(ip))
+   /*
+* Avoid calling __generic_write_end here to prevent mark_inode_dirty
+* from being called for each page: it's relatively expensive on gfs2,
+* so we defer that to gfs2_iomap_end.
+*/
+   old_size = inode->i_size;
+   if (pos + copied > old_size) {
+   i_size_write(inode, pos + copied);
+   set_bit(GIF_SIZE_CHANGED, >i_flags);
+   }
+
+   unlock_page(page);
+
+   if (old_size < pos)
+   pagecache_isize_extended(inode, old_size, pos);
+
+   if (gfs2_is_jdata(ip) && !gfs2_is_stuffed(ip))
gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
-   gfs2_trans_end(sdp);
+
+out:
+   if (current->journal_info)
+   gfs2_trans_end(sdp);
 }
 
 static const struct iomap_page_ops gfs2_iomap_page_ops = {
@@ -1106,8 +1132,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, 
loff_t pos,
gfs2_trans_end(sdp);
}
 
-   if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip))
-   iomap->page_ops = _iomap_page_ops;
+   iomap->page_ops = _iomap_page_ops;
return 0;
 
 out_trans_end:
@@ -1160,6 +1185,11 @@ static int gfs2_iomap_end(struct inode *inode, loff_t 
pos, loff_t length,
if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
goto out;
 
+   if (test_bit(GIF_SIZE_CHANGED, >i_flags)) {
+   clear_bit(GIF_SIZE_CHANGED, >i_flags);
+   mark_inode_dirty(inode);
+   }
+
if (!gfs2_is_stuffed(ip))
gfs2_ordered_add_inode(ip);
 
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index c9af93ac6c73..9f620807b396 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -396,6 +396,7 @@ enum {
GIF_ORDERED = 4,
GIF_FREE_VFS_INODE  = 5,

Re: [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end

2019-06-24 Thread Christoph Hellwig
At least for xfs we don't need the mark_inode_dirty at all.  Can you
solve your gfs2 requirements on top of something like the patch below?
Note that in general it seems like you should try to only update the
on-disk inode size in writeback completion anyway, otherwise you can
have a stale i_size update before the data was actually written.


diff --git a/fs/iomap.c b/fs/iomap.c
index c98107a6bf81..fcf2cbd39114 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -785,6 +785,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned 
len,
unsigned copied, struct page *page, struct iomap *iomap)
 {
const struct iomap_page_ops *page_ops = iomap->page_ops;
+   loff_t old_size = inode->i_size;
int ret;
 
if (iomap->type == IOMAP_INLINE) {
@@ -796,7 +797,12 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned 
len,
ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
}
 
-   __generic_write_end(inode, pos, ret, page);
+   if (pos + ret > inode->i_size)
+   i_size_write(inode, pos + ret);
+   unlock_page(page);
+
+   if (old_size < pos)
+   pagecache_isize_extended(inode, old_size, pos);
if (page_ops && page_ops->page_done)
page_ops->page_done(inode, pos, copied, page, iomap);
put_page(page);



Re: [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end

2019-06-19 Thread Dave Chinner
On Tue, Jun 18, 2019 at 04:47:16PM +0200, Andreas Gruenbacher wrote:
> Remove the mark_inode_dirty call from __generic_write_end and add it to
> generic_write_end and the high-level iomap functions where necessary.
> That way, large writes will only dirty inodes at the end instead of
> dirtying them once per page.  This fixes a performance bottleneck on
> gfs2.
> 
> Signed-off-by: Andreas Gruenbacher 
> ---
>  fs/buffer.c | 26 ++
>  fs/iomap.c  | 42 ++
>  2 files changed, 56 insertions(+), 12 deletions(-)



> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 23ef63fd1669..9454568a7f5e 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -881,6 +881,13 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
> iov_iter *iter,
>  {
>   struct inode *inode = iocb->ki_filp->f_mapping->host;
>   loff_t pos = iocb->ki_pos, ret = 0, written = 0;
> + loff_t old_size;
> +
> +/*
> +  * No need to use i_size_read() here, the i_size cannot change under us
> +  * because we hold i_rwsem.
> +  */
> + old_size = inode->i_size;
>  
>   while (iov_iter_count(iter)) {
>   ret = iomap_apply(inode, pos, iov_iter_count(iter),
> @@ -891,6 +898,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
> iov_iter *iter,
>   written += ret;
>   }
>  
> + if (old_size != inode->i_size)
> + mark_inode_dirty(inode);
> +
>   return written ? written : ret;
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
> @@ -961,18 +971,30 @@ int
>  iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
>   const struct iomap_ops *ops)
>  {
> + loff_t old_size;
>   loff_t ret;
>  
> +/*
> +  * No need to use i_size_read() here, the i_size cannot change under us
> +  * because we hold i_rwsem.
> +  */
> + old_size = inode->i_size;
> +
>   while (len) {
>   ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
>   iomap_dirty_actor);
>   if (ret <= 0)
> - return ret;
> + goto out;
>   pos += ret;
>   len -= ret;
>   }
> + ret = 0;
>  
> - return 0;
> +out:
> + if (old_size != inode->i_size)
> + mark_inode_dirty(inode);


I don't think we want to do this.

The patches I have that add range locking for XFS allow buffered
writes to run concurrently with operations that change the inode
size as long as the ranges don't overlap. To do this, XFS will not
hold the i_rwsem over any iomap call it makes in future - it will
hold a range lock instead. Hence we can have writes and other IO
operations occurring at the same time some other operation is
changing the size of the file, and that means this code no longer
does what you are intending it to do because the inode->i_size is no
longer constant across these operations...

Hence I think adding code that depends on i_rwsem to be held to
function correctly is the wrong direction to be taking the iomap
infrastructure.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com



Re: [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end

2019-06-19 Thread Jan Kara
On Tue 18-06-19 16:47:16, Andreas Gruenbacher wrote:
> Remove the mark_inode_dirty call from __generic_write_end and add it to
> generic_write_end and the high-level iomap functions where necessary.
> That way, large writes will only dirty inodes at the end instead of
> dirtying them once per page.  This fixes a performance bottleneck on
> gfs2.
> 
> Signed-off-by: Andreas Gruenbacher 

So the patch looks correct but honestly I don't like how we duplicate inode
dirtying over several places. I was wondering whether we could not move the
logic to iomap_apply() or something like that.

Honza

> ---
>  fs/buffer.c | 26 ++
>  fs/iomap.c  | 42 ++
>  2 files changed, 56 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index e450c55f6434..1b51003bc9d2 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2089,8 +2089,7 @@ EXPORT_SYMBOL(block_write_begin);
>  void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
>   struct page *page)
>  {
> - loff_t old_size = inode->i_size;
> - bool i_size_changed = false;
> + loff_t old_size;
>  
>   /*
>* No need to use i_size_read() here, the i_size cannot change under us
> @@ -2099,23 +2098,21 @@ void __generic_write_end(struct inode *inode, loff_t 
> pos, unsigned copied,
>* But it's important to update i_size while still holding page lock:
>* page writeout could otherwise come in and zero beyond i_size.
>*/
> - if (pos + copied > inode->i_size) {
> + old_size = inode->i_size;
> + if (pos + copied > old_size)
>   i_size_write(inode, pos + copied);
> - i_size_changed = true;
> - }
>  
>   unlock_page(page);
>  
>   if (old_size < pos)
>   pagecache_isize_extended(inode, old_size, pos);
> +
>   /*
>* Don't mark the inode dirty under page lock. First, it unnecessarily
>* makes the holding time of page lock longer. Second, it forces lock
>* ordering of page lock and transaction start for journaling
>* filesystems.
>*/
> - if (i_size_changed)
> - mark_inode_dirty(inode);
>  }
>  
>  int block_write_end(struct file *file, struct address_space *mapping,
> @@ -2158,9 +2155,22 @@ int generic_write_end(struct file *file, struct 
> address_space *mapping,
>   loff_t pos, unsigned len, unsigned copied,
>   struct page *page, void *fsdata)
>  {
> + struct inode *inode = mapping->host;
> + loff_t old_size;
> +
> + /*
> +  * No need to use i_size_read() here, the i_size cannot change under us
> +  * because we hold i_rwsem.
> +  */
> + old_size = inode->i_size;
> +
>   copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
> - __generic_write_end(mapping->host, pos, copied, page);
> + __generic_write_end(inode, pos, copied, page);
>   put_page(page);
> +
> + if (old_size != inode->i_size)
> + mark_inode_dirty(inode);
> +
>   return copied;
>  }
>  EXPORT_SYMBOL(generic_write_end);
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 23ef63fd1669..9454568a7f5e 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -881,6 +881,13 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
> iov_iter *iter,
>  {
>   struct inode *inode = iocb->ki_filp->f_mapping->host;
>   loff_t pos = iocb->ki_pos, ret = 0, written = 0;
> + loff_t old_size;
> +
> +/*
> +  * No need to use i_size_read() here, the i_size cannot change under us
> +  * because we hold i_rwsem.
> +  */
> + old_size = inode->i_size;
>  
>   while (iov_iter_count(iter)) {
>   ret = iomap_apply(inode, pos, iov_iter_count(iter),
> @@ -891,6 +898,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
> iov_iter *iter,
>   written += ret;
>   }
>  
> + if (old_size != inode->i_size)
> + mark_inode_dirty(inode);
> +
>   return written ? written : ret;
>  }
>  EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
> @@ -961,18 +971,30 @@ int
>  iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
>   const struct iomap_ops *ops)
>  {
> + loff_t old_size;
>   loff_t ret;
>  
> +/*
> +  * No need to use i_size_read() here, the i_size cannot change under us
> +  * because we hold i_rwsem.
> +  */
> + old_size = inode->i_size;
> +
>   while (len) {
>   ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
>   iomap_dirty_actor);
>   if (ret <= 0)
> - return ret;
> + goto out;
>   pos += ret;
>   len -= ret;
>   }
> + ret = 0;
>  
> - return 0;
> +out:
> + if (old_size != inode->i_size)
> + 

[Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end

2019-06-18 Thread Andreas Gruenbacher
Remove the mark_inode_dirty call from __generic_write_end and add it to
generic_write_end and the high-level iomap functions where necessary.
That way, large writes will only dirty inodes at the end instead of
dirtying them once per page.  This fixes a performance bottleneck on
gfs2.

Signed-off-by: Andreas Gruenbacher 
---
 fs/buffer.c | 26 ++
 fs/iomap.c  | 42 ++
 2 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index e450c55f6434..1b51003bc9d2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2089,8 +2089,7 @@ EXPORT_SYMBOL(block_write_begin);
 void __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
struct page *page)
 {
-   loff_t old_size = inode->i_size;
-   bool i_size_changed = false;
+   loff_t old_size;
 
/*
 * No need to use i_size_read() here, the i_size cannot change under us
@@ -2099,23 +2098,21 @@ void __generic_write_end(struct inode *inode, loff_t 
pos, unsigned copied,
 * But it's important to update i_size while still holding page lock:
 * page writeout could otherwise come in and zero beyond i_size.
 */
-   if (pos + copied > inode->i_size) {
+   old_size = inode->i_size;
+   if (pos + copied > old_size)
i_size_write(inode, pos + copied);
-   i_size_changed = true;
-   }
 
unlock_page(page);
 
if (old_size < pos)
pagecache_isize_extended(inode, old_size, pos);
+
/*
 * Don't mark the inode dirty under page lock. First, it unnecessarily
 * makes the holding time of page lock longer. Second, it forces lock
 * ordering of page lock and transaction start for journaling
 * filesystems.
 */
-   if (i_size_changed)
-   mark_inode_dirty(inode);
 }
 
 int block_write_end(struct file *file, struct address_space *mapping,
@@ -2158,9 +2155,22 @@ int generic_write_end(struct file *file, struct 
address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
 {
+   struct inode *inode = mapping->host;
+   loff_t old_size;
+
+   /*
+* No need to use i_size_read() here, the i_size cannot change under us
+* because we hold i_rwsem.
+*/
+   old_size = inode->i_size;
+
copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
-   __generic_write_end(mapping->host, pos, copied, page);
+   __generic_write_end(inode, pos, copied, page);
put_page(page);
+
+   if (old_size != inode->i_size)
+   mark_inode_dirty(inode);
+
return copied;
 }
 EXPORT_SYMBOL(generic_write_end);
diff --git a/fs/iomap.c b/fs/iomap.c
index 23ef63fd1669..9454568a7f5e 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -881,6 +881,13 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
iov_iter *iter,
 {
struct inode *inode = iocb->ki_filp->f_mapping->host;
loff_t pos = iocb->ki_pos, ret = 0, written = 0;
+   loff_t old_size;
+
+/*
+* No need to use i_size_read() here, the i_size cannot change under us
+* because we hold i_rwsem.
+*/
+   old_size = inode->i_size;
 
while (iov_iter_count(iter)) {
ret = iomap_apply(inode, pos, iov_iter_count(iter),
@@ -891,6 +898,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
iov_iter *iter,
written += ret;
}
 
+   if (old_size != inode->i_size)
+   mark_inode_dirty(inode);
+
return written ? written : ret;
 }
 EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
@@ -961,18 +971,30 @@ int
 iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
const struct iomap_ops *ops)
 {
+   loff_t old_size;
loff_t ret;
 
+/*
+* No need to use i_size_read() here, the i_size cannot change under us
+* because we hold i_rwsem.
+*/
+   old_size = inode->i_size;
+
while (len) {
ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
iomap_dirty_actor);
if (ret <= 0)
-   return ret;
+   goto out;
pos += ret;
len -= ret;
}
+   ret = 0;
 
-   return 0;
+out:
+   if (old_size != inode->i_size)
+   mark_inode_dirty(inode);
+
+   return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_file_dirty);
 
@@ -1039,19 +1061,31 @@ int
 iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
const struct iomap_ops *ops)
 {
+   loff_t old_size;
loff_t ret;
 
+/*
+* No need to use i_size_read() here, the i_size cannot change under us
+* because we hold i_rwsem.
+*/
+   old_size =