Re: [Cluster-devel] [PATCH v4 06/11] iomap: Add write_{begin, end} iomap operations

2018-05-28 Thread Christoph Hellwig
On Fri, May 25, 2018 at 07:58:14PM +0200, Andreas Grünbacher wrote:
> > The right way to deal with that is to make
> > inline data an explicit iomap type (done in my next posting of the
> > buffer head removal series), and then have iomap_begin find the data,
> > kmap it and return it in the iomap, with iomap_end doing any fixups.
> 
> I see what you mean. How would iomap_write_actor deal with
> IOMAP_INLINE -- just skip iomap_write_begin and iomap_write_end and
> grab the page from the iomap?

I would add a data pointer to struct iomap and pass the data back
from there.  We'll still need most of iomap_write_begin / iomap_write_end,
but just copy the data from the iomap instead of reading it from disk.
Note that the block_write_begin code already handles that case for
the get_blocks path.



Re: [Cluster-devel] [PATCH v4 06/11] iomap: Add write_{begin, end} iomap operations

2018-05-25 Thread Andreas Grünbacher
2018-05-18 18:04 GMT+02:00 Christoph Hellwig :
> On Tue, May 15, 2018 at 10:16:52AM +0200, Andreas Gruenbacher wrote:
>> > Yes.  And from a quick look we can easily handle "stuffed" aka inline
>> > data with the existing architecture.  We just need a new IOMAP_INLINE
>> > flag for the extent type. Details will need to be worked out how to
>> > pass that data, either a struct page or virtual address should probably
>> > do it.
>>
>> This is about data journaling, it has nothing to do with "stuffed"
>> files per se. With journaled data writes, we need to do some special
>> processing for each page so that the page ends up in the journal
>> before it gets written back to its proper on-disk location. It just
>> happens that since we need these per-page operations anyway, the
>> "unstuffing" and "re-stuffing" nicely fits in those operations as
>> well.
>
> Your series makes two uses of the callbacks - one use case is to
> deal with inline data.

Yes, it's not perfect.

> The right way to deal with that is to make
> inline data an explicit iomap type (done in my next posting of the
> buffer head removal series), and then have iomap_begin find the data,
> kmap it and return it in the iomap, with iomap_end doing any fixups.

I see what you mean. How would iomap_write_actor deal with
IOMAP_INLINE -- just skip iomap_write_begin and iomap_write_end and
grab the page from the iomap? I'm a bit worried about reintroducing
the deadlock that iomap_write_actor so carefully tries to avoid.

> For the data journaing I agree that we need a post-write per page
> callout, but I think this should be just a simple optional callout
> just for data journaling, which doesn't override the remaining
> write_end handling.
>
>> Been there. An earlier version of the patches did have a separate
>> stuffed write path and the unstuffing and re-stuffing didn't happen in
>> those per-page operations. The result was much messier overall.
>
> Pointer, please.

It's in this patch:

  https://patchwork.kernel.org/patch/10191007/

from this posting:

https://www.spinics.net/lists/linux-fsdevel/msg121318.html

Thanks,
Andreas



Re: [Cluster-devel] [PATCH v4 06/11] iomap: Add write_{begin, end} iomap operations

2018-05-18 Thread Christoph Hellwig
On Tue, May 15, 2018 at 10:16:52AM +0200, Andreas Gruenbacher wrote:
> > Yes.  And from a quick look we can easily handle "stuffed" aka inline
> > data with the existing architecture.  We just need a new IOMAP_INLINE
> > flag for the extent type. Details will need to be worked out how to
> > pass that data, either a struct page or virtual address should probably
> > do it.
> 
> This is about data journaling, it has nothing to do with "stuffed"
> files per se. With journaled data writes, we need to do some special
> processing for each page so that the page ends up in the journal
> before it gets written back to its proper on-disk location. It just
> happens that since we need these per-page operations anyway, the
> "unstuffing" and "re-stuffing" nicely fits in those operations as
> well.

Your series makes two uses of the callbacks - one use case is to
deal with inline data.  The right way to deal with that is to make
inline data an explicit iomap type (done in my next posting of the
buffer head removal series), and then have iomap_begin find the data,
kmap it and return it in the iomap, with iomap_end doing any fixups.

For the data journaing I agree that we need a post-write per page
callout, but I think this should be just a simple optional callout
just for data journaling, which doesn't override the remaining
write_end handling.

> Been there. An earlier version of the patches did have a separate
> stuffed write path and the unstuffing and re-stuffing didn't happen in
> those per-page operations. The result was much messier overall.

Pointer, please.



Re: [Cluster-devel] [PATCH v4 06/11] iomap: Add write_{begin, end} iomap operations

2018-05-15 Thread Andreas Gruenbacher
On 15 May 2018 at 09:22, Christoph Hellwig  wrote:
> On Tue, May 15, 2018 at 11:11:47AM +1000, Dave Chinner wrote:
>> I'm not sure adding a per-page filesystem callout abstraction is
>> what we want in the iomap code.  The commit message does not explain
>> why gfs2 needs per-page callouts, nor why the per-page work cannot
>> be done/avoided by running code in the ->iomap_begin callout
>> (e.g. by unstuffing the inode so nothing needs to be done per-page).
>
> Agreed.
>
>> My main concern is that the callouts to the filesystem in iomap are
>> - by design - per IO, not per page. The whole point of using iomap
>> was to get away from doing per-page filesystem operations on
>> multi-page IOs - it's highly inefficient when we only need to call
>> into the filesystem on a per-extent basis, and we simplified the
>> code a *lot* by avoiding such behaviours.
>
> Yes.  And from a quick look we can easily handle "stuffed" aka inline
> data with the existing architecture.  We just need a new IOMAP_INLINE
> flag for the extent type. Details will need to be worked out how to
> pass that data, either a struct page or virtual address should probably
> do it.

This is about data journaling, it has nothing to do with "stuffed"
files per se. With journaled data writes, we need to do some special
processing for each page so that the page ends up in the journal
before it gets written back to its proper on-disk location. It just
happens that since we need these per-page operations anyway, the
"unstuffing" and "re-stuffing" nicely fits in those operations as
well.

> The other thing gfs2 seems to be doing is handling of journaled
> data.  I'm not quite sure how to fit that into the grand overall scheme
> of things, but at least that would only be after the write.

Not sure what you mean by "but at least that would only be after the write".

>> That is, Christoph's patchset pushes us further away from
>> filesystems doing their own per-page processing of page state. It
>> converts the iomap IO path to store it's own per-page state tracking
>> information in page->private, greatly reducing memory overhead (16
>> bytes on 4k page instead of 104 bytes per bufferhead) and
>
> Or in fact zero bytes for block size == PAGE_SIZE
>
>> streamlining the code.  This, however, essentially makes the
>> buffered IO path through the iomap infrastructure incompatible with
>> existing bufferhead based filesystems.
>>
>> Depsite this, we can still provide limited support for buffered
>> writes on bufferhead based filesystems through the iomap
>> infrastructure, but I only see that as a mechanism for filesystems
>> moving away from dependencies on bufferheads. It would require a
>> different refactoring of the iomap code - I'd much prefer to keep
>> bufferhead dependent IO paths completely separate (e.g. via a new
>> actor function) to minimise impact and dependencies on the internal
>> bufferhead free iomap IO path
>>
>> Let's see what Christoph thinks first, though
>
> gfs2 seems to indeed depend pretty heavily on buffer_heads.  This is
> a bit of a bummer, but I'd need to take a deeper look how to offer
> iomap functionality to them without them moving away from buffer_heads
> which isn't really going to be a quick project.  In a way the
> write_begin/write_end ops from Andreas facilitate that, as the
> rest of iomap_file_buffered_write and iomap_write_actor are untouched.
>
> So at least temporarily his callouts are what we need, even if the
> actual inline data functionality should be moved out of them for sure,

Been there. An earlier version of the patches did have a separate
stuffed write path and the unstuffing and re-stuffing didn't happen in
those per-page operations. The result was much messier overall.

> but only with a long term plan to move gfs2 away from buffer heads.

I hope we'll get rid of buffer heads in additional parts of gfs2, but
that's not going to happen soon. The main point of those per-page
operations is still to support data journaling though.

> Btw, does gfs2 support blocksizes smallers than PAGE_SIZE?

Yes.

Thanks,
Andreas



Re: [Cluster-devel] [PATCH v4 06/11] iomap: Add write_{begin, end} iomap operations

2018-05-15 Thread Christoph Hellwig
On Tue, May 15, 2018 at 11:11:47AM +1000, Dave Chinner wrote:
> I'm not sure adding a per-page filesystem callout abstraction is
> what we want in the iomap code.  The commit message does not explain
> why gfs2 needs per-page callouts, nor why the per-page work cannot
> be done/avoided by running code in the ->iomap_begin callout
> (e.g. by unstuffing the inode so nothing needs to be done per-page).

Agreed.
> 
> My main concern is that the callouts to the filesystem in iomap are
> - by design - per IO, not per page. The whole point of using iomap
> was to get away from doing per-page filesystem operations on
> multi-page IOs - it's highly inefficient when we only need to call
> into the filesystem on a per-extent basis, and we simplified the
> code a *lot* by avoiding such behaviours.

Yes.  And from a quick look we can easily handle "stuffed" aka inline
data with the existing architecture.  We just need a new IOMAP_INLINE
flag for the extent type.  Details will need to be worked out how to
pass that data, either a struct page or virtual address should probably
do it.  The other thing gfs2 seems to be doing is handling of journaled
data.  I'm not quite sure how to fit that into the grand overall scheme
of things, but at least that would only be after the write.

> That is, Christoph's patchset pushes us further away from
> filesystems doing their own per-page processing of page state. It
> converts the iomap IO path to store it's own per-page state tracking
> information in page->private, greatly reducing memory overhead (16
> bytes on 4k page instead of 104 bytes per bufferhead) and

Or in fact zero bytes for block size == PAGE_SIZE

> streamlining the code.  This, however, essentially makes the
> buffered IO path through the iomap infrastructure incompatible with
> existing bufferhead based filesystems.
> 
> Depsite this, we can still provide limited support for buffered
> writes on bufferhead based filesystems through the iomap
> infrastructure, but I only see that as a mechanism for filesystems
> moving away from dependencies on bufferheads. It would require a
> different refactoring of the iomap code - I'd much prefer to keep
> bufferhead dependent IO paths completely separate (e.g. via a new
> actor function) to minimise impact and dependencies on the internal
> bufferhead free iomap IO path
> 
> Let's see what Christoph thinks first, though

gfs2 seems to indeed depend pretty heavily on buffer_heads.  This is 
a bit of a bummer, but I'd need to take a deeper look how to offer
iomap functionality to them without them moving away from buffer_heads
which isn't really going to be a quick project.  In a way the
write_begin/write_end ops from Andreas facilitate that, as the
rest of iomap_file_buffered_write and iomap_write_actor are untouched.

So at least temporarily his callouts are what we need, even if the
actual inline data functionality should be moved out of them for sure,
but only with a long term plan to move gfs2 away from buffer heads.

Btw, does gfs2 support blocksizes smallers than PAGE_SIZE?

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com
---end quoted text---



Re: [Cluster-devel] [PATCH v4 06/11] iomap: Add write_{begin, end} iomap operations

2018-05-14 Thread Dave Chinner
On Mon, May 14, 2018 at 05:36:19PM +0200, Andreas Gruenbacher wrote:
> Add write_begin and write_end operations to struct iomap_ops to provide
> a way of overriding the default behavior of iomap_write_begin and
> iomap_write_end.  This is needed for implementing data journaling: in
> the data journaling case, pages are written into the journal before
> being written back to their proper on-disk locations.

I'm not sure adding a per-page filesystem callout abstraction is
what we want in the iomap code.  The commit message does not explain
why gfs2 needs per-page callouts, nor why the per-page work cannot
be done/avoided by running code in the ->iomap_begin callout
(e.g. by unstuffing the inode so nothing needs to be done per-page).

My main concern is that the callouts to the filesystem in iomap are
- by design - per IO, not per page. The whole point of using iomap
was to get away from doing per-page filesystem operations on
multi-page IOs - it's highly inefficient when we only need to call
into the filesystem on a per-extent basis, and we simplified the
code a *lot* by avoiding such behaviours.

And to that point: this change has serious conflicts with the next
step for the iomap infrastructure: Christoph's recent
"remove bufferheads from iomap" series. Apart from the obvious code
conflicts, there's a fairly major architectural/functional conflict.

https://marc.info/?l=linux-fsdevel=152585212000411=2

That is, Christoph's patchset pushes us further away from
filesystems doing their own per-page processing of page state. It
converts the iomap IO path to store it's own per-page state tracking
information in page->private, greatly reducing memory overhead (16
bytes on 4k page instead of 104 bytes per bufferhead) and
streamlining the code.  This, however, essentially makes the
buffered IO path through the iomap infrastructure incompatible with
existing bufferhead based filesystems.

Depsite this, we can still provide limited support for buffered
writes on bufferhead based filesystems through the iomap
infrastructure, but I only see that as a mechanism for filesystems
moving away from dependencies on bufferheads. It would require a
different refactoring of the iomap code - I'd much prefer to keep
bufferhead dependent IO paths completely separate (e.g. via a new
actor function) to minimise impact and dependencies on the internal
bufferhead free iomap IO path

Let's see what Christoph thinks first, though

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com



[Cluster-devel] [PATCH v4 06/11] iomap: Add write_{begin, end} iomap operations

2018-05-14 Thread Andreas Gruenbacher
Add write_begin and write_end operations to struct iomap_ops to provide
a way of overriding the default behavior of iomap_write_begin and
iomap_write_end.  This is needed for implementing data journaling: in
the data journaling case, pages are written into the journal before
being written back to their proper on-disk locations.

Signed-off-by: Andreas Gruenbacher 
Cc: Dave Chinner 
---
 fs/ext2/inode.c   |  2 ++
 fs/ext4/inode.c   |  2 ++
 fs/iomap.c| 62 ++-
 fs/xfs/xfs_iomap.c|  2 ++
 include/linux/iomap.h | 22 +++
 5 files changed, 71 insertions(+), 19 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 1e01fabef130..9598ea936405 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -848,6 +848,8 @@ ext2_iomap_end(struct inode *inode, loff_t offset, loff_t 
length,
 
 const struct iomap_ops ext2_iomap_ops = {
.iomap_begin= ext2_iomap_begin,
+   .write_begin= iomap_write_begin,
+   .write_end  = iomap_write_end,
.iomap_end  = ext2_iomap_end,
 };
 #else
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1e50c5efae67..05c8e8015f13 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3604,6 +3604,8 @@ static int ext4_iomap_end(struct inode *inode, loff_t 
offset, loff_t length,
 
 const struct iomap_ops ext4_iomap_ops = {
.iomap_begin= ext4_iomap_begin,
+   .write_begin= iomap_write_begin,
+   .write_end  = iomap_write_end,
.iomap_end  = ext4_iomap_end,
 };
 
diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..27d97a290623 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -108,7 +108,7 @@ iomap_write_failed(struct inode *inode, loff_t pos, 
unsigned len)
truncate_pagecache_range(inode, max(pos, i_size), pos + len);
 }
 
-static int
+int
 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned 
flags,
struct page **pagep, struct iomap *iomap)
 {
@@ -137,8 +137,9 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned 
len, unsigned flags,
*pagep = page;
return status;
 }
+EXPORT_SYMBOL_GPL(iomap_write_begin);
 
-static int
+int
 iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
unsigned copied, struct page *page)
 {
@@ -150,12 +151,19 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned 
len,
iomap_write_failed(inode, pos, len);
return ret;
 }
+EXPORT_SYMBOL_GPL(iomap_write_end);
+
+struct iomap_write_args {
+   struct iov_iter *iter;
+   const struct iomap_ops *ops;
+};
 
 static loff_t
 iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
 {
-   struct iov_iter *i = data;
+   struct iomap_write_args *args = data;
+   struct iov_iter *i = args->iter;
long status = 0;
ssize_t written = 0;
unsigned int flags = AOP_FLAG_NOFS;
@@ -188,7 +196,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
break;
}
 
-   status = iomap_write_begin(inode, pos, bytes, flags, ,
+   status = args->ops->write_begin(inode, pos, bytes, flags, ,
iomap);
if (unlikely(status))
break;
@@ -200,7 +208,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
 
flush_dcache_page(page);
 
-   status = iomap_write_end(inode, pos, bytes, copied, page);
+   status = args->ops->write_end(inode, pos, bytes, copied, page);
if (unlikely(status < 0))
break;
copied = status;
@@ -237,10 +245,14 @@ iomap_file_buffered_write(struct kiocb *iocb, struct 
iov_iter *iter,
 {
struct inode *inode = iocb->ki_filp->f_mapping->host;
loff_t pos = iocb->ki_pos, ret = 0, written = 0;
+   struct iomap_write_args args = {
+   .iter = iter,
+   .ops = ops,
+   };
 
while (iov_iter_count(iter)) {
ret = iomap_apply(inode, pos, iov_iter_count(iter),
-   IOMAP_WRITE, ops, iter, iomap_write_actor);
+   IOMAP_WRITE, ops, , iomap_write_actor);
if (ret <= 0)
break;
pos += ret;
@@ -271,6 +283,7 @@ static loff_t
 iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
 {
+   const struct iomap_ops *ops = data;
long status = 0;
ssize_t written = 0;
 
@@ -286,15 +299,15 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
if (IS_ERR(rpage))
return