Re: [PATCH v4 15/27] fs: retrofit old error reporting API onto new infrastructure

2017-05-15 Thread Jeff Layton
On Mon, 2017-05-15 at 12:42 +0200, Jan Kara wrote:
> On Tue 09-05-17 11:49:18, Jeff Layton wrote:
> > Now that we have a better way to store and report errors that occur
> > during writeback, we need to convert the existing codebase to use it. We
> > could just adapt all of the filesystem code and related infrastructure
> > to the new API, but that's a lot of churn.
> > 
> > When it comes to setting errors in the mapping, filemap_set_wb_error is
> > a drop-in replacement for mapping_set_error. Turn that function into a
> > simple wrapper around the new one.
> > 
> > Because we want to ensure that writeback errors are always reported at
> > fsync time, inject filemap_report_wb_error calls much closer to the
> > syscall boundary, in call_fsync.
> > 
> > For fsync calls (and things like the nfsd equivalent), we either return
> > the error that the fsync operation returns, or the one returned by
> > filemap_report_wb_error. In both cases, we advance the file->f_wb_err to
> > the latest value. This allows us to provide new fsync semantics that
> > will return errors that may have occurred previously and been viewed
> > via other file descriptors.
> > 
> > The final piece of the puzzle is what to do about filemap_check_errors
> > calls that are being called directly or via filemap_* functions. Here,
> > we must take a little "creative license".
> > 
> > Since we now handle advancing the file->f_wb_err value at the generic
> > filesystem layer, we no longer need those callers to clear errors out
> > of the mapping or advance an errseq_t.
> > 
> > A lot of the existing codebase relies on being getting an error back
> > from those functions when there is a writeback problem, so we do still
> > want to have them report writeback errors somehow.
> > 
> > When reporting writeback errors, we will always report errors that have
> > occurred since a particular point in time. With the old writeback error
> > reporting, the time we used was "since it was last tested/cleared" which
> > is entirely arbitrary and potentially racy. Now, we can at least report
> > the latest error that has occurred since an arbitrary point in time
> > (represented as a sampled errseq_t value).
> > 
> > In the case where we don't have a struct file to work with, this patch
> > just has the wrappers sample the current mapping->wb_err value, and use
> > that as an arbitrary point from which to check for errors.
> 
> I think this is really dangerous and we shouldn't do this. You are quite
> likely to lose IO errors in such calls because you will ignore all errors
> that happened during previous background writeback or even for IO that
> managed to complete before we called filemap_fdatawait(). Maybe we need to
> keep the original set-clear-bit IO error reporting for these cases, until
> we can convert them to fdatawait_range_since()?
> 

Yes that is a danger.

In fact, late last week I was finally able to make my xfstest work with
btrfs, and started hitting oopses with it because of the way the error
reporting changed. I rolled up a patch to fix that (and it simplifies
the code some, IMO), but other callers of filemap_fdatawait may have
similar problems here.

I'll pick up working on this again in a more piecemeal way. I had
originally looked at doing that, but there are some problematic places
(e.g. buffer.c), where the code is shared by a bunch of different
filesystems that makes it difficult.

We may end up needing some sort of FS_WB_ERRSEQ flag to do this
correctly, at least until the transition to errseq_t is done.

> > That's probably not "correct" in all cases, particularly in the case of
> > something like filemap_fdatawait, but I'm not sure it's any worse than
> > what we already have, and this gives us a basis from which to work.
> > 
> > A lot of those callers will likely want to change to a model where they
> > sample the errseq_t much earlier (perhaps when starting a transaction),
> > store it in an appropriate place and then use that value later when
> > checking to see if an error occurred.
> > 
> > That will almost certainly take some involvement from other subsystem
> > maintainers. I'm quite open to adding new API functions to help enable
> > this if that would be helpful, but I don't really want to do that until
> > I better understand what's needed.
> > 
> > Signed-off-by: Jeff Layton 
> 
> ...
> 
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 5f7317875a67..7ce13281925f 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -187,6 +187,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t 
> > start, loff_t end,
> > .nr_to_write = LONG_MAX,
> > .for_reclaim = 0,
> > };
> > +   errseq_t since = READ_ONCE(file->f_wb_err);
> >  
> > if (unlikely(f2fs_readonly(inode->i_sb)))
> > return 0;
> > @@ -265,6 +266,8 @@ static int f2fs_do_sync_file(struct file *file, loff_t 
> > start, loff_t end,
> > }
> >  
> > ret = 

Re: [PATCH v4 15/27] fs: retrofit old error reporting API onto new infrastructure

2017-05-15 Thread Jan Kara
On Tue 09-05-17 11:49:18, Jeff Layton wrote:
> Now that we have a better way to store and report errors that occur
> during writeback, we need to convert the existing codebase to use it. We
> could just adapt all of the filesystem code and related infrastructure
> to the new API, but that's a lot of churn.
> 
> When it comes to setting errors in the mapping, filemap_set_wb_error is
> a drop-in replacement for mapping_set_error. Turn that function into a
> simple wrapper around the new one.
> 
> Because we want to ensure that writeback errors are always reported at
> fsync time, inject filemap_report_wb_error calls much closer to the
> syscall boundary, in call_fsync.
> 
> For fsync calls (and things like the nfsd equivalent), we either return
> the error that the fsync operation returns, or the one returned by
> filemap_report_wb_error. In both cases, we advance the file->f_wb_err to
> the latest value. This allows us to provide new fsync semantics that
> will return errors that may have occurred previously and been viewed
> via other file descriptors.
> 
> The final piece of the puzzle is what to do about filemap_check_errors
> calls that are being called directly or via filemap_* functions. Here,
> we must take a little "creative license".
> 
> Since we now handle advancing the file->f_wb_err value at the generic
> filesystem layer, we no longer need those callers to clear errors out
> of the mapping or advance an errseq_t.
> 
> A lot of the existing codebase relies on being getting an error back
> from those functions when there is a writeback problem, so we do still
> want to have them report writeback errors somehow.
> 
> When reporting writeback errors, we will always report errors that have
> occurred since a particular point in time. With the old writeback error
> reporting, the time we used was "since it was last tested/cleared" which
> is entirely arbitrary and potentially racy. Now, we can at least report
> the latest error that has occurred since an arbitrary point in time
> (represented as a sampled errseq_t value).
> 
> In the case where we don't have a struct file to work with, this patch
> just has the wrappers sample the current mapping->wb_err value, and use
> that as an arbitrary point from which to check for errors.

I think this is really dangerous and we shouldn't do this. You are quite
likely to lose IO errors in such calls because you will ignore all errors
that happened during previous background writeback or even for IO that
managed to complete before we called filemap_fdatawait(). Maybe we need to
keep the original set-clear-bit IO error reporting for these cases, until
we can convert them to fdatawait_range_since()?

> That's probably not "correct" in all cases, particularly in the case of
> something like filemap_fdatawait, but I'm not sure it's any worse than
> what we already have, and this gives us a basis from which to work.
> 
> A lot of those callers will likely want to change to a model where they
> sample the errseq_t much earlier (perhaps when starting a transaction),
> store it in an appropriate place and then use that value later when
> checking to see if an error occurred.
> 
> That will almost certainly take some involvement from other subsystem
> maintainers. I'm quite open to adding new API functions to help enable
> this if that would be helpful, but I don't really want to do that until
> I better understand what's needed.
> 
> Signed-off-by: Jeff Layton 

...

> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 5f7317875a67..7ce13281925f 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -187,6 +187,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t 
> start, loff_t end,
>   .nr_to_write = LONG_MAX,
>   .for_reclaim = 0,
>   };
> + errseq_t since = READ_ONCE(file->f_wb_err);
>  
>   if (unlikely(f2fs_readonly(inode->i_sb)))
>   return 0;
> @@ -265,6 +266,8 @@ static int f2fs_do_sync_file(struct file *file, loff_t 
> start, loff_t end,
>   }
>  
>   ret = wait_on_node_pages_writeback(sbi, ino);
> + if (ret == 0)
> + ret = filemap_check_wb_error(NODE_MAPPING(sbi), since);
>   if (ret)
>   goto out;

So this conversion looks wrong and actually points to a larger issue with
the scheme. The problem is there are two mappings that come into play here
- file_inode(file)->i_mapping which is the data mapping and
NODE_MAPPING(sbi) which is the metadata mapping (and this is not a problem
specific to f2fs. For example ext2 also uses this scheme where block
devices' mapping is the metadata mapping). And we need to merge error
information from these two mappings so for the stamping scheme to work,
we'd need two stamps stored in struct file. One for data mapping and one
for metadata mapping. Or maybe there's some more clever scheme but for now
I don't see one...

Honza
-- 
Jan Kara 

[PATCH v4 15/27] fs: retrofit old error reporting API onto new infrastructure

2017-05-09 Thread Jeff Layton
Now that we have a better way to store and report errors that occur
during writeback, we need to convert the existing codebase to use it. We
could just adapt all of the filesystem code and related infrastructure
to the new API, but that's a lot of churn.

When it comes to setting errors in the mapping, filemap_set_wb_error is
a drop-in replacement for mapping_set_error. Turn that function into a
simple wrapper around the new one.

Because we want to ensure that writeback errors are always reported at
fsync time, inject filemap_report_wb_error calls much closer to the
syscall boundary, in call_fsync.

For fsync calls (and things like the nfsd equivalent), we either return
the error that the fsync operation returns, or the one returned by
filemap_report_wb_error. In both cases, we advance the file->f_wb_err to
the latest value. This allows us to provide new fsync semantics that
will return errors that may have occurred previously and been viewed
via other file descriptors.

The final piece of the puzzle is what to do about filemap_check_errors
calls that are being called directly or via filemap_* functions. Here,
we must take a little "creative license".

Since we now handle advancing the file->f_wb_err value at the generic
filesystem layer, we no longer need those callers to clear errors out
of the mapping or advance an errseq_t.

A lot of the existing codebase relies on being getting an error back
from those functions when there is a writeback problem, so we do still
want to have them report writeback errors somehow.

When reporting writeback errors, we will always report errors that have
occurred since a particular point in time. With the old writeback error
reporting, the time we used was "since it was last tested/cleared" which
is entirely arbitrary and potentially racy. Now, we can at least report
the latest error that has occurred since an arbitrary point in time
(represented as a sampled errseq_t value).

In the case where we don't have a struct file to work with, this patch
just has the wrappers sample the current mapping->wb_err value, and use
that as an arbitrary point from which to check for errors.

That's probably not "correct" in all cases, particularly in the case of
something like filemap_fdatawait, but I'm not sure it's any worse than
what we already have, and this gives us a basis from which to work.

A lot of those callers will likely want to change to a model where they
sample the errseq_t much earlier (perhaps when starting a transaction),
store it in an appropriate place and then use that value later when
checking to see if an error occurred.

That will almost certainly take some involvement from other subsystem
maintainers. I'm quite open to adding new API functions to help enable
this if that would be helpful, but I don't really want to do that until
I better understand what's needed.

Signed-off-by: Jeff Layton 
---
 Documentation/filesystems/vfs.txt |  9 +++
 fs/btrfs/file.c   | 10 ++--
 fs/btrfs/tree-log.c   |  9 ++-
 fs/f2fs/file.c|  3 +++
 fs/f2fs/node.c|  6 +
 fs/fuse/file.c|  7 --
 fs/libfs.c|  6 +++--
 include/linux/fs.h| 19 +--
 include/linux/pagemap.h   |  8 ++
 mm/filemap.c  | 51 +--
 10 files changed, 57 insertions(+), 71 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index ed06fb39822b..f201a77873f7 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -577,7 +577,7 @@ written at any point after PG_Dirty is clear.  Once it is 
known to be
 safe, PG_Writeback is cleared.
 
 If there is an error during writeback, then the address_space should be
-marked with an error (typically using filemap_set_wb_error), in order to
+marked with an error (typically using mapping_set_error), in order to
 ensure that the error can later be reported to the application when an
 fsync is issued.
 
@@ -893,10 +893,9 @@ otherwise noted.
 
   release: called when the last reference to an open file is closed
 
-  fsync: called by the fsync(2) system call. Filesystems that use the
-   pagecache should call filemap_report_wb_error before returning
-   to ensure that any errors that occurred during writeback are
-   reported and the file's error sequence advanced.
+  fsync: called by the fsync(2) system call. Errors that were previously
+recorded using mapping_set_error will automatically be returned to
+the application and the file's error sequence advanced.
 
   fasync: called by the fcntl(2) system call when asynchronous
(non-blocking) mode is enabled for a file
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 520cb7230b2d..e15faf240b51 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1962,6 +1962,7 @@ int