Re: [PATCH v7 11/22] fs: new infrastructure for writeback error handling and reporting

2017-06-20 Thread Jeff Layton
On Tue, 2017-06-20 at 05:34 -0700, Christoph Hellwig wrote:
> > @@ -393,6 +394,7 @@ struct address_space {
> > gfp_t   gfp_mask;   /* implicit gfp mask for 
> > allocations */
> > struct list_headprivate_list;   /* ditto */
> > void*private_data;  /* ditto */
> > +   errseq_twb_err;
> >  } __attribute__((aligned(sizeof(long;
> > /*
> >  * On most architectures that alignment is already the case; but
> > @@ -847,6 +849,7 @@ struct file {
> >  * Must not be taken from IRQ context.
> >  */
> > spinlock_t  f_lock;
> > +   errseq_tf_wb_err;
> > atomic_long_t   f_count;
> > unsigned intf_flags;
> > fmode_t f_mode;
> 
> Did you check the sizes of the structure before and after?
> These places don't look like holes in the packing, but there probably
> are some available.
> 

Yes. That one actually plugs a 4 byte hole in struct file on x86_64.

> > +static inline int filemap_check_wb_err(struct address_space *mapping, 
> > errseq_t since)
> 
> Overly long line here (the patch has a few more)
> 

Ok, I'll fix those up.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 11/22] fs: new infrastructure for writeback error handling and reporting

2017-06-20 Thread Christoph Hellwig
> @@ -393,6 +394,7 @@ struct address_space {
>   gfp_t   gfp_mask;   /* implicit gfp mask for 
> allocations */
>   struct list_headprivate_list;   /* ditto */
>   void*private_data;  /* ditto */
> + errseq_twb_err;
>  } __attribute__((aligned(sizeof(long;
>   /*
>* On most architectures that alignment is already the case; but
> @@ -847,6 +849,7 @@ struct file {
>* Must not be taken from IRQ context.
>*/
>   spinlock_t  f_lock;
> + errseq_tf_wb_err;
>   atomic_long_t   f_count;
>   unsigned intf_flags;
>   fmode_t f_mode;

Did you check the sizes of the structure before and after?
These places don't look like holes in the packing, but there probably
are some available.

> +static inline int filemap_check_wb_err(struct address_space *mapping, 
> errseq_t since)

Overly long line here (the patch has a few more)

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 11/22] fs: new infrastructure for writeback error handling and reporting

2017-06-16 Thread Jeff Layton
Most filesystems currently use mapping_set_error and
filemap_check_errors for setting and reporting/clearing writeback errors
at the mapping level. filemap_check_errors is indirectly called from
most of the filemap_fdatawait_* functions and from
filemap_write_and_wait*. These functions are called from all sorts of
contexts to wait on writeback to finish -- e.g. mostly in fsync, but
also in truncate calls, getattr, etc.

The non-fsync callers are problematic. We should be reporting writeback
errors during fsync, but many places spread over the tree clear out
errors before they can be properly reported, or report errors at
nonsensical times.

If I get -EIO on a stat() call, there is no reason for me to assume that
it is because some previous writeback failed. The fact that it also
clears out the error such that a subsequent fsync returns 0 is a bug,
and a nasty one since that's potentially silent data corruption.

This patch adds a small bit of new infrastructure for setting and
reporting errors during address_space writeback. While the above was my
original impetus for adding this, I think it's also the case that
current fsync semantics are just problematic for userland. Most
applications that call fsync do so to ensure that the data they wrote
has hit the backing store.

In the case where there are multiple writers to the file at the same
time, this is really hard to determine. The first one to call fsync will
see any stored error, and the rest get back 0. The processes with open
fds may not be associated with one another in any way. They could even
be in different containers, so ensuring coordination between all fsync
callers is not really an option.

One way to remedy this would be to track what file descriptor was used
to dirty the file, but that's rather cumbersome and would likely be
slow. However, there is a simpler way to improve the semantics here
without incurring too much overhead.

This set adds an errseq_t to struct address_space, and a corresponding
one is added to struct file. Writeback errors are recorded in the
mapping's errseq_t, and the one in struct file is used as the "since"
value.

This changes the semantics of the Linux fsync implementation such that
applications can now use it to determine whether there were any
writeback errors since fsync(fd) was last called (or since the file was
opened in the case of fsync having never been called).

Note that those writeback errors may have occurred when writing data
that was dirtied via an entirely different fd, but that's the case now
with the current mapping_set_error/filemap_check_error infrastructure.
This will at least prevent you from getting a false report of success.

The new behavior is still consistent with the POSIX spec, and is more
reliable for application developers. This patch just adds some basic
infrastructure for doing this, and ensures that the f_wb_err "cursor"
is properly set when a file is opened. Later patches will change the
existing code to use this new infrastructure for reporting errors at
fsync time.

Signed-off-by: Jeff Layton 
Reviewed-by: Jan Kara 
---
 drivers/dax/device.c |  1 +
 fs/block_dev.c   |  1 +
 fs/file_table.c  |  1 +
 fs/open.c|  3 +++
 include/linux/fs.h   | 53 
 mm/filemap.c | 38 +
 6 files changed, 97 insertions(+)

diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 006e657dfcb9..12943d19bfc4 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -499,6 +499,7 @@ static int dax_open(struct inode *inode, struct file *filp)
inode->i_mapping = __dax_inode->i_mapping;
inode->i_mapping->host = __dax_inode;
filp->f_mapping = inode->i_mapping;
+   filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
filp->private_data = dev_dax;
inode->i_flags = S_DAX;
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index bcd8e16a34e1..dc839f8f0ba5 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1746,6 +1746,7 @@ static int blkdev_open(struct inode * inode, struct file 
* filp)
return -ENOMEM;
 
filp->f_mapping = bdev->bd_inode->i_mapping;
+   filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
 
return blkdev_get(bdev, filp->f_mode, filp);
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index 954d510b765a..72e861a35a7f 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -168,6 +168,7 @@ struct file *alloc_file(const struct path *path, fmode_t 
mode,
file->f_path = *path;
file->f_inode = path->dentry->d_inode;
file->f_mapping = path->dentry->d_inode->i_mapping;
+   file->f_wb_err = filemap_sample_wb_err(file->f_mapping);
if ((mode & FMODE_READ) &&
 likely(fop->read || fop->read_iter))
mode |= FMODE_CAN_READ;
diff --git a/fs/open.c b/fs/open.c
index cd0c5be8d012..280d4a963791 100644
---