Re: [PATCH 4/4] iomap: Use FUA for pure data O_DSYNC DIO writes

2018-05-01 Thread Dave Chinner
On Tue, Apr 24, 2018 at 10:34:44AM -0700, Christoph Hellwig wrote:
> On Sat, Apr 21, 2018 at 02:54:05PM +0200, Jan Kara wrote:
> > > - if (iocb->ki_flags & IOCB_DSYNC)
> > > + if (iocb->ki_flags & IOCB_DSYNC) {
> > >   dio->flags |= IOMAP_DIO_NEED_SYNC;
> > > + /*
> > > +  * We optimistically try using FUA for this IO.  Any
> > > +  * non-FUA write that occurs will clear this flag, hence
> > > +  * we know before completion whether a cache flush is
> > > +  * necessary.
> > > +  */
> > > + dio->flags |= IOMAP_DIO_WRITE_FUA;
> > > + }
> > 
> > So I don't think this is quite correct. IOCB_DSYNC gets set also for O_SYNC
> > writes (in that case we also set IOCB_SYNC). And for those we cannot use
> > the FUA optimization AFAICT (definitely IOMAP_F_DIRTY isn't a safe
> > indicator of a need of full fsync for O_SYNC). Other than that the patch
> > looks good to me.
> 
> Oops, good catch. I think the above if should just be
> 
>   if (iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC) == IOCB_DSYNC)) {
> 
> and we are fine. 

Ah, not exactly. IOMAP_DIO_NEED_SYNC needs to be set for either
DYSNC or SYNC writes, while IOMAP_DIO_WRITE_FUA should only be set
for DSYNC.

I'll fix this up appropriately.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH 4/4] iomap: Use FUA for pure data O_DSYNC DIO writes

2018-04-25 Thread Jan Kara
On Wed 25-04-18 00:07:07, Holger Hoffstätte wrote:
> On 04/24/18 19:34, Christoph Hellwig wrote:
> > On Sat, Apr 21, 2018 at 02:54:05PM +0200, Jan Kara wrote:
> > > > -   if (iocb->ki_flags & IOCB_DSYNC)
> > > > +   if (iocb->ki_flags & IOCB_DSYNC) {
> > > > dio->flags |= IOMAP_DIO_NEED_SYNC;
> > > > +   /*
> > > > +* We optimistically try using FUA for this IO. 
> > > >  Any
> > > > +* non-FUA write that occurs will clear this 
> > > > flag, hence
> > > > +* we know before completion whether a cache 
> > > > flush is
> > > > +* necessary.
> > > > +*/
> > > > +   dio->flags |= IOMAP_DIO_WRITE_FUA;
> > > > +   }
> > > 
> > > So I don't think this is quite correct. IOCB_DSYNC gets set also for 
> > > O_SYNC
> > > writes (in that case we also set IOCB_SYNC). And for those we cannot use
> > > the FUA optimization AFAICT (definitely IOMAP_F_DIRTY isn't a safe
> > > indicator of a need of full fsync for O_SYNC). Other than that the patch
> > > looks good to me.
> > 
> > Oops, good catch. I think the above if should just be
> > 
> > if (iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC) == IOCB_DSYNC)) {
> > 
> > and we are fine.
> 
> The above line just gives parenthesis salad errors, so why not compromise
> on:
> 
>   if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC) {
> 
> Unless my bit twiddling has completely left me I think this is what was
> intended, and it actually compiles too.

Yup, I agree this is what needs to happen.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH 4/4] iomap: Use FUA for pure data O_DSYNC DIO writes

2018-04-24 Thread Christoph Hellwig
On Wed, Apr 25, 2018 at 12:07:07AM +0200, Holger Hoffstätte wrote:
> The above line just gives parenthesis salad errors, so why not compromise
> on:
> 
>   if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC) {
> 
> Unless my bit twiddling has completely left me I think this is what was
> intended, and it actually compiles too.

Yes, that is what was intended :)


Re: [PATCH 4/4] iomap: Use FUA for pure data O_DSYNC DIO writes

2018-04-24 Thread Holger Hoffstätte

On 04/24/18 19:34, Christoph Hellwig wrote:

On Sat, Apr 21, 2018 at 02:54:05PM +0200, Jan Kara wrote:

-   if (iocb->ki_flags & IOCB_DSYNC)
+   if (iocb->ki_flags & IOCB_DSYNC) {
dio->flags |= IOMAP_DIO_NEED_SYNC;
+   /*
+* We optimistically try using FUA for this IO.  Any
+* non-FUA write that occurs will clear this flag, hence
+* we know before completion whether a cache flush is
+* necessary.
+*/
+   dio->flags |= IOMAP_DIO_WRITE_FUA;
+   }


So I don't think this is quite correct. IOCB_DSYNC gets set also for O_SYNC
writes (in that case we also set IOCB_SYNC). And for those we cannot use
the FUA optimization AFAICT (definitely IOMAP_F_DIRTY isn't a safe
indicator of a need of full fsync for O_SYNC). Other than that the patch
looks good to me.


Oops, good catch. I think the above if should just be

if (iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC) == IOCB_DSYNC)) {

and we are fine.


The above line just gives parenthesis salad errors, so why not compromise
on:

if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC) {

Unless my bit twiddling has completely left me I think this is what was
intended, and it actually compiles too.

cheers
Holger


Re: [PATCH 4/4] iomap: Use FUA for pure data O_DSYNC DIO writes

2018-04-24 Thread Christoph Hellwig
On Sat, Apr 21, 2018 at 02:54:05PM +0200, Jan Kara wrote:
> > -   if (iocb->ki_flags & IOCB_DSYNC)
> > +   if (iocb->ki_flags & IOCB_DSYNC) {
> > dio->flags |= IOMAP_DIO_NEED_SYNC;
> > +   /*
> > +* We optimistically try using FUA for this IO.  Any
> > +* non-FUA write that occurs will clear this flag, hence
> > +* we know before completion whether a cache flush is
> > +* necessary.
> > +*/
> > +   dio->flags |= IOMAP_DIO_WRITE_FUA;
> > +   }
> 
> So I don't think this is quite correct. IOCB_DSYNC gets set also for O_SYNC
> writes (in that case we also set IOCB_SYNC). And for those we cannot use
> the FUA optimization AFAICT (definitely IOMAP_F_DIRTY isn't a safe
> indicator of a need of full fsync for O_SYNC). Other than that the patch
> looks good to me.

Oops, good catch. I think the above if should just be

if (iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC) == IOCB_DSYNC)) {

and we are fine. 

> 
>   Honza
> -- 
> Jan Kara 
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---


Re: [PATCH 4/4] iomap: Use FUA for pure data O_DSYNC DIO writes

2018-04-21 Thread Jan Kara
On Wed 18-04-18 14:08:28, Dave Chinner wrote:
> @@ -1012,8 +1035,16 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>   dio->flags |= IOMAP_DIO_DIRTY;
>   } else {
>   dio->flags |= IOMAP_DIO_WRITE;
> - if (iocb->ki_flags & IOCB_DSYNC)
> + if (iocb->ki_flags & IOCB_DSYNC) {
>   dio->flags |= IOMAP_DIO_NEED_SYNC;
> + /*
> +  * We optimistically try using FUA for this IO.  Any
> +  * non-FUA write that occurs will clear this flag, hence
> +  * we know before completion whether a cache flush is
> +  * necessary.
> +  */
> + dio->flags |= IOMAP_DIO_WRITE_FUA;
> + }

So I don't think this is quite correct. IOCB_DSYNC gets set also for O_SYNC
writes (in that case we also set IOCB_SYNC). And for those we cannot use
the FUA optimization AFAICT (definitely IOMAP_F_DIRTY isn't a safe
indicator of a need of full fsync for O_SYNC). Other than that the patch
looks good to me.

Honza
-- 
Jan Kara 
SUSE Labs, CR


[PATCH 4/4] iomap: Use FUA for pure data O_DSYNC DIO writes

2018-04-17 Thread Dave Chinner
From: Dave Chinner 

If we are doing direct IO writes with datasync semantics, we often
have to flush metadata changes along with the data write. However,
if we are overwriting existing data, there are no metadata changes
that we need to flush. In this case, optimising the IO by using
FUA write makes sense.

We know from the IOMAP_F_DIRTY flag as to whether a specific inode
requires a metadata flush - this is currently used by DAX to ensure
extent modification as stable in page fault operations. For direct
IO writes, we can use it to determine if we need to flush metadata
or not once the data is on disk.

Hence if we have been returned a mapped extent that is not new and
the IO mapping is not dirty, then we can use a FUA write to provide
datasync semantics. This allows us to short-cut the
generic_write_sync() call in IO completion and hence avoid
unnecessary operations. This makes pure direct IO data write
behaviour identical to the way block devices use REQ_FUA to provide
datasync semantics.

On a FUA enabled device, a synchronous direct IO write workload
(sequential 4k overwrites in 32MB file) had the following results:

# xfs_io -fd -c "pwrite -V 1 -D 0 32m" /mnt/scratch/boo

kernel  timewrite()swrite iops  Write b/w
--  --  -
(no dsync)   4s 2173/s  21738.5MB/s
vanilla 22s  370/s   7501.4MB/s
patched 19s  420/s   4201.6MB/s

The patched code clearly doesn't send cache flushes anymore, but
instead uses FUA (confirmed via blktrace), and performance improves
a bit as a result. However, the benefits will be higher on workloads
that mix O_DSYNC overwrites with other write IO as we won't be
flushing the entire device cache on every DSYNC overwrite IO
anymore.

Signed-Off-By: Dave Chinner 
Reviewed-by: Christoph Hellwig 
---
 fs/iomap.c | 48 +++-
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 1f59c2d9ade6..62f1f8256da2 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -685,6 +685,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_data);
  * Private flags for iomap_dio, must not overlap with the public ones in
  * iomap.h:
  */
+#define IOMAP_DIO_WRITE_FUA(1 << 28)
 #define IOMAP_DIO_NEED_SYNC(1 << 29)
 #define IOMAP_DIO_WRITE(1 << 30)
 #define IOMAP_DIO_DIRTY(1 << 31)
@@ -860,6 +861,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
length,
struct iov_iter iter;
struct bio *bio;
bool need_zeroout = false;
+   bool use_fua = false;
int nr_pages, ret;
size_t copied = 0;
 
@@ -883,8 +885,20 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
length,
case IOMAP_MAPPED:
if (iomap->flags & IOMAP_F_SHARED)
dio->flags |= IOMAP_DIO_COW;
-   if (iomap->flags & IOMAP_F_NEW)
+   if (iomap->flags & IOMAP_F_NEW) {
need_zeroout = true;
+   } else {
+   /*
+* Use a FUA write if we need datasync semantics, this
+* is a pure data IO that doesn't require any metadata
+* updates and the underlying device supports FUA. This
+* allows us to avoid cache flushes on IO completion.
+*/
+   if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
+   (dio->flags & IOMAP_DIO_WRITE_FUA) &&
+   blk_queue_fua(bdev_get_queue(iomap->bdev)))
+   use_fua = true;
+   }
break;
default:
WARN_ON_ONCE(1);
@@ -932,10 +946,14 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
length,
 
n = bio->bi_iter.bi_size;
if (dio->flags & IOMAP_DIO_WRITE) {
-   bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC | 
REQ_IDLE);
+   bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE;
+   if (use_fua)
+   bio->bi_opf |= REQ_FUA;
+   else
+   dio->flags &= ~IOMAP_DIO_WRITE_FUA;
task_io_account_write(n);
} else {
-   bio_set_op_attrs(bio, REQ_OP_READ, 0);
+   bio->bi_opf = REQ_OP_READ;
if (dio->flags & IOMAP_DIO_DIRTY)
bio_set_pages_dirty(bio);
}
@@ -965,7 +983,12 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
length,
 
 /*
  * iomap_dio_rw() always completes O_[D]SYNC writes regardless of whether the 
IO
- * is being issued as AIO or not.
+ * is being issued as AIO or not.  This allows us to optimise pure data write