Re: [PATCH v7 2/2] Return bytes transferred for partial direct I/O

2018-05-20 Thread Goldwyn Rodrigues


On 05/19/2018 08:29 PM, Theodore Y. Ts'o wrote:
> On Thu, Feb 08, 2018 at 12:59:48PM -0600, Goldwyn Rodrigues wrote:
>> From: Goldwyn Rodrigues 
>>
>> In case direct I/O encounters an error midway, it returns the error.
>> Instead it should be returning the number of bytes transferred so far.
>>
>> Test case for filesystems (with ENOSPC):
>> 1. Create an almost full filesystem
>> 2. Create a file, say /mnt/lastfile, until the filesystem is full.
>> 3. Direct write() with count > sizeof /mnt/lastfile.
>>
>> Result: write() returns -ENOSPC. However, file content has data written
>> in step 3.
>>
>> Added a sysctl entry: dio_short_writes which is on by default. This is
>> to support applications which expect either and error or the bytes submitted
>> as a return value for the write calls.
>>
>> This fixes fstest generic/472.
>>
>> Signed-off-by: Goldwyn Rodrigues 
> 
> Hi,
> 
> I was wondering if you could give an update regarding what's up with
> this patch?
> 
> There doesn't seem to be any movement on this patch in a while, and so
> I still have xfstests generic/472 suppressed in {kvm,gce}-xfstests.
> 

>From earlier discussions, In between errors of a direct I/O cannot be
handled correctly and may need a lot of tracking that it is not worth
performing.

It would be better to drop this test case and add in the documentation
that a direct I/O error could mean that the write() may or may not have
occurred and underlying data may be inconsistent. I am proposing:

diff --git a/man2/write.2 b/man2/write.2
index f8a94f3ff..86f655e26 100644
--- a/man2/write.2
+++ b/man2/write.2
@@ -274,6 +274,14 @@ On Linux,
 returning the number of bytes actually transferred.
 .\" commit e28cc71572da38a5a12c1cfe4d7032017adccf69
 (This is true on both 32-bit and 64-bit systems.)
+.PP
+While performing
+.BR write()
+using direct I/O, an error returned does not mean the
+entire write has failed. Partial data may be written
+and the file offset to length on which the
+.BR write()
+was attempted should be considered inconsistent.
 .SH BUGS
 According to POSIX.1-2008/SUSv4 Section XSI 2.9.7
 ("Thread Interactions with Regular File Operations"):


-- 
Goldwyn


Re: [PATCH v7 2/2] Return bytes transferred for partial direct I/O

2018-05-19 Thread Theodore Y. Ts'o
On Thu, Feb 08, 2018 at 12:59:48PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues 
> 
> In case direct I/O encounters an error midway, it returns the error.
> Instead it should be returning the number of bytes transferred so far.
> 
> Test case for filesystems (with ENOSPC):
> 1. Create an almost full filesystem
> 2. Create a file, say /mnt/lastfile, until the filesystem is full.
> 3. Direct write() with count > sizeof /mnt/lastfile.
> 
> Result: write() returns -ENOSPC. However, file content has data written
> in step 3.
> 
> Added a sysctl entry: dio_short_writes which is on by default. This is
> to support applications which expect either and error or the bytes submitted
> as a return value for the write calls.
> 
> This fixes fstest generic/472.
> 
> Signed-off-by: Goldwyn Rodrigues 

Hi,

I was wondering if you could give an update regarding what's up with
this patch?

There doesn't seem to be any movement on this patch in a while, and so
I still have xfstests generic/472 suppressed in {kvm,gce}-xfstests.

Thanks,

- Ted


Re: [PATCH v7 2/2] Return bytes transferred for partial direct I/O

2018-03-08 Thread Darrick J. Wong
On Thu, Mar 08, 2018 at 09:35:48AM -0600, Goldwyn Rodrigues wrote:
> 
> 
> On 03/07/2018 06:53 PM, Darrick J. Wong wrote:
> > On Thu, Feb 08, 2018 at 12:59:48PM -0600, Goldwyn Rodrigues wrote:
> >> From: Goldwyn Rodrigues 
> >>
> >> In case direct I/O encounters an error midway, it returns the error.
> >> Instead it should be returning the number of bytes transferred so far.
> >>
> >> Test case for filesystems (with ENOSPC):
> >> 1. Create an almost full filesystem
> >> 2. Create a file, say /mnt/lastfile, until the filesystem is full.
> >> 3. Direct write() with count > sizeof /mnt/lastfile.
> >>
> >> Result: write() returns -ENOSPC. However, file content has data written
> >> in step 3.
> >>
> >> Added a sysctl entry: dio_short_writes which is on by default. This is
> >> to support applications which expect either and error or the bytes 
> >> submitted
> >> as a return value for the write calls.
> >>
> >> This fixes fstest generic/472.
> >>
> >> Signed-off-by: Goldwyn Rodrigues 
> >> ---
> >>  Documentation/sysctl/fs.txt | 14 ++
> >>  fs/block_dev.c  |  2 +-
> >>  fs/direct-io.c  |  7 +--
> >>  fs/iomap.c  | 23 ---
> >>  include/linux/fs.h  |  1 +
> >>  kernel/sysctl.c |  9 +
> >>  6 files changed, 42 insertions(+), 14 deletions(-)
> >>
> >> Changes since v1:
> >>  - incorporated iomap and block devices
> >>
> >> Changes since v2:
> >>  - realized that file size was not increasing when performing a (partial)
> >>direct I/O because end_io function was receiving the error instead of
> >>size. Fixed.
> >>
> >> Changes since v3:
> >>  - [hch] initialize transferred with dio->size and use transferred instead
> >>of dio->size.
> >>
> >> Changes since v4:
> >>  - Refreshed to v4.14
> >>
> >> Changes since v5:
> >>  - Added /proc/sys/fs/dio_short_writes (default 1) to guard older 
> >> applications
> >>which expect write(fd, buf, count) returns either count or error.
> >>
> >> Changes since v6:
> >>  - Corrected documentation
> >>  - Re-ordered patch
> >>
> >> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
> >> index 6c00c1e2743f..21582f675985 100644
> >> --- a/Documentation/sysctl/fs.txt
> >> +++ b/Documentation/sysctl/fs.txt
> >> @@ -22,6 +22,7 @@ Currently, these files are in /proc/sys/fs:
> >>  - aio-max-nr
> >>  - aio-nr
> >>  - dentry-state
> >> +- dio_short_writes
> >>  - dquot-max
> >>  - dquot-nr
> >>  - file-max
> >> @@ -76,6 +77,19 @@ dcache isn't pruned yet.
> >>  
> >>  ==
> >>  
> >> +dio_short_writes:
> >> +
> >> +In case Direct I/O encounters a transient error, it returns
> >> +the error code, even if it has performed part of the write.
> >> +This flag, if on (default), will return the number of bytes written
> >> +so far, as the write(2) semantics are. However, some older applications
> >> +still consider a direct write as an error if all of the I/O
> >> +submitted is not complete. I.e. write(file, count, buf) != count.
> >> +This option can be disabled on systems in order to support
> >> +existing applications which do not expect short writes.
> >> +
> >> +==
> >> +
> >>  dquot-max & dquot-nr:
> >>  
> >>  The file dquot-max shows the maximum number of cached disk
> >> diff --git a/fs/block_dev.c b/fs/block_dev.c
> >> index 4a181fcb5175..49d94360bb51 100644
> >> --- a/fs/block_dev.c
> >> +++ b/fs/block_dev.c
> >> @@ -409,7 +409,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
> >> *iter, int nr_pages)
> >>  
> >>if (!ret)
> >>ret = blk_status_to_errno(dio->bio.bi_status);
> >> -  if (likely(!ret))
> >> +  if (likely(dio->size))
> >>ret = dio->size;
> >>  
> >>bio_put(>bio);
> >> diff --git a/fs/direct-io.c b/fs/direct-io.c
> >> index 3aafb3343a65..9bd15be64c25 100644
> >> --- a/fs/direct-io.c
> >> +++ b/fs/direct-io.c
> >> @@ -151,6 +151,7 @@ struct dio {
> >>  } cacheline_aligned_in_smp;
> >>  
> >>  static struct kmem_cache *dio_cache __read_mostly;
> >> +unsigned int sysctl_dio_short_writes = 1;
> >>  
> >>  /*
> >>   * How many pages are in the queue?
> >> @@ -262,7 +263,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t 
> >> ret, unsigned int flags)
> >>ret = dio->page_errors;
> >>if (ret == 0)
> >>ret = dio->io_error;
> >> -  if (ret == 0)
> >> +  if (!sysctl_dio_short_writes && (ret == 0))
> >>ret = transferred;
> >>  
> >>if (dio->end_io) {
> >> @@ -310,7 +311,9 @@ static ssize_t dio_complete(struct dio *dio, ssize_t 
> >> ret, unsigned int flags)
> >>}
> >>  
> >>kmem_cache_free(dio_cache, dio);
> >> -  return ret;
> >> +  if (!sysctl_dio_short_writes)
> >> +  return ret;
> >> +  return transferred ? transferred : ret;
> >>  }
> >>  
> >>  static void 

Re: [PATCH v7 2/2] Return bytes transferred for partial direct I/O

2018-03-08 Thread Goldwyn Rodrigues


On 03/07/2018 06:53 PM, Darrick J. Wong wrote:
> On Thu, Feb 08, 2018 at 12:59:48PM -0600, Goldwyn Rodrigues wrote:
>> From: Goldwyn Rodrigues 
>>
>> In case direct I/O encounters an error midway, it returns the error.
>> Instead it should be returning the number of bytes transferred so far.
>>
>> Test case for filesystems (with ENOSPC):
>> 1. Create an almost full filesystem
>> 2. Create a file, say /mnt/lastfile, until the filesystem is full.
>> 3. Direct write() with count > sizeof /mnt/lastfile.
>>
>> Result: write() returns -ENOSPC. However, file content has data written
>> in step 3.
>>
>> Added a sysctl entry: dio_short_writes which is on by default. This is
>> to support applications which expect either and error or the bytes submitted
>> as a return value for the write calls.
>>
>> This fixes fstest generic/472.
>>
>> Signed-off-by: Goldwyn Rodrigues 
>> ---
>>  Documentation/sysctl/fs.txt | 14 ++
>>  fs/block_dev.c  |  2 +-
>>  fs/direct-io.c  |  7 +--
>>  fs/iomap.c  | 23 ---
>>  include/linux/fs.h  |  1 +
>>  kernel/sysctl.c |  9 +
>>  6 files changed, 42 insertions(+), 14 deletions(-)
>>
>> Changes since v1:
>>  - incorporated iomap and block devices
>>
>> Changes since v2:
>>  - realized that file size was not increasing when performing a (partial)
>>direct I/O because end_io function was receiving the error instead of
>>size. Fixed.
>>
>> Changes since v3:
>>  - [hch] initialize transferred with dio->size and use transferred instead
>>of dio->size.
>>
>> Changes since v4:
>>  - Refreshed to v4.14
>>
>> Changes since v5:
>>  - Added /proc/sys/fs/dio_short_writes (default 1) to guard older 
>> applications
>>which expect write(fd, buf, count) returns either count or error.
>>
>> Changes since v6:
>>  - Corrected documentation
>>  - Re-ordered patch
>>
>> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
>> index 6c00c1e2743f..21582f675985 100644
>> --- a/Documentation/sysctl/fs.txt
>> +++ b/Documentation/sysctl/fs.txt
>> @@ -22,6 +22,7 @@ Currently, these files are in /proc/sys/fs:
>>  - aio-max-nr
>>  - aio-nr
>>  - dentry-state
>> +- dio_short_writes
>>  - dquot-max
>>  - dquot-nr
>>  - file-max
>> @@ -76,6 +77,19 @@ dcache isn't pruned yet.
>>  
>>  ==
>>  
>> +dio_short_writes:
>> +
>> +In case Direct I/O encounters a transient error, it returns
>> +the error code, even if it has performed part of the write.
>> +This flag, if on (default), will return the number of bytes written
>> +so far, as the write(2) semantics are. However, some older applications
>> +still consider a direct write as an error if all of the I/O
>> +submitted is not complete. I.e. write(file, count, buf) != count.
>> +This option can be disabled on systems in order to support
>> +existing applications which do not expect short writes.
>> +
>> +==
>> +
>>  dquot-max & dquot-nr:
>>  
>>  The file dquot-max shows the maximum number of cached disk
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 4a181fcb5175..49d94360bb51 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -409,7 +409,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
>> *iter, int nr_pages)
>>  
>>  if (!ret)
>>  ret = blk_status_to_errno(dio->bio.bi_status);
>> -if (likely(!ret))
>> +if (likely(dio->size))
>>  ret = dio->size;
>>  
>>  bio_put(>bio);
>> diff --git a/fs/direct-io.c b/fs/direct-io.c
>> index 3aafb3343a65..9bd15be64c25 100644
>> --- a/fs/direct-io.c
>> +++ b/fs/direct-io.c
>> @@ -151,6 +151,7 @@ struct dio {
>>  } cacheline_aligned_in_smp;
>>  
>>  static struct kmem_cache *dio_cache __read_mostly;
>> +unsigned int sysctl_dio_short_writes = 1;
>>  
>>  /*
>>   * How many pages are in the queue?
>> @@ -262,7 +263,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t 
>> ret, unsigned int flags)
>>  ret = dio->page_errors;
>>  if (ret == 0)
>>  ret = dio->io_error;
>> -if (ret == 0)
>> +if (!sysctl_dio_short_writes && (ret == 0))
>>  ret = transferred;
>>  
>>  if (dio->end_io) {
>> @@ -310,7 +311,9 @@ static ssize_t dio_complete(struct dio *dio, ssize_t 
>> ret, unsigned int flags)
>>  }
>>  
>>  kmem_cache_free(dio_cache, dio);
>> -return ret;
>> +if (!sysctl_dio_short_writes)
>> +return ret;
>> +return transferred ? transferred : ret;
>>  }
>>  
>>  static void dio_aio_complete_work(struct work_struct *work)
>> diff --git a/fs/iomap.c b/fs/iomap.c
>> index 47d29ccffaef..a8d6908dc0de 100644
>> --- a/fs/iomap.c
>> +++ b/fs/iomap.c
>> @@ -716,23 +716,24 @@ static ssize_t iomap_dio_complete(struct iomap_dio 
>> *dio)
>>  struct kiocb *iocb = dio->iocb;
>>  struct inode *inode = 

Re: [PATCH v7 2/2] Return bytes transferred for partial direct I/O

2018-03-07 Thread Darrick J. Wong
On Thu, Feb 08, 2018 at 12:59:48PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues 
> 
> In case direct I/O encounters an error midway, it returns the error.
> Instead it should be returning the number of bytes transferred so far.
> 
> Test case for filesystems (with ENOSPC):
> 1. Create an almost full filesystem
> 2. Create a file, say /mnt/lastfile, until the filesystem is full.
> 3. Direct write() with count > sizeof /mnt/lastfile.
> 
> Result: write() returns -ENOSPC. However, file content has data written
> in step 3.
> 
> Added a sysctl entry: dio_short_writes which is on by default. This is
> to support applications which expect either and error or the bytes submitted
> as a return value for the write calls.
> 
> This fixes fstest generic/472.
> 
> Signed-off-by: Goldwyn Rodrigues 
> ---
>  Documentation/sysctl/fs.txt | 14 ++
>  fs/block_dev.c  |  2 +-
>  fs/direct-io.c  |  7 +--
>  fs/iomap.c  | 23 ---
>  include/linux/fs.h  |  1 +
>  kernel/sysctl.c |  9 +
>  6 files changed, 42 insertions(+), 14 deletions(-)
> 
> Changes since v1:
>  - incorporated iomap and block devices
> 
> Changes since v2:
>  - realized that file size was not increasing when performing a (partial)
>direct I/O because end_io function was receiving the error instead of
>size. Fixed.
> 
> Changes since v3:
>  - [hch] initialize transferred with dio->size and use transferred instead
>of dio->size.
> 
> Changes since v4:
>  - Refreshed to v4.14
> 
> Changes since v5:
>  - Added /proc/sys/fs/dio_short_writes (default 1) to guard older applications
>which expect write(fd, buf, count) returns either count or error.
> 
> Changes since v6:
>  - Corrected documentation
>  - Re-ordered patch
> 
> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
> index 6c00c1e2743f..21582f675985 100644
> --- a/Documentation/sysctl/fs.txt
> +++ b/Documentation/sysctl/fs.txt
> @@ -22,6 +22,7 @@ Currently, these files are in /proc/sys/fs:
>  - aio-max-nr
>  - aio-nr
>  - dentry-state
> +- dio_short_writes
>  - dquot-max
>  - dquot-nr
>  - file-max
> @@ -76,6 +77,19 @@ dcache isn't pruned yet.
>  
>  ==
>  
> +dio_short_writes:
> +
> +In case Direct I/O encounters a transient error, it returns
> +the error code, even if it has performed part of the write.
> +This flag, if on (default), will return the number of bytes written
> +so far, as the write(2) semantics are. However, some older applications
> +still consider a direct write as an error if all of the I/O
> +submitted is not complete. I.e. write(file, count, buf) != count.
> +This option can be disabled on systems in order to support
> +existing applications which do not expect short writes.
> +
> +==
> +
>  dquot-max & dquot-nr:
>  
>  The file dquot-max shows the maximum number of cached disk
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 4a181fcb5175..49d94360bb51 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -409,7 +409,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter 
> *iter, int nr_pages)
>  
>   if (!ret)
>   ret = blk_status_to_errno(dio->bio.bi_status);
> - if (likely(!ret))
> + if (likely(dio->size))
>   ret = dio->size;
>  
>   bio_put(>bio);
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 3aafb3343a65..9bd15be64c25 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -151,6 +151,7 @@ struct dio {
>  } cacheline_aligned_in_smp;
>  
>  static struct kmem_cache *dio_cache __read_mostly;
> +unsigned int sysctl_dio_short_writes = 1;
>  
>  /*
>   * How many pages are in the queue?
> @@ -262,7 +263,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, 
> unsigned int flags)
>   ret = dio->page_errors;
>   if (ret == 0)
>   ret = dio->io_error;
> - if (ret == 0)
> + if (!sysctl_dio_short_writes && (ret == 0))
>   ret = transferred;
>  
>   if (dio->end_io) {
> @@ -310,7 +311,9 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, 
> unsigned int flags)
>   }
>  
>   kmem_cache_free(dio_cache, dio);
> - return ret;
> + if (!sysctl_dio_short_writes)
> + return ret;
> + return transferred ? transferred : ret;
>  }
>  
>  static void dio_aio_complete_work(struct work_struct *work)
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 47d29ccffaef..a8d6908dc0de 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -716,23 +716,24 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio)
>   struct kiocb *iocb = dio->iocb;
>   struct inode *inode = file_inode(iocb->ki_filp);
>   loff_t offset = iocb->ki_pos;
> - ssize_t ret;
> + ssize_t err;
> + ssize_t transferred = dio->size;

I'm sorry to bring