Re: [PATCH v7 2/2] Return bytes transferred for partial direct I/O
On 05/19/2018 08:29 PM, Theodore Y. Ts'o wrote: > On Thu, Feb 08, 2018 at 12:59:48PM -0600, Goldwyn Rodrigues wrote: >> From: Goldwyn Rodrigues >> >> In case direct I/O encounters an error midway, it returns the error. >> Instead it should be returning the number of bytes transferred so far. >> >> Test case for filesystems (with ENOSPC): >> 1. Create an almost full filesystem >> 2. Create a file, say /mnt/lastfile, until the filesystem is full. >> 3. Direct write() with count > sizeof /mnt/lastfile. >> >> Result: write() returns -ENOSPC. However, file content has data written >> in step 3. >> >> Added a sysctl entry: dio_short_writes which is on by default. This is >> to support applications which expect either and error or the bytes submitted >> as a return value for the write calls. >> >> This fixes fstest generic/472. >> >> Signed-off-by: Goldwyn Rodrigues > > Hi, > > I was wondering if you could give an update regarding what's up with > this patch? > > There doesn't seem to be any movement on this patch in a while, and so > I still have xfstests generic/472 suppressed in {kvm,gce}-xfstests. > >From earlier discussions, In between errors of a direct I/O cannot be handled correctly and may need a lot of tracking that it is not worth performing. It would be better to drop this test case and add in the documentation that a direct I/O error could mean that the write() may or may not have occurred and underlying data may be inconsistent. I am proposing: diff --git a/man2/write.2 b/man2/write.2 index f8a94f3ff..86f655e26 100644 --- a/man2/write.2 +++ b/man2/write.2 @@ -274,6 +274,14 @@ On Linux, returning the number of bytes actually transferred. .\" commit e28cc71572da38a5a12c1cfe4d7032017adccf69 (This is true on both 32-bit and 64-bit systems.) +.PP +While performing +.BR write() +using direct I/O, an error returned does not mean the +entire write has failed. Partial data may be written +and the file offset to length on which the +.BR write() +was attempted should be considered inconsistent. .SH BUGS According to POSIX.1-2008/SUSv4 Section XSI 2.9.7 ("Thread Interactions with Regular File Operations"): -- Goldwyn
Re: [PATCH v7 2/2] Return bytes transferred for partial direct I/O
On Thu, Feb 08, 2018 at 12:59:48PM -0600, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues > > In case direct I/O encounters an error midway, it returns the error. > Instead it should be returning the number of bytes transferred so far. > > Test case for filesystems (with ENOSPC): > 1. Create an almost full filesystem > 2. Create a file, say /mnt/lastfile, until the filesystem is full. > 3. Direct write() with count > sizeof /mnt/lastfile. > > Result: write() returns -ENOSPC. However, file content has data written > in step 3. > > Added a sysctl entry: dio_short_writes which is on by default. This is > to support applications which expect either and error or the bytes submitted > as a return value for the write calls. > > This fixes fstest generic/472. > > Signed-off-by: Goldwyn Rodrigues Hi, I was wondering if you could give an update regarding what's up with this patch? There doesn't seem to be any movement on this patch in a while, and so I still have xfstests generic/472 suppressed in {kvm,gce}-xfstests. Thanks, - Ted
Re: [PATCH v7 2/2] Return bytes transferred for partial direct I/O
On Thu, Mar 08, 2018 at 09:35:48AM -0600, Goldwyn Rodrigues wrote: > > > On 03/07/2018 06:53 PM, Darrick J. Wong wrote: > > On Thu, Feb 08, 2018 at 12:59:48PM -0600, Goldwyn Rodrigues wrote: > >> From: Goldwyn Rodrigues > >> > >> In case direct I/O encounters an error midway, it returns the error. > >> Instead it should be returning the number of bytes transferred so far. > >> > >> Test case for filesystems (with ENOSPC): > >> 1. Create an almost full filesystem > >> 2. Create a file, say /mnt/lastfile, until the filesystem is full. > >> 3. Direct write() with count > sizeof /mnt/lastfile. > >> > >> Result: write() returns -ENOSPC. However, file content has data written > >> in step 3. > >> > >> Added a sysctl entry: dio_short_writes which is on by default. This is > >> to support applications which expect either and error or the bytes > >> submitted > >> as a return value for the write calls. > >> > >> This fixes fstest generic/472. > >> > >> Signed-off-by: Goldwyn Rodrigues > >> --- > >> Documentation/sysctl/fs.txt | 14 ++ > >> fs/block_dev.c | 2 +- > >> fs/direct-io.c | 7 +-- > >> fs/iomap.c | 23 --- > >> include/linux/fs.h | 1 + > >> kernel/sysctl.c | 9 + > >> 6 files changed, 42 insertions(+), 14 deletions(-) > >> > >> Changes since v1: > >> - incorporated iomap and block devices > >> > >> Changes since v2: > >> - realized that file size was not increasing when performing a (partial) > >>direct I/O because end_io function was receiving the error instead of > >>size. Fixed. > >> > >> Changes since v3: > >> - [hch] initialize transferred with dio->size and use transferred instead > >>of dio->size. > >> > >> Changes since v4: > >> - Refreshed to v4.14 > >> > >> Changes since v5: > >> - Added /proc/sys/fs/dio_short_writes (default 1) to guard older > >> applications > >>which expect write(fd, buf, count) returns either count or error. > >> > >> Changes since v6: > >> - Corrected documentation > >> - Re-ordered patch > >> > >> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt > >> index 6c00c1e2743f..21582f675985 100644 > >> --- a/Documentation/sysctl/fs.txt > >> +++ b/Documentation/sysctl/fs.txt > >> @@ -22,6 +22,7 @@ Currently, these files are in /proc/sys/fs: > >> - aio-max-nr > >> - aio-nr > >> - dentry-state > >> +- dio_short_writes > >> - dquot-max > >> - dquot-nr > >> - file-max > >> @@ -76,6 +77,19 @@ dcache isn't pruned yet. > >> > >> == > >> > >> +dio_short_writes: > >> + > >> +In case Direct I/O encounters a transient error, it returns > >> +the error code, even if it has performed part of the write. > >> +This flag, if on (default), will return the number of bytes written > >> +so far, as the write(2) semantics are. However, some older applications > >> +still consider a direct write as an error if all of the I/O > >> +submitted is not complete. I.e. write(file, count, buf) != count. > >> +This option can be disabled on systems in order to support > >> +existing applications which do not expect short writes. > >> + > >> +== > >> + > >> dquot-max & dquot-nr: > >> > >> The file dquot-max shows the maximum number of cached disk > >> diff --git a/fs/block_dev.c b/fs/block_dev.c > >> index 4a181fcb5175..49d94360bb51 100644 > >> --- a/fs/block_dev.c > >> +++ b/fs/block_dev.c > >> @@ -409,7 +409,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter > >> *iter, int nr_pages) > >> > >>if (!ret) > >>ret = blk_status_to_errno(dio->bio.bi_status); > >> - if (likely(!ret)) > >> + if (likely(dio->size)) > >>ret = dio->size; > >> > >>bio_put(&dio->bio); > >> diff --git a/fs/direct-io.c b/fs/direct-io.c > >> index 3aafb3343a65..9bd15be64c25 100644 > >> --- a/fs/direct-io.c > >> +++ b/fs/direct-io.c > >> @@ -151,6 +151,7 @@ struct dio { > >> } cacheline_aligned_in_smp; > >> > >> static struct kmem_cache *dio_cache __read_mostly; > >> +unsigned int sysctl_dio_short_writes = 1; > >> > >> /* > >> * How many pages are in the queue? > >> @@ -262,7 +263,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t > >> ret, unsigned int flags) > >>ret = dio->page_errors; > >>if (ret == 0) > >>ret = dio->io_error; > >> - if (ret == 0) > >> + if (!sysctl_dio_short_writes && (ret == 0)) > >>ret = transferred; > >> > >>if (dio->end_io) { > >> @@ -310,7 +311,9 @@ static ssize_t dio_complete(struct dio *dio, ssize_t > >> ret, unsigned int flags) > >>} > >> > >>kmem_cache_free(dio_cache, dio); > >> - return ret; > >> + if (!sysctl_dio_short_writes) > >> + return ret; > >> + return transferred ? transferred : ret; > >> } > >> > >> static void dio_aio_complete_work(struct work_struct *work) > >> diff
Re: [PATCH v7 2/2] Return bytes transferred for partial direct I/O
On 03/07/2018 06:53 PM, Darrick J. Wong wrote: > On Thu, Feb 08, 2018 at 12:59:48PM -0600, Goldwyn Rodrigues wrote: >> From: Goldwyn Rodrigues >> >> In case direct I/O encounters an error midway, it returns the error. >> Instead it should be returning the number of bytes transferred so far. >> >> Test case for filesystems (with ENOSPC): >> 1. Create an almost full filesystem >> 2. Create a file, say /mnt/lastfile, until the filesystem is full. >> 3. Direct write() with count > sizeof /mnt/lastfile. >> >> Result: write() returns -ENOSPC. However, file content has data written >> in step 3. >> >> Added a sysctl entry: dio_short_writes which is on by default. This is >> to support applications which expect either and error or the bytes submitted >> as a return value for the write calls. >> >> This fixes fstest generic/472. >> >> Signed-off-by: Goldwyn Rodrigues >> --- >> Documentation/sysctl/fs.txt | 14 ++ >> fs/block_dev.c | 2 +- >> fs/direct-io.c | 7 +-- >> fs/iomap.c | 23 --- >> include/linux/fs.h | 1 + >> kernel/sysctl.c | 9 + >> 6 files changed, 42 insertions(+), 14 deletions(-) >> >> Changes since v1: >> - incorporated iomap and block devices >> >> Changes since v2: >> - realized that file size was not increasing when performing a (partial) >>direct I/O because end_io function was receiving the error instead of >>size. Fixed. >> >> Changes since v3: >> - [hch] initialize transferred with dio->size and use transferred instead >>of dio->size. >> >> Changes since v4: >> - Refreshed to v4.14 >> >> Changes since v5: >> - Added /proc/sys/fs/dio_short_writes (default 1) to guard older >> applications >>which expect write(fd, buf, count) returns either count or error. >> >> Changes since v6: >> - Corrected documentation >> - Re-ordered patch >> >> diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt >> index 6c00c1e2743f..21582f675985 100644 >> --- a/Documentation/sysctl/fs.txt >> +++ b/Documentation/sysctl/fs.txt >> @@ -22,6 +22,7 @@ Currently, these files are in /proc/sys/fs: >> - aio-max-nr >> - aio-nr >> - dentry-state >> +- dio_short_writes >> - dquot-max >> - dquot-nr >> - file-max >> @@ -76,6 +77,19 @@ dcache isn't pruned yet. >> >> == >> >> +dio_short_writes: >> + >> +In case Direct I/O encounters a transient error, it returns >> +the error code, even if it has performed part of the write. >> +This flag, if on (default), will return the number of bytes written >> +so far, as the write(2) semantics are. However, some older applications >> +still consider a direct write as an error if all of the I/O >> +submitted is not complete. I.e. write(file, count, buf) != count. >> +This option can be disabled on systems in order to support >> +existing applications which do not expect short writes. >> + >> +== >> + >> dquot-max & dquot-nr: >> >> The file dquot-max shows the maximum number of cached disk >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 4a181fcb5175..49d94360bb51 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -409,7 +409,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter >> *iter, int nr_pages) >> >> if (!ret) >> ret = blk_status_to_errno(dio->bio.bi_status); >> -if (likely(!ret)) >> +if (likely(dio->size)) >> ret = dio->size; >> >> bio_put(&dio->bio); >> diff --git a/fs/direct-io.c b/fs/direct-io.c >> index 3aafb3343a65..9bd15be64c25 100644 >> --- a/fs/direct-io.c >> +++ b/fs/direct-io.c >> @@ -151,6 +151,7 @@ struct dio { >> } cacheline_aligned_in_smp; >> >> static struct kmem_cache *dio_cache __read_mostly; >> +unsigned int sysctl_dio_short_writes = 1; >> >> /* >> * How many pages are in the queue? >> @@ -262,7 +263,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t >> ret, unsigned int flags) >> ret = dio->page_errors; >> if (ret == 0) >> ret = dio->io_error; >> -if (ret == 0) >> +if (!sysctl_dio_short_writes && (ret == 0)) >> ret = transferred; >> >> if (dio->end_io) { >> @@ -310,7 +311,9 @@ static ssize_t dio_complete(struct dio *dio, ssize_t >> ret, unsigned int flags) >> } >> >> kmem_cache_free(dio_cache, dio); >> -return ret; >> +if (!sysctl_dio_short_writes) >> +return ret; >> +return transferred ? transferred : ret; >> } >> >> static void dio_aio_complete_work(struct work_struct *work) >> diff --git a/fs/iomap.c b/fs/iomap.c >> index 47d29ccffaef..a8d6908dc0de 100644 >> --- a/fs/iomap.c >> +++ b/fs/iomap.c >> @@ -716,23 +716,24 @@ static ssize_t iomap_dio_complete(struct iomap_dio >> *dio) >> struct kiocb *iocb = dio->iocb; >> struct inode *inode = file_inode(iocb->ki_filp); >> loff
Re: [PATCH v7 2/2] Return bytes transferred for partial direct I/O
On Thu, Feb 08, 2018 at 12:59:48PM -0600, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues > > In case direct I/O encounters an error midway, it returns the error. > Instead it should be returning the number of bytes transferred so far. > > Test case for filesystems (with ENOSPC): > 1. Create an almost full filesystem > 2. Create a file, say /mnt/lastfile, until the filesystem is full. > 3. Direct write() with count > sizeof /mnt/lastfile. > > Result: write() returns -ENOSPC. However, file content has data written > in step 3. > > Added a sysctl entry: dio_short_writes which is on by default. This is > to support applications which expect either and error or the bytes submitted > as a return value for the write calls. > > This fixes fstest generic/472. > > Signed-off-by: Goldwyn Rodrigues > --- > Documentation/sysctl/fs.txt | 14 ++ > fs/block_dev.c | 2 +- > fs/direct-io.c | 7 +-- > fs/iomap.c | 23 --- > include/linux/fs.h | 1 + > kernel/sysctl.c | 9 + > 6 files changed, 42 insertions(+), 14 deletions(-) > > Changes since v1: > - incorporated iomap and block devices > > Changes since v2: > - realized that file size was not increasing when performing a (partial) >direct I/O because end_io function was receiving the error instead of >size. Fixed. > > Changes since v3: > - [hch] initialize transferred with dio->size and use transferred instead >of dio->size. > > Changes since v4: > - Refreshed to v4.14 > > Changes since v5: > - Added /proc/sys/fs/dio_short_writes (default 1) to guard older applications >which expect write(fd, buf, count) returns either count or error. > > Changes since v6: > - Corrected documentation > - Re-ordered patch > > diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt > index 6c00c1e2743f..21582f675985 100644 > --- a/Documentation/sysctl/fs.txt > +++ b/Documentation/sysctl/fs.txt > @@ -22,6 +22,7 @@ Currently, these files are in /proc/sys/fs: > - aio-max-nr > - aio-nr > - dentry-state > +- dio_short_writes > - dquot-max > - dquot-nr > - file-max > @@ -76,6 +77,19 @@ dcache isn't pruned yet. > > == > > +dio_short_writes: > + > +In case Direct I/O encounters a transient error, it returns > +the error code, even if it has performed part of the write. > +This flag, if on (default), will return the number of bytes written > +so far, as the write(2) semantics are. However, some older applications > +still consider a direct write as an error if all of the I/O > +submitted is not complete. I.e. write(file, count, buf) != count. > +This option can be disabled on systems in order to support > +existing applications which do not expect short writes. > + > +== > + > dquot-max & dquot-nr: > > The file dquot-max shows the maximum number of cached disk > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 4a181fcb5175..49d94360bb51 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -409,7 +409,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter > *iter, int nr_pages) > > if (!ret) > ret = blk_status_to_errno(dio->bio.bi_status); > - if (likely(!ret)) > + if (likely(dio->size)) > ret = dio->size; > > bio_put(&dio->bio); > diff --git a/fs/direct-io.c b/fs/direct-io.c > index 3aafb3343a65..9bd15be64c25 100644 > --- a/fs/direct-io.c > +++ b/fs/direct-io.c > @@ -151,6 +151,7 @@ struct dio { > } cacheline_aligned_in_smp; > > static struct kmem_cache *dio_cache __read_mostly; > +unsigned int sysctl_dio_short_writes = 1; > > /* > * How many pages are in the queue? > @@ -262,7 +263,7 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, > unsigned int flags) > ret = dio->page_errors; > if (ret == 0) > ret = dio->io_error; > - if (ret == 0) > + if (!sysctl_dio_short_writes && (ret == 0)) > ret = transferred; > > if (dio->end_io) { > @@ -310,7 +311,9 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, > unsigned int flags) > } > > kmem_cache_free(dio_cache, dio); > - return ret; > + if (!sysctl_dio_short_writes) > + return ret; > + return transferred ? transferred : ret; > } > > static void dio_aio_complete_work(struct work_struct *work) > diff --git a/fs/iomap.c b/fs/iomap.c > index 47d29ccffaef..a8d6908dc0de 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -716,23 +716,24 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio) > struct kiocb *iocb = dio->iocb; > struct inode *inode = file_inode(iocb->ki_filp); > loff_t offset = iocb->ki_pos; > - ssize_t ret; > + ssize_t err; > + ssize_t transferred = dio->size; I'm sorry to bring this up again, but there's somethin