bug#60489: Programs should exit after EIO from FICLONE or copy_file_range()

2023-01-06 Thread Pádraig Brady

On 02/01/2023 16:22, Pádraig Brady wrote:

On 02/01/2023 13:28, Andreas Schwab wrote:

On Jan 02 2023, Pádraig Brady wrote:


+  /* Note error set is consistent with copy_file_range().  */
+  bool no_clone_attempted = errno == ENOSYS || is_ENOTSUP (errno)
+|| errno == EINVAL || errno == EBADF
+|| errno == EXDEV || errno == ETXTBSY;


Should this be refactored to avoid duplication?


Yes good call.
We should also refactor the handling of clone failure
to also apply to the usage of fclonefileat() on macOS.

Updated patch attached.


Pushed.

Marking this as done.

thanks,
Pádraig






bug#60489: Programs should exit after EIO from FICLONE or copy_file_range()

2023-01-02 Thread Pádraig Brady

On 02/01/2023 13:28, Andreas Schwab wrote:

On Jan 02 2023, Pádraig Brady wrote:


+  /* Note error set is consistent with copy_file_range().  */
+  bool no_clone_attempted = errno == ENOSYS || is_ENOTSUP (errno)
+|| errno == EINVAL || errno == EBADF
+|| errno == EXDEV || errno == ETXTBSY;


Should this be refactored to avoid duplication?


Yes good call.
We should also refactor the handling of clone failure
to also apply to the usage of fclonefileat() on macOS.

Updated patch attached.

cheers,
Pádraig
From d13bde46a48036f17c3fb651977d75b8d3eb7eda Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Mon, 2 Jan 2023 13:07:41 +
Subject: [PATCH] copy: immediately fail with transient errors from FICLONE

* src/copy.c (handle_clone_fail): A new function refactored
from copy_reg() to handle failures from FICLONE or fclonefileat().
(copy_reg): Fail with all errors from FICLONE,
unless they're from the set indicating the file system
or file do not support the clone operation.
Also fail with errors from fclonefileat() if they're
from the set indicating a transient failure for the file.
(sparse_copy): Call the refactored is_CLONENOTSUP()
which is now also used by the new handle_clone_fail() function.
* NEWS: Mention the bug fix.  Also mention explicitly
the older --reflink=auto default change to aid searching.
* cfg.mk: Adjust old_NEWS_hash with `make update-NEWS-hash`.
Fixes https://bugs.gnu.org/60489
---
 NEWS   |  8 +
 cfg.mk |  2 +-
 src/copy.c | 94 --
 3 files changed, 79 insertions(+), 25 deletions(-)

diff --git a/NEWS b/NEWS
index f43182757..0fbfe7b09 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,13 @@ GNU coreutils NEWS-*- outline -*-
   'cp -rx / /mnt' no longer complains "cannot create directory /mnt/".
   [bug introduced in coreutils-9.1]
 
+  cp, mv, and install, now immediately acknowledge transient errors
+  when creating copy-on-write or cloned reflink files, on supporting
+  file systems like XFS, BTRFS, APFS, etc.
+  Previously they would have tried again with other copy methods
+  which may have resulted in data corruption.
+  [bug introduced in coreutils-7.5 and enabled by default in coreutils-9.0]
+
   'mv --backup=simple f d/' no longer mistakenly backs up d/f to f~.
   [bug introduced in coreutils-9.1]
 
@@ -283,6 +290,7 @@ GNU coreutils NEWS-*- outline -*-
 ** Changes in behavior
 
   cp and install now default to copy-on-write (COW) if available.
+  I.e., cp now uses --reflink=auto mode by default.
 
   cp, install and mv now use the copy_file_range syscall if available.
   Also, they use lseek+SEEK_HOLE rather than ioctl+FS_IOC_FIEMAP on sparse
diff --git a/cfg.mk b/cfg.mk
index 179a51d1c..c9b2d4326 100644
--- a/cfg.mk
+++ b/cfg.mk
@@ -49,7 +49,7 @@ export VERBOSE = yes
 # 4914152 9e
 export XZ_OPT = -8e
 
-old_NEWS_hash = f365e0cf2c7890c617df6abe70615fce
+old_NEWS_hash = 49269d8d99f4888a4e955d28cda50091
 
 # Add an exemption for sc_makefile_at_at_check.
 _makefile_at_at_check_exceptions = ' && !/^cu_install_prog/ && !/dynamic-dep/'
diff --git a/src/copy.c b/src/copy.c
index 310c8bf14..1058b08ec 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -224,6 +224,18 @@ create_hole (int fd, char const *name, bool punch_holes, off_t size)
 }
 
 
+/* Whether the errno from FICLONE, or copy_file_range
+   indicates operation is not supported for this file or file system.  */
+
+static bool
+is_CLONENOTSUP (int err)
+{
+  return err == ENOSYS || is_ENOTSUP (err)
+ || err == EINVAL || err == EBADF
+ || err == EXDEV || err == ETXTBSY;
+}
+
+
 /* Copy the regular file open on SRC_FD/SRC_NAME to DST_FD/DST_NAME,
honoring the MAKE_HOLES setting and using the BUF_SIZE-byte buffer
*ABUF for temporary storage, allocating it lazily if *ABUF is null.
@@ -267,9 +279,7 @@ sparse_copy (int src_fd, int dest_fd, char **abuf, size_t buf_size,
   }
 if (n_copied < 0)
   {
-if (errno == ENOSYS || is_ENOTSUP (errno)
-|| errno == EINVAL || errno == EBADF
-|| errno == EXDEV || errno == ETXTBSY)
+if (is_CLONENOTSUP (errno))
   break;
 
 /* copy_file_range might not be enabled in seccomp filters,
@@ -1064,6 +1074,42 @@ infer_scantype (int fd, struct stat const *sb,
 }
 
 
+/* Handle failure from FICLONE or fclonefileat.
+   Return FALSE if it's a terminal failure for this file.  */
+
+static bool
+handle_clone_fail (int dst_dirfd, char const* dst_relname,
+   char const* src_name, char const* dst_name,
+   int dest_desc, bool new_dst, enum Reflink_type reflink_mode)
+{
+  /* If the clone operation is creating the destination,
+ then don't try and cater for all non transient file system errors,
+ and instead only cater for 

bug#60489: Programs should exit after EIO from FICLONE or copy_file_range()

2023-01-02 Thread Andreas Schwab
On Jan 02 2023, Pádraig Brady wrote:

> +  /* Note error set is consistent with copy_file_range().  */
> +  bool no_clone_attempted = errno == ENOSYS || is_ENOTSUP (errno)
> +|| errno == EINVAL || errno == EBADF
> +|| errno == EXDEV || errno == ETXTBSY;

Should this be refactored to avoid duplication?

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





bug#60489: Programs should exit after EIO from FICLONE or copy_file_range()

2023-01-02 Thread Pádraig Brady

On 02/01/2023 06:36, Noah Misch wrote:

Because Debian coreutils 9.1-1 "cp" silently falls back to copy_file_range()
after FICLONE reports EIO, "cp" can transfer incorrect bytes.  Linux syscalls
having a file descriptor parameter report EIO after a fault in the underlying
device.  The affected file is not recoverable in the general case, but syscall
outcomes after the EIO don't reflect that.  For example, consider FICLONE
returning EIO for a fault during source file writeback.  The kernel will mark
"clean" the affected page cache entries and clear the EIO state.  If the page
cache evicts those pages, their file offsets revert to the last written-back
values if any, else zeros.  If userspace issues a syscall that bypasses the
page cache, like copy_file_range() or another FICLONE, that syscall clones the
last written-back state or zeroes.  See
https://lore.kernel.org/linux-xfs/20221108172436.ga3613...@rfd.leadboat.com
for a "cp" and "cat" test script, background, and discussion.

I recommend instead reporting the EIO and terminating when FICLONE or
copy_file_range() fails with EIO.  One could argue that ENOSPC also warrants
termination, since no fallback reduces space usage.  For other errno values,
fallback to the next transfer strategy, like today.  An alternative would be
to fallback from FICLONE to copy_file_range() only after known-appropriate
errors EBADF, EINVAL, EOPNOTSUPP, ETXTBSY, and EXDEV.  That alternative wins
if future FICLONE reports an additional termination-deserving errno value.
Since just EIO needs termination today, I bet new errno values are more likely
than not to deserve fallback.  What do you think?


Yes we should have an allow list that we retry for,
and otherwise fail for EIO, ENOSPC, ENOMEM, ...

We already do this for copy_file_range(),
but should treat FICLONE similarly,
as done in the attached.

thanks!
Pádraig
From 4041d93fa895222064d6c20d380e05158bd939e7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Mon, 2 Jan 2023 13:07:41 +
Subject: [PATCH] copy: immediately fail with transient errors from FICLONE

* src/copy.c (copy_reg): Fail with all errors from FICLONE,
unless they're from the set indicating the file system
or file do not support the clone operation.
* NEWS: Mention the bug fix.  Also mention explicitly
the --reflink=auto default change to aid searching.
* cfg.mk: Adjust old_NEWS_hash with `make update-NEWS-hash`.
Fixes https://bugs.gnu.org/60489
---
 NEWS   |  7 +++
 cfg.mk |  2 +-
 src/copy.c | 28 ++--
 3 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/NEWS b/NEWS
index f43182757..94f3b9086 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,12 @@ GNU coreutils NEWS-*- outline -*-
   'cp -rx / /mnt' no longer complains "cannot create directory /mnt/".
   [bug introduced in coreutils-9.1]
 
+  cp, install, mv now immediately acknowledge transient errors from
+  the FICLONE ioctl, used to create copy-on-write or reflinked clones.
+  Previously they would have tried again with other copy methods
+  which may have resulted in data corruption.
+  [bug introduced in coreutils-7.5 and enabled by default in coreutils-9.0]
+
   'mv --backup=simple f d/' no longer mistakenly backs up d/f to f~.
   [bug introduced in coreutils-9.1]
 
@@ -283,6 +289,7 @@ GNU coreutils NEWS-*- outline -*-
 ** Changes in behavior
 
   cp and install now default to copy-on-write (COW) if available.
+  I.e., cp now uses --reflink=auto mode by default.
 
   cp, install and mv now use the copy_file_range syscall if available.
   Also, they use lseek+SEEK_HOLE rather than ioctl+FS_IOC_FIEMAP on sparse
diff --git a/cfg.mk b/cfg.mk
index 179a51d1c..c9b2d4326 100644
--- a/cfg.mk
+++ b/cfg.mk
@@ -49,7 +49,7 @@ export VERBOSE = yes
 # 4914152 9e
 export XZ_OPT = -8e
 
-old_NEWS_hash = f365e0cf2c7890c617df6abe70615fce
+old_NEWS_hash = 49269d8d99f4888a4e955d28cda50091
 
 # Add an exemption for sc_makefile_at_at_check.
 _makefile_at_at_check_exceptions = ' && !/^cu_install_prog/ && !/dynamic-dep/'
diff --git a/src/copy.c b/src/copy.c
index 310c8bf14..0f599459b 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -267,6 +267,7 @@ sparse_copy (int src_fd, int dest_fd, char **abuf, size_t buf_size,
   }
 if (n_copied < 0)
   {
+/* Note error set is consistent with clone_file().  */
 if (errno == ENOSYS || is_ENOTSUP (errno)
 || errno == EINVAL || errno == EBADF
 || errno == EXDEV || errno == ETXTBSY)
@@ -1275,23 +1276,30 @@ copy_reg (char const *src_name, char const *dst_name,
 {
   if (clone_file (dest_desc, source_desc) == 0)
 data_copy_required = false;
-  else if (x->reflink_mode == REFLINK_ALWAYS)
+  else
 {
-  error (0, errno, _("failed to clone %s from %s"),
- quoteaf_n (0, dst_name), quoteaf_n (1, src_name));
+  /* Note error set 

bug#60489: Programs should exit after EIO from FICLONE or copy_file_range()

2023-01-02 Thread Noah Misch
Because Debian coreutils 9.1-1 "cp" silently falls back to copy_file_range()
after FICLONE reports EIO, "cp" can transfer incorrect bytes.  Linux syscalls
having a file descriptor parameter report EIO after a fault in the underlying
device.  The affected file is not recoverable in the general case, but syscall
outcomes after the EIO don't reflect that.  For example, consider FICLONE
returning EIO for a fault during source file writeback.  The kernel will mark
"clean" the affected page cache entries and clear the EIO state.  If the page
cache evicts those pages, their file offsets revert to the last written-back
values if any, else zeros.  If userspace issues a syscall that bypasses the
page cache, like copy_file_range() or another FICLONE, that syscall clones the
last written-back state or zeroes.  See
https://lore.kernel.org/linux-xfs/20221108172436.ga3613...@rfd.leadboat.com
for a "cp" and "cat" test script, background, and discussion.

I recommend instead reporting the EIO and terminating when FICLONE or
copy_file_range() fails with EIO.  One could argue that ENOSPC also warrants
termination, since no fallback reduces space usage.  For other errno values,
fallback to the next transfer strategy, like today.  An alternative would be
to fallback from FICLONE to copy_file_range() only after known-appropriate
errors EBADF, EINVAL, EOPNOTSUPP, ETXTBSY, and EXDEV.  That alternative wins
if future FICLONE reports an additional termination-deserving errno value.
Since just EIO needs termination today, I bet new errno values are more likely
than not to deserve fallback.  What do you think?

Thanks,
nm