bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-14 Thread Paul Eggert

On 2023-02-14 04:12, Pádraig Brady wrote:


I suspect it's a similar issue to the one that openzfs had:
https://github.com/openzfs/zfs/issues/11900


Yes, quite likely.



Given how closed / uncommunicative Apple are in general
and specifically for this already reported bug,
I'm inclined to disable SEEK_DATA on __APPLE__ for the upcoming release.


If there's an imminent release that's the only way to go. However, if we 
do that we should also disable fclonefileat since the latest edition of 
the patch isn't right either. It messes up the modes in some cases and 
there's a good reason we don't otherwise allow writing through dangling 
symlinks (see the length email discussions when that was put in) so that 
would have to be addressed too.


I'll see if I can come up with an improved version of the fclonefileat 
patch that would handle these issues.




I've attached the 3 latest patches I'm considering in this area.
I presume you're OK with your amended fclonefileat() improvement one?


Not yet, because it doesn't work correctly in some cases (e.g., 
x->explicit_no_preserve_mode) and it mishandles the dangling symlink cases.


The dangling symlink patch might be OK; depends on how fclonefileat 
patch ends up. It's OK to copy through a dangling symlink with 
fclonefileat only if we don't need to fix up permissions or timestamps 
afterwards (e.g., cp -p).


The disabling SEEK_HOLE on Apple is OK if a release is imminent. 
Otherwise I'd like to get to the bottom of it first. This can be done by 
having George use dtrace or (if that's not possible) adding debugging 
printfs.






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-14 Thread Paul Eggert

On 2023-02-13 09:16, George Valkov wrote:

1. We apply my original patch to disable sparse-copy on macOS. Otherwise since 
fclonefileat is not used whenever we overwrite a file, corruption will still 
occur.


I'm not entirely sold on this patch, because I still don't understand 
what's happening. The original bug report in 
 basically says only "it 
doesn't work", and I'd like to know more.


Part of the reason I'm hesitating is that there are multiple ways of 
interpreting what SEEK_HOLE and SEEK_DATA mean, and perhaps it's just 
that Apple has come up with yet another way to interpret it, which we 
need to know about.


Another reason to hesitate is that GNU coreutils is not the only core 
program that uses SEEK_HOLE and SEEK_DATA. If this really is a generic 
Apple problem we need to know which Apple releases have it, so that we 
can program around it at the Gnulib level instead of mucking about with 
each individual program.


Yet another reason is that perhaps the bug was introduced by this pair 
of changes introduced to Gnulib in November 2021 to try to address 
:


https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=4db8db34112b86ddf8bac48f16b5acff732b5fa9
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=1a268176fbb184e393c98575e61fe692264c7d91

These Gnulib patches are attached. If you revert these, does the problem 
go away?


And yet another reason is that perhaps the bug was introduced by this 
pair of Coreutils changes made in October 2021 to try to address 
:


https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a9e31457bf6a63072b54a9324cdf59a09441a21f
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=1753012b8d8cdd0817155f4756f5bf9bfe347816

These Coreutils patches are also attached. Does reverting them help?

All things considered, I'd like a copy of the dtruss output for 
unmodified coreutils 9.1 cp on the failing case. This is really helpful 
for debugging this sort of thing. It's what helped us debug Bug#51857.From 4db8db34112b86ddf8bac48f16b5acff732b5fa9 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Mon, 15 Nov 2021 15:08:25 -0800
Subject: [PATCH 1/2] lseek: port around macOS SEEK_DATA glitch

Problem reported by Sudhip Nashi (Bug#51857).
* doc/posix-functions/lseek.texi (lseek): Mention macOS SEEK_DATA
issue.
* lib/lseek.c (rpl_lseek): Work around macOS portability glitch.
* m4/lseek.m4 (gl_FUNC_LSEEK): Replace lseek on Darwin.
* modules/lseek (Depends-on): Depend on msvc-nothrow
and fstat only if needed.
---
 ChangeLog  | 11 +++
 doc/posix-functions/lseek.texi |  4 
 lib/lseek.c| 16 
 m4/lseek.m4| 10 --
 modules/lseek  |  4 ++--
 5 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index f47071a72d..71a2265706 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2021-11-15  Paul Eggert  
+
+	lseek: port around macOS SEEK_DATA glitch
+	Problem reported by Sudhip Nashi (Bug#51857).
+	* doc/posix-functions/lseek.texi (lseek): Mention macOS SEEK_DATA
+	issue.
+	* lib/lseek.c (rpl_lseek): Work around macOS portability glitch.
+	* m4/lseek.m4 (gl_FUNC_LSEEK): Replace lseek on Darwin.
+	* modules/lseek (Depends-on): Depend on msvc-nothrow
+	and fstat only if needed.
+
 2021-11-11  Fabrice Fontaine(tiny change)
 
 	sigsegv: fix builds on microblazeel, or1k
diff --git a/doc/posix-functions/lseek.texi b/doc/posix-functions/lseek.texi
index 4a9d55dcf7..2f8e2b5877 100644
--- a/doc/posix-functions/lseek.texi
+++ b/doc/posix-functions/lseek.texi
@@ -9,6 +9,10 @@ Gnulib module: lseek
 Portability problems fixed by Gnulib:
 @itemize
 @item
+On some platforms, @code{lseek (fd, offset, SEEK_DATA)} returns a value
+greater than @code{offset} even when @code{offset} addresses data:
+macOS 12
+@item
 This function is declared in a different header file (namely, @code{})
 on some platforms:
 MSVC 14.
diff --git a/lib/lseek.c b/lib/lseek.c
index 0042546a8e..7dcd6c9da7 100644
--- a/lib/lseek.c
+++ b/lib/lseek.c
@@ -52,6 +52,22 @@ rpl_lseek (int fd, off_t offset, int whence)
   errno = ESPIPE;
   return -1;
 }
+#elif defined __APPLE__ && defined __MACH__ && defined SEEK_DATA
+  if (whence == SEEK_DATA)
+{
+  /* If OFFSET points to data, macOS lseek+SEEK_DATA returns the
+ start S of the first data region that begins *after* OFFSET,
+ where the region from OFFSET to S consists of possibly-empty
+ data followed by a possibly-empty hole.  To work around this
+ portability glitch, check whether OFFSET is within data by
+ using lseek+SEEK_HOLE, and if so return to OFFSET by using
+ lseek+SEEK_SET.  */
+  off_t next_hole = lseek (fd, offset, SEEK_HOLE);
+  if (next_hole < 0)
+return next_hole;
+  if (next_hole != offset)
+whence = 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-14 Thread George Valkov


> On 2023-02-14, at 8:12 AM, Paul Eggert  wrote:
> 
> On 2023-02-13 09:16, George Valkov wrote:
>> 1. We apply my original patch to disable sparse-copy on macOS. Otherwise 
>> since fclonefileat is not used whenever we overwrite a file, corruption will 
>> still occur.
> 
> I'm not entirely sold on this patch, because I still don't understand what's 
> happening. The original bug report in 
>  basically says only "it 
> doesn't work", and I'd like to know more.
> 
> Part of the reason I'm hesitating is that there are multiple ways of 
> interpreting what SEEK_HOLE and SEEK_DATA mean, and perhaps it's just that 
> Apple has come up with yet another way to interpret it, which we need to know 
> about.
> 
> Another reason to hesitate is that GNU coreutils is not the only core program 
> that uses SEEK_HOLE and SEEK_DATA. If this really is a generic Apple problem 
> we need to know which Apple releases have it, so that we can program around 
> it at the Gnulib level instead of mucking about with each individual program.
> 
> Yet another reason is that perhaps the bug was introduced by this pair of 
> changes introduced to Gnulib in November 2021 to try to address 
> :
> 
> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=4db8db34112b86ddf8bac48f16b5acff732b5fa9
> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=1a268176fbb184e393c98575e61fe692264c7d91
> 
> These Gnulib patches are attached. If you revert these, does the problem go 
> away?
> 
> And yet another reason is that perhaps the bug was introduced by this pair of 
> Coreutils changes made in October 2021 to try to address 
> :
> 
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a9e31457bf6a63072b54a9324cdf59a09441a21f
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=1753012b8d8cdd0817155f4756f5bf9bfe347816
> 
> These Coreutils patches are also attached. Does reverting them help?
> 
> All things considered, I'd like a copy of the dtruss output for unmodified 
> coreutils 9.1 cp on the failing case. This is really helpful for debugging 
> this sort of thing. It's what helped us debug 
> Bug#51857.<0001-lseek-port-around-macOS-SEEK_DATA-glitch.patch><0002-lseek-port-around-macOS-SEEK_HOLE-glitch.patch><0001-cp-defend-better-against-FreeBSD-9.1-zfs-bug.patch><0002-cp-revert-unnecessary-FreeBSD-workaround.patch>


Hi Paul, I can check later today. One thing I notices is that the sparse-copy 
corruption has been present on 2021-10-06. (October 6th). As I recall it was 
not present on coreutils-8, it was introduced as soon as coreutils-9 came, 
precisely ever since sparse-copy was enabled. I will have to check if I can 
find any old logs or investigate again. But I don’t think reverting these 
patches will make any difference, since the issue was present before them.

Can we bring someone from the gcc team here? They know best how the build 
system for gcc works, and might be able to create a simple PoC to craft a 
sparse file that gets corrupted on copy. Currently I need to rebuild gcc, which 
takes 2-3 minutes and we have no clue how it is crafted.

Any idea how to make Apple aware? Last time I contacted the 
product-secur...@apple.com team on 2021-10-06, the replayed two days later that 
they are investigating, but did not bother fixing the underlaying issue. 
Perhaps if you contact them as a member of GNU/coreutils, they might be more 
open to work? If you do, refer them to Follow-up: 782620861.

I’ll come back to test your requests, at a late point today.

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-14 Thread Pádraig Brady

On 14/02/2023 06:12, Paul Eggert wrote:

On 2023-02-13 09:16, George Valkov wrote:

1. We apply my original patch to disable sparse-copy on macOS. Otherwise since 
fclonefileat is not used whenever we overwrite a file, corruption will still 
occur.


I'm not entirely sold on this patch, because I still don't understand
what's happening. The original bug report in
 basically says only "it
doesn't work", and I'd like to know more.

Part of the reason I'm hesitating is that there are multiple ways of
interpreting what SEEK_HOLE and SEEK_DATA mean, and perhaps it's just
that Apple has come up with yet another way to interpret it, which we
need to know about.


I agree it would be good to know more.
However given it works on HFS but fails on APFS suggests a file system specific 
issue,
rather than an API mismatch issue (over what we're already catering for on 
macOS).
I suspect it's a similar issue to the one that openzfs had:
https://github.com/openzfs/zfs/issues/11900

Given how closed / uncommunicative Apple are in general
and specifically for this already reported bug,
I'm inclined to disable SEEK_DATA on __APPLE__ for the upcoming release.

Also we've mitigated the impact somewhat by enabling fclonefileat() in more 
cases.


Another reason to hesitate is that GNU coreutils is not the only core
program that uses SEEK_HOLE and SEEK_DATA. If this really is a generic
Apple problem we need to know which Apple releases have it, so that we
can program around it at the Gnulib level instead of mucking about with
each individual program.


Yes that would be best if possible.

I've attached the 3 latest patches I'm considering in this area.
I presume you're OK with your amended fclonefileat() improvement one?

thanks,
PádraigFrom 0a590853b159d4eda90b1d3a83fc1859356727ac Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Fri, 10 Feb 2023 13:34:54 -0800
Subject: [PATCH 1/3] copy: improve use of fclonefileat

* src/copy.c (copy_reg): Use CLONE_ACL if available and working.
If the only problem with fclonefileat is that it would create the
file with the wrong timestamp, or with too few permissions,
do that but fix the timestamp and permissions afterwards,
rather than falling back on a traditional copy.
(CLONE_ACL) [HAVE_FCLONEFILEAT && !USE_XATTR]: Default to 0.
---
 NEWS   |  3 +++
 src/copy.c | 53 +
 2 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/NEWS b/NEWS
index 3d0ede150..cbc614c2e 100644
--- a/NEWS
+++ b/NEWS
@@ -124,6 +124,9 @@ GNU coreutils NEWS-*- outline -*-
   and possibly employing copy offloading or reflinking,
   for the non sparse portion of such sparse files.
 
+  On macOS, cp creates a copy-on-write clone in more cases.
+  Previously cp would only do this when preserving mode and timestamps.
+
   date --debug now diagnoses if multiple --date or --set options are
   specified, as only the last specified is significant in that case.
 
diff --git a/src/copy.c b/src/copy.c
index dfbb557de..8370f55bd 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1146,6 +1146,7 @@ copy_reg (char const *src_name, char const *dst_name,
   union scan_inference scan_inference;
   bool return_val = true;
   bool data_copy_required = x->data_copy_required;
+  bool mode_already_preserved = false;
   bool preserve_xattr = USE_XATTR & x->preserve_xattr;
 
   source_desc = open (src_name,
@@ -1244,17 +1245,48 @@ copy_reg (char const *src_name, char const *dst_name,
   if (*new_dst)
 {
 #if HAVE_FCLONEFILEAT && !USE_XATTR
-/* CLONE_NOOWNERCOPY only available on macos >= 10.13.  */
+# ifndef CLONE_ACL
+#  define CLONE_ACL 0 /* Added in macOS 12.6.  */
+# endif
 # ifndef CLONE_NOOWNERCOPY
-#  define CLONE_NOOWNERCOPY 0
+#  define CLONE_NOOWNERCOPY 0 /* Added in macOS 10.13.  */
 # endif
-  int fc_flags = x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY;
+  mode_t cloned_mode_bits = S_ISVTX | S_IRWXUGO;
+  mode_t cloned_mode = src_mode & cloned_mode_bits;
+  int fc_flags = ((x->preserve_mode ? CLONE_ACL : 0)
+  | (x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY));
   if (data_copy_required && x->reflink_mode
-  && x->preserve_mode && x->preserve_timestamps
+  && (x->preserve_mode || ! (cloned_mode & ~dst_mode))
   && (x->preserve_ownership || CLONE_NOOWNERCOPY))
 {
-  if (fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags) == 0)
-goto close_src_desc;
+  int s = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
+  if (s != 0 && (fc_flags & CLONE_ACL) != 0 && errno == EINVAL)
+{
+  fc_flags &= ~CLONE_ACL;
+  s = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
+}
+  if (s == 0)
+{
+  if (!x->preserve_timestamps)
+{
+  struct timespec