bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-21 Thread George Valkov


> On 2023-03-21, at 8:21 PM, Pádraig Brady  wrote:
> 
> On 21/03/2023 18:08, George Valkov wrote:
>>> On 2023-03-21, at 7:32 PM, Paul Eggert  wrote:
>>> 
>>> On 3/21/23 08:27, George Valkov wrote:
 However since the old m4-1.4.19 in under tools/m4, coreutils builds with 
 the legacy version:
>>> 
>>> If you're building from that tarball, coreutils doesn't need m4. I just now 
>>> double-checked that, in an environment that didn't have m4. "./configure && 
>>> make check" succeeded. So the m4 version should be irrelevant.
>> Yes, I can confirm that a stand-alone build using your instructions works. I 
>> do not have enough experience with the OpenWRT build system, but what I can 
>> see is that at some point it tries to run staging_dir/host/bin/m4
>> And that’s why I had the impression that the legacy version of m4 is used. I 
>> might be wrong, and maybe it just needs that executable, since when tools/m4 
>> is present, tools/coreutils builds and works. If the m4 directory from the 
>> tarball is used, then everything’s fine, we can keep tools/m4 as it is.
>> staging_dir/host/bin/m4 is compiled from tools/m4 and does not exist without 
>> tools/m4, so the build fails if I remove it. If I create a link to 
>> /usr/local/Cellar/m4/1.4.19/bin/m4, that part of the build succeeds, but 
>> then we also need to apply the following patch:
>> https://github.com/openwrt/openwrt/blob/master/tools/coreutils/patches/001-m4.patch
>> Do you know if the patch is applied upstream in coreutils?
> 
> Yes if that m4 file is patched then m4 will need to run to regen stuff.
> Though the existing m4 tool should be fine for that.
> I would look at figuring out why that m4 tool is not available in your env.
> Perhaps you need to define an M4 env variable or something.
> 
> The openwrt m4 patch originates from
> https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=c5608227
> and that pattern seems to have originated from:
> https://github.com/openwrt/packages/issues/14120#issuecomment-749438742
> For the patch not to be needed it would need to be applied to gnulib,
> but I don't know how general it is.

Thanks, Pádraig!

The OpenWRT build system is designed to work on many platforms. To achieve 
this, some helper host tools are compiled and used later. These tools are 
installed in staging_dir/host/bin, and they are used during the build instead 
of whatever is installed on the system. On macOS for example the default 
version of m4 is old and does not support the --gnu switch, so it is not 
suitable for the build. Hence OpenWRT compiles a working version and uses it 
during the build.

Without the patch, the OpenWRT tools/coreutils build fails. Build log:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/2023-03-21-openwrt-build-coreutils-9.2-no-patch.txt

With the patch it builds correctly. The target cross-platform build for 
coreutils does not use the patch and builds correctly. A regular build outside 
of OpenWRT also does not need the patch. This seems like improperly configured 
include search path or environment on OpenWRT side. So I guess it would be best 
if we can fix that and go without the patch.

At this point I believe that toos/m4 is only used to produce the executable, 
and so it does not require any changes since the m4 directory that comes with 
coreutils is used.


Cheers mate!

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-21 Thread Pádraig Brady

On 21/03/2023 18:08, George Valkov wrote:



On 2023-03-21, at 7:32 PM, Paul Eggert  wrote:

On 3/21/23 08:27, George Valkov wrote:

However since the old m4-1.4.19 in under tools/m4, coreutils builds with the 
legacy version:


If you're building from that tarball, coreutils doesn't need m4. I just now double-checked that, 
in an environment that didn't have m4. "./configure && make check" succeeded. 
So the m4 version should be irrelevant.


Yes, I can confirm that a stand-alone build using your instructions works. I do 
not have enough experience with the OpenWRT build system, but what I can see is 
that at some point it tries to run staging_dir/host/bin/m4
And that’s why I had the impression that the legacy version of m4 is used. I 
might be wrong, and maybe it just needs that executable, since when tools/m4 is 
present, tools/coreutils builds and works. If the m4 directory from the tarball 
is used, then everything’s fine, we can keep tools/m4 as it is.
staging_dir/host/bin/m4 is compiled from tools/m4 and does not exist without 
tools/m4, so the build fails if I remove it. If I create a link to 
/usr/local/Cellar/m4/1.4.19/bin/m4, that part of the build succeeds, but then 
we also need to apply the following patch:
https://github.com/openwrt/openwrt/blob/master/tools/coreutils/patches/001-m4.patch
Do you know if the patch is applied upstream in coreutils?


Yes if that m4 file is patched then m4 will need to run to regen stuff.
Though the existing m4 tool should be fine for that.
I would look at figuring out why that m4 tool is not available in your env.
Perhaps you need to define an M4 env variable or something.

The openwrt m4 patch originates from
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=c5608227
and that pattern seems to have originated from:
https://github.com/openwrt/packages/issues/14120#issuecomment-749438742
For the patch not to be needed it would need to be applied to gnulib,
but I don't know how general it is.

cheers,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-21 Thread George Valkov


> On 2023-03-21, at 7:32 PM, Paul Eggert  wrote:
> 
> On 3/21/23 08:27, George Valkov wrote:
>> However since the old m4-1.4.19 in under tools/m4, coreutils builds with the 
>> legacy version:
> 
> If you're building from that tarball, coreutils doesn't need m4. I just now 
> double-checked that, in an environment that didn't have m4. "./configure && 
> make check" succeeded. So the m4 version should be irrelevant.

Yes, I can confirm that a stand-alone build using your instructions works. I do 
not have enough experience with the OpenWRT build system, but what I can see is 
that at some point it tries to run staging_dir/host/bin/m4
And that’s why I had the impression that the legacy version of m4 is used. I 
might be wrong, and maybe it just needs that executable, since when tools/m4 is 
present, tools/coreutils builds and works. If the m4 directory from the tarball 
is used, then everything’s fine, we can keep tools/m4 as it is.
staging_dir/host/bin/m4 is compiled from tools/m4 and does not exist without 
tools/m4, so the build fails if I remove it. If I create a link to 
/usr/local/Cellar/m4/1.4.19/bin/m4, that part of the build succeeds, but then 
we also need to apply the following patch:
https://github.com/openwrt/openwrt/blob/master/tools/coreutils/patches/001-m4.patch
Do you know if the patch is applied upstream in coreutils?

Thanks Paul!

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-21 Thread Paul Eggert

On 3/21/23 08:27, George Valkov wrote:

However since the old m4-1.4.19 in under tools/m4, coreutils builds with the 
legacy version:


If you're building from that tarball, coreutils doesn't need m4. I just 
now double-checked that, in an environment that didn't have m4. 
"./configure && make check" succeeded. So the m4 version should be 
irrelevant.






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-21 Thread George Valkov


> On 2023-03-21, at 4:49 PM, Paul Eggert  wrote:
> Easiest might be to build from the latest release:
> https://ftp.gnu.org/pub/gnu/coreutils/coreutils-9.2.tar.xz
> If you do that, you shouldn't need to run m4, and all the gnulib files will 
> be there already.


Thanks, Paul!

Yes, this is what we do, if you look at tools/coreutils/Makefile (link below), 
I changed PKG_VERSION and PKG_HASH.

make tools/coreutils/{clean,compile} V=sc
downloads the file you mentioned from one of the @GNU mirrors, the SHA2 hash 
matches 6885ff47b9cdb211de47d368c17853f406daaf98b148aaecdf10de29cc04b0b3, so it 
proceeds with the build.
https://github.com/httpstorm/openwrt/tree/coreutils-9.2/tools/coreutils
https://github.com/openwrt/openwrt/pull/12233/files

However since the old m4-1.4.19 in under tools/m4, coreutils builds with the 
legacy version:
https://github.com/httpstorm/openwrt/tree/coreutils-9.2/tools/m4

I tried removing tools/m4, and I get the following error
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/2023-03-21-openwrt-build-coreutils-9.2.txt

make tools/coreutils/{clean,compile} V=sc
…
config.status: creating po/POTFILES
config.status: creating po/Makefile
touch /Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2/.configured
CFLAGS="-O2 -I/Volumes/wrt3200/openwrt/staging_dir/host/include " 
CPPFLAGS="-I/Volumes/wrt3200/openwrt/staging_dir/host/include " CXXFLAGS="" 
LDFLAGS="-L/Volumes/wrt3200/openwrt/staging_dir/host/lib " make  -C 
/Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2 PROGRAMS="src/date 
src/readlink src/touch src/ln src/chown src/ginstall" LIBRARIES= MANS= 
SUBDIRS=. 
make[3]: Entering directory 
'/Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2'
CDPATH="${ZSH_VERSION+.}:" && cd . && /usr/bin/env bash 
'/Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2/build-aux/missing' 
aclocal-1.16 -I m4
autom4te: error: need GNU m4 1.4 or later: 
/Volumes/wrt3200/openwrt/staging_dir/host/bin/m4
aclocal.real: error: autom4te failed with exit status: 1
make[3]: *** [Makefile:8458: aclocal.m4] Error 1
make[3]: Leaving directory 
'/Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2'
make[2]: *** [Makefile:45: 
/Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2/.built] Error 2
make[2]: Leaving directory '/Volumes/wrt3200/openwrt/tools/coreutils'
time: tools/coreutils/compile#58.69#31.58#154.08
ERROR: tools/coreutils failed to build.
make[1]: *** [tools/Makefile:219: tools/coreutils/compile] Error 1
make[1]: Leaving directory '/Volumes/wrt3200/openwrt'
make: *** [/Volumes/wrt3200/openwrt/include/toplevel.mk:231: 
tools/coreutils/compile] Error 2

Any ideas?


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-21 Thread Paul Eggert

On 3/21/23 06:55, George Valkov wrote:

Is there a source download link or should be clone from the Savanna git repo?


Easiest might be to build from the latest release:

https://ftp.gnu.org/pub/gnu/coreutils/coreutils-9.2.tar.xz

If you do that, you shouldn't need to run m4, and all the gnulib files 
will be there already.







bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-21 Thread George Valkov


> On 2023-03-11, at 4:28 PM, Pádraig Brady  wrote:
> 
> On 11/03/2023 07:29, George Valkov wrote:
>> Hi Paul!
>> Here are the latest test results on macOS 12.6.3:
>> threshold.sh: skipped test: block size of a directory is smaller than 4 bytes
>> SKIP: tests/du/threshold.sh
>> It was also skipped on macOS before the patch, so no change.
>> https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-10-16000805eb1bcdf25360471b8bbc8ec3f025e035-ori.txt
>> The folks at OpenWRT asked for an update on the conreutils-9.2 release date?
>> I will let them know the sparse copy fix is already on Savannah master.
> 
> Best guess is 9.2 release is 5 days away.

Thank you for your hard work on coreutils, Pádraig!

I opened a PR for OpenWRT here:
https://github.com/openwrt/openwrt/pull/12233

I remember you made changes to gnulib and m4, but I do not see a new release 
tag for m4, and I see that OpenWRT builds tools/m4 as a prerequisite for 
tools/coreutils. The version of m4 currently in use seems outdated. It is 
downloaded from
https://ftp.gnu.org/gnu/m4/m4-1.4.19.tar.xz
or one of the many @GNU mirrors. The latest version is from 2021-05-28 17:55. 
What is the current status for m4? I suppose we should instead clone and build 
gnulib? Is there a source download link or should be clone from the Savanna git 
repo? Is there a specific release of gnulib or should we checkout the commit 
selected by coreutils? Or perhaps we should automate the build using:
./bootstrap && ./configure && make

coreutils-9.2 test results on macOS 12.6.3:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-20-9.2-release.txt

Cheers, mate!

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-11 Thread Pádraig Brady

On 11/03/2023 07:29, George Valkov wrote:

Hi Paul!
Here are the latest test results on macOS 12.6.3:
threshold.sh: skipped test: block size of a directory is smaller than 4 bytes
SKIP: tests/du/threshold.sh
It was also skipped on macOS before the patch, so no change.

https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-10-16000805eb1bcdf25360471b8bbc8ec3f025e035-ori.txt

The folks at OpenWRT asked for an update on the conreutils-9.2 release date?
I will let them know the sparse copy fix is already on Savannah master.


Best guess is 9.2 release is 5 days away.

cheers,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-10 Thread George Valkov


> On 2023-03-07, at 11:53 PM, Paul Eggert  wrote:
> 
> On 3/6/23 16:31, Pádraig Brady wrote:
> 
>> syntax check now looks good thanks.
> 
> You're welcome. I'm getting a failure on tests/du/threshold.sh, though, on 
> Fedora 37 x86-64. Are you seeing that? See attached (compressed) log.
> 
> 
>> I see this as a more fundamental operational thing than a limit issue.
>> `split -n` needing the size up front alludes to the operation,
> 
> To ameliorate a bit, we can document this (done in the attached patch).
> 
> I don't see this as being significantly more serious than the other utilities 
> that read all their input (possibly from a pipe) before outputting anything. 
> They all need to deal with exhausting temporary space, and 'split' is merely 
> joining the company of 'sort' and 'tac'.
> 
> 
>> given split is often used with large data,
>> explicit control is often desired over where and how it's persisted.
> 
> Anybody needing that control can copy the data themselves to a temp file, 
> before calling 'split'. (Like 'sort' and 'tac'.)
> 
> Since the change doesn't invalidate existing uses, it sounds like you're 
> worried that users will want 'split' to work even on enormous pipe inputs, 
> inputs so large that /tmp fills up. I think this unlikely in practice, but if 
> it turns into a real concern I can work around most of the problem by using 
> the first output file as the temporary. However, I'd prefer to avoid doing 
> this unless it's necessary, as it seems overkill.
> 
> I installed the attached patch so that 'split' uses a temp file in this 
> situation, so it is no longer limited to the 128 KiB size that it was before 
> the attached patch. Hope this is good enough; if not please let me 
> know.<0001-split-support-split-n-on-larger-pipe-input.patch>

Hi Paul!
Here are the latest test results on macOS 12.6.3:
threshold.sh: skipped test: block size of a directory is smaller than 4 bytes
SKIP: tests/du/threshold.sh
It was also skipped on macOS before the patch, so no change.

https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-10-16000805eb1bcdf25360471b8bbc8ec3f025e035-ori.txt

The folks at OpenWRT asked for an update on the conreutils-9.2 release date?
I will let them know the sparse copy fix is already on Savannah master.


Good luck!

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-07 Thread Pádraig Brady

On 07/03/2023 21:53, Paul Eggert wrote:

On 3/6/23 16:31, Pádraig Brady wrote:


syntax check now looks good thanks.


You're welcome. I'm getting a failure on tests/du/threshold.sh, though,
on Fedora 37 x86-64. Are you seeing that? See attached (compressed) log.



I see this as a more fundamental operational thing than a limit issue.
`split -n` needing the size up front alludes to the operation,


To ameliorate a bit, we can document this (done in the attached patch).

I don't see this as being significantly more serious than the other
utilities that read all their input (possibly from a pipe) before
outputting anything. They all need to deal with exhausting temporary
space, and 'split' is merely joining the company of 'sort' and 'tac'.



given split is often used with large data,
explicit control is often desired over where and how it's persisted.


Anybody needing that control can copy the data themselves to a temp
file, before calling 'split'. (Like 'sort' and 'tac'.)

Since the change doesn't invalidate existing uses, it sounds like you're
worried that users will want 'split' to work even on enormous pipe
inputs, inputs so large that /tmp fills up. I think this unlikely in
practice, but if it turns into a real concern I can work around most of
the problem by using the first output file as the temporary. However,
I'd prefer to avoid doing this unless it's necessary, as it seems overkill.

I installed the attached patch so that 'split' uses a temp file in this
situation, so it is no longer limited to the 128 KiB size that it was
before the attached patch. Hope this is good enough; if not please let
me know.


Thanks for doing that Paul.
The implementation looks good.

I'll look at the du test later.

cheers,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-07 Thread Paul Eggert

On 3/6/23 16:31, Pádraig Brady wrote:


syntax check now looks good thanks.


You're welcome. I'm getting a failure on tests/du/threshold.sh, though, 
on Fedora 37 x86-64. Are you seeing that? See attached (compressed) log.




I see this as a more fundamental operational thing than a limit issue.
`split -n` needing the size up front alludes to the operation,


To ameliorate a bit, we can document this (done in the attached patch).

I don't see this as being significantly more serious than the other 
utilities that read all their input (possibly from a pipe) before 
outputting anything. They all need to deal with exhausting temporary 
space, and 'split' is merely joining the company of 'sort' and 'tac'.




given split is often used with large data,
explicit control is often desired over where and how it's persisted.


Anybody needing that control can copy the data themselves to a temp 
file, before calling 'split'. (Like 'sort' and 'tac'.)


Since the change doesn't invalidate existing uses, it sounds like you're 
worried that users will want 'split' to work even on enormous pipe 
inputs, inputs so large that /tmp fills up. I think this unlikely in 
practice, but if it turns into a real concern I can work around most of 
the problem by using the first output file as the temporary. However, 
I'd prefer to avoid doing this unless it's necessary, as it seems overkill.


I installed the attached patch so that 'split' uses a temp file in this 
situation, so it is no longer limited to the 128 KiB size that it was 
before the attached patch. Hope this is good enough; if not please let 
me know.

make-check-log.txt.gz
Description: application/gzip
From bb9dbcbbfd5c3159f975f39336a33319c6c5df04 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Tue, 7 Mar 2023 12:58:12 -0800
Subject: [PATCH] split: support split -n on larger pipe input

* bootstrap.conf (gnulib_modules): Add free-posix, tmpfile.
* src/split.c (copy_to_tmpfile): New function.
(input_file_size): Use it to split larger files when sizes cannot
easily be determined via fstat or lseek.  See Bug#61386#235.
* tests/split/l-chunk.sh: Mark tests of /dev/zero as
very expensive since they exhaust /tmp.
---
 NEWS   |  3 ++
 bootstrap.conf |  2 +
 doc/coreutils.texi |  6 ++-
 src/split.c| 89 +-
 tests/split/l-chunk.sh | 15 ---
 5 files changed, 72 insertions(+), 43 deletions(-)

diff --git a/NEWS b/NEWS
index 5b0dc939c..3df17d3b3 100644
--- a/NEWS
+++ b/NEWS
@@ -145,6 +145,9 @@ GNU coreutils NEWS-*- outline -*-
   split now accepts options like '-n SIZE' that exceed machine integer
   range, when they can be implemented as if they were infinity.
 
+  split -n now accepts piped input even when not in round-robin mode,
+  by first copying input to a temporary file to determine its size.
+
   wc now accepts the --total={auto,never,always,only} option
   to give explicit control over when the total is output.
 
diff --git a/bootstrap.conf b/bootstrap.conf
index 01430429b..c122354a1 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -103,6 +103,7 @@ gnulib_modules="
   fnmatch-gnu
   fopen-safer
   fprintftime
+  free-posix
   freopen
   freopen-safer
   fseeko
@@ -270,6 +271,7 @@ gnulib_modules="
   time_rz
   timer-time
   timespec
+  tmpfile
   tzset
   uname
   unicodeio
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index f0e46b9ee..7852e9f8a 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -3409,8 +3409,10 @@ are not split even if they overlap a partition, the files written
 can be larger or smaller than the partition size, and even empty
 if a line/record is so long as to completely overlap the partition.
 
-For @samp{r} mode, the size of @var{input} is irrelevant,
-and so can be a pipe for example.
+When the input is a pipe or some other special file where the size
+cannot easily be determined, there is no trouble for @samp{r} mode
+because the size of the input is irrelevant.  For other modes, such an
+input is first copied to a temporary to determine its size.
 
 @item -a @var{length}
 @itemx --suffix-length=@var{length}
diff --git a/src/split.c b/src/split.c
index 95d174a8b..d872ec56a 100644
--- a/src/split.c
+++ b/src/split.c
@@ -275,6 +275,39 @@ CHUNKS may be:\n\
   exit (status);
 }
 
+/* Copy the data in FD to a temporary file, then make that file FD.
+   Use BUF, of size BUFSIZE, to copy.  Return the number of
+   bytes copied, or -1 (setting errno) on error.  */
+static off_t
+copy_to_tmpfile (int fd, char *buf, idx_t bufsize)
+{
+  FILE *tmp = tmpfile ();
+  if (!tmp)
+return -1;
+  off_t copied = 0;
+  off_t r;
+
+  while (0 < (r = read (fd, buf, bufsize)))
+{
+  if (fwrite (buf, 1, r, tmp) != r)
+return -1;
+  if (INT_ADD_WRAPV (copied, r, ))
+{
+  errno = EOVERFLOW;
+  return -1;
+}
+}
+
+  if (r < 0)
+return r;
+  r = dup2 (fileno (tmp), fd);
+  

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-06 Thread Pádraig Brady

On 06/03/2023 23:48, Paul Eggert wrote:

On 3/6/23 03:13, Pádraig Brady wrote:

On 06/03/2023 07:37, Paul Eggert wrote:
The lseek /dev/null issue was only in your undef SEEK_DATA test patch,
and already addressed in your final gnulib patch in the area, as
discussed at:
https://lists.gnu.org/archive/html/bug-coreutils/2023-02/msg00081.html


Sorry,  I got confused about that issue versus other 'split' issues.



Also immediately rejecting input where we can't determine the size is a
feature.
I.e. the following is the expected behavior:

    $ : | split -n l/1
    split: -: cannot determine file size


But 'split' can easily determine the size there: it's zero. 'split'
doesn't need to use lseek to do that; a single 'read' will do.



With the changes we now have:

    $ : | split -n l/1  # Creates an empty file
    $ yes | split -n l/1
    split: -: cannot determine file size

This is inconsistent, and an insidious issue that users may introduce to
scripts,
that will only fail once input data hits a certain size.


That's OK, as lots of standard utilities already have that issue and
users are OK with this. Users can't reasonably expect 'split -n' to work
on infinitely-sized input such as the second example. At some point
there is a limit.

It's a bit like 'sort' if you feed 'sort' longer and longer input lines,
eventually it will fail and give you a diagnostic.

It'd be nicer if the limit of 'split' were larger than what's in current
Git. If you'd prefer that we raise the limit further, by copying stdin
into a temporary file first, I can write a patch to do that.



Also there are a few `make syntax-check` issues in the new split code.


Ouch, sorry, I'm always forgetting that. Fixed by the attached patch.
(Two of the issues were in 'split', one was elsewhere.)


syntax check now looks good thanks.


Would it be possible to revert this change in isolation?


We could revert the patch's effect (not sure if it's a simple revert,
but I could easily generate such a patch). I'm hoping, though, that we
can reach consensus on extending split's functionality instead, perhaps
along the abovementioned lines.

[dropping bug-gnulib since this is no longer relevant to Gnulib.]


I see this as a more fundamental operational thing than a limit issue.
`split -n` needing the size up front alludes to the operation,
so it's obvious that `yes | split -n l/4 ...` doesn't do round robin for example
and fails immediately.

Also the main point of only supporting a small amount of data
before failing is a bad gotcha for users.

So options I see are to persist implicitly with a temp file patch,
or leave as is and have users explicitly persist the data to split.
Advantages of leaving as is, is it's obvious to users that
all data needs to be read before splitting starts.
Also given split is often used with large data,
explicit control is often desired over where and how it's persisted.

So yes a temp file patch would be better / required,
but I'm thinking a revert would still be better again.

cheers,
Pádraig.





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-06 Thread Paul Eggert

On 3/6/23 03:13, Pádraig Brady wrote:

On 06/03/2023 07:37, Paul Eggert wrote:
The lseek /dev/null issue was only in your undef SEEK_DATA test patch,
and already addressed in your final gnulib patch in the area, as 
discussed at:

https://lists.gnu.org/archive/html/bug-coreutils/2023-02/msg00081.html


Sorry,  I got confused about that issue versus other 'split' issues.


Also immediately rejecting input where we can't determine the size is a 
feature.

I.e. the following is the expected behavior:

   $ : | split -n l/1
   split: -: cannot determine file size


But 'split' can easily determine the size there: it's zero. 'split' 
doesn't need to use lseek to do that; a single 'read' will do.




With the changes we now have:

   $ : | split -n l/1  # Creates an empty file
   $ yes | split -n l/1
   split: -: cannot determine file size

This is inconsistent, and an insidious issue that users may introduce to 
scripts,

that will only fail once input data hits a certain size.


That's OK, as lots of standard utilities already have that issue and 
users are OK with this. Users can't reasonably expect 'split -n' to work 
on infinitely-sized input such as the second example. At some point 
there is a limit.


It's a bit like 'sort' if you feed 'sort' longer and longer input lines, 
eventually it will fail and give you a diagnostic.


It'd be nicer if the limit of 'split' were larger than what's in current 
Git. If you'd prefer that we raise the limit further, by copying stdin 
into a temporary file first, I can write a patch to do that.




Also there are a few `make syntax-check` issues in the new split code.


Ouch, sorry, I'm always forgetting that. Fixed by the attached patch. 
(Two of the issues were in 'split', one was elsewhere.)




Would it be possible to revert this change in isolation?


We could revert the patch's effect (not sure if it's a simple revert, 
but I could easily generate such a patch). I'm hoping, though, that we 
can reach consensus on extending split's functionality instead, perhaps 
along the abovementioned lines.


[dropping bug-gnulib since this is no longer relevant to Gnulib.]From a4778006c8f2b669afcc45456acf0d21f228208d Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Mon, 6 Mar 2023 15:37:45 -0800
Subject: [PATCH] =?UTF-8?q?maint:=20pacify=20=E2=80=98make=20syntax-check?=
 =?UTF-8?q?=E2=80=99?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Problem reported by Pádraig Brady (Bug#61386#226).
* src/split.c (parse_chunk): Use die instead of error.
(main): Quote a string.
* tests/local.mk (all_root_tests): Move du/apparent.sh from here ...
(all_tests): ... to here.
---
 src/split.c| 6 +++---
 tests/local.mk | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/split.c b/src/split.c
index cc581b6c6..95d174a8b 100644
--- a/src/split.c
+++ b/src/split.c
@@ -1325,8 +1325,8 @@ parse_chunk (intmax_t *k_units, intmax_t *n_units, char const *arg)
   *n_units = parse_n_units (argend + 1, "",
 N_("invalid number of chunks"));
   if (! (0 < *k_units && *k_units <= *n_units))
-error (EXIT_FAILURE, 0, "%s: %s", _("invalid chunk number"),
-   quote_mem (arg, argend - arg));
+die (EXIT_FAILURE, 0, "%s: %s", _("invalid chunk number"),
+ quote_mem (arg, argend - arg));
 }
   else if (! (e <= OVERFLOW_OK && 0 < *n_units))
 strtoint_die (N_("invalid number of chunks"), arg);
@@ -1561,7 +1561,7 @@ main (int argc, char **argv)
 
   if (n_units == 0)
 {
-  error (0, 0, _("invalid number of lines: %s"), "0");
+  error (0, 0, _("invalid number of lines: %s"), quote ("0"));
   usage (EXIT_FAILURE);
 }
 
diff --git a/tests/local.mk b/tests/local.mk
index 1fe04235d..1e93290d0 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -118,7 +118,6 @@ all_root_tests =\
   tests/dd/skip-seek-past-dev.sh		\
   tests/df/problematic-chars.sh			\
   tests/df/over-mount-device.sh			\
-  tests/du/apparent.sh\
   tests/du/bind-mount-dir-cycle.sh		\
   tests/du/bind-mount-dir-cycle-v2.sh		\
   tests/id/setgid.sh\
@@ -557,6 +556,7 @@ all_tests =	\
   tests/df/total-verify.sh			\
   tests/du/2g.sh\
   tests/du/8gb.sh\
+  tests/du/apparent.sh\
   tests/du/basic.sh\
   tests/du/bigtime.sh\
   tests/du/deref.sh\
-- 
2.39.2



bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-06 Thread George Valkov


> On 2023-03-06, at 9:37 AM, Paul Eggert  wrote:
> 
> I recall reading somewhere in this thread that a 'split' test was failing on 
> macOS because it doesn't let you lseek on /dev/null. I fixed that problem 
> here:
> 
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=aa266f1b3dc4e12acdc46cc0f562adc03c2c0b8f
> 
> and fixed some other 'split' issues in adjacent patches, while I was in the 
> neighborhood.

I would assume the test are irrelevant, but since I already ran then, here you 
go:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-09-e30af1e5840fdcabe3856f9ffcd6354b94d0a7ee-ori.txt


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-06 Thread Pádraig Brady

On 06/03/2023 07:37, Paul Eggert wrote:

I recall reading somewhere in this thread that a 'split' test was
failing on macOS because it doesn't let you lseek on /dev/null. I fixed
that problem here:

https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=aa266f1b3dc4e12acdc46cc0f562adc03c2c0b8f

and fixed some other 'split' issues in adjacent patches, while I was in
the neighborhood.


The lseek /dev/null issue was only in your undef SEEK_DATA test patch,
and already addressed in your final gnulib patch in the area, as discussed at:
https://lists.gnu.org/archive/html/bug-coreutils/2023-02/msg00081.html

Also immediately rejecting input where we can't determine the size is a feature.
I.e. the following is the expected behavior:

  $ : | split -n l/1
  split: -: cannot determine file size

With the changes we now have:

  $ : | split -n l/1  # Creates an empty file
  $ yes | split -n l/1
  split: -: cannot determine file size

This is inconsistent, and an insidious issue that users may introduce to 
scripts,
that will only fail once input data hits a certain size.

Also there are a few `make syntax-check` issues in the new split code.

Would it be possible to revert this change in isolation?

thanks,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-03-05 Thread Paul Eggert
I recall reading somewhere in this thread that a 'split' test was 
failing on macOS because it doesn't let you lseek on /dev/null. I fixed 
that problem here:


https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=aa266f1b3dc4e12acdc46cc0f562adc03c2c0b8f

and fixed some other 'split' issues in adjacent patches, while I was in 
the neighborhood.






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-25 Thread Pádraig Brady

On 24/02/2023 23:23, George Valkov wrote:

Are we in a state where I can update OpenWRT to the latest coreutils master or 
perhaps it would be better to wait until you fix the tests and coreutils-9.2 is 
released? You mentioned the release is coming soon?


Probably best to wait for the 9.2 release at this stage.
Best guess is 1.5 weeks away.

cheers,
Pádraig.





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-24 Thread George Valkov


> On 2023-02-25, at 12:47 AM, Pádraig Brady  wrote:
> 
> On 24/02/2023 22:06, George Valkov wrote:
>> If I revert a0803c4bad6f8e11bb05effcc42ef433f4fc3f9b, the requirement to 
>> press enter after PASS: tests/rm/isatty.sh is fixed.
> 
> Ah very useful info.
> I'll test this on my macOS 13.2 system.
> 
>> Sometimes it might randomly hang: gdb after
>> PASS: tests/rm/r-4.sh
>> ^Cmake[1]: *** Deleting file 'tests/rm/r-root.log'
>> ^C^C^C
> 
> This is the same gdb issue, which I'll skip similarly
> to the tail-2/inotify tests.
> 
> 
> Since all issues related to sparse copy on macOS are now addressed,
> I'm marking this bug as done.
> 
> The above bugs are unrelated, and I'll take them from here.
> 
> thanks again for all the testing.


Thank you Pádraig! You and Paul did a lot of work, and I’m happy the issue is 
resolved. I’ll let you know if I get any update from Apple.

Are we in a state where I can update OpenWRT to the latest coreutils master or 
perhaps it would be better to wait until you fix the tests and coreutils-9.2 is 
released? You mentioned the release is coming soon?

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-24 Thread Pádraig Brady

On 24/02/2023 22:06, George Valkov wrote:

If I revert a0803c4bad6f8e11bb05effcc42ef433f4fc3f9b, the requirement to press 
enter after PASS: tests/rm/isatty.sh is fixed.



Ah very useful info.
I'll test this on my macOS 13.2 system.


Sometimes it might randomly hang: gdb after
PASS: tests/rm/r-4.sh
^Cmake[1]: *** Deleting file 'tests/rm/r-root.log'
^C^C^C


This is the same gdb issue, which I'll skip similarly
to the tail-2/inotify tests.


Since all issues related to sparse copy on macOS are now addressed,
I'm marking this bug as done.

The above bugs are unrelated, and I'll take them from here.

thanks again for all the testing.

Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-24 Thread George Valkov
If I revert a0803c4bad6f8e11bb05effcc42ef433f4fc3f9b, the requirement to press 
enter after PASS: tests/rm/isatty.sh is fixed.

Sometimes it might randomly hang: gdb after
PASS: tests/rm/r-4.sh
^Cmake[1]: *** Deleting file 'tests/rm/r-root.log'
^C^C^C

ps -A |grep gdb
55051 ttys0200:00.09 gdb -nx --batch-silent -return-child-result 
--eval-command=set exec-wrapper\011\011\011\011\011   env 
'LD_PRELOAD=:./k.so'  --eval-command=break 
'/Volumes/coreutils/coreutils-2023-02-24.d/src/remove.c:377' 
--eval-command=source bp.py --eval-command=run -rv --one-file-system dir 
--eval-command=quit rm

even though I have applied macos-gdb-hang.patch. Other times it goes past that:
r-root.sh: skipped test: internal test failure: maybe LD_PRELOAD or gdb doesn't 
work?
and eventually the tests complete.

If it hangs and I killall -9 gdb, it continues
r-root.sh: skipped test: internal test failure: maybe LD_PRELOAD or gdb doesn't 
work?
SKIP: tests/rm/r-root.sh
PASS: tests/rm/readdir-bug.sh
…


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-24 Thread George Valkov


> On 2023-02-24, at 5:43 PM, Pádraig Brady  wrote:
> 
> On 24/02/2023 14:33, George Valkov wrote:
>>> On 2023-02-24, at 12:23 AM, Paul Eggert  wrote:
>>> 
>>> On 2/20/23 13:14, Pádraig Brady wrote:
 Since https://github.com/coreutils/gnulib/commit/4db8db34
 gnulib has been unconditionally replacing lseek() on macos.
 Now with SEEK_DATA undefined that replaced lseek()
 will run the code intended for BeOS.
 So either the lseek.m4 or lseek.c needs adjusting accordingly.
>>> 
>>> Good catch, thanks. I updated the Gnulib patch accordingly; see attached.
>> Nice: I just downloaded a fresh copy of Savannah, and it already includes 
>> the attached patch, so no action needed on my side.
>> The copy created by cp is good. I noticed that you have added a —debug 
>> switch, so I gave it a test:
>> 1. If the target exists I get this output. I would assume zeroes means that 
>> all data is read from the source, but pages containing only zeroes are 
>> skipped, while pages containing data are written to the target, which is 
>> fine. I would guess 4 KB page granularity or some form of detection takes 
>> place.
> 
> Exactly.
> 
>> ./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
>> 'cc1-mmap' -> 'cc1-ori'
>> copy offload: avoided, reflink: unsupported, sparse detection: zeros
>> 2. If the target does not exist, the file is cloned. You might want to 
>> report something about that in the debug output. Otherwise the clone is good.
>> ./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
>> 'cc1-mmap' -> 'cc1-ori’
> 
> Yes that was a mistake.
> Should be fixed with
> https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=65bb27656

Looks good
commit 65bb2765646df18488b266e6c1851593d3f9c966

./coreutils-2023-02-24.b/src/cp --debug cc1-mmap cc1-ori 
'cc1-mmap' -> 'cc1-ori'
copy offload: unknown, reflink: yes, sparse detection: unknown


>> My first attempt to run the tests stopped here, so I had to interrupt it 10 
>> minutes later with Control+C.
> 
>> ^Cmake[1]: *** Deleting file 'tests/tail-2/inotify-race.log'
>> ^C^C^C^C^C^C
>> killall gdb
>> ps -A |grep gdb
>> 29584 ?? 0:00.09 gdb -nx --batch-silent --eval-command=break 1475 
>> --eval-command=run --pid=29583 -f file --eval-command=quit tail
>> 23269 ttys0100:00.09 gdb -nx --batch-silent --eval-command=break 1475 
>> --eval-command=run --pid=23268 -f file --eval-command=quit tail
>> killall -9 gdb
>> ps -A |grep gdb
>> Killing gdb allowed the tests to continue, I had to do it twice:
> 
> That's awkward.
> That test documents the issue with protecting the gdb invocation with 
> timeout(1).
> I suppose we could restrict this test to inotify capable systems,
> which would have the side effect of avoiding its use on non linux systems.
> The attached patch does that.

Still hanging out there after
PASS: tests/rm/isatty.sh

There were no gdb instances to kill, but if I press enter, it continues

empty-inacc.sh: set-up failure: 
ERROR: tests/rm/empty-inacc.sh
PASS: tests/rm/empty-name.pl
…

https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-07-65bb2765646df18488b266e6c1851593d3f9c966-patch.txt


coreutils: git checkout cf80f988eeb97cc3f8c65ae58e735d36f865277b
gnulib: git checkout 32c16c45d7378b014d9aac6130104c4d02a9acdb

./bootstrap && ./configure && make clean && make -j 16
make check-TESTS # still hangs: gdb

git checkout -b cf80-macos-gdb-hang
git am < macos-gdb-hang.patch
make clean && make -j 16
make check-TESTS # completes successfully

Clone and configure another fresh copy; gnulib at master; test various 
coreutils commits, applying macos-gdb-hang.patch on top of them; make clean && 
make -j 16

cf80f988eeb97cc3f8c65ae58e735d36f865277b hangs:  gdb

I would suspect either some change in gnulib or some of the other repositories 
is causing these. I cannot checkout random gnulib commits, since the build 
breaks completely.

Check various commits of gnulib
gnulib 7352d9fec59398c76c3bb8a2ef86ba58818f0342 master hangs: ENTER
gnulib bb3fd10e6309f017618a12b2c10d3bfb813bfc08 hangs: ENTER
gnulib f77a31de60963c994cd9b42c8088be0e734962d7 aclocal-1.16: error: aclocal: 
file 'm4/build-to-host.m4' does not exist

Trying to revert some commits and go back in time:
git revert f77a31de60963c994cd9b42c8088be0e734962d7 fails
git revert 1e29238e40d118d4f769f7516700dd4fc494bfcd fails


> thanks for all the testing.

Look Pádraig, I’m glad to help, but this is really taking a lot of energy, 
these tests took another full day, and I’d be thankful if we can make 
everything work sooner. I’ve been spending many hours each day for a few weeks 
now. I need to finish my own tasks and find a job.

In the old build directory if I ./bootstrap && ./configure and make -j 16, the 
tests complete. It is using these checkpoints:
coreutils: git checkout cf80f988eeb97cc3f8c65ae58e735d36f865277b
gnulib: git checkout 32c16c45d7378b014d9aac6130104c4d02a9acdb

However if I clone a fresh copy ./bootstrap && ./configure, then check the same 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-24 Thread Pádraig Brady

On 24/02/2023 14:33, George Valkov wrote:



On 2023-02-24, at 12:23 AM, Paul Eggert  wrote:

On 2/20/23 13:14, Pádraig Brady wrote:

Since https://github.com/coreutils/gnulib/commit/4db8db34
gnulib has been unconditionally replacing lseek() on macos.
Now with SEEK_DATA undefined that replaced lseek()
will run the code intended for BeOS.
So either the lseek.m4 or lseek.c needs adjusting accordingly.


Good catch, thanks. I updated the Gnulib patch accordingly; see attached.


Nice: I just downloaded a fresh copy of Savannah, and it already includes the 
attached patch, so no action needed on my side.

The copy created by cp is good. I noticed that you have added a —debug switch, 
so I gave it a test:
1. If the target exists I get this output. I would assume zeroes means that all 
data is read from the source, but pages containing only zeroes are skipped, 
while pages containing data are written to the target, which is fine. I would 
guess 4 KB page granularity or some form of detection takes place.


Exactly.


./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
'cc1-mmap' -> 'cc1-ori'
copy offload: avoided, reflink: unsupported, sparse detection: zeros

2. If the target does not exist, the file is cloned. You might want to report 
something about that in the debug output. Otherwise the clone is good.

./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
'cc1-mmap' -> 'cc1-ori’


Yes that was a mistake.
Should be fixed with
https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=65bb27656


My first attempt to run the tests stopped here, so I had to interrupt it 10 
minutes later with Control+C.



^Cmake[1]: *** Deleting file 'tests/tail-2/inotify-race.log'
^C^C^C^C^C^C

killall gdb
ps -A |grep gdb
29584 ?? 0:00.09 gdb -nx --batch-silent --eval-command=break 1475 
--eval-command=run --pid=29583 -f file --eval-command=quit tail
23269 ttys0100:00.09 gdb -nx --batch-silent --eval-command=break 1475 
--eval-command=run --pid=23268 -f file --eval-command=quit tail

killall -9 gdb
ps -A |grep gdb

Killing gdb allowed the tests to continue, I had to do it twice:


That's awkward.
That test documents the issue with protecting the gdb invocation with 
timeout(1).
I suppose we could restrict this test to inotify capable systems,
which would have the side effect of avoiding its use on non linux systems.
The attached patch does that.

thanks for all the testing.

PádraigFrom 952c8bad4c297ed48cddbb81f1030a35812ca980 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Fri, 24 Feb 2023 15:40:37 +
Subject: [PATCH] tests: avoid gdb hang on macOS

* tests/tail-2/inotify-race.sh: Restrict the test to
inotify capable systems to avoid the hang with some gdbs.
* tests/tail-2/inotify-race.sh: Likewise.
---
 tests/tail-2/inotify-race.sh  | 3 +++
 tests/tail-2/inotify-race2.sh | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/tests/tail-2/inotify-race.sh b/tests/tail-2/inotify-race.sh
index c722fb9fa..63f906536 100755
--- a/tests/tail-2/inotify-race.sh
+++ b/tests/tail-2/inotify-race.sh
@@ -23,6 +23,9 @@
 . "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
 print_ver_ tail sleep
 
+grep '^#define HAVE_INOTIFY 1' "$CONFIG_HEADER" >/dev/null && is_local_dir_ . \
+  || skip_ 'inotify is not supported'
+
 # Terminate any background gdb/tail process
 cleanup_() {
   kill $pid 2>/dev/null && wait $pid
diff --git a/tests/tail-2/inotify-race2.sh b/tests/tail-2/inotify-race2.sh
index 89b02c6cf..19219b72e 100755
--- a/tests/tail-2/inotify-race2.sh
+++ b/tests/tail-2/inotify-race2.sh
@@ -22,6 +22,9 @@
 . "${srcdir=.}/tests/init.sh"; path_prepend_ ./src
 print_ver_ tail sleep
 
+grep '^#define HAVE_INOTIFY 1' "$CONFIG_HEADER" >/dev/null && is_local_dir_ . \
+  || skip_ 'inotify is not supported'
+
 # Terminate any background gdb/tail process
 cleanup_() {
   kill $pid 2>/dev/null && wait $pid
-- 
2.26.2



bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-24 Thread George Valkov


> On 2023-02-24, at 12:23 AM, Paul Eggert  wrote:
> 
> On 2/20/23 13:14, Pádraig Brady wrote:
>> Since https://github.com/coreutils/gnulib/commit/4db8db34
>> gnulib has been unconditionally replacing lseek() on macos.
>> Now with SEEK_DATA undefined that replaced lseek()
>> will run the code intended for BeOS.
>> So either the lseek.m4 or lseek.c needs adjusting accordingly.
> 
> Good catch, thanks. I updated the Gnulib patch accordingly; see attached.

Nice: I just downloaded a fresh copy of Savannah, and it already includes the 
attached patch, so no action needed on my side.

The copy created by cp is good. I noticed that you have added a —debug switch, 
so I gave it a test:
1. If the target exists I get this output. I would assume zeroes means that all 
data is read from the source, but pages containing only zeroes are skipped, 
while pages containing data are written to the target, which is fine. I would 
guess 4 KB page granularity or some form of detection takes place.

./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
'cc1-mmap' -> 'cc1-ori'
copy offload: avoided, reflink: unsupported, sparse detection: zeros

2. If the target does not exist, the file is cloned. You might want to report 
something about that in the debug output. Otherwise the clone is good.

./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
'cc1-mmap' -> 'cc1-ori’


My first attempt to run the tests stopped here, so I had to interrupt it 10 
minutes later with Control+C.
make check-TESTS
make[1]: Entering directory '/Volumes/coreutils/coreutils-2023-02-24'
PASS: tests/misc/help-version.sh
PASS: tests/misc/help-version-getopt.sh

Then I ran it again, and I can see the tests run very slowly now. This time it 
hanged after:

PASS: tests/rm/isatty.sh
^Cmake[1]: *** Deleting file 'tests/rm/empty-inacc.log'
make[1]: *** [Makefile:21399: tests/rm/empty-inacc.log] Error 130
make: *** [Makefile:21381: check-TESTS] Interrupt: 2

make check-TESTS

make[1]: Entering directory '/Volumes/coreutils/coreutils-2023-02-24'
PASS: tests/misc/help-version.sh
PASS: tests/misc/help-version-getopt.sh
^Cmake[1]: *** Deleting file 'tests/tail-2/inotify-race.log'
^C^C^C^C

Now it hangs here and I cannot interrupt it. I don’t see any CPU usage related 
to the tests. My laptop is idle. Attempting to close the Terminal window gave 
me this warning:

Do you want to terminate running processes in this tab?
Closing this tab will terminate the running processes: gdb, bash, make (2), sh 
(4).

That’s a good culprit. Two days ago brew updated gdb to version 13.1 and there 
was a message asking me to sign it with some entitlements to make it more 
functional. So, I signed gdb. At first though make check-TESTS does not play 
nice when gdb is signed, so I removed the signature, but that did not help. 
Next I restored version 12.1_2 and ran make check-TESTS again, which still 
hangs at the same points. So gdb signature and version have nothing to do with 
this new issue.

Next I ran the tests on top of my previous unpatched build directory coreutils 
cf80f988eeb97cc3f8c65ae58e735d36f865277b, gnulib 
32c16c45d7378b014d9aac6130104c4d02a9acdb and it works fine, so I would assume 
something related to the tests has been broken recently in coreutils or gnulib. 
I restored gdb 13.1 signed and the tests completed again.

Back to latest unmodified coreutils 5c8c2a5161c0b6f84212778f694c526105f10dab, 
gnulib 7352d9fec59398c76c3bb8a2ef86ba58818f0342, the tests hang.

make check-TESTS
make[1]: Entering directory '/Volumes/coreutils/coreutils-2023-02-24'
PASS: tests/misc/help-version.sh
PASS: tests/misc/help-version-getopt.sh
^Cmake[1]: *** Deleting file 'tests/tail-2/inotify-race.log'
^C^C^C^C^C^C

killall gdb
ps -A |grep gdb
29584 ?? 0:00.09 gdb -nx --batch-silent --eval-command=break 1475 
--eval-command=run --pid=29583 -f file --eval-command=quit tail
23269 ttys0100:00.09 gdb -nx --batch-silent --eval-command=break 1475 
--eval-command=run --pid=23268 -f file --eval-command=quit tail

killall -9 gdb
ps -A |grep gdb

Killing gdb allowed the tests to continue, I had to do it twice:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-06-5c8c2a5161c0b6f84212778f694c526105f10dab-ori.txt


> On 2/20/23 02:21, George Valkov wrote:
> > What is the correct way to apply your patch?
> 
> Sounds like patching is a bit of a hassle, so to make things easier I 
> installed the attached patch into Gnulib, and propagated this into Coreutils.

I’m starting to think I made a mistake by trying to apply your patch inside 
coreutils, since it already had links to the affected files. I guess I should 
have applied the patch inside coreutils/gnulib?

So my question was: what command do you use to apply your patch and in which 
directory do you run it?


> You should be able to get the latest version of Coreutils from 
> savannah.gnu.org, and then run './bootstrap; ./configure; make' if you have 
> suitable development tools like Autoconf. You can run 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-23 Thread Pádraig Brady

On 23/02/2023 22:23, Paul Eggert wrote:

On 2/20/23 13:14, Pádraig Brady wrote:

Since https://github.com/coreutils/gnulib/commit/4db8db34
gnulib has been unconditionally replacing lseek() on macos.
Now with SEEK_DATA undefined that replaced lseek()
will run the code intended for BeOS.
So either the lseek.m4 or lseek.c needs adjusting accordingly.


Good catch, thanks. I updated the Gnulib patch accordingly; see attached.


Cool.


On 2/20/23 02:21, George Valkov wrote:
  > What is the correct way to apply your patch?

Sounds like patching is a bit of a hassle, so to make things easier I
installed the attached patch into Gnulib, and propagated this into
Coreutils.


The latest coreutils that should include all fixes for this issue is at
https://www.pixelbeat.org/cu/coreutils-9.1.160-5c8c2.tar.gz
That should be buildable with the standard ./configure && make combo

thanks!,
Pádraig






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-23 Thread Paul Eggert

On 2/20/23 13:14, Pádraig Brady wrote:

Since https://github.com/coreutils/gnulib/commit/4db8db34
gnulib has been unconditionally replacing lseek() on macos.
Now with SEEK_DATA undefined that replaced lseek()
will run the code intended for BeOS.
So either the lseek.m4 or lseek.c needs adjusting accordingly.


Good catch, thanks. I updated the Gnulib patch accordingly; see attached.

On 2/20/23 02:21, George Valkov wrote:
> What is the correct way to apply your patch?

Sounds like patching is a bit of a hassle, so to make things easier I 
installed the attached patch into Gnulib, and propagated this into 
Coreutils.


You should be able to get the latest version of Coreutils from 
savannah.gnu.org, and then run './bootstrap; ./configure; make' if you 
have suitable development tools like Autoconf. You can run ./bootstrap 
on a GNU/Linux platform and then copy the directory to macOS and run 
'./configure; make' on macOS. Please give it a try.


> If you know what conditions are required to reproduce the issue on 
FreeBSD,


I don't. I'm relying on FreeBSD bug report 256205. I cited that in the 
attached patch.


> Is it ok if I continue testing on FreeBSD 13.1

Sure, that's fine. Possibly the bug is fixed in FreeBSD 13.1; if so, 
perhaps we can improve performance on 13.1 by changing "__FreeBSD__ < 
14" to something else.


On 2/20/23 13:25, George Valkov wrote:
> there is no need for that hack in lseek.c.

Yes, we don't need any new hack in lseek.c. And strictly speaking, even 
the existing macOS-specific hack in lseek.c (look for __APPLE__) is no 
longer needed, since (with the changes I just installed) lseek.c is no 
longer compiled for macOS.


However, assuming Apple fixes the macOS bug in (say) macOS 14, so that 
we change the "" to "14" in Gnulib's lib/unistd.in.h and 
m4/lseek.m4, we'll likely need that hack back because it works around an 
incompatibility between GNU-or-FreeBSD-or-Solaris SEEK_DATA and macOS 
SEEK_DATA; see 
. So I 
thought it nicer to leave it in for now; it has zero runtime cost on all 
platforms.



From 7352d9fec59398c76c3bb8a2ef86ba58818f0342 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sun, 19 Feb 2023 00:05:24 -0600
Subject: [PATCH] lseek: avoid SEEK_HOLE bugs in FreeBSD, macOS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This attempts to fix , a bug in GNU cp
caused by a serious data corruption bug in FreeBSD and macOS.
* doc/posix-functions/lseek.texi: Mention the bug.
* lib/unistd.in.h (SEEK_DATA, SEEK_HOLE): Undef in macOS < 13 and
FreeBSD < 14.  FreeBSD fixed the bug sometime during FreeBSD 13
, so the "FreeBSD < 14" is
conservative.  It’s unknown when Apple will fix macOS so use
macOS "" as a placeholder.
* m4/lseek.m4 (gl_FUNC_LSEEK): Replace lseek if on one of the
above platforms.
---
 ChangeLog  | 14 ++
 doc/posix-functions/lseek.texi |  5 +
 lib/unistd.in.h| 18 ++
 m4/lseek.m4| 32 ++--
 4 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index c1ca610548..5e436f5b3a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,17 @@
+2023-02-23  Paul Eggert  
+
+	lseek: avoid SEEK_HOLE bugs in FreeBSD, macOS
+	This attempts to fix , a bug in GNU cp
+	caused by a serious data corruption bug in FreeBSD and macOS.
+	* doc/posix-functions/lseek.texi: Mention the bug.
+	* lib/unistd.in.h (SEEK_DATA, SEEK_HOLE): Undef in macOS < 13 and
+	FreeBSD < 14.  FreeBSD fixed the bug sometime during FreeBSD 13
+	, so the "FreeBSD < 14" is
+	conservative.  It’s unknown when Apple will fix macOS so use
+	macOS "" as a placeholder.
+	* m4/lseek.m4 (gl_FUNC_LSEEK): Replace lseek if on one of the
+	above platforms.
+
 2023-02-18  Bruno Haible  
 
 	configmake: Add support for $build_os != $host_os.
diff --git a/doc/posix-functions/lseek.texi b/doc/posix-functions/lseek.texi
index 2f8e2b5877..3470524b12 100644
--- a/doc/posix-functions/lseek.texi
+++ b/doc/posix-functions/lseek.texi
@@ -37,4 +37,9 @@ IRIX 6.5.
 @item
 Some systems do not support @code{SEEK_DATA} and @code{SEEK_HOLE}:
 AIX, HP-UX, Microsoft Windows, NetBSD, OpenBSD.
+@item
+Some systems have a buggy @code{SEEK_DATA} and @code{SEEK_HOLE},
+and Gnulib works around the problem via @code{#undef SEEK_DATA}
+and @code{#undef SEEK_HOLE}:
+FreeBSD 13, macOS 12.
 @end itemize
diff --git a/lib/unistd.in.h b/lib/unistd.in.h
index bfc501e5a7..8ba9867894 100644
--- a/lib/unistd.in.h
+++ b/lib/unistd.in.h
@@ -40,6 +40,24 @@
 # undef _GL_INCLUDING_UNISTD_H
 #endif
 
+/* Avoid lseek bugs in FreeBSD, macOS .
+   This bug is fixed after FreeBSD 13; see .
+   Use macOS "" to stand for a 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-22 Thread George Valkov
I received an update from Apple
> We reproduced the issue and are investigating.
> Our engineers are investigating the root cause of the issue you reported. If 
> we need more information from you, we’ll add a comment and send you an email.


Georgi Valkov
httpstorm.com
nano RTOS





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-20 Thread George Valkov




> On 2023-02-20, at 11:14 PM, Pádraig Brady  wrote:
> 
> On 20/02/2023 19:35, George Valkov wrote:
>>> On 2023-02-20, at 7:49 PM, Pádraig Brady  wrote:
>>> 
>>> On 20/02/2023 15:02, George Valkov wrote:
 Hi Paul, the following tests fail after your patch:
 FAIL: tests/split/filter.sh
 FAIL: tests/split/b-chunk.sh
 FAIL: tests/split/l-chunk.sh
>>> 
>>> These look unrelated and may be due to some other change in your 
>>> environment.
>>> Specifically lseek() is failing on /dev/null and returning:
>>> split: /dev/null: cannot determine file size
>> I deleted the lines that were introduced by the patch in unistd.in.h, then 
>> make clean; make -j 16 and ran all tests: back to 5 failed. Then I restored 
>> the deleted lines, rebuild and I got 9 failed tests again.
> 
> Oh right sorry.
> I think I see what's happening.
> Since https://github.com/coreutils/gnulib/commit/4db8db34
> gnulib has been unconditionally replacing lseek() on macos.
> Now with SEEK_DATA undefined that replaced lseek()
> will run the code intended for BeOS.
> So either the lseek.m4 or lseek.c needs adjusting accordingly.

As I reported in one of my previous comments, there is no need for that hack in 
lseek.c. Either use the original lseek from macOS and make sure it doesn't 
return cached data, or disable sparse support until it gets fixed by Apple. The 
more people report it to them the higher the chance for them to take any action.
Let me know when you fix it.

Good luck!


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-20 Thread Pádraig Brady

On 20/02/2023 19:35, George Valkov wrote:



On 2023-02-20, at 7:49 PM, Pádraig Brady  wrote:

On 20/02/2023 15:02, George Valkov wrote:

Hi Paul, the following tests fail after your patch:
FAIL: tests/split/filter.sh
FAIL: tests/split/b-chunk.sh
FAIL: tests/split/l-chunk.sh


These look unrelated and may be due to some other change in your environment.
Specifically lseek() is failing on /dev/null and returning:
split: /dev/null: cannot determine file size


I deleted the lines that were introduced by the patch in unistd.in.h, then make 
clean; make -j 16 and ran all tests: back to 5 failed. Then I restored the 
deleted lines, rebuild and I got 9 failed tests again.


Oh right sorry.
I think I see what's happening.
Since https://github.com/coreutils/gnulib/commit/4db8db34
gnulib has been unconditionally replacing lseek() on macos.
Now with SEEK_DATA undefined that replaced lseek()
will run the code intended for BeOS.
So either the lseek.m4 or lseek.c needs adjusting accordingly.

cheers,
Pádraig






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-20 Thread George Valkov


> On 2023-02-20, at 7:49 PM, Pádraig Brady  wrote:
> 
> On 20/02/2023 15:02, George Valkov wrote:
>> Hi Paul, the following tests fail after your patch:
>> FAIL: tests/split/filter.sh
>> FAIL: tests/split/b-chunk.sh
>> FAIL: tests/split/l-chunk.sh
> 
> These look unrelated and may be due to some other change in your environment.
> Specifically lseek() is failing on /dev/null and returning:
> split: /dev/null: cannot determine file size

I deleted the lines that were introduced by the patch in unistd.in.h, then make 
clean; make -j 16 and ran all tests: back to 5 failed. Then I restored the 
deleted lines, rebuild and I got 9 failed tests again.


>> FAIL: tests/cp/sparse-perf.sh
> 
> As expected.
> I've a fix for this locally
> which leverages another patch to determine
> dynamically if we support SEEK_HOLE.

All right! Cheers mate!


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-20 Thread Pádraig Brady

On 20/02/2023 15:02, George Valkov wrote:

Hi Paul, the following tests fail after your patch:
FAIL: tests/split/filter.sh
FAIL: tests/split/b-chunk.sh
FAIL: tests/split/l-chunk.sh


These look unrelated and may be due to some other change in your environment.
Specifically lseek() is failing on /dev/null and returning:
split: /dev/null: cannot determine file size


FAIL: tests/cp/sparse-perf.sh


As expected.
I've a fix for this locally
which leverages another patch to determine
dynamically if we support SEEK_HOLE.

cheers,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-20 Thread George Valkov
Hi Paul, the following tests fail after your patch:
FAIL: tests/split/filter.sh
FAIL: tests/split/b-chunk.sh
FAIL: tests/split/l-chunk.sh
FAIL: tests/cp/sparse-perf.sh

Before patch
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-05-95f4ee0577dd836de523f46999777fbbbe9d2772-ori.txt

After patch
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-05-95f4ee0577dd836de523f46999777fbbbe9d2772-patch.txt


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-20 Thread George Valkov

> On 2023-02-19, at 8:08 AM, Paul Eggert  wrote:
> 
> George, given what you've written I suppose we should give up the idea of 
> copying sparse files efficiently on macOS (and on FreeBSD 13.0-RELEASE, as it 
> has a similar bug with SEEK_HOLE and SEEK_DATA), in cases where fclonefileat 
> does not work.

I agree, Paul. I just ran some benchmarks: the time it takes to call msync 
unconditionally hurts the performance: 502 us if the memory view is already 
synced seems fine, however 28177-36585 us whenever a sync is required is more 
than the time it takes to performing a full-copy 7042-21528 us.


> Please try the attached Gnulib patch; it uses your idea of disabling 
> SEEK_HOLE and SEEK_DATA on macOS, except on steroids because it disables 
> these two macros everywhere, for all Gnulib-using applications, and it also 
> disables them on FreeBSD < 14. For coreutils you need only this patch's 
> change to lib/unistd.in.h. If this works for you I suppose we can install it 
> into Gnulib and propagate that into coreutils etc.

What is the correct way to apply your patch? I tried
git apply 0001-lseek-avoid-SEEK_HOLE-bugs-in-FreeBSD-macOS.patch 
error: patch failed: ChangeLog:1
error: ChangeLog: patch does not apply
error: doc/posix-functions/lseek.texi: No such file or directory
error: lib/unistd.in.h: wrong type

Then I made the changes to lib/unistd.in.h manually, and ran the tests:
* before: corrupted
* after: fixed

Regarding FreeBSD, I have a FreeBSD 13.1-RELEASE VM running on Hyper-V, so I 
compiled and ran my samples: m.c, d.c: the copy was correct, however d.c 
reports that there are no segments to skip, which means that the file created 
by m.c using mmap is not sparse even though it should contain one blank segment 
and be sparse. Next I installed a fresh copy of FreeBSD 13.0-RELEASE and 
observed the same behaviour. Both installations use ZFS and the tests were ran 
as root.
If you know what conditions are required to reproduce the issue on FreeBSD, let 
me know and I will test it for you. Otherwise I am not sure if we need to make 
any changes to FreeBSD, since I did not observe anything wrong there.
Is it ok if I continue testing on FreeBSD 13.1, because the 13.0 I just 
installed lacks any configuration and is quite uncomfortable to use. It also 
rejects my password over SSH, so I’m forced to use the console. I can test 
pretty much anything that runs on Hyper-V.


> Also, you reported the bug against macOS 12.6. Can you check whether the same 
> bug occurs on macOS 13? If it's still there I suppose the attached patch will 
> need to be updated since it guesses the bug is fixed in macOS 13.

Yes it does. I just ran my samples under macOS Internet recovery, which reports 
as Ventura with kernel compiled on the 30th of January 2023. See the attached 
log:
uname -a
Darwin gMac.lan 22.3.0 Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 
2023; root:xnu-8792.81.3~2/RELEASE_X86_64 x86_64

./m 
src 3  dst 4
1000 skip
total bytes copied 1a45640 / 1a46640 1a46640  100535 us

/gfc cc1-mmap 
16d835378ab973a114082a585cc76958bdbccec0  cc1-mmap

./d
src 3  dst 4
c a46640  p 1a46640  h 1a46640  d 100
total bytes copied a46640 / 1a46640  7117 us

./gfc cc1-sparse 
75e6f6cb303cb5d3909c6d7830c417fc6ed658c3  cc1-sparse

./n 
dst 3  MS_ASYNC 1  MS_INVALIDATE 2  MS_SYNC 16
size bytes 1a46640  msync ok  43770 us

./d 
src 3  dst 4
c 1000  p 1000  h 1000  d 0
c 1a44640  p 1a46640  h 1a46640  d 2000
total bytes copied 1a45640 / 1a46640  41322 us

./gfc cc1-sparse 
16d835378ab973a114082a585cc76958bdbccec0  cc1-sparse

./m 
src 3  dst 4
1000 skip
total bytes copied 1a45640 / 1a46640 1a46640  59198 us

./d 
src 3  dst 4
c a46640  p 1a46640  h 1a46640  d 100
total bytes copied a46640 / 1a46640  12144 us

./gfc cc1-sparse 
75e6f6cb303cb5d3909c6d7830c417fc6ed658c3  cc1-sparse

./n 
dst 3  MS_ASYNC 1  MS_INVALIDATE 2  MS_SYNC 16
size bytes 1a46640  msync ok  27462 us

./d 
src 3  dst 4
c 1000  p 1000  h 1000  d 0
c 1a44640  p 1a46640  h 1a46640  d 2000
total bytes copied 1a45640 / 1a46640  32125 us

./gfc cc1-sparse 
16d835378ab973a114082a585cc76958bdbccec0  cc1-sparse





> I'll cc this email to bug-gnulib to give them a heads-up. (Gnulib readers can 
> see  for context.)

Thanks, I’ll check it.


> Really, Apple should fix this serious data corruption bug in APFS and macOS.
> <0001-lseek-avoid-SEEK_HOLE-bugs-in-FreeBSD-macOS.patch>

I agree, I opened a new ticket with Apple on 2023-02-16. Both via e-mail, and 
https://security.apple.com/reports/OE1928608366012

This was at the same time I reported about mmap and msync here. I got this 
replay on the following day:
> Thanks for the additional information. We are reviewing it and will follow up 
> if we have any questions or need additional details.

Feel free to write them at product-secur...@apple.com and include this line
OE0928602070811 - please include this ID in replies to this thread.

Introduce your self as part of 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-19 Thread Pádraig Brady

On 19/02/2023 06:08, Paul Eggert wrote:

George, given what you've written I suppose we should give up the idea
of copying sparse files efficiently on macOS (and on FreeBSD
13.0-RELEASE, as it has a similar bug with SEEK_HOLE and SEEK_DATA), in
cases where fclonefileat does not work.

Please try the attached Gnulib patch; it uses your idea of disabling
SEEK_HOLE and SEEK_DATA on macOS, except on steroids because it disables
these two macros everywhere, for all Gnulib-using applications, and it
also disables them on FreeBSD < 14. For coreutils you need only this
patch's change to lib/unistd.in.h. If this works for you I suppose we
can install it into Gnulib and propagate that into coreutils etc.

Also, you reported the bug against macOS 12.6. Can you check whether the
same bug occurs on macOS 13? If it's still there I suppose the attached
patch will need to be updated since it guesses the bug is fixed in macOS 13.

I'll cc this email to bug-gnulib to give them a heads-up. (Gnulib
readers can see  for context.)

Really, Apple should fix this serious data corruption bug in APFS and macOS.


Yes I concur.
I'll adjust the coreutils tests to avoid
using SEEK_DATA if not available in a more dynamic manner.

thank you,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-18 Thread Paul Eggert
George, given what you've written I suppose we should give up the idea 
of copying sparse files efficiently on macOS (and on FreeBSD 
13.0-RELEASE, as it has a similar bug with SEEK_HOLE and SEEK_DATA), in 
cases where fclonefileat does not work.


Please try the attached Gnulib patch; it uses your idea of disabling 
SEEK_HOLE and SEEK_DATA on macOS, except on steroids because it disables 
these two macros everywhere, for all Gnulib-using applications, and it 
also disables them on FreeBSD < 14. For coreutils you need only this 
patch's change to lib/unistd.in.h. If this works for you I suppose we 
can install it into Gnulib and propagate that into coreutils etc.


Also, you reported the bug against macOS 12.6. Can you check whether the 
same bug occurs on macOS 13? If it's still there I suppose the attached 
patch will need to be updated since it guesses the bug is fixed in macOS 13.


I'll cc this email to bug-gnulib to give them a heads-up. (Gnulib 
readers can see  for context.)


Really, Apple should fix this serious data corruption bug in APFS and macOS.
From 0024b8460dfe5e133f9e0d63ed4f75314dea2f82 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sun, 19 Feb 2023 00:05:24 -0600
Subject: [PATCH] lseek: avoid SEEK_HOLE bugs in FreeBSD, macOS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This attempts to fix , a bug in GNU cp
caused by a serious data corruption bug in FreeBSD and macOS.
* doc/posix-functions/lseek.texi: Mention the bug.
* lib/unistd.in.h (SEEK_DATA, SEEK_HOLE): Undef in macOS < 13 and
FreeBSD < 14.  FreeBSD fixed the bug sometime during FreeBSD 13
, so the "FreeBSD < 14" is
conservative.  It’s unknown when (or indeed whether) Apple fixed
macOS but this patch guesses macOS 13.
---
 ChangeLog  | 12 
 doc/posix-functions/lseek.texi |  5 +
 lib/unistd.in.h| 12 
 3 files changed, 29 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index c1ca610548..c8e9726916 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,15 @@
+2023-02-18  Paul Eggert  
+
+	lseek: avoid SEEK_HOLE bugs in FreeBSD, macOS
+	This attempts to fix , a bug in GNU cp
+	caused by a serious data corruption bug in FreeBSD and macOS.
+	* doc/posix-functions/lseek.texi: Mention the bug.
+	* lib/unistd.in.h (SEEK_DATA, SEEK_HOLE): Undef in macOS < 13 and
+	FreeBSD < 14.  FreeBSD fixed the bug sometime during FreeBSD 13
+	, so the "FreeBSD < 14" is
+	conservative.  It’s unknown when (or indeed whether) Apple fixed
+	macOS but this patch guesses macOS 13.
+
 2023-02-18  Bruno Haible  
 
 	configmake: Add support for $build_os != $host_os.
diff --git a/doc/posix-functions/lseek.texi b/doc/posix-functions/lseek.texi
index 2f8e2b5877..3470524b12 100644
--- a/doc/posix-functions/lseek.texi
+++ b/doc/posix-functions/lseek.texi
@@ -37,4 +37,9 @@ IRIX 6.5.
 @item
 Some systems do not support @code{SEEK_DATA} and @code{SEEK_HOLE}:
 AIX, HP-UX, Microsoft Windows, NetBSD, OpenBSD.
+@item
+Some systems have a buggy @code{SEEK_DATA} and @code{SEEK_HOLE},
+and Gnulib works around the problem via @code{#undef SEEK_DATA}
+and @code{#undef SEEK_HOLE}:
+FreeBSD 13, macOS 12.
 @end itemize
diff --git a/lib/unistd.in.h b/lib/unistd.in.h
index bfc501e5a7..47b32eb445 100644
--- a/lib/unistd.in.h
+++ b/lib/unistd.in.h
@@ -40,6 +40,18 @@
 # undef _GL_INCLUDING_UNISTD_H
 #endif
 
+/* Avoid lseek bugs in FreeBSD, macOS .  */
+#if defined __FreeBSD__ && __FreeBSD__ < 14
+# undef SEEK_DATA
+# undef SEEK_HOLE
+#elif (defined __APPLE__ && defined __MACH__ \
+   && (!defined MAC_OS_X_VERSION_MIN_REQUIRED \
+   || MAC_OS_X_VERSION_MIN_REQUIRED < 13))
+# include  /* It also defines the two macros.  */
+# undef SEEK_DATA
+# undef SEEK_HOLE
+#endif
+
 /* Get all possible declarations of gethostname().  */
 #if @GNULIB_GETHOSTNAME@ && @UNISTD_H_HAVE_WINSOCK2_H@ \
   && !defined _GL_INCLUDING_WINSOCK2_H
-- 
2.39.2



bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-16 Thread Paul Eggert

On 2/16/23 13:32, Pádraig Brady wrote:

+  /* any desired permissions that weren't cloned.  */
    extra_permissions = desired_mode & ~cloned_mode;


That comment is a tad redundant, but I did add some commentary along 
those lines and installed the patch into Savannah master.


I haven't had time to look into the SEEK_HOLE stuff but hope to find time.





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-16 Thread Pádraig Brady

On 15/02/2023 10:56, Paul Eggert wrote:

Attached is an updated proposal for the fclonefileat patch.

CVE-2021-30995 confirmed my suspicion that coreutils 9.1 and the current
bleeding-edge coreutils on Savannah both have an exploitable security
bug on macOS. Although I hope this patch fixes the bug, it could use
more pairs of eyes, given that Apple had so many problems fixing its own
security bug involving fclonefileat, and given that the coreutils code
has so many flags and conditions.


The logic looks good to me, and all tests pass here on macOS 13.2

I see the added `if (! (cloned_mode & ~desired_mode))` restriction I presume
for security reasons, so that we only _add_ permission bits to those copied by 
fclonefileat(),
and never create an overly accessible file and then remove permission bits.
That wasn't obvious to me at least so I'd suggest adding comments from the diff 
below.

For the record this restriction will mean we can't reflink in some cases.
For example the following would fail with ENOSUP on macOS:

  touch file;
  chmod g+w file
  umask 0022
  cp --reflink=always file file.copy

Using -p in the above cp command would work though.

thanks,
Pádraig.

diff --git a/src/copy.c b/src/copy.c
index c5f2cc186..7c866e4fb 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1284,6 +1284,7 @@ copy_reg (char const *src_name, char const *dst_name,
: x->set_mode ? x->mode
: ((x->explicit_no_preserve_mode ? MODE_RW_UGO : dst_mode)
   & ~ cached_umask ()));
+  /* If fclonefileat() would not set undesired extra permissions.  */
   if (! (cloned_mode & ~desired_mode))
 {
   int fc_flags
@@ -1318,7 +1319,9 @@ copy_reg (char const *src_name, char const *dst_name,
 }
 }

+  /* any desired permissions that weren't cloned.  */
   extra_permissions = desired_mode & ~cloned_mode;
+
   if (!extra_permissions
   && (!x->preserve_mode || (fc_flags & CLONE_ACL)
   || !fd_has_acl (source_desc)))






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-16 Thread George Valkov
Gentlemen, I found a solution, as well as the reason why lseek(SEEK_DATA) 
returns invalid data.

I found a way to craft a file that gets corrupted when we try to make a sparse 
copy. It’s based on mmap:
1. Open src and dst.
2. Obtain the size of src, and truncate dst to the same size.
3. Use mmap to map dst to RAM.
4. Read src and memcpy all non-blank pages to dst.
5. Use munmap and then close src and dst.
6. If src contained any blank pages, dst is now sparse.
7. Since we did not call msync, the changes are still cached in RAM and 
lseek(SEEK_DATA) is not aware that some of the pages contain valid data.

Workaround:
1. open(dst, O_RDONLY)
2. Obtain the file size = lseek(dst, 0, SEEK_END)
3. Use mmap to map dst to RAM.
4. Call msync.
5. Use munmap and then close dst.
6. The memory view of dst is now synchronised with the underlaying file-system.
7. lseek(SEEK_DATA) returns valid data.

Proof of Concept:
1. Start by extracting cc1
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/cc1.tgz

2. The extracted file is not sparse, we need to craft a sparse copy cc1-mmap 
without synchronising the filesystem with the memory view
gcc m.c && ./a.out
src 3  dst 4
1000 skip
total bytes copied 1a45640 / 1a46640 1a46640

3. Now sparse copy cc1-mmap to cc1-sparse
gcc d.c && ./a.out
src 3  dst 4
c a46640  p 1a46640  h 1a46640  d 100
total bytes copied a46640 / 1a46640

4. cc1-sparse is corrupted
sha1sum cc1*
16d835378ab973a114082a585cc76958bdbccec0  cc1
16d835378ab973a114082a585cc76958bdbccec0  cc1-mmap
75e6f6cb303cb5d3909c6d7830c417fc6ed658c3  cc1-sparse

5. Use msync to update the filesystem
gcc n.c && ./a.out
dst 3  MS_ASYNC 1  MS_INVALIDATE 2  MS_SYNC 16
size bytes 1a46640  msync ok

6. Make sparse copy again cc1-mmap to cc1-sparse
gcc d.c && ./a.out
src 3  dst 4
c 1000  p 1000  h 1000  d 0
c 1a44640  p 1a46640  h 1a46640  d 2000
total bytes copied 1a45640 / 1a46640

7. cc1-sparse is good
sha1sum cc1*
16d835378ab973a114082a585cc76958bdbccec0  cc1
16d835378ab973a114082a585cc76958bdbccec0  cc1-mmap
16d835378ab973a114082a585cc76958bdbccec0  cc1-sparse


PoC samples


d.c
Description: Binary data


m.c
Description: Binary data


n.c
Description: Binary data



Cheers!

Georgi Valkov
httpstorm.com
nano RTOS



bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-15 Thread George Valkov


> On 2023-02-15, at 9:48 PM, Paul Eggert  wrote:
> 
> On 2023-02-15 07:26, George Valkov wrote:
>> I tested your patch: both overwrite existing and clone to new produce a 
>> working copy. Here are the test results:
>> https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-04-cf80f988eeb97cc3f8c65ae58e735d36f865277b-clone.txt
> 
> I see some test failures there, involving cp. Do you get the same set of test 
> failures without the patch?

Yes. Note that Pádraig disabled two tests which fail after the patch:
tests/cp/thru-dangling.sh, tests/seek-data-capable (the actual test that failed 
was related to sparse, I think sparse-extents.sh)

https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-01-cf80f988eeb97cc3f8c65ae58e735d36f865277b-ori.txt
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-04-cf80f988eeb97cc3f8c65ae58e735d36f865277b-clone.txt


>> In the case when a dangling symlink is involved and depending on the 
>> arguments passed to cp, would it be possible to use CLONE_NOFOLLOW with 
>> fclonefileat or fall-back to a normal copy? Does dangling refer to source or 
>> destination?
> 
> Dangling refers to the destination. The latest proposed patch does use 
> CLONE_NOFOLLOW, which means fclonefileat should not follow symlinks to the 
> destination. (This behavior is not documented, unfortunately, but it's the 
> only behavior that makes sense.)

Ok, thank you for clarifying!
Good luck!


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-15 Thread George Valkov

> On 2023-02-15, at 9:26 PM, Paul Eggert  wrote:
> 
> On 2023-02-15 06:05, George Valkov wrote:
>> gcc d.c && ./a.out
>> src 3  dst 4
>> c 14053376  p 20344832  h 20344832  d 6291456
>> total bytes copied 14053376 / 27551296
> 
> Thanks, this is due to a known incompatibility in macOS lseek that coreutils 
> is supposed to work around. See 
> , which 
> says, "On some platforms, lseek (fd, offset, SEEK_DATA) returns a value 
> greater than offset even when offset addresses data: macOS 12".

Hi Paul! There are two possible solutions here:
1. Do not use lseek(SEEK_DATA) on macOS.
2. Update the cached list of valid sectors which is used by lseek. See §§§ 
below. There could be some API to achieve this. A workaround is to fclonefileat 
to a temporary file and then remove it. Internally fclonefileat updates that 
cache. We might be able to find how it’s done.

manpage
If whence is SEEK_DATA, the offset is set to the start of the next non-hole 
file region greater than or equal to the supplied offset.

I believe lseek operates correctly, however the underlaying cache it uses is 
not current after certain conditions occur.


> I guess that somehow, the way you're building coreutils defeats the 
> workaround. If so, we'll need to change how coreutils is built in your 
> environment, or fix coreutils 'configure' so that the workaround isn't 
> defeated in your environment.

Here are the build commands I use:
git clone https://github.com/coreutils/coreutils.git

cd
 coreutils
git submodule foreach git pull origin master
./bootstrap
./configure
make -j 16

Here is a tarball of coreutils after building:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/coreutils-cf80f988eeb97cc3f8c65ae58e735d36f865277b.tgz


> Although in  Pádraig was dubious about this 
> guess, his reasoning that the bug is likely specific to APFS rather than an 
> API mismatch could be wrong, as I think HFS doesn't support SEEK_DATA at all 
> or reports trivial answers, so coreutils is not likely to run into the 
> problem on HFS even if the bug is an API issue.



> Here are some things we can do to test this guess.
> 
> 1. Please try the attached program e.c in place of your d.c program. e.c is 
> like d.c, except it attempts to use the coreutils workaround. What symptoms 
> do you observe? If e.c works then it's almost surely a problem in how 
> coreutils is built (compiler options or whatnot), not in the coreutils 
> workaround. If e.c does not work it's likely that the Gnulib workaround does 
> not suffice on your macOS platform, in which case we need to improve the 
> workaround by hacking further on e.c and porting the result back to Gnulib. 
> (There are other possibilities.)

It produces the same corrupted copy as the original sample:
gcc e.c && ./a.out 
__APPLE__ 1  __MACH__ 1  SEEK_DATA 4
src 3  dst 4
c 14053376  p 20344832  h 20344832  d 6291456
total bytes copied 14053376 / 27551296

In hex
c d67000  p 1367000  h 1367000  d 60
total bytes copied d67000 / 1a46640

419494dd4527ebb22c7bf2b388316a4beb5c73d2  cc1
7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-sparse

The after I use clone to update the cached list of valid sectors for cc1
./coreutils-2023-02-15/src/cp cc1 cc1-clone

It returns a bunch of errors, but produces a valid copy. There is no point in 
using this workaround on macOS, it does not offer any improvements. I would 
recommend making sure that lseek sees an updated list of valid sectors.
gcc e.c && ./a.out 
__APPLE__ 1  __MACH__ 1  SEEK_DATA 4
src 3  dst 4
c 1000  p 1000  h 1000  d 0
c 1a44640  p 1a46640  h 1a46640  d 2000
lseek failed  ENXIO 6  EBADF 9  EINVAL 22  EOVERFLOW 84  ESPIPE 29
  p 1a46640
  d 1a46640  6 Device not configured
  h   6 Device not configured
  a 1a46640  6 Device not configured
  b 1a46640  6 Device not configured
total bytes copied 1a45640 / 1a46640

419494dd4527ebb22c7bf2b388316a4beb5c73d2  cc1
419494dd4527ebb22c7bf2b388316a4beb5c73d2  cc1-clone
419494dd4527ebb22c7bf2b388316a4beb5c73d2  cc1-sparse


> 2. Please verify that coreutils cp is using the Gnulib workaround. In the src 
> directory, the shell command "nm -o *.o | grep lseek" should output only 
> lines containing "rpl_lseek"; there shouldn't be any lines saying just 
> "lseek". Also, please run the command "objdump -d lib/libcoreutils_a-lseek.o" 
> and verify that the replacement lseek is actually doing something nontrivial 
> (you should get maybe three dozen lines of assembly language; if it's much 
> less then this is the problem).

nm -o *.o | grep lseek
cat.o:  U _rpl_lseek
copy.o:  U _rpl_lseek
dd.o:  U _rpl_lseek
head.o:  U _rpl_lseek
selinux.o: no symbols
shred.o:  U _rpl_lseek
shuf.o:  U _rpl_lseek
split.o:  U _rpl_lseek
tac.o:  U _rpl_lseek
tail.o:  U 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-15 Thread Paul Eggert

On 2023-02-15 07:26, George Valkov wrote:

I tested your patch: both overwrite existing and clone to new produce a working 
copy. Here are the test results:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-04-cf80f988eeb97cc3f8c65ae58e735d36f865277b-clone.txt


I see some test failures there, involving cp. Do you get the same set of 
test failures without the patch?



In the case when a dangling symlink is involved and depending on the arguments 
passed to cp, would it be possible to use CLONE_NOFOLLOW with fclonefileat or 
fall-back to a normal copy? Does dangling refer to source or destination?


Dangling refers to the destination. The latest proposed patch does use 
CLONE_NOFOLLOW, which means fclonefileat should not follow symlinks to 
the destination. (This behavior is not documented, unfortunately, but 
it's the only behavior that makes sense.)






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-15 Thread Paul Eggert

On 2023-02-15 06:05, George Valkov wrote:

gcc d.c && ./a.out
src 3  dst 4
c 14053376  p 20344832  h 20344832  d 6291456
total bytes copied 14053376 / 27551296


Thanks, this is due to a known incompatibility in macOS lseek that 
coreutils is supposed to work around. See 
, which 
says, "On some platforms, lseek (fd, offset, SEEK_DATA) returns a value 
greater than offset even when offset addresses data: macOS 12".


I guess that somehow, the way you're building coreutils defeats the 
workaround. If so, we'll need to change how coreutils is built in your 
environment, or fix coreutils 'configure' so that the workaround isn't 
defeated in your environment. Although in 
 Pádraig was dubious about this guess, 
his reasoning that the bug is likely specific to APFS rather than an API 
mismatch could be wrong, as I think HFS doesn't support SEEK_DATA at all 
or reports trivial answers, so coreutils is not likely to run into the 
problem on HFS even if the bug is an API issue.


Here are some things we can do to test this guess.

1. Please try the attached program e.c in place of your d.c program. e.c 
is like d.c, except it attempts to use the coreutils workaround. What 
symptoms do you observe? If e.c works then it's almost surely a problem 
in how coreutils is built (compiler options or whatnot), not in the 
coreutils workaround. If e.c does not work it's likely that the Gnulib 
workaround does not suffice on your macOS platform, in which case we 
need to improve the workaround by hacking further on e.c and porting the 
result back to Gnulib. (There are other possibilities.)


2. Please verify that coreutils cp is using the Gnulib workaround. In 
the src directory, the shell command "nm -o *.o | grep lseek" should 
output only lines containing "rpl_lseek"; there shouldn't be any lines 
saying just "lseek". Also, please run the command "objdump -d 
lib/libcoreutils_a-lseek.o" and verify that the replacement lseek is 
actually doing something nontrivial (you should get maybe three dozen 
lines of assembly language; if it's much less then this is the problem).


3. Please confirm that _DARWIN_C_SOURCE is defined to 1 in lib/config.h.

4. What is the output of the following commands, in the coreutils build 
directory?


rm lib/libcoreutils_a-lseek.o
make V=1 lib/libcoreutils_a-lseek.o
gcc -E -Ilib lib/lseek.c#define _DARWIN_C_SOURCE 1
#define _GNU_SOURCE 1
#include 
#include 
#include 
#include 
#include 
#include 

off_t
rpl_lseek (int fd, off_t offset, int whence)
{
#if defined __APPLE__ && defined __MACH__ && defined SEEK_DATA
  if (whence == SEEK_DATA)
{
  /* If OFFSET points to data, macOS lseek+SEEK_DATA returns the
 start S of the first data region that begins *after* OFFSET,
 where the region from OFFSET to S consists of possibly-empty
 data followed by a possibly-empty hole.  To work around this
 portability glitch, check whether OFFSET is within data by
 using lseek+SEEK_HOLE, and if so return to OFFSET by using
 lseek+SEEK_SET.  Also, contrary to the macOS documentation,
 lseek+SEEK_HOLE can fail with ENXIO if there are no holes on
 or after OFFSET.  What a mess!  */
  off_t next_hole = lseek (fd, offset, SEEK_HOLE);
  if (next_hole < 0)
return errno == ENXIO ? offset : next_hole;
  if (next_hole != offset)
whence = SEEK_SET;
}
#endif
  return lseek (fd, offset, whence);
}
#define lseek rpl_lseek

#define MAX_SIZE (1024 * 1024 * 100)
char msg[MAX_SIZE];

int main(int argc, char ** argv)
{
	// sparse copy test
	int src = open("cc1", O_RDONLY);
	int dst = open("cc1-sparse", O_CREAT | O_RDWR, 0700);
	printf("src %i  dst %i\n", src, dst);
	//- printf("SET %i  CUR %i  END %i  HOLE %i  DATA %i\n", SEEK_SET, SEEK_CUR, SEEK_END, SEEK_HOLE, SEEK_DATA);

	long long a = 0;
	long long b = 0;
	long long d = 0;
	long long e = 0;
	long long h = 0;
	long long p = 0;
	long long s = -1;
	long long t = 0;
	ssize_t c = 0;
	ssize_t i = 0;
	ssize_t r = 0;
	ssize_t w = 0;
	int ea = 0;
	int eb = 0;
	int ed = 0;
	int ee = 0;
	int eh = 0;
	int ep = 0;

	do
	{
		if (++i >= 10)
		{
			printf("LOOP %zi\n", i);
			break;
		}

		errno = 0;
		d = lseek(src, d, SEEK_DATA);
		ed = errno;
		h = lseek(src, d, SEEK_HOLE);
		eh = errno;
		a = lseek(src, d, SEEK_SET);
		ea = errno;
		b = lseek(dst, d, SEEK_SET);
		eb = errno;
		c = h - d;

		if ((a == -1) || (b == -1) || (d == -1) || (h == -1))
		{
			int handled = 0;

			if ((d == -1) && (ed == ENXIO))
			{
p = lseek(src, 0, SEEK_END);
ep = errno;

if (p == -1)
{
	printf(
		"lseek(SEEK_END) failed  p %lli %2i %s\n",
		p, ep, strerror(ep)
	);
}
else
{
	e = ftruncate(dst, p);
	ee = errno;

	if (e == -1)
	{
		printf(
			"ftruncate(%lli) failed  %lli %2i %s\n",
			p, e, ee, strerror(ee)
		);
	}
	else
	{
		

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-15 Thread George Valkov


> On 2023-02-15, at 12:56 PM, Paul Eggert  wrote:
> 
> The disabling SEEK_HOLE on Apple is OK if a release is imminent. Otherwise 
> I'd like to get to the bottom of it first. This can be done by having George 
> use dtrace or (if that's not possible) adding debugging printfs.

Hi Paul! Sure, send me some code, describe the tests you need and I will come 
back with the results. Since we are tracing regular programs and not system 
components, dtruss can be used. Here is a trace of the sparse copy sample I 
wrote earlier, when it tries to copy cc1 to cc1-sparse, resulting in a 
corrupted copy:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/dtruss-sparse-copy.txt


> Attached is an updated proposal for the fclonefileat patch.
> 
> CVE-2021-30995 confirmed my suspicion that coreutils 9.1 and the current 
> bleeding-edge coreutils on Savannah both have an exploitable security bug on 
> macOS. Although I hope this patch fixes the bug, it could use more pairs of 
> eyes, given that Apple had so many problems fixing its own security bug 
> involving fclonefileat, and given that the coreutils code has so many flags 
> and conditions.
> <0001-cp-fclonefileat-security-fix-CLONE_ACL-fixups.patch>

I tested your patch: both overwrite existing and clone to new produce a working 
copy. Here are the test results:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-04-cf80f988eeb97cc3f8c65ae58e735d36f865277b-clone.txt

In the case when a dangling symlink is involved and depending on the arguments 
passed to cp, would it be possible to use CLONE_NOFOLLOW with fclonefileat or 
fall-back to a normal copy? Does dangling refer to source or destination?


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-15 Thread George Valkov

> On 2023-02-14, at 2:12 PM, Pádraig Brady  wrote:
> 
> On 14/02/2023 06:12, Paul Eggert wrote:
>> On 2023-02-13 09:16, George Valkov wrote:
>>> 1. We apply my original patch to disable sparse-copy on macOS. Otherwise 
>>> since fclonefileat is not used whenever we overwrite a file, corruption 
>>> will still occur.
>> I'm not entirely sold on this patch, because I still don't understand
>> what's happening. The original bug report in
>>  basically says only "it
>> doesn't work", and I'd like to know more.
>> Part of the reason I'm hesitating is that there are multiple ways of
>> interpreting what SEEK_HOLE and SEEK_DATA mean, and perhaps it's just
>> that Apple has come up with yet another way to interpret it, which we
>> need to know about.

I created a sparse copy sample that traces all calls. It relays only on the 
manpage for lseek. I should note that I am not familiar with the implementation 
used by coreutils, which mages it a good ground for independent research on the 
topic. The code is attached as d.c Here are my observations:
1. Given a non-sparse file A, it produces an exact copy B.
gcc d.c && ./a.out
src 3  dst 4
c 553  p 553  h 553  d 0
total bytes copied 553 / 553

a2d8f888d5c88f6eef83a3bf4ef2434a85a64792  A
a2d8f888d5c88f6eef83a3bf4ef2434a85a64792  B

2. Given cc1, it produces a corrupted copy cc1-sparse, that matches the 
checksum of the copy cc-ori created by coreutils/src/cp. So unless you can find 
a flaw in my implementation, we can safely assume that SEEK_DATA skips valid 
data blocks and hence it should not be used on macOS because it is broken:
./coreutils/src/cp cc1 cc1-ori
gcc d.c && ./a.out
src 3  dst 4
c 14053376  p 20344832  h 20344832  d 6291456
total bytes copied 14053376 / 27551296
e8eb21ec118ff3a8fce3ad85d5f9a72d47a257c6  cc1
7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-ori
7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-sparse
-rwxr-xr-x 1 g staff  27551296 Feb 15 09:59 cc1*
-rwxr-xr-x 1 g staff  27551296 Feb 15 14:29 cc1-ori*
-rwx-- 1 g staff  27551296 Feb 15 14:29 cc1-sparse*

Further research: it would be interesting to find how my sparse copy sample, as 
well as coreutils/src/cp handle custom crafted sparse files. My first idea 
would be a file with the same size as cc1 (27 551 296 bytes) with a couple of 
data blocks in the middle. My second idea is to read the entire cc1, then call 
ftruncate to set the size of the copy and write only those bytes that are not 0.


> I agree it would be good to know more.
> However given it works on HFS but fails on APFS suggests a file system 
> specific issue,
> rather than an API mismatch issue (over what we're already catering for on 
> macOS).
> I suspect it's a similar issue to the one that openzfs had:
> https://github.com/openzfs/zfs/issues/11900

According to Wikipedia:
> Apple's HFS+ does not provide support for sparse files, but in OS X, the 
> virtual file system layer supports storing them in any supported file system, 
> including HFS+.
https://en.wikipedia.org/wiki/Sparse_file

Hence the reason we don’t observe any issues on HFS+ is most likely that the 
files we try to copy from it are not sparse in first place.


> Given how closed / uncommunicative Apple are in general
> and specifically for this already reported bug,
> I'm inclined to disable SEEK_DATA on __APPLE__ for the upcoming release.
> 
> Also we've mitigated the impact somewhat by enabling fclonefileat() in more 
> cases.

I agree. We can try to send them one last message with a link to this group, 
and be done with it if they don’t relay. I just found a feedback ticket from 
2022-09-16, where they replayed asking me to test macOS 13 Beta 8. At that time 
Finder was still broken, and now it isn’t. So they might have at least 
partially addressed the issue.
https://feedbackassistant.apple.com/feedback/11522527


>> Another reason to hesitate is that GNU coreutils is not the only core
>> program that uses SEEK_HOLE and SEEK_DATA. If this really is a generic
>> Apple problem we need to know which Apple releases have it, so that we
>> can program around it at the Gnulib level instead of mucking about with
>> each individual program.

We need to start somewhere. Fixing cp and install will certainly be helpful to 
users, until you address the issue globally.
It might be possible to install VMs with various versions of macOS, though I 
suspect it will take time.

Pádraig, did your try cp on
build_dir/toolchain-arm_cortex-a9+vfpv3-d16_gcc-12.2.0_musl_eabi/gcc-12.2.0-initial/gcc/cc1
after building gcc in OpenWRT build root?
make toolchain/gcc/initial/{clean,compile} -j 16

Here is a step by step article how to setup the build environment:
https://httpstorm.com/share/.openwrt/help/OpenWRT%20build%20on%20macOS.htm


> I've attached the 3 latest patches I'm considering in this area.
> I presume you're OK with your amended fclonefileat() improvement one?

Thank you! Yes, I’m all for the 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-15 Thread Paul Eggert

Attached is an updated proposal for the fclonefileat patch.

CVE-2021-30995 confirmed my suspicion that coreutils 9.1 and the current 
bleeding-edge coreutils on Savannah both have an exploitable security 
bug on macOS. Although I hope this patch fixes the bug, it could use 
more pairs of eyes, given that Apple had so many problems fixing its own 
security bug involving fclonefileat, and given that the coreutils code 
has so many flags and conditions.
From 9bb98016eb3a369c3ae195b71fcc1010e5aa6e53 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Fri, 10 Feb 2023 13:34:54 -0800
Subject: [PATCH] cp: fclonefileat security fix + CLONE_ACL + fixups

* src/copy.c: Some changes if HAVE_FCLONEFILEAT && !USE_XATTR.
(fd_has_acl): New function.
(CLONE_ACL): Default to 0.
(copy_reg): Use CLONE_NOFOLLOW to avoid races like CVE-2021-30995
.
Use CLONE_ACL if available and working, falling back to cloning
without it if it fails due to EINVAL.
If the only problem with fclonefileat is that it would create the
file with the wrong timestamp, or with too few permissions,
do that but fix the timestamp and permissions afterwards,
rather than falling back on a traditional copy.
---
 NEWS   |   7 
 src/copy.c | 109 ++---
 2 files changed, 103 insertions(+), 13 deletions(-)

diff --git a/NEWS b/NEWS
index 3d0ede150..d66ea0f6f 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,10 @@ GNU coreutils NEWS-*- outline -*-
   total line in this case.
   [bug introduced with the --total option in coreutils-8.26]
 
+  'cp -p' no longer has a security hole when cloning into a dangling
+  symbolic link on macOS 10.12 and later.
+  [bug introduced in coreutils-9.1]
+
   'cp -rx / /mnt' no longer complains "cannot create directory /mnt/".
   [bug introduced in coreutils-9.1]
 
@@ -124,6 +128,9 @@ GNU coreutils NEWS-*- outline -*-
   and possibly employing copy offloading or reflinking,
   for the non sparse portion of such sparse files.
 
+  On macOS, cp creates a copy-on-write clone in more cases.
+  Previously cp would only do this when preserving mode and timestamps.
+
   date --debug now diagnoses if multiple --date or --set options are
   specified, as only the last specified is significant in that case.
 
diff --git a/src/copy.c b/src/copy.c
index dfbb557de..c5f2cc186 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1076,6 +1076,25 @@ infer_scantype (int fd, struct stat const *sb,
   return ZERO_SCANTYPE;
 }
 
+#if HAVE_FCLONEFILEAT && !USE_XATTR
+# include 
+/* Return true if FD has a nontrivial ACL.  */
+static bool
+fd_has_acl (int fd)
+{
+  /* Every platform with fclonefileat (macOS 10.12 or later) also has
+ acl_get_fd_np.  */
+  bool has_acl = false;
+  acl_t acl = acl_get_fd_np (fd, ACL_TYPE_EXTENDED);
+  if (acl)
+{
+  acl_entry_t ace;
+  has_acl = 0 <= acl_get_entry (acl, ACL_FIRST_ENTRY, );
+  acl_free (acl);
+}
+  return has_acl;
+}
+#endif
 
 /* Handle failure from FICLONE or fclonefileat.
Return FALSE if it's a terminal failure for this file.  */
@@ -1244,24 +1263,82 @@ copy_reg (char const *src_name, char const *dst_name,
   if (*new_dst)
 {
 #if HAVE_FCLONEFILEAT && !USE_XATTR
-/* CLONE_NOOWNERCOPY only available on macos >= 10.13.  */
+# ifndef CLONE_ACL
+#  define CLONE_ACL 0 /* Added in macOS 12.6.  */
+# endif
 # ifndef CLONE_NOOWNERCOPY
-#  define CLONE_NOOWNERCOPY 0
+#  define CLONE_NOOWNERCOPY 0 /* Added in macOS 10.13.  */
 # endif
-  int fc_flags = x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY;
+  /* Try fclonefileat if copying data in reflink mode.
+ Use CLONE_NOFOLLOW to avoid security issues that could occur
+ if writing through dangling symlinks.  Although the circa
+ 2023 macOS documentation doesn't say so, CLONE_NOFOLLOW
+ affects the destination file too.  */
   if (data_copy_required && x->reflink_mode
-  && x->preserve_mode && x->preserve_timestamps
-  && (x->preserve_ownership || CLONE_NOOWNERCOPY))
+  && (CLONE_NOOWNERCOPY || x->preserve_ownership))
 {
-  if (fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags) == 0)
-goto close_src_desc;
-  else if (! handle_clone_fail (dst_dirfd, dst_relname, src_name,
-dst_name,
--1, false /* We didn't create dst  */,
-x->reflink_mode))
+  mode_t cloned_mode_bits = S_ISVTX | S_IRWXUGO;
+  mode_t cloned_mode = src_mode & cloned_mode_bits;
+  mode_t desired_mode
+= (x->preserve_mode ? src_mode & CHMOD_MODE_BITS
+   : x->set_mode ? x->mode
+   : ((x->explicit_no_preserve_mode ? MODE_RW_UGO : dst_mode)
+  & ~ 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-15 Thread Pádraig Brady

On 14/02/2023 19:02, Paul Eggert wrote:

On 2023-02-14 04:12, Pádraig Brady wrote:


I suspect it's a similar issue to the one that openzfs had:
https://github.com/openzfs/zfs/issues/11900


Yes, quite likely.



Given how closed / uncommunicative Apple are in general
and specifically for this already reported bug,
I'm inclined to disable SEEK_DATA on __APPLE__ for the upcoming release.


If there's an imminent release that's the only way to go. However, if we
do that we should also disable fclonefileat since the latest edition of
the patch isn't right either. It messes up the modes in some cases and
there's a good reason we don't otherwise allow writing through dangling
symlinks (see the length email discussions when that was put in) so that
would have to be addressed too.

I'll see if I can come up with an improved version of the fclonefileat
patch that would handle these issues.



I've attached the 3 latest patches I'm considering in this area.
I presume you're OK with your amended fclonefileat() improvement one?


Not yet, because it doesn't work correctly in some cases (e.g.,
x->explicit_no_preserve_mode) and it mishandles the dangling symlink cases.

The dangling symlink patch might be OK; depends on how fclonefileat
patch ends up. It's OK to copy through a dangling symlink with
fclonefileat only if we don't need to fix up permissions or timestamps
afterwards (e.g., cp -p).

The disabling SEEK_HOLE on Apple is OK if a release is imminent.
Otherwise I'd like to get to the bottom of it first. This can be done by
having George use dtrace or (if that's not possible) adding debugging
printfs.


Cool.
The release is 1 - 2 weeks away anyway.

thanks for looking at this Paul.

Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-14 Thread Paul Eggert

On 2023-02-14 04:12, Pádraig Brady wrote:


I suspect it's a similar issue to the one that openzfs had:
https://github.com/openzfs/zfs/issues/11900


Yes, quite likely.



Given how closed / uncommunicative Apple are in general
and specifically for this already reported bug,
I'm inclined to disable SEEK_DATA on __APPLE__ for the upcoming release.


If there's an imminent release that's the only way to go. However, if we 
do that we should also disable fclonefileat since the latest edition of 
the patch isn't right either. It messes up the modes in some cases and 
there's a good reason we don't otherwise allow writing through dangling 
symlinks (see the length email discussions when that was put in) so that 
would have to be addressed too.


I'll see if I can come up with an improved version of the fclonefileat 
patch that would handle these issues.




I've attached the 3 latest patches I'm considering in this area.
I presume you're OK with your amended fclonefileat() improvement one?


Not yet, because it doesn't work correctly in some cases (e.g., 
x->explicit_no_preserve_mode) and it mishandles the dangling symlink cases.


The dangling symlink patch might be OK; depends on how fclonefileat 
patch ends up. It's OK to copy through a dangling symlink with 
fclonefileat only if we don't need to fix up permissions or timestamps 
afterwards (e.g., cp -p).


The disabling SEEK_HOLE on Apple is OK if a release is imminent. 
Otherwise I'd like to get to the bottom of it first. This can be done by 
having George use dtrace or (if that's not possible) adding debugging 
printfs.






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-14 Thread Paul Eggert

On 2023-02-13 09:16, George Valkov wrote:

1. We apply my original patch to disable sparse-copy on macOS. Otherwise since 
fclonefileat is not used whenever we overwrite a file, corruption will still 
occur.


I'm not entirely sold on this patch, because I still don't understand 
what's happening. The original bug report in 
 basically says only "it 
doesn't work", and I'd like to know more.


Part of the reason I'm hesitating is that there are multiple ways of 
interpreting what SEEK_HOLE and SEEK_DATA mean, and perhaps it's just 
that Apple has come up with yet another way to interpret it, which we 
need to know about.


Another reason to hesitate is that GNU coreutils is not the only core 
program that uses SEEK_HOLE and SEEK_DATA. If this really is a generic 
Apple problem we need to know which Apple releases have it, so that we 
can program around it at the Gnulib level instead of mucking about with 
each individual program.


Yet another reason is that perhaps the bug was introduced by this pair 
of changes introduced to Gnulib in November 2021 to try to address 
:


https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=4db8db34112b86ddf8bac48f16b5acff732b5fa9
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=1a268176fbb184e393c98575e61fe692264c7d91

These Gnulib patches are attached. If you revert these, does the problem 
go away?


And yet another reason is that perhaps the bug was introduced by this 
pair of Coreutils changes made in October 2021 to try to address 
:


https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a9e31457bf6a63072b54a9324cdf59a09441a21f
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=1753012b8d8cdd0817155f4756f5bf9bfe347816

These Coreutils patches are also attached. Does reverting them help?

All things considered, I'd like a copy of the dtruss output for 
unmodified coreutils 9.1 cp on the failing case. This is really helpful 
for debugging this sort of thing. It's what helped us debug Bug#51857.From 4db8db34112b86ddf8bac48f16b5acff732b5fa9 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Mon, 15 Nov 2021 15:08:25 -0800
Subject: [PATCH 1/2] lseek: port around macOS SEEK_DATA glitch

Problem reported by Sudhip Nashi (Bug#51857).
* doc/posix-functions/lseek.texi (lseek): Mention macOS SEEK_DATA
issue.
* lib/lseek.c (rpl_lseek): Work around macOS portability glitch.
* m4/lseek.m4 (gl_FUNC_LSEEK): Replace lseek on Darwin.
* modules/lseek (Depends-on): Depend on msvc-nothrow
and fstat only if needed.
---
 ChangeLog  | 11 +++
 doc/posix-functions/lseek.texi |  4 
 lib/lseek.c| 16 
 m4/lseek.m4| 10 --
 modules/lseek  |  4 ++--
 5 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index f47071a72d..71a2265706 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2021-11-15  Paul Eggert  
+
+	lseek: port around macOS SEEK_DATA glitch
+	Problem reported by Sudhip Nashi (Bug#51857).
+	* doc/posix-functions/lseek.texi (lseek): Mention macOS SEEK_DATA
+	issue.
+	* lib/lseek.c (rpl_lseek): Work around macOS portability glitch.
+	* m4/lseek.m4 (gl_FUNC_LSEEK): Replace lseek on Darwin.
+	* modules/lseek (Depends-on): Depend on msvc-nothrow
+	and fstat only if needed.
+
 2021-11-11  Fabrice Fontaine(tiny change)
 
 	sigsegv: fix builds on microblazeel, or1k
diff --git a/doc/posix-functions/lseek.texi b/doc/posix-functions/lseek.texi
index 4a9d55dcf7..2f8e2b5877 100644
--- a/doc/posix-functions/lseek.texi
+++ b/doc/posix-functions/lseek.texi
@@ -9,6 +9,10 @@ Gnulib module: lseek
 Portability problems fixed by Gnulib:
 @itemize
 @item
+On some platforms, @code{lseek (fd, offset, SEEK_DATA)} returns a value
+greater than @code{offset} even when @code{offset} addresses data:
+macOS 12
+@item
 This function is declared in a different header file (namely, @code{})
 on some platforms:
 MSVC 14.
diff --git a/lib/lseek.c b/lib/lseek.c
index 0042546a8e..7dcd6c9da7 100644
--- a/lib/lseek.c
+++ b/lib/lseek.c
@@ -52,6 +52,22 @@ rpl_lseek (int fd, off_t offset, int whence)
   errno = ESPIPE;
   return -1;
 }
+#elif defined __APPLE__ && defined __MACH__ && defined SEEK_DATA
+  if (whence == SEEK_DATA)
+{
+  /* If OFFSET points to data, macOS lseek+SEEK_DATA returns the
+ start S of the first data region that begins *after* OFFSET,
+ where the region from OFFSET to S consists of possibly-empty
+ data followed by a possibly-empty hole.  To work around this
+ portability glitch, check whether OFFSET is within data by
+ using lseek+SEEK_HOLE, and if so return to OFFSET by using
+ lseek+SEEK_SET.  */
+  off_t next_hole = lseek (fd, offset, SEEK_HOLE);
+  if (next_hole < 0)
+return next_hole;
+  if (next_hole != offset)
+whence = 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-14 Thread George Valkov


> On 2023-02-14, at 8:12 AM, Paul Eggert  wrote:
> 
> On 2023-02-13 09:16, George Valkov wrote:
>> 1. We apply my original patch to disable sparse-copy on macOS. Otherwise 
>> since fclonefileat is not used whenever we overwrite a file, corruption will 
>> still occur.
> 
> I'm not entirely sold on this patch, because I still don't understand what's 
> happening. The original bug report in 
>  basically says only "it 
> doesn't work", and I'd like to know more.
> 
> Part of the reason I'm hesitating is that there are multiple ways of 
> interpreting what SEEK_HOLE and SEEK_DATA mean, and perhaps it's just that 
> Apple has come up with yet another way to interpret it, which we need to know 
> about.
> 
> Another reason to hesitate is that GNU coreutils is not the only core program 
> that uses SEEK_HOLE and SEEK_DATA. If this really is a generic Apple problem 
> we need to know which Apple releases have it, so that we can program around 
> it at the Gnulib level instead of mucking about with each individual program.
> 
> Yet another reason is that perhaps the bug was introduced by this pair of 
> changes introduced to Gnulib in November 2021 to try to address 
> :
> 
> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=4db8db34112b86ddf8bac48f16b5acff732b5fa9
> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=1a268176fbb184e393c98575e61fe692264c7d91
> 
> These Gnulib patches are attached. If you revert these, does the problem go 
> away?
> 
> And yet another reason is that perhaps the bug was introduced by this pair of 
> Coreutils changes made in October 2021 to try to address 
> :
> 
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a9e31457bf6a63072b54a9324cdf59a09441a21f
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=1753012b8d8cdd0817155f4756f5bf9bfe347816
> 
> These Coreutils patches are also attached. Does reverting them help?
> 
> All things considered, I'd like a copy of the dtruss output for unmodified 
> coreutils 9.1 cp on the failing case. This is really helpful for debugging 
> this sort of thing. It's what helped us debug 
> Bug#51857.<0001-lseek-port-around-macOS-SEEK_DATA-glitch.patch><0002-lseek-port-around-macOS-SEEK_HOLE-glitch.patch><0001-cp-defend-better-against-FreeBSD-9.1-zfs-bug.patch><0002-cp-revert-unnecessary-FreeBSD-workaround.patch>


Hi Paul, I can check later today. One thing I notices is that the sparse-copy 
corruption has been present on 2021-10-06. (October 6th). As I recall it was 
not present on coreutils-8, it was introduced as soon as coreutils-9 came, 
precisely ever since sparse-copy was enabled. I will have to check if I can 
find any old logs or investigate again. But I don’t think reverting these 
patches will make any difference, since the issue was present before them.

Can we bring someone from the gcc team here? They know best how the build 
system for gcc works, and might be able to create a simple PoC to craft a 
sparse file that gets corrupted on copy. Currently I need to rebuild gcc, which 
takes 2-3 minutes and we have no clue how it is crafted.

Any idea how to make Apple aware? Last time I contacted the 
product-secur...@apple.com team on 2021-10-06, the replayed two days later that 
they are investigating, but did not bother fixing the underlaying issue. 
Perhaps if you contact them as a member of GNU/coreutils, they might be more 
open to work? If you do, refer them to Follow-up: 782620861.

I’ll come back to test your requests, at a late point today.

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-14 Thread Pádraig Brady

On 14/02/2023 06:12, Paul Eggert wrote:

On 2023-02-13 09:16, George Valkov wrote:

1. We apply my original patch to disable sparse-copy on macOS. Otherwise since 
fclonefileat is not used whenever we overwrite a file, corruption will still 
occur.


I'm not entirely sold on this patch, because I still don't understand
what's happening. The original bug report in
 basically says only "it
doesn't work", and I'd like to know more.

Part of the reason I'm hesitating is that there are multiple ways of
interpreting what SEEK_HOLE and SEEK_DATA mean, and perhaps it's just
that Apple has come up with yet another way to interpret it, which we
need to know about.


I agree it would be good to know more.
However given it works on HFS but fails on APFS suggests a file system specific 
issue,
rather than an API mismatch issue (over what we're already catering for on 
macOS).
I suspect it's a similar issue to the one that openzfs had:
https://github.com/openzfs/zfs/issues/11900

Given how closed / uncommunicative Apple are in general
and specifically for this already reported bug,
I'm inclined to disable SEEK_DATA on __APPLE__ for the upcoming release.

Also we've mitigated the impact somewhat by enabling fclonefileat() in more 
cases.


Another reason to hesitate is that GNU coreutils is not the only core
program that uses SEEK_HOLE and SEEK_DATA. If this really is a generic
Apple problem we need to know which Apple releases have it, so that we
can program around it at the Gnulib level instead of mucking about with
each individual program.


Yes that would be best if possible.

I've attached the 3 latest patches I'm considering in this area.
I presume you're OK with your amended fclonefileat() improvement one?

thanks,
PádraigFrom 0a590853b159d4eda90b1d3a83fc1859356727ac Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Fri, 10 Feb 2023 13:34:54 -0800
Subject: [PATCH 1/3] copy: improve use of fclonefileat

* src/copy.c (copy_reg): Use CLONE_ACL if available and working.
If the only problem with fclonefileat is that it would create the
file with the wrong timestamp, or with too few permissions,
do that but fix the timestamp and permissions afterwards,
rather than falling back on a traditional copy.
(CLONE_ACL) [HAVE_FCLONEFILEAT && !USE_XATTR]: Default to 0.
---
 NEWS   |  3 +++
 src/copy.c | 53 +
 2 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/NEWS b/NEWS
index 3d0ede150..cbc614c2e 100644
--- a/NEWS
+++ b/NEWS
@@ -124,6 +124,9 @@ GNU coreutils NEWS-*- outline -*-
   and possibly employing copy offloading or reflinking,
   for the non sparse portion of such sparse files.
 
+  On macOS, cp creates a copy-on-write clone in more cases.
+  Previously cp would only do this when preserving mode and timestamps.
+
   date --debug now diagnoses if multiple --date or --set options are
   specified, as only the last specified is significant in that case.
 
diff --git a/src/copy.c b/src/copy.c
index dfbb557de..8370f55bd 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1146,6 +1146,7 @@ copy_reg (char const *src_name, char const *dst_name,
   union scan_inference scan_inference;
   bool return_val = true;
   bool data_copy_required = x->data_copy_required;
+  bool mode_already_preserved = false;
   bool preserve_xattr = USE_XATTR & x->preserve_xattr;
 
   source_desc = open (src_name,
@@ -1244,17 +1245,48 @@ copy_reg (char const *src_name, char const *dst_name,
   if (*new_dst)
 {
 #if HAVE_FCLONEFILEAT && !USE_XATTR
-/* CLONE_NOOWNERCOPY only available on macos >= 10.13.  */
+# ifndef CLONE_ACL
+#  define CLONE_ACL 0 /* Added in macOS 12.6.  */
+# endif
 # ifndef CLONE_NOOWNERCOPY
-#  define CLONE_NOOWNERCOPY 0
+#  define CLONE_NOOWNERCOPY 0 /* Added in macOS 10.13.  */
 # endif
-  int fc_flags = x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY;
+  mode_t cloned_mode_bits = S_ISVTX | S_IRWXUGO;
+  mode_t cloned_mode = src_mode & cloned_mode_bits;
+  int fc_flags = ((x->preserve_mode ? CLONE_ACL : 0)
+  | (x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY));
   if (data_copy_required && x->reflink_mode
-  && x->preserve_mode && x->preserve_timestamps
+  && (x->preserve_mode || ! (cloned_mode & ~dst_mode))
   && (x->preserve_ownership || CLONE_NOOWNERCOPY))
 {
-  if (fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags) == 0)
-goto close_src_desc;
+  int s = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
+  if (s != 0 && (fc_flags & CLONE_ACL) != 0 && errno == EINVAL)
+{
+  fc_flags &= ~CLONE_ACL;
+  s = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
+}
+  if (s == 0)
+{
+  if (!x->preserve_timestamps)
+{
+  struct timespec 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-13 Thread Pádraig Brady

On 13/02/2023 17:16, George Valkov wrote:



On 2023-02-13, at 4:05 PM, Pádraig Brady  wrote:
This amendment seems to make the treatment of umask and setuid consistent
between using fclonefileat() and not.

diff --git a/src/copy.c b/src/copy.c
index 782fab98d..8370f55bd 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1280,10 +1280,11 @@ copy_reg (char const *src_name, char const *dst_name,
   return_val = false;
 }
 }
-  mode_already_preserved = (fc_flags & CLONE_ACL) != 0;
+  mode_already_preserved = (fc_flags & CLONE_ACL) != 0
+   && cloned_mode == src_mode;
   dest_desc = -1;
   omitted_permissions = dst_mode & ~cloned_mode;
-  extra_permissions = 0;
+  extra_permissions = dst_mode & ~ cached_umask ();
   goto set_dest_mode;
 }
   else if (! handle_clone_fail (dst_dirfd, dst_relname, src_name,

With this, all tests pass (apart from thru-dangling.sh as mentioned above
which I'll skip in another patch).
Paul I'm happy to merge this amendment, mention the improvement in NEWS,
and push in your name if you're OK with it.


To summarise:
1. We apply my original patch to disable sparse-copy on macOS. Otherwise since 
fclonefileat is not used whenever we overwrite a file, corruption will still 
occur. An updated copy of the patch is attached below:
By doing this we fail the following test so you have to disable it on macOS:
FAIL: tests/cp/sparse-perf.sh


Right. I'll adjust your patch to also get
tests/seek-data-capable to return false on Darwin,
which will then skip that test.


2. Next we add support for fclonefileat using cp: improve use of fclonefileat
A few more tests fail.

3. We fix those tests using the patch from your last mail. Then we have to 
disable the following test on macOS:
FAIL: tests/cp/thru-dangling.sh

Final results: copying sparse-files on macOS no longer produces corrupted 
copies. Performance is greatly optimised by cloning instead of copying, where 
possible. The following two tests now fail on macOS and will be disabled:
FAIL: tests/cp/sparse-perf.sh


handled above


FAIL: tests/cp/thru-dangling.sh


I'll fix this


Looks good to me. Yes, please add me to the commit history. Also feel free to 
add my web site if appropriate. Finally please let me know when the patches are 
pushed, so that I can update OpenWRT to use the latest version of coreutils.

Pádraig and Paul, thank you very much for your kind help! It was nice working 
with you and learning new experience. Cheers, mates!


Summary sounds good.

Thanks for all the testing.

Pádraig






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-13 Thread George Valkov
Test results on macOS 12.6.3
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-02-d374d32ccf12f8cf33c8f823d573498b7c8b27a4-clone.txt


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-13 Thread George Valkov

> On 2023-02-13, at 4:05 PM, Pádraig Brady  wrote:
> 
> On 12/02/2023 20:36, Pádraig Brady wrote:
>> On 12/02/2023 14:15, Pádraig Brady wrote:
>>> On 11/02/2023 21:53, Paul Eggert wrote:
 On 2023-02-10 17:21, George Valkov wrote:
 
> remember to trace all code paths errors.
 
 Yes, I've been doing that. However, I don't use macOS and the two of us
 are debugging remotely where I write the code and you debug it but
 neither of us know how macOS works. Of course it would be much more
 efficient if we weren't operating with these obstacles, but here we are.
 
 Because the macOS behavior is poorly documented and we lack the source,
 if we can't figure this out fairly quickly coreutils should simply stop
 using fclonefileat because it's unreliable for users and a time sink for
 us. I'm hoping, though, we can get something working with another round
 or two.
 
 
> With CLONE_ACL, I get 22 Invalid argument.
 
 OK, this means that although Apple has documented CLONE_ACL and it's in
 your include files, it doesn't work in your version of macOS, because
 either it's not supported by the kernel, or by the filesystem code, or
 by the particular filesystem instance, or for some other reason. EINVAL
 hints that it's the kernel but that's by no means certain.
 
 One possible way to defend against this macOS misbehavior is for cp to
 try to use CLONE_ACL, and if that fails with errno == EINVAL try again
 without CLONE_ACL. I attempted to implement this in the attached patch;
 please give it a try.
>>> 
>>> I tested this on macOS 13.2 and the first fclonefileat(..., CLONE_ACL) 
>>> succeeds.
>>> I tested with various umask, --no-preserve=mode, -p, ... combinations
>>> and didn't find any inconsistencies in permissions with --reflink=never 
>>> copies.
>>> Actually scratch that. There was an older xcode installed so CLONE_ACL was 
>>> 0.
>>> I've not got access at the moment, so will need to retest later with newer 
>>> xcode.
>> Good news is that with newer xcode supporting macOS 13.1
>> CLONE_ACL is set and that all works as expected, with the
>> first fclonefileat() succeeding.
>> I checked with actual ACLs (set with `chmod +a "user:$USER allow readattr" 
>> file`),
>> and they were copied or dropped as expected. Note ls -l on macOS will list
>> ACLs with a "+" rather than a "@", as was mentioned elsewhere in this thread.
>>> One divergence with fclonefileat() is that it writes through a dangling 
>>> symlink,
>>> where a standard --reflink=never copy will fail.
>>> I.e. `make TESTS=tests/cp/thru-dangling.sh SUBDIRS=. check` fails.
>>> Now there is nothing wrong with writing through the dangling symlink,
>>> and in fact standard cp will also with POSIXLY_CORRECT set in the 
>>> environment.
>>> So I may just need need to adjust the test to work / skip in this case.
>> Another divergence I noticed is wrt umask and setuid handling.
>> The following transcript shows that with fclonefileat() we do _not_
>> preserve setuid when it is preserved without.
>> Also with fclonefileat() the umask is not honored,
>> while it is honored without.
>> # id -u
>> 0
>> # echo create setuid file
>> # rm src/cp.s; src/cp src/cp src/cp.s; chmod u+s src/cp.s
>> # umask; for reflink in always never; do
>>  >  for preserve in mode timestamps; do
>>  >   rm -f src/cp.s.$reflink.$preserve
>>  >   src/cp --reflink=$reflink --preserve=$preserve src/cp.s 
>> src/cp.s.$reflink.$preserve
>>  >  done
>>  > done
>>  > ls -le src/cp.s*
>> 0022
>> -rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s
>> -rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.mode
>> -rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.timestamps
>> -rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.mode
>> -rwxr-xr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.timestamps
> 
> This amendment seems to make the treatment of umask and setuid consistent
> between using fclonefileat() and not.
> 
> diff --git a/src/copy.c b/src/copy.c
> index 782fab98d..8370f55bd 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -1280,10 +1280,11 @@ copy_reg (char const *src_name, char const *dst_name,
>   return_val = false;
> }
> }
> -  mode_already_preserved = (fc_flags & CLONE_ACL) != 0;
> +  mode_already_preserved = (fc_flags & CLONE_ACL) != 0
> +   && cloned_mode == src_mode;
>   dest_desc = -1;
>   omitted_permissions = dst_mode & ~cloned_mode;
> -  extra_permissions = 0;
> +  extra_permissions = dst_mode & ~ cached_umask ();
>   goto set_dest_mode;
> }
>   else if (! handle_clone_fail (dst_dirfd, dst_relname, src_name,
> 
> With this, all tests pass (apart from thru-dangling.sh as mentioned above
> which I'll skip in another patch).
> Paul I'm 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-13 Thread Pádraig Brady

On 12/02/2023 23:37, George Valkov wrote:


I can confirm that. CLONE_ACL is 4 when building on macOS 12.6.3. I rand the 
compiled sample on the recovery environment, which is equivalent to macOS 13.3. 
The flag is supported, though I did not observe any difference with or without 
it. The sample was ran as root, while the source and destination were set to 
g:staff 600 @.


I guess the CLONE_ACL value is more dependent on xcode rather than macOS 
version.
I confirmed again on macOS 13.2 that CLONE_ACL is honored and behaves as 
expected.
I guess on 12.6.3 it's failing with EINVAL and falling back to the CLONE_ACL==0 
case.

cheers,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-13 Thread Pádraig Brady

On 12/02/2023 20:36, Pádraig Brady wrote:

On 12/02/2023 14:15, Pádraig Brady wrote:

On 11/02/2023 21:53, Paul Eggert wrote:

On 2023-02-10 17:21, George Valkov wrote:


remember to trace all code paths errors.


Yes, I've been doing that. However, I don't use macOS and the two of us
are debugging remotely where I write the code and you debug it but
neither of us know how macOS works. Of course it would be much more
efficient if we weren't operating with these obstacles, but here we are.

Because the macOS behavior is poorly documented and we lack the source,
if we can't figure this out fairly quickly coreutils should simply stop
using fclonefileat because it's unreliable for users and a time sink for
us. I'm hoping, though, we can get something working with another round
or two.



With CLONE_ACL, I get 22 Invalid argument.


OK, this means that although Apple has documented CLONE_ACL and it's in
your include files, it doesn't work in your version of macOS, because
either it's not supported by the kernel, or by the filesystem code, or
by the particular filesystem instance, or for some other reason. EINVAL
hints that it's the kernel but that's by no means certain.

One possible way to defend against this macOS misbehavior is for cp to
try to use CLONE_ACL, and if that fails with errno == EINVAL try again
without CLONE_ACL. I attempted to implement this in the attached patch;
please give it a try.


I tested this on macOS 13.2 and the first fclonefileat(..., CLONE_ACL) succeeds.
I tested with various umask, --no-preserve=mode, -p, ... combinations
and didn't find any inconsistencies in permissions with --reflink=never copies.
Actually scratch that. There was an older xcode installed so CLONE_ACL was 0.
I've not got access at the moment, so will need to retest later with newer 
xcode.


Good news is that with newer xcode supporting macOS 13.1
CLONE_ACL is set and that all works as expected, with the
first fclonefileat() succeeding.
I checked with actual ACLs (set with `chmod +a "user:$USER allow readattr" 
file`),
and they were copied or dropped as expected. Note ls -l on macOS will list
ACLs with a "+" rather than a "@", as was mentioned elsewhere in this thread.


One divergence with fclonefileat() is that it writes through a dangling symlink,
where a standard --reflink=never copy will fail.
I.e. `make TESTS=tests/cp/thru-dangling.sh SUBDIRS=. check` fails.
Now there is nothing wrong with writing through the dangling symlink,
and in fact standard cp will also with POSIXLY_CORRECT set in the environment.
So I may just need need to adjust the test to work / skip in this case.


Another divergence I noticed is wrt umask and setuid handling.
The following transcript shows that with fclonefileat() we do _not_
preserve setuid when it is preserved without.
Also with fclonefileat() the umask is not honored,
while it is honored without.

# id -u
0

# echo create setuid file
# rm src/cp.s; src/cp src/cp src/cp.s; chmod u+s src/cp.s

# umask; for reflink in always never; do
  >  for preserve in mode timestamps; do
  >   rm -f src/cp.s.$reflink.$preserve
  >   src/cp --reflink=$reflink --preserve=$preserve src/cp.s 
src/cp.s.$reflink.$preserve
  >  done
  > done
  > ls -le src/cp.s*
0022
-rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s
-rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.mode
-rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.timestamps
-rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.mode
-rwxr-xr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.timestamps


This amendment seems to make the treatment of umask and setuid consistent
between using fclonefileat() and not.

diff --git a/src/copy.c b/src/copy.c
index 782fab98d..8370f55bd 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1280,10 +1280,11 @@ copy_reg (char const *src_name, char const *dst_name,
   return_val = false;
 }
 }
-  mode_already_preserved = (fc_flags & CLONE_ACL) != 0;
+  mode_already_preserved = (fc_flags & CLONE_ACL) != 0
+   && cloned_mode == src_mode;
   dest_desc = -1;
   omitted_permissions = dst_mode & ~cloned_mode;
-  extra_permissions = 0;
+  extra_permissions = dst_mode & ~ cached_umask ();
   goto set_dest_mode;
 }
   else if (! handle_clone_fail (dst_dirfd, dst_relname, src_name,

With this, all tests pass (apart from thru-dangling.sh as mentioned above
which I'll skip in another patch).
Paul I'm happy to merge this amendment, mention the improvement in NEWS,
and push in your name if you're OK with it.

cheers,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-12 Thread George Valkov


> On 2023-02-12, at 10:36 PM, Pádraig Brady  wrote:
> 
> On 12/02/2023 14:15, Pádraig Brady wrote:
>> On 11/02/2023 21:53, Paul Eggert wrote:
>>> On 2023-02-10 17:21, George Valkov wrote:
>>> 
 remember to trace all code paths errors.
>>> 
>>> Yes, I've been doing that. However, I don't use macOS and the two of us
>>> are debugging remotely where I write the code and you debug it but
>>> neither of us know how macOS works. Of course it would be much more
>>> efficient if we weren't operating with these obstacles, but here we are.
>>> 
>>> Because the macOS behavior is poorly documented and we lack the source,
>>> if we can't figure this out fairly quickly coreutils should simply stop
>>> using fclonefileat because it's unreliable for users and a time sink for
>>> us. I'm hoping, though, we can get something working with another round
>>> or two.
>>> 
>>> 
 With CLONE_ACL, I get 22 Invalid argument.
>>> 
>>> OK, this means that although Apple has documented CLONE_ACL and it's in
>>> your include files, it doesn't work in your version of macOS, because
>>> either it's not supported by the kernel, or by the filesystem code, or
>>> by the particular filesystem instance, or for some other reason. EINVAL
>>> hints that it's the kernel but that's by no means certain.
>>> 
>>> One possible way to defend against this macOS misbehavior is for cp to
>>> try to use CLONE_ACL, and if that fails with errno == EINVAL try again
>>> without CLONE_ACL. I attempted to implement this in the attached patch;
>>> please give it a try.
>> I tested this on macOS 13.2 and the first fclonefileat(..., CLONE_ACL) 
>> succeeds.
>> I tested with various umask, --no-preserve=mode, -p, ... combinations
>> and didn't find any inconsistencies in permissions with --reflink=never 
>> copies.
>> Actually scratch that. There was an older xcode installed so CLONE_ACL was 0.
>> I've not got access at the moment, so will need to retest later with newer 
>> xcode.
> 
> Good news is that with newer xcode supporting macOS 13.1
> CLONE_ACL is set and that all works as expected, with the
> first fclonefileat() succeeding.

I can confirm that. CLONE_ACL is 4 when building on macOS 12.6.3. I rand the 
compiled sample on the recovery environment, which is equivalent to macOS 13.3. 
The flag is supported, though I did not observe any difference with or without 
it. The sample was ran as root, while the source and destination were set to 
g:staff 600 @.


> I checked with actual ACLs (set with `chmod +a "user:$USER allow readattr" 
> file`),
> and they were copied or dropped as expected. Note ls -l on macOS will list
> ACLs with a "+" rather than a "@", as was mentioned elsewhere in this thread.

@ is shown on files that have extended attributes. I can remove them with
xattr -cr . # recursive
xattr -cr file # single item

+ means ACL

ll -e -@
drwx--@   3 g staff 96 May 16  2021 Applications/
com.apple.quarantine   21
drwx--+   8 g staff256 May 16  2021 Music/
 0: group:everyone deny delete


>> One divergence with fclonefileat() is that it writes through a dangling 
>> symlink,
>> where a standard --reflink=never copy will fail.
>> I.e. `make TESTS=tests/cp/thru-dangling.sh SUBDIRS=. check` fails.
>> Now there is nothing wrong with writing through the dangling symlink,
>> and in fact standard cp will also with POSIXLY_CORRECT set in the 
>> environment.
>> So I may just need need to adjust the test to work / skip in this case.

The same test fails on macOS 12.6.3. See test-01*.txt.
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/

I ran your test, followed by make check-TESTS for both original (8 failed) and 
patched (12 filed).


> Another divergence I noticed is wrt umask and setuid handling.
> The following transcript shows that with fclonefileat() we do _not_
> preserve setuid when it is preserved without.
> Also with fclonefileat() the umask is not honored,
> while it is honored without.
> 
> # id -u
> 0
> 
> # echo create setuid file
> # rm src/cp.s; src/cp src/cp src/cp.s; chmod u+s src/cp.s
> 
> # umask; for reflink in always never; do
> >  for preserve in mode timestamps; do
> >   rm -f src/cp.s.$reflink.$preserve
> >   src/cp --reflink=$reflink --preserve=$preserve src/cp.s 
> > src/cp.s.$reflink.$preserve
> >  done
> > done
> > ls -le src/cp.s*
> 0022
> -rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s
> -rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.mode
> -rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.timestamps
> -rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.mode
> -rwxr-xr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.timestamps

The manpage is clonefile.2.txt, use the link above.


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-12 Thread Pádraig Brady

On 12/02/2023 14:15, Pádraig Brady wrote:

On 11/02/2023 21:53, Paul Eggert wrote:

On 2023-02-10 17:21, George Valkov wrote:


remember to trace all code paths errors.


Yes, I've been doing that. However, I don't use macOS and the two of us
are debugging remotely where I write the code and you debug it but
neither of us know how macOS works. Of course it would be much more
efficient if we weren't operating with these obstacles, but here we are.

Because the macOS behavior is poorly documented and we lack the source,
if we can't figure this out fairly quickly coreutils should simply stop
using fclonefileat because it's unreliable for users and a time sink for
us. I'm hoping, though, we can get something working with another round
or two.



With CLONE_ACL, I get 22 Invalid argument.


OK, this means that although Apple has documented CLONE_ACL and it's in
your include files, it doesn't work in your version of macOS, because
either it's not supported by the kernel, or by the filesystem code, or
by the particular filesystem instance, or for some other reason. EINVAL
hints that it's the kernel but that's by no means certain.

One possible way to defend against this macOS misbehavior is for cp to
try to use CLONE_ACL, and if that fails with errno == EINVAL try again
without CLONE_ACL. I attempted to implement this in the attached patch;
please give it a try.


I tested this on macOS 13.2 and the first fclonefileat(..., CLONE_ACL) succeeds.
I tested with various umask, --no-preserve=mode, -p, ... combinations
and didn't find any inconsistencies in permissions with --reflink=never copies.
Actually scratch that. There was an older xcode installed so CLONE_ACL was 0.
I've not got access at the moment, so will need to retest later with newer 
xcode.


Good news is that with newer xcode supporting macOS 13.1
CLONE_ACL is set and that all works as expected, with the
first fclonefileat() succeeding.
I checked with actual ACLs (set with `chmod +a "user:$USER allow readattr" 
file`),
and they were copied or dropped as expected. Note ls -l on macOS will list
ACLs with a "+" rather than a "@", as was mentioned elsewhere in this thread.


One divergence with fclonefileat() is that it writes through a dangling symlink,
where a standard --reflink=never copy will fail.
I.e. `make TESTS=tests/cp/thru-dangling.sh SUBDIRS=. check` fails.
Now there is nothing wrong with writing through the dangling symlink,
and in fact standard cp will also with POSIXLY_CORRECT set in the environment.
So I may just need need to adjust the test to work / skip in this case.


Another divergence I noticed is wrt umask and setuid handling.
The following transcript shows that with fclonefileat() we do _not_
preserve setuid when it is preserved without.
Also with fclonefileat() the umask is not honored,
while it is honored without.

# id -u
0

# echo create setuid file
# rm src/cp.s; src/cp src/cp src/cp.s; chmod u+s src/cp.s

# umask; for reflink in always never; do
>  for preserve in mode timestamps; do
>   rm -f src/cp.s.$reflink.$preserve
>   src/cp --reflink=$reflink --preserve=$preserve src/cp.s 
src/cp.s.$reflink.$preserve
>  done
> done
> ls -le src/cp.s*
0022
-rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s
-rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.mode
-rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.timestamps
-rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.mode
-rwxr-xr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.timestamps

cheers,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-12 Thread Pádraig Brady

On 11/02/2023 21:53, Paul Eggert wrote:

On 2023-02-10 17:21, George Valkov wrote:


remember to trace all code paths errors.


Yes, I've been doing that. However, I don't use macOS and the two of us
are debugging remotely where I write the code and you debug it but
neither of us know how macOS works. Of course it would be much more
efficient if we weren't operating with these obstacles, but here we are.

Because the macOS behavior is poorly documented and we lack the source,
if we can't figure this out fairly quickly coreutils should simply stop
using fclonefileat because it's unreliable for users and a time sink for
us. I'm hoping, though, we can get something working with another round
or two.



With CLONE_ACL, I get 22 Invalid argument.


OK, this means that although Apple has documented CLONE_ACL and it's in
your include files, it doesn't work in your version of macOS, because
either it's not supported by the kernel, or by the filesystem code, or
by the particular filesystem instance, or for some other reason. EINVAL
hints that it's the kernel but that's by no means certain.

One possible way to defend against this macOS misbehavior is for cp to
try to use CLONE_ACL, and if that fails with errno == EINVAL try again
without CLONE_ACL. I attempted to implement this in the attached patch;
please give it a try.


I tested this on macOS 13.2 and the first fclonefileat(..., CLONE_ACL) succeeds.
I tested with various umask, --no-preserve=mode, -p, ... combinations
and didn't find any inconsistencies in permissions with --reflink=never copies.
Actually scratch that. There was an older xcode installed so CLONE_ACL was 0.
I've not got access at the moment, so will need to retest later with newer 
xcode.

One divergence with fclonefileat() is that it writes through a dangling symlink,
where a standard --reflink=never copy will fail.
I.e. `make TESTS=tests/cp/thru-dangling.sh SUBDIRS=. check` fails.
Now there is nothing wrong with writing through the dangling symlink,
and in fact standard cp will also with POSIXLY_CORRECT set in the environment.
So I may just need need to adjust the test to work / skip in this case.

cheers,
Pádraig






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-11 Thread George Valkov


> On 2023-02-12, at 2:47 AM, Paul Eggert  wrote:
> 
> On 2023-02-11 16:38, George Valkov wrote:
>> This might help:
>> https://github.com/apple/darwin-xnu/blob/main/bsd/sys/clonefile.h
> 
> It doesn't help, because it doesn't mention CLONE_ACL.

Here is what I found: The version of vfs_syscalls.c on that repository is 3 
years old and does not support CLONE_ACL. Still it should provide a good idea 
about the implementation before this flags was introduced.

https://github.com/apple/darwin-xnu/blob/2ff845c2e033bd0ff64b5b6aa6063a1f8f65aa32/bsd/vfs/vfs_syscalls.c#L8201

if (uap->flags & ~(CLONE_NOFOLLOW | CLONE_NOOWNERCOPY)) {
return EINVAL;
}

I was able to run my sample on macOS 13 recovery environment. The CLONE_ACL 
flag is supported there. I don’t see any difference in the final result with or 
without the flag. Both clones have UNIX permissions, extended attributes and 
time stamp from the source.

Darwin gMac.lan 22.3.0 Darwin Kernel Version 22.3.0: Thu Jan  5 20:53:49 PST 
2023; root:xnu-8792.81.2~2/RELEASE_X86_64 x86_64

fd 3  dir 4
fclonefileat  0   0 Undefined error: 0
fclonefileat  0   0 Undefined error: 0 CLONE_ACL

-rw---@  1 501   staff553 12 Feb 00:50 A
-rw---@  1 501   staff553 12 Feb 00:50 B
-rw---@  1 501   staff553 12 Feb 00:50 CLONE_ACL


I tried running cp with your patch there, but it depends on a dynamic library 
and fails to run. My attempt to use chroot failed, probably due to file 
signatures: Killed 9.


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-11 Thread Paul Eggert

On 2023-02-11 16:38, George Valkov wrote:

This might help:
https://github.com/apple/darwin-xnu/blob/main/bsd/sys/clonefile.h


It doesn't help, because it doesn't mention CLONE_ACL.


The kernel should be open source.


No, the macOS kernel is not free software. Nor is iOS's kernel.





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-11 Thread George Valkov


> On 2023-02-11, at 11:53 PM, Paul Eggert  wrote:
> 
> On 2023-02-10 17:21, George Valkov wrote:
> 
>> remember to trace all code paths errors.
> 
> Yes, I've been doing that. However, I don't use macOS and the two of us are 
> debugging remotely where I write the code and you debug it but neither of us 
> know how macOS works. Of course it would be much more efficient if we weren't 
> operating with these obstacles, but here we are.

Yes, it’s true. It takes a lot of energy from both of us. Especially the time 
to document observations in hope to be helpful. There is a saying that two 
people working together is more than each on their own.
Let’s work together! I have a good experience with mutiplatform development: 
macOS, Linux, BSD, Windows. For the most part it’s the same as Linux with minor 
differences, so pretty easy to support according to my experience. Build 
environment and debugging tools are all set. That will keep you comfortable and 
productive. Just drop me a private mail and we can set a conference call. Then 
test for standard compliance according to manpage.


> Because the macOS behavior is poorly documented and we lack the source, if we 
> can't figure this out fairly quickly coreutils should simply stop using 
> fclonefileat because it's unreliable for users and a time sink for us. I'm 
> hoping, though, we can get something working with another round or two.

Sad but true. This might help:
https://github.com/apple/darwin-xnu/blob/main/bsd/sys/clonefile.h
The kernel should be open source. We might be able to find the implementation 
there.
I hope we can keep fclonefileat.


>> With CLONE_ACL, I get 22 Invalid argument.
> 
> OK, this means that although Apple has documented CLONE_ACL and it's in your 
> include files, it doesn't work in your version of macOS, because either it's 
> not supported by the kernel, or by the filesystem code, or by the particular 
> filesystem instance, or for some other reason. EINVAL hints that it's the 
> kernel but that's by no means certain.
> 
> One possible way to defend against this macOS misbehavior is for cp to try to 
> use CLONE_ACL, and if that fails with errno == EINVAL try again without 
> CLONE_ACL. I attempted to implement this in the attached patch; please give 
> it a try.

Good approach. Tour patch confirms it works. copy.c says it has been introduced 
in 12.6, I’m using 12.6.3. That suggests it is available in headers, but not 
available in the kernel yet. I will check if it is implemented in the recovery 
environment for macOS 13.

I added some code for tracing. The check works as intended.

int s = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
int a = s ? errno : 0;
if (s != 0 && (fc_flags & CLONE_ACL) != 0 && errno == EINVAL)
  {
printf("s %2i  errno %2i %s  fc_flags %x !!\n", s, a, strerror(a), 
fc_flags);
fc_flags &= ~CLONE_ACL;
s = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
a = s ? errno : 0;
  }
  printf("s %2i  errno %2i %s  fc_flags %x\n", s, a, strerror(a), fc_flags);

./coreutils-clone/src/cp A B
s  0  errno  0 Success  fc_flags 2

./coreutils-clone/src/cp -p A C
s -1  errno 22 Invalid argument  fc_flags 4 !!
s  0  errno  0 Success  fc_flags 0

Extended attributes are preserved in both copies: note the @ flag.
-rw---@1 g staff   415 Feb 11 02:10 A
-rw---@1 g staff   415 Feb 12 01:30 B
-rw---@1 g staff   415 Feb 11 02:10 C


Good luck, Paul!

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-11 Thread Paul Eggert

On 2023-02-10 17:21, George Valkov wrote:


remember to trace all code paths errors.


Yes, I've been doing that. However, I don't use macOS and the two of us 
are debugging remotely where I write the code and you debug it but 
neither of us know how macOS works. Of course it would be much more 
efficient if we weren't operating with these obstacles, but here we are.


Because the macOS behavior is poorly documented and we lack the source, 
if we can't figure this out fairly quickly coreutils should simply stop 
using fclonefileat because it's unreliable for users and a time sink for 
us. I'm hoping, though, we can get something working with another round 
or two.




With CLONE_ACL, I get 22 Invalid argument.


OK, this means that although Apple has documented CLONE_ACL and it's in 
your include files, it doesn't work in your version of macOS, because 
either it's not supported by the kernel, or by the filesystem code, or 
by the particular filesystem instance, or for some other reason. EINVAL 
hints that it's the kernel but that's by no means certain.


One possible way to defend against this macOS misbehavior is for cp to 
try to use CLONE_ACL, and if that fails with errno == EINVAL try again 
without CLONE_ACL. I attempted to implement this in the attached patch; 
please give it a try.




do we want to go the quick route of unlinking the destination and creating a 
fresh clone?


No, POSIX is clear that plain cp should not unlink and re-create the 
destination. Of course we can add a non-POSIX flag to do that, and we 
already have: cp --remove-destination.From ea8904606e8e749e2f6d22c1abb0310fac3722f5 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Fri, 10 Feb 2023 13:34:54 -0800
Subject: [PATCH] cp: improve use of fclonefileat

* src/copy.c (copy_reg): Use CLONE_ACL if available and working.
If the only problem with fclonefileat is that it would create the
file with the wrong timestamp, or with too few permissions,
do that but fix the timestamp and permissions afterwards,
rather than falling back on a traditional copy.
(CLONE_ACL) [HAVE_FCLONEFILEAT && !USE_XATTR]: Default to 0.
---
 src/copy.c | 52 
 1 file changed, 44 insertions(+), 8 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index dfbb557de..782fab98d 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1146,6 +1146,7 @@ copy_reg (char const *src_name, char const *dst_name,
   union scan_inference scan_inference;
   bool return_val = true;
   bool data_copy_required = x->data_copy_required;
+  bool mode_already_preserved = false;
   bool preserve_xattr = USE_XATTR & x->preserve_xattr;
 
   source_desc = open (src_name,
@@ -1244,17 +1245,47 @@ copy_reg (char const *src_name, char const *dst_name,
   if (*new_dst)
 {
 #if HAVE_FCLONEFILEAT && !USE_XATTR
-/* CLONE_NOOWNERCOPY only available on macos >= 10.13.  */
+# ifndef CLONE_ACL
+#  define CLONE_ACL 0 /* Added in macOS 12.6.  */
+# endif
 # ifndef CLONE_NOOWNERCOPY
-#  define CLONE_NOOWNERCOPY 0
+#  define CLONE_NOOWNERCOPY 0 /* Added in macOS 10.13.  */
 # endif
-  int fc_flags = x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY;
+  mode_t cloned_mode_bits = S_ISVTX | S_IRWXUGO;
+  mode_t cloned_mode = src_mode & cloned_mode_bits;
+  int fc_flags = ((x->preserve_mode ? CLONE_ACL : 0)
+  | (x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY));
   if (data_copy_required && x->reflink_mode
-  && x->preserve_mode && x->preserve_timestamps
+  && (x->preserve_mode || ! (cloned_mode & ~dst_mode))
   && (x->preserve_ownership || CLONE_NOOWNERCOPY))
 {
-  if (fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags) == 0)
-goto close_src_desc;
+  int s = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
+  if (s != 0 && (fc_flags & CLONE_ACL) != 0 && errno == EINVAL)
+{
+  fc_flags &= ~CLONE_ACL;
+  s = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
+}
+  if (s == 0)
+{
+  if (!x->preserve_timestamps)
+{
+  struct timespec timespec[2];
+  timespec[0].tv_nsec = timespec[1].tv_nsec = UTIME_NOW;
+  if (utimensat (dst_dirfd, dst_relname, timespec,
+ AT_SYMLINK_NOFOLLOW)
+  != 0)
+{
+  error (0, errno, _("updating times for %s"),
+ quoteaf (dst_name));
+  return_val = false;
+}
+}
+  mode_already_preserved = (fc_flags & CLONE_ACL) != 0;
+  dest_desc = -1;
+  omitted_permissions = dst_mode & ~cloned_mode;
+  extra_permissions = 0;
+  goto set_dest_mode;
+}
   else if (! handle_clone_fail (dst_dirfd, dst_relname, src_name,
   

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-11 Thread George Valkov


> On 2023-02-11, at 8:19 PM, Paul Eggert  wrote:
> 
> On 2023-02-10 17:27, George Valkov wrote:
>> I’d prefer a timestamp consistent with source.
> 
> You can get that with 'cp -p' (or with 'cp --preserve=timestamps' etc.). So 
> this is not an issue of whether you can get what you prefer. It's merely an 
> issue about what cp's default behavior is, when the user doesn't specify 
> options.

Thanks!


> POSIX is reasonably clear that if you use plain cp to create a file, the 
> file's last-modified time should be the current time. And it's not just a 
> question of standards conformance: lots of real-world uses of 'cp' expect 
> this behavior. For example, people use plain 'cp' in Makefiles, and 'make' 
> relies on the last-modified time being the current time. If cp changed this 
> behavior, it'd break a lot of builds.

Ah, yes, make relays on timestamps. So we should update timestamp after clone 
by default.
Let me know when you have a patch without CLONE_ACL and I will test it.


Georgi Valkov
httpstorm.com
nano RTOS





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-11 Thread Paul Eggert

On 2023-02-10 17:27, George Valkov wrote:

I’d prefer a timestamp consistent with source.


You can get that with 'cp -p' (or with 'cp --preserve=timestamps' etc.). 
So this is not an issue of whether you can get what you prefer. It's 
merely an issue about what cp's default behavior is, when the user 
doesn't specify options.


POSIX is reasonably clear that if you use plain cp to create a file, the 
file's last-modified time should be the current time. And it's not just 
a question of standards conformance: lots of real-world uses of 'cp' 
expect this behavior. For example, people use plain 'cp' in Makefiles, 
and 'make' relies on the last-modified time being the current time. If 
cp changed this behavior, it'd break a lot of builds.






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread George Valkov


> On 2023-02-11, at 12:02 AM, Pádraig Brady  wrote:
> 
> On 10/02/2023 21:50, Paul Eggert wrote:
>> On 2/10/23 13:35, George Valkov wrote:
>>> Since the source and it’s clone have separate metadata,
>>> it should be possible to change it on the clone, to comply with standards.
>> Attached is a hacky patch to do that. It also uses the new CLONE_ACL
>> flag. The old code apparently mishandled ACLs; the new code tries to do
>> a better job.
> 
> Yes this is looking much better.
> I'm not fully convinced that we should also update the times,
> but not against it either.

I’d prefer a timestamp consistent with source. This is the default when
we clone a file. So no action needed. If we fall-back to copy, we need to set 
it.


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread George Valkov


> On 2023-02-11, at 2:48 AM, Paul Eggert  wrote:
> 
> I want to emphasize the fact that we need to get to the bottom of the 
> SEEK_HOLE / SEEK_DATA situation on macOS. Core programs other than 'cp' use 
> these options, and working around the bug in 'cp' won't fix the bug elsewhere.
> 
> One possibility is to modify Gnulib so that SEEK_HOLE and SEEK_DATA are never 
> used by any Gnulib-using application. This will help somewhat. But not every 
> program uses Gnulib.

Would it help, if I provide access to macOS?


> The bug is really at the macOS level and it needs to get fixed there. Has 
> anyone filed a bug report? (I'm not a macOS user and so can't file one.)

I wrote to product-secur...@apple.com on 2021-10-08.
> I have reviewed your report, and we are investigating.
No further replay. I also tried phone and development support.
They are extremely hard to work with.
They always keep us in the dark and never admit to anything.
30 months of repair services, and erasing my data every time.
Perhaps we can try in public here? https://github.com/apple

It helps me value and appreciate working with OpenWRT and you kind people.


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread George Valkov




> On 2023-02-10, at 11:50 PM, Paul Eggert  wrote:
> 
> On 2/10/23 13:35, George Valkov wrote:
> 
>> Since the source and it’s clone have separate metadata,
>> it should be possible to change it on the clone, to comply with standards.
> 
> Attached is a hacky patch to do that. It also uses the new CLONE_ACL flag. 
> The old code apparently mishandled ACLs; the new code tries to do a better 
> job.

Hint: remember to trace all code paths errors.
The program may and does steer in directions other than what you expected.

Without CLONE_ACL, UNIX permissions and macOS extended attributes
from the source file are preserved, timestamp is updated:
cp A B
-rw---@   1 g staff   332 Feb 11 01:57 A
-rw---@   1 g staff   332 Feb 11 02:19 B

With CLONE_ACL, I get 22 Invalid argument. Then falls-back to something else,
UNIX permissions and timestamp are preserved, extended attributes are lost:
cp -p A B
-rw---@   1 g staff   332 Feb 11 01:57 A
-rw---1 g staff   332 Feb 11 01:57 B

fclonefileat also requires that the target does not exist, if it does we have 
to remove it,
otherwise it fails: 17 File exists. With your patch, cp does not print anything.

Now the question here is:
do we want to go the quick route of unlinking the destination and creating a 
fresh clone?
Or do we consider the possibility of multiple hard-links in which case if we 
want to update
a hard-link, we must overwrite the entire file and not use unlink, fclonefileat.
If we are absolutely certain that only one link exists, the first approach is 
safe
and more efficient.
If more than one link exist, we must always overwrite the existing data, to keep
hard-links consistent.


> This whole area is a mess, but the patch should improve correctness on macOS, 
> while also being efficient in the usual case.

We need to resolve the issues described above first.


>> It feels more natural when the metadata is kept intact for the two files.
> 
> That depends on the application. The longstanding tradition for cp is to 
> preserve metadata if you use 'cp -p', and to not preserve with plain cp. For 
> interactive use you can use an alias or a little script if you prefer -p to 
> be enabled by default.

Thank you for clarifying!
I used the sample I wrote in a previous message to test fclonefileat 
independently
of coreutils: the default behaviour is to preserve everything.
The manpage installed on macOS is from June 3, 2021. While it does describe the
behaviour of CLONE_ACL, reality shows it is not supported. CLONE_ACL fails with
22 Invalid argument. Which means additional code would be required to
create what cp thinks as default metadata, which might be prone to errors


>> Here is a good example: I create a tarball on my OpenWRT router. Extract on 
>> the Mac.
> 
> That's fine, because 'tar' historically has preserved the timestamp and (if 
> you're root) permissions. Plain cp has not.

Fair enough. For comparison, here is the behaviour of coreutils 9.1 cp release:

copy UNIX permissions, without extended attributes or timestamp
cp A B
-rw---@   1 g staff   332 Feb 11 01:57 A
-rw---1 g staff   332 Feb 11 02:40 B

copy UNIX permissions, extended attributes and timestamp
cp -p A B
-rw---@   1 g staff   332 Feb 11 01:57 A
-rw---@   1 g staff   332 Feb 11 01:57 B


Good luck!

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread Paul Eggert
I want to emphasize the fact that we need to get to the bottom of the 
SEEK_HOLE / SEEK_DATA situation on macOS. Core programs other than 'cp' 
use these options, and working around the bug in 'cp' won't fix the bug 
elsewhere.


One possibility is to modify Gnulib so that SEEK_HOLE and SEEK_DATA are 
never used by any Gnulib-using application. This will help somewhat. But 
not every program uses Gnulib.


The bug is really at the macOS level and it needs to get fixed there. 
Has anyone filed a bug report? (I'm not a macOS user and so can't file one.)






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread George Valkov



> On 2023-02-10, at 11:46 PM, Pádraig Brady  wrote:
> 
> On 10/02/2023 20:45, Paul Eggert wrote:
>> On 2/10/23 10:58, Pádraig Brady wrote:
>>> I was considering "touch"ing the timestamps after also,
>>> but it's better to just maintain them as we're
>>> pointing to the same data after all.
>> For POSIX conformance we must touch if the user has specified only POSIX
>> options (and has not specified -p).
>> And it's not just a POSIX conformance issue. Ordinary users will be
>> surprised if plain 'cp A B' creates a file B with a timestamp from last
>> year.
> 
> Maybe. Though POSIX says cp "shall copy" and we're not making a copy, we're 
> making a reflink.
> So technically we're violating POSIX already in that regard.

A hard link is when we have two or more names for the same file.
We can read or write and it affects the same disk content.
That would violate POSIX shall copy, but we are not doing a hard link.
A clone behaves exactly like a copy. Reads and writes affect only
the selected file. It’s rather an optimised copy.


> One might take the view that the fact we write no new data
> means we should not update the data modification time etc. by default,
> and this may be more signal to a user that new data has not in fact been 
> written.

We can describe the changes in the documentation. From a user perspective,
everything works exactly as before, only faster. Some users might get surprised
the first time they copy several gigabytes and we finish instantly. If will 
check:
the copy is there and it works. So they’ll get used to it and be happy about 
the change.
I always wanted a tool that does CoW. I knew APFS supports it.

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread Paul Eggert

On 2/10/23 13:46, Pádraig Brady wrote:

Maybe. Though POSIX says cp "shall copy" and we're not making a copy, 
we're making a reflink.


If that were an important reason not to clone, then cp should not have 
made --reflink=auto the default, as clones would not be considered to be 
copies.


However, I'm comfortable with the idea that clones satisfy the POSIX 
definition of copying. The ordinary interpretation of "clone" means a 
copy, and at the POSIX level users can't tell the difference so the 
as-if rule applies. And if the file system does deduplication+CoW at the 
block level, with something like VDO say, there's no good way GNU 'cp' 
can force a copy rather than cloning regardless.






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread Pádraig Brady

On 10/02/2023 21:50, Paul Eggert wrote:

On 2/10/23 13:35, George Valkov wrote:


Since the source and it’s clone have separate metadata,
it should be possible to change it on the clone, to comply with standards.


Attached is a hacky patch to do that. It also uses the new CLONE_ACL
flag. The old code apparently mishandled ACLs; the new code tries to do
a better job.


Yes this is looking much better.
I'm not fully convinced that we should also update the times,
but not against it either.


This whole area is a mess, but the patch should improve correctness on
macOS, while also being efficient in the usual case.


Yes it's a pity there is no fd (data) only interface for this,
rather than dealing with names (metadata) too.

thanks,
Pádraig.





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread Paul Eggert

On 2/10/23 13:35, George Valkov wrote:


Since the source and it’s clone have separate metadata,
it should be possible to change it on the clone, to comply with standards.


Attached is a hacky patch to do that. It also uses the new CLONE_ACL 
flag. The old code apparently mishandled ACLs; the new code tries to do 
a better job.


This whole area is a mess, but the patch should improve correctness on 
macOS, while also being efficient in the usual case.




It feels more natural when the metadata is kept intact for the two files.


That depends on the application. The longstanding tradition for cp is to 
preserve metadata if you use 'cp -p', and to not preserve with plain cp. 
For interactive use you can use an alias or a little script if you 
prefer -p to be enabled by default.




Here is a good example: I create a tarball on my OpenWRT router. Extract on the 
Mac.


That's fine, because 'tar' historically has preserved the timestamp and 
(if you're root) permissions. Plain cp has not.




I found this link in one of the mailing lists you sent, it explains what was 
wrong
with the original SEEK_DATA and SEEK_HOLE approach on macOS


GNU cp does a pass over the file with SEEK_HOLE and SEEK_DATA, so if I 
understand things correctly it shouldn't run into the problem mentioned 
there.


In other words, that email doesn't appear to explain our current problem.From 9f2c0ab64925a70e287a92e9928100c7cd6368c2 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Fri, 10 Feb 2023 13:34:54 -0800
Subject: [PATCH] cp: improve use of fclonefileat

* src/copy.c (copy_reg): Use CLONE_ACL if available.
If the only problem with fclonefileat is that it would generate
the wrong timestamp, or would create the file with too few
permissions, use it and fix the mode and timestamp afterwards,
rather than falling back on a traditional copy.
---
 src/copy.c | 48 +---
 1 file changed, 41 insertions(+), 7 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index dfbb557de..04f71eb2f 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1146,6 +1146,7 @@ copy_reg (char const *src_name, char const *dst_name,
   union scan_inference scan_inference;
   bool return_val = true;
   bool data_copy_required = x->data_copy_required;
+  bool mode_already_preserved = false;
   bool preserve_xattr = USE_XATTR & x->preserve_xattr;
 
   source_desc = open (src_name,
@@ -1244,17 +1245,45 @@ copy_reg (char const *src_name, char const *dst_name,
   if (*new_dst)
 {
 #if HAVE_FCLONEFILEAT && !USE_XATTR
-/* CLONE_NOOWNERCOPY only available on macos >= 10.13.  */
+# ifndef CLONE_ACL
+#  define CLONE_ACL 0 /* Added in macOS 12.6.  */
+# endif
 # ifndef CLONE_NOOWNERCOPY
-#  define CLONE_NOOWNERCOPY 0
+#  define CLONE_NOOWNERCOPY 0 /* Added in macOS 10.13.  */
 # endif
-  int fc_flags = x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY;
+  mode_t cloned_mode_bits = S_ISVTX | S_IRWXUGO;
+  mode_t cloned_mode = src_mode & cloned_mode_bits;
+  int fc_flags = ((x->preserve_mode ? CLONE_ACL : 0)
+  | (x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY));
   if (data_copy_required && x->reflink_mode
-  && x->preserve_mode && x->preserve_timestamps
+  && (x->preserve_mode || ! (cloned_mode & ~dst_mode))
   && (x->preserve_ownership || CLONE_NOOWNERCOPY))
 {
+  fprintf(stderr, "calling fclonefileat...\n");
   if (fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags) == 0)
-goto close_src_desc;
+{
+  fprintf(stderr, "fclonefileat succeeded\n");
+  if (!x->preserve_timestamps)
+{
+  struct timespec timespec[2];
+  timespec[0].tv_nsec = timespec[1].tv_nsec = UTIME_NOW;
+  fprintf(stderr, "calling utimensat...\n");
+  if (utimensat (dst_dirfd, dst_relname, timespec,
+ AT_SYMLINK_NOFOLLOW)
+  != 0)
+{
+  error (0, errno, _("updating times for %s"),
+ quoteaf (dst_name));
+  return_val = false;
+}
+  fprintf(stderr, "utimensat succeeded\n");
+}
+  mode_already_preserved = (fc_flags & CLONE_ACL) != 0;
+  dest_desc = -1;
+  omitted_permissions = dst_mode & ~cloned_mode;
+  extra_permissions = 0;
+  goto set_dest_mode;
+}
   else if (! handle_clone_fail (dst_dirfd, dst_relname, src_name,
 dst_name,
 -1, false /* We didn't create dst  */,
@@ -1485,9 +1514,14 @@ copy_reg (char const *src_name, char const *dst_name,
 
   set_author (dst_name, dest_desc, src_sb);
 
+#if HAVE_FCLONEFILEAT && !USE_XATTR
+set_dest_mode:
+#endif
   if (x->preserve_mode || x->move_mode)
 {
-  if 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread Pádraig Brady

On 10/02/2023 20:45, Paul Eggert wrote:

On 2/10/23 10:58, Pádraig Brady wrote:

I was considering "touch"ing the timestamps after also,
but it's better to just maintain them as we're
pointing to the same data after all.


For POSIX conformance we must touch if the user has specified only POSIX
options (and has not specified -p).

And it's not just a POSIX conformance issue. Ordinary users will be
surprised if plain 'cp A B' creates a file B with a timestamp from last
year.


Maybe. Though POSIX says cp "shall copy" and we're not making a copy, we're 
making a reflink.
So technically we're violating POSIX already in that regard.
One might take the view that the fact we write no new data
means we should not update the data modification time etc. by default,
and this may be more signal to a user that new data has not in fact been 
written.


Likewise for B's modes.


Yes agreed on this. We'll have to honor umask etc.


There's another complication: recent macOS versions have CLONE_ACL, and
we're not using that.


Oh right, we should consider that too.

thanks,
Pádraig






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread George Valkov


> On 2023-02-10, at 10:45 PM, Paul Eggert  wrote:
> 
> On 2/10/23 10:58, Pádraig Brady wrote:
>> I was considering "touch"ing the timestamps after also,
>> but it's better to just maintain them as we're
>> pointing to the same data after all.
> 
> For POSIX conformance we must touch if the user has specified only POSIX 
> options (and has not specified -p).
> 
> And it's not just a POSIX conformance issue. Ordinary users will be surprised 
> if plain 'cp A B' creates a file B with a timestamp from last year.
> 
> Likewise for B's modes.
> 
> There's another complication: recent macOS versions have CLONE_ACL, and we're 
> not using that.

I personally prefer using CoW as default (no parameters), because it is way 
more efficient:
could save a lot of time and space, especially when long builds or large files 
are involved.
Of course I also understand that coreutils are expected to meet a certain 
standard behaviour
among all architectures. Since the source and it’s clone have separate metadata,
it should be possible to change it on the clone, to comply with standards. On 
the other hand
It feels more natural when the metadata is kept intact for the two files. This 
is a good
indication that no changes have occurred since the clone or copy, and they are 
the same.
This approach is also consistent with how macOS Finder copies files and 
directories:
all metadata is preserved.

Here is a good example: I create a tarball on my OpenWRT router. Extract on the 
Mac.
Copy to the build directory and create new firmware. This preserves all 
permissions and
metadata. So there is no need for any special tools or command arguments, it 
just works.
In this scenario, the permissions are critical to preserve. Timestamp is a good 
indication of
when the file was last changed, setting a different value on the copy is 
confusing and makes
it very hard to track which files in a copied directory has changed since.

That’s just my personal opinion for default operation and may not match 
standards,
but it makes work natural and productive. Command line arguments should be 
respected.

I found this link in one of the mailing lists you sent, it explains what was 
wrong
with the original SEEK_DATA and SEEK_HOLE approach on macOS, and why
we can’t relay on them:
https://lists.gnu.org/archive/html/bug-gnulib/2018-09/msg00054.html

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread Paul Eggert

On 2/10/23 10:58, Pádraig Brady wrote:

I was considering "touch"ing the timestamps after also,
but it's better to just maintain them as we're
pointing to the same data after all.


For POSIX conformance we must touch if the user has specified only POSIX 
options (and has not specified -p).


And it's not just a POSIX conformance issue. Ordinary users will be 
surprised if plain 'cp A B' creates a file B with a timestamp from last 
year.


Likewise for B's modes.

There's another complication: recent macOS versions have CLONE_ACL, and 
we're not using that.







bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread George Valkov

> On 2023-02-10, at 8:58 PM, Pádraig Brady  wrote:
> 
> On 10/02/2023 17:24, George Valkov wrote:
>>> On 2023-02-10, at 4:02 PM, Pádraig Brady  wrote:
>>> 
>>> On 10/02/2023 12:13, George Valkov wrote:
> On 2023-02-10, at 11:18 AM, Pádraig Brady  wrote:
> 
> I'll apply the simple patch later I think.
 I have an interesting idea: If I copy a large file, say the 16 GB disk 
 image
 where I compiled OpenWRT. So I copy this on the same filesystem, and check
 disk usage before an after: no change. Also the copy is instant. That’s 
 because
 APFS supports copy-on-write. Initially both files share the same sectors 
 on disk,
 any changes made after that are copied to new sectors.
 This is the most efficient way to copy files on APFS, and should produce 
 no corruption.
 Let’s implement it. I would assume there is a system cal that does the 
 entire copy,
 And we don’t need to read and write any data.
 Does the trace contain any interesting calls that might be related to that?
>>> 
>>> When you say "I copy a large file", is that with gcp or something else?
>> Finder: option + drag in the same directory
>>> Since coreutils 9.1 we try the CoW by default with fclonefileat()
>>> which is available since macOS 10.12 (2016).
>> I can confirms fclonefileat works on macOS 12.6.3 and solves the issue.
>> File attributes are also preserved. To clarify: these observations are from
>> the sample below. I don’t have internal experience with cloreutils, but I can
>> test it if you provide some description.
>>> Note that works only when src and dest are within the same file system,
>>> but that is the case for you if I'm reading your original report correctly.
>> Correct. fclonefileat only works on the same file system.
>> If we attempt to clone to another volume, fclonefileat fails 18 Cross-device 
>> link.
>> When building OpenWRT everything works on the same file system, so
>> fclonefileat is applicable.
>>> When I mentioned that earlier I thought your macOS version was too old to 
>>> support that,
>>> but in fact your 12.6.3 should support fclonefileat() fine.
>> Thankfully it is supported. The latest security update came in January 2023.
>>> So that's another variable. Is that call failing completely for you?
>>> You might be quicker to add a couple of printfs around the
>>> fclonefileat() calls in the coreutils from latest git you compliled.
>>> Note there is new error handling related to that call in the latest git,
>>> compared to what was released in coreutils 9.1.
>> Coreutils tests on master
>> git clone git://git.savannah.gnu.org/coreutils.git
>> mv coreutils coreutils-clone
>> cd coreutils-clone
>> git submodule foreach git pull origin master
>> ./bootstrap
>> ./configure
>> make -j 16
>> git log
>> commit d374d32ccf12f8cf33c8f823d573498b7c8b27a4 (HEAD -> clone, 
>> origin/master, origin/HEAD, master)
>> cd ..
>> ./coreutils-clone/src/cp cc1 cc1-test
>> printf("HAVE_FCLONEFILEAT %u  USE_XATTR %u\n", HAVE_FCLONEFILEAT, USE_XATTR);
>> HAVE_FCLONEFILEAT 1  USE_XATTR 0
>>   int fc_flags = x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY;
>>  printf(
>>  "data_copy_required  %u  x->reflink_mode
>> %u\n"
>>  "x->preserve_mode%u  x->preserve_timestamps 
>> %u\n"
>>  "x->preserve_ownership   %u  CLONE_NOOWNERCOPY  
>> %u\n",
>>  data_copy_required, x->reflink_mode,
>>  x->preserve_mode, x->preserve_timestamps,
>>  x->preserve_ownership, CLONE_NOOWNERCOPY
>>  );
>>   if (data_copy_required && x->reflink_mode
>>   && x->preserve_mode && x->preserve_timestamps
>>   && (x->preserve_ownership || CLONE_NOOWNERCOPY))
>> {
>>  int a = fclonefileat (source_desc, dst_dirfd, 
>> dst_relname, fc_flags);
>>  int e = errno;
>>  printf("fclonefileat %i %i\n", a, e);
>>   if (a == 0)
>> goto close_src_desc;
>> data_copy_required  1  x->reflink_mode1
>> x->preserve_mode0  x->preserve_timestamps 0
>> x->preserve_ownership   0  CLONE_NOOWNERCOPY  2
>> fclonefileat is not used because x->preserve_mode = 0, 
>> x->preserve_timestamps = 0
>> That design seems wrong. The fact than no one requested to preserve that 
>> data,
>> doesn’t mean preserving it via fclonefileat is a bad thing. I think we should
>> always call fclonefileat and proceed to other means if it fails. It is more 
>> efficient, since
>> it requires no additional space or data to be copied. And always produces a 
>> valid copy.
>>> Note also I don't see any fclonefileat() syscalls in your Finder logs at 
>>> least.
>> Great news: fclonefileat works on the same file system, and if we
>> need to copy outside, then just disable sparse copy according to my patch.
>> One option is to start blindly with fclonefileat, and 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread Pádraig Brady

On 10/02/2023 17:24, George Valkov wrote:



On 2023-02-10, at 4:02 PM, Pádraig Brady  wrote:

On 10/02/2023 12:13, George Valkov wrote:

On 2023-02-10, at 11:18 AM, Pádraig Brady  wrote:

I'll apply the simple patch later I think.

I have an interesting idea: If I copy a large file, say the 16 GB disk image
where I compiled OpenWRT. So I copy this on the same filesystem, and check
disk usage before an after: no change. Also the copy is instant. That’s because
APFS supports copy-on-write. Initially both files share the same sectors on 
disk,
any changes made after that are copied to new sectors.
This is the most efficient way to copy files on APFS, and should produce no 
corruption.
Let’s implement it. I would assume there is a system cal that does the entire 
copy,
And we don’t need to read and write any data.
Does the trace contain any interesting calls that might be related to that?


When you say "I copy a large file", is that with gcp or something else?


Finder: option + drag in the same directory


Since coreutils 9.1 we try the CoW by default with fclonefileat()
which is available since macOS 10.12 (2016).


I can confirms fclonefileat works on macOS 12.6.3 and solves the issue.
File attributes are also preserved. To clarify: these observations are from
the sample below. I don’t have internal experience with cloreutils, but I can
test it if you provide some description.



Note that works only when src and dest are within the same file system,
but that is the case for you if I'm reading your original report correctly.


Correct. fclonefileat only works on the same file system.
If we attempt to clone to another volume, fclonefileat fails 18 Cross-device 
link.
When building OpenWRT everything works on the same file system, so
fclonefileat is applicable.



When I mentioned that earlier I thought your macOS version was too old to 
support that,
but in fact your 12.6.3 should support fclonefileat() fine.


Thankfully it is supported. The latest security update came in January 2023.



So that's another variable. Is that call failing completely for you?
You might be quicker to add a couple of printfs around the
fclonefileat() calls in the coreutils from latest git you compliled.
Note there is new error handling related to that call in the latest git,
compared to what was released in coreutils 9.1.


Coreutils tests on master
git clone git://git.savannah.gnu.org/coreutils.git
mv coreutils coreutils-clone
cd coreutils-clone
git submodule foreach git pull origin master
./bootstrap
./configure
make -j 16
git log
commit d374d32ccf12f8cf33c8f823d573498b7c8b27a4 (HEAD -> clone, origin/master, 
origin/HEAD, master)
cd ..

./coreutils-clone/src/cp cc1 cc1-test

printf("HAVE_FCLONEFILEAT %u  USE_XATTR %u\n", HAVE_FCLONEFILEAT, USE_XATTR);
HAVE_FCLONEFILEAT 1  USE_XATTR 0

   int fc_flags = x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY;
printf(
"data_copy_required  %u  x->reflink_mode
%u\n"
"x->preserve_mode%u  x->preserve_timestamps 
%u\n"
"x->preserve_ownership   %u  CLONE_NOOWNERCOPY  
%u\n",
data_copy_required, x->reflink_mode,
x->preserve_mode, x->preserve_timestamps,
x->preserve_ownership, CLONE_NOOWNERCOPY
);
   if (data_copy_required && x->reflink_mode
   && x->preserve_mode && x->preserve_timestamps
   && (x->preserve_ownership || CLONE_NOOWNERCOPY))
 {
int a = fclonefileat (source_desc, dst_dirfd, 
dst_relname, fc_flags);
int e = errno;
printf("fclonefileat %i %i\n", a, e);
   if (a == 0)
 goto close_src_desc;

data_copy_required  1  x->reflink_mode1
x->preserve_mode0  x->preserve_timestamps 0
x->preserve_ownership   0  CLONE_NOOWNERCOPY  2

fclonefileat is not used because x->preserve_mode = 0, x->preserve_timestamps = 0
That design seems wrong. The fact than no one requested to preserve that data,
doesn’t mean preserving it via fclonefileat is a bad thing. I think we should
always call fclonefileat and proceed to other means if it fails. It is more 
efficient, since
it requires no additional space or data to be copied. And always produces a 
valid copy.



Note also I don't see any fclonefileat() syscalls in your Finder logs at least.


Great news: fclonefileat works on the same file system, and if we
need to copy outside, then just disable sparse copy according to my patch.
One option is to start blindly with fclonefileat, and if that fails, fall back 
to
normal copy. If you can use the trace to find what Finder is doing, we can try 
it.
It might bring some improvements. Else, disable sparse and do a regular copy.

It was easier for me to write a small sample:

// clone.c
#include 
#include 
#include 
#include 

int main(int argc, char ** argv)
{

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread George Valkov


> On 2023-02-10, at 4:02 PM, Pádraig Brady  wrote:
> 
> On 10/02/2023 12:13, George Valkov wrote:
>>> On 2023-02-10, at 11:18 AM, Pádraig Brady  wrote:
>>> 
>>> I'll apply the simple patch later I think.
>> I have an interesting idea: If I copy a large file, say the 16 GB disk image
>> where I compiled OpenWRT. So I copy this on the same filesystem, and check
>> disk usage before an after: no change. Also the copy is instant. That’s 
>> because
>> APFS supports copy-on-write. Initially both files share the same sectors on 
>> disk,
>> any changes made after that are copied to new sectors.
>> This is the most efficient way to copy files on APFS, and should produce no 
>> corruption.
>> Let’s implement it. I would assume there is a system cal that does the 
>> entire copy,
>> And we don’t need to read and write any data.
>> Does the trace contain any interesting calls that might be related to that?
> 
> When you say "I copy a large file", is that with gcp or something else?

Finder: option + drag in the same directory

> Since coreutils 9.1 we try the CoW by default with fclonefileat()
> which is available since macOS 10.12 (2016).

I can confirms fclonefileat works on macOS 12.6.3 and solves the issue.
File attributes are also preserved. To clarify: these observations are from
the sample below. I don’t have internal experience with cloreutils, but I can
test it if you provide some description.


> Note that works only when src and dest are within the same file system,
> but that is the case for you if I'm reading your original report correctly.

Correct. fclonefileat only works on the same file system.
If we attempt to clone to another volume, fclonefileat fails 18 Cross-device 
link.
When building OpenWRT everything works on the same file system, so
fclonefileat is applicable.


> When I mentioned that earlier I thought your macOS version was too old to 
> support that,
> but in fact your 12.6.3 should support fclonefileat() fine.

Thankfully it is supported. The latest security update came in January 2023.


> So that's another variable. Is that call failing completely for you?
> You might be quicker to add a couple of printfs around the
> fclonefileat() calls in the coreutils from latest git you compliled.
> Note there is new error handling related to that call in the latest git,
> compared to what was released in coreutils 9.1.

Coreutils tests on master
git clone git://git.savannah.gnu.org/coreutils.git
mv coreutils coreutils-clone
cd coreutils-clone
git submodule foreach git pull origin master
./bootstrap
./configure
make -j 16
git log
commit d374d32ccf12f8cf33c8f823d573498b7c8b27a4 (HEAD -> clone, origin/master, 
origin/HEAD, master)
cd ..

./coreutils-clone/src/cp cc1 cc1-test

printf("HAVE_FCLONEFILEAT %u  USE_XATTR %u\n", HAVE_FCLONEFILEAT, USE_XATTR);
HAVE_FCLONEFILEAT 1  USE_XATTR 0

  int fc_flags = x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY;
printf(
"data_copy_required  %u  x->reflink_mode
%u\n"
"x->preserve_mode%u  x->preserve_timestamps 
%u\n"
"x->preserve_ownership   %u  CLONE_NOOWNERCOPY  
%u\n",
data_copy_required, x->reflink_mode,
x->preserve_mode, x->preserve_timestamps,
x->preserve_ownership, CLONE_NOOWNERCOPY
);
  if (data_copy_required && x->reflink_mode
  && x->preserve_mode && x->preserve_timestamps
  && (x->preserve_ownership || CLONE_NOOWNERCOPY))
{
int a = fclonefileat (source_desc, dst_dirfd, 
dst_relname, fc_flags);
int e = errno;
printf("fclonefileat %i %i\n", a, e);
  if (a == 0)
goto close_src_desc;

data_copy_required  1  x->reflink_mode1
x->preserve_mode0  x->preserve_timestamps 0
x->preserve_ownership   0  CLONE_NOOWNERCOPY  2

fclonefileat is not used because x->preserve_mode = 0, x->preserve_timestamps = 0
That design seems wrong. The fact than no one requested to preserve that data,
doesn’t mean preserving it via fclonefileat is a bad thing. I think we should
always call fclonefileat and proceed to other means if it fails. It is more 
efficient, since
it requires no additional space or data to be copied. And always produces a 
valid copy.


> Note also I don't see any fclonefileat() syscalls in your Finder logs at 
> least.

Great news: fclonefileat works on the same file system, and if we
need to copy outside, then just disable sparse copy according to my patch.
One option is to start blindly with fclonefileat, and if that fails, fall back 
to
normal copy. If you can use the trace to find what Finder is doing, we can try 
it.
It might bring some improvements. Else, disable sparse and do a regular copy.

It was easier for me to write a small sample:

// clone.c
#include 
#include 
#include 
#include 

int main(int argc, 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread Pádraig Brady

On 10/02/2023 12:13, George Valkov wrote:



On 2023-02-10, at 11:18 AM, Pádraig Brady  wrote:

I'll apply the simple patch later I think.


I have an interesting idea: If I copy a large file, say the 16 GB disk image
where I compiled OpenWRT. So I copy this on the same filesystem, and check
disk usage before an after: no change. Also the copy is instant. That’s because
APFS supports copy-on-write. Initially both files share the same sectors on 
disk,
any changes made after that are copied to new sectors.
This is the most efficient way to copy files on APFS, and should produce no 
corruption.
Let’s implement it. I would assume there is a system cal that does the entire 
copy,
And we don’t need to read and write any data.
Does the trace contain any interesting calls that might be related to that?


When you say "I copy a large file", is that with gcp or something else?
Since coreutils 9.1 we try the CoW by default with fclonefileat()
which is available since macOS 10.12 (2016).
Note that works only when src and dest are within the same file system,
but that is the case for you if I'm reading your original report correctly.
When I mentioned that earlier I thought your macOS version was too old to 
support that,
but in fact your 12.6.3 should support fclonefileat() fine.
So that's another variable. Is that call failing completely for you?
You might be quicker to add a couple of printfs around the
fclonefileat() calls in the coreutils from latest git you compliled.
Note there is new error handling related to that call in the latest git,
compared to what was released in coreutils 9.1.
Note also I don't see any fclonefileat() syscalls in your Finder logs at least.

cheers,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread George Valkov


> On 2023-02-10, at 11:18 AM, Pádraig Brady  wrote:
> 
> On 10/02/2023 03:57, Paul Eggert wrote:
>> On 2/9/23 01:20, George Valkov wrote:
>>> -#ifdef SEEK_HOLE
>>> +#if defined(SEEK_HOLE) && !defined(__APPLE__)
>> Instead of always disabling the SEEK_HOLE optimization, how about doing
>> it only on APFS files? Something like the attached, perhaps (this is
>> against Savannah master).
>> If APFS is pretty much universal now on Apple platforms, this patch is
>> overkill and we should use your much-simpler patch.

Hi Paul,
I compiled your patch against git://git.savannah.gnu.org/coreutils.git
and it fixes the problem. But as I mention later, I think we should not use it.
Thank you!

> Thanks for doing that Paul.
> I think the simpler patch is best though
> since APFS is the default macOS file system since 2017,
> and upon upgrade earlier file systems are upgraded to that.

I agree. macOS supports HFS+ and APFS.
According to Wikipedia, HFS+ does not support sparse files.
On APFS we should not use SEEK_DATA since it is broken.
So the simple patch is better.


> Also SEEK_DATA doesn't seem to be a well supported feature
> on macOS anyway as there was also this issue:
> https://bugs.gnu.org/51857
> 
> I'll apply the simple patch later I think.

I have an interesting idea: If I copy a large file, say the 16 GB disk image
where I compiled OpenWRT. So I copy this on the same filesystem, and check
disk usage before an after: no change. Also the copy is instant. That’s because
APFS supports copy-on-write. Initially both files share the same sectors on 
disk,
any changes made after that are copied to new sectors.
This is the most efficient way to copy files on APFS, and should produce no 
corruption.
Let’s implement it. I would assume there is a system cal that does the entire 
copy,
And we don’t need to read and write any data.
Does the trace contain any interesting calls that might be related to that?


Regarding dtruss, on macOS there is integrity protection which prevents you
from debugging applications that came with macOS such as fined. I read
that in order to do this, integrity protection has to be disabled from recovery,
then copy the application to a non-system location, e.g. ~/ and finally
remove code signature. That’s why the traces I provided are limited, and it gets
tricky to do all of this, considering Finder runs all the time and the system
will restart it if it gets killed. But we need to run the copy. With that said, 
if you want,
I will give it a try.

dtruss [-acdefholLs] [-t syscall] { -p PID | -n name | command | -W name }
-p PID  # examine this PID
-n name # examine this process name
-t syscall  # examine this syscall only
-W name # wait for a process matching this name
-a  # print all details
-c  # print syscall counts
-d  # print relative times (us)
-e  # print elapsed times (us)
-f  # follow children
-l  # force printing pid/lwpid
-o  # print on cpu times
-s  # print stack backtraces
-L  # don't print pid/lwpid
-b bufsize  # dynamic variable buf size


Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread Pádraig Brady

On 10/02/2023 03:57, Paul Eggert wrote:

On 2/9/23 01:20, George Valkov wrote:

-#ifdef SEEK_HOLE
+#if defined(SEEK_HOLE) && !defined(__APPLE__)


Instead of always disabling the SEEK_HOLE optimization, how about doing
it only on APFS files? Something like the attached, perhaps (this is
against Savannah master).

If APFS is pretty much universal now on Apple platforms, this patch is
overkill and we should use your much-simpler patch.


Thanks for doing that Paul.
I think the simpler patch is best though
since APFS is the default macOS file system since 2017,
and upon upgrade earlier file systems are upgraded to that.

Also SEEK_DATA doesn't seem to be a well supported feature
on macOS anyway as there was also this issue:
https://bugs.gnu.org/51857

I'll apply the simple patch later I think.

cheers,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-10 Thread Pádraig Brady

On 10/02/2023 01:03, George Valkov wrote:




On 2023-02-09, at 11:35 PM, Pádraig Brady  wrote:

On 09/02/2023 17:23, George Valkov wrote:

On 2023-02-09, at 6:32 PM, Pádraig Brady  wrote:

On 09/02/2023 15:57, George Valkov wrote:

On 2023-02-09, at 1:56 PM, Pádraig Brady  wrote:

On 09/02/2023 09:20, George Valkov wrote:

Due to a bug in macOS, sparse copies are corrupted on virtual disks
formatted with APFS. HFS is not affected. Affected are coreutils
install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
While reading the entire file returns valid data, scanning for
allocated segments may return holes where valid data is present.
In this case a sparse copy does not contain these segments and returns
zeroes instead. Once the virtual disk is dismounted and then
mounted again, a sparse copy produces correct results.
This breaks OpenWRT build on macOS. Details:
https://github.com/openwrt/openwrt/pull/11960
https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
Signed-off-by: Georgi Valkov 
---
  src/copy.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/copy.c b/src/copy.c
index e16fedb28..4f56138a6 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
return PLAIN_SCANTYPE;
  }
  -#ifdef SEEK_HOLE
+#if defined(SEEK_HOLE) && !defined(__APPLE__)
scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
if (0 <= scan_inference->ext_start || errno == ENXIO)
  return LSEEK_SCANTYPE;



Thanks for the detailed report.
The patch might very well be appropriate.

Hi! Let’s test the ideas you have first, and fall-back to the patch.
In October 2021 macOS Finder was also affected, and that points directly at 
Apple.
I tested again today, they have fixed Finder. After performing a copy in Finder,
coreutils cp produces a good copy. I have to run
make toolchain/gcc/initial/{clean,compile} -j 16
before I can reproduce it again with the same file.
So while Apple didn't fix the underlaying issue with APFS, they did
provide a solution for Finder. And we can make coreutils work correctly too.


That suggests that Finder may have sync'd the file.
Now syncing has overheads of course, so not an option to take lightly.

#include "stdio.h"
#include "unistd.h"
#include "fcntl.h"
int main(int argc, char ** argv)
{
 sync();
 int fd = open("cc1", O_RDWR);
 //int fd = open("/Users/g/vhd/coreutils.sparseimage", O_RDWR);
 int a = fdatasync(fd);
 int b = fsync(fd);
 int c = close(fd);
 printf("fdatasync %u  fsync %u  close %u\n", a, b, c);
 return 0;
}
fdatasync 0  fsync 0  close 0
I agree. It takes about one second to run this code.
All calls are successful. The sparse copy after that is still corrupted.
I also tried doing this on the sparse image while it is mounted.


Thanks for confirming the sync doesn't help.


We may be safer just doing the normal copy on __APPLE__ as per your orig patch.
A dtruss of Finder might be instructive BTW.

Yes, sync is definitely not good for performance. What is ‘dtruss'?


That would show the system calls that Finder is using to perform the copy.
Perhaps Finder is also just avoiding the SEEK_DATA on APFS?


I ran 3 traces, let me know if they are useful? If not, I will have to disable
integrity protection, and collect new data tomorrow. I got errors:
dtruss: system integrity protection is on, some features will not be available
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/dtruss/

You did not mention which flags you want with dtruss?



Without further info, I'll apply your original avoidance patch for __APPLE__.


It’s up to you, as long as you have ideas, I can test them.
I can also invite you on Parsec or Anydesk. Working together is more efficient.


I've only used dtruss a couple of times (with sudo)
as I'm not a macOS user, sorry.

Those 3 traces only had a single lseek() each,
so I'm assuming for now that Finder is not using SEEK_DATA.

Will apply your patch later.

thanks,
Pádraig






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-09 Thread Paul Eggert

On 2/9/23 01:20, George Valkov wrote:

-#ifdef SEEK_HOLE
+#if defined(SEEK_HOLE) && !defined(__APPLE__)


Instead of always disabling the SEEK_HOLE optimization, how about doing 
it only on APFS files? Something like the attached, perhaps (this is 
against Savannah master).


If APFS is pretty much universal now on Apple platforms, this patch is 
overkill and we should use your much-simpler patch.From bd8c931fade7fcf96c02e4492912f27a2d1c795b Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Thu, 9 Feb 2023 19:53:59 -0800
Subject: [PATCH] cp: work around APFS bug

See .
* src/copy.c (SEEK_HOLE && __APPLE__):
Include , for statfs.
(infer_scantype) [SEEK_HOLE && __APPLE__):
Don't use SEEK_HOLE on APFS, as APFS is buggy.
---
 src/copy.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/src/copy.c b/src/copy.c
index dfbb557de..95151fac0 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -65,6 +65,10 @@
 #include "yesno.h"
 #include "selinux.h"
 
+#if defined SEEK_HOLE && defined __APPLE__
+# include 
+#endif
+
 #ifndef USE_XATTR
 # define USE_XATTR false
 #endif
@@ -1063,6 +1067,26 @@ infer_scantype (int fd, struct stat const *sb,
 return PLAIN_SCANTYPE;
 
 #ifdef SEEK_HOLE
+# ifdef __APPLE__
+  /* Work around APFS bug  by not using
+ SEEK_DATA on APFS files.  To avoid fstatfs calls in common cases,
+ APFS_DEV[0] caches the most recent non-APFS st_dev, and APFS_DEV[1]
+ caches the most recent APFS st_dev.  Zero caches are invalid.  */
+  static dev_t apfs_dev[2];
+  if (! (sb->st_dev && sb->st_dev == apfs_dev[0]))
+{
+  if (sb->st_dev && sb->st_dev == apfs_dev[1])
+return ZERO_SCANTYPE;
+  struct statfs fs;
+  if (fstatfs (fd, ) != 0)
+return ERROR_SCANTYPE;
+  bool apfs = strcmp (fs.f_fstypename, "apfs") == 0;
+  if (sb->st_dev)
+apfs_dev[apfs] = sb->st_dev;
+  if (apfs)
+return ZERO_SCANTYPE;
+}
+# endif
   off_t ext_start = lseek (fd, 0, SEEK_DATA);
   if (0 <= ext_start || errno == ENXIO)
 {
-- 
2.39.1



bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-09 Thread George Valkov



> On 2023-02-09, at 11:35 PM, Pádraig Brady  wrote:
> 
> On 09/02/2023 17:23, George Valkov wrote:
>>> On 2023-02-09, at 6:32 PM, Pádraig Brady  wrote:
>>> 
>>> On 09/02/2023 15:57, George Valkov wrote:
> On 2023-02-09, at 1:56 PM, Pádraig Brady  wrote:
> 
> On 09/02/2023 09:20, George Valkov wrote:
>> Due to a bug in macOS, sparse copies are corrupted on virtual disks
>> formatted with APFS. HFS is not affected. Affected are coreutils
>> install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
>> While reading the entire file returns valid data, scanning for
>> allocated segments may return holes where valid data is present.
>> In this case a sparse copy does not contain these segments and returns
>> zeroes instead. Once the virtual disk is dismounted and then
>> mounted again, a sparse copy produces correct results.
>> This breaks OpenWRT build on macOS. Details:
>> https://github.com/openwrt/openwrt/pull/11960
>> https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
>> Signed-off-by: Georgi Valkov 
>> ---
>>  src/copy.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> diff --git a/src/copy.c b/src/copy.c
>> index e16fedb28..4f56138a6 100644
>> --- a/src/copy.c
>> +++ b/src/copy.c
>> @@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
>>return PLAIN_SCANTYPE;
>>  }
>>  -#ifdef SEEK_HOLE
>> +#if defined(SEEK_HOLE) && !defined(__APPLE__)
>>scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>>if (0 <= scan_inference->ext_start || errno == ENXIO)
>>  return LSEEK_SCANTYPE;
> 
> 
> Thanks for the detailed report.
> The patch might very well be appropriate.
 Hi! Let’s test the ideas you have first, and fall-back to the patch.
 In October 2021 macOS Finder was also affected, and that points directly 
 at Apple.
 I tested again today, they have fixed Finder. After performing a copy in 
 Finder,
 coreutils cp produces a good copy. I have to run
 make toolchain/gcc/initial/{clean,compile} -j 16
 before I can reproduce it again with the same file.
 So while Apple didn't fix the underlaying issue with APFS, they did
 provide a solution for Finder. And we can make coreutils work correctly 
 too.
>>> 
>>> That suggests that Finder may have sync'd the file.
>>> Now syncing has overheads of course, so not an option to take lightly.
>> #include "stdio.h"
>> #include "unistd.h"
>> #include "fcntl.h"
>> int main(int argc, char ** argv)
>> {
>> sync();
>> int fd = open("cc1", O_RDWR);
>> //int fd = open("/Users/g/vhd/coreutils.sparseimage", O_RDWR);
>> int a = fdatasync(fd);
>> int b = fsync(fd);
>> int c = close(fd);
>> printf("fdatasync %u  fsync %u  close %u\n", a, b, c);
>> return 0;
>> }
>> fdatasync 0  fsync 0  close 0
>> I agree. It takes about one second to run this code.
>> All calls are successful. The sparse copy after that is still corrupted.
>> I also tried doing this on the sparse image while it is mounted.
> 
> Thanks for confirming the sync doesn't help.
> 
>>> We may be safer just doing the normal copy on __APPLE__ as per your orig 
>>> patch.
>>> A dtruss of Finder might be instructive BTW.
>> Yes, sync is definitely not good for performance. What is ‘dtruss'?
> 
> That would show the system calls that Finder is using to perform the copy.
> Perhaps Finder is also just avoiding the SEEK_DATA on APFS?

I ran 3 traces, let me know if they are useful? If not, I will have to disable
integrity protection, and collect new data tomorrow. I got errors:
dtruss: system integrity protection is on, some features will not be available
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/dtruss/

You did not mention which flags you want with dtruss?


> Without further info, I'll apply your original avoidance patch for __APPLE__.

It’s up to you, as long as you have ideas, I can test them.
I can also invite you on Parsec or Anydesk. Working together is more efficient.

Cheers, mate!

Georgi Valkov
httpstorm.com
nano RTOS






bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-09 Thread Pádraig Brady

On 09/02/2023 17:23, George Valkov wrote:




On 2023-02-09, at 6:32 PM, Pádraig Brady  wrote:

On 09/02/2023 15:57, George Valkov wrote:

On 2023-02-09, at 1:56 PM, Pádraig Brady  wrote:

On 09/02/2023 09:20, George Valkov wrote:

Due to a bug in macOS, sparse copies are corrupted on virtual disks
formatted with APFS. HFS is not affected. Affected are coreutils
install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
While reading the entire file returns valid data, scanning for
allocated segments may return holes where valid data is present.
In this case a sparse copy does not contain these segments and returns
zeroes instead. Once the virtual disk is dismounted and then
mounted again, a sparse copy produces correct results.
This breaks OpenWRT build on macOS. Details:
https://github.com/openwrt/openwrt/pull/11960
https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
Signed-off-by: Georgi Valkov 
---
  src/copy.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/copy.c b/src/copy.c
index e16fedb28..4f56138a6 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
return PLAIN_SCANTYPE;
  }
  -#ifdef SEEK_HOLE
+#if defined(SEEK_HOLE) && !defined(__APPLE__)
scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
if (0 <= scan_inference->ext_start || errno == ENXIO)
  return LSEEK_SCANTYPE;



Thanks for the detailed report.
The patch might very well be appropriate.

Hi! Let’s test the ideas you have first, and fall-back to the patch.
In October 2021 macOS Finder was also affected, and that points directly at 
Apple.
I tested again today, they have fixed Finder. After performing a copy in Finder,
coreutils cp produces a good copy. I have to run
make toolchain/gcc/initial/{clean,compile} -j 16
before I can reproduce it again with the same file.
So while Apple didn't fix the underlaying issue with APFS, they did
provide a solution for Finder. And we can make coreutils work correctly too.


That suggests that Finder may have sync'd the file.
Now syncing has overheads of course, so not an option to take lightly.


#include "stdio.h"
#include "unistd.h"
#include "fcntl.h"

int main(int argc, char ** argv)
{
 sync();

 int fd = open("cc1", O_RDWR);
 //int fd = open("/Users/g/vhd/coreutils.sparseimage", O_RDWR);
 int a = fdatasync(fd);
 int b = fsync(fd);
 int c = close(fd);

 printf("fdatasync %u  fsync %u  close %u\n", a, b, c);

 return 0;
}

fdatasync 0  fsync 0  close 0

I agree. It takes about one second to run this code.
All calls are successful. The sparse copy after that is still corrupted.
I also tried doing this on the sparse image while it is mounted.


Thanks for confirming the sync doesn't help.


We may be safer just doing the normal copy on __APPLE__ as per your orig patch.
A dtruss of Finder might be instructive BTW.


Yes, sync is definitely not good for performance. What is ‘dtruss'?


That would show the system calls that Finder is using to perform the copy.
Perhaps Finder is also just avoiding the SEEK_DATA on APFS?

Without further info, I'll apply your original avoidance patch for __APPLE__.

cheers,
Pádraig.





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-09 Thread George Valkov



> On 2023-02-09, at 6:32 PM, Pádraig Brady  wrote:
> 
> On 09/02/2023 15:57, George Valkov wrote:
>>> On 2023-02-09, at 1:56 PM, Pádraig Brady  wrote:
>>> 
>>> On 09/02/2023 09:20, George Valkov wrote:
 Due to a bug in macOS, sparse copies are corrupted on virtual disks
 formatted with APFS. HFS is not affected. Affected are coreutils
 install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
 While reading the entire file returns valid data, scanning for
 allocated segments may return holes where valid data is present.
 In this case a sparse copy does not contain these segments and returns
 zeroes instead. Once the virtual disk is dismounted and then
 mounted again, a sparse copy produces correct results.
 This breaks OpenWRT build on macOS. Details:
 https://github.com/openwrt/openwrt/pull/11960
 https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
 Signed-off-by: Georgi Valkov 
 ---
  src/copy.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 diff --git a/src/copy.c b/src/copy.c
 index e16fedb28..4f56138a6 100644
 --- a/src/copy.c
 +++ b/src/copy.c
 @@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
return PLAIN_SCANTYPE;
  }
  -#ifdef SEEK_HOLE
 +#if defined(SEEK_HOLE) && !defined(__APPLE__)
scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
if (0 <= scan_inference->ext_start || errno == ENXIO)
  return LSEEK_SCANTYPE;
>>> 
>>> 
>>> Thanks for the detailed report.
>>> The patch might very well be appropriate.
>> Hi! Let’s test the ideas you have first, and fall-back to the patch.
>> In October 2021 macOS Finder was also affected, and that points directly at 
>> Apple.
>> I tested again today, they have fixed Finder. After performing a copy in 
>> Finder,
>> coreutils cp produces a good copy. I have to run
>> make toolchain/gcc/initial/{clean,compile} -j 16
>> before I can reproduce it again with the same file.
>> So while Apple didn't fix the underlaying issue with APFS, they did
>> provide a solution for Finder. And we can make coreutils work correctly too.
> 
> That suggests that Finder may have sync'd the file.
> Now syncing has overheads of course, so not an option to take lightly.

#include "stdio.h"
#include "unistd.h"
#include "fcntl.h"

int main(int argc, char ** argv)
{
sync();

int fd = open("cc1", O_RDWR);
//int fd = open("/Users/g/vhd/coreutils.sparseimage", O_RDWR);
int a = fdatasync(fd);
int b = fsync(fd);
int c = close(fd);

printf("fdatasync %u  fsync %u  close %u\n", a, b, c);

return 0;
}

fdatasync 0  fsync 0  close 0

I agree. It takes about one second to run this code.
All calls are successful. The sparse copy after that is still corrupted.
I also tried doing this on the sparse image while it is mounted.

> We may be safer just doing the normal copy on __APPLE__ as per your orig 
> patch.
> A dtruss of Finder might be instructive BTW.

Yes, sync is definitely not good for performance. What is ‘dtruss'?

> Why I suspect syncing is that old fiemap code we used to have in copy
> needed the FIEMAP_FLAG_SYNC flag to avoid bugs like this on some file systems.

I’m not familiar with the coreutils code. Check an old commit.


>>> We avoided copy_file_range() for a long time on Linux for example
>>> until it became usable.
>>> 
>>> Thanks for confirming the latest git is also impacted.
>> Yes, I tested on master today.
>>> I see you're on macOS 12.
>>> macOS 13 supports fclonefileat() which may
>>> avoid the issue, or also be impacted?
>> Yes, I use macOS 12. There are too many complains about 13.
>> I can boot macOS 13 recovery, so that might be an option,
>> if I can setup a working build environment.
>> Can you invite someone with macOS 13 to test the new API?
>> Github has hosted runners. But I’ve never used one. And I think they use 
>> macOS 12.
> 
> OK let's assume fclonefileat() on macOS 13 is fine.
> That's just a thing to consider if one is testing on macOS 13,
> rather than jumping through hoops to test it.
> 
>>> I wonder might an fdatasync(fd) avoid your issue,
>>> which we might do if defined(__APPLE__)?
>> Send me patches, and I will test them.
>> #ifdef SEEK_HOLE
>>   fdatasync(fd); // this did not work, is fd writable? Do we need to call it 
>> on the disk image?
>>scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>>if (0 <= scan_inference->ext_start || errno == ENXIO)
>>  return LSEEK_SCANTYPE;
> 
> 
> Oh right interesting. When you say doesn't work, do you mean the sync
> fails, or just doesn't help.  Note the separate coreutils sync util
> only opens with write mode on AIX and Cygwin.

fdatasync returns 0, which means success
The sparse copy after that is still corrupted.


> Relatedly you could just try that external `sync file && cp file dest` also 
> to test the sync hypothesis.

I tried these on cc1 and on the sparse image 

bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-09 Thread Pádraig Brady

On 09/02/2023 15:57, George Valkov wrote:




On 2023-02-09, at 1:56 PM, Pádraig Brady  wrote:

On 09/02/2023 09:20, George Valkov wrote:

Due to a bug in macOS, sparse copies are corrupted on virtual disks
formatted with APFS. HFS is not affected. Affected are coreutils
install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
While reading the entire file returns valid data, scanning for
allocated segments may return holes where valid data is present.
In this case a sparse copy does not contain these segments and returns
zeroes instead. Once the virtual disk is dismounted and then
mounted again, a sparse copy produces correct results.
This breaks OpenWRT build on macOS. Details:
https://github.com/openwrt/openwrt/pull/11960
https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
Signed-off-by: Georgi Valkov 
---
  src/copy.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/copy.c b/src/copy.c
index e16fedb28..4f56138a6 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
return PLAIN_SCANTYPE;
  }
  -#ifdef SEEK_HOLE
+#if defined(SEEK_HOLE) && !defined(__APPLE__)
scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
if (0 <= scan_inference->ext_start || errno == ENXIO)
  return LSEEK_SCANTYPE;



Thanks for the detailed report.
The patch might very well be appropriate.


Hi! Let’s test the ideas you have first, and fall-back to the patch.

In October 2021 macOS Finder was also affected, and that points directly at 
Apple.
I tested again today, they have fixed Finder. After performing a copy in Finder,
coreutils cp produces a good copy. I have to run
make toolchain/gcc/initial/{clean,compile} -j 16
before I can reproduce it again with the same file.

So while Apple didn't fix the underlaying issue with APFS, they did
provide a solution for Finder. And we can make coreutils work correctly too.


That suggests that Finder may have sync'd the file.
Now syncing has overheads of course, so not an option to take lightly.
We may be safer just doing the normal copy on __APPLE__ as per your orig patch.
A dtruss of Finder might be instructive BTW.
Why I suspect syncing is that old fiemap code we used to have in copy
needed the FIEMAP_FLAG_SYNC flag to avoid bugs like this on some file systems.


We avoided copy_file_range() for a long time on Linux for example
until it became usable.

Thanks for confirming the latest git is also impacted.


Yes, I tested on master today.


I see you're on macOS 12.
macOS 13 supports fclonefileat() which may
avoid the issue, or also be impacted?


Yes, I use macOS 12. There are too many complains about 13.
I can boot macOS 13 recovery, so that might be an option,
if I can setup a working build environment.
Can you invite someone with macOS 13 to test the new API?
Github has hosted runners. But I’ve never used one. And I think they use macOS 
12.


OK let's assume fclonefileat() on macOS 13 is fine.
That's just a thing to consider if one is testing on macOS 13,
rather than jumping through hoops to test it.


I wonder might an fdatasync(fd) avoid your issue,
which we might do if defined(__APPLE__)?


Send me patches, and I will test them.

#ifdef SEEK_HOLE
   fdatasync(fd); // this did not work, is fd writable? Do we need to call it 
on the disk image?
scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
if (0 <= scan_inference->ext_start || errno == ENXIO)
  return LSEEK_SCANTYPE;



Oh right interesting. When you say doesn't work, do you mean the sync
fails, or just doesn't help.  Note the separate coreutils sync util
only opens with write mode on AIX and Cygwin.

Relatedly you could just try that external `sync file && cp file dest` also to 
test the sync hypothesis.

If the sync doesn't work, or requires write mode
then that might be enough to decide to just avoid SEEK_DATA on __APPLE__ for 
now.


It would be best if we test on an Intel Mac with T2 chip. I use MBP 2019 16”.
That security chip and the encrypted physical disk with APFS might also affect 
our results.

Georgi Valkov
httpstorm.com
nano RTOS

off-topic: can you please point me at API to take a disk offline or lock it for 
exclusive access?
I wrote a disk cloning tool and I have that functionality on Windows, I’d like 
to add it to Linux, BSD and macOS.



Note it's best to keep this on list.

cheers,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-09 Thread Pádraig Brady

On 09/02/2023 09:20, George Valkov wrote:

Due to a bug in macOS, sparse copies are corrupted on virtual disks
formatted with APFS. HFS is not affected. Affected are coreutils
install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.

While reading the entire file returns valid data, scanning for
allocated segments may return holes where valid data is present.
In this case a sparse copy does not contain these segments and returns
zeroes instead. Once the virtual disk is dismounted and then
mounted again, a sparse copy produces correct results.

This breaks OpenWRT build on macOS. Details:
https://github.com/openwrt/openwrt/pull/11960
https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579

Signed-off-by: Georgi Valkov 
---
  src/copy.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/copy.c b/src/copy.c
index e16fedb28..4f56138a6 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
return PLAIN_SCANTYPE;
  }
  
-#ifdef SEEK_HOLE

+#if defined(SEEK_HOLE) && !defined(__APPLE__)
scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
if (0 <= scan_inference->ext_start || errno == ENXIO)
  return LSEEK_SCANTYPE;



Thanks for the detailed report.
The patch might very well be appropriate.
We avoided copy_file_range() for a long time on Linux for example
until it became usable.

Thanks for confirming the latest git is also impacted.

I see you're on macOS 12.
macOS 13 supports fclonefileat() which may
avoid the issue, or also be impacted?

I wonder might an fdatasync(fd) avoid your issue,
which we might do if defined(__APPLE__)?

cheers,
Pádraig





bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS

2023-02-09 Thread George Valkov
Due to a bug in macOS, sparse copies are corrupted on virtual disks
formatted with APFS. HFS is not affected. Affected are coreutils
install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.

While reading the entire file returns valid data, scanning for
allocated segments may return holes where valid data is present.
In this case a sparse copy does not contain these segments and returns
zeroes instead. Once the virtual disk is dismounted and then
mounted again, a sparse copy produces correct results.

This breaks OpenWRT build on macOS. Details:
https://github.com/openwrt/openwrt/pull/11960
https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579

Signed-off-by: Georgi Valkov 
---
 src/copy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/copy.c b/src/copy.c
index e16fedb28..4f56138a6 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
   return PLAIN_SCANTYPE;
 }
 
-#ifdef SEEK_HOLE
+#if defined(SEEK_HOLE) && !defined(__APPLE__)
   scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
   if (0 <= scan_inference->ext_start || errno == ENXIO)
 return LSEEK_SCANTYPE;
-- 
2.39.1


The issue is most likely within APFS, So we need to bring it to Apple's 
attention.
Since it is triggered by building `gcc`, it would be helpful
if they can shine some light how their tools are build
and what trigger this, in order to create PoC code.

**steps to reproduce**

_Create a sparse disk image with APFS_
- open Disk Utility
- press Command-N to create a new virtual disk
- Image Format: sparse disk image
- save as: ~/vhd/test
- Name: test
- Size: 50 GB
- Format: APFS (Case-sensitive)
- Encryption: none
- Partitions: Single partition - GUID Partition Map
- Save
- the image is mounted under `/Volumes/test`

_clone and build coreutils_
``` bash
cd /Volumes/test
# patched with sparse support disabled on macOS
git clone https://github.com/httpstorm/coreutils.git
mv coreutils coreutils-no-sparse
cd coreutils-no-sparse
git checkout macos_disable-sparse-copy
git submodule foreach git pull origin master
./bootstrap
make -j 16
cd ..

# original code
git clone https://github.com/coreutils/coreutils.git
cd coreutils
git submodule foreach git pull origin master
./bootstrap
make -j 16
```

_download OpenWRT and build gcc_
``` bash
cd /Volumes/test
git clone https://github.com/openwrt/openwrt.git
cd openwrt
./scripts/feeds update -a
./scripts/feeds install -a
make defconfig
make menuconfig
# Target System (Marvell EBU Armada)
# Target Profile (Linksys WRT3200ACM)
# Save, Exit
make tools/compile -j 16
make toolchain/gcc/initial/compile -j 16
```

proof of concept
``` bash
cd /Volumes/test/openwrt
cd 
build_dir/toolchain-arm_cortex-a9+vfpv3-d16_gcc-12.2.0_musl_eabi/gcc-12.2.0-initial/gcc

/Volumes/test/coreutils/src/cp cc1 cc1-ori
./cc1-ori --help
-bash: ./cc1-ori: cannot execute binary file: Exec format error

/Volumes/test/coreutils-no-sparse/src/cp cc1 cc1-fix
./cc1-fix --help
# works correctly

sha1sum cc1
# fcf5d9458f54d5588efc5a76a89388925ca04eda  cc1
sha1sum cc1-fix 
# fcf5d9458f54d5588efc5a76a89388925ca04eda  cc1-fix
sha1sum cc1-ori
# 7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-ori
```


Kind regards,

Georgi Valkov
httpstorm.com
nano RTOS