bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-08 Thread Paul Eggert

On 6/8/23 04:28, Arsen Arsenović wrote:

Please reconsider dropping the configure-time version check.


I already dropped it, so this discussion is more about the general 
principle than this particular case.



It is entirely reasonable to expect that a user will roll back a kernel
update, as there are many reasons one might have to do so.


Sure, but if they do so, and if programs use configure-time checks whose 
results depend on the kernel version, users will have to rebuild and 
reconfigure these programs.


That's just a fact of life. This is not just a Gnulib issue. Many 
programs do lots of configure-time checks. Nobody that I know of has 
time to audit them all.


*Usually* rolling back the kernel will work, because *usually* it's a 
small rollback and programs built for a newer kernel won't exercise the 
small area where the kernels differ in a way that will cause 
user-visible symptoms.


But in general it does not work, and cannot reasonably be expected to
work.

So: when in doubt, rebuild. Of course if you've carefully audited the 
program and know that its configure-time tests are valid for the older 
kernel, you can skip the rebuild. Or if building speed is more important 
to you than reliable applications, you can also skip the rebuild.







bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-08 Thread Arsen Arsenović via GNU coreutils Bug Reports

Paul Eggert  writes:

> That's not surprising, as this sort of problem arises only when building for a
> newer platform yields a package that will run incorrectly on an older
> platform. Problems like these are relatively rare if the only such mismatch is
> the Linux kernel version, because current glibc explicitly supports building
> for Linux kernel>=3.2, even when glibc is built on newer kernels, these days
> nobody doing this sort of thing cares about kernels older than 3.2, and most
> packages rely entirely on glibc to access the kernel. There are exceptional
> packages, though, and you may run into problems with those exceptions.
>
> If users build for newer platforms and it usually works on older platforms,
> that's great! But you might want to document that it might not work. Because 
> it
> might not.

In this instance, we have a configure-time test that makes a false
assumption (namely, that linux-headers version matches the linux
version), preventing a very valid runtime test from being emitted.  The
cost for enabling this check is removing a dozen or so lines of code in
copy-file-range.m4 (and the subsequent addition of a few instructions
and a single syscall that is executed once).

It is entirely reasonable to expect that a user will roll back a kernel
update, as there are many reasons one might have to do so.

This is not quite comparable to downgrading libraries (which will,
reasonably, break as a result), because the kernel<->userspace boundary
is of very different nature to the library<->* boundary (namely, the
former is far more 'loose', in the sense that failure to implement
something manifests as a runtime failure rather than a build-time, or
even rtld-time failure).  It is no coincidence that glibc explicitly
supports kernels older than the linux-headers it's being built with.

I'd understand not wanting to support this use-case in the cases where
doing so is difficult, but here, it's more difficult to not support this
use-case than to support it.

Please reconsider dropping the configure-time version check.

Thanks in advance, have a lovely day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-07 Thread Paul Eggert

On 2023-06-07 12:21, Sam James wrote:


I don't see how this is compatible with what I explained, given you
can't have parallel linux-headers installs.

Nobody uses glibc in a vacuum, either, so it's not particularly
meaningful to say you can use glibc but nothing else in that way.

If you object to this, please bring it up on the glibc mailing list
to discuss revising the guidance.


There must be some miscommunication here. I'm not objecting to how glibc 
is built, and I don't see why the glibc mailing list would be interested 
in this problem as it's not their problem.




We can't enforce
what kernels users are running, and we can't simultaneously make the
users install the headers for the kernel they just built manually while
also having things owned and controlled by the package manager.

We're also very much not the only distribution doing this - Arch
usually ships the latest linux-headers, and Alpine does as well.
And we've done it for years with very, very few issues.


That's not surprising, as this sort of problem arises only when building 
for a newer platform yields a package that will run incorrectly on an 
older platform. Problems like these are relatively rare if the only such 
mismatch is the Linux kernel version, because current glibc explicitly 
supports building for Linux kernel>=3.2, even when glibc is built on 
newer kernels, these days nobody doing this sort of thing cares about 
kernels older than 3.2, and most packages rely entirely on glibc to 
access the kernel. There are exceptional packages, though, and you may 
run into problems with those exceptions.


If users build for newer platforms and it usually works on older 
platforms, that's great! But you might want to document that it might 
not work. Because it might not.






bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-07 Thread Sam James

Paul Eggert  writes:

> On 2023-06-06 03:02, Sam James wrote:
>> Thanks Paul. Do you know if there's any other cases of this in gnulib?
>
> There's no other use of linux/version.h in Gnulib. However, this
> doesn't mean there are no other instances of the problem, as it's
> common to assume that you'll run on a host no older than the build
> host. Every use of Autoconf macros like AC_RUN_IFELSE assume this sort
> of thing, and I would not be surprised if there are other examples of
> this more-general problem when some builders (unwisely) build apps
> intended to run on platforms older than the build platform. (For what
> it's worth, I count 331 uses of AC_RUN_IFELSE in Gnulib; it'd be a
> pain to audit them all.)
>
> It's OK to build glibc for Linux kernels somewhat older than the build
> kernel, since glibc explicitly exports that. But it's not OK to build
> other packages that way, since they don't support it. You'll get away
> with it in many cases, but when it doesn't work then the problem is on
> the builder, not on the packages being built.

I don't see how this is compatible with what I explained, given you
can't have parallel linux-headers installs.

Nobody uses glibc in a vacuum, either, so it's not particularly
meaningful to say you can use glibc but nothing else in that way.

If you object to this, please bring it up on the glibc mailing list
to discuss revising the guidance. But again, I genuinely don't see
how anything other than the status quo could work. We can't enforce
what kernels users are running, and we can't simultaneously make the
users install the headers for the kernel they just built manually while
also having things owned and controlled by the package manager.

We're also very much not the only distribution doing this - Arch
usually ships the latest linux-headers, and Alpine does as well.
And we've done it for years with very, very few issues.

>
>
>> (or whether, by chance, if you know off the top of your head if any
>> other gnulib consumers definitely use copy_file_range?)
>
> Other coreutils programs like 'install' use copy_file_range.
>
> Emacs uses it. I just now installed the following patch into
> bleeding-edge Emacs to work around the problem, by updating to current
> Gnulib. This workaround should appear in Emacs 30, whenever that
> happens (not for a year or two I'd think, as Emacs 29 isn't out yet).
>
> https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=a902156068ab071f93cc9bbd34cd320919b74064
>

Thanks!

> PS. I'm boldly marking the coreutils bug as fixed.

Works for me, thanks.



signature.asc
Description: PGP signature


bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-07 Thread Paul Eggert

On 2023-06-06 03:02, Sam James wrote:

Thanks Paul. Do you know if there's any other cases of this in gnulib?


There's no other use of linux/version.h in Gnulib. However, this doesn't 
mean there are no other instances of the problem, as it's common to 
assume that you'll run on a host no older than the build host. Every use 
of Autoconf macros like AC_RUN_IFELSE assume this sort of thing, and I 
would not be surprised if there are other examples of this more-general 
problem when some builders (unwisely) build apps intended to run on 
platforms older than the build platform. (For what it's worth, I count 
331 uses of AC_RUN_IFELSE in Gnulib; it'd be a pain to audit them all.)


It's OK to build glibc for Linux kernels somewhat older than the build 
kernel, since glibc explicitly exports that. But it's not OK to build 
other packages that way, since they don't support it. You'll get away 
with it in many cases, but when it doesn't work then the problem is on 
the builder, not on the packages being built.




(or whether, by chance, if you know off the top of your head if any
other gnulib consumers definitely use copy_file_range?)


Other coreutils programs like 'install' use copy_file_range.

Emacs uses it. I just now installed the following patch into 
bleeding-edge Emacs to work around the problem, by updating to current 
Gnulib. This workaround should appear in Emacs 30, whenever that happens 
(not for a year or two I'd think, as Emacs 29 isn't out yet).


https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=a902156068ab071f93cc9bbd34cd320919b74064

PS. I'm boldly marking the coreutils bug as fixed.





bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-06 Thread Pádraig Brady

On 06/06/2023 11:02, Sam James wrote:


Paul Eggert  writes:


On 2023-06-05 22:26, Sam James wrote:

It's just that linux-headers is
a special case


Indeed it is. And apparently glibc avoids the copy_file_range bugs by
never, ever using copy_file_range internally - which explains why
glibc hasn't run into this backward compatibility issue. Presumably
once glibc starts assuming kernel 5.3 or later, it can start using
copy_file_range internally.

Anyway, thanks for explaining. I installed the patch I mentioned into
Gnulib and have updated coreutils accordingly on Savannah
master. Please give it a try.


Thanks Paul. Do you know if there's any other cases of this in gnulib?

(or whether, by chance, if you know off the top of your head if any
other gnulib consumers definitely use copy_file_range?)


Both google and cs.github.com suggest only coreutils uses copy-file-range from 
gnulib:
https://www.google.com/search?q=filetype%3Aconf+%22copy-file-range%22

cheers,
Pádraig





bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-06 Thread Sam James

Paul Eggert  writes:

> On 2023-06-05 22:26, Sam James wrote:
>> It's just that linux-headers is
>> a special case
>
> Indeed it is. And apparently glibc avoids the copy_file_range bugs by
> never, ever using copy_file_range internally - which explains why
> glibc hasn't run into this backward compatibility issue. Presumably
> once glibc starts assuming kernel 5.3 or later, it can start using
> copy_file_range internally.
>
> Anyway, thanks for explaining. I installed the patch I mentioned into
> Gnulib and have updated coreutils accordingly on Savannah
> master. Please give it a try.

Thanks Paul. Do you know if there's any other cases of this in gnulib?

(or whether, by chance, if you know off the top of your head if any
other gnulib consumers definitely use copy_file_range?)


signature.asc
Description: PGP signature


bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-06 Thread Pádraig Brady

On 06/06/2023 06:48, Paul Eggert wrote:

On 2023-06-05 22:26, Sam James wrote:

It's just that linux-headers is
a special case


Indeed it is. And apparently glibc avoids the copy_file_range bugs by
never, ever using copy_file_range internally - which explains why glibc
hasn't run into this backward compatibility issue. Presumably once glibc
starts assuming kernel 5.3 or later, it can start using copy_file_range
internally.


Yes, glibc has the --enable-kernel configure option
to allow setting the compatibility range independently
from the build host (kernel headers).
In my experience a distro needs to support quite a range of kernel versions
mainly due to certain hardware platforms being tied to older kernels.


Anyway, thanks for explaining. I installed the patch I mentioned into
Gnulib and have updated coreutils accordingly on Savannah master. Please
give it a try.


Thanks for making the change Paul.

Pádraig





bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-05 Thread Paul Eggert

On 2023-06-05 22:26, Sam James wrote:

It's just that linux-headers is
a special case


Indeed it is. And apparently glibc avoids the copy_file_range bugs by 
never, ever using copy_file_range internally - which explains why glibc 
hasn't run into this backward compatibility issue. Presumably once glibc 
starts assuming kernel 5.3 or later, it can start using copy_file_range 
internally.


Anyway, thanks for explaining. I installed the patch I mentioned into 
Gnulib and have updated coreutils accordingly on Savannah master. Please 
give it a try.






bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-05 Thread Sam James

Sam James  writes:

> [[PGP Signed Part:Undecided]]
>
> Paul Eggert  writes:
>
>> On 2023-06-03 06:54, Mike Gilbert wrote:
>>> In this case, headers from linux-6.1 are being used at build time.
>>> However, the code is being run on a linux-4.19 kernel.
>>
>> Gnulib doesn't support that. If you build with headers from a
>> particular version of the operating system, you can't necessarily run
>> on older versions. The reasons for this sort of restriction should be
>> obvious.
>>
>
> This is a principle that other core parts of userland have no issue
> with. For example, util-linux has various fallbacks based on the
> *runtime* kernel version.
>
> This doesn't square with reality, anyway: if I install linux-6.1
> and its headers, then I downgrade, I need to then rebuild every
> piece of the userland I built against the new headers. Tracking
> that as a user is nontrivial.
>
>> If Gentoo builds are regularly targeting older kernels or libraries
>> than the platform they are building on, then surely that's a problem
>> in general, not just here.
>
> Now, continuing from what I said above, it's not feasible to *require*
> users to use a kernel from the package manager. Not only do users want
> to be able to run their own kernel (sometimes even just for a quick
> test), but this is completely incompatible with having multiple kernels
> installed in parallel, as you can't have multiple versions of the
> same linux-headers in /usr/include.
>
> Going further: are we really suggesting that someone who was using
> say, Linux 6.1 for a few days, then downgrades to Linux 5.15 to test
> something is in an unsupported configuration?
>
> This isn't a practical position to have. This assumption *barely*
> holds for binary distributions where you "upgrade the world" all
> at once, and as I said, it's questionable there. And it's completely
> incompatible with source-based distributions.

Just to be clear: I totally agree this isn't feasible for general
libraries (and their headers). It's just that linux-headers is
a special case, because you can't easily juggle different versions of
it, and it's expected that users may vary their kernel version.

Also, glibc says the latest should always be fine:
https://sourceware.org/glibc/wiki/FAQ#What_version_of_the_Linux_kernel_headers_should_be_used.3F.

Of course, gnulib isn't glibc, but I'm pointing out that this is
established practice.


signature.asc
Description: PGP signature


bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-05 Thread Sam James

Paul Eggert  writes:

> On 2023-06-03 06:54, Mike Gilbert wrote:
>> In this case, headers from linux-6.1 are being used at build time.
>> However, the code is being run on a linux-4.19 kernel.
>
> Gnulib doesn't support that. If you build with headers from a
> particular version of the operating system, you can't necessarily run
> on older versions. The reasons for this sort of restriction should be
> obvious.
>

This is a principle that other core parts of userland have no issue
with. For example, util-linux has various fallbacks based on the
*runtime* kernel version.

This doesn't square with reality, anyway: if I install linux-6.1
and its headers, then I downgrade, I need to then rebuild every
piece of the userland I built against the new headers. Tracking
that as a user is nontrivial.

> If Gentoo builds are regularly targeting older kernels or libraries
> than the platform they are building on, then surely that's a problem
> in general, not just here.

Now, continuing from what I said above, it's not feasible to *require*
users to use a kernel from the package manager. Not only do users want
to be able to run their own kernel (sometimes even just for a quick
test), but this is completely incompatible with having multiple kernels
installed in parallel, as you can't have multiple versions of the
same linux-headers in /usr/include.

Going further: are we really suggesting that someone who was using
say, Linux 6.1 for a few days, then downgrades to Linux 5.15 to test
something is in an unsupported configuration?

This isn't a practical position to have. This assumption *barely*
holds for binary distributions where you "upgrade the world" all
at once, and as I said, it's questionable there. And it's completely
incompatible with source-based distributions.

>
> I'll cc this to bug-gnulib to give them a heads-up about the
> issue. For gnulib readers, the original bug report is here:
>
> https://bugs.gnu.org/63850



signature.asc
Description: PGP signature


bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-05 Thread Paul Eggert
If we really need to cater to the practice of building for an older 
system than the build host (which to my mind is asking for trouble), 
then instead of having Gnulib try to work around the bugs in 
copy_file_range in Linux kernels before 5.3, it should simply not use 
copy_file_range in these older kernels.


In other words, something like the attached patch should be more 
reliable than the other band-aids proposed.From 87b95c17dc8611f9483b966d052eefc930f43927 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Mon, 5 Jun 2023 22:04:37 -0700
Subject: [PATCH] copy-file-range: support building for older kernels

* m4/copy-file-range.m4 (gl_FUNC_COPY_FILE_RANGE):
Remove static check, to support the dubious practice of
building for platforms that predate the build platform.
On working kernels this adds an extra syscall the first time
that copy_file_range is used.  Problem reported for Gentoo by
Sam James .
---
 ChangeLog |  8 
 m4/copy-file-range.m4 | 18 +++---
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index d8f04f737d..a917eb63a6 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,13 @@
 2023-06-05  Paul Eggert  
 
+	copy-file-range: support building for older kernels
+	* m4/copy-file-range.m4 (gl_FUNC_COPY_FILE_RANGE):
+	Remove static check, to support the dubious practice of
+	building for platforms that predate the build platform.
+	On working kernels this adds an extra syscall the first time
+	that copy_file_range is used.  Problem reported for Gentoo by
+	Sam James .
+
 	manywarnings: more nuance about optimization
 	* doc/manywarnings.texi (manywarnings): Suggest compiling with the
 	optimization flags commonly used, as opposed to -O2 and -O0
diff --git a/m4/copy-file-range.m4 b/m4/copy-file-range.m4
index d41f2c4831..fa6ab34109 100644
--- a/m4/copy-file-range.m4
+++ b/m4/copy-file-range.m4
@@ -39,21 +39,9 @@ AC_DEFUN([gl_FUNC_COPY_FILE_RANGE],
 
 case $host_os in
   linux*)
-AC_CACHE_CHECK([whether copy_file_range is known to work],
-  [gl_cv_copy_file_range_known_to_work],
-  [AC_COMPILE_IFELSE(
- [AC_LANG_PROGRAM(
-[[#include 
-]],
-[[#if LINUX_VERSION_CODE < KERNEL_VERSION (5, 3, 0)
-   #error "copy_file_range is buggy"
-  #endif
-]])],
- [gl_cv_copy_file_range_known_to_work=yes],
- [gl_cv_copy_file_range_known_to_work=no])])
-if test "$gl_cv_copy_file_range_known_to_work" = no; then
-  REPLACE_COPY_FILE_RANGE=1
-fi;;
+# See copy-file-range.c comment re pre-5.3 Linux kernel bugs.
+# We should be able to remove this hack in 2025.
+REPLACE_COPY_FILE_RANGE=1;;
 esac
   fi
 ])
-- 
2.39.2



bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-05 Thread Paul Eggert

On 2023-06-03 06:54, Mike Gilbert wrote:

In this case, headers from linux-6.1 are being used at build time.
However, the code is being run on a linux-4.19 kernel.


Gnulib doesn't support that. If you build with headers from a particular 
version of the operating system, you can't necessarily run on older 
versions. The reasons for this sort of restriction should be obvious.


If Gentoo builds are regularly targeting older kernels or libraries than 
the platform they are building on, then surely that's a problem in 
general, not just here.


I'll cc this to bug-gnulib to give them a heads-up about the issue. For 
gnulib readers, the original bug report is here:


https://bugs.gnu.org/63850





bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-03 Thread Pádraig Brady

On 03/06/2023 07:05, Paul Eggert wrote:

On 2023-06-02 09:31, Pádraig Brady wrote:

I'm not sure it was working correctly before 9.3 either.
Before 9.3 we would have switched from copy_file_range() to read()/write()


Actually, cp shouldn't have been using copy_file_range at all, as the
code is supposed to never use copy_file_range unless the Linux kernel
version is 5.3 or later. See m4/copy-file-range.m4 and
lib/copy-file-range.c.

Since the bug is being reported against kernel 4.19, someone needs to
investigate why the Gentoo build is using the copy_file_range syscall on
that kernel. Either the Gentoo build isn't properly compiling the
replacement function in coreutils/lib/copy-file-range.c, or the
replacement function is incorrectly deciding that the kernel is new
enough, or something like that.

We shouldn't need to fiddle with src/copy.c on this.


Yes good call on the kernel version checks.

Though I think we should be a bit more accommodating for edge cases
in code as fundamental as coreutils copy routines.

Personally I would have a preference for accommodating the partial failure
as originally proposed in the patch at https://bugs.gnu.org/62404#11

Personally I would definitely reinstate the _runtime_ kernel version check,
for this bug and all the other copy_file_range() issues on older kernels.
The runtime to build time change was originally discussed at:
https://lists.gnu.org/archive/html/bug-gnulib/2022-01/msg00118.html

cheers,
Pádraig





bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-03 Thread Arsen Arsenović via GNU coreutils Bug Reports

Mike Gilbert  writes:

> The macro in copy-file-range.m4 performs a build time version check
> against the installed linux headers (/usr/include/linux).
>
> In this case, headers from linux-6.1 are being used at build time.
> However, the code is being run on a linux-4.19 kernel.
>
> Generally speaking, syscall checks must be done at run time on Linux,
> not build time.

Right, the replacement should always be emitted (perhaps with glibc stub
detection, and omitted if one is found, though).  linux-headers is a
weird library, in that it makes no implication about the compatibility
level of your resulting executable, so, no assumption about runtime
versions or behavior can be made from static checks.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-03 Thread Mike Gilbert
On Fri, Jun 02, 2023 at 11:05:24PM -0700, Paul Eggert wrote:
> On 2023-06-02 09:31, Pádraig Brady wrote:
> > I'm not sure it was working correctly before 9.3 either.
> > Before 9.3 we would have switched from copy_file_range() to read()/write()
> 
> Actually, cp shouldn't have been using copy_file_range at all, as the 
> code is supposed to never use copy_file_range unless the Linux kernel 
> version is 5.3 or later. See m4/copy-file-range.m4 and 
> lib/copy-file-range.c.
> 
> Since the bug is being reported against kernel 4.19, someone needs to 
> investigate why the Gentoo build is using the copy_file_range syscall on 
> that kernel. Either the Gentoo build isn't properly compiling the 
> replacement function in coreutils/lib/copy-file-range.c, or the 
> replacement function is incorrectly deciding that the kernel is new 
> enough, or something like that.
> 
> We shouldn't need to fiddle with src/copy.c on this.

The macro in copy-file-range.m4 performs a build time version check
against the installed linux headers (/usr/include/linux).

In this case, headers from linux-6.1 are being used at build time.
However, the code is being run on a linux-4.19 kernel.

Generally speaking, syscall checks must be done at run time on Linux,
not build time.





bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-03 Thread Mike Gilbert
On Fri, Jun 02, 2023 at 05:31:50PM +0100, Pádraig Brady wrote:
> I'm not sure it was working correctly before 9.3 either.
> Before 9.3 we would have switched from copy_file_range() to read()/write()
> upon receiving the EINVAL, which might have worked, but also I'm not sure
> the file offsets would be correct in that case. Could you show the output 
> with:
> 
> diff --git a/src/copy.c b/src/copy.c
> index 0dd059d2e..35c54b905 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -363,7 +363,16 @@ sparse_copy (int src_fd, int dest_fd, char **abuf, 
> size_t buf_size,
>  edge case where the file is made immutable after creating,
>  in which case the (more accurate) error is still shown.  */
>   if (*total_n_read == 0 && is_CLONENOTSUP (errno))
> -  break;
> +  {
> +if (*total_n_read != 0)
> +  {
> +off_t clone_read_offset = lseek (src_fd, 0, SEEK_CUR);
> +off_t clone_write_offset = lseek (dest_fd, 0, SEEK_CUR);
> +printf ("switching to standard copy at :%"PRIdMAX" 
> read=%"PRIdMAX" write=%"PRIdMAX"\n",
> +*total_n_read, clone_read_offset, 
> clone_write_offset);
> +  }
> +break;
> +  }
> 
>   /* ENOENT was seen sometimes across CIFS shares, resulting in
>  no data being copied, but subsequent standard copies 
> succeed.  */

I don't think this patch will do anything useful: *total_n_read cannot be 0
and not 0 simultaneously, so the new block of code will never be
executed. Maybe you meant to insert this block somewhere else?





bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-03 Thread Paul Eggert

On 2023-06-02 09:31, Pádraig Brady wrote:

I'm not sure it was working correctly before 9.3 either.
Before 9.3 we would have switched from copy_file_range() to read()/write()


Actually, cp shouldn't have been using copy_file_range at all, as the 
code is supposed to never use copy_file_range unless the Linux kernel 
version is 5.3 or later. See m4/copy-file-range.m4 and 
lib/copy-file-range.c.


Since the bug is being reported against kernel 4.19, someone needs to 
investigate why the Gentoo build is using the copy_file_range syscall on 
that kernel. Either the Gentoo build isn't properly compiling the 
replacement function in coreutils/lib/copy-file-range.c, or the 
replacement function is incorrectly deciding that the kernel is new 
enough, or something like that.


We shouldn't need to fiddle with src/copy.c on this.





bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-02 Thread Pádraig Brady

On 03/06/2023 02:02, Mike Gilbert wrote:

On Fri, Jun 02, 2023 at 05:31:50PM +0100, Pádraig Brady wrote:

I'm not sure it was working correctly before 9.3 either.
Before 9.3 we would have switched from copy_file_range() to read()/write()
upon receiving the EINVAL, which might have worked, but also I'm not sure
the file offsets would be correct in that case. Could you show the output with:

diff --git a/src/copy.c b/src/copy.c
index 0dd059d2e..35c54b905 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -363,7 +363,16 @@ sparse_copy (int src_fd, int dest_fd, char **abuf, size_t 
buf_size,
  edge case where the file is made immutable after creating,
  in which case the (more accurate) error is still shown.  */
   if (*total_n_read == 0 && is_CLONENOTSUP (errno))
-  break;
+  {
+if (*total_n_read != 0)
+  {
+off_t clone_read_offset = lseek (src_fd, 0, SEEK_CUR);
+off_t clone_write_offset = lseek (dest_fd, 0, SEEK_CUR);
+printf ("switching to standard copy at :%"PRIdMAX" read=%"PRIdMAX" 
write=%"PRIdMAX"\n",
+*total_n_read, clone_read_offset, 
clone_write_offset);
+  }
+break;
+  }

   /* ENOENT was seen sometimes across CIFS shares, resulting in
  no data being copied, but subsequent standard copies succeed. 
 */


I don't think this patch will do anything useful: *total_n_read cannot be 0
and not 0 simultaneously, so the new block of code will never be
executed. Maybe you meant to insert this block somewhere else?


Sorry was rushing.

Yes I meant to remove the first total_n_read check
as done in the following.

In any case an external check would be as useful.
I suppose we would have heard at this stage
but it would be good to have verification with md5sum or similar
on the source and destination files that the copy worked fine
on such a file >2G.

cheers,
Pádraig.

diff --git a/src/copy.c b/src/copy.c
index 0dd059d2e..296707c39 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -362,8 +362,17 @@ sparse_copy (int src_fd, int dest_fd, char **abuf, size_t 
buf_size,
also occur for immutable files, but that would only be in the
edge case where the file is made immutable after creating,
in which case the (more accurate) error is still shown.  */
-if (*total_n_read == 0 && is_CLONENOTSUP (errno))
-  break;
+if (is_CLONENOTSUP (errno))
+  {
+if (*total_n_read != 0)
+  {
+off_t clone_read_offset = lseek (src_fd, 0, SEEK_CUR);
+off_t clone_write_offset = lseek (dest_fd, 0, SEEK_CUR);
+printf ("switching to standard copy at :%"PRIdMAX" read=%"PRIdMAX" 
write=%"PRIdMAX"\n",
+*total_n_read, clone_read_offset, 
clone_write_offset);
+  }
+break;
+  }

 /* ENOENT was seen sometimes across CIFS shares, resulting in
no data being copied, but subsequent standard copies succeed.  
*/






bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-02 Thread Pádraig Brady

On 02/06/2023 16:44, Sam James wrote:

Hello,

Forwarding a downstream report of a behaviour change between
coreutils-9.1 and coreutils-9.3 from https://bugs.gentoo.org/907474.

The reporter bisected it to 093a8b4bfaba60005f14493ce7ef11ed665a0176
("copy: fix --reflink=auto to fallback in more cases", see bug#62404)
and gave strace output showing:
```
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 2147479552
copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = -1 EINVAL
(Invalid argument)
```

"""
When I try to copy a large file (> 2 GB) like so:

cp --debug file_a file_b

output looks like this:

'file_a' -> 'file_b'
cp: error copying 'file_a' to 'file_b': Invalid argument
copy offload: unsupported, reflink: unsupported, sparse detection: no

Afterwards file_b has a size of 2147479552 bytes (= 2G - 4K).

On another system (with newer kernel version) it looks like this:

cp --debug file_a file_b
'file_a' -> 'file_b'
copy offload: yes, reflink: unsupported, sparse detection: no
"""

Let me know if you need further information, although there's
some more on the downstream Gentoo bug I linked.

Apparently this is only happening w/ the 4.19.x kernels.


I'm not sure it was working correctly before 9.3 either.
Before 9.3 we would have switched from copy_file_range() to read()/write()
upon receiving the EINVAL, which might have worked, but also I'm not sure
the file offsets would be correct in that case. Could you show the output with:

diff --git a/src/copy.c b/src/copy.c
index 0dd059d2e..35c54b905 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -363,7 +363,16 @@ sparse_copy (int src_fd, int dest_fd, char **abuf, size_t 
buf_size,
edge case where the file is made immutable after creating,
in which case the (more accurate) error is still shown.  */
 if (*total_n_read == 0 && is_CLONENOTSUP (errno))
-  break;
+  {
+if (*total_n_read != 0)
+  {
+off_t clone_read_offset = lseek (src_fd, 0, SEEK_CUR);
+off_t clone_write_offset = lseek (dest_fd, 0, SEEK_CUR);
+printf ("switching to standard copy at :%"PRIdMAX" read=%"PRIdMAX" 
write=%"PRIdMAX"\n",
+*total_n_read, clone_read_offset, 
clone_write_offset);
+  }
+break;
+  }

 /* ENOENT was seen sometimes across CIFS shares, resulting in
no data being copied, but subsequent standard copies succeed.  
*/







bug#63850: cp fails for files > 2 GB if copy offload is unsupported

2023-06-02 Thread Sam James
Hello,

Forwarding a downstream report of a behaviour change between
coreutils-9.1 and coreutils-9.3 from https://bugs.gentoo.org/907474.

The reporter bisected it to 093a8b4bfaba60005f14493ce7ef11ed665a0176
("copy: fix --reflink=auto to fallback in more cases", see bug#62404)
and gave strace output showing:
```
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 2147479552
copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = -1 EINVAL
(Invalid argument)
```

"""
When I try to copy a large file (> 2 GB) like so:

cp --debug file_a file_b

output looks like this:

'file_a' -> 'file_b'
cp: error copying 'file_a' to 'file_b': Invalid argument
copy offload: unsupported, reflink: unsupported, sparse detection: no

Afterwards file_b has a size of 2147479552 bytes (= 2G - 4K).

On another system (with newer kernel version) it looks like this:

cp --debug file_a file_b
'file_a' -> 'file_b'
copy offload: yes, reflink: unsupported, sparse detection: no
"""

Let me know if you need further information, although there's
some more on the downstream Gentoo bug I linked.

Apparently this is only happening w/ the 4.19.x kernels.

Thanks!



signature.asc
Description: PGP signature