bug#59382: cp(1) tries to allocate too much memory if filesystem blocksizes are unusual

2022-11-20 Thread Korn Andras
On Sat, Nov 19, 2022 at 07:50:06PM -0800, Paul Eggert wrote:

> > > The block size for filesystems can also be quite large (currently,
> > > up to 16M).
> 
> It seems ZFS tries to "help" apps by reporting misinformation (namely a
> smaller block size than actually preferred) when the file is small. This is

Just a nit: this isn't actually misinformation. ZFS uses variable "block"
sizes (it calls these blocks "records"). There is a configurable
per-filesystem maximum, which records of new writes will not exceed (but may
not reach); but existing files may use larger record sizes than what is
currently configured for the fs.

The currently-configured recordsize of the filesystem and the recordsize a
particular file was written with are not necessarily related. Depending on
the write pattern and whether the recordsize of the fs was changed during
the lifetime of the file, the same file can contain records of different
sizes. Reductio ad absurdum: the "optimal" blocksize for reading may in fact
depend on the position within the file (and only apply to the next read).

If a file fits into a single record, then, IIUC, it is actually optimal to
read it in a single operation; this is the case even if the currently
configured recordsize is smaller than what the allowed maximum was when the
file was written. If the file is highly fragmented and chunks of it are
stored on different physical media (this can easily happen if the zfs pool
was expanded with a new "vdev" during the lifetime of the file), it will in
fact be fastest to issue reads for several chunks in parallel, with read
speed possibly scaling almost linearly with the number of parallel requests.
(Not that I'm proposing cp(1) should try to figure this out, presumably in a
zfs-specific way, and actually do it.)

Since the file may contain records of various sizes that bear no relation to
the current per-fs recordsize setting, it's not immediately obvious (at
least to me) what st_blocksize zfs should report that can't be construed as
misinformation.

If you have strong opinions on the matter, you may want to explain them in
the pertinent OpenZFS issue: https://github.com/openzfs/zfs/issues/14195.

András

-- 
 I was once thrown out of a mental hospital for depressing the other patients.





bug#59382: cp(1) tries to allocate too much memory if filesystem blocksizes are unusual

2022-11-19 Thread Korn Andras
Hi,

on zfs, newfstatat() can return an st_blksize that is approximately equal to 
the file size in bytes (if the file fit into a single zfs record).

The block size for filesystems can also be quite large (currently, up to 16M).

The code at https://github.com/coreutils/coreutils/blob/master/src/copy.c#L1343 
will try to compute the least common multiple of the input and output block 
sizes, then allocate a buffer of that size using mmap(). With such unusual 
block sizes, the least common multiple can be very large, causing the mmap() to 
return ENOMEM and cp to abort with "memory exhausted".

Example:

openat(AT_FDCWD, "tmp", O_RDONLY|O_PATH|O_DIRECTORY) = 3
newfstatat(AT_FDCWD, "usr/src/zfs-2.1.6/configure", {st_dev=makedev(0, 0x30), 
st_ino=29267, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0, 
st_blksize=2647552, st_blocks=5177, st_size=2647046, st_atime=1668786119 /* 
2022-11-18T16:41:59.760703544+0100 */, st_atime_nsec=760703544, 
st_mtime=1667705386 /* 2022-11-06T04:29:46+0100 */, st_mtime_nsec=0, 
st_ctime=1668785062 /* 2022-11-18T16:24:22.866737598+0100 */, 
st_ctime_nsec=866737598}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(3, "configure", {st_dev=makedev(0, 0x32), st_ino=3838, 
st_mode=S_IFREG|0700, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4194304, 
st_blocks=1, st_size=0, st_atime=1668788205 /* 
2022-11-18T17:16:45.416328293+0100 */, st_atime_nsec=416328293, 
st_mtime=1668788255 /* 2022-11-18T17:17:35.809758585+0100 */, 
st_mtime_nsec=809758585, st_ctime=1668788255 /* 
2022-11-18T17:17:35.809758585+0100 */, st_ctime_nsec=809758585}, 0) = 0
openat(AT_FDCWD, "usr/src/zfs-2.1.6/configure", O_RDONLY|O_NOFOLLOW) = 
4
newfstatat(4, "", {st_dev=makedev(0, 0x30), 
st_ino=29267, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0, 
st_blksize=2647552, st_blocks=5177, st_size=2647046, st_atime=1668786119 /* 
2022-11-18T16:41:59.760703544+0100 */, st_atime_nsec=760703544, 
st_mtime=1667705386 /* 2022-11-06T04:29:46+0100 */, st_mtime_nsec=0, 
st_ctime=1668785062 /* 2022-11-18T16:24:22.866737598+0100 */, 
st_ctime_nsec=866737598}, AT_EMPTY_PATH) = 0
openat(3, "configure", O_WRONLY|O_TRUNC) = 5
ioctl(5, BTRFS_IOC_CLONE or FICLONE, 4) = -1 EXDEV (Invalid 
cross-device link)
newfstatat(5, "", {st_dev=makedev(0, 0x32), st_ino=3838, 
st_mode=S_IFREG|0700, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4194304, 
st_blocks=1, st_size=0, st_atime=1668788205 /* 
2022-11-18T17:16:45.416328293+0100 */, st_atime_nsec=416328293, 
st_mtime=1668788266 /* 2022-11-18T17:17:46.326056957+0100 */, 
st_mtime_nsec=326056957, st_ctime=1668788266 /* 
2022-11-18T17:17:46.326056957+0100 */, st_ctime_nsec=326056957}, AT_EMPTY_PATH) 
= 0
fadvise64(4, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
copy_file_range(4, NULL, 5, 
NULL, 9223372035781033984, 0) = -1 EXDEV (Invalid cross-device link)
mmap(NULL, 21688754176, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= -1 ENOMEM (Cannot allocate memory)
brk(0x55a7fa4b) = 0x55a2ed8ab000
mmap(NULL, 21689794560, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= -1 ENOMEM (Cannot allocate memory)
openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) 
= -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", 
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = 
-1 ENOENT (No such file or directory)
write(2>, "cp: ", 4) = 4
write(2>, "memory exhausted", 16) = 16
write(2>, "\n", 1) = 1
lseek(0>, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
close(0>)= 0
close(1>)= 0
close(2>)= 0
exit_group(1)   = ?

I think cp(1) shouldn't insist on using the lcm if it's very large; 
additionally, it should never try to allocate a buffer that is (much) larger 
than the source file.

Perhaps you could also try to use successively smaller buffers if allocating a 
large one fails?

András

-- 
   Write your complaints in this box:  -> []