bug#56710: ls vs. stat display of st_size

2022-07-23 Thread Paul Eggert

On 7/23/22 05:17, Pádraig Brady wrote:


BTW I see we've code in cache_fstatat() that assumes
st_size can't have such large values, which contradicts a bit.


Good catch. I installed the first attached patch.


> This is only a real consideration for virtual files I think
> since off_t is signed, and so impractical for a real file system
> to support files > OFF_T_MAX.

Yes, that sounds right.

You've convinced me that 'ls' should switch to the way 'stat' behaves 
rather than vice versa; that's more useful anyway. How about the 
attached second patch, which I haven't installed? (I was actually 
inclined this way originally but got lazy.)From c2056a320b38126bf5566c2ce94e2c2b25243f66 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 23 Jul 2022 12:11:49 -0700
Subject: [PATCH 1/2] =?UTF-8?q?rm:=20don=E2=80=99t=20assume=20st=5Fsize=20?=
 =?UTF-8?q?is=20nonnegative?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* src/remove.c: Include stat-time.h.
(cache_fstatat, cache_stat_init): Use negative st->st_atim.tv_sec to
determine whether the stat is cached, not negative st->st_size.
On non-POSIX platforms that lack st_atim.tv_sec, don’t bother to cache.
---
 src/remove.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/src/remove.c b/src/remove.c
index b5d1ea8a2..e2f27ca4f 100644
--- a/src/remove.c
+++ b/src/remove.c
@@ -28,6 +28,7 @@
 #include "ignore-value.h"
 #include "remove.h"
 #include "root-dev-ino.h"
+#include "stat-time.h"
 #include "write-any-file.h"
 #include "xfts.h"
 #include "yesno.h"
@@ -62,29 +63,37 @@ enum Prompt_action
 # define DT_LNK 2
 #endif
 
-/* Like fstatat, but cache the result.  If ST->st_size is -1, the
-   status has not been gotten yet.  If less than -1, fstatat failed
-   with errno == ST->st_ino.  Otherwise, the status has already
-   been gotten, so return 0.  */
+/* Like fstatat, but cache on POSIX-compatible systems.  */
 static int
 cache_fstatat (int fd, char const *file, struct stat *st, int flag)
 {
-  if (st->st_size == -1 && fstatat (fd, file, st, flag) != 0)
+#if HAVE_STRUCT_STAT_ST_ATIM_TV_NSEC
+  /* If ST->st_atim.tv_nsec is -1, the status has not been gotten yet.
+ If less than -1, fstatat failed with errno == ST->st_ino.
+ Otherwise, the status has already been gotten, so return 0.  */
+  if (0 <= st->st_atim.tv_nsec)
+return 0;
+  if (st->st_atim.tv_nsec == -1)
 {
-  st->st_size = -2;
+  if (fstatat (fd, file, st, flag) == 0)
+return 0;
+  st->st_atim.tv_nsec = -2;
   st->st_ino = errno;
 }
-  if (0 <= st->st_size)
-return 0;
-  errno = (int) st->st_ino;
+  errno = st->st_ino;
   return -1;
+#else
+  return fstatat (fd, file, st, flag);
+#endif
 }
 
 /* Initialize a fstatat cache *ST.  Return ST for convenience.  */
 static inline struct stat *
 cache_stat_init (struct stat *st)
 {
-  st->st_size = -1;
+#if HAVE_STRUCT_STAT_ST_ATIM_TV_NSEC
+  st->st_atim.tv_nsec = -1;
+#endif
   return st;
 }
 
-- 
2.34.1

From 03cc716cb1d6d69dfdb9038a6889035ab957f201 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Sat, 23 Jul 2022 11:00:33 -0700
Subject: [PATCH 2/2] ls: print negative file sizes as negative

This is more useful in practice (Bug#56710).
However, if POSIXLY_CORRECT is set, print them as positive.
* src/ls.c (abs_file_size, human_file_size): New functions.
(gobble_file, print_long_format): Use them.
* src/stat.c: Revert previous change, so that stat and ls agree.
---
 NEWS   |  6 --
 src/ls.c   | 53 +++--
 src/stat.c | 10 +-
 3 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/NEWS b/NEWS
index 816025255..d76946eb8 100644
--- a/NEWS
+++ b/NEWS
@@ -21,12 +21,14 @@ GNU coreutils NEWS-*- outline -*-
   'cp --reflink=always A B' no longer leaves behind a newly created
   empty file B merely because copy-on-write clones are not supported.
 
+  Unless POSIXLY_CORRECT is set, 'ls -l' no longer prints negative
+  file sizes as huge positive numbers.  This is more consistent with
+  how 'stat -c %s' treats virtual files like /proc/kcore.
+
   'ls -v' and 'sort -V' go back to sorting ".0" before ".A",
   reverting to the behavior in coreutils-9.0 and earlier.
   This behavior is now documented.
 
-  ’stat -c %s' now prints sizes as unsigned, consistent with 'ls'.
-
 ** New Features
 
   factor now accepts the --exponents (-h) option to print factors
diff --git a/src/ls.c b/src/ls.c
index d48892be7..475fb2719 100644
--- a/src/ls.c
+++ b/src/ls.c
@@ -3142,6 +3142,19 @@ file_ignored (char const *name)
   || patterns_match (ignore_patterns, name));
 }
 
+/* The following functions assumes typical implementations
+   where off_t is no wider than uintmax_t.  */
+verify (OFF_T_MAX <= UINTMAX_MAX / 2);
+
+/* Return abs (SIZE) as an unsigned integer.  */
+
+static uintmax_t
+abs_file_size (off_t size)
+{
+  uintmax_t s = 

bug#56710: ls vs. stat display of st_size

2022-07-23 Thread Pádraig Brady

On 22/07/2022 21:52, Paul Eggert wrote:

Thanks for reporting that. I installed the attached.


Playing devil's advocate, this takes the stance that
st_size should always be treated as unsigned
(given that stat(1) is a lower level util than ls(1)).

This is only a real consideration for virtual files I think
since off_t is signed, and so impractical for a real file system
to support files > OFF_T_MAX.
In this case /proc/kcore is a virtual file, with the
size representing the VM size (guessing riscv64 in this case).
But other virtual files may set st_size = -1,
to represent an unknown file size, which with the change,
scripts using stat(1) can no longer rely on?
Perhaps the "-1" case could be specialized for this.

BTW I see we've code in cache_fstatat() that assumes
st_size can't have such large values, which contradicts a bit.

BTW assuming that st_size is unsigned, reminds me of this change where
we cast all st_size to unsigned, which also allowed us to enable -Wsign-compare:
https://lists.gnu.org/archive/html/bug-coreutils/2009-01/msg00050.html

cheers,
Pádraig