bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
On 10/08/2014 02:12 AM, Jim Meyering wrote: > On Tue, Oct 7, 2014 at 5:36 PM, Pádraig Brady wrote: >> On 10/08/2014 12:51 AM, Paul Eggert wrote: >>> Paul Eggert wrote: >>> The attached patch still needs a changelog entry and test cases. >>> >>> I wrote those up and pushed the attached patch; this should fix the bug so >>> I'm closing the bug report. >> >> I was just going through the patch as it happens, and I found no issues. >> I see the tac changes should also avoid erroneous errors if a file was >> truncated while reading for example. >> I pushed a follow up to avoid some new syntax check errors. >> >> thanks! >> Pádraig. > > I pushed a patch to avoid a new failiure of the > split/b-chunk.sh test on non-Linux. The attached is needed to avoid a new warning on 32 bit. thanks, Pádraig. >From 16c7267d7425fe59b6919e77fa572d104d72c2bf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= Date: Wed, 8 Oct 2014 12:35:36 +0100 Subject: [PATCH] maint: avoid new signed overflow warning on 32 bit Prompted by http://hydra.nixos.org/build/15682577 with GCC 4.8.3 on i686 src/tac.c:557:6: error: assuming signed overflow does not occur when simplifying conditional to constant [-Werror=strict-overflow] if (bytes_copied < 0) This happens because copy_to_temp() is inlined in tac_nonseekable(), thus reducing the comparison to the bytes_copied variable in copy_to_temp. Now this can't overflow on either 32 or 64 bit due to the protection of the preceding fwrite(). We could use a guard like "if (bytes_copied <= OFF_T_MAX - bytes_read)" to avoid the warning, but rather than a runtime branch, just use an unsigned type to avoid apparent signed overflow on systems where the accumulation is not promoted to unsigned (32 bit size_t, 64 bit off_t). * src/tac.c (copy_to_temp): Increment an unsigned type to avoid the subsequent signed overflow warning. --- src/tac.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/src/tac.c b/src/tac.c index 248afa9..777ec91 100644 --- a/src/tac.c +++ b/src/tac.c @@ -506,7 +506,7 @@ copy_to_temp (FILE **g_tmp, char **g_tempfile, int input_fd, char const *file) { FILE *fp; char *file_name; - off_t bytes_copied = 0; + uintmax_t bytes_copied = 0; if (!temp_stream (&fp, &file_name)) return -1; @@ -527,6 +527,9 @@ copy_to_temp (FILE **g_tmp, char **g_tempfile, int input_fd, char const *file) goto Fail; } + /* Implicitly <= OFF_T_MAX due to preceding fwrite(), + but unsigned type used to avoid compiler warnings + not aware of this fact. */ bytes_copied += bytes_read; } -- 1.7.7.6
bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
On Tue, Oct 7, 2014 at 5:36 PM, Pádraig Brady wrote: > On 10/08/2014 12:51 AM, Paul Eggert wrote: >> Paul Eggert wrote: >> >>> The attached patch still needs a changelog entry and test cases. >> >> I wrote those up and pushed the attached patch; this should fix the bug so >> I'm closing the bug report. > > I was just going through the patch as it happens, and I found no issues. > I see the tac changes should also avoid erroneous errors if a file was > truncated while reading for example. > I pushed a follow up to avoid some new syntax check errors. > > thanks! > Pádraig. I pushed a patch to avoid a new failiure of the split/b-chunk.sh test on non-Linux.
bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
On 10/08/2014 12:51 AM, Paul Eggert wrote: > Paul Eggert wrote: > >> The attached patch still needs a changelog entry and test cases. > > I wrote those up and pushed the attached patch; this should fix the bug so > I'm closing the bug report. I was just going through the patch as it happens, and I found no issues. I see the tac changes should also avoid erroneous errors if a file was truncated while reading for example. I pushed a follow up to avoid some new syntax check errors. thanks! Pádraig.
bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Paul Eggert wrote: The attached patch still needs a changelog entry and test cases. I wrote those up and pushed the attached patch; this should fix the bug so I'm closing the bug report. From c3c87a92ba016495c20d13b80a42d750f3e0fba0 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 7 Oct 2014 16:46:08 -0700 Subject: [PATCH] wc: don't miscount /sys and similar file systems Fix similar problems in head, od, split, tac, and tail. Reported by George Shuklin in: http://bugs.gnu.org/18621 * NEWS: Document this. * src/head.c (elseek): Move up. (elide_tail_bytes_pipe, elide_tail_lines_pipe): New arg CURRENT_POS. All uses changed. (elide_tail_bytes_file, elide_tail_lines_file): New arg ST and remove arg SIZE. All uses changed. * src/head.c (elide_tail_bytes_file): * src/od.c (skip): Avoid optimization for /sys files, where st_size is bogus and st_size == st_blksize. Don't report error at EOF when not optimizing. * src/head.c, src/od.c, src/tail.c: Include "stat-size.h". * src/split.c (input_file_size): New function. (bytes_split, lines_chunk_split, bytes_chunk_extract): New arg INITIAL_READ. All uses changed. Use it to double-check st_size. * src/tac.c (tac_seekable): New arg FILE_POS. All uses changed. (copy_to_temp): Return size of temp file. All uses changed. * src/tac.c (tac_seekable): * src/tail.c (tail_bytes): * src/wc.c (wc): Don't trust st_size; double-check by reading. * src/wc.c (wc): New arg CURRENT_POS. All uses changed. * tests/local.mk (all_tests): Add tests/misc/wc-proc.sh, tests/misc/od-j.sh, tests/tail-2/tail-c.sh. * tests/misc/head-c.sh: * tests/misc/tac-2-nonseekable.sh: * tests/split/b-chunk.sh: Add tests for problems with /proc and /sys files. * tests/misc/od-j.sh, tests/misc/wc-proc.sh, tests/tail-2/tail-c.sh: New files. --- NEWS| 3 + src/head.c | 147 src/od.c| 23 +-- src/split.c | 146 ++- src/tac.c | 70 --- src/tail.c | 43 +--- src/wc.c| 45 ++-- tests/local.mk | 3 + tests/misc/head-c.sh| 12 tests/misc/od-j.sh | 39 +++ tests/misc/tac-2-nonseekable.sh | 14 +++- tests/misc/wc-proc.sh | 32 + tests/split/b-chunk.sh | 39 +++ tests/tail-2/tail-c.sh | 35 ++ 14 files changed, 463 insertions(+), 188 deletions(-) create mode 100755 tests/misc/od-j.sh create mode 100755 tests/misc/wc-proc.sh create mode 100755 tests/tail-2/tail-c.sh diff --git a/NEWS b/NEWS index 1811ae4..a323b0c 100644 --- a/NEWS +++ b/NEWS @@ -12,6 +12,9 @@ GNU coreutils NEWS-*- outline -*- file types, a warning is issued for source directories with duplicate names, or with -H the directory is copied again using the symlink name. + head, od, split, tac, tail, and wc no longer mishandle input from files in + /proc and /sys file systems that report somewhat-incorrect file sizes. + ** New features chroot accepts the new --skip-chdir option to not change the working directory diff --git a/src/head.c b/src/head.c index d2f1fce..2782f8e 100644 --- a/src/head.c +++ b/src/head.c @@ -36,6 +36,7 @@ #include "quote.h" #include "quotearg.h" #include "safe-read.h" +#include "stat-size.h" #include "xfreopen.h" #include "xstrtol.h" @@ -206,13 +207,42 @@ copy_fd (int src_fd, uintmax_t n_bytes) return COPY_FD_OK; } -/* Print all but the last N_ELIDE bytes from the input available via - the non-seekable file descriptor FD. Return true upon success. +/* Call lseek (FD, OFFSET, WHENCE), where file descriptor FD + corresponds to the file FILENAME. WHENCE must be SEEK_SET or + SEEK_CUR. Return the resulting offset. Give a diagnostic and + return -1 if lseek fails. */ + +static off_t +elseek (int fd, off_t offset, int whence, char const *filename) +{ + off_t new_offset = lseek (fd, offset, whence); + char buf[INT_BUFSIZE_BOUND (offset)]; + + if (new_offset < 0) +error (0, errno, + _(whence == SEEK_SET + ? N_("%s: cannot seek to offset %s") + : N_("%s: cannot seek to relative offset %s")), + quotearg_colon (filename), + offtostr (offset, buf)); + + return new_offset; +} + +/* For an input file with name FILENAME and descriptor FD, + output all but the last N_ELIDE_0 bytes. + If CURRENT_POS is nonnegative, assume that the input file is + positioned at CURRENT_POS and that it should be repositioned to + just before the elided bytes before returning. + Return true upon success. Give a diagnostic and return false upon error. */ static bool -elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0) +elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide
bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Paul Eggert wrote: How about if we change usable_st_size to return false for these proc files Attached is a better idea, I hope. I audited the coreutils code to look for problematic uses of SEEK_END or st_size when reading files (I didn't look at writing; one can of worms at a time). The attached patch still needs a changelog entry and test cases. The basic idea is to not trust st_size when it's <= ST_BLKSIZE. This fixes bugs in 'head', 'od', 'split', 'tac', 'tail', and 'wc' when applied to input files in proc or sysfs file systems. Here's an example bug fixed by this patch: $ cat /sys/kernel/profiling 0 $ tail -2c /sys/kernel/profiling $ This should be: $ cat /sys/kernel/profiling 0 $ tail -2c /sys/kernel/profiling 0 $ diff --git a/src/head.c b/src/head.c index d2f1fce..f6f890a 100644 --- a/src/head.c +++ b/src/head.c @@ -36,6 +36,7 @@ #include "quote.h" #include "quotearg.h" #include "safe-read.h" +#include "stat-size.h" #include "xfreopen.h" #include "xstrtol.h" @@ -206,13 +207,42 @@ copy_fd (int src_fd, uintmax_t n_bytes) return COPY_FD_OK; } -/* Print all but the last N_ELIDE bytes from the input available via - the non-seekable file descriptor FD. Return true upon success. +/* Call lseek (FD, OFFSET, WHENCE), where file descriptor FD + corresponds to the file FILENAME. WHENCE must be SEEK_SET or + SEEK_CUR. Return the resulting offset. Give a diagnostic and + return -1 if lseek fails. */ + +static off_t +elseek (int fd, off_t offset, int whence, char const *filename) +{ + off_t new_offset = lseek (fd, offset, whence); + char buf[INT_BUFSIZE_BOUND (offset)]; + + if (new_offset < 0) +error (0, errno, + _(whence == SEEK_SET + ? N_("%s: cannot seek to offset %s") + : N_("%s: cannot seek to relative offset %s")), + quotearg_colon (filename), + offtostr (offset, buf)); + + return new_offset; +} + +/* For an input file with name FILENAME and descriptor FD, + output all but the last N_ELIDE_0 bytes. + If CURRENT_POS is nonnegative, assume that the input file is + positioned at CURRENT_POS and that it should be repositioned to + just before the elided bytes before returning. + Return true upon success. Give a diagnostic and return false upon error. */ static bool -elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0) +elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0, + off_t current_pos) { size_t n_elide = n_elide_0; + uintmax_t desired_pos = current_pos; + bool ok = true; #ifndef HEAD_TAIL_PIPE_READ_BUFSIZE # define HEAD_TAIL_PIPE_READ_BUFSIZE BUFSIZ @@ -251,7 +281,6 @@ elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0) if (n_elide <= HEAD_TAIL_PIPE_BYTECOUNT_THRESHOLD) { - bool ok = true; bool first = true; bool eof = false; size_t n_to_read = READ_BUFSIZE + n_elide; @@ -293,22 +322,26 @@ elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0) /* Output any (but maybe just part of the) elided data from the previous round. */ if (! first) -xwrite_stdout (b[!i] + READ_BUFSIZE, n_elide - delta); +{ + desired_pos += n_elide - delta; + xwrite_stdout (b[!i] + READ_BUFSIZE, n_elide - delta); +} first = false; if (n_elide < n_read) -xwrite_stdout (b[i], n_read - n_elide); +{ + desired_pos += n_read - n_elide; + xwrite_stdout (b[i], n_read - n_elide); +} } free (b[0]); - return ok; } else { /* Read blocks of size READ_BUFSIZE, until we've read at least n_elide bytes. Then, for each new buffer we read, also write an old one. */ - bool ok = true; bool eof = false; size_t n_read; bool buffered_enough; @@ -357,7 +390,10 @@ elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0) buffered_enough = true; if (buffered_enough) -xwrite_stdout (b[i_next], n_read); +{ + desired_pos += n_read; + xwrite_stdout (b[i_next], n_read); +} } /* Output any remainder: rem bytes from b[i] + n_read. */ @@ -366,6 +402,7 @@ elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0) if (buffered_enough) { size_t n_bytes_left_in_b_i = READ_BUFSIZE - n_read; + desired_pos += rem; if (rem < n_bytes_left_in_b_i) { xwrite_stdout (b[i] + n_read, rem); @@ -392,6 +429,7 @@ elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0) */ size_t y = READ_BUFSIZE - rem; size_t x = n_read - y; + desired_pos
bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
On 10/03/2014 07:47 PM, Paul Eggert wrote: > On 10/03/2014 11:26 AM, Jim Meyering wrote: >> That looks like a fine fix. > > Unfortunately that fix would make 'wc -c' way slower for a file that > consists entirely of a big hole. True, which you could avoid by deferring to read() for empty files: diff --git a/src/wc.c b/src/wc.c index 1ff007d..f8176cc 100644 --- a/src/wc.c +++ b/src/wc.c @@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus) fstatus->failed = fstat (fd, &fstatus->st); if (! fstatus->failed && S_ISREG (fstatus->st.st_mode) + && fstatus->st.st_blocks && fstatus->st.st_size && (current_pos = lseek (fd, 0, SEEK_CUR)) != -1 && (end_pos = lseek (fd, 0, SEEK_END)) != -1) { > How about if we change usable_st_size to return false for these proc files, > with a heuristic as tight as we can make it, and to have coreutils check > usable_st_size in more places. Something like this, perhaps: > > /* Return a boolean indicating whether SB->st_size is correct. */ > static inline bool > usable_st_size (struct stat const *sb) > { > if (S_ISREG (sb->st_mode)) > { > /* proc files like /sys/kernel/vmcoreinfo are weird: their >st_size values do not reflect what's actually in them. >The following heuristic attempts to catch proc files without >catching many regular files that just happen to have the same >signature. */ > return ! (sb->st_uid == 0 && sb->st_gid == 0 && sb->st_blocks == 0 > && sb->st_size == ST_BLKSIZE (*sb)); > } > return (S_ISLNK (sb->st_mode) || S_TYPEISSHM (sb) || S_TYPEISTMO (sb)); > } > > and then review every place where coreutils currently uses st_size and > prepend a check for usable_st_size if needed. That would be usefult, and that check matches this case, however many are not distinguishable. Consider: `stat /proc/$$/io` which has st_size and st_blocks = 0 Note my adjusted patch above handles both cases. thanks, Pádraig
bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
On 10/03/2014 11:26 AM, Jim Meyering wrote: That looks like a fine fix. Unfortunately that fix would make 'wc -c' way slower for a file that consists entirely of a big hole. How about if we change usable_st_size to return false for these proc files, with a heuristic as tight as we can make it, and to have coreutils check usable_st_size in more places. Something like this, perhaps: /* Return a boolean indicating whether SB->st_size is correct. */ static inline bool usable_st_size (struct stat const *sb) { if (S_ISREG (sb->st_mode)) { /* proc files like /sys/kernel/vmcoreinfo are weird: their st_size values do not reflect what's actually in them. The following heuristic attempts to catch proc files without catching many regular files that just happen to have the same signature. */ return ! (sb->st_uid == 0 && sb->st_gid == 0 && sb->st_blocks == 0 && sb->st_size == ST_BLKSIZE (*sb)); } return (S_ISLNK (sb->st_mode) || S_TYPEISSHM (sb) || S_TYPEISTMO (sb)); } and then review every place where coreutils currently uses st_size and prepend a check for usable_st_size if needed.
bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
On Fri, Oct 3, 2014 at 9:48 AM, Pádraig Brady wrote: > On 10/03/2014 03:47 PM, George Shuklin wrote: ... > I'm not sure where the above code comes from, > by coreutils trunk has the same behavior with these files. > We could avoid it with the following patch. > Note in the case where "real" small files don't > take up space in the file system, this will involve a redundant read, > however that will only be the case for small files so shouldn't > be problematic. > > thanks, > Pádraig. > > diff --git a/src/wc.c b/src/wc.c > index 1ff007d..bf1ce76 100644 > --- a/src/wc.c > +++ b/src/wc.c > @@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus) > fstatus->failed = fstat (fd, &fstatus->st); > >if (! fstatus->failed && S_ISREG (fstatus->st.st_mode) > + && fstatus->st.st_blocks >&& (current_pos = lseek (fd, 0, SEEK_CUR)) != -1 >&& (end_pos = lseek (fd, 0, SEEK_END)) != -1) > { That looks like a fine fix. However, a similar issue affects tac, when its lseek-SEEK_END fails: $ tac /sys/kernel/vmcoreinfo tac: /sys/kernel/vmcoreinfo: read error: Inappropriate ioctl for device
bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
On 10/03/2014 03:47 PM, George Shuklin wrote: > There is many sysfs (linux) attributes which reported as '4k files' but > contains just a few bytes. > > wc file and wc -c shows different sizes. > > Example: > > $cat /sys/kernel/vmcoreinfo > 1b74c00 1024 > > $hexdump -Cv /sys/kernel/vmcoreinfo > 31 62 37 34 63 30 30 20 31 30 32 34 0a |1b74c00 1024.| > 000d > > $ls -la /sys/kernel/vmcoreinfo > -r--r--r-- 1 root root 4096 Oct 3 17:40 /sys/kernel/vmcoreinfo > > > Here wc output: > > $ wc /sys/kernel/vmcoreinfo >12 13 /sys/kernel/vmcoreinfo > > and wc -c: > > $ wc -c /sys/kernel/vmcoreinfo > 4096 /sys/kernel/vmcoreinfo > > 4096 is not 13, and manual page for wc says that third number is byte count. > > I think problem is in cnt(const char *file) function: > > if (dochar || domulti) { > if (fstat(fd, &sb)) { > warn("%s: fstat", file); > (void)close(fd); > return (1); > } > if (S_ISREG(sb.st_mode)) { > (void)printf(" %7lld", (long long)sb.st_size); > tcharct += sb.st_size; > (void)close(fd); > return (0); > } > } I'm not sure where the above code comes from, by coreutils trunk has the same behavior with these files. We could avoid it with the following patch. Note in the case where "real" small files don't take up space in the file system, this will involve a redundant read, however that will only be the case for small files so shouldn't be problematic. thanks, Pádraig. diff --git a/src/wc.c b/src/wc.c index 1ff007d..bf1ce76 100644 --- a/src/wc.c +++ b/src/wc.c @@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus) fstatus->failed = fstat (fd, &fstatus->st); if (! fstatus->failed && S_ISREG (fstatus->st.st_mode) + && fstatus->st.st_blocks && (current_pos = lseek (fd, 0, SEEK_CUR)) != -1 && (end_pos = lseek (fd, 0, SEEK_END)) != -1) {
bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
There is many sysfs (linux) attributes which reported as '4k files' but contains just a few bytes. wc file and wc -c shows different sizes. Example: $cat /sys/kernel/vmcoreinfo 1b74c00 1024 $hexdump -Cv /sys/kernel/vmcoreinfo 31 62 37 34 63 30 30 20 31 30 32 34 0a |1b74c00 1024.| 000d $ls -la /sys/kernel/vmcoreinfo -r--r--r-- 1 root root 4096 Oct 3 17:40 /sys/kernel/vmcoreinfo Here wc output: $ wc /sys/kernel/vmcoreinfo 12 13 /sys/kernel/vmcoreinfo and wc -c: $ wc -c /sys/kernel/vmcoreinfo 4096 /sys/kernel/vmcoreinfo 4096 is not 13, and manual page for wc says that third number is byte count. I think problem is in cnt(const char *file) function: if (dochar || domulti) { if (fstat(fd, &sb)) { warn("%s: fstat", file); (void)close(fd); return (1); } if (S_ISREG(sb.st_mode)) { (void)printf(" %7lld", (long long)sb.st_size); tcharct += sb.st_size; (void)close(fd); return (0); } }