bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-08 Thread Pádraig Brady
On 10/08/2014 02:12 AM, Jim Meyering wrote:
> On Tue, Oct 7, 2014 at 5:36 PM, Pádraig Brady  wrote:
>> On 10/08/2014 12:51 AM, Paul Eggert wrote:
>>> Paul Eggert wrote:
>>>
 The attached patch still needs a changelog entry and test cases.
>>>
>>> I wrote those up and pushed the attached patch; this should fix the bug so 
>>> I'm closing the bug report.
>>
>> I was just going through the patch as it happens, and I found no issues.
>> I see the tac changes should also avoid erroneous errors if a file was 
>> truncated while reading for example.
>> I pushed a follow up to avoid some new syntax check errors.
>>
>> thanks!
>> Pádraig.
> 
> I pushed a patch to avoid a new failiure of the
> split/b-chunk.sh test on non-Linux.

The attached is needed to avoid a new warning on 32 bit.

thanks,
Pádraig.

>From 16c7267d7425fe59b6919e77fa572d104d72c2bf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Wed, 8 Oct 2014 12:35:36 +0100
Subject: [PATCH] maint: avoid new signed overflow warning on 32 bit

Prompted by http://hydra.nixos.org/build/15682577
with GCC 4.8.3 on i686

  src/tac.c:557:6: error: assuming signed overflow does not occur
  when simplifying conditional to constant [-Werror=strict-overflow]
 if (bytes_copied < 0)

This happens because copy_to_temp() is inlined in tac_nonseekable(),
thus reducing the comparison to the bytes_copied variable in
copy_to_temp.  Now this can't overflow on either 32 or 64 bit
due to the protection of the preceding fwrite().  We could use a
guard like "if (bytes_copied <= OFF_T_MAX - bytes_read)" to avoid
the warning, but rather than a runtime branch, just use an unsigned
type to avoid apparent signed overflow on systems where the accumulation
is not promoted to unsigned (32 bit size_t, 64 bit off_t).

* src/tac.c (copy_to_temp): Increment an unsigned type to
avoid the subsequent signed overflow warning.
---
 src/tac.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/src/tac.c b/src/tac.c
index 248afa9..777ec91 100644
--- a/src/tac.c
+++ b/src/tac.c
@@ -506,7 +506,7 @@ copy_to_temp (FILE **g_tmp, char **g_tempfile, int input_fd, char const *file)
 {
   FILE *fp;
   char *file_name;
-  off_t bytes_copied = 0;
+  uintmax_t bytes_copied = 0;
   if (!temp_stream (&fp, &file_name))
 return -1;
 
@@ -527,6 +527,9 @@ copy_to_temp (FILE **g_tmp, char **g_tempfile, int input_fd, char const *file)
   goto Fail;
 }
 
+  /* Implicitly <= OFF_T_MAX due to preceding fwrite(),
+ but unsigned type used to avoid compiler warnings
+ not aware of this fact.  */
   bytes_copied += bytes_read;
 }
 
-- 
1.7.7.6



bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-07 Thread Jim Meyering
On Tue, Oct 7, 2014 at 5:36 PM, Pádraig Brady  wrote:
> On 10/08/2014 12:51 AM, Paul Eggert wrote:
>> Paul Eggert wrote:
>>
>>> The attached patch still needs a changelog entry and test cases.
>>
>> I wrote those up and pushed the attached patch; this should fix the bug so 
>> I'm closing the bug report.
>
> I was just going through the patch as it happens, and I found no issues.
> I see the tac changes should also avoid erroneous errors if a file was 
> truncated while reading for example.
> I pushed a follow up to avoid some new syntax check errors.
>
> thanks!
> Pádraig.

I pushed a patch to avoid a new failiure of the
split/b-chunk.sh test on non-Linux.





bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-07 Thread Pádraig Brady
On 10/08/2014 12:51 AM, Paul Eggert wrote:
> Paul Eggert wrote:
> 
>> The attached patch still needs a changelog entry and test cases.
> 
> I wrote those up and pushed the attached patch; this should fix the bug so 
> I'm closing the bug report.

I was just going through the patch as it happens, and I found no issues.
I see the tac changes should also avoid erroneous errors if a file was 
truncated while reading for example.
I pushed a follow up to avoid some new syntax check errors.

thanks!
Pádraig.





bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-07 Thread Paul Eggert

Paul Eggert wrote:


The attached patch still needs a changelog entry and test cases.


I wrote those up and pushed the attached patch; this should fix the bug so I'm 
closing the bug report.


From c3c87a92ba016495c20d13b80a42d750f3e0fba0 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Tue, 7 Oct 2014 16:46:08 -0700
Subject: [PATCH] wc: don't miscount /sys and similar file systems

Fix similar problems in head, od, split, tac, and tail.
Reported by George Shuklin in: http://bugs.gnu.org/18621
* NEWS: Document this.
* src/head.c (elseek): Move up.
(elide_tail_bytes_pipe, elide_tail_lines_pipe): New arg
CURRENT_POS.  All uses changed.
(elide_tail_bytes_file, elide_tail_lines_file):
New arg ST and remove arg SIZE.  All uses changed.
* src/head.c (elide_tail_bytes_file):
* src/od.c (skip): Avoid optimization for /sys files, where
st_size is bogus and st_size == st_blksize.
Don't report error at EOF when not optimizing.
* src/head.c, src/od.c, src/tail.c: Include "stat-size.h".
* src/split.c (input_file_size): New function.
(bytes_split, lines_chunk_split, bytes_chunk_extract): New arg
INITIAL_READ.  All uses changed. Use it to double-check st_size.
* src/tac.c (tac_seekable): New arg FILE_POS.  All uses changed.
(copy_to_temp): Return size of temp file.  All uses changed.
* src/tac.c (tac_seekable):
* src/tail.c (tail_bytes):
* src/wc.c (wc):
Don't trust st_size; double-check by reading.
* src/wc.c (wc): New arg CURRENT_POS.  All uses changed.
* tests/local.mk (all_tests): Add tests/misc/wc-proc.sh,
tests/misc/od-j.sh, tests/tail-2/tail-c.sh.
* tests/misc/head-c.sh:
* tests/misc/tac-2-nonseekable.sh:
* tests/split/b-chunk.sh:
Add tests for problems with /proc and /sys files.
* tests/misc/od-j.sh, tests/misc/wc-proc.sh, tests/tail-2/tail-c.sh:
New files.
---
 NEWS|   3 +
 src/head.c  | 147 
 src/od.c|  23 +--
 src/split.c | 146 ++-
 src/tac.c   |  70 ---
 src/tail.c  |  43 +---
 src/wc.c|  45 ++--
 tests/local.mk  |   3 +
 tests/misc/head-c.sh|  12 
 tests/misc/od-j.sh  |  39 +++
 tests/misc/tac-2-nonseekable.sh |  14 +++-
 tests/misc/wc-proc.sh   |  32 +
 tests/split/b-chunk.sh  |  39 +++
 tests/tail-2/tail-c.sh  |  35 ++
 14 files changed, 463 insertions(+), 188 deletions(-)
 create mode 100755 tests/misc/od-j.sh
 create mode 100755 tests/misc/wc-proc.sh
 create mode 100755 tests/tail-2/tail-c.sh

diff --git a/NEWS b/NEWS
index 1811ae4..a323b0c 100644
--- a/NEWS
+++ b/NEWS
@@ -12,6 +12,9 @@ GNU coreutils NEWS-*- 
outline -*-
   file types, a warning is issued for source directories with duplicate names,
   or with -H the directory is copied again using the symlink name.
 
+  head, od, split, tac, tail, and wc no longer mishandle input from files in
+  /proc and /sys file systems that report somewhat-incorrect file sizes.
+
 ** New features
 
   chroot accepts the new --skip-chdir option to not change the working 
directory
diff --git a/src/head.c b/src/head.c
index d2f1fce..2782f8e 100644
--- a/src/head.c
+++ b/src/head.c
@@ -36,6 +36,7 @@
 #include "quote.h"
 #include "quotearg.h"
 #include "safe-read.h"
+#include "stat-size.h"
 #include "xfreopen.h"
 #include "xstrtol.h"
 
@@ -206,13 +207,42 @@ copy_fd (int src_fd, uintmax_t n_bytes)
   return COPY_FD_OK;
 }
 
-/* Print all but the last N_ELIDE bytes from the input available via
-   the non-seekable file descriptor FD.  Return true upon success.
+/* Call lseek (FD, OFFSET, WHENCE), where file descriptor FD
+   corresponds to the file FILENAME.  WHENCE must be SEEK_SET or
+   SEEK_CUR.  Return the resulting offset.  Give a diagnostic and
+   return -1 if lseek fails.  */
+
+static off_t
+elseek (int fd, off_t offset, int whence, char const *filename)
+{
+  off_t new_offset = lseek (fd, offset, whence);
+  char buf[INT_BUFSIZE_BOUND (offset)];
+
+  if (new_offset < 0)
+error (0, errno,
+   _(whence == SEEK_SET
+ ? N_("%s: cannot seek to offset %s")
+ : N_("%s: cannot seek to relative offset %s")),
+   quotearg_colon (filename),
+   offtostr (offset, buf));
+
+  return new_offset;
+}
+
+/* For an input file with name FILENAME and descriptor FD,
+   output all but the last N_ELIDE_0 bytes.
+   If CURRENT_POS is nonnegative, assume that the input file is
+   positioned at CURRENT_POS and that it should be repositioned to
+   just before the elided bytes before returning.
+   Return true upon success.
Give a diagnostic and return false upon error.  */
 static bool
-elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0)
+elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-07 Thread Paul Eggert

Paul Eggert wrote:

How about if we change usable_st_size to return false for these proc files


Attached is a better idea, I hope.  I audited the coreutils code to look for 
problematic uses of SEEK_END or st_size when reading files (I didn't look at 
writing; one can of worms at a time).


The attached patch still needs a changelog entry and test cases.  The basic idea 
is to not trust st_size when it's <= ST_BLKSIZE.  This fixes bugs in 'head', 
'od', 'split', 'tac', 'tail', and 'wc' when applied to input files in proc or 
sysfs file systems.


Here's an example bug fixed by this patch:

$ cat /sys/kernel/profiling
0
$ tail -2c /sys/kernel/profiling
$

This should be:

$ cat /sys/kernel/profiling
0
$ tail -2c /sys/kernel/profiling
0
$

diff --git a/src/head.c b/src/head.c
index d2f1fce..f6f890a 100644
--- a/src/head.c
+++ b/src/head.c
@@ -36,6 +36,7 @@
 #include "quote.h"
 #include "quotearg.h"
 #include "safe-read.h"
+#include "stat-size.h"
 #include "xfreopen.h"
 #include "xstrtol.h"
 
@@ -206,13 +207,42 @@ copy_fd (int src_fd, uintmax_t n_bytes)
   return COPY_FD_OK;
 }
 
-/* Print all but the last N_ELIDE bytes from the input available via
-   the non-seekable file descriptor FD.  Return true upon success.
+/* Call lseek (FD, OFFSET, WHENCE), where file descriptor FD
+   corresponds to the file FILENAME.  WHENCE must be SEEK_SET or
+   SEEK_CUR.  Return the resulting offset.  Give a diagnostic and
+   return -1 if lseek fails.  */
+
+static off_t
+elseek (int fd, off_t offset, int whence, char const *filename)
+{
+  off_t new_offset = lseek (fd, offset, whence);
+  char buf[INT_BUFSIZE_BOUND (offset)];
+
+  if (new_offset < 0)
+error (0, errno,
+   _(whence == SEEK_SET
+ ? N_("%s: cannot seek to offset %s")
+ : N_("%s: cannot seek to relative offset %s")),
+   quotearg_colon (filename),
+   offtostr (offset, buf));
+
+  return new_offset;
+}
+
+/* For an input file with name FILENAME and descriptor FD,
+   output all but the last N_ELIDE_0 bytes.
+   If CURRENT_POS is nonnegative, assume that the input file is
+   positioned at CURRENT_POS and that it should be repositioned to
+   just before the elided bytes before returning.
+   Return true upon success.
Give a diagnostic and return false upon error.  */
 static bool
-elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0)
+elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0,
+   off_t current_pos)
 {
   size_t n_elide = n_elide_0;
+  uintmax_t desired_pos = current_pos;
+  bool ok = true;
 
 #ifndef HEAD_TAIL_PIPE_READ_BUFSIZE
 # define HEAD_TAIL_PIPE_READ_BUFSIZE BUFSIZ
@@ -251,7 +281,6 @@ elide_tail_bytes_pipe (const char *filename, int fd, 
uintmax_t n_elide_0)
 
   if (n_elide <= HEAD_TAIL_PIPE_BYTECOUNT_THRESHOLD)
 {
-  bool ok = true;
   bool first = true;
   bool eof = false;
   size_t n_to_read = READ_BUFSIZE + n_elide;
@@ -293,22 +322,26 @@ elide_tail_bytes_pipe (const char *filename, int fd, 
uintmax_t n_elide_0)
   /* Output any (but maybe just part of the) elided data from
  the previous round.  */
   if (! first)
-xwrite_stdout (b[!i] + READ_BUFSIZE, n_elide - delta);
+{
+  desired_pos += n_elide - delta;
+  xwrite_stdout (b[!i] + READ_BUFSIZE, n_elide - delta);
+}
   first = false;
 
   if (n_elide < n_read)
-xwrite_stdout (b[i], n_read - n_elide);
+{
+  desired_pos += n_read - n_elide;
+  xwrite_stdout (b[i], n_read - n_elide);
+}
 }
 
   free (b[0]);
-  return ok;
 }
   else
 {
   /* Read blocks of size READ_BUFSIZE, until we've read at least n_elide
  bytes.  Then, for each new buffer we read, also write an old one.  */
 
-  bool ok = true;
   bool eof = false;
   size_t n_read;
   bool buffered_enough;
@@ -357,7 +390,10 @@ elide_tail_bytes_pipe (const char *filename, int fd, 
uintmax_t n_elide_0)
 buffered_enough = true;
 
   if (buffered_enough)
-xwrite_stdout (b[i_next], n_read);
+{
+  desired_pos += n_read;
+  xwrite_stdout (b[i_next], n_read);
+}
 }
 
   /* Output any remainder: rem bytes from b[i] + n_read.  */
@@ -366,6 +402,7 @@ elide_tail_bytes_pipe (const char *filename, int fd, 
uintmax_t n_elide_0)
   if (buffered_enough)
 {
   size_t n_bytes_left_in_b_i = READ_BUFSIZE - n_read;
+  desired_pos += rem;
   if (rem < n_bytes_left_in_b_i)
 {
   xwrite_stdout (b[i] + n_read, rem);
@@ -392,6 +429,7 @@ elide_tail_bytes_pipe (const char *filename, int fd, 
uintmax_t n_elide_0)
*/
   size_t y = READ_BUFSIZE - rem;
   size_t x = n_read - y;
+  desired_pos

bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-03 Thread Pádraig Brady
On 10/03/2014 07:47 PM, Paul Eggert wrote:
> On 10/03/2014 11:26 AM, Jim Meyering wrote:
>> That looks like a fine fix.
> 
> Unfortunately that fix would make 'wc -c' way slower for a file that 
> consists entirely of a big hole.

True, which you could avoid by deferring to read() for empty files:

diff --git a/src/wc.c b/src/wc.c
index 1ff007d..f8176cc 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
 fstatus->failed = fstat (fd, &fstatus->st);

   if (! fstatus->failed && S_ISREG (fstatus->st.st_mode)
+  && fstatus->st.st_blocks && fstatus->st.st_size
   && (current_pos = lseek (fd, 0, SEEK_CUR)) != -1
   && (end_pos = lseek (fd, 0, SEEK_END)) != -1)
 {

> How about if we change usable_st_size to return false for these proc files, 
> with a heuristic as tight as we can make it, and to have coreutils check 
> usable_st_size in more places.  Something like this, perhaps:
> 
>   /* Return a boolean indicating whether SB->st_size is correct. */
>   static inline bool
>   usable_st_size (struct stat const *sb)
>   {
> if (S_ISREG (sb->st_mode))
>   {
> /* proc files like /sys/kernel/vmcoreinfo are weird: their
>st_size values do not reflect what's actually in them.
>The following heuristic attempts to catch proc files without
>catching many regular files that just happen to have the same
>signature.  */
> return ! (sb->st_uid == 0 && sb->st_gid == 0 && sb->st_blocks == 0
>   && sb->st_size == ST_BLKSIZE (*sb));
>   }
> return (S_ISLNK (sb->st_mode) || S_TYPEISSHM (sb) || S_TYPEISTMO (sb));
>   }
>
> and then review every place where coreutils currently uses st_size and 
> prepend a check for usable_st_size if needed.

That would be usefult, and that check matches this case, however many are not
distinguishable. Consider: `stat /proc/$$/io` which has st_size and st_blocks = 0
Note my adjusted patch above handles both cases.

thanks,
Pádraig





bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-03 Thread Paul Eggert

On 10/03/2014 11:26 AM, Jim Meyering wrote:

That looks like a fine fix.


Unfortunately that fix would make 'wc -c' way slower for a file that 
consists entirely of a big hole.


How about if we change usable_st_size to return false for these proc 
files, with a heuristic as tight as we can make it, and to have 
coreutils check usable_st_size in more places.  Something like this, 
perhaps:


  /* Return a boolean indicating whether SB->st_size is correct. */
  static inline bool
  usable_st_size (struct stat const *sb)
  {
if (S_ISREG (sb->st_mode))
  {
/* proc files like /sys/kernel/vmcoreinfo are weird: their
   st_size values do not reflect what's actually in them.
   The following heuristic attempts to catch proc files without
   catching many regular files that just happen to have the same
   signature.  */
return ! (sb->st_uid == 0 && sb->st_gid == 0 && sb->st_blocks == 0
  && sb->st_size == ST_BLKSIZE (*sb));
  }
return (S_ISLNK (sb->st_mode) || S_TYPEISSHM (sb) || S_TYPEISTMO (sb));
  }


and then review every place where coreutils currently uses st_size and 
prepend a check for usable_st_size if needed.






bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-03 Thread Jim Meyering
On Fri, Oct 3, 2014 at 9:48 AM, Pádraig Brady  wrote:
> On 10/03/2014 03:47 PM, George Shuklin wrote:
...
> I'm not sure where the above code comes from,
> by coreutils trunk has the same behavior with these files.
> We could avoid it with the following patch.
> Note in the case where "real" small files don't
> take up space in the file system, this will involve a redundant read,
> however that will only be the case for small files so shouldn't
> be problematic.
>
> thanks,
> Pádraig.
>
> diff --git a/src/wc.c b/src/wc.c
> index 1ff007d..bf1ce76 100644
> --- a/src/wc.c
> +++ b/src/wc.c
> @@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
>  fstatus->failed = fstat (fd, &fstatus->st);
>
>if (! fstatus->failed && S_ISREG (fstatus->st.st_mode)
> +  && fstatus->st.st_blocks
>&& (current_pos = lseek (fd, 0, SEEK_CUR)) != -1
>&& (end_pos = lseek (fd, 0, SEEK_END)) != -1)
>  {

That looks like a fine fix.
However, a similar issue affects tac, when its lseek-SEEK_END fails:

  $ tac /sys/kernel/vmcoreinfo
  tac: /sys/kernel/vmcoreinfo: read error: Inappropriate ioctl for device





bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-03 Thread Pádraig Brady
On 10/03/2014 03:47 PM, George Shuklin wrote:
> There is many sysfs (linux) attributes which reported as '4k files' but 
> contains just a few bytes.
> 
> wc file and wc -c shows different sizes.
> 
> Example:
> 
> $cat /sys/kernel/vmcoreinfo
> 1b74c00 1024
> 
> $hexdump -Cv /sys/kernel/vmcoreinfo
>   31 62 37 34 63 30 30 20  31 30 32 34 0a   |1b74c00 1024.|
> 000d
> 
> $ls -la /sys/kernel/vmcoreinfo
> -r--r--r-- 1 root root 4096 Oct  3 17:40 /sys/kernel/vmcoreinfo
> 
> 
> Here wc output:
> 
> $ wc /sys/kernel/vmcoreinfo
>12   13 /sys/kernel/vmcoreinfo
> 
> and wc -c:
> 
> $ wc -c /sys/kernel/vmcoreinfo
> 4096 /sys/kernel/vmcoreinfo
> 
> 4096 is not 13, and manual page for wc says that third number is byte count.
> 
> I think problem is in  cnt(const char *file)  function:
> 
> if (dochar || domulti) {
> if (fstat(fd, &sb)) {
> warn("%s: fstat", file);
> (void)close(fd);
> return (1);
> }
> if (S_ISREG(sb.st_mode)) {
> (void)printf(" %7lld", (long long)sb.st_size);
> tcharct += sb.st_size;
> (void)close(fd);
> return (0);
> }
> }

I'm not sure where the above code comes from,
by coreutils trunk has the same behavior with these files.
We could avoid it with the following patch.
Note in the case where "real" small files don't
take up space in the file system, this will involve a redundant read,
however that will only be the case for small files so shouldn't
be problematic.

thanks,
Pádraig.

diff --git a/src/wc.c b/src/wc.c
index 1ff007d..bf1ce76 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
 fstatus->failed = fstat (fd, &fstatus->st);

   if (! fstatus->failed && S_ISREG (fstatus->st.st_mode)
+  && fstatus->st.st_blocks
   && (current_pos = lseek (fd, 0, SEEK_CUR)) != -1
   && (end_pos = lseek (fd, 0, SEEK_END)) != -1)
 {






bug#18621: [BUG] wc -c incorrectly counts bytes in /sys

2014-10-03 Thread George Shuklin
There is many sysfs (linux) attributes which reported as '4k files' but 
contains just a few bytes.


wc file and wc -c shows different sizes.

Example:

$cat /sys/kernel/vmcoreinfo
1b74c00 1024

$hexdump -Cv /sys/kernel/vmcoreinfo
  31 62 37 34 63 30 30 20  31 30 32 34 0a   |1b74c00 1024.|
000d

$ls -la /sys/kernel/vmcoreinfo
-r--r--r-- 1 root root 4096 Oct  3 17:40 /sys/kernel/vmcoreinfo


Here wc output:

$ wc /sys/kernel/vmcoreinfo
   12   13 /sys/kernel/vmcoreinfo

and wc -c:

$ wc -c /sys/kernel/vmcoreinfo
4096 /sys/kernel/vmcoreinfo

4096 is not 13, and manual page for wc says that third number is byte count.

I think problem is in  cnt(const char *file)  function:

if (dochar || domulti) {
if (fstat(fd, &sb)) {
warn("%s: fstat", file);
(void)close(fd);
return (1);
}
if (S_ISREG(sb.st_mode)) {
(void)printf(" %7lld", (long long)sb.st_size);
tcharct += sb.st_size;
(void)close(fd);
return (0);
}
}