bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-20 Thread Bernhard Voelker
On 12/20/2016 02:12 PM, Pádraig Brady wrote:
> Right!
> 
> While st_size would have been incorrect for subsequent
> files since v7.1, it was only used since v8.24.
> 
> Fixed with:
> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=94d2c68

Thanks!

Have a nice day,
Berny





bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-20 Thread Pádraig Brady
On 20/12/16 01:50, Bernhard Voelker wrote:
> On 12/19/2016 08:00 PM, Pádraig Brady wrote:
>> +  [bug introduced in coreutils-7.1]
> 
> FWIW I think that the bug was not introduced in v7.0-96-gc2e56e0:
> I had a working 8.23 on a system here, so I took the time to search deeper.
> I found the reason to be the wrong value of the 'hi_pos' parameter passed
> to lseek():
> 
>   open("wc.big", O_RDONLY)= 3
>   fstat(3, {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
>   lseek(3, 1073741824, SEEK_CUR)  = 1073741824
>   read(3, "", 16384)  = 0
>   fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0
>   write(1, "1073741824 wc.big\n", 181073741824 wc.big) = 18
>   close(3)= 0
>   open("wc.small", O_RDONLY)  = 3
>   lseek(3, 1073741824, SEEK_CUR)  = 1073741824
>   read(3, "", 16384)  = 0
>   write(1, "1073741824 wc.small\n", 201073741824 wc.small
> 
> This took me directly to v8.23-47-g2662702 (which a quick test
> against v8.23-47-g2662702^ confirmed).
> 
> Therefore, I think it's worth to do the following amendment:
> 
> -  [bug introduced in coreutils-7.1]
> +  [bug introduced in coreutils-8.24]

Right!

While st_size would have been incorrect for subsequent
files since v7.1, it was only used since v8.24.

Fixed with:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=94d2c68

thanks,
Pádraig





bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-19 Thread Bernhard Voelker
On 12/19/2016 08:00 PM, Pádraig Brady wrote:
> +  [bug introduced in coreutils-7.1]

FWIW I think that the bug was not introduced in v7.0-96-gc2e56e0:
I had a working 8.23 on a system here, so I took the time to search deeper.
I found the reason to be the wrong value of the 'hi_pos' parameter passed
to lseek():

  open("wc.big", O_RDONLY)= 3
  fstat(3, {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
  lseek(3, 1073741824, SEEK_CUR)  = 1073741824
  read(3, "", 16384)  = 0
  fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 8), ...}) = 0
  write(1, "1073741824 wc.big\n", 181073741824 wc.big) = 18
  close(3)= 0
  open("wc.small", O_RDONLY)  = 3
  lseek(3, 1073741824, SEEK_CUR)  = 1073741824
  read(3, "", 16384)  = 0
  write(1, "1073741824 wc.small\n", 201073741824 wc.small

This took me directly to v8.23-47-g2662702 (which a quick test
against v8.23-47-g2662702^ confirmed).

Therefore, I think it's worth to do the following amendment:

-  [bug introduced in coreutils-7.1]
+  [bug introduced in coreutils-8.24]

Thanks & have a nice day,
Berny





bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-19 Thread William R. Fraser
Looks good :)

On Mon, Dec 19, 2016 at 11:00 AM, Pádraig Brady  wrote:

> On 21/03/16 15:16, Pádraig Brady wrote:
> > On 21/03/16 00:59, William R. Fraser wrote:
> >> When wc gets its list of files by reading from stdin, using the argument
> >> '--from-files0=-', it reuses the same fstatus struct for each file.
> >>
> >> The problem is that the 'wc' function checks the 'failed' member of this
> >> struct and if it is <=0, it skips doing fstat on the file. The main loop
> >> doesn't reset this value between files, so only the first file has fstat
> >> done on it.
> >>
> >> This can result in the 'wc' function seeking past the end of
> >> subsequent files and then over-reporting their byte counts.
> >>
> >> See the attached patch, which resets the fstatus struct in between files
> >> when reading the file list from stdin.
> >
> > Ouch. This seems to be since v7.0-96-gc2e56e0
> > It would also mean there would be a lot of redundant reading
> > if the initial file was significantly smaller than any other file.
> >
> > $ truncate -s1G wc.big
> > $ touch wc.small
> > $ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
> > 1073741824 wc.big
> > 1073741760 wc.small
> > 2147483584 total
>
> Sorry for the delay.
> I didn't go far enough back in my TODO list so missed this.
> Proposed patch attached.
>
> thanks,
> Pádraig
>
>


-- 
Bill Fraser { http://www.codewise.org/wrf }



bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-19 Thread Pádraig Brady
On 21/03/16 15:16, Pádraig Brady wrote:
> On 21/03/16 00:59, William R. Fraser wrote:
>> When wc gets its list of files by reading from stdin, using the argument
>> '--from-files0=-', it reuses the same fstatus struct for each file.
>>
>> The problem is that the 'wc' function checks the 'failed' member of this
>> struct and if it is <=0, it skips doing fstat on the file. The main loop
>> doesn't reset this value between files, so only the first file has fstat
>> done on it.
>>
>> This can result in the 'wc' function seeking past the end of
>> subsequent files and then over-reporting their byte counts.
>>
>> See the attached patch, which resets the fstatus struct in between files
>> when reading the file list from stdin.
> 
> Ouch. This seems to be since v7.0-96-gc2e56e0
> It would also mean there would be a lot of redundant reading
> if the initial file was significantly smaller than any other file.
> 
> $ truncate -s1G wc.big
> $ touch wc.small
> $ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
> 1073741824 wc.big
> 1073741760 wc.small
> 2147483584 total

Sorry for the delay.
I didn't go far enough back in my TODO list so missed this.
Proposed patch attached.

thanks,
Pádraig

>From 31bba96cb7dc565719c396b5973085e644939fd5 Mon Sep 17 00:00:00 2001
From: "William R. Fraser" 
Date: Sun, 20 Mar 2016 17:44:09 -0700
Subject: [PATCH] wc: fix wrong byte counts when using --files-from0

* src/wc.c (main): Reset fstatus[0].failed between files when reusing
the fstatus[0] entry in --files-from0 mode.  This ensures a stat() is
done for each file, avoid incorrect counts and redundant reading.
* NEWS: Mention the bug fix.
* tests/misc/wc-files0.sh: Add a test case.
Fixes http://bugs.gnu.org/23073
---
 NEWS|  5 +
 src/wc.c|  3 +++
 tests/misc/wc-files0.sh | 13 -
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index 179c19b..1ed5bd9 100644
--- a/NEWS
+++ b/NEWS
@@ -8,6 +8,11 @@ GNU coreutils NEWS-*- outline -*-
   158909489063877810457 and 222087527029934481871.
   [bug introduced in coreutils-8.20]
 
+  wc --bytes --files0-from now correctly reports byte counts.
+  Previously it may have returned values that were too large,
+  depending on the size of the first file processed.
+  [bug introduced in coreutils-7.1]
+
 
 * Noteworthy changes in release 8.26 (2016-11-30) [stable]
 
diff --git a/src/wc.c b/src/wc.c
index 412bda0..64df50c 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -807,6 +807,9 @@ main (int argc, char **argv)
 ok = false;
   else
 ok &= wc_file (file_name, [nfiles ? i : 0]);
+
+  if (! nfiles)
+fstatus[0].failed = 1;
 }
  argv_iter_done:
 
diff --git a/tests/misc/wc-files0.sh b/tests/misc/wc-files0.sh
index 12b7d6a..d92a010 100755
--- a/tests/misc/wc-files0.sh
+++ b/tests/misc/wc-files0.sh
@@ -25,7 +25,7 @@ printf '2b\n2w\n' |tr '\n' '\0' > names || framework_failure_
 
 
 wc --files0-from=names > out || fail=1
-cat <<\EOF > exp || fail=1
+cat <<\EOF > exp || framework_failure_
  1  1  2 2b
  1  2  8 2w
  2  3 10 total
@@ -48,4 +48,15 @@ printf '%s\0' "$nlname" | wc --files0-from=- > out || fail=1
 printf '%s\n' "0 0 0 '1'$'\\n''2'" > exp || framework_failure_
 compare exp out || fail=1
 
+# Ensure correct byte counts, which fails between v7.1 and v8.26 inclusive
+truncate -s1G wc.big || framework_failure_
+touch wc.small || framework_failure_
+printf '%s\0' wc.big wc.small | wc -c --files0-from=- >out || fail=1
+cat <<\EOF > exp || framework_failure_
+1073741824 wc.big
+0 wc.small
+1073741824 total
+EOF
+compare exp out || fail=1
+
 Exit $fail
-- 
2.5.5



bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-21 Thread Jim Meyering
On Sun, Mar 20, 2016 at 5:59 PM, William R. Fraser  wrote:
> When wc gets its list of files by reading from stdin, using the argument
> '--from-files0=-', it reuses the same fstatus struct for each file.
>
> The problem is that the 'wc' function checks the 'failed' member of this
> struct and if it is <=0, it skips doing fstat on the file. The main loop
> doesn't reset this value between files, so only the first file has fstat
> done on it.
>
> This can result in the 'wc' function seeking past the end of subsequent
> files and then over-reporting their byte counts.
>
> See the attached patch, which resets the fstatus struct in between files
> when reading the file list from stdin.

Thank you for the patch and report for what looks like
a bug in code I wrote.





bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-21 Thread Bernhard Voelker

On 03/21/2016 04:16 PM, Pádraig Brady wrote:

On 21/03/16 00:59, William R. Fraser wrote:

When wc gets its list of files by reading from stdin, using the argument
'--from-files0=-', it reuses the same fstatus struct for each file.

The problem is that the 'wc' function checks the 'failed' member of this
struct and if it is <=0, it skips doing fstat on the file. The main loop
doesn't reset this value between files, so only the first file has fstat
done on it.

This can result in the 'wc' function seeking past the end of
subsequent files and then over-reporting their byte counts.

See the attached patch, which resets the fstatus struct in between files
when reading the file list from stdin.


Ouch. This seems to be since v7.0-96-gc2e56e0
It would also mean there would be a lot of redundant reading
if the initial file was significantly smaller than any other file.

$ truncate -s1G wc.big
$ touch wc.small
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
1073741824 wc.big
1073741760 wc.small
2147483584 total

We'll submit a full patch in your name.


Interesting enough, there seems to be a threshold to trigger the bug:

$ touch wc.small

$ seq 1000 > wc.big
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
3893 wc.big
0 wc.small
3893 total

$ seq 1 > wc.big
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
48894 wc.big
45067 wc.small
93961 total

That's why I couldn't reproduce it this morning.

Have a nice day,
Berny





bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-21 Thread Pádraig Brady

On 21/03/16 00:59, William R. Fraser wrote:

When wc gets its list of files by reading from stdin, using the argument
'--from-files0=-', it reuses the same fstatus struct for each file.

The problem is that the 'wc' function checks the 'failed' member of this
struct and if it is <=0, it skips doing fstat on the file. The main loop
doesn't reset this value between files, so only the first file has fstat
done on it.

This can result in the 'wc' function seeking past the end of
subsequent files and then over-reporting their byte counts.

See the attached patch, which resets the fstatus struct in between files
when reading the file list from stdin.


Ouch. This seems to be since v7.0-96-gc2e56e0
It would also mean there would be a lot of redundant reading
if the initial file was significantly smaller than any other file.

$ truncate -s1G wc.big
$ touch wc.small
$ printf '%s\0' wc.big wc.small | wc -c --files0-from=-
1073741824 wc.big
1073741760 wc.small
2147483584 total

We'll submit a full patch in your name.

thanks!
Pádraig.





bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-20 Thread William R. Fraser
When wc gets its list of files by reading from stdin, using the argument 
'--from-files0=-', it reuses the same fstatus struct for each file.


The problem is that the 'wc' function checks the 'failed' member of this 
struct and if it is <=0, it skips doing fstat on the file. The main loop 
doesn't reset this value between files, so only the first file has fstat 
done on it.


This can result in the 'wc' function seeking past the end of 
subsequent files and then over-reporting their byte counts.


See the attached patch, which resets the fstatus struct in between files 
when reading the file list from stdin.


Thanks,
- Bill FraserFrom 3be80bf2a90c5d9d227fa9c920e7a5e0e70f473b Mon Sep 17 00:00:00 2001
From: "William R. Fraser" 
Date: Sun, 20 Mar 2016 17:44:09 -0700
Subject: [PATCH] wc: fix wrong byte counts when using --files-from0

When using '--files-from0=-', wc reuses the same fstatus struct for each
file. However, it doesn't reset the fstatus.failed member between files,
which causes wc to skip doing a fstat on all files after the first.

This can result in wc seeking past the end of subsequent files, and then
over-reporting the byte count.
---
 src/wc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/wc.c b/src/wc.c
index 94cbaff..4548235 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -806,6 +806,9 @@ main (int argc, char **argv)
 ok = false;
   else
 ok &= wc_file (file_name, [nfiles ? i : 0]);
+
+  if (!nfiles)
+fstatus[0].failed = 1;
 }
  argv_iter_done:
 
-- 
2.7.3