bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-10 Thread Jim Meyering
Paul Eggert wrote:

 On 05/09/2012 03:14 AM, Jim Meyering wrote:
 I think Pádraig's question about dd's skip seeking to EOF on an
 actual tape device is the most important to address.

 Yes indeed, I think that part of my change should be backed out
 at least for now (so close before a release).

 Part of the issue is that it's not clear what lseek (fd, N, SEEK_SET)
 does on tape devices where N exceeds the length of the file.  My
 vague recollection is that such lseeks return L, where L is the
 file length, but I can't cite chapter and verse on that right now.
 This would suggest further fixes in this area -- but later.

 Your stat.c change is actually a bug fix, so I'd prefer to
 put it in a separate commit.  I did that for you.  Let me know
 if the change below is ok and I'll push it -- or you're welcome
 to make any change you'd like and push it yourself.

 Yes, please push that.  I'll try to squeeze some time free
 to look at this in the next day or two.

I see you pushed it.
Thanks for adding that line to NEWS:

 stat no longer reports a negative file size as a huge positive number.
+[bug present since 'stat' was introduced in fileutils-4.1.9]





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-10 Thread Paul Eggert

On 05/09/2012 11:36 PM, Jim Meyering wrote:

I see you pushed it.
Thanks for adding that line to NEWS:


You're welcome.  I shook free some time to adjust the st_size patch,
and here's a new version that I think incorporates all the comments
so far:

From 0c9661af1b29e2999030cb93003d98673faabf83 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Wed, 9 May 2012 23:53:16 -0700
Subject: [PATCH] maint: handle file sizes more reliably

Problem reported by Samuel Thibault in http://debbugs.gnu.org/11424.
* NEWS: Document this.
* src/dd.c (skip): Handle skipping past EOF on shared or typed
memory objects the same way as with regular files.
(dd_copy): It's OK to truncate shared memory objects.
* src/du.c (duinfo_add): Check for overflow.
(print_only_size): Report overflow.
(process_file): Ignore negative file sizes in the --apparent-size case.
* src/od.c (skip): Fix comment about st_size.
* src/split.c (main):
* src/truncate.c (do_ftruncate, main):
On files where st_size is not portable, fall back on using lseek
with SEEK_END to determine the size.  Although strictly speaking
POSIX says the behavior is implementation-defined, in practice
if lseek returns a nonnegative value it's a reasonable one to
use for the file size.
* src/system.h (usable_st_size): Symlinks have reliable st_size too.
* tests/misc/truncate-dir-fail: Don't assume that getting the size
of a dir is not allowed, as it's now allowed on many platforms,
e.g., GNU/Linux.
---
 NEWS |3 ++
 src/dd.c |8 +++---
 src/du.c |   12 ++---
 src/od.c |3 +-
 src/split.c  |   14 +++---
 src/system.h |3 +-
 src/truncate.c   |   56 ++---
 tests/misc/truncate-dir-fail |3 --
 8 files changed, 69 insertions(+), 33 deletions(-)

diff --git a/NEWS b/NEWS
index 7ef2f54..e56f8fa 100644
--- a/NEWS
+++ b/NEWS
@@ -27,6 +27,9 @@ GNU coreutils NEWS-*- 
outline -*-
 
 ** New features
 
+  split and truncate now allow any seekable files in situations where

+  the file size is needed, instead of insisting on regular files.
+
   fmt now accepts the --goal=WIDTH (-g) option.
 
 ** Changes in behavior

diff --git a/src/dd.c b/src/dd.c
index 4626de2..163d514 100644
--- a/src/dd.c
+++ b/src/dd.c
@@ -1544,7 +1544,7 @@ skip (int fdesc, char const *file, uintmax_t records, 
size_t blocksize,
struct stat st;
if (fstat (STDIN_FILENO, st) != 0)
  error (EXIT_FAILURE, errno, _(cannot fstat %s), quote (file));
-   if (S_ISREG (st.st_mode)  st.st_size  (input_offset + offset))
+   if (usable_st_size (st)  st.st_size  input_offset + offset)
  {
/* When skipping past EOF, return the number of _full_ blocks
 * that are not skipped, and set offset to EOF, so the caller
@@ -2104,8 +2104,8 @@ dd_copy (void)
 }
 }
 
-  /* If the last write was converted to a seek, then for a regular file,

- ftruncate to extend the size.  */
+  /* If the last write was converted to a seek, then for a regular file
+ or shared memory object, ftruncate to extend the size.  */
   if (final_op_was_seek)
 {
   struct stat stdout_stat;
@@ -2114,7 +2114,7 @@ dd_copy (void)
   error (0, errno, _(cannot fstat %s), quote (output_file));
   return EXIT_FAILURE;
 }
-  if (S_ISREG (stdout_stat.st_mode))
+  if (S_ISREG (stdout_stat.st_mode) || S_TYPEISSHM (stdout_stat))
 {
   off_t output_offset = lseek (STDOUT_FILENO, 0, SEEK_CUR);
   if (output_offset  stdout_stat.st_size)
diff --git a/src/du.c b/src/du.c
index 41c9535..7333941 100644
--- a/src/du.c
+++ b/src/du.c
@@ -99,7 +99,8 @@ duinfo_set (struct duinfo *a, uintmax_t size, struct timespec 
tmax)
 static inline void
 duinfo_add (struct duinfo *a, struct duinfo const *b)
 {
-  a-size += b-size;
+  uintmax_t sum = a-size + b-size;
+  a-size = a-size = sum ? sum : UINTMAX_MAX;
   if (timespec_cmp (a-tmax, b-tmax)  0)
 a-tmax = b-tmax;
 }
@@ -370,8 +371,11 @@ static void
 print_only_size (uintmax_t n_bytes)
 {
   char buf[LONGEST_HUMAN_READABLE + 1];
-  fputs (human_readable (n_bytes, buf, human_output_opts,
- 1, output_block_size), stdout);
+  fputs ((n_bytes == UINTMAX_MAX
+  ? _(Infinity)
+  : human_readable (n_bytes, buf, human_output_opts,
+1, output_block_size)),
+ stdout);
 }
 
 /* Print size (and optionally time) indicated by *PDUI, followed by STRING.  */

@@ -495,7 +499,7 @@ process_file (FTS *fts, FTSENT *ent)
 
   duinfo_set (dui,

   (apparent_size
-   ? sb-st_size
+   ? MAX (0, sb-st_size)
: (uintmax_t) ST_NBLOCKS (*sb) * ST_NBLOCKSIZE),
   (time_type == time_mtime ? get_stat_mtime (sb)
: 

bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-10 Thread Jim Meyering
Jim Meyering wrote:
 Paul Eggert wrote:
 On 05/09/2012 11:36 PM, Jim Meyering wrote:
 I see you pushed it.
 Thanks for adding that line to NEWS:

 You're welcome.  I shook free some time to adjust the st_size patch,
 and here's a new version that I think incorporates all the comments
 so far:

 From 0c9661af1b29e2999030cb93003d98673faabf83 Mon Sep 17 00:00:00 2001
 From: Paul Eggert egg...@cs.ucla.edu
 Date: Wed, 9 May 2012 23:53:16 -0700
 Subject: [PATCH] maint: handle file sizes more reliably
 ...

 Thanks for sending that.
 Unfortunately, I am unable to apply it because
 something has mangled the patch.

No need to resend.  I figured it out.
Filtering your patch through sed 's/^  / /'
was enough to undo the offending transformation.
Then it applied with git am FILE.





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-10 Thread Pádraig Brady
On 05/10/2012 07:55 AM, Paul Eggert wrote:
 diff --git a/src/truncate.c b/src/truncate.c

 @@ -348,15 +361,28 @@ main (int argc, char **argv)
  {
/* FIXME: Maybe support getting size of block devices.  */

Can the above be removed, as I think SEEK_END should handle devices.

Otherwise looks good.

thanks,
Pádraig.





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-10 Thread Jim Meyering
Pádraig Brady wrote:
 On 05/10/2012 07:55 AM, Paul Eggert wrote:
 diff --git a/src/truncate.c b/src/truncate.c

 @@ -348,15 +361,28 @@ main (int argc, char **argv)
  {
/* FIXME: Maybe support getting size of block devices.  */

 Can the above be removed, as I think SEEK_END should handle devices.

Thanks.
I'll amend to remove that comment.

 Otherwise looks good.

I reached the same conclusion.
The only other change is to the bug URL in the log message: s/debbugs/bugs/





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-09 Thread Jim Meyering
Paul Eggert wrote:
 On 05/08/2012 01:39 AM, Jim Meyering wrote:
 I went ahead and pushed the less-invasive fix.

 Hmm, I don't see this on Savannah; is this part
 of the problem where usable_st_size got pushed?

 Your behavior-changing one is more than welcome, too.

 I came up with a better idea, and propose this patch
 instead.  The idea is to fall back on lseek if
 st_size is not reliable.  This allows the programs
 to work in more cases, including the case in question.
 One test case needs to be removed because it assumes
 a command must fail, where it now typically works.

From bba6f199f621fb434c5df09a0b0278a081be87e2 Mon Sep 17 00:00:00 2001
 From: Paul Eggert egg...@cs.ucla.edu
 Date: Tue, 8 May 2012 09:22:22 -0700
 Subject: [PATCH] maint: handle file sizes more reliably

 Problem reported by Samuel Thibault in http://debbugs.gnu.org/11424.
 * NEWS: Document this.
 * src/dd.c (skip):
 * src/split.c (main):
 * src/truncate.c (do_ftruncate, main):
 On files where st_size is not portable, fall back on using lseek
 with SEEK_END to determine the size.  Although strictly speaking
 POSIX says the behavior is implementation-defined, in practice
 if lseek returns a nonnegative value it's a reasonable one to
 use for the file size.
 * src/dd.c (dd_copy): It's OK to truncate shared memory objects.
 * src/du.c (duinfo_add): Check for overflow.
 (print_only_size): Report overflow.
 (process_file): Ignore negative file sizes in the --apparent-size case.
 * src/od.c (skip): Fix comment about st_size.
 * src/stat.c (print_stat): Don't report negative sizes as huge
 positive ones.
 * src/system.h (usable_st_size): Symlinks have reliable st_size too.
 * tests/misc/truncate-dir-fail: Don't assume that getting the size
 of a dir is not allowed, as it's now allowed on many platforms,
 e.g., GNU/Linux.
 ---
...
 diff --git a/src/stat.c b/src/stat.c
 index b2e1030..d001cda 100644
 --- a/src/stat.c
 +++ b/src/stat.c
 @@ -954,7 +954,7 @@ print_stat (char *pformat, size_t prefix_len, unsigned 
 int m,
out_uint_x (pformat, prefix_len, minor (statbuf-st_rdev));
break;
  case 's':
 -  out_uint (pformat, prefix_len, statbuf-st_size);
 +  out_int (pformat, prefix_len, statbuf-st_size);
break;
  case 'B':
out_uint (pformat, prefix_len, ST_NBLOCKSIZE);

Thanks for all of that.
I think Pádraig's question about dd's skip seeking to EOF on an
actual tape device is the most important to address.

Your stat.c change is actually a bug fix, so I'd prefer to
put it in a separate commit.  I did that for you.  Let me know
if the change below is ok and I'll push it -- or you're welcome
to make any change you'd like and push it yourself.

From 0b816e06d0b3d0cc9b7d2f92b095145bfe7c5476 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Wed, 9 May 2012 12:07:57 +0200
Subject: [PATCH] stat: don't report negative file size as huge positive
 number

* src/stat.c (print_stat): Use out_int, not out_uint for stat.st_size.
* NEWS (Bug fixes): Mention it.
---
 NEWS   | 2 ++
 src/stat.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index eb95404..2935276 100644
--- a/NEWS
+++ b/NEWS
@@ -22,6 +22,8 @@ GNU coreutils NEWS-*- 
outline -*-
   split --number=C /dev/null no longer appears to infloop on GNU/Hurd
   [bug introduced in coreutils-8.8]

+  stat no longer reports a negative file size as a huge positive number
+
 ** New features

   fmt now accepts the --goal=WIDTH (-g) option.
diff --git a/src/stat.c b/src/stat.c
index b2e1030..d001cda 100644
--- a/src/stat.c
+++ b/src/stat.c
@@ -954,7 +954,7 @@ print_stat (char *pformat, size_t prefix_len, unsigned int 
m,
   out_uint_x (pformat, prefix_len, minor (statbuf-st_rdev));
   break;
 case 's':
-  out_uint (pformat, prefix_len, statbuf-st_size);
+  out_int (pformat, prefix_len, statbuf-st_size);
   break;
 case 'B':
   out_uint (pformat, prefix_len, ST_NBLOCKSIZE);
--
1.7.10.1.487.ga3935e6





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-09 Thread Pádraig Brady
On 05/09/2012 11:14 AM, Jim Meyering wrote:
 Paul Eggert wrote:
 On 05/08/2012 01:39 AM, Jim Meyering wrote:
 I went ahead and pushed the less-invasive fix.

 Hmm, I don't see this on Savannah; is this part
 of the problem where usable_st_size got pushed?

 Your behavior-changing one is more than welcome, too.

 I came up with a better idea, and propose this patch
 instead.  The idea is to fall back on lseek if
 st_size is not reliable.  This allows the programs
 to work in more cases, including the case in question.
 One test case needs to be removed because it assumes
 a command must fail, where it now typically works.

 From bba6f199f621fb434c5df09a0b0278a081be87e2 Mon Sep 17 00:00:00 2001
 From: Paul Eggert egg...@cs.ucla.edu
 Date: Tue, 8 May 2012 09:22:22 -0700
 Subject: [PATCH] maint: handle file sizes more reliably

 Problem reported by Samuel Thibault in http://debbugs.gnu.org/11424.
 * NEWS: Document this.
 * src/dd.c (skip):
 * src/split.c (main):
 * src/truncate.c (do_ftruncate, main):
 On files where st_size is not portable, fall back on using lseek
 with SEEK_END to determine the size.  Although strictly speaking
 POSIX says the behavior is implementation-defined, in practice
 if lseek returns a nonnegative value it's a reasonable one to
 use for the file size.
 * src/dd.c (dd_copy): It's OK to truncate shared memory objects.
 * src/du.c (duinfo_add): Check for overflow.
 (print_only_size): Report overflow.
 (process_file): Ignore negative file sizes in the --apparent-size case.
 * src/od.c (skip): Fix comment about st_size.
 * src/stat.c (print_stat): Don't report negative sizes as huge
 positive ones.
 * src/system.h (usable_st_size): Symlinks have reliable st_size too.
 * tests/misc/truncate-dir-fail: Don't assume that getting the size
 of a dir is not allowed, as it's now allowed on many platforms,
 e.g., GNU/Linux.
 ---
 ...
 diff --git a/src/stat.c b/src/stat.c
 index b2e1030..d001cda 100644
 --- a/src/stat.c
 +++ b/src/stat.c
 @@ -954,7 +954,7 @@ print_stat (char *pformat, size_t prefix_len, unsigned 
 int m,
out_uint_x (pformat, prefix_len, minor (statbuf-st_rdev));
break;
  case 's':
 -  out_uint (pformat, prefix_len, statbuf-st_size);
 +  out_int (pformat, prefix_len, statbuf-st_size);
break;
  case 'B':
out_uint (pformat, prefix_len, ST_NBLOCKSIZE);
 
 Thanks for all of that.
 I think Pádraig's question about dd's skip seeking to EOF on an
 actual tape device is the most important to address.
 
 Your stat.c change is actually a bug fix, so I'd prefer to
 put it in a separate commit.  I did that for you.  Let me know
 if the change below is ok and I'll push it -- or you're welcome
 to make any change you'd like and push it yourself.
 
From 0b816e06d0b3d0cc9b7d2f92b095145bfe7c5476 Mon Sep 17 00:00:00 2001
 From: Paul Eggert egg...@cs.ucla.edu
 Date: Wed, 9 May 2012 12:07:57 +0200
 Subject: [PATCH] stat: don't report negative file size as huge positive
  number
 
 * src/stat.c (print_stat): Use out_int, not out_uint for stat.st_size.
 * NEWS (Bug fixes): Mention it.
 ---
  NEWS   | 2 ++
  src/stat.c | 2 +-
  2 files changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/NEWS b/NEWS
 index eb95404..2935276 100644
 --- a/NEWS
 +++ b/NEWS
 @@ -22,6 +22,8 @@ GNU coreutils NEWS-*- 
 outline -*-
split --number=C /dev/null no longer appears to infloop on GNU/Hurd
[bug introduced in coreutils-8.8]
 
 +  stat no longer reports a negative file size as a huge positive number
 +
  ** New features
 
fmt now accepts the --goal=WIDTH (-g) option.
 diff --git a/src/stat.c b/src/stat.c
 index b2e1030..d001cda 100644
 --- a/src/stat.c
 +++ b/src/stat.c
 @@ -954,7 +954,7 @@ print_stat (char *pformat, size_t prefix_len, unsigned 
 int m,
out_uint_x (pformat, prefix_len, minor (statbuf-st_rdev));
break;
  case 's':
 -  out_uint (pformat, prefix_len, statbuf-st_size);
 +  out_int (pformat, prefix_len, statbuf-st_size);
break;
  case 'B':
out_uint (pformat, prefix_len, ST_NBLOCKSIZE);
 --
 1.7.10.1.487.ga3935e6

For the record, stat(1) has output st_size as unsigned since the
initial implementation in fileutils-4.1.10.

I noticed that st_size is unsigned for 32 bit linux kernels
according to /usr/include/asm/stat.h, however my modern 32 kernel
gives EOVERFLOW for files in the 2-4G range, and thus shouldn't
put negative numbers in those fields. That used not be the case I think:
http://lkml.indiana.edu/hypermail/linux/kernel/0004.1/0343.html
Also other 32 bit environments might overflow here.

So I think this change is a net improvement.

cheers,
Pádraig.





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-09 Thread Jim Meyering
Pádraig Brady wrote:
...
From 0b816e06d0b3d0cc9b7d2f92b095145bfe7c5476 Mon Sep 17 00:00:00 2001
 From: Paul Eggert egg...@cs.ucla.edu
 Date: Wed, 9 May 2012 12:07:57 +0200
 Subject: [PATCH] stat: don't report negative file size as huge positive
  number

 * src/stat.c (print_stat): Use out_int, not out_uint for stat.st_size.
 * NEWS (Bug fixes): Mention it.
...

 For the record, stat(1) has output st_size as unsigned since the
 initial implementation in fileutils-4.1.10.

 I noticed that st_size is unsigned for 32 bit linux kernels
 according to /usr/include/asm/stat.h, however my modern 32 kernel
 gives EOVERFLOW for files in the 2-4G range, and thus shouldn't
 put negative numbers in those fields. That used not be the case I think:
 http://lkml.indiana.edu/hypermail/linux/kernel/0004.1/0343.html
 Also other 32 bit environments might overflow here.

Thanks for investigating that.
It is reassuring to know that interpreting it as unsigned
used to be correct, and it's only in not adapting to the 32-bit
kernel ABI change did we let the error sneak in.





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-09 Thread Paul Eggert
On 05/09/2012 03:14 AM, Jim Meyering wrote:
 I think Pádraig's question about dd's skip seeking to EOF on an
 actual tape device is the most important to address.

Yes indeed, I think that part of my change should be backed out
at least for now (so close before a release).

Part of the issue is that it's not clear what lseek (fd, N, SEEK_SET)
does on tape devices where N exceeds the length of the file.  My
vague recollection is that such lseeks return L, where L is the
file length, but I can't cite chapter and verse on that right now.
This would suggest further fixes in this area -- but later.

 Your stat.c change is actually a bug fix, so I'd prefer to
 put it in a separate commit.  I did that for you.  Let me know
 if the change below is ok and I'll push it -- or you're welcome
 to make any change you'd like and push it yourself.

Yes, please push that.  I'll try to squeeze some time free
to look at this in the next day or two.





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-08 Thread Jim Meyering
Jim Meyering wrote:
 Paul Eggert wrote:
 On 05/07/2012 12:46 AM, Jim Meyering wrote:
 +
 +  /* stat.st_size is valid only for regular files.  For others, use 0.  */
 +  file_size = S_ISREG (stat_buf.st_mode) ? stat_buf.st_size : 0;

 Is it right to use 0 there, for non-regular files?
 Won't later code compute incorrect sizes in that case?

 Hi Paul,

 I agree that more change is required and do prefer the direction your
 patches suggest.  However, to fix the Hurd/infloop with minimal
 impact elsewhere, I have a slight preference for my small change.
 I.e. continuing to operate on non-regular files with --number we
 don't have to change the split --number tests that operate on /dev/zero.
 Then, introducing the behavior change (with your follow-on patch) can
 be independent of the bug fix commit.

 I do admit that without being able to determine a size up front, there's
 little point in using that option, so your patch (reject files with
 unusable stat.st_size) is required.

 With or without my patch on Linux/GNU, if you split /dev/zero,
 it sets file_size = 0, so at least for the tested cases
 I don't think that patch introduces a regression.

 Also, as a nit, stat.st_size is also valid for
 SHM and TMO files (this was in the patch I just sent).

 Good point.
 Do you feel like adding something like this to system.h
 and completing your patch?

 /* Return a boolean indicating whether sb-st_size is defined.  */
 static inline bool
 usable_st_size (struct stat const *sb)
 {
   return S_ISREG (sb-st_mode) || S_TYPEISSHM (sb) || S_TYPEISTMO (sb);
 }

I went ahead and pushed the less-invasive fix.
Your behavior-changing one is more than welcome, too.





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-08 Thread Jim Meyering
Jim Meyering wrote:
 Paul Eggert wrote:
...
 I went ahead and pushed the less-invasive fix.
 Your behavior-changing one is more than welcome, too.

Samuel confirmed that the fix solved his problem, so
I've marked this as closed.





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-08 Thread Paul Eggert
On 05/08/2012 01:39 AM, Jim Meyering wrote:
 I went ahead and pushed the less-invasive fix.

Hmm, I don't see this on Savannah; is this part
of the problem where usable_st_size got pushed?

 Your behavior-changing one is more than welcome, too.

I came up with a better idea, and propose this patch
instead.  The idea is to fall back on lseek if
st_size is not reliable.  This allows the programs
to work in more cases, including the case in question.
One test case needs to be removed because it assumes
a command must fail, where it now typically works.


From bba6f199f621fb434c5df09a0b0278a081be87e2 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
Date: Tue, 8 May 2012 09:22:22 -0700
Subject: [PATCH] maint: handle file sizes more reliably

Problem reported by Samuel Thibault in http://debbugs.gnu.org/11424.
* NEWS: Document this.
* src/dd.c (skip):
* src/split.c (main):
* src/truncate.c (do_ftruncate, main):
On files where st_size is not portable, fall back on using lseek
with SEEK_END to determine the size.  Although strictly speaking
POSIX says the behavior is implementation-defined, in practice
if lseek returns a nonnegative value it's a reasonable one to
use for the file size.
* src/dd.c (dd_copy): It's OK to truncate shared memory objects.
* src/du.c (duinfo_add): Check for overflow.
(print_only_size): Report overflow.
(process_file): Ignore negative file sizes in the --apparent-size case.
* src/od.c (skip): Fix comment about st_size.
* src/stat.c (print_stat): Don't report negative sizes as huge
positive ones.
* src/system.h (usable_st_size): Symlinks have reliable st_size too.
* tests/misc/truncate-dir-fail: Don't assume that getting the size
of a dir is not allowed, as it's now allowed on many platforms,
e.g., GNU/Linux.
---
 NEWS |3 ++
 src/dd.c |   17 +---
 src/du.c |   12 ++---
 src/od.c |3 +-
 src/split.c  |7 +
 src/stat.c   |2 +-
 src/system.h |3 +-
 src/truncate.c   |   56 ++---
 tests/misc/truncate-dir-fail |3 --
 9 files changed, 76 insertions(+), 30 deletions(-)

diff --git a/NEWS b/NEWS
index 2dc6531..c911d52 100644
--- a/NEWS
+++ b/NEWS
@@ -21,6 +21,9 @@ GNU coreutils NEWS-*- 
outline -*-
 
 ** New features
 
+  dd, split, and truncate now allow any seekable files in situations
+  where the file size is needed, instead of insisting on regular files.
+
   fmt now accepts the --goal=WIDTH (-g) option.
 
 ** Changes in behavior
diff --git a/src/dd.c b/src/dd.c
index 4626de2..75109bc 100644
--- a/src/dd.c
+++ b/src/dd.c
@@ -1542,9 +1542,18 @@ skip (int fdesc, char const *file, uintmax_t records, 
size_t blocksize,
   if (fdesc == STDIN_FILENO)
 {
struct stat st;
+   off_t file_size;
if (fstat (STDIN_FILENO, st) != 0)
  error (EXIT_FAILURE, errno, _(cannot fstat %s), quote (file));
-   if (S_ISREG (st.st_mode)  st.st_size  (input_offset + offset))
+   if (usable_st_size (st))
+ file_size = st.st_size;
+   else
+ {
+   file_size = skip_via_lseek (file, fdesc, 0, SEEK_END);
+   if (skip_via_lseek (file, fdesc, offset, SEEK_CUR)  0)
+ error (EXIT_FAILURE, errno, _(%s: cannot skip), quote 
(file));
+ }
+   if (0 = file_size  file_size  input_offset + offset)
  {
/* When skipping past EOF, return the number of _full_ blocks
 * that are not skipped, and set offset to EOF, so the caller
@@ -2104,8 +2113,8 @@ dd_copy (void)
 }
 }
 
-  /* If the last write was converted to a seek, then for a regular file,
- ftruncate to extend the size.  */
+  /* If the last write was converted to a seek, then for a regular file
+ or shared memory object, ftruncate to extend the size.  */
   if (final_op_was_seek)
 {
   struct stat stdout_stat;
@@ -2114,7 +2123,7 @@ dd_copy (void)
   error (0, errno, _(cannot fstat %s), quote (output_file));
   return EXIT_FAILURE;
 }
-  if (S_ISREG (stdout_stat.st_mode))
+  if (S_ISREG (stdout_stat.st_mode) || S_TYPEISSHM (stdout_stat))
 {
   off_t output_offset = lseek (STDOUT_FILENO, 0, SEEK_CUR);
   if (output_offset  stdout_stat.st_size)
diff --git a/src/du.c b/src/du.c
index 41c9535..7333941 100644
--- a/src/du.c
+++ b/src/du.c
@@ -99,7 +99,8 @@ duinfo_set (struct duinfo *a, uintmax_t size, struct timespec 
tmax)
 static inline void
 duinfo_add (struct duinfo *a, struct duinfo const *b)
 {
-  a-size += b-size;
+  uintmax_t sum = a-size + b-size;
+  a-size = a-size = sum ? sum : UINTMAX_MAX;
   if (timespec_cmp (a-tmax, b-tmax)  0)
 a-tmax = b-tmax;
 }
@@ -370,8 +371,11 @@ static void
 print_only_size (uintmax_t n_bytes)
 {
   

bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-08 Thread Jim Meyering
Paul Eggert wrote:
 On 05/08/2012 01:39 AM, Jim Meyering wrote:
 I went ahead and pushed the less-invasive fix.

 Hmm, I don't see this on Savannah; is this part
 of the problem where usable_st_size got pushed?

Ahh... I think I know what happened.
I had both the usable_st_size and split-hang-fix patches on a temporary
branch and ran git rebase -i HEAD~2 intending to delete the
usable_st_size change set just before pushing.  Obviously I removed
the other instead.

I've just pushed the split-hang-fix patch, along with a
gnulib-updating patch that also pulls in the latest
init.sh and bootstrap scripts.

 Your behavior-changing one is more than welcome, too.

 I came up with a better idea, and propose this patch
 instead.  The idea is to fall back on lseek if
 st_size is not reliable.  This allows the programs
 to work in more cases, including the case in question.
 One test case needs to be removed because it assumes
 a command must fail, where it now typically works.

Thanks!  I'll look at it this evening.

...
 Subject: [PATCH] maint: handle file sizes more reliably

 Problem reported by Samuel Thibault in http://debbugs.gnu.org/11424.
 * NEWS: Document this.
 * src/dd.c (skip):
 * src/split.c (main):
 * src/truncate.c (do_ftruncate, main):
 On files where st_size is not portable, fall back on using lseek
 with SEEK_END to determine the size.  Although strictly speaking
 POSIX says the behavior is implementation-defined, in practice
 if lseek returns a nonnegative value it's a reasonable one to
 use for the file size.
 * src/dd.c (dd_copy): It's OK to truncate shared memory objects.
 * src/du.c (duinfo_add): Check for overflow.
 (print_only_size): Report overflow.
 (process_file): Ignore negative file sizes in the --apparent-size case.
 * src/od.c (skip): Fix comment about st_size.
 * src/stat.c (print_stat): Don't report negative sizes as huge
 positive ones.
 * src/system.h (usable_st_size): Symlinks have reliable st_size too.
 * tests/misc/truncate-dir-fail: Don't assume that getting the size
 of a dir is not allowed, as it's now allowed on many platforms,
 e.g., GNU/Linux.





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-08 Thread Pádraig Brady
On 05/08/2012 05:27 PM, Paul Eggert wrote:
 On 05/08/2012 01:39 AM, Jim Meyering wrote:
 I went ahead and pushed the less-invasive fix.
 
 Hmm, I don't see this on Savannah; is this part
 of the problem where usable_st_size got pushed?
 
 Your behavior-changing one is more than welcome, too.
 
 I came up with a better idea, and propose this patch
 instead.  The idea is to fall back on lseek if
 st_size is not reliable.  This allows the programs
 to work in more cases, including the case in question.
 One test case needs to be removed because it assumes
 a command must fail, where it now typically works.
 
 
From bba6f199f621fb434c5df09a0b0278a081be87e2 Mon Sep 17 00:00:00 2001
 From: Paul Eggert egg...@cs.ucla.edu
 Date: Tue, 8 May 2012 09:22:22 -0700
 Subject: [PATCH] maint: handle file sizes more reliably
 
 Problem reported by Samuel Thibault in http://debbugs.gnu.org/11424.
 * NEWS: Document this.
 * src/dd.c (skip):
 * src/split.c (main):
 * src/truncate.c (do_ftruncate, main):
 On files where st_size is not portable, fall back on using lseek
 with SEEK_END to determine the size.  Although strictly speaking
 POSIX says the behavior is implementation-defined, in practice
 if lseek returns a nonnegative value it's a reasonable one to
 use for the file size.
 * src/dd.c (dd_copy): It's OK to truncate shared memory objects.
 * src/du.c (duinfo_add): Check for overflow.
 (print_only_size): Report overflow.
 (process_file): Ignore negative file sizes in the --apparent-size case.
 * src/od.c (skip): Fix comment about st_size.
 * src/stat.c (print_stat): Don't report negative sizes as huge
 positive ones.
 * src/system.h (usable_st_size): Symlinks have reliable st_size too.
 * tests/misc/truncate-dir-fail: Don't assume that getting the size
 of a dir is not allowed, as it's now allowed on many platforms,
 e.g., GNU/Linux.
 ---
  NEWS |3 ++
  src/dd.c |   17 +---
  src/du.c |   12 ++---
  src/od.c |3 +-
  src/split.c  |7 +
  src/stat.c   |2 +-
  src/system.h |3 +-
  src/truncate.c   |   56 ++---
  tests/misc/truncate-dir-fail |3 --
  9 files changed, 76 insertions(+), 30 deletions(-)
 
 diff --git a/NEWS b/NEWS
 index 2dc6531..c911d52 100644
 --- a/NEWS
 +++ b/NEWS
 @@ -21,6 +21,9 @@ GNU coreutils NEWS-*- 
 outline -*-
  
  ** New features
  
 +  dd, split, and truncate now allow any seekable files in situations
 +  where the file size is needed, instead of insisting on regular files.
 +
fmt now accepts the --goal=WIDTH (-g) option.
  
  ** Changes in behavior
 diff --git a/src/dd.c b/src/dd.c
 index 4626de2..75109bc 100644
 --- a/src/dd.c
 +++ b/src/dd.c
 @@ -1542,9 +1542,18 @@ skip (int fdesc, char const *file, uintmax_t records, 
 size_t blocksize,
if (fdesc == STDIN_FILENO)
  {
 struct stat st;
 +   off_t file_size;
 if (fstat (STDIN_FILENO, st) != 0)
   error (EXIT_FAILURE, errno, _(cannot fstat %s), quote (file));
 -   if (S_ISREG (st.st_mode)  st.st_size  (input_offset + offset))
 +   if (usable_st_size (st))
 + file_size = st.st_size;
 +   else
 + {
 +   file_size = skip_via_lseek (file, fdesc, 0, SEEK_END);
 +   if (skip_via_lseek (file, fdesc, offset, SEEK_CUR)  0)
 + error (EXIT_FAILURE, errno, _(%s: cannot skip), quote 
 (file));
 + }
 +   if (0 = file_size  file_size  input_offset + offset)
   {
 /* When skipping past EOF, return the number of _full_ blocks
  * that are not skipped, and set offset to EOF, so the caller

Might SEEK_END have implications for tape devices,
where we might go to the end even if only operating on the start of the device?

This thread reminded me of an old one where I previously wondered if
one could refactor out a stat_size() that would return unsigned and
thus simplify many callers?
http://lists.gnu.org/archive/html/bug-coreutils/2009-01/msg00069.html

I think it was suggested in that thread that POSIX will
return a positive st_size for those st_modes?

Something like:

uintmax_t stat_size (struct stat const *statp, int fd)
{
  uintmax_t size = UINTMAX_MAX; /* Error.  */

  struct stat sb;
  if (! statp)
{
  if (fd  0)
{
  if (fstat (fd, sb) != 0)
  return size;
  statp = sb;
}
  else
return size;
}

  if (S_ISREG (statp-st_mode) || S_ISLNK (statp-st_mode)
  || S_TYPEISSHM (stat_buf) || S_TYPEISTMO (stat_buf))
size = statp-st_size;
  else if (fd  0)
; /* Maybe do something with SEEK_END or BLKGETSIZE64 ? */
}

cheers,
Pádraig.





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-07 Thread Jim Meyering
Samuel Thibault wrote:
 Since some time, coreutils fails in split tests on GNU/Hurd. More
 precisely, these two:

 split/filter:55
 split --filter=head -c1 /dev/null -n 1 /dev/zero

 split/l-chunk:39
 split -n l/2 /dev/zero

 It seems that these two tests assume that split will stop by itself when
 given /dev/zero as input.  It does so on Linux, because /dev/zero there
 has stat_buf.st_size equal to 0, and thus split does just one loop, but
 on GNU/Hurd /dev/zero has stat_buf.st_size equal to LONG_MAX, and thus
 split just loops for a very long time.  Posix doesn't seem to talk much
 about what should be done here, but it seems to me that coreutils tests
 should not assume size being zero, and for instance use dd to fetch only
 the required bytes from /dev/zero.

Hi Samuel,
The real problem is that split was using stat.st_size from non-regular
files: that is not defined.

Here is a patch:

From 0a63df4b669faf0585beab09f4b177c39d557b21 Mon Sep 17 00:00:00 2001
From: Jim Meyering meyer...@redhat.com
Date: Mon, 7 May 2012 09:32:00 +0200
Subject: [PATCH] split: avoid apparent infloop when splitting /dev/zero w/-n
 on the Hurd

* src/split.c (main): Use stat.st_size only for regular files.
Samuel Thibault reported in http://bugs.gnu.org/11424 that the
/dev/zero-splitting tests would appear to infloop on GNU/Hurd,
because /dev/zero's st_size is LONG_MAX.  It was only a problem
when using the --number (-n) option.
* NEWS (Bug fixes): Mention it.
This bug was introduced with the --number option, via
commit v8.7-25-gbe10739
---
 NEWS| 3 +++
 src/split.c | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index fd563c0..7563646 100644
--- a/NEWS
+++ b/NEWS
@@ -19,6 +19,9 @@ GNU coreutils NEWS-*- 
outline -*-
   was particularly easy to trigger, since there, the removal of D could
   precede the initial stat.  [This bug was present in the beginning.]

+  split --number=C /dev/null no longer appears to infloop on GNU/Hurd
+  [bug introduced in coreutils-8.8]
+
 ** New features

   fmt now accepts the --goal=WIDTH (-g) option.
diff --git a/src/split.c b/src/split.c
index 99f6390..062aede 100644
--- a/src/split.c
+++ b/src/split.c
@@ -1339,7 +1339,9 @@ main (int argc, char **argv)
 error (EXIT_FAILURE, errno, %s, infile);
   if (in_blk_size == 0)
 in_blk_size = io_blksize (stat_buf);
-  file_size = stat_buf.st_size;
+
+  /* stat.st_size is valid only for regular files.  For others, use 0.  */
+  file_size = S_ISREG (stat_buf.st_mode) ? stat_buf.st_size : 0;

   if (split_type == type_chunk_bytes || split_type == type_chunk_lines)
 {
--
1.7.10.1.457.g8275905





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-07 Thread Paul Eggert

Thanks, here's a quick cut at a patch to 'split' to fix the problem.
Also, the test cases also need to be adjusted, so as not to attempt to
split -n on a device.

diff --git a/src/split.c b/src/split.c
index 99f6390..8b966bc 100644
--- a/src/split.c
+++ b/src/split.c
@@ -1344,7 +1344,9 @@ main (int argc, char **argv)
   if (split_type == type_chunk_bytes || split_type == type_chunk_lines)
 {
   off_t input_offset = lseek (STDIN_FILENO, 0, SEEK_CUR);
-  if (input_offset  0)
+  if (input_offset  0
+  || ! (S_ISREG (stat_buf.st_mode)
+|| S_TYPEISSHM (stat_buf) || S_TYPEISTMO (stat_buf)))
 error (EXIT_FAILURE, 0, _(%s: cannot determine file size),
quote (infile));
   file_size -= input_offset;


I audited coreutils for other misuses of st_size and found some;
here are additional quick cuts at patches that look like they're needed.
I didn't attempt to look for missed optimizations; just errors.

diff --git a/src/dd.c b/src/dd.c
index 4626de2..147b69d 100644
--- a/src/dd.c
+++ b/src/dd.c
@@ -1544,7 +1544,8 @@ skip (int fdesc, char const *file, uintmax_t records, 
size_t blocksize,
struct stat st;
if (fstat (STDIN_FILENO, st) != 0)
  error (EXIT_FAILURE, errno, _(cannot fstat %s), quote (file));
-   if (S_ISREG (st.st_mode)  st.st_size  (input_offset + offset))
+   if ((S_ISREG (st.st_mode) || S_TYPEISSHM (st) || S_TYPEISTMO (st))
+st.st_size  input_offset + offset)
  {
/* When skipping past EOF, return the number of _full_ blocks
 * that are not skipped, and set offset to EOF, so the caller
@@ -2104,8 +2105,8 @@ dd_copy (void)
 }
 }
 
-  /* If the last write was converted to a seek, then for a regular file,

- ftruncate to extend the size.  */
+  /* If the last write was converted to a seek, then for a regular file
+ or shared memory object, ftruncate to extend the size.  */
   if (final_op_was_seek)
 {
   struct stat stdout_stat;
@@ -2114,7 +2115,7 @@ dd_copy (void)
   error (0, errno, _(cannot fstat %s), quote (output_file));
   return EXIT_FAILURE;
 }
-  if (S_ISREG (stdout_stat.st_mode))
+  if (S_ISREG (stdout_stat.st_mode) || S_TYPEISSHM (stdout_stat))
 {
   off_t output_offset = lseek (STDOUT_FILENO, 0, SEEK_CUR);
   if (output_offset  stdout_stat.st_size)
diff --git a/src/du.c b/src/du.c
index 41c9535..7333941 100644
--- a/src/du.c
+++ b/src/du.c
@@ -99,7 +99,8 @@ duinfo_set (struct duinfo *a, uintmax_t size, struct timespec 
tmax)
 static inline void
 duinfo_add (struct duinfo *a, struct duinfo const *b)
 {
-  a-size += b-size;
+  uintmax_t sum = a-size + b-size;
+  a-size = a-size = sum ? sum : UINTMAX_MAX;
   if (timespec_cmp (a-tmax, b-tmax)  0)
 a-tmax = b-tmax;
 }
@@ -370,8 +371,11 @@ static void
 print_only_size (uintmax_t n_bytes)
 {
   char buf[LONGEST_HUMAN_READABLE + 1];
-  fputs (human_readable (n_bytes, buf, human_output_opts,
- 1, output_block_size), stdout);
+  fputs ((n_bytes == UINTMAX_MAX
+  ? _(Infinity)
+  : human_readable (n_bytes, buf, human_output_opts,
+1, output_block_size)),
+ stdout);
 }
 
 /* Print size (and optionally time) indicated by *PDUI, followed by STRING.  */

@@ -495,7 +499,7 @@ process_file (FTS *fts, FTSENT *ent)
 
   duinfo_set (dui,

   (apparent_size
-   ? sb-st_size
+   ? MAX (0, sb-st_size)
: (uintmax_t) ST_NBLOCKS (*sb) * ST_NBLOCKSIZE),
   (time_type == time_mtime ? get_stat_mtime (sb)
: time_type == time_atime ? get_stat_atime (sb)
diff --git a/src/od.c b/src/od.c
index 7593796..a25f965 100644
--- a/src/od.c
+++ b/src/od.c
@@ -983,8 +983,7 @@ skip (uintmax_t n_skip)
 
   if (fstat (fileno (in_stream), file_stats) == 0)

 {
-  /* The st_size field is valid only for regular files
- (and for symbolic links, which cannot occur here).
+  /* The st_size field is valid for regular files.
  If the number of bytes left to skip is larger than
  the size of the current file, we can decrement n_skip
  and go on to the next file.  Skip this optimization also
diff --git a/src/stat.c b/src/stat.c
index b2e1030..d001cda 100644
--- a/src/stat.c
+++ b/src/stat.c
@@ -954,7 +954,7 @@ print_stat (char *pformat, size_t prefix_len, unsigned int 
m,
   out_uint_x (pformat, prefix_len, minor (statbuf-st_rdev));
   break;
 case 's':
-  out_uint (pformat, prefix_len, statbuf-st_size);
+  out_int (pformat, prefix_len, statbuf-st_size);
   break;
 case 'B':
   out_uint (pformat, prefix_len, ST_NBLOCKSIZE);
diff --git a/src/truncate.c b/src/truncate.c
index 9b847d2..eaef598 100644
--- a/src/truncate.c
+++ b/src/truncate.c
@@ -161,7 +161,8 @@ do_ftruncate (int fd, 

bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-07 Thread Paul Eggert

On 05/07/2012 12:46 AM, Jim Meyering wrote:

+
+  /* stat.st_size is valid only for regular files.  For others, use 0.  */
+  file_size = S_ISREG (stat_buf.st_mode) ? stat_buf.st_size : 0;


Is it right to use 0 there, for non-regular files?
Won't later code compute incorrect sizes in that case?

Also, as a nit, stat.st_size is also valid for
SHM and TMO files (this was in the patch I just sent).





bug#11424: coreutils: split tests hang on /dev/zero on GNU/Hurd

2012-05-07 Thread Jim Meyering
Paul Eggert wrote:
 On 05/07/2012 12:46 AM, Jim Meyering wrote:
 +
 +  /* stat.st_size is valid only for regular files.  For others, use 0.  */
 +  file_size = S_ISREG (stat_buf.st_mode) ? stat_buf.st_size : 0;

 Is it right to use 0 there, for non-regular files?
 Won't later code compute incorrect sizes in that case?

Hi Paul,

I agree that more change is required and do prefer the direction your
patches suggest.  However, to fix the Hurd/infloop with minimal
impact elsewhere, I have a slight preference for my small change.
I.e. continuing to operate on non-regular files with --number we
don't have to change the split --number tests that operate on /dev/zero.
Then, introducing the behavior change (with your follow-on patch) can
be independent of the bug fix commit.

I do admit that without being able to determine a size up front, there's
little point in using that option, so your patch (reject files with
unusable stat.st_size) is required.

With or without my patch on Linux/GNU, if you split /dev/zero,
it sets file_size = 0, so at least for the tested cases
I don't think that patch introduces a regression.

 Also, as a nit, stat.st_size is also valid for
 SHM and TMO files (this was in the patch I just sent).

Good point.
Do you feel like adding something like this to system.h
and completing your patch?

/* Return a boolean indicating whether sb-st_size is defined.  */
static inline bool
usable_st_size (struct stat const *sb)
{
  return S_ISREG (sb-st_mode) || S_TYPEISSHM (sb) || S_TYPEISTMO (sb);
}