bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit

2012-02-01 Thread Paul Eggert
On 01/30/2012 07:01 PM, JONES, BILL wrote:
 those are vxfs.

Thanks, can you please try the attached patch?
You can apply it by running the shell commands:

  cd lib
  patch  vxfs-patch.txt

I'll CC: this to bug-gnulib as it appears to be a Gnulib bug.

Thanks.
acl: fix infinite loop on Solaris 10 + vxfs
Problem reported by Bill Jones in
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=10639.
* lib/acl-internal.h: Include limits.h.
(MIN): New macro.
* lib/copy-acl.c (qcopy_acl): Don't object if we get fewer ACL
entries than we originally counted; perhaps some were removed
as we were running, or perhaps we can count but not get.
Check for size-calculation overflow.
Avoid need for dynamic allocation if possible.
* lib/file-has-acl.c (file_has_acl): Likewise.
* lib/set-mode-acl.c (qset_acl): Likewise.
diff --git a/lib/acl-internal.h b/lib/acl-internal.h
index 88e5e45..64701b2 100644
--- a/lib/acl-internal.h
+++ b/lib/acl-internal.h
@@ -55,6 +55,12 @@ extern int aclsort (int, int, struct acl *);
 # define ENOTSUP (-1)
 #endif

+#include limits.h
+
+#ifndef MIN
+# define MIN(a,b) ((a)  (b) ? (a) : (b))
+#endif
+
 #ifndef HAVE_FCHMOD
 # define HAVE_FCHMOD false
 # define fchmod(fd, mode) (-1)
diff --git a/lib/copy-acl.c b/lib/copy-acl.c
index 9b9f033..e9f4826 100644
--- a/lib/copy-acl.c
+++ b/lib/copy-acl.c
@@ -181,10 +181,10 @@ qcopy_acl (const char *src_name, int source_desc, const 
char *dst_name,
  of Unixware.  The acl() call returns the access and default ACL both
  at once.  */
 # ifdef ACE_GETACL
-  int ace_count;
+  int ace_count0, ace_count;
   ace_t *ace_entries;
 # endif
-  int count;
+  int count0, count;
   aclent_t *entries;
   int did_chmod;
   int saved_errno;
@@ -228,19 +228,21 @@ qcopy_acl (const char *src_name, int source_desc, const 
char *dst_name,
   break;
 }

-  ace_entries = (ace_t *) malloc (ace_count * sizeof (ace_t));
-  if (ace_entries == NULL)
+  if (! (ace_count  MIN (INT_MAX, (size_t) -1 / sizeof (ace_t))
+  (ace_entries
+ = (ace_t *) malloc ((ace_count + 1) * sizeof (ace_t)
 {
   errno = ENOMEM;
   return -2;
 }

-  if ((source_desc != -1
-   ? facl (source_desc, ACE_GETACL, ace_count, ace_entries)
-   : acl (src_name, ACE_GETACL, ace_count, ace_entries))
-  == ace_count)
+  ace_count0 = ace_count;
+  ace_count = (source_desc != -1
+   ? facl (source_desc, ACE_GETACL, ace_count + 1, ace_entries)
+   : acl (src_name, ACE_GETACL, ace_count + 1, ace_entries));
+  if (ace_count = ace_count0)
 break;
-  /* Huh? The number of ACL entries changed since the last call.
+  /* Huh? The number of ACL entries grew since the last call.
  Repeat.  */
 }
 # endif
@@ -269,19 +271,21 @@ qcopy_acl (const char *src_name, int source_desc, const 
char *dst_name,
   break;
 }

-  entries = (aclent_t *) malloc (count * sizeof (aclent_t));
-  if (entries == NULL)
+  if (! (count  MIN (INT_MAX, (size_t) -1 / sizeof (aclent_t))
+  (entries
+ = (aclent_t *) malloc ((count + 1) * sizeof (aclent_t)
 {
   errno = ENOMEM;
   return -2;
 }

-  if ((source_desc != -1
-   ? facl (source_desc, GETACL, count, entries)
-   : acl (src_name, GETACL, count, entries))
-  == count)
+  count0 = count;
+  count = (source_desc != -1
+   ? facl (source_desc, GETACL, count + 1, entries)
+   : acl (src_name, GETACL, count + 1, entries));
+  if (count = count0)
 break;
-  /* Huh? The number of ACL entries changed since the last call.
+  /* Huh? The number of ACL entries grew since the last call.
  Repeat.  */
 }

@@ -366,77 +370,44 @@ qcopy_acl (const char *src_name, int source_desc, const 
char *dst_name,
 #elif USE_ACL  HAVE_GETACL /* HP-UX */

   int count;
-  struct acl_entry entries[NACLENTRIES];
+  struct acl_entry entries[NACLENTRIES + 1];
 # if HAVE_ACLV_H
   int aclv_count;
-  struct acl aclv_entries[NACLVENTRIES];
+  struct acl aclv_entries[NACLVENTRIES + 1];
 # endif
   int did_chmod;
   int saved_errno;
   int ret;

-  for (;;)
-{
-  count = (source_desc != -1
-   ? fgetacl (source_desc, 0, NULL)
-   : getacl (src_name, 0, NULL));
-
-  if (count  0)
-{
-  if (errno == ENOSYS || errno == EOPNOTSUPP || errno == ENOTSUP)
-{
-  count = 0;
-  break;
-}
-  else
-return -2;
-}
+  count = (source_desc != -1
+   ? fgetacl (source_desc, NACLENTRIES + 1, entries)
+   : getacl (src_name, NACLENTRIES + 1, entries));

-  if (count == 0)
-break;
-
-  if (count  NACLENTRIES)
-/* If NACLENTRIES cannot be trusted, use dynamic memory allocation.  */
-abort ();
-
-  if ((source_desc != 

bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit

2012-02-01 Thread Bruno Haible
Paul Eggert wrote:
 Thanks, can you please try the attached patch?

I find it overkill to change the code for HP-UX and NonStop systems
when the report is about Solaris. Also I think the structure of the loop
is not the problem; it is the code's reaction to
  acl(x, ACE_GETACL, 4, 0x3432A16E0)Err#-1

Please, can you also try this patch?

2012-02-01  Bruno Haible  br...@clisp.org

* lib/file-has-acl.c (file_has_acl) [Solaris]: Treat a failing
acl()/facl() call for ACE_GETACL like a failing call for ACE_GETACLCNT.
* lib/set-mode-acl.c (qset_acl) [Solaris]: Likewise.
* lib/copy-acl.c (qcopy_acl)[Solaris]: Likewise.

--- lib/copy-acl.c.orig	Wed Feb  1 11:15:51 2012
+++ lib/copy-acl.c	Wed Feb  1 11:10:55 2012
@@ -235,10 +235,22 @@
   return -2;
 }
 
-  if ((source_desc != -1
-   ? facl (source_desc, ACE_GETACL, ace_count, ace_entries)
-   : acl (src_name, ACE_GETACL, ace_count, ace_entries))
-  == ace_count)
+  ret = (source_desc != -1
+ ? facl (source_desc, ACE_GETACL, ace_count, ace_entries)
+ : acl (src_name, ACE_GETACL, ace_count, ace_entries);
+  if (ret  0)
+{
+  free (ace_entries);
+  if (errno == ENOSYS || errno == EINVAL)
+{
+  ace_count = 0;
+  ace_entries = NULL;
+  break;
+}
+  else
+return -2;
+}
+  if (ret == ace_count)
 break;
   /* Huh? The number of ACL entries changed since the last call.
  Repeat.  */
--- lib/file-has-acl.c.orig	Wed Feb  1 11:15:51 2012
+++ lib/file-has-acl.c	Wed Feb  1 11:13:17 2012
@@ -588,6 +588,8 @@
 
 for (;;)
   {
+int ret;
+
 count = acl (name, ACE_GETACLCNT, 0, NULL);
 
 if (count  0)
@@ -618,7 +620,16 @@
 errno = ENOMEM;
 return -1;
   }
-if (acl (name, ACE_GETACL, count, entries) == count)
+ret = acl (name, ACE_GETACL, count, entries);
+if (ret  0)
+  {
+free (entries);
+if (errno == ENOSYS || errno == EINVAL)
+  break;
+else
+  return -1;
+  }
+if (ret == count)
   {
 if (acl_ace_nontrivial (count, entries))
   {
--- lib/set-mode-acl.c.orig	Wed Feb  1 11:15:51 2012
+++ lib/set-mode-acl.c	Wed Feb  1 11:15:26 2012
@@ -219,6 +219,8 @@
 
 for (;;)
   {
+int ret;
+
 if (desc != -1)
   count = facl (desc, ACE_GETACLCNT, 0, NULL);
 else
@@ -234,10 +236,16 @@
 errno = ENOMEM;
 return -1;
   }
-if ((desc != -1
- ? facl (desc, ACE_GETACL, count, entries)
- : acl (name, ACE_GETACL, count, entries))
-== count)
+ret = (desc != -1
+   ? facl (desc, ACE_GETACL, count, entries)
+   : acl (name, ACE_GETACL, count, entries));
+if (ret  0)
+  {
+free (entries);
+convention = -1;
+break;
+  }
+if (ret == count)
   {
 int i;
 


bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit

2012-02-01 Thread Paul Eggert
On 02/01/2012 02:22 AM, Bruno Haible wrote:
 I find it overkill to change the code for HP-UX and NonStop systems
 when the report is about Solaris. Also I think the structure of the loop
 is not the problem; it is the code's reaction to
   acl(x, ACE_GETACL, 4, 0x3432A16E0)Err#-1

The patch you sent looks like it'll fix that particular bug, but while
looking into this I discovered so many bugs in this area, mostly
hard-to-test race conditions and overflows, that I thought it better
to rewrite the affected code than to try to fix each bug one at a time.
I didn't write this up very well, and my first cut at doing this could
stand some improvements of its own.  Here's a revised patch that tries
to do a better job at all this.  (If it's any consolation, this patch
makes the code simpler -- 186 lines shorter)


diff --git a/ChangeLog b/ChangeLog
index f80c7dd..ec16bef 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,89 @@
+2012-02-01  Paul Eggert  egg...@cs.ucla.edu
+
+   acl: fix several problems with ACLs
+   Problem with infinite loop on Solaris 10 + vxfs reported by Bill
+   Jones in http://debbugs.gnu.org/cgi/bugreport.cgi?bug=10639.
+   Looking into the code, I saw several closely related issues:
+
+- There's a race condition in this kind of code:
+
+n = acl (f, GETACLCNT, 0, NULL);
+[ allocate an array A of size N ]
+if (acl (f, GETACL, n, a) == n)
+  return ok;
+
+  The ACL can grow in size between the first and second calls to
+  'acl', which means that the second 'acl' returns a truncated
+  ACL but thinks that it has the whole thing.  To avoid this, the
+  following pattern should be used instead:
+
+n = acl (f, GETACLCNT, 0, NULL);
+[ allocate an array A of size N + 1 ]
+n1 = acl (f, GETACL, n + 1, a);
+if (0  n1  n1 = n)
+  return ok;
+
+- There were several instances of this pattern:
+
+for (;;) {
+  n = acl (f, GETACLCNT, 0, NULL);
+  [ allocate an array A of size N ]
+  if (acl (f, GETACL, n, a) == n)
+break;
+}
+
+  This loop might never terminate if some other process is constantly
+  manipulating the file's ACL.  The loop should be rewritten to
+  terminate.
+
+- The acl (... GETACLNT ...) call is merely an optimization; its value
+  is merely a hint as to how big to make the array.  A better
+  optimization is to avoid the acl (... GETACLNT ...)  call entirely,
+  and just guess a reasonably-big size, growing the size and trying
+  again if it's not large enough.  This guarantees termination, and
+  saves a system call.
+
+- With this approach, for ports like HP-UX that have an upper bound
+  for the ACL length, there's no longer any need to loop reading the
+  ACL.  Just read it once and you're done reading it.
+
+- For ports like Solaris that do need to loop (because the ACL length
+  is not known a priori), it's faster to allocate a reasonably-sized
+  buffer on the stack and use that, and to allocate something on the
+  heap only if the ACL is unusually large.  This avoids malloc in the
+  usual case.
+
+- The code that calculated sizes of these arrays did not check for
+  overflow in size calculations; it should.
+
+- There are some memory leaks.  For example, in qcopy_acl, if acl
+  (src_name, GETACLCNT, 0, NULL)  0  errno == EPERM, then the
+  storage for ace_entries leaks.  Similarly, if acl (src_name,
+  ACE_GETACL, ace_count, ace_entries) returns -1, the storage for
+  ace_entries leaks.
+
+- In qset_acl, there's sometimes no need to read the entire ACL
+  to determine the convention; this can save system calls when
+  looking at very large ACLs.
+
+   Rather than fix these bugs one at a time I thought it more efficient
+   to rewrite the affected code, as follows:
+   * lib/acl-internal.h (GETACLCNT): Remove; no longer needed.
+   Include limits.h.
+   (MIN): New macro.
+   * lib/copy-acl.c (qcopy_acl): Don't bother to count ACL entries
+   before getting them.  Instead, just get them, and make sure that
+   the number of entries gotten is less than the number requested.
+   This avoids a race condition and omits a system call in the usual
+   case.  Before, it could have been the case that we asked for and
+   received N entries, but these were the first N of more than N
+   actual entries, if the ACL was modified while we were getting it.
+   With this change, there is no need to use GETACLCNT or similar ops.
+   Check for size-calculation overflow.
+   Avoid need for dynamic allocation if possible.
+   * lib/file-has-acl.c (file_has_acl): 

bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit

2012-01-30 Thread JONES, BILL
Paul,

SUN Solaris 10

The FS are a mix.  Some vxfs and some ufs.

$ mount
/ on /dev/vx/dsk/bootdg/rootvol 
read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=434
 on Sat Jan 21 05:16:53 2012
/devices on /devices read/write/setuid/devices/dev=578 on Sat Jan 21 
05:13:16 2012
/system/contract on ctfs read/write/setuid/devices/dev=57c0001 on Sat Jan 21 
05:13:16 2012
/proc on proc read/write/setuid/devices/dev=580 on Sat Jan 21 05:13:16 2012
/etc/mnttab on mnttab read/write/setuid/devices/dev=5840001 on Sat Jan 21 
05:13:16 2012
/etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=5880001 on Sat 
Jan 21 05:13:16 2012
/system/object on objfs read/write/setuid/devices/dev=58c0001 on Sat Jan 21 
05:13:16 2012
/etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=591 on Sat Jan 
21 05:13:17 2012
/platform/sun4u-us3/lib/libc_psr.so.1 on 
/platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap2.so.1 
read/write/setuid/devices/dev=434 on Sat Jan 21 05:16:50 2012
/platform/sun4u-us3/lib/sparcv9/libc_psr.so.1 on 
/platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1 
read/write/setuid/devices/dev=434 on Sat Jan 21 05:16:50 2012
/dev/fd on fd read/write/setuid/devices/dev=5a80001 on Sat Jan 21 05:16:53 2012
/var on /dev/vx/dsk/bootdg/var 
read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=4356764
 on Sat Jan 21 05:16:53 2012
/tmp on swap read/write/setuid/devices/xattr/dev=5880002 on Sat Jan 21 05:16:53 
2012
/var/run on swap read/write/setuid/devices/xattr/dev=5880003 on Sat Jan 21 
05:16:53 2012
/dev/vx/dmp on swap read/write/setuid/devices/xattr/dev=5880004 on Sat Jan 21 
05:16:55 2012
/dev/vx/rdmp on swap read/write/setuid/devices/xattr/dev=5880005 on Sat Jan 21 
05:16:55 2012
/extra on /dev/vx/dsk/bootdg/extra 
read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=4356762
 on Sat Jan 21 05:17:12 2012
/export/home on /dev/vx/dsk/bootdg/home 
read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=4356763
 on Sat Jan 21 05:17:12 2012
/appl/logs on /dev/vx/dsk/appldg/vol07 
read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459de on 
Sat Jan 21 05:17:13 2012
/appl/repos on /dev/vx/dsk/appldg/vol02 
read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459d9 on 
Sat Jan 21 05:17:13 2012
/appl/archives on /dev/vx/dsk/appldg/vol06 
read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459dd on 
Sat Jan 21 05:17:13 2012
/appl/gfpip on /dev/vx/dsk/appldg/vol01 
read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459d8 on 
Sat Jan 21 05:17:14 2012
/appl/var on /dev/vx/dsk/appldg/vol05 
read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459dc on 
Sat Jan 21 05:17:15 2012
/appl/data on /dev/vx/dsk/appldg/vol04 
read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459db on 
Sat Jan 21 05:17:15 2012
/appl/IBM on /dev/vx/dsk/appldg/lvol13 
read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459e4 on 
Wed Jan 25 15:56:39 2012
/nas/usr/sbc on r07748fb1n1-nas:/vol/v00_fs01_admin/q00_fs01_admin/sun 
remote/read only/setuid/devices/soft/bg/noquota/proto=tcp/xattr/dev=5b005a6 on 
Mon Jan 30 05:19:08 2012

SunOS gfpmtipb 5.10 Generic_142900-06 sun4u sparc SUNW,Netra-T12 Solaris

Bill

From: Paul Eggert [egg...@cs.ucla.edu]
Sent: Sunday, January 29, 2012 11:55 PM
To: JONES, BILL
Cc: 10...@debbugs.gnu.org
Subject: Re: bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit

Thanks for the bug report.

What kind of file system was the test run on, and
what mount options were used for it?  E.g., what's
the output of the mount command?

Also, what's the output of uname -a, and which
compiler did you use to build coreutils?




bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit

2012-01-30 Thread Paul Eggert
On 01/29/2012 09:25 PM, JONES, BILL wrote:
 The FS are a mix.  Some vxfs and some ufs.

Thanks; which file system was the test actually run on?
(I have a Solaris 10 sparc box here to test with, just
wanna have something as close to yours as I can.)





bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit

2012-01-30 Thread JONES, BILL
I am not sure how the test was run...if it was run in the current directory as 
the compile it would be /appl/... those are vxfs.

Bill


From: Paul Eggert [egg...@cs.ucla.edu]
Sent: Monday, January 30, 2012 5:34 PM
To: JONES, BILL
Cc: 10...@debbugs.gnu.org
Subject: Re: bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit

On 01/29/2012 09:25 PM, JONES, BILL wrote:
 The FS are a mix.  Some vxfs and some ufs.

Thanks; which file system was the test actually run on?
(I have a Solaris 10 sparc box here to test with, just
wanna have something as close to yours as I can.)




bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit

2012-01-29 Thread Paul Eggert
Thanks for the bug report.

What kind of file system was the test run on, and
what mount options were used for it?  E.g., what's
the output of the mount command?

Also, what's the output of uname -a, and which
compiler did you use to build coreutils?