bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit
On 01/30/2012 07:01 PM, JONES, BILL wrote: those are vxfs. Thanks, can you please try the attached patch? You can apply it by running the shell commands: cd lib patch vxfs-patch.txt I'll CC: this to bug-gnulib as it appears to be a Gnulib bug. Thanks. acl: fix infinite loop on Solaris 10 + vxfs Problem reported by Bill Jones in http://debbugs.gnu.org/cgi/bugreport.cgi?bug=10639. * lib/acl-internal.h: Include limits.h. (MIN): New macro. * lib/copy-acl.c (qcopy_acl): Don't object if we get fewer ACL entries than we originally counted; perhaps some were removed as we were running, or perhaps we can count but not get. Check for size-calculation overflow. Avoid need for dynamic allocation if possible. * lib/file-has-acl.c (file_has_acl): Likewise. * lib/set-mode-acl.c (qset_acl): Likewise. diff --git a/lib/acl-internal.h b/lib/acl-internal.h index 88e5e45..64701b2 100644 --- a/lib/acl-internal.h +++ b/lib/acl-internal.h @@ -55,6 +55,12 @@ extern int aclsort (int, int, struct acl *); # define ENOTSUP (-1) #endif +#include limits.h + +#ifndef MIN +# define MIN(a,b) ((a) (b) ? (a) : (b)) +#endif + #ifndef HAVE_FCHMOD # define HAVE_FCHMOD false # define fchmod(fd, mode) (-1) diff --git a/lib/copy-acl.c b/lib/copy-acl.c index 9b9f033..e9f4826 100644 --- a/lib/copy-acl.c +++ b/lib/copy-acl.c @@ -181,10 +181,10 @@ qcopy_acl (const char *src_name, int source_desc, const char *dst_name, of Unixware. The acl() call returns the access and default ACL both at once. */ # ifdef ACE_GETACL - int ace_count; + int ace_count0, ace_count; ace_t *ace_entries; # endif - int count; + int count0, count; aclent_t *entries; int did_chmod; int saved_errno; @@ -228,19 +228,21 @@ qcopy_acl (const char *src_name, int source_desc, const char *dst_name, break; } - ace_entries = (ace_t *) malloc (ace_count * sizeof (ace_t)); - if (ace_entries == NULL) + if (! (ace_count MIN (INT_MAX, (size_t) -1 / sizeof (ace_t)) + (ace_entries + = (ace_t *) malloc ((ace_count + 1) * sizeof (ace_t) { errno = ENOMEM; return -2; } - if ((source_desc != -1 - ? facl (source_desc, ACE_GETACL, ace_count, ace_entries) - : acl (src_name, ACE_GETACL, ace_count, ace_entries)) - == ace_count) + ace_count0 = ace_count; + ace_count = (source_desc != -1 + ? facl (source_desc, ACE_GETACL, ace_count + 1, ace_entries) + : acl (src_name, ACE_GETACL, ace_count + 1, ace_entries)); + if (ace_count = ace_count0) break; - /* Huh? The number of ACL entries changed since the last call. + /* Huh? The number of ACL entries grew since the last call. Repeat. */ } # endif @@ -269,19 +271,21 @@ qcopy_acl (const char *src_name, int source_desc, const char *dst_name, break; } - entries = (aclent_t *) malloc (count * sizeof (aclent_t)); - if (entries == NULL) + if (! (count MIN (INT_MAX, (size_t) -1 / sizeof (aclent_t)) + (entries + = (aclent_t *) malloc ((count + 1) * sizeof (aclent_t) { errno = ENOMEM; return -2; } - if ((source_desc != -1 - ? facl (source_desc, GETACL, count, entries) - : acl (src_name, GETACL, count, entries)) - == count) + count0 = count; + count = (source_desc != -1 + ? facl (source_desc, GETACL, count + 1, entries) + : acl (src_name, GETACL, count + 1, entries)); + if (count = count0) break; - /* Huh? The number of ACL entries changed since the last call. + /* Huh? The number of ACL entries grew since the last call. Repeat. */ } @@ -366,77 +370,44 @@ qcopy_acl (const char *src_name, int source_desc, const char *dst_name, #elif USE_ACL HAVE_GETACL /* HP-UX */ int count; - struct acl_entry entries[NACLENTRIES]; + struct acl_entry entries[NACLENTRIES + 1]; # if HAVE_ACLV_H int aclv_count; - struct acl aclv_entries[NACLVENTRIES]; + struct acl aclv_entries[NACLVENTRIES + 1]; # endif int did_chmod; int saved_errno; int ret; - for (;;) -{ - count = (source_desc != -1 - ? fgetacl (source_desc, 0, NULL) - : getacl (src_name, 0, NULL)); - - if (count 0) -{ - if (errno == ENOSYS || errno == EOPNOTSUPP || errno == ENOTSUP) -{ - count = 0; - break; -} - else -return -2; -} + count = (source_desc != -1 + ? fgetacl (source_desc, NACLENTRIES + 1, entries) + : getacl (src_name, NACLENTRIES + 1, entries)); - if (count == 0) -break; - - if (count NACLENTRIES) -/* If NACLENTRIES cannot be trusted, use dynamic memory allocation. */ -abort (); - - if ((source_desc !=
bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit
Paul Eggert wrote: Thanks, can you please try the attached patch? I find it overkill to change the code for HP-UX and NonStop systems when the report is about Solaris. Also I think the structure of the loop is not the problem; it is the code's reaction to acl(x, ACE_GETACL, 4, 0x3432A16E0)Err#-1 Please, can you also try this patch? 2012-02-01 Bruno Haible br...@clisp.org * lib/file-has-acl.c (file_has_acl) [Solaris]: Treat a failing acl()/facl() call for ACE_GETACL like a failing call for ACE_GETACLCNT. * lib/set-mode-acl.c (qset_acl) [Solaris]: Likewise. * lib/copy-acl.c (qcopy_acl)[Solaris]: Likewise. --- lib/copy-acl.c.orig Wed Feb 1 11:15:51 2012 +++ lib/copy-acl.c Wed Feb 1 11:10:55 2012 @@ -235,10 +235,22 @@ return -2; } - if ((source_desc != -1 - ? facl (source_desc, ACE_GETACL, ace_count, ace_entries) - : acl (src_name, ACE_GETACL, ace_count, ace_entries)) - == ace_count) + ret = (source_desc != -1 + ? facl (source_desc, ACE_GETACL, ace_count, ace_entries) + : acl (src_name, ACE_GETACL, ace_count, ace_entries); + if (ret 0) +{ + free (ace_entries); + if (errno == ENOSYS || errno == EINVAL) +{ + ace_count = 0; + ace_entries = NULL; + break; +} + else +return -2; +} + if (ret == ace_count) break; /* Huh? The number of ACL entries changed since the last call. Repeat. */ --- lib/file-has-acl.c.orig Wed Feb 1 11:15:51 2012 +++ lib/file-has-acl.c Wed Feb 1 11:13:17 2012 @@ -588,6 +588,8 @@ for (;;) { +int ret; + count = acl (name, ACE_GETACLCNT, 0, NULL); if (count 0) @@ -618,7 +620,16 @@ errno = ENOMEM; return -1; } -if (acl (name, ACE_GETACL, count, entries) == count) +ret = acl (name, ACE_GETACL, count, entries); +if (ret 0) + { +free (entries); +if (errno == ENOSYS || errno == EINVAL) + break; +else + return -1; + } +if (ret == count) { if (acl_ace_nontrivial (count, entries)) { --- lib/set-mode-acl.c.orig Wed Feb 1 11:15:51 2012 +++ lib/set-mode-acl.c Wed Feb 1 11:15:26 2012 @@ -219,6 +219,8 @@ for (;;) { +int ret; + if (desc != -1) count = facl (desc, ACE_GETACLCNT, 0, NULL); else @@ -234,10 +236,16 @@ errno = ENOMEM; return -1; } -if ((desc != -1 - ? facl (desc, ACE_GETACL, count, entries) - : acl (name, ACE_GETACL, count, entries)) -== count) +ret = (desc != -1 + ? facl (desc, ACE_GETACL, count, entries) + : acl (name, ACE_GETACL, count, entries)); +if (ret 0) + { +free (entries); +convention = -1; +break; + } +if (ret == count) { int i;
bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit
On 02/01/2012 02:22 AM, Bruno Haible wrote: I find it overkill to change the code for HP-UX and NonStop systems when the report is about Solaris. Also I think the structure of the loop is not the problem; it is the code's reaction to acl(x, ACE_GETACL, 4, 0x3432A16E0)Err#-1 The patch you sent looks like it'll fix that particular bug, but while looking into this I discovered so many bugs in this area, mostly hard-to-test race conditions and overflows, that I thought it better to rewrite the affected code than to try to fix each bug one at a time. I didn't write this up very well, and my first cut at doing this could stand some improvements of its own. Here's a revised patch that tries to do a better job at all this. (If it's any consolation, this patch makes the code simpler -- 186 lines shorter) diff --git a/ChangeLog b/ChangeLog index f80c7dd..ec16bef 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,89 @@ +2012-02-01 Paul Eggert egg...@cs.ucla.edu + + acl: fix several problems with ACLs + Problem with infinite loop on Solaris 10 + vxfs reported by Bill + Jones in http://debbugs.gnu.org/cgi/bugreport.cgi?bug=10639. + Looking into the code, I saw several closely related issues: + +- There's a race condition in this kind of code: + +n = acl (f, GETACLCNT, 0, NULL); +[ allocate an array A of size N ] +if (acl (f, GETACL, n, a) == n) + return ok; + + The ACL can grow in size between the first and second calls to + 'acl', which means that the second 'acl' returns a truncated + ACL but thinks that it has the whole thing. To avoid this, the + following pattern should be used instead: + +n = acl (f, GETACLCNT, 0, NULL); +[ allocate an array A of size N + 1 ] +n1 = acl (f, GETACL, n + 1, a); +if (0 n1 n1 = n) + return ok; + +- There were several instances of this pattern: + +for (;;) { + n = acl (f, GETACLCNT, 0, NULL); + [ allocate an array A of size N ] + if (acl (f, GETACL, n, a) == n) +break; +} + + This loop might never terminate if some other process is constantly + manipulating the file's ACL. The loop should be rewritten to + terminate. + +- The acl (... GETACLNT ...) call is merely an optimization; its value + is merely a hint as to how big to make the array. A better + optimization is to avoid the acl (... GETACLNT ...) call entirely, + and just guess a reasonably-big size, growing the size and trying + again if it's not large enough. This guarantees termination, and + saves a system call. + +- With this approach, for ports like HP-UX that have an upper bound + for the ACL length, there's no longer any need to loop reading the + ACL. Just read it once and you're done reading it. + +- For ports like Solaris that do need to loop (because the ACL length + is not known a priori), it's faster to allocate a reasonably-sized + buffer on the stack and use that, and to allocate something on the + heap only if the ACL is unusually large. This avoids malloc in the + usual case. + +- The code that calculated sizes of these arrays did not check for + overflow in size calculations; it should. + +- There are some memory leaks. For example, in qcopy_acl, if acl + (src_name, GETACLCNT, 0, NULL) 0 errno == EPERM, then the + storage for ace_entries leaks. Similarly, if acl (src_name, + ACE_GETACL, ace_count, ace_entries) returns -1, the storage for + ace_entries leaks. + +- In qset_acl, there's sometimes no need to read the entire ACL + to determine the convention; this can save system calls when + looking at very large ACLs. + + Rather than fix these bugs one at a time I thought it more efficient + to rewrite the affected code, as follows: + * lib/acl-internal.h (GETACLCNT): Remove; no longer needed. + Include limits.h. + (MIN): New macro. + * lib/copy-acl.c (qcopy_acl): Don't bother to count ACL entries + before getting them. Instead, just get them, and make sure that + the number of entries gotten is less than the number requested. + This avoids a race condition and omits a system call in the usual + case. Before, it could have been the case that we asked for and + received N entries, but these were the first N of more than N + actual entries, if the ACL was modified while we were getting it. + With this change, there is no need to use GETACLCNT or similar ops. + Check for size-calculation overflow. + Avoid need for dynamic allocation if possible. + * lib/file-has-acl.c (file_has_acl):
bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit
Paul, SUN Solaris 10 The FS are a mix. Some vxfs and some ufs. $ mount / on /dev/vx/dsk/bootdg/rootvol read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=434 on Sat Jan 21 05:16:53 2012 /devices on /devices read/write/setuid/devices/dev=578 on Sat Jan 21 05:13:16 2012 /system/contract on ctfs read/write/setuid/devices/dev=57c0001 on Sat Jan 21 05:13:16 2012 /proc on proc read/write/setuid/devices/dev=580 on Sat Jan 21 05:13:16 2012 /etc/mnttab on mnttab read/write/setuid/devices/dev=5840001 on Sat Jan 21 05:13:16 2012 /etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=5880001 on Sat Jan 21 05:13:16 2012 /system/object on objfs read/write/setuid/devices/dev=58c0001 on Sat Jan 21 05:13:16 2012 /etc/dfs/sharetab on sharefs read/write/setuid/devices/dev=591 on Sat Jan 21 05:13:17 2012 /platform/sun4u-us3/lib/libc_psr.so.1 on /platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap2.so.1 read/write/setuid/devices/dev=434 on Sat Jan 21 05:16:50 2012 /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1 on /platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap2.so.1 read/write/setuid/devices/dev=434 on Sat Jan 21 05:16:50 2012 /dev/fd on fd read/write/setuid/devices/dev=5a80001 on Sat Jan 21 05:16:53 2012 /var on /dev/vx/dsk/bootdg/var read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=4356764 on Sat Jan 21 05:16:53 2012 /tmp on swap read/write/setuid/devices/xattr/dev=5880002 on Sat Jan 21 05:16:53 2012 /var/run on swap read/write/setuid/devices/xattr/dev=5880003 on Sat Jan 21 05:16:53 2012 /dev/vx/dmp on swap read/write/setuid/devices/xattr/dev=5880004 on Sat Jan 21 05:16:55 2012 /dev/vx/rdmp on swap read/write/setuid/devices/xattr/dev=5880005 on Sat Jan 21 05:16:55 2012 /extra on /dev/vx/dsk/bootdg/extra read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=4356762 on Sat Jan 21 05:17:12 2012 /export/home on /dev/vx/dsk/bootdg/home read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=4356763 on Sat Jan 21 05:17:12 2012 /appl/logs on /dev/vx/dsk/appldg/vol07 read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459de on Sat Jan 21 05:17:13 2012 /appl/repos on /dev/vx/dsk/appldg/vol02 read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459d9 on Sat Jan 21 05:17:13 2012 /appl/archives on /dev/vx/dsk/appldg/vol06 read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459dd on Sat Jan 21 05:17:13 2012 /appl/gfpip on /dev/vx/dsk/appldg/vol01 read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459d8 on Sat Jan 21 05:17:14 2012 /appl/var on /dev/vx/dsk/appldg/vol05 read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459dc on Sat Jan 21 05:17:15 2012 /appl/data on /dev/vx/dsk/appldg/vol04 read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459db on Sat Jan 21 05:17:15 2012 /appl/IBM on /dev/vx/dsk/appldg/lvol13 read/write/setuid/devices/delaylog/largefiles/ioerror=mwdisable/dev=43459e4 on Wed Jan 25 15:56:39 2012 /nas/usr/sbc on r07748fb1n1-nas:/vol/v00_fs01_admin/q00_fs01_admin/sun remote/read only/setuid/devices/soft/bg/noquota/proto=tcp/xattr/dev=5b005a6 on Mon Jan 30 05:19:08 2012 SunOS gfpmtipb 5.10 Generic_142900-06 sun4u sparc SUNW,Netra-T12 Solaris Bill From: Paul Eggert [egg...@cs.ucla.edu] Sent: Sunday, January 29, 2012 11:55 PM To: JONES, BILL Cc: 10...@debbugs.gnu.org Subject: Re: bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit Thanks for the bug report. What kind of file system was the test run on, and what mount options were used for it? E.g., what's the output of the mount command? Also, what's the output of uname -a, and which compiler did you use to build coreutils?
bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit
On 01/29/2012 09:25 PM, JONES, BILL wrote: The FS are a mix. Some vxfs and some ufs. Thanks; which file system was the test actually run on? (I have a Solaris 10 sparc box here to test with, just wanna have something as close to yours as I can.)
bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit
I am not sure how the test was run...if it was run in the current directory as the compile it would be /appl/... those are vxfs. Bill From: Paul Eggert [egg...@cs.ucla.edu] Sent: Monday, January 30, 2012 5:34 PM To: JONES, BILL Cc: 10...@debbugs.gnu.org Subject: Re: bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit On 01/29/2012 09:25 PM, JONES, BILL wrote: The FS are a mix. Some vxfs and some ufs. Thanks; which file system was the test actually run on? (I have a Solaris 10 sparc box here to test with, just wanna have something as close to yours as I can.)
bug#10639: BUG REPORT coreutils-8.15 Solaris 10 64bit
Thanks for the bug report. What kind of file system was the test run on, and what mount options were used for it? E.g., what's the output of the mount command? Also, what's the output of uname -a, and which compiler did you use to build coreutils?