Re: du counting files in hard-linked directories twice
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Matthew Woehlke on 8/27/2009 5:09 PM: > Hard-linked *directories*? I wasn't aware those were allowed... (If I > read 'man 3 link' right, they aren't allowed on Linux.) POSIX permits, but does not require, them (and Linux intentionally does not permit them). Solaris has them (for superuser only), but they are annoying; they break some optimizations that would otherwise be possible due to unlink() vs. rmdir() semantics. Therefore, coreutils already intentionally drops privileges on Solaris in order to remove the ability to unlink a hard-linked directory. - -- Don't work too hard, make some time for fun as well! Eric Blake e...@byu.net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkqXRuwACgkQ84KuGfSFAYBpMACfTjrR2VmKWa0N04Y3SKr5xsJE WhMAn1LFTOJiLGIbbePvkC5sYgnMfH05 =TGOI -END PGP SIGNATURE-
Re: du counting files in hard-linked directories twice
Am 2009-08-28 um 01:09 schrieb Matthew Woehlke: Hard-linked *directories*? I wasn't aware those were allowed... (If I read 'man 3 link' right, they aren't allowed on Linux.) That got added in Mac OS X 10.5. It allows Time Machine create a backup in O(number of changed files) time instead of O(number of files) as unchanged subtrees can be hard-linked from a previous backup. This is also where I saw this bug, when doing a du -hs * in the backup dir to get a roundup on how much new data is in each backup. Michael
Re: [PATCH] open: introduce O_NOSTD
On Thu, Aug 27, 2009 at 15:55, Davide Libenzi wrote: > Can't the handling be done on close(), like (modulo some errno save/restore): No. You can have any file descriptor closed when the process is started. No close in the process with the special close.
Re: [PATCH] open: introduce O_NOSTD
On Thu, 27 Aug 2009, Eric Blake wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > According to Davide Libenzi on 8/25/2009 3:53 PM: > >> Another solution is for the application to sanitize all newly-created > >> fds: GNU coreutils provides a wrapper open_safer, which does nothing > >> extra in the common case that open() returned 3 or larger, but calls > >> fcntl(n,F_DUPFD,3)/close(n) before returning if n was less than 3. > >> However, this leads to triple the syscall cost for every open() call > >> if the process starts life with a std fd closed; and if O_CLOEXEC is > >> not used, still leaves a window of time where the fd can be leaked > >> through another thread's use of fork/exec. > > > > I think we can say that the vast majority of the software is not going to > > notice the proposed open_safer(), performance-wise, since the first three > > fds are always filled. So IMO the performance impact argument is a weak one. > > If CLOEXEC semantics are needed in the open operation, F_DUPFD_CLOEXEC can > > be used to match it. > > The current gnulib implementation of open_safer (pre-O_CLOEXEC support) is > (roughly): > > /* Wrap open, to guarantee that a successful return value is >= 3. */ > int open_safer (const char *name, int flags, int mode) > { > int fd = open (name, flags, mode); > if (0 <= fd && fd <= 2) > { > int dup = fcntl (fd, F_DUPFD, 3); > int saved_errno = errno; > close (fd); > errno = saved_errno; > fd = dup; > } > return fd; > } > > which has the desired property of no overhead in the common case of all > standard fds open. But it obviously mishandles the O_CLOEXEC flag. > Here's a first cut at supporting it: > > int open_safer (const char *name, int flags, int mode) > { > int fd = open (name, flags, mode); > if (0 <= fd && fd <= 2) > { > int dup = fcntl (fd, ((flags & O_CLOEXEC) > ? F_DUPFD_CLOEXEC : F_DUPFD), 3); > int saved_errno = errno; > close (fd); > errno = saved_errno; > fd = dup; > } > return fd; > } > > If the user requested open_safer(O_CLOEXEC), then we still have the > desired property of no syscall overhead and no fd leak. But if the user > intentionally does not pass O_CLOEXEC because they wanted to create an > inheritable fd, but without stomping on standard fds, then this version > still has a window for an fd leak. So let's try this version, which > guarantees no fd leak, while still keeping the semantics of giving the > user an inheritable fd outside the std fd range: > > int open_safer (const char *name, int flags, int mode) > { > int fd = open (name, flags | O_CLOEXEC, mode); > if (0 <= fd && fd <= 2) > { > int dup = fcntl (fd, ((flags & O_CLOEXEC) > ? F_DUPFD_CLOEXEC : F_DUPFD), 3); > int saved_errno = errno; > close (fd); > errno = saved_errno; > fd = dup; > } > else if (!(flags & O_CLOEXEC)) > { > if ((flags = fcntl (fd, F_GETFD)) < 0 > || fcntl (fd, F_SETFD, flags & ~FD_CLOEXEC) == -1) > { > int saved_errno = errno; > close (fd); > fd = -1; > errno = saved_errno; > } > } > return fd; > } > > This solves the fd leak, and open_safer(O_CLOEXEC) is still cheap in the > common case. But now the case of open_safer without O_CLOEXEC costs 3 > syscalls, regardless of whether the standard fds were already open (if we > assumed there were no possibility of new FD_* flags, we could cut the > common-case penalty from three to two by skipping the fcntl(fd,F_GETFD) > and just using fcntl(fd,F_SETFD,0), but that's asking for problems). > > > While the patch is simple, IMO this is something that can be easily taken > > care in glibc layers w/out huge drawbacks. > > I hope that my example shows why doing it in the kernel is desirable - > there is no safe way to keep the pre-O_CLOEXEC efficiency using just the > library, but there IS a way to do it with kernel support: > > int open_safer (const char *name, int flags, int mode) > { > return open (name, flags | O_NOSTD, mode); > } Can't the handling be done on close(), like (modulo some errno save/restore): int safer_close(int fd) { int error = close(fd); if (fd < 3 && fd >= 0) { if ((fd = open("/dev/null", O_RDWR)) > 2) close(fd); } return error; } - Davide
Re: du counting files in hard-linked directories twice
Michael Schwarz wrote: The Problem I see as a bug is that, in some circumstances, du is counting the same files multiple times when they appear inside of hard-linked directories. Hard-linked *directories*? I wasn't aware those were allowed... (If I read 'man 3 link' right, they aren't allowed on Linux.) ...Aside, of course, from 'd' and all 'd/*/..' being the same. -- Matthew Please do not quote my e-mail address unobfuscated in message bodies. -- For great justice!! -- Captain (Zero Wing)
Re: Correct semantics in rm -rf ?
On Thu, Aug 27, 2009 at 12:40 PM, Jim Meyering wrote: > shailesh jain wrote: > > Here it is. > > Outcome is that Child gets deleted but parent does not get deleted. > > (Now you will say it's a filesystem issue ;) ) > > Let's start with the command you used here. > It must not have been rm -rf, because the ENOENT is not being ignored: > huh ? I used rm -rf If ENOENT should have been ignored then there you see another bug ;). > > > fstatat64(AT_FDCWD, "parent", {st_mode=S_IFDIR|0755, st_size=72, ...}, > > AT_SYMLINK_NOFOLLOW) = 0 > > unlinkat(AT_FDCWD, "parent", 0) = -1 EISDIR (Is a directory) > > openat(AT_FDCWD, "parent", > > O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) > > = 3 > > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > > fcntl64(3, F_GETFL) = 0x28800 (flags > > O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) > > fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 > > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > > getdents64(3, /* 3 entries */, 1024)= 80 > > openat(3, "Child", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) = > 4 > > fstat64(4, {st_mode=S_IFDIR|0755, st_size=48, ...}) = 0 > > fstat64(4, {st_mode=S_IFDIR|0755, st_size=48, ...}) = 0 > > fcntl64(4, F_GETFL) = 0x28800 (flags > > O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) > > fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 > > close(3)= 0 > > fcntl64(4, F_GETFD) = 0x1 (flags FD_CLOEXEC) > > fstat64(4, {st_mode=S_IFDIR|0755, st_size=48, ...}) = 0 > > getdents64(4, /* 2 entries */, 1024)= 48 > > getdents64(4, /* 0 entries */, 1024)= 0 > > fcntl64(4, F_GETFD) = 0x1 (flags FD_CLOEXEC) > > openat(4, "..", O_RDONLY|O_LARGEFILE) = 3 > > close(4)= 0 > > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > > unlinkat(3, "Child", AT_REMOVEDIR) = 0 > > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > > fcntl64(3, F_GETFL) = 0x8000 (flags > > O_RDONLY|O_LARGEFILE) > > fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 > > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > > getdents64(3, /* 3 entries */, 1024)= 80 > > openat(3, "Child", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) = > -1 > > ENOENT (No such file or directory) > > getdents64(3, /* 0 entries */, 1024)= 0 > > fcntl64(3, F_GETFD) = 0x1 (flags FD_CLOEXEC) > > POSIX allows readdir to work that way. > Sorry I do not follow your comment. What way POSIX allows readdir to work ? > > I already suggested that making this change might help you: > > -CONSECUTIVE_READDIR_UNLINK_THRESHOLD = 10 > +CONSECUTIVE_READDIR_UNLINK_THRESHOLD = 1 > I did not try above option. I will try this later tonight. Shailesh Jain
Re: Correct semantics in rm -rf ?
shailesh jain wrote: > Here it is. > Outcome is that Child gets deleted but parent does not get deleted. > (Now you will say it's a filesystem issue ;) ) Let's start with the command you used here. It must not have been rm -rf, because the ENOENT is not being ignored: > fstatat64(AT_FDCWD, "parent", {st_mode=S_IFDIR|0755, st_size=72, ...}, > AT_SYMLINK_NOFOLLOW) = 0 > unlinkat(AT_FDCWD, "parent", 0) = -1 EISDIR (Is a directory) > openat(AT_FDCWD, "parent", > O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) > = 3 > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > fcntl64(3, F_GETFL) = 0x28800 (flags > O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) > fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > getdents64(3, /* 3 entries */, 1024)= 80 > openat(3, "Child", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) = 4 > fstat64(4, {st_mode=S_IFDIR|0755, st_size=48, ...}) = 0 > fstat64(4, {st_mode=S_IFDIR|0755, st_size=48, ...}) = 0 > fcntl64(4, F_GETFL) = 0x28800 (flags > O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) > fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 > close(3)= 0 > fcntl64(4, F_GETFD) = 0x1 (flags FD_CLOEXEC) > fstat64(4, {st_mode=S_IFDIR|0755, st_size=48, ...}) = 0 > getdents64(4, /* 2 entries */, 1024)= 48 > getdents64(4, /* 0 entries */, 1024)= 0 > fcntl64(4, F_GETFD) = 0x1 (flags FD_CLOEXEC) > openat(4, "..", O_RDONLY|O_LARGEFILE) = 3 > close(4)= 0 > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > unlinkat(3, "Child", AT_REMOVEDIR) = 0 > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > fcntl64(3, F_GETFL) = 0x8000 (flags > O_RDONLY|O_LARGEFILE) > fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 > fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 > getdents64(3, /* 3 entries */, 1024)= 80 > openat(3, "Child", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) = -1 > ENOENT (No such file or directory) > getdents64(3, /* 0 entries */, 1024)= 0 > fcntl64(3, F_GETFD) = 0x1 (flags FD_CLOEXEC) POSIX allows readdir to work that way. I already suggested that making this change might help you: -CONSECUTIVE_READDIR_UNLINK_THRESHOLD = 10 +CONSECUTIVE_READDIR_UNLINK_THRESHOLD = 1
Re: Correct semantics in rm -rf ?
Here it is. Outcome is that Child gets deleted but parent does not get deleted. (Now you will say it's a filesystem issue ;) ) fstatat64(AT_FDCWD, "parent", {st_mode=S_IFDIR|0755, st_size=72, ...}, AT_SYMLINK_NOFOLLOW) = 0 unlinkat(AT_FDCWD, "parent", 0) = -1 EISDIR (Is a directory) openat(AT_FDCWD, "parent", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) = 3 fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 fcntl64(3, F_GETFL) = 0x28800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 getdents64(3, /* 3 entries */, 1024)= 80 openat(3, "Child", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) = 4 fstat64(4, {st_mode=S_IFDIR|0755, st_size=48, ...}) = 0 fstat64(4, {st_mode=S_IFDIR|0755, st_size=48, ...}) = 0 fcntl64(4, F_GETFL) = 0x28800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 close(3)= 0 fcntl64(4, F_GETFD) = 0x1 (flags FD_CLOEXEC) fstat64(4, {st_mode=S_IFDIR|0755, st_size=48, ...}) = 0 getdents64(4, /* 2 entries */, 1024)= 48 getdents64(4, /* 0 entries */, 1024)= 0 fcntl64(4, F_GETFD) = 0x1 (flags FD_CLOEXEC) openat(4, "..", O_RDONLY|O_LARGEFILE) = 3 close(4)= 0 fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 unlinkat(3, "Child", AT_REMOVEDIR) = 0 fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 fcntl64(3, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 fstat64(3, {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0 getdents64(3, /* 3 entries */, 1024)= 80 openat(3, "Child", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) = -1 ENOENT (No such file or directory) getdents64(3, /* 0 entries */, 1024)= 0 fcntl64(3, F_GETFD) = 0x1 (flags FD_CLOEXEC) close(3)= 0 close(0)= 0 close(1)= 0 close(2)= 0 exit_group(1) = ? Shailesh Jain On Thu, Aug 27, 2009 at 3:44 AM, Jim Meyering wrote: > shailesh jain wrote: > ... > >> > /* Now get directory entries of 'parent'. There are no guarantees that > >> > updates should have been reflected here, but rm -rf assumes this! > Updates > >> > are guaranteed to be reflects only on next openat() */ > >> > >> Or rewinddir. In practice, only a few unusual (I'd say crufty) file > >> systems have had problems here. Hence the periodic rewinddir call, > >> when deemed necessary. I think one was hfs on MacOS of a few > >> years ago. That version of that OS appears no longer to be relevant. > >> > > > > > > Yes, that is what I expected to have either rewinddir() or Close()/Open() > > call. I agree that 'rm -rf' will work on most (if not all) filesystems > but > > clearly just because if works on most filesystems does not make it > > compliant. > > > > And thus I do not understand when you say "If you see behavior from rm > that > > *is* not POSIX compliant, given compliant infrastructure, then I'll be > happy > > to investigate it." > > > > It clearly isn't. > > How can it be clear when you have not demonstrated a failure? > Did rm -rf fail to remove a hierarchy that POSIX specifies it must? > Your strace output did not show any failure. > > While rm -rf may do extra work on a file system like you describe, > I would be surprised if rm -rf fails to conform, since > with -rf, rm ignores any failure due to e.g., ENOENT. > > > Underlying file-systems are not required by POSIX to > > respect the assumption that 'rm' is making. Why not have a rewinddir() > call > > there ? > > Try it, and you'll see that performance is affected. > > This might do it: > > -CONSECUTIVE_READDIR_UNLINK_THRESHOLD = 10 > +CONSECUTIVE_READDIR_UNLINK_THRESHOLD = 1 > > > Given that you plan to remove remove.c, you might not want to do it. But > > I have rewritten the core of remove.c. I'm not removing the file. > > > really it's just one call to rewinddir() :) which will make > POSIX-confirmers > > (like me) happy as well. > > IMHO, you still haven't shown how rm fails to conform. > Did you demonstrate how rm -rf fails? > Did you demonstrate how some other invocation of rm fails? > log Description: Binary data
du counting files in hard-linked directories twice
Hello everybody The Problem I see as a bug is that, in some circumstances, du is counting the same files multiple times when they appear inside of hard- linked directories. I have tested this with GNU coreutils 7.4 (MacPorts) and 7.5 (vanilla) on Mac OS X 10.5.8 on x86-64. I did all tests on an HFS+ volume, formatted as being non-cases-sensitive. Judging from du.c, the problem seems to be that du only inserts a file into it's table of visited files if it's link count is greater than one (du.c, Lines 513-517): if (skip || (!opt_count_all && ! S_ISDIR (sb->st_mode) && 1 < sb->st_nlink && ! hash_ins (sb->st_ino, sb->st_dev))) Replacing the 1 in the condition above with a 0 solves the problem, although it would probably be a much better idea to insert hard-linked directories into the hash table instead. This would also prevent du from counting the size of directory inodes twice. Appended is a simple script which makes three tests: * One where there is a single file with two links. * One where there is a directory with two links containing one file * One where there is a directory with two links containing two links to a single file. After the script is it's output on my machine and the output of ls - ilR for reference. With all the tests there is exactly one file, 1 MiB in size and only for the second test, du reports a total size of 2 MiB. I hope you agree that this is a bug in du and that I did not miss a problem on my side in evaluating this. I will gladly be of any help I can on fixing this problem, e.g. testing patches for people who do not have a setup that may be used to reproduce this problem. Thank you Michael $ cat test.sh #! /usr/bin/env bash set -e -o pipefail # Test with a single file with two links. ( echo 'Test with file hard links:' rm -rf 'test-d1-f2' mkdir 'test-d1-f2' cd 'test-d1-f2' mkdir 'top-1' mkdir 'top-2' dd bs=1k count=1k < /dev/zero > 'top-1/file' 2> /dev/null link 'top-1/file' 'top-2/file' du -h echo ) # Test with a directory with two links containing a single file with one link. ( echo 'Test with directory hard links:' rm -rf 'test-d2-f1' mkdir 'test-d2-f1' cd 'test-d2-f1' # Because of a restriction in HFS+ or the Mac OS, we need to put the two directory entries for the hard-linked directory into sepparate directories. mkdir 'top-1' mkdir 'top-2' mkdir 'top-1/dir' link 'top-1/dir' 'top-2/dir' dd bs=1k count=1k < /dev/zero > 'top-1/dir/file' 2> /dev/null du -h echo ) # Test with a directory with two links containing a single file with two links. ( echo 'Test with file and directory hard links:' rm -rf 'test-d2-f2' mkdir 'test-d2-f2' cd 'test-d2-f2' mkdir 'top-1' mkdir 'top-2' mkdir 'top-1/dir' link 'top-1/dir' 'top-2/dir' dd bs=1k count=1k < /dev/zero > 'top-1/dir/file-1' 2> /dev/null link 'top-1/dir/file-1' 'top-1/dir/file-2' du -h echo ) $ ./test.sh Test with file hard links: 1.0M./top-1 0 ./top-2 1.0M. Test with directory hard links: 1.0M./top-1/dir 1.0M./top-1 1.0M./top-2/dir 1.0M./top-2 2.0M. Test with file and directory hard links: 1.0M./top-1/dir 1.0M./top-1 0 ./top-2/dir 0 ./top-2 1.0M. $ ls -ilR .: total 4 12382224 drwxr-xr-x 4 michi michi 136 Aug 27 18:06 test-d1-f2 12382230 drwxr-xr-x 4 michi michi 136 Aug 27 18:06 test-d2-f1 12382237 drwxr-xr-x 4 michi michi 136 Aug 27 18:06 test-d2-f2 12382194 -rwxr-xr-x 1 michi michi 1207 Aug 27 18:06 test.sh ./test-d1-f2: total 0 12382225 drwxr-xr-x 3 michi michi 102 Aug 27 18:06 top-1 12382226 drwxr-xr-x 3 michi michi 102 Aug 27 18:06 top-2 ./test-d1-f2/top-1: total 1024 12382227 -rw-r--r-- 2 michi michi 1048576 Aug 27 18:06 file ./test-d1-f2/top-2: total 1024 12382227 -rw-r--r-- 2 michi michi 1048576 Aug 27 18:06 file ./test-d2-f1: total 0 12382231 drwxr-xr-x 3 michi michi 102 Aug 27 18:06 top-1 12382232 drwxr-xr-x 3 michi michi 102 Aug 27 18:06 top-2 ./test-d2-f1/top-1: total 0 12382233 drwxr-xr-x 3 michi michi 102 Aug 27 18:06 dir ./test-d2-f1/top-1/dir: total 1024 12382236 -rw-r--r-- 1 michi michi 1048576 Aug 27 18:06 file ./test-d2-f1/top-2: total 0 12382233 drwxr-xr-x 3 michi michi 102 Aug 27 18:06 dir ./test-d2-f1/top-2/dir: total 1024 12382236 -rw-r--r-- 1 michi michi 1048576 Aug 27 18:06 file ./test-d2-f2: total 0 12382238 drwxr-xr-x 3 michi michi 102 Aug 27 18:06 top-1 12382239 drwxr-xr-x 3 michi michi 102 Aug 27 18:06 top-2 ./test-d2-f2/top-1: total 0 12382240 drwxr-xr-x 4 michi michi 136 Aug 27 18:06 dir ./test-d2-f2/top-1/dir: total 2048 12382243 -rw-r--r-- 2 michi michi 1048576 Aug 27 18:06 file-1 12382243 -rw-r--r--
Re: [PATCH] open: introduce O_NOSTD
* Eric Blake: > int open_safer (const char *name, int flags, int mode) > { > int fd = open (name, flags | O_CLOEXEC, mode); > if (0 <= fd && fd <= 2) > { > int dup = fcntl (fd, ((flags & O_CLOEXEC) > ? F_DUPFD_CLOEXEC : F_DUPFD), 3); > int saved_errno = errno; > close (fd); > errno = saved_errno; > fd = dup; > } > else if (!(flags & O_CLOEXEC)) > { > if ((flags = fcntl (fd, F_GETFD)) < 0 > || fcntl (fd, F_SETFD, flags & ~FD_CLOEXEC) == -1) > { > int saved_errno = errno; > close (fd); > fd = -1; > errno = saved_errno; > } > } > return fd; > } > This solves the fd leak, It's still buggy. You need something like this: int open_safer(const char *name, int flags, int mode) { int opened_fd[3] = {0, 0, 0}; int fd, i, errno_saved; while (1) { fd = open(name, flags | O_CLOEXEC, mode); if (fd < 0 || fd > 2) { break; } opened_fd[fd] = 1; } for (int i = 0; i <= 2; ++i) { if (opened_fd[i]) { errno_saved = errno; close(i); errno = errno_saved; } } return fd; } It's untested, so it's probably still buggy. (O_CLOEXEC should have been a thread attribute, like the base path in the *_at functions. *sigh*) -- Florian Weimer BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Re: [PATCH] open: introduce O_NOSTD
On 08/27/2009 06:54 AM, Eric Blake wrote: I hope that my example shows why doing it in the kernel is desirable - there is no safe way to keep the pre-O_CLOEXEC efficiency using just the library, but there IS a way to do it with kernel support: You're describing a very special case where the performance implications are really minimal and try to argue that is a good enough reason? I don't think so. If a program really has to do thousands of these safe open calls then it can invest time into opening /dev/null for any of the unallocated descriptors < 3. You can even embed this logic in the safer_open function. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
Re: [PATCH] open: introduce O_NOSTD
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 According to Davide Libenzi on 8/25/2009 3:53 PM: >> Another solution is for the application to sanitize all newly-created >> fds: GNU coreutils provides a wrapper open_safer, which does nothing >> extra in the common case that open() returned 3 or larger, but calls >> fcntl(n,F_DUPFD,3)/close(n) before returning if n was less than 3. >> However, this leads to triple the syscall cost for every open() call >> if the process starts life with a std fd closed; and if O_CLOEXEC is >> not used, still leaves a window of time where the fd can be leaked >> through another thread's use of fork/exec. > > I think we can say that the vast majority of the software is not going to > notice the proposed open_safer(), performance-wise, since the first three > fds are always filled. So IMO the performance impact argument is a weak one. > If CLOEXEC semantics are needed in the open operation, F_DUPFD_CLOEXEC can > be used to match it. The current gnulib implementation of open_safer (pre-O_CLOEXEC support) is (roughly): /* Wrap open, to guarantee that a successful return value is >= 3. */ int open_safer (const char *name, int flags, int mode) { int fd = open (name, flags, mode); if (0 <= fd && fd <= 2) { int dup = fcntl (fd, F_DUPFD, 3); int saved_errno = errno; close (fd); errno = saved_errno; fd = dup; } return fd; } which has the desired property of no overhead in the common case of all standard fds open. But it obviously mishandles the O_CLOEXEC flag. Here's a first cut at supporting it: int open_safer (const char *name, int flags, int mode) { int fd = open (name, flags, mode); if (0 <= fd && fd <= 2) { int dup = fcntl (fd, ((flags & O_CLOEXEC) ? F_DUPFD_CLOEXEC : F_DUPFD), 3); int saved_errno = errno; close (fd); errno = saved_errno; fd = dup; } return fd; } If the user requested open_safer(O_CLOEXEC), then we still have the desired property of no syscall overhead and no fd leak. But if the user intentionally does not pass O_CLOEXEC because they wanted to create an inheritable fd, but without stomping on standard fds, then this version still has a window for an fd leak. So let's try this version, which guarantees no fd leak, while still keeping the semantics of giving the user an inheritable fd outside the std fd range: int open_safer (const char *name, int flags, int mode) { int fd = open (name, flags | O_CLOEXEC, mode); if (0 <= fd && fd <= 2) { int dup = fcntl (fd, ((flags & O_CLOEXEC) ? F_DUPFD_CLOEXEC : F_DUPFD), 3); int saved_errno = errno; close (fd); errno = saved_errno; fd = dup; } else if (!(flags & O_CLOEXEC)) { if ((flags = fcntl (fd, F_GETFD)) < 0 || fcntl (fd, F_SETFD, flags & ~FD_CLOEXEC) == -1) { int saved_errno = errno; close (fd); fd = -1; errno = saved_errno; } } return fd; } This solves the fd leak, and open_safer(O_CLOEXEC) is still cheap in the common case. But now the case of open_safer without O_CLOEXEC costs 3 syscalls, regardless of whether the standard fds were already open (if we assumed there were no possibility of new FD_* flags, we could cut the common-case penalty from three to two by skipping the fcntl(fd,F_GETFD) and just using fcntl(fd,F_SETFD,0), but that's asking for problems). > While the patch is simple, IMO this is something that can be easily taken > care in glibc layers w/out huge drawbacks. I hope that my example shows why doing it in the kernel is desirable - there is no safe way to keep the pre-O_CLOEXEC efficiency using just the library, but there IS a way to do it with kernel support: int open_safer (const char *name, int flags, int mode) { return open (name, flags | O_NOSTD, mode); } Also, remember this statement from Ulrich in the series that first introduced O_CLOEXEC/O_NONBLOCK support across all the fd creation APIs: "In future there will be other uses. Like a O_* flag to enable the use of non-sequential descriptors." http://marc.info/?l=linux-kernel&m=121010907127190&w=2 - -- Don't work too hard, make some time for fun as well! Eric Blake e...@byu.net -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkqWj/gACgkQ84KuGfSFAYClIgCdEw6/7+PiFhR7aKGyuLUd5RdR lbAAmgKLqCC5zlhkOm/x4K+Om7nqeD0i =Ibq4 -END PGP SIGNATURE-
Re: sort -R not working
Barty Pleydell-Bouverie wrote: > Hi > > sort -R fails to randomly sort files. I am using fedora8 - any known reason > for this? I vaguely remember there being a bug where sort -R was not working on some locales, but the git history is not helping me out. Can you retry with `LANG=C sort -R`. I can't reproduce though: $ sudo chroot /f8 # rpm -q coreutils coreutils-6.9-12.fc8 # echo $LANG en_IE.UTF-8 # seq 5 | sort -R 3 2 1 4 5 cheers, Pádraig.
Re: sort -R not working
On Thursday 27 of August 2009 12:46:51 Barty Pleydell-Bouverie wrote: > Hi > > sort -R fails to randomly sort files. I am using fedora8 - any known reason > for this? Please define "not working". What is the command you have used? What is the output? What is the expected output? Fedora 8 is EOL [1]. Are you able to reproduce the bug on a newer Fedora? Kamil [1] http://fedoraproject.org/wiki/LifeCycle/EOL
Re: sort -R not working
Barty Pleydell-Bouverie wrote: > sort -R fails to randomly sort files. I am using fedora8 - any known reason > for this? Fedora 8 is rather old. I suggest you upgrade. Please show us what command you ran, and the inputs and outputs. And what sort --version prints.
Re: sort -R not working
On Thu, 27 Aug 2009, Barty Pleydell-Bouverie wrote: sort -R fails to randomly sort files. I am using fedora8 - any known reason for this? Fedora 8 was declared end-of-life (i.e. unsupported) on January 7, 2009. Can you reproduce this problem either with a current Fedora release, or with the latest stable coreutils source? Also, when filing bug reports please state with as much detail as possible: - What you did - What happened (this is lacking in detail above) - What you expected to happen Cheers, Phil
sort -R not working
Hi sort -R fails to randomly sort files. I am using fedora8 - any known reason for this?
[PATCH] build: prefix a few rules with $(AM_V_GEN)
FYI, >From de619c8fa5ea4e5cd3ca4a9632af81a487c33c0b Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Thu, 27 Aug 2009 10:17:55 +0200 Subject: [PATCH] build: prefix a few rules with $(AM_V_GEN) * Makefile.am (.version, dist-hook, gen-ChangeLog): Use $(AM_V_GEN) and $(AM_V_at), so that automake's silent-rules option (make V=1/V=0) now controls whether the commands are printed at build time. (THANKS-to-translators, check-ls-dircolors): Likewise. --- Makefile.am | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Makefile.am b/Makefile.am index 57b3e33..603a444 100644 --- a/Makefile.am +++ b/Makefile.am @@ -98,20 +98,20 @@ rm_subst = \ BUILT_SOURCES = .version .version: - echo $(VERSION) > $...@-t && mv $...@-t $@ + $(AM_V_GEN)echo $(VERSION) > $...@-t && mv $...@-t $@ # Arrange so that .tarball-version appears only in the distribution # tarball, and never in a checked-out repository. # The perl substitution is to change some key uses of "rm" to "/bin/rm". # See the rm_subst comment for details. dist-hook: gen-ChangeLog - echo $(VERSION) > $(distdir)/.tarball-version - perl -pi -e '$(rm_subst)' $(distdir)/src/Makefile.in + $(AM_V_GEN)echo $(VERSION) > $(distdir)/.tarball-version + $(AM_V_at)perl -pi -e '$(rm_subst)' $(distdir)/src/Makefile.in gen_start_date = 2008-02-08 .PHONY: gen-ChangeLog gen-ChangeLog: - if test -d .git; then \ + $(AM_V_GEN)if test -d .git; then\ $(top_srcdir)/build-aux/gitlog-to-changelog \ --since=$(gen_start_date) > $(distdir)/cl-t;\ rm -f $(distdir)/ChangeLog; \ @@ -125,7 +125,7 @@ distcheck-hook: check-ls-dircolors DISTCLEANFILES = VERSION MAINTAINERCLEANFILES = THANKS-to-translators THANKS-to-translators: po/LINGUAS THANKStt.in - ( \ + $(AM_V_GEN)(\ cat $(srcdir)/THANKStt.in;\ for lang in `cat $(srcdir)/po/LINGUAS`; do\ echo http://translationproject.org/team/$$lang.html;\ @@ -136,7 +136,7 @@ THANKS-to-translators: po/LINGUAS THANKStt.in # remain in sync. .PHONY: check-ls-dircolors check-ls-dircolors: - dc=$$(sed -n '/static.*ls_codes\[/,/};'/p \ + $(AM_V_GEN)dc=$$(sed -n '/static.*ls_codes\[/,/};'/p\ $(srcdir)/src/dircolors.c \ |sed -n '/^ *"/p'|tr , '\n'|sed 's/^ *//' \ |sed -n 's/^"\(..\)"/\1/p'|sort -u); \ -- 1.6.4.1.359.g4fc77
Re: Correct semantics in rm -rf ?
shailesh jain wrote: ... >> > /* Now get directory entries of 'parent'. There are no guarantees that >> > updates should have been reflected here, but rm -rf assumes this! Updates >> > are guaranteed to be reflects only on next openat() */ >> >> Or rewinddir. In practice, only a few unusual (I'd say crufty) file >> systems have had problems here. Hence the periodic rewinddir call, >> when deemed necessary. I think one was hfs on MacOS of a few >> years ago. That version of that OS appears no longer to be relevant. >> > > > Yes, that is what I expected to have either rewinddir() or Close()/Open() > call. I agree that 'rm -rf' will work on most (if not all) filesystems but > clearly just because if works on most filesystems does not make it > compliant. > > And thus I do not understand when you say "If you see behavior from rm that > *is* not POSIX compliant, given compliant infrastructure, then I'll be happy > to investigate it." > > It clearly isn't. How can it be clear when you have not demonstrated a failure? Did rm -rf fail to remove a hierarchy that POSIX specifies it must? Your strace output did not show any failure. While rm -rf may do extra work on a file system like you describe, I would be surprised if rm -rf fails to conform, since with -rf, rm ignores any failure due to e.g., ENOENT. > Underlying file-systems are not required by POSIX to > respect the assumption that 'rm' is making. Why not have a rewinddir() call > there ? Try it, and you'll see that performance is affected. This might do it: -CONSECUTIVE_READDIR_UNLINK_THRESHOLD = 10 +CONSECUTIVE_READDIR_UNLINK_THRESHOLD = 1 > Given that you plan to remove remove.c, you might not want to do it. But I have rewritten the core of remove.c. I'm not removing the file. > really it's just one call to rewinddir() :) which will make POSIX-confirmers > (like me) happy as well. IMHO, you still haven't shown how rm fails to conform. Did you demonstrate how rm -rf fails? Did you demonstrate how some other invocation of rm fails?