Bug#577642: mv deletes files created while moving large directories
chrysn wrote: Package: coreutils Version: 8.4-2 Severity: important when files are created inside a directory during a `mv` of a the directory, those files are deleted at the end of the move. this typically happens when moving large (in terms of amount of data) directories across file systems. to reproduce, you need a current directory on a different partition than your `/tmp`, then do $ mkdir /tmp/movetest $ mkdir movetest $ dd if=/dev/zero of=movetest/bigfile bs=1024 count=30 30+0 records in 30+0 records out 30720 bytes (307 MB) copied, 11.4315 s, 26.9 MB/s (it is important that the file is big enough to take some seconds to create and read) $ mv movetest /tmp/movetest/ ; sleep 3; echo foobar movetest/latefile [1] + done mv movetest /tmp/movetest/ $ ls -lamovetest ls: cannot access movetest: No such file or directory $ ls -la /tmp/movetest total 300348 drwxr-xr-x 2 chrysn chrysn 4096 Apr 13 12:21 . drwxrwxrwt 158 root root 45056 Apr 13 12:21 .. -rw-r--r-- 1 chrysn chrysn 30720 Apr 13 12:21 bigfile you see that the `latestfile` has vanished. no such behavior is documented in the man page. i suggest that `mv` should only delete the files it has successfully moved, and then should behave like `rmdir` for removing the directories. i'm indifferent on whether it should just print a warning that some folders could not be removed because they are not empty and return successfully or set a non-zero exit status. (there could be flags to modify the behavior, eg one to restore the old behavior and one to make mv fail if the directories can't be removed.) Thanks for the report. In some sense, the behavior you've noticed is inevitable. Imagine that after copying, mv were to go back and check again: then it spots the new file (your latestfile) and copies it. Do we continue iterating and looking for new files in each and every directory being copied? At some point we have to stop and then begin the removal process (which requires removal of each entire tree/argument). Between when we stop looking for new files and when the removal gets to any given directory, there will always be an interval during which someone can create a file/dir there that will silently be removed. Also consider this: what if a file we've already copied is removed before the copy completes? Should mv perform another iteration to detect that, and then remove it also in the destination tree? If we were to try to make mv remove source files only if we've copied them, not only would that introduce a significant amount of overhead, but it would change mv's semantics. If you want to pursue this, I suggest that you bring it up with the Austin Group (they define the POSIX standard). http://www.opengroup.org/austin/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#577642: mv deletes files created while moving large directories
On Wed, Apr 14, 2010 at 08:32:39AM +0200, Jim Meyering wrote: In some sense, the behavior you've noticed is inevitable. Imagine that after copying, mv were to go back and check again: then it spots the new file (your latestfile) and copies it. Do we continue iterating and looking for new files in each and every directory being copied? At some point we have to stop and then begin the removal process (which requires removal of each entire tree/argument). Between when we stop looking for new files and when the removal gets to any given directory, there will always be an interval during which someone can create a file/dir there that will silently be removed. Also consider this: what if a file we've already copied is removed before the copy completes? Should mv perform another iteration to detect that, and then remove it also in the destination tree? i am aware that it is impossible to atomically move all files and remove the directory on posix semantics, that's why i rather suggest leaving left-over files where they are and not removing the directory. for sake of completeness, there is even the problem with open file handles: assume a process has just written a file that is now being moved and still has a file handle. when move completes, the file is unlinked, leaving the program with a write handle on a deleted file, to which it can, to my knowledge, continue writing, but on close(), all is lost -- in the typical case originally described, this is not the case, though, and people who operate on files currently being written to usually know that there can be issues. If we were to try to make mv remove source files only if we've copied them, not only would that introduce a significant amount of overhead, but [...] i've now had a look at the implementation -- current coreutils really does the equivalent of 'rm -r' if there were no errors when copying. only removing the files moved would mean tracking all of them, while the current theoretical memory requirement amounts to the maximum path depth. [...] overhead, but it would change mv's semantics. If you want to pursue this, I suggest that you bring it up with the Austin Group (they define the POSIX standard). http://www.opengroup.org/austin/ for what i looked up on posix specs, there are no statements about what to do in case of EXDEV (rename didn't work) [1]. do you think the austin group would bother to specify previously unspecified behavior? [1] http://www.opengroup.org/onlinepubs/9699919799/utilities/mv.html a solution that goes even deeper into the semantics but has no memory overhead issues would be to delete files immediately after moving them. this has a deeper effect on the semantics because its effect is not limited to the case described above, but also affects cases in which some files can't be read, which would be the only files left in this solution (while originally, in that case there would be a copy of readable files in the destination, but all unreadable files would be left untouched). in case we stick to the current semantics (or implement others but leave the old as default), i suggest the following section to be inserted in the man page: CAVEATS When directories are moved across file systems, the source is removed completely after successfully having copied all files to the destination with the equivalent of `rm -r`, regardless of files written while mv was running. signature.asc Description: Digital signature
Bug#577642: mv deletes files created while moving large directories
chrysn wrote: On Wed, Apr 14, 2010 at 08:32:39AM +0200, Jim Meyering wrote: In some sense, the behavior you've noticed is inevitable. Imagine that after copying, mv were to go back and check again: then it spots the new file (your latestfile) and copies it. Do we continue iterating and looking for new files in each and every directory being copied? At some point we have to stop and then begin the removal process (which requires removal of each entire tree/argument). Between when we stop looking for new files and when the removal gets to any given directory, there will always be an interval during which someone can create a file/dir there that will silently be removed. Also consider this: what if a file we've already copied is removed before the copy completes? Should mv perform another iteration to detect that, and then remove it also in the destination tree? i am aware that it is impossible to atomically move all files and remove the directory on posix semantics, that's why i rather suggest leaving left-over files where they are and not removing the directory. What if I'm manually doing cp -a dir/ dest/, then run rm -rf dir ? The same thing can arise if someone copies a file into dir while cp is running. Depending on the timing, it may or may not be copied, and then my subsequent rm will delete it. for sake of completeness, there is even the problem with open file handles: assume a process has just written a file that is now being moved and still has a file handle. when move completes, the file is unlinked, leaving the program with a write handle on a deleted file, to which it can, to my knowledge, continue writing, but on close(), all is lost -- in the typical case originally described, this is not the case, though, and people who operate on files currently being written to usually know that there can be issues. If we were to try to make mv remove source files only if we've copied them, not only would that introduce a significant amount of overhead, but [...] i've now had a look at the implementation -- current coreutils really does the equivalent of 'rm -r' if there were no errors when copying. only removing the files moved would mean tracking all of them, while the current theoretical memory requirement amounts to the maximum path depth. [...] overhead, but it would change mv's semantics. If you want to pursue this, I suggest that you bring it up with the Austin Group (they define the POSIX standard). http://www.opengroup.org/austin/ for what i looked up on posix specs, there are no statements about what to do in case of EXDEV (rename didn't work) [1]. do you think the austin group would bother to specify previously unspecified behavior? [1] http://www.opengroup.org/onlinepubs/9699919799/utilities/mv.html I think your scenario is unlikely enough that we can compare it to the classical Doctor, it hurts when I do this... one. Well, then don't do that. However, if you find that some other implementation of rm (*BSD, opensolaris, etc.) handle this in a better manner either by default, or via an option, please let us know. a solution that goes even deeper into the semantics but has no memory overhead issues would be to delete files immediately after moving them. this has a deeper effect on the semantics because its effect is not limited to the case described above, but also affects cases in which some files can't be read, which would be the only files left in this solution (while originally, in that case there would be a copy of readable files in the destination, but all unreadable files would be left untouched). in case we stick to the current semantics (or implement others but leave the old as default), i suggest the following section to be inserted in the man page: CAVEATS When directories are moved across file systems, the source is removed completely after successfully having copied all files to the destination with the equivalent of `rm -r`, regardless of files written while mv was running. Thanks for the suggestion. The man page is generated from mv --help, so a note like that belongs in the more thorough info documentation. Would you like to reword that so it doesn't sound like we're using rm to copy, and present it as a patch to doc/coreutils.texi, per the contribution guidelines? http://git.sv.gnu.org/cgit/coreutils.git/tree/HACKING Consider whether cp would need a similar note. Maybe rm, too. It may or may not delete something you write into a tree that is in the process of being removed. And chown, chmod, chgrp (when using -R) and du. Perhaps this is something that is too basic to be attached to any particular tool. The behavior of hierarchy-traversing tools is usually
Bug#577642: mv deletes files created while moving large directories
Package: coreutils Version: 8.4-2 Severity: important when files are created inside a directory during a `mv` of a the directory, those files are deleted at the end of the move. this typically happens when moving large (in terms of amount of data) directories across file systems. to reproduce, you need a current directory on a different partition than your `/tmp`, then do $ mkdir /tmp/movetest $ mkdir movetest $ dd if=/dev/zero of=movetest/bigfile bs=1024 count=30 30+0 records in 30+0 records out 30720 bytes (307 MB) copied, 11.4315 s, 26.9 MB/s (it is important that the file is big enough to take some seconds to create and read) $ mv movetest /tmp/movetest/ ; sleep 3; echo foobar movetest/latefile [1] + done mv movetest /tmp/movetest/ $ ls -lamovetest ls: cannot access movetest: No such file or directory $ ls -la /tmp/movetest total 300348 drwxr-xr-x 2 chrysn chrysn 4096 Apr 13 12:21 . drwxrwxrwt 158 root root 45056 Apr 13 12:21 .. -rw-r--r-- 1 chrysn chrysn 30720 Apr 13 12:21 bigfile you see that the `latestfile` has vanished. no such behavior is documented in the man page. i suggest that `mv` should only delete the files it has successfully moved, and then should behave like `rmdir` for removing the directories. i'm indifferent on whether it should just print a warning that some folders could not be removed because they are not empty and return successfully or set a non-zero exit status. (there could be flags to modify the behavior, eg one to restore the old behavior and one to make mv fail if the directories can't be removed.) -- System Information: Debian Release: squeeze/sid APT prefers unstable APT policy: (500, 'unstable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 2.6.33-2-amd64 (SMP w/2 CPU cores) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages coreutils depends on: ii libacl1 2.2.49-2 Access control list shared library ii libc6 2.10.2-6 Embedded GNU C Library: Shared lib ii libselinux1 2.0.94-1 SELinux runtime shared libraries coreutils recommends no packages. coreutils suggests no packages. -- debconf-show failed -- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom signature.asc Description: Digital signature