Bug#577642: mv deletes files created while moving large directories

2010-04-14 Thread Jim Meyering
chrysn wrote:
 Package: coreutils
 Version: 8.4-2
 Severity: important

 when files are created inside a directory during a `mv` of a the
 directory, those files are deleted at the end of the move.

 this typically happens when moving large (in terms of amount of data)
 directories across file systems.

 to reproduce, you need a current directory on a different partition than
 your `/tmp`, then do

 $ mkdir /tmp/movetest
 $ mkdir movetest
 $ dd if=/dev/zero of=movetest/bigfile bs=1024 count=30
 30+0 records in
 30+0 records out
 30720 bytes (307 MB) copied, 11.4315 s, 26.9 MB/s

 (it is important that the file is big enough to take some seconds to
 create and read)

 $ mv movetest /tmp/movetest/ ; sleep 3; echo foobar  movetest/latefile
 [1]  + done   mv movetest /tmp/movetest/
 $ ls -lamovetest
 ls: cannot access movetest: No such file or directory
 $ ls -la /tmp/movetest
 total 300348
 drwxr-xr-x   2 chrysn chrysn  4096 Apr 13 12:21 .
 drwxrwxrwt 158 root   root   45056 Apr 13 12:21 ..
 -rw-r--r--   1 chrysn chrysn 30720 Apr 13 12:21 bigfile

 you see that the `latestfile` has vanished.

 no such behavior is documented in the man page.

 i suggest that `mv` should only delete the files it has successfully
 moved, and then should behave like `rmdir` for removing the directories.

 i'm indifferent on whether it should just print a warning that some
 folders could not be removed because they are not empty and return
 successfully or set a non-zero exit status.

 (there could be flags to modify the behavior, eg one to restore the old
 behavior and one to make mv fail if the directories can't be removed.)

Thanks for the report.
In some sense, the behavior you've noticed is inevitable.
Imagine that after copying, mv were to go back and check again:
then it spots the new file (your latestfile) and copies it.
Do we continue iterating and looking for new files in each
and every directory being copied?  At some point we have to
stop and then begin the removal process (which requires removal
of each entire tree/argument).  Between when we stop looking for
new files and when the removal gets to any given directory, there
will always be an interval during which someone can create a file/dir
there that will silently be removed.

Also consider this: what if a file we've already copied is removed before
the copy completes?  Should mv perform another iteration to detect that,
and then remove it also in the destination tree?

If we were to try to make mv remove source files only if we've copied
them, not only would that introduce a significant amount of overhead,
but it would change mv's semantics.

If you want to pursue this, I suggest that you bring it up with the
Austin Group (they define the POSIX standard).
http://www.opengroup.org/austin/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#577642: mv deletes files created while moving large directories

2010-04-14 Thread chrysn
On Wed, Apr 14, 2010 at 08:32:39AM +0200, Jim Meyering wrote:
 In some sense, the behavior you've noticed is inevitable.
 Imagine that after copying, mv were to go back and check again:
 then it spots the new file (your latestfile) and copies it.
 Do we continue iterating and looking for new files in each
 and every directory being copied?  At some point we have to
 stop and then begin the removal process (which requires removal
 of each entire tree/argument).  Between when we stop looking for
 new files and when the removal gets to any given directory, there
 will always be an interval during which someone can create a file/dir
 there that will silently be removed.
 
 Also consider this: what if a file we've already copied is removed before
 the copy completes?  Should mv perform another iteration to detect that,
 and then remove it also in the destination tree?

i am aware that it is impossible to atomically move all files and remove
the directory on posix semantics, that's why i rather suggest leaving
left-over files where they are and not removing the directory.

for sake of completeness, there is even the problem with open file
handles: assume a process has just written a file that is now being
moved and still has a file handle. when move completes, the file is
unlinked, leaving the program with a write handle on a deleted file, to
which it can, to my knowledge, continue writing, but on close(), all is
lost -- in the typical case originally described, this is not the
case, though, and people who operate on files currently being written to
usually know that there can be issues.


 If we were to try to make mv remove source files only if we've copied
 them, not only would that introduce a significant amount of overhead,
 but [...]

i've now had a look at the implementation -- current coreutils really
does the equivalent of 'rm -r' if there were no errors when copying.
only removing the files moved would mean tracking all of them, while the
current theoretical memory requirement amounts to the maximum path
depth.

   [...] overhead,
 but it would change mv's semantics.
 
 If you want to pursue this, I suggest that you bring it up with the
 Austin Group (they define the POSIX standard).
 http://www.opengroup.org/austin/

for what i looked up on posix specs, there are no statements about what
to do in case of EXDEV (rename didn't work) [1]. do you think the austin
group would bother to specify previously unspecified behavior?

[1] http://www.opengroup.org/onlinepubs/9699919799/utilities/mv.html


a solution that goes even deeper into the semantics but has no memory
overhead issues would be to delete files immediately after moving them.
this has a deeper effect on the semantics because its effect is not
limited to the case described above, but also affects cases in which
some files can't be read, which would be the only files left in this
solution (while originally, in that case there would be a copy of
readable files in the destination, but all unreadable files would be
left untouched).


in case we stick to the current semantics (or implement others but leave
the old as default), i suggest the following section to be inserted in
the man page:



CAVEATS
   When directories are moved across file systems, the source is
   removed completely after successfully having copied all files to
   the destination with the equivalent of `rm -r`, regardless of
   files written while mv was running.




signature.asc
Description: Digital signature


Bug#577642: mv deletes files created while moving large directories

2010-04-14 Thread Jim Meyering
chrysn wrote:
 On Wed, Apr 14, 2010 at 08:32:39AM +0200, Jim Meyering wrote:
 In some sense, the behavior you've noticed is inevitable.
 Imagine that after copying, mv were to go back and check again:
 then it spots the new file (your latestfile) and copies it.
 Do we continue iterating and looking for new files in each
 and every directory being copied?  At some point we have to
 stop and then begin the removal process (which requires removal
 of each entire tree/argument).  Between when we stop looking for
 new files and when the removal gets to any given directory, there
 will always be an interval during which someone can create a file/dir
 there that will silently be removed.

 Also consider this: what if a file we've already copied is removed before
 the copy completes?  Should mv perform another iteration to detect that,
 and then remove it also in the destination tree?

 i am aware that it is impossible to atomically move all files and remove
 the directory on posix semantics, that's why i rather suggest leaving
 left-over files where they are and not removing the directory.

What if I'm manually doing cp -a dir/ dest/,
then run rm -rf dir ?
The same thing can arise if someone copies a file
into dir while cp is running.  Depending on the
timing, it may or may not be copied, and then
my subsequent rm will delete it.

 for sake of completeness, there is even the problem with open file
 handles: assume a process has just written a file that is now being
 moved and still has a file handle. when move completes, the file is
 unlinked, leaving the program with a write handle on a deleted file, to
 which it can, to my knowledge, continue writing, but on close(), all is
 lost -- in the typical case originally described, this is not the
 case, though, and people who operate on files currently being written to
 usually know that there can be issues.


 If we were to try to make mv remove source files only if we've copied
 them, not only would that introduce a significant amount of overhead,
 but [...]

 i've now had a look at the implementation -- current coreutils really
 does the equivalent of 'rm -r' if there were no errors when copying.
 only removing the files moved would mean tracking all of them, while the
 current theoretical memory requirement amounts to the maximum path
 depth.

   [...] overhead,
 but it would change mv's semantics.

 If you want to pursue this, I suggest that you bring it up with the
 Austin Group (they define the POSIX standard).
 http://www.opengroup.org/austin/

 for what i looked up on posix specs, there are no statements about what
 to do in case of EXDEV (rename didn't work) [1]. do you think the austin
 group would bother to specify previously unspecified behavior?

 [1] http://www.opengroup.org/onlinepubs/9699919799/utilities/mv.html

I think your scenario is unlikely enough that we can compare
it to the classical Doctor, it hurts when I do this... one.
Well, then don't do that.

However, if you find that some other implementation of rm
(*BSD, opensolaris, etc.) handle this in a better manner
either by default, or via an option, please let us know.

 a solution that goes even deeper into the semantics but has no memory
 overhead issues would be to delete files immediately after moving them.
 this has a deeper effect on the semantics because its effect is not
 limited to the case described above, but also affects cases in which
 some files can't be read, which would be the only files left in this
 solution (while originally, in that case there would be a copy of
 readable files in the destination, but all unreadable files would be
 left untouched).


 in case we stick to the current semantics (or implement others but leave
 the old as default), i suggest the following section to be inserted in
 the man page:

 

 CAVEATS
When directories are moved across file systems, the source is
removed completely after successfully having copied all files to
the destination with the equivalent of `rm -r`, regardless of
files written while mv was running.
 

Thanks for the suggestion.
The man page is generated from mv --help, so a note like that belongs
in the more thorough info documentation.

Would you like to reword that so it doesn't sound like we're using rm
to copy, and present it as a patch to doc/coreutils.texi, per the
contribution guidelines?

  http://git.sv.gnu.org/cgit/coreutils.git/tree/HACKING

Consider whether cp would need a similar note.

Maybe rm, too.  It may or may not delete something you write into a
tree that is in the process of being removed.
And chown, chmod, chgrp (when using -R) and du.

Perhaps this is something that is too basic to be attached to
any particular tool.  The behavior of hierarchy-traversing tools
is usually 

Bug#577642: mv deletes files created while moving large directories

2010-04-13 Thread chrysn
Package: coreutils
Version: 8.4-2
Severity: important

when files are created inside a directory during a `mv` of a the
directory, those files are deleted at the end of the move.

this typically happens when moving large (in terms of amount of data)
directories across file systems.

to reproduce, you need a current directory on a different partition than
your `/tmp`, then do

$ mkdir /tmp/movetest
$ mkdir movetest
$ dd if=/dev/zero of=movetest/bigfile bs=1024 count=30
30+0 records in
30+0 records out
30720 bytes (307 MB) copied, 11.4315 s, 26.9 MB/s

(it is important that the file is big enough to take some seconds to
create and read)

$ mv movetest /tmp/movetest/ ; sleep 3; echo foobar  movetest/latefile
[1]  + done   mv movetest /tmp/movetest/
$ ls -lamovetest
ls: cannot access movetest: No such file or directory
$ ls -la /tmp/movetest
total 300348
drwxr-xr-x   2 chrysn chrysn  4096 Apr 13 12:21 .
drwxrwxrwt 158 root   root   45056 Apr 13 12:21 ..
-rw-r--r--   1 chrysn chrysn 30720 Apr 13 12:21 bigfile

you see that the `latestfile` has vanished.

no such behavior is documented in the man page.


i suggest that `mv` should only delete the files it has successfully
moved, and then should behave like `rmdir` for removing the directories.

i'm indifferent on whether it should just print a warning that some
folders could not be removed because they are not empty and return
successfully or set a non-zero exit status.

(there could be flags to modify the behavior, eg one to restore the old
behavior and one to make mv fail if the directories can't be removed.)


-- System Information:
Debian Release: squeeze/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.33-2-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages coreutils depends on:
ii  libacl1   2.2.49-2   Access control list shared library
ii  libc6 2.10.2-6   Embedded GNU C Library: Shared lib
ii  libselinux1   2.0.94-1   SELinux runtime shared libraries

coreutils recommends no packages.

coreutils suggests no packages.

-- debconf-show failed

-- 
To use raw power is to make yourself infinitely vulnerable to greater powers.
  -- Bene Gesserit axiom


signature.asc
Description: Digital signature