Re: [PATCH 00/13] overlay filesystem v22
On Thu, May 29, 2014 at 12:26 PM, David Howells wrote: > Miklos Szeredi wrote: > >> Perfect solution would be an invisible temp directory. This needs filesystem >> support, but perhaps not so difficult. Again could be done later without >> backward compatibility issues. > > Maybe make a tempfile and hardlink it into place when complete. That's what > unionmount is doing. That doesn't work with RENAME_EXCHANGE, which is what overlayfs uses. I think that's a small price to pay for not needing to add whiteout support to every single directory operation. We could also implement RENAME_EXCHANGE with a tmpfile, but then again, I think that may be too much complexity for too little gain. Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/13] overlay filesystem v22
Miklos Szeredi wrote: > Perfect solution would be an invisible temp directory. This needs filesystem > support, but perhaps not so difficult. Again could be done later without > backward compatibility issues. Maybe make a tempfile and hardlink it into place when complete. That's what unionmount is doing. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/13] overlay filesystem v22
Miklos Szeredi mik...@szeredi.hu wrote: Perfect solution would be an invisible temp directory. This needs filesystem support, but perhaps not so difficult. Again could be done later without backward compatibility issues. Maybe make a tempfile and hardlink it into place when complete. That's what unionmount is doing. David -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/13] overlay filesystem v22
On Thu, May 29, 2014 at 12:26 PM, David Howells dhowe...@redhat.com wrote: Miklos Szeredi mik...@szeredi.hu wrote: Perfect solution would be an invisible temp directory. This needs filesystem support, but perhaps not so difficult. Again could be done later without backward compatibility issues. Maybe make a tempfile and hardlink it into place when complete. That's what unionmount is doing. That doesn't work with RENAME_EXCHANGE, which is what overlayfs uses. I think that's a small price to pay for not needing to add whiteout support to every single directory operation. We could also implement RENAME_EXCHANGE with a tmpfile, but then again, I think that may be too much complexity for too little gain. Thanks, Miklos -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/13] overlay filesystem v22
On Mon, May 26, 2014 at 10:56:42AM +0900, J. R. Okajima wrote: > > Here are some comments. Thanks for the review. > > - I have no objection about the 0:0 char-dev whiteout, but you don't > have to have the inode for each whiteout. The hardlink is better. > In this version, you have now. How about creating a "base" > whiteout under workdir at the mount-time? Maybe it is possible by > user-space "mount.overlayfs" or kernel-space. When the whiteout meets > EMLINK, create a non-hardlink for that target synchronously and > re-create the "base" asynchronously. The reason I don't do this is complexity. If ever this becomes a problem, then we could add hardlinked whiteouts in the future. > > - Is /work always visible to users? If a user accidentally > removes it or its children, then some operations will fail. And he > cannot recover it anymore. I know it cannot easily happen since its > mode is 0. But I am afraid it will be a source of troubles. For > example, find(1) or "ls -R /overlayfs" will almost always fail. Perfect solution would be an invisible temp directory. This needs filesystem support, but perhaps not so difficult. Again could be done later without backward compatibility issues. > > - If I remember correctly, the length of the dir mutex is held time has > been pointed out. This version has still a long mutex held time, a whole > copy-up operation includeing lookup, create, copy filedata, copy > xattr/attr, and then rename. How about unlock the dir before copying > filedata and re-lock and confirm after copying attr? Possibly doable, but again this would add complexity and I'd rather leave it until somebody complains. > > - When two processes copy-up a similar dir hierarcy, for example > /dirA/dirB/fileC and /dirA/dirB/dirC/fileD, may a race condition > happen? Two processes begin copying-up dirA, first processA succeeds > it and second processB fails and returns EIO? No, we check the state with the parent lock held and skip the copy up if sombody else won the race. > > - All copy-up operations will be serialized due to lock. Yes. Trivially fixable by creating a separate dir for each temp file. > > - In fstat(2) for a dir, is nlink set to 1 even it is removed? Probably. I think right fix is to check if dentry is hashed and set nlink to zero otherwise. Will look into it. > > - In readdir, it lookup or getattr to determine whether the found char > dev entry is a whiteout or not. I know a char dev is not so many, so > this overhead won't be large. But if its name represented "I am a > whiteout", then the extra lookup or getattr would be unnecessary. At the cost of namespace issues. I wouldn't consider that a good trade. Thanks, Miklos -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/13] overlay filesystem v22
On Mon, May 26, 2014 at 10:56:42AM +0900, J. R. Okajima wrote: Here are some comments. Thanks for the review. - I have no objection about the 0:0 char-dev whiteout, but you don't have to have the inode for each whiteout. The hardlink is better. In this version, you have workdir now. How about creating a base whiteout under workdir at the mount-time? Maybe it is possible by user-space mount.overlayfs or kernel-space. When the whiteout meets EMLINK, create a non-hardlink for that target synchronously and re-create the base asynchronously. The reason I don't do this is complexity. If ever this becomes a problem, then we could add hardlinked whiteouts in the future. - Is workdir/work always visible to users? If a user accidentally removes it or its children, then some operations will fail. And he cannot recover it anymore. I know it cannot easily happen since its mode is 0. But I am afraid it will be a source of troubles. For example, find(1) or ls -R /overlayfs will almost always fail. Perfect solution would be an invisible temp directory. This needs filesystem support, but perhaps not so difficult. Again could be done later without backward compatibility issues. - If I remember correctly, the length of the dir mutex is held time has been pointed out. This version has still a long mutex held time, a whole copy-up operation includeing lookup, create, copy filedata, copy xattr/attr, and then rename. How about unlock the dir before copying filedata and re-lock and confirm after copying attr? Possibly doable, but again this would add complexity and I'd rather leave it until somebody complains. - When two processes copy-up a similar dir hierarcy, for example /dirA/dirB/fileC and /dirA/dirB/dirC/fileD, may a race condition happen? Two processes begin copying-up dirA, first processA succeeds it and second processB fails and returns EIO? No, we check the state with the parent lock held and skip the copy up if sombody else won the race. - All copy-up operations will be serialized due to workdir lock. Yes. Trivially fixable by creating a separate dir for each temp file. - In fstat(2) for a dir, is nlink set to 1 even it is removed? Probably. I think right fix is to check if dentry is hashed and set nlink to zero otherwise. Will look into it. - In readdir, it lookup or getattr to determine whether the found char dev entry is a whiteout or not. I know a char dev is not so many, so this overhead won't be large. But if its name represented I am a whiteout, then the extra lookup or getattr would be unnecessary. At the cost of namespace issues. I wouldn't consider that a good trade. Thanks, Miklos -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/13] overlay filesystem v22
Thanks for CC-ing me. Here are some comments. - I have no objection about the 0:0 char-dev whiteout, but you don't have to have the inode for each whiteout. The hardlink is better. In this version, you have now. How about creating a "base" whiteout under workdir at the mount-time? Maybe it is possible by user-space "mount.overlayfs" or kernel-space. When the whiteout meets EMLINK, create a non-hardlink for that target synchronously and re-create the "base" asynchronously. - Is /work always visible to users? If a user accidentally removes it or its children, then some operations will fail. And he cannot recover it anymore. I know it cannot easily happen since its mode is 0. But I am afraid it will be a source of troubles. For example, find(1) or "ls -R /overlayfs" will almost always fail. - If I remember correctly, the length of the dir mutex is held time has been pointed out. This version has still a long mutex held time, a whole copy-up operation includeing lookup, create, copy filedata, copy xattr/attr, and then rename. How about unlock the dir before copying filedata and re-lock and confirm after copying attr? - When two processes copy-up a similar dir hierarcy, for example /dirA/dirB/fileC and /dirA/dirB/dirC/fileD, may a race condition happen? Two processes begin copying-up dirA, first processA succeeds it and second processB fails and returns EIO? - All copy-up operations will be serialized due to lock. - In fstat(2) for a dir, is nlink set to 1 even it is removed? - In readdir, it lookup or getattr to determine whether the found char dev entry is a whiteout or not. I know a char dev is not so many, so this overhead won't be large. But if its name represented "I am a whiteout", then the extra lookup or getattr would be unnecessary. My personal impression for overall is overlayfs starts growing. Also several parts look like towarding aufs. For example, - a means an overlayfs specific work. Aufs has such special dir for copying-up an unlinked file and a pseudo-link. Both are unnecessary for overlayfs because overlayfs copies-up a file in open(2) time, and doesn't support the hardlink between layers. - many small wrapper functions for VFS helpers resemble to aufs too. In aufs, all they have lockdep_off/on. - the internal cache for readdir allocating extra memory. Aufs adopts a simple hashing, while overlayfs uses rbtree. But of course the fundamental design differences between overlayfs and aufs are kept, such as - a name-based union .vs. an inode-aware union - multiple layers - allow users to access the layers directly - etc... If LKML people consider merging overlayfs, then I'd ask to consier aufs too. The basic design is unchanged since when I posted a few years ago. http://marc.info/?l=linux-kernel=123934927611907=2 And latest one is http://aufs.sourceforge.net J. R. Okajima -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/13] overlay filesystem v22
Thanks for CC-ing me. Here are some comments. - I have no objection about the 0:0 char-dev whiteout, but you don't have to have the inode for each whiteout. The hardlink is better. In this version, you have workdir now. How about creating a base whiteout under workdir at the mount-time? Maybe it is possible by user-space mount.overlayfs or kernel-space. When the whiteout meets EMLINK, create a non-hardlink for that target synchronously and re-create the base asynchronously. - Is workdir/work always visible to users? If a user accidentally removes it or its children, then some operations will fail. And he cannot recover it anymore. I know it cannot easily happen since its mode is 0. But I am afraid it will be a source of troubles. For example, find(1) or ls -R /overlayfs will almost always fail. - If I remember correctly, the length of the dir mutex is held time has been pointed out. This version has still a long mutex held time, a whole copy-up operation includeing lookup, create, copy filedata, copy xattr/attr, and then rename. How about unlock the dir before copying filedata and re-lock and confirm after copying attr? - When two processes copy-up a similar dir hierarcy, for example /dirA/dirB/fileC and /dirA/dirB/dirC/fileD, may a race condition happen? Two processes begin copying-up dirA, first processA succeeds it and second processB fails and returns EIO? - All copy-up operations will be serialized due to workdir lock. - In fstat(2) for a dir, is nlink set to 1 even it is removed? - In readdir, it lookup or getattr to determine whether the found char dev entry is a whiteout or not. I know a char dev is not so many, so this overhead won't be large. But if its name represented I am a whiteout, then the extra lookup or getattr would be unnecessary. My personal impression for overall is overlayfs starts growing. Also several parts look like towarding aufs. For example, - a workdir means an overlayfs specific work. Aufs has such special dir for copying-up an unlinked file and a pseudo-link. Both are unnecessary for overlayfs because overlayfs copies-up a file in open(2) time, and doesn't support the hardlink between layers. - many small wrapper functions for VFS helpers resemble to aufs too. In aufs, all they have lockdep_off/on. - the internal cache for readdir allocating extra memory. Aufs adopts a simple hashing, while overlayfs uses rbtree. But of course the fundamental design differences between overlayfs and aufs are kept, such as - a name-based union .vs. an inode-aware union - multiple layers - allow users to access the layers directly - etc... If LKML people consider merging overlayfs, then I'd ask to consier aufs too. The basic design is unchanged since when I posted a few years ago. http://marc.info/?l=linux-kernelm=123934927611907w=2 And latest one is http://aufs.sourceforge.net J. R. Okajima -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 00/13] overlay filesystem v22
I'd like to propose this for 3.16. Changes in v22: - Whiteout is now a special char device instead of a symlink, this breaks compatibility with previous versions. See attached conversion script (takes upperdir as argument). - Uses cross-rename to make operations atomic: copy-up, unlink, rename, etc... - Added "workdir=" mount option. Work directory is used to prepare files before atomically swithing with destination and needs to be on the same filesystem as upperdir. Git tree is here: git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.current Thanks, Miklos --- Andy Whitcroft (1): overlayfs: add statfs support Erez Zadok (1): overlayfs: implement show_options Miklos Szeredi (9): vfs: add i_op->dentry_open() vfs: export do_splice_direct() to modules vfs: export __inode_permission() to modules vfs: introduce clone_private_mount() vfs: export check_sticky() vfs: add whiteout support vfs: add RENAME_WHITEOUT overlay filesystem fs: limit filesystem stacking depth Neil Brown (1): overlay: overlay filesystem documentation Sedat Dilek (1): vfs: dcache: Export d_ancestor to modules --- Documentation/filesystems/Locking | 2 + Documentation/filesystems/overlayfs.txt | 198 +++ Documentation/filesystems/vfs.txt | 7 + MAINTAINERS | 7 + fs/Kconfig | 1 + fs/Makefile | 1 + fs/btrfs/ioctl.c| 20 +- fs/dcache.c | 1 + fs/ecryptfs/main.c | 7 + fs/ext4/namei.c | 99 +++- fs/internal.h | 7 - fs/namei.c | 41 +- fs/namespace.c | 27 + fs/open.c | 23 +- fs/overlayfs/Kconfig| 10 + fs/overlayfs/Makefile | 7 + fs/overlayfs/copy_up.c | 427 +++ fs/overlayfs/dir.c | 886 fs/overlayfs/inode.c| 372 ++ fs/overlayfs/overlayfs.h| 185 +++ fs/overlayfs/readdir.c | 518 +++ fs/overlayfs/super.c| 776 fs/splice.c | 1 + include/linux/fs.h | 44 ++ include/linux/mount.h | 3 + include/uapi/linux/fs.h | 1 + 26 files changed, 3613 insertions(+), 58 deletions(-) create mode 100644 Documentation/filesystems/overlayfs.txt create mode 100644 fs/overlayfs/Kconfig create mode 100644 fs/overlayfs/Makefile create mode 100644 fs/overlayfs/copy_up.c create mode 100644 fs/overlayfs/dir.c create mode 100644 fs/overlayfs/inode.c create mode 100644 fs/overlayfs/overlayfs.h create mode 100644 fs/overlayfs/readdir.c create mode 100644 fs/overlayfs/super.c --- overlayfs-fixup.sh --- #! /bin/bash upper=$1 tmpdir=`mktemp -d` tmp=$tmpdir/wh find "$upper" -type l -print0 | while IFS= read -r -d $'\0' name; do iswh=`getfattr -h -ntrusted.overlay.whiteout --only-values "$name" 2> /dev/null` if test "$iswh" = y; then echo "changing whiteout <$name> from symlink to chardev" mknod -m0 $tmp c 0 0 mv -f $tmp "$name" fi done rmdir $tmpdir -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 00/13] overlay filesystem v22
I'd like to propose this for 3.16. Changes in v22: - Whiteout is now a special char device instead of a symlink, this breaks compatibility with previous versions. See attached conversion script (takes upperdir as argument). - Uses cross-rename to make operations atomic: copy-up, unlink, rename, etc... - Added workdir= mount option. Work directory is used to prepare files before atomically swithing with destination and needs to be on the same filesystem as upperdir. Git tree is here: git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.current Thanks, Miklos --- Andy Whitcroft (1): overlayfs: add statfs support Erez Zadok (1): overlayfs: implement show_options Miklos Szeredi (9): vfs: add i_op-dentry_open() vfs: export do_splice_direct() to modules vfs: export __inode_permission() to modules vfs: introduce clone_private_mount() vfs: export check_sticky() vfs: add whiteout support vfs: add RENAME_WHITEOUT overlay filesystem fs: limit filesystem stacking depth Neil Brown (1): overlay: overlay filesystem documentation Sedat Dilek (1): vfs: dcache: Export d_ancestor to modules --- Documentation/filesystems/Locking | 2 + Documentation/filesystems/overlayfs.txt | 198 +++ Documentation/filesystems/vfs.txt | 7 + MAINTAINERS | 7 + fs/Kconfig | 1 + fs/Makefile | 1 + fs/btrfs/ioctl.c| 20 +- fs/dcache.c | 1 + fs/ecryptfs/main.c | 7 + fs/ext4/namei.c | 99 +++- fs/internal.h | 7 - fs/namei.c | 41 +- fs/namespace.c | 27 + fs/open.c | 23 +- fs/overlayfs/Kconfig| 10 + fs/overlayfs/Makefile | 7 + fs/overlayfs/copy_up.c | 427 +++ fs/overlayfs/dir.c | 886 fs/overlayfs/inode.c| 372 ++ fs/overlayfs/overlayfs.h| 185 +++ fs/overlayfs/readdir.c | 518 +++ fs/overlayfs/super.c| 776 fs/splice.c | 1 + include/linux/fs.h | 44 ++ include/linux/mount.h | 3 + include/uapi/linux/fs.h | 1 + 26 files changed, 3613 insertions(+), 58 deletions(-) create mode 100644 Documentation/filesystems/overlayfs.txt create mode 100644 fs/overlayfs/Kconfig create mode 100644 fs/overlayfs/Makefile create mode 100644 fs/overlayfs/copy_up.c create mode 100644 fs/overlayfs/dir.c create mode 100644 fs/overlayfs/inode.c create mode 100644 fs/overlayfs/overlayfs.h create mode 100644 fs/overlayfs/readdir.c create mode 100644 fs/overlayfs/super.c --- overlayfs-fixup.sh --- #! /bin/bash upper=$1 tmpdir=`mktemp -d` tmp=$tmpdir/wh find $upper -type l -print0 | while IFS= read -r -d $'\0' name; do iswh=`getfattr -h -ntrusted.overlay.whiteout --only-values $name 2 /dev/null` if test $iswh = y; then echo changing whiteout $name from symlink to chardev mknod -m0 $tmp c 0 0 mv -f $tmp $name fi done rmdir $tmpdir -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/