Re: [PATCH 00/13] overlay filesystem v22

2014-05-29 Thread Miklos Szeredi
On Thu, May 29, 2014 at 12:26 PM, David Howells  wrote:
> Miklos Szeredi  wrote:
>
>> Perfect solution would be an invisible temp directory.  This needs filesystem
>> support, but perhaps not so difficult.  Again could be done later without
>> backward compatibility issues.
>
> Maybe make a tempfile and hardlink it into place when complete.  That's what
> unionmount is doing.

That doesn't work with RENAME_EXCHANGE, which is what overlayfs uses.

I think that's a small price to pay for not needing to add whiteout
support to every single directory operation.

We could also implement RENAME_EXCHANGE with a tmpfile, but then
again, I think that may be too much complexity for too little gain.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] overlay filesystem v22

2014-05-29 Thread David Howells
Miklos Szeredi  wrote:

> Perfect solution would be an invisible temp directory.  This needs filesystem
> support, but perhaps not so difficult.  Again could be done later without
> backward compatibility issues.

Maybe make a tempfile and hardlink it into place when complete.  That's what
unionmount is doing.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] overlay filesystem v22

2014-05-29 Thread David Howells
Miklos Szeredi mik...@szeredi.hu wrote:

 Perfect solution would be an invisible temp directory.  This needs filesystem
 support, but perhaps not so difficult.  Again could be done later without
 backward compatibility issues.

Maybe make a tempfile and hardlink it into place when complete.  That's what
unionmount is doing.

David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] overlay filesystem v22

2014-05-29 Thread Miklos Szeredi
On Thu, May 29, 2014 at 12:26 PM, David Howells dhowe...@redhat.com wrote:
 Miklos Szeredi mik...@szeredi.hu wrote:

 Perfect solution would be an invisible temp directory.  This needs filesystem
 support, but perhaps not so difficult.  Again could be done later without
 backward compatibility issues.

 Maybe make a tempfile and hardlink it into place when complete.  That's what
 unionmount is doing.

That doesn't work with RENAME_EXCHANGE, which is what overlayfs uses.

I think that's a small price to pay for not needing to add whiteout
support to every single directory operation.

We could also implement RENAME_EXCHANGE with a tmpfile, but then
again, I think that may be too much complexity for too little gain.

Thanks,
Miklos
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] overlay filesystem v22

2014-05-28 Thread Miklos Szeredi
On Mon, May 26, 2014 at 10:56:42AM +0900, J. R. Okajima wrote:

> 
> Here are some comments.

Thanks for the review.

> 
> - I have no objection about the 0:0 char-dev whiteout, but you don't
>   have to have the inode for each whiteout. The hardlink is better.
>   In this version, you have  now. How about creating a "base"
>   whiteout under workdir at the mount-time? Maybe it is possible by
>   user-space "mount.overlayfs" or kernel-space. When the whiteout meets
>   EMLINK, create a non-hardlink for that target synchronously and
>   re-create the "base" asynchronously.

The reason I don't do this is complexity.  If ever this becomes a problem, then
we could add hardlinked whiteouts in the future.

> 
> - Is /work always visible to users? If a user accidentally
>   removes it or its children, then some operations will fail. And he
>   cannot recover it anymore. I know it cannot easily happen since its
>   mode is 0. But I am afraid it will be a source of troubles. For
>   example, find(1) or "ls -R /overlayfs" will almost always fail.

Perfect solution would be an invisible temp directory.  This needs filesystem
support, but perhaps not so difficult.  Again could be done later without
backward compatibility issues.

> 
> - If I remember correctly, the length of the dir mutex is held time has
>   been pointed out. This version has still a long mutex held time, a whole
>   copy-up operation includeing lookup, create, copy filedata, copy
>   xattr/attr, and then rename. How about unlock the dir before copying
>   filedata and re-lock and confirm after copying attr?

Possibly doable, but again this would add complexity and I'd rather leave it
until somebody complains.

> 
> - When two processes copy-up a similar dir hierarcy, for example
>   /dirA/dirB/fileC and /dirA/dirB/dirC/fileD, may a race condition
>   happen? Two processes begin copying-up dirA, first processA succeeds
>   it and second processB fails and returns EIO?

No, we check the state with the parent lock held and skip the copy up if sombody
else won the race.

> 
> - All copy-up operations will be serialized due to  lock.

Yes.  Trivially fixable by creating a separate dir for each temp file.

> 
> - In fstat(2) for a dir, is nlink set to 1 even it is removed?

Probably.  I think right fix is to check if dentry is hashed and set nlink to
zero otherwise.  Will look into it.

> 
> - In readdir, it lookup or getattr to determine whether the found char
>   dev entry is a whiteout or not. I know a char dev is not so many, so
>   this overhead won't be large. But if its name represented "I am a
>   whiteout", then the extra lookup or getattr would be unnecessary.

At the cost of namespace issues.  I wouldn't consider that a good trade.


Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] overlay filesystem v22

2014-05-28 Thread Miklos Szeredi
On Mon, May 26, 2014 at 10:56:42AM +0900, J. R. Okajima wrote:

 
 Here are some comments.

Thanks for the review.

 
 - I have no objection about the 0:0 char-dev whiteout, but you don't
   have to have the inode for each whiteout. The hardlink is better.
   In this version, you have workdir now. How about creating a base
   whiteout under workdir at the mount-time? Maybe it is possible by
   user-space mount.overlayfs or kernel-space. When the whiteout meets
   EMLINK, create a non-hardlink for that target synchronously and
   re-create the base asynchronously.

The reason I don't do this is complexity.  If ever this becomes a problem, then
we could add hardlinked whiteouts in the future.

 
 - Is workdir/work always visible to users? If a user accidentally
   removes it or its children, then some operations will fail. And he
   cannot recover it anymore. I know it cannot easily happen since its
   mode is 0. But I am afraid it will be a source of troubles. For
   example, find(1) or ls -R /overlayfs will almost always fail.

Perfect solution would be an invisible temp directory.  This needs filesystem
support, but perhaps not so difficult.  Again could be done later without
backward compatibility issues.

 
 - If I remember correctly, the length of the dir mutex is held time has
   been pointed out. This version has still a long mutex held time, a whole
   copy-up operation includeing lookup, create, copy filedata, copy
   xattr/attr, and then rename. How about unlock the dir before copying
   filedata and re-lock and confirm after copying attr?

Possibly doable, but again this would add complexity and I'd rather leave it
until somebody complains.

 
 - When two processes copy-up a similar dir hierarcy, for example
   /dirA/dirB/fileC and /dirA/dirB/dirC/fileD, may a race condition
   happen? Two processes begin copying-up dirA, first processA succeeds
   it and second processB fails and returns EIO?

No, we check the state with the parent lock held and skip the copy up if sombody
else won the race.

 
 - All copy-up operations will be serialized due to workdir lock.

Yes.  Trivially fixable by creating a separate dir for each temp file.

 
 - In fstat(2) for a dir, is nlink set to 1 even it is removed?

Probably.  I think right fix is to check if dentry is hashed and set nlink to
zero otherwise.  Will look into it.

 
 - In readdir, it lookup or getattr to determine whether the found char
   dev entry is a whiteout or not. I know a char dev is not so many, so
   this overhead won't be large. But if its name represented I am a
   whiteout, then the extra lookup or getattr would be unnecessary.

At the cost of namespace issues.  I wouldn't consider that a good trade.


Thanks,
Miklos
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] overlay filesystem v22

2014-05-25 Thread J. R. Okajima

Thanks for CC-ing me.

Here are some comments.

- I have no objection about the 0:0 char-dev whiteout, but you don't
  have to have the inode for each whiteout. The hardlink is better.
  In this version, you have  now. How about creating a "base"
  whiteout under workdir at the mount-time? Maybe it is possible by
  user-space "mount.overlayfs" or kernel-space. When the whiteout meets
  EMLINK, create a non-hardlink for that target synchronously and
  re-create the "base" asynchronously.

- Is /work always visible to users? If a user accidentally
  removes it or its children, then some operations will fail. And he
  cannot recover it anymore. I know it cannot easily happen since its
  mode is 0. But I am afraid it will be a source of troubles. For
  example, find(1) or "ls -R /overlayfs" will almost always fail.

- If I remember correctly, the length of the dir mutex is held time has
  been pointed out. This version has still a long mutex held time, a whole
  copy-up operation includeing lookup, create, copy filedata, copy
  xattr/attr, and then rename. How about unlock the dir before copying
  filedata and re-lock and confirm after copying attr?

- When two processes copy-up a similar dir hierarcy, for example
  /dirA/dirB/fileC and /dirA/dirB/dirC/fileD, may a race condition
  happen? Two processes begin copying-up dirA, first processA succeeds
  it and second processB fails and returns EIO?

- All copy-up operations will be serialized due to  lock.

- In fstat(2) for a dir, is nlink set to 1 even it is removed?

- In readdir, it lookup or getattr to determine whether the found char
  dev entry is a whiteout or not. I know a char dev is not so many, so
  this overhead won't be large. But if its name represented "I am a
  whiteout", then the extra lookup or getattr would be unnecessary.


My personal impression for overall is overlayfs starts growing.
Also several parts look like towarding aufs. For example,
- a  means an overlayfs specific work. Aufs has such special
  dir for copying-up an unlinked file and a pseudo-link. Both are
  unnecessary for overlayfs because overlayfs copies-up a file in
  open(2) time, and doesn't support the hardlink between layers.
- many small wrapper functions for VFS helpers resemble to aufs
  too. In aufs, all they have lockdep_off/on.
- the internal cache for readdir allocating extra memory. Aufs adopts
  a simple hashing, while overlayfs uses rbtree.

But of course the fundamental design differences between overlayfs and
aufs are kept, such as
- a name-based union .vs. an inode-aware union
- multiple layers
- allow users to access the layers directly
- etc...

If LKML people consider merging overlayfs, then I'd ask to consier aufs
too. The basic design is unchanged since when I posted a few years ago.
http://marc.info/?l=linux-kernel=123934927611907=2

And latest one is
http://aufs.sourceforge.net


J. R. Okajima
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/13] overlay filesystem v22

2014-05-25 Thread J. R. Okajima

Thanks for CC-ing me.

Here are some comments.

- I have no objection about the 0:0 char-dev whiteout, but you don't
  have to have the inode for each whiteout. The hardlink is better.
  In this version, you have workdir now. How about creating a base
  whiteout under workdir at the mount-time? Maybe it is possible by
  user-space mount.overlayfs or kernel-space. When the whiteout meets
  EMLINK, create a non-hardlink for that target synchronously and
  re-create the base asynchronously.

- Is workdir/work always visible to users? If a user accidentally
  removes it or its children, then some operations will fail. And he
  cannot recover it anymore. I know it cannot easily happen since its
  mode is 0. But I am afraid it will be a source of troubles. For
  example, find(1) or ls -R /overlayfs will almost always fail.

- If I remember correctly, the length of the dir mutex is held time has
  been pointed out. This version has still a long mutex held time, a whole
  copy-up operation includeing lookup, create, copy filedata, copy
  xattr/attr, and then rename. How about unlock the dir before copying
  filedata and re-lock and confirm after copying attr?

- When two processes copy-up a similar dir hierarcy, for example
  /dirA/dirB/fileC and /dirA/dirB/dirC/fileD, may a race condition
  happen? Two processes begin copying-up dirA, first processA succeeds
  it and second processB fails and returns EIO?

- All copy-up operations will be serialized due to workdir lock.

- In fstat(2) for a dir, is nlink set to 1 even it is removed?

- In readdir, it lookup or getattr to determine whether the found char
  dev entry is a whiteout or not. I know a char dev is not so many, so
  this overhead won't be large. But if its name represented I am a
  whiteout, then the extra lookup or getattr would be unnecessary.


My personal impression for overall is overlayfs starts growing.
Also several parts look like towarding aufs. For example,
- a workdir means an overlayfs specific work. Aufs has such special
  dir for copying-up an unlinked file and a pseudo-link. Both are
  unnecessary for overlayfs because overlayfs copies-up a file in
  open(2) time, and doesn't support the hardlink between layers.
- many small wrapper functions for VFS helpers resemble to aufs
  too. In aufs, all they have lockdep_off/on.
- the internal cache for readdir allocating extra memory. Aufs adopts
  a simple hashing, while overlayfs uses rbtree.

But of course the fundamental design differences between overlayfs and
aufs are kept, such as
- a name-based union .vs. an inode-aware union
- multiple layers
- allow users to access the layers directly
- etc...

If LKML people consider merging overlayfs, then I'd ask to consier aufs
too. The basic design is unchanged since when I posted a few years ago.
http://marc.info/?l=linux-kernelm=123934927611907w=2

And latest one is
http://aufs.sourceforge.net


J. R. Okajima
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/13] overlay filesystem v22

2014-05-23 Thread Miklos Szeredi
I'd like to propose this for 3.16.

Changes in v22:

 - Whiteout is now a special char device instead of a symlink, this breaks
   compatibility with previous versions.  See attached conversion script (takes
   upperdir as argument).

 - Uses cross-rename to make operations atomic: copy-up, unlink, rename, etc...

 - Added "workdir=" mount option.  Work directory is used to prepare files
   before atomically swithing with destination and needs to be on the same
   filesystem as upperdir.

Git tree is here:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git 
overlayfs.current

Thanks,
Miklos

---
Andy Whitcroft (1):
  overlayfs: add statfs support

Erez Zadok (1):
  overlayfs: implement show_options

Miklos Szeredi (9):
  vfs: add i_op->dentry_open()
  vfs: export do_splice_direct() to modules
  vfs: export __inode_permission() to modules
  vfs: introduce clone_private_mount()
  vfs: export check_sticky()
  vfs: add whiteout support
  vfs: add RENAME_WHITEOUT
  overlay filesystem
  fs: limit filesystem stacking depth

Neil Brown (1):
  overlay: overlay filesystem documentation

Sedat Dilek (1):
  vfs: dcache: Export d_ancestor to modules

---
 Documentation/filesystems/Locking   |   2 +
 Documentation/filesystems/overlayfs.txt | 198 +++
 Documentation/filesystems/vfs.txt   |   7 +
 MAINTAINERS |   7 +
 fs/Kconfig  |   1 +
 fs/Makefile |   1 +
 fs/btrfs/ioctl.c|  20 +-
 fs/dcache.c |   1 +
 fs/ecryptfs/main.c  |   7 +
 fs/ext4/namei.c |  99 +++-
 fs/internal.h   |   7 -
 fs/namei.c  |  41 +-
 fs/namespace.c  |  27 +
 fs/open.c   |  23 +-
 fs/overlayfs/Kconfig|  10 +
 fs/overlayfs/Makefile   |   7 +
 fs/overlayfs/copy_up.c  | 427 +++
 fs/overlayfs/dir.c  | 886 
 fs/overlayfs/inode.c| 372 ++
 fs/overlayfs/overlayfs.h| 185 +++
 fs/overlayfs/readdir.c  | 518 +++
 fs/overlayfs/super.c| 776 
 fs/splice.c |   1 +
 include/linux/fs.h  |  44 ++
 include/linux/mount.h   |   3 +
 include/uapi/linux/fs.h |   1 +
 26 files changed, 3613 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/filesystems/overlayfs.txt
 create mode 100644 fs/overlayfs/Kconfig
 create mode 100644 fs/overlayfs/Makefile
 create mode 100644 fs/overlayfs/copy_up.c
 create mode 100644 fs/overlayfs/dir.c
 create mode 100644 fs/overlayfs/inode.c
 create mode 100644 fs/overlayfs/overlayfs.h
 create mode 100644 fs/overlayfs/readdir.c
 create mode 100644 fs/overlayfs/super.c


--- overlayfs-fixup.sh ---
#! /bin/bash

upper=$1
tmpdir=`mktemp -d`
tmp=$tmpdir/wh
find "$upper" -type l -print0 | while IFS= read -r -d $'\0' name; do
iswh=`getfattr -h -ntrusted.overlay.whiteout --only-values "$name" 2> 
/dev/null`
if test "$iswh" = y; then
echo "changing whiteout <$name> from symlink to chardev"
mknod -m0 $tmp c 0 0
mv -f $tmp "$name"
fi
done
rmdir $tmpdir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/13] overlay filesystem v22

2014-05-23 Thread Miklos Szeredi
I'd like to propose this for 3.16.

Changes in v22:

 - Whiteout is now a special char device instead of a symlink, this breaks
   compatibility with previous versions.  See attached conversion script (takes
   upperdir as argument).

 - Uses cross-rename to make operations atomic: copy-up, unlink, rename, etc...

 - Added workdir= mount option.  Work directory is used to prepare files
   before atomically swithing with destination and needs to be on the same
   filesystem as upperdir.

Git tree is here:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git 
overlayfs.current

Thanks,
Miklos

---
Andy Whitcroft (1):
  overlayfs: add statfs support

Erez Zadok (1):
  overlayfs: implement show_options

Miklos Szeredi (9):
  vfs: add i_op-dentry_open()
  vfs: export do_splice_direct() to modules
  vfs: export __inode_permission() to modules
  vfs: introduce clone_private_mount()
  vfs: export check_sticky()
  vfs: add whiteout support
  vfs: add RENAME_WHITEOUT
  overlay filesystem
  fs: limit filesystem stacking depth

Neil Brown (1):
  overlay: overlay filesystem documentation

Sedat Dilek (1):
  vfs: dcache: Export d_ancestor to modules

---
 Documentation/filesystems/Locking   |   2 +
 Documentation/filesystems/overlayfs.txt | 198 +++
 Documentation/filesystems/vfs.txt   |   7 +
 MAINTAINERS |   7 +
 fs/Kconfig  |   1 +
 fs/Makefile |   1 +
 fs/btrfs/ioctl.c|  20 +-
 fs/dcache.c |   1 +
 fs/ecryptfs/main.c  |   7 +
 fs/ext4/namei.c |  99 +++-
 fs/internal.h   |   7 -
 fs/namei.c  |  41 +-
 fs/namespace.c  |  27 +
 fs/open.c   |  23 +-
 fs/overlayfs/Kconfig|  10 +
 fs/overlayfs/Makefile   |   7 +
 fs/overlayfs/copy_up.c  | 427 +++
 fs/overlayfs/dir.c  | 886 
 fs/overlayfs/inode.c| 372 ++
 fs/overlayfs/overlayfs.h| 185 +++
 fs/overlayfs/readdir.c  | 518 +++
 fs/overlayfs/super.c| 776 
 fs/splice.c |   1 +
 include/linux/fs.h  |  44 ++
 include/linux/mount.h   |   3 +
 include/uapi/linux/fs.h |   1 +
 26 files changed, 3613 insertions(+), 58 deletions(-)
 create mode 100644 Documentation/filesystems/overlayfs.txt
 create mode 100644 fs/overlayfs/Kconfig
 create mode 100644 fs/overlayfs/Makefile
 create mode 100644 fs/overlayfs/copy_up.c
 create mode 100644 fs/overlayfs/dir.c
 create mode 100644 fs/overlayfs/inode.c
 create mode 100644 fs/overlayfs/overlayfs.h
 create mode 100644 fs/overlayfs/readdir.c
 create mode 100644 fs/overlayfs/super.c


--- overlayfs-fixup.sh ---
#! /bin/bash

upper=$1
tmpdir=`mktemp -d`
tmp=$tmpdir/wh
find $upper -type l -print0 | while IFS= read -r -d $'\0' name; do
iswh=`getfattr -h -ntrusted.overlay.whiteout --only-values $name 2 
/dev/null`
if test $iswh = y; then
echo changing whiteout $name from symlink to chardev
mknod -m0 $tmp c 0 0
mv -f $tmp $name
fi
done
rmdir $tmpdir
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/