Re: [PATCH] Btrfs: prevent deletion of mounted subvolumes

2015-04-01 Thread Timo Kokkonen

Hi,

On 01.04.2015 10:03, Omar Sandoval wrote:

On Tue, Mar 31, 2015 at 10:54:55PM -0500, Eric W. Biederman wrote:

Omar Sandoval  writes:


On Mon, Mar 30, 2015 at 02:30:34PM +0200, David Sterba wrote:

On Mon, Mar 30, 2015 at 02:02:17AM -0700, Omar Sandoval wrote:

Before commit bafc9b754f75 ("vfs: More precise tests in d_invalidate"),
d_invalidate() could return -EBUSY when a dentry for a directory had
more than one reference to it. This is what prevented a mounted
subvolume from being deleted, as struct vfsmount holds a reference to
the subvolume dentry. However, that commit removed that case, and later
commits in that patch series removed the return code from d_invalidate()
completely, so we don't get that check for free anymore. So, reintroduce
it in btrfs_ioctl_snap_destroy().



This applies to 4.0-rc6. To be honest, I'm not sure that this is the most
correct fix for this bug, but it's equivalent to the pre-3.18 behavior and it's
the best that I could come up with. Thoughts?



+   spin_lock(&dentry->d_lock);
+   err = dentry->d_lockref.count > 1 ? -EBUSY : 0;
+   spin_unlock(&dentry->d_lock);


The fix restores the original behaviour, but I don't think opencoding and
using internals is fine. Either there should be a vfs api for that or
there's an existing one that can be used instead.


I have a problem with restoring the original behavior as is.

In some sense it re-introduces the security issue that the d_invalidate
changes were built to fix.

Any user in the system can create a user namespace, create a mount
namespace and keep any subvolume pinned forever.  Which at the very
least could make a very nice DOS attack.  I am not familiar enough with
how people use subvolumes and

So let me ask.  How can userspace not know that a subvolume that they
want to delete is already mounted?



Currently, the entry in /proc/mounts doesn't tell you which subvolume is
mounted. The fix for that could be as simple as:


diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 05fef19..9492d83 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1024,6 +1024,10 @@ static int btrfs_show_options(struct seq_file *seq, 
struct dentry *dentry)
struct btrfs_root *root = info->tree_root;
char *compress_type;

+   if (dentry != dentry->d_sb->s_root) {
+   seq_puts(seq, ",subvol=");
+   seq_dentry(seq, dentry, " \t\n\\");
+   }
if (btrfs_test_opt(root, DEGRADED))
seq_puts(seq, ",degraded");
if (btrfs_test_opt(root, NODATASUM))


Then, maybe this policy could be pushed up to userspace. It feels
awkward to do it in the kernel, but users are apparently depending on
this behavior. Timo, do you mind sharing some more details about how
your scripts ran into the bug?



We are choosing the active subvolume via kernel command line parameter, eg:

root=/dev/mmcblk0p3 rw rootwait rootflags=subvol=/foobar

In the user space we run a script that does some upgrades on the root 
file system and in the process creates new subvolumes and deletes old 
ones. For this to work we mount the root subvolume "eg. /" onto 
somewhere so we can take a new snapshot from the currently active 
subvolume, eg.


mount /dev/mmcblk0p3 /mnt/root
btrfs subvolume snapshot /mnt/root/foobar /mnt/root/new_foobar
btrfs subvolume delete /mnt/root/old_foobar

and such. But if it happens that user gives faulty names for the 
subvolumes to operate with, the script might delete the subvolume that 
is the same where the device root file system is mounted from. That is, 
there is not anything anymore that prevents user from running this command:


btrfs subvolume delete /mnt/root/foobar

Once you delete the this subvolume, things obviously collapse into halt 
pretty soon as the userspace expects "/" to be present on the system.


That is something that is obviously wrong thing for the user to do to 
begin with, easily avoidable with more careful scripting. But I can 
think this is very bad if the user is doing root file system snapshot 
management by hand and might easily delete his mounted root file system 
by accident. And I can't think of any reason why kernel should allow 
user to do this.


I hope this clears up things a bit.

-Timo


I can see having something like is_local_mount_root and denying the
subvolume destruction if the mount that is pinning it is in your local
mount namespace.



The bug here seems defined up to the point that we're trying to delete a
subvolume that's a mountpoint. My next guess is that a check

if (d_mountpoint(&dentry)) { ... }

could work.


That was my first instinct as well, but d_mountpoint() is true for
dentries that have a filesystem mounted on them (e.g., after mount
/dev/sda1 /mnt, the dentry for "/mnt"), not the dentry that is mounted.

I poked around the mount code for awhile and couldn't come up with
anything using the existing interface. Mounting subvolumes bubbles down
to mount_subtree(), which doesn't really le

Re: [PATCH] Btrfs: prevent deletion of mounted subvolumes

2015-03-30 Thread Timo Kokkonen

On 30.03.2015 12:02, Omar Sandoval wrote:

Before commit bafc9b754f75 ("vfs: More precise tests in d_invalidate"),
d_invalidate() could return -EBUSY when a dentry for a directory had
more than one reference to it. This is what prevented a mounted
subvolume from being deleted, as struct vfsmount holds a reference to
the subvolume dentry. However, that commit removed that case, and later
commits in that patch series removed the return code from d_invalidate()
completely, so we don't get that check for free anymore. So, reintroduce
it in btrfs_ioctl_snap_destroy().

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93021
Reported-by: Timo Kokkonen 
Fixes: bafc9b754f75 ("vfs: More precise tests in d_invalidate")
Signed-off-by: Omar Sandoval 


Tested-by: Timo Kokkonen 

Thank you

-Timo


---
This applies to 4.0-rc6. To be honest, I'm not sure that this is the most
correct fix for this bug, but it's equivalent to the pre-3.18 behavior and it's
the best that I could come up with. Thoughts?

  fs/btrfs/ioctl.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 74609b9..39b0538 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2384,6 +2384,12 @@ static noinline int btrfs_ioctl_snap_destroy(struct file 
*file,
goto out_dput;
}

+   spin_lock(&dentry->d_lock);
+   err = dentry->d_lockref.count > 1 ? -EBUSY : 0;
+   spin_unlock(&dentry->d_lock);
+   if (err)
+   goto out_dput;
+
mutex_lock(&inode->i_mutex);

/*



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html