[PATCH 0/5] btrfs: Readonly snapshots
(Cc: Sage Weil s...@newdream.net for changes in async snapshots) This patchset adds readonly-snapshots support. You can create a readonly snapshot, and you can also set a snapshot readonly/writable on the fly. A few readonly checks are added in setattr, permission, remove_xattr and set_xattr callbacks, as well as in some ioctls. You can also try it out by pulling (based on the master branch of Chris' tree): git://repo.or.cz/linux-btrfs-devel.git readonly-snapshots Note: I changed the async snapshot ABI. So if the patchset is acceptable, the first patch has to be merged into .37 to avoid ABI breakage. --- fs/btrfs/ctree.h |3 + fs/btrfs/disk-io.c | 36 +++- fs/btrfs/inode.c |8 +++ fs/btrfs/ioctl.c | 147 +--- fs/btrfs/ioctl.h | 16 - fs/btrfs/transaction.c |8 +++ fs/btrfs/transaction.h |1 + fs/btrfs/xattr.c | 18 ++ 8 files changed, 196 insertions(+), 41 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] btrfs: Make async snapshot ioctl more generic
So we don't have to add new structures as we create more ioctls for snapshots. Now to create async snapshot, set BTRFS_SNAPSHOT_CREATE_ASYNC bit of vol_arg_v2-flags, and then call ioctl(BTRFS_IOCT_SNAP_CREATE_V2). Note: this changes the async snapshot ioctl ABI, which was merged in .37 merge window, so we have to make this change into .37. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ioctl.c | 34 +- fs/btrfs/ioctl.h | 12 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 463d91b..d3f1a60 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -935,23 +935,31 @@ out: static noinline int btrfs_ioctl_snap_create(struct file *file, void __user *arg, int subvol, - int async) + bool v2) { struct btrfs_ioctl_vol_args *vol_args = NULL; - struct btrfs_ioctl_async_vol_args *async_vol_args = NULL; + struct btrfs_ioctl_vol_args_v2 *vol_args_v2 = NULL; char *name; u64 fd; u64 transid = 0; + bool async = false; int ret; - if (async) { - async_vol_args = memdup_user(arg, sizeof(*async_vol_args)); - if (IS_ERR(async_vol_args)) - return PTR_ERR(async_vol_args); + if (v2) { + vol_args_v2 = memdup_user(arg, sizeof(*vol_args_v2)); + if (IS_ERR(vol_args_v2)) + return PTR_ERR(vol_args_v2); - name = async_vol_args-name; - fd = async_vol_args-fd; - async_vol_args-name[BTRFS_SNAPSHOT_NAME_MAX] = '\0'; + if (vol_args_v2-flags ~BTRFS_SNAPSHOT_CREATE_ASYNC) { + ret = -EINVAL; + goto out; + } + + name = vol_args_v2-name; + fd = vol_args_v2-fd; + vol_args_v2-name[BTRFS_SNAPSHOT_NAME_MAX] = '\0'; + if (vol_args_v2-flags BTRFS_SNAPSHOT_CREATE_ASYNC) + async = true; } else { vol_args = memdup_user(arg, sizeof(*vol_args)); if (IS_ERR(vol_args)) @@ -966,13 +974,13 @@ static noinline int btrfs_ioctl_snap_create(struct file *file, if (!ret async) { if (copy_to_user(arg + - offsetof(struct btrfs_ioctl_async_vol_args, + offsetof(struct btrfs_ioctl_vol_args_v2, transid), transid, sizeof(transid))) return -EFAULT; } - +out: kfree(vol_args); - kfree(async_vol_args); + kfree(vol_args_v2); return ret; } @@ -2235,7 +2243,7 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_getversion(file, argp); case BTRFS_IOC_SNAP_CREATE: return btrfs_ioctl_snap_create(file, argp, 0, 0); - case BTRFS_IOC_SNAP_CREATE_ASYNC: + case BTRFS_IOC_SNAP_CREATE_V2: return btrfs_ioctl_snap_create(file, argp, 0, 1); case BTRFS_IOC_SUBVOL_CREATE: return btrfs_ioctl_snap_create(file, argp, 1, 0); diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 17c99eb..bc70584 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -30,10 +30,14 @@ struct btrfs_ioctl_vol_args { char name[BTRFS_PATH_NAME_MAX + 1]; }; -#define BTRFS_SNAPSHOT_NAME_MAX 4079 -struct btrfs_ioctl_async_vol_args { +#define BTRFS_SNAPSHOT_CREATE_ASYNC(1ULL 0) + +#define BTRFS_SNAPSHOT_NAME_MAX 4039 +struct btrfs_ioctl_vol_args_v2 { __s64 fd; __u64 transid; + __u64 flags; + __u64 unused[4]; char name[BTRFS_SNAPSHOT_NAME_MAX + 1]; }; @@ -187,6 +191,6 @@ struct btrfs_ioctl_space_args { struct btrfs_ioctl_space_args) #define BTRFS_IOC_START_SYNC _IOR(BTRFS_IOCTL_MAGIC, 24, __u64) #define BTRFS_IOC_WAIT_SYNC _IOW(BTRFS_IOCTL_MAGIC, 22, __u64) -#define BTRFS_IOC_SNAP_CREATE_ASYNC _IOW(BTRFS_IOCTL_MAGIC, 23, \ - struct btrfs_ioctl_async_vol_args) +#define BTRFS_IOC_SNAP_CREATE_V2 _IOW(BTRFS_IOCTL_MAGIC, 23, \ + struct btrfs_ioctl_vol_args_v2) #endif -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] btrfs: Fix memory leak in a failure path
In btrfs_ioctl_snap_create(), vol_args_v2 is not freed if copy_to_user() returns failure. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ioctl.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index d3f1a60..ba437ad 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -976,7 +976,7 @@ static noinline int btrfs_ioctl_snap_create(struct file *file, if (copy_to_user(arg + offsetof(struct btrfs_ioctl_vol_args_v2, transid), transid, sizeof(transid))) - return -EFAULT; + ret = -EFAULT; } out: kfree(vol_args); -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] btrfs: Add helper __setup_root_post()
This eliminates duplicate code in find_and_setup_root() and btrfs_read_fs_root_no_radix(). (Prepare for the next patch) Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/disk-io.c | 31 +++ 1 files changed, 15 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b40dfe4..fb650e0 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -959,14 +959,25 @@ static int __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, return 0; } +static void __setup_root_post(struct btrfs_root *root) +{ + u32 blocksize; + u64 generation; + + generation = btrfs_root_generation(root-root_item); + blocksize = btrfs_level_size(root, btrfs_root_level(root-root_item)); + root-node = read_tree_block(root, btrfs_root_bytenr(root-root_item), +blocksize, generation); + BUG_ON(!root-node); + root-commit_root = btrfs_root_node(root); +} + static int find_and_setup_root(struct btrfs_root *tree_root, struct btrfs_fs_info *fs_info, u64 objectid, struct btrfs_root *root) { int ret; - u32 blocksize; - u64 generation; __setup_root(tree_root-nodesize, tree_root-leafsize, tree_root-sectorsize, tree_root-stripesize, @@ -977,12 +988,7 @@ static int find_and_setup_root(struct btrfs_root *tree_root, return -ENOENT; BUG_ON(ret); - generation = btrfs_root_generation(root-root_item); - blocksize = btrfs_level_size(root, btrfs_root_level(root-root_item)); - root-node = read_tree_block(root, btrfs_root_bytenr(root-root_item), -blocksize, generation); - BUG_ON(!root-node); - root-commit_root = btrfs_root_node(root); + __setup_root_post(root); return 0; } @@ -1083,8 +1089,6 @@ struct btrfs_root *btrfs_read_fs_root_no_radix(struct btrfs_root *tree_root, struct btrfs_fs_info *fs_info = tree_root-fs_info; struct btrfs_path *path; struct extent_buffer *l; - u64 generation; - u32 blocksize; int ret = 0; root = kzalloc(sizeof(*root), GFP_NOFS); @@ -1121,12 +1125,7 @@ struct btrfs_root *btrfs_read_fs_root_no_radix(struct btrfs_root *tree_root, return ERR_PTR(ret); } - generation = btrfs_root_generation(root-root_item); - blocksize = btrfs_level_size(root, btrfs_root_level(root-root_item)); - root-node = read_tree_block(root, btrfs_root_bytenr(root-root_item), -blocksize, generation); - root-commit_root = btrfs_root_node(root); - BUG_ON(!root-node); + __setup_root_post(root); out: if (location-objectid != BTRFS_TREE_LOG_OBJECTID) root-ref_cows = 1; -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] btrfs: Add readonly snapshots support
Usage: Set BTRFS_SNAPSHOT_CREATE_RDONLY of btrfs_ioctl_vol_arg_v2-flags, and call ioctl(BTRFS_I0CTL_SNAP_CREATE_V2). Implementation: - In disk set readonly bit of btrfs_root_item-flags, and in memory set btrfs_root-readonly. - Add readonly checks in btrfs_permission (inode_permission), btrfs_setattr, btrfs_set/remove_xattr and some ioctls. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ctree.h |3 +++ fs/btrfs/disk-io.c |5 + fs/btrfs/inode.c |8 fs/btrfs/ioctl.c | 33 + fs/btrfs/ioctl.h |1 + fs/btrfs/transaction.c |8 fs/btrfs/transaction.h |1 + fs/btrfs/xattr.c | 18 ++ 8 files changed, 69 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8db9234..4b263ed 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -597,6 +597,8 @@ struct btrfs_dir_item { u8 type; } __attribute__ ((__packed__)); +#define BTRFS_ROOT_SNAP_RDONLY (1ULL 0) + struct btrfs_root_item { struct btrfs_inode_item inode; __le64 generation; @@ -1116,6 +1118,7 @@ struct btrfs_root { int defrag_running; char *name; int in_sysfs; + bool readonly; /* the dirty list is only used by non-reference counted roots */ struct list_head dirty_list; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index fb650e0..5b88712 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -963,6 +963,7 @@ static void __setup_root_post(struct btrfs_root *root) { u32 blocksize; u64 generation; + u64 flags; generation = btrfs_root_generation(root-root_item); blocksize = btrfs_level_size(root, btrfs_root_level(root-root_item)); @@ -970,6 +971,10 @@ static void __setup_root_post(struct btrfs_root *root) blocksize, generation); BUG_ON(!root-node); root-commit_root = btrfs_root_node(root); + + flags = btrfs_root_flags(root-root_item); + if (flags BTRFS_ROOT_SNAP_RDONLY) + root-readonly = true; } static int find_and_setup_root(struct btrfs_root *tree_root, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5132c9a..08c3075 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3671,8 +3671,12 @@ static int btrfs_setattr_size(struct inode *inode, struct iattr *attr) static int btrfs_setattr(struct dentry *dentry, struct iattr *attr) { struct inode *inode = dentry-d_inode; + struct btrfs_root *root = BTRFS_I(inode)-root; int err; + if (root-readonly) + return -EROFS; + err = inode_change_ok(inode, attr); if (err) return err; @@ -7028,6 +7032,10 @@ static int btrfs_set_page_dirty(struct page *page) static int btrfs_permission(struct inode *inode, int mask) { + struct btrfs_root *root = BTRFS_I(inode)-root; + + if (root-readonly (mask MAY_WRITE)) + return -EROFS; if ((BTRFS_I(inode)-flags BTRFS_INODE_READONLY) (mask MAY_WRITE)) return -EACCES; return generic_permission(inode, mask, btrfs_check_acl); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index ba437ad..7f9c571 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -147,6 +147,9 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg) unsigned int flags, oldflags; int ret; + if (root-readonly) + return -EROFS; + if (copy_from_user(flags, arg, sizeof(flags))) return -EFAULT; @@ -351,7 +354,8 @@ fail: } static int create_snapshot(struct btrfs_root *root, struct dentry *dentry, - char *name, int namelen, u64 *async_transid) + char *name, int namelen, u64 *async_transid, + bool readonly) { struct inode *inode; struct btrfs_pending_snapshot *pending_snapshot; @@ -368,6 +372,7 @@ static int create_snapshot(struct btrfs_root *root, struct dentry *dentry, btrfs_init_block_rsv(pending_snapshot-block_rsv); pending_snapshot-dentry = dentry; pending_snapshot-root = root; + pending_snapshot-readonly = readonly; trans = btrfs_start_transaction(root-fs_info-extent_root, 5); if (IS_ERR(trans)) { @@ -497,7 +502,7 @@ static inline int btrfs_may_create(struct inode *dir, struct dentry *child) static noinline int btrfs_mksubvol(struct path *parent, char *name, int namelen, struct btrfs_root *snap_src, - u64 *async_transid) + u64 *async_transid, bool readonly) { struct inode *dir = parent-dentry-d_inode; struct dentry *dentry; @@ -529,7 +534,7 @@ static noinline int btrfs_mksubvol(struct path *parent, if (snap_src)
[PATCH 5/5] btrfs: Add ioctl to set snapshot readonly/writable
This allows us to set a snapshot readonly or writable on the fly. Usage: Set BTRFS_SNAPSHOT_RDONLY/WRITABLE of btrfs_ioctl_vol_arg_v2-flags, and then call ioctl(BTRFS_IOCTL_SNAP_SETFLAGS); Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ioctl.c | 88 +++-- fs/btrfs/ioctl.h |3 ++ 2 files changed, 87 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 7f9c571..34d8683 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -939,6 +939,24 @@ out: return ret; } +static int snap_check_flags(u64 flags, bool create) +{ + u64 valid_flags = BTRFS_SNAPSHOT_RDONLY | BTRFS_SNAPSHOT_WRITABLE; + + if (create) + valid_flags |= BTRFS_SNAPSHOT_CREATE_ASYNC; + + if (flags valid_flags) + return -EINVAL; + + /* readonly and writable are mutually exclusive */ + if ((flags BTRFS_SNAPSHOT_RDONLY) + (flags BTRFS_SNAPSHOT_WRITABLE)) + return -EINVAL; + + return 0; +} + static noinline int btrfs_ioctl_snap_create(struct file *file, void __user *arg, int subvol, bool v2) @@ -957,11 +975,9 @@ static noinline int btrfs_ioctl_snap_create(struct file *file, if (IS_ERR(vol_args_v2)) return PTR_ERR(vol_args_v2); - if (vol_args_v2-flags - ~(BTRFS_SNAPSHOT_CREATE_ASYNC | BTRFS_SNAPSHOT_RDONLY)) { - ret = -EINVAL; + ret = snap_check_flags(vol_args_v2-flags, true); + if (ret) goto out; - } name = vol_args_v2-name; fd = vol_args_v2-fd; @@ -995,6 +1011,65 @@ out: return ret; } +static noinline int btrfs_ioctl_snap_setflags(struct file *file, + void __user *arg) +{ + struct inode *inode = fdentry(file)-d_inode; + struct btrfs_root *root = BTRFS_I(inode)-root; + struct btrfs_trans_handle *trans; + struct btrfs_ioctl_vol_args_v2 *vol_args_v2; + u64 root_flags; + u64 flags; + int err; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return -EROFS; + + vol_args_v2 = memdup_user(arg, sizeof(*vol_args_v2)); + if (IS_ERR(vol_args_v2)) + return PTR_ERR(vol_args_v2); + flags = vol_args_v2-flags; + + err = snap_check_flags(flags, false); + if (err) + goto out; + + down_write(root-fs_info-subvol_sem); + + /* nothing to do */ + if ((BTRFS_SNAPSHOT_RDONLY root-readonly) || + (BTRFS_SNAPSHOT_WRITABLE !root-readonly)) + goto out_unlock; + + root_flags = btrfs_root_flags(root-root_item); + if (flags BTRFS_SNAPSHOT_RDONLY) { + btrfs_set_root_flags(root-root_item, +root_flags | BTRFS_ROOT_SNAP_RDONLY); + root-readonly = true; + } + if (flags BTRFS_SNAPSHOT_WRITABLE) { + btrfs_set_root_flags(root-root_item, +root_flags ~BTRFS_ROOT_SNAP_RDONLY); + root-readonly = false; + } + + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + err = PTR_ERR(trans); + goto out_unlock; + } + + err = btrfs_update_root(trans, root, + root-root_key, root-root_item); + + btrfs_commit_transaction(trans, root); +out_unlock: + up_write(root-fs_info-subvol_sem); +out: + kfree(vol_args_v2); + return err; +} + /* * helper to check if the subvolume references other subvolumes */ @@ -1503,6 +1578,9 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp) struct btrfs_ioctl_defrag_range_args *range; int ret; + if (root-readonly) + return -EROFS; + ret = mnt_want_write(file-f_path.mnt); if (ret) return ret; @@ -2266,6 +2344,8 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_snap_create(file, argp, 1, 0); case BTRFS_IOC_SNAP_DESTROY: return btrfs_ioctl_snap_destroy(file, argp); + case BTRFS_IOC_SNAP_SETFLAGS: + return btrfs_ioctl_snap_setflags(file, argp); case BTRFS_IOC_DEFAULT_SUBVOL: return btrfs_ioctl_default_subvol(file, argp); case BTRFS_IOC_DEFRAG: diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index ff15fb2..559fd27 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -32,6 +32,7 @@ struct btrfs_ioctl_vol_args { #define BTRFS_SNAPSHOT_CREATE_ASYNC(1ULL 0) #define BTRFS_SNAPSHOT_RDONLY (1ULL 1) +#define BTRFS_SNAPSHOT_WRITABLE(1ULL 2) #define BTRFS_SNAPSHOT_NAME_MAX 4039
Re: [next-rc] Compile Error fs/btrfs/diskio.c
Excerpts from Mitch Harder's message of 2010-11-27 17:53:23 -0500: I've been getting a compile error when building the 'next-rc' branch of btrfs-unstable. CC fs/btrfs/disk-io.o fs/btrfs/disk-io.c: In function ‘btree_migratepage’: fs/btrfs/disk-io.c:716: error: called object ‘0u’ is not a function make[2]: *** [fs/btrfs/disk-io.o] Error 1 make[1]: *** [fs/btrfs] Error 2 make: *** [fs] Error 2 Line 716 of fs/btrfs/disk-io.c is: return migrate_page(mapping, newpage, page); This is related to the Btrfs: add migrate page for metadata inode patch (the first patch in the set of patches added to the next-rc branch). I've pushed out a similar fix in the next-rc branch and the master branch. Waiting on korg mirrors to update before I send the pull request. Thanks! -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs updates for 2.6.37-rc
Hi everyone, The master branch of the btrfs unstable tree: git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git master Has a collection of btrfs bug fixes. The three most important fixes here address crashes in the btrfs O_DIRECT code, add a migrate_page operation to avoid metadata corruption as btree pages go through migration, and fix up our NFS support. Otherwise we have assorted correctness fixes, many of which were kicked out by xfsqa. -chris Josef Bacik (11) commits (+241/-54): Btrfs: make btrfs_add_nondir take parent inode as an argument (+16/-22) Btrfs: fix typo in fallocate to make it honor actual size (+5/-4) Btrfs: hold i_mutex when calling btrfs_log_dentry_safe (+7/-0) Btrfs: setup blank root and fs_info for mount time (+33/-7) Btrfs: make sure new inode size is ok in fallocate (+4/-0) Btrfs: use dget_parent where we can UPDATED (+43/-12) Btrfs: handle the space_cache option properly (+1/-0) Btrfs: update inode ctime when using links (+1/-0) Btrfs: fix more ESTALE problems with NFS (+1/-0) Btrfs: handle NFS lookups properly (+76/-0) Btrfs: fix fiemap (+54/-9) Chris Mason (4) commits (+124/-9): Btrfs: deal with DIO bios that span more than one ordered extent (+89/-4) Btrfs: avoid NULL pointer deref in try_release_extent_buffer (+4/-2) Btrfs: don't use migrate page without CONFIG_MIGRATION (+6/-1) Btrfs: add migrate page for metadata inode (+25/-2) Li Zefan (3) commits (+6/-6): btrfs: Check if dest_offset is block-size aligned before cloning file (+3/-4) btrfs: Show device attr correctly for symlinks (+1/-0) btrfs: Set file size correctly in file clone (+2/-2) Miao Xie (3) commits (+195/-45): btrfs: cleanup duplicate bio allocating functions (+8/-18) btrfs: fix free dip and dip-csums twice (+3/-6) btrfs: fix panic caused by direct IO (+184/-21) Mariusz Kozlowski (1) commits (+3/-3): btrfs: make 1-bit signed fileds unsigned Arne Jansen (1) commits (+1/-1): btrfs: Fix early enospc because 'unused' calculated with wrong sign. Ian Kent (1) commits (+6/-0): Btrfs - fix race between btrfs_get_sb() and umount Total: (24) commits (+576/-118) fs/btrfs/compression.c | 15 +--- fs/btrfs/ctree.h|6 +- fs/btrfs/disk-io.c | 38 +- fs/btrfs/export.c | 76 fs/btrfs/extent-tree.c |2 +- fs/btrfs/extent_io.c| 77 ++--- fs/btrfs/extent_io.h|3 + fs/btrfs/file.c |7 + fs/btrfs/inode.c| 294 ++- fs/btrfs/ioctl.c| 31 -- fs/btrfs/ordered-data.c | 67 +++ fs/btrfs/ordered-data.h |3 + fs/btrfs/super.c| 41 ++- fs/btrfs/transaction.c |5 +- fs/btrfs/tree-log.c | 21 +++- 15 files changed, 572 insertions(+), 114 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC-PATCH] Re: mounting arbitrary directories
On Sun, Nov 28, 2010 at 4:07 AM, C Anthony Risinger anth...@extof.me wrote: On Nov 27, 2010, at 10:22 PM, Calvin Walton calvin.wal...@gmail.com wrote: On Sat, 2010-11-27 at 21:19 -0600, C Anthony Risinger wrote: eg. if i have a regular directory (not a subvolume) in the top- level: /__boot i can mount it with: mount -o subvol=__boot /dev/sda /mnt The 'subvol' option actually works using the same mechanism as a bind mount. The fact that it works using a directory is purely a coincidence - I would not expect it to be officially supported, and you shouldn't rely on it. Hrrrm... Well I thought I'd be clever and use this little trick one time to update my kernel... Anyways I oops out like 3 times in a row (last func was something.clone in each IIRC) and now I'm stuck with only my mobile since my server isn't set up yet and my SSD just failed on my netbook yesterday :-( So, if anyone can help me recover this partition long enough to grab a few things... I would be most grateful :-) I think I have everything critical, but I'd rather take another look :-) Right now I can't mount; mount is failing w/bad superblock. /me promises to never recklessly sabotage himself after being warned 6 hrs earlier any suggestions for me? i dd'ed the corrupt partition to an external disk because i need the machine back up and running, but if possible i'd like to get the image running again. i mounted it via loopback and as expected got the same errors: (will post the dmesg output next message -- left at home) open_ctree_fd failed wanted found instead the mount command itself fails with bad superblock or ... i have seen others withe similar crash reports and IIRC correctly were able to recover it (seems like something just didn't get updated right before it locked...) any ideas? C Anthony -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL][PATCH v2 0/6] btrfs: Add lzo compression support
On Wed, Nov 17, 2010 at 8:08 PM, Li Zefan l...@cn.fujitsu.com wrote: Hi Chris, Here's the updated patchset. As I still haven't got a kernel.org account, I have set up a git tree in another public git repository, and I'll use it for now. You can pull from: git://repo.or.cz/linux-btrfs-devel.git lzo-support Lzo is a much faster compression algorithm than gzib, so would allow more users to enable transparent compression, and some users can choose from compression ratio and compression speed. Usage: # mount -t btrfs -o compress[=zlib,lzo] dev /mnt or # mount -t btrfs -o compress-force[=zlib,lzo] dev /mnt -o compress without argument is still allowed for compatability. Compatibility: If we mount a filesystem with lzo compression, it will not be able be mounted in old kernels. One reason is, otherwise btrfs will directly dump compressed data, which sits in inline extent, to user. Performance: The test copied a linux source tarball (~400M) from an ext4 partition to the btrfs partition, and then extracted the tarball. (time in second) lzo zlib nocompress copy: 10.6 21.7 14.9 extract: 70.1 94.4 66.6 (data size in MB) lzo zlib nocompress copy: 185.87 108.69 394.49 extract: 193.80 132.36 381.21 Test: Mitch has tested the patchset, and provided some positive feedback. According to him, the patchset works as expected and nothing bad has he experienced. Other: The defrag ioctl is also updated, so one can choose lzo or zlib when turning on compression in defrag operation. Main change from v1: - Add incompat flag. - Fix build issue by selecting kernel lzo module. - Check compression type in defrag ioctl. Li Zefan (6): btrfs: Fix bugs in zlib workspace btrfs: Fix error handling in zlib btrfs: Allow to add new compression algorithm btrfs: Add lzo compression support btrfs: Allow to specify compress method when defrag btrfs: Extract duplicate decompress code fs/btrfs/Makefile | 2 +- fs/btrfs/btrfs_inode.h | 2 +- fs/btrfs/compression.c | 329 +- fs/btrfs/compression.h | 72 ++-- fs/btrfs/ctree.h | 11 +- fs/btrfs/extent_io.c | 5 +- fs/btrfs/extent_io.h | 17 ++- fs/btrfs/extent_map.c | 2 + fs/btrfs/extent_map.h | 3 +- fs/btrfs/file.c | 2 + fs/btrfs/inode.c | 82 ++ fs/btrfs/ioctl.c | 10 +- fs/btrfs/ioctl.h | 9 +- fs/btrfs/lzo.c | 409 +++ fs/btrfs/ordered-data.c | 18 ++- fs/btrfs/ordered-data.h | 8 +- fs/btrfs/super.c | 47 +- fs/btrfs/zlib.c | 361 +++-- 18 files changed, 1013 insertions(+), 376 deletions(-) is this in a branch somewhere? or for inclusion in .37/.38? this is a very attractive feature. what's the proper place (repo/branch) to see what is pending? C Anthony -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Root fs on raid1
On Monday, 29 November, 2010, you (Erik Jensen) wrote: I have a similar setup on one of my computers. I set it up a while ago using an initramfs, but I could not get the device scan to work properly, for whatever reason. I ended up having it just trying to mount every device in the array in turn. When it gets to the last device, the array mounts since btrfs now knows about all of the other ones. Really ugly, I know, but it works. I also had a lot of problem to get a btrfs-aware initramfs working properly. One of the biggest proble was to find the right list of modules. Btrfs depends by crc32c, and I need a lot of retries to discover it. The main problem was that if the module is not available (it is the case of a initramfs) btrfs doesn't start... When you say it just trying to mount every device in the array in turn, do you mean in the initarmfs ? -- Erik Jensen On Wed, Nov 17, 2010 at 10:59 PM, Goffredo Baroncelli kreij...@libero.itwrote: On Thursday, 18 November, 2010, ad...@prnet.org wrote: Hi, I have read that for using raid1 btrfs device scan must be run before mounting the fs. How do I proceed if root filesystem is mounted from btrfs ? See a my previous post http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg04709.html Thanks in advance Bye, David Arendt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] btrfs: Make async snapshot ioctl more generic
Hi Li, great work, but I have some suggestions: On Monday, 29 November, 2010, Li Zefan wrote: So we don't have to add new structures as we create more ioctls for snapshots. Now to create async snapshot, set BTRFS_SNAPSHOT_CREATE_ASYNC bit of vol_arg_v2-flags, and then call ioctl(BTRFS_IOCT_SNAP_CREATE_V2). Note: this changes the async snapshot ioctl ABI, which was merged in .37 merge window, so we have to make this change into .37. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ioctl.c | 34 +- fs/btrfs/ioctl.h | 12 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 463d91b..d3f1a60 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -935,23 +935,31 @@ out: static noinline int btrfs_ioctl_snap_create(struct file *file, void __user *arg, int subvol, - int async) + bool v2) This is a aesthetic suggestion: instead of passing a flag called v2 I suggest to add two wrapper functions, which extract the parameters and passes all available parameter to the generic function. Something like: static inline btrfs_ioctl_snap_create_v1(struct file *file, void __user *arg, int subvol) { vol_args = memdup_user(arg, sizeof(*vol_args)); if (IS_ERR(vol_args)) return PTR_ERR(vol_args); name = vol_args-name; fd = vol_args-fd; vol_args-name[BTRFS_PATH_NAME_MAX] = '\0'; btrfs_ioctl_snap_create_generic(file, subvol, name, fd, 0, 0); } static inline btrfs_ioctl_snap_create_v2(struct file *file, void __user *arg, int subvol, ) { vol_args_v2 = memdup_user(arg, sizeof(*vol_args_v2)); if (IS_ERR(vol_args_v2)) return PTR_ERR(vol_args_v2); ret = snap_check_flags(vol_args_v2-flags, true); if (ret) goto out; name = vol_args_v2-name; fd = vol_args_v2-fd; vol_args_v2-name[BTRFS_SNAPSHOT_NAME_MAX] = '\0'; if (vol_args_v2-flags BTRFS_SNAPSHOT_CREATE_ASYNC) async = true; if (vol_args_v2-flags BTRFS_SNAPSHOT_RDONLY) readonly = true; btrfs_ioctl_snap_create_generic(file, subvol, name, fd, async, readonly); } and case BTRFS_IOC_SNAP_CREATE: return btrfs_ioctl_snap_create_v1(file, argp, 0); case BTRFS_IOC_SNAP_CREATE_V2: return btrfs_ioctl_snap_create_v2(file, argp, 0); Frankly speaking, we could get rid of the subvol parameter adding another wrapper function like btrfs_ioctl_subvol_create( ), but this would be another story :) { struct btrfs_ioctl_vol_args *vol_args = NULL; - struct btrfs_ioctl_async_vol_args *async_vol_args = NULL; + struct btrfs_ioctl_vol_args_v2 *vol_args_v2 = NULL; char *name; u64 fd; u64 transid = 0; + bool async = false; int ret; - if (async) { - async_vol_args = memdup_user(arg, sizeof(*async_vol_args)); - if (IS_ERR(async_vol_args)) - return PTR_ERR(async_vol_args); + if (v2) { + vol_args_v2 = memdup_user(arg, sizeof(*vol_args_v2)); + if (IS_ERR(vol_args_v2)) + return PTR_ERR(vol_args_v2); - name = async_vol_args-name; - fd = async_vol_args-fd; - async_vol_args-name[BTRFS_SNAPSHOT_NAME_MAX] = '\0'; + if (vol_args_v2-flags ~BTRFS_SNAPSHOT_CREATE_ASYNC) { + ret = -EINVAL; + goto out; + } + + name = vol_args_v2-name; + fd = vol_args_v2-fd; + vol_args_v2-name[BTRFS_SNAPSHOT_NAME_MAX] = '\0'; + if (vol_args_v2-flags BTRFS_SNAPSHOT_CREATE_ASYNC) + async = true; } else { vol_args = memdup_user(arg, sizeof(*vol_args)); if (IS_ERR(vol_args)) @@ -966,13 +974,13 @@ static noinline int btrfs_ioctl_snap_create(struct file *file, if (!ret async) { if (copy_to_user(arg + - offsetof(struct btrfs_ioctl_async_vol_args, + offsetof(struct btrfs_ioctl_vol_args_v2, transid), transid, sizeof(transid))) return -EFAULT; } - +out: kfree(vol_args); - kfree(async_vol_args); + kfree(vol_args_v2); return ret; } @@ -2235,7
Re: [PATCH 5/5] btrfs: Add ioctl to set snapshot readonly/writable
Hi Li, On Monday, 29 November, 2010, Li Zefan wrote: This allows us to set a snapshot readonly or writable on the fly. Usage: Set BTRFS_SNAPSHOT_RDONLY/WRITABLE of btrfs_ioctl_vol_arg_v2-flags, and then call ioctl(BTRFS_IOCTL_SNAP_SETFLAGS); I really appreciate your work, but I have some doubt about this interface. In particolar: - how get the flags of a subvolume ? I suggest to implement a pair of ioctls: - subvolume_setflags - get the flags - subvolume_getflags - set the flags These ioctls would be more generic (there are a lot of flags which may be interested to put in the root of a subvolume: think about compress/nocompress, (no)datasum...) - For the reason abowe, I suggest to replace SNAPSHOT with SUBVOLUME - Finally, with a pair of get/set_flags functions we can avoid the use of the flags BTRFS_SNAPSHOT_WRITABLE. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ioctl.c | 88 +++-- fs/btrfs/ioctl.h |3 ++ 2 files changed, 87 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 7f9c571..34d8683 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -939,6 +939,24 @@ out: return ret; } +static int snap_check_flags(u64 flags, bool create) +{ + u64 valid_flags = BTRFS_SNAPSHOT_RDONLY | BTRFS_SNAPSHOT_WRITABLE; + + if (create) + valid_flags |= BTRFS_SNAPSHOT_CREATE_ASYNC; + + if (flags valid_flags) + return -EINVAL; + + /* readonly and writable are mutually exclusive */ + if ((flags BTRFS_SNAPSHOT_RDONLY) + (flags BTRFS_SNAPSHOT_WRITABLE)) + return -EINVAL; + + return 0; +} + static noinline int btrfs_ioctl_snap_create(struct file *file, void __user *arg, int subvol, bool v2) @@ -957,11 +975,9 @@ static noinline int btrfs_ioctl_snap_create(struct file *file, if (IS_ERR(vol_args_v2)) return PTR_ERR(vol_args_v2); - if (vol_args_v2-flags - ~(BTRFS_SNAPSHOT_CREATE_ASYNC | BTRFS_SNAPSHOT_RDONLY)) { - ret = -EINVAL; + ret = snap_check_flags(vol_args_v2-flags, true); + if (ret) goto out; - } name = vol_args_v2-name; fd = vol_args_v2-fd; @@ -995,6 +1011,65 @@ out: return ret; } +static noinline int btrfs_ioctl_snap_setflags(struct file *file, + void __user *arg) +{ + struct inode *inode = fdentry(file)-d_inode; + struct btrfs_root *root = BTRFS_I(inode)-root; + struct btrfs_trans_handle *trans; + struct btrfs_ioctl_vol_args_v2 *vol_args_v2; + u64 root_flags; + u64 flags; + int err; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return -EROFS; + + vol_args_v2 = memdup_user(arg, sizeof(*vol_args_v2)); + if (IS_ERR(vol_args_v2)) + return PTR_ERR(vol_args_v2); + flags = vol_args_v2-flags; + + err = snap_check_flags(flags, false); + if (err) + goto out; + + down_write(root-fs_info-subvol_sem); + + /* nothing to do */ + if ((BTRFS_SNAPSHOT_RDONLY root-readonly) || + (BTRFS_SNAPSHOT_WRITABLE !root-readonly)) + goto out_unlock; + + root_flags = btrfs_root_flags(root-root_item); + if (flags BTRFS_SNAPSHOT_RDONLY) { + btrfs_set_root_flags(root-root_item, + root_flags | BTRFS_ROOT_SNAP_RDONLY); + root-readonly = true; + } + if (flags BTRFS_SNAPSHOT_WRITABLE) { + btrfs_set_root_flags(root-root_item, + root_flags ~BTRFS_ROOT_SNAP_RDONLY); + root-readonly = false; + } + + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + err = PTR_ERR(trans); + goto out_unlock; + } + + err = btrfs_update_root(trans, root, + root-root_key, root-root_item); + + btrfs_commit_transaction(trans, root); +out_unlock: + up_write(root-fs_info-subvol_sem); +out: + kfree(vol_args_v2); + return err; +} + /* * helper to check if the subvolume references other subvolumes */ @@ -1503,6 +1578,9 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp) struct btrfs_ioctl_defrag_range_args *range; int ret; + if (root-readonly) + return -EROFS; + ret = mnt_want_write(file-f_path.mnt); if (ret) return ret; @@ -2266,6 +2344,8 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_snap_create(file, argp, 1, 0); case BTRFS_IOC_SNAP_DESTROY: return
Re: [PATCH 1/5] btrfs: Make async snapshot ioctl more generic
On Monday, 29 November, 2010, Goffredo Baroncelli wrote: Why the unused fields ? What happens if you use a more recent btrfs-tools which take advantage of these fields but the kernel is an old one ? At the minimum please check the flags so (flags^(BTRFS_SNAPSHOT_CREATE_ASYNC|BTRFS_SNAPSHOT_RDONLY)) == 0 and unused[0..3] == 0 Sorry, the check should be (flags (~(BTRFS_SNAPSHOT_CREATE_ASYNC|BTRFS_SNAPSHOT_RDONLY))) == 0 For future expansion I suggest to use a different ioctl. To me it seems a more robust API. -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] btrfs: Make async snapshot ioctl more generic
Hi Li, On Mon, 29 Nov 2010, Li Zefan wrote: So we don't have to add new structures as we create more ioctls for snapshots. Now to create async snapshot, set BTRFS_SNAPSHOT_CREATE_ASYNC bit of vol_arg_v2-flags, and then call ioctl(BTRFS_IOCT_SNAP_CREATE_V2). Note: this changes the async snapshot ioctl ABI, which was merged in .37 merge window, so we have to make this change into .37. These changes all look good to me. Thanks! sage Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ioctl.c | 34 +- fs/btrfs/ioctl.h | 12 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 463d91b..d3f1a60 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -935,23 +935,31 @@ out: static noinline int btrfs_ioctl_snap_create(struct file *file, void __user *arg, int subvol, - int async) + bool v2) { struct btrfs_ioctl_vol_args *vol_args = NULL; - struct btrfs_ioctl_async_vol_args *async_vol_args = NULL; + struct btrfs_ioctl_vol_args_v2 *vol_args_v2 = NULL; char *name; u64 fd; u64 transid = 0; + bool async = false; int ret; - if (async) { - async_vol_args = memdup_user(arg, sizeof(*async_vol_args)); - if (IS_ERR(async_vol_args)) - return PTR_ERR(async_vol_args); + if (v2) { + vol_args_v2 = memdup_user(arg, sizeof(*vol_args_v2)); + if (IS_ERR(vol_args_v2)) + return PTR_ERR(vol_args_v2); - name = async_vol_args-name; - fd = async_vol_args-fd; - async_vol_args-name[BTRFS_SNAPSHOT_NAME_MAX] = '\0'; + if (vol_args_v2-flags ~BTRFS_SNAPSHOT_CREATE_ASYNC) { + ret = -EINVAL; + goto out; + } + + name = vol_args_v2-name; + fd = vol_args_v2-fd; + vol_args_v2-name[BTRFS_SNAPSHOT_NAME_MAX] = '\0'; + if (vol_args_v2-flags BTRFS_SNAPSHOT_CREATE_ASYNC) + async = true; } else { vol_args = memdup_user(arg, sizeof(*vol_args)); if (IS_ERR(vol_args)) @@ -966,13 +974,13 @@ static noinline int btrfs_ioctl_snap_create(struct file *file, if (!ret async) { if (copy_to_user(arg + - offsetof(struct btrfs_ioctl_async_vol_args, + offsetof(struct btrfs_ioctl_vol_args_v2, transid), transid, sizeof(transid))) return -EFAULT; } - +out: kfree(vol_args); - kfree(async_vol_args); + kfree(vol_args_v2); return ret; } @@ -2235,7 +2243,7 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_getversion(file, argp); case BTRFS_IOC_SNAP_CREATE: return btrfs_ioctl_snap_create(file, argp, 0, 0); - case BTRFS_IOC_SNAP_CREATE_ASYNC: + case BTRFS_IOC_SNAP_CREATE_V2: return btrfs_ioctl_snap_create(file, argp, 0, 1); case BTRFS_IOC_SUBVOL_CREATE: return btrfs_ioctl_snap_create(file, argp, 1, 0); diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index 17c99eb..bc70584 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -30,10 +30,14 @@ struct btrfs_ioctl_vol_args { char name[BTRFS_PATH_NAME_MAX + 1]; }; -#define BTRFS_SNAPSHOT_NAME_MAX 4079 -struct btrfs_ioctl_async_vol_args { +#define BTRFS_SNAPSHOT_CREATE_ASYNC (1ULL 0) + +#define BTRFS_SNAPSHOT_NAME_MAX 4039 +struct btrfs_ioctl_vol_args_v2 { __s64 fd; __u64 transid; + __u64 flags; + __u64 unused[4]; char name[BTRFS_SNAPSHOT_NAME_MAX + 1]; }; @@ -187,6 +191,6 @@ struct btrfs_ioctl_space_args { struct btrfs_ioctl_space_args) #define BTRFS_IOC_START_SYNC _IOR(BTRFS_IOCTL_MAGIC, 24, __u64) #define BTRFS_IOC_WAIT_SYNC _IOW(BTRFS_IOCTL_MAGIC, 22, __u64) -#define BTRFS_IOC_SNAP_CREATE_ASYNC _IOW(BTRFS_IOCTL_MAGIC, 23, \ -struct btrfs_ioctl_async_vol_args) +#define BTRFS_IOC_SNAP_CREATE_V2 _IOW(BTRFS_IOCTL_MAGIC, 23, \ +struct btrfs_ioctl_vol_args_v2) #endif -- 1.6.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)
On Mon, Nov 29, 2010 at 12:02 AM, Li Zefan l...@cn.fujitsu.com wrote: (Cc: Sage Weil s...@newdream.net for changes in async snapshots) This patchset adds readonly-snapshots support. You can create a readonly snapshot, and you can also set a snapshot readonly/writable on the fly. A few readonly checks are added in setattr, permission, remove_xattr and set_xattr callbacks, as well as in some ioctls. Great work! I have a suggestion on defaults when snapshots are created. I think they should default to being read-only and if they are meant to be read-write a flag can be set at creation time (and changable at a later time as well of course). This way user/admin preconceptions of a snapshot being read-only can be enforced by default, and the exception when you want a read-write snapshot can be available with a switch at the cli level (and probably a flag at the ioctl level). It gives one more natural distinction between a snapshot and a subvolume at the user conceptual level. What do you think? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Add readonly support to replace BUG_ON phrase
On Thu, Nov 25, 2010 at 05:52:47PM +0800, Miao Xie wrote: Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. --- fs/btrfs/ctree.h | 21 ++ fs/btrfs/disk-io.c | 23 +++ fs/btrfs/super.c | 100 ++- fs/btrfs/transaction.c |7 +++ 4 files changed, 148 insertions(+), 3 deletions(-) Overall seems sane, but what about kernels that don't make these checks? I'm ok with well sucks for them as an answer, just want to make sure we've at least though about it. Also I'm not sure marking the fs as broken is the right move here. Ext3/4 don't do this, they just mount read-only, as long as you can still unmount the filesystem everything comes out ok. Think of the case where we just get a spurious EIO, the fs should be fine the next time around, there's reason to force the user to run fsck in this case. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)
On 11/29/10 21:02, Mike Fedyk wrote: On Mon, Nov 29, 2010 at 12:02 AM, Li Zefanl...@cn.fujitsu.com wrote: (Cc: Sage Weils...@newdream.net for changes in async snapshots) This patchset adds readonly-snapshots support. You can create a readonly snapshot, and you can also set a snapshot readonly/writable on the fly. A few readonly checks are added in setattr, permission, remove_xattr and set_xattr callbacks, as well as in some ioctls. Great work! I have a suggestion on defaults when snapshots are created. I think they should default to being read-only and if they are meant to be read-write a flag can be set at creation time (and changable at a later time as well of course). This way user/admin preconceptions of a snapshot being read-only can be enforced by default, and the exception when you want a read-write snapshot can be available with a switch at the cli level (and probably a flag at the ioctl level). It gives one more natural distinction between a snapshot and a subvolume at the user conceptual level. What do you think? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Hi, I completely agree with you. I think lots of people use snapshots for backup purposes and these ones shouldn't be writable. Bye, David Arendt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Errors during defragmentation
Hello, I decided to test the 'defragment' feature on my system (after a huge number of system updates and prelinking): find /bin /sbin /lib /usr/lib /usr/bin /usr/sbin -type d -exec btrfs filesystem defragment '{}' '+' I have already defragmented a couple of (very large) directories with no errors at all, so this was expected to work somehow. Surprisingly, this time there were thousands of messages like this: ioctl failed on directory name ret -1 errno 28 Most of the reported directories had zero files/subdirectories. However, *most* of them were *not* empty... What does this error message mean? Could someone shed more light on this, please? Should I get ready for a bad crash? ;-) (There seems to be no data loss so far. No new messages in dmesg, no unexpected system behavior.) Andrej smime.p7s Description: S/MIME Cryptographic Signature
Re: Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)
On Mon, Nov 29, 2010 at 12:41 PM, David Arendt ad...@prnet.org wrote: On 11/29/10 21:02, Mike Fedyk wrote: On Mon, Nov 29, 2010 at 12:02 AM, Li Zefanl...@cn.fujitsu.com wrote: (Cc: Sage Weils...@newdream.net for changes in async snapshots) This patchset adds readonly-snapshots support. You can create a readonly snapshot, and you can also set a snapshot readonly/writable on the fly. A few readonly checks are added in setattr, permission, remove_xattr and set_xattr callbacks, as well as in some ioctls. Great work! I have a suggestion on defaults when snapshots are created. I think they should default to being read-only and if they are meant to be read-write a flag can be set at creation time (and changable at a later time as well of course). This way user/admin preconceptions of a snapshot being read-only can be enforced by default, and the exception when you want a read-write snapshot can be available with a switch at the cli level (and probably a flag at the ioctl level). It gives one more natural distinction between a snapshot and a subvolume at the user conceptual level. What do you think? I completely agree with you. I think lots of people use snapshots for backup purposes and these ones shouldn't be writable. by default. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Add readonly support to replace BUG_ON phrase
On Mon, Nov 29, 2010 at 12:10 PM, Josef Bacik jo...@redhat.com wrote: On Thu, Nov 25, 2010 at 05:52:47PM +0800, Miao Xie wrote: Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. --- fs/btrfs/ctree.h | 21 ++ fs/btrfs/disk-io.c | 23 +++ fs/btrfs/super.c | 100 ++- fs/btrfs/transaction.c | 7 +++ 4 files changed, 148 insertions(+), 3 deletions(-) Overall seems sane, but what about kernels that don't make these checks? I'm ok with well sucks for them as an answer, just want to make sure we've at least though about it. Also I'm not sure marking the fs as broken is the right move here. Ext3/4 don't do this, they just mount read-only, as long as you can still unmount the filesystem everything comes out ok. Think of the case where we just get a spurious EIO, the fs should be fine the next time around, there's reason to force the user to run fsck in this case. Did you mean there's no reason to? Also I guess you mean this in the case when there is no redundancy (single and raid0) as the other cases should recover from spurious EIO at run time. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Errors during defragmentation
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Mon, Nov 29, 2010 at 10:02:56PM +0100, Andrej Podzimek wrote: Hello, I decided to test the 'defragment' feature on my system (after a huge number of system updates and prelinking): find /bin /sbin /lib /usr/lib /usr/bin /usr/sbin -type d -exec btrfs filesystem defragment '{}' '+' I have already defragmented a couple of (very large) directories with no errors at all, so this was expected to work somehow. Surprisingly, this time there were thousands of messages like this: ioctl failed on directory name ret -1 errno 28 errno 28 is ENOSPC You've run out of disk space. (Or at least, btrfs thinks so). Most of the reported directories had zero files/subdirectories. However, *most* of them were *not* empty... What does this error message mean? Could someone shed more light on this, please? Should I get ready for a bad crash? ;-) (There seems to be no data loss so far. No new messages in dmesg, no unexpected system behavior.) Hugo. - -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I believe that it's closely correlated with --- the aeroswine coefficient. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFM9BoxIKyzvlFcI40RAv5SAJkB13ClPuTeRElrN1ARFhvDJ2C76gCghC/d zFczJesxQGbd2jC2ildNNI0= =i+tI -END PGP SIGNATURE- signature.asc Description: Digital signature
Re: Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)
This may sound excessive as any new concept introduction that late in development, but readonly/writable snapshots could be further differentiated by naming the latter clones. This way end-user would naturally perceive snapsot as read-only PIT fs image, while clone would naturally refer to (writable) head fork. Regards, Andrey On Tue, Nov 30, 2010 at 12:08 AM, Mike Fedyk mfe...@mikefedyk.com wrote: On Mon, Nov 29, 2010 at 12:41 PM, David Arendt ad...@prnet.org wrote: On 11/29/10 21:02, Mike Fedyk wrote: On Mon, Nov 29, 2010 at 12:02 AM, Li Zefanl...@cn.fujitsu.com wrote: (Cc: Sage Weils...@newdream.net for changes in async snapshots) This patchset adds readonly-snapshots support. You can create a readonly snapshot, and you can also set a snapshot readonly/writable on the fly. A few readonly checks are added in setattr, permission, remove_xattr and set_xattr callbacks, as well as in some ioctls. Great work! I have a suggestion on defaults when snapshots are created. I think they should default to being read-only and if they are meant to be read-write a flag can be set at creation time (and changable at a later time as well of course). This way user/admin preconceptions of a snapshot being read-only can be enforced by default, and the exception when you want a read-write snapshot can be available with a switch at the cli level (and probably a flag at the ioctl level). It gives one more natural distinction between a snapshot and a subvolume at the user conceptual level. What do you think? I completely agree with you. I think lots of people use snapshots for backup purposes and these ones shouldn't be writable. by default. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)
On Mon, Nov 29, 2010 at 1:31 PM, Andrey Kuzmin andrey.v.kuz...@gmail.com wrote: This may sound excessive as any new concept introduction that late in development, but readonly/writable snapshots could be further differentiated by naming the latter clones. This way end-user would naturally perceive snapsot as read-only PIT fs image, while clone would naturally refer to (writable) head fork. I'm not sure we want to take all of the terminology that zfs uses as it may also bring the percieved drawbacks as well. Isn't there some additional overhead for a zfs clone compared to a snapshot? I'm not very familiar with zfs so that's why I ask. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Errors during defragmentation
Hello, I decided to test the 'defragment' feature on my system (after a huge number of system updates and prelinking): find /bin /sbin /lib /usr/lib /usr/bin /usr/sbin -type d -exec btrfs filesystem defragment '{}' '+' I have already defragmented a couple of (very large) directories with no errors at all, so this was expected to work somehow. Surprisingly, this time there were thousands of messages like this: ioctl failed ondirectory name ret -1 errno 28 errno 28 is ENOSPC You've run out of disk space. (Or at least, btrfs thinks so). Pleased to hear that this is not a fatal error. :-) The filesystem still has quite a lot of free space. New files can be created. I have just tried to add about 10 GB of data, which worked fine. The output from 'df' indicates that only 71% of the partition (177 GB out of 250 GB) is used. The built-in df shows similar numbers -- if I understand it well, there is plenty of free space left. # btrfs filesystem df / Data: total=175.01GB, used=169.57GB Metadata: total=6.51GB, used=3.64GB System: total=12.00MB, used=32.00KB # btrfs filesystem show octopus failed to read /dev/sdb failed to read /dev/sr0 Label: 'octopus' uuid: 8576b57b-b934-424e-9a8a-04abc780c963 Total devices 1 FS bytes used 173.21GB devid1 size 249.50GB used 188.04GB path /dev/dm-2 Btrfs Btrfs v0.19 Does defragmentation have any unexpected (and not yet documented) free space requirements? (Most of the files I was attempting to defragment were smaller than 10 MB, as the directory names suggest.) Is there a workaround for this issue? Or should I just leave the defragmentation feature alone for the time being? Andrej smime.p7s Description: S/MIME Cryptographic Signature
Re: [patch] fs: fix deadlocks in writeback_if_idle
On Thu, 25 Nov 2010 14:53:56 +1100 Nick Piggin npig...@kernel.dk wrote: On Wed, Nov 24, 2010 at 02:10:28PM +0100, Jan Kara wrote: On Wed 24-11-10 12:03:43, Nick Piggin wrote: For the _nr variant that btrfs uses, it's worse for the filesystems that don't have a 1:1 bdi-sb mapping. It might not actually write any of the pages from the SB that is out of space. That's true, but it might not write anything anyway (and after we check whether writeout is in progress, the writeout thread could go to sleep and not do anything anyway). So it's a pretty hacky interface anyway. If you want to do anything deterministic, you obviously need real coupling between producer and consumer. This should only be a performance tweak (or a workaround hack in worst case). Yes, the current interface is a band aid for the problem and better interface is welcome. But it's not trivial to do better... It makes no further guarantees, and anyway the sb has to compete for normal writeback within this bdi. I think Christoph is right because filesystems should not really know about how bdi writeback queueing works. But I don't know if it's worth doing anything more complex for this functionality? I think we should make a writeback_inodes_sb_unlocked() that doesn't warn when the semaphore isn't held. That should be enough given where btrfs and ext4 are calling it from. It doesn't solve the bugs -- calling and waiting for writeback is still broken because completion requires i_mutex and it is called from under i_mutex. Well, as I wrote in my previous email, only ext4 has the problem with i_mutex and I personally view it as a bug. But ultimately it's Ted's call to decide. Well, for now, the easiest and simplest fix is my patch, I think. The objection is that we may not write out anything for the specified sb, but the current implementation provides no such guarantees at all anyway, so I don't think it's a big issue. Well yes. We take something which will fail occasionally and with your patch replace it with something which will fail a bit more often. Why don't we go all the way and do something which will fail *even more often*. Namely, just delete the damn function in the hope that the resulting failures will provoke the ext4 crew into doing something sane this time? Guys, this delalloc thing *sucks*. And here we are just sticking new bandaids on top of the old bandaids. And the btrfs approach isn't exactly a thing of glory, either. So... nope. I won't be applying Nick's patch. Please fix this thing properly - you have a whole month! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] fs: fix deadlocks in writeback_if_idle
On Mon, Nov 29, 2010 at 02:26:03PM -0800, Andrew Morton wrote: On Thu, 25 Nov 2010 14:53:56 +1100 Nick Piggin npig...@kernel.dk wrote: On Wed, Nov 24, 2010 at 02:10:28PM +0100, Jan Kara wrote: Well, for now, the easiest and simplest fix is my patch, I think. The objection is that we may not write out anything for the specified sb, but the current implementation provides no such guarantees at all anyway, so I don't think it's a big issue. Well yes. We take something which will fail occasionally and with your patch replace it with something which will fail a bit more often. Why don't we go all the way and do something which will fail *even more often*. Namely, just delete the damn function in the hope that the resulting failures will provoke the ext4 crew into doing something sane this time? I just need it fixed because the deadlocks are constantly hanging my tests and/or switching off lockdep. Guys, this delalloc thing *sucks*. And here we are just sticking new bandaids on top of the old bandaids. And the btrfs approach isn't exactly a thing of glory, either. So... nope. I won't be applying Nick's patch. Please fix this thing properly - you have a whole month! Testers have less. It would be better to fix it now and rip it out at the start of the next merge window if you're that way inclined :) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)
On Nov 29, 2010, at 3:48 PM, Andrey Kuzmin andrey.v.kuz...@gmail.com wrote: I'm not sure why zfs came up, they don't own the term :). As to zfs/overhead topic, I doubt there's any difference between clone and writable shapshot (there should be none, of course, it's just two different names for the same concept). Regards, Andrey On Tue, Nov 30, 2010 at 12:43 AM, Mike Fedyk mfe...@mikefedyk.com wrote: On Mon, Nov 29, 2010 at 1:31 PM, Andrey Kuzmin andrey.v.kuz...@gmail.com wrote: This may sound excessive as any new concept introduction that late in development, but readonly/writable snapshots could be further differentiated by naming the latter clones. This way end-user would naturally perceive snapsot as read-only PIT fs image, while clone would naturally refer to (writable) head fork. I'm not sure we want to take all of the terminology that zfs uses as it may also bring the percieved drawbacks as well. Isn't there some additional overhead for a zfs clone compared to a snapshot? I'm not very familiar with zfs so that's why I ask. -- To unsubscribe from this list: send the line unsubscribe linux- btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I don't like the idea of readonly by default, or further changes to terminology, for several reasons: ) readonly by default offers no real enhancement whatsoever other than breaking _anything_ that's written right now ) btrfs readonly is not even really readonly; as superuser could simply flip a flag to enable writes, readonly merely prevents accidental writes or misbehaving apps... ie. protecting you from yourself ) backups are the simple/obvious use case; I personally use btrfs heavily for LXC containers, in which case nearly every single snapshot is intended to be writable -- usually cloning a template into a new domain ) I also use an initramfs hook to provide system rollbacks, also writable; the hook also provides multiple versions of the branch... all writable ) adding new terms is not a good idea imo; I've already spewed out many sentences explaining the difference between subvolumes and snapshots, ie. that there is none... adding another term only adds to this problem; they each describe the same thing, but differentiate based on origin or current state, neither of which actually describe what it _is_-- a new named pointer to a tree, like a git branch -- a subvolume. I think a better solution/compromise would be to leave snapshots writeable by default, since that's more true to what's happening internally anyway, but maybe introduce a mount option controlling the default action for that mount point. C Anthony [mobile] -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] btrfs: Make async snapshot ioctl more generic
Goffredo Baroncelli wrote: Hi Li, great work, but I have some suggestions: On Monday, 29 November, 2010, Li Zefan wrote: So we don't have to add new structures as we create more ioctls for snapshots. Now to create async snapshot, set BTRFS_SNAPSHOT_CREATE_ASYNC bit of vol_arg_v2-flags, and then call ioctl(BTRFS_IOCT_SNAP_CREATE_V2). Note: this changes the async snapshot ioctl ABI, which was merged in .37 merge window, so we have to make this change into .37. Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ioctl.c | 34 +- fs/btrfs/ioctl.h | 12 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 463d91b..d3f1a60 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -935,23 +935,31 @@ out: static noinline int btrfs_ioctl_snap_create(struct file *file, void __user *arg, int subvol, -int async) +bool v2) This is a aesthetic suggestion: instead of passing a flag called v2 I suggest to add two wrapper functions, which extract the parameters and passes all available parameter to the generic function. Something like: Sure, as long as it won't result in code duplication and can improve readability. static inline btrfs_ioctl_snap_create_v1(struct file *file, void __user *arg, int subvol) { vol_args = memdup_user(arg, sizeof(*vol_args)); if (IS_ERR(vol_args)) return PTR_ERR(vol_args); name = vol_args-name; fd = vol_args-fd; vol_args-name[BTRFS_PATH_NAME_MAX] = '\0'; btrfs_ioctl_snap_create_generic(file, subvol, name, fd, 0, 0); } static inline btrfs_ioctl_snap_create_v2(struct file *file, void __user *arg, int subvol, ) { vol_args_v2 = memdup_user(arg, sizeof(*vol_args_v2)); if (IS_ERR(vol_args_v2)) return PTR_ERR(vol_args_v2); ret = snap_check_flags(vol_args_v2-flags, true); if (ret) goto out; name = vol_args_v2-name; fd = vol_args_v2-fd; vol_args_v2-name[BTRFS_SNAPSHOT_NAME_MAX] = '\0'; if (vol_args_v2-flags BTRFS_SNAPSHOT_CREATE_ASYNC) async = true; if (vol_args_v2-flags BTRFS_SNAPSHOT_RDONLY) readonly = true; btrfs_ioctl_snap_create_generic(file, subvol, name, fd, async, readonly); } and case BTRFS_IOC_SNAP_CREATE: return btrfs_ioctl_snap_create_v1(file, argp, 0); case BTRFS_IOC_SNAP_CREATE_V2: return btrfs_ioctl_snap_create_v2(file, argp, 0); Frankly speaking, we could get rid of the subvol parameter adding another wrapper function like btrfs_ioctl_subvol_create( ), but this would be another story :) I thought about that too. { struct btrfs_ioctl_vol_args *vol_args = NULL; -struct btrfs_ioctl_async_vol_args *async_vol_args = NULL; +struct btrfs_ioctl_vol_args_v2 *vol_args_v2 = NULL; char *name; u64 fd; u64 transid = 0; +bool async = false; int ret; -if (async) { -async_vol_args = memdup_user(arg, sizeof(*async_vol_args)); -if (IS_ERR(async_vol_args)) -return PTR_ERR(async_vol_args); +if (v2) { +vol_args_v2 = memdup_user(arg, sizeof(*vol_args_v2)); +if (IS_ERR(vol_args_v2)) +return PTR_ERR(vol_args_v2); -name = async_vol_args-name; -fd = async_vol_args-fd; -async_vol_args-name[BTRFS_SNAPSHOT_NAME_MAX] = '\0'; +if (vol_args_v2-flags ~BTRFS_SNAPSHOT_CREATE_ASYNC) { +ret = -EINVAL; +goto out; +} + +name = vol_args_v2-name; +fd = vol_args_v2-fd; +vol_args_v2-name[BTRFS_SNAPSHOT_NAME_MAX] = '\0'; +if (vol_args_v2-flags BTRFS_SNAPSHOT_CREATE_ASYNC) +async = true; } else { vol_args = memdup_user(arg, sizeof(*vol_args)); if (IS_ERR(vol_args)) @@ -966,13 +974,13 @@ static noinline int btrfs_ioctl_snap_create(struct file *file, if (!ret async) { if (copy_to_user(arg + -offsetof(struct btrfs_ioctl_async_vol_args, +offsetof(struct btrfs_ioctl_vol_args_v2, transid), transid, sizeof(transid))) return -EFAULT;
Re: [RFC PATCH 0/4] Add readonly support to replace BUG_ON phrase
On 11/30/2010 04:10 AM, Josef Bacik wrote: On Thu, Nov 25, 2010 at 05:52:47PM +0800, Miao Xie wrote: Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. --- fs/btrfs/ctree.h | 21 ++ fs/btrfs/disk-io.c | 23 +++ fs/btrfs/super.c | 100 ++- fs/btrfs/transaction.c |7 +++ 4 files changed, 148 insertions(+), 3 deletions(-) Overall seems sane, but what about kernels that don't make these checks? I'm ok with well sucks for them as an answer, just want to make sure we've at least though about it. You mean those code that does nothing on ret-checks? IMO, if the code really needs ret-check, we should deal with them seriously, or just leave it alone. And this is a step-by-step job. Also I'm not sure marking the fs as broken is the right move here. Ext3/4 don't do this, they just mount read-only, as long as you can still unmount the filesystem everything comes out ok. Think of the case where we just get a spurious EIO, the fs should be fine the next time around, there's reason to force the user to run fsck in this case. Yes, I agree on this. For spurious EIO, it mainly depends on coders, returning the errno to caller may work on bypassing fsck. While I'm working on this readonly stuff, it is difficult to solve the potential deadlock when we write the super block to disk. Since btrfs supports multi-device, before write-super, we must get the device lock device_list_mutex first, and this has puzzled me a lot. BTW, I've tried another way to bypass deadlock. I made the write-super stuff into umount, which can make us free from deadlock, however, while testing this, it seemes that umount cannot work due to a ext3/4 jbd oops, I'm digging on this oops... So, any ideas about free from deadlock? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Default to read-only on snapshot creation and have a flag if snapshot should be writable (was: [PATCH 0/5] btrfs: Readonly snapshots)
C Anthony Risinger wrote: On Nov 29, 2010, at 3:48 PM, Andrey Kuzmin andrey.v.kuz...@gmail.com wrote: I'm not sure why zfs came up, they don't own the term :). As to zfs/overhead topic, I doubt there's any difference between clone and writable shapshot (there should be none, of course, it's just two different names for the same concept). Regards, Andrey On Tue, Nov 30, 2010 at 12:43 AM, Mike Fedyk mfe...@mikefedyk.com wrote: On Mon, Nov 29, 2010 at 1:31 PM, Andrey Kuzmin andrey.v.kuz...@gmail.com wrote: This may sound excessive as any new concept introduction that late in development, but readonly/writable snapshots could be further differentiated by naming the latter clones. This way end-user would naturally perceive snapsot as read-only PIT fs image, while clone would naturally refer to (writable) head fork. I'm not sure we want to take all of the terminology that zfs uses as it may also bring the percieved drawbacks as well. Isn't there some additional overhead for a zfs clone compared to a snapshot? I'm not very familiar with zfs so that's why I ask. -- To unsubscribe from this list: send the line unsubscribe linux- btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I don't like the idea of readonly by default, or further changes to terminology, for several reasons: I quite agree with you. LVM2 also defaults to read/write for snapshots. ) readonly by default offers no real enhancement whatsoever other than breaking _anything_ that's written right now This was the first thing that came to my mind. ) btrfs readonly is not even really readonly; as superuser could simply flip a flag to enable writes, readonly merely prevents accidental writes or misbehaving apps... ie. protecting you from yourself ) backups are the simple/obvious use case; I personally use btrfs heavily for LXC containers, in which case nearly every single snapshot is intended to be writable -- usually cloning a template into a new domain ) I also use an initramfs hook to provide system rollbacks, also writable; the hook also provides multiple versions of the branch... all writable ) adding new terms is not a good idea imo; I've already spewed out many sentences explaining the difference between subvolumes and snapshots, ie. that there is none... adding another term only adds to this problem; they each describe the same thing, but differentiate based on origin or current state, neither of which actually describe what it _is_-- a new named pointer to a tree, like a git branch -- a subvolume. I think a better solution/compromise would be to leave snapshots writeable by default, since that's more true to what's happening internally anyway, but maybe introduce a mount option controlling the default action for that mount point. C Anthony [mobile] -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Add readonly support to replace BUG_ON phrase
On Tue, Nov 30, 2010 at 10:03:58AM +0800, liubo wrote: On 11/30/2010 04:10 AM, Josef Bacik wrote: On Thu, Nov 25, 2010 at 05:52:47PM +0800, Miao Xie wrote: Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. --- fs/btrfs/ctree.h | 21 ++ fs/btrfs/disk-io.c | 23 +++ fs/btrfs/super.c | 100 ++- fs/btrfs/transaction.c |7 +++ 4 files changed, 148 insertions(+), 3 deletions(-) Overall seems sane, but what about kernels that don't make these checks? I'm ok with well sucks for them as an answer, just want to make sure we've at least though about it. You mean those code that does nothing on ret-checks? IMO, if the code really needs ret-check, we should deal with them seriously, or just leave it alone. And this is a step-by-step job. Sorry I mean for older kernels that don't know about these hey your fs is screwed flags. It seems like they'll just get ignored, are we sure thats what we want to happen? I'm fine with that, but if we don't want that to happen it may be good to have a incompat flag. Also I'm not sure marking the fs as broken is the right move here. Ext3/4 don't do this, they just mount read-only, as long as you can still unmount the filesystem everything comes out ok. Think of the case where we just get a spurious EIO, the fs should be fine the next time around, there's reason to force the user to run fsck in this case. Yes, I agree on this. For spurious EIO, it mainly depends on coders, returning the errno to caller may work on bypassing fsck. Right I'm worried about the flipping read only stuff being kicked by EIO, which happens with ext* and could happen with btrfs in the right cases. I'm not saying thats wrong, its what should happen, I'm just saying we need to be able to unmount the filesystem and mount it back up without needing to run an fsck in between. While I'm working on this readonly stuff, it is difficult to solve the potential deadlock when we write the super block to disk. Since btrfs supports multi-device, before write-super, we must get the device lock device_list_mutex first, and this has puzzled me a lot. BTW, I've tried another way to bypass deadlock. I made the write-super stuff into umount, which can make us free from deadlock, however, while testing this, it seemes that umount cannot work due to a ext3/4 jbd oops, I'm digging on this oops... So, any ideas about free from deadlock? None :). The best thing I can think of is do like we're doing with the read only stuff and only write out the super right before we flip read only, and then make umount make sure that if we're mounted read only to not do anything. Truth be told I hate this mark the fs as broken idea. We don't know if the error we got means the filesystem is broken (for example the EIO case). If we do hit actual corruption maybe it would be good, and in that case we should write out the super at the point we find that corruption and then flip read only and have that be the only time we have to worry about writing out the super. So I guess that's 2 options 1) Ditch the the fs is broken flag. This makes things nice and simple since on-disk is already consistent, all we have to do is drop anything thats dirty and we're home free. 2) Keep the flag, but only worry about writing it out on a case by case basis. So we have a btrfs_corrupt_fs() function that writes out the super with the appropriate flag, and then flips the fs read only. Then we don't have to do anything special in the common paths, just the normal hey is this fs read only? things, so for all other cases we can just flip the fs read only and everything works. I hope that makes sense, if not feel free to ignore me and just keep doing what you've been doing :). Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Add readonly support to replace BUG_ON phrase
On 11/30/2010 10:30 AM, Josef Bacik wrote: On Tue, Nov 30, 2010 at 10:03:58AM +0800, liubo wrote: On 11/30/2010 04:10 AM, Josef Bacik wrote: On Thu, Nov 25, 2010 at 05:52:47PM +0800, Miao Xie wrote: Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. --- fs/btrfs/ctree.h | 21 ++ fs/btrfs/disk-io.c | 23 +++ fs/btrfs/super.c | 100 ++- fs/btrfs/transaction.c |7 +++ 4 files changed, 148 insertions(+), 3 deletions(-) Overall seems sane, but what about kernels that don't make these checks? I'm ok with well sucks for them as an answer, just want to make sure we've at least though about it. You mean those code that does nothing on ret-checks? IMO, if the code really needs ret-check, we should deal with them seriously, or just leave it alone. And this is a step-by-step job. Sorry I mean for older kernels that don't know about these hey your fs is screwed flags. It seems like they'll just get ignored, are we sure thats what we want to happen? I'm fine with that, but if we don't want that to happen it may be good to have a incompat flag. Ohh, got it, thanks for pointing it out. Will do it later. Also I'm not sure marking the fs as broken is the right move here. Ext3/4 don't do this, they just mount read-only, as long as you can still unmount the filesystem everything comes out ok. Think of the case where we just get a spurious EIO, the fs should be fine the next time around, there's reason to force the user to run fsck in this case. Yes, I agree on this. For spurious EIO, it mainly depends on coders, returning the errno to caller may work on bypassing fsck. Right I'm worried about the flipping read only stuff being kicked by EIO, which happens with ext* and could happen with btrfs in the right cases. I'm not saying thats wrong, its what should happen, I'm just saying we need to be able to unmount the filesystem and mount it back up without needing to run an fsck in between. hm, this really makes sense. Since it is difficult to tell whether a fake corruption it is, what about just implementing readonly stuff like this and making it more friendly to EIO in future? While I'm working on this readonly stuff, it is difficult to solve the potential deadlock when we write the super block to disk. Since btrfs supports multi-device, before write-super, we must get the device lock device_list_mutex first, and this has puzzled me a lot. BTW, I've tried another way to bypass deadlock. I made the write-super stuff into umount, which can make us free from deadlock, however, while testing this, it seemes that umount cannot work due to a ext3/4 jbd oops, I'm digging on this oops... So, any ideas about free from deadlock? None :). The best thing I can think of is do like we're doing with the read only stuff and only write out the super right before we flip read only, and then make umount make sure that if we're mounted read only to not do anything. Truth be told I hate this mark the fs as broken idea. We don't know if the error we got means the filesystem is broken (for example the EIO case). If we do hit actual corruption maybe it would be good, and in that case we should write out the super at the point we find that corruption and then flip read only and have that be the only time we have to worry about writing out the super. So I guess that's 2 options 1) Ditch the the fs is broken flag. This makes things nice and simple since on-disk is already consistent, all we have to do is drop anything thats dirty and we're home free. 2) Keep the flag, but only worry about writing it out on a case by case basis. So we have a btrfs_corrupt_fs() function that writes out the super with the appropriate flag, and then flips the fs read only. Then we don't have to do anything special in the common paths, just the normal hey is this fs read only? things, so for all other cases we can just flip the fs read only and everything works. The 2) is what I have just done. :) I hope that makes sense, if not feel free to ignore me and just keep doing what you've been doing :). Thanks, They are very helpful. Thanks, Liu Bo Josef -- To unsubscribe
R: Re: [PATCH 5/5] btrfs: Add ioctl to set snapshot readonly/writable
Hi Li, Messaggio originale Da: l...@cn.fujitsu.com Data: 30/11/2010 8.03 A: kreij...@libero.it Cc: linux-btrfs@vger.kernel.org Ogg: Re: [PATCH 5/5] btrfs: Add ioctl to set snapshot readonly/writable Goffredo Baroncelli wrote: Hi Li, On Monday, 29 November, 2010, Li Zefan wrote: This allows us to set a snapshot readonly or writable on the fly. Usage: Set BTRFS_SNAPSHOT_RDONLY/WRITABLE of btrfs_ioctl_vol_arg_v2-flags, and then call ioctl(BTRFS_IOCTL_SNAP_SETFLAGS); I really appreciate your work, but I have some doubt about this interface. In particolar: It's the interface that I would like to be discussed. Thanks! - how get the flags of a subvolume ? I suggest to implement a pair of ioctls: - subvolume_setflags - get the flags - subvolume_getflags - set the flags These ioctls would be more generic (there are a lot of flags which may be interested to put in the root of a subvolume: think about compress/nocompress, (no)datasum...) - For the reason abowe, I suggest to replace SNAPSHOT with SUBVOLUME - Finally, with a pair of get/set_flags functions we can avoid the use of the flags BTRFS_SNAPSHOT_WRITABLE. There are some reasons that I created this interface: - set/getflags should set/get root flags which reflect in struct btrfs_root_item-flags. - btrfs_root_item-flags was not used at all before this patch, so (no)compress and (no)datasum is not reflect in -flags. - _CREATE_ASYNC flag is to create snapshot asynchronously, so it's not a flag of tree root. Of course I never mind about _CREATE_ASYNC to be set in btrfs_root_item- flags. _CREATE_ASYNC is not a snapshot properties but a way of creating a subvolume. But other flags may make sense to live in btrfs_root_item-flags. So I am suggesting to develop a more general interface for future improvement. These pair of functions (*_set/get) should be use to set the subvolume/snapshot properties. And the RDONLY is one of them. In the detail to set an attributue an user should be: - get the subvolume flags (ioctl(fd, BTRFS_IOC_SNAP_GETFLAGS) ) - compute the new flags ( flags |= BTRFS_FLAGS_ or flags = ~BTRFS_FLAG_) - set the subvolume flags (ioctl(fd, BTRFS_IOC_SNAP_SGETFLAGS) ) - It seems to me there's no user requirement for getflags ioctl to return _RDONLY/_WRITABLE flags of a tree root? And how an user/admin can understand that a snapshot/subvolume is readonly ? How - By suggesting BTRFS_SUBVOL_RDONLY, does it impliy not only snapshot but also a subvolume can be made readonly? IIRC a snapshot is a subvolume already filled from the beginning. Why doesn't share the capability of make a subvolume RO ? Finally I have another suggestion: make sense to check that the file descriptor is referring to the root of a subvolume instead of a the tree. I highlight that because the other ioctls suffer the same problem and confused the user sometime. For example a lot of people tough that was possible to snapshot a directory, because the ioctl doesn't return any error. But instead of the directory the snapshot was of the full subvolume. Goffredo [...] -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html