[PATCH] Btrfs: update inode flags when renaming
A user reported some weird behaviours, if we move a file with the noCow flag to a directory without the noCow flag, the file is now without the flag, but after remount, we'll find the file's noCow flag comes back. This is because we missed a proper inode update after inheriting parent directory's flags, Reported-by: Marios Titas redneb8...@gmail.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/inode.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d9984fa..d2e3352 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7478,8 +7478,6 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, old_dentry-d_inode, old_dentry-d_name.name, old_dentry-d_name.len); - if (!ret) - ret = btrfs_update_inode(trans, root, old_inode); } if (ret) { btrfs_abort_transaction(trans, root, ret); @@ -7514,6 +7512,11 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, } fixup_inode_flags(new_dir, old_inode); + ret = btrfs_update_inode(trans, root, old_inode); + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto out_fail; + } ret = btrfs_add_link(trans, new_dir, old_inode, new_dentry-d_name.name, -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: update inode flags when renaming
Sorry, but the bug persists even with the above patch. touch test chattr +C test lsattr test mv test test2 lsattr test2 In the above scenario test2 will not have the C flag. On Fri, Feb 22, 2013 at 3:11 AM, Liu Bo bo.li@oracle.com wrote: A user reported some weird behaviours, if we move a file with the noCow flag to a directory without the noCow flag, the file is now without the flag, but after remount, we'll find the file's noCow flag comes back. This is because we missed a proper inode update after inheriting parent directory's flags, Reported-by: Marios Titas redneb8...@gmail.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/inode.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d9984fa..d2e3352 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7478,8 +7478,6 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, old_dentry-d_inode, old_dentry-d_name.name, old_dentry-d_name.len); - if (!ret) - ret = btrfs_update_inode(trans, root, old_inode); } if (ret) { btrfs_abort_transaction(trans, root, ret); @@ -7514,6 +7512,11 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, } fixup_inode_flags(new_dir, old_inode); + ret = btrfs_update_inode(trans, root, old_inode); + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto out_fail; + } ret = btrfs_add_link(trans, new_dir, old_inode, new_dentry-d_name.name, -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use reserved space for creating a snapshot
On fri, 22 Feb 2013 12:33:36 +0800, Liu Bo wrote: While inserting dir index and updating inode for a snapshot, we'd add delayed items which consume trans-block_rsv, if we don't have any space reserved in this trans handle, we either just return or reserve space again. But before creating pending snapshots during committing transaction, we've done a release on this trans handle, so we don't have space reserved in it at this stage. What we're using is block_rsv of pending snapshots which has already reserved well enough space for both inserting dir index and updating inode, so we need to set trans handle to indicate that we have space now. Signed-off-by: Liu Bo bo.li@oracle.com Reviewed-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/transaction.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index fc03aa6..5878bb4 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1063,6 +1063,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, rsv = trans-block_rsv; trans-block_rsv = pending-block_rsv; + trans-bytes_reserved = trans-block_rsv-reserved; dentry = pending-dentry; parent = dget_parent(dentry); @@ -1216,6 +1217,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, fail: dput(parent); trans-block_rsv = rsv; + trans-bytes_reserved = 0; no_free_objectid: kfree(new_root_item); root_item_alloc_fail: -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: update inode flags when renaming
Wouldn't though inheriting create all sorts of problems? For instance check the example that I give in my other responese [1]. [1] http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg22396.html On Fri, Feb 22, 2013 at 4:34 AM, Miao Xie mi...@cn.fujitsu.com wrote: On fri, 22 Feb 2013 16:40:35 +0800, Liu Bo wrote: On Fri, Feb 22, 2013 at 03:32:50AM -0500, Marios Titas wrote: Sorry, but the bug persists even with the above patch. touch test chattr +C test lsattr test mv test test2 lsattr test2 In the above scenario test2 will not have the C flag. What do you expect? IMO it's right that test2 does not have the C flag. No, it's not right. For the users, they expect the C flag is not lost because they just do a rename operation. but fixup_inode_flags() re-sets the flags by the parent directory's flag. I think we should inherit the flags from the parent just when we create a new file/directory, in the other cases, just give a option to the users. How do you think about? Thanks Miao This patch ensure that we get the same result after we remount, no more the C flag coming back :) thanks, liubo On Fri, Feb 22, 2013 at 3:11 AM, Liu Bo bo.li@oracle.com wrote: A user reported some weird behaviours, if we move a file with the noCow flag to a directory without the noCow flag, the file is now without the flag, but after remount, we'll find the file's noCow flag comes back. This is because we missed a proper inode update after inheriting parent directory's flags, Reported-by: Marios Titas redneb8...@gmail.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/inode.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d9984fa..d2e3352 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7478,8 +7478,6 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, old_dentry-d_inode, old_dentry-d_name.name, old_dentry-d_name.len); - if (!ret) - ret = btrfs_update_inode(trans, root, old_inode); } if (ret) { btrfs_abort_transaction(trans, root, ret); @@ -7514,6 +7512,11 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, } fixup_inode_flags(new_dir, old_inode); + ret = btrfs_update_inode(trans, root, old_inode); + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto out_fail; + } ret = btrfs_add_link(trans, new_dir, old_inode, new_dentry-d_name.name, -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: update inode flags when renaming
On Fri, Feb 22, 2013 at 04:10:37AM -0500, Marios Titas wrote: You are right, your patch does improve the situation a bit. But it still does not address the main issue. To illustrate that, consider the following scenario: Sorry for so much confusion for users. Please let me explain the following senario, touch test chattr +C test head -c 1048576 /dev/zero test mv test test2 now test2 lost the C flag because it was renamed. But the data in test2 was written before it lost the C flag and so the extents do not have checksums! Also try to clone it with BTRFS_IOC_CLONE. It fails as if it had the C flag: cp --reflink test2 test3 We don't clone a file when the src(test2) file has NODATASUM and the dest(test3) file does not have NODATASUM, or vice versa. This ensures our checksum's valid. Here, * test2 does has NODATASUM because test has NODATASUM, while * test3 is a new created file, and we're not with '-o nodatasum' or '-o nodatacow' mount options or we don't chattr test3, so test3 does not have NODATASUM flags set. So 'cp' ends up 'INVALID'. OTOH, if you try to clone over a file with NODATACOW then it works: touch test3 chattr +C test3 cp --reflink test2 test3 Now test3 is with NODATACOW, so the above 'cp' works. so the file is in an incosistent state: it sometimes behaves as if it had the NODATACOW flag and sometimes as if it didn't. The C flag refers to NODATACOW, this NODATACOW is used to tell btrfs if we write the file's data on COW mode. So the failure of 'clone' does not equal to the file is NODATACOW. Feel free to correct me. thanks, liubo Thanks On Fri, Feb 22, 2013 at 3:40 AM, Liu Bo bo.li@oracle.com wrote: On Fri, Feb 22, 2013 at 03:32:50AM -0500, Marios Titas wrote: Sorry, but the bug persists even with the above patch. touch test chattr +C test lsattr test mv test test2 lsattr test2 In the above scenario test2 will not have the C flag. What do you expect? IMO it's right that test2 does not have the C flag. This patch ensure that we get the same result after we remount, no more the C flag coming back :) thanks, liubo On Fri, Feb 22, 2013 at 3:11 AM, Liu Bo bo.li@oracle.com wrote: A user reported some weird behaviours, if we move a file with the noCow flag to a directory without the noCow flag, the file is now without the flag, but after remount, we'll find the file's noCow flag comes back. This is because we missed a proper inode update after inheriting parent directory's flags, Reported-by: Marios Titas redneb8...@gmail.com Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/inode.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d9984fa..d2e3352 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7478,8 +7478,6 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, old_dentry-d_inode, old_dentry-d_name.name, old_dentry-d_name.len); - if (!ret) - ret = btrfs_update_inode(trans, root, old_inode); } if (ret) { btrfs_abort_transaction(trans, root, ret); @@ -7514,6 +7512,11 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, } fixup_inode_flags(new_dir, old_inode); + ret = btrfs_update_inode(trans, root, old_inode); + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto out_fail; + } ret = btrfs_add_link(trans, new_dir, old_inode, new_dentry-d_name.name, -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Snapshot Cleaner not Working with inode_cache
Am 20.02.2013, 02:14 Uhr, schrieb Liu Bo bo.li@oracle.com: I think I know why inode_cache keeps us from freeing space, inode_cache adds a cache_inode in each btrfs root, and this cache_inode will be iput at the very last of stage during umount, ie. after we do cleanup work on old snapshot/subvols, where we free the space. A remount will force btrfs to do cleanup work on old snapshots during mount. This may explain the situation. thanks, liubo I don't know how long the code behaves that way, but this is exactly what I see here on debian kernel 3.2.35. Norbert -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Snapshot Cleaner not Working with inode_cache
On Fri, Feb 22, 2013 at 11:16:22AM +0100, Norbert Scheibner wrote: Am 20.02.2013, 02:14 Uhr, schrieb Liu Bo bo.li@oracle.com: I think I know why inode_cache keeps us from freeing space, inode_cache adds a cache_inode in each btrfs root, and this cache_inode will be iput at the very last of stage during umount, ie. after we do cleanup work on old snapshot/subvols, where we free the space. A remount will force btrfs to do cleanup work on old snapshots during mount. This may explain the situation. thanks, liubo I don't know how long the code behaves that way, but this is exactly what I see here on debian kernel 3.2.35. A patch to fix it is now in btrfs-next, so we may not be bitten any more. thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use
From: Wang Shilong wangsl-f...@cn.fujitsu.com This patch tries to stop users to create/destroy qgroup level 0, users can only create/destroy qgroup level more than 0. See the fact: a subvolume/snapshot qgroup was created automatically when creating subvolume/snapshot, so creating a qgroup level 0 can't be a subvolume/snapshot qgroup, the only way to use it is that assigning subvolume/snapshot qgroup to it, the point is that we don't want to have a parent qgroup whose level is 0. So we want to force users to use qgroup with clear relations which means a parent qgroup's level child qgroup's level.For example: 2/0 /\ / \ /\ 1/0 1/1 / \\ / \\ / \\ 0/2560/2570/258 This pattern of quota is nature and easy for users to understand, otherwise it will make the quota configuration confusing and difficult to maintain. Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com Acked-by: Miao Xie mi...@cn.fujitsu.com Cc: Arne Jansen sensi...@gmx.net --- fs/btrfs/ioctl.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a31cd93..3590c21 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3755,7 +3755,7 @@ static long btrfs_ioctl_qgroup_create(struct file *file, void __user *arg) goto drop_write; } - if (!sa-qgroupid) { + if (!(sa-qgroupid 48)) { ret = -EINVAL; goto out; } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND RFC PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use
From: Wang Shilong wangsl-f...@cn.fujitsu.com This patch tries to stop users to create/destroy qgroup level 0, users can only create/destroy qgroup level more than 0. See the fact: a subvolume/snapshot qgroup was created automatically when creating subvolume/snapshot, so creating a qgroup level 0 can't be a subvolume/snapshot qgroup, the only way to use it is that assigning subvolume/snapshot qgroup to it, the point is that we don't want to have a parent qgroup whose level is 0. So we want to force users to use qgroup with clear relations which means a parent qgroup's level child qgroup's level.For example: 2/0 /\ / \ /\ 1/0 1/1 / \\ / \\ / \\ 0/256 0/2570/258 This pattern of quota is nature and easy for users to understand, otherwise it will make the quota configuration confusing and difficult to maintain. Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com Acked-by: Miao Xie mi...@cn.fujitsu.com Cc: Arne Jansen sensi...@gmx.net --- fs/btrfs/ioctl.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a31cd93..3590c21 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3755,7 +3755,7 @@ static long btrfs_ioctl_qgroup_create(struct file *file, void __user *arg) goto drop_write; } - if (!sa-qgroupid) { + if (!(sa-qgroupid 48)) { ret = -EINVAL; goto out; } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: create the qgroup that limits root subvolume automatically
On 02/22/13 13:02, Wang Shilong wrote: From: Wang Shilong wangsl-f...@cn.fujitsu.com Creating the root subvolume qgroup when enabling quota,with Why only create a qgroup for the root subvolume and not for every existing subvolume? this patch,it will be ok to limit the whole filesystem size. This will not limit the whole filesystem, but only the root subvolume. To limit the whole filesystem you'd have to create a level 1 qgroup and add all subvolumes to it. -Arne Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com Reviewed-by: Miao Xie mi...@cn.fujitsu.com Cc: Arne Jansen sensi...@gmx.net --- fs/btrfs/qgroup.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index a5c8562..c409096 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -777,6 +777,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, struct extent_buffer *leaf; struct btrfs_key key; int ret = 0; + struct btrfs_qgroup *qgroup = NULL; spin_lock(fs_info-qgroup_lock); if (fs_info-quota_root) { @@ -823,7 +824,18 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, btrfs_mark_buffer_dirty(leaf); + btrfs_release_path(path); + ret = add_qgroup_item(trans, quota_root, BTRFS_FS_TREE_OBJECTID); + if (ret) + goto out; + spin_lock(fs_info-qgroup_lock); + qgroup = add_qgroup_rb(fs_info, BTRFS_FS_TREE_OBJECTID); + if (IS_ERR(qgroup)) { + spin_unlock(fs_info-qgroup_lock); + ret = PTR_ERR(qgroup); + goto out; + } fs_info-quota_root = quota_root; fs_info-pending_quota_state = 1; spin_unlock(fs_info-qgroup_lock); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND RFC PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use
On 02/22/13 13:09, Wang Shilong wrote: From: Wang Shilong wangsl-f...@cn.fujitsu.com This patch tries to stop users to create/destroy qgroup level 0, users can only create/destroy qgroup level more than 0. See the fact: a subvolume/snapshot qgroup was created automatically when creating subvolume/snapshot, so creating a qgroup level 0 can't be a subvolume/snapshot qgroup, the only way to use it is that assigning subvolume/snapshot qgroup to it, the point is that we don't want to have a parent qgroup whose level is 0. So we want to force users to use qgroup with clear relations which means a parent qgroup's level child qgroup's level.For example: 2/0 /\ / \ /\ 1/0 1/1 / \\ / \\ / \\ 0/256 0/2570/258 This pattern of quota is nature and easy for users to understand, otherwise it will make the quota configuration confusing and difficult to maintain. I agree that a strict hierarchy of the levels should be enforced. Currently the kernel has no idea of 'level', it's just an artificial concept that lives in userspace. This patch would be the first place to add that magic shift '48' to the kernel. In my opinion it would be sufficient to do the enforcement in user space, as it is of no technical nature. -Arne Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com Acked-by: Miao Xie mi...@cn.fujitsu.com Cc: Arne Jansen sensi...@gmx.net --- fs/btrfs/ioctl.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a31cd93..3590c21 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3755,7 +3755,7 @@ static long btrfs_ioctl_qgroup_create(struct file *file, void __user *arg) goto drop_write; } - if (!sa-qgroupid) { + if (!(sa-qgroupid 48)) { ret = -EINVAL; goto out; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()
https://bugzilla.redhat.com/show_bug.cgi?id=906142 With 3.8 kernels in Fedora 18, using encfs on btrfs I get the following error. It can take hours of use before I get a reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with '-o recovery' to get the filesystem back after a reboot. No data appears to be lost, and a scrub runs to completion with no errors. [14691.074991] WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]() [14691.074993] Hardware name: C2SEA [14691.074995] btrfs bad mapping eb start 645984256 len 4096, wanted 4096 8 [14691.074997] Modules linked in: vfat fat usb_storage fuse rfcomm bnep nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6table_filter nf_conntrack ip6_tables w83627ehf hwmon_vid snd_hda_codec_realtek snd_hda_intel snd_hda_codec uvcvideo videobuf2_vmalloc snd_hwdep snd_seq snd_seq_device videobuf2_memops btusb videobuf2_core videodev snd_pcm bluetooth iTCO_wdt snd_page_alloc media rfkill coretemp snd_timer iTCO_vendor_support i2c_i801 snd lpc_ich mfd_core soundcore microcode r8169 mii vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc i2c_dev uinput btrfs zlib_deflate libcrc32c ata_generic pata_acpi i915 video firewire_ohci i2c_algo_bit firewire_core drm_kms_helper pata_it8213 crc_itu_t drm i2c_core [14691.075070] Pid: 1926, comm: encfs Not tainted 3.8.0-0.rc7.git0.1.fc19.x86_64 #1 [14691.075072] Call Trace: [14691.075093] [a01a7c00] ? map_private_extent_buffer+0xb0/0xe0 [btrfs] [14691.075099] [8105c210] warn_slowpath_common+0x70/0xa0 [14691.075102] [8105c28c] warn_slowpath_fmt+0x4c/0x50 [14691.075121] [a01a7c24] map_private_extent_buffer+0xd4/0xe0 [btrfs] [14691.075139] [a019da30] btrfs_set_token_64+0x60/0xf0 [btrfs] [14691.075159] [a01be264] btrfs_log_changed_extents+0x384/0x600 [btrfs] [14691.075178] [a01c05b8] btrfs_log_inode+0x3b8/0x660 [btrfs] [14691.075196] [a01c1519] btrfs_log_inode_parent+0x169/0x450 [btrfs] [14691.075216] [a01c183a] btrfs_log_dentry_safe+0x3a/0x60 [btrfs] [14691.075234] [a0198400] btrfs_sync_file+0x150/0x1f0 [btrfs] [14691.075239] [811c48c6] do_fsync+0x56/0x80 [14691.075242] [811c4b50] sys_fsync+0x10/0x20 [14691.075247] [8163e419] system_call_fastpath+0x16/0x1b [14691.075253] ---[ end trace 0c19c78181b4038d ]--- [14691.075261] BUG: unable to handle kernel NULL pointer dereference at (null) [14691.075311] IP: [a01a7e23] write_extent_buffer+0xd3/0x150 [btrfs] [14691.075364] PGD 208a79067 PUD 2089a6067 PMD 0 [14691.075400] Oops: [#1] SMP [14691.075425] Modules linked in: vfat fat usb_storage fuse rfcomm bnep nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6table_filter nf_conntrack ip6_tables w83627ehf hwmon_vid snd_hda_codec_realtek snd_hda_intel snd_hda_codec uvcvideo videobuf2_vmalloc snd_hwdep snd_seq snd_seq_device videobuf2_memops btusb videobuf2_core videodev snd_pcm bluetooth iTCO_wdt snd_page_alloc media rfkill coretemp snd_timer iTCO_vendor_support i2c_i801 snd lpc_ich mfd_core soundcore microcode r8169 mii vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc i2c_dev uinput btrfs zlib_deflate libcrc32c ata_generic pata_acpi i915 video firewire_ohci i2c_algo_bit firewire_core drm_kms_helper pata_it8213 crc_itu_t drm i2c_core [14691.076012] CPU 2 [14691.076012] Pid: 1926, comm: encfs Tainted: GW 3.8.0-0.rc7.git0.1.fc19.x86_64 #1 Supermicro C2SEA/C2SEA [14691.076012] RIP: 0010:[a01a7e23] [a01a7e23] write_extent_buffer+0xd3/0x150 [btrfs] [14691.076012] RSP: 0018:88020b653c20 EFLAGS: 00010202 [14691.076012] RAX: RBX: 0008 RCX: 0008 [14691.076012] RDX: 1008 RSI: 2681 RDI: 8801316cf988 [14691.076012] RBP: 88020b653c50 R08: 000a R09: 03ea [14691.076012] R10: R11: 88020b6538d6 R12: 88020b653c80 [14691.076012] R13: 8801316cf988 R14: R15: 0008 [14691.076012] FS: 7fd04462b800() GS:880237d0() knlGS: [14691.076012] CS: 0010 DS: ES: CR0: 80050033 [14691.076012] CR2: CR3: 0001e7e39000 CR4: 07e0 [14691.076012] DR0: DR1: DR2: [14691.076012] DR3: DR6: 0ff0 DR7: 0400 [14691.076012] Process encfs (pid: 1926, threadinfo 88020b652000, task 8801f0c44620) [14691.076012] Stack: [14691.076012] 1000 88020b653d70 8801316cf988 1000 [14691.076012] 0025 0fdb 88020b653cb0 a019dab0 [14691.076012] 880106418000 1000 1000 [14691.076012] Call Trace: [14691.076012] [a019dab0] btrfs_set_token_64+0xe0/0xf0 [btrfs]
Re: collapse concurrent forced allocations (was: Re: clear chunk_alloc flag on retryable failure)
On Thu, Feb 21, 2013 at 06:15:49PM -0700, Alexandre Oliva wrote: On Feb 21, 2013, Alexandre Oliva ol...@gnu.org wrote: What I saw in that function also happens to explain why in some cases I see filesystems allocate a huge number of chunks that remain unused (leading to the scenario above, of not having more chunks to allocate). It happens for data and metadata, but not necessarily both. I'm guessing some thread sets the force_alloc flag on the corresponding space_info, and then several threads trying to get disk space end up attempting to allocate a new chunk concurrently. All of them will see the force_alloc flag and bump their local copy of force up to the level they see first, and they won't clear it even if another thread succeeds in allocating a chunk, thus clearing the force flag. Then each thread that observed the force flag will, on its turn, force the allocation of a new chunk. And any threads that come in while it does that will see the force flag still set and pick it up, and so on. This sounds like a problem to me, but... what should the correct behavior be? Clear force_flag once we copy it to a local force? Reset force to the incoming value on every loop? I think a slight variant of the following makes the most sense, so I implemented it in the patch below. Set the flag to our incoming force if we have it at first, clear our local flag, and move it from the space_info when we determined that we are the thread that's going to perform the allocation? From: Alexandre Oliva ol...@gnu.org btrfs: consume force_alloc in the first thread to chunk_alloc Even if multiple threads in do_chunk_alloc look at force_alloc and see a force flag, it suffices that one of them consumes the flag. Arrange for an incoming force argument to make to force_alloc in case of concurrent calls, so that it is used only by the first thread to get to allocation after the initial request. Signed-off-by: Alexandre Oliva ol...@gnu.org --- fs/btrfs/extent-tree.c |8 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 6ee89d5..66283f7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3574,8 +3574,12 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, again: spin_lock(space_info-lock); + + /* Bring force_alloc to force and tentatively consume it. */ if (force space_info-force_alloc) force = space_info-force_alloc; + space_info-force_alloc = CHUNK_ALLOC_NO_FORCE; + if (space_info-full) { spin_unlock(space_info-lock); return 0; @@ -3586,6 +3590,10 @@ again: return 0; } else if (space_info-chunk_alloc) { wait_for_alloc = 1; + /* Reset force_alloc so that it's consumed by the +first thread that completes the allocation. */ + space_info-force_alloc = force; + force = CHUNK_ALLOC_NO_FORCE; So I understand what you are getting at, but I think you are doing it wrong. If we're calling with CHUNK_ALLOC_FORCE, but somebody has already started to allocate with CHUNK_ALLOC_NO_FORCE, we'll reset the space_info-force_alloc to our original caller's CHUNK_ALLOC_FORCE. So we only really care about making sure a chunk is actually allocated, instead of doing this flag shuffling we should just do if (space_info-chunk_alloc) { spin_unlock(space_info-lock); wait_event(!space_info-chunk_alloc); return 0; } and that way we don't allocate more chunks than normal. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: clear chunk_alloc flag on retryable failure
On Thu, Feb 21, 2013 at 02:15:14PM -0700, Alexandre Oliva wrote: I've experienced filesystem freezes with permanent spikes in the active process count for quite a while, particularly on filesystems whose available raw space has already been fully allocated to chunks. While looking into this, I found a pretty obvious error in do_chunk_alloc: it sets space_info-chunk_alloc, but if btrfs_alloc_chunk returns an error other than ENOSPC, it returns leaving that flag set, which causes any other threads waiting for space_info-chunk_alloc to become zero to spin indefinitely. I haven't double-checked that this patch fixes the failure I've observed fully (it's not exactly trivial to trigger), but it surely is a bug and the fix is trivial, so... Please put it in :-) Yup putting in btrfs-next, thanks. Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: create the qgroup that limits root subvolume automatically
2013/2/22 Arne Jansen sensi...@gmx.net: On 02/22/13 13:02, Wang Shilong wrote: From: Wang Shilong wangsl-f...@cn.fujitsu.com Creating the root subvolume qgroup when enabling quota,with Why only create a qgroup for the root subvolume and not for every existing subvolume? Yes,You are right. Creating all the existed subvolume qgroup is necessary when enabling quota since we try to prevent creating group level 0...the subvolume/snapshot group should be operated automatically... Atfer this work. I think it is necessary to delete the subvolume/snapshot qgroup as the deletion of sub volume/snapshot. BTW, there is a thing to think about... During enabling quota...No new subvolume should be created before the enabling quota is done. I will try to implement such functions. this patch,it will be ok to limit the whole filesystem size. This will not limit the whole filesystem, but only the root subvolume. To limit the whole filesystem you'd have to create a level 1 qgroup and add all subvolumes to it. Right, thanks for correcting it... Thanks, Wang -Arne Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com Reviewed-by: Miao Xie mi...@cn.fujitsu.com Cc: Arne Jansen sensi...@gmx.net --- fs/btrfs/qgroup.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index a5c8562..c409096 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -777,6 +777,7 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, struct extent_buffer *leaf; struct btrfs_key key; int ret = 0; + struct btrfs_qgroup *qgroup = NULL; spin_lock(fs_info-qgroup_lock); if (fs_info-quota_root) { @@ -823,7 +824,18 @@ int btrfs_quota_enable(struct btrfs_trans_handle *trans, btrfs_mark_buffer_dirty(leaf); + btrfs_release_path(path); + ret = add_qgroup_item(trans, quota_root, BTRFS_FS_TREE_OBJECTID); + if (ret) + goto out; + spin_lock(fs_info-qgroup_lock); + qgroup = add_qgroup_rb(fs_info, BTRFS_FS_TREE_OBJECTID); + if (IS_ERR(qgroup)) { + spin_unlock(fs_info-qgroup_lock); + ret = PTR_ERR(qgroup); + goto out; + } fs_info-quota_root = quota_root; fs_info-pending_quota_state = 1; spin_unlock(fs_info-qgroup_lock); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND RFC PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use
Hello, 2013/2/22 Arne Jansen sensi...@gmx.net: On 02/22/13 13:09, Wang Shilong wrote: From: Wang Shilong wangsl-f...@cn.fujitsu.com This patch tries to stop users to create/destroy qgroup level 0, users can only create/destroy qgroup level more than 0. See the fact: a subvolume/snapshot qgroup was created automatically when creating subvolume/snapshot, so creating a qgroup level 0 can't be a subvolume/snapshot qgroup, the only way to use it is that assigning subvolume/snapshot qgroup to it, the point is that we don't want to have a parent qgroup whose level is 0. So we want to force users to use qgroup with clear relations which means a parent qgroup's level child qgroup's level.For example: 2/0 /\ / \ /\ 1/0 1/1 / \\ / \\ / \\ 0/256 0/2570/258 This pattern of quota is nature and easy for users to understand, otherwise it will make the quota configuration confusing and difficult to maintain. I agree that a strict hierarchy of the levels should be enforced. Currently the kernel has no idea of 'level', it's just an artificial concept that lives in userspace. This patch would be the first place to add that magic shift '48' to the kernel. In my opinion it would be sufficient to do the enforcement in user space, as it is of no technical nature. ...i have made some patches about these work in btrfs-prog, but it has been not merged... I will pick up thoses patches and do the other necessary work.. Thanks, Wang -Arne Signed-off-by: Wang Shilong wangsl-f...@cn.fujitsu.com Acked-by: Miao Xie mi...@cn.fujitsu.com Cc: Arne Jansen sensi...@gmx.net --- fs/btrfs/ioctl.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index a31cd93..3590c21 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3755,7 +3755,7 @@ static long btrfs_ioctl_qgroup_create(struct file *file, void __user *arg) goto drop_write; } - if (!sa-qgroupid) { + if (!(sa-qgroupid 48)) { ret = -EINVAL; goto out; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()
On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote: https://bugzilla.redhat.com/show_bug.cgi?id=906142 With 3.8 kernels in Fedora 18, using encfs on btrfs I get the following error. It can take hours of use before I get a reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with '-o recovery' to get the filesystem back after a reboot. No data appears to be lost, and a scrub runs to completion with no errors. Could you do gdb btrfs.ko list *(btrfs_log_inode+0x3b8) and tell me what it says? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
copy on write misconception
I think I have a misconception of what copy on write in btrfs means for individual files. I had originally thought that I could create a large file: time dd if=/dev/zero of=10G bs=1G count=10 10+0 records in 10+0 records out 10737418240 bytes (11 GB) copied, 100.071 s, 107 MB/s real1m41.082s user0m0.000s sys0m7.792s Then if I copied this file no blocks would be copied until they are written. Hence the two files would use the same blocks underneath. But specifically that copy would be fast. Since it would only need to write some metadata. But when I copy the file: time cp 10G 10G2 real3m38.790s user0m0.124s sys0m10.709s Oddly enough it actually takes longer then the initial file creation. So I am guessing that the long duration copy of the file is expected and that is not one of the virtues of btrfs copy on write. Does that sound right? I was looking at a virtual machine solution and thought btrfs would be great if I could copy the vm disk to a new file at low cost and then launch that vm and customize it to my needs. OS Ubuntu 12.10 Mike Power -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copy on write misconception
On Fri, Feb 22, 2013 at 09:11:28AM -0800, Mike Power wrote: I think I have a misconception of what copy on write in btrfs means for individual files. I had originally thought that I could create a large file: time dd if=/dev/zero of=10G bs=1G count=10 10+0 records in 10+0 records out 10737418240 bytes (11 GB) copied, 100.071 s, 107 MB/s real1m41.082s user0m0.000s sys0m7.792s Then if I copied this file no blocks would be copied until they are written. Hence the two files would use the same blocks underneath. But specifically that copy would be fast. Since it would only need to write some metadata. But when I copy the file: time cp 10G 10G2 real3m38.790s user0m0.124s sys0m10.709s Oddly enough it actually takes longer then the initial file creation. So I am guessing that the long duration copy of the file is expected and that is not one of the virtues of btrfs copy on write. Does that sound right? You probably want cp --reflink=always, which makes a CoW copy of the file's metadata only. The resulting files have the semantics of two different files, but share their blocks until a part of one of them is modified (at which point, the modified blocks are no longer shared). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I don't like the look of it, I tell you. Well, stop --- looking at it, then. signature.asc Description: Digital signature
Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()
On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote: https://bugzilla.redhat.com/show_bug.cgi?id=906142 With 3.8 kernels in Fedora 18, using encfs on btrfs I get the following error. It can take hours of use before I get a reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with '-o recovery' to get the filesystem back after a reboot. No data appears to be lost, and a scrub runs to completion with no errors. Could you do gdb btrfs.ko list *(btrfs_log_inode+0x3b8) and tell me what it says? Thanks, Josef # uname -r 3.8.0-0.rc7.git0.1.fc19.x86_64 # gdb /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko (gdb) list *(btrfs_log_inode+0x3b8) 0x675b8 is in btrfs_log_inode (fs/btrfs/tree-log.c:3633). 3628 3629log_extents: 3630if (fast_search) { 3631btrfs_release_path(dst_path); 3632ret = btrfs_log_changed_extents(trans, root, inode, dst_path); 3633if (ret) { 3634err = ret; 3635goto out_unlock; 3636} 3637} else { (gdb) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copy on write misconception
Then if I copied this file no blocks would be copied until they are written. Hence the two files would use the same blocks underneath. But specifically that copy would be fast. Since it would only need to write some metadata. But when I copy the file: time cp 10G 10G2 cp without arguments still does a regular copy; btrfs does nothing to de-duplicate writes. cp --reflink 10G 10G2 will give you the results you expect. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copy on write misconception
On 02/22/2013 09:16 AM, Hugo Mills wrote: On Fri, Feb 22, 2013 at 09:11:28AM -0800, Mike Power wrote: I think I have a misconception of what copy on write in btrfs means for individual files. I had originally thought that I could create a large file: time dd if=/dev/zero of=10G bs=1G count=10 10+0 records in 10+0 records out 10737418240 bytes (11 GB) copied, 100.071 s, 107 MB/s real1m41.082s user0m0.000s sys0m7.792s Then if I copied this file no blocks would be copied until they are written. Hence the two files would use the same blocks underneath. But specifically that copy would be fast. Since it would only need to write some metadata. But when I copy the file: time cp 10G 10G2 real3m38.790s user0m0.124s sys0m10.709s Oddly enough it actually takes longer then the initial file creation. So I am guessing that the long duration copy of the file is expected and that is not one of the virtues of btrfs copy on write. Does that sound right? You probably want cp --reflink=always, which makes a CoW copy of the file's metadata only. The resulting files have the semantics of two different files, but share their blocks until a part of one of them is modified (at which point, the modified blocks are no longer shared). Hugo. I see, and it works great: time cp --reflink=always 10G 10G3 real0m0.028s user0m0.000s sys0m0.000s So from the user perspective I might say I want to opt out of this feature not optin. I want all copies by all applications done as a copy on write. But if my understanding is correct that is up to the application being called (in this case cp) and how it in turns makes calls to the system. In short I can't remount the btrfs filesystem with some new args that says always copy on write files because that is what it already. Mike Power -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()
On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote: On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote: https://bugzilla.redhat.com/show_bug.cgi?id=906142 With 3.8 kernels in Fedora 18, using encfs on btrfs I get the following error. It can take hours of use before I get a reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with '-o recovery' to get the filesystem back after a reboot. No data appears to be lost, and a scrub runs to completion with no errors. Could you do gdb btrfs.ko list *(btrfs_log_inode+0x3b8) and tell me what it says? Thanks, Josef # uname -r 3.8.0-0.rc7.git0.1.fc19.x86_64 # gdb /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko Sigh sorry, I miseed the other line because of line wrapping, can you do list *(btrfs_log_changed_extents+0x384) Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()
On Fri, Feb 22, 2013 at 12:44 PM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote: On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote: https://bugzilla.redhat.com/show_bug.cgi?id=906142 With 3.8 kernels in Fedora 18, using encfs on btrfs I get the following error. It can take hours of use before I get a reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with '-o recovery' to get the filesystem back after a reboot. No data appears to be lost, and a scrub runs to completion with no errors. Could you do gdb btrfs.ko list *(btrfs_log_inode+0x3b8) and tell me what it says? Thanks, Josef # uname -r 3.8.0-0.rc7.git0.1.fc19.x86_64 # gdb /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko Sigh sorry, I miseed the other line because of line wrapping, can you do list *(btrfs_log_changed_extents+0x384) Thanks, Josef (gdb) list *(btrfs_log_changed_extents+0x384) 0x65264 is in btrfs_log_changed_extents (fs/btrfs/ctree.h:2731). 2726 generation, 64); 2727BTRFS_SETGET_FUNCS(file_extent_disk_num_bytes, struct btrfs_file_extent_item, 2728 disk_num_bytes, 64); 2729BTRFS_SETGET_FUNCS(file_extent_offset, struct btrfs_file_extent_item, 2730 offset, 64); 2731BTRFS_SETGET_FUNCS(file_extent_num_bytes, struct btrfs_file_extent_item, 2732 num_bytes, 64); 2733BTRFS_SETGET_FUNCS(file_extent_ram_bytes, struct btrfs_file_extent_item, 2734 ram_bytes, 64); 2735BTRFS_SETGET_FUNCS(file_extent_compression, struct btrfs_file_extent_item, (gdb) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()
On Fri, Feb 22, 2013 at 10:52:19AM -0700, Mace Moneta wrote: On Fri, Feb 22, 2013 at 12:44 PM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote: On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote: https://bugzilla.redhat.com/show_bug.cgi?id=906142 With 3.8 kernels in Fedora 18, using encfs on btrfs I get the following error. It can take hours of use before I get a reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with '-o recovery' to get the filesystem back after a reboot. No data appears to be lost, and a scrub runs to completion with no errors. Could you do gdb btrfs.ko list *(btrfs_log_inode+0x3b8) and tell me what it says? Thanks, Josef # uname -r 3.8.0-0.rc7.git0.1.fc19.x86_64 # gdb /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko Sigh sorry, I miseed the other line because of line wrapping, can you do list *(btrfs_log_changed_extents+0x384) Thanks, Josef (gdb) list *(btrfs_log_changed_extents+0x384) 0x65264 is in btrfs_log_changed_extents (fs/btrfs/ctree.h:2731). 2726 generation, 64); 2727BTRFS_SETGET_FUNCS(file_extent_disk_num_bytes, struct btrfs_file_extent_item, 2728 disk_num_bytes, 64); 2729BTRFS_SETGET_FUNCS(file_extent_offset, struct btrfs_file_extent_item, 2730 offset, 64); 2731BTRFS_SETGET_FUNCS(file_extent_num_bytes, struct btrfs_file_extent_item, 2732 num_bytes, 64); 2733BTRFS_SETGET_FUNCS(file_extent_ram_bytes, struct btrfs_file_extent_item, 2734 ram_bytes, 64); 2735BTRFS_SETGET_FUNCS(file_extent_compression, struct btrfs_file_extent_item, (gdb) Ok nothing obvious is jumping out at me, anything specifc to your btrfs setup? Mount options, raid etc. I'm going to setup encfs up here and hammer it with fsstress and see if I can reproduce. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()
On Fri, Feb 22, 2013 at 1:10 PM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 10:52:19AM -0700, Mace Moneta wrote: On Fri, Feb 22, 2013 at 12:44 PM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote: On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote: https://bugzilla.redhat.com/show_bug.cgi?id=906142 With 3.8 kernels in Fedora 18, using encfs on btrfs I get the following error. It can take hours of use before I get a reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with '-o recovery' to get the filesystem back after a reboot. No data appears to be lost, and a scrub runs to completion with no errors. Could you do gdb btrfs.ko list *(btrfs_log_inode+0x3b8) and tell me what it says? Thanks, Josef # uname -r 3.8.0-0.rc7.git0.1.fc19.x86_64 # gdb /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko Sigh sorry, I miseed the other line because of line wrapping, can you do list *(btrfs_log_changed_extents+0x384) Thanks, Josef (gdb) list *(btrfs_log_changed_extents+0x384) 0x65264 is in btrfs_log_changed_extents (fs/btrfs/ctree.h:2731). 2726 generation, 64); 2727BTRFS_SETGET_FUNCS(file_extent_disk_num_bytes, struct btrfs_file_extent_item, 2728 disk_num_bytes, 64); 2729BTRFS_SETGET_FUNCS(file_extent_offset, struct btrfs_file_extent_item, 2730 offset, 64); 2731BTRFS_SETGET_FUNCS(file_extent_num_bytes, struct btrfs_file_extent_item, 2732 num_bytes, 64); 2733BTRFS_SETGET_FUNCS(file_extent_ram_bytes, struct btrfs_file_extent_item, 2734 ram_bytes, 64); 2735BTRFS_SETGET_FUNCS(file_extent_compression, struct btrfs_file_extent_item, (gdb) Ok nothing obvious is jumping out at me, anything specifc to your btrfs setup? Mount options, raid etc. I'm going to setup encfs up here and hammer it with fsstress and see if I can reproduce. Thanks, Josef The btrfs mount options I'm using are: subvol=home,noatime,autodefrag The encfs is mounted with default options. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()
On Fri, Feb 22, 2013 at 1:16 PM, Mace Moneta moneta.m...@gmail.com wrote: On Fri, Feb 22, 2013 at 1:10 PM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 10:52:19AM -0700, Mace Moneta wrote: On Fri, Feb 22, 2013 at 12:44 PM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote: On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote: https://bugzilla.redhat.com/show_bug.cgi?id=906142 With 3.8 kernels in Fedora 18, using encfs on btrfs I get the following error. It can take hours of use before I get a reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with '-o recovery' to get the filesystem back after a reboot. No data appears to be lost, and a scrub runs to completion with no errors. Could you do gdb btrfs.ko list *(btrfs_log_inode+0x3b8) and tell me what it says? Thanks, Josef # uname -r 3.8.0-0.rc7.git0.1.fc19.x86_64 # gdb /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko Sigh sorry, I miseed the other line because of line wrapping, can you do list *(btrfs_log_changed_extents+0x384) Thanks, Josef (gdb) list *(btrfs_log_changed_extents+0x384) 0x65264 is in btrfs_log_changed_extents (fs/btrfs/ctree.h:2731). 2726 generation, 64); 2727BTRFS_SETGET_FUNCS(file_extent_disk_num_bytes, struct btrfs_file_extent_item, 2728 disk_num_bytes, 64); 2729BTRFS_SETGET_FUNCS(file_extent_offset, struct btrfs_file_extent_item, 2730 offset, 64); 2731BTRFS_SETGET_FUNCS(file_extent_num_bytes, struct btrfs_file_extent_item, 2732 num_bytes, 64); 2733BTRFS_SETGET_FUNCS(file_extent_ram_bytes, struct btrfs_file_extent_item, 2734 ram_bytes, 64); 2735BTRFS_SETGET_FUNCS(file_extent_compression, struct btrfs_file_extent_item, (gdb) Ok nothing obvious is jumping out at me, anything specifc to your btrfs setup? Mount options, raid etc. I'm going to setup encfs up here and hammer it with fsstress and see if I can reproduce. Thanks, Josef The btrfs mount options I'm using are: subvol=home,noatime,autodefrag The encfs is mounted with default options. Oh, and there's no raid data, just a single drive. I don't do heavy I/O to the encfs, which may explain why it takes minutes to hours to recreate. I have my google-chrome config directory (cache, profile, passwords, etc.) in the encfs, so it's getting read/written as I browse. # btrfs fi show failed to read /dev/sr0 Label: 'btrfs' uuid: 057239ee-1cc7-44b2-8fa3-714661dfa7fe Total devices 1 FS bytes used 39.06GB devid1 size 455.58GB used 77.04GB path /dev/sda3 Btrfs Btrfs v0.19 # btrfs fi df /home Data: total=58.01GB, used=38.46GB System, DUP: total=8.00MB, used=16.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=9.50GB, used=611.59MB Metadata: total=8.00MB, used=0.00 Btrfs Btrfs v0.19 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: copy on write misconception
On Fri, Feb 22, 2013 at 11:41 AM, Mike Power dodts...@gmail.com wrote: On 02/22/2013 09:16 AM, Hugo Mills wrote: On Fri, Feb 22, 2013 at 09:11:28AM -0800, Mike Power wrote: I think I have a misconception of what copy on write in btrfs means for individual files. I had originally thought that I could create a large file: time dd if=/dev/zero of=10G bs=1G count=10 10+0 records in 10+0 records out 10737418240 bytes (11 GB) copied, 100.071 s, 107 MB/s real1m41.082s user0m0.000s sys0m7.792s Then if I copied this file no blocks would be copied until they are written. Hence the two files would use the same blocks underneath. But specifically that copy would be fast. Since it would only need to write some metadata. But when I copy the file: time cp 10G 10G2 real3m38.790s user0m0.124s sys0m10.709s Oddly enough it actually takes longer then the initial file creation. So I am guessing that the long duration copy of the file is expected and that is not one of the virtues of btrfs copy on write. Does that sound right? You probably want cp --reflink=always, which makes a CoW copy of the file's metadata only. The resulting files have the semantics of two different files, but share their blocks until a part of one of them is modified (at which point, the modified blocks are no longer shared). Hugo. I see, and it works great: time cp --reflink=always 10G 10G3 real0m0.028s user0m0.000s sys0m0.000s So from the user perspective I might say I want to opt out of this feature not optin. I want all copies by all applications done as a copy on write. But if my understanding is correct that is up to the application being called (in this case cp) and how it in turns makes calls to the system. In short I can't remount the btrfs filesystem with some new args that says always copy on write files because that is what it already. There's no copy a file syscall; when a program copies a file, it opens a new file, and writes all the bytes from the old to the new. Converting this to a reflink would require btrfs to implement full de-dup (which is rather expensive), and still wouldn't prevent the program from reading and writing all 10gb (and so wouldn't be any faster). You can set an alias in your shell to make cp --reflink=auto the default, but that won't affect other programs, nor other users. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug] mkfs.btrfs reports device busy for ext4 mounted disk
Next, since previously we had btrfs on sdb and mkfs.ext4 does not overwrite super-block mirror 1.. so btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr) finds btrfs on sdb. btrfs-progs shouldn't be unconditionally trusting the backup superblocks if the primary is garbage. It should only check the backups if the user specifically asks it to. unless I am missing something. wipefs (along with the below patch) [PATCH][v2] Btrfs: wipe all the superblock [redhat bugzilla 889888] seems to be only solution as of now. This is good practice and will work around the bug in btrfs-progs for now. - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING: at fs/btrfs/extent_io.c:4718 map_private_extent_buffer+0xd4/0xe0 [btrfs]()
On Fri, Feb 22, 2013 at 11:31:07AM -0700, Mace Moneta wrote: On Fri, Feb 22, 2013 at 1:16 PM, Mace Moneta moneta.m...@gmail.com wrote: On Fri, Feb 22, 2013 at 1:10 PM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 10:52:19AM -0700, Mace Moneta wrote: On Fri, Feb 22, 2013 at 12:44 PM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 10:22:04AM -0700, Mace Moneta wrote: On Fri, Feb 22, 2013 at 11:53 AM, Josef Bacik jba...@fusionio.com wrote: On Fri, Feb 22, 2013 at 07:46:16AM -0700, Mace Moneta wrote: https://bugzilla.redhat.com/show_bug.cgi?id=906142 With 3.8 kernels in Fedora 18, using encfs on btrfs I get the following error. It can take hours of use before I get a reoccurrence, and I need to btrfsck, btrfs-zero-log, and/or mount with '-o recovery' to get the filesystem back after a reboot. No data appears to be lost, and a scrub runs to completion with no errors. Could you do gdb btrfs.ko list *(btrfs_log_inode+0x3b8) and tell me what it says? Thanks, Josef # uname -r 3.8.0-0.rc7.git0.1.fc19.x86_64 # gdb /usr/lib/modules/3.8.0-0.rc7.git0.1.fc19.x86_64/kernel/fs/btrfs/btrfs.ko Sigh sorry, I miseed the other line because of line wrapping, can you do list *(btrfs_log_changed_extents+0x384) Thanks, Josef (gdb) list *(btrfs_log_changed_extents+0x384) 0x65264 is in btrfs_log_changed_extents (fs/btrfs/ctree.h:2731). 2726 generation, 64); 2727BTRFS_SETGET_FUNCS(file_extent_disk_num_bytes, struct btrfs_file_extent_item, 2728 disk_num_bytes, 64); 2729BTRFS_SETGET_FUNCS(file_extent_offset, struct btrfs_file_extent_item, 2730 offset, 64); 2731BTRFS_SETGET_FUNCS(file_extent_num_bytes, struct btrfs_file_extent_item, 2732 num_bytes, 64); 2733BTRFS_SETGET_FUNCS(file_extent_ram_bytes, struct btrfs_file_extent_item, 2734 ram_bytes, 64); 2735BTRFS_SETGET_FUNCS(file_extent_compression, struct btrfs_file_extent_item, (gdb) Ok nothing obvious is jumping out at me, anything specifc to your btrfs setup? Mount options, raid etc. I'm going to setup encfs up here and hammer it with fsstress and see if I can reproduce. Thanks, Josef The btrfs mount options I'm using are: subvol=home,noatime,autodefrag The encfs is mounted with default options. Oh, and there's no raid data, just a single drive. I don't do heavy I/O to the encfs, which may explain why it takes minutes to hours to recreate. I have my google-chrome config directory (cache, profile, passwords, etc.) in the encfs, so it's getting read/written as I browse. So incase I can't reproduce can you build btrfs-next and see if it reproduces on there? And if it does perfect I can send you debug patches to apply and such. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: update inode flags when renaming
On Fri, Feb 22, 2013 at 05:34:47PM +0800, Miao Xie wrote: Onfri, 22 Feb 2013 16:40:35 +0800, Liu Bo wrote: On Fri, Feb 22, 2013 at 03:32:50AM -0500, Marios Titas wrote: Sorry, but the bug persists even with the above patch. touch test chattr +C test lsattr test mv test test2 lsattr test2 In the above scenario test2 will not have the C flag. What do you expect? IMO it's right that test2 does not have the C flag. No, it's not right. For the users, they expect the C flag is not lost because they just do a rename operation. but fixup_inode_flags() re-sets the flags by the parent directory's flag. I think we should inherit the flags from the parent just when we create a new file/directory, in the other cases, just give a option to the users. How do you think about? I agree with that. The COW status of a file should not be changed at all when renamed. The typical users are database files and vm images, losing the NOCOW flag just from moving here and back is quite unexpected. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: update inode flags when renaming
On Fri, Feb 22, 2013 at 04:19:27PM -0500, Marios Titas wrote: I think that many end users will find all this very confusing. They will never expect that renaming a file will cause it to suddenly lose one flag (NODATACOW) while preserving the other (NODATASUM). Especially since they cannot explicitly control the NODATASUM flag on a per file basis. I think that renaming a file should preserve all flags no matter if it's done in the same directory or not. Just like it preserves permissions, ownership and inode number. I agree. For completeness, the other inherited flags/attributes are compression statuses. Silently changing them on remove may be wrong in case the file gains the 'never try to compress' flag after some clever heuristic (which we do not have yet) decides so. So I think inheriting the flags from the parent on rename is not a good idea either. Interestingly enough, files don't lose any of the two flags if instead of renaming you link and then unlink the original. Link does not take the same codepath as new file or rename. A new directory entry is created and link count increased. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug] mkfs.btrfs reports device busy for ext4 mounted disk
On Fri, Feb 22, 2013 at 11:03:25AM -0800, Zach Brown wrote: Next, since previously we had btrfs on sdb and mkfs.ext4 does not overwrite super-block mirror 1.. so btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr) finds btrfs on sdb. btrfs-progs shouldn't be unconditionally trusting the backup superblocks if the primary is garbage. It should only check the backups if the user specifically asks it to. Agreed. Let me add that all the rescue tools should accept a parameter to pick the backup superblocks. Currently fsck -s, select-super -s, restore -u (though I'd like see all the option names unified, 'S' is my candidate that would not break compatibility). david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND RFC PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use
On Sat, Feb 23, 2013 at 12:39:24AM +0800, Shilong Wang wrote: Hello, 2013/2/22 Arne Jansen sensi...@gmx.net: On 02/22/13 13:09, Wang Shilong wrote: From: Wang Shilong wangsl-f...@cn.fujitsu.com This patch tries to stop users to create/destroy qgroup level 0, users can only create/destroy qgroup level more than 0. See the fact: a subvolume/snapshot qgroup was created automatically when creating subvolume/snapshot, so creating a qgroup level 0 can't be a subvolume/snapshot qgroup, the only way to use it is that assigning subvolume/snapshot qgroup to it, the point is that we don't want to have a parent qgroup whose level is 0. So we want to force users to use qgroup with clear relations which means a parent qgroup's level child qgroup's level.For example: 2/0 /\ / \ /\ 1/0 1/1 / \\ / \\ / \\ 0/256 0/2570/258 This pattern of quota is nature and easy for users to understand, otherwise it will make the quota configuration confusing and difficult to maintain. I agree that a strict hierarchy of the levels should be enforced. Currently the kernel has no idea of 'level', it's just an artificial concept that lives in userspace. This patch would be the first place to add that magic shift '48' to the kernel. In my opinion it would be sufficient to do the enforcement in user space, as it is of no technical nature. ...i have made some patches about these work in btrfs-prog, but it has been not merged... I will pick up thoses patches and do the other necessary work.. This one? https://patchwork.kernel.org/patch/2008591/ went through integration branch into progs' master. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] btrfs: clean snapshots one by one
On Sun, Feb 17, 2013 at 09:55:23PM +0200, Alex Lyakas wrote: --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1635,15 +1635,17 @@ static int cleaner_kthread(void *arg) struct btrfs_root *root = arg; do { + int again = 0; + if (!(root-fs_info-sb-s_flags MS_RDONLY) mutex_trylock(root-fs_info-cleaner_mutex)) { btrfs_run_delayed_iputs(root); - btrfs_clean_old_snapshots(root); + again = btrfs_clean_one_deleted_snapshot(root); mutex_unlock(root-fs_info-cleaner_mutex); btrfs_run_defrag_inodes(root-fs_info); } - if (!try_to_freeze()) { + if (!try_to_freeze() !again) { set_current_state(TASK_INTERRUPTIBLE); if (!kthread_should_stop()) schedule(); @@ -3301,8 +3303,8 @@ int btrfs_commit_super(struct btrfs_root *root) mutex_lock(root-fs_info-cleaner_mutex); btrfs_run_delayed_iputs(root); - btrfs_clean_old_snapshots(root); mutex_unlock(root-fs_info-cleaner_mutex); + wake_up_process(root-fs_info-cleaner_kthread); I am probably missing something, but if the cleaner wakes up here, won't it attempt cleaning the next snap? Because I don't see the cleaner checking anywhere that we are unmounting. Or at this point dead_roots is supposed to be empty? No, you're right, the check of umount semaphore is missing (was in the dusted patchset and was titled 'avoid cleaner deadlock' which we solve now in another way, so I did not realize the patch is actually needed). So, this hunk should do it: --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1627,11 +1627,13 @@ static int cleaner_kthread(void *arg) int again = 0; if (!(root-fs_info-sb-s_flags MS_RDONLY) + down_read_trylock(root-fs_info-sb-s_umount) mutex_trylock(root-fs_info-cleaner_mutex)) { btrfs_run_delayed_iputs(root); again = btrfs_clean_one_deleted_snapshot(root); mutex_unlock(root-fs_info-cleaner_mutex); btrfs_run_defrag_inodes(root-fs_info); + up_read(root-fs_info-sb-s_umount); } if (!try_to_freeze() !again) { --- Seems that also checking for btrfs_fs_closing != 0 would help here. And to the second part, no dead_roots is not supposed to be empty. @@ -1783,31 +1783,50 @@ cleanup_transaction: } /* - * interface function to delete all the snapshots we have scheduled for deletion + * return 0 if error + * 0 if there are no more dead_roots at the time of call + * 1 there are more to be processed, call me again + * + * The return value indicates there are certainly more snapshots to delete, but + * if there comes a new one during processing, it may return 0. We don't mind, + * because btrfs_commit_super will poke cleaner thread and it will process it a + * few seconds later. */ -int btrfs_clean_old_snapshots(struct btrfs_root *root) +int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root) { - LIST_HEAD(list); + int ret; + int run_again = 1; struct btrfs_fs_info *fs_info = root-fs_info; + if (root-fs_info-sb-s_flags MS_RDONLY) { + pr_debug(G btrfs: cleaner called for RO fs!\n); + return 0; + } + spin_lock(fs_info-trans_lock); - list_splice_init(fs_info-dead_roots, list); + if (list_empty(fs_info-dead_roots)) { + spin_unlock(fs_info-trans_lock); + return 0; + } + root = list_first_entry(fs_info-dead_roots, + struct btrfs_root, root_list); + list_del(root-root_list); spin_unlock(fs_info-trans_lock); - while (!list_empty(list)) { - int ret; - - root = list_entry(list.next, struct btrfs_root, root_list); - list_del(root-root_list); + pr_debug(btrfs: cleaner removing %llu\n, + (unsigned long long)root-objectid); - btrfs_kill_all_delayed_nodes(root); + btrfs_kill_all_delayed_nodes(root); - if (btrfs_header_backref_rev(root-node) - BTRFS_MIXED_BACKREF_REV) - ret = btrfs_drop_snapshot(root, NULL, 0, 0); - else - ret =btrfs_drop_snapshot(root, NULL, 1, 0); - BUG_ON(ret 0); - } - return 0; + if (btrfs_header_backref_rev(root-node) + BTRFS_MIXED_BACKREF_REV) + ret = btrfs_drop_snapshot(root, NULL, 0,
Re: Rebalancing RAID1
On Mon, 18 Feb 2013, Stefan Behrens wrote: On Fri, 15 Feb 2013 22:56:19 +0100 (CET), Fredrik Tolf wrote: The oops cut can be found here: http://www.dolda2000.com/~fredrik/tmp/btrfs-oops This scrub issue is fixed since Linux 3.8-rc1 with commit 4ded4f6 Btrfs: fix BUG() in scrub when first superblock reading gives EIO I see, thanks! Rebooting the system did get me running again, allowing me to remove the missing device from filesystem. However, I encountered a couple of somewhat strange happenings as I did that. I don't know if they're considered bugs or not, but I thought I had best report them. To begin with, the act of removing the missing device from the filesystem itself caused the resynchronization to the new device to happen in blocking mode, so the btrfs device delete missing operation took about a day to finish. My expectation would have been that the device removal would have been a fast operation and that I would have had to scrub the filesystem or something in order to resynchronize, but I can see how this would be intented behavior. However, what's weirder is that while the resynchronization was underway, I couldn't mount subvolumes on other mountpoints. The mount commands blocked (disk-slept) until the entire synchronization was done, and I don't think this was intended behavior, because I had the kernel saying the following while it happened: Feb 16 06:01:27 nerv kernel: [ 3482.512106] INFO: task mount:3525 blocked for more than 120 seconds. Feb 16 06:01:28 nerv kernel: [ 3482.518484] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Feb 16 06:01:28 nerv kernel: [ 3482.526324] mount D 88003e220e40 0 3525 3524 0x Feb 16 06:01:28 nerv kernel: [ 3482.533587] 88003e220e40 0082 a0067470 88003e2300c0 Feb 16 06:01:28 nerv kernel: [ 3482.541088] 00013b40 88001126dfd8 00013b40 88001126dfd8 Feb 16 06:01:28 nerv kernel: [ 3482.548584] 00013b40 88003e220e40 00013b40 88001126c010 Feb 16 06:01:28 nerv kernel: [ 3482.556280] Call Trace: Feb 16 06:01:28 nerv kernel: [ 3482.558776] [81396132] ? __mutex_lock_common+0x10d/0x175 Feb 16 06:01:28 nerv kernel: [ 3482.565078] [81396260] ? mutex_lock+0x1a/0x2c Feb 16 06:01:28 nerv kernel: [ 3482.570661] [a05a38c2] ? btrfs_scan_one_device+0x40/0x133 [btrfs] Feb 16 06:01:28 nerv kernel: [ 3482.577752] [a0564e8b] ? btrfs_mount+0x1c4/0x4d8 [btrfs] Feb 16 06:01:28 nerv kernel: [ 3482.584080] [810e56cb] ? pcpu_next_pop+0x37/0x43 Feb 16 06:01:28 nerv kernel: [ 3482.589709] [810e52c0] ? cpumask_next+0x18/0x1a Feb 16 06:01:28 nerv kernel: [ 3482.595226] [811012aa] ? alloc_pages_current+0xbb/0xd8 Feb 16 06:01:28 nerv kernel: [ 3482.601345] [81113778] ? mount_fs+0x6c/0x149 Feb 16 06:01:28 nerv kernel: [ 3482.606595] [811291f7] ? vfs_kern_mount+0x67/0xdd Feb 16 06:01:28 nerv kernel: [ 3482.612292] [a056516b] ? btrfs_mount+0x4a4/0x4d8 [btrfs] Feb 16 06:01:28 nerv kernel: [ 3482.618673] [810e52c0] ? cpumask_next+0x18/0x1a Feb 16 06:01:28 nerv kernel: [ 3482.624178] [811012aa] ? alloc_pages_current+0xbb/0xd8 Feb 16 06:01:28 nerv kernel: [ 3482.630347] [81113778] ? mount_fs+0x6c/0x149 Feb 16 06:01:28 nerv kernel: [ 3482.635580] [811291f7] ? vfs_kern_mount+0x67/0xdd Feb 16 06:01:28 nerv kernel: [ 3482.641258] [811292e0] ? do_kern_mount+0x49/0xd6 Feb 16 06:01:29 nerv kernel: [ 3482.646855] [81129a98] ? do_mount+0x72b/0x791 Feb 16 06:01:29 nerv kernel: [ 3482.652186] [81129b86] ? sys_mount+0x88/0xc3 Feb 16 06:01:29 nerv kernel: [ 3482.657464] [8139d229] ? system_call_fastpath+0x16/0x1b Furthermore, it struck me that the consequences of having to mount a filesystem with missing deviced with -o degraded can be a bit strange. I realize what the intentions of the behavior is, of course, but I think it might cause quite some difficulties when trying to mount a degraded btrfs filesystem as root on a system that you don't have physical access to, like a hosted server, because it might be hard to manipulate the boot process so as to pass that mountflag to the initrd. Note that this is not a problem with md-raid; it will simply assemble its arrays in degraded mode automatically, without intervention. I'm not necessarily saying that's better, but I thought I should bring up the point. -- Fredrik Tolf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Changing allocation mode
Dear list, I'm still in the process of transferring all the data I have to the btrfs filesystem I have had your help in debugging in a previous thread, and I have a slight question, if you will humour me. I have the data I want to transfer on an old ReiserFS partition, consisting of 2 mdraid mirrors, one of which consists of two 1.5 TB disks, and the other of two 3 TB disks. The btrfs I'm copying the data to consists of two 3 TB disks only that I have put in RAID-1 mode, and the data on the old filesystem is only slightly larger than 3 TB. I am now at the point where I have transferred just under 3 TB. If I were transferring the data to a new filesystem on mdraid, the procedure I would use for that last portion of the data would be to remove one disk only from either of the old mdraid mirror arrays (putting that array in degraded mode), and then create a new mirror in degraded mode with only that disk, add that mirror to the new filesystem, expand it, copy the last data, and then delete the old mirrors, moving the rest of the disks to the new filesystem. Is there a way to mirror this procedure in btrfs? I'm not yet quite so familiar with all btrfs concepts that I know quite what I'm talking about, but I'm guessing that what I want to do is to merely temporarily set the allocator to allocate new btrfs on a single disk only, and then add a single disk to the filesystem. And then copy the rest of the data, abandon the old filesystem and add another disk and rebalance those singly-allocated extents to RAID-1 mode. Have I described a conceptionable idea in saying so? And if so, how does one actually do that? I don't know if I'm just blind, but I haven't found any btrfs command to change the allocation algorithm without having to rebalance the existing data, which seems a bit unnecessary in this case. Thanks for any help you can offer! -- Fredrik Tolf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Tests] xfs test[298]: Btrfs Quota testing
On Fri, Feb 22, 2013 at 11:24:04AM +0530, Hemanth Kumar wrote: Signed-off-by: Hemanth Kumar hemanthkuma...@gmail.com Description? --- 298 | 37 + 298.out | 12 2 files changed, 49 insertions(+) create mode 100644 298 create mode 100644 298.out You didn't actaully run this in the xfstests, harness, did you? i.e. $ ./check 298 because it won't run as there is no entry in the group file for this test diff --git a/298 b/298 new file mode 100644 index 000..d699fb7 --- /dev/null +++ b/298 @@ -0,0 +1,37 @@ + +#! /bin/bash +# FS QA Test No. 298 Newline at top of patch. +# +# Test btrfs's quotas +# +#-- +# +# creator +#owner=hemanthkuma...@gmail.com Copyright statement? + + +seq=`basename $0` +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ +status=1# failure is the default! + +_cleanup() +{ +rm -rf $tmp.* +} + +trap _cleanup ; exit \$status 0 1 2 3 15 + +#Enabeling btrfs qutas Where are all the usual _require() statements? +btrfs quota enable $TEST_DIR +echo quota enabled on $TEST_DEV That won't work - you're not allowed to output actual device names into the golden output. We always filter then the to TEST_DEV/TEST_DIR or SCRATCH_DEV/SCRATCH_MNT +btrfs subvolume create $TEST_DIR/vol1 +echo vol1 created +btrfs qgroup show $TEST_DIR That output will change for different filesystem configurations. needs filtering, or dropping. +btrfs qgroup limit 2m $TEST_DIR/vol1 +echo qgroup limited to 2mb +dd if=$TEST_DEV of=$TEST_DIR/vol1/file1 bs=3M count=1 You need to filter the output of dd to remove all variable data. However, it is preferable to use xfs_io for doing IO. i.e: $XFS_IO_PROG -f -c pwrite 0 3m $TEST_DIR/vol1/file1 | filter_io +echo tried to write 3m worth data +exit You never set status=0, so the test will always fail. Also, you don't undo any of the modifications you made to the TEST_DEV, which means that it will affect all subsequent tests. If you are doing specific configuration tests, your should be using the SCRATCH_DEV/SCRATCH_MNT diff --git a/298.out b/298.out new file mode 100644 index 000..344ab7f --- /dev/null +++ b/298.out @@ -0,0 +1,12 @@ +QA output created by 298 +quota enabled on /dev/sdb5 +Create subvolume '/test/vol1' +vol1 created +0/257 4096 4096 +qgroup limited to 2mb +dd: writing ‘/test/vol1/file1’: Disk quota exceeded You've got environment specific characters in your output. Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND RFC PATCH 2/2] Btrfs: disable the qgroup level 0 for userspace use
Hello, David 2013/2/23 David Sterba dste...@suse.cz: On Sat, Feb 23, 2013 at 12:39:24AM +0800, Shilong Wang wrote: Hello, 2013/2/22 Arne Jansen sensi...@gmx.net: On 02/22/13 13:09, Wang Shilong wrote: From: Wang Shilong wangsl-f...@cn.fujitsu.com This patch tries to stop users to create/destroy qgroup level 0, users can only create/destroy qgroup level more than 0. See the fact: a subvolume/snapshot qgroup was created automatically when creating subvolume/snapshot, so creating a qgroup level 0 can't be a subvolume/snapshot qgroup, the only way to use it is that assigning subvolume/snapshot qgroup to it, the point is that we don't want to have a parent qgroup whose level is 0. So we want to force users to use qgroup with clear relations which means a parent qgroup's level child qgroup's level.For example: 2/0 /\ / \ /\ 1/0 1/1 / \\ / \\ / \\ 0/256 0/2570/258 This pattern of quota is nature and easy for users to understand, otherwise it will make the quota configuration confusing and difficult to maintain. I agree that a strict hierarchy of the levels should be enforced. Currently the kernel has no idea of 'level', it's just an artificial concept that lives in userspace. This patch would be the first place to add that magic shift '48' to the kernel. In my opinion it would be sufficient to do the enforcement in user space, as it is of no technical nature. ...i have made some patches about these work in btrfs-prog, but it has been not merged... I will pick up thoses patches and do the other necessary work.. This one? https://patchwork.kernel.org/patch/2008591/ went through integration branch into progs' master. Yes, it is.However, more work needs done to make it work well.. I'd continue my work based on integration-20130126.. Thanks, Wang david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html