Re: [PATCH v2] Btrfs: fix tree mod logging
Thanks a lot Filipe! Have been testing this patch now for 5 days and it fixed this annoying Problem since 3.11.0 on 3x NFS Servers here. This is also a candidate that should be back ported, as it fixes crashes. Just for Information for others here: Your previous patch, "Btrfs: return immediately if tree log mod is not necessary" is also needed to make it apply cleanly. Ahmet -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] Btrfs: introduce lock_ref/unlock_ref
On Wed, Dec 18, 2013 at 04:07:27PM -0500, Josef Bacik wrote: > qgroups need to have a consistent view of the references for a particular > extent > record. Currently they do this through sequence numbers on delayed refs, but > this is no longer acceptable. So instead introduce lock_ref/unlock_ref. This > will provide the qgroup code with a consistent view of the reference while it > does its accounting calculations without interfering with the delayed ref > code. > Thanks, > > Signed-off-by: Josef Bacik > --- > fs/btrfs/ctree.h | 11 ++ > fs/btrfs/delayed-ref.c | 2 + > fs/btrfs/delayed-ref.h | 1 + > fs/btrfs/extent-tree.c | 102 > +++-- > 4 files changed, 113 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index a924274..8b3fd61 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1273,6 +1273,9 @@ struct btrfs_block_group_cache { > > /* For delayed block group creation */ > struct list_head new_bg_list; > + > + /* For locking reference modifications */ > + struct extent_io_tree ref_lock; > }; > > /* delayed seq elem */ > @@ -3319,6 +3322,14 @@ int btrfs_init_space_info(struct btrfs_fs_info > *fs_info); > int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans, >struct btrfs_fs_info *fs_info); > int __get_raid_index(u64 flags); > +int lock_ref(struct btrfs_fs_info *fs_info, u64 root_objectid, u64 bytenr, > + u64 num_bytes, int for_cow, > + struct btrfs_block_group_cache **block_group, > + struct extent_state **cached_state); > +int unlock_ref(struct btrfs_fs_info *fs_info, u64 root_objectid, u64 bytenr, > +u64 num_bytes, int for_cow, > +struct btrfs_block_group_cache *block_group, > +struct extent_state **cached_state); Please namespace these - they are far too similar to the generic struct lockref name and manipulation functions Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: sync-up with newly introduced ioctl number
for now the manual sync up of new ioctls introduced in the btrfs kernel. For which there wasn't any btrfs-progs patch. however we might have better idea for the long run. Signed-off-by: Anand Jain --- ioctl.h |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/ioctl.h b/ioctl.h index 29575d8..cd90afe 100644 --- a/ioctl.h +++ b/ioctl.h @@ -623,6 +623,12 @@ struct btrfs_ioctl_clone_range_args { struct btrfs_ioctl_dedup_args) #define BTRFS_IOC_GET_FSLIST _IOWR(BTRFS_IOCTL_MAGIC, 56, \ struct btrfs_ioctl_fslist_args) +#define BTRFS_IOC_GET_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 57, \ + struct btrfs_ioctl_feature_flags) +#define BTRFS_IOC_SET_FEATURES _IOW(BTRFS_IOCTL_MAGIC, 58, \ + struct btrfs_ioctl_feature_flags[2]) +#define BTRFS_IOC_GET_SUPPORTED_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 59, \ + struct btrfs_ioctl_feature_flags[3]) #ifdef __cplusplus } #endif -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: ioctls would need unique id
BTRFS_IOC_SET_FEATURES and BTRFS_IOC_GET_SUPPORTED_FEATURES conflicts with BTRFS_IOC_GET_FEATURES Signed-off-by: Anand Jain --- include/uapi/linux/btrfs.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index 7d7f776..0fe736e 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -634,9 +634,9 @@ struct btrfs_ioctl_fslist_args { struct btrfs_ioctl_fslist_args) #define BTRFS_IOC_GET_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 57, \ struct btrfs_ioctl_feature_flags) -#define BTRFS_IOC_SET_FEATURES _IOW(BTRFS_IOCTL_MAGIC, 57, \ +#define BTRFS_IOC_SET_FEATURES _IOW(BTRFS_IOCTL_MAGIC, 58, \ struct btrfs_ioctl_feature_flags[2]) -#define BTRFS_IOC_GET_SUPPORTED_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 57, \ +#define BTRFS_IOC_GET_SUPPORTED_FEATURES _IOR(BTRFS_IOCTL_MAGIC, 59, \ struct btrfs_ioctl_feature_flags[3]) #endif /* _UAPI_LINUX_BTRFS_H */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] btrfs: add framework to read fs info from btrfs-control
This adds ioctl BTRFS_IOC_GET_FSIDS which reads the fs info through the btrfs-control, needed to optimize heavily used btrfs-progs function check_mounted() plus few other minor uses. Signed-off-by: Anand Jain --- v3: rebase and update commit v2: accepts Zach suggested and now holds uuid_mutex fs/btrfs/super.c | 66 fs/btrfs/volumes.c | 39 ++ fs/btrfs/volumes.h |2 + include/uapi/linux/btrfs.h | 19 4 files changed, 120 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 0d4b1c3..13884c5 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1645,38 +1645,92 @@ static struct file_system_type btrfs_fs_type = { }; MODULE_ALIAS_FS("btrfs"); +static int btrfs_ioc_get_fslist(void __user *arg) +{ + int ret = 0; + u64 sz_fslist_arg; + u64 sz_fslist; + u64 sz_out; + struct btrfs_ioctl_fslist_args *fslist_arg; + struct btrfs_ioctl_fslist_args *fslist_arg_tmp; + struct btrfs_ioctl_fslist *fslist; + + u64 cnt = 0, ucnt; + + sz_fslist_arg = sizeof(*fslist_arg); + sz_fslist = sizeof(*fslist); + if (copy_from_user(&ucnt, + (struct btrfs_ioctl_fslist_args __user *)(arg + + offsetof(struct btrfs_ioctl_fslist_args, count)), + sizeof(ucnt))) + return -EFAULT; + + cnt = btrfs_get_fslist_cnt(); + + if (cnt > ucnt) { + if (copy_to_user(arg + + offsetof(struct btrfs_ioctl_fslist_args, count), + &cnt, sizeof(cnt))) + return -EFAULT; + return 1; + } + + sz_out = sz_fslist_arg + sz_fslist * cnt; + fslist_arg_tmp = fslist_arg = memdup_user(arg, sz_out); + if (IS_ERR(fslist_arg)) + return PTR_ERR(fslist_arg); + fslist = (struct btrfs_ioctl_fslist *) (++fslist_arg_tmp); + cnt = btrfs_get_fslist(fslist, cnt); + fslist_arg->count = cnt; + if (copy_to_user(arg, fslist_arg, sz_out)) { + ret = -EFAULT; + goto out; + } + ret = 0; +out: + kfree(fslist_arg); + return ret; +} + /* * used by btrfsctl to scan devices when no FS is mounted */ static long btrfs_control_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { - struct btrfs_ioctl_vol_args *vol; + struct btrfs_ioctl_vol_args *vol = NULL; struct btrfs_fs_devices *fs_devices; int ret = -ENOTTY; + void __user *argp = (void __user *)arg; if (!capable(CAP_SYS_ADMIN)) return -EPERM; - vol = memdup_user((void __user *)arg, sizeof(*vol)); - if (IS_ERR(vol)) - return PTR_ERR(vol); - switch (cmd) { case BTRFS_IOC_SCAN_DEV: + vol = memdup_user((void __user *)arg, sizeof(*vol)); + if (IS_ERR(vol)) + return PTR_ERR(vol); ret = btrfs_scan_one_device(vol->name, FMODE_READ, &btrfs_fs_type, &fs_devices); + kfree(vol); break; case BTRFS_IOC_DEVICES_READY: + vol = memdup_user((void __user *)arg, sizeof(*vol)); + if (IS_ERR(vol)) + return PTR_ERR(vol); ret = btrfs_scan_one_device(vol->name, FMODE_READ, &btrfs_fs_type, &fs_devices); + kfree(vol); if (ret) break; ret = !(fs_devices->num_devices == fs_devices->total_devices); break; + case BTRFS_IOC_GET_FSLIST: + ret = btrfs_ioc_get_fslist(argp); + break; } - kfree(vol); return ret; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 92303f4..debd619 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6284,3 +6284,42 @@ int btrfs_scratch_superblock(struct btrfs_device *device) return 0; } + +int btrfs_get_fslist_cnt(void) +{ + int cnt = 0; + struct btrfs_fs_devices *fs_devices; + + mutex_lock(&uuid_mutex); + list_for_each_entry(fs_devices, &fs_uuids, list) + cnt++; + mutex_unlock(&uuid_mutex); + + return cnt; +} + +u64 btrfs_get_fslist(struct btrfs_ioctl_fslist *fslist, u64 ucnt) +{ + u64 cnt = 0; + struct btrfs_fs_devices *fs_devices; + + mutex_lock(&uuid_mutex); + list_for_each_entry(fs_devices, &fs_uuids, list) { + if (!(cnt < ucnt)) + break; + memcpy(fslist->fsid, fs_devices->fsid, + BTRFS_FSID_SIZE); + fslist->num_devices = fs_devices->num_devices; + fslist->missing_devices = fs_devices->missing_device
[PATCH] btrfs: fix the warning in prepare_pages
would fix the below compile warning fs/btrfs/file.c: In function ‘prepare_pages’: fs/btrfs/file.c:1247: warning: ‘err’ may be used uninitialized in this function Signed-off-by: Anand Jain --- fs/btrfs/file.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 740ae8c..35bf838 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1244,7 +1244,7 @@ static noinline int prepare_pages(struct inode *inode, struct page **pages, int i; unsigned long index = pos >> PAGE_CACHE_SHIFT; gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping); - int err; + int err = 0; int faili; for (i = 0; i < num_pages; i++) { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rework qgroup accounting
On Wed, Dec 18, 2013 at 04:07:26PM -0500, Josef Bacik wrote: > People have been complaining about autodefrag/defrag killing their box with > OOM. > This is because the snapshot aware defrag stuff super sucks if you have lots > of > snapshots, and so that needs to be reworked. The problem is once that is > fixed > you start to hit horrible lock contention on the delayed refs lock because we > have thousands of like entries that can't be merged until when we go to > actually > run the delayed ref. This problem exists because of the delayed ref sequence > number. > > The major user of the delayed ref sequence number is the qgroup code. It uses > it to pass into btrfs_find_all_roots to see what roots pointed to a particular > bytenr either before or including the current operation. It needs this > information to know if we were removing the last ref or an just the last ref > for > this particular root. The problem with this is that it has made the delayed > ref > code incredibly fragile and has forced us to do things like > btrfs_merge_delayed_refs which is what is causing us so much pain when we have > thousands of ref updates for the same block. > > In order to fix this I'm introducing a new way of adjusting quota counts. > I've > called them qgroup operations, and we apply them in very specific situations. > We only add these when we add or remove the only ref for a particular root. > Obviously we have to account for shared refs as well so there is some extra > code > for these special cases, but basically we make the qgroup accounting only > happen > when we know there was a real change (or likely a real change in the case of > shared refs). > > In order to do this I've also introduced lock/unlock_ref. This only gets used > if we actually have qgroups enabled, but it will be relatively low cost even > if > we have qgroups enabled as it only locks the bytenr for reference updates. So > delayed ref updates will not trip over this since we only do one at a time > anyway, so we'll only have contention if we have delayed refs running at the > same time as a qgroup operation update. > > Then all we need to account for is the fact that we will get the full view of > the roots at the time we run the operations, not what they were when our > particular operation occurred. This is ok because we will either ignore our > root in the case of add or not ignore it in case of remove when calculating > the > ref counts. We use the same ref counting scheme that Arne developed as it's > pretty freaking awesome, and just adjust how we count the ref counts based on > our operations. > > In addition to all of this new code I've added a big set of sanity tests to > make > sure everything is working right. Between this and the qgroups xfstests I'm > pretty certain I haven't broken anything obvious with qgroups. This is just > the > first step in getting rid of the delayed ref sequence number and fixing the > defrag OOM mess but it is the biggest part. Thanks, I'd say I love the idea, will look at it closer. -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] Btrfs: convert printk to btrfs_ and fix BTRFS prefix
On 12/12/2013 02:57 PM, Frank Holton wrote: Convert all applicable cases of printk and pr_* to the btrfs_* macros. Fix all uses of the BTRFS prefix. Signed-off-by: Frank Holton There are tailing whitespaces everywhere. Please run this through checkpatch.pl so you can fix all those up, or use let c_space_errors = 1 in your .vimrc so you can see them. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: improve the performance fluctuating of the fsync
On 12/18/2013 05:52 AM, Miao Xie wrote: In order to improve the performance of fsync, we use the outstanding ordered extents to avoid looking up the checksum from the csum tree. But we didn't filter out the ordered extents whose csum is still being calculated, when we got those ordered extents, we had to wait for the csum calculation. It made the performance dropped down suddenly. (On my box, it drop down from 56MB/s to 4-10MB/s) But actually, the csum calculation of the ordered extents which were introduced by the current fsync had already completed. Those ordered extents whose csum was being calculated didn't belong to the current fsync, we can ignore them. This isn't true because we will just start IO and carry on and wait later on, so we could very well have ordered extents that we started for this fsync without their csums ready which is why this code exists. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] Btrfs: introduce lock_ref/unlock_ref
qgroups need to have a consistent view of the references for a particular extent record. Currently they do this through sequence numbers on delayed refs, but this is no longer acceptable. So instead introduce lock_ref/unlock_ref. This will provide the qgroup code with a consistent view of the reference while it does its accounting calculations without interfering with the delayed ref code. Thanks, Signed-off-by: Josef Bacik --- fs/btrfs/ctree.h | 11 ++ fs/btrfs/delayed-ref.c | 2 + fs/btrfs/delayed-ref.h | 1 + fs/btrfs/extent-tree.c | 102 +++-- 4 files changed, 113 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index a924274..8b3fd61 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1273,6 +1273,9 @@ struct btrfs_block_group_cache { /* For delayed block group creation */ struct list_head new_bg_list; + + /* For locking reference modifications */ + struct extent_io_tree ref_lock; }; /* delayed seq elem */ @@ -3319,6 +3322,14 @@ int btrfs_init_space_info(struct btrfs_fs_info *fs_info); int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int __get_raid_index(u64 flags); +int lock_ref(struct btrfs_fs_info *fs_info, u64 root_objectid, u64 bytenr, +u64 num_bytes, int for_cow, +struct btrfs_block_group_cache **block_group, +struct extent_state **cached_state); +int unlock_ref(struct btrfs_fs_info *fs_info, u64 root_objectid, u64 bytenr, + u64 num_bytes, int for_cow, + struct btrfs_block_group_cache *block_group, + struct extent_state **cached_state); /* ctree.c */ int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key, int level, int *slot); diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index fab60c1..ee1c29d 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -680,6 +680,7 @@ static noinline void add_delayed_tree_ref(struct btrfs_fs_info *fs_info, ref->action = action; ref->is_head = 0; ref->in_tree = 1; + ref->for_cow = for_cow; if (need_ref_seq(for_cow, ref_root)) seq = btrfs_get_tree_mod_seq(fs_info, &trans->delayed_ref_elem); @@ -739,6 +740,7 @@ static noinline void add_delayed_data_ref(struct btrfs_fs_info *fs_info, ref->action = action; ref->is_head = 0; ref->in_tree = 1; + ref->for_cow = for_cow; if (need_ref_seq(for_cow, ref_root)) seq = btrfs_get_tree_mod_seq(fs_info, &trans->delayed_ref_elem); diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index a54c9d4..db71a37 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -52,6 +52,7 @@ struct btrfs_delayed_ref_node { unsigned int action:8; unsigned int type:8; + unsigned int for_cow:1; /* is this node still in the rbtree? */ unsigned int is_head:1; unsigned int in_tree:1; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index cd4d9ca..03b536c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -672,6 +672,79 @@ struct btrfs_block_group_cache *btrfs_lookup_block_group( return cache; } + +/* This is used to lock the modification to an extent ref. This only does + * something if the reference is a fs tree. + * + * @fs_info: the fs_info for this filesystem. + * @root_objectid: the root objectid that we are modifying for this extent. + * @bytenr: the byte we are modifying the reference for + * @num_bytes: the number of bytes we are locking. + * @for_cow: if this operation is for cow then we don't need to lock + * @block_group: we will store the block group we looked up so that the unlock + * doesn't have to do another search. + * @cached_state: this is for caching our location so when we unlock we don't + * have to do a tree search. + * + * This can return -ENOMEM if we cannot allocate our extent state. + */ +int lock_ref(struct btrfs_fs_info *fs_info, u64 root_objectid, u64 bytenr, +u64 num_bytes, int for_cow, +struct btrfs_block_group_cache **block_group, +struct extent_state **cached_state) +{ + struct btrfs_block_group_cache *cache; + int ret; + + if (!fs_info->quota_enabled || !need_ref_seq(for_cow, root_objectid)) + return 0; + + cache = btrfs_lookup_block_group(fs_info, bytenr); + ASSERT(cache); + ASSERT(cache->key.objectid <= bytenr && + (cache->key.objectid + cache->key.offset >= + bytenr + num_bytes)); + ret = lock_extent_bits(&cache->ref_lock, bytenr, + bytenr + num_bytes - 1, 0, cached_state); + if (!ret) + *block_group = cache; + else + btrfs_put_block_group(cache)
Rework qgroup accounting
People have been complaining about autodefrag/defrag killing their box with OOM. This is because the snapshot aware defrag stuff super sucks if you have lots of snapshots, and so that needs to be reworked. The problem is once that is fixed you start to hit horrible lock contention on the delayed refs lock because we have thousands of like entries that can't be merged until when we go to actually run the delayed ref. This problem exists because of the delayed ref sequence number. The major user of the delayed ref sequence number is the qgroup code. It uses it to pass into btrfs_find_all_roots to see what roots pointed to a particular bytenr either before or including the current operation. It needs this information to know if we were removing the last ref or an just the last ref for this particular root. The problem with this is that it has made the delayed ref code incredibly fragile and has forced us to do things like btrfs_merge_delayed_refs which is what is causing us so much pain when we have thousands of ref updates for the same block. In order to fix this I'm introducing a new way of adjusting quota counts. I've called them qgroup operations, and we apply them in very specific situations. We only add these when we add or remove the only ref for a particular root. Obviously we have to account for shared refs as well so there is some extra code for these special cases, but basically we make the qgroup accounting only happen when we know there was a real change (or likely a real change in the case of shared refs). In order to do this I've also introduced lock/unlock_ref. This only gets used if we actually have qgroups enabled, but it will be relatively low cost even if we have qgroups enabled as it only locks the bytenr for reference updates. So delayed ref updates will not trip over this since we only do one at a time anyway, so we'll only have contention if we have delayed refs running at the same time as a qgroup operation update. Then all we need to account for is the fact that we will get the full view of the roots at the time we run the operations, not what they were when our particular operation occurred. This is ok because we will either ignore our root in the case of add or not ignore it in case of remove when calculating the ref counts. We use the same ref counting scheme that Arne developed as it's pretty freaking awesome, and just adjust how we count the ref counts based on our operations. In addition to all of this new code I've added a big set of sanity tests to make sure everything is working right. Between this and the qgroups xfstests I'm pretty certain I haven't broken anything obvious with qgroups. This is just the first step in getting rid of the delayed ref sequence number and fixing the defrag OOM mess but it is the biggest part. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Btrfs: add sanity tests for new qgroup accounting code
This exercises the various parts of the new qgroup accounting code. We do some basic stuff and do some things with the shared refs to make sure all that code works. I had to add a bunch of infrastructure because I needed to be able to insert items into a fake tree without having to do all the hard work myself, hopefully this will be usefull in the future. Thanks, Signed-off-by: Josef Bacik --- fs/btrfs/Makefile | 2 +- fs/btrfs/ctree.c | 4 + fs/btrfs/ctree.h | 3 + fs/btrfs/disk-io.c| 18 +- fs/btrfs/disk-io.h| 1 + fs/btrfs/extent-tree.c| 25 ++ fs/btrfs/extent_io.c | 47 fs/btrfs/extent_io.h | 2 + fs/btrfs/qgroup.c | 23 ++ fs/btrfs/super.c | 3 + fs/btrfs/tests/btrfs-tests.c | 91 +++ fs/btrfs/tests/btrfs-tests.h | 9 + fs/btrfs/tests/inode-tests.c | 35 +-- fs/btrfs/tests/qgroup-tests.c | 617 ++ 14 files changed, 843 insertions(+), 37 deletions(-) create mode 100644 fs/btrfs/tests/qgroup-tests.c diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 1a44e42..e6df2dd 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -16,4 +16,4 @@ btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o btrfs-$(CONFIG_BTRFS_FS_RUN_SANITY_TESTS) += tests/free-space-tests.o \ tests/extent-buffer-tests.o tests/btrfs-tests.o \ - tests/extent-io-tests.o tests/inode-tests.o + tests/extent-io-tests.o tests/inode-tests.o tests/qgroup-tests.o diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index a57507a..38ef590 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1344,6 +1344,10 @@ static inline int should_cow_block(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf) { +#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS + if (unlikely(root->dummy_root)) + return 0; +#endif /* ensure we can see the force_cow */ smp_rmb(); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 944c916..8ad5adb 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1783,6 +1783,7 @@ struct btrfs_root { int in_radix; #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS int dummy_root; + u64 alloc_bytenr; #endif u64 defrag_trans_start; struct btrfs_key defrag_progress; @@ -4094,6 +4095,8 @@ static inline int btrfs_defrag_cancelled(struct btrfs_fs_info *fs_info) /* Sanity test specific functions */ #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS void btrfs_test_destroy_inode(struct inode *inode); +int btrfs_verify_qgroup_counts(struct btrfs_fs_info *fs_info, u64 qgroupid, + u64 rfer, u64 excl); #endif #endif diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3eb27b9..c30de3d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1095,6 +1095,11 @@ struct extent_buffer *btrfs_find_tree_block(struct btrfs_root *root, struct extent_buffer *btrfs_find_create_tree_block(struct btrfs_root *root, u64 bytenr, u32 blocksize) { +#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS + if (unlikely(root->dummy_root)) + return alloc_test_extent_buffer(root->fs_info, bytenr, + blocksize); +#endif return alloc_extent_buffer(root->fs_info, bytenr, blocksize); } @@ -1245,6 +1250,7 @@ struct btrfs_root *btrfs_alloc_dummy_root(void) return ERR_PTR(-ENOMEM); __setup_root(4096, 4096, 4096, 4096, root, NULL, 1); root->dummy_root = 1; + root->alloc_bytenr = 0; return root; } @@ -2034,7 +2040,7 @@ static void free_root_pointers(struct btrfs_fs_info *info, int chunk_root) free_root_extent_buffers(info->chunk_root); } -static void del_fs_roots(struct btrfs_fs_info *fs_info) +void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info) { int ret; struct btrfs_root *gang[8]; @@ -2929,7 +2935,7 @@ fail_qgroup: fail_trans_kthread: kthread_stop(fs_info->transaction_kthread); btrfs_cleanup_transaction(fs_info->tree_root); - del_fs_roots(fs_info); + btrfs_free_fs_roots(fs_info); fail_cleaner: kthread_stop(fs_info->cleaner_kthread); @@ -3454,8 +3460,10 @@ void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info, btrfs_free_log_root_tree(NULL, fs_info); } - __btrfs_remove_free_space_cache(root->free_ino_pinned); - __btrfs_remove_free_space_cache(root->free_ino_ctl); + if (root->free_ino_pinned) + __btrfs_remove_free_space_cache(root->free_ino_pinned); + if (root->free_ino_ctl) + __btrfs_remove_free_space_cache(root->free_ino_ctl); free_fs_root(root); } @@ -3580,7 +3588,7 @@ int close_ctree(struct btrfs_root *root)
Re: Backporting bugfixes
Pavel Roskin posted on Wed, 18 Dec 2013 14:31:53 -0500 as excerpted: > On Wed, 18 Dec 2013 19:23:08 + Chris Mason wrote: > >> We do tag some commits for stable, but Dave Sterba actually just sent a >> request to the stable tree to pull in a few more. > > That's great news! Thank you for a quick reply! In fact, here's the stable-queue request.
Re: [PATCH] Btrfs-progs: receive: fix the case that we can not find subvolume
Wang Shilong writes: > On 12/18/2013 12:06 PM, Michael Welsh Duggan wrote: >> Wang Shilong writes: >> >>> It seems that you use older kernel version but use the latest >>> btrfs-progs, new btrfs-progs use uuid tree to search but >>> this tree did not exist yet. >>> >>> Can you try to upgrade your kernel? >> What version is necessary? (I am currently on 3.11.10.) > 3.12 is ok, btw, can you run for 3.11.10 > > #dmesg > > Let's see if it output somthing like: > > btrfs: can not found root: 9 Indeed. $ dmesg | grep "root 9" [305770.945287] could not find root 9 [305770.945300] could not find root 9 [305770.945369] could not find root 9 [305770.945398] could not find root 9 [305915.405421] could not find root 9 [305915.405483] could not find root 9 [305962.927150] could not find root 9 [305962.927222] could not find root 9 [399096.924559] could not find root 9 [399096.924617] could not find root 9 [399195.585768] could not find root 9 [399195.585823] could not find root 9 Looks like I'll be rebooting to a new kernel when I get home tonight. -- Michael Welsh Duggan (m...@md5i.com) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs stable updates for 3.12
On Wed, Dec 18, 2013 at 04:14:02PM +0100, David Sterba wrote: > Hi, > > please queue the following patches to 3.12 stable. They fix a few > crashes or lockups that were reported by users. > > The patch "stop using vfs_read in send" may seem big for stable, but without > it > the send/receive ioctl hits the global open file limit sooner or later, > depending on the ram size. > > Subjects: > Btrfs: do a full search everytime in btrfs_search_old_slot > Btrfs: reset intwrite on transaction abort > Btrfs: fix memory leak of chunks' extent map > Btrfs: fix hole check in log_one_extent [bug 1] > Btrfs: fix incorrect inode acl reset > Btrfs: stop using vfs_read in send > Btrfs: take ordered root lock when removing ordered operations inode > Btrfs: do not run snapshot-aware defragment on error > Btrfs: fix a crash when running balance and defrag concurrently > Btrfs: fix lockdep error in async commit > Commits: > d4b4087c43cc00a196c5be57fac41f41309f1d56 > e0228285a8cad70e4b7b4833cc650e36ecd8de89 > 7d3d1744f8a7d62e4875bd69cc2192a939813880 > ed9e8af88e2551aaa6bf51d8063a2493e2d71597 > 8185554d3eb09d23a805456b6fa98dcbb34aa518 > ed2590953bd06b892f0411fc94e19175d32f197a > 93858769172c4e3678917810e9d5de360eb991cc > 6f519564d7d978c00351d9ab6abac3deeac31621 > 48ec47364b6d493f0a9cdc116977bf3f34e5c3ec > b1a06a4b574996692b72b742bf6e6aa0c711a948 > > all apply cleanly on top of 3.12.5. all now applied, along with 4 of these that seem to be applicable to 3.10-stable. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backporting bugfixes
On Wed, 18 Dec 2013 19:23:08 + Chris Mason wrote: > We do tag some commits for stable, but Dave Sterba actually just sent > a request to the stable tree to pull in a few more. That's great news! Thank you for a quick reply! -- Regards, Pavel Roskin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backporting bugfixes
On Wed, 2013-12-18 at 14:06 -0500, Pavel Roskin wrote: > Hello! > > I have noticed that there have been important fixes for btrfs in the > mainline Linux git repository. However, there is just one btrfs fix > in Linux 3.12.5 after 3.12. > > I think it's important to submit all serious bugfixes to the stable > kernels. It would protect users against data corruption and improve > the image of btrfs as a serious filesystem that can be trusted at > least with semi-important data. > > This post was inspired by http://lwn.net/Articles/577218/ and > https://bugzilla.redhat.com/show_bug.cgi?id=1028750 > Hi Pavel, We do tag some commits for stable, but Dave Sterba actually just sent a request to the stable tree to pull in a few more. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Backporting bugfixes
Hello! I have noticed that there have been important fixes for btrfs in the mainline Linux git repository. However, there is just one btrfs fix in Linux 3.12.5 after 3.12. I think it's important to submit all serious bugfixes to the stable kernels. It would protect users against data corruption and improve the image of btrfs as a serious filesystem that can be trusted at least with semi-important data. This post was inspired by http://lwn.net/Articles/577218/ and https://bugzilla.redhat.com/show_bug.cgi?id=1028750 -- Regards, Pavel Roskin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs on bcache
I've recently setup a system (Kernel 3.12.5-1-ARCH) which is layered as follows: /dev/sdb3 - cache0 (80 GB Intel SSD) /dev/sdc1 - backing device (2 TB WD HDD) sdb3+sdc1 => /dev/bcache0 On /dev/bcache0, there's a btrfs filesystem with 2 subvolumes, mounted as / and /home. What's been bothering me are the following entries in my kernel log: [13811.845540] incomplete page write in btrfs with offset 1536 and length 2560 [13870.326639] incomplete page write in btrfs with offset 3072 and length 1024 The offset/length values are always either 1536/2560 or 3072/1024, they sum up nicely to 4K. There are 607 of those in there as I am writing this, the machine has been up 18 hours and been under no particular I/O strain (it's a desktop). Trying to fix this, I unattached the cache (still using /dev/bcache0, but without /dev/sdb3 attached), causing these errors to disappear. As soon as I re-attached /dev/sdb3 they started again, so I am fairly sure it's an unfavorable interaction between bcache and btrfs. Is this something I should be worried about (they're only emitted with KERN_INFO?) or just an alignment problem? The underlying HDD is using 4K-Sectors, while the block_size of bcache seems to be 512, could that be the issue here? I've also encountered incomplete reads and a few csum errors, but I have not been able to trigger these regularly. I have a feeling that the error is more likely o be on the bcache end (I've mailed to that list as well), however any insight into the matter would be much appreciated. Thanks, - eb -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs stable updates for 3.12
Hi, please queue the following patches to 3.12 stable. They fix a few crashes or lockups that were reported by users. The patch "stop using vfs_read in send" may seem big for stable, but without it the send/receive ioctl hits the global open file limit sooner or later, depending on the ram size. Subjects: Btrfs: do a full search everytime in btrfs_search_old_slot Btrfs: reset intwrite on transaction abort Btrfs: fix memory leak of chunks' extent map Btrfs: fix hole check in log_one_extent [bug 1] Btrfs: fix incorrect inode acl reset Btrfs: stop using vfs_read in send Btrfs: take ordered root lock when removing ordered operations inode Btrfs: do not run snapshot-aware defragment on error Btrfs: fix a crash when running balance and defrag concurrently Btrfs: fix lockdep error in async commit Commits: d4b4087c43cc00a196c5be57fac41f41309f1d56 e0228285a8cad70e4b7b4833cc650e36ecd8de89 7d3d1744f8a7d62e4875bd69cc2192a939813880 ed9e8af88e2551aaa6bf51d8063a2493e2d71597 8185554d3eb09d23a805456b6fa98dcbb34aa518 ed2590953bd06b892f0411fc94e19175d32f197a 93858769172c4e3678917810e9d5de360eb991cc 6f519564d7d978c00351d9ab6abac3deeac31621 48ec47364b6d493f0a9cdc116977bf3f34e5c3ec b1a06a4b574996692b72b742bf6e6aa0c711a948 all apply cleanly on top of 3.12.5. Thanks, david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: no space left, metadata usage almost full?
On Wed, Dec 18, 2013 at 11:54:39PM +0900, Tomasz Chmielewski wrote: > On Wed, 18 Dec 2013 12:46:52 + > Hugo Mills wrote: > > > > # btrfs fi df /home > > > Data, RAID1: total=2.51TiB, used=1.58TiB > > > System, RAID1: total=32.00MiB, used=372.00KiB > > > Metadata, RAID1: total=48.00GiB, used=47.23GiB > > > # btrfs fi balance start -dusage=5 /home > > >Currently, yes, it is the only approach. > >Hope the above helps, > > So the balance finished, and metadata is still almost full: > > # btrfs fi df /home > Data, RAID1: total=1.60TiB, used=1.58TiB > System, RAID1: total=32.00MiB, used=248.00KiB > Metadata, RAID1: total=49.00GiB, used=47.24GiB > > Confused about the output - does it actually look any better? Yes, because... > # btrfs fi show /home > Label: crawler-btrfs uuid: 60f1759c-45f6-4484-9f60-66a4e9bbf2b6 > Total devices 2 FS bytes used 1.63TiB > devid3 size 2.56TiB used 1.66TiB path /dev/sdb4 > devid4 size 2.56TiB used 1.66TiB path /dev/sda4 ... you have unallocated space here, so the FS can now allocate more metadata as it needs to. > Does it mean that data/system/metadata will be able to grow now, > until their size in total in 2.56TiB? Yes. Although note that where btrfs fi df reports space, that's _usable_ space (i.e. how much data you can fit in it), but where btrfs fi show reports space, it's disk bytes (i.e. how much of the disk has useful content on it). With RAID-1, the first figure is half the second figure. In your case, that's simple, but with different RAID levels for data and metadata the calculation becomes a little bit more complicated. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- What part of "gestalt" don't you understand? --- signature.asc Description: Digital signature
Re: no space left, metadata usage almost full?
On Wed, 18 Dec 2013 12:46:52 + Hugo Mills wrote: > > # btrfs fi df /home > > Data, RAID1: total=2.51TiB, used=1.58TiB > > System, RAID1: total=32.00MiB, used=372.00KiB > > Metadata, RAID1: total=48.00GiB, used=47.23GiB > > # btrfs fi balance start -dusage=5 /home >Currently, yes, it is the only approach. >Hope the above helps, So the balance finished, and metadata is still almost full: # btrfs fi df /home Data, RAID1: total=1.60TiB, used=1.58TiB System, RAID1: total=32.00MiB, used=248.00KiB Metadata, RAID1: total=49.00GiB, used=47.24GiB Confused about the output - does it actually look any better? # btrfs fi show /home Label: crawler-btrfs uuid: 60f1759c-45f6-4484-9f60-66a4e9bbf2b6 Total devices 2 FS bytes used 1.63TiB devid3 size 2.56TiB used 1.66TiB path /dev/sdb4 devid4 size 2.56TiB used 1.66TiB path /dev/sda4 Btrfs v3.12 Does it mean that data/system/metadata will be able to grow now, until their size in total in 2.56TiB? -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: no space left, metadata usage almost full?
On Wed, Dec 18, 2013 at 09:37:28PM +0900, Tomasz Chmielewski wrote: > I have a btrfs filesystem which has plenty of free space left, yet it's > hitting out of space regularly. > > Here is how it looks like: > > # btrfs fi df /home > Data, RAID1: total=2.51TiB, used=1.58TiB > System, RAID1: total=32.00MiB, used=372.00KiB > Metadata, RAID1: total=48.00GiB, used=47.23GiB > > > What I read from it, is we're almost full on metadata usage, and that > might be causing out of space issues. This is highly likely. > Reading past posts on this group, I can see it's recommended to run > this if I hit out of space and the fs is low on metadata space: > > # btrfs fi balance start -dusage=5 /home > > Is it really the only workaround? Shouldn't the filesystem be more > intelligent and be able to grab some more metadata space if it's > running low? Currently, yes, it is the only approach. The automatic reclamation of unused chunks (or barely-used chunks) is on the projects list. Nobody's got round to implementing it yet. > I'd appreciate some clarifications on this (FYI, it was running > 3.11.4, upgraded to the latest rc now). Hope the above helps, Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- If it's December 1941 in Casablanca, what time is it --- in New York? signature.asc Description: Digital signature
no space left, metadata usage almost full?
I have a btrfs filesystem which has plenty of free space left, yet it's hitting out of space regularly. Here is how it looks like: # btrfs fi df /home Data, RAID1: total=2.51TiB, used=1.58TiB System, RAID1: total=32.00MiB, used=372.00KiB Metadata, RAID1: total=48.00GiB, used=47.23GiB What I read from it, is we're almost full on metadata usage, and that might be causing out of space issues. Reading past posts on this group, I can see it's recommended to run this if I hit out of space and the fs is low on metadata space: # btrfs fi balance start -dusage=5 /home Is it really the only workaround? Shouldn't the filesystem be more intelligent and be able to grab some more metadata space if it's running low? I'd appreciate some clarifications on this (FYI, it was running 3.11.4, upgraded to the latest rc now). -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance on single device
On Wed, Dec 18, 2013 at 11:05:29AM +, Hugo Mills wrote: > On Wed, Dec 18, 2013 at 10:44:43AM +, Leonidas Spyropoulos wrote: > > I'm using the same subject as it might be relevant, feel free to change it.# > > > > I'm trying to do some maintenance to the system running over a btrfs file > > system on root (/). I started a balance on the '/' partition and it failed > > with the below information: > > $ sudo btrfs balance start / > > [sudo] password for inglor: > > ERROR: error during balancing '/' - No space left on device > > There may be more info in syslog - try dmesg | tail > > $ dmesg | tail > > [93827.115887] btrfs: found 29461 extents > > [93827.481849] btrfs: relocating block group 29855055872 flags 1 > > [93841.646011] btrfs: found 33171 extents > > [93851.421207] btrfs: found 33171 extents > > [93851.782054] btrfs: relocating block group 28781314048 flags 1 > > [93866.815342] btrfs: found 52535 extents > > [93877.159354] btrfs: found 52534 extents > > [93877.356805] btrfs: relocating block group 28747759616 flags 34 > > [93880.287185] btrfs: found 1 extents > > [93880.608798] btrfs: 1 enospc errors during balance > >You don't specify your kernel version, but if it's older than 3.11 > or so, you should probably upgrade -- 3.10 and earlier had occasional > bugs where the block reserve system never kept enough blocks free to > add a new metadata chunk when it was needed, which led to exactly this > kind of symptom. You are right, apologies. It is an up to date Archlinux box with a kernel: $ uname -a Linux tiamat 3.12.5-1-ARCH #1 SMP PREEMPT Thu Dec 12 12:57:31 CET 2013 x86_64 GNU/Linux > >Alternatively, and this is a bit of a long shot given that the > error seems to have been while relocating your system chunk (which > argues against this particular diagnosis), but: > >Do you have a large file on that filesystem (larger than 1 GiB)? Unlikely since the btrfs file system in question is '/' exluding /opt and /media directories (these are other partitions) $ sudo find / -type f -size +1048576k -and -not -path "/media*" -print /proc/kcore find: `/proc/27221/task/27221/fd/5': No such file or directory find: `/proc/27221/task/27221/fdinfo/5': No such file or directory find: `/proc/27221/fd/5': No such file or directory find: `/proc/27221/fdinfo/5': No such file or directory find: `/run/user/1000/gvfs': Permission denied inglor@tiamat ~$ > >If so, I would recommend switching to a 3.12 kernel, and running a > defrag on the file. There's a known and now-fixed bug where you can > get ENOSPC while balancing, if a file has an extent larger than 1 GiB > in size. (The bug being that there's an extent over 1 GiB in size in > the first place). I might try the defrag option anyway and restart the balance operation, see if this will help anyway. Thanks, Leonidas > >Hugo. > > -- > === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === > PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk > --- I'd make a joke about UDP, but I don't know if --- > anyone's actually listening... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: improve the performance fluctuating of the fsync
On Wed, Dec 18, 2013 at 06:52:44PM +0800, Miao Xie wrote: > In order to improve the performance of fsync, we use the outstanding > ordered extents to avoid looking up the checksum from the csum tree. > But we didn't filter out the ordered extents whose csum is still being > calculated, when we got those ordered extents, we had to wait for the > csum calculation. It made the performance dropped down suddenly. (On > my box, it drop down from 56MB/s to 4-10MB/s) > > But actually, the csum calculation of the ordered extents which were > introduced by the current fsync had already completed. Those ordered > extents whose csum was being calculated didn't belong to the current > fsync, we can ignore them. > > By this patch, the performance fluctuating doesn't happen, and the average > performance grows up by ~2%. > [..] Will this help with apt-get performance over btrfs file system? As far as I understand it it's happening because of multiple fsync calls. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance on single device
On Wed, Dec 18, 2013 at 10:44:43AM +, Leonidas Spyropoulos wrote: > I'm using the same subject as it might be relevant, feel free to change it.# > > I'm trying to do some maintenance to the system running over a btrfs file > system on root (/). I started a balance on the '/' partition and it failed > with the below information: > $ sudo btrfs balance start / > [sudo] password for inglor: > ERROR: error during balancing '/' - No space left on device > There may be more info in syslog - try dmesg | tail > $ dmesg | tail > [93827.115887] btrfs: found 29461 extents > [93827.481849] btrfs: relocating block group 29855055872 flags 1 > [93841.646011] btrfs: found 33171 extents > [93851.421207] btrfs: found 33171 extents > [93851.782054] btrfs: relocating block group 28781314048 flags 1 > [93866.815342] btrfs: found 52535 extents > [93877.159354] btrfs: found 52534 extents > [93877.356805] btrfs: relocating block group 28747759616 flags 34 > [93880.287185] btrfs: found 1 extents > [93880.608798] btrfs: 1 enospc errors during balance You don't specify your kernel version, but if it's older than 3.11 or so, you should probably upgrade -- 3.10 and earlier had occasional bugs where the block reserve system never kept enough blocks free to add a new metadata chunk when it was needed, which led to exactly this kind of symptom. Alternatively, and this is a bit of a long shot given that the error seems to have been while relocating your system chunk (which argues against this particular diagnosis), but: Do you have a large file on that filesystem (larger than 1 GiB)? If so, I would recommend switching to a 3.12 kernel, and running a defrag on the file. There's a known and now-fixed bug where you can get ENOSPC while balancing, if a file has an extent larger than 1 GiB in size. (The bug being that there's an extent over 1 GiB in size in the first place). Hugo. > $ df |grep sda2 > /dev/sda2 20971520 13980396 5797124 71% / > > > $ sudo btrfs fi show > [sudo] password for inglor: > Label: none uuid: 699d671b-7064-441d-95ec-c616049fe287 > Total devices 1 FS bytes used 12.75GB > devid1 size 20.00GB used 15.31GB path /dev/sda2 > > Btrfs v0.20-rc1-358-g194aa4a-dirty > > $ sudo btrfs fi df / > [sudo] password for inglor: > Data: total=13.00GB, used=12.16GB > System, DUP: total=32.00MB, used=4.00KB > Metadata, DUP: total=1.12GB, used=601.54MB > > Does it really needs more than 5.7GB to do a balance? I though it suppose to > move chunks one by one and considering the chunks for Data is 1GB and for > MetaData 512MB (256 x2 for dublication) it should be more than enough. > Also I had less space before and the dmesg reported 7 enospc errors. With > cleaning a bit of packages installed now it reports only 1 enospc. Is that > anywhere relevant? > > Thanks, > Leonidas -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I'd make a joke about UDP, but I don't know if --- anyone's actually listening... signature.asc Description: Digital signature
[PATCH] Btrfs: improve the performance fluctuating of the fsync
In order to improve the performance of fsync, we use the outstanding ordered extents to avoid looking up the checksum from the csum tree. But we didn't filter out the ordered extents whose csum is still being calculated, when we got those ordered extents, we had to wait for the csum calculation. It made the performance dropped down suddenly. (On my box, it drop down from 56MB/s to 4-10MB/s) But actually, the csum calculation of the ordered extents which were introduced by the current fsync had already completed. Those ordered extents whose csum was being calculated didn't belong to the current fsync, we can ignore them. By this patch, the performance fluctuating doesn't happen, and the average performance grows up by ~2%. Test Environment: CPU:2CPU * 2Cores Memory: 4GB Partition: 20GB(HDD) Test Command: # sysbench --num-threads=8 --test=fileio --file-num=1 \ > --file-total-size=8G --file-block-size=32768 \ > --file-io-mode=sync --file-fsync-freq=100 \ > --file-fsync-end=no --max-requests=1 \ > --file-test-mode=rndwr run Signed-off-by: Miao Xie --- fs/btrfs/ordered-data.c | 3 +++ fs/btrfs/tree-log.c | 2 -- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index b8c2ded..df87ed5 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -433,6 +433,9 @@ void btrfs_get_logged_extents(struct btrfs_root *log, struct inode *inode) spin_lock_irq(&tree->lock); for (n = rb_first(&tree->tree); n; n = rb_next(n)) { ordered = rb_entry(n, struct btrfs_ordered_extent, rb_node); + if (ordered->csum_bytes_left) + continue; + spin_lock(&log->log_extents_lock[index]); if (list_empty(&ordered->log_list)) { list_add_tail(&ordered->log_list, &log->logged_list[index]); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index ba2f151..3eae2eb 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3631,8 +3631,6 @@ again: * start over after this. */ - wait_event(ordered->wait, ordered->csum_bytes_left == 0); - list_for_each_entry(sum, &ordered->list, list) { ret = btrfs_csum_file_blocks(trans, log, sum); if (ret) { -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs balance on single device
I'm using the same subject as it might be relevant, feel free to change it.# I'm trying to do some maintenance to the system running over a btrfs file system on root (/). I started a balance on the '/' partition and it failed with the below information: $ sudo btrfs balance start / [sudo] password for inglor: ERROR: error during balancing '/' - No space left on device There may be more info in syslog - try dmesg | tail $ dmesg | tail [93827.115887] btrfs: found 29461 extents [93827.481849] btrfs: relocating block group 29855055872 flags 1 [93841.646011] btrfs: found 33171 extents [93851.421207] btrfs: found 33171 extents [93851.782054] btrfs: relocating block group 28781314048 flags 1 [93866.815342] btrfs: found 52535 extents [93877.159354] btrfs: found 52534 extents [93877.356805] btrfs: relocating block group 28747759616 flags 34 [93880.287185] btrfs: found 1 extents [93880.608798] btrfs: 1 enospc errors during balance $ df |grep sda2 /dev/sda2 20971520 13980396 5797124 71% / $ sudo btrfs fi show [sudo] password for inglor: Label: none uuid: 699d671b-7064-441d-95ec-c616049fe287 Total devices 1 FS bytes used 12.75GB devid1 size 20.00GB used 15.31GB path /dev/sda2 Btrfs v0.20-rc1-358-g194aa4a-dirty $ sudo btrfs fi df / [sudo] password for inglor: Data: total=13.00GB, used=12.16GB System, DUP: total=32.00MB, used=4.00KB Metadata, DUP: total=1.12GB, used=601.54MB Does it really needs more than 5.7GB to do a balance? I though it suppose to move chunks one by one and considering the chunks for Data is 1GB and for MetaData 512MB (256 x2 for dublication) it should be more than enough. Also I had less space before and the dmesg reported 7 enospc errors. With cleaning a bit of packages installed now it reports only 1 enospc. Is that anywhere relevant? Thanks, Leonidas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs-progs: receive: fix the case that we can not find the subvolume
If we change our default subvolume, btrfs receive will fail to find subvolume. To fix this problem, we have three ideas: 1.make btrfs snapshot ioctl support passing source subvolume's objectid. 2.when we want to using interval subvolume path, we mount it other place that use subvolume 5 as its default subvolume. 3.tell the user to mount the toplevel subvol by himself and run receive We's better use the third approach because first patch will bother kernel change and the second approach is not very good for power users. So give this option to users. Reported-by: Michael Welsh Duggan Signed-off-by: Wang Shilong Signed-off-by: Miao Xie --- Changelog: v1->v2: addressed david's comments and use the third approach to fix the problem --- cmds-receive.c | 11 +-- man/btrfs.8.in | 15 ++- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/cmds-receive.c b/cmds-receive.c index ed44107..cce37a7 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -257,8 +257,15 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, O_RDONLY | O_NOATIME); if (args_v2.fd < 0) { ret = -errno; - fprintf(stderr, "ERROR: open %s failed. %s\n", - parent_subvol->path, strerror(-ret)); + if (errno != ENOENT) + fprintf(stderr, "ERROR: open %s failed. %s\n", + parent_subvol->path, strerror(-ret)); + else + fprintf(stderr, + "It seems that you have changed your default " + "subvolume or you specify other subvolume to\n" + "mount btrfs, try to remount this btrfs filesystem " + "with fs tree, and run btrfs receive again!\n"); goto out; } diff --git a/man/btrfs.8.in b/man/btrfs.8.in index 901caa5..ece6a5a 100644 --- a/man/btrfs.8.in +++ b/man/btrfs.8.in @@ -668,11 +668,16 @@ Receive subvolumes from stdin. Receives one or more subvolumes that were previously sent with btrfs send. The received subvolumes are stored into \fI\fP. -btrfs receive will fail in case a receiving subvolume -already exists. It will also fail in case a previously -received subvolume was changed after it was received. -After receiving a subvolume, it is immediately set to -read only. +btrfs receive will fail with the following case: + +1.a receiving subvolume already exists. + +2.a previously received subvolume was changed after it was received. + +3.default subvolume is changed or you don't mount btrfs filesystem with +fs tree. + +After receiving a subvolume, it is immediately set to read only. .RS \fIOptions\fR -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs RAID1 File System Grew Something Extra
Garry T. Williams posted on Tue, 17 Dec 2013 23:12:25 -0500 as excerpted: > On 12-18-13 10:46:29 Anand Jain wrote: >> On 12/18/2013 10:03 AM, Garry T. Williams wrote: >> > I have been using btrfs for my /home partition on my home machine for >> > a few years now. I created the file system RAID1 using two disk >> > partitions. Recently I noticed btrfs fi df shows extra Data, System, >> > and Metadata allocations. >> >> this is a known bug in mkfs.btrfs, the workaround for now is to run >> balance on FS having some data. so that unused group- >> profile will go away. > > Thanks. > > garry@vfr$ sudo btrfs balance start /home > Done, had to relocate 50 out of 50 chunks > garry@vfr$ sudo btrfs filesystem df /home > Data, RAID1: total=22.00GiB, used=21.02GiB > System, RAID1: total=32.00MiB, used=12.00KiB > System, single: total=4.00MiB, used=0.00 > Metadata, RAID1: total=1.00GiB, used=419.60MiB > > Hmmm. > > Well, it's better, but the extra allocation for System is baffling. I > believe that this happened sometime after creating the file system. Keep in mind that btrfs remains under development, still improving old features and growing new ones as well as bugfixing (and of course unfortunately still adding new bugs with the new code occasionally, it comes with the development filesystem territory). Having seen the same thing happen here, I think the extra allocations were there all the time, but simply weren't originally reported. After some improvements in btrfs fi df, it more accurately reported the empty chunk-stub relics of mkfs.btrfs where it didn't before, so they appeared to be new even if they'd been there all the time. But a balance normally does remove them. Tho that doesn't explain why the balance didn't remove that 4 MiB single- mode system stub. It did on all my btrfs here. But I run gentoo and build/install gentoo's live-git btrfs-progs, and build/run the mainline development kernel from live-git as well, so I'm well into the 3.13-rcs by now, while you haven't even upgraded to 3.12 yet and are still on 3.11- stable series, which might account for that. Or perhaps another balance would kill the system-stub as well? I don't know. > Also balance on a RAID1 file system with exactly two drives doesn't make > much sense to me. Why would any "chunks" have to be relocated? I'm > clearly missing something here. You haven't read up on how btrfs balance works at the wiki, have you? Which means you're probably missing other information that might be helpful in administering your btrfs as well. It'll likely be worth your while to spend some time reading the user documentation there (and to bookmark it for further reference, too =:^) : https://btrfs.wiki.kernel.org/ For balance in particular, see: https://btrfs.wiki.kernel.org/index.php/FAQ#What_does_.22balance.22_do.3F Also of interest: https://btrfs.wiki.kernel.org/index.php/Balance_Filters Of course, the btrfs manpage is also useful. Basically, unless you limit it with the -d/-m/-s switches and/or filters, balance blindly rewrites/relocates every chunk on the filesystem, cleaning up and if applicable converting between redundancy types as it does so. So all chunks are relocated/rewritten. But the above documentation should also suggest trying this to see if it addresses that remaining single-mode system chunk stub: btrfs balance start -fsconvert=raid1 /home Especially if you're on spinning rust (not SSD), that should take quite a bit less time than a full balance as well, because you're only rebalancing the few MiB of system chunks, not the GiBs of data and metadata. Hopefully that'll kill the single-mode system stub-chunk. If not, you've probably hit a bug and should report it as such, tho you might wish to try it with the latest 3.12 stable or 3.13-rc first, in case the bug has already been fixed. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs RAID1 File System Grew Something Extra
On Tue, Dec 17, 2013 at 11:12:25PM -0500, Garry T. Williams wrote: > On 12-18-13 10:46:29 Anand Jain wrote: > > On 12/18/2013 10:03 AM, Garry T. Williams wrote: > > > I have been using btrfs for my /home partition on my home machine for > > > a few years now. I created the file system RAID1 using two disk > > > partitions. Recently I noticed btrfs fi df shows extra Data, System, > > > and Metadata allocations. And btrfs fi show indicates extra > > > allocations on one of my disk drives accounting for the 20 MiB > > > allocation in the df display. > > > > this is a known bug in mkfs.btrfs, the workaround for now is > > to run balance on FS having some data. so that unused group- > > profile will go away. > > Thanks. > > garry@vfr$ sudo btrfs balance start /home > Done, had to relocate 50 out of 50 chunks > garry@vfr$ sudo btrfs filesystem df /home > Data, RAID1: total=22.00GiB, used=21.02GiB > System, RAID1: total=32.00MiB, used=12.00KiB > System, single: total=4.00MiB, used=0.00 > Metadata, RAID1: total=1.00GiB, used=419.60MiB > > Hmmm. > > Well, it's better, but the extra allocation for System is baffling. I > believe that this happened sometime after creating the file system. It won't be spontaneously created -- it'll have been there since the beginning. The first system chunk is "special" and is skipped during balances, so it won't get cleaned up like this. > Also balance on a RAID1 file system with exactly two drives doesn't > make much sense to me. Why would any "chunks" have to be relocated? > I'm clearly missing something here. That's what balance does -- it rewrites every single piece of data on the filesystem. In this case, you could have used a filter to balance (and hence remove) only the single chunks: btrfs balance start -mprofiles=single -dprofiles=single -sprofiles=single /mountpoint Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- One of these days, I'll catch that man without a quotation, --- and he'll look undressed. signature.asc Description: Digital signature