Re: [PATCH v2 1/4] btrfs: Remove userspace transaction ioctls
These ioctl are originally introduced by Sage Weil for Ceph use? Not sure whether it still useful, Cc Sage just in case. 在 2018年2月5日,下午5:41,Nikolay Borisov写道: Commit 3558d4f88ec8 ("btrfs: Deprecate userspace transaction ioctls") marked the beginning of the end of userspace transaction. This commit finishes the job! Signed-off-by: Nikolay Borisov --- V2: * Also remove the usage of btrfs_ioctl_trans_end from btrfs_release_file so that the patch compiles on its own as well. fs/btrfs/ctree.h | 1 - fs/btrfs/file.c | 8 - fs/btrfs/ioctl.c | 95 3 files changed, 104 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1a462ab85c49..6a4752177ad8 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3193,7 +3193,6 @@ void btrfs_destroy_inode(struct inode *inode); int btrfs_drop_inode(struct inode *inode); int __init btrfs_init_cachep(void); void btrfs_destroy_cachep(void); -long btrfs_ioctl_trans_end(struct file *file); struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, struct btrfs_root *root, int *was_new); struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index cba2ac371ce0..101e0c7fea92 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1996,8 +1996,6 @@ int btrfs_release_file(struct inode *inode, struct file *filp) { struct btrfs_file_private *private = filp->private_data; - if (private && private->trans) - btrfs_ioctl_trans_end(filp); if (private && private->filldir_buf) kfree(private->filldir_buf); kfree(private); @@ -2189,12 +2187,6 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) } /* -* ok we haven't committed the transaction yet, lets do a commit -*/ - if (file->private_data) - btrfs_ioctl_trans_end(file); - - /* * We use start here because we will need to wait on the IO to complete * in btrfs_sync_log, which could require joining a transaction (for * example checking cross references in the nocow path). If we use join diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f573cad72b7e..3094e079fc4f 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3935,73 +3935,6 @@ int btrfs_clone_file_range(struct file *src_file, loff_t off, return btrfs_clone_files(dst_file, src_file, off, len, destoff); } -/* - * there are many ways the trans_start and trans_end ioctls can lead - * to deadlocks. They should only be used by applications that - * basically own the machine, and have a very in depth understanding - * of all the possible deadlocks and enospc problems. - */ -static long btrfs_ioctl_trans_start(struct file *file) -{ - struct inode *inode = file_inode(file); - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_root *root = BTRFS_I(inode)->root; - struct btrfs_trans_handle *trans; - struct btrfs_file_private *private; - int ret; - static bool warned = false; - - ret = -EPERM; - if (!capable(CAP_SYS_ADMIN)) - goto out; - - if (!warned) { - btrfs_warn(fs_info, - "Userspace transaction mechanism is considered " - "deprecated and slated to be removed in 4.17. " - "If you have a valid use case please " - "speak up on the mailing list"); - WARN_ON(1); - warned = true; - } - - ret = -EINPROGRESS; - private = file->private_data; - if (private && private->trans) - goto out; - if (!private) { - private = kzalloc(sizeof(struct btrfs_file_private), - GFP_KERNEL); - if (!private) - return -ENOMEM; - file->private_data = private; - } - - ret = -EROFS; - if (btrfs_root_readonly(root)) - goto out; - - ret = mnt_want_write_file(file); - if (ret) - goto out; - - atomic_inc(_info->open_ioctl_trans); - - ret = -ENOMEM; - trans = btrfs_start_ioctl_transaction(root); - if (IS_ERR(trans)) - goto out_drop; - - private->trans = trans; - return 0; - -out_drop: - atomic_dec(_info->open_ioctl_trans); - mnt_drop_write_file(file); -out: - return ret; -} - static long btrfs_ioctl_default_subvol(struct file *file, void __user *argp) { struct inode *inode = file_inode(file); @@ -4243,30 +4176,6 @@ static long btrfs_ioctl_space_info(struct btrfs_fs_info *fs_info, return ret; } -/* - * there are many ways the trans_start and trans_end ioctls can lead - * to deadlocks. They should only be used by
Re: [PATCH 0/5] btrfs-progs: resolve property and enhance defragment
在 2017年11月2日,上午11:44,Su Yue写道: Sorry, the patchset does not work as expected. Please ignore it. On 11/02/2017 11:23 AM, Su Yue wrote: > The patchset adds an option '--compress-force' to work with > 'btrfs fi defrag -c'. Then no-compression files will be set with > compression property specified by '-c' (zlib default). > patch[1-4] divide property handler to setter, getter, and printer. > Then patch[5] could enhance defragment easier. > Su Yue (5): > btrfs-progs: property: divide prop_handler_t into > setter,getter,printer > btrfs-progs: property: set/get/print ro property > btrfs-progs: property: set/get/print label property > btrfs-progs: property: set/get/print compression property > btrfs-progs: fi defrag: extend -c to drop nocompress flag on files > cmds-filesystem.c | 94 +++-- > cmds-property.c | 7 +- > props.c | 207 > +- > props.h | 19 +++-- > 4 files changed, 283 insertions(+), 44 deletions(-) FYI, i think it better to add some tests to btrfs-progs if new interface or function like this is added. Thanks, Shilong -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
Hi Qu, On Fri, Aug 4, 2017 at 10:05 PM, Qu Wenruowrote: > > > On 2017年08月02日 16:38, Brendan Hide wrote: >> >> The title seems alarmist to me - and I suspect it is going to be >> misconstrued. :-/ >> >> From the release notes at >> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html >> >> "Btrfs has been deprecated >> >> The Btrfs file system has been in Technology Preview state since the >> initial release of Red Hat Enterprise Linux 6. Red Hat will not be moving >> Btrfs to a fully supported feature and it will be removed in a future major >> release of Red Hat Enterprise Linux. >> >> The Btrfs file system did receive numerous updates from the upstream in >> Red Hat Enterprise Linux 7.4 and will remain available in the Red Hat >> Enterprise Linux 7 series. However, this is the last planned update to this >> feature. >> >> Red Hat will continue to invest in future technologies to address the use >> cases of our customers, specifically those related to snapshots, >> compression, NVRAM, and ease of use. We encourage feedback through your Red >> Hat representative on features and requirements you have for file systems >> and storage technology." > > > Personally speaking, unlike most of the btrfs supporters, I think Red Hat is > doing the correct thing for their enterprise use case. > > (To clarify, I'm not going to Red Hat, just in case anyone wonders why I'm > not supporting btrfs) > > [Good things of btrfs] > Btrfs is indeed a technic pioneer in a lot of aspects (at least in linux > world): > > 1) Metadata CoW instead of traditional journal > 2) Snapshot and delta-backup > I think this is the killer feature of Btrfs, and why SUSE is using it > for root fs. > 3) Default data CoW > 4) Data checksum and scrubbing > 5) Multi-device management > 6) Online resize/balancing > And a lot of more. > > [Bad things of btrfs] > But for enterprise usage, it's too advanced and has several problems > preventing them being widely applied: > > 1) Low performance from metadata/data CoW > This is a little complicated dilemma. > Although Btrfs can disable data CoW, nodatacow also disables data > checksum, which is another main feature for btrfs. > So Btrfs can't default to nodatacow, unlike XFS. > > And metadata CoW causes extra metadata write along with superblock > update (FUA), further degrading the performance. > > Such pioneered design makes traditional performance-intense use case > very unhappy. > Especially for almost all kind of databases. (Note that nodatacow can't > always solve the performance problem). > Most performance intense usage is still based on tradtional fs design > (journal with no CoW) > > 2) Low concurrency caused by tree design. > Unlike traditional one-tree-for-one-inode design, btrfs uses > one-tree-for-one-subvolume. > The design makes snapshot implementation very easy, while makes tree > very hot when a lot of modifiers are trying to modify any metadata. > > Btrfs has a lot of different way to solve it. > For extent tree (the most busy tree), we are using delayed-ref to speed > up extent tree update. > For fs tree fsync, we have log tree to speed things up. > These approaches work, at the cost of complexity and bugs, and we still > have slow fs tree modification speed. > > 3) Low code reusage of device-mapper. > I totally understand that, due to the unique support for data csum, > btrfs can't use device-mapper directly, as we must verify the data read out > from device before passing it to higher level. > So Btrfs uses its own device-mapper like implementation to handle > multi-device management. > > The result is mixed. For easy to handle case like RAID0/1/10 btrfs is > doing well. > While for RAID5/6, everyone knows the result. > > Such btrfs *enhanced* re-implementation not only makes btrfs larger but > also more complex and bug-prune. > > In short, btrfs is too advanced for generic use cases (performance) and > developers (bugs), unfortunately. > > And even SUSE is just pushing btrfs as root fs, mainly for the snapshot > feature. > But still ext4/xfs for data or performance intense use case. > > > [Other solution on the table] > On the other hand, I think RedHat is pushing storage technology based on LVM > (thin) and Xfs. > > For traditional LVM, it's stable but its snapshot design is old-fashion and > low-performance. > While new thin-provision LVM solves the problem using a method just like > Btrfs, but at block level. > > And for XFS, it's still traditional designed, journal based, > one-tree-for-one-inode. > But with fancy new features like data CoW. > > Even XFS + LVM-thin lacks ability to shrink fs or scrub data or delta > backup, it can do a lot of things just like Btrfs. > From snapshot to multi-device management. > > And more importantly, has better
Re: [PATCH] Btrfs: avoid unnecessarily locking inode when clearing a range
On Thu, Aug 3, 2017 at 11:00 PM, Chris Mason <c...@fb.com> wrote: > > > On 07/27/2017 02:52 PM, fdman...@kernel.org wrote: >> >> From: Filipe Manana <fdman...@suse.com> >> >> If the range being cleared was not marked for defrag and we are not >> about to clear the range from the defrag status, we don't need to >> lock and unlock the inode. >> >> Signed-off-by: Filipe Manana <fdman...@suse.com> > > > Thanks Filipe, looks like it goes all the way back to: > > commit 47059d930f0e002ff851beea87d738146804726d > Author: Wang Shilong <wangsl.f...@cn.fujitsu.com> > Date: Thu Jul 3 18:22:07 2014 +0800 > > Btrfs: make defragment work with nodatacow option > > I can't see how the inode lock is required here. This blames to me, thanks for fixing it. Reviewed-by: Wang Shilong <wangshilong1...@gmail.com> > > Reviewed-by: Chris Mason <c...@fb.com> > > -chris > >> --- >> fs/btrfs/inode.c | 7 --- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index eb495e956d53..51c45c0a8553 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -1797,10 +1797,11 @@ static void btrfs_clear_bit_hook(void >> *private_data, >> u64 len = state->end + 1 - state->start; >> u32 num_extents = count_max_extents(len); >> - spin_lock(>lock); >> - if ((state->state & EXTENT_DEFRAG) && (*bits & EXTENT_DEFRAG)) >> + if ((state->state & EXTENT_DEFRAG) && (*bits & EXTENT_DEFRAG)) { >> + spin_lock(>lock); >> inode->defrag_bytes -= len; >> - spin_unlock(>lock); >> + spin_unlock(>lock); >> + } >> /* >> * set_bit and clear bit hooks normally require _irqsave/restore >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?
I haven't seen active btrfs developers from some time, Redhat looks put most of their efforts on XFS, It is time to switch to SLES/opensuse! On Wed, Aug 2, 2017 at 4:38 PM, Brendan Hidewrote: > The title seems alarmist to me - and I suspect it is going to be > misconstrued. :-/ > > From the release notes at > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/chap-Red_Hat_Enterprise_Linux-7.4_Release_Notes-Deprecated_Functionality.html > > "Btrfs has been deprecated > > The Btrfs file system has been in Technology Preview state since the initial > release of Red Hat Enterprise Linux 6. Red Hat will not be moving Btrfs to a > fully supported feature and it will be removed in a future major release of > Red Hat Enterprise Linux. > > The Btrfs file system did receive numerous updates from the upstream in Red > Hat Enterprise Linux 7.4 and will remain available in the Red Hat Enterprise > Linux 7 series. However, this is the last planned update to this feature. > > Red Hat will continue to invest in future technologies to address the use > cases of our customers, specifically those related to snapshots, > compression, NVRAM, and ease of use. We encourage feedback through your Red > Hat representative on features and requirements you have for file systems > and storage technology." > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL] Btrfs for 4.13, part 1
Sorry for bikeshedding. On Fri, Jun 23, 2017 at 11:16 PM, David Sterbawrote: > Hi, > > this is the main batch for 4.13. There are some user visible changes, see > below. The core updates improve error handling (mostly related to bios), with > the usual incremental work on the GFP_NOFS (mis)use removal. All patches have > > Fabian Frederick (1): > btrfs: kmap() can't fail > <..SNIP..> > > Sargun Dhillon (2): > btrfs: add quota override flag to enable quota override for > CAP_SYS_RESOURCE > btrfs: Add quota_override knob into sysfs > > Su Yue (9): > btrfs: Introduce btrfs_is_name_len_valid to avoid reading beyond > boundary > btrfs: Check name_len with boundary in verify dir_item > btrfs: Check name_len on add_inode_ref call path > btrfs: Verify dir_item in replay_xattr_deletes > btrfs: Check name_len in btrfs_check_ref_name_override > btrfs: Check name_len before read in iterate_dir_item > btrfs: Check name_len before reading btrfs_get_name > btrfs: Check name_len before in btrfs_del_root_ref > btrfs: Verify dir_item in iterate_object_props Hmm..add those check might be expensive for metadata operations, especially in hot path, i could see similar behavior in Ext4 for ext4 dentry check. Could we run some metadata tests to confirm?it makes sense to add check but with min affects to performace. Thanks, Shilong > Timofey Titovets (3): > Btrfs: lzo: fix typo in error message after failed deflate > Btrfs: lzo: compressed data size must be less then input size > Btrfs: compression must free at least one sector size > > Yonghong Song (1): > Btrfs: add statx support > > fs/btrfs/backref.c | 10 +- > fs/btrfs/check-integrity.c | 53 ++--- > fs/btrfs/compression.c | 94 ++-- > fs/btrfs/compression.h | 44 +++- > fs/btrfs/ctree.c | 42 ++-- > fs/btrfs/ctree.h | 84 --- > fs/btrfs/delayed-ref.c | 29 ++- > fs/btrfs/delayed-ref.h | 6 +- > fs/btrfs/dir-item.c | 94 +++- > fs/btrfs/disk-io.c | 179 +++ > fs/btrfs/disk-io.h | 8 +- > fs/btrfs/export.c| 5 + > fs/btrfs/extent-tree.c | 481 > +-- > fs/btrfs/extent_io.c | 217 -- > fs/btrfs/extent_io.h | 82 +-- > fs/btrfs/file-item.c | 31 ++- > fs/btrfs/file.c | 46 ++-- > fs/btrfs/free-space-tree.c | 38 ++-- > fs/btrfs/inode-map.c | 4 +- > fs/btrfs/inode.c | 449 > fs/btrfs/ioctl.c | 16 +- > fs/btrfs/lzo.c | 33 +-- > fs/btrfs/print-tree.c| 7 +- > fs/btrfs/props.c | 7 + > fs/btrfs/qgroup.c| 223 +- > fs/btrfs/qgroup.h| 9 +- > fs/btrfs/raid56.c| 16 +- > fs/btrfs/reada.c | 1 - > fs/btrfs/relocation.c| 15 +- > fs/btrfs/root-tree.c | 7 + > fs/btrfs/scrub.c | 209 +++-- > fs/btrfs/send.c | 112 ++--- > fs/btrfs/super.c | 72 +- > fs/btrfs/sysfs.c | 41 > fs/btrfs/tests/extent-io-tests.c | 2 +- > fs/btrfs/transaction.c | 23 +- > fs/btrfs/tree-log.c | 44 +++- > fs/btrfs/volumes.c | 74 +++--- > fs/btrfs/volumes.h | 7 + > fs/btrfs/xattr.c | 2 +- > fs/btrfs/zlib.c | 20 +- > include/trace/events/btrfs.h | 36 --- > include/uapi/linux/btrfs.h | 63 +++-- > 43 files changed, 1682 insertions(+), 1353 deletions(-) > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs-progs: du: fix to skip not btrfs dir/file
'btrfs file du' is a very useful tool to watch my system file usage information with snapshot aware. when trying to run following commands: [root@localhost btrfs-progs]# btrfs file du / Total Exclusive Set shared Filename ERROR: Failed to lookup root id - Inappropriate ioctl for device ERROR: cannot check space of '/': Unknown error -1 and My Filesystem looks like this: [root@localhost btrfs-progs]# df -Th Filesystem Type Size Used Avail Use% Mounted on devtmpfs devtmpfs 16G 0 16G 0% /dev tmpfs tmpfs 16G 368K 16G 1% /dev/shm tmpfs tmpfs 16G 1.4M 16G 1% /run tmpfs tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda3 btrfs 60G 19G 40G 33% / tmpfs tmpfs 16G 332K 16G 1% /tmp /dev/sdc btrfs 2.8T 166G 1.7T 9% /data /dev/sda2 xfs 2.0G 452M 1.6G 23% /boot /dev/sda1 vfat 1.9G 11M 1.9G 1% /boot/efi tmpfs tmpfs 3.2G 24K 3.2G 1% /run/user/1000 So I installed Btrfs as my root partition, but boot partition can be other fs. We can Let btrfs tool aware of this is not a btrfs file or directory and skip those files, so that someone like me could just run 'btrfs file du /' to scan all btrfs filesystems. After patch, it will look like: Total Exclusive Set shared Filename 0.00B 0.00B - //root/.bash_logout 0.00B 0.00B - //root/.bash_profile 0.00B 0.00B - //root/.bashrc 0.00B 0.00B - //root/.cshrc 0.00B 0.00B - //root/.tcshrc This works for me to analysis system usage and analysis performaces. Signed-off-by: Wang Shilong <wangshilong1...@gmail.com> --- v1->v2: remove extra unnecessary messages output --- cmds-fi-du.c | 8 +++- cmds-inspect.c | 2 +- utils.c| 8 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/cmds-fi-du.c b/cmds-fi-du.c index 12855a5..6d5bf35 100644 --- a/cmds-fi-du.c +++ b/cmds-fi-du.c @@ -389,8 +389,14 @@ static int du_walk_dir(struct du_dir_ctxt *ctxt, struct rb_root *shared_extents) dirfd(dirstream), shared_extents, , , 0); - if (ret) + if (ret == -ENOTTY) { + continue; + } else if (ret) { + fprintf(stderr, + "failed to walk dir/file: %s :%s\n", + entry->d_name, strerror(-ret)); break; + } ctxt->bytes_total += tot; ctxt->bytes_shared += shr; diff --git a/cmds-inspect.c b/cmds-inspect.c index dd7b9dd..2ae44be 100644 --- a/cmds-inspect.c +++ b/cmds-inspect.c @@ -323,7 +323,7 @@ static int cmd_inspect_rootid(int argc, char **argv) ret = lookup_ino_rootid(fd, ); if (ret) { - error("rootid failed with ret=%d", ret); + error("failed to lookup root id: %s", strerror(-ret)); goto out; } diff --git a/utils.c b/utils.c index 578fdb0..f73b048 100644 --- a/utils.c +++ b/utils.c @@ -2815,6 +2815,8 @@ path: if (fd < 0) goto err; ret = lookup_ino_rootid(fd, ); + if (ret) + error("failed to lookup root id: %s", strerror(-ret)); close(fd); if (ret < 0) goto err; @@ -3497,10 +3499,8 @@ int lookup_ino_rootid(int fd, u64 *rootid) args.objectid = BTRFS_FIRST_FREE_OBJECTID; ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, ); - if (ret < 0) { - error("failed to lookup root id: %s", strerror(errno)); - return ret; - } + if (ret < 0) + return -errno; *rootid = args.treeid; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: du: fix to skip not btrfs dir/file
On Wed, Jul 6, 2016 at 10:35 PM, Holger Hoffstätte <hol...@applied-asynchrony.com> wrote: > On 07/06/16 14:25, Wang Shilong wrote: >> 'btrfs file du' is a very useful tool to watch my system >> file usage with snapshot aware. >> >> when trying to run following commands: >> [root@localhost btrfs-progs]# btrfs file du / >> Total Exclusive Set shared Filename >> ERROR: Failed to lookup root id - Inappropriate ioctl for device >> ERROR: cannot check space of '/': Unknown error -1 >> >> and My Filesystem looks like this: >> [root@localhost btrfs-progs]# df -Th >> Filesystem Type Size Used Avail Use% Mounted on >> devtmpfs devtmpfs 16G 0 16G 0% /dev >> tmpfs tmpfs 16G 368K 16G 1% /dev/shm >> tmpfs tmpfs 16G 1.4M 16G 1% /run >> tmpfs tmpfs 16G 0 16G 0% /sys/fs/cgroup >> /dev/sda3 btrfs 60G 19G 40G 33% / >> tmpfs tmpfs 16G 332K 16G 1% /tmp >> /dev/sdc btrfs 2.8T 166G 1.7T 9% /data >> /dev/sda2 xfs 2.0G 452M 1.6G 23% /boot >> /dev/sda1 vfat 1.9G 11M 1.9G 1% /boot/efi >> tmpfs tmpfs 3.2G 24K 3.2G 1% /run/user/1000 >> >> So I installed Btrfs as my root partition, but boot partition >> can be other fs. >> >> We can Let btrfs tool aware of this is not a btrfs file or >> directory and skip those files, so that someone like me >> could just run 'btrfs file du /' to scan all btrfs filesystems. >> >> After patch, it will look like: >>Total Exclusive Set shared Filename >> skipping not btrfs dir/file: boot >> skipping not btrfs dir/file: dev >> skipping not btrfs dir/file: proc >> skipping not btrfs dir/file: run >> skipping not btrfs dir/file: sys >> 0.00B 0.00B - //root/.bash_logout >> 0.00B 0.00B - //root/.bash_profile >> 0.00B 0.00B - //root/.bashrc >> 0.00B 0.00B - //root/.cshrc >> 0.00B 0.00B - //root/.tcshrc >> >> This works for me to analysis system usage and analysis >> performaces. > > This is great, but can we please skip the "skipping .." messages? > Maybe it's just me but I really don't see the value of printing them > when they don't contribute to the result. > They also mess up the display. :) I don't have a taste whether it needed or not, because it is somehow useful to let users know some files/directories skipped Wait some other guys opinion for this... thanks, Shilong > > thanks, > Holger > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: du: fix to skip not btrfs dir/file
'btrfs file du' is a very useful tool to watch my system file usage with snapshot aware. when trying to run following commands: [root@localhost btrfs-progs]# btrfs file du / Total Exclusive Set shared Filename ERROR: Failed to lookup root id - Inappropriate ioctl for device ERROR: cannot check space of '/': Unknown error -1 and My Filesystem looks like this: [root@localhost btrfs-progs]# df -Th Filesystem Type Size Used Avail Use% Mounted on devtmpfs devtmpfs 16G 0 16G 0% /dev tmpfs tmpfs 16G 368K 16G 1% /dev/shm tmpfs tmpfs 16G 1.4M 16G 1% /run tmpfs tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/sda3 btrfs 60G 19G 40G 33% / tmpfs tmpfs 16G 332K 16G 1% /tmp /dev/sdc btrfs 2.8T 166G 1.7T 9% /data /dev/sda2 xfs 2.0G 452M 1.6G 23% /boot /dev/sda1 vfat 1.9G 11M 1.9G 1% /boot/efi tmpfs tmpfs 3.2G 24K 3.2G 1% /run/user/1000 So I installed Btrfs as my root partition, but boot partition can be other fs. We can Let btrfs tool aware of this is not a btrfs file or directory and skip those files, so that someone like me could just run 'btrfs file du /' to scan all btrfs filesystems. After patch, it will look like: Total Exclusive Set shared Filename skipping not btrfs dir/file: boot skipping not btrfs dir/file: dev skipping not btrfs dir/file: proc skipping not btrfs dir/file: run skipping not btrfs dir/file: sys 0.00B 0.00B - //root/.bash_logout 0.00B 0.00B - //root/.bash_profile 0.00B 0.00B - //root/.bashrc 0.00B 0.00B - //root/.cshrc 0.00B 0.00B - //root/.tcshrc This works for me to analysis system usage and analysis performaces. Signed-off-by: Wang Shilong <wangshilong1...@gmail.com> --- cmds-fi-du.c | 11 ++- cmds-inspect.c | 2 +- utils.c| 8 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/cmds-fi-du.c b/cmds-fi-du.c index 12855a5..bf0e62c 100644 --- a/cmds-fi-du.c +++ b/cmds-fi-du.c @@ -389,8 +389,17 @@ static int du_walk_dir(struct du_dir_ctxt *ctxt, struct rb_root *shared_extents) dirfd(dirstream), shared_extents, , , 0); - if (ret) + if (ret == -ENOTTY) { + fprintf(stdout, + "skipping not btrfs dir/file: %s\n", + entry->d_name); + continue; + } else if (ret) { + fprintf(stderr, + "failed to walk dir/file: %s :%s\n", + entry->d_name, strerror(-ret)); break; + } ctxt->bytes_total += tot; ctxt->bytes_shared += shr; diff --git a/cmds-inspect.c b/cmds-inspect.c index dd7b9dd..2ae44be 100644 --- a/cmds-inspect.c +++ b/cmds-inspect.c @@ -323,7 +323,7 @@ static int cmd_inspect_rootid(int argc, char **argv) ret = lookup_ino_rootid(fd, ); if (ret) { - error("rootid failed with ret=%d", ret); + error("failed to lookup root id: %s", strerror(-ret)); goto out; } diff --git a/utils.c b/utils.c index 578fdb0..f73b048 100644 --- a/utils.c +++ b/utils.c @@ -2815,6 +2815,8 @@ path: if (fd < 0) goto err; ret = lookup_ino_rootid(fd, ); + if (ret) + error("failed to lookup root id: %s", strerror(-ret)); close(fd); if (ret < 0) goto err; @@ -3497,10 +3499,8 @@ int lookup_ino_rootid(int fd, u64 *rootid) args.objectid = BTRFS_FIRST_FREE_OBJECTID; ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, ); - if (ret < 0) { - error("failed to lookup root id: %s", strerror(errno)); - return ret; - } + if (ret < 0) + return -errno; *rootid = args.treeid; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About in-band dedupe for v4.7
Hello Guys, I am commenting not because of Qu's patch, of course, Qu and Mark Fasheh Do a really good thing for Btrfs contributions. Just my two cents! 1) I think Currently, we should really focus on Btrfs stability, there are still many bugs hidden inside Btrfs, please See Filipe flighting patches here and there. Unfortunately, I have not seen Btrfs's Bug is reducing for these two years even we have frozen new features here. we are adopting Btrfs internally sometime before, but it is really unstable unfortunately. So Why not sit down and make it stable firstly? 2)I am not against new feature, but for a new feature, I think we should be really careful now, Especially if a new feature affect normal write/read path, I think following things can be done to make things better: ->Give your design and documents(maybe put it in wiki or somewhere else) So that other guys can really review your design instead of looking a bunch of codes firstly. And we really need understand pros and cons of design, also if there is TODO, please do clarity out how we can solve this problem(or it is possible?). ->We need reallly a lot of testing and benchmarking if it affects normal write/read path. ->I think Chris is nice to merge patches, but I really argue next time if we want to merge new feautres Please make sure at least two other guys review for patches. Thank you! Shilong On Wed, May 11, 2016 at 6:11 AM, Mark Fashehwrote: > On Tue, May 10, 2016 at 03:19:52PM +0800, Qu Wenruo wrote: >> Hi, Chris, Josef and David, >> >> As merge window for v4.7 is coming, it would be good to hear your >> ideas about the inband dedupe. >> >> We are addressing the ENOSPC problem which Josef pointed out, and we >> believe the final fix patch would come out at the beginning of the >> merge window.(Next week) > > How about the fiemap performance problem you referenced before? My guess is > that it happens because you don't coalesce writes into anything larger than > a page so you're stuck deduping at some silly size like 4k. This in turn > fragments the files so much that fiemap has a hard time walking backrefs. > > I have to check the patches to be sure but perhaps you can tell me whether > my hunch is correct or not. > > > In fact, I actually asked privately for time to review your dedupe patches, > but I've been literally so busy cleaning up after the mess you left in your > last qgroups rewrite I haven't had time. > > You literally broke qgroups in almost every spot that matters. In some cases > (drop_snapshot) you tore out working code and left in a /* TODO */ comment > for someone else to complete. Snapshot create was so trivially and > completely broken by your changes that weeks later, I'm still hunting a > solution which doesn't involve adding an extra _commit_ to our commit. This > is a MASSIVE regression from where we were before. > > IMHO, you should not be trusted with large features or rewrites until you > can demonstrate: > > - A willingness to *completely* solve the problem you are trying to 'fix', >not do half the job which someone else will have to complete for you. > > - Actual testing. The snapshot bug I reference above exists purely because >nobody created a snapshot inside of one and checked the qgroup numbers! > > Sorry to be so harsh. >--Mark > > -- > Mark Fasheh > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
On Wed, Dec 2, 2015 at 9:53 PM, Tomasz Chmielewskiwrote: > On 2015-12-02 22:03, Austin S Hemmelgarn wrote: > >>> From these numbers (124 GB used where data size is 153 GB), it appears >>> that we save around 20% with zlib compression enabled. >>> Is 20% reasonable saving for zlib? Typically text compresses much better >>> with that algorithm, although I understand that we have several >>> limitations when applying that on a filesystem level. >> >> >> This is actually an excellent question. A couple of things to note >> before I share what I've seen: >> 1. Text compresses better with any compression algorithm. It is by >> nature highly patterned and moderately redundant data, which is what >> benefits the most from compression. > > > It looks that compress=zlib does not compress very well. Following Duncan's > suggestion, I've changed it to compress-force=zlib, and re-copied the data > to make sure the file are compressed. > > Compression ratio is much much better now (on a slightly changed data set): > > # df -h > /dev/xvdb 200G 24G 176G 12% /var/log/remote > > > # du -sh /var/log/remote/ > 138G/var/log/remote/ > > > So, 138 GB files use just 24 GB on disk - nice! > > However, I would still expect that compress=zlib has almost the same effect > as compress-force=zlib, for 100% text files/logs. btw, what is your kernel version? there was a bug that detected inode compression ration wrong. http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=68bb462d42a963169bf7acbe106aae08c17129a5 http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=4bcbb33255131adbe481c0467df26d654ce3bc78 Regards, Shilong > > > Tomasz Chmielewski > http://wpkg.org > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: shall distros run btrfsck on boot?
On Tue, Nov 24, 2015 at 12:02 PM, Christoph Anton Mittererwrote: > Hey. > > Short question since that came up on debian-devel. > > Now that btrfs check get's more and more useful, are the developers > going to recommend running it periodically on boot (of course that > wouldn't work right now, as it would *always* check)? > > Plus... is btrfs check (without any arguments) non-desctructive, or are > there corner-cases where it may lead to any changes on the devices? As far as i know, 'btrfs check' did not repair anything in default. only output errors if repair options not specified. Regards, Shilong > > Thanks, > Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix wrong check for btrfs_force_chunk_alloc()
On Sun, Apr 12, 2015 at 7:35 AM, Wang Shilong wangshilong1...@gmail.com wrote: btrfs_force_chunk_alloc() return 1 for allocation chunk successfully. This problem exists since commit c87f08ca4. With this patch, we might fix some enospc problems for balances. Signed-off-by: Wang Shilong wangshilong1...@gmail.com Reviewed-by: Filipe Manana fdman...@suse.com Tested-by: Filipe Manana fdman...@suse.com Thanks Filipe! Regards, Shilong --- fs/btrfs/relocation.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index d830853..c453464 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4037,7 +4037,7 @@ restart: if (trans progress err == -ENOSPC) { ret = btrfs_force_chunk_alloc(trans, rc-extent_root, rc-block_group-flags); - if (ret == 0) { + if (ret == 1) { err = 0; progress = 0; goto restart; -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix wrong check for btrfs_force_chunk_alloc()
btrfs_force_chunk_alloc() return 1 for allocation chunk successfully. This problem exists since commit c87f08ca4. With this patch, we might fix some enospc problems for balances. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- fs/btrfs/relocation.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index d830853..c453464 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4037,7 +4037,7 @@ restart: if (trans progress err == -ENOSPC) { ret = btrfs_force_chunk_alloc(trans, rc-extent_root, rc-block_group-flags); - if (ret == 0) { + if (ret == 1) { err = 0; progress = 0; goto restart; -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Send/Receive WARN_ON Cannot allocate memory And no such file or directory error
Hi, Hi, I have been testing btrfs send/receive recently. I got an error send ioctl failed with -12: Cannot allocate memory on send side. WARN_ON happened on len PATH_MAX in fs_path_ensure_buf. I got an error rename failed: no such file or directory on receive side. The followings are simple reproduced steps and related information, Is there any idea about what this might be or how to fix it? uanme -a Linux ubuntu 4.0.0-rc1-custom #2 SMP Fri Feb 27 22:20:12 CST 2015 x86_64 x86_64 x86_64 GNU/Linux btrfs --version Btrfs v3.14.1 I don’t know exact commit that fix this problem. But there are some bug fixes since v3.14. Could you try latest kernel and see if problem still exist? Steps to reproduce: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt2 $ mkdir /mnt/data $ mkdir /mnt/data/n1 $ mkdir /mnt/data/n1/n2 $ mkdir /mnt/data/n4 $ mkdir /mnt/data/n1/n2/p1 $ mkdir /mnt/data/n1/n2/p1/p2 $ mkdir /mnt/data/t6 $ mkdir /mnt/data/t7 $ mkdir -p /mnt/data/t5/t7 $ mkdir /mnt/data/t2 $ mkdir /mnt/data/t4 $ mkdir -p /mnt/data/t1/t3 $ mkdir /mnt/data/p1 $ mv /mnt/data/t1 /mnt/data/p1 $ mkdir -p /mnt/data/p1/p2 $ mv /mnt/data/t4 /mnt/data/p1/p2/t1 $ mv /mnt/data/t5 /mnt/data/n4/t5 $ mv /mnt/data/n1/n2/p1/p2 /mnt/data/n4/t5/p2 $ mv /mnt/data/t7 /mnt/data/n4/t5/p2/t7 $ mv /mnt/data/t2 /mnt/data/n4/t1 $ mv /mnt/data/p1 /mnt/data/n4/t5/p2/p1 $ mv /mnt/data/n1/n2 /mnt/data/n4/t5/p2/p1/p2/n2 $ mv /mnt/data/n4/t5/p2/p1/p2/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1 $ mv /mnt/data/n4/t5/t7 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7 $ mv /mnt/data/n4/t5/p2/p1/t1/t3 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3 $ mv /mnt/data/n4/t5/p2/p1/p2/n2/p1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1 $ mv /mnt/data/t6 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3/t5 $ mv /mnt/data/n4/t5/p2/p1/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3/t1 $ mv /mnt/data/n1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1/n1 $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ mv /mnt/data/n4/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1/t1 $ mv /mnt/data/n4/t5/p2/p1/p2/n2/t1 /mnt/data/n4/ $ mv /mnt/data/n4/t5/p2/p1/p2/n2 /mnt/data/n4/t1/n2 $ mv /mnt/data/n4/t1/t7/p1 /mnt/data/n4/t1/n2/p1 $ mv /mnt/data/n4/t1/t3/t1 /mnt/data/n4/t1/n2/t1 $ mv /mnt/data/n4/t1/t3 /mnt/data/n4/t1/n2/t1/t3 $ mv /mnt/data/n4/t5/p2/p1/p2 /mnt/data/n4/t1/n2/p1/p2 $ mv /mnt/data/n4/t1/t7 /mnt/data/n4/t1/n2/p1/t7 $ mv /mnt/data/n4/t5/p2/p1 /mnt/data/n4/t1/n2/p1/p2/p1 $ mv /mnt/data/n4/t1/n2/t1/t3/t5 /mnt/data/n4/t1/n2/p1/p2/t5 $ mv /mnt/data/n4/t5 /mnt/data/n4/t1/n2/p1/p2/p1/t5 $ mv /mnt/data/n4/t1/n2/p1/p2/p1/t5/p2 /mnt/data/n4/t1/n2/p1/p2/p1/p2 $ mv /mnt/data/n4/t1/n2/p1/p2/p1/p2/t7 /mnt/data/n4/t1/t7 $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs send /mnt/snap1 | btrfs receive /mnt2 $ btrfs send -p /mnt/snap1 /mnt/snap2 | btrfs receive -vv /mnt2 ERROR: send ioctl failed with -12: Cannot allocate memory rename data/n4/t5/p2/p1/p2 - o272-10-0 rename data/n4/t5/p2/p1/p2/n2/t1/t7 - o266-9-0 ERROR: rename data/n4/t5/p2/p1/p2/n2/t1/t7 - o266-9-0 failed. No such file or directory It seems that data/n4/t5/p2/p1/p2 had been renamed but the file path had not been updated to o272-10-0. Thus it makes the next rename failed. Thanks. robbieko -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs Send/Receive WARN_ON Cannot allocate memory And no such file or directory error
On Tue, Mar 17, 2015 at 9:34 AM, Wang Shilong wangshilong1...@gmail.com wrote: Hi, Hi, I have been testing btrfs send/receive recently. I got an error send ioctl failed with -12: Cannot allocate memory on send side. WARN_ON happened on len PATH_MAX in fs_path_ensure_buf. I got an error rename failed: no such file or directory on receive side. The followings are simple reproduced steps and related information, Is there any idea about what this might be or how to fix it? uanme -a Linux ubuntu 4.0.0-rc1-custom #2 SMP Fri Feb 27 22:20:12 CST 2015 x86_64 x86_64 x86_64 GNU/Linux btrfs --version Btrfs v3.14.1 I don’t know exact commit that fix this problem. But there are some bug fixes since v3.14. Could you try latest kernel and see if problem still exist? There's isn't any change between 4.0-rc1 and 4.0-rc4 that will fix this issue (iow, futile to try 4.0-rc4 or Chris' integration branch). It's another case for an infinite path build loop, likely due to inverting the ancestor-descendant relationship of 2 directories. I'll take a look at it when I get the time, can't promise how soon however. Oops, you are right, i was only seeing Btrfs v3.14.1 which i was thinking it was kernel version… thanks Steps to reproduce: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt2 $ mkdir /mnt/data $ mkdir /mnt/data/n1 $ mkdir /mnt/data/n1/n2 $ mkdir /mnt/data/n4 $ mkdir /mnt/data/n1/n2/p1 $ mkdir /mnt/data/n1/n2/p1/p2 $ mkdir /mnt/data/t6 $ mkdir /mnt/data/t7 $ mkdir -p /mnt/data/t5/t7 $ mkdir /mnt/data/t2 $ mkdir /mnt/data/t4 $ mkdir -p /mnt/data/t1/t3 $ mkdir /mnt/data/p1 $ mv /mnt/data/t1 /mnt/data/p1 $ mkdir -p /mnt/data/p1/p2 $ mv /mnt/data/t4 /mnt/data/p1/p2/t1 $ mv /mnt/data/t5 /mnt/data/n4/t5 $ mv /mnt/data/n1/n2/p1/p2 /mnt/data/n4/t5/p2 $ mv /mnt/data/t7 /mnt/data/n4/t5/p2/t7 $ mv /mnt/data/t2 /mnt/data/n4/t1 $ mv /mnt/data/p1 /mnt/data/n4/t5/p2/p1 $ mv /mnt/data/n1/n2 /mnt/data/n4/t5/p2/p1/p2/n2 $ mv /mnt/data/n4/t5/p2/p1/p2/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1 $ mv /mnt/data/n4/t5/t7 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7 $ mv /mnt/data/n4/t5/p2/p1/t1/t3 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3 $ mv /mnt/data/n4/t5/p2/p1/p2/n2/p1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1 $ mv /mnt/data/t6 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3/t5 $ mv /mnt/data/n4/t5/p2/p1/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t3/t1 $ mv /mnt/data/n1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1/n1 $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ mv /mnt/data/n4/t1 /mnt/data/n4/t5/p2/p1/p2/n2/t1/t7/p1/t1 $ mv /mnt/data/n4/t5/p2/p1/p2/n2/t1 /mnt/data/n4/ $ mv /mnt/data/n4/t5/p2/p1/p2/n2 /mnt/data/n4/t1/n2 $ mv /mnt/data/n4/t1/t7/p1 /mnt/data/n4/t1/n2/p1 $ mv /mnt/data/n4/t1/t3/t1 /mnt/data/n4/t1/n2/t1 $ mv /mnt/data/n4/t1/t3 /mnt/data/n4/t1/n2/t1/t3 $ mv /mnt/data/n4/t5/p2/p1/p2 /mnt/data/n4/t1/n2/p1/p2 $ mv /mnt/data/n4/t1/t7 /mnt/data/n4/t1/n2/p1/t7 $ mv /mnt/data/n4/t5/p2/p1 /mnt/data/n4/t1/n2/p1/p2/p1 $ mv /mnt/data/n4/t1/n2/t1/t3/t5 /mnt/data/n4/t1/n2/p1/p2/t5 $ mv /mnt/data/n4/t5 /mnt/data/n4/t1/n2/p1/p2/p1/t5 $ mv /mnt/data/n4/t1/n2/p1/p2/p1/t5/p2 /mnt/data/n4/t1/n2/p1/p2/p1/p2 $ mv /mnt/data/n4/t1/n2/p1/p2/p1/p2/t7 /mnt/data/n4/t1/t7 $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs send /mnt/snap1 | btrfs receive /mnt2 $ btrfs send -p /mnt/snap1 /mnt/snap2 | btrfs receive -vv /mnt2 ERROR: send ioctl failed with -12: Cannot allocate memory rename data/n4/t5/p2/p1/p2 - o272-10-0 rename data/n4/t5/p2/p1/p2/n2/t1/t7 - o266-9-0 ERROR: rename data/n4/t5/p2/p1/p2/n2/t1/t7 - o266-9-0 failed. No such file or directory It seems that data/n4/t5/p2/p1/p2 had been renamed but the file path had not been updated to o272-10-0. Thus it makes the next rename failed. Thanks. robbieko -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is it safe or useful to use NOCOW flag and autodefrag mount option at same time?
Hello, Hi, first of all a great thank you to all developers and contributors. The Btrfs filesystem ist a great filesystem! Following common recommendations [1], I use these mount options on my main developing machine: noatime,autodefrag. This is desktop machine and it works well so far. Now, I'm also going to install several KVM virtual machines on this system. I want to use qcow2 files stored on SSD with a btrfs on it. In order to avoid bad performance with the VMs, I want to disable the Copy-On-Write mechanism on the storage directory of my VM images as for example described in [2]. My question now is: Is it safe to use the autodefrag mount option and to disable CoW for single directories which contain VM files at the same time? Is it even useful or may it harm? Thank you very much for insights and comments in advance. Btrfs use COW feautr to do defrag, acutally there was bug in the past (nocow and defrag).. But make sure your kernel include following commit: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=47059d930f0e002ff851beea87d738146804726d With above commit, i think autodefrag should work well with nocow. Best regards, Wang Shilong Best regards, jploz As requested in the wiki basic information about my fresh setup attached: uname -a Linux hostname 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt7-1 (2015-03-01) i686 GNU/Linux btrfs --version Btrfs v3.17 btrfs fi show Label: none uuid: cc6492e7-a7a7-43f9-bba2-a1abe8635d36 Total devices 1 FS bytes used 4.00GiB devid1 size 210.00GiB used 8.02GiB path /dev/sdb2 Label: 'DataHddBtr' uuid: aa3f41c0-5a60-4fd7-9365-8c99284e032c Total devices 1 FS bytes used 384.00KiB devid1 size 480.00GiB used 2.04GiB path /dev/sda1 Btrfs v3.17 mount | grep btrfs /dev/sdb2 on / type btrfs (rw,noatime,ssd,space_cache,autodefrag) /dev/sda1 on /data/btr type btrfs (rw,noatime,space_cache,autodefrag) btrfs fi df / Data, single: total=7.01GiB, used=3.89GiB System, single: total=4.00MiB, used=16.00KiB Metadata, single: total=1.01GiB, used=114.39MiB GlobalReserve, single: total=48.00MiB, used=0.00B btrfs fi df /data/btr/ Data, single: total=8.00MiB, used=256.00KiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=1.00GiB, used=112.00KiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=16.00MiB, used=0.00B dmesg | grep -i btrfs [2.693408] Btrfs loaded [2.756178] BTRFS: device fsid cc6492e7-a7a7-43f9-bba2-a1abe8635d36 devid 1 transid 393 /dev/sdb2 [2.756843] BTRFS: device label DataHddBtr devid 1 transid 9 /dev/sda1 [2.775447] BTRFS info (device sdb2): disk space caching is enabled [2.779944] BTRFS: detected SSD devices, enabling SSD mode [3.430997] BTRFS info (device sdb2): enabling auto defrag [3.431002] BTRFS info (device sdb2): disk space caching is enabled [3.548937] BTRFS info (device sda1): enabling auto defrag [3.548939] BTRFS info (device sda1): disk space caching is enabled [ 183.341398] BTRFS info (device sda1): enabling auto defrag [ 183.341403] BTRFS info (device sda1): disk space caching is enabled References: [1] https://btrfs.wiki.kernel.org/index.php/Gotchas [2] https://wiki.archlinux.org/index.php/Btrfs#Copy-On-Write_.28CoW.29 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance, sudden drop to 0 IOPs
Hello guys, On Thu, Feb 12, 2015 at 05:33:41AM +0100, Kai Krakow wrote: Duncan 1i5t5.dun...@cox.net schrieb: P. Remek posted on Tue, 10 Feb 2015 18:44:33 +0100 as excerpted: In the test, I use --direct=1 parameter for fio which basically does O_DIRECT on target file. The O_DIRECT should guarantee that the filesystem cache is bypassed and IO is sent directly to the underlaying storage. Are you saying that btrfs buffers writes despite of O_DIRECT? I'm out of my (admin, no claims at developer) league on that. I see someone else replied, and would defer to them on this. I don't think that O_DIRECT can work efficiently on COW filesystems. It probably has a negative effect and cannot be faster as normal access. Linus itself said one time that O_DIRECT is broken and should go away, and instead cache hinting should be used. Think of this: For the _unbuffered_ direct-io request to be fulfilled the file system has to go through its COW logic first which it otherwise had buffered and done in background. Bypassing the cache is probably only a side-effect of O_DIRECT, not its purpose. Hmm, not true in btrfs, the COW logic mentioned above is nothing but to allocate a NEW extent, and it's not done in background. Comparing to nocow logic, the main difference comes from a) COW files' calculating checksums of the dirty data in DIO pages which nocow files don't need to. b) their endio handlers. Or am I missing something? We did benchmark Btrfs aio/dio performance before, we noticed one big differences from COW and nocow is not only checksum but checksum cost more metadata, which will make Btrfs performance drop suddenly for a while, because of metadata reservation. Thanks, -liubo At least I'd try with a nocow-file for the benchmark if you still have to use O_DIRECT. -- Replies to list only preferred. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs: qgroup: destroy related qgroup when removing subvolume if needed.
Hello, When removing a subvol, if the related qgroup has no child qgroup, we should destroy it at the same time. Also remove the TODO entry in qgroup.c. My two cents here: Take a quick look at this, i am not sure whether this is right way to do it. Actually, i think we can only remove a subvolume qgroup safely when both refer and excl are 0, because for subvolume removal case, to make qgroup accounting right, we need walk tree. At this time, qgroup walking might have not finished, if we remove this qgroup directly, accounting for this qgroup will be skipped. Maybe we could remove such qgroup when at qgroup accounting, we find an qgroup’s refer and excl are 0, we could remove it, or we do it at the background.. Also, there is another risk, that is to say, if qgroup accounting not work right, So we might need gurantee that subvolume it going to be deleted or have been deleted at the same time, then we could remove qgroup safely etc. Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com --- fs/btrfs/inode.c | 4 fs/btrfs/qgroup.c | 1 - 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e687bb0..31de211 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -59,6 +59,7 @@ #include backref.h #include hash.h #include props.h +#include qgroup.h struct btrfs_iget_args { struct btrfs_key *location; @@ -4029,6 +4030,9 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans, ret = btrfs_update_inode_fallback(trans, root, dir); if (ret) btrfs_abort_transaction(trans, root, ret); + ret = btrfs_remove_qgroup(trans, root-fs_info, objectid); + if (ret == -EBUSY) + ret = 0; out: btrfs_free_path(path); return ret; diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index c3b1e4f..303c078 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -35,7 +35,6 @@ #include qgroup.h /* TODO XXX FIXME - * - subvol delete - delete when ref goes to 0? delete limits also? * - reorganize keys * - compressed * - sync -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] btrfs-progs: make btrfs qgroups show human readable sizes
(struct btrfs_qgroup *qgroup, { BUG_ON(column = BTRFS_QGROUP_ALL || column 0); int len; + int unit_mode = btrfs_qgroup_columns[column].unit_mode; + int max_len = btrfs_qgroup_columns[column].max_len; switch (column) { @@ -203,24 +223,20 @@ static void print_qgroup_column(struct btrfs_qgroup *qgroup, print_qgroup_column_add_blank(BTRFS_QGROUP_QGROUPID, len); break; case BTRFS_QGROUP_RFER: - len = printf(%llu, qgroup-rfer); - print_qgroup_column_add_blank(BTRFS_QGROUP_RFER, len); + len = printf(%*s, max_len, pretty_size_mode(qgroup-rfer, unit_mode)); break; case BTRFS_QGROUP_EXCL: - len = printf(%llu, qgroup-excl); - print_qgroup_column_add_blank(BTRFS_QGROUP_EXCL, len); + len = printf(%*s, max_len, pretty_size_mode(qgroup-excl, unit_mode)); break; case BTRFS_QGROUP_PARENT: len = print_parent_column(qgroup); print_qgroup_column_add_blank(BTRFS_QGROUP_PARENT, len); break; case BTRFS_QGROUP_MAX_RFER: - len = printf(%llu, qgroup-max_rfer); - print_qgroup_column_add_blank(BTRFS_QGROUP_MAX_RFER, len); + len = printf(%*s, max_len, pretty_size_mode(qgroup-max_rfer, unit_mode)); break; case BTRFS_QGROUP_MAX_EXCL: - len = printf(%llu, qgroup-max_excl); - print_qgroup_column_add_blank(BTRFS_QGROUP_MAX_EXCL, len); + len = printf(%*s, max_len, pretty_size_mode(qgroup-max_excl, unit_mode)); break; case BTRFS_QGROUP_CHILD: len = print_child_column(qgroup); @@ -250,30 +266,41 @@ static void print_table_head() { int i; int len; + int max_len; for (i = 0; i BTRFS_QGROUP_ALL; i++) { + max_len = btrfs_qgroup_columns[i].max_len; if (!btrfs_qgroup_columns[i].need_print) continue; - printf(%s, btrfs_qgroup_columns[i].name); - len = btrfs_qgroup_columns[i].max_len - - strlen(btrfs_qgroup_columns[i].name); - while (len--) - printf( ); + if ((i == BTRFS_QGROUP_QGROUPID) | (i == BTRFS_QGROUP_PARENT) | + (i == BTRFS_QGROUP_CHILD)) + printf(%-*s, max_len, btrfs_qgroup_columns[i].name); + else + printf(%*s, max_len, btrfs_qgroup_columns[i].name); printf( ); } printf(\n); for (i = 0; i BTRFS_QGROUP_ALL; i++) { + max_len = btrfs_qgroup_columns[i].max_len; if (!btrfs_qgroup_columns[i].need_print) continue; - - len = strlen(btrfs_qgroup_columns[i].name); - while (len--) - printf(-); - len = btrfs_qgroup_columns[i].max_len - - strlen(btrfs_qgroup_columns[i].name); + if ((i == BTRFS_QGROUP_QGROUPID) | (i == BTRFS_QGROUP_PARENT) | + (i == BTRFS_QGROUP_CHILD)) { + len = strlen(btrfs_qgroup_columns[i].name); + while (len--) + printf(-); + len = max_len - strlen(btrfs_qgroup_columns[i].name); + while (len--) + printf( ); + } else { + len = max_len - strlen(btrfs_qgroup_columns[i].name); + while (len--) + printf( ); + len = strlen(btrfs_qgroup_columns[i].name); + while (len--) + printf(-); + } printf( ); - while (len--) - printf( ); } printf(\n); } diff --git a/qgroup.h b/qgroup.h index 653cf1c..09070b6 100644 --- a/qgroup.h +++ b/qgroup.h @@ -83,6 +83,7 @@ u64 btrfs_get_path_rootid(int fd); int btrfs_show_qgroups(int fd, struct btrfs_qgroup_filter_set *, struct btrfs_qgroup_comparer_set *); void btrfs_qgroup_setup_print_column(enum btrfs_qgroup_column_enum column); +void btrfs_qgroup_setup_human_readable(unsigned unit_mode); struct btrfs_qgroup_filter_set *btrfs_qgroup_alloc_filter_set(void); void btrfs_qgroup_free_filter_set(struct btrfs_qgroup_filter_set *filter_set); int btrfs_qgroup_setup_filter(struct btrfs_qgroup_filter_set **filter_set, -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo
Re: btrfs performance - ssd array
Hello, Hello, Could you check how many extents with BTRFS and Ext4: # filefrag test1 So my findings are odd: On BTRFS when I run fio with a single worker thread (target file is 12GB large,and its 100% random write of 4kb blocks), then number of extents reported by filefrag is around 3. However when I do the same with 4 worker threads, I get some crazy number of extents - test1: 3141866 extents found. Also when running with 4 threads when I check CPU, the sys% utilization takes 80% of CPU ( in the top output I see that all is consumed by kworker processes). On the EXT4 I get only 13 extents even when running with 4 worker threads. (note that I created RAID10 using mdadm before setting up ext4 there in order to get comparable storage solution to what we test with BTRFS). Another odd thing is, that it takes very long time for the filefrag utility to return the result on the BTRFS and not only for the case where I got 3 milions of extents but also for the first case where I ran single worker and the number of extents was only 3. Filefrag on EXT4 returns immediately. So looks like Btrfs lock contention problem. Take a look at look btrfs_drop_extents(), even for nodatacow , there will be still many items removal for FS tree. Unfortunately, btrfs lock contention problems seem very sensible to item removal operations. And, here Your SSD is very fast? hm, i am not very sensible to Your IOPS number. You could verify this problem by reduing threads number for example compare 1 thread results. aslo i guess btrfs seq write formace should be less serious here… To see if this is because bad fragments for BTRFS. I am still not sure how fio will test randwrite here, so here is possibilities: case1: if fio don’t repeat write same position for several time, i think you could add --overite=0, and retest to see if it helps. Not sure what parameter do you mean here. I mean ‘--overwrite' is an option for fio. case2: if fio randwrite did write same position for several time, i think you could use ‘-o nodatacow’ mount option to verify if this is because BTRFS COW caused serious fragments. It seems that mounting it with this option does have some effect but not very significant and it is not very deterministic. The IOPs are slightly higher at the beginning (~25 000 IOPs) but IOPs perfromance is very spiky and I can still see that CPU sys% is very high. As soon as the kworker threads start consuming CPU, the IOPs performance goes down again to some ~15 000 IOPs. Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs performance - ssd array
Hello, Hello, we are currently investigating possiblities and performance limits of the Btrfs filesystem. Now it seems we are getting pretty poor performance for the writes and I would like to ask, if our results makes sense and if it is a result of some well known performance bottleneck. Our setup: Server: CPU: dual socket: E5-2630 v2 RAM: 32 GB ram OS: Ubuntu server 14.10 Kernel: 3.19.0-031900rc2-generic btrfs tools: Btrfs v3.14.1 2x LSI 9300 HBAs - SAS3 12/Gbs 8x SSD Ultrastar SSD1600MM 400GB SAS3 12/Gbs Both HBAs see all 8 disks and we have set up multipathing using multipath command and device mapper. Then we using this command to create the filesystem: mkfs.btrfs -f -d raid10 /dev/mapper/prm-0 /dev/mapper/prm-1 /dev/mapper/prm-2 /dev/mapper/prm-3 /dev/mapper/prm-4 /dev/mapper/prm-5 /dev/mapper/prm-6 /dev/mapper/prm-7 We run performance test using following command: fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test1 --filename=test1 --bs=4k --iodepth=32 --size=12G --numjobs=24 --readwrite=randwrite Could you check how many extents with BTRFS and Ext4: # filefrag test1 To see if this is because bad fragments for BTRFS. I am still not sure how fio will test randwrite here, so here is possibilities: case1: if fio don’t repeat write same position for several time, i think you could add --overite=0, and retest to see if it helps. case2: if fio randwrite did write same position for several time, i think you could use ‘-o nodatacow’ mount option to verify if this is because BTRFS COW caused serious fragments. The results for the random read are more or less comparable with the performance of EXT4 filesystem, we get approximately 300 000 IOPs for random read. For random write however, we are getting only about 15 000 IOPs, which is much lower than for ESX4 (~200 000 IOPs for RAID10). Regards, Premek -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: should I use btrfs on Centos 7 for a new production server?
Hello, I have a well tested and working fine Centos5-Xen system. Accumulated cruft from various development efforts make it desirable to redo the install. Currently a RAID-10 ext4 filesystem with LVM and 750G of storage. There's a hot spare 750 drive in the system. I'm thinking of migrating the web sites (almost the only use of the server) to a spare then installing Centos-7 and btrfs, then migrating the sites back. I see RH marks btrfs in C7 as a technology preview but don't understand what that implies for future support and a suitably stable basis for storage. The demand on the system is low and not likely to change in the near future, storage access speeds are not likely to be dealbreakers and it would be nice to not need to use LVM, btrfs seems to have a better feature set and more intuitive command set. But I'm uncertain about stability. Anyone have an opinion? I used CentOS7 btrfs myself, just doing some tests..it crashed easily. I don’t know how much efforts that Redhat do on btrfs for 7 series. Dave -- As long as politics is the shadow cast on society by big business, the attenuation of the shadow will not change the substance. -- John Dewey -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: should I use btrfs on Centos 7 for a new production server?
Hello, I have a well tested and working fine Centos5-Xen system. Accumulated cruft from various development efforts make it desirable to redo the install. Currently a RAID-10 ext4 filesystem with LVM and 750G of storage. There's a hot spare 750 drive in the system. I'm thinking of migrating the web sites (almost the only use of the server) to a spare then installing Centos-7 and btrfs, then migrating the sites back. I see RH marks btrfs in C7 as a technology preview but don't understand what that implies for future support and a suitably stable basis for storage. The demand on the system is low and not likely to change in the near future, storage access speeds are not likely to be dealbreakers and it would be nice to not need to use LVM, btrfs seems to have a better feature set and more intuitive command set. But I'm uncertain about stability. Anyone have an opinion? I used CentOS7 btrfs myself, just doing some tests..it crashed easily. I don’t know how much efforts that Redhat do on btrfs for 7 series. Maybe use SUSE enterprise for btrfs will be a better choice, they offered better support for btrfs as far as i know. Dave -- As long as politics is the shadow cast on society by big business, the attenuation of the shadow will not change the substance. -- John Dewey -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: call inode_dec_link_count() on mkdir error path
From: Wang Shilong wangshilong1...@gmail.com In btrfs_mkdir(), if it fails to create dir, we should clean up existed items, setting inode's link properly to make sure it could be cleaned up properly. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- fs/btrfs/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8de2335..8a036ed 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6255,8 +6255,10 @@ static int btrfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) out_fail: btrfs_end_transaction(trans, root); - if (drop_on_err) + if (drop_on_err) { + inode_dec_link_count(inode); iput(inode); + } btrfs_balance_delayed_items(root); btrfs_btree_balance_dirty(root); return err; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BTRFS-PROGS][PATCH][CLEANUP] Remove some gotos
Hello, Hi all, before sending this patch I thought twice if it is really worth. I know the coding style is really a hot topic. But the world is already too ugly, to allow even that. And it is Merry Christmas, so let me to make me a gift: a small clean up. The culprit are some goto found in 'cmds-filesystem.c' inside the function 'cmd_show()'. These are used instead of an 'if' statement. I wish you all a Happy Christmas and a successful start to 2015. I think Nice cleanups are always good for Btrfs..Just take a look at Btrfs kernel codes etc which really scared a lot of code reviewers and newbies.. Merry Chirismas guys. ^_^ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] Btrfs: deal with all 'subvol=xxx' options once
Steps to reproduce: # mkfs.btrfs -f /dev/sdb # mount -t btrfs /dev/sdb /mnt # btrfs sub create /mnt/dir # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir It fails with: mount: mount(2) failed: No such file or directory Btrfs deal with subvolume mounting in a recursive way, to avoid looping, it will stripe out 'subvol=' string, then next loop will stop.Problem here is it only deal one string once, if users specify mount option multiple times. It will loop several times which is not good, and above reproducing steps will also return confusing results. Fix this problem by striping out all 'subvol=xxx' options, only last is valid. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- v3-v4: zero buffer to make sure strncat() work... --- fs/btrfs/super.c | 46 ++ 1 file changed, 26 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 21c60ee..cf3faf5 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1117,44 +1117,50 @@ static inline int is_subvolume_inode(struct inode *inode) */ static char *setup_root_args(char *args) { - unsigned len = strlen(args) + 2 + 1; - char *src, *dst, *buf; + unsigned len; + char *src = args; + char *p = args; + char *dst, *buf; /* * We need the same args as before, but with this substitution: * s!subvol=[^,]+!subvolid=0! * * Since the replacement string is up to 2 bytes longer than the -* original, allocate strlen(args) + 2 + 1 bytes. +* original, allocate strlen(args) + 2 * N + 1 bytes. */ - - src = strstr(args, subvol=); + p = strstr(src, subvol=); /* This shouldn't happen, but just in case.. */ - if (!src) + if (!p) return NULL; - buf = dst = kmalloc(len, GFP_NOFS); + len = strlen(args) + 1; + while (p) { + len += 2; + p = strchr(p, ','); + if (!p) + break; + p = strstr(p, subvol=); + } + + buf = dst = kzalloc(len, GFP_NOFS); if (!buf) return NULL; - /* -* If the subvol= arg is not at the start of the string, -* copy whatever precedes it into buf. -*/ - if (src != args) { - *src++ = '\0'; - strcpy(buf, args); - dst += strlen(args); + /* loop and replace all subvol=xxx string */ + while ((p = strstr(src, subvol=))) { + strncat(dst, src, p - src); + dst += p - src; + strcpy(dst, subvolid=0); + dst += strlen(subvolid=0); + src = p = strchr(p, ','); + if (!src) + break; } - - strcpy(dst, subvolid=0); - dst += strlen(subvolid=0); - /* * If there is a , after the original subvol=... string, * copy that suffix into our buffer. Otherwise, we're done. */ - src = strchr(src, ','); if (src) strcpy(dst, src); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 00/10] Implement device scrub/replace for RAID56
On Tue, Dec 2, 2014 at 7:39 AM, Miao Xie mi...@cn.fujitsu.com wrote: This patchset implement the device scrub/replace function for RAID56, the most implementation of the common data is similar to the other RAID type. The differentia or difficulty is the parity process. The basic idea is reading and check the data which has checksum out of the raid56 stripe lock, if the data is right, then lock the raid56 stripe, read out the other data in the same stripe, if no IO error happens, calculate the parity and check the original one, if the original parity is right, the scrub parity passes. or write out the new one. But if the common data(not parity) that we read out is wrong, we will try to recover it, and then check and repair the parity. And in order to avoid making the code more and more complex, we copy some code of common data process for the parity, the cleanup work is in my TODO list. We have done some test, the patchset worked well. Of course, more tests are welcome. If you are interesting to use it or test it, you can pull the patchset from https://github.com/miaoxie/linux-btrfs.git raid56-scrub-replace Changelog v3 - v4: - Fix the problem that the scrub's raid bio was cached, which was reported by Chris. - Remove the 10st patch, the deadlock that was described in that patch doesn't exist on the current kernel. - Rebase the patchset to the top of integration branch Thanks, I'll try this today. I need to rebase in a new version of the RCU patches, can you please cook one on top of v3.18-rc6 instead? BTW, Chris could you please replace with this newer version: https://patchwork.kernel.org/patch/5359251/ Fengguang reported a build failure regarding this. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: deal with all 'subvol=xxx' options once
Steps to reproduce: # mkfs.btrfs -f /dev/sdb # mount -t btrfs /dev/sdb /mnt # btrfs sub create /mnt/dir # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir It fails with: mount: mount(2) failed: No such file or directory Btrfs deal with subvolume mounting in a recursive way, to avoid looping, it will stripe out 'subvol=' string, then next loop will stop.Problem here is it only deal one string once, if users specify mount option multiple times. It will loop several times which is not good, and above reproducing steps will also return confusing results. Fix this problem by striping out all 'subvol=xxx' options, only last is valid. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- v1-v2: error handling and comment update --- fs/btrfs/super.c | 26 +++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 54bd91e..42f3176 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1115,7 +1115,7 @@ static inline int is_subvolume_inode(struct inode *inode) * subvolid=0 to make sure we get the actual tree root for path walking to the * subvol we want. */ -static char *setup_root_args(char *args) +static char *__setup_root_args(char *args) { unsigned len = strlen(args) + 2 + 1; char *src, *dst, *buf; @@ -1129,13 +1129,12 @@ static char *setup_root_args(char *args) */ src = strstr(args, subvol=); - /* This shouldn't happen, but just in case.. */ if (!src) return NULL; buf = dst = kmalloc(len, GFP_NOFS); if (!buf) - return NULL; + return ERR_PTR(-ENOMEM); /* * If the subvol= arg is not at the start of the string, @@ -1161,6 +1160,27 @@ static char *setup_root_args(char *args) return buf; } +static char *setup_root_args(char *args) +{ + char *p, *new_args; + + p = new_args = __setup_root_args(args); + /* in case users specify subvol=xxx option multiple times */ + while (p) { + p = __setup_root_args(new_args); + if (!p) + break; + if (!IS_ERR(p)) { + kfree(new_args); + new_args = p; + } else { + kfree(new_args); + return NULL; + } + } + return new_args; +} + static struct dentry *mount_subvol(const char *subvol_name, int flags, const char *device_name, char *data) { -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: deal with all 'subvol=xxx' options once
On Tue, Nov 25, 2014 at 04:20:11PM +0800, Wang Shilong wrote: Steps to reproduce: # mkfs.btrfs -f /dev/sdb # mount -t btrfs /dev/sdb /mnt # btrfs sub create /mnt/dir # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir It fails with: mount: mount(2) failed: No such file or directory The bug is real, but I don't like the fix. The mount path is hard to read already, and I'm afraid your fix adds another unobvious step to the whole processing. setup_root_args replaces subvol= with subvolid=0 once. I suggest to replace all occurences of subvol= here and not rely on the recursive behaviour of the mount callbacks. ok, if you like this way, i will do it. The (buggy) way how it works now is that the first occurence of subvol will get parsed and passed as newroot = vfs_kern_mount(,subvol=second,...,subvolid=0) and this will call back again to btrfs_mount and will try to mount the subvol 'second' but now relative to 'newroot'. Try this: # mkfs.btrfs -f /dev/sdb # mount -t btrfs /dev/sdb /mnt # btrfs sub create /mnt/dir # btrfs sub create /mnt/dir/dir2 # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir2 mount succeeds and the mounted subvolume is dir2. Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Extent tree rebuilding with snapshots patches
I did send these patches a long while ago, but due to some reasons, they were not merged, these are important fixes for fsck, without these patches, extent tree rebuilding did not work with snapshots. Also, /tests/fsck-tests.sh's extent tree rebuild test could always detect failure without these patches, unluckily, it need extra enviroment setting, so i supposed it was always 'NOTRUN'! last patch fix a regression for root rebuilding, root rebuilding should be at first, because if root(extent root eg) corrupted, root items also won't succeed. patches are based on David's integration-20141125 Wang Shilong (3): Btrfs-progs: fsck: deal with snapshot one by one when rebuilding extent tree Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots Btrfs-progs, fsck: move root items repair after root rebuilding cmds-check.c | 412 +++ 1 file changed, 303 insertions(+), 109 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH resend 2/3] Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots
From: Wang Shilong wangsl.f...@cn.fujitsu.com This patch makes us to rebuild a really corrupt extent tree with snapshots. To implement this, we have to verify whether a block is FULL BACKREF. This idea come from Josef Bacik: 1) We walk down the original tree, every eb we encounter has btrfs_header_owner(eb) == root-objectid. We add normal references for this root (BTRFS_TREE_BLOCK_REF_KEY) for this root. World peace is achieved. 2) We walk down the snapshotted tree. Say we didn't change anything at all, it was just a clean snapshot and then boom. So the btrfs_header_owner(root-node) == root-objectid, so normal backref. We walk down to the next level, where btrfs_header_owner(eb) != root-objectid, but the level above did, so we add normal refs for all of these blocks. We go down the next level, now our btrfs_header_owner(parent) != root-objectid and btrfs_header_owner(eb) != root-objectid. This is where we need to now go back and see if btrfs_header_owner(eb) currently has a ref on eb. If it does we are done, move on to the next block in this same level, we don't have to go further down. 3) Harder case, we snapshotted and then changed things in the original root. Do the same thing as in step 2, but now we get down to btrfs_header_owner(eb) != root-objectid btrfs_header_owner(parent) != root-objectid. We lookup the references we have for eb and notice that btrfs_header_owner(eb) no longer refers to eb. So now we must set FULL_BACKREF on this extent reference and add a SHARED_BLOCK_REF_KEY for this eb using the parent-start as the offset. And we need to keep walking down and doing the same thing until we either hit level 0 or btrfs_header_owner(eb) has a ref on the block. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- cmds-check.c | 132 +-- 1 file changed, 129 insertions(+), 3 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index ff2795d..a102752 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -112,6 +112,7 @@ struct extent_record { unsigned int owner_ref_checked:1; unsigned int is_root:1; unsigned int metadata:1; + unsigned int flag_block_full_backref:1; }; struct inode_backref { @@ -4608,6 +4609,127 @@ static int is_dropped_key(struct btrfs_key *key, return 0; } +static int calc_extent_flag(struct btrfs_root *root, + struct cache_tree *extent_cache, + struct extent_buffer *buf, + struct root_item_record *ri, + u64 *flags) +{ + int i; + int nritems = btrfs_header_nritems(buf); + struct btrfs_key key; + struct extent_record *rec; + struct cache_extent *cache; + struct data_backref *dback; + struct tree_backref *tback; + struct extent_buffer *new_buf; + u64 owner = 0; + u64 bytenr; + u64 offset; + u64 ptr; + int size; + int ret; + u8 level; + + /* +* Except file/reloc tree, we can not have +* FULL BACKREF MODE +*/ + if (ri-objectid BTRFS_FIRST_FREE_OBJECTID) + goto normal; + /* +* root node +*/ + if (buf-start == ri-bytenr) + goto normal; + if (btrfs_is_leaf(buf)) { + /* +* we are searching from original root, world +* peace is achieved, we use normal backref. +*/ + owner = btrfs_header_owner(buf); + if (owner == ri-objectid) + goto normal; + /* +* we check every eb here, and if any of +* eb dosen't have original root refers +* to this eb, we set full backref flag for +* this extent, otherwise normal backref. +*/ + for (i = 0; i nritems; i++) { + struct btrfs_file_extent_item *fi; + btrfs_item_key_to_cpu(buf, key, i); + + if (key.type != BTRFS_EXTENT_DATA_KEY) + continue; + fi = btrfs_item_ptr(buf, i, + struct btrfs_file_extent_item); + if (btrfs_file_extent_type(buf, fi) == + BTRFS_FILE_EXTENT_INLINE) + continue; + if (btrfs_file_extent_disk_bytenr(buf, fi) == 0) + continue; + bytenr = btrfs_file_extent_disk_bytenr(buf, fi); + cache = lookup_cache_extent(extent_cache, bytenr, 1); + if (!cache) + goto full_backref; + offset = btrfs_file_extent_offset(buf, fi); + rec = container_of(cache, struct
[PATCH resend 1/3] Btrfs-progs: fsck: deal with snapshot one by one when rebuilding extent tree
From: Wang Shilong wangsl.f...@cn.fujitsu.com Previously, we deal with node block firstly and then leaf block which can maximize readahead. However, to rebuild extent tree, we need deal with snapshot one by one. This patch makes us deal with snapshot one by one if we need rebuild extent tree otherwise we drop into previous way. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- cmds-check.c | 248 +-- 1 file changed, 158 insertions(+), 90 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index 24b6b59..ff2795d 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -128,10 +128,14 @@ struct inode_backref { char name[0]; }; -struct dropping_root_item_record { +struct root_item_record { struct list_head list; - struct btrfs_root_item ri; - struct btrfs_key found_key; + u64 objectid; + u64 bytenr; + u8 level; + u8 drop_level; + int level_size; + struct btrfs_key drop_key; }; #define REF_ERR_NO_DIR_ITEM(1 0) @@ -4618,7 +4622,7 @@ static int run_next_block(struct btrfs_trans_handle *trans, struct rb_root *dev_cache, struct block_group_tree *block_group_cache, struct device_extent_tree *dev_extent_cache, - struct btrfs_root_item *ri) + struct root_item_record *ri) { struct extent_buffer *buf; u64 bytenr; @@ -4851,11 +4855,8 @@ static int run_next_block(struct btrfs_trans_handle *trans, size = btrfs_level_size(root, level - 1); btrfs_node_key_to_cpu(buf, key, i); if (ri != NULL) { - struct btrfs_key drop_key; - btrfs_disk_key_to_cpu(drop_key, - ri-drop_progress); if ((level == ri-drop_level) -is_dropped_key(key, drop_key)) { +is_dropped_key(key, ri-drop_key)) { continue; } } @@ -4896,7 +4897,7 @@ static int add_root_to_pending(struct extent_buffer *buf, struct cache_tree *pending, struct cache_tree *seen, struct cache_tree *nodes, - struct btrfs_key *root_key) + u64 objectid) { if (btrfs_header_level(buf) 0) add_pending(nodes, seen, buf-start, buf-len); @@ -4905,13 +4906,12 @@ static int add_root_to_pending(struct extent_buffer *buf, add_extent_rec(extent_cache, NULL, 0, buf-start, buf-len, 0, 1, 1, 0, 1, 0, buf-len); - if (root_key-objectid == BTRFS_TREE_RELOC_OBJECTID || + if (objectid == BTRFS_TREE_RELOC_OBJECTID || btrfs_header_backref_rev(buf) BTRFS_MIXED_BACKREF_REV) add_tree_backref(extent_cache, buf-start, buf-start, 0, 1); else - add_tree_backref(extent_cache, buf-start, 0, -root_key-objectid, 1); + add_tree_backref(extent_cache, buf-start, 0, objectid, 1); return 0; } @@ -6481,6 +6481,99 @@ static int check_devices(struct rb_root *dev_cache, return ret; } +static int add_root_item_to_list(struct list_head *head, + u64 objectid, u64 bytenr, + u8 level, u8 drop_level, + int level_size, struct btrfs_key *drop_key) +{ + + struct root_item_record *ri_rec; + ri_rec = malloc(sizeof(*ri_rec)); + if (!ri_rec) + return -ENOMEM; + ri_rec-bytenr = bytenr; + ri_rec-objectid = objectid; + ri_rec-level = level; + ri_rec-level_size = level_size; + ri_rec-drop_level = drop_level; + if (drop_key) + memcpy(ri_rec-drop_key, drop_key, sizeof(*drop_key)); + list_add_tail(ri_rec-list, head); + + return 0; +} + +static int deal_root_from_list(struct list_head *list, + struct btrfs_trans_handle *trans, + struct btrfs_root *root, + struct block_info *bits, + int bits_nr, + struct cache_tree *pending, + struct cache_tree *seen, + struct cache_tree *reada, + struct cache_tree *nodes, + struct cache_tree *extent_cache, + struct cache_tree *chunk_cache
[PATCH 3/3] Btrfs-progs, fsck: move root items repair after root rebuilding
If some critical roots are corrupt, reapr_root_items() will fail, this is detected by fsck_tests.sh's extent rebuilding tests. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- cmds-check.c | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index a102752..ae9005e 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -7987,22 +7987,6 @@ int cmd_check(int argc, char **argv) root = info-fs_root; - ret = repair_root_items(info); - if (ret 0) - goto close_out; - if (repair) { - fprintf(stderr, Fixed %d roots.\n, ret); - ret = 0; - } else if (ret 0) { - fprintf(stderr, - Found %d roots with an outdated root item.\n, - ret); - fprintf(stderr, - Please run a filesystem check with the option --repair to fix them.\n); - ret = 1; - goto close_out; - } - /* * repair mode will force us to commit transaction which * will make us fail to load log tree when mounting. @@ -8101,6 +8085,22 @@ int cmd_check(int argc, char **argv) if (ret) fprintf(stderr, Errors found in extent allocation tree or chunk allocation\n); + ret = repair_root_items(info); + if (ret 0) + goto close_out; + if (repair) { + fprintf(stderr, Fixed %d roots.\n, ret); + ret = 0; + } else if (ret 0) { + fprintf(stderr, + Found %d roots with an outdated root item.\n, + ret); + fprintf(stderr, + Please run a filesystem check with the option --repair to fix them.\n); + ret = 1; + goto close_out; + } + fprintf(stderr, checking free space cache\n); ret = check_space_cache(root); if (ret) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] Btrfs-progs, fsck: move root items repair after root rebuilding
If some critical roots are corrupt, reapr_root_items() will fail, this is detected by fsck_tests.sh's extent rebuilding tests. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- cmds-check.c | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index a102752..ae9005e 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -7987,22 +7987,6 @@ int cmd_check(int argc, char **argv) root = info-fs_root; - ret = repair_root_items(info); - if (ret 0) - goto close_out; - if (repair) { - fprintf(stderr, Fixed %d roots.\n, ret); - ret = 0; - } else if (ret 0) { - fprintf(stderr, - Found %d roots with an outdated root item.\n, - ret); - fprintf(stderr, - Please run a filesystem check with the option --repair to fix them.\n); - ret = 1; - goto close_out; - } - /* * repair mode will force us to commit transaction which * will make us fail to load log tree when mounting. @@ -8101,6 +8085,22 @@ int cmd_check(int argc, char **argv) if (ret) fprintf(stderr, Errors found in extent allocation tree or chunk allocation\n); + ret = repair_root_items(info); + if (ret 0) + goto close_out; + if (repair) { + fprintf(stderr, Fixed %d roots.\n, ret); + ret = 0; + } else if (ret 0) { + fprintf(stderr, + Found %d roots with an outdated root item.\n, + ret); + fprintf(stderr, + Please run a filesystem check with the option --repair to fix them.\n); + ret = 1; + goto close_out; + } + fprintf(stderr, checking free space cache\n); ret = check_space_cache(root); if (ret) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH resend 1/3] Btrfs-progs: fsck: deal with snapshot one by one when rebuilding extent tree
From: Wang Shilong wangsl.f...@cn.fujitsu.com Previously, we deal with node block firstly and then leaf block which can maximize readahead. However, to rebuild extent tree, we need deal with snapshot one by one. This patch makes us deal with snapshot one by one if we need rebuild extent tree otherwise we drop into previous way. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- cmds-check.c | 248 +-- 1 file changed, 158 insertions(+), 90 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index 24b6b59..ff2795d 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -128,10 +128,14 @@ struct inode_backref { char name[0]; }; -struct dropping_root_item_record { +struct root_item_record { struct list_head list; - struct btrfs_root_item ri; - struct btrfs_key found_key; + u64 objectid; + u64 bytenr; + u8 level; + u8 drop_level; + int level_size; + struct btrfs_key drop_key; }; #define REF_ERR_NO_DIR_ITEM(1 0) @@ -4618,7 +4622,7 @@ static int run_next_block(struct btrfs_trans_handle *trans, struct rb_root *dev_cache, struct block_group_tree *block_group_cache, struct device_extent_tree *dev_extent_cache, - struct btrfs_root_item *ri) + struct root_item_record *ri) { struct extent_buffer *buf; u64 bytenr; @@ -4851,11 +4855,8 @@ static int run_next_block(struct btrfs_trans_handle *trans, size = btrfs_level_size(root, level - 1); btrfs_node_key_to_cpu(buf, key, i); if (ri != NULL) { - struct btrfs_key drop_key; - btrfs_disk_key_to_cpu(drop_key, - ri-drop_progress); if ((level == ri-drop_level) -is_dropped_key(key, drop_key)) { +is_dropped_key(key, ri-drop_key)) { continue; } } @@ -4896,7 +4897,7 @@ static int add_root_to_pending(struct extent_buffer *buf, struct cache_tree *pending, struct cache_tree *seen, struct cache_tree *nodes, - struct btrfs_key *root_key) + u64 objectid) { if (btrfs_header_level(buf) 0) add_pending(nodes, seen, buf-start, buf-len); @@ -4905,13 +4906,12 @@ static int add_root_to_pending(struct extent_buffer *buf, add_extent_rec(extent_cache, NULL, 0, buf-start, buf-len, 0, 1, 1, 0, 1, 0, buf-len); - if (root_key-objectid == BTRFS_TREE_RELOC_OBJECTID || + if (objectid == BTRFS_TREE_RELOC_OBJECTID || btrfs_header_backref_rev(buf) BTRFS_MIXED_BACKREF_REV) add_tree_backref(extent_cache, buf-start, buf-start, 0, 1); else - add_tree_backref(extent_cache, buf-start, 0, -root_key-objectid, 1); + add_tree_backref(extent_cache, buf-start, 0, objectid, 1); return 0; } @@ -6481,6 +6481,99 @@ static int check_devices(struct rb_root *dev_cache, return ret; } +static int add_root_item_to_list(struct list_head *head, + u64 objectid, u64 bytenr, + u8 level, u8 drop_level, + int level_size, struct btrfs_key *drop_key) +{ + + struct root_item_record *ri_rec; + ri_rec = malloc(sizeof(*ri_rec)); + if (!ri_rec) + return -ENOMEM; + ri_rec-bytenr = bytenr; + ri_rec-objectid = objectid; + ri_rec-level = level; + ri_rec-level_size = level_size; + ri_rec-drop_level = drop_level; + if (drop_key) + memcpy(ri_rec-drop_key, drop_key, sizeof(*drop_key)); + list_add_tail(ri_rec-list, head); + + return 0; +} + +static int deal_root_from_list(struct list_head *list, + struct btrfs_trans_handle *trans, + struct btrfs_root *root, + struct block_info *bits, + int bits_nr, + struct cache_tree *pending, + struct cache_tree *seen, + struct cache_tree *reada, + struct cache_tree *nodes, + struct cache_tree *extent_cache, + struct cache_tree *chunk_cache
[PATCH 0/3] Extent tree rebuilding with snapshots patches
I did send these patches a long while ago, but due to some reasons, they were not merged, these are important fixes for fsck, without these patches, extent tree rebuilding did not work with snapshots. Also, /tests/fsck-tests.sh's extent tree rebuild test could always detect failure without these patches, unluckily, it need extra enviroment setting, so i supposed it was always 'NOTRUN'! last patch fix a regression for root rebuilding, root rebuilding should be at first, because if root(extent root eg) corrupted, root items also won't succeed. patches are based on David's integration-20141125 Wang Shilong (3): Btrfs-progs: fsck: deal with snapshot one by one when rebuilding extent tree Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots Btrfs-progs, fsck: move root items repair after root rebuilding cmds-check.c | 412 +++ 1 file changed, 303 insertions(+), 109 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH resend 2/3] Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots
From: Wang Shilong wangsl.f...@cn.fujitsu.com This patch makes us to rebuild a really corrupt extent tree with snapshots. To implement this, we have to verify whether a block is FULL BACKREF. This idea come from Josef Bacik: 1) We walk down the original tree, every eb we encounter has btrfs_header_owner(eb) == root-objectid. We add normal references for this root (BTRFS_TREE_BLOCK_REF_KEY) for this root. World peace is achieved. 2) We walk down the snapshotted tree. Say we didn't change anything at all, it was just a clean snapshot and then boom. So the btrfs_header_owner(root-node) == root-objectid, so normal backref. We walk down to the next level, where btrfs_header_owner(eb) != root-objectid, but the level above did, so we add normal refs for all of these blocks. We go down the next level, now our btrfs_header_owner(parent) != root-objectid and btrfs_header_owner(eb) != root-objectid. This is where we need to now go back and see if btrfs_header_owner(eb) currently has a ref on eb. If it does we are done, move on to the next block in this same level, we don't have to go further down. 3) Harder case, we snapshotted and then changed things in the original root. Do the same thing as in step 2, but now we get down to btrfs_header_owner(eb) != root-objectid btrfs_header_owner(parent) != root-objectid. We lookup the references we have for eb and notice that btrfs_header_owner(eb) no longer refers to eb. So now we must set FULL_BACKREF on this extent reference and add a SHARED_BLOCK_REF_KEY for this eb using the parent-start as the offset. And we need to keep walking down and doing the same thing until we either hit level 0 or btrfs_header_owner(eb) has a ref on the block. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- cmds-check.c | 132 +-- 1 file changed, 129 insertions(+), 3 deletions(-) diff --git a/cmds-check.c b/cmds-check.c index ff2795d..a102752 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -112,6 +112,7 @@ struct extent_record { unsigned int owner_ref_checked:1; unsigned int is_root:1; unsigned int metadata:1; + unsigned int flag_block_full_backref:1; }; struct inode_backref { @@ -4608,6 +4609,127 @@ static int is_dropped_key(struct btrfs_key *key, return 0; } +static int calc_extent_flag(struct btrfs_root *root, + struct cache_tree *extent_cache, + struct extent_buffer *buf, + struct root_item_record *ri, + u64 *flags) +{ + int i; + int nritems = btrfs_header_nritems(buf); + struct btrfs_key key; + struct extent_record *rec; + struct cache_extent *cache; + struct data_backref *dback; + struct tree_backref *tback; + struct extent_buffer *new_buf; + u64 owner = 0; + u64 bytenr; + u64 offset; + u64 ptr; + int size; + int ret; + u8 level; + + /* +* Except file/reloc tree, we can not have +* FULL BACKREF MODE +*/ + if (ri-objectid BTRFS_FIRST_FREE_OBJECTID) + goto normal; + /* +* root node +*/ + if (buf-start == ri-bytenr) + goto normal; + if (btrfs_is_leaf(buf)) { + /* +* we are searching from original root, world +* peace is achieved, we use normal backref. +*/ + owner = btrfs_header_owner(buf); + if (owner == ri-objectid) + goto normal; + /* +* we check every eb here, and if any of +* eb dosen't have original root refers +* to this eb, we set full backref flag for +* this extent, otherwise normal backref. +*/ + for (i = 0; i nritems; i++) { + struct btrfs_file_extent_item *fi; + btrfs_item_key_to_cpu(buf, key, i); + + if (key.type != BTRFS_EXTENT_DATA_KEY) + continue; + fi = btrfs_item_ptr(buf, i, + struct btrfs_file_extent_item); + if (btrfs_file_extent_type(buf, fi) == + BTRFS_FILE_EXTENT_INLINE) + continue; + if (btrfs_file_extent_disk_bytenr(buf, fi) == 0) + continue; + bytenr = btrfs_file_extent_disk_bytenr(buf, fi); + cache = lookup_cache_extent(extent_cache, bytenr, 1); + if (!cache) + goto full_backref; + offset = btrfs_file_extent_offset(buf, fi); + rec = container_of(cache, struct
Re: [PATCH 3/3] Btrfs-progs, fsck: move root items repair after root rebuilding
Oops, sorry for sending this twice, just entered one more *.patch… I did send these patches a long while ago, but due to some reasons, they were not merged, these are important fixes for fsck, without these patches, extent tree rebuilding did not work with snapshots. Also, /tests/fsck-tests.sh's extent tree rebuild test could always detect failure without these patches, unluckily, it need extra enviroment setting, so i supposed it was always 'NOTRUN'! last patch fix a regression for root rebuilding, root rebuilding should be at first, because if root(extent root eg) corrupted, root items also won't succeed. patches are based on David's integration-20141125 Wang Shilong (3): Btrfs-progs: fsck: deal with snapshot one by one when rebuilding extent tree Btrfs-progs: fsck: add ability to rebuild extent tree with snapshots Btrfs-progs, fsck: move root items repair after root rebuilding cmds-check.c | 412 +++ 1 file changed, 303 insertions(+), 109 deletions(-) -- 2.1.0 Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: deal with all 'subvol=xxx' options once
Steps to reproduce: # mkfs.btrfs -f /dev/sdb # mount -t btrfs /dev/sdb /mnt # btrfs sub create /mnt/dir # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir It fails with: mount: mount(2) failed: No such file or directory Btrfs deal with subvolume mounting in a recursive way, to avoid looping, it will stripe out 'subvol=' string, then next loop will stop.Problem here is it only deal one string once, if users specify mount option multiple times. It will loop several times which is not good, and above reproducing steps will also return confusing results. Fix this problem by striping out all 'subvol=xxx' options, only last is valid. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- fs/btrfs/super.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 54bd91e..2b2fa4b 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1115,7 +1115,7 @@ static inline int is_subvolume_inode(struct inode *inode) * subvolid=0 to make sure we get the actual tree root for path walking to the * subvol we want. */ -static char *setup_root_args(char *args) +static char *__setup_root_args(char *args) { unsigned len = strlen(args) + 2 + 1; char *src, *dst, *buf; @@ -1161,6 +1161,24 @@ static char *setup_root_args(char *args) return buf; } +static char *setup_root_args(char *args) +{ + char *p, *new_args; + + p = new_args = __setup_root_args(args); + /* in case users specify subvol=xxx option multiple times */ + while (p) { + p = __setup_root_args(new_args); + if (p) { + kfree(new_args); + new_args = p; + } else { + break; + } + } + return new_args; +} + static struct dentry *mount_subvol(const char *subvol_name, int flags, const char *device_name, char *data) { -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: deal with all 'subvol=xxx' options once
Steps to reproduce: # mkfs.btrfs -f /dev/sdb # mount -t btrfs /dev/sdb /mnt # btrfs sub create /mnt/dir # mount -t btrfs /dev/sdb /mnt -o subvol=dir,subvol=dir It fails with: mount: mount(2) failed: No such file or directory Btrfs deal with subvolume mounting in a recursive way, to avoid looping, it will stripe out 'subvol=' string, then next loop will stop.Problem here is it only deal one string once, if users specify mount option multiple times. It will loop several times which is not good, and above reproducing steps will also return confusing results. Fix this problem by striping out all 'subvol=xxx' options, only last is valid. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- fs/btrfs/super.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 54bd91e..2b2fa4b 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1115,7 +1115,7 @@ static inline int is_subvolume_inode(struct inode *inode) * subvolid=0 to make sure we get the actual tree root for path walking to the * subvol we want. */ -static char *setup_root_args(char *args) +static char *__setup_root_args(char *args) { unsigned len = strlen(args) + 2 + 1; char *src, *dst, *buf; @@ -1161,6 +1161,24 @@ static char *setup_root_args(char *args) return buf; } +static char *setup_root_args(char *args) +{ + char *p, *new_args; + + p = new_args = __setup_root_args(args); + /* in case users specify subvol=xxx option multiple times */ + while (p) { + p = __setup_root_args(new_args); Oh, i missed error handling here, will update it.. + if (p) { + kfree(new_args); + new_args = p; + } else { + break; + } + } + return new_args; +} + static struct dentry *mount_subvol(const char *subvol_name, int flags, const char *device_name, char *data) { -- 1.7.12.4 Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: fix allocationg memory failure for btrfsic_state structure
size of @btrfsic_state needs more than 2M, it is very likely to fail allocating memory using kzalloc(). see following mesage: [91428.902148] Call Trace: [816f6e0f] dump_stack+0x4d/0x66 [811b1c7f] warn_alloc_failed+0xff/0x170 [811b66e1] __alloc_pages_nodemask+0x951/0xc30 [811fd9da] alloc_pages_current+0x11a/0x1f0 [811b1e0b] ? alloc_kmem_pages+0x3b/0xf0 [811b1e0b] alloc_kmem_pages+0x3b/0xf0 [811d1018] kmalloc_order+0x18/0x50 [811d1074] kmalloc_order_trace+0x24/0x140 [a06c097b] btrfsic_mount+0x8b/0xae0 [btrfs] [810af555] ? check_preempt_curr+0x85/0xa0 [810b2de3] ? try_to_wake_up+0x103/0x430 [a063d200] open_ctree+0x1bd0/0x2130 [btrfs] [a060fdde] btrfs_mount+0x62e/0x8b0 [btrfs] [811fd9da] ? alloc_pages_current+0x11a/0x1f0 [811b0a5e] ? __get_free_pages+0xe/0x50 [81230429] mount_fs+0x39/0x1b0 [812509fb] vfs_kern_mount+0x6b/0x150 [812537fb] do_mount+0x27b/0xc30 [811b0a5e] ? __get_free_pages+0xe/0x50 [812544f6] SyS_mount+0x96/0xf0 [81701970] system_call_fastpath+0x16/0x1b Since we are allocating memory for hash table array, so it will be good if we could allocate continuous pages here. Fix this problem by firstly trying kzalloc(), if we fail, use vzalloc() instead. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- v1-v2: include vmalloc.h and swith kvfree() helper. --- fs/btrfs/check-integrity.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index cb7f3fe..f88b3a6 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -94,6 +94,7 @@ #include linux/mutex.h #include linux/genhd.h #include linux/blkdev.h +#include linux/vmalloc.h #include ctree.h #include disk-io.h #include hash.h @@ -3130,10 +3131,13 @@ int btrfsic_mount(struct btrfs_root *root, root-sectorsize, PAGE_CACHE_SIZE); return -1; } - state = kzalloc(sizeof(*state), GFP_NOFS); - if (NULL == state) { - printk(KERN_INFO btrfs check-integrity: kmalloc() failed!\n); - return -1; + state = kzalloc(sizeof(*state), GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); + if (!state) { + state = vzalloc(sizeof(*state)); + if (!state) { + printk(KERN_INFO btrfs check-integrity: vzalloc() failed!\n); + return -1; + } } if (!btrfsic_is_initialized) { @@ -3277,5 +3281,5 @@ void btrfsic_unmount(struct btrfs_root *root, mutex_unlock(btrfsic_mutex); - kfree(state); + kvfree(state); } -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: switch to kvfree() helper
A new helper kvfree() in mm/utils.c will do this. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- fs/btrfs/raid56.c | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 6a41631..12e343b 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -221,12 +221,8 @@ int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info *info) } x = cmpxchg(info-stripe_hash_table, NULL, table); - if (x) { - if (is_vmalloc_addr(x)) - vfree(x); - else - kfree(x); - } + if (x) + kvfree(x); return 0; } @@ -436,10 +432,7 @@ void btrfs_free_stripe_hash_table(struct btrfs_fs_info *info) if (!info-stripe_hash_table) return; btrfs_clear_rbio_cache(info); - if (is_vmalloc_addr(info-stripe_hash_table)) - vfree(info-stripe_hash_table); - else - kfree(info-stripe_hash_table); + kvfree(info-stripe_hash_table); info-stripe_hash_table = NULL; } -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Quota question
Anyone on this ? There is an issue with quotas depending on the write rate. The more we can write before a sync, the more we can exceed quotas limits Absolutely, this was a regression bug since Josef introduce delay refs merging for qgroup. Josef, would you please spend some time tracking this problem down? -- Cyril SCETBON I did some new tests and the issue with quota exceeded comes from the fact that the data is not sync to disk : http://pastebin.com/fZdh2YRu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: request for info on the list of parameters to tweak for PCIe SSDs
Hi Lakshmi, Personally, i haven’t done many benchmarks about performance tuning, but there are some info you could refer to: If you feel intrested, you could take a look at following url which offers btrfs mount options: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/btrfs.txt And for Wiki, there are many information which might help you: https://btrfs.wiki.kernel.org/index.php/Main_Page There are also some options that you could tune for mkfs, see detailed info by ‘man mkfs.btrfs’.I supposed for example different ’nodesize/leafsize’ will have effects for performance. Also there are some new features Btrfs, like skinny_metadata, no_holes etc, you could clone a Latest btrfs-progs tools from: http://git.kernel.org/cgit/linux/kernel/git/kdave/btrfs-progs.git/ Some features are not enabled by default(skinny_medata,no_holes etc), but for benchmarking, you could turn on them and using latest kernel with latest btrfs-progs: root@localhost autotest]# mkfs.btrfs -O list-all Filesystem features available at mkfs time: mixed-bg- mixed data and metadata block groups (0x4) extref - increased hardlink limit per file to 65536 (0x40, default) raid56 - raid56 extended format (0x80) skinny-metadata - reduced-size metadata extent refs (0x100) no-holes- no explicit hole extents for files (0x200) It will be good that you could do some benchmarks for Btrfs with different parameters, and feel free to let others know your results(i am also curious).^_^ Best Regards, Wang Shilong Hi, Could you kindly help us with the list of all the btrfs file system parameters , that can be tweaked for the best performance of the PCIe based SSDs ? Thanks, Lakshmi -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Poll: time to switch skinny-metadata on by default?
info in comments 10,14 in https://bugzilla.kernel.org/show_bug.cgi?id=64961 I can't reproduce this, how big is your home directory, and are you still seeing corruptions after just rsyncing to a clean fs? Thanks, as I wrote in comment 10, it has improved since year ago when I reported it: I see no corruption at all, neither after rsync, nor after balance crash: btrfs check doesn't find anything wrong, files look ok. The only problem is that after adding a disk the balance segfaults on a kernel bug and the fs gets stuck. When I run balance again after reboot, it makes only a very small progress and crashes again the same way. There are some 2.5TB of data in 7.5M files on that fs. And couple dozen ro snapshots -- I'm testing 3.17 + revert of 9c3b306e1c9e right now, but it takes more than day to copy the data and recreate all the snapshots. But a test with ^skinny-metadata showed no problems, so I don't thing I got bitten by that bug. I have btrfs-image of one of previous runs after crashed balance. It's 15GB. I can place it somewhere with fast link, are you interested? Yup, send me the link and I'll pull it down. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Poll: time to switch skinny-metadata on by default?
Sure, that is cool, let me know if i could give any help! I have an idle VM that could run btrfs tests there.^_^ Thanks I'll run this on Monday. Josef Wang Shilong wangshilong1...@gmail.com wrote: Hello Josef, With Skinny metadta and i running your btrfs-next repo for-suse branch (which has extent ref patch), i hit following problem: [ 250.679705] BTRFS info (device sdb): relocating block group 35597058048 flags 36 [ 250.728815] BTRFS info (device sdb): relocating block group 35462840320 flags 36 [ 253.562133] Dropping a ref for a root that doesn't have a ref on the block [ 253.562475] Dumping block entry [34793177088 8192], num_refs 3, metadata 0 [ 253.562795] Ref root 0, parent 35532013568, owner 23988, offset 0, num_refs 18446744073709551615 [ 253.563126] Ref root 0, parent 35560964096, owner 23988, offset 0, num_refs 1 [ 253.563505] Ref root 0, parent 35654615040, owner 23988, offset 0, num_refs 1 [ 253.563837] Ref root 0, parent 35678650368, owner 23988, offset 0, num_refs 1 [ 253.564162] Root entry 5, num_refs 1 [ 253.564520] Root entry 18446744073709551608, num_refs 18446744073709551615 [ 253.564860] Ref action 4, root 5, ref_root 5, parent 0, owner 23988, offset 0, num_refs 1 [ 253.565205][a049d2f1] process_leaf.isra.6+0x281/0x3e0 [btrfs] [ 253.565225][a049de83] build_ref_tree_for_root+0x433/0x460 [btrfs] [ 253.565234][a049e1af] btrfs_build_ref_tree+0x18f/0x1c0 [btrfs] [ 253.565241][a0419ce8] open_ctree+0x18b8/0x21a0 [btrfs] [ 253.565247][a03ecb0e] btrfs_mount+0x62e/0x8b0 [btrfs] [ 253.565251][812324e9] mount_fs+0x39/0x1b0 [ 253.565255][8125285b] vfs_kern_mount+0x6b/0x150 [ 253.565257][8125565b] do_mount+0x27b/0xc30 [ 253.565259][81256356] SyS_mount+0x96/0xf0 [ 253.565260][81795429] system_call_fastpath+0x16/0x1b [ 253.565263][] 0x [ 253.565272] Ref action 1, root 18446744073709551608, ref_root 0, parent 35654615040, owner 23988, offset 0, num_refs 1 [ 253.565681][a049d564] btrfs_ref_tree_mod+0x114/0x570 [btrfs] [ 253.565692][a03f946b] btrfs_inc_extent_ref+0x6b/0x120 [btrfs] [ 253.565697][a03fb77c] __btrfs_mod_ref+0x16c/0x2b0 [btrfs] [ 253.565702][a0401504] btrfs_inc_ref+0x14/0x20 [btrfs] [ 253.565707][a03f05ff] update_ref_for_cow+0x15f/0x380 [btrfs] [ 253.565711][a03f0a3d] __btrfs_cow_block+0x21d/0x540 [btrfs] [ 253.565716][a03f0f0c] btrfs_cow_block+0x12c/0x290 [btrfs] [ 253.565721][a046f59c] do_relocation+0x49c/0x570 [btrfs] [ 253.565728][a04723ce] relocate_tree_blocks+0x60e/0x660 [btrfs] [ 253.565735][a0473ce7] relocate_block_group+0x407/0x690 [btrfs] [ 253.565741][a0474148] btrfs_relocate_block_group+0x1d8/0x2f0 [btrfs] [ 253.565746][a04455a7] btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs] [ 253.565753][a0448a8b] __btrfs_balance+0x4eb/0x8d0 [btrfs] [ 253.565760][a044928a] btrfs_balance+0x41a/0x720 [btrfs] [ 253.565766][a045112a] btrfs_ioctl_balance+0x16a/0x530 [btrfs] [ 253.565772][a0456df8] btrfs_ioctl+0x588/0x2cb0 [btrfs] [ 253.565779] Ref action 1, root 18446744073709551608, ref_root 0, parent 35560964096, owner 23988, offset 0, num_refs 1 [ 253.566143][a049d564] btrfs_ref_tree_mod+0x114/0x570 [btrfs] [ 253.566152][a03f946b] btrfs_inc_extent_ref+0x6b/0x120 [btrfs] [ 253.566180][a03fb77c] __btrfs_mod_ref+0x16c/0x2b0 [btrfs] [ 253.566186][a0401504] btrfs_inc_ref+0x14/0x20 [btrfs] [ 253.566191][a03f071b] update_ref_for_cow+0x27b/0x380 [btrfs] [ 253.566195][a03f0a3d] __btrfs_cow_block+0x21d/0x540 [btrfs] [ 253.566199][a03f0f0c] btrfs_cow_block+0x12c/0x290 [btrfs] [ 253.566203][a046f59c] do_relocation+0x49c/0x570 [btrfs] [ 253.566210][a04723ce] relocate_tree_blocks+0x60e/0x660 [btrfs] [ 253.566216][a0473ce7] relocate_block_group+0x407/0x690 [btrfs] [ 253.566222][a0474148] btrfs_relocate_block_group+0x1d8/0x2f0 [btrfs] [ 253.566227][a04455a7] btrfs_relocate_chunk.isra.30+0x77/0x800 [btrfs] [ 253.566233][a0448a8b] __btrfs_balance+0x4eb/0x8d0 [btrfs] [ 253.566240][a044928a] btrfs_balance+0x41a/0x720 [btrfs] [ 253.566245][a045112a] btrfs_ioctl_balance+0x16a/0x530 [btrfs] [ 253.566252][a0456df8] btrfs_ioctl+0x588/0x2cb0 [btrfs] [ 253.566258] Ref action 2, root 18446744073709551608, ref_root 5, parent 0, owner 23988, offset 0, num_refs 18446744073709551615 [ 253.566641][a049d710] btrfs_ref_tree_mod+0x2c0/0x570 [btrfs] [ 253.566651][a040404a
Re: [PATCH 1/2] Btrfs: check_int: use the known block location
The xfstest btrfs/014 which tests the balance operation caused issues with the check_int module. The attempt was made to use btrfs_map_block() to find the physical location for a written block. However, this was not at all needed since the location of the written block was known since a hook to submit_bio() was the reason for entering the check_int module. Additionally, after a block relocation it happened that btrfs_map_block() failed causing misleading error messages afterwards. This patch changes the check_int module to use the known information of the physical location from the bio. Reported-by: Wang Shilong wangshilong1...@gmail.com Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de This passed my Tests after applying both patches. Tested-by: Wang Shilong wangshilong1...@gmail.com --- fs/btrfs/check-integrity.c | 66 -- 1 file changed, 11 insertions(+), 55 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index ce92ae30250f..65fc2e0bbc4a 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -326,9 +326,6 @@ static int btrfsic_handle_extent_data(struct btrfsic_state *state, static int btrfsic_map_block(struct btrfsic_state *state, u64 bytenr, u32 len, struct btrfsic_block_data_ctx *block_ctx_out, int mirror_num); -static int btrfsic_map_superblock(struct btrfsic_state *state, u64 bytenr, - u32 len, struct block_device *bdev, - struct btrfsic_block_data_ctx *block_ctx_out); static void btrfsic_release_block_ctx(struct btrfsic_block_data_ctx *block_ctx); static int btrfsic_read_block(struct btrfsic_state *state, struct btrfsic_block_data_ctx *block_ctx); @@ -1609,25 +1606,6 @@ static int btrfsic_map_block(struct btrfsic_state *state, u64 bytenr, u32 len, return ret; } -static int btrfsic_map_superblock(struct btrfsic_state *state, u64 bytenr, - u32 len, struct block_device *bdev, - struct btrfsic_block_data_ctx *block_ctx_out) -{ - block_ctx_out-dev = btrfsic_dev_state_lookup(bdev); - block_ctx_out-dev_bytenr = bytenr; - block_ctx_out-start = bytenr; - block_ctx_out-len = len; - block_ctx_out-datav = NULL; - block_ctx_out-pagev = NULL; - block_ctx_out-mem_to_free = NULL; - if (NULL != block_ctx_out-dev) { - return 0; - } else { - printk(KERN_INFO btrfsic: error, cannot lookup dev (#2)!\n); - return -ENXIO; - } -} - static void btrfsic_release_block_ctx(struct btrfsic_block_data_ctx *block_ctx) { if (block_ctx-mem_to_free) { @@ -2004,24 +1982,13 @@ again: } } - if (block-is_superblock) - ret = btrfsic_map_superblock(state, bytenr, - processed_len, - bdev, block_ctx); - else - ret = btrfsic_map_block(state, bytenr, processed_len, - block_ctx, 0); - if (ret) { - printk(KERN_INFO -btrfsic: btrfsic_map_block(root @%llu) - failed!\n, bytenr); - goto continue_loop; - } - block_ctx.datav = mapped_datav; - /* the following is required in case of writes to mirrors, - * use the same that was used for the lookup */ block_ctx.dev = dev_state; block_ctx.dev_bytenr = dev_bytenr; + block_ctx.start = bytenr; + block_ctx.len = processed_len; + block_ctx.pagev = NULL; + block_ctx.mem_to_free = NULL; + block_ctx.datav = mapped_datav; if (is_metadata || state-include_extent_data) { block-never_written = 0; @@ -2135,10 +2102,6 @@ again: /* this is getting ugly for the * include_extent_data case... */ bytenr = 0; /* unknown */ - block_ctx.start = bytenr; - block_ctx.len = processed_len; - block_ctx.mem_to_free = NULL; - block_ctx.pagev = NULL; } else { processed_len = state-metablock_size; bytenr = btrfs_stack_header_bytenr( @@ -2151,22 +2114,15 @@ again: Written block @%llu (%s/%llu/?) !found in hash table, M.\n, bytenr, dev_state-name, dev_bytenr); - - ret
Re: [PATCH 2/2] Btrfs: check-int: don't complain about balanced blocks
The xfstest btrfs/014 which tests the balance operation caused that the check_int module complained that known blocks changed their physical location. Since this is not an error in this case, only print such message if the verbose mode was enabled. Reported-by: Wang Shilong wangshilong1...@gmail.com Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de This passed my Tests after applying both patches. Tested-by: Wang Shilong wangshilong1...@gmail.com --- fs/btrfs/check-integrity.c | 87 ++ 1 file changed, 49 insertions(+), 38 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 65fc2e0bbc4a..65226d7c9fe0 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -1325,24 +1325,28 @@ static int btrfsic_create_link_to_next_block( l = NULL; next_block-generation = BTRFSIC_GENERATION_UNKNOWN; } else { - if (next_block-logical_bytenr != next_bytenr - !(!next_block-is_metadata - 0 == next_block-logical_bytenr)) { - printk(KERN_INFO -Referenced block @%llu (%s/%llu/%d) - found in hash table, %c, - bytenr mismatch (!= stored %llu).\n, -next_bytenr, next_block_ctx-dev-name, -next_block_ctx-dev_bytenr, *mirror_nump, -btrfsic_get_block_type(state, next_block), -next_block-logical_bytenr); - } else if (state-print_mask BTRFSIC_PRINT_MASK_VERBOSE) - printk(KERN_INFO -Referenced block @%llu (%s/%llu/%d) - found in hash table, %c.\n, -next_bytenr, next_block_ctx-dev-name, -next_block_ctx-dev_bytenr, *mirror_nump, -btrfsic_get_block_type(state, next_block)); + if (state-print_mask BTRFSIC_PRINT_MASK_VERBOSE) { + if (next_block-logical_bytenr != next_bytenr + !(!next_block-is_metadata + 0 == next_block-logical_bytenr)) + printk(KERN_INFO +Referenced block @%llu (%s/%llu/%d) + found in hash table, %c, + bytenr mismatch (!= stored %llu).\n, +next_bytenr, next_block_ctx-dev-name, +next_block_ctx-dev_bytenr, *mirror_nump, +btrfsic_get_block_type(state, + next_block), +next_block-logical_bytenr); + else + printk(KERN_INFO +Referenced block @%llu (%s/%llu/%d) + found in hash table, %c.\n, +next_bytenr, next_block_ctx-dev-name, +next_block_ctx-dev_bytenr, *mirror_nump, +btrfsic_get_block_type(state, + next_block)); + } next_block-logical_bytenr = next_bytenr; next_block-mirror_num = *mirror_nump; @@ -1528,7 +1532,9 @@ static int btrfsic_handle_extent_data( return -1; } if (!block_was_created) { - if (next_block-logical_bytenr != next_bytenr + if ((state-print_mask + BTRFSIC_PRINT_MASK_VERBOSE) + next_block-logical_bytenr != next_bytenr !(!next_block-is_metadata 0 == next_block-logical_bytenr)) { printk(KERN_INFO @@ -1881,25 +1887,30 @@ again: dev_state, dev_bytenr); } - if (block-logical_bytenr != bytenr - !(!block-is_metadata - block-logical_bytenr == 0)) - printk(KERN_INFO -Written block @%llu (%s/%llu/%d) - found in hash table, %c, - bytenr mismatch - (!= stored %llu).\n, -bytenr, dev_state-name, dev_bytenr
Re: [PATCH v2] Btrfs-progs: check, fix return value check of is_child_root()
The following commit: btrfs-progs: fsck: remove unfriendly BUG_ON() for searching tree failure f495a2ac66116f0a1b15e73380c8cbca6e0a4ca0 introduced a regression, detected through xfstests/btrfs/054, where previously a negative return value (-1) was used to mean a particular root didn't had any parent root, and now, after that change, a negative value is also used to mean that an error happened. That change also made the only caller of is_child_root() interpret any negative return value as an error and therefore incorrectly made the caller leave with an error, instead of continuing. This affects only the 3.17 release candidates (3.16 and older releases don't have this issue). Signed-off-by: Filipe Manana fdman...@suse.com Thanks for Fixing this, Filipe! Reviewed-by: Wang Shilong wangshilong1...@gmail.com --- V2: Made it return 2 (instead of -1) when the root child_root_id doesn't have any parent roots, in order to behave exactly like the code pre-commit f495a2ac66116f0a1b15e73380c8cbca6e0a4ca0. cmds-check.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index 99d1a94..310eb2a 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -895,6 +895,14 @@ static int leave_shared_node(struct btrfs_root *root, return 0; } +/* + * Returns: + * 0 - on error + * 1 - if the root with id child_root_id is a child of root parent_root_id + * 0 - if the root child_root_id isn't a child of the root parent_root_id but + * has other root(s) as parent(s) + * 2 - if the root child_root_id doesn't have any parent roots + */ static int is_child_root(struct btrfs_root *root, u64 parent_root_id, u64 child_root_id) { @@ -952,7 +960,7 @@ out: btrfs_release_path(path); if (ret 0) return ret; - return has_parent? 0 : -1; + return has_parent ? 0 : 2; } static int process_dir_item(struct btrfs_root *root, -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problems running Balance and checking integrity concurrently.
Hi Stefan Behrens, Thanks very much for your attention! More stress tests will be followed and will give you feedbacks when finished. On Fri, 10 Oct 2014 21:35:02 +0800, Wang Shilong wrote: Hello, I have reproduced a problem with Btrfs integrity and balance using latest btrfs kernel,it is very easy to reproduce: With mount option “check_int”, and run xfstests/btrfs/ tests, below test could definitely reproduce problem: ./check tests/btrfs/014 The following two patches fix the issue. Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Problems running Balance and checking integrity concurrently.
Hello, I have reproduced a problem with Btrfs integrity and balance using latest btrfs kernel,it is very easy to reproduce: With mount option “check_int”, and run xfstests/btrfs/ tests, below test could definitely reproduce problem: ./check tests/btrfs/014 5.122248] Written block @25800343552 (sdc/1210187776/0) found in hash table, M, bytenr mismatch (!= stored 20968505344). [ 455.122283] Written block @25800359936 (sdc/1210204160/1) found in hash table, M, bytenr mismatch (!= stored 20968521728). [ 455.122301] Written block @25800376320 (sdc/1210220544/1) found in hash table, M, bytenr mismatch (!= stored 20968538112). [ 455.122347] Referenced block @25800343552 (sdc/1344405504/2) found in hash table, M, bytenr mismatch (!= stored 20968505344). [ 455.122427] Referenced block @25800359936 (sdc/1344421888/2) found in hash table, M, bytenr mismatch (!= stored 20968521728). [ 455.122853] Written block @25800376320 (sdc/1344438272/0) found in hash table, M, bytenr mismatch (!= stored 20968538112). [ 455.126275] BTRFS critical (device sdc): unable to find logical 3452043264 len 4096 [ 455.126491] btrfsic: btrfsic_map_block(root @3452043264) failed! [ 455.126496] BTRFS critical (device sdc): unable to find logical 3452059648 len 4096 [ 455.126769] btrfsic: btrfsic_map_block(root @3452059648) failed! [ 455.126773] BTRFS critical (device sdc): unable to find logical 3452076032 len 4096 [ 455.127007] btrfsic: btrfsic_map_block(root @3452076032) failed! [ 455.127011] BTRFS critical (device sdc): unable to find logical 3452092416 len 4096 snip [ 455.129864] btrfsic: btrfsic_map_block(root @3452272640) failed! [ 455.129869] BTRFS critical (device sdc): unable to find logical 3452289024 len 4096 [ 455.130113] btrfsic: btrfsic_map_block(root @3452289024) failed! [ 455.130756] BTRFS critical (device sdc): unable to find logical 3451994112 len 4096 [ 455.130984] btrfsic: btrfsic_map_block(root @3451994112) failed! [ 455.130988] BTRFS critical (device sdc): unable to find logical 3452010496 len 4096 [ 455.131192] btrfsic: btrfsic_map_block(root @3452010496) failed! [ 455.131195] BTRFS critical (device sdc): unable to find logical 3452026880 len 4096 [ 455.131401] btrfsic: btrfsic_map_block(root @3452026880) failed! [ 455.131914] Written block @25800392704 (sdc/1210236928/0) found in hash table, M, bytenr mismatch (!= stored 20968554496). [ 455.131938] Referenced block @25800409088 (sdc/1210253312/1) found in hash table, M, bytenr mismatch (!= stored 20968570880). [ 455.131971] Referenced block @25800409088 (sdc/1344471040/2) found in hash table, M, bytenr mismatch (!= stored 20968570880). [ 455.132017] Referenced block @25800425472 (sdc/1210269696/1) found in hash table, M, bytenr mismatch (!= stored 20968587264). [ 455.132030] Referenced block @25800425472 (sdc/1344487424/2) found in hash table, M, bytenr mismatch (!= stored 20968587264). [ 455.132344] Written block @25800392704 (sdc/1344454656/0) found in hash table, M, bytenr mismatch (!= stored 20968554496 I supposed there are some race problems between integrity check and Btrfs balance, could you guys please take a look at this issue! Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix allocationg memory failure for btrfsic_state structure
size of @btrfsic_state needs more than 2M, it is very likely to fail allocating memory using kzalloc(). see following mesage: [91428.902148] Call Trace: [816f6e0f] dump_stack+0x4d/0x66 [811b1c7f] warn_alloc_failed+0xff/0x170 [811b66e1] __alloc_pages_nodemask+0x951/0xc30 [811fd9da] alloc_pages_current+0x11a/0x1f0 [811b1e0b] ? alloc_kmem_pages+0x3b/0xf0 [811b1e0b] alloc_kmem_pages+0x3b/0xf0 [811d1018] kmalloc_order+0x18/0x50 [811d1074] kmalloc_order_trace+0x24/0x140 [a06c097b] btrfsic_mount+0x8b/0xae0 [btrfs] [810af555] ? check_preempt_curr+0x85/0xa0 [810b2de3] ? try_to_wake_up+0x103/0x430 [a063d200] open_ctree+0x1bd0/0x2130 [btrfs] [a060fdde] btrfs_mount+0x62e/0x8b0 [btrfs] [811fd9da] ? alloc_pages_current+0x11a/0x1f0 [811b0a5e] ? __get_free_pages+0xe/0x50 [81230429] mount_fs+0x39/0x1b0 [812509fb] vfs_kern_mount+0x6b/0x150 [812537fb] do_mount+0x27b/0xc30 [811b0a5e] ? __get_free_pages+0xe/0x50 [812544f6] SyS_mount+0x96/0xf0 [81701970] system_call_fastpath+0x16/0x1b Since we are allocating memory for hash table array, so it will be good if we could allocate continuous pages here. Fix this problem by firstly trying kzalloc(), if we fail, use vzalloc() instead. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- fs/btrfs/check-integrity.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index cb7f3fe..66e820f 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -3130,10 +3130,13 @@ int btrfsic_mount(struct btrfs_root *root, root-sectorsize, PAGE_CACHE_SIZE); return -1; } - state = kzalloc(sizeof(*state), GFP_NOFS); - if (NULL == state) { - printk(KERN_INFO btrfs check-integrity: kmalloc() failed!\n); - return -1; + state = kzalloc(sizeof(*state), GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); + if (!state) { + state = vzalloc(sizeof(*state)); + if (!state) { + printk(KERN_INFO btrfs check-integrity: vzalloc() failed!\n); + return -1; + } } if (!btrfsic_is_initialized) { @@ -3277,5 +3280,8 @@ void btrfsic_unmount(struct btrfs_root *root, mutex_unlock(btrfsic_mutex); - kfree(state); + if (is_vmalloc_addr(state)) + vfree(state); + else + kfree(state); } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix incorrect compression ratio detection
Steps to reproduce: # mkfs.btrfs -f /dev/sdb # mount -t btrfs /dev/sdb /mnt -o compress=lzo # dd if=/dev/zero of=/mnt/data bs=$((33*4096)) count=1 after previous steps, inode will be detected as bad compression ratio, and NOCOMPRESS flag will be set for that inode. Reason is that compress have a max limit pages every time(128K), if a 132k write in, it will be splitted into two write(128k+4k), this bug is a leftover for commit 68bb462d42a(Btrfs: don't compress for a small write) Fix this problem by checking every time before compression, if it is a small write(=blocksize), we bail out and fall into nocompression directly. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- fs/btrfs/inode.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 344a322..b78e90a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -411,14 +411,6 @@ static noinline int compress_file_range(struct inode *inode, (start 0 || end + 1 BTRFS_I(inode)-disk_i_size)) btrfs_add_inode_defrag(NULL, inode); - /* -* skip compression for a small file range(=blocksize) that -* isn't an inline extent, since it dosen't save disk space at all. -*/ - if ((end - start + 1) = blocksize - (start 0 || end + 1 BTRFS_I(inode)-disk_i_size)) - goto cleanup_and_bail_uncompressed; - actual_end = min_t(u64, isize, end + 1); again: will_compress = 0; @@ -440,6 +432,14 @@ again: total_compressed = actual_end - start; + /* +* skip compression for a small file range(=blocksize) that +* isn't an inline extent, since it dosen't save disk space at all. +*/ + if (total_compressed = blocksize + (start 0 || end + 1 BTRFS_I(inode)-disk_i_size)) + goto cleanup_and_bail_uncompressed; + /* we want to make sure that amount of ram required to uncompress * an extent is reasonable, so we limit the total size in ram * of a compressed extent to 128k. This is a crucial number -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xfstests: add regression for inode bad compression ratio detection
This test add a regression test to make sure btrfs dosen't set inode NOCOMPRESS flag when a small write(=blocksize) happens. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- Test is a little relying on btrfs now have a max 128k limit write, not sure whether it is good since it is possible we may change it in the future, let me know if you have better idea ^_^. --- tests/btrfs/060 | 72 + tests/btrfs/060.out | 3 +++ tests/btrfs/group | 1 + 3 files changed, 76 insertions(+) create mode 100644 tests/btrfs/060 create mode 100644 tests/btrfs/060.out diff --git a/tests/btrfs/060 b/tests/btrfs/060 new file mode 100644 index 000..6ef2e0f --- /dev/null +++ b/tests/btrfs/060 @@ -0,0 +1,72 @@ +#! /bin/bash +# FS QA Test No. btrfs/060 +# +# Regression test for btrfs incorrection inode ratio detection. +# This was fixed in the following linux kernel patch: +# +# Btrfs: fix incorrect compression ratio detection +# +#--- +# Copyright (c) 2014 Wang Shilong. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ + +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -fr $tmp +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_test +_require_scratch +_need_to_be_root + +rm -f $seqres.full + +_scratch_mkfs $seqres.full 21 +_scratch_mount -o compress=lzo + +$XFS_IO_PROG -f -c pwrite 0 10M -c fsync \ + $SCRATCH_MNT/data $seqres.full 21 + +filefrag $SCRATCH_MNT/data | $AWK_PROG '{print $2}' + +$XFS_IO_PROG -f -c pwrite 0 $((4096*33)) -c fsync \ + $SCRATCH_MNT/data $seqres.full 21 + +$XFS_IO_PROG -f -c pwrite 0 10M -c fsync \ + $SCRATCH_MNT/data $seqres.full 21 +filefrag $SCRATCH_MNT/data | $AWK_PROG '{print $2}' + +status=0 +exit diff --git a/tests/btrfs/060.out b/tests/btrfs/060.out new file mode 100644 index 000..1c306d2 --- /dev/null +++ b/tests/btrfs/060.out @@ -0,0 +1,3 @@ +QA output created by 060 +80 +80 diff --git a/tests/btrfs/group b/tests/btrfs/group index 68b5c79..d06c163 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -62,3 +62,4 @@ 057 auto quick 058 auto quick 059 auto quick +060 auto quick -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] xfstests: add regression for inode bad compression ratio detection
This test add a regression test to make sure btrfs dosen't set inode NOCOMPRESS flag when a small write(=blocksize) happens. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- v1-v2: switch _extent_count() helper. Test relies on btrfs have a max 128k limit write, not sure whether it is good since it is possible we may change it in the future. let me know if you have better idea here. ^_^ --- tests/btrfs/060 | 74 + tests/btrfs/060.out | 3 +++ tests/btrfs/group | 1 + 3 files changed, 78 insertions(+) create mode 100644 tests/btrfs/060 create mode 100644 tests/btrfs/060.out diff --git a/tests/btrfs/060 b/tests/btrfs/060 new file mode 100644 index 000..3ff4055 --- /dev/null +++ b/tests/btrfs/060 @@ -0,0 +1,74 @@ +#! /bin/bash +# FS QA Test No. btrfs/060 +# +# Regression test for btrfs incorrect inode ratio detection. +# This was fixed in the following linux kernel patch: +# +# Btrfs: fix incorrect compression ratio detection +# +#--- +# Copyright (c) 2014 Wang Shilong. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ + +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -fr $tmp +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/defrag + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_test +_require_scratch +_need_to_be_root + +rm -f $seqres.full + +_scratch_mkfs $seqres.full 21 +_scratch_mount -o compress=lzo + +$XFS_IO_PROG -f -c pwrite 0 10M -c fsync \ + $SCRATCH_MNT/data $seqres.full 21 + +_extent_count $SCRATCH_MNT/data + +$XFS_IO_PROG -f -c pwrite 0 $((4096*33)) -c fsync \ + $SCRATCH_MNT/data $seqres.full 21 + +$XFS_IO_PROG -f -c pwrite 0 10M -c fsync \ + $SCRATCH_MNT/data $seqres.full 21 + +_extent_count $SCRATCH_MNT/data + +status=0 +exit diff --git a/tests/btrfs/060.out b/tests/btrfs/060.out new file mode 100644 index 000..1c306d2 --- /dev/null +++ b/tests/btrfs/060.out @@ -0,0 +1,3 @@ +QA output created by 060 +80 +80 diff --git a/tests/btrfs/group b/tests/btrfs/group index 68b5c79..d06c163 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -62,3 +62,4 @@ 057 auto quick 058 auto quick 059 auto quick +060 auto quick -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] xfstests,btrfs/010: use _extent_count() helper
cleanup to swith _extent_count(), this way we remove a dependence on filefrag. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- tests/btrfs/010 | 8 1 file changed, 4 insertions(+), 4 deletions(-) mode change 100755 = 100644 tests/btrfs/010 diff --git a/tests/btrfs/010 b/tests/btrfs/010 old mode 100755 new mode 100644 index 371f1b8..6cfc561 --- a/tests/btrfs/010 +++ b/tests/btrfs/010 @@ -40,6 +40,7 @@ _cleanup() # get standard environment, filters and checks . ./common/rc . ./common/filter +. ./common/defrag # real QA test starts here @@ -47,7 +48,6 @@ _cleanup() _supported_fs btrfs _supported_os Linux _require_scratch -_require_command /usr/sbin/filefrag rm -f $seqres.full @@ -71,9 +71,9 @@ done $BTRFS_UTIL_PROG filesystem defrag $subvol/foobar sync -num_extent_orig=`filefrag $subvol/foobar | awk -F ' ' '{print $2}'` -num_extent_snap1=`filefrag $SCRATCH_MNT/snap-1/foobar | awk -F ' ' '{print $2}'` -num_extent_snap2=`filefrag $SCRATCH_MNT/snap-2/foobar | awk -F ' ' '{print $2}'` +num_extent_orig=`_extent_count $subvol/foobar` +num_extent_snap1=`_extent_count $SCRATCH_MNT/snap-1/foobar` +num_extent_snap2=`_extent_count $SCRATCH_MNT/snap-2/foobar` echo $num_extent_orig $num_extent_snap1 $num_extent_snap2 $seqres.full -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] xfstests,btrfs/004: use FILEFRAG_PROG macro
Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- i didn't find a way to remove filefrag here, but let us use macro firstly... --- tests/btrfs/004 | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/btrfs/004 b/tests/btrfs/004 index 1d5b816..1f59963 100755 --- a/tests/btrfs/004 +++ b/tests/btrfs/004 @@ -53,7 +53,6 @@ _supported_os Linux _require_scratch _require_no_large_scratch_dev _require_btrfs inspect-internal -_require_command /usr/sbin/filefrag rm -f $seqres.full @@ -82,7 +81,7 @@ _filter_extents() _check_file_extents() { - cmd=filefrag -v $1 + cmd=$FILEFRAG_PROG -v $1 echo # $cmd $seqres.full out=`$cmd | _filter_extents` if [ -z $out ]; then -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] xfstests,btrfs/004: use FILEFRAG_PROG macro
Cc list On Wed, Oct 08, 2014 at 06:35:22AM -0400, Wang Shilong wrote: Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- i didn't find a way to remove filefrag here, but let us use macro firstly... YOu shoul dbe able to use fiemap to calculate the same output. i.e. _check_file_extents() outputs a strings of the format #physical#length#logical#flags and this is then cut up later later by the test. fiemap outputs the same information, it just requires slightly different processing via the _filter_extents function to munge it into the same format… I supposed there are some differences between fiemap and filefrag: [root@vm01 mnt]# filefrag -v data Filesystem type is: 9123683e File size of data is 1024 (1 block of 4096 bytes) ext: logical_offset:physical_offset: length: expected: flags: 0:0..4095: 0.. 4095: 4096: not_aligned,inline,eof data: 1 extent found [root@vm01 mnt]# xfs_io -c fiemap data data: 0: [0..7]: 0..7 Hm…. Cheers, Dave. -- Dave Chinner da...@fromorbit.com Best Regards, Wang Shilong Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs rescue super-recover memory corruption
Hi Eric, This: # truncate --size=8g # dd if=/dev/zero of=file conv=notrunc bs=4 seek=16384 count=1 # valgrind ./btrfs rescue super-recover file -v yields: ==4604== Memcheck, a memory error detector ==4604== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==4604== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==4604== Command: ./btrfs rescue super-recover file -v ==4604== All Devices: Device: id = 1, name = file Before Recovering: [All good supers]: device name = file superblock bytenr = 67108864 [All bad supers]: device name = file superblock bytenr = 65536 Make sure this is a btrfs disk otherwise the tool will destroy other fs, Are you sure? [y/N]: y Recovered bad superblocks successful ==4604== Invalid read of size 8 ==4604==at 0x426B55: btrfs_recover_superblocks (list.h:204) ==4604==by 0x421C79: cmd_super_recover (cmds-rescue.c:148) ==4604==by 0x40420A: handle_command_group (btrfs.c:145) ==4604==by 0x421B54: cmd_rescue (cmds-rescue.c:162) ==4604==by 0x404199: main (btrfs.c:247) ==4604== Address 0x4c250b0 is 48 bytes inside a block of size 96 free'd ==4604==at 0x4A063F0: free (vg_replace_malloc.c:446) ==4604==by 0x43C77E: btrfs_close_devices (volumes.c:196) ==4604==by 0x42F5D1: close_ctree (disk-io.c:1404) ==4604==by 0x426A85: btrfs_recover_superblocks (super-recover.c:340) ==4604==by 0x421C79: cmd_super_recover (cmds-rescue.c:148) ==4604==by 0x40420A: handle_command_group (btrfs.c:145) ==4604==by 0x421B54: cmd_rescue (cmds-rescue.c:162) ==4604==by 0x404199: main (btrfs.c:247) ==4604== ==4604== Invalid free() / delete / delete[] / realloc() ==4604==at 0x4A063F0: free (vg_replace_malloc.c:446) ==4604==by 0x426B9E: btrfs_recover_superblocks (super-recover.c:85) ==4604==by 0x421C79: cmd_super_recover (cmds-rescue.c:148) ==4604==by 0x40420A: handle_command_group (btrfs.c:145) ==4604==by 0x421B54: cmd_rescue (cmds-rescue.c:162) ==4604==by 0x404199: main (btrfs.c:247) ==4604== Address 0x4c25080 is 0 bytes inside a block of size 96 free'd ==4604==at 0x4A063F0: free (vg_replace_malloc.c:446) ==4604==by 0x43C77E: btrfs_close_devices (volumes.c:196) ==4604==by 0x42F5D1: close_ctree (disk-io.c:1404) ==4604==by 0x426A85: btrfs_recover_superblocks (super-recover.c:340) ==4604==by 0x421C79: cmd_super_recover (cmds-rescue.c:148) ==4604==by 0x40420A: handle_command_group (btrfs.c:145) ==4604==by 0x421B54: cmd_rescue (cmds-rescue.c:162) ==4604==by 0x404199: main (btrfs.c:247) ==4604== ==4604== ==4604== HEAP SUMMARY: ==4604== in use at exit: 0 bytes in 0 blocks ==4604== total heap usage: 72 allocs, 73 frees, 140,384 bytes allocated ==4604== ==4604== All heap blocks were freed -- no leaks are possible ==4604== ==4604== For counts of detected and suppressed errors, rerun with: -v ==4604== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 6 from 6) i.e. I think we are double freeing memory: close_ctree(root); // -- here no_recover: recover_err_str(ret); free_recover_superblock(recover); // -- and here I can't really work out what all this is all doing, but maybe the fix is obvious to Wang Shilong (who wrote the original code)? Though i no longer spend much time on btrfs, i will take a look at this. ^_^ Thanks Eric Thanks, -Eric Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: super-recover: fix double free fs_devices memory
super-recover collects btrfs devices infomation using existed functions scan_one_devices(). Problem is fs_devices is freed twice in close_ctree() and free_recover_superblock() for super correction path. Fix this problem by checking whether fs_devices memory have been freed before we free it. Cc: Eric Sandeen sand...@redhat.com Cc: Chris Murphy li...@colorremedies.com Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- super-recover.c | 13 +++-- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/super-recover.c b/super-recover.c index 767de4b..419b86a 100644 --- a/super-recover.c +++ b/super-recover.c @@ -69,21 +69,11 @@ void init_recover_superblock(struct btrfs_recover_superblock *recover) static void free_recover_superblock(struct btrfs_recover_superblock *recover) { - struct btrfs_device *device; struct super_block_record *record; if (!recover-fs_devices) return; - while (!list_empty(recover-fs_devices-devices)) { - device = list_entry(recover-fs_devices-devices.next, - struct btrfs_device, dev_list); - list_del_init(device-dev_list); - free(device-name); - free(device); - } - free(recover-fs_devices); - while (!list_empty(recover-good_supers)) { record = list_entry(recover-good_supers.next, struct super_block_record, list); @@ -341,6 +331,9 @@ int btrfs_recover_superblocks(const char *dname, no_recover: recover_err_str(ret); free_recover_superblock(recover); + /* check if we have freed fs_deivces in close_ctree() */ + if (!root) + btrfs_close_devices(recover.fs_devices); return ret; } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: remove empty block groups automatically
@@ static int btrfs_relocate_chunk(struct btrfs_root *root, if (map-stripes[i].dev) { ret = btrfs_update_device(trans, map-stripes[i].dev); - BUG_ON(ret); + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto out; + } } } ret = btrfs_free_chunk(trans, root, chunk_tree, chunk_objectid, chunk_offset); - - BUG_ON(ret); + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto out; + } trace_btrfs_chunk_free(root, map, chunk_offset, em-len); if (map-type BTRFS_BLOCK_GROUP_SYSTEM) { ret = btrfs_del_sys_chunk(root, chunk_objectid, chunk_offset); - BUG_ON(ret); + if (ret) { + btrfs_abort_transaction(trans, root, ret); + goto out; + } } ret = btrfs_remove_block_group(trans, extent_root, chunk_offset); - BUG_ON(ret); + if (ret) { + btrfs_abort_transaction(trans, extent_root, ret); + goto out; + } write_lock(em_tree-lock); remove_extent_mapping(em_tree, em); @@ -2658,11 +2660,46 @@ static int btrfs_relocate_chunk(struct btrfs_root *root, /* once for the tree */ free_extent_map(em); +out: /* once for us */ free_extent_map(em); + return ret; +} + +static int btrfs_relocate_chunk(struct btrfs_root *root, + u64 chunk_tree, u64 chunk_objectid, + u64 chunk_offset) +{ + struct btrfs_root *extent_root; + struct btrfs_trans_handle *trans; + int ret; + + root = root-fs_info-chunk_root; + extent_root = root-fs_info-extent_root; + + ret = btrfs_can_relocate(extent_root, chunk_offset); + if (ret) + return -ENOSPC; + + /* step one, relocate all the extents inside this chunk */ + ret = btrfs_relocate_block_group(extent_root, chunk_offset); + if (ret) + return ret; + trans = btrfs_start_transaction(root, 0); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + btrfs_std_error(root-fs_info, ret); + return ret; + } + + /* + * step two, delete the device extents and the + * chunk tree entries + */ + ret = btrfs_remove_chunk(trans, root, chunk_offset); btrfs_end_transaction(trans, root); - return 0; + return ret; } static int btrfs_relocate_sys_chunks(struct btrfs_root *root) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 2b37da3..f648d19 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -462,6 +462,8 @@ unsigned long btrfs_full_stripe_len(struct btrfs_root *root, int btrfs_finish_chunk_alloc(struct btrfs_trans_handle *trans, struct btrfs_root *extent_root, u64 chunk_offset, u64 chunk_size); +int btrfs_remove_chunk(struct btrfs_trans_handle *trans, +struct btrfs_root *root, u64 chunk_offset); static inline int btrfs_dev_stats_dirty(struct btrfs_device *dev) { -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Best Regards, Wang Shilong -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: fix some build warnings on 32bit platform
Hi David, On Sun, Jul 27, 2014 at 12:49:55AM +0800, Wang Shilong wrote: cmds-restore.c:120:4: warning: format %lu expects argument of type long unsigned int, but argument 3 has type size_t [-Wformat=] fprintf(stderr, bad compress length %lu\n, in_len); --- a/cmds-restore.c +++ b/cmds-restore.c @@ -117,7 +117,7 @@ static int decompress_lzo(unsigned char *inbuf, char *outbuf, u64 compress_len, in_len = read_compress_length(inbuf); if ((tot_in + LZO_LEN + in_len) tot_len) { -fprintf(stderr, bad compress length %lu\n, in_len); +fprintf(stderr, bad compress length %u\n, in_len); in_len is size_t, this prints a warning on 64bit. Let's make it %lu and cas to unsigned long (fixed locally, no need to resend), we'er not using the 'z' size_t modifier for printf anywhere. I am sorry that i forgot to test the patch on 64bit. As i have left current company, i could not setup a 64 bit machine until next week. So i will appreciate it if you could fix this when merging or wait until next week once i got a 64bit machine. Thanks, Wang-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch
Hi Takashi, This seems like a promising fix, i just have a little question: ulist_add() logic is we firstly cast a pointer to u64(see ptr_to_u64()), and then cast u64 to pointer back(u64_to_ptr()). So normally, arg u64 is actually a pointer. If the below overflow happens, that means casting can change original value? Am i missing something here? Thanks, Wang On 07/28/2014 04:57 PM, Takashi Iwai wrote: We've got bug reports that btrfs crashes when quota is enabled on 32bit kernel, typically with the Oops like below: BUG: unable to handle kernel NULL pointer dereference at 0004 IP: [f9234590] find_parent_nodes+0x360/0x1380 [btrfs] *pde = Oops: [#1] SMP CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S W 3.15.2-1.gd43d97e-default #1 Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs] task: f1478130 ti: f147c000 task.ti: f147c000 EIP: 0060:[f9234590] EFLAGS: 00010213 CPU: 0 EIP is at find_parent_nodes+0x360/0x1380 [btrfs] EAX: f147dda8 EBX: f147ddb0 ECX: 0011 EDX: ESI: EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 0004 CR3: 00bf3000 CR4: 0690 Stack: f147dda4 0050 0001 0001 0050 0001 d3059000 0001 0022 00a8 00a1 0001 1180 Call Trace: [f923564d] __btrfs_find_all_roots+0x9d/0xf0 [btrfs] [f9237bb1] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs] [f9206148] normal_work_helper+0xc8/0x270 [btrfs] [c025e38b] process_one_work+0x11b/0x390 [c025eea1] worker_thread+0x101/0x340 [c026432b] kthread+0x9b/0xb0 [c0712a71] ret_from_kernel_thread+0x21/0x30 [c0264290] kthread_create_on_node+0x110/0x110 This indicates a NULL corruption in prefs_delayed list. The further investigation and bisection pointed that the call of ulist_add_merge() results in the corruption. ulist_add_merge() takes u64 as aux and writes a 64bit value into old_aux. The callers of this function in backref.c, however, pass a pointer of a pointer to old_aux. That is, the function overwrites 64bit value on 32bit pointer. This caused a NULL in the adjacent variable, in this case, prefs_delayed. Here is a quick attempt to band-aid over this: a new function, ulist_add_merge_ptr() is introduced to pass/store properly a pointer value instead of u64. There are still ugly void ** cast remaining in the callers because void ** cannot be taken implicitly. But, it's safer than explicit cast to u64, anyway. Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046 Cc: sta...@vger.kernel.org [v3.11+] Signed-off-by: Takashi Iwai ti...@suse.de --- Alternatively, we can change the argument of aux and old_aux to a pointer from u64, as backref.c is the only user of ulist_add_merge() function. I'll cook up another patch if it's the preferred way. fs/btrfs/backref.c | 11 +-- fs/btrfs/ulist.h | 15 +++ 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index e25564bfcb46..d7a24620a963 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -276,9 +276,8 @@ static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, } if (ret 0) goto next; - ret = ulist_add_merge(parents, eb-start, - (uintptr_t)eie, - (u64 *)old, GFP_NOFS); + ret = ulist_add_merge_ptr(parents, eb-start, + eie, (void **)old, GFP_NOFS); if (ret 0) break; if (!ret extent_item_pos) { @@ -1008,9 +1007,9 @@ again: goto out; ref-inode_list = eie; } - ret = ulist_add_merge(refs, ref-parent, - (uintptr_t)ref-inode_list, - (u64 *)eie, GFP_NOFS); + ret = ulist_add_merge_ptr(refs, ref-parent, + ref-inode_list, + (void **)eie, GFP_NOFS); if (ret 0) goto out; if (!ret extent_item_pos) { diff --git a/fs/btrfs/ulist.h b/fs/btrfs/ulist.h index 7f78cbf5cf41..695fc2bac680 100644 --- a/fs/btrfs/ulist.h +++ b/fs/btrfs/ulist.h @@ -57,6 +57,21 @@ void ulist_free(struct ulist *ulist); int ulist_add(struct ulist *ulist, u64 val, u64 aux, gfp_t gfp_mask); int ulist_add_merge(struct ulist *ulist, u64 val, u64 aux, u64 *old_aux, gfp_t
Re: Help with Project on brtfs wiki
Hi Hugo, Am Samstag, 26. Juli 2014, 11:05:03 schrieb Hugo Mills: I am not asking to hold my hand You are, though. Something like using grep to find code that (might be) related to the thing you're trying to work with is _fundamental_. You should expect to be reading the code -- several layers deep, in both directions from any given function -- in order to understand what it does and how it fits in, and what problems you might enncounter if you change it. If you won't put in that degree of basic effort, then you won't get very far. Actually, from what I read from Nick Krause so far here and on LKML: Can it be that he is at admittedly quite inventive trolling? I found none of his posts to be even remotely convincing although he submitted a patch elsewhere. Thanks for your detailed explaination tough. Hugo is one of most reliable and serious man i have known. Anyway, Nick if you really want to do something, just follow Hugo, Ted and others' advices. Contributing Btrfs has many ways, try it and report some BUGs, add some regression tests etc are all welcome IMO. Thanks, Wang To Nick: If I am mistaken – mea culpa. But your posts just seem to follow what I perceive the usual style of something into trolling. Ciao, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: fix some build warnings on 32bit platform
Fix following build warnings on 32bit platform: ... utils.c:1708:3: warning: left shift count = width of type [enabled by default] if (x i (1UL 63)) ^ qgroup-verify.c:393:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] return (struct tree_block *)unode-aux; ^ qgroup-verify.c:407:38: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] if (ulist_add(tree_blocks, bytenr, (unsigned long long)block, 0) = 0) ^ cmds-restore.c:120:4: warning: format %lu expects argument of type long unsigned int, but argument 3 has type size_t [-Wformat=] fprintf(stderr, bad compress length %lu\n, in_len); ... BTW, this patch also switches other castings with new helpers. Signed-off-by: Wang Shilong wangshilong1...@gmail.com --- cmds-inspect.c | 4 ++-- cmds-restore.c | 2 +- extent-tree.c | 3 +-- kerncompat.h| 4 qgroup-verify.c | 4 ++-- utils.c | 2 +- 6 files changed, 11 insertions(+), 8 deletions(-) diff --git a/cmds-inspect.c b/cmds-inspect.c index cd9d2c6..c6c17a5 100644 --- a/cmds-inspect.c +++ b/cmds-inspect.c @@ -49,7 +49,7 @@ static int __ino_to_path_fd(u64 inum, int fd, int verbose, const char *prepend) memset(fspath, 0, sizeof(*fspath)); ipa.inum = inum; ipa.size = 4096; - ipa.fspath = (uintptr_t)fspath; + ipa.fspath = ptr_to_u64(fspath); ret = ioctl(fd, BTRFS_IOC_INO_PATHS, ipa); if (ret) { @@ -185,7 +185,7 @@ static int cmd_logical_resolve(int argc, char **argv) memset(inodes, 0, sizeof(*inodes)); loi.logical = arg_strtou64(argv[optind]); loi.size = size; - loi.inodes = (uintptr_t)inodes; + loi.inodes = ptr_to_u64(inodes); fd = open_file_or_dir(argv[optind+1], dirstream); if (fd 0) { diff --git a/cmds-restore.c b/cmds-restore.c index 3465f84..abd9773 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -117,7 +117,7 @@ static int decompress_lzo(unsigned char *inbuf, char *outbuf, u64 compress_len, in_len = read_compress_length(inbuf); if ((tot_in + LZO_LEN + in_len) tot_len) { - fprintf(stderr, bad compress length %lu\n, in_len); + fprintf(stderr, bad compress length %u\n, in_len); return -1; } diff --git a/extent-tree.c b/extent-tree.c index 7979457..c46c92b 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -3090,8 +3090,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info) break; ret = get_state_private(info-block_group_cache, start, ptr); if (!ret) { - cache = (struct btrfs_block_group_cache *) - (uintptr_t)ptr; + cache = u64_to_ptr(ptr); if (cache-free_space_ctl) { btrfs_remove_free_space_cache(cache); kfree(cache-free_space_ctl); diff --git a/kerncompat.h b/kerncompat.h index 652275e..bb03194 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -28,6 +28,10 @@ #include assert.h #include stddef.h #include linux/types.h +#include stdint.h + +#define ptr_to_u64(x) ((u64)(uintptr_t)x) +#define u64_to_ptr(x) ((void *)(uintptr_t)x) #ifndef READ #define READ 0 diff --git a/qgroup-verify.c b/qgroup-verify.c index 2e1716d..a5bd4f8 100644 --- a/qgroup-verify.c +++ b/qgroup-verify.c @@ -390,7 +390,7 @@ static u64 resolve_one_root(u64 bytenr) static inline struct tree_block *unode_tree_block(struct ulist_node *unode) { - return (struct tree_block *)unode-aux; + return u64_to_ptr(unode-aux); } static inline u64 unode_bytenr(struct ulist_node *unode) { @@ -404,7 +404,7 @@ static int alloc_tree_block(u64 bytenr, u64 num_bytes, int level) if (block) { block-num_bytes = num_bytes; block-level = level; - if (ulist_add(tree_blocks, bytenr, (unsigned long long)block, 0) = 0) + if (ulist_add(tree_blocks, bytenr, ptr_to_u64(block), 0) = 0) return 0; free(block); } diff --git a/utils.c b/utils.c index 11250d9..241e5ff 100644 --- a/utils.c +++ b/utils.c @@ -1705,7 +1705,7 @@ static int fls64(u64 x) int i; for (i = 0; i 64; i++) - if (x i (1UL 63)) + if (x i (1ULL 63)) return 64 - i; return 64 - i; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: remove author and copyright info from man page
From: http://man7.org/linux/man-pages/man7/man-pages.7.html ... AUTHORS lists authors of the documentation or program.Use of an AUTHORS section is strongly discouraged. Generally, it is better not to clutter every page with a list of (over time potentially numerous) authors; if you write or significantly amend a page, add a copyright notice as a comment in the source file. If you are the author of a device driver and want to include an address for reporting bugs, place this under the BUGS section. ... Suggested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- Documentation/btrfs-convert.txt | 12 Documentation/btrfs-debug-tree.txt | 12 Documentation/btrfs-find-root.txt | 12 Documentation/btrfs-image.txt | 12 Documentation/btrfs-map-logical.txt | 12 Documentation/btrfs-show-super.txt | 12 Documentation/btrfs-zero-log.txt| 12 Documentation/btrfstune.txt | 12 8 files changed, 96 deletions(-) diff --git a/Documentation/btrfs-convert.txt b/Documentation/btrfs-convert.txt index 1eff0bf..11d6044 100644 --- a/Documentation/btrfs-convert.txt +++ b/Documentation/btrfs-convert.txt @@ -31,18 +31,6 @@ EXIT STATUS *btrfs-convert* will return 0 if no error happened. If any problems happened, 1 will be returned. -AUTHOR --- -Written by Shilong Wang and Wenruo Qu. - -COPYRIGHT -- -Copyright (C) 2013 FUJITSU LIMITED. - -License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html. - -This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. - SEE ALSO `mkfs.btrfs`(8) diff --git a/Documentation/btrfs-debug-tree.txt b/Documentation/btrfs-debug-tree.txt index bfc8aa4..23fc115 100644 --- a/Documentation/btrfs-debug-tree.txt +++ b/Documentation/btrfs-debug-tree.txt @@ -33,18 +33,6 @@ EXIT STATUS *btrfs-debug-tree* will return 0 if no error happened. If any problems happened, 1 will be returned. -AUTHOR --- -Written by Shilong Wang and Wenruo Qu. - -COPYRIGHT -- -Copyright (C) 2013 FUJITSU LIMITED. - -License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html. - -This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. - SEE ALSO `mkfs.btrfs`(8) diff --git a/Documentation/btrfs-find-root.txt b/Documentation/btrfs-find-root.txt index a360f8f..c934b4c 100644 --- a/Documentation/btrfs-find-root.txt +++ b/Documentation/btrfs-find-root.txt @@ -28,18 +28,6 @@ EXIT STATUS *btrfs-find-root* will return 0 if no error happened. If any problems happened, 1 will be returned. -AUTHOR --- -Written by Shilong Wang and Wenruo Qu. - -COPYRIGHT -- -Copyright (C) 2013 FUJITSU LIMITED. - -License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html. - -This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. - SEE ALSO `mkfs.btrfs`(8) diff --git a/Documentation/btrfs-image.txt b/Documentation/btrfs-image.txt index 155194a..b7751f9 100644 --- a/Documentation/btrfs-image.txt +++ b/Documentation/btrfs-image.txt @@ -56,18 +56,6 @@ EXIT STATUS *btrfs-image* will return 0 if no error happened. If any problems happened, 1 will be returned. -AUTHOR --- -Written by Shilong Wang and Wenruo Qu. - -COPYRIGHT -- -Copyright (C) 2013 FUJITSU LIMITED. - -License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html. - -This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. - SEE ALSO `mkfs.btrfs`(8) diff --git a/Documentation/btrfs-map-logical.txt b/Documentation/btrfs-map-logical.txt index a8710bf..a3d110c 100644 --- a/Documentation/btrfs-map-logical.txt +++ b/Documentation/btrfs-map-logical.txt @@ -32,18 +32,6 @@ EXIT STATUS *btrfs-map-logical* will return 0 if no error happened. If any problems happened, 1 will be returned. -AUTHOR --- -Written by Shilong Wang and Wenruo Qu. - -COPYRIGHT -- -Copyright (C) 2013 FUJITSU LIMITED. - -License GPLv2: GNU GPL version 2 http://gnu.org/licenses/gpl.html. - -This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. - SEE ALSO `mkfs.btrfs`(8) diff --git a/Documentation/btrfs-show-super.txt b/Documentation/btrfs-show-super.txt index 6fee0f1..1646be3 100644 --- a/Documentation/btrfs-show-super.txt +++ b/Documentation/btrfs-show-super.txt @@ -45,18 +45,6 @@ EXIT STATUS *btrfs-show-super* will return 0 if no error happened. If any problems happened, 1 will be returned. -AUTHOR --- -Written by Shilong Wang and Wenruo Qu. - -COPYRIGHT -- -Copyright (C) 2013 FUJITSU LIMITED. - -License GPLv2: GNU GPL version 2 http
Re: btrfs_qgroup_create unused parameter
Hi Kevin, On 07/25/2014 07:23 AM, Kevin Brandstatter wrote: I submitted a patch for this a week or two ago (https://patchwork.kernel.org/patch/4486121/), but latest for-linus doesn't have it merged, is it just being put of as minor, or is there a problem with it? I believe your patch will be picked up by Chris and sent to Linus when next merge window is open. Since Chris is sometimes busy, patch merging is always delayed for some time. Thanks, Wang -Kevin On 07/04/2014 09:09 PM, Wang Shilong wrote: Hi I think you are right, @name here is unneeded.. You can give a patch for that.^_^ Wang |The code is pasted below for convenience of reference, but in the function to create a qgruop, it taks a 4th parameter (char * name). I assume this is the name of the path to limit, however, i don't see where its used anywhere in the function. -Kevin Brandstatter int btrfs_create_qgroup(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 qgroupid, *char** *name**)* { struct btrfs_root *quota_root; struct btrfs_qgroup *qgroup; int ret = 0; mutex_lock(fs_info-qgroup_ioctl_lock); quota_root = fs_info-quota_root; if (!quota_root) { ret = -EINVAL; goto out; } qgroup = find_qgroup_rb(fs_info, qgroupid); if (qgroup) { ret = -EEXIST; goto out; } ret = add_qgroup_item(trans, quota_root, qgroupid); if (ret) goto out; spin_lock(fs_info-qgroup_lock); qgroup = add_qgroup_rb(fs_info, qgroupid); spin_unlock(fs_info-qgroup_lock); if (IS_ERR(qgroup)) ret = PTR_ERR(qgroup); out: mutex_unlock(fs_info-qgroup_ioctl_lock); return ret; }| -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix compressed write corruption on enospc
On 07/24/2014 10:48 PM, Liu Bo wrote: When failing to allocate space for the whole compressed extent, we'll fallback to uncompressed IO, but we've forgotten to redirty the pages which belong to this compressed extent, and these 'clean' pages will simply skip 'submit' part and go to endio directly, at last we got data corruption as we write nothing. Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/inode.c | 12 1 file changed, 12 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3668048..8ea7610 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -709,6 +709,18 @@ retry: unlock_extent(io_tree, async_extent-start, async_extent-start + async_extent-ram_size - 1); + + /* +* we need to redirty the pages if we decide to +* fallback to uncompressed IO, otherwise we +* will not submit these pages down to lower +* layers. +*/ + extent_range_redirty_for_io(inode, + async_extent-start, + async_extent-start + + async_extent-ram_size - 1); + goto retry; BTW, if such ENOSPC happens, it means we could not reserve compressed space. So we retry with nocompression codes, it will try to reserve more space. Any reason we do such things? Thanks, Wang } goto out_free; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix compressed write corruption on enospc
On 07/25/2014 10:08 AM, Liu Bo wrote: On Fri, Jul 25, 2014 at 09:53:43AM +0800, Wang Shilong wrote: On 07/24/2014 10:48 PM, Liu Bo wrote: When failing to allocate space for the whole compressed extent, we'll fallback to uncompressed IO, but we've forgotten to redirty the pages which belong to this compressed extent, and these 'clean' pages will simply skip 'submit' part and go to endio directly, at last we got data corruption as we write nothing. Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/inode.c | 12 1 file changed, 12 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3668048..8ea7610 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -709,6 +709,18 @@ retry: unlock_extent(io_tree, async_extent-start, async_extent-start + async_extent-ram_size - 1); + + /* +* we need to redirty the pages if we decide to +* fallback to uncompressed IO, otherwise we +* will not submit these pages down to lower +* layers. +*/ + extent_range_redirty_for_io(inode, + async_extent-start, + async_extent-start + + async_extent-ram_size - 1); + goto retry; BTW, if such ENOSPC happens, it means we could not reserve compressed space. So we retry with nocompression codes, it will try to reserve more space. Any reason we do such things? Compressed extents needs continuous space while uncompressed extents can have more choices. Yeah, that is reasonable. Thanks for your answer.^_^ thanks, -liubo Thanks, Wang } goto out_free; . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Quota Ignored On write problem still exist with 3.16-rc5
Hi Satoru-san, On 07/23/2014 08:53 AM, Satoru Takeuchi wrote: Hi Wang, (2014/07/18 19:29), Wang Shilong wrote: On 07/18/2014 04:45 PM, Satoru Takeuchi wrote: Hi Josef, Chris, I found Quota Ignored On write problem still exist with 3.16-rc5, which Kevin reported before. Kevin's report: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35292.html The result of bisect: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35304.html I guess this is because Josef's patch delayed qgroup accounting, it will cause @refer and @excl updating very late... The patch maybe optimize to merge some delayed refs(for example), but it updates qgroup accounting when commiting transaction which will be very late, we may have accumulated many data.. Thank you for your comment. I know of the code logic which caused this problem. However, what I want to say here is that this problem should be fixed as soon as possible. It is a important regression problem and we've already know the root cause. If it's impossible to fix it by releasing 3.16, I consider this patch should be reverted. Since Btrfs Quota function is under heavy development, and should be considered as *broken*. I think we'd better close Btrfs quota function(Like snapshot-aware function) until we really sit down and solve everything. Thanks, Wang Thanks, Satoru Thanks, Wang I bisected and found the bad commit is the following patch. === commit fcebe4562dec83b3f8d3088d77584727b09130b2 Author: Josef Bacik jba...@fb.com Date: Tue May 13 17:30:47 2014 -0700 Btrfs: rework qgroup accounting === Josef, please take a look at this patch. Reproducer: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35299.html Could you tell me the progress of fixing this bug? In addition, could you fix it by 3.16? command log: === # ./test.sh + uname -a Linux luna.soft.fujitsu.com 3.16.0-rc5 #2 SMP Tue Jul 15 13:39:46 JST 2014 x86_64 x86_64 x86_64 GNU/Linux + df -T /test7 Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sdc7 btrfs 29296640 1536 27169536 1% /test7 + btrfs quota ena /test7 + cd /test7 + btrfs sub cre test Create subvolume './test' + btrfs sub l -a /test7 ID 270 gen 66 top level 5 path test + btrfs qg lim 1G test # limit test subvol to 1GB + btrfs qg show -pcre /test7 qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 0 0--- --- 0/27016384 16384 1073741824 0--- --- + dd if=/dev/zero of=test/file0 bs=1M count=2000 2000+0 records in 2000+0 records out 2097152000 bytes (2.1 GB) copied, 9.67876 s, 217 MB/s # write 2GB. It's a bug! + sync + ls -lisaR /test7 /test7: total 20 256 16 drwxr-xr-x 1 root root8 Jul 18 15:12 . 2 4 drwxr-xr-x. 43 root root 4096 Jul 16 08:34 .. 256 0 drwxr-xr-x 1 root root 10 Jul 18 15:17 test /test7/test: total 2048016 256 0 drwxr-xr-x 1 root root 10 Jul 18 15:17 . 256 16 drwxr-xr-x 1 root root 8 Jul 18 15:12 .. 257 2048000 -rw-r--r-- 1 root root 2097152000 Jul 18 15:17 file0 + btrfs qg show -pcre /test7 qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 0 0--- --- 0/2702097168384 2097168384 1073741824 0--- --- + btrfs quota dis /test7 + btrfs sub del test Transaction commit: none (default) Delete subvolume '/test7/test' + set +x # === NOTE: The reproducer here (./test.sh) is a bit different from above-mentioned one because of some reason. Thanks, Satoru -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On 07/21/2014 10:00 PM, TM wrote: Wang Shilong wangsl.fnst at cn.fujitsu.com writes: Just my two cents: Since 'btrfs replace' support RADI10, I suppose using replace operation is better than 'device removal and add'. Another Question is related to btrfs snapshot-aware balance. How many snapshots did you have in your system? Of course, During balance/resize/device removal operations, you could still snapshot, but fewer snapshots should speed things up! Anyway 'btrfs replace' is implemented more effective than 'device remova and add'. Hi Wang, just one subvolume, no snaphots or anything else. device replace: to tell you the truth I have not used it in the past. Most of my testing was done 2 years ago. So in this 'kind of production' system I did not try it. But if I knew that it was faster, perhaps I could have used it. Anyone has statistics for such a replace and the time it takes? I don't have specific statistics about this. The conclusion come from implementation differences between replace and 'device removal'. Also, can replace be used when one device is missing? Cant find documentation. eg. btrfs replace start missing /dev/sdXX TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
On 07/21/2014 10:00 PM, TM wrote: Wang Shilong wangsl.fnst at cn.fujitsu.com writes: Just my two cents: Since 'btrfs replace' support RADI10, I suppose using replace operation is better than 'device removal and add'. Another Question is related to btrfs snapshot-aware balance. How many snapshots did you have in your system? Of course, During balance/resize/device removal operations, you could still snapshot, but fewer snapshots should speed things up! Anyway 'btrfs replace' is implemented more effective than 'device remova and add'. Hi Wang, just one subvolume, no snaphots or anything else. device replace: to tell you the truth I have not used it in the past. Most of my testing was done 2 years ago. So in this 'kind of production' system I did not try it. But if I knew that it was faster, perhaps I could have used it. Anyone has statistics for such a replace and the time it takes? I don't have specific statistics about this. The conclusion come from implementation differences between replace and 'device removal'. Also, can replace be used when one device is missing? Cant find documentation. eg. btrfs replace start missing /dev/sdXX The latest btrfs-progs include man page of btrfs-replace. Actually, you could use it something like: btrfs replace start srcdev|devid targetdev mnt You could use 'btrfs file show' to see missing device id. and then run btrfs replace. Thanks, Wang TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1 week to rebuid 4x 3TB raid10 is a long time!
Hi, On 07/20/2014 04:45 PM, TM wrote: Hi, I have a raid10 with 4x 3TB disks on a microserver http://n40l.wikia.com/wiki/Base_Hardware_N54L , 8Gb RAM Recently one disk started to fail (smart errors), so I replaced it Mounted as degraded, added new disk, removed old Started yesterday I am monitoring /var/log/messages and it seems it will take a long time Started at about 8010631739392 And 20 hours later I am at 6910631739392 btrfs: relocating block group 6910631739392 flags 65 At this rate it will take a week to complete the raid rebuild!!! Just my two cents: Since 'btrfs replace' support RADI10, I suppose using replace operation is better than 'device removal and add'. Another Question is related to btrfs snapshot-aware balance. How many snapshots did you have in your system? Of course, During balance/resize/device removal operations, you could still snapshot, but fewer snapshots should speed things up! Anyway 'btrfs replace' is implemented more effective than 'device remova and add'.:-) Thanks, Wang Furthermore it seems that the operation is getting slower and slower When the rebuild started I had a new message every half a minute, now it’s getting to OneAndHalf minutes Most files are small files like flac/jpeg One week for a raid10 rebuild 4x3TB drives is a very long time. Any thoughts? Can you share any statistics from your RAID10 rebuilds? If I shut down the system, before the rebuild, what is the proper procedure to remount it? Again degraded? Or normally? Can the process of rebuilding the raid continue after a reboot? Will it survive, and continue rebuilding? Thanks in advance TM -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Quota Ignored On write problem still exist with 3.16-rc5
On 07/18/2014 04:45 PM, Satoru Takeuchi wrote: Hi Josef, Chris, I found Quota Ignored On write problem still exist with 3.16-rc5, which Kevin reported before. Kevin's report: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35292.html The result of bisect: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35304.html I guess this is because Josef's patch delayed qgroup accounting, it will cause @refer and @excl updating very late... The patch maybe optimize to merge some delayed refs(for example), but it updates qgroup accounting when commiting transaction which will be very late, we may have accumulated many data.. Thanks, Wang I bisected and found the bad commit is the following patch. === commit fcebe4562dec83b3f8d3088d77584727b09130b2 Author: Josef Bacik jba...@fb.com Date: Tue May 13 17:30:47 2014 -0700 Btrfs: rework qgroup accounting === Josef, please take a look at this patch. Reproducer: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35299.html Could you tell me the progress of fixing this bug? In addition, could you fix it by 3.16? command log: === # ./test.sh + uname -a Linux luna.soft.fujitsu.com 3.16.0-rc5 #2 SMP Tue Jul 15 13:39:46 JST 2014 x86_64 x86_64 x86_64 GNU/Linux + df -T /test7 Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sdc7 btrfs 29296640 1536 27169536 1% /test7 + btrfs quota ena /test7 + cd /test7 + btrfs sub cre test Create subvolume './test' + btrfs sub l -a /test7 ID 270 gen 66 top level 5 path test + btrfs qg lim 1G test # limit test subvol to 1GB + btrfs qg show -pcre /test7 qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 0 0--- --- 0/27016384 16384 1073741824 0--- --- + dd if=/dev/zero of=test/file0 bs=1M count=2000 2000+0 records in 2000+0 records out 2097152000 bytes (2.1 GB) copied, 9.67876 s, 217 MB/s # write 2GB. It's a bug! + sync + ls -lisaR /test7 /test7: total 20 256 16 drwxr-xr-x 1 root root8 Jul 18 15:12 . 2 4 drwxr-xr-x. 43 root root 4096 Jul 16 08:34 .. 256 0 drwxr-xr-x 1 root root 10 Jul 18 15:17 test /test7/test: total 2048016 256 0 drwxr-xr-x 1 root root 10 Jul 18 15:17 . 256 16 drwxr-xr-x 1 root root 8 Jul 18 15:12 .. 257 2048000 -rw-r--r-- 1 root root 2097152000 Jul 18 15:17 file0 + btrfs qg show -pcre /test7 qgroupid rfer excl max_rfer max_excl parent child -- - 0/5 16384 16384 0 0--- --- 0/2702097168384 2097168384 1073741824 0--- --- + btrfs quota dis /test7 + btrfs sub del test Transaction commit: none (default) Delete subvolume '/test7/test' + set +x # === NOTE: The reproducer here (./test.sh) is a bit different from above-mentioned one because of some reason. Thanks, Satoru -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: `btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags 1)' failed.` in `btrfsck`
Hi, There are some patches for fsck flighting, they are integrated in David's branches. You can pull from David's latest branch, and see if it helps: https://github.com/kdave/btrfs-progs integration-20140704 Have a try and see if it helps anyway. Thanks, Wang Hi together, I'm experiencing the following issues when I invoke `btrfsck` on a sparse file image with a GPT and one (the only) btrfs partition attached to a loop device $ sudo btrfsck --repair --init-csum-tree --init-extent-tree -b /dev/loop0p1 Incorrect local backref count on 128510738432 root 5 owner 3849475 offset 0 found 1 wanted 0 back 0xbab41270 backpointer mismatch on [128510738432 4096] ref mismatch on [128510742528 12288] extent item 0, found 1 btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags 1)' failed. $ sudo btrfsck --repair --init-csum-tree --init-extent-tree /dev/loop0p1 Incorrect local backref count on 128510726144 root 5 owner 3849470 offset 0 found 1 wanted 0 back 0xbbcb9500 backpointer mismatch on [128510726144 12288] ref mismatch on [128510738432 4096] extent item 0, found 1 adding new data backref on 128510738432 root 5 owner 3849475 offset 0 found 1 Backref 128510738432 root 5 owner 3849475 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 128510738432 root 5 owner 3849475 offset 0 found 1 wanted 0 back 0xbbcb9630 backpointer mismatch on [128510738432 4096] ref mismatch on [128510742528 12288] extent item 0, found 1 btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags 1)' failed. $ sudo btrfsck --repair /dev/loop0p1 Incorrect local backref count on 130861096960 root 5 owner 22733727 offset 0 found 1 wanted 0 back 0xc7c7d170 backpointer mismatch on [130861096960 8192] ref mismatch on [130861105152 8192] extent item 0, found 1 btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags 1)' failed. $ sudo btrfsck --repair /dev/loop0p1 Backref 130861096960 root 5 owner 22733727 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 130861096960 root 5 owner 22733727 offset 0 found 1 wanted 0 back 0xc7f31170 backpointer mismatch on [130861096960 8192] ref mismatch on [130861105152 8192] extent item 0, found 1 btrfsck: extent_io.c:612: free_extent_buffer: Assertion `!(eb-flags 1)' failed. I'm using `btrfs-progs` 24cf4d8c3ee924b474f68514e0167cc2e602a48d on Linux 3.16-rc5 (anything else, i.e. older versions, give me immediate error after start because errornous file system) I'd like to know whether this (assertion) error is related to a bug or missing feature in btrfs-progs and might be fixed at some point or whether this might indicate a completely messed up btrfs. Best regards, Karl-P. Richter -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] Btrfs: fix wrong extent mapping for DirectIO
btrfs_next_leaf() will use current leaf's last key to search and then return a bigger one. So it may still return a file extent item that is smaller than expected value and we will get an overflow here for @em-len. This is easy to reproduce for Btrfs Direct writting, it did not cause any problem, because writting will re-insert right mapping later. However, by hacking code to make DIO support compression, wrong extent mapping is kept and it encounter merging failure(EEXIST) quickly. Fix this problem by looping to find next file extent item that is bigger than @start or we could not find anything more. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- fs/btrfs/inode.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ed8b55c..d1ba5e4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6308,6 +6308,8 @@ next: goto not_found; if (start + len = found_key.offset) goto not_found; + if (start found_key.offset) + goto next; em-start = start; em-orig_start = start; em-len = found_key.offset - start; -- 1.8.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] Btrfs: fix off-by-one in cow_file_range_inline()
Btrfs could still inline file data if its size is same as page size, so don't skip max value here. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- fs/btrfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e678391..e2c3d63 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -249,8 +249,8 @@ static noinline int cow_file_range_inline(struct btrfs_root *root, data_len = compressed_size; if (start 0 || - actual_end = PAGE_CACHE_SIZE || - data_len = BTRFS_MAX_INLINE_DATA_SIZE(root) || + actual_end PAGE_CACHE_SIZE || + data_len BTRFS_MAX_INLINE_DATA_SIZE(root) || (!compressed_size (actual_end (root-sectorsize - 1)) == 0) || end + 1 isize || -- 1.8.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] Btrfs: fix wrong max inline data size limit
inline data is stored from offset of @disk_bytenr in struct btrfs_file_extent_item. So substracting total size of struct btrfs_file_extent_item is wrong, fix it. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- fs/btrfs/ctree.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index be91397..22fbd04 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -394,7 +394,7 @@ struct btrfs_header { #define BTRFS_LEAF_DATA_SIZE(r) (__BTRFS_LEAF_DATA_SIZE(r-leafsize)) #define BTRFS_MAX_INLINE_DATA_SIZE(r) (BTRFS_LEAF_DATA_SIZE(r) - \ sizeof(struct btrfs_item) - \ - sizeof(struct btrfs_file_extent_item)) + offsetof(struct btrfs_file_extent_item, disk_bytenr)) #define BTRFS_MAX_XATTR_SIZE(r)(BTRFS_LEAF_DATA_SIZE(r) - \ sizeof(struct btrfs_item) -\ sizeof(struct btrfs_dir_item)) -- 1.8.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] Btrfs: fix wrong skipping compression for an inode
If a file's compression ratios is bad, we will set NOCOMPRESS flag for it, and it will skip compression for that inode next time. However, if we remount fs to COMPRESS_FORCE, it still should try if we could compress pages for that inode, this patch fix wrong check for this problem. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- fs/btrfs/inode.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a58f31c..20798f8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -348,6 +348,23 @@ static noinline int add_async_extent(struct async_cow *cow, return 0; } +static inline int inode_need_compress(struct inode *inode) +{ + struct btrfs_root *root = BTRFS_I(inode)-root; + + /* force compress */ + if (btrfs_test_opt(root, FORCE_COMPRESS)) + return 1; + /* bad compression ratios */ + if (BTRFS_I(inode)-flags BTRFS_INODE_NOCOMPRESS) + return 0; + if (btrfs_test_opt(root, COMPRESS) || + BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS || + BTRFS_I(inode)-force_compress) + return 1; + return 0; +} + /* * we create compressed extents in two phases. The first * phase compresses a range of pages that have already been @@ -444,10 +461,7 @@ again: * inode has not been flagged as nocompress. This flag can * change at any time if we discover bad compression ratios. */ - if (!(BTRFS_I(inode)-flags BTRFS_INODE_NOCOMPRESS) - (btrfs_test_opt(root, COMPRESS) || -(BTRFS_I(inode)-force_compress) || -(BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS))) { + if (inode_need_compress(inode)) { WARN_ON(pages); pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS); if (!pages) { @@ -1076,7 +1090,8 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page, async_cow-locked_page = locked_page; async_cow-start = start; - if (BTRFS_I(inode)-flags BTRFS_INODE_NOCOMPRESS) + if (BTRFS_I(inode)-flags BTRFS_INODE_NOCOMPRESS + !btrfs_test_opt(root, FORCE_COMPRESS)) cur_end = end; else cur_end = min(end, start + 512 * 1024 - 1); -- 1.8.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] Btrfs: fall into nocompression codes quickly if possible
If flag NOCOMPRESS is set which means bad compression ratio, we could avoid call cow_file_range_async() for this case earlier. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- fs/btrfs/inode.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 20798f8..e678391 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1468,7 +1468,6 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page, unsigned long *nr_written) { int ret; - struct btrfs_root *root = BTRFS_I(inode)-root; int force_cow = need_force_cow(inode, start, end); if (BTRFS_I(inode)-flags BTRFS_INODE_NODATACOW !force_cow) { @@ -1477,9 +1476,7 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page, } else if (BTRFS_I(inode)-flags BTRFS_INODE_PREALLOC !force_cow) { ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); - } else if (!btrfs_test_opt(root, COMPRESS) - !(BTRFS_I(inode)-force_compress) - !(BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS)) { + } else if (!inode_need_compress(inode)) { ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); } else { -- 1.8.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] Btrfs: fix wrong write range for filemap_fdatawrite_range()
filemap_fdatawrite_range() expect the third arg to be @end not @len, fix it. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- fs/btrfs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e2c3d63..ed8b55c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7567,7 +7567,8 @@ static ssize_t btrfs_direct_IO(int rw, struct kiocb *iocb, count = iov_iter_count(iter); if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, BTRFS_I(inode)-runtime_flags)) - filemap_fdatawrite_range(inode-i_mapping, offset, count); + filemap_fdatawrite_range(inode-i_mapping, offset, +offset + count - 1); if (rw WRITE) { /* -- 1.8.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qgroup destroy / assign
Hi Kevin, On 07/05/2014 05:10 AM, Kevin Brandstatter wrote: how are qgroups accounted for? Are they specifially tied to one subvolume on creation? Qgroup implementation is aslo a little confusing for me at first:-) . Yes, a qgroup is created automatically tied to one subvolume on creation with the same objectid. To implement qgroup group, you may want to do something like following: [1/1] / \ / \ sub1(5) subv2(257) If so, is it possible to auto delete relavant qgroups on deletion of the subvolume? I supposed so, according to latest qgroup patches flighting on, a subvolume qgroup should be destroyed safely, when it finished sub-tree space accounting. also, how exactly does qgroup assign work? I havent been able to get it to work at all. in btrfsprogs cmds-cgroup.c if ((args.src 48) = (args.dst 48)) { fprintf(stderr, ERROR: bad relation requested '%s'\n, path); return 1; } Oh, this is to implement a strict level qgroup group, which means a u64 is divided into two parts, 16bits for level and the rest for id. So we ask parent qgroup's level must be greater than child's qgroup.that is the code you see above. You could create a qgroup relation like this: # btrfs qgroup assign 256 1/1 mnt Hopely, this could help you. always seems to fail. I tried creating another qgroup id 1000, and assigning it to as sub, and vice versa, as well as assigning the sub to the root, and vice versa, as well as one subvol to another. The fixme comment leads me to believe that the src should be a path not a qgroup (FIXME src should accept subvol path) but the progs let me create a qgroup without a subvol, which makes sense if you want to be able to have some meta-qgroup for a bunch of subvols. Further on noticing that a sub create also creates a qgroup with the same id as the subvol, it would seem that the qgroup is tied to the subvol via this shared id. -Kevin Brandstatter -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On 07/04/2014 02:02 PM, Marc MERLIN wrote: On Fri, Jul 04, 2014 at 01:29:29PM +0800, Wang Shilong wrote: Well, I explained the problem, ext4 and others of course tell me which devid an error is on, hopefully btrfs will able to do so in the near future. So it is ok for you to print one of btrfs filesystem device(for example device name) ? maybe it is not really physical address the metadata locates in, this is easier. Yes, the device name is great, now I can see which of my 3 filesystems has a problem, that's a start. Next would be knowing which filename this occurred in, but I understand this would be harder to get from that point in the code. Ideally scrub should be able to find that problem and report it, at least I would know which filesystem to rescan for errors: Back to the original problem, would you agree that find / -type f -print0 | xargs grep . /dev/nul? I'll also have to try this to see if I get lucky with it :) + printk_ratelimited(BTRFS (device: %s) parent transid verify failed on %llu wanted %llu found %llu\n, + eb-fs_info-sb-s_id, eb-start, + parent_transid, btrfs_header_generation(eb)); That looks great. Ideally all such errors would look like this. OK, i found there is a mirror num recorded in struct @extent_buffer, so it is not diffcult to locate the real physical address that this corrupt metadata block locates. But another question is that if such problems happen, it should also help little, because usually other mirrors maybe have the same errors... Looking at btrfs codes, if it found such error which means checksum has passed but generation verification fails it dosen't even try other mirrors if existed, so i supposed such kinds of errors is fatal and could be corrected by btrfs itself.. Anyway, let me see if there are any other output even dose not output btrfs specific info. Thanks, Thanks for looking into this, I appreciate it. Best, Marc -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On 07/04/2014 02:02 PM, Marc MERLIN wrote: On Fri, Jul 04, 2014 at 01:29:29PM +0800, Wang Shilong wrote: Well, I explained the problem, ext4 and others of course tell me which devid an error is on, hopefully btrfs will able to do so in the near future. So it is ok for you to print one of btrfs filesystem device(for example device name) ? maybe it is not really physical address the metadata locates in, this is easier. Yes, the device name is great, now I can see which of my 3 filesystems has a problem, that's a start. Next would be knowing which filename this occurred in, but I understand this would be harder to get from that point in the code. Ideally scrub should be able to find that problem and report it, at least I would know which filesystem to rescan for errors: So there is a problem, ususally such generation verifications errors is related to Btrfs metdata block.it is maybe just a Btrfs node , not related to any actually files, or even a leaf that contains more that one file If this is a read/write path from normal fs/file root, we may output its' root..but if this is something like extent root...i think it helps little Back to the original problem, would you agree that find / -type f -print0 | xargs grep . /dev/nul? I'll also have to try this to see if I get lucky with it :) + printk_ratelimited(BTRFS (device: %s) parent transid verify failed on %llu wanted %llu found %llu\n, + eb-fs_info-sb-s_id, eb-start, + parent_transid, btrfs_header_generation(eb)); That looks great. Ideally all such errors would look like this. Thanks for looking into this, I appreciate it. Best, Marc -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs_qgroup_create unused parameter
Hi I think you are right, @name here is unneeded.. You can give a patch for that.^_^ Wang |The code is pasted below for convenience of reference, but in the function to create a qgruop, it taks a 4th parameter (char * name). I assume this is the name of the path to limit, however, i don't see where its used anywhere in the function. -Kevin Brandstatter int btrfs_create_qgroup(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, u64 qgroupid, *char** *name**)* { struct btrfs_root *quota_root; struct btrfs_qgroup *qgroup; int ret = 0; mutex_lock(fs_info-qgroup_ioctl_lock); quota_root = fs_info-quota_root; if (!quota_root) { ret = -EINVAL; goto out; } qgroup = find_qgroup_rb(fs_info, qgroupid); if (qgroup) { ret = -EEXIST; goto out; } ret = add_qgroup_item(trans, quota_root, qgroupid); if (ret) goto out; spin_lock(fs_info-qgroup_lock); qgroup = add_qgroup_rb(fs_info, qgroupid); spin_unlock(fs_info-qgroup_lock); if (IS_ERR(qgroup)) ret = PTR_ERR(qgroup); out: mutex_unlock(fs_info-qgroup_ioctl_lock); return ret; }| -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On 07/03/2014 04:13 PM, Liu Bo wrote: On Wed, Jul 02, 2014 at 01:41:52PM -0700, Marc MERLIN wrote: This got triggered by an rsync I think. I'm not sure which of my btrfs FS has the issue yet since BUG_ON isn't very helpful as discussed earlier. [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 [160562.950297] [ cut here ] [160562.965904] kernel BUG at fs/btrfs/locking.c:269! But shouldn't messages like 'parent transid verify failed' print which device this happened on to give the operator a hint on where the problem is? Could someone do a pass at those and make sure they all print the device ID/name? Bug below: Full log before the crash: INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-transacti D 0 3358 2 0x 8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8 8800c6914390 000141c0 7fff 8801433b8f10 0002 8161c9b0 7fff 8800c50ebc60 Call Trace: [8161c9b0] ? sock_rps_reset_flow+0x32/0x32 [8161d3c6] schedule+0x73/0x75 [8161c9e9] schedule_timeout+0x39/0x129 [8107653d] ? get_parent_ip+0xd/0x3c [8162338f] ? preempt_count_add+0x7a/0x8d [8161dbac] __wait_for_common+0x11a/0x159 [8107810f] ? wake_up_state+0x12/0x12 [8161dc0f] wait_for_completion+0x24/0x26 [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28 [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55 [8122b25a] btrfs_commit_transaction+0x20d/0x8b0 [81227b41] transaction_kthread+0xf8/0x1ab [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c [8106b4b4] kthread+0xae/0xb6 [8106b406] ? __kthread_parkme+0x61/0x61 [8162667c] ret_from_fork+0x7c/0xb0 [8106b406] ? __kthread_parkme+0x61/0x61 INFO: task kworker/u8:13:13157 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. kworker/u8:13 D 0 13157 2 0x0080 Workqueue: btrfs-flush_delalloc normal_work_helper 8800041cfc00 0046 8800041cfbd0 8800041cffd8 8800034f40d0 000141c0 88021f2941c0 8800034f40d0 8800041cfca0 810fdc2f 0002 8800041cfc10 Call Trace: [810fdc2f] ? wait_on_page_read+0x3c/0x3c [8161d3c6] schedule+0x73/0x75 [8161d56b] io_schedule+0x60/0x7a [810fdc3d] sleep_on_page+0xe/0x12 [8161d7fd] __wait_on_bit+0x48/0x7a [810fdbdd] wait_on_page_bit+0x7a/0x7c [81084821] ? autoremove_wake_function+0x34/0x34 [810fef03] filemap_fdatawait_range+0x7e/0x126 [8122f1cf] ? btrfs_submit_direct+0x3f4/0x3f4 [8122d7aa] ? btrfs_writepages+0x28/0x2a [811084c6] ? do_writepages+0x1e/0x2c [810ff38e] ? __filemap_fdatawrite_range+0x55/0x57 [8124006f] btrfs_wait_ordered_range+0x6a/0x11a [8122fe01] btrfs_run_delalloc_work+0x27/0x69 [812508db] normal_work_helper+0xfe/0x240 [81065d7e] process_one_work+0x195/0x2d2 [81066020] worker_thread+0x136/0x205 [81065eea] ? process_scheduled_works+0x2f/0x2f [8106b4b4] kthread+0xae/0xb6 [8106b406] ? __kthread_parkme+0x61/0x61 [8162667c] ret_from_fork+0x7c/0xb0 [8106b406] ? __kthread_parkme+0x61/0x61 INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-transacti D 0 3358 2 0x 8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8 8800c6914390 000141c0 7fff 8801433b8f10 0002 8161c9b0 7fff 8800c50ebc60 Call Trace: [8161c9b0] ? sock_rps_reset_flow+0x32/0x32 [8161d3c6] schedule+0x73/0x75 [8161c9e9] schedule_timeout+0x39/0x129 [8107653d] ? get_parent_ip+0xd/0x3c [8162338f] ? preempt_count_add+0x7a/0x8d [8161dbac] __wait_for_common+0x11a/0x159 [8107810f] ? wake_up_state+0x12/0x12 [8161dc0f] wait_for_completion+0x24/0x26 [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28 [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55 [8122b25a] btrfs_commit_transaction+0x20d/0x8b0 [81227b41] transaction_kthread+0xf8/0x1ab [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On 07/04/2014 12:11 PM, Marc MERLIN wrote: On Fri, Jul 04, 2014 at 11:07:22AM +0800, Liu Bo wrote: [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 What should I be doing about this? Does it mean that I do have some kind of corruption/damage on my filesystem? If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read the copy and repair the crc with the good one, it's all we can do about it. Right. It's not quite my question though. I mean I don't know what device it's on, never mind what file is affected. If I know which file is corrupted, I can simply delete it and restore from backup, no biggie. Right now I don't even know which one of my 3 btrfs filesystems (over 10TB) has this problem. That makes the message kind of problematic: you have a problem, but not I'm not giving you any fighting chance of finding out where :) Also, is it possible to have all these messages state which devid they occurred on? I don't even know which device I should be worrying about right now, and although I'm running scrub now, my understanding is that scrub doesn't actually look at FS structures and is likely to miss this anyway. Yes we can but it'd need a bit more effort, for now, all device msg we've seen in panic info comes from sb-s_id which points to @fs_info-latest_device. Food for though, as is the message is unfortunately close to useless, except to an FS developer with a system that has only one btrfs filesystem. On Fri, Jul 04, 2014 at 11:50:25AM +0800, Wang Shilong wrote: I am afraid, scrub maybe could not fix such kind of errors, all scrub doing is to verify whether checksums match and if possible use good mirrors to rewrite bad one. I wouldn't be bothered if scrub can't fix it, but it would be good if it could tell me. Such errors seem imply contention itself is corrupted, we may have passed checksum check after ending io, but we fail generation check afterwards. So should I really replace scrub with find / -type f -print0 | xargs grep . /dev/null ? Basically we need something that will scan the filesystem and ensure that all files are reachable correctly without causing filesystem problems, and if one is bad, output the name of the bad file(s). Scrub only does a half job of that it seems. To get physical device name, we still need mirror num to know which device we are locating. Ok, so it's missing for now and therefore the code can't easily report it, I understand. Well, I explained the problem, ext4 and others of course tell me which devid an error is on, hopefully btrfs will able to do so in the near future. So it is ok for you to print one of btrfs filesystem device(for example device name) ? maybe it is not really physical address the metadata locates in, this is easier. Back to the original problem, would you agree that find / -type f -print0 | xargs grep . /dev/nul? may do a better job scanning the entire FS for problems than scrub would? Thanks, Marc -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On 07/04/2014 11:07 AM, Liu Bo wrote: On Thu, Jul 03, 2014 at 06:44:21AM -0700, Marc MERLIN wrote: Thanks for the patch. Hopefully this will make it to the next 3.15.x kernel. I also went back to 3.14 anyway since the 'blocked for 120 seconds' look like another instance of deadlocks we've been discussing here. But just curious: [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 What should I be doing about this? Does it mean that I do have some kind of corruption/damage on my filesystem? If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read the copy and repair the crc with the good one, it's all we can do about it. Also, is it possible to have all these messages state which devid they occurred on? I don't even know which device I should be worrying about right now, and although I'm running scrub now, my understanding is that scrub doesn't actually look at FS structures and is likely to miss this anyway. Yes we can but it'd need a bit more effort, for now, all device msg we've seen in panic info comes from sb-s_id which points to @fs_info-latest_device. You means something like this: + printk_ratelimited(BTRFS (device: %s) parent transid verify failed on %llu wanted %llu found %llu\n, + eb-fs_info-sb-s_id, eb-start, + parent_transid, btrfs_header_generation(eb)); thanks, -liubo Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] btrfs: label should not contain return char
Hi Satoru and all, I think there maybe a leftover issue. That is if we don't set label, in default it will output a blank line. Steps to reproduce: # mkfs.btrfs -f /dev/sdb # mount /dev/sdb # cat /sys/fs/btrfs/uuid/label --an extra line will be outputed. This is because in btrfs_label_show(), we did something like this directly: return snprintf(buf, PAGE_SIZE, %s\n, fs_info-super_copy-label); Maybe we can have a check whether label is NULL before we output? otherwise,the extra blank line is outputed, IMO this is not so nice thing! Thanks, Wang On 07/01/2014 01:22 PM, Satoru Takeuchi wrote: Although Anand once sent the following two patches, - [PATCH 1/2 v4] btrfs: label should not contain return char - [PATCH 2/2 v4] btrfs: usage error should not be logged into system log only the latter patch was merged to mason/for-linus and 3.16-rc3 as 402a0f4 (by accident?). It results in that the former patch can't be cleanly applied to 3.16-rc3. I fixed this problem, wrote a reproducer, and tested it. Test Result: 3.16-rc3 w/o this patch: fail 3.16-rc3 w/ this patch: pass Subject: [PATCH v5] btrfs: label should not contain return char From: Anand Jain anand.j...@oracle.com generally if you use echo test /sys/fs/btrfs/fsid/label it would introduce return char at the end and it can not be part of the label. The correct command is echo -n test /sys/fs/btrfs/fsid/label This patch will check for this user error reproducer.sh: === #!/bin/bash TEST_DEV=/dev/vdb TEST_DIR=/home/sat/mnt umount /home/sat/mnt mkfs.btrfs -f $TEST_DEV UUID=$(btrfs fi show $TEST_DEV | head -1 | sed -e 's/.*uuid: \([-0-9a-z]*\)$/\1/') mount $TEST_DEV $TEST_DIR LABELFILE=/sys/fs/btrfs/$UUID/label echo testlabel $LABELFILE LINES=$(cat $LABELFILE | wc -l | awk '{print $1}') RET=1 if [ $LINES -eq 1 ] ; then echo '[PASS] Trailing \n is removed correctly.' 2 RET=0 else echo '[FAIL] Trailing \n still exists.' 2 fi exit $RET === Signed-off-by: Anand Jain anand.j...@oracle.com Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- v5: tweak to be able to apply on the top of 402a0f4. add test program. v4: used memcpy and memset. Thanks David again v3: accepts review comments. Thanks David and Eric again v2: accepts review comments. Thanks Eric and Roman --- fs/btrfs/sysfs.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index df39458..dcae61a 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -374,8 +374,15 @@ static ssize_t btrfs_label_store(struct kobject *kobj, struct btrfs_trans_handle *trans; struct btrfs_root *root = fs_info-fs_root; int ret; + size_t p_len; - if (len = BTRFS_LABEL_SIZE) + /* + * p_len is the len until the first occurrence of either + * '\n' or '\0' + */ + p_len = strcspn(buf, \n); + + if (p_len = BTRFS_LABEL_SIZE) return -EINVAL; trans = btrfs_start_transaction(root, 0); @@ -383,7 +390,8 @@ static ssize_t btrfs_label_store(struct kobject *kobj, return PTR_ERR(trans); spin_lock(root-fs_info-super_lock); - strcpy(fs_info-super_copy-label, buf); + memset(fs_info-super_copy-label, 0, BTRFS_LABEL_SIZE); + memcpy(fs_info-super_copy-label, buf, p_len); spin_unlock(root-fs_info-super_lock); ret = btrfs_commit_transaction(trans, root); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v3 RESEND] Btrfs: device_list_add() should not update list when mounted
Hi Anand, Sorry for delay reply, more comments below: On 06/13/2014 12:26 PM, Anand Jain wrote: From: Anand Jain anand.j...@oracle.com device_list_add() is called when user runs btrfs dev scan, which would add any btrfs device into the btrfs_fs_devices list. Now think of a mounted btrfs. And a new device which contains the a SB from the mounted btrfs devices. In this situation when user runs btrfs dev scan, the current code would just replace existing device with the new device. Which is to note that old device is neither closed nor gracefully removed from the btrfs. The FS is still operational with the old bdev however the device name is the btrfs_device is new which is provided by the btrfs dev scan. reproducer: devmgt[1] detach /dev/sdc replace the missing disk /dev/sdc btrfs rep start -f 1 /dev/sde /btrfs Label: none uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120 Total devices 2 FS bytes used 32.00KiB devid1 size 958.94MiB used 115.88MiB path /dev/sde devid2 size 958.94MiB used 103.88MiB path /dev/sdd make /dev/sdc to reappear devmgt attach host2 btrfs dev scan btrfs fi show -m Label: none uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120^M Total devices 2 FS bytes used 32.00KiB^M devid1 size 958.94MiB used 115.88MiB path /dev/sdc - Wrong. devid2 size 958.94MiB used 103.88MiB path /dev/sdd since /dev/sdc has been replaced with /dev/sde, the /dev/sdc shouldn't be part of the btrfs-fsid when it reappears. If user want it to be part of it then sys admin should be using btrfs device add instead. [1] github.com/anajain/devmgt.git Signed-off-by: Anand Jain anand.j...@oracle.com --- v2-v3: simpler commit and comment message v1-v2: remove '---' in commit messages which conflict with git am fs/btrfs/volumes.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index bb2cd66..56822f0 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -472,6 +472,7 @@ static noinline int device_list_add(const char *path, device = __find_device(fs_devices-devices, devid, disk_super-dev_item.uuid); } + You add an extra line here. if (!device) { if (fs_devices-opened) return -EBUSY; @@ -497,6 +498,32 @@ static noinline int device_list_add(const char *path, device-fs_devices = fs_devices; } else if (!device-name || strcmp(device-name-str, path)) { + /* +* When FS is already mounted. +* 1. If you are here and if the device-name is NULL that means +*this device was missing at time of FS mount. +* 2. If you are here and if the device-name is different from 'path' +*that means either +* a. The same device disappeared and reappeared with different +* name. or +* b. The missing-disk-which-was-replaced, has reappeared now. +* +* We must allow 1 and 2a above. But 2b would be a spurious and +* unintentional. +* Further in case of 1 and 2a above, the disk at 'path' would have +* missed some transaction when it was away and in case of 2a +* the stale bdev has to be updated as well. +* 2b must not be allowed at all time. +*/ + + /* +* As of now don't allow update to btrfs_fs_device through the +* btrfs dev scan cli, after FS has been mounted. +*/ + + if (fs_devices-opened) + return -EBUSY; + I agree we don't allow to update device list when mounted, i tested this patch, it worked but it will output the following message: Scanning for Btrfs filesystems in '/dev/sdc' ERROR: unable to scan the device '/dev/sdc' - Device or resource busy I think this is a little confusing for common users. Maybe don't return error but output some log message into kernel buffer, better? Thanks, Wang name = rcu_string_strdup(path, GFP_NOFS); if (!name) return -ENOMEM; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs: check generation as replace duplicates devid+uuid
Hi Anand, On 06/13/2014 12:26 PM, Anand Jain wrote: ..snip.. @@ -523,6 +523,16 @@ static noinline int device_list_add(const char *path, if (fs_devices-opened) return -EBUSY; + else { + /* +* That is if the FS is _not_ mounted and if you are here, that +* means there is more than one disk with same uuid and devid. +* We keep the one with larger generation number or the last-in +* if generation are equal. +*/ + if (found_transid device-generation) + return -EINVAL; + } I tried this patch it outputed the following message if it encounter two device with the same uuid and device id: Scanning for Btrfs filesystems ERROR: device scan failed '/dev/sdc' - Invalid argument Same comment as your first patch here. name = rcu_string_strdup(path, GFP_NOFS); if (!name) @@ -535,6 +545,15 @@ static noinline int device_list_add(const char *path, } } + /* +* Unmount does not free the btrfs_device struct but would zero +* generation along with most of the other members. So just update +* it back. We need it to pick the disk with largest generation +* (as above). +*/ + if (!fs_devices-opened) + device-generation = found_transid; + if (found_transid fs_devices-latest_trans) { fs_devices-latest_devid = devid; fs_devices-latest_trans = found_transid; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6] btrfs: label should not contain return char
Hi Satoru, On 07/01/2014 04:00 PM, Satoru Takeuchi wrote: Hi Wang, (2014/07/01 15:46), Wang Shilong wrote: Hi Satoru and all, I think there maybe a leftover issue. That is if we don't set label, in default it will output a blank line. Steps to reproduce: # mkfs.btrfs -f /dev/sdb # mount /dev/sdb # cat /sys/fs/btrfs/uuid/label --an extra line will be outputed. This is because in btrfs_label_show(), we did something like this directly: return snprintf(buf, PAGE_SIZE, %s\n, fs_info-super_copy-label); Maybe we can have a check whether label is NULL before we output? otherwise,the extra blank line is outputed, IMO this is not so nice thing! OK, how about it is? I also add a test for empty-label case. From: Anand Jain anand.j...@oracle.com generally if you use echo test /sys/fs/btrfs/fsid/label it would introduce return char at the end and it can not be part of the label. The correct command is echo -n test /sys/fs/btrfs/fsid/label This patch will check for this user error reproducer.sh: === #!/bin/bash TEST_DEV=/dev/vdb TEST_DIR=/home/sat/mnt umount /home/sat/mnt mkfs.btrfs -f $TEST_DEV UUID=$(btrfs fi show $TEST_DEV | head -1 | sed -e 's/.*uuid: \([-0-9a-z]*\)$/\1/') mount $TEST_DEV $TEST_DIR LABELFILE=/sys/fs/btrfs/$UUID/label echo Test for empty label... 2 LINES=$(cat $LABELFILE | wc -l | awk '{print $1}') RET=0 if [ $LINES -eq 0 ] ; then echo '[PASS] Trailing \n is removed correctly.' 2 else echo '[FAIL] Trailing \n still exists.' 2 RET=1 fi echo Test for non-empty label... 2 echo testlabel $LABELFILE LINES=$(cat $LABELFILE | wc -l | awk '{print $1}') if [ $LINES -eq 1 ] ; then echo '[PASS] Trailing \n is removed correctly.' 2 else echo '[FAIL] Trailing \n still exists.' 2 RET=1 fi exit $RET === Signed-off-by: Anand Jain anand.j...@oracle.com Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Thanks for pushing this patch, Now it Looks good to me. Reviewed-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- v6: fix empty-label case v5: tweak to be able to apply on the top of 402a0f4. add test program. v4: used memcpy and memset. Thanks David again v3: accepts review comments. Thanks David and Eric again v2: accepts review comments. Thanks Eric and Roman --- fs/btrfs/sysfs.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index df39458..68d1f63 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -363,7 +363,8 @@ static ssize_t btrfs_label_show(struct kobject *kobj, struct kobj_attribute *a, char *buf) { struct btrfs_fs_info *fs_info = to_fs_info(kobj); - return snprintf(buf, PAGE_SIZE, %s\n, fs_info-super_copy-label); + char *label = fs_info-super_copy-label; + return snprintf(buf, PAGE_SIZE, label[0] ? %s\n : %s, label); } static ssize_t btrfs_label_store(struct kobject *kobj, @@ -374,8 +375,15 @@ static ssize_t btrfs_label_store(struct kobject *kobj, struct btrfs_trans_handle *trans; struct btrfs_root *root = fs_info-fs_root; int ret; + size_t p_len; - if (len = BTRFS_LABEL_SIZE) + /* + * p_len is the len until the first occurrence of either + * '\n' or '\0' + */ + p_len = strcspn(buf, \n); + + if (p_len = BTRFS_LABEL_SIZE) return -EINVAL; trans = btrfs_start_transaction(root, 0); @@ -383,7 +391,8 @@ static ssize_t btrfs_label_store(struct kobject *kobj, return PTR_ERR(trans); spin_lock(root-fs_info-super_lock); - strcpy(fs_info-super_copy-label, buf); + memset(fs_info-super_copy-label, 0, BTRFS_LABEL_SIZE); + memcpy(fs_info-super_copy-label, buf, p_len); spin_unlock(root-fs_info-super_lock); ret = btrfs_commit_transaction(trans, root); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: clear compress-force when remounting with compress option
On 07/02/2014 09:58 AM, Satoru Takeuchi wrote: (2014/07/02 1:04), David Sterba wrote: On Mon, Jun 30, 2014 at 10:51:25AM +0800, Wang Shilong wrote: Steps to reproduce: # mkfs.btrfs -f /dev/sdb # mount /dev/sdb /mnt -o compress-force=lzo # mount /dev/sdb /mnt -o remount,compress=zlib # cat /proc/mounts Remounting from compress-force to compress could not clear compress-force option. The problem is there is no way for users to clear compress-force option separately. Fix this problem by clearing @FORCE_COMPRESS flag when remounting to compress=xxx. Suggested-by: Tsutomu Itoh t-i...@jp.fujitsu.com Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com I'm sorry for lack of answer in the V1 thread, V2 is the minimal bugfix and I'm fine with it right away. Reviewed-by: David Sterba dste...@suse.cz Test result (I used your reproducer): 3.16-rc3: FAIL 3.16-rc3 w/ your patch: PASS Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Satoru, Thanks for doing this. ^_^ Regards, Wang Thanks, Satoru -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v3 RESEND] Btrfs: device_list_add() should not update list when mounted
On 07/02/2014 12:31 AM, Anand Jain wrote: Thanks for the commenting Wang. inline below. On 01/07/2014 15:16, Wang Shilong wrote: Hi Anand, Sorry for delay reply, more comments below: On 06/13/2014 12:26 PM, Anand Jain wrote: From: Anand Jain anand.j...@oracle.com [...] I agree we don't allow to update device list when mounted, i tested this patch, it worked but Thanks for testing this patch set. it will output the following message: Scanning for Btrfs filesystems in '/dev/sdc' ERROR: unable to scan the device '/dev/sdc' - Device or resource busy I think this is a little confusing for common users. Maybe don't return error but output some log message into kernel buffer, better? the Other choices are: display the error only when specific device is used by the user like 'btrfs dev scan /dev/sdc' And don't print busy / invalid error when a system wide scan is used like 'btrfs dev scan'. To achieve this we have to tweak btrfs-progs. Maybe return EEXIST value here is better. Also for your second patch, if we replace one disk with existed disk, printk that info is good.:-) Regard, Wang or put the error under the verbose option in the btrfs-progs. Thanks, Wang name = rcu_string_strdup(path, GFP_NOFS); if (!name) return -ENOMEM; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] btrfs: looping 'mkfs.btrfs -f dev' may fail with EBUSY
On 06/12/2014 02:25 PM, Anand Jain wrote: The thread holding the O_EXCL flag seems to be BTRFS_IOC_SCAN_DEV ioctl, which in turn calls btrfs_scan_one_device() to open dev with the O_EXCL flag. But btrfs_scan_one_device() does not write anything to the disk. and it is called by . An intermediary step (not the final open_ctree) in the mount thread to read the SB and . btrfs-control ioctls viz BTRFS_IOC_SCAN_DEV and BTRFS_IOC_DEVICES_READY they don't need the O_EXCL. test script: (run this in a loop) static int test_skip_this_disk(char *path) { int fd; char c; printf(%s , path); fd = open(path, O_RDWR|O_EXCL); if (fd 0) { printf(Open failed\n); return 1; } /*fflush(stdout); printf(Open Fine press enter\n); scanf(%c, c);*/ close(fd); return 0; } main(int arg, char **argv) { int i; if (arg == 1) { printf(usage: %s dev-with-btrfs-sb .. \n, argv[0]); exit(1); } for (i = 1; i arg; i++) test_skip_this_disk(argv[i]); } dump stack after the userland close(fd) dump_stack+0x9/0x60 btrfs_scan_one_device+0x18d/0x1f0 [btrfs] btrfs_control_ioctl+0xb9/0x210 [btrfs] do_vfs_ioctl+0x84/0x4c0 inode_has_perm+0x28/0x30 file_has_perm+0x8a/0xa0 SyS_ioctl+0x91/0xa0 system_call_fastpath+0x16/0x1b Signed-off-by: Anand Jain anand.j...@oracle.com --- fs/btrfs/volumes.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 636faa0..c186b5e 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -914,7 +914,6 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ bytenr = btrfs_sb_offset(0); - flags |= FMODE_EXCL; I could not think whether it will cause some big problem if we remove this flag. So Cc Chris and others. Any ideas about this problem? Thanks, Wang mutex_lock(uuid_mutex); bdev = blkdev_get_by_path(path, flags, holder); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fill_holes: Fix slot number passed to hole_mergeable() call.
On 07/01/2014 02:34 PM, Chandan Rajendra wrote: For a non-existent key, btrfs_search_slot() sets path-slots[0] to the slot where the key could have been present, which in this case would be the slot containing the extent item which would be the next neighbor of the file range being punched. The current code passes an incremented path-slots[0] and we skip to the wrong file extent item. This would mean that we would fail to merge the yet to be created hole with the next neighboring hole (if one exists). Fix this. Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com --- Nice catch, this looks good to me. Reviewed-by: Wang Shilong wangsl.f...@cn.fujitsu.com fs/btrfs/file.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index ad7c059..3cd7997 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2126,10 +2126,9 @@ static int fill_holes(struct btrfs_trans_handle *trans, struct inode *inode, goto out; } - if (hole_mergeable(inode, leaf, path-slots[0]+1, offset, end)) { + if (hole_mergeable(inode, leaf, path-slots[0], offset, end)) { u64 num_bytes; - path-slots[0]++; key.offset = offset; btrfs_set_item_key_safe(root, path, key); fi = btrfs_item_ptr(leaf, path-slots[0], -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scrub being stuck
On 06/30/2014 01:14 AM, Florian Lindner wrote: Hello, I've just completed a scrub on my home filesystem and now I wanted to start on my Archiv btrfs RAID 0. root@horus ~ # uname - a Linux horus 3.15.2-1-ARCH #1 SMP PREEMPT Fri Jun 27 07:41:19 CEST 2014 x86_64 GNU/Linux root@horus ~ # btrfs --version Btrfs v3.14.2-dirty root@horus ~ # btrfs fi show Label: 'home' uuid: 571009c0-6cc4-4598-a77b-f175501d23fd Total devices 1 FS bytes used 2.03TiB devid1 size 2.73TiB used 2.04TiB path /dev/sdb1 Label: 'Archiv' uuid: b2504193-efeb-4ef3-8797-a8d73536a416 Total devices 2 FS bytes used 5.14TiB devid1 size 2.73TiB used 2.73TiB path /dev/sdd1 devid2 size 2.73TiB used 2.73TiB path /dev/sde1 Btrfs v3.14.2-dirty root@horus ~ # btrfs fi df /mnt/Archiv Data, RAID0: total=5.44TiB, used=5.13TiB Data, single: total=8.00MiB, used=244.00KiB System, RAID1: total=32.00MiB, used=396.00KiB System, single: total=4.00MiB, used=0.00 Metadata, RAID1: total=10.94GiB, used=8.71GiB unknown, single: total=512.00MiB, used=0.00 root@horus ~ # btrfs scrub status /mnt/Archiv scrub status for b2504193-efeb-4ef3-8797-a8d73536a416 scrub started at Thu Jan 2 21:13:35 2014, running for 1336 seconds total bytes scrubbed: 473.28GiB with 0 errors root@horus ~ # btrfs scrub cancel /mnt/Archiv ERROR: scrub cancel failed on /mnt/Archiv: not running root@horus ~ # btrfs scrub resume /mnt/Archiv ERROR: scrub is already running. To cancel use 'btrfs scrub cancel /mnt/Archiv'. To see the status use 'btrfs scrub status [-d] /mnt/Archiv'. root@horus ~ # btrfs scrub start /mnt/Archiv ERROR: scrub is already running. To cancel use 'btrfs scrub cancel /mnt/Archiv'. To see the status use 'btrfs scrub status [-d] /mnt/Archiv'. A scrub I started at Jan 02 seems to be stuck somehow. The command outputs above were generated right now. I'll be hapy to provide any more information. Thx! Florian Scrub log record file seems corrupted sometimes, if you use latest btrfs-progs version. There should be an option '-f' for 'btrfs scrub start' which will allow scrub to skip check in userspace. Actually, i am thinking whether it would be better to remove userspace check about scrub status since log record file is not so reliable. Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: clear compress-force when remounting with compress option
Steps to reproduce: # mkfs.btrfs -f /dev/sdb # mount /dev/sdb /mnt -o compress-force=lzo # mount /dev/sdb /mnt -o remount,compress=zlib # cat /proc/mounts Remounting from compress-force to compress could not clear compress-force option. The problem is there is no way for users to clear compress-force option separately. Fix this problem by clearing @FORCE_COMPRESS flag when remounting to compress=xxx. Suggested-by: Tsutomu Itoh t-i...@jp.fujitsu.com Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- v1-v2: remove unnecessary message output which was addressed by David. --- fs/btrfs/super.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 4662d92..5aae8f0 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -513,6 +513,13 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) btrfs_info(root-fs_info, btrfs: use %s compression, compress_type); + /* +* If we remount from compress-force=xxx to +* compress=xxx, we need clear FORCE_COMPRESS +* flag, otherwise, there is no way for users +* to disable forcible compression separately. +*/ + btrfs_clear_opt(info-mount_opt, FORCE_COMPRESS); } break; case Opt_ssd: -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix race between balance recovery and root deletion
Balance recovery is called when RW mounting or remounting from RO to RW, it is called to finish roots merging. When doing balance recovery, relocation root's corresponding fs root(whose root refs is 0) might be destroyed by cleaner thread, this will make btrfs fail to mount. Fix this problem by holding @cleaner_mutex when doing balance recovery. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- fs/btrfs/disk-io.c | 2 ++ fs/btrfs/super.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8bb4aa1..61d8e87 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2904,7 +2904,9 @@ retry_root_backup: if (ret) goto fail_qgroup; + mutex_lock(fs_info-cleaner_mutex); ret = btrfs_recover_relocation(tree_root); + mutex_unlock(fs_info-cleaner_mutex); if (ret 0) { printk(KERN_WARNING BTRFS: failed to recover relocation\n); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 4662d92..b6ebde2 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1467,7 +1467,9 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data) goto restore; /* recover relocation */ + mutex_lock(fs_info-cleaner_mutex); ret = btrfs_recover_relocation(root); + mutex_unlock(fs_info-cleaner_mutex); if (ret) goto restore; -- 1.8.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html