Re: [PATCH] xfstests: remove check_scratch_fs in btrfs/012
On Wed, Sep 03, 2014 at 11:25:59AM +0800, Liu Bo wrote: From: Liu Bo liub.li...@gmail.com btrfs/012 is a case to verify btrfs-convert feature, it converts an ext4 to btrfs firstly and do something, then rolls back to ext4. So at last we have a ext4 on the scratch device, but setting _require_scratch will force a btrfsck on a ext4 fs because $FSTYP here is btrfs, and it ends up with a failure report of _check_btrfs_filesystem. Now that we have deliberately check the final ext4 fs in btrfs/012, just do not set _require_scratch in this case. Signed-off-by: Liu Bo liub.li...@gmail.com --- tests/btrfs/012 | 1 - 1 file changed, 1 deletion(-) diff --git a/tests/btrfs/012 b/tests/btrfs/012 index f7e5da5..12f6462 100755 --- a/tests/btrfs/012 +++ b/tests/btrfs/012 @@ -52,7 +52,6 @@ _cleanup() # Modify as appropriate. _supported_fs btrfs _supported_os Linux -_require_scratch The test still requires a scratch device, so we cannot simply remove this line. Now we can use _require_scratch_nocheck helper, and it works fine based on my test. Thanks, Eryu BTRFS_CONVERT_PROG=`set_prog_path btrfs-convert` MKFS_EXT4_PROG=`set_prog_path mkfs.ext4` -- 1.8.1.4 ___ xfs mailing list x...@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] btrfs: correct a message on setting nodatacow
Hi Qu, Thank you for your comment. (2014/09/19 11:03), Qu Wenruo wrote: Original Message Subject: [PATCH 2/5] btrfs: correct a message on setting nodatacow From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org Date: 2014年09月18日 16:28 From: Naohiro Aota na...@elisp.net If we set nodatacow mount option after compress-force option, we don't get compression disabling message. === $ sudo mount -o remount,compress-force,nodatacow /; dmesg|tail -n 3 [ 3845.719047] BTRFS info (device vda2): force zlib compression [ 3845.719052] BTRFS info (device vda2): setting nodatacow [ 3845.719055] BTRFS info (device vda2): disk space caching is enabled === Signed-off-by: Naohiro Aota na...@elisp.net Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- fs/btrfs/super.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index d1c5b6d..d131098 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -462,8 +462,7 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) break; case Opt_nodatacow: if (!btrfs_test_opt(root, NODATACOW)) { -if (!btrfs_test_opt(root, COMPRESS) || -!btrfs_test_opt(root, FORCE_COMPRESS)) { +if (btrfs_test_opt(root, COMPRESS)) { btrfs_info(root-fs_info, setting nodatacow, compression disabled); } else { -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Although the patch makes the output ok, the core problem is missing conflict options check. compress-force mount options implies datacow and datasum, but following nodatasum will disable datasum and compress, in fact they are conflicting mount option... Even the current behavior(later mount option will override previous ones) provides great tolerance, IMO there should better be some conflicting check for mount options. For example, we first save all the mount options passed in into a temporary bitmaps to finds out the conflicting and only when they contains no conflicts, set the mount options to fs_info. (Maybe bitmap is not enough for this case, since we can't distinguish default value and value to be set?) What do you think about this idea ? I'm against your idea for two reasons and it's better to stay in current behavior though it's a bit complex. First, the rule last one wins is not only a conventional rule, but also is what mount(8) says. https://git.kernel.org/cgit/utils/util-linux/util-linux.git/tree/sys-utils/mount.8#n253 == The usual behavior is that the last option wins if there are conflicting ones. == Second, if we change the behavior, we would break existing systems. At worst case, users would fail to boot their system after updating kernel, because of the failure of mounting Btrfs at the init process. Thanks, Satoru Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] btrfs: correct a message on setting nodatacow
Original Message Subject: Re: [PATCH 2/5] btrfs: correct a message on setting nodatacow From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org Date: 2014年09月19日 14:45 Hi Qu, Thank you for your comment. (2014/09/19 11:03), Qu Wenruo wrote: Original Message Subject: [PATCH 2/5] btrfs: correct a message on setting nodatacow From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org Date: 2014年09月18日 16:28 From: Naohiro Aota na...@elisp.net If we set nodatacow mount option after compress-force option, we don't get compression disabling message. === $ sudo mount -o remount,compress-force,nodatacow /; dmesg|tail -n 3 [ 3845.719047] BTRFS info (device vda2): force zlib compression [ 3845.719052] BTRFS info (device vda2): setting nodatacow [ 3845.719055] BTRFS info (device vda2): disk space caching is enabled === Signed-off-by: Naohiro Aota na...@elisp.net Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- fs/btrfs/super.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index d1c5b6d..d131098 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -462,8 +462,7 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) break; case Opt_nodatacow: if (!btrfs_test_opt(root, NODATACOW)) { -if (!btrfs_test_opt(root, COMPRESS) || -!btrfs_test_opt(root, FORCE_COMPRESS)) { +if (btrfs_test_opt(root, COMPRESS)) { btrfs_info(root-fs_info, setting nodatacow, compression disabled); } else { -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Although the patch makes the output ok, the core problem is missing conflict options check. compress-force mount options implies datacow and datasum, but following nodatasum will disable datasum and compress, in fact they are conflicting mount option... Even the current behavior(later mount option will override previous ones) provides great tolerance, IMO there should better be some conflicting check for mount options. For example, we first save all the mount options passed in into a temporary bitmaps to finds out the conflicting and only when they contains no conflicts, set the mount options to fs_info. (Maybe bitmap is not enough for this case, since we can't distinguish default value and value to be set?) What do you think about this idea ? I'm against your idea for two reasons and it's better to stay in current behavior though it's a bit complex. First, the rule last one wins is not only a conventional rule, but also is what mount(8) says. https://git.kernel.org/cgit/utils/util-linux/util-linux.git/tree/sys-utils/mount.8#n253 == The usual behavior is that the last option wins if there are conflicting ones. == Second, if we change the behavior, we would break existing systems. At worst case, users would fail to boot their system after updating kernel, because of the failure of mounting Btrfs at the init process. Thanks, Satoru It really makes sense. So I'm OK to keep things and it's true that the conflict check is somewhat overkilled. Thanks, Qu Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map
(2014/09/18 17:55), Qu Wenruo wrote: The following commit enhanced the merge_extent_mapping() to reduce fragment in extent map tree, but it can't handle case which existing lies before map_start: 51f39 btrfs: Use right extent length when inserting overlap extent map. [BUG] When existing extent map's start is before map_start, the em-len will be minus, which will corrupt the extent map and fail to insert the new extent map. This will happen when someone get a large extent map, but when it is going to insert it into extent map tree, some one has already commit some write and split the huge extent into small parts. [REPRODUCER] It is very easy to tiger using filebench with randomrw personality. It is about 100% to reproduce when using 8G preallocated file in 60s randonrw test. [FIX] This patch can now handle any existing extent position. Since it does not directly use existing-start, now it will find the previous and next extent around map_start. So the old existing-start map_start bug will never happen again. [ENHANCE] This patch will insert the best fitted extent map into extent map tree, other than the oldest [map_start, map_start + sectorsize) or the relatively newer but not perfect [map_start, existing-start). The patch will first search existing extent that does not intersects with the desired map range [map_start, map_start + len). The existing extent will be either before or behind map_start, and based on the existing extent, we can find out the previous and next extent around map_start. So the best fitted extent would be [prev-end, next-start). For prev or next is not found, em-start would be prev-end and em-end wold be next-start. With this patch, the fragment in extent map tree should be reduced much more than the 51f39 commit and reduce an unneeded extent map tree search. Reported-by: Tsutomu Itoh t-i...@jp.fujitsu.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com Reviewed-by: Liu Bo bo.li@oracle.com Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Sorry to late reply. I confirmed this problem happens without this patch and does not happen with this patch. Thanks, Satoru --- changelog: v2: Liu Bo points out that the if() use 'start + len = existing-start' is not equal to original if(), which may cause problem in no-holes mode. --- fs/btrfs/inode.c | 79 1 file changed, 57 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 016c403..b3864b7 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6191,21 +6191,60 @@ out_fail_inode: goto out_fail; } +/* Find next extent map of a given extent map, caller needs to ensure locks */ +static struct extent_map *next_extent_map(struct extent_map *em) +{ + struct rb_node *next; + + next = rb_next(em-rb_node); + if (!next) + return NULL; + return container_of(next, struct extent_map, rb_node); +} + +static struct extent_map *prev_extent_map(struct extent_map *em) +{ + struct rb_node *prev; + + prev = rb_prev(em-rb_node); + if (!prev) + return NULL; + return container_of(prev, struct extent_map, rb_node); +} + /* helper for btfs_get_extent. Given an existing extent in the tree, + * the existing extent is the nearest extent to map_start, * and an extent that you want to insert, deal with overlap and insert - * the new extent into the tree. + * the best fitted new extent into the tree. */ static int merge_extent_mapping(struct extent_map_tree *em_tree, struct extent_map *existing, struct extent_map *em, u64 map_start) { + struct extent_map *prev; + struct extent_map *next; + u64 start; + u64 end; u64 start_diff; BUG_ON(map_start em-start || map_start = extent_map_end(em)); - start_diff = map_start - em-start; - em-start = map_start; - em-len = existing-start - em-start; + + if (existing-start map_start) { + next = existing; + prev = prev_extent_map(next); + } else { + prev = existing; + next = next_extent_map(prev); + } + + start = prev ? extent_map_end(prev) : em-start; + start = max_t(u64, start, em-start); + end = next ? next-start : extent_map_end(em); + end = min_t(u64, end, extent_map_end(em)); + start_diff = start - em-start; + em-start = start; + em-len = end - start; if (em-block_start EXTENT_MAP_LAST_BYTE !test_bit(EXTENT_FLAG_COMPRESSED, em-flags)) { em-block_start += start_diff; @@ -6482,25 +6521,21 @@ insert: ret = 0; - existing = lookup_extent_mapping(em_tree, start, len); - if (existing (existing-start
[PATCH 1/4] btrfs: correct empty compression property behavior
From: Naohiro Aota na...@elisp.net In the current implementation, compression property == has the two different meanings: one is with BTRFS_INODE_NOCOMPRESS, and the other is without this flag. So, even if the two files a and b have the same compression property, , and the same contents, one file seems to be compressed and the other is not. It's difficult to understand for users and also confuses them. Here is the real example. Let assume the following two cases. a) A file created freshly (under a directory without both COMPRESS and NOCOMPRESS flag.) b) A existing file which is explicitly set to compression property. In addition, here is the command log (I attached the source of getflags program in this patch.) === $ rm -f a b; touch a b $ btrfs prop set b compression # both a and b have the same compression property: $ btrfs prop get a compression $ btrfs prop get b compression # but ... let's take a look at inode flags $ ./getflags a 0x0 $ ./getflags b 0x400 # 0x400 (FS_NOCOMP_FL) corresponds to BTRFS_INODE_NOCOMPRESS === So both these two files have their compression property == , but have different NOCOMPRESS flag state leading to different behavior. case | BTRFS_INODE_NOCOMPRESS | behavior =++= a | unset | might be compressed b | set| never be compressed I consider that we should not expect users to remember whether their files are case a or b and should introduce another value for compress property anyway. getflags.c: === #include sys/ioctl.h #include sys/types.h #include sys/stat.h #include fcntl.h #include stdio.h #include linux/fs.h int main(int argc, char const* argv[]) { const char *name = argv[1]; int fd = open(name, O_RDONLY); long x; ioctl(fd, FS_IOC_GETFLAGS, x); printf(0x%lx\n, x); return 0; } === Signed-off-by: Naohiro Aota na...@elisp.net Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- fs/btrfs/props.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c index 129b1dd..bf005f4 100644 --- a/fs/btrfs/props.c +++ b/fs/btrfs/props.c @@ -393,8 +393,8 @@ static int prop_compression_apply(struct inode *inode, int type; if (len == 0) { - BTRFS_I(inode)-flags |= BTRFS_INODE_NOCOMPRESS; - BTRFS_I(inode)-flags = ~BTRFS_INODE_COMPRESS; + BTRFS_I(inode)-flags = + ~(BTRFS_INODE_COMPRESS | BTRFS_INODE_NOCOMPRESS); BTRFS_I(inode)-force_compress = BTRFS_COMPRESS_NONE; return 0; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] btrfs: introduce new compression property to disable compression at all
From: Naohiro Aota na...@elisp.net This new compression property, off, to disable compression of the file at all. Signed-off-by: Naohiro Aota na...@elisp.net Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- fs/btrfs/props.c | 13 + 1 file changed, 13 insertions(+) diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c index bf005f4..38efbe1 100644 --- a/fs/btrfs/props.c +++ b/fs/btrfs/props.c @@ -382,6 +382,8 @@ static int prop_compression_validate(const char *value, size_t len) return 0; else if (!strncmp(zlib, value, len)) return 0; + else if (!strncmp(off, value, len)) + return 0; return -EINVAL; } @@ -400,6 +402,14 @@ static int prop_compression_apply(struct inode *inode, return 0; } + if (!strncmp(off, value, len)) { + BTRFS_I(inode)-flags |= BTRFS_INODE_NOCOMPRESS; + BTRFS_I(inode)-flags = ~BTRFS_INODE_COMPRESS; + BTRFS_I(inode)-force_compress = BTRFS_COMPRESS_NONE; + + return 0; + } + if (!strncmp(lzo, value, len)) type = BTRFS_COMPRESS_LZO; else if (!strncmp(zlib, value, len)) @@ -423,5 +433,8 @@ static const char *prop_compression_extract(struct inode *inode) return lzo; } + if (BTRFS_I(inode)-flags BTRFS_INODE_NOCOMPRESS) + return off; + return NULL; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] btrfs: export __btrfs_set_prop
From: Naohiro Aota na...@elisp.net Export __btrfs_set_prop() to be able to call it with running transaction. Signed-off-by: Naohiro Aota na...@elisp.net Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- fs/btrfs/props.c | 2 +- fs/btrfs/props.h | 6 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c index 38efbe1..6f56f5b 100644 --- a/fs/btrfs/props.c +++ b/fs/btrfs/props.c @@ -99,7 +99,7 @@ find_prop_handler(const char *name, return NULL; } -static int __btrfs_set_prop(struct btrfs_trans_handle *trans, +int __btrfs_set_prop(struct btrfs_trans_handle *trans, struct inode *inode, const char *name, const char *value, diff --git a/fs/btrfs/props.h b/fs/btrfs/props.h index 100f188..cff91e0 100644 --- a/fs/btrfs/props.h +++ b/fs/btrfs/props.h @@ -28,6 +28,12 @@ int btrfs_set_prop(struct inode *inode, const char *value, size_t value_len, int flags); +int __btrfs_set_prop(struct btrfs_trans_handle *trans, + struct inode *inode, + const char *name, + const char *value, + size_t value_len, + int flags); int btrfs_load_inode_props(struct inode *inode, struct btrfs_path *path); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/2] Move BTRFS RCU string to common library
This patch series moves the generic RCU string library used internally by BTRFS to be accessible by anyone. It provides printk_in_rcu and printk_ratelimited_in_rcu to print these strings. In order to avoid a weird inconsistency between the two, the first patch fixes printk_ratelimited so it passes on the return value from printk. The second patch actually moves the RCU string library. Version 2 passes on the return values from printk{,_ratelimited} and fixes some style issues. Omar Sandoval (2): Return a value from printk_ratelimited Move BTRFS RCU string to common library fs/btrfs/check-integrity.c | 6 +-- fs/btrfs/dev-replace.c | 19 +- fs/btrfs/disk-io.c | 6 +-- fs/btrfs/extent_io.c | 4 +- fs/btrfs/ioctl.c | 4 +- fs/btrfs/raid56.c | 2 +- fs/btrfs/rcu-string.h | 56 fs/btrfs/scrub.c | 15 fs/btrfs/super.c | 2 +- fs/btrfs/volumes.c | 14 +++ include/linux/printk.h | 4 +- include/linux/rcustring.h | 91 ++ 12 files changed, 131 insertions(+), 92 deletions(-) delete mode 100644 fs/btrfs/rcu-string.h create mode 100644 include/linux/rcustring.h -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] Move BTRFS RCU string to common library
The RCU-friendy string API used internally by BTRFS is generic enough for common use. This doesn't add any new functionality, but instead just moves the code and documents the existing API. Signed-off-by: Omar Sandoval osan...@osandov.com --- fs/btrfs/check-integrity.c | 6 +-- fs/btrfs/dev-replace.c | 19 +- fs/btrfs/disk-io.c | 6 +-- fs/btrfs/extent_io.c | 4 +- fs/btrfs/ioctl.c | 4 +- fs/btrfs/raid56.c | 2 +- fs/btrfs/rcu-string.h | 56 fs/btrfs/scrub.c | 15 fs/btrfs/super.c | 2 +- fs/btrfs/volumes.c | 14 +++ include/linux/rcustring.h | 91 ++ 11 files changed, 128 insertions(+), 91 deletions(-) delete mode 100644 fs/btrfs/rcu-string.h create mode 100644 include/linux/rcustring.h diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index ce92ae3..4ccd7da 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -94,6 +94,7 @@ #include linux/mutex.h #include linux/genhd.h #include linux/blkdev.h +#include linux/rcustring.h #include ctree.h #include disk-io.h #include hash.h @@ -103,7 +104,6 @@ #include print-tree.h #include locking.h #include check-integrity.h -#include rcu-string.h #define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x1 #define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x1 @@ -851,8 +851,8 @@ static int btrfsic_process_superblock_dev_mirror( printk_in_rcu(KERN_INFO New initial S-block (bdev %p, %s) @%llu (%s/%llu/%d)\n, superblock_bdev, -rcu_str_deref(device-name), dev_bytenr, -dev_state-name, dev_bytenr, +rcu_string_dereference(device-name), +dev_bytenr, dev_state-name, dev_bytenr, superblock_mirror_num); list_add(superblock_tmp-all_blocks_node, state-all_blocks_list); diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c index eea26e1..87d10cc 100644 --- a/fs/btrfs/dev-replace.c +++ b/fs/btrfs/dev-replace.c @@ -25,6 +25,7 @@ #include linux/capability.h #include linux/kthread.h #include linux/math64.h +#include linux/rcustring.h #include asm/div64.h #include ctree.h #include extent_map.h @@ -34,7 +35,6 @@ #include volumes.h #include async-thread.h #include check-integrity.h -#include rcu-string.h #include dev-replace.h #include sysfs.h @@ -376,9 +376,9 @@ int btrfs_dev_replace_start(struct btrfs_root *root, printk_in_rcu(KERN_INFO BTRFS: dev_replace from %s (devid %llu) to %s started\n, src_device-missing ? missing disk : - rcu_str_deref(src_device-name), + rcu_string_dereference(src_device-name), src_device-devid, - rcu_str_deref(tgt_device-name)); + rcu_string_dereference(tgt_device-name)); tgt_device-total_bytes = src_device-total_bytes; tgt_device-disk_total_bytes = src_device-disk_total_bytes; @@ -528,9 +528,10 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, printk_in_rcu(KERN_ERR BTRFS: btrfs_scrub_dev(%s, %llu, %s) failed %d\n, src_device-missing ? missing disk : - rcu_str_deref(src_device-name), + rcu_string_dereference(src_device-name), src_device-devid, - rcu_str_deref(tgt_device-name), scrub_ret); + rcu_string_dereference(tgt_device-name), + scrub_ret); btrfs_dev_replace_unlock(dev_replace); mutex_unlock(root-fs_info-fs_devices-device_list_mutex); mutex_unlock(root-fs_info-chunk_mutex); @@ -544,9 +545,9 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info, printk_in_rcu(KERN_INFO BTRFS: dev_replace from %s (devid %llu) to %s) finished\n, src_device-missing ? missing disk : - rcu_str_deref(src_device-name), + rcu_string_dereference(src_device-name), src_device-devid, - rcu_str_deref(tgt_device-name)); + rcu_string_dereference(tgt_device-name)); tgt_device-is_tgtdev_for_dev_replace = 0; tgt_device-devid = src_device-devid; src_device-devid = BTRFS_DEV_REPLACE_DEVID; @@ -805,10 +806,10 @@ static int btrfs_dev_replace_kthread(void *data) printk_in_rcu(KERN_INFO BTRFS: continuing dev_replace from %s (devid
[PATCH v2 1/2] Return a value from printk_ratelimited
printk returns an integer; there's no reason for printk_ratelimited to swallow it. Signed-off-by: Omar Sandoval osan...@osandov.com --- include/linux/printk.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/printk.h b/include/linux/printk.h index d78125f..67534bc 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -343,12 +343,14 @@ extern asmlinkage void dump_stack(void) __cold; #ifdef CONFIG_PRINTK #define printk_ratelimited(fmt, ...) \ ({ \ + int __ret = 0; \ static DEFINE_RATELIMIT_STATE(_rs, \ DEFAULT_RATELIMIT_INTERVAL, \ DEFAULT_RATELIMIT_BURST); \ \ if (__ratelimit(_rs)) \ - printk(fmt, ##__VA_ARGS__); \ + __ret = printk(fmt, ##__VA_ARGS__); \ + __ret; \ }) #else #define printk_ratelimited(fmt, ...) \ -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] btrfs: Fix compression related ioctl to run atomic operations in one transaction
From: Naohiro Aota na...@elisp.net Fix the following two problems in compression related ioctl code. a) Updating compression flags and updating inode attribute in two separated transaction. So, if something bad happens after the former, and before the latter, file system would become inconsistent state. This patch move them into one transaction. b) It updates compression flags here and calls btrfs_set_prop() after that. However flags are also updated in this function. This patch removes the duplicated code for updating flags from ioctl code and aggregates this work to __btrfs_set_prop() at all. Signed-off-by: Naohiro Aota na...@elisp.net Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com --- fs/btrfs/ioctl.c | 32 +++- 1 file changed, 11 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 0ff2127..47ac6da 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -221,6 +221,7 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg) u64 ip_oldflags; unsigned int i_oldflags; umode_t mode; + const char *comp; if (!inode_owner_or_capable(inode)) return -EPERM; @@ -310,40 +311,29 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg) * things smaller. */ if (flags FS_NOCOMP_FL) { - ip-flags = ~BTRFS_INODE_COMPRESS; - ip-flags |= BTRFS_INODE_NOCOMPRESS; - - ret = btrfs_set_prop(inode, btrfs.compression, NULL, 0, 0); - if (ret ret != -ENODATA) - goto out_drop; + comp = off; } else if (flags FS_COMPR_FL) { - const char *comp; - - ip-flags |= BTRFS_INODE_COMPRESS; - ip-flags = ~BTRFS_INODE_NOCOMPRESS; - if (root-fs_info-compress_type == BTRFS_COMPRESS_LZO) comp = lzo; else comp = zlib; - ret = btrfs_set_prop(inode, btrfs.compression, -comp, strlen(comp), 0); - if (ret) - goto out_drop; - } else { - ret = btrfs_set_prop(inode, btrfs.compression, NULL, 0, 0); - if (ret ret != -ENODATA) - goto out_drop; - ip-flags = ~(BTRFS_INODE_COMPRESS | BTRFS_INODE_NOCOMPRESS); + comp = ; } - trans = btrfs_start_transaction(root, 1); + trans = btrfs_start_transaction(root, 2); if (IS_ERR(trans)) { ret = PTR_ERR(trans); goto out_drop; } + ret = __btrfs_set_prop(trans, inode, btrfs.compression, comp, + strlen(comp), 0); + if (ret ret != -ENODATA) { + btrfs_end_transaction(trans, root); + goto out_drop; + } + btrfs_update_iflags(inode); inode_inc_iversion(inode); inode-i_ctime = CURRENT_TIME; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help for creating a useful bugreport
Hi, my btrfs partition got corrupted. With some trouble I got most of the valuable data out of it using `btrfs restore -i` (it crashed a few times, but on the fourth or fifth run it reached the stuff I wanted to recover). As far as I can tell, the file system broke during normal operations without any hardware failures. Before I switch back to ext4, I'd like to file a bug report so my troubles were not completely in vain. Unfortunately I don't have much to work with. Can you help me with extracting enough information to create a useful bugreport? Regards, Jakob $ cat /etc/fedora-release Fedora release 20 (Heisenbug) $ uname -a Linux localhost.localdomain 3.16.2-200.fc20.x86_64 #1 SMP Mon Sep 8 11:54:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux $ sudo btrfs fi show /dev/dm-1 Label: 'EncJakobExtern' uuid: 8ccbd085-564d-4022-bfc9-18fd429d0a8d Total devices 1 FS bytes used 731.07GiB devid1 size 931.60GiB used 769.04GiB path /dev/mapper/luks-a266c492-2360-404d-9ad7-00edc2f0c09d Btrfs v3.16 $ sudo mount /dev/dm-1 /mnt/diverse/ mount: wrong fs type, bad option, bad superblock on /dev/mapper/luks-a266c492-2360-404d-9ad7-00edc2f0c09d, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. $ sudo mount /dev/dm-1 /mnt/diverse/ -o recovery mount: wrong fs type, bad option, bad superblock on /dev/mapper/luks-a266c492-2360-404d-9ad7-00edc2f0c09d, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. *Syslog:* Sep 19 10:11:11 localhost.localdomain kernel: BTRFS: device label EncJakobExtern devid 1 transid 4923 /dev/dm-1 Sep 19 10:11:11 localhost.localdomain udisksd[1971]: Unlocked LUKS device /dev/sdb1 as /dev/dm-1 Sep 19 10:16:13 localhost.localdomain sudo[5080]: jakob : TTY=pts/6 ; PWD=/home/jakob ; USER=root ; COMMAND=/bin/mount /dev/dm-1 /mnt/diverse/ Sep 19 10:16:17 localhost.localdomain kernel: BTRFS info (device dm-1): disk space caching is enabled Sep 19 10:16:18 localhost.localdomain kernel: parent transid verify failed on 46678016 wanted 4923 found 3306 Sep 19 10:16:18 localhost.localdomain kernel: parent transid verify failed on 46678016 wanted 4923 found 3306 Sep 19 10:16:18 localhost.localdomain kernel: BTRFS: Failed to read block groups: -5 Sep 19 10:16:18 localhost.localdomain kernel: BTRFS: open_ctree failed Sep 19 10:16:45 localhost.localdomain sudo[5456]: jakob : TTY=pts/6 ; PWD=/home/jakob ; USER=root ; COMMAND=/bin/mount /dev/dm-1 /mnt/diverse/ -o recovery Sep 19 10:16:45 localhost.localdomain kernel: BTRFS info (device dm-1): enabling auto recovery Sep 19 10:16:45 localhost.localdomain kernel: BTRFS info (device dm-1): disk space caching is enabled Sep 19 10:16:46 localhost.localdomain kernel: parent transid verify failed on 46678016 wanted 4923 found 3306 Sep 19 10:16:46 localhost.localdomain kernel: parent transid verify failed on 46678016 wanted 4923 found 3306 Sep 19 10:16:46 localhost.localdomain kernel: BTRFS: Failed to read block groups: -5 Sep 19 10:16:46 localhost.localdomain kernel: BTRFS: open_ctree failed *fstab entry with which it was usually mounted:* /dev/mapper/LuksOpenendEncJakobExtern /mnt/EncJakobExtern btrfs compress=lzo,nofail 0 0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert Btrfs: device_list_add() should not update list when
Looks good to me Chris. Thank you. Reviewed-by: Anand Jain anand.j...@oracle.com On 09/18/2014 11:00 PM, Chris Mason wrote: Johannes and Sam, could you please confirm this patch fixes your mount regression for now? Anand, please make sure I kept the generation check properly. This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f. This commit is triggering failures to mount by subvolume id in some configurations. The main problem is how many different ways this scanning function is used, both for scanning while mounted and unmounted. A proper cleanup is too big for late rcs. For now, just revert the commit and we'll put a better fix into a later merge window. Signed-off-by: Chris Mason c...@fb.com --- fs/btrfs/volumes.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 340a92d..2c2d6d1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -529,12 +529,12 @@ static noinline int device_list_add(const char *path, */ /* -* As of now don't allow update to btrfs_fs_device through -* the btrfs dev scan cli, after FS has been mounted. +* For now, we do allow update to btrfs_fs_device through the +* btrfs dev scan cli after FS has been mounted. We're still +* tracking a problem where systems fail mount by subvolume id +* when we reject replacement on a mounted FS. */ - if (fs_devices-opened) { - return -EBUSY; - } else { + if (!fs_devices-opened found_transid device-generation) { /* * That is if the FS is _not_ mounted and if you * are here, that means there is more than one @@ -542,8 +542,7 @@ static noinline int device_list_add(const char *path, * with larger generation number or the last-in if * generation are equal. */ - if (found_transid device-generation) - return -EEXIST; + return -EEXIST; } name = rcu_string_strdup(path, GFP_NOFS); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Performance Issues
Hi, I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. I've tried mounting with noatime, and this has had no effect. Anyone got any ideas? Here are the things that the wiki page asked for [1]: uname -a: Linux zarniwoop.blob 3.16.2-200.fc20.x86_64 #1 SMP Mon Sep 8 11:54:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux btrfs --version: Btrfs v3.16 btrfs fi show: Label: 'fedora' uuid: 717c0a1b-815c-4e6a-86c0-60b921e84d75 Total devices 1 FS bytes used 1.49TiB devid1 size 2.72TiB used 1.50TiB path /dev/sda4 Btrfs v3.16 btrfs fi df /: Data, single: total=1.48TiB, used=1.48TiB System, DUP: total=32.00MiB, used=208.00KiB Metadata, DUP: total=11.50GiB, used=10.43GiB unknown, single: total=512.00MiB, used=0.00 dmesg dump is attached. Please CC any responses to me, as I'm not subscribed to the list. Cheers, Rob [1] https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.16.2-200.fc20.x86_64 (mockbu...@bkernel01.phx2.fedoraproject.org) (gcc version 4.8.3 20140624 (Red Hat 4.8.3-1) (GCC) ) #1 SMP Mon Sep 8 11:54:45 UTC 2014 [0.00] Command line: BOOT_IMAGE=/vmlinuz-3.16.2-200.fc20.x86_64 root=UUID=717c0a1b-815c-4e6a-86c0-60b921e84d75 ro rootflags=subvol=root vconsole.font=latarcyrheb-sun16 rhgb quiet LANG=en_GB.UTF-8 [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009dbff] usable [0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xdfed] usable [0.00] BIOS-e820: [mem 0xdfee-0xdfee2fff] ACPI NVS [0.00] BIOS-e820: [mem 0xdfee3000-0xdfee] ACPI data [0.00] BIOS-e820: [mem 0xdfef-0xdfef] reserved [0.00] BIOS-e820: [mem 0xf000-0xf3ff] reserved [0.00] BIOS-e820: [mem 0xfec0-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00019fff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.4 present. [0.00] DMI: Gigabyte Technology Co., Ltd. P35-S3G/P35-S3G, BIOS F5 06/19/2009 [0.00] e820: update [mem 0x-0x0fff] usable == reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x1a max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-CDFFF write-protect [0.00] CE000-E uncachable [0.00] F-F write-through [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask F write-back [0.00] 1 base 0E000 mask FE000 uncachable [0.00] 2 base 1 mask F write-back [0.00] 3 base 1C000 mask FC000 uncachable [0.00] 4 base 1A000 mask FE000 uncachable [0.00] 5 base 0DFF0 mask 0 uncachable [0.00] 6 disabled [0.00] 7 disabled [0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 [0.00] e820: update [mem 0xdff0-0x] usable == reserved [0.00] e820: last_pfn = 0xdfee0 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000f50d0-0x000f50df] mapped at [880f50d0] [0.00] Base memory trampoline at [88097000] 97000 size 24576 [0.00] init_memory_mapping: [mem 0x-0x000f] [0.00] [mem 0x-0x000f] page 4k [0.00] BRK [0x02004000, 0x02004fff] PGTABLE [0.00] BRK [0x02005000, 0x02005fff] PGTABLE [0.00] BRK [0x02006000, 0x02006fff] PGTABLE [0.00] init_memory_mapping: [mem 0x19fe0-0x19fff] [0.00] [mem 0x19fe0-0x19fff] page 2M [0.00] BRK [0x02007000, 0x02007fff] PGTABLE [0.00] init_memory_mapping: [mem 0x19c00-0x19fdf] [0.00] [mem 0x19c00-0x19fdf] page 2M [0.00] init_memory_mapping: [mem 0x18000-0x19bff] [0.00] [mem 0x18000-0x19bff] page 2M [0.00]
Re: Performance Issues
Le vendredi 19 septembre 2014, 13:18:34 Rob Spanton a écrit : I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. Weeelll I have the same over-complicated kind of setup, and an Arch Linux BTRFS system which used to boot in some decent amout of time in the past now takes about 5 full minutes to just make it to the KDM login prompt, and another 5 minutes before KDE is fully started. Makes me think of the good ole' times of Windows 95 OSR2 on a 486SX with a dying 1 GB Hard disk... Now, let me add that I had removed all snaphots, ran a full defrag, and even rebalanced the damn thing without any positive effect... (And yes, my HD is physically in good shape, SMART feels fully happy, and it's less than 75% full...) I've been using BTRFS for 2-3 years on a dozen of different systems, and if something doesn't surprise me at all, it's « slow performance », indeed, although I'm myself more accustomed to « incredibly fscking damn slow performance »... HTH -- Swâmi Petaramesh sw...@petaramesh.org http://petaramesh.org PGP 9076E32E Un homme ne doit pas avaler plus de bobards qu'il ne peut en digérer. -- Henry Brooks Adams -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance Issues
On 2014-09-19 08:18, Rob Spanton wrote: Hi, I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. I've tried mounting with noatime, and this has had no effect. Anyone got any ideas? Here are the things that the wiki page asked for [1]: uname -a: Linux zarniwoop.blob 3.16.2-200.fc20.x86_64 #1 SMP Mon Sep 8 11:54:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux btrfs --version: Btrfs v3.16 btrfs fi show: Label: 'fedora' uuid: 717c0a1b-815c-4e6a-86c0-60b921e84d75 Total devices 1 FS bytes used 1.49TiB devid1 size 2.72TiB used 1.50TiB path /dev/sda4 Btrfs v3.16 btrfs fi df /: Data, single: total=1.48TiB, used=1.48TiB System, DUP: total=32.00MiB, used=208.00KiB Metadata, DUP: total=11.50GiB, used=10.43GiB unknown, single: total=512.00MiB, used=0.00 dmesg dump is attached. Please CC any responses to me, as I'm not subscribed to the list. Cheers, Rob [1] https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list WRT the performance of Evolution, the issue is probably fragmentation of the data files. If you run the command: # btrfs fi defrag -rv /home you should see some improvement in evolution performance (until you get any new mail that is). Evolution (like most graphical e-mail clients these days) uses sqlite for data storage, and sqlite database files are one of the known pathological cases for COW filesystems in general; the solution is to mark the files as NOCOW (see the info about VM images in [1] and [2], the same suggestions apply to database files). As for git, I haven't seen any performance issues specific to BTRFS; are you using any compress= mount option? zlib based compression is known to cause serious slowdowns. I don't think that git uses any kind of database for data storage. Also, if the performance comparison is from other systems, unless those systems have the EXACT same hardware configuration, they aren't really a good comparison. Unless the pc this is on is a relatively recent system (less than a year or two old), it may just be hardware that is the performance bottleneck. smime.p7s Description: S/MIME Cryptographic Signature
Re: Performance Issues
On 2014-09-19 08:25, Swâmi Petaramesh wrote: Le vendredi 19 septembre 2014, 13:18:34 Rob Spanton a écrit : I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. Weeelll I have the same over-complicated kind of setup, and an Arch Linux BTRFS system which used to boot in some decent amout of time in the past now takes about 5 full minutes to just make it to the KDM login prompt, and another 5 minutes before KDE is fully started. Makes me think of the good ole' times of Windows 95 OSR2 on a 486SX with a dying 1 GB Hard disk... Well, part of your problem might be KDE itself, it's extremely CPU intensive these days. I'd suggest disabling the 'semantic desktop' stuff, because that tends to be the worst offender as far as soaking up system resources. Also, if you recently switched to systemd, that may be causing some slowdown as well (journald's default settings are terrible for performance) Now, let me add that I had removed all snaphots, ran a full defrag, and even rebalanced the damn thing without any positive effect... (And yes, my HD is physically in good shape, SMART feels fully happy, and it's less than 75% full...) I've been using BTRFS for 2-3 years on a dozen of different systems, and if something doesn't surprise me at all, it's « slow performance », indeed, although I'm myself more accustomed to « incredibly fscking damn slow performance »... It's kind of funny, but I haven't had any performance issues with BTRFS since about 3.10, even on the systems my employer is using Fedora 20 on, and those use only a Core 2 Duo Processor, DDR2-800 RAM, and SATA2 hard drives. HTH smime.p7s Description: S/MIME Cryptographic Signature
Re: Performance Issues
On 2014-09-19 08:49, Austin S Hemmelgarn wrote: On 2014-09-19 08:18, Rob Spanton wrote: Hi, I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. I've tried mounting with noatime, and this has had no effect. Anyone got any ideas? Here are the things that the wiki page asked for [1]: uname -a: Linux zarniwoop.blob 3.16.2-200.fc20.x86_64 #1 SMP Mon Sep 8 11:54:45 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux btrfs --version: Btrfs v3.16 btrfs fi show: Label: 'fedora' uuid: 717c0a1b-815c-4e6a-86c0-60b921e84d75 Total devices 1 FS bytes used 1.49TiB devid1 size 2.72TiB used 1.50TiB path /dev/sda4 Btrfs v3.16 btrfs fi df /: Data, single: total=1.48TiB, used=1.48TiB System, DUP: total=32.00MiB, used=208.00KiB Metadata, DUP: total=11.50GiB, used=10.43GiB unknown, single: total=512.00MiB, used=0.00 dmesg dump is attached. Please CC any responses to me, as I'm not subscribed to the list. Cheers, Rob [1] https://btrfs.wiki.kernel.org/index.php/Btrfs_mailing_list WRT the performance of Evolution, the issue is probably fragmentation of the data files. If you run the command: # btrfs fi defrag -rv /home you should see some improvement in evolution performance (until you get any new mail that is). Evolution (like most graphical e-mail clients these days) uses sqlite for data storage, and sqlite database files are one of the known pathological cases for COW filesystems in general; the solution is to mark the files as NOCOW (see the info about VM images in [1] and [2], the same suggestions apply to database files). As for git, I haven't seen any performance issues specific to BTRFS; are you using any compress= mount option? zlib based compression is known to cause serious slowdowns. I don't think that git uses any kind of database for data storage. Also, if the performance comparison is from other systems, unless those systems have the EXACT same hardware configuration, they aren't really a good comparison. Unless the pc this is on is a relatively recent system (less than a year or two old), it may just be hardware that is the performance bottleneck. Realized after I sent this that I forgot the links for [1] and [2] [1] https://btrfs.wiki.kernel.org/index.php/UseCases [2] https://btrfs.wiki.kernel.org/index.php/FAQ smime.p7s Description: S/MIME Cryptographic Signature
Re: kernel integration branch updated
On 09/18/2014 09:45 PM, Qu Wenruo wrote: Hi Chris, I'm sorry that the commit 'btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map' has a V2 patch, so the one in tree is not up-to-data. Although the v2 change is quite small and it's relevantly dependent, so it should not be a pain change. Thanks, please send the v2 as an incremental. We'll send both to stable. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance Issues
On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote: I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My This is - unfortunately - a particular btrfs oddity/characteristic/flaw, whatever you want to call it. git relies a lot on fast stat() calls, and those seem to be particularly slow with btrfs esp. on rotational media. I have the same problem with rsync on a freshly mounted volume; it gets fast (quite so!) after the first run. The simplest thing to fix this is a du -s /dev/null to pre-cache all file inodes. I'd also love a technical explanation why this happens and how it could be fixed. Maybe it's just a consequence of how the metadata tree(s) are laid out on disk. I've tried mounting with noatime, and this has had no effect. Anyone got any ideas? Don't drop the caches :-) -h -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert Btrfs: device_list_add() should not update list when
On 18/09/14 16:00, Chris Mason wrote: Johannes and Sam, could you please confirm this patch fixes your mount regression for now? Anand, please make sure I kept the generation check properly. I've just tested this patch on top of 3.17-rc5 and it fixes the issue for me. Thanks! Sam -- Sam Thursfield, Codethink Ltd. Office telephone: +44 161 236 5575 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance Issues
On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote: I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. This is - unfortunately - a particular btrfs oddity/characteristic/flaw, whatever you want to call it. git relies a lot on fast stat() calls, and those seem to be particularly slow with btrfs esp. on rotational media. I have the same problem with rsync on a freshly mounted volume; it gets fast (quite so!) after the first run. The simplest thing to fix this is a du -s /dev/null to pre-cache all file inodes. I'd also love a technical explanation why this happens and how it could be fixed. Maybe it's just a consequence of how the metadata tree(s) are laid out on disk. I've tried mounting with noatime, and this has had no effect. Anyone got any ideas? Don't drop the caches :-) -h -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: cleanup error handling in build_backref_tree
When balance panics it tends to panic in the BUG_ON(!upper-checked); test, because it means it couldn't build the backref tree properly. This is annoying to users and frankly a recoverable error, nothing in this function is actually fatal since it is just an in-memory building of the backrefs for a given bytenr. So go through and change all the BUG_ON()'s to ASSERT()'s, and fix the BUG_ON(!upper-checked) thing to just return an error. This patch also fixes the error handling so it tears down the work we've done properly. This code was horribly broken since we always just panic'ed instead of actually erroring out, so it needed to be completely re-worked. With this patch my broken image no longer panics when I mount it. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- fs/btrfs/relocation.c | 88 ++- 1 file changed, 59 insertions(+), 29 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 2d221c4..19726af 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -736,7 +736,8 @@ again: err = ret; goto out; } - BUG_ON(!ret || !path1-slots[0]); + ASSERT(ret); + ASSERT(path1-slots[0]); path1-slots[0]--; @@ -746,10 +747,10 @@ again: * the backref was added previously when processing * backref of type BTRFS_TREE_BLOCK_REF_KEY */ - BUG_ON(!list_is_singular(cur-upper)); + ASSERT(list_is_singular(cur-upper)); edge = list_entry(cur-upper.next, struct backref_edge, list[LOWER]); - BUG_ON(!list_empty(edge-list[UPPER])); + ASSERT(list_empty(edge-list[UPPER])); exist = edge-node[UPPER]; /* * add the upper level block to pending list if we need @@ -831,7 +832,7 @@ again: cur-cowonly = 1; } #else - BUG_ON(key.type == BTRFS_EXTENT_REF_V0_KEY); + ASSERT(key.type != BTRFS_EXTENT_REF_V0_KEY); if (key.type == BTRFS_SHARED_BLOCK_REF_KEY) { #endif if (key.objectid == key.offset) { @@ -840,7 +841,7 @@ again: * backref of this type. */ root = find_reloc_root(rc, cur-bytenr); - BUG_ON(!root); + ASSERT(root); cur-root = root; break; } @@ -868,7 +869,7 @@ again: } else { upper = rb_entry(rb_node, struct backref_node, rb_node); - BUG_ON(!upper-checked); + ASSERT(upper-checked); INIT_LIST_HEAD(edge-list[UPPER]); } list_add_tail(edge-list[LOWER], cur-upper); @@ -892,7 +893,7 @@ again: if (btrfs_root_level(root-root_item) == cur-level) { /* tree root */ - BUG_ON(btrfs_root_bytenr(root-root_item) != + ASSERT(btrfs_root_bytenr(root-root_item) == cur-bytenr); if (should_ignore_root(root)) list_add(cur-list, useless); @@ -927,7 +928,7 @@ again: need_check = true; for (; level BTRFS_MAX_LEVEL; level++) { if (!path2-nodes[level]) { - BUG_ON(btrfs_root_bytenr(root-root_item) != + ASSERT(btrfs_root_bytenr(root-root_item) == lower-bytenr); if (should_ignore_root(root)) list_add(lower-list, useless); @@ -982,7 +983,7 @@ again: } else { upper = rb_entry(rb_node, struct backref_node, rb_node); - BUG_ON(!upper-checked); + ASSERT(upper-checked); INIT_LIST_HEAD(edge-list[UPPER]); if (!upper-owner) upper-owner = btrfs_header_owner(eb); @@ -1026,7 +1027,7 @@ next: * everything goes well, connect backref nodes and insert backref nodes * into the cache. */ - BUG_ON(!node-checked); + ASSERT(node-checked); cowonly = node-cowonly; if (!cowonly) { rb_node = tree_insert(cache-rb_root, node-bytenr, @@ -1062,8 +1063,21 @@ next:
Re: Performance Issues
On 2014-09-19 09:51, Holger Hoffstätte wrote: On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote: I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. This is - unfortunately - a particular btrfs oddity/characteristic/flaw, whatever you want to call it. git relies a lot on fast stat() calls, and those seem to be particularly slow with btrfs esp. on rotational media. I have the same problem with rsync on a freshly mounted volume; it gets fast (quite so!) after the first run. I find that kind of funny, because regardless of filesystem, stat() is one of the *slowest* syscalls on almost every *nix system in existence. The simplest thing to fix this is a du -s /dev/null to pre-cache all file inodes. I'd also love a technical explanation why this happens and how it could be fixed. Maybe it's just a consequence of how the metadata tree(s) are laid out on disk. While I don't know for certain, I think it's largely just a side effect of the lack of performance tuning in the BTRFS code. I've tried mounting with noatime, and this has had no effect. Anyone got any ideas? Don't drop the caches :-) -h smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH] xfstests: remove check_scratch_fs in btrfs/012
On 09/02/2014 11:25 PM, Liu Bo wrote: From: Liu Bo liub.li...@gmail.com btrfs/012 is a case to verify btrfs-convert feature, it converts an ext4 to btrfs firstly and do something, then rolls back to ext4. So at last we have a ext4 on the scratch device, but setting _require_scratch will force a btrfsck on a ext4 fs because $FSTYP here is btrfs, and it ends up with a failure report of _check_btrfs_filesystem. Now that we have deliberately check the final ext4 fs in btrfs/012, just do not set _require_scratch in this case. Signed-off-by: Liu Bo liub.li...@gmail.com I sent a patch for this already, it's on the fs-tests list. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance Issues
On 09/19/2014 08:18 AM, Rob Spanton wrote: Hi, I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. Weird, I get the exact opposite performance. Anyway it's probably because of your file layouts, try defragging your git dir and see if that helps. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/2] Move BTRFS RCU string to common library
On Fri, Sep 19, 2014 at 02:01:28AM -0700, Omar Sandoval wrote: This patch series moves the generic RCU string library used internally by BTRFS to be accessible by anyone. It provides printk_in_rcu and printk_ratelimited_in_rcu to print these strings. In order to avoid a weird inconsistency between the two, the first patch fixes printk_ratelimited so it passes on the return value from printk. The second patch actually moves the RCU string library. Version 2 passes on the return values from printk{,_ratelimited} and fixes some style issues. Omar Sandoval (2): For the series: Acked-by: Paul E. McKenney paul...@linux.vnet.ibm.com Return a value from printk_ratelimited Move BTRFS RCU string to common library fs/btrfs/check-integrity.c | 6 +-- fs/btrfs/dev-replace.c | 19 +- fs/btrfs/disk-io.c | 6 +-- fs/btrfs/extent_io.c | 4 +- fs/btrfs/ioctl.c | 4 +- fs/btrfs/raid56.c | 2 +- fs/btrfs/rcu-string.h | 56 fs/btrfs/scrub.c | 15 fs/btrfs/super.c | 2 +- fs/btrfs/volumes.c | 14 +++ include/linux/printk.h | 4 +- include/linux/rcustring.h | 91 ++ 12 files changed, 131 insertions(+), 92 deletions(-) delete mode 100644 fs/btrfs/rcu-string.h create mode 100644 include/linux/rcustring.h -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/2] Move BTRFS RCU string to common library
On Fri, Sep 19, 2014 at 11:47:53AM -0400, Chris Mason wrote: On 09/19/2014 11:45 AM, Paul E. McKenney wrote: On Fri, Sep 19, 2014 at 02:01:28AM -0700, Omar Sandoval wrote: This patch series moves the generic RCU string library used internally by BTRFS to be accessible by anyone. It provides printk_in_rcu and printk_ratelimited_in_rcu to print these strings. In order to avoid a weird inconsistency between the two, the first patch fixes printk_ratelimited so it passes on the return value from printk. The second patch actually moves the RCU string library. Version 2 passes on the return values from printk{,_ratelimited} and fixes some style issues. Omar Sandoval (2): For the series: Acked-by: Paul E. McKenney paul...@linux.vnet.ibm.com Fine by me too, Paul, do you want to merge it in? I would be happy to. Are you thinking in terms of 3.18 or 3.19? These look OK either way, but thought I should check. Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/2] Move BTRFS RCU string to common library
On 09/19/2014 12:05 PM, Paul E. McKenney wrote: On Fri, Sep 19, 2014 at 11:47:53AM -0400, Chris Mason wrote: On 09/19/2014 11:45 AM, Paul E. McKenney wrote: On Fri, Sep 19, 2014 at 02:01:28AM -0700, Omar Sandoval wrote: This patch series moves the generic RCU string library used internally by BTRFS to be accessible by anyone. It provides printk_in_rcu and printk_ratelimited_in_rcu to print these strings. In order to avoid a weird inconsistency between the two, the first patch fixes printk_ratelimited so it passes on the return value from printk. The second patch actually moves the RCU string library. Version 2 passes on the return values from printk{,_ratelimited} and fixes some style issues. Omar Sandoval (2): For the series: Acked-by: Paul E. McKenney paul...@linux.vnet.ibm.com Fine by me too, Paul, do you want to merge it in? I would be happy to. Are you thinking in terms of 3.18 or 3.19? These look OK either way, but thought I should check. Either way is fine with me. Actually this will have minor conflicts with my current branch headed for-next, so I can resolve and send as a stand alone pull. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance Issues
On Fri, 19 Sep 2014 10:53:03 -0400, Austin S Hemmelgarn wrote: I find that kind of funny, because regardless of filesystem, stat() is one of the *slowest* syscalls on almost every *nix system in existence. Sure. I didn't mean to imply that stat() in its various incarnations is fast by itself, just that git relies a lot on it since it necessarily needs to look at every file. -h -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance Issues
Hi, Thanks for the response everyone. I wrote: I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. The evolution problem has been improved: the sqlite db that it was using had over 18000 fragments, so I got evolution to recreate that file with nocow set. It now takes only 30s to load my mail rather than 80s, which is better... On Fri, 2014-09-19 at 11:05 -0400, Josef Bacik wrote: Weird, I get the exact opposite performance. Anyway it's probably because of your file layouts, try defragging your git dir and see if that helps. Thanks, Defragging has improved matters a bit: it now takes 26s (was 46s) to run git status. Still not amazing, but at the moment I have no evidence to suggest that it's not something to do with the machine's hardware. If I get time over the weekend I'll dig out an external hard disk and try a couple of benchmarks with that. For reference, these are the mount flags: /dev/sda4 on / type btrfs (rw,noatime,space_cache) /dev/sda4 on /home type btrfs (rw,noatime,space_cache) Cheers, Rob signature.asc Description: This is a digitally signed message part
Re: Problem with unmountable filesystem.
Possibly btrfs-select-super can do some of the things I was doing the hard way. It's possible to select a super to overwrite other supers, even if they're good ones. Whereas btrfs rescue super-recover won't do that, and neither will btrfsck, hence why I corrupted the one I didn't want first. This command isn't built by default (at least not on Fedora). Chris-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help for creating a useful bugreport
On Sep 19, 2014, at 2:58 AM, Jakob Breier jakob.bre...@rwth-aachen.de wrote: Unfortunately I don't have much to work with. Can you help me with extracting enough information to create a useful bugreport? What storage device(s)? Include results from # btrfs check And also a note whether you get different results with -s1, -s2, -s3 (how many backups superblocks you have depends on file system size so some of those might not work). Since it won't mount you can't get fi df, but if you can provide that info so we know if, e.g. the metadata is single (by default on SSD) or DUP. Was it created with btrfs-progs 3.16, and has it only been written to with kernel 3.16 or other kernels also? If you can use btrfs-image per the wiki, and keep the image around, it might come in handy for a Btrfs developer. Sep 19 10:16:18 localhost.localdomain kernel: parent transid verify failed on 46678016 wanted 4923 found 3306 Sep 19 10:16:18 localhost.localdomain kernel: parent transid verify failed on 46678016 wanted 4923 found 3306 These messages come up often on the list. The notes written in disk-io.c say this: * we can't consider a given block up to date unless the transid of the * block matches the transid in the parent node's pointer. This is how we * detect blocks that either didn't get written at all or got written * in the wrong place. I don't know whether this definitely means hardware related problems of some sort, but it sounds suspiciously like that because blocks should get written in the correct place. Right? But they didn't. Sep 19 10:16:18 localhost.localdomain kernel: BTRFS: Failed to read block groups: -5 This came up in a recent thread Problem with a filesystem. I'm not sure what it means. Once you've taken the btrfs-image, and you're about ready to toss the file system it's worth trying these commands. btrfs rescue super-recover -v ## if it fixes anything, don't continue, try to mount the fs btrfs check --repair ## I'd try mounting even if it doesn't say it's repaired anything btrfs check --repair --init-extent-tree ## Again try to mount the fs And report kernel and user space messages. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] Return a value from printk_ratelimited
On Fri, 19 Sep 2014 02:01:29 -0700 Omar Sandoval osan...@osandov.com wrote: printk returns an integer; there's no reason for printk_ratelimited to swallow it. Signed-off-by: Omar Sandoval osan...@osandov.com --- include/linux/printk.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/printk.h b/include/linux/printk.h index d78125f..67534bc 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -343,12 +343,14 @@ extern asmlinkage void dump_stack(void) __cold; #ifdef CONFIG_PRINTK #define printk_ratelimited(fmt, ...) \ ({ \ + int __ret = 0; \ My only issues is with the __ret name. It's not really unique enough. If something else uses __ret and does printk_ratelimit(some fmt string %d\n, __ret); This will not print the right value. printk_ratelimit can be used almost anywhere thus using a really unique value may be worth while here. What about: int __r ? -- Steve static DEFINE_RATELIMIT_STATE(_rs, \ DEFAULT_RATELIMIT_INTERVAL, \ DEFAULT_RATELIMIT_BURST); \ \ if (__ratelimit(_rs)) \ - printk(fmt, ##__VA_ARGS__); \ + __ret = printk(fmt, ##__VA_ARGS__); \ + __ret; \ }) #else #define printk_ratelimited(fmt, ...) \ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with unmountable filesystem.
On 2014-09-19 13:07, Chris Murphy wrote: Possibly btrfs-select-super can do some of the things I was doing the hard way. It's possible to select a super to overwrite other supers, even if they're good ones. Whereas btrfs rescue super-recover won't do that, and neither will btrfsck, hence why I corrupted the one I didn't want first. This command isn't built by default (at least not on Fedora). I don't think it's built by default on any of the major distributions. On Gentoo you need to set package specific configure options. smime.p7s Description: S/MIME Cryptographic Signature
Re: Performance Issues
On 09/19/2014 11:51 AM, Rob Spanton wrote: Hi, Thanks for the response everyone. I wrote: I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. The evolution problem has been improved: the sqlite db that it was using had over 18000 fragments, so I got evolution to recreate that file with nocow set. It now takes only 30s to load my mail rather than 80s, which is better... On Fri, 2014-09-19 at 11:05 -0400, Josef Bacik wrote: Weird, I get the exact opposite performance. Anyway it's probably because of your file layouts, try defragging your git dir and see if that helps. Thanks, Defragging has improved matters a bit: it now takes 26s (was 46s) to run git status. Still not amazing, but at the moment I have no evidence to suggest that it's not something to do with the machine's hardware. If I get time over the weekend I'll dig out an external hard disk and try a couple of benchmarks with that. For reference, these are the mount flags: /dev/sda4 on / type btrfs (rw,noatime,space_cache) /dev/sda4 on /home type btrfs (rw,noatime,space_cache) You have an awful lot of metadata, do you have a lot of snapshots? Also I'd be interested in making sure most of this is just from shitty metadata layout, could you make sure you have a recent version of trace-cmd and then drop caches and do trace-cmd record -e sched:sched_switch git status and send me the trace.dat so I can see where all the time is spent? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance Issues
On Fri, Sep 19, 2014 at 01:51:22PM +, Holger Hoffstätte wrote: On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote: I have a particularly uncomplicated setup (a desktop PC with a hard disk) and I'm seeing particularly slow performance from btrfs. A `git status` in the linux source tree takes about 46 seconds after dropping caches, whereas on other machines using ext4 this takes about 13s. My mail client (evolution) also seems to perform particularly poorly on this setup, and my hunch is that it's spending a lot of time waiting on the filesystem. This is - unfortunately - a particular btrfs oddity/characteristic/flaw, whatever you want to call it. git relies a lot on fast stat() calls, and those seem to be particularly slow with btrfs esp. on rotational media. I have the same problem with rsync on a freshly mounted volume; it gets fast (quite so!) after the first run. The simplest thing to fix this is a du -s /dev/null to pre-cache all file inodes. I'd also love a technical explanation why this happens and how it could be fixed. Maybe it's just a consequence of how the metadata tree(s) are laid out on disk. There's a lot of meat behind that just a consequence but, yes, that's the heart of it. Different metadata designs result in different io patterns which single rotating drives are exquisitely sensitive to. You can look for differences in io patterns with iostat, blktrace, iowatcher, etc. They'll show differences in io sizes, concurrency, locality, and often differences in the amount of blocks of data read. http://masoncoding.com/iowatcher/ As for fixing it, wel, it's arguably working as intended. If you turned btrfs from one cow tree into lots of journaled trees of trees then, well, we'd be left with an absurd reimplementation of ext*|xfs. - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with unmountable filesystem.
On Sep 17, 2014, at 5:23 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote: [ 30.920536] BTRFS: bad tree block start 0 130402254848 [ 30.924018] BTRFS: bad tree block start 0 130402254848 [ 30.926234] BTRFS: failed to read log tree [ 30.953055] BTRFS: open_ctree failed I'm still confused. Btrfs knows this tree root is bad, but it has backup roots. So why wasn't one of those used by -o recovery? I thought that's the whole point of that mount option. Backup tree roots are per superblock, so conceivably you'd have up to 8 of these with two superblocks, they're shown with btrfs-show-super -af ## and -F even if a super is bad But skipping that, to fix this you need to know which super is pointing to the wrong tree root, since you're using ssd mount option with rotating supers. I assume mount uses the super with the highest generation number. So you'd need to: btrfs-show-super -a to find out the super with the most recent generation. You'd assume that one was wrong. And then use btrfs-select-super to pick the right one, and replace the wrong one. Then you could mount. I also wonder if btrfs check -sX would show different results in your case. I'd think it would because it ought to know one of those tree roots is bad, seeing as mount knows it. And then it seems (I'm speculating a ton) that --repair might try to fix the bad tree root, and then if it fails I'd like to think it can just find the most recent good tree root, ideally one listed as a backup_tree_root by any good superblock, and then have the next mount use that. I'm not sure why this persistently fails, and I wonder if there are cases of users giving up and blowing away file systems that could actually be mountable. But it's just really a manual process figuring out what things to do in what order to get them to mount. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] Return a value from printk_ratelimited
On Fri, 2014-09-19 at 13:21 -0400, Steven Rostedt wrote: On Fri, 19 Sep 2014 02:01:29 -0700 Omar Sandoval osan...@osandov.com wrote: printk returns an integer; there's no reason for printk_ratelimited to swallow it. Except for the lack of usefulness of the return value itself. See: https://lkml.org/lkml/2009/10/7/275 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with unmountable filesystem.
On 2014-09-19 13:54, Chris Murphy wrote: On Sep 17, 2014, at 5:23 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote: [ 30.920536] BTRFS: bad tree block start 0 130402254848 [ 30.924018] BTRFS: bad tree block start 0 130402254848 [ 30.926234] BTRFS: failed to read log tree [ 30.953055] BTRFS: open_ctree failed I'm still confused. Btrfs knows this tree root is bad, but it has backup roots. So why wasn't one of those used by -o recovery? I thought that's the whole point of that mount option. Backup tree roots are per superblock, so conceivably you'd have up to 8 of these with two superblocks, they're shown with btrfs-show-super -af ## and -F even if a super is bad But skipping that, to fix this you need to know which super is pointing to the wrong tree root, since you're using ssd mount option with rotating supers. I assume mount uses the super with the highest generation number. So you'd need to: btrfs-show-super -a to find out the super with the most recent generation. You'd assume that one was wrong. And then use btrfs-select-super to pick the right one, and replace the wrong one. Then you could mount. I also wonder if btrfs check -sX would show different results in your case. I'd think it would because it ought to know one of those tree roots is bad, seeing as mount knows it. And then it seems (I'm speculating a ton) that --repair might try to fix the bad tree root, and then if it fails I'd like to think it can just find the most recent good tree root, ideally one listed as a backup_tree_root by any good superblock, and then have the next mount use that. I'm not sure why this persistently fails, and I wonder if there are cases of users giving up and blowing away file systems that could actually be mountable. But it's just really a manual process figuring out what things to do in what order to get them to mount. From what I can tell, btrfs check doesn't do anything about backup superblocks unless you specifically tell it to. In this case, running btrfs check without specifying a superblock mirror, and with explicitly specifying the primary superblock produced identical results (namely it choked, hard, with an error message similar to that from the kernel. However, running it with -s1 to select the first backup superblock returned no errors at all other than the space_cache being invalid and the count of used blocks being wrong. Based on my (limited) understanding of the mount code, it does try to use the superblock with the highest generation (regardless of whether we are on an ssd or not), but doesn't properly fall back to a secondary superblock after trying to mount using the primary. As far as btrfs check repair trying to fix this, I don't think that it does so currently, probably for the same reason that mount fails. smime.p7s Description: S/MIME Cryptographic Signature
Re: Single disk parrallelization
On 2014-09-19 14:10, Jeb Thomson wrote: With the advanced features of btrfs, it would be an additional simple task to make different platters run in parallel. In this case, say a disk has three platters, and so three seek heads as well. If we can identify that much, and what offsets they are at, it then becomes a trivial matter to place the reads and writes to different platters at the same time. In affect, this means each platter should be operating as a single virtualized unit, instead of one single unit... In theory this is a great idea except for two things: 1) Most consumer drives have only one platter. 2) The kernel doesn't have such low-level hardware access, so it would have to be implemented in device firmware (and I'd be willing to bet that most drive manufacturers already stripe data across multiple platters when possible). smime.p7s Description: S/MIME Cryptographic Signature
[PATCH] Btrfs: fix build_backref_tree issue with multiple shared blocks
Marc Merlin sent me a broken fs image months ago where it would blow up in the upper-checked BUG_ON() in build_backref_tree. This is because we had a scenario like this block a -- level 4 (not shared) | block b -- level 3 (reloc block, shared) | block c -- level 2 (not shared) | block d -- level 1 (shared) | block e -- level 0 (shared) We go to build a backref tree for block e, we notice block d is shared and add it to the list of blocks to lookup it's backrefs for. Now when we loop around we will check edges for the block, so we will see we looked up block c last time. So we lookup block d and then see that the block that points to it is block c and we can just skip that edge since we've already been up this path. The problem is because we clear need_check when we see block d (as it is shared) we never add block b as needing to be checked. And because block c is in our path already we bail out before we walk up to block b and add it to the backref check list. To fix this we need to reset need_check if we trip over a block that doesn't need to be checked. This will make sure that any subsequent blocks in the path as we're walking up afterwards are added to the list to be processed. With this patch I can now mount Marc's fs image and it'll complete the balance without panicing. Thanks, Reported-by: Marc MERLIN m...@merlins.org Signed-off-by: Josef Bacik jba...@fb.com --- fs/btrfs/relocation.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 19726af..b55ea37 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -978,8 +978,11 @@ again: need_check = false; list_add_tail(edge-list[UPPER], list); - } else + } else { + if (upper-checked) + need_check = true; INIT_LIST_HEAD(edge-list[UPPER]); + } } else { upper = rb_entry(rb_node, struct backref_node, rb_node); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs fixes
Hi Linus, We have two more fixes for pulling: git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus I've got a revert to fix a regression with btrfs device registration, and Filipe has part two of his fsync fix from last week. Chris Mason (1) commits (+6/-7): Revert Btrfs: device_list_add() should not update list when mounted Filipe Manana (1) commits (+13/-14): Btrfs: set inode's logged_trans/last_log_commit after ranged fsync Total: (2) commits (+19/-21) fs/btrfs/btrfs_inode.h | 13 +++-- fs/btrfs/tree-log.c| 14 ++ fs/btrfs/volumes.c | 13 ++--- 3 files changed, 19 insertions(+), 21 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: general thoughts and questions + general and RAID5/6 stability?
Hey guys... I was just crawling through the wiki and this list's archive to find answers about some questions. Actually many of them matching those which Christoph has asked here some time ago, though it seems no answers came up at all. Isn't it possible to answer them, at least one by one? I'd believe that most of these questions and their answers would be of common interest and having them properly answered should be a benefit for all possible btrfs users. Regards, William. On Sun, 2014-08-31 at 06:02 +0200, Christoph Anton Mitterer wrote: Hey. For some time now I consider to use btrfs at a larger scale, basically in two scenarios: a) As the backend for data pools handled by dcache (dcache.org), where we run a Tier-2 in the higher PiB range for the LHC Computing Grid... For now that would be rather boring use of btrfs (i.e. not really using any of its advanced features) and also RAID functionality would still be provided by hardware (at least with the current hardware generations we have in use). b) Personally, for my NAS. Here the main goal is less performance but rather data safety (i.e. I want something like RAID6 or better) and security (i.e. it will be on top of dm-crypt/LUKS) and integrity. Hardware wise I'll use and UPS as well as enterprise SATA disks, from different vendors respectively different production lots. (Of course I'm aware that btrfs is experimental, and I would have regular backups) 1) Now I've followed linux-btrfs for a while and blogs like Marc's... and I still read about a lot of stability problems, some which sound quite serious. Sure we have a fsck now, but even in the wiki one can read statements like the developers use it on their systems without major problems... but also if you do this, it could help you... or break even more. I mean I understand that there won't be a single point in time, where Chris Mason says now it's stable and it would be rock solid form that point on... but especially since new features (e.g. things like subvolume quota groups, online/offline dedup, online/offline fsck) move (or will) move in with every new version... one has (as an end-user) basically no chance to determine what can be used safely and what tickles the devil. So one issue I have is to determine the general stability of the different parts. 2) Documentation status... I feel that some general and extensive documentation is missing. One that basically handles (and teaches) all the things which are specific to modern (especially CoW) filesystems. - General design, features and problems of CoW and btrfs - Special situations that arise from the CoW, e.g. that one may not be able to remove files once the fs is full,... or that just reading files could make the used space grow (via the atime) - General guidelines when and how to use nodatacow... i.e. telling people for which kinds of files this SHOULD usually be done (VM images)... and what this means for those files (not checksumming) and what the drawbacks are if it's not used (e.g. if people insist on having the checksumming - what happens to the performance of VM images? what about the wear with SSDs?) - the implications of things like compression and hash algos... whether and when this will have performance impacts (positive or negative) and when not. - the typical lifecycles and procedures when using stuff like multiple devices (how to replace a faulty disk) or important hints like (don't span a btrfs RAID over multiple partitions on the same disk) - especially with the different (mount)options, I mean things that change the way the fs works like no-hole or mixed data/meta block groups... people need to have some general information when to choose which and some real world examples of disadvantages / advantages. E.g. what are the disadvantages of having mixed data/meta block groups? If there'd be only advantages, why wouldn't it be the default? Parts of this is already scattered over LWN articles, the wiki (however the quality greatly varies there), blog posts or mailing list posts... many of the information there is however outdated... and suggested procedures (e.g. how to replace a faulty disk) differ from example to example. An admin that wants to use btrfs shouldn't be required to pick all this together (which is basically impossible).. there should be a manpage (which is kept up to date!) that describes all this. Other important things to document (which I couldn't fine so far in most cases): What is actually guaranteed by btrfs respectively its design? For example: - If there'd be no bugs in the code,.. would the fs be guaranteed to be always consistent by it's CoW design? Or are there circumstances where it can still run into being inconsistent? - Does this basically mean, that even without and fs journal,.. my database is always consistent even if I have a power cut or system crash? - At which places does checksumming take place? Just
Re: Single disk parrallelization
On 09/19/2014 11:10 AM, Jeb Thomson wrote: With the advanced features of btrfs, it would be an additional simple task to make different platters run in parallel. In this case, say a disk has three platters, and so three seek heads as well. If we can identify that much, and what offsets they are at, it then becomes a trivial matter to place the reads and writes to different platters at the same time. In affect, this means each platter should be operating as a single virtualized unit, instead of one single unit... Regards, -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html A disk drive has only one actuator that moves all heads in parallel. Also disk drives are an array of logical blocks today; nobody uses cylinder/head/sector addressing any more. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help Out with the Btrfs Code base and User Space Tools
Hey Fellow Developers, I am new to working on the Linux Kernel and am interested in helping out with btrfs file system and it's respective user space tools. If anyone either has some work or would like to mentor me with the code base that would be greatly appreciated. In addition I hope to do this professionally eventually in the future as a actual career. Cheers and Thanks Nick -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Performance Issues
Rob Spanton posted on Fri, 19 Sep 2014 17:51:09 +0100 as excerpted: The evolution problem has been improved: the sqlite db that it was using had over 18000 fragments, so I got evolution to recreate that file with nocow set. It now takes only 30s to load my mail rather than 80s, which is better... On Fri, 2014-09-19 at 11:05 -0400, Josef Bacik wrote: Weird, I get the exact opposite performance. Anyway it's probably because of your file layouts, try defragging your git dir and see if that helps. Thanks, Defragging has improved matters a bit: it now takes 26s (was 46s) to run git status. Still not amazing, but at the moment I have no evidence to suggest that it's not something to do with the machine's hardware. If I get time over the weekend I'll dig out an external hard disk and try a couple of benchmarks with that. [Replying via mail and list both, as requested.] If you're snapshotting those nocow files, be aware (if you aren't already) that nocow, snapshots and defrag (all on the same files) don't work all that well together... First let's deal with snapshots of nocow files. What does a snapshot do? It locks in place the existing version of a file, both logically, so you can get at that version of it via the snapshot even after changes have been made, and physically, it locks existing extents where they are. With normal cow files this is fine, since any changes would cause the changed block to be written elsewhere, freeing the now replaced block if there's nothing holding it in place. A snapshot simply keeps a reference to the existing extent when the data is cowed elsewhere instead of releasing it, so there's a way to get the old version as referenced by that snapshot back too. But nocow files are normally overwritten in place, that's what nocow /is/. Obviously that conflicts with what a snapshot does, locking the existing version in place. What btrfs does, then, to handle that, is on the first write to a (4KB) block in a (normally) nowcow file after a snapshot, a cow write is forced on that block anyway. The file remains nocow, and additional writes to the /same/ block continue to write to the same new location... until another snapshot locks /that/ in place. All fine if you're just doing occasional snapshots and/or if the nocow file isn't being very actively rewritten after all; it's not that big a deal in that case. *BUT*, if you're doing time-based snapshots say every hour or so, and the file is actively being semi-randomly rewritten, the constant snapshotting locking in place the current version, forcing many of those writes to cow anyway, is going to end up fragmenting that file nearly as fast as it would without the nocow. IOW, the nocow ends up being nearly worthless on that file! There is a (partial) workaround, however. You can use the fact that snapshots stop at subvolume boundaries, putting the nocow files on their own dedicated subvolume. You can then continue snapshotting the up-tree subvolume as you were before and it'll stop at the dedicated subvolume, so the nocow files on that subvolume don't get snapshotted and thus don't get fragmented anyway. Of course without that snapshotting you'll need to do conventional backup on the files in that dedicated nocow subvolume. Another alternative is to continue snapshotting the dedicated subvolume and its nocow files, but at a lower frequency, perhaps every day or twice a day instead of every hour, or maybe twice a week instead of daily, or whatever. That will slow down but not eliminate the snapshot-triggered fragmentation of the nocow files. If you then combine that with scheduled (presumably cron job or systemd- timer) defrag of that dedicated subvolume, perhaps weekly or monthly, depending on how fast it still fragments, that can help keep performance from dragging down too badly. Of course you can use the scheduled defrag technique without the dedicated subvolume and just up the frequency of the defrags instead of decreasing the frequency of the snapshotting, too, if it works better for you. Meanwhile, how big are those files? If you're not dealing with any nocow- candidate files approaching a gig or larger, you may find that the autodefrag mount option helps. However, it works by queuing up a rewrite of the entire file for a worker thread that comes along a bit later, and if the file is too big and being written to too much, the changes to the file can end up coming faster than the file can be rewritten. Obviously that's not a good thing. Generally, for files under 100 MB autodefrag works very well. For actively rewritten files over a GB, it doesn't work well at all, and for files between 100 MB and 1 GB, it depends on the speed of your hardware and how fast the rewrites are coming in. Actually, most folks seem to be OK up to a quarter GiB or so, and most folks have problems starting around 3/4 GiB or so. 256-768 MiB is the YMMV zone.