Re: btrfs: open_ctree failed error
W dniu 2011-12-21 20:06, Chris Mason pisze: On Wed, Dec 21, 2011 at 01:54:06PM +, Malwina Bartoszynska wrote: Hello, after unmounting btrfs partition, I can't mount it again. root@xxx:~# btrfs device scan Scanning for Btrfs filesystems root@xxx:~# mount /dev/sdb /data/osd.0/ mount: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so root@:~# dmesg|tail [57192.607912] device fsid ed25c604-3e11-4459-85b5-e4090c4d22d0 devid 2 transid14429 /dev/sda [57204.796573] end_request: I/O error, dev fd0, sector 0 [57231.660913] device fsid ed25c604-3e11-4459-85b5-e4090c4d22d0 devid 1 transid 14429 /dev/sdb [57231.680387] parent transid verify failed on 424308420608 wanted 6970 found 8959 [57231.680546] parent transid verify failed on 424308420608 wanted 6970 found 8959 [57231.680705] parent transid verify failed on 424308420608 wanted 6970 found 8959 [57231.680861] parent transid verify failed on 424308420608 wanted 6970 found 8959 [57231.680869] parent transid verify failed on 424308420608 wanted 6970 found 8959 [57231.680875] Failed to read block groups: -5 [57231.704165] btrfs: open_ctree failed Can you tell us more about this filesystem? Was there an unclean shutdown or did you just unmount, mount again? The confusing thing is that all of your disks seem to have the same copy of the block, so it looks like things were written properly. -chris There was no shutdown before this, filesystem was just unmounted(which looked as properly done - no errors). Then tried to mount it again. Is there way of fixing it? -- Malwina Bartoszynska -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: new check 276 to ensure btrfs backref integrity
Thanks for the feedback. I've now removed the $fresh code completely as it's not meant to be used by anyone but me :-) _require_btrfs will become a new helper in common.rc. Will resend soon as 278 (hoping that number holds). -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] xfstests: new check 278 to ensure btrfs backref integrity
This is a btrfs specific scratch test checking the backref walker. It creates a file system with compressed and uncompressed data extents, picks files randomly and uses filefrag to get their extents. It then asks the btrfs utility (inspect-internal) to do the backref resolving from fs-logical address (the one filefrag calls physical) back to the inode number and file-logical offset, verifying the result. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- change log -v2: - renamed 276-278 - added _require_btrfs helper - check for filefrag with _require_command - added some comments - removed $fresh code - don't set FSTYP --- 278 | 255 + 278.out |4 + common.config |1 + common.rc | 12 +++ group |1 + 5 files changed, 273 insertions(+), 0 deletions(-) create mode 100755 278 create mode 100644 278.out diff --git a/278 b/278 new file mode 100755 index 000..f831a0e --- /dev/null +++ b/278 @@ -0,0 +1,255 @@ +#! /bin/bash +# FSQA Test No. 278 +# +# Run fsstress to create a reasonably strange file system, make a +# snapshot and run more fsstress. Then select some files from that fs, +# run filefrag to get the extent mapping and follow the backrefs. +# We check to end up back at the original file with the correct offset. +# +#--- +# Copyright (C) 2011 STRATO. All rights reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +# +#--- +# +# creator +owner=list.bt...@jan-o-sch.net + +seq=`basename $0` +echo QA output created by $seq + +here=`pwd` +tmp=/tmp/$$ +status=1 + +_cleanup() +{ + echo *** unmount + umount $SCRATCH_MNT 2/dev/null + rm -f $tmp.* +} +trap _cleanup; exit \$status 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter + +# real QA test starts here +_need_to_be_root +_supported_fs btrfs +_supported_os Linux +_require_scratch + +_require_nobigloopfs +_require_btrfs inspect-internal +_require_command /usr/sbin/filefrag + +rm -f $seq.full + +FILEFRAG_FILTER='if (/, blocksize (\d+)/) {$blocksize = $1; next} ($ext, '\ +'$logical, $physical, $expected, $length, $flags) = (/^\s*(\d+)\s+(\d+)'\ +'\s+(\d+)\s+(?:(\d+)\s+)?(\d+)\s+(.*)/) or next; $flags =~ '\ +'/(?:^|,)inline(?:,|$)/ and next; print $physical * $blocksize, #, '\ +'$length * $blocksize, #, $logical * $blocksize, ' + +# this makes filefrag output script readable by using a perl helper. +# output is one extent per line, with three numbers separated by '#' +# the numbers are: physical, length, logical (all in bytes) +# sample output: 1234#10#5678 - physical 1234, length 10, logical 5678 +_filter_extents() +{ + tee -a $seq.full | $PERL_PROG -ne $FILEFRAG_FILTER +} + +_check_file_extents() +{ + cmd=filefrag -vx $1 + echo # $cmd $seq.full + out=`$cmd | _filter_extents` + if [ -z $out ]; then + return 1 + fi + echo after filter: $out $seq.full + echo $out + return 0 +} + +# use a logical address and walk the backrefs back to the inode. +# compare to the expected result. +# returns 0 on success, 1 on error (with output made) +_btrfs_inspect_addr() +{ + mp=$1 + addr=$2 + expect_addr=$3 + expect_inum=$4 + file=$5 + cmd=$BTRFS_UTIL_PROG inspect-internal logical-resolve -P $addr $mp + echo # $cmd $seq.full + out=`$cmd` + echo $out $seq.full + grep_expr=inode $expect_inum offset $expect_addr root + echo $out | grep ^$grep_expr 5$ /dev/null + ret=$? + if [ $ret -eq 0 ]; then + # look for a root number that is not 5 + echo $out | grep ^$grep_expr \([0-46-9][0-9]*\|5[0-9]\+\)$ \ + /dev/null + ret=$? + fi + if [ $ret -eq 0 ]; then + return 0 + fi + echo unexpected output from + echo $cmd + echo expected inum: $expect_inum, expected address: $expect_addr,\ + file: $file, got: + echo $out + return 1 +} + +# use an inode number and walk the backrefs back to the file name. +# compare to the expected result. +# returns 0 on success, 1 on error (with output made) +_btrfs_inspect_inum() +{ +
Re: COW a file from snapshot
On Thu, 22 Dec 2011 07:12:13 PM Roman Kapusta wrote: I'm using btrfs for about two years and this is the key feature I'm missing all the time. Why is it not part of mainline btrfs already? Because nobody has written the code to do it yet? I'm sure the developers would welcome patches for this with open arms! cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP signature.asc Description: This is a digitally signed message part.
Re: COW a file from snapshot
Chris, I recommend reading the previously linked thread. The supplied (and reportedly working) patch was nacked because it violates some principles or another of file systems. (although from my limited understanding it only does it in the same way that btrfs snapshots do in the first place) On Thu, Dec 22, 2011 at 10:35 PM, Chris Samuel ch...@csamuel.org wrote: On Thu, 22 Dec 2011 07:12:13 PM Roman Kapusta wrote: I'm using btrfs for about two years and this is the key feature I'm missing all the time. Why is it not part of mainline btrfs already? Because nobody has written the code to do it yet? I'm sure the developers would welcome patches for this with open arms! cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP -- Gareth Pye Level 2 Judge, Melbourne, Australia Australian MTG Forum: mtgau.com gar...@cerberos.id.au - www.rockpaperdynamite.wordpress.com Dear God, I would like to file a bug report -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: COW a file from snapshot
Chris Samuel wrote (ao): On Thu, 22 Dec 2011 07:12:13 PM Roman Kapusta wrote: I'm using btrfs for about two years and this is the key feature I'm missing all the time. Why is it not part of mainline btrfs already? Because nobody has written the code to do it yet? I'm sure the developers would welcome patches for this with open arms! As posted in this thread by Jerome two days ago: You would need to apply this patch to your kernel: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg09096.html Is there any chance this patch gets in linux-next ? I use this feature all the time and it never broke on me. Sander -- Humilis IT Services and Solutions http://www.humilis.net -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: COW a file from snapshot
On Thu, 22 Dec 2011 10:57:10 PM Gareth Pye wrote: Chris, I recommend reading the previously linked thread. Mea culpa, I blame reading email out of order after getting out of hospital yesterday. :-( We now return you to your regularly scheduled mailing list.. -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP signature.asc Description: This is a digitally signed message part.
Re: [PATCH 0/2] btrfs: allow cross-subvolume BTRFS_IOC_CLONE
Christoph, On Sat, 2 Apr 2011 12:40:11 AM Chris Mason wrote: Excerpts from Christoph Hellwig's message of 2011-04-01 09:34:05 -0400: I don't think it's a good idea to introduce any user visible operations over subvolume boundaries. Currently we don't have any operations over mount boundaries, which is pretty fumdamental to the unix filesystem semantics. If you want to change this please come up with a clear description of the semantics and post it to linux-fsdevel for discussion. That of course requires a clear description of the btrfs subvolumes, which is still completely missing. The subvolume is just a directory tree that can be snapshotted, and has it's own private inode number space. reflink across subvolumes is no different from copying a file from one subvolume to another at the VFS level. The src and destination are different files and different inodes, they just happen to share data extents. Were Chris Mason's points above enough to sway your opposition to this functionality/patch? There is demand for the ability to move data between subvolumes without needing to copy the extents themselves, it's cropped up again on the list in recent days. It seems a little hard (and counterintuitive) to enforce a wasteful use of resources to copy data between different parts of the same filesystem which happen to be a on a different subvolume when it's permitted functional to the same filesystem on the same subvolume. I don't dispute the comment about documentation on subvolumes though, there is a short discussion of them on the btrfs wiki in the sysadmins guide, but not really a lot of detail. :-) All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP signature.asc Description: This is a digitally signed message part.
[PATCH v1 02/10] Btrfs: added helper btrfs_next_item()
btrfs_next_item() makes the btrfs path point to the next item, crossing leaf boundaries if needed. Signed-off-by: Arne Jansen sensi...@gmx.net Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.h |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 50634abe..3e4a07b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2482,6 +2482,13 @@ static inline int btrfs_insert_empty_item(struct btrfs_trans_handle *trans, } int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path); +static inline int btrfs_next_item(struct btrfs_root *root, struct btrfs_path *p) +{ + ++p-slots[0]; + if (p-slots[0] = btrfs_header_nritems(p-nodes[0])) + return btrfs_next_leaf(root, p); + return 0; +} int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path); int btrfs_leaf_free_space(struct btrfs_root *root, struct extent_buffer *leaf); void btrfs_drop_snapshot(struct btrfs_root *root, -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 04/10] Btrfs: always save ref_root in delayed refs
From: Arne Jansen sensi...@gmx.net For consistent backref walking and (later) qgroup calculation the information to which root a delayed ref belongs is useful even for shared refs. Signed-off-by: Arne Jansen sensi...@gmx.net Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/delayed-ref.c | 18 -- fs/btrfs/delayed-ref.h | 12 2 files changed, 12 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 3a0f0ab..babd37b 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -495,13 +495,12 @@ static noinline int add_delayed_tree_ref(struct btrfs_fs_info *fs_info, ref-in_tree = 1; full_ref = btrfs_delayed_node_to_tree_ref(ref); - if (parent) { - full_ref-parent = parent; + full_ref-parent = parent; + full_ref-root = ref_root; + if (parent) ref-type = BTRFS_SHARED_BLOCK_REF_KEY; - } else { - full_ref-root = ref_root; + else ref-type = BTRFS_TREE_BLOCK_REF_KEY; - } full_ref-level = level; trace_btrfs_delayed_tree_ref(ref, full_ref, action); @@ -551,13 +550,12 @@ static noinline int add_delayed_data_ref(struct btrfs_fs_info *fs_info, ref-in_tree = 1; full_ref = btrfs_delayed_node_to_data_ref(ref); - if (parent) { - full_ref-parent = parent; + full_ref-parent = parent; + full_ref-root = ref_root; + if (parent) ref-type = BTRFS_SHARED_DATA_REF_KEY; - } else { - full_ref-root = ref_root; + else ref-type = BTRFS_EXTENT_DATA_REF_KEY; - } full_ref-objectid = owner; full_ref-offset = offset; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index 8316bff..a5fb2bc 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -98,19 +98,15 @@ struct btrfs_delayed_ref_head { struct btrfs_delayed_tree_ref { struct btrfs_delayed_ref_node node; - union { - u64 root; - u64 parent; - }; + u64 root; + u64 parent; int level; }; struct btrfs_delayed_data_ref { struct btrfs_delayed_ref_node node; - union { - u64 root; - u64 parent; - }; + u64 root; + u64 parent; u64 objectid; u64 offset; }; -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 05/10] Btrfs: add nested locking mode for paths
From: Arne Jansen sensi...@gmx.net This patch adds the possibilty to read-lock an extent even if it is already write-locked from the same thread. btrfs_find_all_roots() needs this capability. Signed-off-by: Arne Jansen sensi...@gmx.net Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/ctree.c | 22 fs/btrfs/ctree.h |1 + fs/btrfs/extent_io.c |1 + fs/btrfs/extent_io.h |2 + fs/btrfs/locking.c | 51 +++-- fs/btrfs/locking.h |2 +- 6 files changed, 66 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 0639a55..d0cd67e 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -186,13 +186,14 @@ struct extent_buffer *btrfs_lock_root_node(struct btrfs_root *root) * tree until you end up with a lock on the root. A locked buffer * is returned, with a reference held. */ -struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root) +struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root, + int nested) { struct extent_buffer *eb; while (1) { eb = btrfs_root_node(root); - btrfs_tree_read_lock(eb); + btrfs_tree_read_lock(eb, nested); if (eb == root-node) break; btrfs_tree_read_unlock(eb); @@ -1637,6 +1638,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root /* everything at write_lock_level or lower must be write locked */ int write_lock_level = 0; u8 lowest_level = 0; + int nested = p-nested; lowest_level = p-lowest_level; WARN_ON(lowest_level ins_len 0); @@ -1678,8 +1680,9 @@ again: b = root-commit_root; extent_buffer_get(b); level = btrfs_header_level(b); + BUG_ON(p-skip_locking nested); if (!p-skip_locking) - btrfs_tree_read_lock(b); + btrfs_tree_read_lock(b, 0); } else { if (p-skip_locking) { b = btrfs_root_node(root); @@ -1688,7 +1691,7 @@ again: /* we don't know the level of the root node * until we actually have it read locked */ - b = btrfs_read_lock_root_node(root); + b = btrfs_read_lock_root_node(root, nested); level = btrfs_header_level(b); if (level = write_lock_level) { /* whoops, must trade for write lock */ @@ -1827,7 +1830,8 @@ cow_done: err = btrfs_try_tree_read_lock(b); if (!err) { btrfs_set_path_blocking(p); - btrfs_tree_read_lock(b); + btrfs_tree_read_lock(b, +nested); btrfs_clear_path_blocking(p, b, BTRFS_READ_LOCK); } @@ -3972,7 +3976,7 @@ int btrfs_search_forward(struct btrfs_root *root, struct btrfs_key *min_key, WARN_ON(!path-keep_locks); again: - cur = btrfs_read_lock_root_node(root); + cur = btrfs_read_lock_root_node(root, 0); level = btrfs_header_level(cur); WARN_ON(path-nodes[level]); path-nodes[level] = cur; @@ -4066,7 +4070,7 @@ find_next_key: cur = read_node_slot(root, cur, slot); BUG_ON(!cur); - btrfs_tree_read_lock(cur); + btrfs_tree_read_lock(cur, 0); path-locks[level - 1] = BTRFS_READ_LOCK; path-nodes[level - 1] = cur; @@ -4260,7 +4264,7 @@ again: ret = btrfs_try_tree_read_lock(next); if (!ret) { btrfs_set_path_blocking(path); - btrfs_tree_read_lock(next); + btrfs_tree_read_lock(next, 0); btrfs_clear_path_blocking(path, next, BTRFS_READ_LOCK); } @@ -4297,7 +4301,7 @@ again: ret = btrfs_try_tree_read_lock(next); if (!ret) { btrfs_set_path_blocking(path); - btrfs_tree_read_lock(next); + btrfs_tree_read_lock(next, 0); btrfs_clear_path_blocking(path, next,
[PATCH v1 09/10] Btrfs: added btrfs_find_all_roots()
This function gets a byte number (a data extent), collects all the leafs pointing to it and walks up the trees to find all fs roots pointing to those leafs. It also returns the list of all leafs pointing to that extent. It does proper locking for the involved trees, can be used on busy file systems and honors delayed refs. Signed-off-by: Arne Jansen sensi...@gmx.net Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/backref.c | 784 fs/btrfs/backref.h |5 + 2 files changed, 789 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 22c64ff..e01790e 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -19,6 +19,9 @@ #include ctree.h #include disk-io.h #include backref.h +#include ulist.h +#include transaction.h +#include delayed-ref.h struct __data_ref { struct list_head list; @@ -32,6 +35,787 @@ struct __shared_ref { u64 disk_byte; }; +/* + * this structure records all encountered refs on the way up to the root + */ +struct __prelim_ref { + struct list_head list; + u64 root_id; + struct btrfs_key key; + int level; + int count; + u64 parent; + u64 wanted_disk_byte; +}; + +static int __add_prelim_ref(struct list_head *head, u64 root_id, + struct btrfs_key *key, int level, u64 parent, + u64 wanted_disk_byte, int count) +{ + struct __prelim_ref *ref; + + /* in case we're adding delayed refs, we're holding the refs spinlock */ + ref = kmalloc(sizeof(*ref), GFP_ATOMIC); + if (!ref) + return -ENOMEM; + + ref-root_id = root_id; + if (key) + ref-key = *key; + else + memset(ref-key, 0, sizeof(ref-key)); + + ref-level = level; + ref-count = count; + ref-parent = parent; + ref-wanted_disk_byte = wanted_disk_byte; + list_add_tail(ref-list, head); + + return 0; +} + +static int add_all_parents(struct btrfs_root *root, struct btrfs_path *path, + struct ulist *parents, + struct extent_buffer *eb, int level, + u64 wanted_objectid, u64 wanted_disk_byte) +{ + int ret; + int slot; + struct btrfs_file_extent_item *fi; + struct btrfs_key key; + u64 disk_byte; + +add_parent: + ret = ulist_add(parents, eb-start, 0, GFP_NOFS); + if (ret 0) + return ret; + + if (level != 0) + return 0; + + /* +* if the current leaf is full with EXTENT_DATA items, we must +* check the next one if that holds a reference as well. +* ref-count cannot be used to skip this check. +* repeat this until we don't find any additional EXTENT_DATA items. +*/ + while (1) { + ret = btrfs_next_leaf(root, path); + if (ret 0) + return ret; + if (ret) + return 0; + + eb = path-nodes[0]; + for (slot = 0; slot btrfs_header_nritems(eb); ++slot) { + btrfs_item_key_to_cpu(eb, key, slot); + if (key.objectid != wanted_objectid || + key.type != BTRFS_EXTENT_DATA_KEY) + return 0; + fi = btrfs_item_ptr(eb, slot, + struct btrfs_file_extent_item); + disk_byte = btrfs_file_extent_disk_bytenr(eb, fi); + if (disk_byte == wanted_disk_byte) + goto add_parent; + } + } + + return 0; +} + +/* + * resolve an indirect backref in the form (root_id, key, level) + * to a logical address + */ +static int __resolve_indirect_ref(struct btrfs_fs_info *fs_info, + struct __prelim_ref *ref, + struct ulist *parents) +{ + struct btrfs_path *path; + struct btrfs_root *root; + struct btrfs_key root_key; + struct btrfs_key key = {0}; + struct extent_buffer *eb; + int ret = 0; + int root_level; + int level = ref-level; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + root_key.objectid = ref-root_id; + root_key.type = BTRFS_ROOT_ITEM_KEY; + root_key.offset = (u64)-1; + root = btrfs_read_fs_root_no_name(fs_info, root_key); + if (IS_ERR(root)) { + ret = PTR_ERR(root); + goto out; + } + + rcu_read_lock(); + root_level = btrfs_header_level(root-node); + rcu_read_unlock(); + + if (root_level + 1 == level) + goto out; + + path-lowest_level = level; + path-nested = 1; + ret =
[PATCH v1 10/10] Btrfs: new backref walking code
The old backref iteration code could only safely be used on commit roots. Besides this limitation, it had bugs in finding the roots for these references. This commit replaces large parts of it by btrfs_find_all_roots() which a) really finds all roots and the correct roots, b) works correctly under heavy file system load, c) considers delayed refs. Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/backref.c | 354 +++- fs/btrfs/ioctl.c |8 +- fs/btrfs/scrub.c |7 +- 3 files changed, 107 insertions(+), 262 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index e01790e..2fdb4b1 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -23,18 +23,6 @@ #include transaction.h #include delayed-ref.h -struct __data_ref { - struct list_head list; - u64 inum; - u64 root; - u64 extent_data_item_offset; -}; - -struct __shared_ref { - struct list_head list; - u64 disk_byte; -}; - /* * this structure records all encountered refs on the way up to the root */ @@ -965,8 +953,11 @@ int extent_from_logical(struct btrfs_fs_info *fs_info, u64 logical, btrfs_item_key_to_cpu(path-nodes[0], found_key, path-slots[0]); if (found_key-type != BTRFS_EXTENT_ITEM_KEY || found_key-objectid logical || - found_key-objectid + found_key-offset = logical) + found_key-objectid + found_key-offset = logical) { + pr_debug(logical %llu is not within any extent\n, +(unsigned long long)logical); return -ENOENT; + } eb = path-nodes[0]; item_size = btrfs_item_size_nr(eb, path-slots[0]); @@ -975,6 +966,13 @@ int extent_from_logical(struct btrfs_fs_info *fs_info, u64 logical, ei = btrfs_item_ptr(eb, path-slots[0], struct btrfs_extent_item); flags = btrfs_extent_flags(eb, ei); + pr_debug(logical %llu is at position %llu within the extent (%llu +EXTENT_ITEM %llu) flags %#llx size %u\n, +(unsigned long long)logical, +(unsigned long long)(logical - found_key-objectid), +(unsigned long long)found_key-objectid, +(unsigned long long)found_key-offset, +(unsigned long long)flags, item_size); if (flags BTRFS_EXTENT_FLAG_TREE_BLOCK) return BTRFS_EXTENT_FLAG_TREE_BLOCK; if (flags BTRFS_EXTENT_FLAG_DATA) @@ -1071,128 +1069,11 @@ int tree_backref_for_extent(unsigned long *ptr, struct extent_buffer *eb, return 0; } -static int __data_list_add(struct list_head *head, u64 inum, - u64 extent_data_item_offset, u64 root) -{ - struct __data_ref *ref; - - ref = kmalloc(sizeof(*ref), GFP_NOFS); - if (!ref) - return -ENOMEM; - - ref-inum = inum; - ref-extent_data_item_offset = extent_data_item_offset; - ref-root = root; - list_add_tail(ref-list, head); - - return 0; -} - -static int __data_list_add_eb(struct list_head *head, struct extent_buffer *eb, - struct btrfs_extent_data_ref *dref) -{ - return __data_list_add(head, btrfs_extent_data_ref_objectid(eb, dref), - btrfs_extent_data_ref_offset(eb, dref), - btrfs_extent_data_ref_root(eb, dref)); -} - -static int __shared_list_add(struct list_head *head, u64 disk_byte) -{ - struct __shared_ref *ref; - - ref = kmalloc(sizeof(*ref), GFP_NOFS); - if (!ref) - return -ENOMEM; - - ref-disk_byte = disk_byte; - list_add_tail(ref-list, head); - - return 0; -} - -static int __iter_shared_inline_ref_inodes(struct btrfs_fs_info *fs_info, - u64 logical, u64 inum, - u64 extent_data_item_offset, - u64 extent_offset, - struct btrfs_path *path, - struct list_head *data_refs, - iterate_extent_inodes_t *iterate, - void *ctx) -{ - u64 ref_root; - u32 item_size; - struct btrfs_key key; - struct extent_buffer *eb; - struct btrfs_extent_item *ei; - struct btrfs_extent_inline_ref *eiref; - struct __data_ref *ref; - int ret; - int type; - int last; - unsigned long ptr = 0; - - WARN_ON(!list_empty(data_refs)); - ret = extent_from_logical(fs_info, logical, path, key); - if (ret BTRFS_EXTENT_FLAG_DATA) - ret = -EIO; - if (ret 0) - goto out; - - eb = path-nodes[0]; - ei = btrfs_item_ptr(eb, path-slots[0], struct btrfs_extent_item); - item_size = btrfs_item_size_nr(eb,
[PATCH v1 06/10] Btrfs: add sequence numbers to delayed refs
From: Arne Jansen sensi...@gmx.net Sequence numbers are needed to reconstruct the backrefs of a given extent to a certain point in time. The total set of backrefs consist of the set of backrefs recorded on disk plus the enqueued delayed refs for it that existed at that moment. This patch also adds a list that records all delayed refs which are currently in the process of being added. When walking all refs of an extent in btrfs_find_all_roots(), we freeze the current state of delayed refs, honor anythinh up to this point and prevent processing newer delayed refs to assert consistency. Signed-off-by: Arne Jansen sensi...@gmx.net Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/delayed-ref.c | 34 +++ fs/btrfs/delayed-ref.h | 70 fs/btrfs/transaction.c |4 +++ 3 files changed, 108 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index babd37b..a405db0 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -101,6 +101,11 @@ static int comp_entry(struct btrfs_delayed_ref_node *ref2, return -1; if (ref1-type ref2-type) return 1; + /* merging of sequenced refs is not allowed */ + if (ref1-seq ref2-seq) + return -1; + if (ref1-seq ref2-seq) + return 1; if (ref1-type == BTRFS_TREE_BLOCK_REF_KEY || ref1-type == BTRFS_SHARED_BLOCK_REF_KEY) { return comp_tree_refs(btrfs_delayed_node_to_tree_ref(ref2), @@ -209,6 +214,24 @@ int btrfs_delayed_ref_lock(struct btrfs_trans_handle *trans, return 0; } +int btrfs_check_delayed_seq(struct btrfs_delayed_ref_root *delayed_refs, + u64 seq) +{ + struct seq_list *elem; + + assert_spin_locked(delayed_refs-lock); + if (list_empty(delayed_refs-seq_head)) + return 0; + + elem = list_first_entry(delayed_refs-seq_head, struct seq_list, list); + if (seq = elem-seq) { + pr_debug(holding back delayed_ref %llu, lowest is %llu (%p)\n, +seq, elem-seq, delayed_refs); + return 1; + } + return 0; +} + int btrfs_find_ref_cluster(struct btrfs_trans_handle *trans, struct list_head *cluster, u64 start) { @@ -438,6 +461,7 @@ static noinline int add_delayed_ref_head(struct btrfs_fs_info *fs_info, ref-action = 0; ref-is_head = 1; ref-in_tree = 1; + ref-seq = 0; head_ref = btrfs_delayed_node_to_head(ref); head_ref-must_insert_reserved = must_insert_reserved; @@ -479,6 +503,7 @@ static noinline int add_delayed_tree_ref(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_node *existing; struct btrfs_delayed_tree_ref *full_ref; struct btrfs_delayed_ref_root *delayed_refs; + u64 seq = 0; if (action == BTRFS_ADD_DELAYED_EXTENT) action = BTRFS_ADD_DELAYED_REF; @@ -494,6 +519,10 @@ static noinline int add_delayed_tree_ref(struct btrfs_fs_info *fs_info, ref-is_head = 0; ref-in_tree = 1; + if (need_ref_seq(for_cow, ref_root)) + seq = inc_delayed_seq(delayed_refs); + ref-seq = seq; + full_ref = btrfs_delayed_node_to_tree_ref(ref); full_ref-parent = parent; full_ref-root = ref_root; @@ -534,6 +563,7 @@ static noinline int add_delayed_data_ref(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_node *existing; struct btrfs_delayed_data_ref *full_ref; struct btrfs_delayed_ref_root *delayed_refs; + u64 seq = 0; if (action == BTRFS_ADD_DELAYED_EXTENT) action = BTRFS_ADD_DELAYED_REF; @@ -549,6 +579,10 @@ static noinline int add_delayed_data_ref(struct btrfs_fs_info *fs_info, ref-is_head = 0; ref-in_tree = 1; + if (need_ref_seq(for_cow, ref_root)) + seq = inc_delayed_seq(delayed_refs); + ref-seq = seq; + full_ref = btrfs_delayed_node_to_data_ref(ref); full_ref-parent = parent; full_ref-root = ref_root; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index a5fb2bc..174416f 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -33,6 +33,9 @@ struct btrfs_delayed_ref_node { /* the size of the extent */ u64 num_bytes; + /* seq number to keep track of insertion order */ + u64 seq; + /* ref count on this data structure */ atomic_t refs; @@ -136,6 +139,20 @@ struct btrfs_delayed_ref_root { int flushing; u64 run_delayed_start; + + /* +* seq number of delayed refs. We need to know if a backref was being +* added before the currently processed ref or afterwards. +*/ + u64 seq; + + /* +* seq_list holds a list of all seq numbers that are
[PATCH v1 00/10] Btrfs: backref walking rewrite
This patch series is a major rewrite of the backref walking code. The patch series Arne sent some weeks ago for quota groups had a very interesting function, find_all_roots. I took this from him together with the bits needed for find_all_roots to work and replaced a major part of the code in backref.c with it. It can be pulled from git://git.jan-o-sch.net/btrfs-unstable for-chris There's also a gitweb for that repo on http://git.jan-o-sch.net/?p=btrfs-unstable My old backref code had several problems: - it relied on a consistent state of the trees in memory - it ignored delayed refs - it only featured rudimentary locking - it could miss some references depending on the tree layout The biggest advantage is, that we're now able to do reliable backref resolving, even on busy file systems. So we've got benefits for: - the existing btrfs inspect-internal commands - aforementioned qgroups (patches on the list) - btrfs send (currently in development) - snapshot-aware defrag - ... possibly more to come Splitting the needed bits out of Arne's code was a quite intrusive operation. In case this goes into 3.3, any of us will soon make a rebased version of the qgroup patch set. Things corrected/changed in Arne's code along the way: - don't assume INODE_ITEMs and the corresponding EXTENT_DATA items are in the same leaf (use the correct EXTENT_DATA_KEY for tree searches) - don't assume all EXTENT_DATA items with the same backref for the same inode are in the same leaf (__resolve_indirect_refs can now add more refs) - added missing key and level to prelim lists for shared block refs - delayed ref sequence locking ability without wasting sequence numbers - waitqueue instead of busy waiting for more delayed refs As this touches a critical part of the file system, I also did some speed benchmarks. It turns out that dbench shows no performance decrease on my hardware. I can do more tests if desired. By the way: this patch series fixes xfstest 278 (to be published soon) :-) -Jan Arne Jansen (6): Btrfs: generic data structure to build unique lists Btrfs: mark delayed refs as for cow Btrfs: always save ref_root in delayed refs Btrfs: add nested locking mode for paths Btrfs: add sequence numbers to delayed refs Btrfs: put back delayed refs that are too new Jan Schmidt (4): Btrfs: added helper btrfs_next_item() Btrfs: add waitqueue instead of doing busy waiting for more delayed refs Btrfs: added btrfs_find_all_roots() Btrfs: new backref walking code fs/btrfs/Makefile |2 +- fs/btrfs/backref.c | 1132 +--- fs/btrfs/backref.h |5 + fs/btrfs/ctree.c | 64 ++-- fs/btrfs/ctree.h | 25 +- fs/btrfs/delayed-ref.c | 153 +-- fs/btrfs/delayed-ref.h | 104 - fs/btrfs/disk-io.c |3 +- fs/btrfs/extent-tree.c | 187 ++-- fs/btrfs/extent_io.c |1 + fs/btrfs/extent_io.h |2 + fs/btrfs/file.c| 10 +- fs/btrfs/inode.c |2 +- fs/btrfs/ioctl.c | 13 +- fs/btrfs/locking.c | 51 ++- fs/btrfs/locking.h |2 +- fs/btrfs/relocation.c | 18 +- fs/btrfs/scrub.c |7 +- fs/btrfs/transaction.c |9 +- fs/btrfs/tree-log.c|2 +- fs/btrfs/ulist.c | 220 ++ fs/btrfs/ulist.h | 68 +++ 22 files changed, 1651 insertions(+), 429 deletions(-) create mode 100644 fs/btrfs/ulist.c create mode 100644 fs/btrfs/ulist.h -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 01/10] Btrfs: generic data structure to build unique lists
From: Arne Jansen sensi...@gmx.net ulist is a generic data structures to hold a collection of unique u64 values. The only operations it supports is adding to the list and enumerating it. It is possible to store an auxiliary value along with the key. The implementation is preliminary and can probably be sped up significantly. It is used by btrfs_find_all_roots() quota to translate recursions into iterative loops. Signed-off-by: Arne Jansen sensi...@gmx.net Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/Makefile |2 +- fs/btrfs/ulist.c | 220 + fs/btrfs/ulist.h | 68 3 files changed, 289 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index c0ddfd2..7079840 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -8,6 +8,6 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ export.o tree-log.o free-space-cache.o zlib.o lzo.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ - reada.o backref.o + reada.o backref.o ulist.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o diff --git a/fs/btrfs/ulist.c b/fs/btrfs/ulist.c new file mode 100644 index 000..12f5147 --- /dev/null +++ b/fs/btrfs/ulist.c @@ -0,0 +1,220 @@ +/* + * Copyright (C) 2011 STRATO AG + * written by Arne Jansen sensi...@gmx.net + * Distributed under the GNU GPL license version 2. + */ + +#include linux/slab.h +#include linux/module.h +#include ulist.h + +/* + * ulist is a generic data structure to hold a collection of unique u64 + * values. The only operations it supports is adding to the list and + * enumerating it. + * It is possible to store an auxiliary value along with the key. + * + * The implementation is preliminary and can probably be sped up + * significantly. A first step would be to store the values in an rbtree + * as soon as ULIST_SIZE is exceeded. + * + * A sample usage for ulists is the enumeration of directed graphs without + * visiting a node twice. The pseudo-code could look like this: + * + * ulist = ulist_alloc(); + * ulist_add(ulist, root); + * elem = NULL; + * + * while ((elem = ulist_next(ulist, elem)) { + * for (all child nodes n in elem) + * ulist_add(ulist, n); + * do something useful with the node; + * } + * ulist_free(ulist); + * + * This assumes the graph nodes are adressable by u64. This stems from the + * usage for tree enumeration in btrfs, where the logical addresses are + * 64 bit. + * + * It is also useful for tree enumeration which could be done elegantly + * recursively, but is not possible due to kernel stack limitations. The + * loop would be similar to the above. + */ + +/** + * ulist_init - freshly initialize a ulist + * @ulist: the ulist to initialize + * + * Note: don't use this function to init an already used ulist, use + * ulist_reinit instead. + */ +void ulist_init(struct ulist *ulist) +{ + ulist-nnodes = 0; + ulist-nodes = ulist-int_nodes; + ulist-nodes_alloced = ULIST_SIZE; +} +EXPORT_SYMBOL(ulist_init); + +/** + * ulist_fini - free up additionally allocated memory for the ulist + * @ulist: the ulist from which to free the additional memory + * + * This is useful in cases where the base 'struct ulist' has been statically + * allocated. + */ +void ulist_fini(struct ulist *ulist) +{ + /* +* The first ULIST_SIZE elements are stored inline in struct ulist. +* Only if more elements are alocated they need to be freed. +*/ + if (ulist-nodes_alloced ULIST_SIZE) + kfree(ulist-nodes); + ulist-nodes_alloced = 0; /* in case ulist_fini is called twice */ +} +EXPORT_SYMBOL(ulist_fini); + +/** + * ulist_reinit - prepare a ulist for reuse + * @ulist: ulist to be reused + * + * Free up all additional memory allocated for the list elements and reinit + * the ulist. + */ +void ulist_reinit(struct ulist *ulist) +{ + ulist_fini(ulist); + ulist_init(ulist); +} +EXPORT_SYMBOL(ulist_reinit); + +/** + * ulist_alloc - dynamically allocate a ulist + * @gfp_mask: allocation flags to for base allocation + * + * The allocated ulist will be returned in an initialized state. + */ +struct ulist *ulist_alloc(unsigned long gfp_mask) +{ + struct ulist *ulist = kmalloc(sizeof(*ulist), gfp_mask); + + if (!ulist) + return NULL; + + ulist_init(ulist); + + return ulist; +} +EXPORT_SYMBOL(ulist_alloc); + +/** + * ulist_free - free dynamically allocated ulist + * @ulist: ulist to free + * + * It is not necessary to call ulist_fini before. + */ +void ulist_free(struct ulist *ulist) +{ + if (!ulist) + return; + ulist_fini(ulist); + kfree(ulist); +} +EXPORT_SYMBOL(ulist_free); + +/** + * ulist_add - add an element to the ulist + * @ulist:
[PATCH v1 07/10] Btrfs: put back delayed refs that are too new
From: Arne Jansen sensi...@gmx.net When processing a delayed ref, first check if there are still old refs in the process of being added. If so, put this ref back to the tree. To avoid looping on this ref, choose a newer one in the next loop. btrfs_find_ref_cluster has to take care of that. Signed-off-by: Arne Jansen sensi...@gmx.net Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net --- fs/btrfs/delayed-ref.c | 43 +-- fs/btrfs/extent-tree.c | 27 ++- 2 files changed, 47 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index a405db0..ee18198 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -155,16 +155,22 @@ static struct btrfs_delayed_ref_node *tree_insert(struct rb_root *root, /* * find an head entry based on bytenr. This returns the delayed ref - * head if it was able to find one, or NULL if nothing was in that spot + * head if it was able to find one, or NULL if nothing was in that spot. + * If return_bigger is given, the next bigger entry is returned if no exact + * match is found. */ static struct btrfs_delayed_ref_node *find_ref_head(struct rb_root *root, u64 bytenr, - struct btrfs_delayed_ref_node **last) + struct btrfs_delayed_ref_node **last, + int return_bigger) { - struct rb_node *n = root-rb_node; + struct rb_node *n; struct btrfs_delayed_ref_node *entry; - int cmp; + int cmp = 0; +again: + n = root-rb_node; + entry = NULL; while (n) { entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node); WARN_ON(!entry-in_tree); @@ -187,6 +193,19 @@ static struct btrfs_delayed_ref_node *find_ref_head(struct rb_root *root, else return entry; } + if (entry return_bigger) { + if (cmp 0) { + n = rb_next(entry-rb_node); + if (!n) + n = rb_first(root); + entry = rb_entry(n, struct btrfs_delayed_ref_node, +rb_node); + bytenr = entry-bytenr; + return_bigger = 0; + goto again; + } + return entry; + } return NULL; } @@ -246,20 +265,8 @@ int btrfs_find_ref_cluster(struct btrfs_trans_handle *trans, node = rb_first(delayed_refs-root); } else { ref = NULL; - find_ref_head(delayed_refs-root, start, ref); + find_ref_head(delayed_refs-root, start + 1, ref, 1); if (ref) { - struct btrfs_delayed_ref_node *tmp; - - node = rb_prev(ref-rb_node); - while (node) { - tmp = rb_entry(node, - struct btrfs_delayed_ref_node, - rb_node); - if (tmp-bytenr start) - break; - ref = tmp; - node = rb_prev(ref-rb_node); - } node = ref-rb_node; } else node = rb_first(delayed_refs-root); @@ -748,7 +755,7 @@ btrfs_find_delayed_ref_head(struct btrfs_trans_handle *trans, u64 bytenr) struct btrfs_delayed_ref_root *delayed_refs; delayed_refs = trans-transaction-delayed_refs; - ref = find_ref_head(delayed_refs-root, bytenr, NULL); + ref = find_ref_head(delayed_refs-root, bytenr, NULL, 0); if (ref) return btrfs_delayed_node_to_head(ref); return NULL; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index dc8b9a8..bbcca12 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2237,6 +2237,28 @@ static noinline int run_clustered_refs(struct btrfs_trans_handle *trans, } /* +* locked_ref is the head node, so we have to go one +* node back for any delayed ref updates +*/ + ref = select_delayed_ref(locked_ref); + + if (ref ref-seq + btrfs_check_delayed_seq(delayed_refs, ref-seq)) { + /* +* there are still refs with lower seq numbers in the +* process of being added. Don't run this ref yet. +*/ + list_del_init(locked_ref-cluster); + mutex_unlock(locked_ref-mutex); + locked_ref = NULL; + delayed_refs-num_heads_ready++;
[RESEND] [PATCH v2] Btrfs: runtime integrity check tool
Sigh. In the previously sent v2 patch the mail 1/4 exceeded the archaic 100,000 chars limit of vger.kernel.org (no complains from checkpatch.pl though). Therefore I now prepared a git-daemon for pulling. Please pull from git://btrfs.giantdisaster.de/git/btrfs integrity-check-patch-v2 Changes v1-v2: - Merge with updated disk flush code - Use bdevname to print the bdev's name instead of the disk's name - Fix v1 formatting issue (labels, and a few casts still had been wrong) - Merge with current scrub.c - Cast all u64 parameters to unsigned long long for printk - Fix issue with read errors on lower layers (caused by one signed / unsigned mixup) - Fix comment in check-integrity.c - Check that data that is referenced from a newly written superblock was FLUSHed before - Change the way the hash keys are calculated - Add code to runtime integrity check tool that verifies that all written meta blocks contain a logical bytenr that maps to the device and physical bytenr that is used to submit the bio - Shrink Kconfig help entry to 14 lines This patch series adds a new module to the btrfs kernel mode code. This new module can be used to catch cases when the btrfs kernel code executes write requests to the disk that bring the file system in an inconsistent state. In such a state, a power-loss or kernel panic event would cause that the data on disk is lost or at least damaged. Code is added that examines all block write requests during runtime (including writes of the super block). Three rules are verified and an error is printed on violation of the rules: 1. It is not allowed to write a disk block which is currently referenced by the super block (either directly or indirectly). 2. When a super block is written, it is verified that all referenced (directly or indirectly) blocks fulfill the following requirements: 2a. All referenced blocks have either been present when the file system was mounted, (i.e., they have been referenced by the super block) or they have been written since then and the write completion callback was called. 2b. All referenced blocks need to have a generation number which is equal to the parent's number. Before commit v3.1-83-g5ff921b from Chris Mason, the xfstests 013 and 083 used to trigger integrity issues in the log tree. Disk blocks that had been in use for the log tree had been freed and reused too early, while being referenced by the written super block. Since this issue with the log tree is fixed, no more issues with the on-disk file system have been found while running all currently available xfstests for btrfs and generic. The search term in the kernel log that can be used to filter on the existence of detected integrity issues is btrfs: attempt. The integrity check is enabled via mount options. These mount options are only supported if the integrity check tool is compiled by defining BTRFS_FS_CHECK_INTEGRITY. Example #1, apply integrity checks to all metadata: mount /dev/sdb1 /mnt -o check_int Example #2, apply integrity checks to all metadata and to data extents: mount /dev/sdb1 /mnt -o check_int_data Example #3, apply integrity checks to all metadata and dump the tree that the super block references to kernel messages each time after a super block was written: mount /dev/sdb1 /mnt -o check_int,check_int_print_mask=263 If the integrity check tool is included and activated in the mount options, plenty of kernel memory is used, and plenty of additional CPU cycles are spent. Enabling this functionality is not intended for normal use. In most cases, unless you are a btrfs developer who needs to verify the integrity of (super)-block write requests, do not enable the config option BTRFS_FS_CHECK_INTEGRITY to include and compile the integrity check tool. The patches are based on Chris Mason's v3.1-182-gd85c8a6. Stefan Behrens (4): Btrfs: add optional integrity check code Btrfs: add config option to enable btrfs integrity check Btrfs: Makefile changes to optionally include btrfs integrity check Btrfs: integrate integrity check module into btrfs fs/btrfs/Kconfig | 19 + fs/btrfs/Makefile |1 + fs/btrfs/check-integrity.c | 3068 fs/btrfs/check-integrity.h | 36 + fs/btrfs/ctree.h |8 +- fs/btrfs/disk-io.c | 26 +- fs/btrfs/extent_io.c |5 +- fs/btrfs/scrub.c |5 +- fs/btrfs/super.c | 39 +- fs/btrfs/volumes.c |7 +- 10 files changed, 3203 insertions(+), 11 deletions(-) create mode 100644 fs/btrfs/check-integrity.c create mode 100644 fs/btrfs/check-integrity.h -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 05/10] Btrfs: add nested locking mode for paths
On Thu, Dec 22, 2011 at 05:03:19PM +0100, Jan Schmidt wrote: From: Arne Jansen sensi...@gmx.net This patch adds the possibilty to read-lock an extent even if it is already write-locked from the same thread. btrfs_find_all_roots() needs this capability. I'd rather not add a nested flag to the locking code, lets just make the nesting explicitly allowed. You shouldn't need locks around lock-owner. Either your process owns the lock (and it won't change away from your pid), or you don't own it and it won't be your pid. Just make sure the owner field gets cleared when you do your final unlock. So, if you are the owner of a write lock, you can add more write locks or a read lock as required. Could you please describe the case where btrfs_find_all_roots needs this? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[3.2.0-rc6] WARNING: at fs/btrfs/extent-tree.c:4771 while deleting subvolume
Hello btrfs... I tried to delete a subvolume which probably has some transid errors. After this, the subvolume is gone but I cannot reboot - it hangs. After reisub, the deleted subvolume is right back there (this is different from previous kernel version before 3.2.0-rc4 (afair) where the subvolume was gone even after hard reboot but during mount the btrfs got some hickups from btrfs_cleaner then). From this I suppose that btrfs is much more robust to unexpected reboots now, but how can I get rid of this broken subvolume now? Here's my dmesg (including sysrq+w): [ 121.411013] device fsid 311dda08-f33f-4cb9-9d59-6eac6026b1b1 devid 2 transid 146955 /dev/sda3 [ 121.411330] btrfs: use lzo compression [ 121.411333] btrfs: disk space caching is enabled [ 125.232594] zcache: created ephemeral tmem pool, id=2, client=65535 [ 157.519388] Old style space inode found, converting. [ 157.525214] Old style space inode found, converting. [ 157.525227] Old style space inode found, converting. [ 157.525236] Old style space inode found, converting. [ 157.525242] Old style space inode found, converting. [ 157.525446] Old style space inode found, converting. [ 157.525634] Old style space inode found, converting. [ 157.528640] Old style space inode found, converting. [ 157.529025] Old style space inode found, converting. [ 157.534514] Old style space inode found, converting. [ 157.534916] Old style space inode found, converting. [ 157.544907] Old style space inode found, converting. [ 157.545118] Old style space inode found, converting. [ 157.545312] Old style space inode found, converting. [ 157.545489] Old style space inode found, converting. [ 157.545675] Old style space inode found, converting. [ 157.545683] Old style space inode found, converting. [ 157.550657] Old style space inode found, converting. [ 157.550677] Old style space inode found, converting. [ 157.550879] Old style space inode found, converting. [ 157.551085] Old style space inode found, converting. [ 157.551265] Old style space inode found, converting. [ 157.551272] Old style space inode found, converting. [ 157.564007] btrfs: truncated 1 orphans [ 157.854236] Old style space inode found, converting. [ 157.895119] btrfs: unlinked 6 orphans [ 157.895122] btrfs: truncated 8 orphans [ 225.682864] Old style space inode found, converting. [ 262.885468] Old style space inode found, converting. [ 262.885477] Old style space inode found, converting. [ 262.885484] Old style space inode found, converting. [ 262.885490] Old style space inode found, converting. [ 262.885498] Old style space inode found, converting. [ 262.885504] Old style space inode found, converting. [ 262.885511] Old style space inode found, converting. [ 262.885525] Old style space inode found, converting. [ 262.885531] Old style space inode found, converting. [ 262.885537] Old style space inode found, converting. [ 262.885543] Old style space inode found, converting. [ 298.668898] Old style space inode found, converting. [ 298.668906] Old style space inode found, converting. [ 302.264552] parent transid verify failed on 622147694592 wanted 130733 found 134506 [ 302.264562] parent transid verify failed on 622147694592 wanted 130733 found 134506 [ 302.264575] parent transid verify failed on 622147694592 wanted 130733 found 134506 [ 302.264579] parent transid verify failed on 622147694592 wanted 130733 found 134506 [ 302.264582] parent transid verify failed on 622147694592 wanted 130733 found 134506 [ 302.264585] [ cut here ] [ 302.264592] WARNING: at fs/btrfs/extent-tree.c:4771 __btrfs_free_extent+0x290/0x5c7() [ 302.264595] Hardware name: To Be Filled By O.E.M. [ 302.264596] Modules linked in: af_packet snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss nls_iso8859_15 nls_cp437 vfat fat zram(C) loop tcp_cubic snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq_device gspca_sonixj gspca_main videodev v4l2_compat_ioctl32 evdev i2c_i801 pcspkr unix fuse xfs nfs nfs_acl auth_rpcgss lockd sunrpc reiserfs scsi_wait_scan hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech usbhid usb_storage hid sr_mod cdrom sg pata_cmd64x [last unloaded: microcode] [ 302.264635] Pid: 6303, comm: btrfs-delayed-m Tainted: G C 3.2.0-rc6 #5 [ 302.264637] Call Trace: [ 302.264644] [810333ea] ? warn_slowpath_common+0x78/0x8c [ 302.264647] [8114e6f3] ? __btrfs_free_extent+0x290/0x5c7 [ 302.264651] [810b2998] ? __slab_free+0xd1/0x236 [ 302.264655] [81151a9f] ? run_clustered_refs+0x66c/0x6b8 [ 302.264659] [81151bb4] ? btrfs_run_delayed_refs+0xc9/0x173 [ 302.264663] [8115f82c] ? __btrfs_end_transaction+0x90/0x1dd [ 302.264668] [810274b3] ? should_resched+0x5/0x24 [ 302.264673] [8119690d] ? btrfs_async_run_delayed_node_done+0x16c/0x1ca [ 302.264677]
Re: Error handling: How to lose a transaction
On 12/23/2011 01:12 PM, Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/21/2011 10:38 PM, Jeff Mahoney wrote: On 12/21/2011 10:21 PM, Liu Bo wrote: On 12/22/2011 10:59 AM, Jeff Mahoney wrote: Sorry I haven't responded to this yet. I started digging right in and I've started to have some good results. It turns out there's already a btrfs_cleanup_transaction call that will tear down outstanding transactions. It's not perfect and I've fixed a few bugs in there, but it saved me a bunch of effort. I just wished I noticed it a day before since I had it half implemented myself. :) Hi Jeff, Yes, it should be, and I wrote this cleanup_transaction where I should notice you earlier... Anyway, thanks for your effort. The error handling part has lots of corner cases, so I just pick up a brute way to tear down the current transaction in order to make the FS RO. Oh, and it's worked great. The brute force method is a good start and will address the most severe problems (and most cases) well. I've decided to ignore most cases of -ENOMEM for now. The biggest bug I ran into so far was calling mutex_lock while holding a spinlock. It was a quick fix. The method I've generally used is to mark the transaction aborted and pass the error up as quickly as possible, cleaning up the local allocations and locks as I go. The transaction gets completed normally, returns an error, isn't committed, and then is destroyed (with others, potentially) when called from in btrfs_commit_transaction. Btrfs makes this super easy since we can just skip all the CoW writes. Now, just out of curiosity, would it be ok if I printed this when we ran out memory in deep call paths? I'm ok with this, but it depends on Chris :) Indeed, ENOMEM in deep call paths is a big big trouble for us, we don't yet have a graceful solution, and we can make an memory allocation with mask __GFP_NOFAIL flags for simplicity, although it is not recommended: * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller * cannot handle allocation failures. This modifier is deprecated and no new * users should be added. FAIL WHALE! W W W WW W W '. W .--._ \ \.--| / -..__) .-' | _ / \'-.__, .__.,' `''._\--' V Happy Holidays ;) Happy Holidays! thanks, liubo - -Jeff Thanks! -Jeff thanks, liubo This afternoon I started running xfstests on a dm-linear mapped partition. Halfway through a sufficiently long test, I swap out the linear mapping to an error mapping. It still crashes, but somewhat less spectacularly. There are still a ton of BUG_ON's I need to eliminate as well as work out the usual I/O error-recovery issue of uninterruptible, unrecoverable writeback contexts and still-locked pages holding up exit. I'm pretty pleased with the results so far and am pretty optimistic. -Jeff -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJO9A2/AAoJEB57S2MheeWyiNIP/3Z6NETIXskkp+OVKTiF/gaP bopj2dp92BlURFHEj5vJoESm4cUtQKTx9J/DB3yc7JDzc0UcRs9KCqGV9UpH6y9/ Zetzy3ZMsYyxvV5CZ50NGr+C1r5ULVGQ/UrPex/GT0bApcdBRMkFASLH8xkFl6dE dfRjir038GzjVX/Phy0VPm0mg8eg77aco11Xk2+Y1MdEhsEqI+cUQYgA8O9M7HWy 67Vv3KWxKC7PU6SYCPa0wGmQwTgs10GuKT9w+s7Ampy8iQhCgEuDo4dQxpRehQfp YwD/vlHwVATTAR2zMbRtI0BWa+ideBzcdQg1QrZxB3o026Z7ooy+/fTqS6MiUrXy mxGvb0g/BglK6Q86YQE77doIfJeUDLGoGQx2Zv1S9OzVwigo1a0LcP82P7yNnJBY oihql+FAYBXwjqiAQ+wUvo7wy0H+ltmQgWfUDf5wjDHquTRT1H0kE15Okc8MX8+T rmhp6vD1deX5Jz+JBIpCm94JhxUBPkBH2WksyA1jdLUOngHxRI0jmqz/5mPexV8e dChaq1rsjYs5Zbbv/jpaefnEw0kbZ0cqS7uDLVVoyjEqGnBpqjdwE86WYjxc4biM MkeSJ67Oof3ZGLWR0VQ+h4YnRjqAsMWsEd3jBLMo2krsr8ucc/UOzVDBVojDlGWJ Z2HunZuWJkNgcsBatVoS =z1sd -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html