raid 10 corruption from single drive failure
Hi, I'm evaluating btrfs for a future deployment, and managed to (repeatedly ) get btrfs to the state where the system can't mount, can't fsck and can't recover. The test setup is pretty small, 6 devices of various size: butter-1.5GA vg_dolt -wi-a 1.50g butter-1.5GB vg_dolt -wi-a 1.50g butter-2GA vg_dolt -wi-a 2.00g butter-2GB vg_dolt -wi-a 2.00g butter-3GA vg_dolt -wi-a 3.00g butter-3GB vg_dolt -wi-a 3.00g Created an btrfs volume: mkfs.btrfs -d raid10 -m raid1 /dev/mapper/vg_dolt-butter--1.5GA /dev/mapper/vg_dolt-butter--1.5GA /dev/mapper/vg_dolt-butter--2GA /dev/mapper/vg_dolt-butter--2GB /dev/mapper/vg_dolt-butter--3GA /dev/mapper/vg_dolt-butter--3GB ( Note how above it is mistyped, This is a 5 disk raid10. Where 1.5GA was listed twice. ) -- mount it and fill it with files ( I downloaded parts of the fedora src.rpm tree ). unmount the partition Zero one drive dd if=/dev/zero of=/dev/vg_dolt/butter-3GB bs=1M skip=100 ( It's sort of hard to fake a corrupt drive, this is a decent way of doing it ) trying to mount it gives the following setup: Jun 28 23:58:34 dolt kernel: [2815554.803082] device fsid 379e495a-9ba7-4485-ae74-6c8939f7b22e devid 3 transid 27 /dev/mapper/vg_dolt-butter--2GB Jun 28 23:58:34 dolt kernel: [2815554.850211] btrfs: disk space caching is enabled Jun 28 23:58:34 dolt kernel: [2815554.850856] btrfs: failed to read chunk tree on dm-6 Jun 28 23:58:34 dolt kernel: [2815554.856453] btrfs: open_ctree failed Jun 28 23:58:44 dolt kernel: [2815565.475519] device fsid 379e495a-9ba7-4485-ae74-6c8939f7b22e devid 3 transid 27 /dev/mapper/vg_dolt-butter--2GB Jun 28 23:58:44 dolt kernel: [2815565.476939] btrfs: enabling auto recovery Jun 28 23:58:44 dolt kernel: [2815565.476944] btrfs: disk space caching is enabled Jun 28 23:58:44 dolt kernel: [2815565.477648] btrfs: failed to read chunk tree on dm-6 Jun 28 23:58:44 dolt kernel: [2815565.486300] btrfs: open_ctree failed Jun 28 23:58:52 dolt kernel: [2815573.522271] device fsid 379e495a-9ba7-4485-ae74-6c8939f7b22e devid 2 transid 27 /dev/mapper/vg_dolt-butter--2GA Jun 28 23:58:52 dolt kernel: [2815573.536624] btrfs: enabling auto recovery Jun 28 23:58:52 dolt kernel: [2815573.536628] btrfs: disk space caching is enabled Jun 28 23:58:52 dolt kernel: [2815573.537185] btrfs: failed to read chunk tree on dm-6 Jun 28 23:58:52 dolt kernel: [2815573.542938] btrfs: open_ctree failed [root@dolt mnt]# btrfsck /dev/vg_dolt/butter-2GA failed to read /dev/sr0 failed to read /dev/sr0 warning, device 5 is missing warning devid 5 not found already checking extents checking fs roots checking root refs Segmentation fault [root@dolt mnt]# mount -o recovery,ro /dev/mapper/vg_dolt-butter--2GA /mnt/test/ mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg_dolt-butter--2GA, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so [root@dolt mnt]# debuginfo-install btrfs-progs-0.20.rc1.20121017git91d9eec-3.fc18.x86_64 [root@dolt mnt]# gdb btrfsck /dev/vg_dolt/butter-2GA GNU gdb (GDB) Fedora (7.5.1-38.fc18) Copyright (C) 2012 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-redhat-linux-gnu. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/sbin/btrfsck...Reading symbols from /usr/lib/debug/usr/sbin/btrfsck.debug...done. done. /dev/vg_dolt/butter-2GA is not a core dump: File format not recognized (gdb) run /dev/vg_dolt/butter-2GA Starting program: /usr/sbin/btrfsck /dev/vg_dolt/butter-2GA failed to read /dev/sr0 failed to read /dev/sr0 warning, device 5 is missing warning devid 5 not found already checking extents checking fs roots checking root refs Program received signal SIGSEGV, Segmentation fault. __GI___libc_free (mem=0x80) at malloc.c:2907 2907 if (chunk_is_mmapped(p)) /* release mmapped memory. */ (gdb) bt full #0 __GI___libc_free (mem=0x80) at malloc.c:2907 ar_ptr = optimized out p = optimized out hook = 0x0 #1 0x0040d429 in close_all_devices (fs_info=0x6323e0) at disk-io.c:1088 list = 0x631050 next = 0x6300b0 tmp = 0x630430 device = 0x6300b0 #2 0x0040e3df in close_ctree (root=root@entry=0x6426e0) at disk-io.c:1135 ret = optimized out fs_info = 0x6323e0 __PRETTY_FUNCTION__ = close_ctree #3 0x00401d8d in main (ac=optimized out, av=optimized out) at btrfsck.c:3593 root_cache = {root = {rb_node = 0x0, rotate_notify = 0x423aad __libc_csu_init+93}} root = optimized out info = optimized out trans = optimized out bytenr =
Re: raid 10 corruption from single drive failure
Making this with all 6 devices from the beginning and btrfsck doesn't segfault. But it also doesn't repair the system enough to make it mountable. ( nether does -o recover, however -o degraded works, and files are then accessible ) Not sure I entirely follow: mounting with -o degraded (not -o recovery) is how you're supposed to mount if there's a disk missing. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 inefficient unbalanced filesystem reads
On Sat, 29 Jun 2013, Martin m_bt...@ml1.co.uk wrote: Mmmm... I'm not sure trying to balance historical read/write counts is the way to go... What happens for the use case of an SSD paired up with a HDD? (For example an SSD and a similarly sized Raptor or enterprise SCSI?...) Or even just JBODs of a mishmash of different speeds? Rather than trying to balance io counts, can a realtime utilisation check be made and go for the least busy? It would also be nice to be able to tune this. For example I've got a RAID-1 array that's mounted noatime, hardly ever written, and accessed via NFS on 100baseT. It would be nice if one disk could be spun down for most of the time and save 7W of system power. Something like the --write-mostly option of mdadm would be good here. Also it should be possible for a RAID-1 array to allow faster reads for a single process reading a single file if the file in question is fragmented. -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix crash regarding to ulist_add_merge
On Fri, Jun 28, 2013 at 12:43:14PM -0700, Zach Brown wrote: On Fri, Jun 28, 2013 at 12:37:45PM +0800, Liu Bo wrote: Several users reported this crash of NULL pointer or general protection, the story is that we add a rbtree for speedup ulist iteration, and we use krealloc() to address ulist growth, and krealloc() use memcpy to copy old data to new memory area, so it's OK for an array as it doesn't use pointers while it's not OK for a rbtree as it uses pointers. So krealloc() will mess up our rbtree and it ends up with crash. Reviewed-by: Wang Shilong wangsl-f...@cn.fujitsu.com Signed-off-by: Liu Bo bo.li@oracle.com Yeah, this should fix the probem. Thanks for being persistent. Reviewed-by: Zach Brown z...@redhat.com + for (i = 0; i ulist-nnodes; i++) + rb_erase(ulist-nodes[i].rb_node, ulist-root); (still twitching over here because this is a bunch of work that achieves nothing :)) Hmm, I think that this is necessary for the inline array inside ulist, so I keep it :) - liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: fix crash regarding to ulist_add_merge
On Fri, Jun 28, 2013 at 01:08:21PM -0400, Josef Bacik wrote: On Fri, Jun 28, 2013 at 10:25:39AM +0800, Liu Bo wrote: Several users reported this crash of NULL pointer or general protection, the story is that we add a rbtree for speedup ulist iteration, and we use krealloc() to address ulist growth, and krealloc() use memcpy to copy old data to new memory area, so it's OK for an array as it doesn't use pointers while it's not OK for a rbtree as it uses pointers. So krealloc() will mess up our rbtree and it ends up with crash. Signed-off-by: Liu Bo bo.li@oracle.com --- v2: fix an use-after-free bug and a finger error(Thanks Zach and Josef). Is this supposed to fix this bug? [ 1215.561033] [ cut here ] [ 1215.561064] kernel BUG at fs/btrfs/ctree.c:1183! [ 1215.561087] invalid opcode: [#1] PREEMPT SMP [ 1215.561114] Modules linked in: btrfs raid6_pq zlib_deflate xor libcrc32c ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4 i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ip6t_REJECT nf_conntrack_ipv6 ib_core nf_defrag_ipv6 ib_addr nf_conntrack_ipv4 iscsi_tcp nf_defrag_ipv4 xt_state nf_conntrack libiscsi_tcp ip6table_filter libisc si ip6_tables scsi_transport_iscsi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm vhost_net snd_timer macvtap snd macvlan tun virtio_net soundcore kvm_amd sunrpc kvm snd_page _alloc sp5100_tco edac_core microcode pcspkr serio_raw k10temp edac_mce_amd i2c_piix4 r8169 mii iomemory_vsl(OF) floppy firewire_ohci firewire_core ata_generic pata_acpi crc_itu_t pata_via radeon ttm drm_kms_helper drm i2c_algo_bit i2c_c ore [ 1215.561585] CPU 1 [ 1215.561597] Pid: 28188, comm: btrfs-endio-wri Tainted: GF O 3.9.0+ #9 To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5 [ 1215.561649] RIP: 0010:[a06f529b] [a06f529b] __tree_mod_log_rewind+0x26b/0x270 [btrfs] [ 1215.561706] RSP: 0018:8803b7529828 EFLAGS: 00010293 [ 1215.561729] RAX: RBX: 8803b42d5960 RCX: 8803b75297c8 [ 1215.561759] RDX: 0002577d RSI: 0921 RDI: 8803b3e92440 [ 1215.561788] RBP: 8803b7529858 R08: 1000 R09: 8803b75297d8 [ 1215.561818] R10: 1bbb R11: R12: 8803b630ddc0 [ 1215.561848] R13: 0044 R14: 8803b3e92540 R15: 00017add [ 1215.561878] FS: 7f9ba1ce7700() GS:88043fc4() knlGS: [ 1215.561911] CS: 0010 DS: ES: CR0: 8005003b [ 1215.561936] CR2: 7fa4a6148d90 CR3: 000427ff7000 CR4: 07e0 [ 1215.561965] DR0: DR1: DR2: [ 1215.561995] DR3: DR6: 0ff0 DR7: 0400 [ 1215.562025] Process btrfs-endio-wri (pid: 28188, threadinfo 8803b7528000, task 8803eb5a97d0) [ 1215.562063] Stack: [ 1215.562073] 88042998e1c0 8800 88042998e1c0 8803c41b8000 [ 1215.562109] 8803b43c4e20 0001 8803b7529908 a06fda47 [ 1215.562146] 8803b7694458 00017add 8803b7529888 8803b42d5960 [ 1215.562182] Call Trace: [ 1215.562200] [a06fda47] btrfs_search_old_slot+0x757/0xa40 [btrfs] [ 1215.562237] [a0779fcd] __resolve_indirect_refs+0x11d/0x670 [btrfs] [ 1215.562273] [a077ab4c] find_parent_nodes+0x1fc/0xe90 [btrfs] [ 1215.562307] [a077b879] btrfs_find_all_roots+0x99/0x100 [btrfs] [ 1215.562341] [a07240b0] ? btrfs_submit_direct+0x680/0x680 [btrfs] [ 1215.562376] [a077c224] iterate_extent_inodes+0x144/0x2f0 [btrfs] [ 1215.562412] [a077c462] iterate_inodes_from_logical+0x92/0xb0 [btrfs] [ 1215.562449] [a07240b0] ? btrfs_submit_direct+0x680/0x680 [btrfs] [ 1215.562484] [a07214f8] record_extent_backrefs+0x78/0xf0 [btrfs] [ 1215.562519] [a072bac6] btrfs_finish_ordered_io+0x156/0x9d0 [btrfs] [ 1215.562556] [a072c355] finish_ordered_fn+0x15/0x20 [btrfs] [ 1215.562589] [a074d96a] worker_loop+0x16a/0x570 [btrfs] [ 1215.562618] [8108f348] ? __wake_up_common+0x58/0x90 [ 1215.562649] [a074d800] ? btrfs_queue_worker+0x300/0x300 [btrfs] [ 1215.562680] [81086c10] kthread+0xc0/0xd0 [ 1215.562703] [8165] ? acpi_processor_add+0xcb/0x47d [ 1215.562731] [81086b50] ? flush_kthread_worker+0xb0/0xb0 [ 1215.562758] [8166452c] ret_from_fork+0x7c/0xb0 [ 1215.562783] [81086b50] ? flush_kthread_worker+0xb0/0xb0 [ 1215.562809] Code: c1 49 63 46 58 48 89 c2 48 c1 e2 05 48 8d 54 10 65 49 63 46 2c 48 89 c6 48 c1 e6 05 48 8d 74 30 65 e8 0a c7 04 00 e9 9d fe ff ff 0f
Re: [PATCH] Btrfs: make backref walking code handle skinny metadata
On Fri, Jun 28, 2013 at 01:12:58PM -0400, Josef Bacik wrote: I missed fixing the backref stuff when I introduced the skinny metadata. If you try and do things like snapshot aware defrag with skinny metadata you are going to see tons of warnings related to the backref count being less than 0. This is because the delayed refs will be found for stuff just fine, but it won't find the skinny metadata extent refs. With this patch I'm not seeing warnings anymore. Thanks, Reviewed-by: Liu Bo bo.li@oracle.com - liubo Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/backref.c | 31 +-- 1 files changed, 25 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 431ea92..eaf1333 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -597,6 +597,7 @@ static int __add_inline_refs(struct btrfs_fs_info *fs_info, int slot; struct extent_buffer *leaf; struct btrfs_key key; + struct btrfs_key found_key; unsigned long ptr; unsigned long end; struct btrfs_extent_item *ei; @@ -614,17 +615,21 @@ static int __add_inline_refs(struct btrfs_fs_info *fs_info, ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item); flags = btrfs_extent_flags(leaf, ei); + btrfs_item_key_to_cpu(leaf, found_key, slot); ptr = (unsigned long)(ei + 1); end = (unsigned long)ei + item_size; - if (flags BTRFS_EXTENT_FLAG_TREE_BLOCK) { + if (found_key.type == BTRFS_EXTENT_ITEM_KEY + flags BTRFS_EXTENT_FLAG_TREE_BLOCK) { struct btrfs_tree_block_info *info; info = (struct btrfs_tree_block_info *)ptr; *info_level = btrfs_tree_block_level(leaf, info); ptr += sizeof(struct btrfs_tree_block_info); BUG_ON(ptr end); + } else if (found_key.type == BTRFS_METADATA_ITEM_KEY) { + *info_level = found_key.offset; } else { BUG_ON(!(flags BTRFS_EXTENT_FLAG_DATA)); } @@ -796,8 +801,11 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, INIT_LIST_HEAD(prefs_delayed); key.objectid = bytenr; - key.type = BTRFS_EXTENT_ITEM_KEY; key.offset = (u64)-1; + if (btrfs_fs_incompat(fs_info, SKINNY_METADATA)) + key.type = BTRFS_METADATA_ITEM_KEY; + else + key.type = BTRFS_EXTENT_ITEM_KEY; path = btrfs_alloc_path(); if (!path) @@ -862,7 +870,8 @@ again: slot = path-slots[0]; btrfs_item_key_to_cpu(leaf, key, slot); if (key.objectid == bytenr - key.type == BTRFS_EXTENT_ITEM_KEY) { + (key.type == BTRFS_EXTENT_ITEM_KEY || + key.type == BTRFS_METADATA_ITEM_KEY)) { ret = __add_inline_refs(fs_info, path, bytenr, info_level, prefs); if (ret) @@ -1276,12 +1285,16 @@ int extent_from_logical(struct btrfs_fs_info *fs_info, u64 logical, { int ret; u64 flags; + u64 size = 0; u32 item_size; struct extent_buffer *eb; struct btrfs_extent_item *ei; struct btrfs_key key; - key.type = BTRFS_EXTENT_ITEM_KEY; + if (btrfs_fs_incompat(fs_info, SKINNY_METADATA)) + key.type = BTRFS_METADATA_ITEM_KEY; + else + key.type = BTRFS_EXTENT_ITEM_KEY; key.objectid = logical; key.offset = (u64)-1; @@ -1294,9 +1307,15 @@ int extent_from_logical(struct btrfs_fs_info *fs_info, u64 logical, return ret; btrfs_item_key_to_cpu(path-nodes[0], found_key, path-slots[0]); - if (found_key-type != BTRFS_EXTENT_ITEM_KEY || + if (found_key-type == BTRFS_METADATA_ITEM_KEY) + size = fs_info-extent_root-leafsize; + else if (found_key-type == BTRFS_EXTENT_ITEM_KEY) + size = found_key-offset; + + if ((found_key-type != BTRFS_EXTENT_ITEM_KEY + found_key-type != BTRFS_METADATA_ITEM_KEY) || found_key-objectid logical || - found_key-objectid + found_key-offset = logical) { + found_key-objectid + size = logical) { pr_debug(logical %llu is not within any extent\n, (unsigned long long)logical); return -ENOENT; -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfsck output: What does it all mean?
This is the btrfsck output for a real-world rsync backup onto a btrfs raid1 mirror across 4 drives (yes, I know at the moment for btrfs raid1 there's only ever two copies of the data...) checking extents checking fs roots root 5 inode 18446744073709551604 errors 2000 root 5 inode 18446744073709551605 errors 1 root 256 inode 18446744073709551604 errors 2000 root 256 inode 18446744073709551605 errors 1 found 3183604633600 bytes used err is 1 total csum bytes: 3080472924 total tree bytes: 28427821056 total fs tree bytes: 23409475584 btree space waste bytes: 4698218231 file data blocks allocated: 3155176812544 referenced 3155176812544 Btrfs Btrfs v0.19 Command exited with non-zero status 1 So: What does that little lot mean? The drives were mounted and active during an unexpected power-plug pull :-( Safe to mount again or are there other checks/fixes needed? Thanks, Martin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 inefficient unbalanced filesystem reads
On 29/06/13 10:41, Russell Coker wrote: On Sat, 29 Jun 2013, Martin wrote: Mmmm... I'm not sure trying to balance historical read/write counts is the way to go... What happens for the use case of an SSD paired up with a HDD? (For example an SSD and a similarly sized Raptor or enterprise SCSI?...) Or even just JBODs of a mishmash of different speeds? Rather than trying to balance io counts, can a realtime utilisation check be made and go for the least busy? It would also be nice to be able to tune this. For example I've got a RAID-1 array that's mounted noatime, hardly ever written, and accessed via NFS on 100baseT. It would be nice if one disk could be spun down for most of the time and save 7W of system power. Something like the --write-mostly option of mdadm would be good here. For that case, a --read-mostly would be more apt ;-) Hence, add a check to preferentially use last disk used if all are idle? Also it should be possible for a RAID-1 array to allow faster reads for a single process reading a single file if the file in question is fragmented. That sounds good but complicated to gather and sort the fragments into groups per disk... Or is something like that already done by the block device elevator for HDDs? Also, is head seek optimisation turned off for SSD accesses? (This is sounding like a lot more than just swapping: current-pid % map-num_stripes to a psuedorandomhash( current-pid ) % map-num_stripes ... ;-) ) Are there any readily accessible present state for such as disk activity or queue length or access latency available for the btrfs process to read? I suspect a good first guess to cover many conditions would be to 'simply' choose whichever device is powered up and has the lowest current latency, or if idle has the lowest historical latency... Regards, Martin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck output: What does it all mean?
Martin posted on Sat, 29 Jun 2013 14:48:40 +0100 as excerpted: This is the btrfsck output for a real-world rsync backup onto a btrfs raid1 mirror across 4 drives (yes, I know at the moment for btrfs raid1 there's only ever two copies of the data...) Being just a btrfs user I don't have a detailed answer, but perhaps this helps. First of all, a btrfs-tools update is available, v0.20-rc1. Given that btrfs is still experimental and the rate of development, even using the live-git version (as I do), is probably the best idea, but certainly, I'd encourage you to get the 0.20-rc1 version at least. FWIW, v0.20-rc1-335- gf00dd83 is what I'm running, that's 335 commits after rc1, on git-commit f00dd83. (Of course similarly with the kernel. You may not want to run the live-git mainline kernel during the commit window or even the first couple of rcs, but starting with rc3 or so, a new mainline pre-release kernel should be /reasonably/ safe to run in general, and the new kernel will have enough fixes to btrfs that you really should be running it. Of course if you've experienced and filed a bug with it and are back on the latest full stable release until it's fixed, or if there's a known btrfs regression in the new version that you're waiting on a fix for, then the latest version without that fix is good, but otherwise, if you're not running the latest kernel and btrfs-tools, you really might be taking chances with your data that you don't need to take, due to already existing fixes you're not yet running.) checking extents checking fs roots root 5 inode 18446744073709551604 errors 2000 root 5 inode 18446744073709551605 errors 1 root 256 inode 18446744073709551604 errors 2000 root 256 inode 18446744073709551605 errors 1 Based on the root numbers, I'd guess those are subvolume IDs. The original root volume has ID 5, and the first subvolume created under it has ID 256, based on my own experience. What the error numbers refer to I don't know. However, based on the identical inode and error numbers seen in both subvolumes, I'd guess that #256 is a snapshot of #5, and that whatever is triggering the errors hadn't been written after the snapshot (thus copying the data to a new location), so when the errors happened in the one, it happened in the other as well, since they're both the same location. The good news of that is that in reality that's only the one set of errors duplicated twice. The bad news is that it affects both snapshots, so if you don't have different snapshot with a newer/older copy of whatever's damaged in those two, you may simply lose it. found 3183604633600 bytes used err is 1 total csum bytes: 3080472924 csum would be checksum... The rest, above and below, says in the output pretty much what I'd be able to make of it, so I've nothing really to add about that. total tree bytes: 28427821056 total fs tree bytes: 23409475584 btree space waste bytes: 4698218231 file data blocks allocated: 3155176812544 referenced 3155176812544 Btrfs Btrfs v0.19 Meanwhile, you didn't mention anything about the --repair option. If you didn't use it just because you want to know a bit more about what it's doing first, OK, but while btrfsck lacked a repair option for quite some time, it has had a --repair option for over year now, so it /is/ possible to try to repair the detected damage, these days. Of course you might be running a really old 0.19+ snapshot without that ability (distros packaged 0.19+ snapshots for some time during which there was no upstream release, tho hopefully the distro package has something about the snapshot it was, but we know your version is old in any case since it's not 0.20-rc1 or newer, but still 0.19 something). I'd suggest ensuring that you're running the latest almost-release 3.10- rc7+ kernel and the latest btrfs-tools, then both trying a mount and running the btrfsck again. You can both watch the output and check the kernel log for output as it runs, and as you try to mount the filesystem. It may be that a newer kernel (presuming your kernel is as old as your btrfs-tools appear to be) might fix whatever's damaged on- mount, so btrfsck won't have anything left to do. If not, since you have backups of the data (well, this was the backup, you have the originals) if anything goes wrong, you can try the --repair option and see what happens. If that doesn't fix it, post the logs and output from the updated kernel and btrfs-tools btrfsck, and ask the experts about it once they have that to look at too. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: hold the tree mod lock in __tree_mod_log_rewind
We need to hold the tree mod log lock in __tree_mod_log_rewind since we walk forward in the tree mod entries, otherwise we'll end up with random entries and trip the BUG_ON() at the front of __tree_mod_log_rewind. This fixes the panics people were seeing when running find /whatever -type f -exec btrfs fi defrag {} \; Thansk, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/ctree.c | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index c32d03d..7921e1d 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1161,8 +1161,8 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info, * time_seq). */ static void -__tree_mod_log_rewind(struct extent_buffer *eb, u64 time_seq, - struct tree_mod_elem *first_tm) +__tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, + u64 time_seq, struct tree_mod_elem *first_tm) { u32 n; struct rb_node *next; @@ -1172,6 +1172,7 @@ __tree_mod_log_rewind(struct extent_buffer *eb, u64 time_seq, unsigned long p_size = sizeof(struct btrfs_key_ptr); n = btrfs_header_nritems(eb); + tree_mod_log_read_lock(fs_info); while (tm tm-seq = time_seq) { /* * all the operations are recorded with the operator used for @@ -1226,6 +1227,7 @@ __tree_mod_log_rewind(struct extent_buffer *eb, u64 time_seq, if (tm-index != first_tm-index) break; } + tree_mod_log_read_unlock(fs_info); btrfs_set_header_nritems(eb, n); } @@ -1274,7 +1276,7 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, extent_buffer_get(eb_rewin); btrfs_tree_read_lock(eb_rewin); - __tree_mod_log_rewind(eb_rewin, time_seq, tm); + __tree_mod_log_rewind(fs_info, eb_rewin, time_seq, tm); WARN_ON(btrfs_header_nritems(eb_rewin) BTRFS_NODEPTRS_PER_BLOCK(fs_info-tree_root)); @@ -1350,7 +1352,7 @@ get_old_root(struct btrfs_root *root, u64 time_seq) btrfs_set_header_generation(eb, old_generation); } if (tm) - __tree_mod_log_rewind(eb, time_seq, tm); + __tree_mod_log_rewind(root-fs_info, eb, time_seq, tm); else WARN_ON(btrfs_header_level(eb) != 0); WARN_ON(btrfs_header_nritems(eb) BTRFS_NODEPTRS_PER_BLOCK(root)); -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html