parent transid verify failed
Hello, this is some btrfs-on-luks, USB hdd as blockdevice. I can't mount my btrfs anymore, getting continuously the same syslog error: - Last output repeated twice - May 11 07:58:25 [kernel] BTRFS error (device dm-3): failed to read block groups: -5 May 11 07:58:25 [kernel] BTRFS error (device dm-3): open_ctree failed May 11 07:58:31 [kernel] BTRFS info (device dm-3): use zlib compression May 11 07:58:31 [kernel] BTRFS info (device dm-3): enabling auto defrag May 11 07:58:31 [kernel] BTRFS info (device dm-3): disk space caching is enabled May 11 07:58:31 [kernel] BTRFS info (device dm-3): has skinny extents May 11 07:58:33 [kernel] BTRFS error (device dm-3): parent transid verify failed on 541635395584 wanted 10388 found 10385 This is the last part of btrfs check --repair (I know, highly experimental, but I didn't get an alternative solution on #btrfs) : rent transid verify failed on 541577035776 wanted 10388 found 10384 parent transid verify failed on 541577035776 wanted 10388 found 10384 parent transid verify failed on 541577035776 wanted 10388 found 10384 parent transid verify failed on 541577035776 wanted 10388 found 10384 parent transid verify failed on 541577035776 wanted 10388 found 10384 Chunk[256, 228, 429526089728]: length(1073741824), offset(429526089728), type(1) is not found in block group Chunk[256, 228, 430599831552]: length(1073741824), offset(430599831552), type(1) is not found in block group Chunk[256, 228, 431673573376]: length(1073741824), offset(431673573376), type(1) is not found in block group Chunk[256, 228, 434894798848]: length(1073741824), offset(434894798848), type(1) is not found in block group Chunk[256, 228, 435968540672]: length(1073741824), offset(435968540672), type(1) is not found in block group Chunk[256, 228, 437042282496]: length(1073741824), offset(437042282496), type(1) is not found in block group Chunk[256, 228, 438116024320]: length(1073741824), offset(438116024320), type(1) is not found in block group ref mismatch on [429497528320 40960] extent item 0, found 1 Backref 429497528320 parent 858210304 owner 0 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 429497528320 parent 858210304 owner 0 offset 0 found 1 wanted 0 back 0x37aaefc0 backpointer mismatch on [429497528320 40960] parent transid verify failed on 541635395584 wanted 10388 found 10385 Ignoring transid failure Failed to find [541635395584, 168, 16384] btrfs unable to find ref byte nr 541635395584 parent 0 root 2 owner 1 offset 0 failed to repair damaged filesystem, aborting How did that happen? Yesterday I sent a big snapshot from local drive to a slower USB drive via btrbk. That was already finished. However the USB drive was completely filled up to 99% and doing some IO apparently. Then I was not able to shutdown the machine. Shutdown was really slow, finally umounts were accomplished, services stopped, system shutdown almost finished, but no shutdown. I did a Sysreq- E I U S R B, no reboot. Sysreq-O did not even shut off. So as last consequence I disconnected power supply. The broken btrfs is actually only a snapshot receiver as backup. I would prefer to get it repaired. Seeing that btrfs is sensitive about filling up to 99% usage, I'm worried about my production btrfs. This is Gentoo-Linux, 4.10.14-ck, btrfs-progs-4.10.2. Best regards, Massimo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: "parent transid verify failed" out of blue sky
At 02/28/2017 02:51 AM, Andrei Borzenkov wrote: This is VM under QEMU/KVM running openSUSE Tumbleweed. I boot it infrequently for short time to test something. Last time it installed quite a lot of updates including kernel (I think 4.9.11 was the last version); I do not remember whether I rebooted it after that. Today I booted it to check something, after 10 minutes did "reboot" and was greeted with grub rescue prompt (it is located on btrfs itself and apparently failed to read its modules as well). Any attempt to mount it fails with "parent transid verfy failed". btrfsck --mode=lowmem from current Tumbleweed snapshot runs for half an hour now with never-ending same message. Would you please provide the size of the fs? lowmem mode is indeed slow, as it doesn't use much memory so it will do tons of tree search instead, which will cause tons of same "parent transid verify failed" if the corrupted node/leaf lies in a hot tree, like root tree or extent tree. Despite that, would you please try to run btrfsck original mode (default mode) on the fs? It may takes some memory but it's more mature than lowmem mode. In fact there are near 10 bug fixes for lowmem mode mode merged yet. I do not care in disk content really, but I would be interested in trying to recover it under guidance. Also if it may be useful I can provide image or other information. Image would be best. However I'm more interested how such problem happens. In theory, Btrfs' mandatory metadata CoW and default data CoW should keep btrfs bullet proof to any powerloss. (While real world is far from theory) Thanks, Qu TIA -andrei -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"parent transid verify failed" out of blue sky
This is VM under QEMU/KVM running openSUSE Tumbleweed. I boot it infrequently for short time to test something. Last time it installed quite a lot of updates including kernel (I think 4.9.11 was the last version); I do not remember whether I rebooted it after that. Today I booted it to check something, after 10 minutes did "reboot" and was greeted with grub rescue prompt (it is located on btrfs itself and apparently failed to read its modules as well). Any attempt to mount it fails with "parent transid verfy failed". btrfsck --mode=lowmem from current Tumbleweed snapshot runs for half an hour now with never-ending same message. I do not care in disk content really, but I would be interested in trying to recover it under guidance. Also if it may be useful I can provide image or other information. TIA -andrei -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"parent transid verify failed"
Hi I am getting some "parent transid verify failed"-errors. Is there any way to find out what's affected? Are these errors in metadata, data or both - and if they are errors in the data: How can I find out which files are affected? Regards, Tobias -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"Fixed", Re: parent transid verify failed on snapshot deletion
On Sat, 12 Mar 2016 20:48:47 +0500 Roman Mamedov <r...@romanrm.net> wrote: > The system was seemingly running just fine for days or weeks, then I > routinely deleted a bunch of old snapshots, and suddenly got hit with: > > [Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify > failed on 7483566862336 wanted 410578 found 404133 > [Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify > failed on 7483566862336 wanted 410578 found 404133 As I mentioned, the initial run of btrfsck --repair did not do anything to fix this problem; I started btrfsck --repair --init-extent-tree, but it still not finished after 5 days, so I looked for other options. While reviewing the btrfs-progs source for some attempts to make btrfsck do something about these transid-failures, I spotted the tool called btrfs-corrupt-block. At this point I was ready to accept some loss of data, which I'd expect to be minor if even user-visible at all (after all the original backtrace is happening in "btrfs_clean_one_deleted_snapshot" so perhaps all that the "bad" block was storing was only related to a snapshot that's already been deleted). I ran: /root/btrfs-corrupt-block -l 7483566862336 /dev/nbd8 Btrfsck then finally reported something inspiring some hope: checking extents checksum verify failed on 7483566862336 found 295F0086 wanted checksum verify failed on 7483566862336 found 295F0086 wanted checksum verify failed on 7483566862336 found 295F0086 wanted checksum verify failed on 7483566862336 found 295F0086 wanted bytenr mismatch, want=7483566862336, have=0 deleting pointer to block 7483566862336 ref mismatch on [6504947712 118784] extent item 0, found 1 adding new data backref on 6504947712 parent 4311306919936 owner 0 offset 0 found 1 Backref 6504947712 parent 4311306919936 owner 0 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 6504947712 parent 4311306919936 owner 0 offset 0 found 1 wanted 0 back 0x57cfdff0 backpointer mismatch on [6504947712 118784] ...etc After a few passes it settled into a state with no new errors reported (only a few of "bad metadata crossing stripe boundary", but those seem to be also commonly reported in connection with filesystems otherwise exhibiting no issues). Finally I was able to mount the FS with no backtrace occurring anymore -- the btrfs-cleaner process then finished all the remaining snapshot deletion work, freeing up 20GB or so. All data seems to be present, and selective checksum verifications showed no corruption. Well, this machine is primarily a backup server using rsync, so it should catch and fix-up any losses. As a side note, for experiments with 'btrfsck --repair', 'btrfs-corrupt-block' and my own patched versions of btrfsck, the technique of making writable CoW snapshots of the whole block device has proved invaluable: At first I used the nbd-server '-c' mode, but quickly discovered it to be flaky: it seems to crash if the amount of changes gets over 150 MB or so, and anyways the RAM usage of it seems to match "block device size / 1000", i.e. it used 6GB of RAM for a 6TB filesystem. So in the end I changed to using the dm-snapshot target as described in [1]. One just has to remember to never have the snapshot and the original device visible and trying to mount one of them on the same machine (this will confuse Btrfs with duplicate UUIDs); for that, I used the same nbd-server (not using its built-in CoW anymore), exporting writable snapshots via network and mounting them on a different server or VM. [1]http://stackoverflow.com/questions/7582019/lvm-like-snapshot-on-a-normal-block-device -- With respect, Roman pgpNk9j4L06u0.pgp Description: OpenPGP digital signature
Re: parent transid verify failed on snapshot deletion
On Sun, 13 Mar 2016 15:52:52 -0600 Chris Murphywrote: > I really think you need a minute's worth of kernel messages prior to > that time stamp. There was no messages a minute, or even (from memory) many hours prior to the crash. If there was something even remotely weird or block-device or FS-related, I would've of course included it with the original report. -- With respect, Roman pgpaM13kZYnMZ.pgp Description: OpenPGP digital signature
Re: parent transid verify failed on snapshot deletion
On Sun, Mar 13, 2016 at 2:55 PM, Roman Mamedovwrote: > On Sun, 13 Mar 2016 14:10:47 -0600 > Chris Murphy wrote: > >> I'm going to guess it's a metadata block, and the profile is single. >> Otherwise, if it were data it'd just be a corrupt file and you'd be >> told which one is affected. And if metadata had more than one copy, >> then it should recover from the copy. The exact nature of the loss >> isn't clear, a kernel message for the time of the bad block message >> might help but I'm going to guess again that it's a 4096 byte missing >> block of metadata. Depending on what it is, that could be a pretty >> serious hole for any file system. > > Pretty sure the metadata is DUP on that FS. Big difference. If it's single and the block is bad, it's uncertain if it's something Btrfs should be able to recover from. If it's DUP then it should be a non-factor. In either case, kernel messages would be a lot more enlightening about what happened right before this. The call trace really isn't that helpful in my opinion, all that tells us is Btrfs got confused. I saved this from before the btrfsck passes: > > # btrfs-debug-tree -b 7483566862336 /dev/alpha/lv1 >:( > node 7483566862336 level 3 items 95 free 26 generation 404133 owner 7 > fs uuid 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99 > chunk uuid 4688dce4-89dd-43eb-a0f4-d10900535183 > key (EXTENT_CSUM EXTENT_CSUM 1062973087744) block 4314139631616 > (1053256746) gen 402032 > key (EXTENT_CSUM EXTENT_CSUM 1091441795072) block 4314548232192 > (1053356502) gen 402102 > key (EXTENT_CSUM EXTENT_CSUM 1107647541248) block 7482607947776 > (1826808581) gen 402791 > key (EXTENT_CSUM EXTENT_CSUM 1176289222656) block 7482608832512 > (1826808797) gen 402791 > key (EXTENT_CSUM EXTENT_CSUM 1199852232704) block 7483421888512 > (1827007297) gen 403882 > key (EXTENT_CSUM EXTENT_CSUM 1252762054656) block 7483566968832 > (1827042717) gen 404133 > key (EXTENT_CSUM EXTENT_CSUM 1302207705088) block 7486122131456 > (1827666536) gen 399086 > key (EXTENT_CSUM EXTENT_CSUM 1342292983808) block 7486136766464 > (1827670109) gen 399086 > key (EXTENT_CSUM EXTENT_CSUM 1357230608384) block 7486143053824 > (1827671644) gen 399088 > key (EXTENT_CSUM EXTENT_CSUM 1374801608704) block 7486219661312 > (1827690347) gen 399097 > key (EXTENT_CSUM EXTENT_CSUM 140654296) block 7482936365056 > (1826888761) gen 403108 > key (EXTENT_CSUM EXTENT_CSUM 1425602490368) block 7482806996992 > (1826857177) gen 402938 > key (EXTENT_CSUM EXTENT_CSUM 1439588401152) block 7492133109760 > (1829134060) gen 400631 > key (EXTENT_CSUM EXTENT_CSUM 1471449923584) block 7486878142464 > (1827851109) gen 399121 > key (EXTENT_CSUM EXTENT_CSUM 1494641868800) block 7486882181120 > (1827852095) gen 399121 > key (EXTENT_CSUM EXTENT_CSUM 1511553085440) block 7492376141824 > (1829193394) gen 400803 > key (EXTENT_CSUM EXTENT_CSUM 1530452836352) block 7492377698304 > (1829193774) gen 400803 > key (EXTENT_CSUM EXTENT_CSUM 1557468987392) block 7544937934848 > (1842025863) gen 401275 > key (EXTENT_CSUM EXTENT_CSUM 1589122428928) block 7544937947136 > (1842025866) gen 401275 > key (EXTENT_CSUM EXTENT_CSUM 1623402835968) block 7544935043072 > (1842025157) gen 401275 > key (EXTENT_CSUM EXTENT_CSUM 1660158967808) block 7544935292928 > (1842025218) gen 401275 > key (EXTENT_CSUM EXTENT_CSUM 1686639628288) block 7544935317504 > (1842025224) gen 401275 > key (EXTENT_CSUM EXTENT_CSUM 1717318074368) block 7545404669952 > (1842139812) gen 401300 > key (EXTENT_CSUM EXTENT_CSUM 1755587174400) block 7544935378944 > (1842025239) gen 401275 > key (EXTENT_CSUM EXTENT_CSUM 1771312803840) block 7482802622464 > (1826856109) gen 402938 > key (EXTENT_CSUM EXTENT_CSUM 1792774889472) block 7545001177088 > (1842041303) gen 401281 > key (EXTENT_CSUM EXTENT_CSUM 1833762066432) block 7545013350400 > (1842044275) gen 401278 > key (EXTENT_CSUM EXTENT_CSUM 1848938086400) block 7545009430528 > (1842043318) gen 401278 > key (EXTENT_CSUM EXTENT_CSUM 1874773962752) block 7545013170176 > (1842044231) gen 401278 > key (EXTENT_CSUM EXTENT_CSUM 1912300650496) block 4309044703232 > (1052012867) gen 401366 > key (EXTENT_CSUM EXTENT_CSUM 1934921564160) block 4308804886528 > (1051954318) gen 401354 > key (EXTENT_CSUM EXTENT_CSUM 1951308283904) block 4310900432896 > (1052465926) gen 401686 > key (EXTENT_CSUM EXTENT_CSUM 1966261223424) block 4309153787904 > (1052039499) gen 401376 > key (EXTENT_CSUM EXTENT_CSUM 1985369530368) block 4311094611968 > (105251) gen 401757 > key (EXTENT_CSUM EXTENT_CSUM 2002212573184) block 4311279501312 > (1052558472) gen 401766 >
Re: parent transid verify failed on snapshot deletion
On Sun, 13 Mar 2016 14:10:47 -0600 Chris Murphywrote: > I'm going to guess it's a metadata block, and the profile is single. > Otherwise, if it were data it'd just be a corrupt file and you'd be > told which one is affected. And if metadata had more than one copy, > then it should recover from the copy. The exact nature of the loss > isn't clear, a kernel message for the time of the bad block message > might help but I'm going to guess again that it's a 4096 byte missing > block of metadata. Depending on what it is, that could be a pretty > serious hole for any file system. Pretty sure the metadata is DUP on that FS. Besides, the "bad" block (only going by btrfsck's lingo here, it's not the usual "hard disk got a bad block" problem) is not entirely missing, just 6k transids older than it should be(???). I saved this from before the btrfsck passes: # btrfs-debug-tree -b 7483566862336 /dev/alpha/lv1 :( node 7483566862336 level 3 items 95 free 26 generation 404133 owner 7 fs uuid 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99 chunk uuid 4688dce4-89dd-43eb-a0f4-d10900535183 key (EXTENT_CSUM EXTENT_CSUM 1062973087744) block 4314139631616 (1053256746) gen 402032 key (EXTENT_CSUM EXTENT_CSUM 1091441795072) block 4314548232192 (1053356502) gen 402102 key (EXTENT_CSUM EXTENT_CSUM 1107647541248) block 7482607947776 (1826808581) gen 402791 key (EXTENT_CSUM EXTENT_CSUM 1176289222656) block 7482608832512 (1826808797) gen 402791 key (EXTENT_CSUM EXTENT_CSUM 1199852232704) block 7483421888512 (1827007297) gen 403882 key (EXTENT_CSUM EXTENT_CSUM 1252762054656) block 7483566968832 (1827042717) gen 404133 key (EXTENT_CSUM EXTENT_CSUM 1302207705088) block 7486122131456 (1827666536) gen 399086 key (EXTENT_CSUM EXTENT_CSUM 1342292983808) block 7486136766464 (1827670109) gen 399086 key (EXTENT_CSUM EXTENT_CSUM 1357230608384) block 7486143053824 (1827671644) gen 399088 key (EXTENT_CSUM EXTENT_CSUM 1374801608704) block 7486219661312 (1827690347) gen 399097 key (EXTENT_CSUM EXTENT_CSUM 140654296) block 7482936365056 (1826888761) gen 403108 key (EXTENT_CSUM EXTENT_CSUM 1425602490368) block 7482806996992 (1826857177) gen 402938 key (EXTENT_CSUM EXTENT_CSUM 1439588401152) block 7492133109760 (1829134060) gen 400631 key (EXTENT_CSUM EXTENT_CSUM 1471449923584) block 7486878142464 (1827851109) gen 399121 key (EXTENT_CSUM EXTENT_CSUM 1494641868800) block 7486882181120 (1827852095) gen 399121 key (EXTENT_CSUM EXTENT_CSUM 1511553085440) block 7492376141824 (1829193394) gen 400803 key (EXTENT_CSUM EXTENT_CSUM 1530452836352) block 7492377698304 (1829193774) gen 400803 key (EXTENT_CSUM EXTENT_CSUM 1557468987392) block 7544937934848 (1842025863) gen 401275 key (EXTENT_CSUM EXTENT_CSUM 1589122428928) block 7544937947136 (1842025866) gen 401275 key (EXTENT_CSUM EXTENT_CSUM 1623402835968) block 7544935043072 (1842025157) gen 401275 key (EXTENT_CSUM EXTENT_CSUM 1660158967808) block 7544935292928 (1842025218) gen 401275 key (EXTENT_CSUM EXTENT_CSUM 1686639628288) block 7544935317504 (1842025224) gen 401275 key (EXTENT_CSUM EXTENT_CSUM 1717318074368) block 7545404669952 (1842139812) gen 401300 key (EXTENT_CSUM EXTENT_CSUM 1755587174400) block 7544935378944 (1842025239) gen 401275 key (EXTENT_CSUM EXTENT_CSUM 1771312803840) block 7482802622464 (1826856109) gen 402938 key (EXTENT_CSUM EXTENT_CSUM 1792774889472) block 7545001177088 (1842041303) gen 401281 key (EXTENT_CSUM EXTENT_CSUM 1833762066432) block 7545013350400 (1842044275) gen 401278 key (EXTENT_CSUM EXTENT_CSUM 1848938086400) block 7545009430528 (1842043318) gen 401278 key (EXTENT_CSUM EXTENT_CSUM 1874773962752) block 7545013170176 (1842044231) gen 401278 key (EXTENT_CSUM EXTENT_CSUM 1912300650496) block 4309044703232 (1052012867) gen 401366 key (EXTENT_CSUM EXTENT_CSUM 1934921564160) block 4308804886528 (1051954318) gen 401354 key (EXTENT_CSUM EXTENT_CSUM 1951308283904) block 4310900432896 (1052465926) gen 401686 key (EXTENT_CSUM EXTENT_CSUM 1966261223424) block 4309153787904 (1052039499) gen 401376 key (EXTENT_CSUM EXTENT_CSUM 1985369530368) block 4311094611968 (105251) gen 401757 key (EXTENT_CSUM EXTENT_CSUM 2002212573184) block 4311279501312 (1052558472) gen 401766 key (EXTENT_CSUM EXTENT_CSUM 2031789600768) block 4311093194752 (1052512987) gen 401757 key (EXTENT_CSUM EXTENT_CSUM 2056985681920) block 4311095111680 (1052513455) gen 401757 key (EXTENT_CSUM EXTENT_CSUM 2086494728192) block 4310101364736 (1052270841) gen 401441 key (EXTENT_CSUM EXTENT_CSUM 2114637971456) block 4311356846080 (1052577355) gen 401773 key (EXTENT_CSUM EXTENT_CSUM
Re: parent transid verify failed on snapshot deletion
My unfortunate experience with these transid problems is that they (1) randomly appear without warning and (2) --repair completely destroys the filesystem. I have right now two separate volumes on two separate disks reporting that error, and --repair surely destroyed the first one. I am trying to see what I can restore from the second one before I try --repair as well. The frustrating part is that these volumes in my case are only used to receive subvolumes, and delete them. From an outsider's point of view, it hardly seems to be a very intensive workload. Sylvain 2016-03-12 12:48 GMT-03:00 Roman Mamedov <r...@romanrm.net>: > Hello, > > The system was seemingly running just fine for days or weeks, then I > routinely deleted a bunch of old snapshots, and suddenly got hit with: > > [Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify > failed on 7483566862336 wanted 410578 found 404133 > [Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify > failed on 7483566862336 wanted 410578 found 404133 > [Sat Mar 12 20:17:10 2016] [ cut here ] > [Sat Mar 12 20:17:10 2016] WARNING: CPU: 0 PID: 217 at > fs/btrfs/extent-tree.c:6549 __btrfs_free_extent.isra.67+0x2c2/0xd40 [btrfs]() > [Sat Mar 12 20:17:10 2016] BTRFS: Transaction aborted (error -5) > [Sat Mar 12 20:17:10 2016] Modules linked in: xt_tcpudp xt_multiport xt_limit > xt_length xt_conntrack ip6t_rpfilter ipt_rpfilter ip6table_raw > ip6table_mangle iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 > nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables iptable_filter > ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave > cpufreq_conservative cfg80211 rfkill arc4 ecb md4 hmac nls_utf8 cifs > dns_resolver fscache 8021q garp mrp bridge stp llc tcp_illinois ext4 crc16 > mbcache jbd2 fuse kvm_amd kvm irqbypass serio_raw evdev pcspkr joydev > snd_hda_codec_realtek k10temp snd_hda_codec_generic snd_hda_codec_hdmi > snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep acpi_cpufreq sp5100_tco > snd_pcm snd_timer tpm_tis snd tpm shpchp soundcore i2c_piix4 button processor > btrfs dm_mod raid1 raid456 > [Sat Mar 12 20:17:10 2016] async_raid6_recov async_memcpy async_pq async_xor > async_tx xor raid6_pq libcrc32c crc32c_generic md_mod sg ata_generic sd_mod > hid_generic usbhid hid uas usb_storage ohci_pci xhci_pci xhci_hcd r8169 mii > sata_mv ahci libahci pata_atiixp ehci_pci ohci_hcd ehci_hcd libata usbcore > usb_common scsi_mod > [Sat Mar 12 20:17:10 2016] CPU: 0 PID: 217 Comm: btrfs-cleaner Tainted: G >W 4.4.4-rm1+ #108 > [Sat Mar 12 20:17:10 2016] Hardware name: Gigabyte Technology Co., Ltd. > GA-E350N-USB3/GA-E350N-USB3, BIOS F2 09/19/2011 > [Sat Mar 12 20:17:10 2016] 0286 7223a131 > 880406befa88 81315721 > [Sat Mar 12 20:17:10 2016] 880406befad0 a03539b2 > 880406befac0 8107e735 > [Sat Mar 12 20:17:10 2016] 000183c9c000 fffb > 88032dbc0e01 069c4f95b000 > [Sat Mar 12 20:17:10 2016] Call Trace: > [Sat Mar 12 20:17:10 2016] [] dump_stack+0x63/0x82 > [Sat Mar 12 20:17:10 2016] [] > warn_slowpath_common+0x95/0xe0 > [Sat Mar 12 20:17:10 2016] [] warn_slowpath_fmt+0x5c/0x80 > [Sat Mar 12 20:17:10 2016] [] > __btrfs_free_extent.isra.67+0x2c2/0xd40 [btrfs] > [Sat Mar 12 20:17:10 2016] [] > __btrfs_run_delayed_refs+0x412/0x1230 [btrfs] > [Sat Mar 12 20:17:10 2016] [] ? > __percpu_counter_add+0x5d/0x80 > [Sat Mar 12 20:17:10 2016] [] > btrfs_run_delayed_refs+0x7e/0x2b0 [btrfs] > [Sat Mar 12 20:17:10 2016] [] > btrfs_should_end_transaction+0x68/0x70 [btrfs] > [Sat Mar 12 20:17:10 2016] [] > btrfs_drop_snapshot+0x45d/0x840 [btrfs] > [Sat Mar 12 20:17:10 2016] [] ? __schedule+0x355/0xa30 > [Sat Mar 12 20:17:10 2016] [] > btrfs_clean_one_deleted_snapshot+0xbd/0x120 [btrfs] > [Sat Mar 12 20:17:10 2016] [] cleaner_kthread+0x17d/0x210 > [btrfs] > [Sat Mar 12 20:17:10 2016] [] ? check_leaf+0x370/0x370 > [btrfs] > [Sat Mar 12 20:17:10 2016] [] kthread+0xea/0x100 > [Sat Mar 12 20:17:10 2016] [] ? kthread_park+0x60/0x60 > [Sat Mar 12 20:17:10 2016] [] ret_from_fork+0x3f/0x70 > [Sat Mar 12 20:17:10 2016] [] ? kthread_park+0x60/0x60 > [Sat Mar 12 20:17:10 2016] ---[ end trace 4a0a05309f1c27f4 ]--- > [Sat Mar 12 20:17:10 2016] BTRFS: error (device dm-0) in > __btrfs_free_extent:6549: errno=-5 IO failure > [Sat Mar 12 20:17:10 2016] BTRFS info (device dm-0): forced readonly > [Sat Mar 12 20:17:10 2016] BTRFS: error (device dm-0) in > btrfs_run_delayed_refs:2927: errno=-5 IO failure > [Sat Mar 12 20:17:10 2016] pending csums is 103825408 > > Now this happens after each reboot too, causing the FS to be remounted
Re: parent transid verify failed on snapshot deletion
On Sun, Mar 13, 2016 at 11:24 AM, Roman Mamedovwrote: > > "Blowing away" a 6TB filesystem just because some block randomly went "bad", I'm going to guess it's a metadata block, and the profile is single. Otherwise, if it were data it'd just be a corrupt file and you'd be told which one is affected. And if metadata had more than one copy, then it should recover from the copy. The exact nature of the loss isn't clear, a kernel message for the time of the bad block message might help but I'm going to guess again that it's a 4096 byte missing block of metadata. Depending on what it is, that could be a pretty serious hole for any file system. > I'm running --init-extent-tree right now in a "what if" mode, using > the copy-on-write feature of 'nbd-server' (this way the original block device > is not modified, and all changes are saved in a separate file). So it's a Btrfs on NDB with no replication either from Btrfs or the storage backing it on the server? Off hand I'd say one of them needs redundancy to avoid this very problem, otherwise it's just too easy for even network corruption to cause a problem (NDB or iSCSI). Not related to your problem, I'm not sure whether and how many times Btrfs retries corrupt reads. That is, device returns read command OK (no error), but Btrfs detects corruption. Does it retry? Or immediately fail? For flash and network based Btrfs, it's possible the result is intermittant so it should try again. > It's been > running for a good 8 hours now, with 100% CPU use of btrfsck and very little > disk access. Yeah btrfs check is very much RAM intensive. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parent transid verify failed on snapshot deletion
On Sun, 13 Mar 2016 17:03:54 + (UTC) Duncan <1i5t5.dun...@cox.net> wrote: > With backups I'd try it, if only for the personal experience value and to > see what the result was. But that's certainly more intensive "surgery" > on the filesystem than --repair, and I'd only do it either for that > experience value or if I was seriously desperate to recover files, as I'd > not trust the filesystem's health after that intensive a surgery, and > would blow the filesystem away after I recovered what I needed, even if > it did appear to work successfully. "Blowing away" a 6TB filesystem just because some block randomly went "bad", without any explanation why, or guarantees that this won't happen again, is not the best outcome. Sure there might be no way to "guarantee" anything, but let's at least figure out a robust way to recover from this failure state. I'm running --init-extent-tree right now in a "what if" mode, using the copy-on-write feature of 'nbd-server' (this way the original block device is not modified, and all changes are saved in a separate file). It's been running for a good 8 hours now, with 100% CPU use of btrfsck and very little disk access. Unless I'm mistaken and something went majorly wrong, these messages (100 MB worth of them by now) seem to indicate it indeed proceeds in recreating the extent tree. adding new data backref on 3282190336 parent 4315246948352 owner 0 offset 0 found 1 Backref 3282190336 root 256 owner 1187677 offset 4096 num_refs 0 not found in extent tree Incorrect local backref count on 3282190336 root 256 owner 1187677 offset 4096 found 1 wanted 0 back 0x23496e40 Backref 3282190336 parent 4315038240768 owner 0 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 3282190336 parent 4315038240768 owner 0 offset 0 found 1 wanted 0 back 0x4b29f3a0 Backref 3282190336 parent 4315246948352 owner 0 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 3282190336 parent 4315246948352 owner 0 offset 0 found 1 wanted 0 back 0x4c330f60 backpointer mismatch on [3282190336 4096] ref mismatch on [3282194432 32768] extent item 0, found 1 adding new data backref on 3282194432 parent 4309109956608 owner 0 offset 0 found 1 Backref 3282194432 parent 4309109956608 owner 0 offset 0 num_refs 0 not found in extent tree Incorrect local backref count on 3282194432 parent 4309109956608 owner 0 offset 0 found 1 wanted 0 back 0x52903a20 backpointer mismatch on [3282194432 32768] ref mismatch on [3282227200 4096] extent item 0, found 1 As it finishes I'll check if files are present and not corrupted, then will have to run it once more, this time "for real". Unfortunately this also seems to be an O(n) operation (if I'm using the term correctly), as the rate at which new log messages appear has been slowing down considerably as it progresses. -- With respect, Roman pgpuvtdhfBIeT.pgp Description: OpenPGP digital signature
Re: parent transid verify failed on snapshot deletion
Roman Mamedov posted on Sun, 13 Mar 2016 14:24:28 +0500 as excerpted: > With "Errors found in extent allocation tree", I wonder if I should try > --init-extent-tree next. With backups I'd try it, if only for the personal experience value and to see what the result was. But that's certainly more intensive "surgery" on the filesystem than --repair, and I'd only do it either for that experience value or if I was seriously desperate to recover files, as I'd not trust the filesystem's health after that intensive a surgery, and would blow the filesystem away after I recovered what I needed, even if it did appear to work successfully. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parent transid verify failed on snapshot deletion
On Sat, 12 Mar 2016 22:15:24 +0500 Roman Mamedov <r...@romanrm.net> wrote: > Seems like it should be safe to run --repair? Well this is unexpected, I ran --repair, and it did not do anything. # btrfsck --repair /dev/alpha/lv1 enabling repair mode Checking filesystem on /dev/alpha/lv1 UUID: 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99 checking extents parent transid verify failed on 7483566862336 wanted 410578 found 404133 parent transid verify failed on 7483566862336 wanted 410578 found 404133 parent transid verify failed on 7483566862336 wanted 410578 found 404133 parent transid verify failed on 7483566862336 wanted 410578 found 404133 Ignoring transid failure bad block 7483566862336 Errors found in extent allocation tree or chunk allocation parent transid verify failed on 7483566862336 wanted 410578 found 404133 Ignoring transid failure Fixed 0 roots. checking free space cache parent transid verify failed on 7483566862336 wanted 410578 found 404133 Ignoring transid failure There is no free space entry for 6504947712-7537164288 cache appears valid but isnt 6463422464 found 2455135691065 bytes used err is -22 total csum bytes: 0 total tree bytes: 368590848 total fs tree bytes: 0 total extent tree bytes: 364605440 btree space waste bytes: 122267201 file data blocks allocated: 1294204928 referenced 1294204928 # btrfsck /dev/alpha/lv1 Checking filesystem on /dev/alpha/lv1 UUID: 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99 checking extents parent transid verify failed on 7483566862336 wanted 410578 found 404133 parent transid verify failed on 7483566862336 wanted 410578 found 404133 parent transid verify failed on 7483566862336 wanted 410578 found 404133 parent transid verify failed on 7483566862336 wanted 410578 found 404133 Ignoring transid failure bad block 7483566862336 Errors found in extent allocation tree or chunk allocation parent transid verify failed on 7483566862336 wanted 410578 found 404133 Ignoring transid failure checking free space cache parent transid verify failed on 7483566862336 wanted 410578 found 404133 Ignoring transid failure There is no free space entry for 6504947712-7537164288 cache appears valid but isnt 6463422464 found 2455135691065 bytes used err is -22 total csum bytes: 0 total tree bytes: 368590848 total fs tree bytes: 0 total extent tree bytes: 364605440 btree space waste bytes: 122267201 file data blocks allocated: 1294204928 referenced 1294204928 With "Errors found in extent allocation tree", I wonder if I should try --init-extent-tree next. -- With respect, Roman pgp181FVnz1MP.pgp Description: OpenPGP digital signature
Re: parent transid verify failed on snapshot deletion
Roman Mamedov posted on Sat, 12 Mar 2016 20:48:47 +0500 as excerpted: > I wonder what's the best way to proceed here. Maybe try btrfs-zero-log? > But the difference between transid numbers of 6 thousands is concerning. btrfs-zero-log is a very specific tool designed to fix a very specific problem, and transid differences >1 are not it. I read your followup, posting btrfs check output and wondering about enabling --repair, as well. As long as you have a backup, shouldn't be a problem, even if it does cause further damage (which it doesn't appear like it will in your case). If you don't have a backup it shouldn't be a problem either, since the very fact that you don't have a backup, indicates by your actions that you consider the data at risk as of less value than the time, effort and resources necessary to have that backup in the first place. As such, even if you lose the data, you saved what was obviously more important than that data to you, the time, effort and resources that you would have otherwise put into making and testing that backup, so you're still coming out ahead. =:^) Which means the only case not clearly covered is that of data worth having backed up, which you do, but the backup is somewhat stale, and as long as the risk was theoretical, you didn't consider the chance of something happening to the data updated since the backup worth more than the cost of updating that backup. But now that the theoretical chance has become reality, while loss of that incremental data isn't earth shattering in its consequences, you'd prefer not to lose it if you can save it without too much trouble. That's quite understandable, and is the exact position I've been in myself a couple times. In both my cases where I did end up actually giving up on repair and eventually blowing away the filesystem, btrfs restore (before that blow- away) was able to get me back the incremental changes since my last proper backup. If it hadn't worked I'd have certainly lost some work and been less than absolutely happy, but as I _did_ have backups (which by the fact that I had them indicated I actually valued the data at risk at something above trivial), they were simply somewhat stale, it wouldn't have been the end of the world. Of course in your case you _can_ mount, if only in read-only mode. So take the opportunity you've been handed and update your backups (and of course backups that haven't been verified readable/restorable aren't yet completed backups, as a would-be backup isn't complete and can't really be considered a backup yet, until that verification is done), just in case, and then even in the worst-case scenario, btrfs check --repair can't do more than inconvenience you a bit if it makes the problem worse instead of fixing it, since you have current backups and will only need to blow away the filesystem and recreate it fresh, in ordered to restore them. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parent transid verify failed on snapshot deletion
Hello, btrfsck output: # btrfsck /dev/alpha/lv1 Checking filesystem on /dev/alpha/lv1 UUID: 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99 checking extents parent transid verify failed on 7483566862336 wanted 410578 found 404133 parent transid verify failed on 7483566862336 wanted 410578 found 404133 parent transid verify failed on 7483566862336 wanted 410578 found 404133 parent transid verify failed on 7483566862336 wanted 410578 found 404133 Ignoring transid failure bad block 7483566862336 Errors found in extent allocation tree or chunk allocation parent transid verify failed on 7483566862336 wanted 410578 found 404133 Ignoring transid failure checking free space cache parent transid verify failed on 7483566862336 wanted 410578 found 404133 Ignoring transid failure There is no free space entry for 6504947712-7537164288 cache appears valid but isnt 6463422464 found 2455135703350 bytes used err is -22 total csum bytes: 0 total tree bytes: 368590848 total fs tree bytes: 0 total extent tree bytes: 364605440 btree space waste bytes: 122267203 file data blocks allocated: 1294204928 referenced 1294204928 Seems like it should be safe to run --repair? -- With respect, Roman pgpWb9lTJLQG2.pgp Description: OpenPGP digital signature
parent transid verify failed on snapshot deletion
Hello, The system was seemingly running just fine for days or weeks, then I routinely deleted a bunch of old snapshots, and suddenly got hit with: [Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify failed on 7483566862336 wanted 410578 found 404133 [Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify failed on 7483566862336 wanted 410578 found 404133 [Sat Mar 12 20:17:10 2016] [ cut here ] [Sat Mar 12 20:17:10 2016] WARNING: CPU: 0 PID: 217 at fs/btrfs/extent-tree.c:6549 __btrfs_free_extent.isra.67+0x2c2/0xd40 [btrfs]() [Sat Mar 12 20:17:10 2016] BTRFS: Transaction aborted (error -5) [Sat Mar 12 20:17:10 2016] Modules linked in: xt_tcpudp xt_multiport xt_limit xt_length xt_conntrack ip6t_rpfilter ipt_rpfilter ip6table_raw ip6table_mangle iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative cfg80211 rfkill arc4 ecb md4 hmac nls_utf8 cifs dns_resolver fscache 8021q garp mrp bridge stp llc tcp_illinois ext4 crc16 mbcache jbd2 fuse kvm_amd kvm irqbypass serio_raw evdev pcspkr joydev snd_hda_codec_realtek k10temp snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep acpi_cpufreq sp5100_tco snd_pcm snd_timer tpm_tis snd tpm shpchp soundcore i2c_piix4 button processor btrfs dm_mod raid1 raid456 [Sat Mar 12 20:17:10 2016] async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic md_mod sg ata_generic sd_mod hid_generic usbhid hid uas usb_storage ohci_pci xhci_pci xhci_hcd r8169 mii sata_mv ahci libahci pata_atiixp ehci_pci ohci_hcd ehci_hcd libata usbcore usb_common scsi_mod [Sat Mar 12 20:17:10 2016] CPU: 0 PID: 217 Comm: btrfs-cleaner Tainted: G W 4.4.4-rm1+ #108 [Sat Mar 12 20:17:10 2016] Hardware name: Gigabyte Technology Co., Ltd. GA-E350N-USB3/GA-E350N-USB3, BIOS F2 09/19/2011 [Sat Mar 12 20:17:10 2016] 0286 7223a131 880406befa88 81315721 [Sat Mar 12 20:17:10 2016] 880406befad0 a03539b2 880406befac0 8107e735 [Sat Mar 12 20:17:10 2016] 000183c9c000 fffb 88032dbc0e01 069c4f95b000 [Sat Mar 12 20:17:10 2016] Call Trace: [Sat Mar 12 20:17:10 2016] [] dump_stack+0x63/0x82 [Sat Mar 12 20:17:10 2016] [] warn_slowpath_common+0x95/0xe0 [Sat Mar 12 20:17:10 2016] [] warn_slowpath_fmt+0x5c/0x80 [Sat Mar 12 20:17:10 2016] [] __btrfs_free_extent.isra.67+0x2c2/0xd40 [btrfs] [Sat Mar 12 20:17:10 2016] [] __btrfs_run_delayed_refs+0x412/0x1230 [btrfs] [Sat Mar 12 20:17:10 2016] [] ? __percpu_counter_add+0x5d/0x80 [Sat Mar 12 20:17:10 2016] [] btrfs_run_delayed_refs+0x7e/0x2b0 [btrfs] [Sat Mar 12 20:17:10 2016] [] btrfs_should_end_transaction+0x68/0x70 [btrfs] [Sat Mar 12 20:17:10 2016] [] btrfs_drop_snapshot+0x45d/0x840 [btrfs] [Sat Mar 12 20:17:10 2016] [] ? __schedule+0x355/0xa30 [Sat Mar 12 20:17:10 2016] [] btrfs_clean_one_deleted_snapshot+0xbd/0x120 [btrfs] [Sat Mar 12 20:17:10 2016] [] cleaner_kthread+0x17d/0x210 [btrfs] [Sat Mar 12 20:17:10 2016] [] ? check_leaf+0x370/0x370 [btrfs] [Sat Mar 12 20:17:10 2016] [] kthread+0xea/0x100 [Sat Mar 12 20:17:10 2016] [] ? kthread_park+0x60/0x60 [Sat Mar 12 20:17:10 2016] [] ret_from_fork+0x3f/0x70 [Sat Mar 12 20:17:10 2016] [] ? kthread_park+0x60/0x60 [Sat Mar 12 20:17:10 2016] ---[ end trace 4a0a05309f1c27f4 ]--- [Sat Mar 12 20:17:10 2016] BTRFS: error (device dm-0) in __btrfs_free_extent:6549: errno=-5 IO failure [Sat Mar 12 20:17:10 2016] BTRFS info (device dm-0): forced readonly [Sat Mar 12 20:17:10 2016] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2927: errno=-5 IO failure [Sat Mar 12 20:17:10 2016] pending csums is 103825408 Now this happens after each reboot too, causing the FS to be remounted read-only. I wonder what's the best way to proceed here. Maybe try btrfs-zero-log? But the difference between transid numbers of 6 thousands is concerning. Also puzzling why did this happen in the first place, I don't think this filesystem had any crashes or storage device-related issues recently. -- With respect, Roman pgpWNWaUvNZsT.pgp Description: OpenPGP digital signature
Re: Fixing recursive fault and parent transid verify failed
On Wed, Dec 09, 2015 at 10:19:41AM +, Duncan wrote: > Alistair Grant posted on Wed, 09 Dec 2015 09:38:47 +1100 as excerpted: > > > On Tue, Dec 08, 2015 at 03:25:14PM +, Duncan wrote: > > Thanks again Duncan for your assistance. > > > > I plugged the ext4 drive I planned to use for the recovery in to the > > machine and immediately got a couple of errors, which makes me wonder > > whether there isn't a hardware problem with the machine somewhere. > > > > So decided to move to another machine to do the recovery. > > Ouch! That can happen, and if you moved the ext4 drive to a different > machine and it was fine there, then it's not the drive. > > But you didn't say what kind of errors or if you checked SMART, or even > how it was plugged in (USB or SATA-direct or...). So I guess you have > that side of things under control. (If not, there's some here who know > quite a bit about that sort of thing...) Yep, I'm familiar enough with smartmontools, etc. to (hopefully) figure this out on my own. > > > So I'm now recovering on Arch Linux 4.1.13-1 with btrfs-progs v4.3.1 > > (the latest version from archlinuxarm.org). > > > > Attempting: > > > > sudo btrfs restore -S -m -v /dev/sdb /mnt/btrfs-recover/ ^&1 | tee > > btrfs-recover.log > > > > only recovered 53 of the more than 106,000 files that should be > > available. > > > > The log is available at: > > > > https://www.dropbox.com/s/p8bi6b8b27s9mhv/btrfs-recover.log?dl=0 > > > > I did attempt btrfs-find-root, but couldn't make sense of the output: > > > > https://www.dropbox.com/s/qm3h2f7c6puvd4j/btrfs-find-root.log?dl=0 > > Yeah, btrfs-find-root's output deciphering takes a bit of knowledge. > Between what I had said and the wiki, I was hoping you could make sense > of things without further help, but... > > ... It turns out that a drive from a separate filesystem was dying and causing all the weird behaviour on the original machine. Having two failures at the same time (drive physical failure and btrfs filesystem corruption) was a bit too much for me, so I aborted the btrfs restore attempts, bought a replacement drive and just went back to the backups (for both failures). Unfortunately, I now won't be able to determine whether there was any connection between the failures or not. So while I didn't get to practice my restore skills, the good news is that it is all back up and running without any problems (yet :-)). Thank you very much for the description and detailed set of steps for using btrfs-find-root and restore. While I didn't get to use them this time, I've added links to the mailing list archive in my btrfs wiki user page so I can find my way back (and if others search for restore and find root they may also benefit from your effort). Thanks again, Alistair -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fixing recursive fault and parent transid verify failed
Alistair Grant posted on Wed, 09 Dec 2015 09:38:47 +1100 as excerpted: > On Tue, Dec 08, 2015 at 03:25:14PM +, Duncan wrote: >> Alistair Grant posted on Tue, 08 Dec 2015 06:55:04 +1100 as excerpted: >> >> > On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote: >> >> Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as >> >> excerpted: >> >> >> >> > I think I'll try the btrfs restore as a learning exercise >> >> >> >> Trying btrfs restore is an excellent idea. It'll make things far >> >> easier if you have to use it for real some day. > > Thanks again Duncan for your assistance. > > I plugged the ext4 drive I planned to use for the recovery in to the > machine and immediately got a couple of errors, which makes me wonder > whether there isn't a hardware problem with the machine somewhere. > > So decided to move to another machine to do the recovery. Ouch! That can happen, and if you moved the ext4 drive to a different machine and it was fine there, then it's not the drive. But you didn't say what kind of errors or if you checked SMART, or even how it was plugged in (USB or SATA-direct or...). So I guess you have that side of things under control. (If not, there's some here who know quite a bit about that sort of thing...) > So I'm now recovering on Arch Linux 4.1.13-1 with btrfs-progs v4.3.1 > (the latest version from archlinuxarm.org). > > Attempting: > > sudo btrfs restore -S -m -v /dev/sdb /mnt/btrfs-recover/ ^&1 | tee > btrfs-recover.log > > only recovered 53 of the more than 106,000 files that should be > available. > > The log is available at: > > https://www.dropbox.com/s/p8bi6b8b27s9mhv/btrfs-recover.log?dl=0 > > I did attempt btrfs-find-root, but couldn't make sense of the output: > > https://www.dropbox.com/s/qm3h2f7c6puvd4j/btrfs-find-root.log?dl=0 Yeah, btrfs-find-root's output deciphering takes a bit of knowledge. Between what I had said and the wiki, I was hoping you could make sense of things without further help, but... Well, at least this gets you some practice before you are desperate. =:^) FWIW, I was really hoping that it would find generation/transid 2308, since that's what it was finding on those errors, but that seems to be too far back. OK, here's the thing about transaction IDs aka transids aka generations. Normally, it's a monotonically increasing number, representing the transaction/commit count at that point. Taking a step back, btrfs organizes things as a tree of trees, with each change cascading up (down?) the tree to its root, and then to the master tree's root. Between this and btrfs' copy-on-write nature, this means the filesystem is atomic. If the system crashes at any point, either the latest changes are committed and the master root reflects them, or the master root points to the previous consistent state of all the subtrees, which is still in place due to copy-on-write and the fact that the changes hadn't cascaded all the way up the trees to the master root, yet. And each time the master root is updated, the generation aka transid is incremented by one. So 3503 is the current generation (see the superblock thinks... bit), 3502 the one before that, 3501 the one before that... The superblocks record the current transid and point (by address, aka bytenr) to that master root. But, because btrfs is copy-on-write, older copies of the master root (and the other roots it points to) tend to hang around for awhile. Which is where btrfs-find-root comes along, as it's designed to find all those old roots, listing them by bytenr and generation/transid. In your case, while generation 3361 is current, there's a list going back to generation 2497 with only a few (just eyeballing it) missing, then 2326, and pretty much nothing before that but the REALLY early generation 2 and 3, which are likely a nearly empty filesystem. OK, that explains the generations/transids. There's also levels, which I don't clearly understand myself; definitely not well enough to try to explain, tho I could make some WAGs but that'd just confuse things if they're equally wildly wrong. But it turns out that levels aren't in practice something you normally need to worry much about anyway, so ignoring them seems to work fine. Then, there's bytenrs, the block addresses. These are more or less randomly large numbers, from an admin perspective, but they're very important numbers, because this is the number you feed to restore's -t option, that tells it which tree root to use. Put a different way, humans read the generation aka transid numbers; btrfs reads the block numbers. So what we do is find a generation number that looks reasonable, and get its corresponding block number, to feed to restore -t. OK, knowing that, you can perhaps make a bit more sense of what those transid verify failed messages are all about. As I said, the current generation is 3503. Apparently, there's a problem in a subtree, however, where the
Re: Fixing recursive fault and parent transid verify failed
Alistair Grant posted on Tue, 08 Dec 2015 06:55:04 +1100 as excerpted: > On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote: >> Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted: >> >> > I think I'll try the btrfs restore as a learning exercise, and to >> > check the contents of my backup (I don't trust my memory, so >> > something could have changed since the last backup). >> >> Trying btrfs restore is an excellent idea. It'll make things far >> easier if you have to use it for real some day. >> >> Note that while I see your kernel is reasonably current (4.2 series), I >> don't know what btrfs-progs ubuntu ships. There have been some marked >> improvements to restore somewhat recently, checking the wiki >> btrfs-progs release-changelog list says 4.0 brought optional metadata >> restore, 4.0.1 added --symlinks, and 4.2.3 fixed a symlink path check >> off-by-one error. (And don't use 4.1.1 as its mkfs.btrfs is broken and >> produces invalid filesystems.) So you'll want at least progs 4.0 to >> get the optional metadata restoration, and 4.2.3 to get full symlinks >> restoration support. >> >> > Ubuntu 15.10 comes with btrfs-progs v4.0. It looks like it is easy > enough to compile and install the latest version from > git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git so > I'll do that. > > Should I stick to 4.2.3 or use the latest 4.3.1? I generally use the latest myself, but recommend as a general guideline that at minimum, a userspace version series matching that of your kernel be used, as if the usual kernel recommendations (within two kernel series of either current or LTS, so presently 4.2 or 4.3 for current or 3.18 or 4.1 for LTS) are followed, that will keep userspace reasonably current as well, and the userspace of a particular version was being developed concurrently with the kernel of the same series, so they're relatively in sync. So with a 4.2 kernel, I'd suggest at least a 4.2 userspace. If you want the latest, as I generally do, and are willing to put up with occasional bleeding edge bugs like that broken mkfs.btrfs in 4.1.1, by all means, use the latest, but otherwise, the general same series as your kernel guideline is quite acceptable. The exception would be if you're trying to fix or recover from a broken filesystem, in which case the very latest tends to have the best chance at fixing things, since it has fixes for (or lacking that, at least detection of) the latest round of discovered bugs, that older versions will lack. While btrfs restore does fall into the recover from broken category, we know from the changelogs that nothing specific has gone into it since the mentioned 4.2.3 symlink off-by-one fix, so while I would recommend at least that since you are going to be working with restore, there's no urgent need for 4.3.0 or 4.3.1 if you're more comfortable with the older version. (In fact, while I knew I was on 4.3.something, I just had to run btrfs version, to check whether it was 4.3 or 4.3.1, myself. FWIW, it was 4.3.1.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fixing recursive fault and parent transid verify failed
On Tue, Dec 08, 2015 at 03:25:14PM +, Duncan wrote: > Alistair Grant posted on Tue, 08 Dec 2015 06:55:04 +1100 as excerpted: > > > On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote: > >> Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted: > >> > >> > I think I'll try the btrfs restore as a learning exercise, and to > >> > check the contents of my backup (I don't trust my memory, so > >> > something could have changed since the last backup). > >> > >> Trying btrfs restore is an excellent idea. It'll make things far > >> easier if you have to use it for real some day. > >> > >> Note that while I see your kernel is reasonably current (4.2 series), I > >> don't know what btrfs-progs ubuntu ships. There have been some marked > >> improvements to restore somewhat recently, checking the wiki > >> btrfs-progs release-changelog list says 4.0 brought optional metadata > >> restore, 4.0.1 added --symlinks, and 4.2.3 fixed a symlink path check > >> off-by-one error. (And don't use 4.1.1 as its mkfs.btrfs is broken and > >> produces invalid filesystems.) So you'll want at least progs 4.0 to > >> get the optional metadata restoration, and 4.2.3 to get full symlinks > >> restoration support. > >> > >> ... Thanks again Duncan for your assistance. I plugged the ext4 drive I planned to use for the recovery in to the machine and immediately got a couple of errors, which makes me wonder whether there isn't a hardware problem with the machine somewhere. So decided to move to another machine to do the recovery. So I'm now recovering on Arch Linux 4.1.13-1 with btrfs-progs v4.3.1 (the latest version from archlinuxarm.org). Attempting: sudo btrfs restore -S -m -v /dev/sdb /mnt/btrfs-recover/ ^&1 | tee btrfs-recover.log only recovered 53 of the more than 106,000 files that should be available. The log is available at: https://www.dropbox.com/s/p8bi6b8b27s9mhv/btrfs-recover.log?dl=0 I did attempt btrfs-find-root, but couldn't make sense of the output: https://www.dropbox.com/s/qm3h2f7c6puvd4j/btrfs-find-root.log?dl=0 Simply mounting the drive, then re-mounting it read only, and rsync'ing the files to the backup drive recovered 97,974 files before crashing. If anyone is interested, I've uploaded a photo of the console to: https://www.dropbox.com/s/xbrp6hiah9y6i7s/rsync%20crash.jpg?dl=0 I'm currently running a hashdeep audit between the recovered files and the backup to see how the recovery went. If you'd like me to try any other tests, I'll keep the damaged file system for at least the next day or so. Thanks again for all your assistance, Alistair -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fixing recursive fault and parent transid verify failed
Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted: > I think I'll try the btrfs restore as a learning exercise, and to check > the contents of my backup (I don't trust my memory, so something could > have changed since the last backup). Trying btrfs restore is an excellent idea. It'll make things far easier if you have to use it for real some day. Note that while I see your kernel is reasonably current (4.2 series), I don't know what btrfs-progs ubuntu ships. There have been some marked improvements to restore somewhat recently, checking the wiki btrfs-progs release-changelog list says 4.0 brought optional metadata restore, 4.0.1 added --symlinks, and 4.2.3 fixed a symlink path check off-by-one error. (And don't use 4.1.1 as its mkfs.btrfs is broken and produces invalid filesystems.) So you'll want at least progs 4.0 to get the optional metadata restoration, and 4.2.3 to get full symlinks restoration support. > Does btrfs restore require the path to be on a btrfs filesystem? I've > got an existing ext4 drive with enough free space to do the restore, so > would prefer to use it than have to buy another drive. Restoring to ext4 should be fine. Btrfs restore writes files as would an ordinary application, the reason metadata restoration is optional (otherwise it uses normal file change and mod times, with files written as the running user, root, using umask- based file perms, all exactly the same as if it were a normal file writing application), so it will restore to any normal filesystem. The filesystem it's restoring /from/ of course must be btrfs... unmounted since it's designed to be used when mounting is broken, but it writes files normally, so can write them to any filesystem. FWIW, I restored to my reiserfs based media partition (still on spinning rust, my btrfs are all on ssd) here, since that's where I had the room to work with. > My plan is: > > * btrfs restore /dev/sdX /path/to/ext4/restorepoint > ** Where /dev/sdX is one of the two drives that were part of the raid1 >fileystem > * hashdeep audit the restored drive and backup > * delete the existing corrupted btrfs filesystem and recreate > * rsync the merge filesystem (from backup and restore) > on to the new filesystem > > Any comments or suggestions are welcome. Looks very reasonable, here. There's a restore page on the wiki with more information than the btrfs-restore manpage, describing how to use it with btrfs-find-root if necessary, etc. https://btrfs.wiki.kernel.org/index.php/Restore Some details on the page are a bit dated; it doesn't cover the dryrun, list-roots, metadata and symlink options, for instance, and these can be very helpful, but the general idea remains the same. The general idea is to use btrfs-find-root to get a listing of available root generations (if restore can't find a working root from the superblocks or you want to try restoring an earlier root), then feed the corresponding bytenr to restore's -t option. Note that generation and transid refer to the same thing, a normally increasing number, so higher generations are newer. The wiki page makes this much clearer than it used to, but the old wording anyway was confusing to me until I figured that out. Where the wiki page talks about root object-ids, those are the various subtrees, low numbers are the base trees, 256+ are subvolumes/snapshots. Note that restore's list-roots option lists these for the given bytenr as well. So you try restore with list-roots (-l) to see what it gives you, try btrfs-find-root if not satisfied, to find older generations and get their bytenrs to plug into restore with -t, and then confirm specific generation bytenrs with list-roots again. Once you have a good generation/bytenr candidate, try a dry-run (-D) to see if you get a list of files it's trying to restore that looks reasonable. If the dry-run goes well, you can try the full restore, not forgetting the metadata and symlinks options (-m, -S, respectively), if desired. >From there you can continue with your plan as above. One more bonus hint. Since you'll be doing a new mkfs.btrfs, it's a good time to review active features and decide which ones you might wish to activate (or not, if you're concerned about old-kernel compatibility). Additionally, before repopulating your new filesystem, you may want to review mount options, particularly autodefrag if appropriate, and compression if desired, so they take effect from the very first file created on the new filesystem. =:^) FWIW in the past I usually did an immediate post-mkfs.btrfs mount and balance with -dusage=0 -musage=0 to get rid of the single-mode chunk artifacts from the mkfs.btrfs as well, but with a new enough mkfs.btrfs you may be able to avoid that now, as -progs 4.2 was supposed to eliminate those single-mode mkfs.btrfs artifacts on multi-device filesystems. I've just not done any fresh mkfs.btrfs since then so haven't had a
Re: Fixing recursive fault and parent transid verify failed
Alistair Grant posted on Mon, 07 Dec 2015 12:57:15 +1100 as excerpted: > I've ran btrfs scrub and btrfsck on the drives, with the output included > below. Based on what I've found on the web, I assume that a > btrfs-zero-log is required. > > * Is this the recommended path? [Just replying to a couple more minor points, here.] Absolutely not. btrfs-zero-log isn't the tool you need here. About the btrfs log... Unlike most journaling filesystems, btrfs is designed to be atomic and consistent at commit time (every 30 seconds by default) and doesn't log normal filesystem activity at all. The only thing logged is fsyncs, allowing them to deliver on their file-written-to-hardware guarantees, without forcing the entire atomic filesystem sync, which would trigger a normal atomic commit and thus is a far heavier weight process. IOW, all it does is log and speedup fsyncs. The filesystem is designed to be atomically consistent at commit time, with or without the log, with the only thing missing if the log isn't replayed being the last few seconds of fsyncs since the last atomic commit. So the btrfs log is very limited in scope and will in many cases be entirely empty, if there were no fsyncs after the last atomic filesystem commit, again, every 30 seconds by default, so in human terms at least, not a lot of time. About btrfs log replay... The kernel, meanwhile, is designed to replay the log automatically at mount time. If the mount is successful, the log has by definition been replayed successfully and zeroing it wouldn't have done much of anything but possibly lose you a few seconds worth of fsyncs. Since you are able to run scrub, which requires a writable mount, the mount is definitely successful, which means btrfs-zero-log is the wrong tool for the job, since it addresses a problem you obviously don't have. > * Is there a way to find out which files will be affected by the loss of > the transactions? I'm interpreting that question in the context of the transid wanted/found listings in your linked logs, since it no longer makes sense in the context of btrfs-zero-log, given the information above. I believe so, but the most direct method requires manual use of btrfs- debug and similar tools, looking up addresses and tracing down the files to which they belong. Of course that's if the addresses trace to actual files at all. If they trace to metadata instead of data, then it's not normally files, but the metadata (including checksums and very small files of only a few KiB) about files, instead. Of course if it's metadata the problem's worse, as a single bad metadata block can affect multiple actual files. The more indirect way would be to use btrfs restore with the -t option, feeding it the root address associated with the transid found (with that association traced via btrfs-find-root), to restore the file from the filesystem as it existed at that point, to some other mounted filesystem, also using the restore metadata option. You could then do for instance a diff of the listing (or possibly a per-file checksum, say md5sum, of both versions) between your current backup (or current mounted filesystem, since you can still mount it) and the restored version, which would be the files at the time of that transaction-id, and see which ones changed. That of course would be the affected files. =:^] > I do have a backup of the drive (which I believe is completely up to > date, the btrfs volume is used for archiving media and documents, and > single person use of git repositories, i.e. only very light writing and > reading). Of course either one of the above is going to be quite some work, and if you have a current backup, simply restoring it is likely to be far easier, unless of course you're interested in practicing your recovery technique or the like, certainly not a valueless endeavor, if you have the time and patience for it. The *GOOD* thing is that you *DO* have a current backup. Far *FAR* too many people we see posting here, are unfortunately finding out the hard way, that their actions, or more precisely, lack thereof, in failing to do backups, put the lie to any claims that they actually valued the data. As any good sysadmin can tell you, often from unhappy lessons such as this, if it's not backed up, by definition, your actions are placing its value at less than the time and resources necessary to do that backup (modified of course by the risk factor of actually needing it, thus taking care of the Nth level backup, some of which are off-site, if the data is really /that/ valuable, while also covering the throw-away data that's so trivial as to not justify even the effort of a single level of backup). So hurray for you! =:^) (FWIW, I personally have backups of most stuff here, often several levels, tho I don't always keep them current. But should I be forced to resort to them, I'm prepared to lose the intervening updates, as I
Re: Fixing recursive fault and parent transid verify failed
On Mon, Dec 07, 2015 at 08:25:01AM +, Duncan wrote: > Alistair Grant posted on Mon, 07 Dec 2015 12:57:15 +1100 as excerpted: > > > I've ran btrfs scrub and btrfsck on the drives, with the output included > > below. Based on what I've found on the web, I assume that a > > btrfs-zero-log is required. > > > > * Is this the recommended path? > > [Just replying to a couple more minor points, here.] > > Absolutely not. btrfs-zero-log isn't the tool you need here. > > About the btrfs log... > > Unlike most journaling filesystems, btrfs is designed to be atomic and > consistent at commit time (every 30 seconds by default) and doesn't log > normal filesystem activity at all. The only thing logged is fsyncs, > allowing them to deliver on their file-written-to-hardware guarantees, > without forcing the entire atomic filesystem sync, which would trigger a > normal atomic commit and thus is a far heavier weight process. IOW, all > it does is log and speedup fsyncs. The filesystem is designed to be > atomically consistent at commit time, with or without the log, with the > only thing missing if the log isn't replayed being the last few seconds > of fsyncs since the last atomic commit. > > So the btrfs log is very limited in scope and will in many cases be > entirely empty, if there were no fsyncs after the last atomic filesystem > commit, again, every 30 seconds by default, so in human terms at least, > not a lot of time. > > About btrfs log replay... > > The kernel, meanwhile, is designed to replay the log automatically at > mount time. If the mount is successful, the log has by definition been > replayed successfully and zeroing it wouldn't have done much of anything > but possibly lose you a few seconds worth of fsyncs. > > Since you are able to run scrub, which requires a writable mount, the > mount is definitely successful, which means btrfs-zero-log is the wrong > tool for the job, since it addresses a problem you obviously don't have. OK, thanks for the detailed explanation (here and below, so I don't have to repeat myself). The reason I thought it might be required was that the parent transid failed errors were found even after a reboot (and obviously remounting the filesystem) and without any user activity. > > > * Is there a way to find out which files will be affected by the loss of > > the transactions? > > I'm interpreting that question in the context of the transid wanted/found > listings in your linked logs, since it no longer makes sense in the > context of btrfs-zero-log, given the information above. > > I believe so, but the most direct method requires manual use of btrfs- > debug and similar tools, looking up addresses and tracing down the files > to which they belong. Of course that's if the addresses trace to actual > files at all. If they trace to metadata instead of data, then it's not > normally files, but the metadata (including checksums and very small > files of only a few KiB) about files, instead. Of course if it's > metadata the problem's worse, as a single bad metadata block can affect > multiple actual files. > > The more indirect way would be to use btrfs restore with the -t option, > feeding it the root address associated with the transid found (with that > association traced via btrfs-find-root), to restore the file from the > filesystem as it existed at that point, to some other mounted filesystem, > also using the restore metadata option. You could then do for instance a > diff of the listing (or possibly a per-file checksum, say md5sum, of both > versions) between your current backup (or current mounted filesystem, > since you can still mount it) and the restored version, which would be > the files at the time of that transaction-id, and see which ones > changed. That of course would be the affected files. =:^] > I think I'll try the btrfs restore as a learning exercise, and to check the contents of my backup (I don't trust my memory, so something could have changed since the last backup). Does btrfs restore require the path to be on a btrfs filesystem? I've got an existing ext4 drive with enough free space to do the restore, so would prefer to use it than have to buy another drive. My plan is: * btrfs restore /dev/sdX /path/to/ext4/restorepoint ** Where /dev/sdX is one of the two drives that were part of the raid1 fileystem * hashdeep audit the restored drive and backup * delete the existing corrupted btrfs filesystem and recreate * rsync the merge filesystem (from backup and restore) on to the new filesystem Any comments or suggestions are welcome. > > I do have a backup of the drive (which I believe is completely up to > > date, the btrfs volume is used for archiving media and documents, and > > single person use of git repositories, i.e. only very light writing and > > reading). > > Of course either one of the above is going to be quite some work, and if > you have a current backup, simply
Re: Fixing recursive fault and parent transid verify failed
On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote: > Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted: > > > I think I'll try the btrfs restore as a learning exercise, and to check > > the contents of my backup (I don't trust my memory, so something could > > have changed since the last backup). > > Trying btrfs restore is an excellent idea. It'll make things far easier > if you have to use it for real some day. > > Note that while I see your kernel is reasonably current (4.2 series), I > don't know what btrfs-progs ubuntu ships. There have been some marked > improvements to restore somewhat recently, checking the wiki btrfs-progs > release-changelog list says 4.0 brought optional metadata restore, 4.0.1 > added --symlinks, and 4.2.3 fixed a symlink path check off-by-one error. > (And don't use 4.1.1 as its mkfs.btrfs is broken and produces invalid > filesystems.) So you'll want at least progs 4.0 to get the optional > metadata restoration, and 4.2.3 to get full symlinks restoration support. > Ubuntu 15.10 comes with btrfs-progs v4.0. It looks like it is easy enough to compile and install the latest version from git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git so I'll do that. Should I stick to 4.2.3 or use the latest 4.3.1? > > Does btrfs restore require the path to be on a btrfs filesystem? I've > > got an existing ext4 drive with enough free space to do the restore, so > > would prefer to use it than have to buy another drive. > > Restoring to ext4 should be fine. > > Btrfs restore writes files as would an ordinary application, the reason > metadata restoration is optional (otherwise it uses normal file change > and mod times, with files written as the running user, root, using umask- > based file perms, all exactly the same as if it were a normal file > writing application), so it will restore to any normal filesystem. The > filesystem it's restoring /from/ of course must be btrfs... unmounted > since it's designed to be used when mounting is broken, but it writes > files normally, so can write them to any filesystem. > > FWIW, I restored to my reiserfs based media partition (still on spinning > rust, my btrfs are all on ssd) here, since that's where I had the room to > work with. > Thanks for the confirmation. > > My plan is: > > > > * btrfs restore /dev/sdX /path/to/ext4/restorepoint > > ** Where /dev/sdX is one of the two drives that were part of the raid1 > >fileystem > > * hashdeep audit the restored drive and backup > > * delete the existing corrupted btrfs filesystem and recreate > > * rsync the merge filesystem (from backup and restore) > > on to the new filesystem > > > > Any comments or suggestions are welcome. > > > Looks very reasonable, here. There's a restore page on the wiki with > more information than the btrfs-restore manpage, describing how to use it > with btrfs-find-root if necessary, etc. > > https://btrfs.wiki.kernel.org/index.php/Restore > I'd seen this, but it isn't explicit about the target filesystem support. I should try and update the page a bit. > Some details on the page are a bit dated; it doesn't cover the dryrun, > list-roots, metadata and symlink options, for instance, and these can be > very helpful, but the general idea remains the same. > > The general idea is to use btrfs-find-root to get a listing of available > root generations (if restore can't find a working root from the > superblocks or you want to try restoring an earlier root), then feed the > corresponding bytenr to restore's -t option. > > Note that generation and transid refer to the same thing, a normally > increasing number, so higher generations are newer. The wiki page makes > this much clearer than it used to, but the old wording anyway was > confusing to me until I figured that out. > > Where the wiki page talks about root object-ids, those are the various > subtrees, low numbers are the base trees, 256+ are subvolumes/snapshots. > Note that restore's list-roots option lists these for the given bytenr as > well. > > So you try restore with list-roots (-l) to see what it gives you, try > btrfs-find-root if not satisfied, to find older generations and get their > bytenrs to plug into restore with -t, and then confirm specific > generation bytenrs with list-roots again. > > Once you have a good generation/bytenr candidate, try a dry-run (-D) to > see if you get a list of files it's trying to restore that looks > reasonable. > > If the dry-run goes well, you can try the full restore, not forgetting > the metadata and symlinks options (-m, -S, respectively), if desired. > > From there you can continue with your plan as above. > > One more bonus hint. Since you'll be doing a new mkfs.btrfs, it's a good > time to review active features and decide which ones you might wish to > activate (or not, if you're concerned about old-kernel compatibility). > Additionally, before
Re: Fixing recursive fault and parent transid verify failed
On 12/07/2015 02:57 PM, Alistair Grant wrote as excerpted: > Fixing recursive fault, but reboot is needed For the record: I saw the same message (incl. hard lockup) when doing a balance on a single-disk btrfs. Besides that, the fs works flawlessly (~60GB, usage: no snapshots, ~15 lxc containers, low-load databases, few mails, a couple of Web servers). As this is a production machine, I rather rebooted the machine instead of investigating but the error is reproducible if that would be of great interest. > I've ran btrfs scrub and btrfsck on the drives, with the output > included below. Based on what I've found on the web, I assume that a > btrfs-zero-log is required. > > * Is this the recommended path? > * Is there a way to find out which files will be affected by the loss of > the transactions? > Kernel: Ubuntu 4.2.0-19-generic (which is based on mainline 4.2.6) I used Debian Backports 4.2.6. Cheers, Lukas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fixing recursive fault and parent transid verify failed
Hi, (Resending as it looks like the first attempt didn't get through, probably too large, so logs are now in dropbox) I have a btrfs volume which is raid1 across two spinning rust disks, each 2TB. When trying to access some files from a another machine using sshfs the server machine has crashed twice resulting in a hard lock up, i.e. power off required to restart the machine. There are no crash dumps in /var/log/syslog, or anything that looks like an associated error message to me, however on the second occasion I was able to see the following message flash up the console (in addition to some stack dumps): Fixing recursive fault, but reboot is needed I've ran btrfs scrub and btrfsck on the drives, with the output included below. Based on what I've found on the web, I assume that a btrfs-zero-log is required. * Is this the recommended path? * Is there a way to find out which files will be affected by the loss of the transactions? I do have a backup of the drive (which I believe is completely up to date, the btrfs volume is used for archiving media and documents, and single person use of git repositories, i.e. only very light writing and reading). Some basic details: OS: Ubuntu 15.10 Kernel: Ubuntu 4.2.0-19-generic (which is based on mainline 4.2.6) > sudo btrfs fi df /srv/d2root == Data, RAID1: total=250.00GiB, used=248.86GiB Data, single: total=8.00MiB, used=0.00B System, RAID1: total=8.00MiB, used=64.00KiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=1.00GiB, used=466.77MiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=160.00MiB, used=0.00B > sudo btrfs fi usage /srv/d2root = Overall: Device size: 3.64TiB Device allocated:502.04GiB Device unallocated:3.15TiB Device missing: 0.00B Used:498.62GiB Free (estimated): 1.58TiB (min: 1.58TiB) Data ratio: 2.00 Metadata ratio: 1.99 Global reserve: 160.00MiB (used: 0.00B) Data,single: Size:8.00MiB, Used:0.00B /dev/sdc8.00MiB Data,RAID1: Size:250.00GiB, Used:248.86GiB /dev/sdb 250.00GiB /dev/sdc 250.00GiB Metadata,single: Size:8.00MiB, Used:0.00B /dev/sdc8.00MiB Metadata,RAID1: Size:1.00GiB, Used:466.77MiB /dev/sdb1.00GiB /dev/sdc1.00GiB System,single: Size:4.00MiB, Used:0.00B /dev/sdc4.00MiB System,RAID1: Size:8.00MiB, Used:64.00KiB /dev/sdb8.00MiB /dev/sdc8.00MiB Unallocated: /dev/sdb1.57TiB /dev/sdc1.57TiB btrfs scrub output: https://www.dropbox.com/s/blqvopa1lhkghe5/scrub.log?dl=0 btrfsck sdb output: https://www.dropbox.com/s/hw6w6cupuu1rny4/btrfsck.sdb.log?dl=0 btrfsck sdc output: https://www.dropbox.com/s/mijz492mjr76p8z/btrfsck.sdc.log?dl=0 Thanks very much, Alistair -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fs unreadable after powercycle: BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922
Martin Tippmann wrote on 2015/08/08 20:43 +0200: Hi, after a hard reboot (powercycle) a btrfs volume did not come up again: It's a single 4TB disk - only btrfs with lzo - data=single,metadata=dup [ 121.831814] BTRFS info (device sda): disk space caching is enabled [ 121.857820] BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922 [ 121.861607] BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922 [ 121.861715] BTRFS: failed to read tree root on sda [ 121.878111] BTRFS: open_ctree failed btrfs-progs v4.0 Kernel: 4.1.4 I'm quite sure that the HDD is fine (no SMART Problems, Disk Errorlog is empty, It's a new Enterprise-Drive that worked well in the past days/weeks). So I'm kind at loss what to do: How can I recover from that problem? I've found just a note in the FAQ[1] but no solution to the problem. Maybe someone can give some clues why does this happen in the first place? Is it unfortunate timing due to the abrupt power cycle? Shouldn't CoW protect against this somewhat? Thanks for any hints! Additional info: # btrfs check /dev/sda parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 Ignoring transid failure Couldn't setup extent tree Seems extent tree or tree root is corrupted. Couldn't open file system Not sure what it does but it looks not too good: # btrfs-find-root /dev/sda parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Superblock thinks the generation is 390924 Superblock thinks the level is 1 Well block 427084988416(gen: 390923 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427084021760(gen: 390923 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 These two block seems to be good, but not sure why there will be two of them. Try btrfsck --tree-root 427084988416 and btrfsck --tree-root 427084021760 to see which produce the least number of error. Thanks, Qu Well block 427084431360(gen: 390915 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427084398592(gen: 390915 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427083988992(gen: 390915 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427038621696(gen: 390914 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427031035904(gen: 390913 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427285069824(gen: 390912 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427060887552(gen: 390912 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427013128192(gen: 390912 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427001872384(gen: 390909 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965237760(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965221376(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965188608(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965172224(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965155840(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426964271104(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426964156416(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426950377472(gen: 390905 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426944512000(gen: 390905 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426940841984(gen: 390905 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426940612608(gen: 390905 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426940465152(gen: 390905 level: 0) seems good
fs unreadable after powercycle: BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922
Hi, after a hard reboot (powercycle) a btrfs volume did not come up again: It's a single 4TB disk - only btrfs with lzo - data=single,metadata=dup [ 121.831814] BTRFS info (device sda): disk space caching is enabled [ 121.857820] BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922 [ 121.861607] BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922 [ 121.861715] BTRFS: failed to read tree root on sda [ 121.878111] BTRFS: open_ctree failed btrfs-progs v4.0 Kernel: 4.1.4 I'm quite sure that the HDD is fine (no SMART Problems, Disk Errorlog is empty, It's a new Enterprise-Drive that worked well in the past days/weeks). So I'm kind at loss what to do: How can I recover from that problem? I've found just a note in the FAQ[1] but no solution to the problem. Maybe someone can give some clues why does this happen in the first place? Is it unfortunate timing due to the abrupt power cycle? Shouldn't CoW protect against this somewhat? Thanks for any hints! Additional info: # btrfs check /dev/sda parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 Ignoring transid failure Couldn't setup extent tree Couldn't open file system Not sure what it does but it looks not too good: # btrfs-find-root /dev/sda parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Superblock thinks the generation is 390924 Superblock thinks the level is 1 Well block 427084988416(gen: 390923 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427084021760(gen: 390923 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427084431360(gen: 390915 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427084398592(gen: 390915 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427083988992(gen: 390915 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427038621696(gen: 390914 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427031035904(gen: 390913 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427285069824(gen: 390912 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427060887552(gen: 390912 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427013128192(gen: 390912 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427001872384(gen: 390909 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965237760(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965221376(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965188608(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965172224(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965155840(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426964271104(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426964156416(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426950377472(gen: 390905 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426944512000(gen: 390905 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426940841984(gen: 390905 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426940612608(gen: 390905 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426940465152(gen: 390905 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426940153856(gen: 390905 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426939809792(gen: 390905 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block
Re: fs unreadable after powercycle: BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922
On Sat, Aug 08, 2015 at 08:43:34PM +0200, Martin Tippmann wrote: Hi, after a hard reboot (powercycle) a btrfs volume did not come up again: It's a single 4TB disk - only btrfs with lzo - data=single,metadata=dup [ 121.831814] BTRFS info (device sda): disk space caching is enabled [ 121.857820] BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922 [ 121.861607] BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922 [ 121.861715] BTRFS: failed to read tree root on sda [ 121.878111] BTRFS: open_ctree failed btrfs-progs v4.0 Kernel: 4.1.4 I'm quite sure that the HDD is fine (no SMART Problems, Disk Errorlog is empty, It's a new Enterprise-Drive that worked well in the past days/weeks). So I'm kind at loss what to do: How can I recover from that problem? I've found just a note in the FAQ[1] but no solution to the problem. Maybe someone can give some clues why does this happen in the first place? Is it unfortunate timing due to the abrupt power cycle? Shouldn't CoW protect against this somewhat? Not somewhat: it should protect it completely. There are two ways that this can happen: it's a bug in btrfs, or there's something stopping barriers from working. That latter case can be either a bug in the kernel's block layer (pretty unlikely), or the hardware is behaving badly and ignoring the barriers (more likely, particularly if it's on a USB/SATA converter). I don't think there's a good solution to transid failures, I'm afraid. The best that I'm aware of is to use btrfs restore to grab the pieces of your FS that aren't up to date in your backups, and then restore from them. Thanks for any hints! Additional info: # btrfs check /dev/sda parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 Ignoring transid failure Couldn't setup extent tree Couldn't open file system Not sure what it does but it looks not too good: Actually, it's pretty good, other than the transid failure, which is a real problem. Hugo. # btrfs-find-root /dev/sda parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 parent transid verify failed on 427084513280 wanted 390924 found 390922 Ignoring transid failure Couldn't setup extent tree Couldn't setup device tree Superblock thinks the generation is 390924 Superblock thinks the level is 1 Well block 427084988416(gen: 390923 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427084021760(gen: 390923 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427084431360(gen: 390915 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427084398592(gen: 390915 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427083988992(gen: 390915 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427038621696(gen: 390914 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427031035904(gen: 390913 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427285069824(gen: 390912 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427060887552(gen: 390912 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427013128192(gen: 390912 level: 1) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 427001872384(gen: 390909 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965237760(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965221376(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965188608(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965172224(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426965155840(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426964271104(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block 426964156416(gen: 390906 level: 0) seems good, but generation/level doesn't match, want gen: 390924 level: 1 Well block
Re: fs unreadable after powercycle: BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922
2015-08-08 21:05 GMT+02:00 Hugo Mills h...@carfax.org.uk: Maybe someone can give some clues why does this happen in the first place? Is it unfortunate timing due to the abrupt power cycle? Shouldn't CoW protect against this somewhat? Not somewhat: it should protect it completely. There are two ways that this can happen: it's a bug in btrfs, or there's something stopping barriers from working. That latter case can be either a bug in the kernel's block layer (pretty unlikely), or the hardware is behaving badly and ignoring the barriers (more likely, particularly if it's on a USB/SATA converter). Thanks for the information. The setup is nothing out of the ordinary. The disks are HGST HUS724040ALA640 running on a Dell H310 SATA controller and configured as JBOD. It's all running on defaults on a Dell PowerEdge R720. SMART says the disk write cache is enabled - maybe that's part of the problem? I don't think there's a good solution to transid failures, I'm afraid. The best that I'm aware of is to use btrfs restore to grab the pieces of your FS that aren't up to date in your backups, and then restore from them. Okay, fortunately can I dismiss the data - or is the broken Image of any use for anyone? It's a 4TB disk but I guess I could create a compressed (partial) image if it's of interest to anyone. regards Martin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fs unreadable after powercycle: BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922
Martin Tippmann posted on Sat, 08 Aug 2015 20:43:34 +0200 as excerpted: Hi, after a hard reboot (powercycle) a btrfs volume did not come up again: It's a single 4TB disk - only btrfs with lzo - data=single,metadata=dup [ 121.831814] BTRFS info (device sda): disk space caching is enabled [ 121.857820] BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922 [ 121.861607] BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922 [ 121.861715] BTRFS: failed to read tree root on sda [ 121.878111] BTRFS: open_ctree failed btrfs-progs v4.0 Kernel: 4.1.4 I'm quite sure that the HDD is fine (no SMART Problems, Disk Errorlog is empty, It's a new Enterprise-Drive that worked well in the past days/weeks). So I'm kind at loss what to do: How can I recover from that problem? I've found just a note in the FAQ[1] but no solution to the problem. [The FAQ reference was to the wiki problem faq, transid failure explanation, but it didn't say what to do about it.] Did you try the recovery mount option suggested earlier in the problem-faq under mount problems? https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_can.27t_mount_my_filesystem.2 For transid failures, that's what I'd try first, since that scans previous tree-roots and tries to use the first one it can read. Since the transid it wants (390924) is only a couple ahead of what it finds (390922), and the recover mount option scans backward in the tree-root history to see if it can find any that work, that could well solve the problem. If not, as Hugo mentions, given find-tree-root looks good, btrfs restore has a good chance of working. I've used that myself to good effect a couple times when a btrfs refused to mount (I have backups if I have to use 'em, but recovery or restore, when they work, will normally leave me with more current copies, since I tend to let my backups get somewhat stale). There's a page on the wiki for using it with find-root if necessary, but the wiki page is a bit dated. The btrfs-restore manpage should be current, but doesn't have the detail about using it with find- root that the wiki page has. Maybe someone can give some clues why does this happen in the first place? Is it unfortunate timing due to the abrupt power cycle? Shouldn't CoW protect against this somewhat? As Hugo says, in theory cow should protect against this, but the combination of possible bugs in a still not yet fully stable and mature btrfs, and possibly buggy hardware, means theory and practice don't always line up as well as they should, in theory. (How's that for an ouroboros, aka snake eating it's tail circular-reference, explanation? =:^) But the recovery mount option is a reasonable first recovery (now ouroboroi =:^) option, and btrfs restore not too bad to work with if that fails. Referencing the hardware write-caching option you mentioned later, yes, turning that off can help... in theory... but it also tends to have a DRAMATICALLY bad effect on spinning rust write performance (I don't know enough about SSD write caching to venture a guess), and in some cases voids warranties due to the additional thrashing it's likely to cause as well, so do your research before turning it off. In general, it's not a good idea as it's simply not worth it. Both Linux at the generic IO level and the various filesystem stacks are designed to work around all but the worst hardware IO barrier failures, and the write slowdown and increased disk thrashing are simply not worth it, in most cases. If the hardware is actually bad enough that it's worth it, I'd strongly consider different hardware. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
parent transid verify failed
I am running kernel 4.0 and btrfs-prog mainline. I have a backup. Of the following commands: btrfs check —repair device btrfsck —repair device mount -t btrfs -o recovery device mount btrfs scrub start mount --none of them remove the parent transid verify failed” errors from the disk. The disk was going to read-only. The disk now mounts and seems to be fine. However, these “errors” persist. Is there any tool other than to zero the log, which will “repair” the log?-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
'parent transid verify failed' for 13637 missing transactions, resulting in 'BTRFS: Transaction aborted'
Hi, I have a btrfs volume in RAID0 across 2 SSDs which has (for no apparent reason) become corrupted. Although I am able to mount the partition, there are several messages displayed in the kernel log when doing so. I have copied the files off the file system, but would like to know if they can be relied upon or not (and if not, which ones are corrupt). I would also like to know if the file system itself is recoverable, or should be erased entirely and replaced. I have tried 'btrfs check --repair' and btrfs-zero-log to no avail. The SMART data for both drives suggests there are no issues with the hardware. Thanks in advance. Distro: Sabayon amd64 Kernel in use when corruption occurred: 3.17.4 Kernel in use when collecting diagnostic info: 3.16.0-23-generic (Ubuntu livecd) Btrfs-progs version: 3.18 btrfs fi df: (Used space is incorrect - should be at least 30 GB) Data, RAID0: total=93.16GiB, used=25.19MiB System, RAID1: total=32.00MiB, used=16.00KiB Metadata, RAID1: total=8.01GiB, used=73.81MiB unknown, single: total=16.00MiB, used=16.00KiB btrfs fi show: (truncated to show relevant filesystem only) Label: none uuid: d75ecf88-9b18-4ca6-8fd4-7bda0630de9b Total devices 2 FS bytes used 73.81MiB devid1 size 54.62GiB used 54.62GiB path /dev/sda1 devid2 size 54.62GiB used 54.62GiB path /dev/sdb1 Kernel log when mounting file system: [ 106.564009] BTRFS info (device sda1): disk space caching is enabled [ 106.577597] BTRFS: detected SSD devices, enabling SSD mode [ 106.578440] BTRFS: checking UUID tree [ 106.581198] parent transid verify failed on 168079851520 wanted 6329580 found 6343217 [ 106.581857] parent transid verify failed on 168079851520 wanted 6329580 found 6343217 [ 106.581880] BTRFS warning (device sda1): btrfs_uuid_tree_iterate failed -12 When unmounting: [ 113.814408] [ cut here ] [ 113.814454] WARNING: CPU: 0 PID: 3872 at /build/buildd/linux-3.16.0/fs/btrfs/extent-tree.c:5956 __btrfs_free_extent+0x675/0xc00 [btrfs]() [ 113.814460] Modules linked in: joydev btrfs dm_crypt xor snd_hda_codec_hdmi raid6_pq dm_multipath scsi_dh kvm_amd kvm snd_seq_midi snd_hda_codec_realtek snd_seq_midi_event snd_hda_codec_generic snd_rawmidi edac_core snd_hda_intel snd_hda_controller k10temp serio_raw edac_mce_amd snd_seq snd_hda_codec bnep snd_hwdep rfcomm snd_seq_device snd_pcm bluetooth snd_timer snd 6lowpan_iphc sp5100_tco soundcore i2c_piix4 shpchp mac_hid parport_pc ppdev lp parport squashfs overlayfs nls_utf8 isofs jfs xfs libcrc32c reiserfs dm_mirror dm_region_hash dm_log hid_generic nouveau mxm_wmi video i2c_algo_bit ttm usbhid drm_kms_helper pata_acpi firewire_ohci tg3 hid firewire_core r8169 drm ahci ptp crc_itu_t mii pata_jmicron libahci pps_core wmi [ 113.814558] CPU: 0 PID: 3872 Comm: umount Tainted: GW 3.16.0-23-generic #31-Ubuntu [ 113.814564] Hardware name: Gigabyte Technology Co., Ltd. GA-870A-UD3/GA-870A-UD3, BIOS F5 08/01/2011 [ 113.814569] 0009 8800bd5afa28 8177fcbc [ 113.814577] 8800bd5afa60 8106fd8d 00218f175000 8800cb98f000 [ 113.814584] 8800a80e9000 fffe 8800bd5afa70 [ 113.814591] Call Trace: [ 113.814605] [8177fcbc] dump_stack+0x45/0x56 [ 113.814615] [8106fd8d] warn_slowpath_common+0x7d/0xa0 [ 113.814623] [8106fe6a] warn_slowpath_null+0x1a/0x20 [ 113.814651] [c0d15345] __btrfs_free_extent+0x675/0xc00 [btrfs] [ 113.814661] [811c16a6] ? __slab_free+0xa6/0x320 [ 113.814690] [c0d1a044] __btrfs_run_delayed_refs+0x424/0x11e0 [btrfs] [ 113.814721] [c0d1edf3] btrfs_run_delayed_refs.part.64+0x73/0x270 [btrfs] [ 113.814750] [c0d1f51d] btrfs_write_dirty_block_groups+0x46d/0x710 [btrfs] [ 113.814784] [c0d2d64d] commit_cowonly_roots+0x18d/0x240 [btrfs] [ 113.814818] [c0d301ad] btrfs_commit_transaction.part.22+0x49d/0x970 [btrfs] [ 113.814852] [c0d2f27a] btrfs_commit_transaction+0x3a/0x80 [btrfs] [ 113.814875] [c0cfe760] btrfs_sync_fs+0x50/0xc0 [btrfs] [ 113.814884] [81211a82] sync_filesystem+0x72/0xb0 [ 113.814891] [811e2d50] generic_shutdown_super+0x30/0xf0 [ 113.814897] [811e30a2] kill_anon_super+0x12/0x20 [ 113.814920] [c0d01e86] btrfs_kill_super+0x16/0x90 [btrfs] [ 113.814926] [811e3429] deactivate_locked_super+0x49/0x60 [ 113.814932] [811e3874] deactivate_super+0x64/0x70 [ 113.814940] [812015ef] mntput_no_expire+0xdf/0x180 [ 113.814947] [81202bac] SyS_umount+0x8c/0x100 [ 113.814954] [81787ced] system_call_fastpath+0x1a/0x1f [ 113.814959] ---[ end trace 328a5b6c02402780 ]--- [ 113.814967] BTRFS info (device sda1): leaf 104182874112 total ptrs 209 free space 75 [ 113.814973] item 0 key (140680462336 168 16384) itemoff 16232 itemsize 51 [ 113.814978] extent refs 1
Unmountable filesystem parent transid verify failed
Hi again. Sorry for top posting. I have a 9 disk filesystem that does not mount anymore and need some help/advice so I can recover the data. What happened was that I was running a btrfs delete device under Ubuntu 13.04 Kernel 3.8 and after a long time of moving data around it crashed with a SEGV. Now the filesystem does not mount and none of the recovery options I have tried work. I have upgraded to Debian testing and are now using kerne3.10-2-amd64 When I try btrfsck I get heaps of these : Ignoring transid failure parent transid verify failed on 24419581267968 wanted 301480 found 301495 parent transid verify failed on 24419581267968 wanted 301480 found 301495 parent transid verify failed on 24419581267968 wanted 301480 found 301495 parent transid verify failed on 24419581267968 wanted 301480 found 301495 I have tried using btrfs-image : but it too crashes eventually with : btrfs-image -c9 -t4 /dev/sde btrfs-image ... btrfs-image: ctree.c:787: read_node_slot: Assertion `!(level == 0)' failed. Aborted mount -o ro,recovery fails # mount -o ro,recovery /dev/sde /DATA mount: wrong fs type, bad option, bad superblock on /dev/sde, ... # btrfs-zero-log /dev/sde eventually fails with : btrfs-zero-log: ctree.c:342: __btrfs_cow_block: Assertion `!(btrfs_header_generation(buf) trans-transid)' failed. Aborted What should I try next? regards ronnie sahlberg -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Unmountable BTRFS with parent transid verify failed
Hi, I have a 9 disk raid1 filesystem that is no longer mountable. I am using ubuntu 13.04 with kernel 3.8.0-26-generic What happened was that I was removing a device using btrfs device delete and this was running for quite a while (I was removing a 3T device) but eventually this failed with the btrfs command segfaulting. Now when I have rebooted but the filesystem does not mount. When I run btrfsck /dev/sde I get a lot of parent transid verify failed on 3539986560 wanted 301481 found 301495 parent transid verify failed on 3539986560 wanted 301481 found 301495 parent transid verify failed on 3539986560 wanted 301481 found 301495 parent transid verify failed on 3539986560 wanted 301481 found 301495 Ignoring transid failure leaf parent key incorrect 3539986560 leaf parent key incorrect 3536398464 bad block 3536398464 And while btrfsck eventually does complete the filesystem remains unmountable. Any advice ? regards ronnie sahlberg -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unmountable BTRFS with parent transid verify failed
ronnie sahlberg posted on Sat, 31 Aug 2013 14:50:36 -0700 as excerpted: And while btrfsck eventually does complete the filesystem remains unmountable. Any advice ? This isn't specific to your question, but in general... In the Question: How can I recover this partition? (unable to find logical $hugenum len 4096) thread about a week ago, there's a post from Hugo Mills, listing the general troubleshooting steps he recommends and in what order. I'd try that. http://permalink.gmane.org/gmane.comp.file-systems.btrfs/27999 (I have it marked to possibly add the info to the wiki as I don't remember seeing such a concise list there, but I haven't gotten around to it yet.) Wiki: https://btrfs.wiki.kernel.org/ -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unmountable BTRFS with parent transid verify failed
On Aug 31, 2013, at 4:01 PM, Duncan 1i5t5.dun...@cox.net wrote: ronnie sahlberg posted on Sat, 31 Aug 2013 14:50:36 -0700 as excerpted: And while btrfsck eventually does complete the filesystem remains unmountable. Any advice ? This isn't specific to your question, but in general... In the Question: How can I recover this partition? (unable to find logical $hugenum len 4096) thread about a week ago, there's a post from Hugo Mills, listing the general troubleshooting steps he recommends and in what order. I'd try that. http://permalink.gmane.org/gmane.comp.file-systems.btrfs/27999 I was about to suggest the same thing, but also to use something newer than 3.8.0, and before getting to any of the btrfs specific commands to make sure a recent btrfs-progs is being used. There have been lots of fixes between 3.8 and 3.10. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parent transid verify failed on -- After moving btrfs closer to the beginning of drive with dd
Also here's the output of btrfs-find-root: ./btrfs-find-root /dev/sdb1 Super think's the tree root is at 1229060866048, chunk root 1259695439872 Went past the fs size, exiting Not sure where to go from here. On Sat, Dec 29, 2012 at 6:04 AM, Jordan Windsor jorda...@gmail.com wrote: Hello, thanks for the response! Here's the output of -o recovery [ 5473.725751] device label Storage devid 1 transid 116023 /dev/sdb1 [ 5473.726612] btrfs: enabling auto recovery [ 5473.726615] btrfs: disk space caching is enabled [ 5473.734581] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 5473.734797] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 5473.734801] btrfs: failed to read tree root on sdb1 [ 5473.735010] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 5473.735259] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 5473.735262] btrfs: failed to read tree root on sdb1 [ 5473.756367] parent transid verify failed on 1229060243456 wanted 116022 found 116028 [ 5473.761968] parent transid verify failed on 1229060243456 wanted 116022 found 116028 [ 5473.761975] btrfs: failed to read tree root on sdb1 [ 5475.561208] btrfs bad tree block start 7479324919942847850 1241518882816 [ 5475.567008] btrfs bad tree block start 13410158725948676859 1241518882816 [ 5475.567056] Failed to read block groups: -5 [ 5475.570200] btrfs: open_ctree failed I'm on kernel 3.6.10 and have been before this problem. Thanks. On Sat, Dec 29, 2012 at 5:29 AM, cwillu cwi...@cwillu.com wrote: On Fri, Dec 28, 2012 at 12:09 PM, Jordan Windsor jorda...@gmail.com wrote: Hello, I moved my btrfs to the beginning of my drive updated the partition table also restarted, I'm currently unable to mount it, here's the output in dmesg. [ 481.513432] device label Storage devid 1 transid 116023 /dev/sdb1 [ 481.514277] btrfs: disk space caching is enabled [ 481.522611] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 481.522789] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 481.522790] btrfs: failed to read tree root on sdb1 [ 481.523656] btrfs: open_ctree failed What command should I run from here? The filesystem wasn't uncleanly unmounted, likely on an older kernel. Try mounting with -o recovery -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parent transid verify failed on -- After moving btrfs closer to the beginning of drive with dd
On Sat, Dec 29, 2012 at 7:14 AM, Jordan Windsor jorda...@gmail.com wrote: Also here's the output of btrfs-find-root: ./btrfs-find-root /dev/sdb1 Super think's the tree root is at 1229060866048, chunk root 1259695439872 Went past the fs size, exiting Not sure where to go from here. I can't say for certain, but that suggests that the move-via-dd didn't succeed / wasn't correct, and/or the partitioning changes didn't match, and/or the dd happened from a mounted filesystem (which would also explain the transid errors, if there wasn't an unclean umount involved). btrfs-restore might be able to pick out files, but you may be in restore-from-backup territory. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parent transid verify failed on -- After moving btrfs closer to the beginning of drive with dd
On Dec 29, 2012, at 8:38 AM, cwillu cwi...@cwillu.com wrote: On Sat, Dec 29, 2012 at 7:14 AM, Jordan Windsor jorda...@gmail.com wrote: Also here's the output of btrfs-find-root: ./btrfs-find-root /dev/sdb1 Super think's the tree root is at 1229060866048, chunk root 1259695439872 Went past the fs size, exiting Not sure where to go from here. I can't say for certain, but that suggests that the move-via-dd didn't succeed / wasn't correct, and/or the partitioning changes didn't match, and/or the dd happened from a mounted filesystem (which would also explain the transid errors, if there wasn't an unclean umount involved). btrfs-restore might be able to pick out files, but you may be in restore-from-backup territory. Yeah I'm vaguely curious about how the move was done, in particular if it was dd'd from a mounted fs. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parent transid verify failed on -- After moving btrfs closer to the beginning of drive with dd
On Fri, Dec 28, 2012 at 12:09 PM, Jordan Windsor jorda...@gmail.com wrote: Hello, I moved my btrfs to the beginning of my drive updated the partition table also restarted, I'm currently unable to mount it, here's the output in dmesg. [ 481.513432] device label Storage devid 1 transid 116023 /dev/sdb1 [ 481.514277] btrfs: disk space caching is enabled [ 481.522611] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 481.522789] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 481.522790] btrfs: failed to read tree root on sdb1 [ 481.523656] btrfs: open_ctree failed What command should I run from here? The filesystem wasn't uncleanly unmounted, likely on an older kernel. Try mounting with -o recovery -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parent transid verify failed on -- After moving btrfs closer to the beginning of drive with dd
Hello, thanks for the response! Here's the output of -o recovery [ 5473.725751] device label Storage devid 1 transid 116023 /dev/sdb1 [ 5473.726612] btrfs: enabling auto recovery [ 5473.726615] btrfs: disk space caching is enabled [ 5473.734581] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 5473.734797] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 5473.734801] btrfs: failed to read tree root on sdb1 [ 5473.735010] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 5473.735259] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 5473.735262] btrfs: failed to read tree root on sdb1 [ 5473.756367] parent transid verify failed on 1229060243456 wanted 116022 found 116028 [ 5473.761968] parent transid verify failed on 1229060243456 wanted 116022 found 116028 [ 5473.761975] btrfs: failed to read tree root on sdb1 [ 5475.561208] btrfs bad tree block start 7479324919942847850 1241518882816 [ 5475.567008] btrfs bad tree block start 13410158725948676859 1241518882816 [ 5475.567056] Failed to read block groups: -5 [ 5475.570200] btrfs: open_ctree failed I'm on kernel 3.6.10 and have been before this problem. Thanks. On Sat, Dec 29, 2012 at 5:29 AM, cwillu cwi...@cwillu.com wrote: On Fri, Dec 28, 2012 at 12:09 PM, Jordan Windsor jorda...@gmail.com wrote: Hello, I moved my btrfs to the beginning of my drive updated the partition table also restarted, I'm currently unable to mount it, here's the output in dmesg. [ 481.513432] device label Storage devid 1 transid 116023 /dev/sdb1 [ 481.514277] btrfs: disk space caching is enabled [ 481.522611] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 481.522789] parent transid verify failed on 1229060423680 wanted 116023 found 116027 [ 481.522790] btrfs: failed to read tree root on sdb1 [ 481.523656] btrfs: open_ctree failed What command should I run from here? The filesystem wasn't uncleanly unmounted, likely on an older kernel. Try mounting with -o recovery -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recovering parent transid verify failed
Anything new? I'm still trying to fix my FS every once in a while, none of the tools helps. This is what find-root gives: http://pastebin.com/KycgzhaP Btrfsck still only gives this: # sudo ./btrfsck --repair /dev/sda4 enabling repair mode parent transid verify failed on 216925220864 wanted 135714 found 135713 parent transid verify failed on 216925220864 wanted 135714 found 135713 parent transid verify failed on 216925220864 wanted 135714 found 135713 parent transid verify failed on 216925220864 wanted 135714 found 135713 Ignoring transid failure btrfsck: root-tree.c:46: btrfs_find_last_root: Assertion `!(path-slots[0] == 0)' failed. Anymore details I can give you which will help resolving this? Thanks. Yo'av בתאריך 6 במרץ 2011 11:02, מאת Hugo Mills hugo-l...@carfax.org.uk: On Sun, Mar 06, 2011 at 12:28:41PM +0200, Yo'av Moshe wrote: Hey, I'd start by saying that I know Btrfs is a still experimental, and so there's no guarantee that one would be able to help me at all... But I thought I'll try anyway :-) Few months ago I bought a new laptop and installed ArchLinux on it, with Btrfs on the root filesystem... I know, it's not the smartest thing to do... After a few month I had issues with my hibernations scripts, and one day I tried to hibernate my computer but it didn't go that well, and, well, ever since then my Btrfs partition is not accessible. I opened up the Btrfs FAQ and saw that the fsck tool should be out by the end of 2010, and thought oh well, I could wait until then, and went on and installed Ubuntu with Ext4 on another small partition. But times goes one and the fsck tool is still in development... I've tried using the code from GIT and it didn't work, and I'm starting to wonder (a) if there's any hope at all and (b) what other step am I able to do to recover my old Btrfs partition. Yes, there is hope. This error should be fixable with the new fsck. When trying to mount the Btrfs parition I get this in dmesg: [105252.779080] device fsid d14e78a602757297-bf762d859b406ca9 devid 1 transid 135714 /dev/sda4 [105252.818697] parent transid verify failed on 216925220864 wanted 135714 found 135713 [snip] Should I wait for btrfsck to be ready? Yes. Am I not using it correctly now? No, there's not a lot the current version can do right now. Is there anyway to recover this partition or should I just wipe it and reinstall Btrfs only when I'm supposed to?.. Your help is appreciated. HTH, Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I am the author. You are the audience. I outrank you! --- -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFNc2nhIKyzvlFcI40RAsq2AKCrSZE6nXYIRbxfLThwIH/yEeO/iACggwHZ vXj/K5R746xiMj8x6Ehdzbs= =zinf -END PGP SIGNATURE- -- Yo'av Moshe -- Yo'av Moshe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted btrfs volume: parent transid verify failed
Regarding your backup regimen, consider using rsync instead of dd: after the initial backup, rsync can update the existing backup _much_ more quickly, making it practical to do a backup every night, or even multiple times a day. dd also has the downside of potentially _really_ confusing btrfs if it ever sees the backup and the original at the same time. A still better option is to use an online backup service such as crashplan or spideroak, as that way your backups are also safe from fire or theft. Also most will automatically create incremental backups several times per hour, so that you can access old versions of your file easily. Crashplan has a free online backup service where you backup to a friend's computer over the internet instead of to their servers. Another cheap alternative for small and very important files is to email them to your google mail account so you can retrieve lost files from any computer. I know of one Comp Sci professor who advises all his students to use that email method of backup for important thesis and suchlike as well as any other backup method. His argument is that if a student's room mate gets arrested, then the cops are likely to take away all computers and backup media, so in that case an online backup will be the only usable one. -- David Pottage Error compiling committee.c To many arguments to function. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted btrfs volume: parent transid verify failed
Halp! I was recently forced to power cycle my desktop PC, and upon restart, the btrfs /home volume would no longer mount, citing the error BUG: scheduling while atomic: mount /5584/0x2. I retrieved the latest btrfs-progs git repositories from git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git and http://git.darksatanic.net/repo/btrfs-progs-unstable.git -b integration-20110805, but when running sudo ./btrfsck -s 1 /dev/mapper/home from either repo builds, I receive the error parent transid verify failed on 647363842048 wanted 210333 found 210302 (repeated 3x). I've also tried the flags -s 0, -s 1, and -s 2, all with the same results. I take care to complete a full dd copy of my disk every 2 weeks, but my previous backup is nearly 2 weeks old and I've put in almost 2 weeks of effort on my masters thesis since then. I'm quite desperate to recover this volume. Any help is appreciated, as I've exhausted the existing suggestions from the mailing list posts to date. I've tried to ask in #btrfs, but suspect that they're all sleepy bearded people :( Regards, -Yalonda -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted btrfs volume: parent transid verify failed
On Mon, Aug 15, 2011 at 4:13 AM, Yalonda Gishtaka yalonda.gisht...@gmail.com wrote: Halp! I was recently forced to power cycle my desktop PC, and upon restart, the btrfs /home volume would no longer mount, citing the error BUG: scheduling while atomic: mount /5584/0x2. I retrieved the latest btrfs-progs git repositories from git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git and http://git.darksatanic.net/repo/btrfs-progs-unstable.git -b integration-20110805, but when running sudo ./btrfsck -s 1 /dev/mapper/home from either repo builds, I receive the error parent transid verify failed on 647363842048 wanted 210333 found 210302 (repeated 3x). I've also tried the flags -s 0, -s 1, and -s 2, all with the same results. Is there something in the log about replaying log? If yes, try btrfs-zero-log https://btrfs.wiki.kernel.org/index.php/Problem_FAQ -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted btrfs volume: parent transid verify failed
Fajar, Thank you for the suggestion. Unfortunately, running sudo ./btrfs-zero-log /dev/mapper/home results in the same parent transid verify failed on 647363842048 wanted 210333 found 210302 errors, repeated 3 times. I am running Arch Linux with the latest 3.0.1 kernel on a x86_64 machine. Regards, -Yalonda On Sun, Aug 14, 2011 at 11:40 PM, Fajar A. Nugraha l...@fajar.net wrote: On Mon, Aug 15, 2011 at 4:13 AM, Yalonda Gishtaka yalonda.gisht...@gmail.com wrote: Halp! I was recently forced to power cycle my desktop PC, and upon restart, the btrfs /home volume would no longer mount, citing the error BUG: scheduling while atomic: mount /5584/0x2. I retrieved the latest btrfs-progs git repositories from git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git and http://git.darksatanic.net/repo/btrfs-progs-unstable.git -b integration-20110805, but when running sudo ./btrfsck -s 1 /dev/mapper/home from either repo builds, I receive the error parent transid verify failed on 647363842048 wanted 210333 found 210302 (repeated 3x). I've also tried the flags -s 0, -s 1, and -s 2, all with the same results. Is there something in the log about replaying log? If yes, try btrfs-zero-log https://btrfs.wiki.kernel.org/index.php/Problem_FAQ -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted btrfs volume: parent transid verify failed
Soon seems a bit subjective given that the devs have been touting this since the beginning of time. /Helpful/ advice would be nice. This blog posting (http://stujordan.wordpress.com/2011/06/20/churning-the-butter/) sounded promising, but none of the superblock copies on my btrfs volume are ok, as I keep receiving the same parent transid verify failed messages. Will be released btrfsck tool handle this case? On Mon, Aug 15, 2011 at 1:10 AM, Michael Cronenworth m...@cchtml.com wrote: On 08/14/2011 04:13 PM, Yalonda Gishtaka wrote: I'm quite desperate to recover this volume. You should have had backups. Btrfs has no file system repair tool, but it is supposed to be out soon (tm). You will have to wait. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted btrfs volume: parent transid verify failed
Telling someone (that has a ~2 week stale backup) that they should have kept backups is hardly constructive. We're all aware there's no official btrfs repair tool. But it appears there has been been some hard, dedicated work towards this that has resulted in many commits and patches. I'm here to find out what there is to know about recent developments that may help my current situation. Please consider offering helpful advice instead of pointing out the obvious about my backup schedule. Cheers, -Yalonda On Mon, Aug 15, 2011 at 1:51 AM, Michael Cronenworth m...@cchtml.com wrote: On 08/14/2011 06:32 PM, Yalonda Gishtaka wrote: /Helpful/ advice would be nice. Being hostile will net you zero advice. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726
On Tue, Jun 21, 2011 at 05:01:53PM +0200, Francesco R wrote: 2011/6/21 Daniel Witzel dannyboy48...@gmail.com: Welcome to the club, I have a similar issue. We pretty much have to wait for the fsck tool to finish being developed. If possible unhook the drives and leave them be until the tool is done. I don't know when it will be done as I am not a developer, mearly a follower. There are tools to view the metadata stored as raid10? possibly in high level language? I see Chris Mason stopped git commits to btrfs-progs-unstable in 2010, there is someone working on it? There have been lots of commits and patches since then. The tmp branch contains a bunch of commits from Chris, and the integration-20110616 branch in my git repository[1] contains more or less all of the other patches that have made it to this mailing list since. Sadly, none of them contain the new btrfsck code. :( Hugo. [1] http://git.darksatanic.net/repo/btrfs-progs-unstable.git/ -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- There are three mistaiks in this sentance. --- signature.asc Description: Digital signature
Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726
Well, I'm patient. Rather have a fsck that works than a fsck that may thrash the FS so no 'gun to the head' on this one. Some feedback on RECENT progress would be nice. Besides your merge branch Hugo (yes I tried it still no cigar...) it's been ghost since December 2010. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726
Hello, I am facing the same issue on a Btrfs RAID0 with 2 drives: Label: 'root' uuid: 1e26b203-fc1e-4ebf-9551-451bd34d3ac4 Total devices 2 FS bytes used 36.14GB devid1 size 80.43GB used 41.65GB path /dev/sda6 devid2 size 80.43GB used 41.63GB path /dev/sdb6 Btrfs v0.19-36-g70c6c10-dirty Tried btrfs-select-super -s 1 /dev/sd[ab]6, but that does not help at all. On both drives, its standard output is identical: parent transid verify failed on 576901120 wanted 70669 found 70755 btrfs-select-super: disk-io.c:412: find_and_setup_root: Assertion `!(!root-node)' failed. using SB copy 1, bytenr 67108864 This is the first error message from dmesg: [ 156.617407] parent transid verify failed on 576901120 wanted 70669 found 70755 [ 156.617504] parent transid verify failed on 576901120 wanted 70669 found 70755 [ 156.635322] btrfs: open_ctree failed The problem occurred shortly after this issue: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg10618.html The machine booted (and worked normally) at least ten times between the error message and the current problem. The kernel version was 2.6.39.1 when the first BUG message appeared in dmesg. I downgraded to 2.6.38.8 after that and everything seemed to work fine ... up to now. Any suggestions? ;-) I can always restore the data from another machine with identical installation. But first of all I'd like to understand this problem and know whether it can be dealt with somehow. Andrej smime.p7s Description: S/MIME Cryptographic Signature
[HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726
[HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726 Hi list, I've a broken btrfs filesystem to deal with can someone please help me in recover the data? The filesystem has been created a pair of years ago with 4 devices with the command at #create and is mounted with #fstab options. Recently I've added a pair of devices and made a `btrfs filesystem balance`, after it succeded I was doing a `btrfs device delete` on space02 (the currently broken one) in the middle of this the power cable has been axed. After replacing the cable cord 'space01' is mountable, 'space02' is not. tryed to use a backup copy of super with `btrfs-select-super` but it fail as reported in #btrfs-select-super please, pretty please have you suggestion on what try next? #current kernel (vanilla + linux-vserver) uname -a Linux dobbia 2.6.38.8-vs2.3.0.37-rc17 #5 SMP Mon Jun 20 15:04:39 CEST 2011 x86_64 Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz GenuineIntel GNU/Linux #create modprobe btrfs mkfs.btrfs -L space01 -m raid10 -d raid10 $DEVICES1 mkfs.btrfs -L space02 -m raid10 -d raid10 $DEVICES2 # fstab /dev/sda6 /mnt/space01 btrfs defaults,device=/dev/sda6,device=/dev/sdc6,device=/dev/sdd1,device=/dev/sde1,device=/dev/sdf6,device=/dev/sdg6 0 0 /dev/sda7 /mnt/space02 btrfs defaults,device=/dev/sda7,device=/dev/sdb7,device=/dev/sdc7,device=/dev/sdd2,device=/dev/sde2,device=/dev/sdf7,device=/dev/sdg7 0 0 # current layout btrfs filesystem show failed to read /dev/sr0 Label: 'space01' uuid: c77c6e87-fccd-4204-bd2c-d924fe06be31 Total devices 6 FS bytes used 164.81GB devid7 size 244.14GB used 56.59GB path /dev/sdf6 devid5 size 244.93GB used 56.59GB path /dev/sdd1 devid8 size 244.14GB used 56.59GB path /dev/sdg6 devid6 size 244.93GB used 56.59GB path /dev/sde1 devid4 size 244.14GB used 56.59GB path /dev/sda6 devid3 size 244.14GB used 56.59GB path /dev/sdc6 Label: 'space02' uuid: f752def1-1abc-48c7-8ebb-47ba37b8ffa6 Total devices 7 FS bytes used 172.94GB devid7 size 487.65GB used 0.00 path /dev/sdf7 devid6 size 488.94GB used 60.25GB path /dev/sde2 devid5 size 488.94GB used 58.75GB path /dev/sdd2 devid4 size 487.65GB used 60.26GB path /dev/sda7 devid7 size 487.65GB used 1.50GB path /dev/sdg7 devid2 size 487.65GB used 58.76GB path /dev/sdb7 devid3 size 487.65GB used 60.26GB path /dev/sdc7 Btrfs v0.19-35-g1b444cd-dirty # first error messages Jun 20 14:04:35 dobbia kernel: [ 806.587580] device label space02 devid 4 transid 757294 /dev/sda7 Jun 20 14:04:35 dobbia kernel: [ 806.629781] device label space02 devid 2 transid 756848 /dev/sdb7 Jun 20 14:04:35 dobbia kernel: [ 806.630107] device label space02 devid 3 transid 757294 /dev/sdc7 Jun 20 14:04:35 dobbia kernel: [ 806.652126] device label space02 devid 5 transid 756846 /dev/sdd2 Jun 20 14:04:37 dobbia kernel: [ 808.201719] device label space02 devid 6 transid 757294 /dev/sde2 Jun 20 14:04:37 dobbia kernel: [ 808.218108] device label space02 devid 7 transid 756846 /dev/sdf7 Jun 20 14:04:37 dobbia kernel: [ 808.218433] device label space02 devid 7 transid 757294 /dev/sdg7 Jun 20 14:04:37 dobbia kernel: [ 808.218715] device label space02 devid 4 transid 757294 /dev/sda7 Jun 20 14:04:37 dobbia kernel: [ 808.271797] btrfs: failed to read the system array on sdg7 Jun 20 14:04:37 dobbia kernel: [ 808.293776] btrfs: open_ctree failed Jun 20 14:04:56 dobbia kernel: [ 827.190208] device label space02 devid 4 transid 757294 /dev/sda7 Jun 20 14:04:56 dobbia kernel: [ 827.254517] btrfs: failed to read the system array on sdg7 Jun 20 14:04:56 dobbia kernel: [ 827.280152] btrfs: open_ctree failed Jun 20 14:05:01 dobbia kernel: [ 832.442454] device label space02 devid 4 transid 757294 /dev/sda7 Jun 20 14:05:01 dobbia kernel: [ 832.502017] btrfs: failed to read the system array on sdg7 Jun 20 14:05:01 dobbia kernel: [ 832.521492] btrfs: open_ctree failed Jun 20 14:05:20 dobbia kernel: [ 851.113237] device label space02 devid 4 transid 757294 /dev/sda7 Jun 20 14:05:20 dobbia kernel: [ 851.199478] btrfs: allowing degraded mounts Jun 20 14:05:20 dobbia kernel: [ 851.563583] parent transid verify failed on 600755752960 wanted 757102 found 756726 Jun 20 14:05:20 dobbia kernel: [ 851.564146] parent transid verify failed on 600755752960 wanted 757102 found 756726 Jun 20 14:05:20 dobbia kernel: [ 851.651006] btrfs bad tree block start 0 600859951104 Jun 20 14:05:20 dobbia kernel: [ 851.671362] parent transid verify failed on 600859955200 wanted 756926 found 756726 Jun 20 14:05:20 dobbia kernel: [ 851.671636] parent transid verify failed on 600859955200 wanted 756926 found 756726 Jun 20 14:05:20 dobbia kernel: [ 851.693515] btrfs bad tree block start 0 601053986816 Jun 20 14:05:20 dobbia kernel: [ 851.693559] btrfs bad tree block start 0 601054003200 Jun 20 14:05:20 dobbia kernel: [ 851.693566
Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726
Welcome to the club, I have a similar issue. We pretty much have to wait for the fsck tool to finish being developed. If possible unhook the drives and leave them be until the tool is done. I don't know when it will be done as I am not a developer, mearly a follower. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726
2011/6/21 Daniel Witzel dannyboy48...@gmail.com: Welcome to the club, I have a similar issue. We pretty much have to wait for the fsck tool to finish being developed. If possible unhook the drives and leave them be until the tool is done. I don't know when it will be done as I am not a developer, mearly a follower. There are tools to view the metadata stored as raid10? possibly in high level language? I see Chris Mason stopped git commits to btrfs-progs-unstable in 2010, there is someone working on it? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Having parent transid verify failed
On Thursday 05 May 2011 22:32:42 Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400: I think i made some progress. When i tried to remove the directory that i suspect contains the problematic file, i got this on the console rm -rf serverloft/ Ok, our one bad block is in the extent allocation tree. This is going to be the very hardest thing to fix. Until I finish off the code to rebuild parts of the extent allocation tree, I think your best bet is to copy the files off. The big question is, what happened to make this error? Can you describe your setup in more detail? -chris It seems that I run into the same problem: parent transid verify failed on 32940560384 wanted 210334 found 210342 BUG: scheduling while atomic: chrome/17058/0x0002 Modules linked in: snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss fuse dm_crypt dm_mod usbhid snd_intel8x0 snd_ac97_codec sr_mod cdrom ac97_bus snd_pcm sg snd_timer snd e1000 fschmd uhci_hcd snd_page_alloc i2c_i801 [last unloaded: microcode] Pid: 17058, comm: chrome Tainted: GW 2.6.39 #29 Call Trace: [c13cf70c] ? schedule+0x78/0x6ef [c11acabb] ? generic_make_request+0x1d5/0x22f [c11acbad] ? submit_bio+0x98/0x9f [c118026a] ? btrfs_map_bio+0x1ab/0x1b5 [c13cfdc2] ? io_schedule+0x3f/0x50 [c105723d] ? sleep_on_page+0x5/0x8 [c13d0292] ? __wait_on_bit+0x31/0x58 [c1057238] ? __lock_page+0x52/0x52 [c1057388] ? wait_on_page_bit+0x5a/0x62 [c1037f92] ? autoremove_wake_function+0x29/0x29 [c117ab39] ? read_extent_buffer_pages+0x33a/0x3b5 [c115891f] ? btree_read_extent_buffer_pages.clone.51+0x44/0x9e [c11578b0] ? verify_parent_transid+0x147/0x147 [c11593aa] ? read_tree_block+0x2d/0x3e [c1144f90] ? read_block_for_search.clone.36+0xc3/0x35d [c11863bf] ? btrfs_tree_unlock+0x19/0x3a [c11420bb] ? unlock_up+0x88/0x9f [c1146f7e] ? btrfs_search_slot+0x39d/0x4fe [c1149fa1] ? lookup_inline_extent_backref+0x116/0x49b [c11773b0] ? set_extent_dirty+0x19/0x1d [c114cbd0] ? __btrfs_free_extent+0xe2/0x6c6 [c114fa28] ? run_clustered_refs+0x6ad/0x720 [c1191330] ? btrfs_find_ref_cluster+0x53/0x11f [c114fb53] ? btrfs_run_delayed_refs+0xb8/0x18d [c115d395] ? __btrfs_end_transaction+0x5a/0x17f [c115d4dc] ? btrfs_end_transaction+0x9/0xb [c1165e19] ? btrfs_evict_inode+0x190/0x1a7 [c1092c45] ? evict+0x56/0xeb [c108baa8] ? do_unlinkat+0xc3/0x103 [c13d1c90] ? sysenter_do_call+0x12/0x26 [c13d] ? console_conditional_schedule+0x8/0xf parent transid verify failed on 32940560384 wanted 210334 found 210342 BUG: scheduling while atomic: chrome/17058/0x0002 Modules linked in: snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss fuse dm_crypt dm_mod usbhid snd_intel8x0 snd_ac97_codec sr_mod cdrom ac97_bus snd_pcm sg snd_timer snd e1000 fschmd uhci_hcd snd_page_alloc i2c_i801 [last unloaded: microcode] Pid: 17058, comm: chrome Tainted: GW 2.6.39 #29 Call Trace: [c13cf70c] ? schedule+0x78/0x6ef [c11acabb] ? generic_make_request+0x1d5/0x22f [c11acbad] ? submit_bio+0x98/0x9f [c118026a] ? btrfs_map_bio+0x1ab/0x1b5 [c13cfdc2] ? io_schedule+0x3f/0x50 [c105723d] ? sleep_on_page+0x5/0x8 [c13d0292] ? __wait_on_bit+0x31/0x58 [c1057238] ? __lock_page+0x52/0x52 [c1057388] ? wait_on_page_bit+0x5a/0x62 [c1037f92] ? autoremove_wake_function+0x29/0x29 [c117ab39] ? read_extent_buffer_pages+0x33a/0x3b5 [c116bd50] ? lookup_extent_mapping+0x5a/0x148 [c115891f] ? btree_read_extent_buffer_pages.clone.51+0x44/0x9e [c11578b0] ? verify_parent_transid+0x147/0x147 [c11593aa] ? read_tree_block+0x2d/0x3e [c1144f90] ? read_block_for_search.clone.36+0xc3/0x35d [c11863bf] ? btrfs_tree_unlock+0x19/0x3a [c11420bb] ? unlock_up+0x88/0x9f [c1146f7e] ? btrfs_search_slot+0x39d/0x4fe [c1149fa1] ? lookup_inline_extent_backref+0x116/0x49b [c11773b0] ? set_extent_dirty+0x19/0x1d [c114cbd0] ? __btrfs_free_extent+0xe2/0x6c6 [c114fa28] ? run_clustered_refs+0x6ad/0x720 [c1191330] ? btrfs_find_ref_cluster+0x53/0x11f [c114fb53] ? btrfs_run_delayed_refs+0xb8/0x18d [c115d395] ? __btrfs_end_transaction+0x5a/0x17f [c115d4dc] ? btrfs_end_transaction+0x9/0xb [c1165e19] ? btrfs_evict_inode+0x190/0x1a7 [c1092c45] ? evict+0x56/0xeb [c108baa8] ? do_unlinkat+0xc3/0x103 [c13d1c90] ? sysenter_do_call+0x12/0x26 [c13d] ? console_conditional_schedule+0x8/0xf parent transid verify failed on 32940560384 wanted 210334 found 210342 BUG: scheduling while atomic: chrome/17058/0x0002 Modules linked in: snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss fuse dm_crypt dm_mod usbhid snd_intel8x0 snd_ac97_codec sr_mod cdrom ac97_bus snd_pcm sg snd_timer snd e1000 fschmd uhci_hcd snd_page_alloc i2c_i801 [last unloaded: microcode] Pid: 17058, comm: chrome Tainted: GW 2.6.39 #29 Call Trace: [c13cf70c] ? schedule+0x78/0x6ef [c11acabb] ? generic_make_request+0x1d5/0x22f [c11acbad] ? submit_bio+0x98/0x9f [c118026a
cannot mount btrfs - parent transid verify failed
hello! A power outage damaged a btrfs - it could not be mounted upon startup. kernel: 2.6.38 (from ubuntu kernel ppa). dmesg: [ 88.562819] device fsid 844676ff057abdd4-ccd6cf8af4e14dba devid 1 transid 112504 /dev/sdb1 [ 88.596515] verify_parent_transid: 6 callbacks suppressed [ 88.596518] parent transid verify failed on 408626470912 wanted 24 found 111474 [ 88.596686] parent transid verify failed on 408626470912 wanted 24 found 111474 [ 88.600062] parent transid verify failed on 408626470912 wanted 24 found 111474 [ 88.600067] parent transid verify failed on 408626470912 wanted 24 found 111474 [ 88.670071] btrfs: open_ctree failed I compiled the latest btrfs-progs-unstable and tried btrfsck: root@tesla:/root/btrfs-progs-unstable# ./btrfsck /dev/sdb1 parent transid verify failed on 408626470912 wanted 24 found 111474 parent transid verify failed on 408626470912 wanted 24 found 111474 parent transid verify failed on 408626470912 wanted 24 found 111474 btrfsck: disk-io.c:416: find_and_setup_root: Assertion `!(!root-node)' failed. Abgebrochen root@tesla:/root/btrfs-progs-unstable# ./btrfsck -s 1 /dev/sdb1 using SB copy 1, bytenr 67108864 parent transid verify failed on 408626470912 wanted 24 found 111474 parent transid verify failed on 408626470912 wanted 24 found 111474 parent transid verify failed on 408626470912 wanted 24 found 111474 btrfsck: disk-io.c:416: find_and_setup_root: Assertion `!(!root-node)' failed. Abgebrochen root@tesla:/root/btrfs-progs-unstable# ./btrfsck -s 2 /dev/sdb1 using SB copy 2, bytenr 274877906944 parent transid verify failed on 408626470912 wanted 24 found 111474 parent transid verify failed on 408626470912 wanted 24 found 111474 parent transid verify failed on 408626470912 wanted 24 found 111474 btrfsck: disk-io.c:416: find_and_setup_root: Assertion `!(!root-node)' failed. Abgebrochen root@tesla:/root/btrfs-progs-unstable# ./btrfs filesystem show /dev/sdb1 failed to read /dev/sr0 Label: none uuid: d4bd7a05-ff76-4684-ba4d-e1f48acfd6cc Total devices 1 FS bytes used 544.12GB devid1 size 931.51GB used 547.79GB path /dev/sdb1 Btrfs v0.19-36-g70c6c10 The data on this filesystem is not important, though it would be nice if i could regain access to it. Is there anything I can try to salvage it? thanks Robert -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Having parent transid verify failed
Hello, I have a 5.5TB Btrfs filesystem on top of a md-raid 5 device. Now if i run some file operations like find, i get these messages. kernel is 2.6.38.5-1 on arch linux May 5 14:15:12 mail kernel: [13559.089713] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 14:15:12 mail kernel: [13559.089834] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 14:15:14 mail kernel: [13560.752074] btrfs-transacti D 88007211ac78 0 5339 2 0x May 5 14:15:14 mail kernel: [13560.752078] 880023167d30 0046 8800 8800195b6000 May 5 14:15:14 mail kernel: [13560.752082] 880023167c10 02c8f27b4000 880023167fd8 88007211a9a0 May 5 14:15:14 mail kernel: [13560.752085] 880023167fd8 880023167fd8 88007211ac80 880023167fd8 May 5 14:15:14 mail kernel: [13560.752087] Call Trace: May 5 14:15:14 mail kernel: [13560.752101] [a0850d02] ? run_clustered_refs+0x132/0x830 [btrfs] May 5 14:15:14 mail kernel: [13560.752105] [813aff3d] schedule_timeout+0x2fd/0x380 May 5 14:15:14 mail kernel: [13560.752108] [813b0cf9] ? mutex_unlock+0x9/0x10 May 5 14:15:14 mail kernel: [13560.752115] [a087e9f4] ? btrfs_run_ordered_operations+0x1f4/0x210 [btrfs] May 5 14:15:14 mail kernel: [13560.752122] [a0860fa3] btrfs_commit_transaction+0x263/0x750 [btrfs] May 5 14:15:14 mail kernel: [13560.752126] [81079ff0] ? autoremove_wake_function+0x0/0x40 May 5 14:15:14 mail kernel: [13560.752131] [a085a9bd] transaction_kthread+0x26d/0x290 [btrfs] May 5 14:15:14 mail kernel: [13560.752137] [a085a750] ? transaction_kthread+0x0/0x290 [btrfs] May 5 14:15:14 mail kernel: [13560.752139] [81079717] kthread+0x87/0x90 May 5 14:15:14 mail kernel: [13560.752142] [8100bc24] kernel_thread_helper+0x4/0x10 May 5 14:15:14 mail kernel: [13560.752145] [81079690] ? kthread+0x0/0x90 May 5 14:15:14 mail kernel: [13560.752147] [8100bc20] ? kernel_thread_helper+0x0/0x10 May 5 14:15:17 mail kernel: [13564.092081] verify_parent_transid: 40736 callbacks suppressed May 5 14:15:17 mail kernel: [13564.092084] parent transid verify failed on 3062073683968 wanted 5181 found 5188 --snip-- May 5 14:17:13 mail kernel: [13679.169772] parent transid verify failed on 3062073683968 wanted 5181 found 5188 --snip-- May 5 14:17:14 mail kernel: [13680.751996] btrfs-transacti D 88007211ac78 0 5339 2 0x May 5 14:17:14 mail kernel: [13680.752000] 880023167d30 0046 8800 8800195b6000 May 5 14:17:14 mail kernel: [13680.752004] 880023167c10 02c8f27b4000 880023167fd8 88007211a9a0 May 5 14:17:14 mail kernel: [13680.752006] 880023167fd8 880023167fd8 88007211ac80 880023167fd8 May 5 14:17:14 mail kernel: [13680.752009] Call Trace: May 5 14:17:14 mail kernel: [13680.752024] [a0850d02] ? run_clustered_refs+0x132/0x830 [btrfs] May 5 14:17:14 mail kernel: [13680.752030] [813aff3d] schedule_timeout+0x2fd/0x380 May 5 14:17:14 mail kernel: [13680.752032] [813b0cf9] ? mutex_unlock+0x9/0x10 May 5 14:17:14 mail kernel: [13680.752040] [a087e9f4] ? btrfs_run_ordered_operations+0x1f4/0x210 [btrfs] May 5 14:17:14 mail kernel: [13680.752046] [a0860fa3] btrfs_commit_transaction+0x263/0x750 [btrfs] May 5 14:17:14 mail kernel: [13680.752051] [81079ff0] ? autoremove_wake_function+0x0/0x40 May 5 14:17:14 mail kernel: [13680.752057] [a085a9bd] transaction_kthread+0x26d/0x290 [btrfs] May 5 14:17:14 mail kernel: [13680.752062] [a085a750] ? transaction_kthread+0x0/0x290 [btrfs] May 5 14:17:14 mail kernel: [13680.752065] [81079717] kthread+0x87/0x90 May 5 14:17:14 mail kernel: [13680.752068] [8100bc24] kernel_thread_helper+0x4/0x10 May 5 14:17:14 mail kernel: [13680.752070] [81079690] ? kthread+0x0/0x90 May 5 14:17:14 mail kernel: [13680.752072] [8100bc20] ? kernel_thread_helper+0x0/0x10 May 5 14:17:14 mail kernel: [13680.752079] dd D 8800714c4838 0 5792 5740 0x0004 May 5 14:17:14 mail kernel: [13680.752082] 88006a205b38 0082 88006a205af8 0246 May 5 14:17:14 mail kernel: [13680.752085] ea00017f57e8 88006a205fd8 88006a205fd8 8800714c4560 May 5 14:17:14 mail kernel: [13680.752088] 88006a205fd8 88006a205fd8 8800714c4840 88006a205fd8 May 5 14:17:14 mail kernel: [13680.752090] Call Trace: May 5 14:17:14 mail kernel: [13680.752095] [810ff145] ? zone_statistics+0x75/0x90 May 5 14:17:14 mail kernel: [13680.752098] [810ea8b7] ? get_page_from_freelist+0x3c7/0x820 May 5 14:17:14 mail kernel: [13680.752101] [810e3588] ? find_get_page+0x68/0xb0 May 5 14:17:14 mail kernel: [13680.752108] [a08603f9
Re: Having parent transid verify failed
Excerpts from Konstantinos Skarlatos's message of 2011-05-05 07:19:52 -0400: Hello, I have a 5.5TB Btrfs filesystem on top of a md-raid 5 device. Now if i run some file operations like find, i get these messages. kernel is 2.6.38.5-1 on arch linux Are all of the messages for this one block? parent transid verify failed on 3062073683968 wanted 5181 found 5188 -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Having parent transid verify failed
On 5/5/2011 2:42 μμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 07:19:52 -0400: Hello, I have a 5.5TB Btrfs filesystem on top of a md-raid 5 device. Now if i run some file operations like find, i get these messages. kernel is 2.6.38.5-1 on arch linux Are all of the messages for this one block? parent transid verify failed on 3062073683968 wanted 5181 found 5188 yes, only this block -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Having parent transid verify failed
Excerpts from Konstantinos Skarlatos's message of 2011-05-05 07:45:08 -0400: On 5/5/2011 2:42 μμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 07:19:52 -0400: Hello, I have a 5.5TB Btrfs filesystem on top of a md-raid 5 device. Now if i run some file operations like find, i get these messages. kernel is 2.6.38.5-1 on arch linux Are all of the messages for this one block? parent transid verify failed on 3062073683968 wanted 5181 found 5188 yes, only this block Ok, what were the call traces in there? Was there an oops or a hung task? It looks like part of the messages are missing. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Having parent transid verify failed
Excerpts from Konstantinos Skarlatos's message of 2011-05-05 10:27:30 -0400: attached you can find the whole dmesg log. I can trigger the error again if more logs are needed Yes, I'll send you a patch to get rid of the printk for the transid failed message. That way we can get a clean view of the other errors. Will you be able to compile/test it? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Having parent transid verify failed
On 5/5/2011 6:06 μμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 10:27:30 -0400: attached you can find the whole dmesg log. I can trigger the error again if more logs are needed Yes, I'll send you a patch to get rid of the printk for the transid failed message. That way we can get a clean view of the other errors. Will you be able to compile/test it? Yes, i think i will be able to make it, but because i have only done this once and in a quite hackish way, i may need some help in order to do it right. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Having parent transid verify failed
I think i made some progress. When i tried to remove the directory that i suspect contains the problematic file, i got this on the console rm -rf serverloft/ 2011 May 5 23:32:53 mail [ 200.580195] Oops: [#1] PREEMPT SMP 2011 May 5 23:32:53 mail [ 200.580220] last sysfs file: /sys/module/vt/parameters/default_utf8 2011 May 5 23:32:53 mail [ 200.581145] Stack: 2011 May 5 23:32:53 mail [ 200.581276] Call Trace: 2011 May 5 23:32:53 mail [ 200.581732] Code: cc 00 00 48 8d 91 28 e0 ff ff 48 89 e5 48 81 ec 90 00 00 00 48 89 5d d8 4c 89 65 e0 48 89 f3 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 8b 76 30 83 42 1c 01 48 b8 00 00 00 00 00 16 00 00 48 01 f0 2011 May 5 23:32:53 mail [ 200.583376] CR2: 0030 here is the part of dmesg that does not contain the thousands of parent transid verify failed messages May 5 23:32:51 mail kernel: [ 198.371084] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 23:32:51 mail kernel: [ 198.371204] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 23:32:53 mail kernel: [ 200.572774] Modules linked in: ipv6 btrfs zlib_deflate crc32c libcrc32c ext2 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx md_mod usb_storage uas snd_seq_dummy snd_seq_oss radeon snd_seq_midi_event ttm snd_seq snd_hda_codec_hdmi snd_seq_device drm_kms_helper ohci_hcd snd_hda_intel snd_hda_codec snd_pcm_oss snd_hwdep drm i2c_algo_bit snd_mixer_oss snd_pcm i2c_piix4 snd_timer snd soundcore snd_page_alloc ehci_hcd wmi i2c_core usbcore evdev processor button k10temp serio_raw pcspkr sg r8169 edac_core shpchp pci_hotplug edac_mce_amd mii sp5100_tco ext4 mbcache jbd2 crc16 sd_mod pata_acpi ahci libahci pata_atiixp libata scsi_mod May 5 23:32:53 mail kernel: [ 200.572808] Pid: 1037, comm: btrfs-transacti Not tainted 2.6.38-ARCH #1 May 5 23:32:53 mail kernel: [ 200.572810] Call Trace: May 5 23:32:53 mail kernel: [ 200.572817] [813a932b] ? __schedule_bug+0x59/0x5d May 5 23:32:53 mail kernel: [ 200.572820] [813af827] ? schedule+0x9f7/0xad0 May 5 23:32:53 mail kernel: [ 200.572823] [811e5827] ? generic_unplug_device+0x37/0x40 May 5 23:32:53 mail kernel: [ 200.572827] [a07ac164] ? md_raid5_unplug_device+0x64/0x110 [raid456] May 5 23:32:53 mail kernel: [ 200.572830] [a07ac223] ? raid5_unplug_queue+0x13/0x20 [raid456] May 5 23:32:53 mail kernel: [ 200.572833] [81012d79] ? read_tsc+0x9/0x20 May 5 23:32:53 mail kernel: [ 200.572837] [8108418c] ? ktime_get_ts+0xac/0xe0 May 5 23:32:53 mail kernel: [ 200.572840] [810e36c0] ? sync_page+0x0/0x50 May 5 23:32:53 mail kernel: [ 200.572842] [813af96e] ? io_schedule+0x6e/0xb0 May 5 23:32:53 mail kernel: [ 200.572844] [810e36fb] ? sync_page+0x3b/0x50 May 5 23:32:53 mail kernel: [ 200.572846] [813b0077] ? __wait_on_bit+0x57/0x80 May 5 23:32:53 mail kernel: [ 200.572848] [810e38c0] ? wait_on_page_bit+0x70/0x80 May 5 23:32:53 mail kernel: [ 200.572851] [8107a030] ? wake_bit_function+0x0/0x40 May 5 23:32:53 mail kernel: [ 200.572861] [a08348d2] ? read_extent_buffer_pages+0x412/0x480 [btrfs] May 5 23:32:53 mail kernel: [ 200.572867] [a0809e00] ? btree_get_extent+0x0/0x1b0 [btrfs] May 5 23:32:53 mail kernel: [ 200.572873] [a080ac7e] ? btree_read_extent_buffer_pages.isra.60+0x5e/0xb0 [btrfs] May 5 23:32:53 mail kernel: [ 200.572880] [a080c0bc] ? read_tree_block+0x3c/0x60 [btrfs] May 5 23:32:53 mail kernel: [ 200.572884] [a07f272b] ? read_block_for_search.isra.34+0x1fb/0x410 [btrfs] May 5 23:32:53 mail kernel: [ 200.572890] [a08417d1] ? btrfs_tree_unlock+0x51/0x60 [btrfs] May 5 23:32:53 mail kernel: [ 200.572895] [a07f5ca0] ? btrfs_search_slot+0x430/0xa30 [btrfs] May 5 23:32:53 mail kernel: [ 200.572900] [a07fb3a6] ? lookup_inline_extent_backref+0x96/0x460 [btrfs] May 5 23:32:53 mail kernel: [ 200.572904] [8112b8d3] ? kmem_cache_alloc+0x133/0x150 May 5 23:32:53 mail kernel: [ 200.572908] [a07fd452] ? __btrfs_free_extent+0xc2/0x6d0 [btrfs] May 5 23:32:53 mail kernel: [ 200.572914] [a0800f59] ? run_clustered_refs+0x389/0x830 [btrfs] May 5 23:32:53 mail kernel: [ 200.572920] [a084d900] ? btrfs_find_ref_cluster+0x10/0x190 [btrfs] May 5 23:32:53 mail kernel: [ 200.572925] [a08014c0] ? btrfs_run_delayed_refs+0xc0/0x210 [btrfs] May 5 23:32:53 mail kernel: [ 200.572927] [813b0cf9] ? mutex_unlock+0x9/0x10 May 5 23:32:53 mail kernel: [ 200.572933] [a0810db8] ? btrfs_commit_transaction+0x78/0x750 [btrfs] May 5 23:32:53 mail kernel: [ 200.572936] [81079ff0] ? autoremove_wake_function+0x0/0x40 May 5 23:32:53 mail kernel: [ 200.572941] [a080a9bd] ? transaction_kthread+0x26d/0x290 [btrfs] May 5 23:32:53 mail kernel
Re: Having parent transid verify failed
Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400: I think i made some progress. When i tried to remove the directory that i suspect contains the problematic file, i got this on the console rm -rf serverloft/ Ok, our one bad block is in the extent allocation tree. This is going to be the very hardest thing to fix. Until I finish off the code to rebuild parts of the extent allocation tree, I think your best bet is to copy the files off. The big question is, what happened to make this error? Can you describe your setup in more detail? -chris 2011 May 5 23:32:53 mail [ 200.580195] Oops: [#1] PREEMPT SMP 2011 May 5 23:32:53 mail [ 200.580220] last sysfs file: /sys/module/vt/parameters/default_utf8 2011 May 5 23:32:53 mail [ 200.581145] Stack: 2011 May 5 23:32:53 mail [ 200.581276] Call Trace: 2011 May 5 23:32:53 mail [ 200.581732] Code: cc 00 00 48 8d 91 28 e0 ff ff 48 89 e5 48 81 ec 90 00 00 00 48 89 5d d8 4c 89 65 e0 48 89 f3 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 8b 76 30 83 42 1c 01 48 b8 00 00 00 00 00 16 00 00 48 01 f0 2011 May 5 23:32:53 mail [ 200.583376] CR2: 0030 here is the part of dmesg that does not contain the thousands of parent transid verify failed messages May 5 23:32:51 mail kernel: [ 198.371084] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 23:32:51 mail kernel: [ 198.371204] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 23:32:53 mail kernel: [ 200.572774] Modules linked in: ipv6 btrfs zlib_deflate crc32c libcrc32c ext2 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx md_mod usb_storage uas snd_seq_dummy snd_seq_oss radeon snd_seq_midi_event ttm snd_seq snd_hda_codec_hdmi snd_seq_device drm_kms_helper ohci_hcd snd_hda_intel snd_hda_codec snd_pcm_oss snd_hwdep drm i2c_algo_bit snd_mixer_oss snd_pcm i2c_piix4 snd_timer snd soundcore snd_page_alloc ehci_hcd wmi i2c_core usbcore evdev processor button k10temp serio_raw pcspkr sg r8169 edac_core shpchp pci_hotplug edac_mce_amd mii sp5100_tco ext4 mbcache jbd2 crc16 sd_mod pata_acpi ahci libahci pata_atiixp libata scsi_mod May 5 23:32:53 mail kernel: [ 200.572808] Pid: 1037, comm: btrfs-transacti Not tainted 2.6.38-ARCH #1 May 5 23:32:53 mail kernel: [ 200.572810] Call Trace: May 5 23:32:53 mail kernel: [ 200.572817] [813a932b] ? __schedule_bug+0x59/0x5d May 5 23:32:53 mail kernel: [ 200.572820] [813af827] ? schedule+0x9f7/0xad0 May 5 23:32:53 mail kernel: [ 200.572823] [811e5827] ? generic_unplug_device+0x37/0x40 May 5 23:32:53 mail kernel: [ 200.572827] [a07ac164] ? md_raid5_unplug_device+0x64/0x110 [raid456] May 5 23:32:53 mail kernel: [ 200.572830] [a07ac223] ? raid5_unplug_queue+0x13/0x20 [raid456] May 5 23:32:53 mail kernel: [ 200.572833] [81012d79] ? read_tsc+0x9/0x20 May 5 23:32:53 mail kernel: [ 200.572837] [8108418c] ? ktime_get_ts+0xac/0xe0 May 5 23:32:53 mail kernel: [ 200.572840] [810e36c0] ? sync_page+0x0/0x50 May 5 23:32:53 mail kernel: [ 200.572842] [813af96e] ? io_schedule+0x6e/0xb0 May 5 23:32:53 mail kernel: [ 200.572844] [810e36fb] ? sync_page+0x3b/0x50 May 5 23:32:53 mail kernel: [ 200.572846] [813b0077] ? __wait_on_bit+0x57/0x80 May 5 23:32:53 mail kernel: [ 200.572848] [810e38c0] ? wait_on_page_bit+0x70/0x80 May 5 23:32:53 mail kernel: [ 200.572851] [8107a030] ? wake_bit_function+0x0/0x40 May 5 23:32:53 mail kernel: [ 200.572861] [a08348d2] ? read_extent_buffer_pages+0x412/0x480 [btrfs] May 5 23:32:53 mail kernel: [ 200.572867] [a0809e00] ? btree_get_extent+0x0/0x1b0 [btrfs] May 5 23:32:53 mail kernel: [ 200.572873] [a080ac7e] ? btree_read_extent_buffer_pages.isra.60+0x5e/0xb0 [btrfs] May 5 23:32:53 mail kernel: [ 200.572880] [a080c0bc] ? read_tree_block+0x3c/0x60 [btrfs] May 5 23:32:53 mail kernel: [ 200.572884] [a07f272b] ? read_block_for_search.isra.34+0x1fb/0x410 [btrfs] May 5 23:32:53 mail kernel: [ 200.572890] [a08417d1] ? btrfs_tree_unlock+0x51/0x60 [btrfs] May 5 23:32:53 mail kernel: [ 200.572895] [a07f5ca0] ? btrfs_search_slot+0x430/0xa30 [btrfs] May 5 23:32:53 mail kernel: [ 200.572900] [a07fb3a6] ? lookup_inline_extent_backref+0x96/0x460 [btrfs] May 5 23:32:53 mail kernel: [ 200.572904] [8112b8d3] ? kmem_cache_alloc+0x133/0x150 May 5 23:32:53 mail kernel: [ 200.572908] [a07fd452] ? __btrfs_free_extent+0xc2/0x6d0 [btrfs] May 5 23:32:53 mail kernel: [ 200.572914] [a0800f59] ? run_clustered_refs+0x389/0x830 [btrfs] May 5 23:32:53 mail kernel: [ 200.572920] [a084d900] ? btrfs_find_ref_cluster+0x10/0x190 [btrfs] May 5 23:32:53 mail kernel
Re: Having parent transid verify failed
On 5/5/2011 11:32 μμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400: I think i made some progress. When i tried to remove the directory that i suspect contains the problematic file, i got this on the console rm -rf serverloft/ Ok, our one bad block is in the extent allocation tree. This is going to be the very hardest thing to fix. Until I finish off the code to rebuild parts of the extent allocation tree, I think your best bet is to copy the files off. The big question is, what happened to make this error? Can you describe your setup in more detail? I created this btrfs filesystem on an arch linux system (amd64, quad core) with kernel 2.3.38.1. it is on top of a md raid 5. [root@linuxserver ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sde1[3] sdc1[1] sda1[0] sdf1[4] 5860535808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [] the raid was grown from 3 devices to 4, and then btrfs was grown to max size. mount options were clear_cache,compress-force. I was investigating a performance issue that i had, because over the network i could only write to the filesystem at about 32mb/sec. when writing btrfs-delalloc- cpu usage was at 100%. While investigating i disabled compression, enabled space_cache and tried zlib compression, and various combinations, while copying large files back and forth using samba. BTW I tried to change some mount options using mount -o remount but although the new options were printed on dmesg i think that they were not enabled. I got the first error when i was copying some files and at the same time created a directory over samba. After a while i upgraded to 2.6.38.5 but nothing seems to have changed. I really dont think there is a hardware error here, but to be safe I am now running a check on the raid -chris 2011 May 5 23:32:53 mail [ 200.580195] Oops: [#1] PREEMPT SMP 2011 May 5 23:32:53 mail [ 200.580220] last sysfs file: /sys/module/vt/parameters/default_utf8 2011 May 5 23:32:53 mail [ 200.581145] Stack: 2011 May 5 23:32:53 mail [ 200.581276] Call Trace: 2011 May 5 23:32:53 mail [ 200.581732] Code: cc 00 00 48 8d 91 28 e0 ff ff 48 89 e5 48 81 ec 90 00 00 00 48 89 5d d8 4c 89 65 e0 48 89 f3 4c 89 6d e8 4c 89 75 f0 4c 89 7d f848 8b 76 30 83 42 1c 01 48 b8 00 00 00 00 00 16 00 00 48 01 f0 2011 May 5 23:32:53 mail [ 200.583376] CR2: 0030 here is the part of dmesg that does not contain the thousands of parent transid verify failed messages May 5 23:32:51 mail kernel: [ 198.371084] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 23:32:51 mail kernel: [ 198.371204] parent transid verify failed on 3062073683968 wanted 5181 found 5188 May 5 23:32:53 mail kernel: [ 200.572774] Modules linked in: ipv6 btrfs zlib_deflate crc32c libcrc32c ext2 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx md_mod usb_storage uas snd_seq_dummy snd_seq_oss radeon snd_seq_midi_event ttm snd_seq snd_hda_codec_hdmi snd_seq_device drm_kms_helper ohci_hcd snd_hda_intel snd_hda_codec snd_pcm_oss snd_hwdep drm i2c_algo_bit snd_mixer_oss snd_pcm i2c_piix4 snd_timer snd soundcore snd_page_alloc ehci_hcd wmi i2c_core usbcore evdev processor button k10temp serio_raw pcspkr sg r8169 edac_core shpchp pci_hotplug edac_mce_amd mii sp5100_tco ext4 mbcache jbd2 crc16 sd_mod pata_acpi ahci libahci pata_atiixp libata scsi_mod May 5 23:32:53 mail kernel: [ 200.572808] Pid: 1037, comm: btrfs-transacti Not tainted 2.6.38-ARCH #1 May 5 23:32:53 mail kernel: [ 200.572810] Call Trace: May 5 23:32:53 mail kernel: [ 200.572817] [813a932b] ? __schedule_bug+0x59/0x5d May 5 23:32:53 mail kernel: [ 200.572820] [813af827] ? schedule+0x9f7/0xad0 May 5 23:32:53 mail kernel: [ 200.572823] [811e5827] ? generic_unplug_device+0x37/0x40 May 5 23:32:53 mail kernel: [ 200.572827] [a07ac164] ? md_raid5_unplug_device+0x64/0x110 [raid456] May 5 23:32:53 mail kernel: [ 200.572830] [a07ac223] ? raid5_unplug_queue+0x13/0x20 [raid456] May 5 23:32:53 mail kernel: [ 200.572833] [81012d79] ? read_tsc+0x9/0x20 May 5 23:32:53 mail kernel: [ 200.572837] [8108418c] ? ktime_get_ts+0xac/0xe0 May 5 23:32:53 mail kernel: [ 200.572840] [810e36c0] ? sync_page+0x0/0x50 May 5 23:32:53 mail kernel: [ 200.572842] [813af96e] ? io_schedule+0x6e/0xb0 May 5 23:32:53 mail kernel: [ 200.572844] [810e36fb] ? sync_page+0x3b/0x50 May 5 23:32:53 mail kernel: [ 200.572846] [813b0077] ? __wait_on_bit+0x57/0x80 May 5 23:32:53 mail kernel: [ 200.572848] [810e38c0] ? wait_on_page_bit+0x70/0x80 May 5 23:32:53 mail kernel: [ 200.572851] [8107a030] ? wake_bit_function+0x0/0x40 May 5 23:32:53 mail kernel: [ 200.572861] [a08348d2] ? read_extent_buffer_pages+0x412/0x480 [btrfs] May 5 23:32:53
Re: Having parent transid verify failed
Excerpts from Konstantinos Skarlatos's message of 2011-05-05 17:04:00 -0400: On 5/5/2011 11:32 μμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400: I think i made some progress. When i tried to remove the directory that i suspect contains the problematic file, i got this on the console rm -rf serverloft/ Ok, our one bad block is in the extent allocation tree. This is going to be the very hardest thing to fix. Until I finish off the code to rebuild parts of the extent allocation tree, I think your best bet is to copy the files off. The big question is, what happened to make this error? Can you describe your setup in more detail? I created this btrfs filesystem on an arch linux system (amd64, quad core) with kernel 2.3.38.1. it is on top of a md raid 5. [root@linuxserver ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sde1[3] sdc1[1] sda1[0] sdf1[4] 5860535808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [] the raid was grown from 3 devices to 4, and then btrfs was grown to max size. mount options were clear_cache,compress-force. I was investigating a performance issue that i had, because over the network i could only write to the filesystem at about 32mb/sec. when writing btrfs-delalloc- cpu usage was at 100%. While investigating i disabled compression, enabled space_cache and tried zlib compression, and various combinations, while copying large files back and forth using samba. BTW I tried to change some mount options using mount -o remount but although the new options were printed on dmesg i think that they were not enabled. I got the first error when i was copying some files and at the same time created a directory over samba. After a while i upgraded to 2.6.38.5 but nothing seems to have changed. I really dont think there is a hardware error here, but to be safe I am now running a check on the raid This error basically means we didn't write the block. It could be because the write went to the wrong spot, or the hardware stack messed it up, or because of a btrfs bug. But, 2.6.38 is relatively recent. It doesn't look like memory corruption because the transids are fairly close. When you grew the raid device, did you grow a partition as well? We've had trouble in the past with block dev flushing code kicking in as devices are resized. Samba isn't doing anything exotic, and 2.6.38 has my recent fixes for rare metadata corruption bugs in btrfs. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Having parent transid verify failed
Chris Mason wrote: We've had trouble in the past with block dev flushing code kicking in as devices are resized. Might this be the problem with my root node? I wish my problem was in only one directory. :) //Peter -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Having parent transid verify failed
On 6/5/2011 2:50 πμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 17:04:00 -0400: On 5/5/2011 11:32 μμ, Chris Mason wrote: Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400: I think i made some progress. When i tried to remove the directory that i suspect contains the problematic file, i got this on the console rm -rf serverloft/ Ok, our one bad block is in the extent allocation tree. This is going to be the very hardest thing to fix. Until I finish off the code to rebuild parts of the extent allocation tree, I think your best bet is to copy the files off. The big question is, what happened to make this error? Can you describe your setup in more detail? I created this btrfs filesystem on an arch linux system (amd64, quad core) with kernel 2.3.38.1. it is on top of a md raid 5. [root@linuxserver ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sde1[3] sdc1[1] sda1[0] sdf1[4] 5860535808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [] the raid was grown from 3 devices to 4, and then btrfs was grown to max size. mount options were clear_cache,compress-force. I was investigating a performance issue that i had, because over the network i could only write to the filesystem at about 32mb/sec. when writing btrfs-delalloc- cpu usage was at 100%. While investigating i disabled compression, enabled space_cache and tried zlib compression, and various combinations, while copying large files back and forth using samba. BTW I tried to change some mount options using mount -o remount but although the new options were printed on dmesg i think that they were not enabled. I got the first error when i was copying some files and at the same time created a directory over samba. After a while i upgraded to 2.6.38.5 but nothing seems to have changed. I really dont think there is a hardware error here, but to be safe I am now running a check on the raid This error basically means we didn't write the block. It could be because the write went to the wrong spot, or the hardware stack messed it up, or because of a btrfs bug. But, 2.6.38 is relatively recent. It doesn't look like memory corruption because the transids are fairly close. When you grew the raid device, did you grow a partition as well? We've had trouble in the past with block dev flushing code kicking in as devices are resized. no, I did not grow any partitions, I just added one disk to the Raid 5 md0 device, and then grew the btrfs filesystem to max size(no partitions on md0). I can remember that as a test (to see if shrink works) i shrank the fs by 1 gb and then grew it again to max size. Samba isn't doing anything exotic, and 2.6.38 has my recent fixes for rare metadata corruption bugs in btrfs. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Recovering parent transid verify failed
Hi, I'm having the same issues as previously mentioned. Apparently the new fsck tool will be able to recover this? Few questions, is there a GIT version I can compile and use already for this? If not, is there any indication of when this will be released? --- Luke Sheldrick e: l...@sheldrick.co.uk p: 07880 725099 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recovering parent transid verify failed
Excerpts from Luke Sheldrick's message of 2011-03-23 14:12:45 -0400: Hi, I'm having the same issues as previously mentioned. Apparently the new fsck tool will be able to recover this? Few questions, is there a GIT version I can compile and use already for this? If not, is there any indication of when this will be released? Yes, I'm still hammering out a reliable way to resolve most of these. But, please post the messages you're hitting, it is actually a very generic problem and has many different causes. What happened to your FS that made them come up? Which kernel were you running and what was the FS built on top of? What happens when you grab the latest btrfsck from git and do: btrfsck -s 1 /dev/xxx -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recovering parent transid verify failed
On Sun, Mar 06, 2011 at 12:28:41PM +0200, Yo'av Moshe wrote: Hey, I'd start by saying that I know Btrfs is a still experimental, and so there's no guarantee that one would be able to help me at all... But I thought I'll try anyway :-) Few months ago I bought a new laptop and installed ArchLinux on it, with Btrfs on the root filesystem... I know, it's not the smartest thing to do... After a few month I had issues with my hibernations scripts, and one day I tried to hibernate my computer but it didn't go that well, and, well, ever since then my Btrfs partition is not accessible. I opened up the Btrfs FAQ and saw that the fsck tool should be out by the end of 2010, and thought oh well, I could wait until then, and went on and installed Ubuntu with Ext4 on another small partition. But times goes one and the fsck tool is still in development... I've tried using the code from GIT and it didn't work, and I'm starting to wonder (a) if there's any hope at all and (b) what other step am I able to do to recover my old Btrfs partition. Yes, there is hope. This error should be fixable with the new fsck. When trying to mount the Btrfs parition I get this in dmesg: [105252.779080] device fsid d14e78a602757297-bf762d859b406ca9 devid 1 transid 135714 /dev/sda4 [105252.818697] parent transid verify failed on 216925220864 wanted 135714 found 135713 [snip] Should I wait for btrfsck to be ready? Yes. Am I not using it correctly now? No, there's not a lot the current version can do right now. Is there anyway to recover this partition or should I just wipe it and reinstall Btrfs only when I'm supposed to?.. Your help is appreciated. HTH, Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I am the author. You are the audience. I outrank you! --- signature.asc Description: Digital signature
Re: Fsck, parent transid verify failed
On Thu, Dec 9, 2010 at 6:14 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Tommy Jonsson's message of 2010-12-08 15:07:58 -0500: Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. -chris $ btrfsck -s 1 /dev/sda using SB copy 1, bytenr 67108864 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted $ btrfsck -s 2 /dev/sda using SB copy 2, bytenr 274877906944 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda : parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. dmesg said: [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 transid 39650 /dev/sdd [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 transid 39651 /dev/sdc [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda Sorry to be a bother, but do you have any other suggestions ? Not a bother at all, I'm polishing off a version of fsck that I hope will be able to construct a good tree for you. It's my main priority right now and I hope to have something ready early Monday. -chris Hi again Chris. Hope you survived Christmas and new year :] Just wanted to check in and see how you are progressing on the btrfsck? Drop me a mail if you want me to test/debug anything. -tommy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Excerpts from Tommy Jonsson's message of 2010-12-08 15:07:58 -0500: Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. -chris $ btrfsck -s 1 /dev/sda using SB copy 1, bytenr 67108864 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted $ btrfsck -s 2 /dev/sda using SB copy 2, bytenr 274877906944 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda : parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. dmesg said: [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 transid 39650 /dev/sdd [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 transid 39651 /dev/sdc [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda Sorry to be a bother, but do you have any other suggestions ? Not a bother at all, I'm polishing off a version of fsck that I hope will be able to construct a good tree for you. It's my main priority right now and I hope to have something ready early Monday. -chris Hi Chris. Thanks for all your help. Any progress on the fsck ? I pulled the latest btrfs-progs-unstable and recompiled, same output from all the commands (btrfsck -s / btrfs-debug-tree). -tommy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
On Fr, 10.12.10 15:11 Chris Mason chris.ma...@oracle.com wrote: What would be the steps to get it mounted? If btrfsck -s is able to find a good super, I've setup a tool that will copy the good super over into the default super. It is currently sitting in the next branch of the btrfs-progs-unstable repo. git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git next (or git pull into your existing checkout) Then make btrfs-select-super ./btrfs-selects-super -s 1 /dev/xxx After this you'll want to do a full backup and make sure things are working properly. -chris This worked fine. I was able to mount and completely read it. The volume seems healthy and is fully usable so far. Thanks a lot! ~thomas -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Excerpts from Tom Kuther's message of 2010-12-09 11:21:03 -0500: Chris Mason chris.mason at oracle.com writes: [...] Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. Hello, I get those parent transid verify failed errors too after a system failure. # btrfsck -s 1 /dev/md0 using SB copy 1, bytenr 67108864 found 1954912653312 bytes used err is 0 total csum bytes: 1892054684 total tree bytes: 3455627264 total fs tree bytes: 1082691584 btree space waste bytes: 584155173 file data blocks allocated: 12808940421120 referenced 1933520879616 Btrfs v0.19-35-g1b444cd-dirty # btrfsck -s 2 /dev/md0 using SB copy 2, bytenr 274877906944 found 1954912653312 bytes used err is 0 -snip- Both seem to work. What would be the steps to get it mounted? If btrfsck -s is able to find a good super, I've setup a tool that will copy the good super over into the default super. It is currently sitting in the next branch of the btrfs-progs-unstable repo. git clone git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git next (or git pull into your existing checkout) Then make btrfs-select-super ./btrfs-selects-super -s 1 /dev/xxx After this you'll want to do a full backup and make sure things are working properly. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Chris Mason chris.mason at oracle.com writes: [...] Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. Hello, I get those parent transid verify failed errors too after a system failure. # btrfsck -s 1 /dev/md0 using SB copy 1, bytenr 67108864 found 1954912653312 bytes used err is 0 total csum bytes: 1892054684 total tree bytes: 3455627264 total fs tree bytes: 1082691584 btree space waste bytes: 584155173 file data blocks allocated: 12808940421120 referenced 1933520879616 Btrfs v0.19-35-g1b444cd-dirty # btrfsck -s 2 /dev/md0 using SB copy 2, bytenr 274877906944 found 1954912653312 bytes used err is 0 -snip- Both seem to work. What would be the steps to get it mounted? Thanks in advance. ~thomas -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Excerpts from Tommy Jonsson's message of 2010-12-08 15:07:58 -0500: Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. -chris $ btrfsck -s 1 /dev/sda using SB copy 1, bytenr 67108864 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted $ btrfsck -s 2 /dev/sda using SB copy 2, bytenr 274877906944 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda : parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. dmesg said: [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 transid 39650 /dev/sdd [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 transid 39651 /dev/sdc [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda Sorry to be a bother, but do you have any other suggestions ? Not a bother at all, I'm polishing off a version of fsck that I hope will be able to construct a good tree for you. It's my main priority right now and I hope to have something ready early Monday. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. -chris $ btrfsck -s 1 /dev/sda using SB copy 1, bytenr 67108864 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted $ btrfsck -s 2 /dev/sda using SB copy 2, bytenr 274877906944 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda : parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. dmesg said: [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 transid 39650 /dev/sdd [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 transid 39651 /dev/sdc [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda Sorry to be a bother, but do you have any other suggestions ? Thanks! -tommy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Excerpts from Tommy Jonsson's message of 2010-12-01 06:00:56 -0500: Hi folks! Been using btrfs for quite a while now, worked great until now. Got power-loss on my machine and now i have the parent transid verify failed on X wanted X found X problem. So I can't get it to mount. My btrfs is spread over sda (2tb), sdc(2tb), sdd(1tb). Is this something that an offline fsck could fix ? If so is the fsck-util being developed ? Is there a way to mount the FS in a read-only mode or something to rescue the data ? Which kernel are you on? Unless you formatted with -m raid0, the current git tree should be able to read this FS by using the second copy of the metadata. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Excerpts from Tommy Jonsson's message of 2010-12-02 16:45:39 -0500: I can't remember if i used -m raid0. I think i just used mkfs.btrfs /dev/sda then btrfs device add /dev/sdb and same for sdc. I am sure that i didn't explicitly use -m raid1 or raid10. Is there a way that i can check this ? The defaults will maintain raid1 as you add more drives. We can check it with btrfs-debug-tree from the git repository. But, more below. If i do have raid0 for both metadata and data is there anything i can do ? I've been looking at the source but haven't got my head around it yet. What whould happen if i just ignore/bypass the transid error? The error: [265889.197279] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda [265889.198266] btrfs: use compression [265889.647817] parent transid verify failed on 2721514774528 wanted 39651 found 39649 [265889.672632] btrfs: open_ctree failed Or could i update the metadata to want 39649 ? The first thing I would try is: git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
$ btrfsck -s 1 /dev/sda using SB copy 1, bytenr 67108864 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted $ btrfsck -s 2 /dev/sda using SB copy 2, bytenr 274877906944 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted -tommy On Thu, Dec 2, 2010 at 10:50 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Tommy Jonsson's message of 2010-12-02 16:45:39 -0500: I can't remember if i used -m raid0. I think i just used mkfs.btrfs /dev/sda then btrfs device add /dev/sdb and same for sdc. I am sure that i didn't explicitly use -m raid1 or raid10. Is there a way that i can check this ? The defaults will maintain raid1 as you add more drives. We can check it with btrfs-debug-tree from the git repository. But, more below. If i do have raid0 for both metadata and data is there anything i can do ? I've been looking at the source but haven't got my head around it yet. What whould happen if i just ignore/bypass the transid error? The error: [265889.197279] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda [265889.198266] btrfs: use compression [265889.647817] parent transid verify failed on 2721514774528 wanted 39651 found 39649 [265889.672632] btrfs: open_ctree failed Or could i update the metadata to want 39649 ? The first thing I would try is: git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda : parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. dmesg said: [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 transid 39650 /dev/sdd [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 transid 39651 /dev/sdc [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda -tommy On Thu, Dec 2, 2010 at 10:59 PM, Tommy Jonsson quaz...@gmail.com wrote: $ btrfsck -s 1 /dev/sda using SB copy 1, bytenr 67108864 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted $ btrfsck -s 2 /dev/sda using SB copy 2, bytenr 274877906944 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted -tommy On Thu, Dec 2, 2010 at 10:50 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Tommy Jonsson's message of 2010-12-02 16:45:39 -0500: I can't remember if i used -m raid0. I think i just used mkfs.btrfs /dev/sda then btrfs device add /dev/sdb and same for sdc. I am sure that i didn't explicitly use -m raid1 or raid10. Is there a way that i can check this ? The defaults will maintain raid1 as you add more drives. We can check it with btrfs-debug-tree from the git repository. But, more below. If i do have raid0 for both metadata and data is there anything i can do ? I've been looking at the source but haven't got my head around it yet. What whould happen if i just ignore/bypass the transid error? The error: [265889.197279] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda [265889.198266] btrfs: use compression [265889.647817] parent transid verify failed on 2721514774528 wanted 39651 found 39649 [265889.672632] btrfs: open_ctree failed Or could i update the metadata to want 39649 ? The first thing I would try is: git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fsck, parent transid verify failed
Hi folks! Been using btrfs for quite a while now, worked great until now. Got power-loss on my machine and now i have the parent transid verify failed on X wanted X found X problem. So I can't get it to mount. My btrfs is spread over sda (2tb), sdc(2tb), sdd(1tb). Is this something that an offline fsck could fix ? If so is the fsck-util being developed ? Is there a way to mount the FS in a read-only mode or something to rescue the data ? Thanks, Tommy. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parent transid verify failed, continued
On Thu, Sep 23, 2010 at 08:24, Francis Galiegue fgalie...@gmail.com wrote: Hello list, I've been using btrfs for nearly 6 months now, on three machines, with no problems but for _one_ filesystem on one machine. The problem is the message in $subject. For this particular filesystem, which contains qemu-kvm disk images in raw mode with caching mode set to writeback, the symptoms is that: * in 2.6.34 and lower, I could mount the filesystem, with the parent transid verify failed message appearing once; * with 2.6.35+ and upper, however, not anymore: I mount it and the same parent transid very failed message now floods dmesg, and I cannot kill -9 any program trying to access that filesystem. [...] I just fear that I get [for my rootfs] into the situation of the hosed filesystem which I cannot mount anymore... And I just did. Dang. Fortunately I have the sysreccd with which I _can_ mount the filesystem! Phew. -- Francis Galiegue, fgalie...@gmail.com It seems obvious [...] that at least some 'business intelligence' tools invest so much intelligence on the business side that they have nothing left for generating SQL queries (Stéphane Faroult, in The Art of SQL, ISBN 0-596-00894-5) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
parent transid verify failed, continued
Hello list, I've been using btrfs for nearly 6 months now, on three machines, with no problems but for _one_ filesystem on one machine. The problem is the message in $subject. For this particular filesystem, which contains qemu-kvm disk images in raw mode with caching mode set to writeback, the symptoms is that: * in 2.6.34 and lower, I could mount the filesystem, with the parent transid verify failed message appearing once; * with 2.6.35+ and upper, however, not anymore: I mount it and the same parent transid very failed message now floods dmesg, and I cannot kill -9 any program trying to access that filesystem. I sent a mail to the list at the time: I bisected that to 5bdd3536cbbe2ecd94ecc14410c6b1b31da16381. The problem is still there. And this morning, while doing a btrfs filesystem defragment on the / of one of my machines (the one I'm writing this mail from, in fact), I saw this message four times again (kernel 2.6.36-rc5): Sep 23 07:42:11 erwin kernel: [ 148.689191] parent transid verify failed on 14077947904 wanted 316581 found 316247 Sep 23 07:42:11 erwin kernel: [ 148.689529] parent transid verify failed on 14077947904 wanted 316581 found 316247 Sep 23 07:42:13 erwin kernel: [ 151.059728] parent transid verify failed on 14084829184 wanted 316581 found 316247 Sep 23 07:42:13 erwin kernel: [ 151.060036] parent transid verify failed on 14084829184 wanted 316581 found 316247 Does that mean that there is corruption on the filesystem, somewhere? I just fear that I get into the situation of the hosed filesystem which I cannot mount anymore... -- Francis Galiegue, fgalie...@gmail.com It seems obvious [...] that at least some 'business intelligence' tools invest so much intelligence on the business side that they have nothing left for generating SQL queries (Stéphane Faroult, in The Art of SQL, ISBN 0-596-00894-5) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Any btrfsck to try out? [was: parent transid verify failed, continued]
On Thu, Sep 23, 2010 at 08:24, Francis Galiegue fgalie...@gmail.com wrote: [...] For this particular filesystem, which contains qemu-kvm disk images in raw mode with caching mode set to writeback, the symptoms is that: * in 2.6.34 and lower, I could mount the filesystem, with the parent transid verify failed message appearing once; * with 2.6.35+ and upper, however, not anymore: I mount it and the same parent transid very failed message now floods dmesg, and I cannot kill -9 any program trying to access that filesystem. [...] Another thing: as I can afford to recreate the hosed filesystem if need be, I'm also ready to try any offline (of course) repairing btrfsck on this filesystem and see if I can mount it again safely. Any btrfs-progs tree that I might try out? I have the possibility to boot from a USB key with a sufficiently recent kernel and test that, and attempt to mount the fs again... -- Francis Galiegue, fgalie...@gmail.com It seems obvious [...] that at least some 'business intelligence' tools invest so much intelligence on the business side that they have nothing left for generating SQL queries (Stéphane Faroult, in The Art of SQL, ISBN 0-596-00894-5) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
parent transid verify failed
After an unclean shutdown, my btrfs is now unmountable: device label root devid 1 transid 375202 /dev/sdc4 parent transid verify failed on 53984886784 wanted 375202 found 375201 parent transid verify failed on 53984886784 wanted 375202 found 375201 parent transid verify failed on 53984886784 wanted 375202 found 375201 btrfs: open_ctree failed btrfsck aborts: couldn't open because of unsupported option features (2). btrfsck: disk-io.c:682: open_ctree_fd: Assertion `!(1)' failed. [1]14899 abort btrfsck /dev/sdc4 Is there any way to recover the filesystem? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
how can i recover the btrfs after parent transid verify failed?
My raid5 crashed, so i changed the old hdd and started the rebuild. But now i can't mount the btrfs, the mount command frozen and the kernel give me in a loop the follwing lines: Jul 29 18:37:04 fileserver kernel: [ 1229.692268] verify_parent_transid: 2492 callbacks suppressed Jul 29 18:37:04 fileserver kernel: [ 1229.692274] parent transid verify failed on 6975016271872 wanted 204247 found 204249 Jul 29 18:37:04 fileserver kernel: [ 1229.692287] parent transid verify failed on 6975016271872 wanted 204247 found 204249 Jul 29 18:37:04 fileserver kernel: [ 1229.696549] parent transid verify failed on 6975016271872 wanted 204247 found 204249 Jul 29 18:37:04 fileserver kernel: [ 1229.696564] parent transid verify failed on 6975016271872 wanted 204247 found 204249 Jul 29 18:37:04 fileserver kernel: [ 1229.700419] parent transid verify failed on 6975016271872 wanted 204247 found 204249 Jul 29 18:37:04 fileserver kernel: [ 1229.700433] parent transid verify failed on 6975016271872 wanted 204247 found 204249 Jul 29 18:37:04 fileserver kernel: [ 1229.704392] parent transid verify failed on 6975016271872 wanted 204247 found 204249 Jul 29 18:37:04 fileserver kernel: [ 1229.704404] parent transid verify failed on 6975016271872 wanted 204247 found 204249 Jul 29 18:37:04 fileserver kernel: [ 1229.708384] parent transid verify failed on 6975016271872 wanted 204247 found 204249 Jul 29 18:37:04 fileserver kernel: [ 1229.708396] parent transid verify failed on 6975016271872 wanted 204247 found 204249 Is there any way to revocery the fs? rgds Fabian Kramer -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
dmesg filled with parent transid verify failed messages using Linus' HEAD as of 20100704
[Note: I am not on the list, can you please Cc: me on replies? Thank you in advance] Hello, Kernels: 2.6.34 to HEAD. Using ~amd64 Gentoo. All my filesystems except /boot are btrfs. btrfsck says I have a corrupted btrfs filesystem on my machine, and I see this message in dmesg when mounting the filesystem (in fs/btrfs/disk-io.c:284): parent transid verify failed on 48136192 wanted 16424 found 16420 But there is something strange: * this message only appears twice when running 2.6.34; * it fills my dmesg (several tens of thousands of times a second - printk_ratelimit() triggers to suppress the vast majority of them) when running HEAD. After a quick git log v2.6.34.. -- fs/btrfs, I found commit 5bdd3536cbbe2ecd94ecc14410c6b1b31da16381, which I reverted: HEAD now behaves like 2.6.34, dmesg-wise. Unintended side effect? -- Francis Galiegue ONE2TEAM Ingénieur système Mob : +33 (0) 683 877 875 Tel : +33 (0) 178 945 552 f...@one2team.com 40 avenue Raymond Poincaré 75116 Paris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html