parent transid verify failed

2017-05-11 Thread Massimo B.
Hello,

this is some btrfs-on-luks, USB hdd as blockdevice.
I can't mount my btrfs anymore, getting continuously the same syslog error:

- Last output repeated twice -
May 11 07:58:25 [kernel] BTRFS error (device dm-3): failed to read block groups:
-5
May 11 07:58:25 [kernel] BTRFS error (device dm-3): open_ctree failed
May 11 07:58:31 [kernel] BTRFS info (device dm-3): use zlib compression
May 11 07:58:31 [kernel] BTRFS info (device dm-3): enabling auto defrag
May 11 07:58:31 [kernel] BTRFS info (device dm-3): disk space caching is enabled
May 11 07:58:31 [kernel] BTRFS info (device dm-3): has skinny extents
May 11 07:58:33 [kernel] BTRFS error (device dm-3): parent transid verify failed
on 541635395584 wanted 10388 found 10385

This is the last part of btrfs check --repair (I know, highly experimental, but
I didn't get an alternative solution on #btrfs) :

rent transid verify failed on 541577035776 wanted 10388 found 10384
parent transid verify failed on 541577035776 wanted 10388 found 10384
parent transid verify failed on 541577035776 wanted 10388 found 10384
parent transid verify failed on 541577035776 wanted 10388 found 10384
parent transid verify failed on 541577035776 wanted 10388 found 10384
Chunk[256, 228, 429526089728]: length(1073741824), offset(429526089728), type(1)
is not found in block group
Chunk[256, 228, 430599831552]: length(1073741824), offset(430599831552), type(1)
is not found in block group
Chunk[256, 228, 431673573376]: length(1073741824), offset(431673573376), type(1)
is not found in block group
Chunk[256, 228, 434894798848]: length(1073741824), offset(434894798848), type(1)
is not found in block group
Chunk[256, 228, 435968540672]: length(1073741824), offset(435968540672), type(1)
is not found in block group
Chunk[256, 228, 437042282496]: length(1073741824), offset(437042282496), type(1)
is not found in block group
Chunk[256, 228, 438116024320]: length(1073741824), offset(438116024320), type(1)
is not found in block group
ref mismatch on [429497528320 40960] extent item 0, found 1
Backref 429497528320 parent 858210304 owner 0 offset 0 num_refs 0 not found in
extent tree
Incorrect local backref count on 429497528320 parent 858210304 owner 0 offset 0
found 1 wanted 0 back 0x37aaefc0
backpointer mismatch on [429497528320 40960]
parent transid verify failed on 541635395584 wanted 10388 found 10385
Ignoring transid failure
Failed to find [541635395584, 168, 16384]
btrfs unable to find ref byte nr 541635395584 parent 0 root 2  owner 1 offset 0
failed to repair damaged filesystem, aborting

How did that happen?
Yesterday I sent a big snapshot from local drive to a slower USB drive via
btrbk. That was  already finished. However the USB drive was completely filled
up to 99% and doing some IO apparently. Then I was not able to shutdown the
machine. Shutdown was really slow, finally umounts were accomplished, services
stopped, system shutdown almost finished, but no shutdown. I did a Sysreq- E I U
S R B, no reboot. Sysreq-O did not even shut off. So as last consequence I
disconnected power supply.

The broken btrfs is actually only a snapshot receiver as backup. I would prefer
to get it repaired. Seeing that btrfs is sensitive about filling up to 99%
usage, I'm worried about my production btrfs.

This is Gentoo-Linux, 4.10.14-ck, btrfs-progs-4.10.2.

Best regards,
Massimo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "parent transid verify failed" out of blue sky

2017-02-27 Thread Qu Wenruo



At 02/28/2017 02:51 AM, Andrei Borzenkov wrote:

This is VM under QEMU/KVM running openSUSE Tumbleweed. I boot it
infrequently for short time to test something. Last time it installed
quite a lot of updates including kernel (I think 4.9.11 was the last
version); I do not remember whether I rebooted it after that. Today I
booted it to check something, after 10 minutes did "reboot" and was
greeted with grub rescue prompt (it is located on btrfs itself and
apparently failed to read its modules as well). Any attempt to mount it
fails with "parent transid verfy failed". btrfsck --mode=lowmem from
current Tumbleweed snapshot runs for half an hour now with never-ending
same message.


Would you please provide the size of the fs?

lowmem mode is indeed slow, as it doesn't use much memory so it will do 
tons of tree search instead, which will cause tons of same "parent 
transid verify failed" if the corrupted node/leaf lies in a hot tree, 
like root tree or extent tree.


Despite that, would you please try to run btrfsck original mode (default 
mode) on the fs?


It may takes some memory but it's more mature than lowmem mode.
In fact there are near 10 bug fixes for lowmem mode mode merged yet.



I do not care in disk content really, but I would be interested in
trying to recover it under guidance. Also if it may be useful I can
provide image or other information.


Image would be best.
However I'm more interested how such problem happens.

In theory, Btrfs' mandatory metadata CoW and default data CoW should 
keep btrfs bullet proof to any powerloss.

(While real world is far from theory)

Thanks,
Qu



TIA

-andrei
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


"parent transid verify failed" out of blue sky

2017-02-27 Thread Andrei Borzenkov
This is VM under QEMU/KVM running openSUSE Tumbleweed. I boot it
infrequently for short time to test something. Last time it installed
quite a lot of updates including kernel (I think 4.9.11 was the last
version); I do not remember whether I rebooted it after that. Today I
booted it to check something, after 10 minutes did "reboot" and was
greeted with grub rescue prompt (it is located on btrfs itself and
apparently failed to read its modules as well). Any attempt to mount it
fails with "parent transid verfy failed". btrfsck --mode=lowmem from
current Tumbleweed snapshot runs for half an hour now with never-ending
same message.

I do not care in disk content really, but I would be interested in
trying to recover it under guidance. Also if it may be useful I can
provide image or other information.

TIA

-andrei
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


"parent transid verify failed"

2016-06-11 Thread Tobias Holst
Hi

I am getting some "parent transid verify failed"-errors. Is there any
way to find out what's affected? Are these errors in metadata, data or
both - and if they are errors in the data: How can I find out which
files are affected?

Regards,
Tobias
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


"Fixed", Re: parent transid verify failed on snapshot deletion

2016-03-20 Thread Roman Mamedov
On Sat, 12 Mar 2016 20:48:47 +0500
Roman Mamedov <r...@romanrm.net> wrote:

> The system was seemingly running just fine for days or weeks, then I
> routinely deleted a bunch of old snapshots, and suddenly got hit with:
> 
> [Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify 
> failed on 7483566862336 wanted 410578 found 404133
> [Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify 
> failed on 7483566862336 wanted 410578 found 404133

As I mentioned, the initial run of btrfsck --repair did not do anything to fix
this problem; I started btrfsck --repair --init-extent-tree, but it still not
finished after 5 days, so I looked for other options.

While reviewing the btrfs-progs source for some attempts to make btrfsck do
something about these transid-failures, I spotted the tool called
btrfs-corrupt-block. At this point I was ready to accept some loss of data,
which I'd expect to be minor if even user-visible at all (after all the
original backtrace is happening in "btrfs_clean_one_deleted_snapshot" so
perhaps all that the "bad" block was storing was only related to a snapshot
that's already been deleted).

I ran:

  /root/btrfs-corrupt-block -l 7483566862336 /dev/nbd8

Btrfsck then finally reported something inspiring some hope:

checking extents
checksum verify failed on 7483566862336 found 295F0086 wanted 
checksum verify failed on 7483566862336 found 295F0086 wanted 
checksum verify failed on 7483566862336 found 295F0086 wanted 
checksum verify failed on 7483566862336 found 295F0086 wanted 
bytenr mismatch, want=7483566862336, have=0
deleting pointer to block 7483566862336
ref mismatch on [6504947712 118784] extent item 0, found 1
adding new data backref on 6504947712 parent 4311306919936 owner 0 offset 0 
found 1
Backref 6504947712 parent 4311306919936 owner 0 offset 0 num_refs 0 not found 
in extent tree
Incorrect local backref count on 6504947712 parent 4311306919936 owner 0 offset 
0 found 1 wanted 0 back 0x57cfdff0
backpointer mismatch on [6504947712 118784]
...etc

After a few passes it settled into a state with no new errors reported (only
a few of "bad metadata crossing stripe boundary", but those seem to be also
commonly reported in connection with filesystems otherwise exhibiting no 
issues).

Finally I was able to mount the FS with no backtrace occurring anymore -- the
btrfs-cleaner process then finished all the remaining snapshot deletion work,
freeing up 20GB or so. All data seems to be present, and selective checksum
verifications showed no corruption. Well, this machine is primarily a backup
server using rsync, so it should catch and fix-up any losses.

As a side note, for experiments with 'btrfsck --repair', 'btrfs-corrupt-block'
and my own patched versions of btrfsck, the technique of making writable CoW
snapshots of the whole block device has proved invaluable:

At first I used the nbd-server '-c' mode, but quickly discovered it to be
flaky: it seems to crash if the amount of changes gets over 150 MB or so, and
anyways the RAM usage of it seems to match "block device size / 1000", i.e. it
used 6GB of RAM for a 6TB filesystem. So in the end I changed to using the
dm-snapshot target as described in [1]. One just has to remember to never have
the snapshot and the original device visible and trying to mount one of them
on the same machine (this will confuse Btrfs with duplicate UUIDs); for that,
I used the same nbd-server (not using its built-in CoW anymore), exporting
writable snapshots via network and mounting them on a different server or VM.

[1]http://stackoverflow.com/questions/7582019/lvm-like-snapshot-on-a-normal-block-device

-- 
With respect,
Roman


pgpNk9j4L06u0.pgp
Description: OpenPGP digital signature


Re: parent transid verify failed on snapshot deletion

2016-03-19 Thread Roman Mamedov
On Sun, 13 Mar 2016 15:52:52 -0600
Chris Murphy  wrote:

> I really think you need a minute's worth of kernel messages prior to
> that time stamp.

There was no messages a minute, or even (from memory) many hours prior to the
crash. If there was something even remotely weird or block-device or
FS-related, I would've of course included it with the original report.

-- 
With respect,
Roman


pgpaM13kZYnMZ.pgp
Description: OpenPGP digital signature


Re: parent transid verify failed on snapshot deletion

2016-03-13 Thread Chris Murphy
On Sun, Mar 13, 2016 at 2:55 PM, Roman Mamedov  wrote:
> On Sun, 13 Mar 2016 14:10:47 -0600
> Chris Murphy  wrote:
>
>> I'm going to guess it's a metadata block, and the profile is single.
>> Otherwise, if it were data it'd just be a corrupt file and you'd be
>> told which one is affected. And if metadata had more than one copy,
>> then it should recover from the copy. The exact nature of the loss
>> isn't clear, a kernel message for the time of the bad block message
>> might help but I'm going to guess again that it's a 4096 byte missing
>> block of metadata. Depending on what it is, that could be a pretty
>> serious hole for any file system.
>
> Pretty sure the metadata is DUP on that FS.

Big difference. If it's single and the block is bad, it's uncertain if
it's something Btrfs should be able to recover from. If it's DUP then
it should be a non-factor. In either case, kernel messages would be a
lot more enlightening about what happened right before this. The call
trace really isn't that helpful in my opinion, all that tells us is
Btrfs got confused.


I saved this from before the btrfsck passes:
>
> # btrfs-debug-tree -b 7483566862336 /dev/alpha/lv1
>:(
> node 7483566862336 level 3 items 95 free 26 generation 404133 owner 7
> fs uuid 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99
> chunk uuid 4688dce4-89dd-43eb-a0f4-d10900535183
> key (EXTENT_CSUM EXTENT_CSUM 1062973087744) block 4314139631616 
> (1053256746) gen 402032
> key (EXTENT_CSUM EXTENT_CSUM 1091441795072) block 4314548232192 
> (1053356502) gen 402102
> key (EXTENT_CSUM EXTENT_CSUM 1107647541248) block 7482607947776 
> (1826808581) gen 402791
> key (EXTENT_CSUM EXTENT_CSUM 1176289222656) block 7482608832512 
> (1826808797) gen 402791
> key (EXTENT_CSUM EXTENT_CSUM 1199852232704) block 7483421888512 
> (1827007297) gen 403882
> key (EXTENT_CSUM EXTENT_CSUM 1252762054656) block 7483566968832 
> (1827042717) gen 404133
> key (EXTENT_CSUM EXTENT_CSUM 1302207705088) block 7486122131456 
> (1827666536) gen 399086
> key (EXTENT_CSUM EXTENT_CSUM 1342292983808) block 7486136766464 
> (1827670109) gen 399086
> key (EXTENT_CSUM EXTENT_CSUM 1357230608384) block 7486143053824 
> (1827671644) gen 399088
> key (EXTENT_CSUM EXTENT_CSUM 1374801608704) block 7486219661312 
> (1827690347) gen 399097
> key (EXTENT_CSUM EXTENT_CSUM 140654296) block 7482936365056 
> (1826888761) gen 403108
> key (EXTENT_CSUM EXTENT_CSUM 1425602490368) block 7482806996992 
> (1826857177) gen 402938
> key (EXTENT_CSUM EXTENT_CSUM 1439588401152) block 7492133109760 
> (1829134060) gen 400631
> key (EXTENT_CSUM EXTENT_CSUM 1471449923584) block 7486878142464 
> (1827851109) gen 399121
> key (EXTENT_CSUM EXTENT_CSUM 1494641868800) block 7486882181120 
> (1827852095) gen 399121
> key (EXTENT_CSUM EXTENT_CSUM 1511553085440) block 7492376141824 
> (1829193394) gen 400803
> key (EXTENT_CSUM EXTENT_CSUM 1530452836352) block 7492377698304 
> (1829193774) gen 400803
> key (EXTENT_CSUM EXTENT_CSUM 1557468987392) block 7544937934848 
> (1842025863) gen 401275
> key (EXTENT_CSUM EXTENT_CSUM 1589122428928) block 7544937947136 
> (1842025866) gen 401275
> key (EXTENT_CSUM EXTENT_CSUM 1623402835968) block 7544935043072 
> (1842025157) gen 401275
> key (EXTENT_CSUM EXTENT_CSUM 1660158967808) block 7544935292928 
> (1842025218) gen 401275
> key (EXTENT_CSUM EXTENT_CSUM 1686639628288) block 7544935317504 
> (1842025224) gen 401275
> key (EXTENT_CSUM EXTENT_CSUM 1717318074368) block 7545404669952 
> (1842139812) gen 401300
> key (EXTENT_CSUM EXTENT_CSUM 1755587174400) block 7544935378944 
> (1842025239) gen 401275
> key (EXTENT_CSUM EXTENT_CSUM 1771312803840) block 7482802622464 
> (1826856109) gen 402938
> key (EXTENT_CSUM EXTENT_CSUM 1792774889472) block 7545001177088 
> (1842041303) gen 401281
> key (EXTENT_CSUM EXTENT_CSUM 1833762066432) block 7545013350400 
> (1842044275) gen 401278
> key (EXTENT_CSUM EXTENT_CSUM 1848938086400) block 7545009430528 
> (1842043318) gen 401278
> key (EXTENT_CSUM EXTENT_CSUM 1874773962752) block 7545013170176 
> (1842044231) gen 401278
> key (EXTENT_CSUM EXTENT_CSUM 1912300650496) block 4309044703232 
> (1052012867) gen 401366
> key (EXTENT_CSUM EXTENT_CSUM 1934921564160) block 4308804886528 
> (1051954318) gen 401354
> key (EXTENT_CSUM EXTENT_CSUM 1951308283904) block 4310900432896 
> (1052465926) gen 401686
> key (EXTENT_CSUM EXTENT_CSUM 1966261223424) block 4309153787904 
> (1052039499) gen 401376
> key (EXTENT_CSUM EXTENT_CSUM 1985369530368) block 4311094611968 
> (105251) gen 401757
> key (EXTENT_CSUM EXTENT_CSUM 2002212573184) block 4311279501312 
> (1052558472) gen 401766
>   

Re: parent transid verify failed on snapshot deletion

2016-03-13 Thread Roman Mamedov
On Sun, 13 Mar 2016 14:10:47 -0600
Chris Murphy  wrote:

> I'm going to guess it's a metadata block, and the profile is single.
> Otherwise, if it were data it'd just be a corrupt file and you'd be
> told which one is affected. And if metadata had more than one copy,
> then it should recover from the copy. The exact nature of the loss
> isn't clear, a kernel message for the time of the bad block message
> might help but I'm going to guess again that it's a 4096 byte missing
> block of metadata. Depending on what it is, that could be a pretty
> serious hole for any file system.

Pretty sure the metadata is DUP on that FS.

Besides, the "bad" block (only going by btrfsck's lingo here, it's not the usual
"hard disk got a bad block" problem) is not entirely missing, just 6k transids
older than it should be(???). I saved this from before the btrfsck passes:

# btrfs-debug-tree -b 7483566862336 /dev/alpha/lv1  
 :(
node 7483566862336 level 3 items 95 free 26 generation 404133 owner 7
fs uuid 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99
chunk uuid 4688dce4-89dd-43eb-a0f4-d10900535183
key (EXTENT_CSUM EXTENT_CSUM 1062973087744) block 4314139631616 
(1053256746) gen 402032
key (EXTENT_CSUM EXTENT_CSUM 1091441795072) block 4314548232192 
(1053356502) gen 402102
key (EXTENT_CSUM EXTENT_CSUM 1107647541248) block 7482607947776 
(1826808581) gen 402791
key (EXTENT_CSUM EXTENT_CSUM 1176289222656) block 7482608832512 
(1826808797) gen 402791
key (EXTENT_CSUM EXTENT_CSUM 1199852232704) block 7483421888512 
(1827007297) gen 403882
key (EXTENT_CSUM EXTENT_CSUM 1252762054656) block 7483566968832 
(1827042717) gen 404133
key (EXTENT_CSUM EXTENT_CSUM 1302207705088) block 7486122131456 
(1827666536) gen 399086
key (EXTENT_CSUM EXTENT_CSUM 1342292983808) block 7486136766464 
(1827670109) gen 399086
key (EXTENT_CSUM EXTENT_CSUM 1357230608384) block 7486143053824 
(1827671644) gen 399088
key (EXTENT_CSUM EXTENT_CSUM 1374801608704) block 7486219661312 
(1827690347) gen 399097
key (EXTENT_CSUM EXTENT_CSUM 140654296) block 7482936365056 
(1826888761) gen 403108
key (EXTENT_CSUM EXTENT_CSUM 1425602490368) block 7482806996992 
(1826857177) gen 402938
key (EXTENT_CSUM EXTENT_CSUM 1439588401152) block 7492133109760 
(1829134060) gen 400631
key (EXTENT_CSUM EXTENT_CSUM 1471449923584) block 7486878142464 
(1827851109) gen 399121
key (EXTENT_CSUM EXTENT_CSUM 1494641868800) block 7486882181120 
(1827852095) gen 399121
key (EXTENT_CSUM EXTENT_CSUM 1511553085440) block 7492376141824 
(1829193394) gen 400803
key (EXTENT_CSUM EXTENT_CSUM 1530452836352) block 7492377698304 
(1829193774) gen 400803
key (EXTENT_CSUM EXTENT_CSUM 1557468987392) block 7544937934848 
(1842025863) gen 401275
key (EXTENT_CSUM EXTENT_CSUM 1589122428928) block 7544937947136 
(1842025866) gen 401275
key (EXTENT_CSUM EXTENT_CSUM 1623402835968) block 7544935043072 
(1842025157) gen 401275
key (EXTENT_CSUM EXTENT_CSUM 1660158967808) block 7544935292928 
(1842025218) gen 401275
key (EXTENT_CSUM EXTENT_CSUM 1686639628288) block 7544935317504 
(1842025224) gen 401275
key (EXTENT_CSUM EXTENT_CSUM 1717318074368) block 7545404669952 
(1842139812) gen 401300
key (EXTENT_CSUM EXTENT_CSUM 1755587174400) block 7544935378944 
(1842025239) gen 401275
key (EXTENT_CSUM EXTENT_CSUM 1771312803840) block 7482802622464 
(1826856109) gen 402938
key (EXTENT_CSUM EXTENT_CSUM 1792774889472) block 7545001177088 
(1842041303) gen 401281
key (EXTENT_CSUM EXTENT_CSUM 1833762066432) block 7545013350400 
(1842044275) gen 401278
key (EXTENT_CSUM EXTENT_CSUM 1848938086400) block 7545009430528 
(1842043318) gen 401278
key (EXTENT_CSUM EXTENT_CSUM 1874773962752) block 7545013170176 
(1842044231) gen 401278
key (EXTENT_CSUM EXTENT_CSUM 1912300650496) block 4309044703232 
(1052012867) gen 401366
key (EXTENT_CSUM EXTENT_CSUM 1934921564160) block 4308804886528 
(1051954318) gen 401354
key (EXTENT_CSUM EXTENT_CSUM 1951308283904) block 4310900432896 
(1052465926) gen 401686
key (EXTENT_CSUM EXTENT_CSUM 1966261223424) block 4309153787904 
(1052039499) gen 401376
key (EXTENT_CSUM EXTENT_CSUM 1985369530368) block 4311094611968 
(105251) gen 401757
key (EXTENT_CSUM EXTENT_CSUM 2002212573184) block 4311279501312 
(1052558472) gen 401766
key (EXTENT_CSUM EXTENT_CSUM 2031789600768) block 4311093194752 
(1052512987) gen 401757
key (EXTENT_CSUM EXTENT_CSUM 2056985681920) block 4311095111680 
(1052513455) gen 401757
key (EXTENT_CSUM EXTENT_CSUM 2086494728192) block 4310101364736 
(1052270841) gen 401441
key (EXTENT_CSUM EXTENT_CSUM 2114637971456) block 4311356846080 
(1052577355) gen 401773
key (EXTENT_CSUM EXTENT_CSUM 

Re: parent transid verify failed on snapshot deletion

2016-03-13 Thread Sylvain Joyeux
My unfortunate experience with these transid problems is that they (1)
randomly appear without warning and (2) --repair completely destroys
the filesystem. I have right now two separate volumes on two separate
disks reporting that error, and --repair surely destroyed the first
one. I am trying to see what I can restore from the second one before
I try --repair as well.

The frustrating part is that these volumes in my case are only used to
receive subvolumes, and delete them. From an outsider's point of view,
it hardly seems to be a very intensive workload.

Sylvain

2016-03-12 12:48 GMT-03:00 Roman Mamedov <r...@romanrm.net>:
> Hello,
>
> The system was seemingly running just fine for days or weeks, then I
> routinely deleted a bunch of old snapshots, and suddenly got hit with:
>
> [Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify 
> failed on 7483566862336 wanted 410578 found 404133
> [Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify 
> failed on 7483566862336 wanted 410578 found 404133
> [Sat Mar 12 20:17:10 2016] [ cut here ]
> [Sat Mar 12 20:17:10 2016] WARNING: CPU: 0 PID: 217 at 
> fs/btrfs/extent-tree.c:6549 __btrfs_free_extent.isra.67+0x2c2/0xd40 [btrfs]()
> [Sat Mar 12 20:17:10 2016] BTRFS: Transaction aborted (error -5)
> [Sat Mar 12 20:17:10 2016] Modules linked in: xt_tcpudp xt_multiport xt_limit 
> xt_length xt_conntrack ip6t_rpfilter ipt_rpfilter ip6table_raw 
> ip6table_mangle iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
> nf_nat_ipv4 nf_nat nf_conntrack ip6table_filter ip6_tables iptable_filter 
> ip_tables x_tables cpufreq_userspace cpufreq_stats cpufreq_powersave 
> cpufreq_conservative cfg80211 rfkill arc4 ecb md4 hmac nls_utf8 cifs 
> dns_resolver fscache 8021q garp mrp bridge stp llc tcp_illinois ext4 crc16 
> mbcache jbd2 fuse kvm_amd kvm irqbypass serio_raw evdev pcspkr joydev 
> snd_hda_codec_realtek k10temp snd_hda_codec_generic snd_hda_codec_hdmi 
> snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep acpi_cpufreq sp5100_tco 
> snd_pcm snd_timer tpm_tis snd tpm shpchp soundcore i2c_piix4 button processor 
> btrfs dm_mod raid1 raid456
> [Sat Mar 12 20:17:10 2016]  async_raid6_recov async_memcpy async_pq async_xor 
> async_tx xor raid6_pq libcrc32c crc32c_generic md_mod sg ata_generic sd_mod 
> hid_generic usbhid hid uas usb_storage ohci_pci xhci_pci xhci_hcd r8169 mii 
> sata_mv ahci libahci pata_atiixp ehci_pci ohci_hcd ehci_hcd libata usbcore 
> usb_common scsi_mod
> [Sat Mar 12 20:17:10 2016] CPU: 0 PID: 217 Comm: btrfs-cleaner Tainted: G 
>W   4.4.4-rm1+ #108
> [Sat Mar 12 20:17:10 2016] Hardware name: Gigabyte Technology Co., Ltd. 
> GA-E350N-USB3/GA-E350N-USB3, BIOS F2 09/19/2011
> [Sat Mar 12 20:17:10 2016]  0286 7223a131 
> 880406befa88 81315721
> [Sat Mar 12 20:17:10 2016]  880406befad0 a03539b2 
> 880406befac0 8107e735
> [Sat Mar 12 20:17:10 2016]  000183c9c000 fffb 
> 88032dbc0e01 069c4f95b000
> [Sat Mar 12 20:17:10 2016] Call Trace:
> [Sat Mar 12 20:17:10 2016]  [] dump_stack+0x63/0x82
> [Sat Mar 12 20:17:10 2016]  [] 
> warn_slowpath_common+0x95/0xe0
> [Sat Mar 12 20:17:10 2016]  [] warn_slowpath_fmt+0x5c/0x80
> [Sat Mar 12 20:17:10 2016]  [] 
> __btrfs_free_extent.isra.67+0x2c2/0xd40 [btrfs]
> [Sat Mar 12 20:17:10 2016]  [] 
> __btrfs_run_delayed_refs+0x412/0x1230 [btrfs]
> [Sat Mar 12 20:17:10 2016]  [] ? 
> __percpu_counter_add+0x5d/0x80
> [Sat Mar 12 20:17:10 2016]  [] 
> btrfs_run_delayed_refs+0x7e/0x2b0 [btrfs]
> [Sat Mar 12 20:17:10 2016]  [] 
> btrfs_should_end_transaction+0x68/0x70 [btrfs]
> [Sat Mar 12 20:17:10 2016]  [] 
> btrfs_drop_snapshot+0x45d/0x840 [btrfs]
> [Sat Mar 12 20:17:10 2016]  [] ? __schedule+0x355/0xa30
> [Sat Mar 12 20:17:10 2016]  [] 
> btrfs_clean_one_deleted_snapshot+0xbd/0x120 [btrfs]
> [Sat Mar 12 20:17:10 2016]  [] cleaner_kthread+0x17d/0x210 
> [btrfs]
> [Sat Mar 12 20:17:10 2016]  [] ? check_leaf+0x370/0x370 
> [btrfs]
> [Sat Mar 12 20:17:10 2016]  [] kthread+0xea/0x100
> [Sat Mar 12 20:17:10 2016]  [] ? kthread_park+0x60/0x60
> [Sat Mar 12 20:17:10 2016]  [] ret_from_fork+0x3f/0x70
> [Sat Mar 12 20:17:10 2016]  [] ? kthread_park+0x60/0x60
> [Sat Mar 12 20:17:10 2016] ---[ end trace 4a0a05309f1c27f4 ]---
> [Sat Mar 12 20:17:10 2016] BTRFS: error (device dm-0) in 
> __btrfs_free_extent:6549: errno=-5 IO failure
> [Sat Mar 12 20:17:10 2016] BTRFS info (device dm-0): forced readonly
> [Sat Mar 12 20:17:10 2016] BTRFS: error (device dm-0) in 
> btrfs_run_delayed_refs:2927: errno=-5 IO failure
> [Sat Mar 12 20:17:10 2016] pending csums is 103825408
>
> Now this happens after each reboot too, causing the FS to be remounted

Re: parent transid verify failed on snapshot deletion

2016-03-13 Thread Chris Murphy
On Sun, Mar 13, 2016 at 11:24 AM, Roman Mamedov  wrote:

>
> "Blowing away" a 6TB filesystem just because some block randomly went "bad",

I'm going to guess it's a metadata block, and the profile is single.
Otherwise, if it were data it'd just be a corrupt file and you'd be
told which one is affected. And if metadata had more than one copy,
then it should recover from the copy. The exact nature of the loss
isn't clear, a kernel message for the time of the bad block message
might help but I'm going to guess again that it's a 4096 byte missing
block of metadata. Depending on what it is, that could be a pretty
serious hole for any file system.


> I'm running --init-extent-tree right now in a "what if" mode, using
> the copy-on-write feature of 'nbd-server' (this way the original block device
> is not modified, and all changes are saved in a separate file).

So it's a Btrfs on NDB with no replication either from Btrfs or the
storage backing it on the server? Off hand I'd say one of them needs
redundancy to avoid this very problem, otherwise it's just too easy
for even network corruption to cause a problem (NDB or iSCSI).

Not related to your problem, I'm not sure whether and how many times
Btrfs retries corrupt reads. That is, device returns read command OK
(no error), but Btrfs detects corruption. Does it retry? Or
immediately fail? For flash and network based Btrfs, it's possible the
result is intermittant so it should try again.

> It's been
> running for a good 8 hours now, with 100% CPU use of btrfsck and very little
> disk access.

Yeah btrfs check is very much RAM intensive.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failed on snapshot deletion

2016-03-13 Thread Roman Mamedov
On Sun, 13 Mar 2016 17:03:54 + (UTC)
Duncan <1i5t5.dun...@cox.net> wrote:

> With backups I'd try it, if only for the personal experience value and to 
> see what the result was.  But that's certainly more intensive "surgery" 
> on the filesystem than --repair, and I'd only do it either for that 
> experience value or if I was seriously desperate to recover files, as I'd 
> not trust the filesystem's health after that intensive a surgery, and 
> would blow the filesystem away after I recovered what I needed, even if 
> it did appear to work successfully.

"Blowing away" a 6TB filesystem just because some block randomly went "bad",
without any explanation why, or guarantees that this won't happen again, is not
the best outcome. Sure there might be no way to "guarantee" anything, but let's
at least figure out a robust way to recover from this failure state.

I'm running --init-extent-tree right now in a "what if" mode, using
the copy-on-write feature of 'nbd-server' (this way the original block device
is not modified, and all changes are saved in a separate file). It's been
running for a good 8 hours now, with 100% CPU use of btrfsck and very little
disk access. Unless I'm mistaken and something went majorly wrong, these
messages (100 MB worth of them by now) seem to indicate it indeed proceeds in
recreating the extent tree.

adding new data backref on 3282190336 parent 4315246948352 owner 0 offset 0 
found 1
Backref 3282190336 root 256 owner 1187677 offset 4096 num_refs 0 not found in 
extent tree
Incorrect local backref count on 3282190336 root 256 owner 1187677 offset 4096 
found 1 wanted 0 back 0x23496e40
Backref 3282190336 parent 4315038240768 owner 0 offset 0 num_refs 0 not found 
in extent tree
Incorrect local backref count on 3282190336 parent 4315038240768 owner 0 offset 
0 found 1 wanted 0 back 0x4b29f3a0
Backref 3282190336 parent 4315246948352 owner 0 offset 0 num_refs 0 not found 
in extent tree
Incorrect local backref count on 3282190336 parent 4315246948352 owner 0 offset 
0 found 1 wanted 0 back 0x4c330f60
backpointer mismatch on [3282190336 4096]
ref mismatch on [3282194432 32768] extent item 0, found 1
adding new data backref on 3282194432 parent 4309109956608 owner 0 offset 0 
found 1
Backref 3282194432 parent 4309109956608 owner 0 offset 0 num_refs 0 not found 
in extent tree
Incorrect local backref count on 3282194432 parent 4309109956608 owner 0 offset 
0 found 1 wanted 0 back 0x52903a20
backpointer mismatch on [3282194432 32768]
ref mismatch on [3282227200 4096] extent item 0, found 1

As it finishes I'll check if files are present and not corrupted, then will
have to run it once more, this time "for real". Unfortunately this also seems
to be an O(n) operation (if I'm using the term correctly), as the rate at which
new log messages appear has been slowing down considerably as it progresses.

-- 
With respect,
Roman


pgpuvtdhfBIeT.pgp
Description: OpenPGP digital signature


Re: parent transid verify failed on snapshot deletion

2016-03-13 Thread Duncan
Roman Mamedov posted on Sun, 13 Mar 2016 14:24:28 +0500 as excerpted:

> With "Errors found in extent allocation tree", I wonder if I should try
> --init-extent-tree next.

With backups I'd try it, if only for the personal experience value and to 
see what the result was.  But that's certainly more intensive "surgery" 
on the filesystem than --repair, and I'd only do it either for that 
experience value or if I was seriously desperate to recover files, as I'd 
not trust the filesystem's health after that intensive a surgery, and 
would blow the filesystem away after I recovered what I needed, even if 
it did appear to work successfully.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failed on snapshot deletion

2016-03-13 Thread Roman Mamedov
On Sat, 12 Mar 2016 22:15:24 +0500
Roman Mamedov <r...@romanrm.net> wrote:

> Seems like it should be safe to run --repair?

Well this is unexpected, I ran --repair, and it did not do anything.

# btrfsck --repair /dev/alpha/lv1
enabling repair mode
Checking filesystem on /dev/alpha/lv1
UUID: 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99
checking extents
parent transid verify failed on 7483566862336 wanted 410578 found 404133
parent transid verify failed on 7483566862336 wanted 410578 found 404133
parent transid verify failed on 7483566862336 wanted 410578 found 404133
parent transid verify failed on 7483566862336 wanted 410578 found 404133
Ignoring transid failure
bad block 7483566862336
Errors found in extent allocation tree or chunk allocation
parent transid verify failed on 7483566862336 wanted 410578 found 404133
Ignoring transid failure
Fixed 0 roots.
checking free space cache
parent transid verify failed on 7483566862336 wanted 410578 found 404133
Ignoring transid failure
There is no free space entry for 6504947712-7537164288
cache appears valid but isnt 6463422464
found 2455135691065 bytes used err is -22
total csum bytes: 0
total tree bytes: 368590848
total fs tree bytes: 0
total extent tree bytes: 364605440
btree space waste bytes: 122267201
file data blocks allocated: 1294204928
 referenced 1294204928

# btrfsck /dev/alpha/lv1
Checking filesystem on /dev/alpha/lv1
UUID: 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99
checking extents
parent transid verify failed on 7483566862336 wanted 410578 found 404133
parent transid verify failed on 7483566862336 wanted 410578 found 404133
parent transid verify failed on 7483566862336 wanted 410578 found 404133
parent transid verify failed on 7483566862336 wanted 410578 found 404133
Ignoring transid failure
bad block 7483566862336
Errors found in extent allocation tree or chunk allocation
parent transid verify failed on 7483566862336 wanted 410578 found 404133
Ignoring transid failure
checking free space cache
parent transid verify failed on 7483566862336 wanted 410578 found 404133
Ignoring transid failure
There is no free space entry for 6504947712-7537164288
cache appears valid but isnt 6463422464
found 2455135691065 bytes used err is -22
total csum bytes: 0
total tree bytes: 368590848
total fs tree bytes: 0
total extent tree bytes: 364605440
btree space waste bytes: 122267201
file data blocks allocated: 1294204928
 referenced 1294204928

With "Errors found in extent allocation tree", I wonder if I should
try --init-extent-tree next.

-- 
With respect,
Roman


pgp181FVnz1MP.pgp
Description: OpenPGP digital signature


Re: parent transid verify failed on snapshot deletion

2016-03-12 Thread Duncan
Roman Mamedov posted on Sat, 12 Mar 2016 20:48:47 +0500 as excerpted:

> I wonder what's the best way to proceed here. Maybe try btrfs-zero-log?
> But the difference between transid numbers of 6 thousands is concerning.

btrfs-zero-log is a very specific tool designed to fix a very specific 
problem, and transid differences >1 are not it.

I read your followup, posting btrfs check output and wondering about 
enabling --repair, as well.

As long as you have a backup, shouldn't be a problem, even if it does 
cause further damage (which it doesn't appear like it will in your case).

If you don't have a backup it shouldn't be a problem either, since the 
very fact that you don't have a backup, indicates by your actions that 
you consider the data at risk as of less value than the time, effort and 
resources necessary to have that backup in the first place.   As such, 
even if you lose the data, you saved what was obviously more important 
than that data to you, the time, effort and resources that you would have 
otherwise put into making and testing that backup, so you're still coming 
out ahead. =:^)

Which means the only case not clearly covered is that of data worth 
having backed up, which you do, but the backup is somewhat stale, and as 
long as the risk was theoretical, you didn't consider the chance of 
something happening to the data updated since the backup worth more than 
the cost of updating that backup.  But now that the theoretical chance 
has become reality, while loss of that incremental data isn't earth 
shattering in its consequences, you'd prefer not to lose it if you can 
save it without too much trouble.  That's quite understandable, and is 
the exact position I've been in myself a couple times.

In both my cases where I did end up actually giving up on repair and 
eventually blowing away the filesystem, btrfs restore (before that blow-
away) was able to get me back the incremental changes since my last 
proper backup.  If it hadn't worked I'd have certainly lost some work and 
been less than absolutely happy, but as I _did_ have backups (which by 
the fact that I had them indicated I actually valued the data at risk at 
something above trivial), they were simply somewhat stale, it wouldn't 
have been the end of the world.


Of course in your case you _can_ mount, if only in read-only mode.  So 
take the opportunity you've been handed and update your backups (and of 
course backups that haven't been verified readable/restorable aren't yet 
completed backups, as a would-be backup isn't complete and can't really 
be considered a backup yet, until that verification is done), just in 
case, and then even in the worst-case scenario, btrfs check --repair 
can't do more than inconvenience you a bit if it makes the problem worse 
instead of fixing it, since you have current backups and will only need 
to blow away the filesystem and recreate it fresh, in ordered to restore 
them.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failed on snapshot deletion

2016-03-12 Thread Roman Mamedov
Hello,

btrfsck output:

# btrfsck /dev/alpha/lv1
Checking filesystem on /dev/alpha/lv1
UUID: 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99
checking extents
parent transid verify failed on 7483566862336 wanted 410578 found 404133
parent transid verify failed on 7483566862336 wanted 410578 found 404133
parent transid verify failed on 7483566862336 wanted 410578 found 404133
parent transid verify failed on 7483566862336 wanted 410578 found 404133
Ignoring transid failure
bad block 7483566862336
Errors found in extent allocation tree or chunk allocation
parent transid verify failed on 7483566862336 wanted 410578 found 404133
Ignoring transid failure
checking free space cache
parent transid verify failed on 7483566862336 wanted 410578 found 404133
Ignoring transid failure
There is no free space entry for 6504947712-7537164288
cache appears valid but isnt 6463422464
found 2455135703350 bytes used err is -22
total csum bytes: 0
total tree bytes: 368590848
total fs tree bytes: 0
total extent tree bytes: 364605440
btree space waste bytes: 122267203
file data blocks allocated: 1294204928
 referenced 1294204928

Seems like it should be safe to run --repair?

-- 
With respect,
Roman


pgpWb9lTJLQG2.pgp
Description: OpenPGP digital signature


parent transid verify failed on snapshot deletion

2016-03-12 Thread Roman Mamedov
Hello,

The system was seemingly running just fine for days or weeks, then I
routinely deleted a bunch of old snapshots, and suddenly got hit with:

[Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify 
failed on 7483566862336 wanted 410578 found 404133
[Sat Mar 12 20:17:10 2016] BTRFS error (device dm-0): parent transid verify 
failed on 7483566862336 wanted 410578 found 404133
[Sat Mar 12 20:17:10 2016] [ cut here ]
[Sat Mar 12 20:17:10 2016] WARNING: CPU: 0 PID: 217 at 
fs/btrfs/extent-tree.c:6549 __btrfs_free_extent.isra.67+0x2c2/0xd40 [btrfs]()
[Sat Mar 12 20:17:10 2016] BTRFS: Transaction aborted (error -5)
[Sat Mar 12 20:17:10 2016] Modules linked in: xt_tcpudp xt_multiport xt_limit 
xt_length xt_conntrack ip6t_rpfilter ipt_rpfilter ip6table_raw ip6table_mangle 
iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat 
nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables 
cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_conservative cfg80211 
rfkill arc4 ecb md4 hmac nls_utf8 cifs dns_resolver fscache 8021q garp mrp 
bridge stp llc tcp_illinois ext4 crc16 mbcache jbd2 fuse kvm_amd kvm irqbypass 
serio_raw evdev pcspkr joydev snd_hda_codec_realtek k10temp 
snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec 
snd_hda_core snd_hwdep acpi_cpufreq sp5100_tco snd_pcm snd_timer tpm_tis snd 
tpm shpchp soundcore i2c_piix4 button processor btrfs dm_mod raid1 raid456
[Sat Mar 12 20:17:10 2016]  async_raid6_recov async_memcpy async_pq async_xor 
async_tx xor raid6_pq libcrc32c crc32c_generic md_mod sg ata_generic sd_mod 
hid_generic usbhid hid uas usb_storage ohci_pci xhci_pci xhci_hcd r8169 mii 
sata_mv ahci libahci pata_atiixp ehci_pci ohci_hcd ehci_hcd libata usbcore 
usb_common scsi_mod
[Sat Mar 12 20:17:10 2016] CPU: 0 PID: 217 Comm: btrfs-cleaner Tainted: G   
 W   4.4.4-rm1+ #108
[Sat Mar 12 20:17:10 2016] Hardware name: Gigabyte Technology Co., Ltd. 
GA-E350N-USB3/GA-E350N-USB3, BIOS F2 09/19/2011
[Sat Mar 12 20:17:10 2016]  0286 7223a131 880406befa88 
81315721
[Sat Mar 12 20:17:10 2016]  880406befad0 a03539b2 880406befac0 
8107e735
[Sat Mar 12 20:17:10 2016]  000183c9c000 fffb 88032dbc0e01 
069c4f95b000
[Sat Mar 12 20:17:10 2016] Call Trace:
[Sat Mar 12 20:17:10 2016]  [] dump_stack+0x63/0x82
[Sat Mar 12 20:17:10 2016]  [] warn_slowpath_common+0x95/0xe0
[Sat Mar 12 20:17:10 2016]  [] warn_slowpath_fmt+0x5c/0x80
[Sat Mar 12 20:17:10 2016]  [] 
__btrfs_free_extent.isra.67+0x2c2/0xd40 [btrfs]
[Sat Mar 12 20:17:10 2016]  [] 
__btrfs_run_delayed_refs+0x412/0x1230 [btrfs]
[Sat Mar 12 20:17:10 2016]  [] ? 
__percpu_counter_add+0x5d/0x80
[Sat Mar 12 20:17:10 2016]  [] 
btrfs_run_delayed_refs+0x7e/0x2b0 [btrfs]
[Sat Mar 12 20:17:10 2016]  [] 
btrfs_should_end_transaction+0x68/0x70 [btrfs]
[Sat Mar 12 20:17:10 2016]  [] 
btrfs_drop_snapshot+0x45d/0x840 [btrfs]
[Sat Mar 12 20:17:10 2016]  [] ? __schedule+0x355/0xa30
[Sat Mar 12 20:17:10 2016]  [] 
btrfs_clean_one_deleted_snapshot+0xbd/0x120 [btrfs]
[Sat Mar 12 20:17:10 2016]  [] cleaner_kthread+0x17d/0x210 
[btrfs]
[Sat Mar 12 20:17:10 2016]  [] ? check_leaf+0x370/0x370 
[btrfs]
[Sat Mar 12 20:17:10 2016]  [] kthread+0xea/0x100
[Sat Mar 12 20:17:10 2016]  [] ? kthread_park+0x60/0x60
[Sat Mar 12 20:17:10 2016]  [] ret_from_fork+0x3f/0x70
[Sat Mar 12 20:17:10 2016]  [] ? kthread_park+0x60/0x60
[Sat Mar 12 20:17:10 2016] ---[ end trace 4a0a05309f1c27f4 ]---
[Sat Mar 12 20:17:10 2016] BTRFS: error (device dm-0) in 
__btrfs_free_extent:6549: errno=-5 IO failure
[Sat Mar 12 20:17:10 2016] BTRFS info (device dm-0): forced readonly
[Sat Mar 12 20:17:10 2016] BTRFS: error (device dm-0) in 
btrfs_run_delayed_refs:2927: errno=-5 IO failure
[Sat Mar 12 20:17:10 2016] pending csums is 103825408

Now this happens after each reboot too, causing the FS to be remounted 
read-only.

I wonder what's the best way to proceed here. Maybe try btrfs-zero-log? But
the difference between transid numbers of 6 thousands is concerning.

Also puzzling why did this happen in the first place, I don't think this
filesystem had any crashes or storage device-related issues recently.

-- 
With respect,
Roman


pgpWNWaUvNZsT.pgp
Description: OpenPGP digital signature


Re: Fixing recursive fault and parent transid verify failed

2015-12-12 Thread Alistair Grant
On Wed, Dec 09, 2015 at 10:19:41AM +, Duncan wrote:
> Alistair Grant posted on Wed, 09 Dec 2015 09:38:47 +1100 as excerpted:
> 
> > On Tue, Dec 08, 2015 at 03:25:14PM +, Duncan wrote:
> > Thanks again Duncan for your assistance.
> > 
> > I plugged the ext4 drive I planned to use for the recovery in to the
> > machine and immediately got a couple of errors, which makes me wonder
> > whether there isn't a hardware problem with the machine somewhere.
> > 
> > So decided to move to another machine to do the recovery.
> 
> Ouch!  That can happen, and if you moved the ext4 drive to a different 
> machine and it was fine there, then it's not the drive.
> 
> But you didn't say what kind of errors or if you checked SMART, or even 
> how it was plugged in (USB or SATA-direct or...).  So I guess you have 
> that side of things under control.  (If not, there's some here who know 
> quite a bit about that sort of thing...)

Yep, I'm familiar enough with smartmontools, etc. to (hopefully) figure
this out on my own.


> 
> > So I'm now recovering on Arch Linux 4.1.13-1 with btrfs-progs v4.3.1
> > (the latest version from archlinuxarm.org).
> > 
> > Attempting:
> > 
> > sudo btrfs restore -S -m -v /dev/sdb /mnt/btrfs-recover/ ^&1 | tee
> > btrfs-recover.log
> > 
> > only recovered 53 of the more than 106,000 files that should be
> > available.
> > 
> > The log is available at:
> > 
> > https://www.dropbox.com/s/p8bi6b8b27s9mhv/btrfs-recover.log?dl=0
> > 
> > I did attempt btrfs-find-root, but couldn't make sense of the output:
> > 
> > https://www.dropbox.com/s/qm3h2f7c6puvd4j/btrfs-find-root.log?dl=0
> 
> Yeah, btrfs-find-root's output deciphering takes a bit of knowledge.  
> Between what I had said and the wiki, I was hoping you could make sense 
> of things without further help, but...
>
> ...

It turns out that a drive from a separate filesystem was dying and
causing all the weird behaviour on the original machine.

Having two failures at the same time (drive physical failure and btrfs
filesystem corruption) was a bit too much for me, so I aborted the btrfs
restore attempts, bought a replacement drive and just went back to the
backups (for both failures).

Unfortunately, I now won't be able to determine whether there was any
connection between the failures or not.

So while I didn't get to practice my restore skills, the good news is
that it is all back up and running without any problems (yet :-)).

Thank you very much for the description and detailed set of steps for
using btrfs-find-root and restore.  While I didn't get to use them this
time, I've added links to the mailing list archive in my btrfs wiki user
page so I can find my way back (and if others search for restore and
find root they may also benefit from your effort).

Thanks again,
Alistair

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing recursive fault and parent transid verify failed

2015-12-09 Thread Duncan
Alistair Grant posted on Wed, 09 Dec 2015 09:38:47 +1100 as excerpted:

> On Tue, Dec 08, 2015 at 03:25:14PM +, Duncan wrote:
>> Alistair Grant posted on Tue, 08 Dec 2015 06:55:04 +1100 as excerpted:
>> 
>> > On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote:
>> >> Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as
>> >> excerpted:
>> >> 
>> >> > I think I'll try the btrfs restore as a learning exercise
>> >> 
>> >> Trying btrfs restore is an excellent idea.  It'll make things far
>> >> easier if you have to use it for real some day.
> 
> Thanks again Duncan for your assistance.
> 
> I plugged the ext4 drive I planned to use for the recovery in to the
> machine and immediately got a couple of errors, which makes me wonder
> whether there isn't a hardware problem with the machine somewhere.
> 
> So decided to move to another machine to do the recovery.

Ouch!  That can happen, and if you moved the ext4 drive to a different 
machine and it was fine there, then it's not the drive.

But you didn't say what kind of errors or if you checked SMART, or even 
how it was plugged in (USB or SATA-direct or...).  So I guess you have 
that side of things under control.  (If not, there's some here who know 
quite a bit about that sort of thing...)

> So I'm now recovering on Arch Linux 4.1.13-1 with btrfs-progs v4.3.1
> (the latest version from archlinuxarm.org).
> 
> Attempting:
> 
> sudo btrfs restore -S -m -v /dev/sdb /mnt/btrfs-recover/ ^&1 | tee
> btrfs-recover.log
> 
> only recovered 53 of the more than 106,000 files that should be
> available.
> 
> The log is available at:
> 
> https://www.dropbox.com/s/p8bi6b8b27s9mhv/btrfs-recover.log?dl=0
> 
> I did attempt btrfs-find-root, but couldn't make sense of the output:
> 
> https://www.dropbox.com/s/qm3h2f7c6puvd4j/btrfs-find-root.log?dl=0

Yeah, btrfs-find-root's output deciphering takes a bit of knowledge.  
Between what I had said and the wiki, I was hoping you could make sense 
of things without further help, but...

Well, at least this gets you some practice before you are desperate. =:^)

FWIW, I was really hoping that it would find generation/transid 2308, 
since that's what it was finding on those errors, but that seems to be 
too far back.

OK, here's the thing about transaction IDs aka transids aka generations.  
Normally, it's a monotonically increasing number, representing the 
transaction/commit count at that point.

Taking a step back, btrfs organizes things as a tree of trees, with each 
change cascading up (down?) the tree to its root, and then to the master 
tree's root.  Between this and btrfs' copy-on-write nature, this means 
the filesystem is atomic.  If the system crashes at any point, either the 
latest changes are committed and the master root reflects them, or the 
master root points to the previous consistent state of all the subtrees, 
which is still in place due to copy-on-write and the fact that the 
changes hadn't cascaded all the way up the trees to the master root, yet.

And each time the master root is updated, the generation aka transid is 
incremented by one.  So 3503 is the current generation (see the superblock 
thinks... bit), 3502 the one before that, 3501 the one before that...

The superblocks record the current transid and point (by address, aka 
bytenr) to that master root.

But, because btrfs is copy-on-write, older copies of the master root (and 
the other roots it points to) tend to hang around for awhile.  Which is 
where btrfs-find-root comes along, as it's designed to find all those old 
roots, listing them by bytenr and generation/transid.

In your case, while generation 3361 is current, there's a list going back 
to generation 2497 with only a few (just eyeballing it) missing, then 
2326, and pretty much nothing before that but the REALLY early generation 
2 and 3, which are likely a nearly empty filesystem.

OK, that explains the generations/transids.  There's also levels, which I 
don't clearly understand myself; definitely not well enough to try to 
explain, tho I could make some WAGs but that'd just confuse things if 
they're equally wildly wrong.  But it turns out that levels aren't in 
practice something you normally need to worry much about anyway, so 
ignoring them seems to work fine.

Then, there's bytenrs, the block addresses.  These are more or less 
randomly large numbers, from an admin perspective, but they're very 
important numbers, because this is the number you feed to restore's -t 
option, that tells it which tree root to use.

Put a different way, humans read the generation aka transid numbers; 
btrfs reads the block numbers.  So what we do is find a generation number 
that looks reasonable, and get its corresponding block number, to feed to 
restore -t.


OK, knowing that, you can perhaps make a bit more sense of what those 
transid verify failed messages are all about.  As I said, the current 
generation is 3503.  Apparently, there's a problem in a subtree, however, 
where the 

Re: Fixing recursive fault and parent transid verify failed

2015-12-08 Thread Duncan
Alistair Grant posted on Tue, 08 Dec 2015 06:55:04 +1100 as excerpted:

> On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote:
>> Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted:
>> 
>> > I think I'll try the btrfs restore as a learning exercise, and to
>> > check the contents of my backup (I don't trust my memory, so
>> > something could have changed since the last backup).
>> 
>> Trying btrfs restore is an excellent idea.  It'll make things far
>> easier if you have to use it for real some day.
>> 
>> Note that while I see your kernel is reasonably current (4.2 series), I
>> don't know what btrfs-progs ubuntu ships.  There have been some marked
>> improvements to restore somewhat recently, checking the wiki
>> btrfs-progs release-changelog list says 4.0 brought optional metadata
>> restore, 4.0.1 added --symlinks, and 4.2.3 fixed a symlink path check
>> off-by-one error. (And don't use 4.1.1 as its mkfs.btrfs is broken and
>> produces invalid filesystems.)  So you'll want at least progs 4.0 to
>> get the optional metadata restoration, and 4.2.3 to get full symlinks
>> restoration support.
>> 
>> 
> Ubuntu 15.10 comes with btrfs-progs v4.0.  It looks like it is easy
> enough to compile and install the latest version from
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git so
> I'll do that.
> 
> Should I stick to 4.2.3 or use the latest 4.3.1?

I generally use the latest myself, but recommend as a general guideline 
that at minimum, a userspace version series matching that of your kernel 
be used, as if the usual kernel recommendations (within two kernel series 
of either current or LTS, so presently 4.2 or 4.3 for current or 3.18 or 
4.1 for LTS) are followed, that will keep userspace reasonably current as 
well, and the userspace of a particular version was being developed 
concurrently with the kernel of the same series, so they're relatively in 
sync.

So with a 4.2 kernel, I'd suggest at least a 4.2 userspace.  If you want 
the latest, as I generally do, and are willing to put up with occasional 
bleeding edge bugs like that broken mkfs.btrfs in 4.1.1, by all means, 
use the latest, but otherwise, the general same series as your kernel 
guideline is quite acceptable.

The exception would be if you're trying to fix or recover from a broken 
filesystem, in which case the very latest tends to have the best chance 
at fixing things, since it has fixes for (or lacking that, at least 
detection of) the latest round of discovered bugs, that older versions 
will lack.

While btrfs restore does fall into the recover from broken category, we 
know from the changelogs that nothing specific has gone into it since the 
mentioned 4.2.3 symlink off-by-one fix, so while I would recommend at 
least that since you are going to be working with restore, there's no 
urgent need for 4.3.0 or 4.3.1 if you're more comfortable with the older 
version.  (In fact, while I knew I was on 4.3.something, I just had to 
run btrfs version, to check whether it was 4.3 or 4.3.1, myself.  FWIW, 
it was 4.3.1.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing recursive fault and parent transid verify failed

2015-12-08 Thread Alistair Grant
On Tue, Dec 08, 2015 at 03:25:14PM +, Duncan wrote:
> Alistair Grant posted on Tue, 08 Dec 2015 06:55:04 +1100 as excerpted:
> 
> > On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote:
> >> Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted:
> >> 
> >> > I think I'll try the btrfs restore as a learning exercise, and to
> >> > check the contents of my backup (I don't trust my memory, so
> >> > something could have changed since the last backup).
> >> 
> >> Trying btrfs restore is an excellent idea.  It'll make things far
> >> easier if you have to use it for real some day.
> >> 
> >> Note that while I see your kernel is reasonably current (4.2 series), I
> >> don't know what btrfs-progs ubuntu ships.  There have been some marked
> >> improvements to restore somewhat recently, checking the wiki
> >> btrfs-progs release-changelog list says 4.0 brought optional metadata
> >> restore, 4.0.1 added --symlinks, and 4.2.3 fixed a symlink path check
> >> off-by-one error. (And don't use 4.1.1 as its mkfs.btrfs is broken and
> >> produces invalid filesystems.)  So you'll want at least progs 4.0 to
> >> get the optional metadata restoration, and 4.2.3 to get full symlinks
> >> restoration support.
> >> 
> >> ...


Thanks again Duncan for your assistance.

I plugged the ext4 drive I planned to use for the recovery in to the
machine and immediately got a couple of errors, which makes me wonder
whether there isn't a hardware problem with the machine somewhere.  So
decided to move to another machine to do the recovery.

So I'm now recovering on Arch Linux 4.1.13-1 with btrfs-progs v4.3.1
(the latest version from archlinuxarm.org).

Attempting:

sudo btrfs restore -S -m -v /dev/sdb /mnt/btrfs-recover/ ^&1 | tee 
btrfs-recover.log

only recovered 53 of the more than 106,000 files that should be available.

The log is available at: 

https://www.dropbox.com/s/p8bi6b8b27s9mhv/btrfs-recover.log?dl=0

I did attempt btrfs-find-root, but couldn't make sense of the output:

https://www.dropbox.com/s/qm3h2f7c6puvd4j/btrfs-find-root.log?dl=0

Simply mounting the drive, then re-mounting it read only, and rsync'ing
the files to the backup drive recovered 97,974 files before crashing.
If anyone is interested, I've uploaded a photo of the console to:

https://www.dropbox.com/s/xbrp6hiah9y6i7s/rsync%20crash.jpg?dl=0

I'm currently running a hashdeep audit between the recovered files and
the backup to see how the recovery went.

If you'd like me to try any other tests, I'll keep the damaged file
system for at least the next day or so.

Thanks again for all your assistance,
Alistair

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fixing recursive fault and parent transid verify failed

2015-12-07 Thread Duncan
Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted:

> I think I'll try the btrfs restore as a learning exercise, and to check
> the contents of my backup (I don't trust my memory, so something could
> have changed since the last backup).

Trying btrfs restore is an excellent idea.  It'll make things far easier 
if you have to use it for real some day.

Note that while I see your kernel is reasonably current (4.2 series), I 
don't know what btrfs-progs ubuntu ships.  There have been some marked 
improvements to restore somewhat recently, checking the wiki btrfs-progs 
release-changelog list says 4.0 brought optional metadata restore, 4.0.1 
added --symlinks, and 4.2.3 fixed a symlink path check off-by-one error.  
(And don't use 4.1.1 as its mkfs.btrfs is broken and produces invalid 
filesystems.)  So you'll want at least progs 4.0 to get the optional 
metadata restoration, and 4.2.3 to get full symlinks restoration support.

> Does btrfs restore require the path to be on a btrfs filesystem?  I've
> got an existing ext4 drive with enough free space to do the restore, so
> would prefer to use it than have to buy another drive.

Restoring to ext4 should be fine.

Btrfs restore writes files as would an ordinary application, the reason 
metadata restoration is optional (otherwise it uses normal file change 
and mod times, with files written as the running user, root, using umask-
based file perms, all exactly the same as if it were a normal file 
writing application), so it will restore to any normal filesystem.  The 
filesystem it's restoring /from/ of course must be btrfs... unmounted 
since it's designed to be used when mounting is broken, but it writes 
files normally, so can write them to any filesystem.

FWIW, I restored to my reiserfs based media partition (still on spinning 
rust, my btrfs are all on ssd) here, since that's where I had the room to 
work with.

> My plan is:
> 
> * btrfs restore /dev/sdX /path/to/ext4/restorepoint
> ** Where /dev/sdX is one of the two drives that were part of the raid1
>fileystem
> * hashdeep audit the restored drive and backup
> * delete the existing corrupted btrfs filesystem and recreate
> * rsync the merge filesystem (from backup and restore)
>   on to the new filesystem
> 
> Any comments or suggestions are welcome.


Looks very reasonable, here.  There's a restore page on the wiki with 
more information than the btrfs-restore manpage, describing how to use it 
with btrfs-find-root if necessary, etc.

https://btrfs.wiki.kernel.org/index.php/Restore

Some details on the page are a bit dated; it doesn't cover the dryrun, 
list-roots, metadata and symlink options, for instance, and these can be 
very helpful, but the general idea remains the same.

The general idea is to use btrfs-find-root to get a listing of available 
root generations (if restore can't find a working root from the 
superblocks or you want to try restoring an earlier root), then feed the 
corresponding bytenr to restore's -t option.

Note that generation and transid refer to the same thing, a normally 
increasing number, so higher generations are newer.  The wiki page makes 
this much clearer than it used to, but the old wording anyway was 
confusing to me until I figured that out.

Where the wiki page talks about root object-ids, those are the various 
subtrees, low numbers are the base trees, 256+ are subvolumes/snapshots.  
Note that restore's list-roots option lists these for the given bytenr as 
well.

So you try restore with list-roots (-l) to see what it gives you, try 
btrfs-find-root if not satisfied, to find older generations and get their 
bytenrs to plug into restore with -t, and then confirm specific 
generation bytenrs with list-roots again.

Once you have a good generation/bytenr candidate, try a dry-run (-D) to 
see if you get a list of files it's trying to restore that looks 
reasonable.

If the dry-run goes well, you can try the full restore, not forgetting 
the metadata and symlinks options (-m, -S, respectively), if desired.

>From there you can continue with your plan as above.

One more bonus hint.  Since you'll be doing a new mkfs.btrfs, it's a good 
time to review active features and decide which ones you might wish to 
activate (or not, if you're concerned about old-kernel compatibility).  
Additionally, before repopulating your new filesystem, you may want to 
review mount options, particularly autodefrag if appropriate, and 
compression if desired, so they take effect from the very first file 
created on the new filesystem. =:^)

FWIW in the past I usually did an immediate post-mkfs.btrfs mount and 
balance with -dusage=0 -musage=0 to get rid of the single-mode chunk 
artifacts from the mkfs.btrfs as well, but with a new enough mkfs.btrfs 
you may be able to avoid that now, as -progs 4.2 was supposed to 
eliminate those single-mode mkfs.btrfs artifacts on multi-device 
filesystems.  I've just not done any fresh mkfs.btrfs since then so 
haven't had a 

Re: Fixing recursive fault and parent transid verify failed

2015-12-07 Thread Duncan
Alistair Grant posted on Mon, 07 Dec 2015 12:57:15 +1100 as excerpted:

> I've ran btrfs scrub and btrfsck on the drives, with the output included
> below.  Based on what I've found on the web, I assume that a
> btrfs-zero-log is required.
> 
> * Is this the recommended path?

[Just replying to a couple more minor points, here.]

Absolutely not.  btrfs-zero-log isn't the tool you need here.

About the btrfs log...

Unlike most journaling filesystems, btrfs is designed to be atomic and 
consistent at commit time (every 30 seconds by default) and doesn't log 
normal filesystem activity at all.  The only thing logged is fsyncs, 
allowing them to deliver on their file-written-to-hardware guarantees, 
without forcing the entire atomic filesystem sync, which would trigger a 
normal atomic commit and thus is a far heavier weight process.  IOW, all 
it does is log and speedup fsyncs.  The filesystem is designed to be 
atomically consistent at commit time, with or without the log, with the 
only thing missing if the log isn't replayed being the last few seconds 
of fsyncs since the last atomic commit.

So the btrfs log is very limited in scope and will in many cases be 
entirely empty, if there were no fsyncs after the last atomic filesystem 
commit, again, every 30 seconds by default, so in human terms at least, 
not a lot of time.

About btrfs log replay...

The kernel, meanwhile, is designed to replay the log automatically at 
mount time.  If the mount is successful, the log has by definition been 
replayed successfully and zeroing it wouldn't have done much of anything 
but possibly lose you a few seconds worth of fsyncs.

Since you are able to run scrub, which requires a writable mount, the 
mount is definitely successful, which means btrfs-zero-log is the wrong 
tool for the job, since it addresses a problem you obviously don't have.

> * Is there a way to find out which files will be affected by the loss of
>   the transactions?

I'm interpreting that question in the context of the transid wanted/found 
listings in your linked logs, since it no longer makes sense in the 
context of btrfs-zero-log, given the information above.

I believe so, but the most direct method requires manual use of btrfs-
debug and similar tools, looking up addresses and tracing down the files 
to which they belong.  Of course that's if the addresses trace to actual 
files at all.  If they trace to metadata instead of data, then it's not 
normally files, but the metadata (including checksums and very small 
files of only a few KiB) about files, instead.  Of course if it's 
metadata the problem's worse, as a single bad metadata block can affect 
multiple actual files.

The more indirect way would be to use btrfs restore with the -t option, 
feeding it the root address associated with the transid found (with that 
association traced via btrfs-find-root), to restore the file from the 
filesystem as it existed at that point, to some other mounted filesystem, 
also using the restore metadata option.  You could then do for instance a 
diff of the listing (or possibly a per-file checksum, say md5sum, of both 
versions) between your current backup (or current mounted filesystem, 
since you can still mount it) and the restored version, which would be 
the files at the time of that transaction-id, and see which ones 
changed.  That of course would be the affected files. =:^]

> I do have a backup of the drive (which I believe is completely up to
> date, the btrfs volume is used for archiving media and documents, and
> single person use of git repositories, i.e. only very light writing and
> reading).

Of course either one of the above is going to be quite some work, and if 
you have a current backup, simply restoring it is likely to be far 
easier, unless of course you're interested in practicing your recovery 
technique or the like, certainly not a valueless endeavor, if you have 
the time and patience for it.

The *GOOD* thing is that you *DO* have a current backup.  Far *FAR* too 
many people we see posting here, are unfortunately finding out the hard 
way, that their actions, or more precisely, lack thereof, in failing to 
do backups, put the lie to any claims that they actually valued the 
data.  As any good sysadmin can tell you, often from unhappy lessons such 
as this, if it's not backed up, by definition, your actions are placing 
its value at less than the time and resources necessary to do that backup 
(modified of course by the risk factor of actually needing it, thus 
taking care of the Nth level backup, some of which are off-site, if the 
data is really /that/ valuable, while also covering the throw-away data 
that's so trivial as to not justify even the effort of a single level of 
backup).

So hurray for you! =:^)

(FWIW, I personally have backups of most stuff here, often several 
levels, tho I don't always keep them current.  But should I be forced to 
resort to them, I'm prepared to lose the intervening updates, as I 

Re: Fixing recursive fault and parent transid verify failed

2015-12-07 Thread Alistair Grant
On Mon, Dec 07, 2015 at 08:25:01AM +, Duncan wrote:
> Alistair Grant posted on Mon, 07 Dec 2015 12:57:15 +1100 as excerpted:
> 
> > I've ran btrfs scrub and btrfsck on the drives, with the output included
> > below.  Based on what I've found on the web, I assume that a
> > btrfs-zero-log is required.
> > 
> > * Is this the recommended path?
> 
> [Just replying to a couple more minor points, here.]
> 
> Absolutely not.  btrfs-zero-log isn't the tool you need here.
> 
> About the btrfs log...
> 
> Unlike most journaling filesystems, btrfs is designed to be atomic and 
> consistent at commit time (every 30 seconds by default) and doesn't log 
> normal filesystem activity at all.  The only thing logged is fsyncs, 
> allowing them to deliver on their file-written-to-hardware guarantees, 
> without forcing the entire atomic filesystem sync, which would trigger a 
> normal atomic commit and thus is a far heavier weight process.  IOW, all 
> it does is log and speedup fsyncs.  The filesystem is designed to be 
> atomically consistent at commit time, with or without the log, with the 
> only thing missing if the log isn't replayed being the last few seconds 
> of fsyncs since the last atomic commit.
> 
> So the btrfs log is very limited in scope and will in many cases be 
> entirely empty, if there were no fsyncs after the last atomic filesystem 
> commit, again, every 30 seconds by default, so in human terms at least, 
> not a lot of time.
> 
> About btrfs log replay...
> 
> The kernel, meanwhile, is designed to replay the log automatically at 
> mount time.  If the mount is successful, the log has by definition been 
> replayed successfully and zeroing it wouldn't have done much of anything 
> but possibly lose you a few seconds worth of fsyncs.
> 
> Since you are able to run scrub, which requires a writable mount, the 
> mount is definitely successful, which means btrfs-zero-log is the wrong 
> tool for the job, since it addresses a problem you obviously don't have.

OK, thanks for the detailed explanation (here and below, so I don't have
to repeat myself).

The reason I thought it might be required was that the parent transid
failed errors were found even after a reboot (and obviously remounting
the filesystem) and without any user activity.

> 
> > * Is there a way to find out which files will be affected by the loss of
> >   the transactions?
> 
> I'm interpreting that question in the context of the transid wanted/found 
> listings in your linked logs, since it no longer makes sense in the 
> context of btrfs-zero-log, given the information above.
> 
> I believe so, but the most direct method requires manual use of btrfs-
> debug and similar tools, looking up addresses and tracing down the files 
> to which they belong.  Of course that's if the addresses trace to actual 
> files at all.  If they trace to metadata instead of data, then it's not 
> normally files, but the metadata (including checksums and very small 
> files of only a few KiB) about files, instead.  Of course if it's 
> metadata the problem's worse, as a single bad metadata block can affect 
> multiple actual files.
> 
> The more indirect way would be to use btrfs restore with the -t option, 
> feeding it the root address associated with the transid found (with that 
> association traced via btrfs-find-root), to restore the file from the 
> filesystem as it existed at that point, to some other mounted filesystem, 
> also using the restore metadata option.  You could then do for instance a 
> diff of the listing (or possibly a per-file checksum, say md5sum, of both 
> versions) between your current backup (or current mounted filesystem, 
> since you can still mount it) and the restored version, which would be 
> the files at the time of that transaction-id, and see which ones 
> changed.  That of course would be the affected files. =:^]
> 

I think I'll try the btrfs restore as a learning exercise, and to check
the contents of my backup (I don't trust my memory, so something could
have changed since the last backup).

Does btrfs restore require the path to be on a btrfs filesystem?  I've
got an existing ext4 drive with enough free space to do the restore, so
would prefer to use it than have to buy another drive.

My plan is:

* btrfs restore /dev/sdX /path/to/ext4/restorepoint
** Where /dev/sdX is one of the two drives that were part of the raid1
   fileystem
* hashdeep audit the restored drive and backup
* delete the existing corrupted btrfs filesystem and recreate
* rsync the merge filesystem (from backup and restore) on to the new
  filesystem

Any comments or suggestions are welcome.


> > I do have a backup of the drive (which I believe is completely up to
> > date, the btrfs volume is used for archiving media and documents, and
> > single person use of git repositories, i.e. only very light writing and
> > reading).
> 
> Of course either one of the above is going to be quite some work, and if 
> you have a current backup, simply 

Re: Fixing recursive fault and parent transid verify failed

2015-12-07 Thread Alistair Grant
On Mon, Dec 07, 2015 at 01:48:47PM +, Duncan wrote:
> Alistair Grant posted on Mon, 07 Dec 2015 21:02:56 +1100 as excerpted:
> 
> > I think I'll try the btrfs restore as a learning exercise, and to check
> > the contents of my backup (I don't trust my memory, so something could
> > have changed since the last backup).
> 
> Trying btrfs restore is an excellent idea.  It'll make things far easier 
> if you have to use it for real some day.
> 
> Note that while I see your kernel is reasonably current (4.2 series), I 
> don't know what btrfs-progs ubuntu ships.  There have been some marked 
> improvements to restore somewhat recently, checking the wiki btrfs-progs 
> release-changelog list says 4.0 brought optional metadata restore, 4.0.1 
> added --symlinks, and 4.2.3 fixed a symlink path check off-by-one error.  
> (And don't use 4.1.1 as its mkfs.btrfs is broken and produces invalid 
> filesystems.)  So you'll want at least progs 4.0 to get the optional 
> metadata restoration, and 4.2.3 to get full symlinks restoration support.
> 

Ubuntu 15.10 comes with btrfs-progs v4.0.  It looks like it is easy
enough to compile and install the latest version from
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git so
I'll do that.

Should I stick to 4.2.3 or use the latest 4.3.1?


> > Does btrfs restore require the path to be on a btrfs filesystem?  I've
> > got an existing ext4 drive with enough free space to do the restore, so
> > would prefer to use it than have to buy another drive.
> 
> Restoring to ext4 should be fine.
> 
> Btrfs restore writes files as would an ordinary application, the reason 
> metadata restoration is optional (otherwise it uses normal file change 
> and mod times, with files written as the running user, root, using umask-
> based file perms, all exactly the same as if it were a normal file 
> writing application), so it will restore to any normal filesystem.  The 
> filesystem it's restoring /from/ of course must be btrfs... unmounted 
> since it's designed to be used when mounting is broken, but it writes 
> files normally, so can write them to any filesystem.
> 
> FWIW, I restored to my reiserfs based media partition (still on spinning 
> rust, my btrfs are all on ssd) here, since that's where I had the room to 
> work with.
>

Thanks for the confirmation.

 
> > My plan is:
> > 
> > * btrfs restore /dev/sdX /path/to/ext4/restorepoint
> > ** Where /dev/sdX is one of the two drives that were part of the raid1
> >fileystem
> > * hashdeep audit the restored drive and backup
> > * delete the existing corrupted btrfs filesystem and recreate
> > * rsync the merge filesystem (from backup and restore)
> >   on to the new filesystem
> > 
> > Any comments or suggestions are welcome.
> 
> 
> Looks very reasonable, here.  There's a restore page on the wiki with 
> more information than the btrfs-restore manpage, describing how to use it 
> with btrfs-find-root if necessary, etc.
> 
> https://btrfs.wiki.kernel.org/index.php/Restore
> 

I'd seen this, but it isn't explicit about the target filesystem
support.  I should try and update the page a bit.


> Some details on the page are a bit dated; it doesn't cover the dryrun, 
> list-roots, metadata and symlink options, for instance, and these can be 
> very helpful, but the general idea remains the same.
> 
> The general idea is to use btrfs-find-root to get a listing of available 
> root generations (if restore can't find a working root from the 
> superblocks or you want to try restoring an earlier root), then feed the 
> corresponding bytenr to restore's -t option.
> 
> Note that generation and transid refer to the same thing, a normally 
> increasing number, so higher generations are newer.  The wiki page makes 
> this much clearer than it used to, but the old wording anyway was 
> confusing to me until I figured that out.
> 
> Where the wiki page talks about root object-ids, those are the various 
> subtrees, low numbers are the base trees, 256+ are subvolumes/snapshots.  
> Note that restore's list-roots option lists these for the given bytenr as 
> well.
> 
> So you try restore with list-roots (-l) to see what it gives you, try 
> btrfs-find-root if not satisfied, to find older generations and get their 
> bytenrs to plug into restore with -t, and then confirm specific 
> generation bytenrs with list-roots again.
> 
> Once you have a good generation/bytenr candidate, try a dry-run (-D) to 
> see if you get a list of files it's trying to restore that looks 
> reasonable.
> 
> If the dry-run goes well, you can try the full restore, not forgetting 
> the metadata and symlinks options (-m, -S, respectively), if desired.
> 
> From there you can continue with your plan as above.
> 
> One more bonus hint.  Since you'll be doing a new mkfs.btrfs, it's a good 
> time to review active features and decide which ones you might wish to 
> activate (or not, if you're concerned about old-kernel compatibility).  
> Additionally, before 

Re: Fixing recursive fault and parent transid verify failed

2015-12-06 Thread Lukas Pirl
On 12/07/2015 02:57 PM, Alistair Grant wrote as excerpted:
> Fixing recursive fault, but reboot is needed

For the record:

I saw the same message (incl. hard lockup) when doing a balance on a
single-disk btrfs.

Besides that, the fs works flawlessly (~60GB, usage: no snapshots, ~15
lxc containers, low-load databases, few mails, a couple of Web servers).

As this is a production machine, I rather rebooted the machine instead
of investigating but the error is reproducible if that would be of
great interest.

> I've ran btrfs scrub and btrfsck on the drives, with the output
> included below.  Based on what I've found on the web, I assume that a
> btrfs-zero-log is required.
> 
> * Is this the recommended path?
> * Is there a way to find out which files will be affected by the loss of
>   the transactions?

> Kernel: Ubuntu 4.2.0-19-generic (which is based on mainline 4.2.6)

I used Debian Backports 4.2.6.

Cheers,

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fixing recursive fault and parent transid verify failed

2015-12-06 Thread Alistair Grant
Hi,

(Resending as it looks like the first attempt didn't get through,
probably too large, so logs are now in dropbox)

I have a btrfs volume which is raid1 across two spinning rust disks,
each 2TB.

When trying to access some files from a another machine using sshfs the
server machine has crashed twice resulting in a hard lock up, i.e. power
off required to restart the machine.

There are no crash dumps in /var/log/syslog, or anything that looks like
an associated error message to me, however on the second occasion I was
able to see the following message flash up the console (in addition to
some stack dumps):

Fixing recursive fault, but reboot is needed

I've ran btrfs scrub and btrfsck on the drives, with the output
included below.  Based on what I've found on the web, I assume that a
btrfs-zero-log is required.

* Is this the recommended path?
* Is there a way to find out which files will be affected by the loss of
  the transactions?

I do have a backup of the drive (which I believe is completely up to
date, the btrfs volume is used for archiving media and documents, and
single person use of git repositories, i.e. only very light writing and
reading).

Some basic details:

OS: Ubuntu 15.10
Kernel: Ubuntu 4.2.0-19-generic (which is based on mainline 4.2.6)

> sudo btrfs fi df /srv/d2root
==

Data, RAID1: total=250.00GiB, used=248.86GiB
Data, single: total=8.00MiB, used=0.00B
System, RAID1: total=8.00MiB, used=64.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=1.00GiB, used=466.77MiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=160.00MiB, used=0.00B

> sudo btrfs fi usage /srv/d2root
=

Overall:
Device size:   3.64TiB
Device allocated:502.04GiB
Device unallocated:3.15TiB
Device missing:  0.00B
Used:498.62GiB
Free (estimated):  1.58TiB  (min: 1.58TiB)
Data ratio:   2.00
Metadata ratio:   1.99
Global reserve:  160.00MiB  (used: 0.00B)

Data,single: Size:8.00MiB, Used:0.00B
   /dev/sdc8.00MiB

Data,RAID1: Size:250.00GiB, Used:248.86GiB
   /dev/sdb  250.00GiB
   /dev/sdc  250.00GiB

Metadata,single: Size:8.00MiB, Used:0.00B
   /dev/sdc8.00MiB

Metadata,RAID1: Size:1.00GiB, Used:466.77MiB
   /dev/sdb1.00GiB
   /dev/sdc1.00GiB

System,single: Size:4.00MiB, Used:0.00B
   /dev/sdc4.00MiB

System,RAID1: Size:8.00MiB, Used:64.00KiB
   /dev/sdb8.00MiB
   /dev/sdc8.00MiB

Unallocated:
   /dev/sdb1.57TiB
   /dev/sdc1.57TiB


btrfs scrub output:
https://www.dropbox.com/s/blqvopa1lhkghe5/scrub.log?dl=0


btrfsck sdb output:
https://www.dropbox.com/s/hw6w6cupuu1rny4/btrfsck.sdb.log?dl=0


btrfsck sdc output:
https://www.dropbox.com/s/mijz492mjr76p8z/btrfsck.sdc.log?dl=0



Thanks very much,
Alistair

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fs unreadable after powercycle: BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922

2015-08-16 Thread Qu Wenruo



Martin Tippmann wrote on 2015/08/08 20:43 +0200:

Hi, after a hard reboot (powercycle) a btrfs volume did not come up again:

It's a single 4TB disk - only btrfs with lzo - data=single,metadata=dup

[  121.831814] BTRFS info (device sda): disk space caching is enabled
[  121.857820] BTRFS (device sda): parent transid verify failed on
427084513280 wanted 390924 found 390922
[  121.861607] BTRFS (device sda): parent transid verify failed on
427084513280 wanted 390924 found 390922
[  121.861715] BTRFS: failed to read tree root on sda
[  121.878111] BTRFS: open_ctree failed

btrfs-progs v4.0
Kernel: 4.1.4

I'm quite sure that the HDD is fine (no SMART Problems, Disk Errorlog
is empty, It's a new Enterprise-Drive that worked well in the past
days/weeks).

So I'm kind at loss what to do:

How can I recover from that problem? I've found just a note in the
FAQ[1] but no solution to the problem.

Maybe someone can give some clues why does this happen in the first
place? Is it unfortunate timing due to the abrupt power cycle?
Shouldn't CoW protect against this somewhat?

Thanks for any hints!

Additional info:

# btrfs check /dev/sda
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
Ignoring transid failure
Couldn't setup extent tree

Seems extent tree or tree root is corrupted.


Couldn't open file system

Not sure what it does but it looks not too good:

# btrfs-find-root /dev/sda
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Superblock thinks the generation is 390924
Superblock thinks the level is 1
   Well block 427084988416(gen: 390923 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427084021760(gen: 390923 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
These two block seems to be good, but not sure why there will be two of 
them.


Try btrfsck --tree-root 427084988416 and btrfsck --tree-root 
427084021760 to see which produce the least number of error.


Thanks,
Qu

Well block 427084431360(gen: 390915 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427084398592(gen: 390915 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427083988992(gen: 390915 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427038621696(gen: 390914 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427031035904(gen: 390913 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427285069824(gen: 390912 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427060887552(gen: 390912 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427013128192(gen: 390912 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427001872384(gen: 390909 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426965237760(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426965221376(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426965188608(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426965172224(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426965155840(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426964271104(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426964156416(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426950377472(gen: 390905 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426944512000(gen: 390905 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426940841984(gen: 390905 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426940612608(gen: 390905 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426940465152(gen: 390905 level: 0) seems good

fs unreadable after powercycle: BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922

2015-08-08 Thread Martin Tippmann
Hi, after a hard reboot (powercycle) a btrfs volume did not come up again:

It's a single 4TB disk - only btrfs with lzo - data=single,metadata=dup

[  121.831814] BTRFS info (device sda): disk space caching is enabled
[  121.857820] BTRFS (device sda): parent transid verify failed on
427084513280 wanted 390924 found 390922
[  121.861607] BTRFS (device sda): parent transid verify failed on
427084513280 wanted 390924 found 390922
[  121.861715] BTRFS: failed to read tree root on sda
[  121.878111] BTRFS: open_ctree failed

btrfs-progs v4.0
Kernel: 4.1.4

I'm quite sure that the HDD is fine (no SMART Problems, Disk Errorlog
is empty, It's a new Enterprise-Drive that worked well in the past
days/weeks).

So I'm kind at loss what to do:

How can I recover from that problem? I've found just a note in the
FAQ[1] but no solution to the problem.

Maybe someone can give some clues why does this happen in the first
place? Is it unfortunate timing due to the abrupt power cycle?
Shouldn't CoW protect against this somewhat?

Thanks for any hints!

Additional info:

# btrfs check /dev/sda
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
Ignoring transid failure
Couldn't setup extent tree
Couldn't open file system

Not sure what it does but it looks not too good:

# btrfs-find-root /dev/sda
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
parent transid verify failed on 427084513280 wanted 390924 found 390922
Ignoring transid failure
Couldn't setup extent tree
Couldn't setup device tree
Superblock thinks the generation is 390924
Superblock thinks the level is 1
  Well block 427084988416(gen: 390923 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427084021760(gen: 390923 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427084431360(gen: 390915 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427084398592(gen: 390915 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427083988992(gen: 390915 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427038621696(gen: 390914 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427031035904(gen: 390913 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427285069824(gen: 390912 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427060887552(gen: 390912 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427013128192(gen: 390912 level: 1) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 427001872384(gen: 390909 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426965237760(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426965221376(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426965188608(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426965172224(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426965155840(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426964271104(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426964156416(gen: 390906 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426950377472(gen: 390905 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426944512000(gen: 390905 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426940841984(gen: 390905 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426940612608(gen: 390905 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426940465152(gen: 390905 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426940153856(gen: 390905 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block 426939809792(gen: 390905 level: 0) seems good, but
generation/level doesn't match, want gen: 390924 level: 1
Well block

Re: fs unreadable after powercycle: BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922

2015-08-08 Thread Hugo Mills
On Sat, Aug 08, 2015 at 08:43:34PM +0200, Martin Tippmann wrote:
 Hi, after a hard reboot (powercycle) a btrfs volume did not come up again:
 
 It's a single 4TB disk - only btrfs with lzo - data=single,metadata=dup
 
 [  121.831814] BTRFS info (device sda): disk space caching is enabled
 [  121.857820] BTRFS (device sda): parent transid verify failed on
 427084513280 wanted 390924 found 390922
 [  121.861607] BTRFS (device sda): parent transid verify failed on
 427084513280 wanted 390924 found 390922
 [  121.861715] BTRFS: failed to read tree root on sda
 [  121.878111] BTRFS: open_ctree failed
 
 btrfs-progs v4.0
 Kernel: 4.1.4
 
 I'm quite sure that the HDD is fine (no SMART Problems, Disk Errorlog
 is empty, It's a new Enterprise-Drive that worked well in the past
 days/weeks).
 
 So I'm kind at loss what to do:
 
 How can I recover from that problem? I've found just a note in the
 FAQ[1] but no solution to the problem.
 
 Maybe someone can give some clues why does this happen in the first
 place? Is it unfortunate timing due to the abrupt power cycle?
 Shouldn't CoW protect against this somewhat?

   Not somewhat: it should protect it completely. There are two ways
that this can happen: it's a bug in btrfs, or there's something
stopping barriers from working. That latter case can be either a bug
in the kernel's block layer (pretty unlikely), or the hardware is
behaving badly and ignoring the barriers (more likely, particularly if
it's on a USB/SATA converter).

   I don't think there's a good solution to transid failures, I'm
afraid. The best that I'm aware of is to use btrfs restore to grab the
pieces of your FS that aren't up to date in your backups, and then
restore from them.

 Thanks for any hints!
 
 Additional info:
 
 # btrfs check /dev/sda
 parent transid verify failed on 427084513280 wanted 390924 found 390922
 parent transid verify failed on 427084513280 wanted 390924 found 390922
 parent transid verify failed on 427084513280 wanted 390924 found 390922
 parent transid verify failed on 427084513280 wanted 390924 found 390922
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't open file system
 
 Not sure what it does but it looks not too good:

   Actually, it's pretty good, other than the transid failure, which
is a real problem.

   Hugo.

 # btrfs-find-root /dev/sda
 parent transid verify failed on 427084513280 wanted 390924 found 390922
 parent transid verify failed on 427084513280 wanted 390924 found 390922
 parent transid verify failed on 427084513280 wanted 390924 found 390922
 parent transid verify failed on 427084513280 wanted 390924 found 390922
 Ignoring transid failure
 Couldn't setup extent tree
 Couldn't setup device tree
 Superblock thinks the generation is 390924
 Superblock thinks the level is 1
   Well block 427084988416(gen: 390923 level: 1) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 427084021760(gen: 390923 level: 1) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 427084431360(gen: 390915 level: 0) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 427084398592(gen: 390915 level: 0) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 427083988992(gen: 390915 level: 0) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 427038621696(gen: 390914 level: 1) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 427031035904(gen: 390913 level: 1) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 427285069824(gen: 390912 level: 1) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 427060887552(gen: 390912 level: 1) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 427013128192(gen: 390912 level: 1) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 427001872384(gen: 390909 level: 0) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 426965237760(gen: 390906 level: 0) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 426965221376(gen: 390906 level: 0) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 426965188608(gen: 390906 level: 0) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 426965172224(gen: 390906 level: 0) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 426965155840(gen: 390906 level: 0) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 426964271104(gen: 390906 level: 0) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block 426964156416(gen: 390906 level: 0) seems good, but
 generation/level doesn't match, want gen: 390924 level: 1
 Well block

Re: fs unreadable after powercycle: BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922

2015-08-08 Thread Martin Tippmann
2015-08-08 21:05 GMT+02:00 Hugo Mills h...@carfax.org.uk:
 Maybe someone can give some clues why does this happen in the first
 place? Is it unfortunate timing due to the abrupt power cycle?
 Shouldn't CoW protect against this somewhat?

Not somewhat: it should protect it completely. There are two ways
 that this can happen: it's a bug in btrfs, or there's something
 stopping barriers from working. That latter case can be either a bug
 in the kernel's block layer (pretty unlikely), or the hardware is
 behaving badly and ignoring the barriers (more likely, particularly if
 it's on a USB/SATA converter).

Thanks for the information. The setup is nothing out of the ordinary.
The disks are HGST HUS724040ALA640 running on a Dell H310 SATA
controller and configured as JBOD. It's all running on defaults on a
Dell PowerEdge R720. SMART says the disk write cache is enabled -
maybe that's part of the problem?

I don't think there's a good solution to transid failures, I'm
 afraid. The best that I'm aware of is to use btrfs restore to grab the
 pieces of your FS that aren't up to date in your backups, and then
 restore from them.

Okay, fortunately can I dismiss the data - or is the broken Image of
any use for anyone? It's a 4TB disk but I guess I could create a
compressed (partial) image if it's of interest to anyone.

regards
Martin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fs unreadable after powercycle: BTRFS (device sda): parent transid verify failed on 427084513280 wanted 390924 found 390922

2015-08-08 Thread Duncan
Martin Tippmann posted on Sat, 08 Aug 2015 20:43:34 +0200 as excerpted:

 Hi, after a hard reboot (powercycle) a btrfs volume did not come up
 again:
 
 It's a single 4TB disk - only btrfs with lzo - data=single,metadata=dup
 
 [  121.831814] BTRFS info (device sda): disk space caching is enabled [
 121.857820] BTRFS (device sda): parent transid verify failed on
 427084513280 wanted 390924 found 390922 [  121.861607] BTRFS (device
 sda):
 parent transid verify failed on 427084513280 wanted 390924 found 390922
 [ 121.861715] BTRFS: failed to read tree root on sda [  121.878111]
 BTRFS: open_ctree failed
 
 btrfs-progs v4.0 Kernel: 4.1.4
 
 I'm quite sure that the HDD is fine (no SMART Problems, Disk Errorlog is
 empty, It's a new Enterprise-Drive that worked well in the past
 days/weeks).
 
 So I'm kind at loss what to do:
 
 How can I recover from that problem? I've found just a note in the
 FAQ[1] but no solution to the problem.

[The FAQ reference was to the wiki problem faq, transid failure 
explanation, but it didn't say what to do about it.]

Did you try the recovery mount option suggested earlier in the problem-faq 
under mount problems?

https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_can.27t_mount_my_filesystem.2

For transid failures, that's what I'd try first, since that scans 
previous tree-roots and tries to use the first one it can read.  Since 
the transid it wants (390924) is only a couple ahead of what it finds 
(390922), and the recover mount option scans backward in the tree-root 
history to see if it can find any that work, that could well solve the 
problem.

If not, as Hugo mentions, given find-tree-root looks good, btrfs restore 
has a good chance of working.  I've used that myself to good effect a 
couple times when a btrfs refused to mount (I have backups if I have to 
use 'em, but recovery or restore, when they work, will normally leave me 
with more current copies, since I tend to let my backups get somewhat 
stale).  There's a page on the wiki for using it with find-root if 
necessary, but the wiki page is a bit dated.  The btrfs-restore manpage 
should be current, but doesn't have the detail about using it with find-
root that the wiki page has.

 Maybe someone can give some clues why does this happen in the first
 place?
 Is it unfortunate timing due to the abrupt power cycle?
 Shouldn't CoW protect against this somewhat?

As Hugo says, in theory cow should protect against this, but the 
combination of possible bugs in a still not yet fully stable and mature 
btrfs, and possibly buggy hardware, means theory and practice don't 
always line up as well as they should, in theory. (How's that for an 
ouroboros, aka snake eating it's tail circular-reference, explanation? 
=:^)

But the recovery mount option is a reasonable first recovery (now 
ouroboroi =:^) option, and btrfs restore not too bad to work with if that 
fails.

Referencing the hardware write-caching option you mentioned later, yes, 
turning that off can help... in theory... but it also tends to have a 
DRAMATICALLY bad effect on spinning rust write performance (I don't know 
enough about SSD write caching to venture a guess), and in some cases 
voids warranties due to the additional thrashing it's likely to cause as 
well, so do your research before turning it off.  In general, it's not a 
good idea as it's simply not worth it.  Both Linux at the generic IO 
level and the various filesystem stacks are designed to work around all 
but the worst hardware IO barrier failures, and the write slowdown and 
increased disk thrashing are simply not worth it, in most cases.  If the 
hardware is actually bad enough that it's worth it, I'd strongly consider 
different hardware.





-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


parent transid verify failed

2015-04-28 Thread Anthony Plack
I am running kernel 4.0 and btrfs-prog mainline.  I have a backup.

Of the following commands:

btrfs check —repair device
btrfsck —repair device
mount -t btrfs -o recovery device mount  btrfs scrub start mount

--none of them remove the parent transid verify failed” errors from the disk.

The disk was going to read-only.  The disk now mounts and seems to be fine.  
However, these “errors” persist.

Is there any tool other than to zero the log, which will “repair” the log?--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


'parent transid verify failed' for 13637 missing transactions, resulting in 'BTRFS: Transaction aborted'

2015-01-08 Thread Reuben D'Netto
Hi,
I have a btrfs volume in RAID0 across 2 SSDs which has (for no apparent reason) 
become corrupted. Although I am able to mount the partition, there are several 
messages displayed in the kernel log when doing so.
I have copied the files off the file system, but would like to know if they can 
be relied upon or not (and if not, which ones are corrupt). I would also like 
to know if the file system itself is recoverable, or should be erased entirely 
and replaced.
I have tried 'btrfs check --repair' and btrfs-zero-log to no avail. The SMART 
data for both drives suggests there are no issues with the hardware.

Thanks in advance.


Distro: Sabayon amd64
Kernel in use when corruption occurred: 3.17.4
Kernel in use when collecting diagnostic info: 3.16.0-23-generic (Ubuntu livecd)
Btrfs-progs version: 3.18


btrfs fi df: (Used space is incorrect - should be at least 30 GB)
Data, RAID0: total=93.16GiB, used=25.19MiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=8.01GiB, used=73.81MiB
unknown, single: total=16.00MiB, used=16.00KiB


btrfs fi show: (truncated to show relevant filesystem only)
Label: none  uuid: d75ecf88-9b18-4ca6-8fd4-7bda0630de9b
Total devices 2 FS bytes used 73.81MiB
devid1 size 54.62GiB used 54.62GiB path /dev/sda1
devid2 size 54.62GiB used 54.62GiB path /dev/sdb1


Kernel log when mounting file system:
[  106.564009] BTRFS info (device sda1): disk space caching is enabled
[  106.577597] BTRFS: detected SSD devices, enabling SSD mode
[  106.578440] BTRFS: checking UUID tree
[  106.581198] parent transid verify failed on 168079851520 wanted 6329580 
found 6343217
[  106.581857] parent transid verify failed on 168079851520 wanted 6329580 
found 6343217
[  106.581880] BTRFS warning (device sda1): btrfs_uuid_tree_iterate failed -12


When unmounting:
[  113.814408] [ cut here ]
[  113.814454] WARNING: CPU: 0 PID: 3872 at 
/build/buildd/linux-3.16.0/fs/btrfs/extent-tree.c:5956 
__btrfs_free_extent+0x675/0xc00 [btrfs]()
[  113.814460] Modules linked in: joydev btrfs dm_crypt xor snd_hda_codec_hdmi 
raid6_pq dm_multipath scsi_dh kvm_amd kvm snd_seq_midi snd_hda_codec_realtek 
snd_seq_midi_event snd_hda_codec_generic snd_rawmidi edac_core snd_hda_intel 
snd_hda_controller k10temp serio_raw edac_mce_amd snd_seq snd_hda_codec bnep 
snd_hwdep rfcomm snd_seq_device snd_pcm bluetooth snd_timer snd 6lowpan_iphc 
sp5100_tco soundcore i2c_piix4 shpchp mac_hid parport_pc ppdev lp parport 
squashfs overlayfs nls_utf8 isofs jfs xfs libcrc32c reiserfs dm_mirror 
dm_region_hash dm_log hid_generic nouveau mxm_wmi video i2c_algo_bit ttm usbhid 
drm_kms_helper pata_acpi firewire_ohci tg3 hid firewire_core r8169 drm ahci ptp 
crc_itu_t mii pata_jmicron libahci pps_core wmi
[  113.814558] CPU: 0 PID: 3872 Comm: umount Tainted: GW 
3.16.0-23-generic #31-Ubuntu
[  113.814564] Hardware name: Gigabyte Technology Co., Ltd. 
GA-870A-UD3/GA-870A-UD3, BIOS F5 08/01/2011
[  113.814569]  0009 8800bd5afa28 8177fcbc 

[  113.814577]  8800bd5afa60 8106fd8d 00218f175000 
8800cb98f000
[  113.814584]  8800a80e9000 fffe  
8800bd5afa70
[  113.814591] Call Trace:
[  113.814605]  [8177fcbc] dump_stack+0x45/0x56
[  113.814615]  [8106fd8d] warn_slowpath_common+0x7d/0xa0
[  113.814623]  [8106fe6a] warn_slowpath_null+0x1a/0x20
[  113.814651]  [c0d15345] __btrfs_free_extent+0x675/0xc00 [btrfs]
[  113.814661]  [811c16a6] ? __slab_free+0xa6/0x320
[  113.814690]  [c0d1a044] __btrfs_run_delayed_refs+0x424/0x11e0 
[btrfs]
[  113.814721]  [c0d1edf3] btrfs_run_delayed_refs.part.64+0x73/0x270 
[btrfs]
[  113.814750]  [c0d1f51d] btrfs_write_dirty_block_groups+0x46d/0x710 
[btrfs]
[  113.814784]  [c0d2d64d] commit_cowonly_roots+0x18d/0x240 [btrfs]
[  113.814818]  [c0d301ad] 
btrfs_commit_transaction.part.22+0x49d/0x970 [btrfs]
[  113.814852]  [c0d2f27a] btrfs_commit_transaction+0x3a/0x80 [btrfs]
[  113.814875]  [c0cfe760] btrfs_sync_fs+0x50/0xc0 [btrfs]
[  113.814884]  [81211a82] sync_filesystem+0x72/0xb0
[  113.814891]  [811e2d50] generic_shutdown_super+0x30/0xf0
[  113.814897]  [811e30a2] kill_anon_super+0x12/0x20
[  113.814920]  [c0d01e86] btrfs_kill_super+0x16/0x90 [btrfs]
[  113.814926]  [811e3429] deactivate_locked_super+0x49/0x60
[  113.814932]  [811e3874] deactivate_super+0x64/0x70
[  113.814940]  [812015ef] mntput_no_expire+0xdf/0x180
[  113.814947]  [81202bac] SyS_umount+0x8c/0x100
[  113.814954]  [81787ced] system_call_fastpath+0x1a/0x1f
[  113.814959] ---[ end trace 328a5b6c02402780 ]---
[  113.814967] BTRFS info (device sda1): leaf 104182874112 total ptrs 209 free 
space 75
[  113.814973]  item 0 key (140680462336 168 16384) itemoff 16232 itemsize 51
[  113.814978]  extent refs 1

Unmountable filesystem parent transid verify failed

2013-09-01 Thread ronnie sahlberg
Hi again.
Sorry for top posting.


I have a 9 disk filesystem that does not mount anymore and need some
help/advice so I can recover the data.

What happened was that I was running a btrfs delete device
under Ubuntu 13.04   Kernel 3.8
and after a long time of moving data around it crashed with a SEGV.

Now the filesystem does not mount and none of the recovery options I
have tried work.

I have upgraded to Debian testing and are now using kerne3.10-2-amd64



When I try btrfsck I get heaps of these :
Ignoring transid failure
parent transid verify failed on 24419581267968 wanted 301480 found 301495
parent transid verify failed on 24419581267968 wanted 301480 found 301495
parent transid verify failed on 24419581267968 wanted 301480 found 301495
parent transid verify failed on 24419581267968 wanted 301480 found 301495


I have tried using btrfs-image :
but it too crashes eventually with :

btrfs-image -c9 -t4 /dev/sde btrfs-image
...
btrfs-image: ctree.c:787: read_node_slot: Assertion `!(level == 0)' failed.
Aborted


mount -o ro,recovery fails
# mount -o ro,recovery /dev/sde /DATA
mount: wrong fs type, bad option, bad superblock on /dev/sde,
...


# btrfs-zero-log /dev/sde
eventually fails with :
btrfs-zero-log: ctree.c:342: __btrfs_cow_block: Assertion
`!(btrfs_header_generation(buf)  trans-transid)' failed.
Aborted


What should I try next?


regards
ronnie sahlberg
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unmountable BTRFS with parent transid verify failed

2013-08-31 Thread ronnie sahlberg
Hi,

I have a 9 disk raid1 filesystem that is no longer mountable.
I am using ubuntu 13.04 with kernel 3.8.0-26-generic


What happened was that I was removing a device using
btrfs device delete
and this was running for quite a while (I was removing a 3T device)
but eventually this failed with the btrfs command segfaulting.

Now when I have rebooted but the filesystem does not mount.
When I run btrfsck /dev/sde I get a lot of

parent transid verify failed on 3539986560 wanted 301481 found 301495
parent transid verify failed on 3539986560 wanted 301481 found 301495
parent transid verify failed on 3539986560 wanted 301481 found 301495
parent transid verify failed on 3539986560 wanted 301481 found 301495
Ignoring transid failure
leaf parent key incorrect 3539986560
leaf parent key incorrect 3536398464
bad block 3536398464

And while btrfsck eventually does complete  the filesystem remains unmountable.

Any advice ?


regards
ronnie sahlberg
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unmountable BTRFS with parent transid verify failed

2013-08-31 Thread Duncan
ronnie sahlberg posted on Sat, 31 Aug 2013 14:50:36 -0700 as excerpted:

 And while btrfsck eventually does complete  the filesystem remains
 unmountable.
 
 Any advice ?

This isn't specific to your question, but in general...

In the Question: How can I recover this partition? (unable to find 
logical $hugenum len 4096) thread about a week ago, there's a post from 
Hugo Mills, listing the general troubleshooting steps he recommends and 
in what order.  I'd try that.

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/27999

(I have it marked to possibly add the info to the wiki as I don't 
remember seeing such a concise list there, but I haven't gotten around to 
it yet.)

Wiki: https://btrfs.wiki.kernel.org/

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unmountable BTRFS with parent transid verify failed

2013-08-31 Thread Chris Murphy

On Aug 31, 2013, at 4:01 PM, Duncan 1i5t5.dun...@cox.net wrote:

 ronnie sahlberg posted on Sat, 31 Aug 2013 14:50:36 -0700 as excerpted:
 
 And while btrfsck eventually does complete  the filesystem remains
 unmountable.
 
 Any advice ?
 
 This isn't specific to your question, but in general...
 
 In the Question: How can I recover this partition? (unable to find 
 logical $hugenum len 4096) thread about a week ago, there's a post from 
 Hugo Mills, listing the general troubleshooting steps he recommends and 
 in what order.  I'd try that.
 
 http://permalink.gmane.org/gmane.comp.file-systems.btrfs/27999

I was about to suggest the same thing, but also to use something newer than 
3.8.0, and before getting to any of the btrfs specific commands to make sure a 
recent btrfs-progs is being used. There have been lots of fixes between 3.8 and 
3.10.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failed on -- After moving btrfs closer to the beginning of drive with dd

2012-12-29 Thread Jordan Windsor
Also here's the output of btrfs-find-root:

./btrfs-find-root /dev/sdb1
Super think's the tree root is at 1229060866048, chunk root 1259695439872
Went past the fs size, exiting

Not sure where to go from here.

On Sat, Dec 29, 2012 at 6:04 AM, Jordan Windsor jorda...@gmail.com wrote:
 Hello,
 thanks for the response!
 Here's the output of -o recovery

 [ 5473.725751] device label Storage devid 1 transid 116023 /dev/sdb1
 [ 5473.726612] btrfs: enabling auto recovery
 [ 5473.726615] btrfs: disk space caching is enabled
 [ 5473.734581] parent transid verify failed on 1229060423680 wanted
 116023 found 116027
 [ 5473.734797] parent transid verify failed on 1229060423680 wanted
 116023 found 116027
 [ 5473.734801] btrfs: failed to read tree root on sdb1
 [ 5473.735010] parent transid verify failed on 1229060423680 wanted
 116023 found 116027
 [ 5473.735259] parent transid verify failed on 1229060423680 wanted
 116023 found 116027
 [ 5473.735262] btrfs: failed to read tree root on sdb1
 [ 5473.756367] parent transid verify failed on 1229060243456 wanted
 116022 found 116028
 [ 5473.761968] parent transid verify failed on 1229060243456 wanted
 116022 found 116028
 [ 5473.761975] btrfs: failed to read tree root on sdb1
 [ 5475.561208] btrfs bad tree block start 7479324919942847850 1241518882816
 [ 5475.567008] btrfs bad tree block start 13410158725948676859 1241518882816
 [ 5475.567056] Failed to read block groups: -5
 [ 5475.570200] btrfs: open_ctree failed

 I'm on kernel 3.6.10 and have been before this problem.

 Thanks.

 On Sat, Dec 29, 2012 at 5:29 AM, cwillu cwi...@cwillu.com wrote:
 On Fri, Dec 28, 2012 at 12:09 PM, Jordan Windsor jorda...@gmail.com wrote:
 Hello,
 I moved my btrfs to the beginning of my drive  updated the partition
 table  also restarted, I'm currently unable to mount it, here's the
 output in dmesg.

 [  481.513432] device label Storage devid 1 transid 116023 /dev/sdb1
 [  481.514277] btrfs: disk space caching is enabled
 [  481.522611] parent transid verify failed on 1229060423680 wanted
 116023 found 116027
 [  481.522789] parent transid verify failed on 1229060423680 wanted
 116023 found 116027
 [  481.522790] btrfs: failed to read tree root on sdb1
 [  481.523656] btrfs: open_ctree failed

 What command should I run from here?

 The filesystem wasn't uncleanly unmounted, likely on an older kernel.

 Try mounting with -o recovery
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failed on -- After moving btrfs closer to the beginning of drive with dd

2012-12-29 Thread cwillu
On Sat, Dec 29, 2012 at 7:14 AM, Jordan Windsor jorda...@gmail.com wrote:
 Also here's the output of btrfs-find-root:

 ./btrfs-find-root /dev/sdb1
 Super think's the tree root is at 1229060866048, chunk root 1259695439872
 Went past the fs size, exiting

 Not sure where to go from here.

I can't say for certain, but that suggests that the move-via-dd didn't
succeed / wasn't correct, and/or the partitioning changes didn't
match, and/or the dd happened from a mounted filesystem (which would
also explain the transid errors, if there wasn't an unclean umount
involved).

btrfs-restore might be able to pick out files, but you may be in
restore-from-backup territory.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failed on -- After moving btrfs closer to the beginning of drive with dd

2012-12-29 Thread Chris Murphy

On Dec 29, 2012, at 8:38 AM, cwillu cwi...@cwillu.com wrote:

 On Sat, Dec 29, 2012 at 7:14 AM, Jordan Windsor jorda...@gmail.com wrote:
 Also here's the output of btrfs-find-root:
 
 ./btrfs-find-root /dev/sdb1
 Super think's the tree root is at 1229060866048, chunk root 1259695439872
 Went past the fs size, exiting
 
 Not sure where to go from here.
 
 I can't say for certain, but that suggests that the move-via-dd didn't
 succeed / wasn't correct, and/or the partitioning changes didn't
 match, and/or the dd happened from a mounted filesystem (which would
 also explain the transid errors, if there wasn't an unclean umount
 involved).
 
 btrfs-restore might be able to pick out files, but you may be in
 restore-from-backup territory.

Yeah I'm vaguely curious about how the move was done, in particular if it was 
dd'd from a mounted fs. 

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failed on -- After moving btrfs closer to the beginning of drive with dd

2012-12-28 Thread cwillu
On Fri, Dec 28, 2012 at 12:09 PM, Jordan Windsor jorda...@gmail.com wrote:
 Hello,
 I moved my btrfs to the beginning of my drive  updated the partition
 table  also restarted, I'm currently unable to mount it, here's the
 output in dmesg.

 [  481.513432] device label Storage devid 1 transid 116023 /dev/sdb1
 [  481.514277] btrfs: disk space caching is enabled
 [  481.522611] parent transid verify failed on 1229060423680 wanted
 116023 found 116027
 [  481.522789] parent transid verify failed on 1229060423680 wanted
 116023 found 116027
 [  481.522790] btrfs: failed to read tree root on sdb1
 [  481.523656] btrfs: open_ctree failed

 What command should I run from here?

The filesystem wasn't uncleanly unmounted, likely on an older kernel.

Try mounting with -o recovery
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failed on -- After moving btrfs closer to the beginning of drive with dd

2012-12-28 Thread Jordan Windsor
Hello,
thanks for the response!
Here's the output of -o recovery

[ 5473.725751] device label Storage devid 1 transid 116023 /dev/sdb1
[ 5473.726612] btrfs: enabling auto recovery
[ 5473.726615] btrfs: disk space caching is enabled
[ 5473.734581] parent transid verify failed on 1229060423680 wanted
116023 found 116027
[ 5473.734797] parent transid verify failed on 1229060423680 wanted
116023 found 116027
[ 5473.734801] btrfs: failed to read tree root on sdb1
[ 5473.735010] parent transid verify failed on 1229060423680 wanted
116023 found 116027
[ 5473.735259] parent transid verify failed on 1229060423680 wanted
116023 found 116027
[ 5473.735262] btrfs: failed to read tree root on sdb1
[ 5473.756367] parent transid verify failed on 1229060243456 wanted
116022 found 116028
[ 5473.761968] parent transid verify failed on 1229060243456 wanted
116022 found 116028
[ 5473.761975] btrfs: failed to read tree root on sdb1
[ 5475.561208] btrfs bad tree block start 7479324919942847850 1241518882816
[ 5475.567008] btrfs bad tree block start 13410158725948676859 1241518882816
[ 5475.567056] Failed to read block groups: -5
[ 5475.570200] btrfs: open_ctree failed

I'm on kernel 3.6.10 and have been before this problem.

Thanks.

On Sat, Dec 29, 2012 at 5:29 AM, cwillu cwi...@cwillu.com wrote:
 On Fri, Dec 28, 2012 at 12:09 PM, Jordan Windsor jorda...@gmail.com wrote:
 Hello,
 I moved my btrfs to the beginning of my drive  updated the partition
 table  also restarted, I'm currently unable to mount it, here's the
 output in dmesg.

 [  481.513432] device label Storage devid 1 transid 116023 /dev/sdb1
 [  481.514277] btrfs: disk space caching is enabled
 [  481.522611] parent transid verify failed on 1229060423680 wanted
 116023 found 116027
 [  481.522789] parent transid verify failed on 1229060423680 wanted
 116023 found 116027
 [  481.522790] btrfs: failed to read tree root on sdb1
 [  481.523656] btrfs: open_ctree failed

 What command should I run from here?

 The filesystem wasn't uncleanly unmounted, likely on an older kernel.

 Try mounting with -o recovery
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovering parent transid verify failed

2012-03-25 Thread Yo'av Moshe
Anything new? I'm still trying to fix my FS every once in a while,
none of the tools helps.


 This is what find-root gives: http://pastebin.com/KycgzhaP

 Btrfsck still only gives this:
 # sudo ./btrfsck --repair /dev/sda4
 enabling repair mode
 parent transid verify failed on 216925220864 wanted 135714 found 135713
 parent transid verify failed on 216925220864 wanted 135714 found 135713
 parent transid verify failed on 216925220864 wanted 135714 found 135713
 parent transid verify failed on 216925220864 wanted 135714 found 135713
 Ignoring transid failure
 btrfsck: root-tree.c:46: btrfs_find_last_root: Assertion
`!(path-slots[0] == 0)' failed.

 Anymore details I can give you which will help resolving this? Thanks.

 Yo'av

 בתאריך 6 במרץ 2011 11:02, מאת Hugo Mills hugo-l...@carfax.org.uk:

 On Sun, Mar 06, 2011 at 12:28:41PM +0200, Yo'av Moshe wrote:
  Hey,
  I'd start by saying that I know Btrfs is a still experimental, and so
  there's no guarantee that one would be able to help me at all... But I
  thought I'll try anyway :-)
 
  Few months ago I bought a new laptop and installed ArchLinux on it,
  with Btrfs on the root filesystem... I know, it's not the smartest
  thing to do...
  After a few month I had issues with my hibernations scripts, and one
  day I tried to hibernate my computer but it didn't go that well, and,
  well, ever since then my Btrfs partition is not accessible.
  I opened up the Btrfs FAQ and saw that the fsck tool should be out by
  the end of 2010, and thought oh well, I could wait until then, and
  went on and installed Ubuntu with Ext4 on another small partition.
 
  But times goes one and the fsck tool is still in development... I've
  tried using the code from GIT and it didn't work, and I'm starting to
  wonder (a) if there's any hope at all and (b) what other step am I
  able to do to recover my old Btrfs partition.

   Yes, there is hope. This error should be fixable with the new fsck.

  When trying to mount the Btrfs parition I get this in dmesg:
  [105252.779080] device fsid d14e78a602757297-bf762d859b406ca9 devid 1
  transid 135714 /dev/sda4
  [105252.818697] parent transid verify failed on 216925220864 wanted
  135714 found 135713
 [snip]
  Should I wait for btrfsck to be ready?

   Yes.

  Am I not using it correctly now?

   No, there's not a lot the current version can do right now.

  Is there anyway to recover this partition or should I just wipe it and
  reinstall Btrfs only when I'm supposed to?..
 
  Your help is appreciated.

   HTH,
   Hugo.

 --
 === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
     --- I am the author. You are the audience. I outrank you! ---

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (GNU/Linux)

 iD8DBQFNc2nhIKyzvlFcI40RAsq2AKCrSZE6nXYIRbxfLThwIH/yEeO/iACggwHZ
 vXj/K5R746xiMj8x6Ehdzbs=
 =zinf
 -END PGP SIGNATURE-




 --
 Yo'av Moshe



--
Yo'av Moshe
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted btrfs volume: parent transid verify failed

2011-08-15 Thread David Pottage

 Regarding your backup regimen, consider using rsync instead of dd:
 after the initial backup, rsync can update the existing backup _much_
 more quickly, making it practical to do a backup every night, or even
 multiple times a day.  dd also has the downside of potentially
 _really_ confusing btrfs if it ever sees the backup and the original
 at the same time.

A still better option is to use an online backup service such as crashplan
or spideroak, as that way your backups are also safe from fire or theft.
Also most will automatically create incremental backups several times per
hour, so that you can access old versions of your file easily.

Crashplan has a free online backup service where you backup to a friend's
computer over the internet instead of to their servers. Another cheap
alternative for small and very important files is to email them to your
google mail account so you can retrieve lost files from any computer.

I know of one Comp Sci professor who advises all his students to use that
email method of backup for important thesis and suchlike as well as any
other backup method. His argument is that if a student's room mate gets
arrested, then the cops are likely to take away all computers and backup
media, so in that case an online backup will be the only usable one.

-- 
David Pottage

Error compiling committee.c To many arguments to function.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted btrfs volume: parent transid verify failed

2011-08-14 Thread Yalonda Gishtaka
Halp!  I was recently forced to power cycle my desktop PC, and upon
restart, the btrfs /home volume would no longer mount, citing the
error BUG: scheduling while atomic: mount /5584/0x2.  I
retrieved the latest btrfs-progs git repositories from
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
and http://git.darksatanic.net/repo/btrfs-progs-unstable.git -b
integration-20110805, but when running sudo ./btrfsck -s 1
/dev/mapper/home from either repo builds, I receive the error parent
transid verify failed on 647363842048 wanted 210333 found 210302
(repeated 3x).  I've also tried the flags -s 0, -s 1, and -s 2, all
with the same results.

I take care to complete a full dd copy of my disk every 2 weeks, but
my previous backup is nearly 2 weeks old and I've put in almost 2
weeks of effort on my masters thesis since then.  I'm quite desperate
to recover this volume.  Any help is appreciated, as I've exhausted
the existing suggestions from the mailing list posts to date.  I've
tried to ask in #btrfs, but suspect that they're all sleepy bearded
people :(

Regards,
-Yalonda
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted btrfs volume: parent transid verify failed

2011-08-14 Thread Fajar A. Nugraha
On Mon, Aug 15, 2011 at 4:13 AM, Yalonda Gishtaka
yalonda.gisht...@gmail.com wrote:
 Halp!  I was recently forced to power cycle my desktop PC, and upon
 restart, the btrfs /home volume would no longer mount, citing the
 error BUG: scheduling while atomic: mount /5584/0x2.  I
 retrieved the latest btrfs-progs git repositories from
 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
 and http://git.darksatanic.net/repo/btrfs-progs-unstable.git -b
 integration-20110805, but when running sudo ./btrfsck -s 1
 /dev/mapper/home from either repo builds, I receive the error parent
 transid verify failed on 647363842048 wanted 210333 found 210302
 (repeated 3x).  I've also tried the flags -s 0, -s 1, and -s 2, all
 with the same results.

Is there something in the log about replaying log? If yes, try btrfs-zero-log
https://btrfs.wiki.kernel.org/index.php/Problem_FAQ

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted btrfs volume: parent transid verify failed

2011-08-14 Thread Yalonda Gishtaka
Fajar,

Thank you for the suggestion.  Unfortunately, running sudo
./btrfs-zero-log /dev/mapper/home results in the same parent transid
verify failed on 647363842048 wanted 210333 found 210302 errors,
repeated 3 times.

I am running Arch Linux with the latest 3.0.1 kernel on a x86_64 machine.

Regards,
-Yalonda

On Sun, Aug 14, 2011 at 11:40 PM, Fajar A. Nugraha l...@fajar.net wrote:
 On Mon, Aug 15, 2011 at 4:13 AM, Yalonda Gishtaka
 yalonda.gisht...@gmail.com wrote:
 Halp!  I was recently forced to power cycle my desktop PC, and upon
 restart, the btrfs /home volume would no longer mount, citing the
 error BUG: scheduling while atomic: mount /5584/0x2.  I
 retrieved the latest btrfs-progs git repositories from
 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
 and http://git.darksatanic.net/repo/btrfs-progs-unstable.git -b
 integration-20110805, but when running sudo ./btrfsck -s 1
 /dev/mapper/home from either repo builds, I receive the error parent
 transid verify failed on 647363842048 wanted 210333 found 210302
 (repeated 3x).  I've also tried the flags -s 0, -s 1, and -s 2, all
 with the same results.

 Is there something in the log about replaying log? If yes, try btrfs-zero-log
 https://btrfs.wiki.kernel.org/index.php/Problem_FAQ

 --
 Fajar
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted btrfs volume: parent transid verify failed

2011-08-14 Thread Yalonda Gishtaka
Soon seems a bit subjective given that the devs have been touting
this since the beginning of time.

/Helpful/ advice would be nice.

This blog posting
(http://stujordan.wordpress.com/2011/06/20/churning-the-butter/)
sounded promising, but none of the superblock copies on my btrfs
volume are ok, as I keep receiving the same parent transid verify
failed messages.  Will be released btrfsck tool handle this case?

On Mon, Aug 15, 2011 at 1:10 AM, Michael Cronenworth m...@cchtml.com wrote:
 On 08/14/2011 04:13 PM, Yalonda Gishtaka wrote:

 I'm quite desperate
 to recover this volume.

 You should have had backups.

 Btrfs has no file system repair tool, but it is supposed to be out soon
 (tm). You will have to wait.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: corrupted btrfs volume: parent transid verify failed

2011-08-14 Thread Yalonda Gishtaka
Telling someone (that has a ~2 week stale backup) that they should
have kept backups is hardly constructive.  We're all aware there's no
official btrfs repair tool.  But it appears there has been been some
hard, dedicated work towards this that has resulted in many commits
and patches.  I'm here to find out what there is to know about recent
developments that may help my current situation.  Please consider
offering helpful advice instead of pointing out the obvious about my
backup schedule.

Cheers,
-Yalonda

On Mon, Aug 15, 2011 at 1:51 AM, Michael Cronenworth m...@cchtml.com wrote:
 On 08/14/2011 06:32 PM, Yalonda Gishtaka wrote:

 /Helpful/  advice would be nice.

 Being hostile will net you zero advice.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726

2011-06-22 Thread Hugo Mills
On Tue, Jun 21, 2011 at 05:01:53PM +0200, Francesco R wrote:
 2011/6/21 Daniel Witzel dannyboy48...@gmail.com:
  Welcome to the club, I have a similar issue. We pretty much have to wait 
  for the
  fsck tool to finish being developed. If possible unhook the drives and leave
  them be until the tool is done. I don't know when it will be done as I am 
  not a
  developer, mearly a follower.
 
 
 There are tools to view the metadata stored as raid10? possibly in
 high level language?
 
 I see Chris Mason stopped git commits to btrfs-progs-unstable in 2010,
 there is someone working on it?

   There have been lots of commits and patches since then. The tmp
branch contains a bunch of commits from Chris, and the
integration-20110616 branch in my git repository[1] contains more or
less all of the other patches that have made it to this mailing list
since.

   Sadly, none of them contain the new btrfsck code. :(

   Hugo.

[1] http://git.darksatanic.net/repo/btrfs-progs-unstable.git/

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- There are three mistaiks in this sentance. ---


signature.asc
Description: Digital signature


Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726

2011-06-22 Thread Daniel Witzel
Well, I'm patient. Rather have a fsck that works than a fsck that may thrash the
FS so no 'gun to the head' on this one. Some feedback on RECENT progress would
be nice. Besides your merge branch Hugo (yes I tried it still no cigar...) it's
been ghost since December 2010.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726

2011-06-22 Thread Andrej Podzimek

Hello,

I am facing the same issue on a Btrfs RAID0 with 2 drives:

Label: 'root'  uuid: 1e26b203-fc1e-4ebf-9551-451bd34d3ac4
Total devices 2 FS bytes used 36.14GB
devid1 size 80.43GB used 41.65GB path /dev/sda6
devid2 size 80.43GB used 41.63GB path /dev/sdb6

Btrfs v0.19-36-g70c6c10-dirty

Tried btrfs-select-super -s 1 /dev/sd[ab]6, but that does not help at all. On 
both drives, its standard output is identical:

parent transid verify failed on 576901120 wanted 70669 found 70755
btrfs-select-super: disk-io.c:412: find_and_setup_root: Assertion 
`!(!root-node)' failed.
using SB copy 1, bytenr 67108864

This is the first error message from dmesg:

[  156.617407] parent transid verify failed on 576901120 wanted 70669 
found 70755
[  156.617504] parent transid verify failed on 576901120 wanted 70669 
found 70755
[  156.635322] btrfs: open_ctree failed

The problem occurred shortly after this issue: 
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg10618.html The 
machine booted (and worked normally) at least ten times between the error 
message and the current problem.

The kernel version was 2.6.39.1 when the first BUG message appeared in dmesg. I 
downgraded to 2.6.38.8 after that and everything seemed to work fine ... up to 
now.

Any suggestions? ;-) I can always restore the data from another machine with 
identical installation. But first of all I'd like to understand this problem 
and know whether it can be dealt with somehow.

Andrej



smime.p7s
Description: S/MIME Cryptographic Signature


[HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726

2011-06-21 Thread Francesco R
[HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726

Hi list, I've a broken btrfs filesystem to deal with can someone
please help me in recover the data?

The filesystem has been created a pair of years ago with 4 devices
with the command at #create and is mounted with #fstab options.
Recently I've added a pair of devices and made a `btrfs filesystem
balance`, after it succeded I was doing a `btrfs device delete` on
space02 (the currently broken one) in the middle of this the power
cable has been axed.
After replacing the cable cord 'space01' is mountable, 'space02' is not.

tryed to use a backup copy of super with `btrfs-select-super` but it
fail as reported in #btrfs-select-super

please, pretty please have you suggestion on what try next?


#current kernel (vanilla + linux-vserver)
uname -a
Linux dobbia 2.6.38.8-vs2.3.0.37-rc17 #5 SMP Mon Jun 20 15:04:39 CEST
2011 x86_64 Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz GenuineIntel
GNU/Linux

#create
modprobe btrfs
mkfs.btrfs -L space01 -m raid10 -d raid10 $DEVICES1
mkfs.btrfs -L space02 -m raid10 -d raid10 $DEVICES2

# fstab
/dev/sda6 /mnt/space01 btrfs
defaults,device=/dev/sda6,device=/dev/sdc6,device=/dev/sdd1,device=/dev/sde1,device=/dev/sdf6,device=/dev/sdg6
0 0
/dev/sda7 /mnt/space02 btrfs
defaults,device=/dev/sda7,device=/dev/sdb7,device=/dev/sdc7,device=/dev/sdd2,device=/dev/sde2,device=/dev/sdf7,device=/dev/sdg7
0 0

# current layout
btrfs filesystem show
failed to read /dev/sr0
Label: 'space01'  uuid: c77c6e87-fccd-4204-bd2c-d924fe06be31
Total devices 6 FS bytes used 164.81GB
devid7 size 244.14GB used 56.59GB path /dev/sdf6
devid5 size 244.93GB used 56.59GB path /dev/sdd1
devid8 size 244.14GB used 56.59GB path /dev/sdg6
devid6 size 244.93GB used 56.59GB path /dev/sde1
devid4 size 244.14GB used 56.59GB path /dev/sda6
devid3 size 244.14GB used 56.59GB path /dev/sdc6

Label: 'space02'  uuid: f752def1-1abc-48c7-8ebb-47ba37b8ffa6
Total devices 7 FS bytes used 172.94GB
devid7 size 487.65GB used 0.00 path /dev/sdf7
devid6 size 488.94GB used 60.25GB path /dev/sde2
devid5 size 488.94GB used 58.75GB path /dev/sdd2
devid4 size 487.65GB used 60.26GB path /dev/sda7
devid7 size 487.65GB used 1.50GB path /dev/sdg7
devid2 size 487.65GB used 58.76GB path /dev/sdb7
devid3 size 487.65GB used 60.26GB path /dev/sdc7

Btrfs v0.19-35-g1b444cd-dirty

# first error messages
Jun 20 14:04:35 dobbia kernel: [  806.587580] device label space02
devid 4 transid 757294 /dev/sda7
Jun 20 14:04:35 dobbia kernel: [  806.629781] device label space02
devid 2 transid 756848 /dev/sdb7
Jun 20 14:04:35 dobbia kernel: [  806.630107] device label space02
devid 3 transid 757294 /dev/sdc7
Jun 20 14:04:35 dobbia kernel: [  806.652126] device label space02
devid 5 transid 756846 /dev/sdd2
Jun 20 14:04:37 dobbia kernel: [  808.201719] device label space02
devid 6 transid 757294 /dev/sde2
Jun 20 14:04:37 dobbia kernel: [  808.218108] device label space02
devid 7 transid 756846 /dev/sdf7
Jun 20 14:04:37 dobbia kernel: [  808.218433] device label space02
devid 7 transid 757294 /dev/sdg7
Jun 20 14:04:37 dobbia kernel: [  808.218715] device label space02
devid 4 transid 757294 /dev/sda7
Jun 20 14:04:37 dobbia kernel: [  808.271797] btrfs: failed to read
the system array on sdg7
Jun 20 14:04:37 dobbia kernel: [  808.293776] btrfs: open_ctree failed
Jun 20 14:04:56 dobbia kernel: [  827.190208] device label space02
devid 4 transid 757294 /dev/sda7
Jun 20 14:04:56 dobbia kernel: [  827.254517] btrfs: failed to read
the system array on sdg7
Jun 20 14:04:56 dobbia kernel: [  827.280152] btrfs: open_ctree failed
Jun 20 14:05:01 dobbia kernel: [  832.442454] device label space02
devid 4 transid 757294 /dev/sda7
Jun 20 14:05:01 dobbia kernel: [  832.502017] btrfs: failed to read
the system array on sdg7
Jun 20 14:05:01 dobbia kernel: [  832.521492] btrfs: open_ctree failed
Jun 20 14:05:20 dobbia kernel: [  851.113237] device label space02
devid 4 transid 757294 /dev/sda7
Jun 20 14:05:20 dobbia kernel: [  851.199478] btrfs: allowing degraded mounts
Jun 20 14:05:20 dobbia kernel: [  851.563583] parent transid verify
failed on 600755752960 wanted 757102 found 756726
Jun 20 14:05:20 dobbia kernel: [  851.564146] parent transid verify
failed on 600755752960 wanted 757102 found 756726
Jun 20 14:05:20 dobbia kernel: [  851.651006] btrfs bad tree block
start 0 600859951104
Jun 20 14:05:20 dobbia kernel: [  851.671362] parent transid verify
failed on 600859955200 wanted 756926 found 756726
Jun 20 14:05:20 dobbia kernel: [  851.671636] parent transid verify
failed on 600859955200 wanted 756926 found 756726
Jun 20 14:05:20 dobbia kernel: [  851.693515] btrfs bad tree block
start 0 601053986816
Jun 20 14:05:20 dobbia kernel: [  851.693559] btrfs bad tree block
start 0 601054003200
Jun 20 14:05:20 dobbia kernel: [  851.693566

Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726

2011-06-21 Thread Daniel Witzel
Welcome to the club, I have a similar issue. We pretty much have to wait for the
fsck tool to finish being developed. If possible unhook the drives and leave
them be until the tool is done. I don't know when it will be done as I am not a
developer, mearly a follower.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [HELP!] parent transid verify failed on 600755752960 wanted 757102 found 756726

2011-06-21 Thread Francesco R
2011/6/21 Daniel Witzel dannyboy48...@gmail.com:
 Welcome to the club, I have a similar issue. We pretty much have to wait for 
 the
 fsck tool to finish being developed. If possible unhook the drives and leave
 them be until the tool is done. I don't know when it will be done as I am not 
 a
 developer, mearly a follower.


There are tools to view the metadata stored as raid10? possibly in
high level language?

I see Chris Mason stopped git commits to btrfs-progs-unstable in 2010,
there is someone working on it?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Having parent transid verify failed

2011-06-02 Thread Johannes Hirte
On Thursday 05 May 2011 22:32:42 Chris Mason wrote:
 Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400:
  I think i made some progress. When i tried to remove the directory that
  i suspect contains the problematic file, i got this on the console
  
  rm -rf serverloft/
 
 Ok, our one bad block is in the extent allocation tree.  This is going
 to be the very hardest thing to fix.
 
 Until I finish off the code to rebuild parts of the extent allocation
 tree, I think your best bet is to copy the files off.
 
 The big question is, what happened to make this error?  Can you describe
 your setup in more detail?
 
 -chris

It seems that I run into the same problem:

parent transid verify failed on 32940560384 wanted 210334 found 210342
BUG: scheduling while atomic: chrome/17058/0x0002
Modules linked in: snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device 
snd_pcm_oss snd_mixer_oss fuse dm_crypt dm_mod usbhid snd_intel8x0 
snd_ac97_codec sr_mod cdrom ac97_bus snd_pcm sg snd_timer snd e1000 fschmd 
uhci_hcd snd_page_alloc i2c_i801 [last unloaded: microcode]
Pid: 17058, comm: chrome Tainted: GW   2.6.39 #29
Call Trace:
 [c13cf70c] ? schedule+0x78/0x6ef
 [c11acabb] ? generic_make_request+0x1d5/0x22f
 [c11acbad] ? submit_bio+0x98/0x9f
 [c118026a] ? btrfs_map_bio+0x1ab/0x1b5
 [c13cfdc2] ? io_schedule+0x3f/0x50
 [c105723d] ? sleep_on_page+0x5/0x8
 [c13d0292] ? __wait_on_bit+0x31/0x58
 [c1057238] ? __lock_page+0x52/0x52
 [c1057388] ? wait_on_page_bit+0x5a/0x62
 [c1037f92] ? autoremove_wake_function+0x29/0x29
 [c117ab39] ? read_extent_buffer_pages+0x33a/0x3b5
 [c115891f] ? btree_read_extent_buffer_pages.clone.51+0x44/0x9e
 [c11578b0] ? verify_parent_transid+0x147/0x147
 [c11593aa] ? read_tree_block+0x2d/0x3e
 [c1144f90] ? read_block_for_search.clone.36+0xc3/0x35d
 [c11863bf] ? btrfs_tree_unlock+0x19/0x3a
 [c11420bb] ? unlock_up+0x88/0x9f
 [c1146f7e] ? btrfs_search_slot+0x39d/0x4fe
 [c1149fa1] ? lookup_inline_extent_backref+0x116/0x49b
 [c11773b0] ? set_extent_dirty+0x19/0x1d
 [c114cbd0] ? __btrfs_free_extent+0xe2/0x6c6
 [c114fa28] ? run_clustered_refs+0x6ad/0x720
 [c1191330] ? btrfs_find_ref_cluster+0x53/0x11f
 [c114fb53] ? btrfs_run_delayed_refs+0xb8/0x18d
 [c115d395] ? __btrfs_end_transaction+0x5a/0x17f
 [c115d4dc] ? btrfs_end_transaction+0x9/0xb
 [c1165e19] ? btrfs_evict_inode+0x190/0x1a7
 [c1092c45] ? evict+0x56/0xeb
 [c108baa8] ? do_unlinkat+0xc3/0x103
 [c13d1c90] ? sysenter_do_call+0x12/0x26
 [c13d] ? console_conditional_schedule+0x8/0xf
parent transid verify failed on 32940560384 wanted 210334 found 210342
BUG: scheduling while atomic: chrome/17058/0x0002
Modules linked in: snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device 
snd_pcm_oss snd_mixer_oss fuse dm_crypt dm_mod usbhid snd_intel8x0 
snd_ac97_codec sr_mod cdrom ac97_bus snd_pcm sg snd_timer snd e1000 fschmd 
uhci_hcd snd_page_alloc i2c_i801 [last unloaded: microcode]
Pid: 17058, comm: chrome Tainted: GW   2.6.39 #29
Call Trace:
 [c13cf70c] ? schedule+0x78/0x6ef
 [c11acabb] ? generic_make_request+0x1d5/0x22f
 [c11acbad] ? submit_bio+0x98/0x9f
 [c118026a] ? btrfs_map_bio+0x1ab/0x1b5
 [c13cfdc2] ? io_schedule+0x3f/0x50
 [c105723d] ? sleep_on_page+0x5/0x8
 [c13d0292] ? __wait_on_bit+0x31/0x58
 [c1057238] ? __lock_page+0x52/0x52
 [c1057388] ? wait_on_page_bit+0x5a/0x62
 [c1037f92] ? autoremove_wake_function+0x29/0x29
 [c117ab39] ? read_extent_buffer_pages+0x33a/0x3b5
 [c116bd50] ? lookup_extent_mapping+0x5a/0x148
 [c115891f] ? btree_read_extent_buffer_pages.clone.51+0x44/0x9e
 [c11578b0] ? verify_parent_transid+0x147/0x147
 [c11593aa] ? read_tree_block+0x2d/0x3e
 [c1144f90] ? read_block_for_search.clone.36+0xc3/0x35d
 [c11863bf] ? btrfs_tree_unlock+0x19/0x3a
 [c11420bb] ? unlock_up+0x88/0x9f
 [c1146f7e] ? btrfs_search_slot+0x39d/0x4fe
 [c1149fa1] ? lookup_inline_extent_backref+0x116/0x49b
 [c11773b0] ? set_extent_dirty+0x19/0x1d
 [c114cbd0] ? __btrfs_free_extent+0xe2/0x6c6
 [c114fa28] ? run_clustered_refs+0x6ad/0x720
 [c1191330] ? btrfs_find_ref_cluster+0x53/0x11f
 [c114fb53] ? btrfs_run_delayed_refs+0xb8/0x18d
 [c115d395] ? __btrfs_end_transaction+0x5a/0x17f
 [c115d4dc] ? btrfs_end_transaction+0x9/0xb
 [c1165e19] ? btrfs_evict_inode+0x190/0x1a7
 [c1092c45] ? evict+0x56/0xeb
 [c108baa8] ? do_unlinkat+0xc3/0x103
 [c13d1c90] ? sysenter_do_call+0x12/0x26
 [c13d] ? console_conditional_schedule+0x8/0xf
parent transid verify failed on 32940560384 wanted 210334 found 210342
BUG: scheduling while atomic: chrome/17058/0x0002
Modules linked in: snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device 
snd_pcm_oss snd_mixer_oss fuse dm_crypt dm_mod usbhid snd_intel8x0 
snd_ac97_codec sr_mod cdrom ac97_bus snd_pcm sg snd_timer snd e1000 fschmd 
uhci_hcd snd_page_alloc i2c_i801 [last unloaded: microcode]
Pid: 17058, comm: chrome Tainted: GW   2.6.39 #29
Call Trace:
 [c13cf70c] ? schedule+0x78/0x6ef
 [c11acabb] ? generic_make_request+0x1d5/0x22f
 [c11acbad] ? submit_bio+0x98/0x9f
 [c118026a

cannot mount btrfs - parent transid verify failed

2011-05-23 Thread Robert Schöftner
hello!

A power outage damaged a btrfs - it could not be mounted upon startup.
kernel: 2.6.38 (from ubuntu kernel ppa).

dmesg:

[   88.562819] device fsid 844676ff057abdd4-ccd6cf8af4e14dba devid 1
transid 112504 /dev/sdb1
[   88.596515] verify_parent_transid: 6 callbacks suppressed
[   88.596518] parent transid verify failed on 408626470912 wanted
24 found 111474
[   88.596686] parent transid verify failed on 408626470912 wanted
24 found 111474
[   88.600062] parent transid verify failed on 408626470912 wanted
24 found 111474
[   88.600067] parent transid verify failed on 408626470912 wanted
24 found 111474
[   88.670071] btrfs: open_ctree failed

I compiled the latest btrfs-progs-unstable and tried btrfsck:

root@tesla:/root/btrfs-progs-unstable# ./btrfsck /dev/sdb1
parent transid verify failed on 408626470912 wanted 24 found 111474
parent transid verify failed on 408626470912 wanted 24 found 111474
parent transid verify failed on 408626470912 wanted 24 found 111474
btrfsck: disk-io.c:416: find_and_setup_root: Assertion `!(!root-node)'
failed.
Abgebrochen

root@tesla:/root/btrfs-progs-unstable# ./btrfsck -s 1 /dev/sdb1
using SB copy 1, bytenr 67108864
parent transid verify failed on 408626470912 wanted 24 found 111474
parent transid verify failed on 408626470912 wanted 24 found 111474
parent transid verify failed on 408626470912 wanted 24 found 111474
btrfsck: disk-io.c:416: find_and_setup_root: Assertion `!(!root-node)'
failed.
Abgebrochen

root@tesla:/root/btrfs-progs-unstable# ./btrfsck -s 2 /dev/sdb1
using SB copy 2, bytenr 274877906944
parent transid verify failed on 408626470912 wanted 24 found 111474
parent transid verify failed on 408626470912 wanted 24 found 111474
parent transid verify failed on 408626470912 wanted 24 found 111474
btrfsck: disk-io.c:416: find_and_setup_root: Assertion `!(!root-node)'
failed.
Abgebrochen


root@tesla:/root/btrfs-progs-unstable# ./btrfs filesystem show /dev/sdb1
failed to read /dev/sr0
Label: none  uuid: d4bd7a05-ff76-4684-ba4d-e1f48acfd6cc
Total devices 1 FS bytes used 544.12GB
devid1 size 931.51GB used 547.79GB path /dev/sdb1

Btrfs v0.19-36-g70c6c10

The data on this filesystem is not important, though it would be nice if
i could regain access to it. Is there anything I can try to salvage it?

thanks

Robert
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Having parent transid verify failed

2011-05-05 Thread Konstantinos Skarlatos
Hello, I have a 5.5TB Btrfs filesystem on top of a md-raid 5 device. Now 
if i run some file operations like find, i get these messages.

kernel is 2.6.38.5-1 on arch linux

May  5 14:15:12 mail kernel: [13559.089713] parent transid verify failed 
on 3062073683968 wanted 5181 found 5188
May  5 14:15:12 mail kernel: [13559.089834] parent transid verify failed 
on 3062073683968 wanted 5181 found 5188
May  5 14:15:14 mail kernel: [13560.752074] btrfs-transacti D 
88007211ac78 0  5339  2 0x
May  5 14:15:14 mail kernel: [13560.752078]  880023167d30 
0046 8800 8800195b6000
May  5 14:15:14 mail kernel: [13560.752082]  880023167c10 
02c8f27b4000 880023167fd8 88007211a9a0
May  5 14:15:14 mail kernel: [13560.752085]  880023167fd8 
880023167fd8 88007211ac80 880023167fd8

May  5 14:15:14 mail kernel: [13560.752087] Call Trace:
May  5 14:15:14 mail kernel: [13560.752101]  [a0850d02] ? 
run_clustered_refs+0x132/0x830 [btrfs]
May  5 14:15:14 mail kernel: [13560.752105]  [813aff3d] 
schedule_timeout+0x2fd/0x380
May  5 14:15:14 mail kernel: [13560.752108]  [813b0cf9] ? 
mutex_unlock+0x9/0x10
May  5 14:15:14 mail kernel: [13560.752115]  [a087e9f4] ? 
btrfs_run_ordered_operations+0x1f4/0x210 [btrfs]
May  5 14:15:14 mail kernel: [13560.752122]  [a0860fa3] 
btrfs_commit_transaction+0x263/0x750 [btrfs]
May  5 14:15:14 mail kernel: [13560.752126]  [81079ff0] ? 
autoremove_wake_function+0x0/0x40
May  5 14:15:14 mail kernel: [13560.752131]  [a085a9bd] 
transaction_kthread+0x26d/0x290 [btrfs]
May  5 14:15:14 mail kernel: [13560.752137]  [a085a750] ? 
transaction_kthread+0x0/0x290 [btrfs]
May  5 14:15:14 mail kernel: [13560.752139]  [81079717] 
kthread+0x87/0x90
May  5 14:15:14 mail kernel: [13560.752142]  [8100bc24] 
kernel_thread_helper+0x4/0x10
May  5 14:15:14 mail kernel: [13560.752145]  [81079690] ? 
kthread+0x0/0x90
May  5 14:15:14 mail kernel: [13560.752147]  [8100bc20] ? 
kernel_thread_helper+0x0/0x10
May  5 14:15:17 mail kernel: [13564.092081] verify_parent_transid: 40736 
callbacks suppressed
May  5 14:15:17 mail kernel: [13564.092084] parent transid verify failed 
on 3062073683968 wanted 5181 found 5188


--snip--
May  5 14:17:13 mail kernel: [13679.169772] parent transid verify failed 
on 3062073683968 wanted 5181 found 5188

--snip--
May  5 14:17:14 mail kernel: [13680.751996] btrfs-transacti D 
88007211ac78 0  5339  2 0x
May  5 14:17:14 mail kernel: [13680.752000]  880023167d30 
0046 8800 8800195b6000
May  5 14:17:14 mail kernel: [13680.752004]  880023167c10 
02c8f27b4000 880023167fd8 88007211a9a0
May  5 14:17:14 mail kernel: [13680.752006]  880023167fd8 
880023167fd8 88007211ac80 880023167fd8

May  5 14:17:14 mail kernel: [13680.752009] Call Trace:
May  5 14:17:14 mail kernel: [13680.752024]  [a0850d02] ? 
run_clustered_refs+0x132/0x830 [btrfs]
May  5 14:17:14 mail kernel: [13680.752030]  [813aff3d] 
schedule_timeout+0x2fd/0x380
May  5 14:17:14 mail kernel: [13680.752032]  [813b0cf9] ? 
mutex_unlock+0x9/0x10
May  5 14:17:14 mail kernel: [13680.752040]  [a087e9f4] ? 
btrfs_run_ordered_operations+0x1f4/0x210 [btrfs]
May  5 14:17:14 mail kernel: [13680.752046]  [a0860fa3] 
btrfs_commit_transaction+0x263/0x750 [btrfs]
May  5 14:17:14 mail kernel: [13680.752051]  [81079ff0] ? 
autoremove_wake_function+0x0/0x40
May  5 14:17:14 mail kernel: [13680.752057]  [a085a9bd] 
transaction_kthread+0x26d/0x290 [btrfs]
May  5 14:17:14 mail kernel: [13680.752062]  [a085a750] ? 
transaction_kthread+0x0/0x290 [btrfs]
May  5 14:17:14 mail kernel: [13680.752065]  [81079717] 
kthread+0x87/0x90
May  5 14:17:14 mail kernel: [13680.752068]  [8100bc24] 
kernel_thread_helper+0x4/0x10
May  5 14:17:14 mail kernel: [13680.752070]  [81079690] ? 
kthread+0x0/0x90
May  5 14:17:14 mail kernel: [13680.752072]  [8100bc20] ? 
kernel_thread_helper+0x0/0x10
May  5 14:17:14 mail kernel: [13680.752079] dd  D 
8800714c4838 0  5792   5740 0x0004
May  5 14:17:14 mail kernel: [13680.752082]  88006a205b38 
0082 88006a205af8 0246
May  5 14:17:14 mail kernel: [13680.752085]  ea00017f57e8 
88006a205fd8 88006a205fd8 8800714c4560
May  5 14:17:14 mail kernel: [13680.752088]  88006a205fd8 
88006a205fd8 8800714c4840 88006a205fd8

May  5 14:17:14 mail kernel: [13680.752090] Call Trace:
May  5 14:17:14 mail kernel: [13680.752095]  [810ff145] ? 
zone_statistics+0x75/0x90
May  5 14:17:14 mail kernel: [13680.752098]  [810ea8b7] ? 
get_page_from_freelist+0x3c7/0x820
May  5 14:17:14 mail kernel: [13680.752101]  [810e3588] ? 
find_get_page+0x68/0xb0
May  5 14:17:14 mail kernel: [13680.752108]  [a08603f9

Re: Having parent transid verify failed

2011-05-05 Thread Chris Mason
Excerpts from Konstantinos Skarlatos's message of 2011-05-05 07:19:52 -0400:
 Hello, I have a 5.5TB Btrfs filesystem on top of a md-raid 5 device. Now 
 if i run some file operations like find, i get these messages.
 kernel is 2.6.38.5-1 on arch linux

Are all of the messages for this one block?

parent transid verify failed on 3062073683968 wanted 5181 found 5188

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Having parent transid verify failed

2011-05-05 Thread Konstantinos Skarlatos



On 5/5/2011 2:42 μμ, Chris Mason wrote:

Excerpts from Konstantinos Skarlatos's message of 2011-05-05 07:19:52 -0400:

Hello, I have a 5.5TB Btrfs filesystem on top of a md-raid 5 device. Now
if i run some file operations like find, i get these messages.
kernel is 2.6.38.5-1 on arch linux


Are all of the messages for this one block?

parent transid verify failed on 3062073683968 wanted 5181 found 5188

yes, only this block


-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Having parent transid verify failed

2011-05-05 Thread Chris Mason
Excerpts from Konstantinos Skarlatos's message of 2011-05-05 07:45:08 -0400:
 
 On 5/5/2011 2:42 μμ, Chris Mason wrote:
  Excerpts from Konstantinos Skarlatos's message of 2011-05-05 07:19:52 -0400:
  Hello, I have a 5.5TB Btrfs filesystem on top of a md-raid 5 device. Now
  if i run some file operations like find, i get these messages.
  kernel is 2.6.38.5-1 on arch linux
 
  Are all of the messages for this one block?
 
  parent transid verify failed on 3062073683968 wanted 5181 found 5188
 yes, only this block

Ok, what were the call traces in there?  Was there an oops or a hung
task?  It looks like part of the messages are missing.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Having parent transid verify failed

2011-05-05 Thread Chris Mason
Excerpts from Konstantinos Skarlatos's message of 2011-05-05 10:27:30 -0400:
 attached you can find the whole dmesg log. I can trigger the error again 
 if more logs are needed

Yes, I'll send you a patch to get rid of the printk for the transid
failed message.  That way we can get a clean view of the other errors.

Will you be able to compile/test it?

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Having parent transid verify failed

2011-05-05 Thread Konstantinos Skarlatos



On 5/5/2011 6:06 μμ, Chris Mason wrote:

Excerpts from Konstantinos Skarlatos's message of 2011-05-05 10:27:30 -0400:

attached you can find the whole dmesg log. I can trigger the error again
if more logs are needed


Yes, I'll send you a patch to get rid of the printk for the transid
failed message.  That way we can get a clean view of the other errors.

Will you be able to compile/test it?


Yes, i think i will be able to make it, but because i have only done 
this once and in a quite hackish way, i may need some help in order to 
do it right.




-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Having parent transid verify failed

2011-05-05 Thread Konstantinos Skarlatos
I think i made some progress. When i tried to remove the directory that 
i suspect contains the problematic file, i got this on the console


rm -rf serverloft/

2011 May  5 23:32:53 mail [  200.580195] Oops:  [#1] PREEMPT SMP
2011 May  5 23:32:53 mail [  200.580220] last sysfs file: 
/sys/module/vt/parameters/default_utf8

2011 May  5 23:32:53 mail [  200.581145] Stack:
2011 May  5 23:32:53 mail [  200.581276] Call Trace:
2011 May  5 23:32:53 mail [  200.581732] Code: cc 00 00 48 8d 91 28 e0 
ff ff 48 89 e5 48 81 ec 90 00 00 00 48 89 5d d8 4c 89 65 e0 48 89 f3 4c 
89 6d e8 4c 89 75 f0 4c 89 7d f8 48 8b 76 30 83 42 1c 01 48 b8 00 00 
00 00 00 16 00 00 48 01 f0

2011 May  5 23:32:53 mail [  200.583376] CR2: 0030


here is the  part of dmesg that does not contain the  thousands of 
parent transid verify failed messages



May  5 23:32:51 mail kernel: [  198.371084] parent transid verify failed 
on 3062073683968 wanted 5181 found 5188
May  5 23:32:51 mail kernel: [  198.371204] parent transid verify failed 
on 3062073683968 wanted 5181 found 5188
May  5 23:32:53 mail kernel: [  200.572774] Modules linked in: ipv6 
btrfs zlib_deflate crc32c libcrc32c ext2 raid456 async_raid6_recov 
async_pq raid6_pq async_xor xor async_memcpy async_tx md_mod usb_storage 
uas snd_seq_dummy snd_seq_oss radeon snd_seq_midi_event ttm snd_seq 
snd_hda_codec_hdmi snd_seq_device drm_kms_helper ohci_hcd snd_hda_intel 
snd_hda_codec snd_pcm_oss snd_hwdep drm i2c_algo_bit snd_mixer_oss 
snd_pcm i2c_piix4 snd_timer snd soundcore snd_page_alloc ehci_hcd wmi 
i2c_core usbcore evdev processor button k10temp serio_raw pcspkr sg 
r8169 edac_core shpchp pci_hotplug edac_mce_amd mii sp5100_tco ext4 
mbcache jbd2 crc16 sd_mod pata_acpi ahci libahci pata_atiixp libata scsi_mod
May  5 23:32:53 mail kernel: [  200.572808] Pid: 1037, comm: 
btrfs-transacti Not tainted 2.6.38-ARCH #1

May  5 23:32:53 mail kernel: [  200.572810] Call Trace:
May  5 23:32:53 mail kernel: [  200.572817]  [813a932b] ? 
__schedule_bug+0x59/0x5d
May  5 23:32:53 mail kernel: [  200.572820]  [813af827] ? 
schedule+0x9f7/0xad0
May  5 23:32:53 mail kernel: [  200.572823]  [811e5827] ? 
generic_unplug_device+0x37/0x40
May  5 23:32:53 mail kernel: [  200.572827]  [a07ac164] ? 
md_raid5_unplug_device+0x64/0x110 [raid456]
May  5 23:32:53 mail kernel: [  200.572830]  [a07ac223] ? 
raid5_unplug_queue+0x13/0x20 [raid456]
May  5 23:32:53 mail kernel: [  200.572833]  [81012d79] ? 
read_tsc+0x9/0x20
May  5 23:32:53 mail kernel: [  200.572837]  [8108418c] ? 
ktime_get_ts+0xac/0xe0
May  5 23:32:53 mail kernel: [  200.572840]  [810e36c0] ? 
sync_page+0x0/0x50
May  5 23:32:53 mail kernel: [  200.572842]  [813af96e] ? 
io_schedule+0x6e/0xb0
May  5 23:32:53 mail kernel: [  200.572844]  [810e36fb] ? 
sync_page+0x3b/0x50
May  5 23:32:53 mail kernel: [  200.572846]  [813b0077] ? 
__wait_on_bit+0x57/0x80
May  5 23:32:53 mail kernel: [  200.572848]  [810e38c0] ? 
wait_on_page_bit+0x70/0x80
May  5 23:32:53 mail kernel: [  200.572851]  [8107a030] ? 
wake_bit_function+0x0/0x40
May  5 23:32:53 mail kernel: [  200.572861]  [a08348d2] ? 
read_extent_buffer_pages+0x412/0x480 [btrfs]
May  5 23:32:53 mail kernel: [  200.572867]  [a0809e00] ? 
btree_get_extent+0x0/0x1b0 [btrfs]
May  5 23:32:53 mail kernel: [  200.572873]  [a080ac7e] ? 
btree_read_extent_buffer_pages.isra.60+0x5e/0xb0 [btrfs]
May  5 23:32:53 mail kernel: [  200.572880]  [a080c0bc] ? 
read_tree_block+0x3c/0x60 [btrfs]
May  5 23:32:53 mail kernel: [  200.572884]  [a07f272b] ? 
read_block_for_search.isra.34+0x1fb/0x410 [btrfs]
May  5 23:32:53 mail kernel: [  200.572890]  [a08417d1] ? 
btrfs_tree_unlock+0x51/0x60 [btrfs]
May  5 23:32:53 mail kernel: [  200.572895]  [a07f5ca0] ? 
btrfs_search_slot+0x430/0xa30 [btrfs]
May  5 23:32:53 mail kernel: [  200.572900]  [a07fb3a6] ? 
lookup_inline_extent_backref+0x96/0x460 [btrfs]
May  5 23:32:53 mail kernel: [  200.572904]  [8112b8d3] ? 
kmem_cache_alloc+0x133/0x150
May  5 23:32:53 mail kernel: [  200.572908]  [a07fd452] ? 
__btrfs_free_extent+0xc2/0x6d0 [btrfs]
May  5 23:32:53 mail kernel: [  200.572914]  [a0800f59] ? 
run_clustered_refs+0x389/0x830 [btrfs]
May  5 23:32:53 mail kernel: [  200.572920]  [a084d900] ? 
btrfs_find_ref_cluster+0x10/0x190 [btrfs]
May  5 23:32:53 mail kernel: [  200.572925]  [a08014c0] ? 
btrfs_run_delayed_refs+0xc0/0x210 [btrfs]
May  5 23:32:53 mail kernel: [  200.572927]  [813b0cf9] ? 
mutex_unlock+0x9/0x10
May  5 23:32:53 mail kernel: [  200.572933]  [a0810db8] ? 
btrfs_commit_transaction+0x78/0x750 [btrfs]
May  5 23:32:53 mail kernel: [  200.572936]  [81079ff0] ? 
autoremove_wake_function+0x0/0x40
May  5 23:32:53 mail kernel: [  200.572941]  [a080a9bd] ? 
transaction_kthread+0x26d/0x290 [btrfs]
May  5 23:32:53 mail kernel

Re: Having parent transid verify failed

2011-05-05 Thread Chris Mason
Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400:
 I think i made some progress. When i tried to remove the directory that 
 i suspect contains the problematic file, i got this on the console
 
 rm -rf serverloft/

Ok, our one bad block is in the extent allocation tree.  This is going
to be the very hardest thing to fix.

Until I finish off the code to rebuild parts of the extent allocation
tree, I think your best bet is to copy the files off.

The big question is, what happened to make this error?  Can you describe
your setup in more detail?

-chris

 
 2011 May  5 23:32:53 mail [  200.580195] Oops:  [#1] PREEMPT SMP
 2011 May  5 23:32:53 mail [  200.580220] last sysfs file: 
 /sys/module/vt/parameters/default_utf8
 2011 May  5 23:32:53 mail [  200.581145] Stack:
 2011 May  5 23:32:53 mail [  200.581276] Call Trace:
 2011 May  5 23:32:53 mail [  200.581732] Code: cc 00 00 48 8d 91 28 e0 
 ff ff 48 89 e5 48 81 ec 90 00 00 00 48 89 5d d8 4c 89 65 e0 48 89 f3 4c 
 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 8b 76 30 83 42 1c 01 48 b8 00 00 
 00 00 00 16 00 00 48 01 f0
 2011 May  5 23:32:53 mail [  200.583376] CR2: 0030
 
 
 here is the  part of dmesg that does not contain the  thousands of 
 parent transid verify failed messages
 
 
 May  5 23:32:51 mail kernel: [  198.371084] parent transid verify failed 
 on 3062073683968 wanted 5181 found 5188
 May  5 23:32:51 mail kernel: [  198.371204] parent transid verify failed 
 on 3062073683968 wanted 5181 found 5188
 May  5 23:32:53 mail kernel: [  200.572774] Modules linked in: ipv6 
 btrfs zlib_deflate crc32c libcrc32c ext2 raid456 async_raid6_recov 
 async_pq raid6_pq async_xor xor async_memcpy async_tx md_mod usb_storage 
 uas snd_seq_dummy snd_seq_oss radeon snd_seq_midi_event ttm snd_seq 
 snd_hda_codec_hdmi snd_seq_device drm_kms_helper ohci_hcd snd_hda_intel 
 snd_hda_codec snd_pcm_oss snd_hwdep drm i2c_algo_bit snd_mixer_oss 
 snd_pcm i2c_piix4 snd_timer snd soundcore snd_page_alloc ehci_hcd wmi 
 i2c_core usbcore evdev processor button k10temp serio_raw pcspkr sg 
 r8169 edac_core shpchp pci_hotplug edac_mce_amd mii sp5100_tco ext4 
 mbcache jbd2 crc16 sd_mod pata_acpi ahci libahci pata_atiixp libata scsi_mod
 May  5 23:32:53 mail kernel: [  200.572808] Pid: 1037, comm: 
 btrfs-transacti Not tainted 2.6.38-ARCH #1
 May  5 23:32:53 mail kernel: [  200.572810] Call Trace:
 May  5 23:32:53 mail kernel: [  200.572817]  [813a932b] ? 
 __schedule_bug+0x59/0x5d
 May  5 23:32:53 mail kernel: [  200.572820]  [813af827] ? 
 schedule+0x9f7/0xad0
 May  5 23:32:53 mail kernel: [  200.572823]  [811e5827] ? 
 generic_unplug_device+0x37/0x40
 May  5 23:32:53 mail kernel: [  200.572827]  [a07ac164] ? 
 md_raid5_unplug_device+0x64/0x110 [raid456]
 May  5 23:32:53 mail kernel: [  200.572830]  [a07ac223] ? 
 raid5_unplug_queue+0x13/0x20 [raid456]
 May  5 23:32:53 mail kernel: [  200.572833]  [81012d79] ? 
 read_tsc+0x9/0x20
 May  5 23:32:53 mail kernel: [  200.572837]  [8108418c] ? 
 ktime_get_ts+0xac/0xe0
 May  5 23:32:53 mail kernel: [  200.572840]  [810e36c0] ? 
 sync_page+0x0/0x50
 May  5 23:32:53 mail kernel: [  200.572842]  [813af96e] ? 
 io_schedule+0x6e/0xb0
 May  5 23:32:53 mail kernel: [  200.572844]  [810e36fb] ? 
 sync_page+0x3b/0x50
 May  5 23:32:53 mail kernel: [  200.572846]  [813b0077] ? 
 __wait_on_bit+0x57/0x80
 May  5 23:32:53 mail kernel: [  200.572848]  [810e38c0] ? 
 wait_on_page_bit+0x70/0x80
 May  5 23:32:53 mail kernel: [  200.572851]  [8107a030] ? 
 wake_bit_function+0x0/0x40
 May  5 23:32:53 mail kernel: [  200.572861]  [a08348d2] ? 
 read_extent_buffer_pages+0x412/0x480 [btrfs]
 May  5 23:32:53 mail kernel: [  200.572867]  [a0809e00] ? 
 btree_get_extent+0x0/0x1b0 [btrfs]
 May  5 23:32:53 mail kernel: [  200.572873]  [a080ac7e] ? 
 btree_read_extent_buffer_pages.isra.60+0x5e/0xb0 [btrfs]
 May  5 23:32:53 mail kernel: [  200.572880]  [a080c0bc] ? 
 read_tree_block+0x3c/0x60 [btrfs]
 May  5 23:32:53 mail kernel: [  200.572884]  [a07f272b] ? 
 read_block_for_search.isra.34+0x1fb/0x410 [btrfs]
 May  5 23:32:53 mail kernel: [  200.572890]  [a08417d1] ? 
 btrfs_tree_unlock+0x51/0x60 [btrfs]
 May  5 23:32:53 mail kernel: [  200.572895]  [a07f5ca0] ? 
 btrfs_search_slot+0x430/0xa30 [btrfs]
 May  5 23:32:53 mail kernel: [  200.572900]  [a07fb3a6] ? 
 lookup_inline_extent_backref+0x96/0x460 [btrfs]
 May  5 23:32:53 mail kernel: [  200.572904]  [8112b8d3] ? 
 kmem_cache_alloc+0x133/0x150
 May  5 23:32:53 mail kernel: [  200.572908]  [a07fd452] ? 
 __btrfs_free_extent+0xc2/0x6d0 [btrfs]
 May  5 23:32:53 mail kernel: [  200.572914]  [a0800f59] ? 
 run_clustered_refs+0x389/0x830 [btrfs]
 May  5 23:32:53 mail kernel: [  200.572920]  [a084d900] ? 
 btrfs_find_ref_cluster+0x10/0x190 [btrfs]
 May  5 23:32:53 mail kernel

Re: Having parent transid verify failed

2011-05-05 Thread Konstantinos Skarlatos

On 5/5/2011 11:32 μμ, Chris Mason wrote:

Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400:

I think i made some progress. When i tried to remove the directory that
i suspect contains the problematic file, i got this on the console

rm -rf serverloft/


Ok, our one bad block is in the extent allocation tree.  This is going
to be the very hardest thing to fix.

Until I finish off the code to rebuild parts of the extent allocation
tree, I think your best bet is to copy the files off.

The big question is, what happened to make this error?  Can you describe
your setup in more detail?


I created this btrfs filesystem on an arch linux system (amd64, quad 
core) with kernel 2.3.38.1. it is on top of a md raid 5.


[root@linuxserver ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sde1[3] sdc1[1] sda1[0] sdf1[4]
  5860535808 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[4/4] []


the raid was grown from 3 devices to 4, and then btrfs was grown to max 
size. mount options were clear_cache,compress-force.


I was investigating a performance issue that i had, because over the 
network i could only write to the filesystem at about 32mb/sec.


when writing btrfs-delalloc- cpu usage was at 100%.

While investigating i disabled compression, enabled space_cache and 
tried zlib compression, and various combinations, while copying large 
files back and forth using samba.


BTW I tried to change some mount options using mount -o remount but 
although the new options were printed on dmesg i think that they were 
not enabled.


I got the first error when i was copying some files and at the same time 
created a directory over samba. After a while i upgraded to 2.6.38.5 but 
nothing seems to have changed.


I really dont think there is a hardware error here, but to be safe I am 
now running a check on the raid





-chris



2011 May  5 23:32:53 mail [  200.580195] Oops:  [#1] PREEMPT SMP
2011 May  5 23:32:53 mail [  200.580220] last sysfs file:
/sys/module/vt/parameters/default_utf8
2011 May  5 23:32:53 mail [  200.581145] Stack:
2011 May  5 23:32:53 mail [  200.581276] Call Trace:
2011 May  5 23:32:53 mail [  200.581732] Code: cc 00 00 48 8d 91 28 e0
ff ff 48 89 e5 48 81 ec 90 00 00 00 48 89 5d d8 4c 89 65 e0 48 89 f3 4c
89 6d e8 4c 89 75 f0 4c 89 7d f848  8b 76 30 83 42 1c 01 48 b8 00 00
00 00 00 16 00 00 48 01 f0
2011 May  5 23:32:53 mail [  200.583376] CR2: 0030


here is the  part of dmesg that does not contain the  thousands of
parent transid verify failed messages


May  5 23:32:51 mail kernel: [  198.371084] parent transid verify failed
on 3062073683968 wanted 5181 found 5188
May  5 23:32:51 mail kernel: [  198.371204] parent transid verify failed
on 3062073683968 wanted 5181 found 5188
May  5 23:32:53 mail kernel: [  200.572774] Modules linked in: ipv6
btrfs zlib_deflate crc32c libcrc32c ext2 raid456 async_raid6_recov
async_pq raid6_pq async_xor xor async_memcpy async_tx md_mod usb_storage
uas snd_seq_dummy snd_seq_oss radeon snd_seq_midi_event ttm snd_seq
snd_hda_codec_hdmi snd_seq_device drm_kms_helper ohci_hcd snd_hda_intel
snd_hda_codec snd_pcm_oss snd_hwdep drm i2c_algo_bit snd_mixer_oss
snd_pcm i2c_piix4 snd_timer snd soundcore snd_page_alloc ehci_hcd wmi
i2c_core usbcore evdev processor button k10temp serio_raw pcspkr sg
r8169 edac_core shpchp pci_hotplug edac_mce_amd mii sp5100_tco ext4
mbcache jbd2 crc16 sd_mod pata_acpi ahci libahci pata_atiixp libata scsi_mod
May  5 23:32:53 mail kernel: [  200.572808] Pid: 1037, comm:
btrfs-transacti Not tainted 2.6.38-ARCH #1
May  5 23:32:53 mail kernel: [  200.572810] Call Trace:
May  5 23:32:53 mail kernel: [  200.572817]  [813a932b] ?
__schedule_bug+0x59/0x5d
May  5 23:32:53 mail kernel: [  200.572820]  [813af827] ?
schedule+0x9f7/0xad0
May  5 23:32:53 mail kernel: [  200.572823]  [811e5827] ?
generic_unplug_device+0x37/0x40
May  5 23:32:53 mail kernel: [  200.572827]  [a07ac164] ?
md_raid5_unplug_device+0x64/0x110 [raid456]
May  5 23:32:53 mail kernel: [  200.572830]  [a07ac223] ?
raid5_unplug_queue+0x13/0x20 [raid456]
May  5 23:32:53 mail kernel: [  200.572833]  [81012d79] ?
read_tsc+0x9/0x20
May  5 23:32:53 mail kernel: [  200.572837]  [8108418c] ?
ktime_get_ts+0xac/0xe0
May  5 23:32:53 mail kernel: [  200.572840]  [810e36c0] ?
sync_page+0x0/0x50
May  5 23:32:53 mail kernel: [  200.572842]  [813af96e] ?
io_schedule+0x6e/0xb0
May  5 23:32:53 mail kernel: [  200.572844]  [810e36fb] ?
sync_page+0x3b/0x50
May  5 23:32:53 mail kernel: [  200.572846]  [813b0077] ?
__wait_on_bit+0x57/0x80
May  5 23:32:53 mail kernel: [  200.572848]  [810e38c0] ?
wait_on_page_bit+0x70/0x80
May  5 23:32:53 mail kernel: [  200.572851]  [8107a030] ?
wake_bit_function+0x0/0x40
May  5 23:32:53 mail kernel: [  200.572861]  [a08348d2] ?
read_extent_buffer_pages+0x412/0x480 [btrfs]
May  5 23:32:53

Re: Having parent transid verify failed

2011-05-05 Thread Chris Mason
Excerpts from Konstantinos Skarlatos's message of 2011-05-05 17:04:00 -0400:
 On 5/5/2011 11:32 μμ, Chris Mason wrote:
  Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400:
  I think i made some progress. When i tried to remove the directory that
  i suspect contains the problematic file, i got this on the console
 
  rm -rf serverloft/
 
  Ok, our one bad block is in the extent allocation tree.  This is going
  to be the very hardest thing to fix.
 
  Until I finish off the code to rebuild parts of the extent allocation
  tree, I think your best bet is to copy the files off.
 
  The big question is, what happened to make this error?  Can you describe
  your setup in more detail?
 
 I created this btrfs filesystem on an arch linux system (amd64, quad 
 core) with kernel 2.3.38.1. it is on top of a md raid 5.
 
 [root@linuxserver ~]# cat /proc/mdstat
 Personalities : [raid6] [raid5] [raid4]
 md0 : active raid5 sde1[3] sdc1[1] sda1[0] sdf1[4]
5860535808 blocks super 1.2 level 5, 512k chunk, algorithm 2 
 [4/4] []
 
 the raid was grown from 3 devices to 4, and then btrfs was grown to max 
 size. mount options were clear_cache,compress-force.
 
 I was investigating a performance issue that i had, because over the 
 network i could only write to the filesystem at about 32mb/sec.
 
 when writing btrfs-delalloc- cpu usage was at 100%.
 
 While investigating i disabled compression, enabled space_cache and 
 tried zlib compression, and various combinations, while copying large 
 files back and forth using samba.
 
 BTW I tried to change some mount options using mount -o remount but 
 although the new options were printed on dmesg i think that they were 
 not enabled.
 
 I got the first error when i was copying some files and at the same time 
 created a directory over samba. After a while i upgraded to 2.6.38.5 but 
 nothing seems to have changed.
 
 I really dont think there is a hardware error here, but to be safe I am 
 now running a check on the raid

This error basically means we didn't write the block.  It could be
because the write went to the wrong spot, or the hardware stack messed
it up, or because of a btrfs bug.  But, 2.6.38 is relatively recent.  It
doesn't look like memory corruption because the transids are fairly
close.

When you grew the raid device, did you grow a partition as well?  We've
had trouble in the past with block dev flushing code kicking in as
devices are resized.

Samba isn't doing anything exotic, and 2.6.38 has my recent fixes for
rare metadata corruption bugs in btrfs.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Having parent transid verify failed

2011-05-05 Thread Peter Stuge
Chris Mason wrote:
 We've had trouble in the past with block dev flushing code kicking
 in as devices are resized.

Might this be the problem with my root node? I wish my problem was
in only one directory. :)


//Peter
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Having parent transid verify failed

2011-05-05 Thread Konstantinos Skarlatos



On 6/5/2011 2:50 πμ, Chris Mason wrote:

Excerpts from Konstantinos Skarlatos's message of 2011-05-05 17:04:00 -0400:

On 5/5/2011 11:32 μμ, Chris Mason wrote:

Excerpts from Konstantinos Skarlatos's message of 2011-05-05 16:27:54 -0400:

I think i made some progress. When i tried to remove the directory that
i suspect contains the problematic file, i got this on the console

rm -rf serverloft/


Ok, our one bad block is in the extent allocation tree.  This is going
to be the very hardest thing to fix.

Until I finish off the code to rebuild parts of the extent allocation
tree, I think your best bet is to copy the files off.

The big question is, what happened to make this error?  Can you describe
your setup in more detail?


I created this btrfs filesystem on an arch linux system (amd64, quad
core) with kernel 2.3.38.1. it is on top of a md raid 5.

[root@linuxserver ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sde1[3] sdc1[1] sda1[0] sdf1[4]
5860535808 blocks super 1.2 level 5, 512k chunk, algorithm 2
[4/4] []

the raid was grown from 3 devices to 4, and then btrfs was grown to max
size. mount options were clear_cache,compress-force.

I was investigating a performance issue that i had, because over the
network i could only write to the filesystem at about 32mb/sec.

when writing btrfs-delalloc- cpu usage was at 100%.

While investigating i disabled compression, enabled space_cache and
tried zlib compression, and various combinations, while copying large
files back and forth using samba.

BTW I tried to change some mount options using mount -o remount but
although the new options were printed on dmesg i think that they were
not enabled.

I got the first error when i was copying some files and at the same time
created a directory over samba. After a while i upgraded to 2.6.38.5 but
nothing seems to have changed.

I really dont think there is a hardware error here, but to be safe I am
now running a check on the raid


This error basically means we didn't write the block.  It could be
because the write went to the wrong spot, or the hardware stack messed
it up, or because of a btrfs bug.  But, 2.6.38 is relatively recent.  It
doesn't look like memory corruption because the transids are fairly
close.

When you grew the raid device, did you grow a partition as well?  We've
had trouble in the past with block dev flushing code kicking in as
devices are resized.


no, I did not grow any partitions, I just added one disk to the Raid 5 
md0 device, and then grew the btrfs filesystem to max size(no partitions 
on md0).


I can remember that as a test (to see if shrink works) i shrank the fs 
by 1 gb and then grew it again to max size.




Samba isn't doing anything exotic, and 2.6.38 has my recent fixes for
rare metadata corruption bugs in btrfs.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recovering parent transid verify failed

2011-03-23 Thread Luke Sheldrick
Hi,

I'm having the same issues as previously mentioned.

Apparently the new fsck tool will be able to recover this?

Few questions, is there a GIT version I can compile and use already for this?

If not, is there any indication of when this will be released?

---
Luke Sheldrick
e: l...@sheldrick.co.uk
p: 07880 725099
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovering parent transid verify failed

2011-03-23 Thread Chris Mason
Excerpts from Luke Sheldrick's message of 2011-03-23 14:12:45 -0400:
 Hi,
 
 I'm having the same issues as previously mentioned.
 
 Apparently the new fsck tool will be able to recover this?
 
 Few questions, is there a GIT version I can compile and use already for this?
 
 If not, is there any indication of when this will be released?

Yes, I'm still hammering out a reliable way to resolve most of these.
But, please post the messages you're hitting, it is actually a very
generic problem and has many different causes.

What happened to your FS that made them come up?

Which kernel were you running and what was the FS built on top of?

What happens when you grab the latest btrfsck from git and do:

btrfsck -s 1 /dev/xxx

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recovering parent transid verify failed

2011-03-06 Thread Hugo Mills
On Sun, Mar 06, 2011 at 12:28:41PM +0200, Yo'av Moshe wrote:
 Hey,
 I'd start by saying that I know Btrfs is a still experimental, and so
 there's no guarantee that one would be able to help me at all... But I
 thought I'll try anyway :-)
 
 Few months ago I bought a new laptop and installed ArchLinux on it,
 with Btrfs on the root filesystem... I know, it's not the smartest
 thing to do...
 After a few month I had issues with my hibernations scripts, and one
 day I tried to hibernate my computer but it didn't go that well, and,
 well, ever since then my Btrfs partition is not accessible.
 I opened up the Btrfs FAQ and saw that the fsck tool should be out by
 the end of 2010, and thought oh well, I could wait until then, and
 went on and installed Ubuntu with Ext4 on another small partition.
 
 But times goes one and the fsck tool is still in development... I've
 tried using the code from GIT and it didn't work, and I'm starting to
 wonder (a) if there's any hope at all and (b) what other step am I
 able to do to recover my old Btrfs partition.

   Yes, there is hope. This error should be fixable with the new fsck.

 When trying to mount the Btrfs parition I get this in dmesg:
 [105252.779080] device fsid d14e78a602757297-bf762d859b406ca9 devid 1
 transid 135714 /dev/sda4
 [105252.818697] parent transid verify failed on 216925220864 wanted
 135714 found 135713
[snip]
 Should I wait for btrfsck to be ready?

   Yes.

 Am I not using it correctly now?

   No, there's not a lot the current version can do right now.

 Is there anyway to recover this partition or should I just wipe it and
 reinstall Btrfs only when I'm supposed to?..
 
 Your help is appreciated.

   HTH,
   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- I am the author. You are the audience. I outrank you! --- 


signature.asc
Description: Digital signature


Re: Fsck, parent transid verify failed

2011-01-03 Thread Tommy Jonsson
On Thu, Dec 9, 2010 at 6:14 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Tommy Jonsson's message of 2010-12-08 15:07:58 -0500:
  Build the latest tools, then:
 
  btrfsck -s 1 /dev/xxx
  btrfsck -s 2 /dev/xxx
 
  If either of these work we have an easy way to get it mounted.  Just let
  me know.
 
  -chris
 

  $ btrfsck -s 1 /dev/sda
  using SB copy 1, bytenr 67108864
  parent transid verify failed on 2721514774528 wanted 39651 found 39649
  btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' 
  failed.
  Aborted
 
  $ btrfsck -s 2 /dev/sda
  using SB copy 2, bytenr 274877906944
  parent transid verify failed on 2721514774528 wanted 39651 found 39649
  btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' 
  failed.
  Aborted
 
  Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda :
  parent transid verify failed on 2721514774528 wanted 39651 found 39649
  btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion
  `!(!tree_root-node)' failed.
 
  dmesg said:
  [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 
  transid 39650 /dev/sdd
  [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 
  transid 39651 /dev/sdc
  [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 
  transid 39651 /dev/sda

 Sorry to be a bother, but do you have any other suggestions ?

 Not a bother at all, I'm polishing off a version of fsck that I hope
 will be able to construct a good tree for you.  It's my main priority
 right now and I hope to have something ready early Monday.

 -chris


Hi again Chris.
Hope you survived Christmas and new year :]

Just wanted to check in and see how you are progressing on the btrfsck?
Drop me a mail if you want me to test/debug anything.

-tommy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fsck, parent transid verify failed

2010-12-15 Thread Tommy Jonsson
 Excerpts from Tommy Jonsson's message of 2010-12-08 15:07:58 -0500:
  Build the latest tools, then:
 
  btrfsck -s 1 /dev/xxx
  btrfsck -s 2 /dev/xxx
 
  If either of these work we have an easy way to get it mounted.  Just let
  me know.
 
  -chris
 

  $ btrfsck -s 1 /dev/sda
  using SB copy 1, bytenr 67108864
  parent transid verify failed on 2721514774528 wanted 39651 found 39649
  btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' 
  failed.
  Aborted
 
  $ btrfsck -s 2 /dev/sda
  using SB copy 2, bytenr 274877906944
  parent transid verify failed on 2721514774528 wanted 39651 found 39649
  btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' 
  failed.
  Aborted
 
  Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda :
  parent transid verify failed on 2721514774528 wanted 39651 found 39649
  btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion
  `!(!tree_root-node)' failed.
 
  dmesg said:
  [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 
  transid 39650 /dev/sdd
  [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 
  transid 39651 /dev/sdc
  [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 
  transid 39651 /dev/sda

 Sorry to be a bother, but do you have any other suggestions ?

 Not a bother at all, I'm polishing off a version of fsck that I hope
 will be able to construct a good tree for you.  It's my main priority
 right now and I hope to have something ready early Monday.

 -chris


Hi Chris.
Thanks for all your help. Any progress on the fsck ?
I pulled the latest btrfs-progs-unstable and recompiled, same output
from all the commands (btrfsck -s / btrfs-debug-tree).

-tommy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fsck, parent transid verify failed

2010-12-12 Thread Tom Kuther
On Fr, 10.12.10 15:11 Chris Mason chris.ma...@oracle.com wrote:

  What would be the steps to get it mounted?
 
 If btrfsck -s is able to find a good super, I've setup a tool that
 will copy the good super over into the default super.  It is currently
 sitting in the next branch of the btrfs-progs-unstable repo.
 
 git clone
 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
 next
 
 (or git pull into your existing checkout)
 
 Then
 
 make btrfs-select-super
 ./btrfs-selects-super -s 1 /dev/xxx
 
 After this you'll want to do a full backup and make sure things are
 working properly.
 
 -chris

This worked fine. I was able to mount and completely read it.
The volume seems healthy and is fully usable so far.

Thanks a lot!

~thomas
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fsck, parent transid verify failed

2010-12-10 Thread Chris Mason
Excerpts from Tom Kuther's message of 2010-12-09 11:21:03 -0500:
 Chris Mason chris.mason at oracle.com writes:
 
  [...]
  Build the latest tools, then:
  
  btrfsck -s 1 /dev/xxx
  btrfsck -s 2 /dev/xxx
  
  If either of these work we have an easy way to get it mounted.  Just let
  me know.
  
 
 Hello,
 
 I get those parent transid verify failed errors too after a system failure.
 
 # btrfsck -s 1 /dev/md0 
 using SB copy 1, bytenr 67108864
 found 1954912653312 bytes used err is 0
 total csum bytes: 1892054684
 total tree bytes: 3455627264
 total fs tree bytes: 1082691584
 btree space waste bytes: 584155173
 file data blocks allocated: 12808940421120
  referenced 1933520879616
 Btrfs v0.19-35-g1b444cd-dirty
 # btrfsck -s 2 /dev/md0 
 using SB copy 2, bytenr 274877906944
 found 1954912653312 bytes used err is 0
 -snip-
 
 Both seem to work.
 What would be the steps to get it mounted?

If btrfsck -s is able to find a good super, I've setup a tool that will
copy the good super over into the default super.  It is currently
sitting in the next branch of the btrfs-progs-unstable repo.

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git 
next

(or git pull into your existing checkout)

Then

make btrfs-select-super
./btrfs-selects-super -s 1 /dev/xxx

After this you'll want to do a full backup and make sure things are
working properly.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fsck, parent transid verify failed

2010-12-09 Thread Tom Kuther
Chris Mason chris.mason at oracle.com writes:

 [...]
 Build the latest tools, then:
 
 btrfsck -s 1 /dev/xxx
 btrfsck -s 2 /dev/xxx
 
 If either of these work we have an easy way to get it mounted.  Just let
 me know.
 

Hello,

I get those parent transid verify failed errors too after a system failure.

# btrfsck -s 1 /dev/md0 
using SB copy 1, bytenr 67108864
found 1954912653312 bytes used err is 0
total csum bytes: 1892054684
total tree bytes: 3455627264
total fs tree bytes: 1082691584
btree space waste bytes: 584155173
file data blocks allocated: 12808940421120
 referenced 1933520879616
Btrfs v0.19-35-g1b444cd-dirty
# btrfsck -s 2 /dev/md0 
using SB copy 2, bytenr 274877906944
found 1954912653312 bytes used err is 0
-snip-

Both seem to work.
What would be the steps to get it mounted?

Thanks in advance.

~thomas


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fsck, parent transid verify failed

2010-12-09 Thread Chris Mason
Excerpts from Tommy Jonsson's message of 2010-12-08 15:07:58 -0500:
  Build the latest tools, then:
 
  btrfsck -s 1 /dev/xxx
  btrfsck -s 2 /dev/xxx
 
  If either of these work we have an easy way to get it mounted.  Just let
  me know.
 
  -chris
 
 
  $ btrfsck -s 1 /dev/sda
  using SB copy 1, bytenr 67108864
  parent transid verify failed on 2721514774528 wanted 39651 found 39649
  btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' 
  failed.
  Aborted
 
  $ btrfsck -s 2 /dev/sda
  using SB copy 2, bytenr 274877906944
  parent transid verify failed on 2721514774528 wanted 39651 found 39649
  btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' 
  failed.
  Aborted
 
  Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda :
  parent transid verify failed on 2721514774528 wanted 39651 found 39649
  btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion
  `!(!tree_root-node)' failed.
 
  dmesg said:
  [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 
  transid 39650 /dev/sdd
  [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 
  transid 39651 /dev/sdc
  [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 
  transid 39651 /dev/sda
 
 Sorry to be a bother, but do you have any other suggestions ?

Not a bother at all, I'm polishing off a version of fsck that I hope
will be able to construct a good tree for you.  It's my main priority
right now and I hope to have something ready early Monday.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fsck, parent transid verify failed

2010-12-08 Thread Tommy Jonsson
 Build the latest tools, then:

 btrfsck -s 1 /dev/xxx
 btrfsck -s 2 /dev/xxx

 If either of these work we have an easy way to get it mounted.  Just let
 me know.

 -chris


 $ btrfsck -s 1 /dev/sda
 using SB copy 1, bytenr 67108864
 parent transid verify failed on 2721514774528 wanted 39651 found 39649
 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed.
 Aborted

 $ btrfsck -s 2 /dev/sda
 using SB copy 2, bytenr 274877906944
 parent transid verify failed on 2721514774528 wanted 39651 found 39649
 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed.
 Aborted

 Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda :
 parent transid verify failed on 2721514774528 wanted 39651 found 39649
 btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion
 `!(!tree_root-node)' failed.

 dmesg said:
 [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 transid 
 39650 /dev/sdd
 [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 transid 
 39651 /dev/sdc
 [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 
 39651 /dev/sda

Sorry to be a bother, but do you have any other suggestions ?

Thanks!

-tommy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fsck, parent transid verify failed

2010-12-02 Thread Chris Mason
Excerpts from Tommy Jonsson's message of 2010-12-01 06:00:56 -0500:
 Hi folks!
 
 Been using btrfs for quite a while now, worked great until now. 
 Got power-loss on my machine and now i have the parent transid verify
 failed on X wanted X found X problem.
 So I can't get it to mount.
 
 My btrfs is spread over sda (2tb), sdc(2tb), sdd(1tb).
 
 Is this something that an offline fsck could fix ? 
 If so is the fsck-util being developed ?
 Is there a way to mount the FS in a read-only mode or something to rescue
 the data ?

Which kernel are you on?  Unless you formatted with -m raid0, the
current git tree should be able to read this FS by using the second copy
of the metadata.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fsck, parent transid verify failed

2010-12-02 Thread Chris Mason
Excerpts from Tommy Jonsson's message of 2010-12-02 16:45:39 -0500:
 I can't remember if i used -m raid0.
 I think i just used mkfs.btrfs /dev/sda then btrfs device add /dev/sdb
 and same for sdc.
 I am sure that i didn't explicitly use -m raid1 or raid10.
 Is there a way that i can check this ?

The defaults will maintain raid1 as you add more drives.  We can check
it with btrfs-debug-tree from the git repository.  But, more below.

 
 If i do have raid0 for both metadata and data is there anything i can do ?
 I've been looking at the source but haven't got my head around it yet.
 
 What whould happen if i just ignore/bypass the transid error?
 
 The error:
 [265889.197279] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3
 transid 39651 /dev/sda
 [265889.198266] btrfs: use compression
 [265889.647817] parent transid verify failed on 2721514774528 wanted 39651
 found 39649
 [265889.672632] btrfs: open_ctree failed
 
 Or could i update the metadata to want 39649 ?

The first thing I would try is:

git pull 
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git

Build the latest tools, then:

btrfsck -s 1 /dev/xxx
btrfsck -s 2 /dev/xxx

If either of these work we have an easy way to get it mounted.  Just let
me know.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fsck, parent transid verify failed

2010-12-02 Thread Tommy Jonsson
$ btrfsck -s 1 /dev/sda
using SB copy 1, bytenr 67108864
parent transid verify failed on 2721514774528 wanted 39651 found 39649
btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed.
Aborted

$ btrfsck -s 2 /dev/sda
using SB copy 2, bytenr 274877906944
parent transid verify failed on 2721514774528 wanted 39651 found 39649
btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed.
Aborted

-tommy


On Thu, Dec 2, 2010 at 10:50 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Tommy Jonsson's message of 2010-12-02 16:45:39 -0500:
 I can't remember if i used -m raid0.
 I think i just used mkfs.btrfs /dev/sda then btrfs device add /dev/sdb
 and same for sdc.
 I am sure that i didn't explicitly use -m raid1 or raid10.
 Is there a way that i can check this ?

 The defaults will maintain raid1 as you add more drives.  We can check
 it with btrfs-debug-tree from the git repository.  But, more below.


 If i do have raid0 for both metadata and data is there anything i can do ?
 I've been looking at the source but haven't got my head around it yet.

 What whould happen if i just ignore/bypass the transid error?

 The error:
 [265889.197279] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3
 transid 39651 /dev/sda
 [265889.198266] btrfs: use compression
 [265889.647817] parent transid verify failed on 2721514774528 wanted 39651
 found 39649
 [265889.672632] btrfs: open_ctree failed

 Or could i update the metadata to want 39649 ?

 The first thing I would try is:

 git pull 
 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git

 Build the latest tools, then:

 btrfsck -s 1 /dev/xxx
 btrfsck -s 2 /dev/xxx

 If either of these work we have an easy way to get it mounted.  Just let
 me know.

 -chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fsck, parent transid verify failed

2010-12-02 Thread Tommy Jonsson
Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda :
parent transid verify failed on 2721514774528 wanted 39651 found 39649
btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion
`!(!tree_root-node)' failed.

dmesg said:
[268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2
transid 39650 /dev/sdd
[268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1
transid 39651 /dev/sdc
[268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3
transid 39651 /dev/sda

-tommy

On Thu, Dec 2, 2010 at 10:59 PM, Tommy Jonsson quaz...@gmail.com wrote:
 $ btrfsck -s 1 /dev/sda
 using SB copy 1, bytenr 67108864
 parent transid verify failed on 2721514774528 wanted 39651 found 39649
 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed.
 Aborted

 $ btrfsck -s 2 /dev/sda
 using SB copy 2, bytenr 274877906944
 parent transid verify failed on 2721514774528 wanted 39651 found 39649
 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed.
 Aborted

 -tommy


 On Thu, Dec 2, 2010 at 10:50 PM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Tommy Jonsson's message of 2010-12-02 16:45:39 -0500:
 I can't remember if i used -m raid0.
 I think i just used mkfs.btrfs /dev/sda then btrfs device add /dev/sdb
 and same for sdc.
 I am sure that i didn't explicitly use -m raid1 or raid10.
 Is there a way that i can check this ?

 The defaults will maintain raid1 as you add more drives.  We can check
 it with btrfs-debug-tree from the git repository.  But, more below.


 If i do have raid0 for both metadata and data is there anything i can do ?
 I've been looking at the source but haven't got my head around it yet.

 What whould happen if i just ignore/bypass the transid error?

 The error:
 [265889.197279] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3
 transid 39651 /dev/sda
 [265889.198266] btrfs: use compression
 [265889.647817] parent transid verify failed on 2721514774528 wanted 39651
 found 39649
 [265889.672632] btrfs: open_ctree failed

 Or could i update the metadata to want 39649 ?

 The first thing I would try is:

 git pull 
 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git

 Build the latest tools, then:

 btrfsck -s 1 /dev/xxx
 btrfsck -s 2 /dev/xxx

 If either of these work we have an easy way to get it mounted.  Just let
 me know.

 -chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fsck, parent transid verify failed

2010-12-01 Thread Tommy Jonsson
Hi folks!

Been using btrfs for quite a while now, worked great until now. 
Got power-loss on my machine and now i have the parent transid verify
failed on X wanted X found X problem.
So I can't get it to mount.

My btrfs is spread over sda (2tb), sdc(2tb), sdd(1tb).

Is this something that an offline fsck could fix ? 
If so is the fsck-util being developed ?
Is there a way to mount the FS in a read-only mode or something to rescue
the data ?

Thanks, Tommy.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent transid verify failed, continued

2010-09-25 Thread Francis Galiegue
On Thu, Sep 23, 2010 at 08:24, Francis Galiegue fgalie...@gmail.com wrote:
 Hello list,

 I've been using btrfs for nearly 6 months now, on three machines, with
 no problems but for _one_ filesystem on one machine. The problem is
 the message in $subject.

 For this particular filesystem, which contains qemu-kvm disk images in
 raw mode with caching mode set to writeback, the symptoms is that:

 * in 2.6.34 and lower, I could mount the filesystem, with the parent
 transid verify failed message appearing once;
 * with 2.6.35+ and upper, however, not anymore: I mount it and the
 same parent transid very failed message now floods dmesg, and I
 cannot kill -9 any program trying to access that filesystem.

[...]

 I just fear that I get [for my rootfs] into the situation of the hosed 
 filesystem
 which I cannot mount anymore...


And I just did. Dang.

Fortunately I have the sysreccd with which I _can_ mount the filesystem! Phew.

-- 
Francis Galiegue, fgalie...@gmail.com
It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries (Stéphane Faroult, in The
Art of SQL, ISBN 0-596-00894-5)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


parent transid verify failed, continued

2010-09-23 Thread Francis Galiegue
Hello list,

I've been using btrfs for nearly 6 months now, on three machines, with
no problems but for _one_ filesystem on one machine. The problem is
the message in $subject.

For this particular filesystem, which contains qemu-kvm disk images in
raw mode with caching mode set to writeback, the symptoms is that:

* in 2.6.34 and lower, I could mount the filesystem, with the parent
transid verify failed message appearing once;
* with 2.6.35+ and upper, however, not anymore: I mount it and the
same parent transid very failed message now floods dmesg, and I
cannot kill -9 any program trying to access that filesystem.

I sent a mail to the list at the time: I bisected that to
5bdd3536cbbe2ecd94ecc14410c6b1b31da16381. The problem is still there.

And this morning, while doing a btrfs filesystem defragment on the /
of one of my machines (the one I'm writing this mail from, in fact), I
saw this message four times again (kernel 2.6.36-rc5):


Sep 23 07:42:11 erwin kernel: [  148.689191] parent transid verify
failed on 14077947904 wanted 316581 found 316247
Sep 23 07:42:11 erwin kernel: [  148.689529] parent transid verify
failed on 14077947904 wanted 316581 found 316247
Sep 23 07:42:13 erwin kernel: [  151.059728] parent transid verify
failed on 14084829184 wanted 316581 found 316247
Sep 23 07:42:13 erwin kernel: [  151.060036] parent transid verify
failed on 14084829184 wanted 316581 found 316247


Does that mean that there is corruption on the filesystem, somewhere?
I just fear that I get into the situation of the hosed filesystem
which I cannot mount anymore...

-- 
Francis Galiegue, fgalie...@gmail.com
It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries (Stéphane Faroult, in The
Art of SQL, ISBN 0-596-00894-5)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Any btrfsck to try out? [was: parent transid verify failed, continued]

2010-09-23 Thread Francis Galiegue
On Thu, Sep 23, 2010 at 08:24, Francis Galiegue fgalie...@gmail.com wrote:
[...]

 For this particular filesystem, which contains qemu-kvm disk images in
 raw mode with caching mode set to writeback, the symptoms is that:

 * in 2.6.34 and lower, I could mount the filesystem, with the parent
 transid verify failed message appearing once;
 * with 2.6.35+ and upper, however, not anymore: I mount it and the
 same parent transid very failed message now floods dmesg, and I
 cannot kill -9 any program trying to access that filesystem.

[...]

Another thing: as I can afford to recreate the hosed filesystem if
need be, I'm also ready to try any offline (of course) repairing
btrfsck on this filesystem and see if I can mount it again safely.

Any btrfs-progs tree that I might try out? I have the possibility to
boot from a USB key with a sufficiently recent kernel and test that,
and attempt to mount the fs again...

-- 
Francis Galiegue, fgalie...@gmail.com
It seems obvious [...] that at least some 'business intelligence'
tools invest so much intelligence on the business side that they have
nothing left for generating SQL queries (Stéphane Faroult, in The
Art of SQL, ISBN 0-596-00894-5)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


parent transid verify failed

2010-09-06 Thread Jan Steffens
After an unclean shutdown, my btrfs is now unmountable:

device label root devid 1 transid 375202 /dev/sdc4
parent transid verify failed on 53984886784 wanted 375202 found 375201
parent transid verify failed on 53984886784 wanted 375202 found 375201
parent transid verify failed on 53984886784 wanted 375202 found 375201
btrfs: open_ctree failed

btrfsck aborts:

couldn't open because of unsupported option features (2).
btrfsck: disk-io.c:682: open_ctree_fd: Assertion `!(1)' failed.
[1]14899 abort  btrfsck /dev/sdc4

Is there any way to recover the filesystem?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


how can i recover the btrfs after parent transid verify failed?

2010-08-01 Thread Fabian Kramer
My raid5 crashed, so i changed the old hdd and started the rebuild.
But now i can't mount the btrfs, the mount command frozen and the
kernel give me in a loop the follwing lines:

Jul 29 18:37:04 fileserver kernel: [ 1229.692268]
verify_parent_transid: 2492 callbacks suppressed
Jul 29 18:37:04 fileserver kernel: [ 1229.692274] parent transid
verify failed on 6975016271872 wanted 204247 found 204249
Jul 29 18:37:04 fileserver kernel: [ 1229.692287] parent transid
verify failed on 6975016271872 wanted 204247 found 204249
Jul 29 18:37:04 fileserver kernel: [ 1229.696549] parent transid
verify failed on 6975016271872 wanted 204247 found 204249
Jul 29 18:37:04 fileserver kernel: [ 1229.696564] parent transid
verify failed on 6975016271872 wanted 204247 found 204249
Jul 29 18:37:04 fileserver kernel: [ 1229.700419] parent transid
verify failed on 6975016271872 wanted 204247 found 204249
Jul 29 18:37:04 fileserver kernel: [ 1229.700433] parent transid
verify failed on 6975016271872 wanted 204247 found 204249
Jul 29 18:37:04 fileserver kernel: [ 1229.704392] parent transid
verify failed on 6975016271872 wanted 204247 found 204249
Jul 29 18:37:04 fileserver kernel: [ 1229.704404] parent transid
verify failed on 6975016271872 wanted 204247 found 204249
Jul 29 18:37:04 fileserver kernel: [ 1229.708384] parent transid
verify failed on 6975016271872 wanted 204247 found 204249
Jul 29 18:37:04 fileserver kernel: [ 1229.708396] parent transid
verify failed on 6975016271872 wanted 204247 found 204249

Is there any way to revocery the fs?

rgds Fabian Kramer
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


dmesg filled with parent transid verify failed messages using Linus' HEAD as of 20100704

2010-07-04 Thread Francis GALIEGUE
[Note: I am not on the list, can you please Cc: me on replies? Thank
you in advance]

Hello,

Kernels: 2.6.34 to HEAD. Using ~amd64 Gentoo. All my filesystems
except /boot are btrfs.

btrfsck says I have a corrupted btrfs filesystem on my machine, and I
see this message in dmesg when mounting the filesystem (in
fs/btrfs/disk-io.c:284):

parent transid verify failed on 48136192 wanted 16424 found 16420

But there is something strange:

* this message only appears twice when running 2.6.34;
* it fills my dmesg (several tens of thousands of times a second -
printk_ratelimit() triggers to suppress the vast majority of them)
when running HEAD.

After a quick git log v2.6.34.. -- fs/btrfs, I found commit
5bdd3536cbbe2ecd94ecc14410c6b1b31da16381, which I reverted: HEAD now
behaves like 2.6.34, dmesg-wise. Unintended side effect?

-- 

Francis Galiegue
ONE2TEAM
Ingénieur système
Mob : +33 (0) 683 877 875
Tel : +33 (0) 178 945 552
f...@one2team.com
40 avenue Raymond Poincaré
75116 Paris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html