Re: System unable to mount partition after a power loss

2018-12-07 Thread Doni Crosby
I ran that command and I cannot get the email to send properly to the 
mailing list as the attachment of the output is over 4.6M.


On 12/7/2018 11:49 AM, Doni Crosby wrote:

The output of the command is attached. This is what errors showed up
on the system:
parent transid verify failed on 3563224842240 wanted 5184691 found 5184689
parent transid verify failed on 3563224842240 wanted 5184691 found 5184689
parent transid verify failed on 3563222974464 wanted 5184691 found 5184688
parent transid verify failed on 3563222974464 wanted 5184691 found 5184688
parent transid verify failed on 3563223121920 wanted 5184691 found 5184688
parent transid verify failed on 3563223121920 wanted 5184691 found 5184688
parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
Ignoring transid failure
parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
Ignoring transid failure
parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
Ignoring transid failure
parent transid verify failed on 3563231412224 wanted 5184691 found 5183325
parent transid verify failed on 3563231412224 wanted 5184691 found 5183325
parent transid verify failed on 3563231412224 wanted 5184691 found 5183325
parent transid verify failed on 3563231412224 wanted 5184691 found 5183325
Ignoring transid failure
parent transid verify failed on 3563231461376 wanted 5184691 found 5183325
parent transid verify failed on 3563231461376 wanted 5184691 found 5183325
parent transid verify failed on 3563231461376 wanted 5184691 found 5183325
parent transid verify failed on 3563231461376 wanted 5184691 found 5183325
Ignoring transid failure
WARNING: eb corrupted: parent bytenr 31801344 slot 132 level 1 child
bytenr 3563231461376 level has 1 expect 0, skipping the slot
parent transid verify failed on 3563231494144 wanted 5184691 found 5183325
parent transid verify failed on 3563231494144 wanted 5184691 found 5183325
parent transid verify failed on 3563231494144 wanted 5184691 found 5183325
parent transid verify failed on 3563231494144 wanted 5184691 found 5183325
Ignoring transid failure
parent transid verify failed on 3563231526912 wanted 5184691 found 5183325
parent transid verify failed on 3563231526912 wanted 5184691 found 5183325
parent transid verify failed on 3563231526912 wanted 5184691 found 5183325
parent transid verify failed on 3563231526912 wanted 5184691 found 5183325
Ignoring transid failure
parent transid verify failed on 3563229626368 wanted 5184691 found 5184689
parent transid verify failed on 3563229626368 wanted 5184691 found 5184689
parent transid verify failed on 3563229937664 wanted 5184691 found 5184689
parent transid verify failed on 3563229937664 wanted 5184691 found 5184689
parent transid verify failed on 3563226857472 wanted 5184691 found 5184689
parent transid verify failed on 3563226857472 wanted 5184691 found 5184689
parent transid verify failed on 3563230674944 wanted 5184691 found 5183325
parent transid verify failed on 3563230674944 wanted 5184691 found 5183325
parent transid verify failed on 3563230674944 wanted 5184691 found 5183325
parent transid verify failed on 3563230674944 wanted 5184691 found 5183325
Ignoring transid failure
On Fri, Dec 7, 2018 at 2:22 AM Qu Wenruo  wrote:




On 2018/12/7 下午1:24, Doni Crosby wrote:

All,

I'm coming to you to see if there is a way to fix or at least recover
most of the data I have from a btrfs filesystem. The system went down
after both a breaker and the battery backup failed. I cannot currently
mount the system, with the following error from dmesg:

Note: The vda1 is just the entire disk being passed from the VM host
to the VM it's not an actual true virtual block device

[ 499.704398] BTRFS info (device vda1): disk space caching is enabled
[  499.704401] BTRFS info (device vda1): has skinny extents
[  499.739522] BTRFS error (device vda1): parent transid verify failed
on 3563231428608 wanted 5184691 found 5183327


Transid mismatch normally means the fs is screwed up more or less.

And according to your mount failure, it looks the fs get screwed up badly.

What's the kernel version used in the VM?
I don't really think the VM is always using the latest kernel.


[  499.740257] BTRFS error (device vda1): parent transid verify failed
on 3563231428608 wanted 5184691 found 5183327

Re: System unable to mount partition after a power loss

2018-12-07 Thread Doni Crosby
I just looked at the VM it does not have a cache. That's the default
in proxmox to improve performance.
On Fri, Dec 7, 2018 at 7:25 AM Austin S. Hemmelgarn
 wrote:
>
> On 2018-12-07 01:43, Doni Crosby wrote:
> >> This is qemu-kvm? What's the cache mode being used? It's possible the
> >> usual write guarantees are thwarted by VM caching.
> > Yes it is a proxmox host running the system so it is a qemu vm, I'm
> > unsure on the caching situation.
> On the note of QEMU and the cache mode, the only cache mode I've seen to
> actually cause issues for BTRFS volumes _inside_ a VM is 'cache=unsafe',
> but that causes problems for most filesystems, so it's probably not the
> issue here.
>
> OTOH, I've seen issues with most of the cache modes other than
> 'cache=writeback' and 'cache=writethrough' when dealing with BTRFS as
> the back-end storage on the host system, and most of the time such
> issues will manifest as both problems with the volume inside the VM
> _and_ the volume the disk images are being stored on.


Re: System unable to mount partition after a power loss

2018-12-07 Thread Austin S. Hemmelgarn

On 2018-12-07 01:43, Doni Crosby wrote:

This is qemu-kvm? What's the cache mode being used? It's possible the
usual write guarantees are thwarted by VM caching.

Yes it is a proxmox host running the system so it is a qemu vm, I'm
unsure on the caching situation.
On the note of QEMU and the cache mode, the only cache mode I've seen to 
actually cause issues for BTRFS volumes _inside_ a VM is 'cache=unsafe', 
but that causes problems for most filesystems, so it's probably not the 
issue here.


OTOH, I've seen issues with most of the cache modes other than 
'cache=writeback' and 'cache=writethrough' when dealing with BTRFS as 
the back-end storage on the host system, and most of the time such 
issues will manifest as both problems with the volume inside the VM 
_and_ the volume the disk images are being stored on.


Re: System unable to mount partition after a power loss

2018-12-06 Thread Qu Wenruo


On 2018/12/7 下午1:24, Doni Crosby wrote:
> All,
> 
> I'm coming to you to see if there is a way to fix or at least recover
> most of the data I have from a btrfs filesystem. The system went down
> after both a breaker and the battery backup failed. I cannot currently
> mount the system, with the following error from dmesg:
> 
> Note: The vda1 is just the entire disk being passed from the VM host
> to the VM it's not an actual true virtual block device
> 
> [ 499.704398] BTRFS info (device vda1): disk space caching is enabled
> [  499.704401] BTRFS info (device vda1): has skinny extents
> [  499.739522] BTRFS error (device vda1): parent transid verify failed
> on 3563231428608 wanted 5184691 found 5183327

Transid mismatch normally means the fs is screwed up more or less.

And according to your mount failure, it looks the fs get screwed up badly.

What's the kernel version used in the VM?
I don't really think the VM is always using the latest kernel.

> [  499.740257] BTRFS error (device vda1): parent transid verify failed
> on 3563231428608 wanted 5184691 found 5183327
> [  499.770847] BTRFS error (device vda1): open_ctree failed
> 
> I have tried running btrfsck:
> parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
> parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
> parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
> parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
> parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
> parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
> parent transid verify failed on 3563221630976 wanted 5184691 found 5184688
> parent transid verify failed on 3563221630976 wanted 5184691 found 5184688
> parent transid verify failed on 3563223138304 wanted 5184691 found 5184688
> parent transid verify failed on 3563223138304 wanted 5184691 found 5184688
> parent transid verify failed on 3563223138304 wanted 5184691 found 5184688
> parent transid verify failed on 3563223138304 wanted 5184691 found 5184688
> parent transid verify failed on 3563224072192 wanted 5184691 found 5184688
> parent transid verify failed on 3563224072192 wanted 5184691 found 5184688
> parent transid verify failed on 3563225268224 wanted 5184691 found 5184689
> parent transid verify failed on 3563225268224 wanted 5184691 found 5184689
> parent transid verify failed on 3563227398144 wanted 5184691 found 5184689
> parent transid verify failed on 3563227398144 wanted 5184691 found 5184689
> parent transid verify failed on 3563229593600 wanted 5184691 found 5184689
> parent transid verify failed on 3563229593600 wanted 5184691 found 5184689
> parent transid verify failed on 3563229593600 wanted 5184691 found 5184689
> parent transid verify failed on 3563229593600 wanted 5184691 found 5184689

According to your later dump-super output, it looks pretty possible that
the corrupted extents are all belonging to extent tree.

So it's still possible that your fs tree and other essential trees are OK.

Please dump the following output (with its stderr) to further confirm
the damage.
# btrfs ins dump-tree -b 31801344 --follow /dev/vda1

If your objective is only to recover data, then you could start to try
btrfs-restore.
It's pretty hard to fix the heavily damaged extent tree.

Thanks,
Qu
> Ignoring transid failure
> Checking filesystem on /dev/vda1
> UUID: 7c76bb05-b3dc-4804-bf56-88d010a214c6
> checking extents
> parent transid verify failed on 3563224842240 wanted 5184691 found 5184689
> parent transid verify failed on 3563224842240 wanted 5184691 found 5184689
> parent transid verify failed on 3563222974464 wanted 5184691 found 5184688
> parent transid verify failed on 3563222974464 wanted 5184691 found 5184688
> parent transid verify failed on 3563223121920 wanted 5184691 found 5184688
> parent transid verify failed on 3563223121920 wanted 5184691 found 5184688
> parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
> parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
> parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
> parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
> Ignoring transid failure
> parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
> parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
> parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
> parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
> Ignoring transid failure
> parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
> parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
> parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
> parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
> Ignoring transid failure
> parent transid verify 

Re: System unable to mount partition after a power loss

2018-12-06 Thread Doni Crosby
> This is qemu-kvm? What's the cache mode being used? It's possible the
> usual write guarantees are thwarted by VM caching.
Yes it is a proxmox host running the system so it is a qemu vm, I'm
unsure on the caching situation.

> Old version of progs, I suggest upgrading to 4.17.1 and run
I updated the progs to 4.17 and ran the following

btrfs insp dump-s -f /device/:
See attachment

btrfs rescue super -v /device/ (insp rescue super wasn't valid)
All Devices:
Device: id = 1, name = /dev/vda1

Before Recovering:
[All good supers]:
device name = /dev/vda1
superblock bytenr = 65536

device name = /dev/vda1
superblock bytenr = 67108864

device name = /dev/vda1
superblock bytenr = 274877906944

[All bad supers]:

All supers are valid, no need to recover

btrfs check --mode=lowmem /dev/vda1:
parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
parent transid verify failed on 3563221630976 wanted 5184691 found 5184688
parent transid verify failed on 3563221630976 wanted 5184691 found 5184688
parent transid verify failed on 3563223138304 wanted 5184691 found 5184688
parent transid verify failed on 3563223138304 wanted 5184691 found 5184688
parent transid verify failed on 3563224072192 wanted 5184691 found 5184688
parent transid verify failed on 3563224072192 wanted 5184691 found 5184688
parent transid verify failed on 3563225268224 wanted 5184691 found 5184689
parent transid verify failed on 3563225268224 wanted 5184691 found 5184689
parent transid verify failed on 3563227398144 wanted 5184691 found 5184689
parent transid verify failed on 3563227398144 wanted 5184691 found 5184689
parent transid verify failed on 3563229593600 wanted 5184691 found 5184689
parent transid verify failed on 3563229593600 wanted 5184691 found 5184689
parent transid verify failed on 3563229593600 wanted 5184691 found 5184689
parent transid verify failed on 3563229593600 wanted 5184691 found 5184689
Ignoring transid failure
ERROR: child eb corrupted: parent bytenr=3563210342400 item=120 parent
level=1 child level=1
ERROR: cannot open file system

mount -o ro,norecovery,usebackuproot /dev/vda1 /mnt:
Same dmesg output as before.
On Fri, Dec 7, 2018 at 12:56 AM Chris Murphy  wrote:
>
> On Thu, Dec 6, 2018 at 10:24 PM Doni Crosby  wrote:
> >
> > All,
> >
> > I'm coming to you to see if there is a way to fix or at least recover
> > most of the data I have from a btrfs filesystem. The system went down
> > after both a breaker and the battery backup failed. I cannot currently
> > mount the system, with the following error from dmesg:
> >
> > Note: The vda1 is just the entire disk being passed from the VM host
> > to the VM it's not an actual true virtual block device
>
> This is qemu-kvm? What's the cache mode being used? It's possible the
> usual write guarantees are thwarted by VM caching.
>
>
>
> > btrfs check --recover also ends in a segmentation fault
>
> I'm not familiar with --recover option, the --repair option is flagged
> with a warning in the man page.
>Warning
>Do not use --repair unless you are advised to do so by a
> developer or an experienced user,
>
>
> > btrfs --version:
> > btrfs-progs v4.7.3
>
> Old version of progs, I suggest upgrading to 4.17.1 and run
>
> btrfs insp dump-s -f /device/
> btrfs insp rescue super -v /device/
> btrfs check --mode=lowmem /device/
>
> These are all read only commands. Please post output to the list,
> hopefully a developer will get around to looking at it.
>
> It is safe to try:
>
> mount -o ro,norecovery,usebackuproot /device/ /mnt/
>
> If that works, I suggest updating your backup while it's still
> possible in the meantime.
>
>
> --
> Chris Murphy
superblock: bytenr=65536, device=/dev/vda1
-
csum_type   0 (crc32c)
csum_size   4
csum0xbfa6fd72 [match]
bytenr  65536
flags   0x1
( WRITTEN )
magic   _BHRfS_M [match]
fsid7c76bb05-b3dc-4804-bf56-88d010a214c6
label   Array
generation  5184693
root31801344
sys_array_size  226
chunk_root_generation   5183734
root_level  1
chunk_root  20971520
chunk_root_level1
log_root0
log_root_transid0
log_root_level  0
total_bytes 32003947737088
bytes_used  6652776640512
sectorsize  4096
nodesize16384
leafsize (deprecated)   16384
stripesize  4096
root_dir6
num_devices 1
compat_flags0x0
compat_ro_flags 0x0
incompat_flags  0x161
( MIXED_BACKREF |
  

Re: System unable to mount partition after a power loss

2018-12-06 Thread Chris Murphy
On Thu, Dec 6, 2018 at 10:24 PM Doni Crosby  wrote:
>
> All,
>
> I'm coming to you to see if there is a way to fix or at least recover
> most of the data I have from a btrfs filesystem. The system went down
> after both a breaker and the battery backup failed. I cannot currently
> mount the system, with the following error from dmesg:
>
> Note: The vda1 is just the entire disk being passed from the VM host
> to the VM it's not an actual true virtual block device

This is qemu-kvm? What's the cache mode being used? It's possible the
usual write guarantees are thwarted by VM caching.



> btrfs check --recover also ends in a segmentation fault

I'm not familiar with --recover option, the --repair option is flagged
with a warning in the man page.
   Warning
   Do not use --repair unless you are advised to do so by a
developer or an experienced user,


> btrfs --version:
> btrfs-progs v4.7.3

Old version of progs, I suggest upgrading to 4.17.1 and run

btrfs insp dump-s -f /device/
btrfs insp rescue super -v /device/
btrfs check --mode=lowmem /device/

These are all read only commands. Please post output to the list,
hopefully a developer will get around to looking at it.

It is safe to try:

mount -o ro,norecovery,usebackuproot /device/ /mnt/

If that works, I suggest updating your backup while it's still
possible in the meantime.


-- 
Chris Murphy


System unable to mount partition after a power loss

2018-12-06 Thread Doni Crosby
All,

I'm coming to you to see if there is a way to fix or at least recover
most of the data I have from a btrfs filesystem. The system went down
after both a breaker and the battery backup failed. I cannot currently
mount the system, with the following error from dmesg:

Note: The vda1 is just the entire disk being passed from the VM host
to the VM it's not an actual true virtual block device

[ 499.704398] BTRFS info (device vda1): disk space caching is enabled
[  499.704401] BTRFS info (device vda1): has skinny extents
[  499.739522] BTRFS error (device vda1): parent transid verify failed
on 3563231428608 wanted 5184691 found 5183327
[  499.740257] BTRFS error (device vda1): parent transid verify failed
on 3563231428608 wanted 5184691 found 5183327
[  499.770847] BTRFS error (device vda1): open_ctree failed

I have tried running btrfsck:
parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
parent transid verify failed on 3563224121344 wanted 5184691 found 5184688
parent transid verify failed on 3563221630976 wanted 5184691 found 5184688
parent transid verify failed on 3563221630976 wanted 5184691 found 5184688
parent transid verify failed on 3563223138304 wanted 5184691 found 5184688
parent transid verify failed on 3563223138304 wanted 5184691 found 5184688
parent transid verify failed on 3563223138304 wanted 5184691 found 5184688
parent transid verify failed on 3563223138304 wanted 5184691 found 5184688
parent transid verify failed on 3563224072192 wanted 5184691 found 5184688
parent transid verify failed on 3563224072192 wanted 5184691 found 5184688
parent transid verify failed on 3563225268224 wanted 5184691 found 5184689
parent transid verify failed on 3563225268224 wanted 5184691 found 5184689
parent transid verify failed on 3563227398144 wanted 5184691 found 5184689
parent transid verify failed on 3563227398144 wanted 5184691 found 5184689
parent transid verify failed on 3563229593600 wanted 5184691 found 5184689
parent transid verify failed on 3563229593600 wanted 5184691 found 5184689
parent transid verify failed on 3563229593600 wanted 5184691 found 5184689
parent transid verify failed on 3563229593600 wanted 5184691 found 5184689
Ignoring transid failure
Checking filesystem on /dev/vda1
UUID: 7c76bb05-b3dc-4804-bf56-88d010a214c6
checking extents
parent transid verify failed on 3563224842240 wanted 5184691 found 5184689
parent transid verify failed on 3563224842240 wanted 5184691 found 5184689
parent transid verify failed on 3563222974464 wanted 5184691 found 5184688
parent transid verify failed on 3563222974464 wanted 5184691 found 5184688
parent transid verify failed on 3563223121920 wanted 5184691 found 5184688
parent transid verify failed on 3563223121920 wanted 5184691 found 5184688
parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
parent transid verify failed on 3563229970432 wanted 5184691 found 5184689
Ignoring transid failure
parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
parent transid verify failed on 3563231428608 wanted 5184691 found 5183327
Ignoring transid failure
parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
parent transid verify failed on 3563231444992 wanted 5184691 found 5183325
Ignoring transid failure
parent transid verify failed on 3563231412224 wanted 5184691 found 5183325
parent transid verify failed on 3563231412224 wanted 5184691 found 5183325
parent transid verify failed on 3563231412224 wanted 5184691 found 5183325
parent transid verify failed on 3563231412224 wanted 5184691 found 5183325
Ignoring transid failure
parent transid verify failed on 3563231461376 wanted 5184691 found 5183325
parent transid verify failed on 3563231461376 wanted 5184691 found 5183325
parent transid verify failed on 3563231461376 wanted 5184691 found 5183325
parent transid verify failed on 3563231461376 wanted 5184691 found 5183325
Ignoring transid failure
Segmentation fault

btrfs check --recover also ends in a segmentation fault

I am aware of chunk-recover and have tried to run it but got weary
when I saw dev0 not vda1.

Any help would be appreciated,
Doni

uname -a:
Linux Homophone 4.18.0-0.bpo.1-amd64 #1 SMP Debian 4.18.6-1~bpo9+1
(2018-09-13)