Re: Uncorrectable errors on RAID6

2015-06-15 Thread Tobias Holst
Hi Qu, hi all,

 RO snapshot, I remember there is a RO snapshot bug, but seems fixed in 4.x?
Yes, that bug has already been fixed.

 For recovery, first just try cp -r mnt/* to grab what's still completely OK.
 Maybe recovery mount option can do some help in the process?
That's what I did now. I mounted with recovery and copied all of my
important data. But several folders/files couldn't be read, the whole
system stopped responding. Nothing in the logs, nothing on the screen
- but everything is frozen. So I have to take these files out of my
backup.
Also several files produced checksum verify failed, csum failed
and no csum found errrors in the syslog.

 Then you may try btrfs restore, which is the safest method, won't
 write any byte into the offline disks.
Yes but I would need at least the same storage space as for the
original data - and I don't have as much free space somewhere else (or
not quickly available).

 Lastly, you can try the btrfsck --repair, *WITH BINARY BACKUP OF YOUR DISKS*
I don't have a bitwise copy of my disks, but all important data is
secure now. So I tried it, see below.

 BTW, if you decided to use btrfs --repair, please upload the full
 output, since we can use it to improve the b-tree recovery codes.
OK, see below.

 (Yeah, welcome to be a laboratory mice of real world b-tree recovery codes)
Haha, right. Since I have been testing the experimental RAID6-features
of btrfs for a while I know what it means to be a laboratory mice ;)

So back to btrfsck. I started it and after a while this happened in
the syslog. Again and again: https://paste.ee/p/BIs56
According to the internet this is a known but very rare problem with
my LSI 9211-8i controller. It happens when the
PCIe-generation-autodetection detects the card as a PCIe-3.0-card
instead of 2.0 and heavy I/O is happening. Because I never ever had
this bug before it must be coincidence... But not the root cause of
this broken filesystem.
As a result there were many blk_update_request: I/O error, FAILED
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE, Add. Sense: Power
on, reset, or bus device reset occurred and Buffer I/O error/lost
async page write in the syslog.

The result of btrfsck --repair until this point: https://paste.ee/p/nzzAo
Then btrfsck died: https://paste.ee/p/0Brku

Now I rebooted and forced the card to PCIe-generation 2.0, so this bug
shouldn't happen again, and started btrfsck --repair again.
This time it ran without controller-problems and you can find the full
output here: 
https://ssl-account.com/oc.tobby.eu/public.php?service=filest=8b93f56a69ea04886e9bc2c8534b32f6
(huge, about 13MB)

Result: One (out of four) folder in my root-directory is completly
gone (about 8 TB). Two folders seem to be ok (about 1.4 TB). And the
last folder is ok in terms of folder- and subfolder-structure, but
nearly all subfolders are empty (only 230GB of 3.1TB are still there).
So roughly 90% of the data is gone now.

I will now destroy the filesystem, create a new btrfs-RAID-6 and fetch
the data out of my backups. I hope, my logs help a little bit to find
the cause. I didn't have the time to try to reproduce this broken
filesystem - did you try it with loop devices?

Regards,
Tobias


2015-05-29 4:27 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com:


  Original Message  
 Subject: Re: Uncorrectable errors on RAID6
 From: Tobias Holst to...@tobby.eu
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2015年05月29日 10:00

 Thanks, Qu, sad news... :-(
 No, I also didn't defrag with older kernels. Maybe I did it a while
 ago with 3.19.x, but there was a scrub afterwards and it showed no
 error, so this shouldn't be the problem. The things described above
 were all done with 4.0.3/4.0.4.

 Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an
 error in the log, scrub just doesn't do anything according to dstat
 without any error and still shows running.

 The errors/problems started during the first balance but maybe this
 only showed them and is not the cause.

 Here detailed debug infos to (maybe?) recreate the problem. This is
 exactly what happened here over some time. As I can only tell when it
 definitively has been clean (scrub at the beginning of May) an when it
 definitively was broken (now, end of May), there may be some more
 steps neccessary to reproduce, because several things happened in the
 meantime:
 - filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L
 t-raid -O extref,raid56,skinny-metadata,no-holes with 6
 LUKS-encrypted HDDs on kernel 3.19

 LUKS...
 Even LUKS is much stabler than btrfs, and may not be related to the
 bug, your setup is quite complex anyway.

 - mounted with options
 defaults,compress-force=zlib,space_cache,autodefrag


 Normally i'd not recommend compress-force as btrfs can auto detect compress
 ratio.
 But such complex setting up with such mount option as LUKS base should
 be quite a good playground to produce some of bug.

 - copies all data onto it
 - all data

Re: Uncorrectable errors on RAID6

2015-06-15 Thread Qu Wenruo



Tobias Holst wrote on 2015/06/16 03:31 +0200:

Hi Qu, hi all,


RO snapshot, I remember there is a RO snapshot bug, but seems fixed in 4.x?

Yes, that bug has already been fixed.


For recovery, first just try cp -r mnt/* to grab what's still completely OK.
Maybe recovery mount option can do some help in the process?

That's what I did now. I mounted with recovery and copied all of my
important data. But several folders/files couldn't be read, the whole
system stopped responding. Nothing in the logs, nothing on the screen
- but everything is frozen. So I have to take these files out of my
backup.
Also several files produced checksum verify failed, csum failed
and no csum found errrors in the syslog.


Then you may try btrfs restore, which is the safest method, won't
write any byte into the offline disks.

Yes but I would need at least the same storage space as for the
original data - and I don't have as much free space somewhere else (or
not quickly available).


Lastly, you can try the btrfsck --repair, *WITH BINARY BACKUP OF YOUR DISKS*

I don't have a bitwise copy of my disks, but all important data is
secure now. So I tried it, see below.


BTW, if you decided to use btrfs --repair, please upload the full
output, since we can use it to improve the b-tree recovery codes.

OK, see below.


(Yeah, welcome to be a laboratory mice of real world b-tree recovery codes)

Haha, right. Since I have been testing the experimental RAID6-features
of btrfs for a while I know what it means to be a laboratory mice ;)

So back to btrfsck. I started it and after a while this happened in
the syslog. Again and again: https://paste.ee/p/BIs56
According to the internet this is a known but very rare problem with
my LSI 9211-8i controller. It happens when the
PCIe-generation-autodetection detects the card as a PCIe-3.0-card
instead of 2.0 and heavy I/O is happening. Because I never ever had
this bug before it must be coincidence... But not the root cause of
this broken filesystem.
As a result there were many blk_update_request: I/O error, FAILED
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE, Add. Sense: Power
on, reset, or bus device reset occurred and Buffer I/O error/lost
async page write in the syslog.


Hardware bug is quite hard to debug, but you still find the bug, nice!



The result of btrfsck --repair until this point: https://paste.ee/p/nzzAo
Then btrfsck died: https://paste.ee/p/0Brku

Now I rebooted and forced the card to PCIe-generation 2.0, so this bug
shouldn't happen again, and started btrfsck --repair again.
This time it ran without controller-problems and you can find the full
output here: 
https://ssl-account.com/oc.tobby.eu/public.php?service=filest=8b93f56a69ea04886e9bc2c8534b32f6
(huge, about 13MB)
After a brief check, about 55K inodes are salvaged, no doubt some will 
lose its data.




Result: One (out of four) folder in my root-directory is completly
gone (about 8 TB). Two folders seem to be ok (about 1.4 TB). And the
last folder is ok in terms of folder- and subfolder-structure, but
nearly all subfolders are empty (only 230GB of 3.1TB are still there).
So roughly 90% of the data is gone now.


Quite a lot of inode are salvaged in a heavily broken status.
Did you checked lost+found dir in each subvolume?
Almost every salvaged inode is moved to that dir.



I will now destroy the filesystem, create a new btrfs-RAID-6 and fetch
the data out of my backups. I hope, my logs help a little bit to find
the cause. I didn't have the time to try to reproduce this broken
filesystem - did you try it with loop devices?


Not yet, but according to your description it's a problem of the 
controller, right?


Thanks,
Qu


Regards,
Tobias


2015-05-29 4:27 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com:



 Original Message  
Subject: Re: Uncorrectable errors on RAID6
From: Tobias Holst to...@tobby.eu
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年05月29日 10:00


Thanks, Qu, sad news... :-(
No, I also didn't defrag with older kernels. Maybe I did it a while
ago with 3.19.x, but there was a scrub afterwards and it showed no
error, so this shouldn't be the problem. The things described above
were all done with 4.0.3/4.0.4.

Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an
error in the log, scrub just doesn't do anything according to dstat
without any error and still shows running.

The errors/problems started during the first balance but maybe this
only showed them and is not the cause.

Here detailed debug infos to (maybe?) recreate the problem. This is
exactly what happened here over some time. As I can only tell when it
definitively has been clean (scrub at the beginning of May) an when it
definitively was broken (now, end of May), there may be some more
steps neccessary to reproduce, because several things happened in the
meantime:
- filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L
t-raid -O extref,raid56,skinny-metadata,no-holes with 6
LUKS-encrypted HDDs

Re: Uncorrectable errors on RAID6

2015-05-28 Thread Qu Wenruo



 Original Message  
Subject: Re: Uncorrectable errors on RAID6
From: Tobias Holst to...@tobby.eu
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年05月28日 21:13


Ah it's already done. You can find the error-log over here:
https://paste.ee/p/sxCKF

In short there are several of these:
bytenr mismatch, want=6318462353408, have=56676169344768
checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
checksum verify failed on 8955306033152 found 5B5F717A wanted C44CA54E
checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A
checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A

and these:
ref mismatch on [13431504896 16384] extent item 1, found 0
Backref 13431504896 root 7 not referenced back 0x1202acc0
Incorrect global backref count on 13431504896 found 1 wanted 0
backpointer mismatch on [13431504896 16384]
owner ref check failed [13431504896 16384]

and these:
ref mismatch on [1951739412480 524288] extent item 0, found 1
Backref 1951739412480 root 5 owner 27852 offset 644349952 num_refs 0
not found in extent tree
Incorrect local backref count on 1951739412480 root 5 owner 27852
offset 644349952 found 1 wanted 0 back 0x1a92aa20
backpointer mismatch on [1951739412480 524288]

Any ideas? :)


The metadata is really corrupted...

I'd recommend to salvage your data as soon as possible.

For the reason, as you didn't run replace, it should at least not the
bug spotted by Zhao Lei.

BTW, did you run defrag on older kernels?
IIRC, old kernel has bug with snapshot aware defrag, so it's later
disabled in newer kernel.
Not sure if it's related.

Balance may be related but I'm not familiar with balance with RAID5/6.
So hard to say.

Sorry for unable to provide much help.

But if you have enough time to find a stable method to reproduce the 
bug, best try it on loop device, it would definitely help us to debug.


Thanks,
Qu


Regards
Tobias


2015-05-28 14:57 GMT+02:00 Tobias Holst to...@tobby.eu:

Hi Qu,

no, I didn't run a replace. But I ran a defrag with -clzo on all
files while there has been slightly I/O on the devices. Don't know if
this could cause corruptions, too?

Later on I deleted a r/o-snapshot which should free a big amount of
storage space. It didn't free as much as it should so after a few days
I started a balance to free the space. During the balance the first
checksum errors happened and the whole balance process crashed:

[19174.342882] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.365473] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.365651] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.366168] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.366250] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.366392] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.367313] [ cut here ]
[19174.367340] kernel BUG at /home/kernel/COD/linux/fs/btrfs/relocation.c:242!
[19174.367384] invalid opcode:  [#1] SMP
[19174.367418] Modules linked in: iosf_mbi kvm_intel kvm
crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel
aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper
cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp
parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt
ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy
psmouse pata_acpi
[19174.367656] CPU: 1 PID: 4960 Comm: btrfs Not tainted
4.0.4-040004-generic #201505171336
[19174.367703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[19174.367752] task: 8804274e8000 ti: 880367b5 task.ti:
880367b5
[19174.367797] RIP: 0010:[c05ec4ba]  [c05ec4ba]
backref_cache_cleanup+0xea/0x100 [btrfs]
[19174.367867] RSP: 0018:880367b53bd8  EFLAGS: 00010202
[19174.367905] RAX: 88008250d8f8 RBX: 88008250d820 RCX: 00018021
[19174.367948] RDX: 88008250d8d8 RSI: 88008250d8e8 RDI: 4000
[19174.367992] RBP: 880367b53bf8 R08: 880418b77780 R09: 00018021
[19174.368037] R10: c05ec1d9 R11: 00018bf8 R12: 0001
[19174.368081] R13: 88008250d8e8 R14: fffb R15: 880367b53c28
[19174.368125] FS:  7f7fd6831c80() GS:88043fc4()
knlGS:
[19174.368172] CS:  0010 DS:  ES:  CR0: 80050033
[19174.368210] CR2: 7f65f7564770 CR3: 0003ac92f000 CR4: 001407e0
[19174.368257] Stack:
[19174.368279]  fffb 88008250d800 88042b3d46e0
88006845f990
[19174.368327]  880367b53c78 c05f25eb 880367b53c78
0002

Re: Uncorrectable errors on RAID6

2015-05-28 Thread Qu Wenruo



 Original Message  
Subject: Re: Uncorrectable errors on RAID6
From: Tobias Holst to...@tobby.eu
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年05月29日 10:00


Thanks, Qu, sad news... :-(
No, I also didn't defrag with older kernels. Maybe I did it a while
ago with 3.19.x, but there was a scrub afterwards and it showed no
error, so this shouldn't be the problem. The things described above
were all done with 4.0.3/4.0.4.

Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an
error in the log, scrub just doesn't do anything according to dstat
without any error and still shows running.

The errors/problems started during the first balance but maybe this
only showed them and is not the cause.

Here detailed debug infos to (maybe?) recreate the problem. This is
exactly what happened here over some time. As I can only tell when it
definitively has been clean (scrub at the beginning of May) an when it
definitively was broken (now, end of May), there may be some more
steps neccessary to reproduce, because several things happened in the
meantime:
- filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L
t-raid -O extref,raid56,skinny-metadata,no-holes with 6
LUKS-encrypted HDDs on kernel 3.19

LUKS...
Even LUKS is much stabler than btrfs, and may not be related to the
bug, your setup is quite complex anyway.

- mounted with options defaults,compress-force=zlib,space_cache,autodefrag


Normally i'd not recommend compress-force as btrfs can auto detect 
compress ratio.

But such complex setting up with such mount option as LUKS base should
be quite a good playground to produce some of bug.

- copies all data onto it
- all data on the devices is now compressed with zlib
- until now the filesystem is ok, scrub shows no errors

autodefrag seems not related to this bug as you removed it from the
mount option.
As it doesn't even have effect, as you copy data from other place,
without overwrite.


- now mount it with defaults,compress-force=lzo,space_cache instead
- use kernel 4.0.3/4.0.4
- create a r/o-snapshot

RO snapshot, I remember there is a RO snapshot bug, but seems fixed in 4.x?

- defrag some data with -clzo
- have some (not much) I/O during the process
- this should approx. double the size of the defragged data because
your snapshot contains your data compressed with zlib and your volume
contains your data compressed with lzo
- delete the snapshot
- wait some time until the cleaning is complete, still some other I/O
during this
- this doesn't free as much data as the snapshot contained (?)
- is this ok? Maybe here the problem already existed/started
- defrag the rest of all data on the devices with -clzo, still some
other I/O during this
- now start a balance of the whole array
- errors will spam the log and it's broken.

I hope, it is possible to reproduce the errors and find out exactly
when this happens. I'll do the same steps again, too, but maybe there
is someone else who could try it as well?

I'll try it with script, but maybe without LUKS to simplify the setup.

With some small loop-devices
just for testing this shouldn't take too long even if it sounds like
that ;-)

Back to my actual data: Are there any tips on how to recover?
For recovery, first just try cp -r mnt/* to grab what's still 
completely OK.

Maybe recovery mount option can do some help in the process?

Then you may try btrfs restore, which is the safest method, won't
write any byte into the offline disks.

Lastly, you can try the btrfsck --repair, *WITH BINARY BACKUP OF YOUR DISKS*

For best luck, it can make your filesystem completely clean at the cost
of some file lost(maybe file name lost or part of data lost, or nothing
remaining).
Some corrupted file can be partly recovered into 'lost+found' dir of 
each subvolume.

At the best case, the recovered fs can pass btrfsck without any error.

But for your case, the salvaged data will be somewhat meaningless, as
it works best for uncompressed data!

And for the worst case, your filesystem will be corrupted even more.
So consider twice before using btrfsck --repair.

BTW, if you decided to use btrfs --repair, please upload the full
output, since we can use it to improve the b-tree recovery codes.
(Yeah, welcome to be a laboratory mice of real world b-tree recovery codes)

Thanks,
Qu
 Mount

with recover, copy over and see the log, which files seem to be
broken? Or some (dangerous) tricks on how to repair this broken file
system?
I do have a full backup, but it's very slow and may take weeks
(months?), if I have to recover everything.

Regards,
Tobias



2015-05-29 2:36 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com:



 Original Message  
Subject: Re: Uncorrectable errors on RAID6
From: Tobias Holst to...@tobby.eu
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2015年05月28日 21:13


Ah it's already done. You can find the error-log over here:
https://paste.ee/p/sxCKF

In short there are several of these:
bytenr mismatch, want=6318462353408

Re: Uncorrectable errors on RAID6

2015-05-28 Thread Tobias Holst
Thanks, Qu, sad news... :-(
No, I also didn't defrag with older kernels. Maybe I did it a while
ago with 3.19.x, but there was a scrub afterwards and it showed no
error, so this shouldn't be the problem. The things described above
were all done with 4.0.3/4.0.4.

Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an
error in the log, scrub just doesn't do anything according to dstat
without any error and still shows running.

The errors/problems started during the first balance but maybe this
only showed them and is not the cause.

Here detailed debug infos to (maybe?) recreate the problem. This is
exactly what happened here over some time. As I can only tell when it
definitively has been clean (scrub at the beginning of May) an when it
definitively was broken (now, end of May), there may be some more
steps neccessary to reproduce, because several things happened in the
meantime:
- filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L
t-raid -O extref,raid56,skinny-metadata,no-holes with 6
LUKS-encrypted HDDs on kernel 3.19
- mounted with options defaults,compress-force=zlib,space_cache,autodefrag
- copies all data onto it
- all data on the devices is now compressed with zlib
- until now the filesystem is ok, scrub shows no errors
- now mount it with defaults,compress-force=lzo,space_cache instead
- use kernel 4.0.3/4.0.4
- create a r/o-snapshot
- defrag some data with -clzo
- have some (not much) I/O during the process
- this should approx. double the size of the defragged data because
your snapshot contains your data compressed with zlib and your volume
contains your data compressed with lzo
- delete the snapshot
- wait some time until the cleaning is complete, still some other I/O
during this
- this doesn't free as much data as the snapshot contained (?)
- is this ok? Maybe here the problem already existed/started
- defrag the rest of all data on the devices with -clzo, still some
other I/O during this
- now start a balance of the whole array
- errors will spam the log and it's broken.

I hope, it is possible to reproduce the errors and find out exactly
when this happens. I'll do the same steps again, too, but maybe there
is someone else who could try it as well? With some small loop-devices
just for testing this shouldn't take too long even if it sounds like
that ;-)

Back to my actual data: Are there any tips on how to recover? Mount
with recover, copy over and see the log, which files seem to be
broken? Or some (dangerous) tricks on how to repair this broken file
system?
I do have a full backup, but it's very slow and may take weeks
(months?), if I have to recover everything.

Regards,
Tobias



2015-05-29 2:36 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com:


  Original Message  
 Subject: Re: Uncorrectable errors on RAID6
 From: Tobias Holst to...@tobby.eu
 To: Qu Wenruo quwen...@cn.fujitsu.com
 Date: 2015年05月28日 21:13

 Ah it's already done. You can find the error-log over here:
 https://paste.ee/p/sxCKF

 In short there are several of these:
 bytenr mismatch, want=6318462353408, have=56676169344768
 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
 checksum verify failed on 8955306033152 found 5B5F717A wanted C44CA54E
 checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A
 checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A

 and these:
 ref mismatch on [13431504896 16384] extent item 1, found 0
 Backref 13431504896 root 7 not referenced back 0x1202acc0
 Incorrect global backref count on 13431504896 found 1 wanted 0
 backpointer mismatch on [13431504896 16384]
 owner ref check failed [13431504896 16384]

 and these:
 ref mismatch on [1951739412480 524288] extent item 0, found 1
 Backref 1951739412480 root 5 owner 27852 offset 644349952 num_refs 0
 not found in extent tree
 Incorrect local backref count on 1951739412480 root 5 owner 27852
 offset 644349952 found 1 wanted 0 back 0x1a92aa20
 backpointer mismatch on [1951739412480 524288]

 Any ideas? :)

 The metadata is really corrupted...

 I'd recommend to salvage your data as soon as possible.

 For the reason, as you didn't run replace, it should at least not the
 bug spotted by Zhao Lei.

 BTW, did you run defrag on older kernels?
 IIRC, old kernel has bug with snapshot aware defrag, so it's later
 disabled in newer kernel.
 Not sure if it's related.

 Balance may be related but I'm not familiar with balance with RAID5/6.
 So hard to say.

 Sorry for unable to provide much help.

 But if you have enough time to find a stable method to reproduce the bug,
 best try it on loop device, it would definitely help us to debug.

 Thanks,
 Qu


 Regards
 Tobias


 2015-05-28 14:57 GMT+02:00 Tobias Holst to...@tobby.eu:

 Hi Qu,

 no, I didn't run a replace. But I ran a defrag with -clzo on all
 files while there has been slightly I/O on the devices. Don't know if
 this could cause

Re: Uncorrectable errors on RAID6

2015-05-28 Thread Duncan
Tobias Holst posted on Fri, 29 May 2015 04:00:15 +0200 as excerpted:

 Back to my actual data: Are there any tips on how to recover? Mount
 with recover, copy over and see the log, which files seem to be
 broken? Or some (dangerous) tricks on how to repair this broken file
 system?
 I do have a full backup, but it's very slow and may take weeks
 (months?), if I have to recover everything.

Unfortunately I can't be of any direct help.  For that, Qu is a dev and 
already providing quite a bit.  But perhaps this will help a bit with 
background and in further decisions once the big current issue is dealt 
with...

With that out of the way...

As a (non-dev) btrfs user, sysadmin, and list regular, I can point out 
that full btrfs raid56 mode support is quite new, 3.19 was the first that 
had complete support in theory, and any code that new is very likely 
buggy enough you won't want to rely on it for anything but testing.  Real-
world deployment... can come later, after a few kernel cycles worth of 
maturing.  I've been recommending waiting at least two kernel cycles to 
work out the worst bugs, and that would still be very leading, perhaps 
bleeding, edge.  Better to wait about five cycles, a year or so, after 
which point btrfs raid56 mode should have stabilized to about that of the 
rest of btrfs, which is to say, not entirely stable yet, but reasonably 
usable for most people, provided they're following the sysadmin's backups 
rule, if they don't have backups by definition they don't care about the 
data regardless of claims to the contrary, and untested would-be backups 
cannot for purposes of this rule be considered backups.

The recommendation for now thus remains to stick with btrfs raid1 or 
raid10 modes, which are already effectively as mature as btrfs itself 
is.  Of course, given the six devices in your raid6, raid10 would be the 
more common choice, but since btrfs raid1 is only two-way-mirrored in any 
case, you'd get the same effective three-device capacity (assuming 
devices of roughly the same size) either way

And in fact the list unfortunately has several threads of folks with 
similar raid56 mode issues.  On the bright side, I guess their disasters 
are where the improvements and stabilization come from that the folks 
waiting the recommended two kernel cycles minimum, better a year (five 
kernel cycles), get, and were they not there, the recommended wait time 
would have to be longer.  Unfortunately that's little help for the folks 
with the problem...

So you have a backup, but it's slow enough you're looking at weeks or 
months to recover from it.  So it's a last-resort backup, but not a 
/practical/ backup.

How on earth did you come to use btrfs raid56 mode for this more or less 
not practically backed up data, despite the recommendations and long 
history of partial raid56 support indicating its complexity and thus the 
likelihood of severe bugs still being present, in the first place?  In 
fact, given a restore time of weeks to months and the fact that btrfs 
itself isn't yet completely stable, I'd wonder about choosing it in any 
mode (I can't imagine doing so myself with that sort of restore time, and 
I'd give up fancy features in ordered to get something as stable as 
possible, to cut down as far as possible the chance of having to use 
it... or perhaps more practically, I'd have an on-site primary backup 
with restore time on the order of hours to days, in addition to the 
presumably remote, slow backup and restore, which never-the-less remains 
an excellent insurance policy for the worst-case), but certainly, the 
still so new it's extremely likely to be buggy enough to eat data raid56 
mode isn't appropriate.

Hopefully you can restore, either via direct copy-off, or using btrfs 
restore (as Qu mentions), which has in fact been something I've used a 
couple times myself (on btrfs raid1, there's a reason I say btrfs itself 
isn't fully stable yet) as I've had backups but they weren't current 
(obviously a tradeoff I was willing to make, given my knowledge of the 
sysadmin's backup rule above), and btrfs restore worked better for me 
than the backups would have.

But given that you'll have to be restoring to something else, I'd 
strongly recommend at /least/ switching to btrfs raid1/10 mode, for the 
time being, if not to something other than btrfs if you still aren't 
going to have backups that restore in hours to days rather than weeks to 
months, because btrfs really /isn't/ stable enough for the latter case 
yet.

Then, since you'll have the extra storage you'll have freed after 
switching to the restored copy, I'd use that to create that local backup, 
restorable in days at maximum, rather than weeks at minimum, that you're 
currently missing.  With that backup in-place and tested, going ahead and 
playing with btrfs in its still not entirely stable, but for daily use 
with backups ready if needed, stable /enough/, is reasonable.  Just stay 
away from the raid56 stuff 

Re: Uncorrectable errors on RAID6

2015-05-28 Thread Tobias Holst
Ah it's already done. You can find the error-log over here:
https://paste.ee/p/sxCKF

In short there are several of these:
bytenr mismatch, want=6318462353408, have=56676169344768
checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
checksum verify failed on 8955306033152 found 5B5F717A wanted C44CA54E
checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A
checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A

and these:
ref mismatch on [13431504896 16384] extent item 1, found 0
Backref 13431504896 root 7 not referenced back 0x1202acc0
Incorrect global backref count on 13431504896 found 1 wanted 0
backpointer mismatch on [13431504896 16384]
owner ref check failed [13431504896 16384]

and these:
ref mismatch on [1951739412480 524288] extent item 0, found 1
Backref 1951739412480 root 5 owner 27852 offset 644349952 num_refs 0
not found in extent tree
Incorrect local backref count on 1951739412480 root 5 owner 27852
offset 644349952 found 1 wanted 0 back 0x1a92aa20
backpointer mismatch on [1951739412480 524288]

Any ideas? :)

Regards
Tobias


2015-05-28 14:57 GMT+02:00 Tobias Holst to...@tobby.eu:
 Hi Qu,

 no, I didn't run a replace. But I ran a defrag with -clzo on all
 files while there has been slightly I/O on the devices. Don't know if
 this could cause corruptions, too?

 Later on I deleted a r/o-snapshot which should free a big amount of
 storage space. It didn't free as much as it should so after a few days
 I started a balance to free the space. During the balance the first
 checksum errors happened and the whole balance process crashed:

 [19174.342882] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.365473] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.365651] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.366168] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.366250] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.366392] BTRFS: dm-5 checksum verify failed on 6318462353408
 wanted 25D94CD6 found 8BA427D4 level 1
 [19174.367313] [ cut here ]
 [19174.367340] kernel BUG at /home/kernel/COD/linux/fs/btrfs/relocation.c:242!
 [19174.367384] invalid opcode:  [#1] SMP
 [19174.367418] Modules linked in: iosf_mbi kvm_intel kvm
 crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel
 aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper
 cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp
 parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt
 ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy
 psmouse pata_acpi
 [19174.367656] CPU: 1 PID: 4960 Comm: btrfs Not tainted
 4.0.4-040004-generic #201505171336
 [19174.367703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
 BIOS Bochs 01/01/2011
 [19174.367752] task: 8804274e8000 ti: 880367b5 task.ti:
 880367b5
 [19174.367797] RIP: 0010:[c05ec4ba]  [c05ec4ba]
 backref_cache_cleanup+0xea/0x100 [btrfs]
 [19174.367867] RSP: 0018:880367b53bd8  EFLAGS: 00010202
 [19174.367905] RAX: 88008250d8f8 RBX: 88008250d820 RCX: 
 00018021
 [19174.367948] RDX: 88008250d8d8 RSI: 88008250d8e8 RDI: 
 4000
 [19174.367992] RBP: 880367b53bf8 R08: 880418b77780 R09: 
 00018021
 [19174.368037] R10: c05ec1d9 R11: 00018bf8 R12: 
 0001
 [19174.368081] R13: 88008250d8e8 R14: fffb R15: 
 880367b53c28
 [19174.368125] FS:  7f7fd6831c80() GS:88043fc4()
 knlGS:
 [19174.368172] CS:  0010 DS:  ES:  CR0: 80050033
 [19174.368210] CR2: 7f65f7564770 CR3: 0003ac92f000 CR4: 
 001407e0
 [19174.368257] Stack:
 [19174.368279]  fffb 88008250d800 88042b3d46e0
 88006845f990
 [19174.368327]  880367b53c78 c05f25eb 880367b53c78
 0002
 [19174.368376]  00ff880429e4c670 a910d8fb7e00 
 
 [19174.368424] Call Trace:
 [19174.368459]  [c05f25eb] relocate_block_group+0x2cb/0x510 [btrfs]
 [19174.368509]  [c05f29e0]
 btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs]
 [19174.368562]  [c05c6eab]
 btrfs_relocate_chunk.isra.75+0x4b/0xd0 [btrfs]
 [19174.368615]  [c05c82e8] __btrfs_balance+0x348/0x460 [btrfs]
 [19174.368663]  [c05c87b5] btrfs_balance+0x3b5/0x5d0 [btrfs]
 [19174.368710]  [c05d5cac] btrfs_ioctl_balance+0x1cc/0x530 [btrfs]
 [19174.368756]  [811b52e0] ? handle_mm_fault+0xb0/0x160
 [19174.368802]  [c05d7c7e] btrfs_ioctl+0x69e/0xb20 [btrfs]
 [19174.368845]  [8120f5b5] 

Re: Uncorrectable errors on RAID6

2015-05-28 Thread Tobias Holst
Hi Qu,

no, I didn't run a replace. But I ran a defrag with -clzo on all
files while there has been slightly I/O on the devices. Don't know if
this could cause corruptions, too?

Later on I deleted a r/o-snapshot which should free a big amount of
storage space. It didn't free as much as it should so after a few days
I started a balance to free the space. During the balance the first
checksum errors happened and the whole balance process crashed:

[19174.342882] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.365473] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.365651] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.366168] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.366250] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.366392] BTRFS: dm-5 checksum verify failed on 6318462353408
wanted 25D94CD6 found 8BA427D4 level 1
[19174.367313] [ cut here ]
[19174.367340] kernel BUG at /home/kernel/COD/linux/fs/btrfs/relocation.c:242!
[19174.367384] invalid opcode:  [#1] SMP
[19174.367418] Modules linked in: iosf_mbi kvm_intel kvm
crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel
aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper
cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp
parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt
ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy
psmouse pata_acpi
[19174.367656] CPU: 1 PID: 4960 Comm: btrfs Not tainted
4.0.4-040004-generic #201505171336
[19174.367703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[19174.367752] task: 8804274e8000 ti: 880367b5 task.ti:
880367b5
[19174.367797] RIP: 0010:[c05ec4ba]  [c05ec4ba]
backref_cache_cleanup+0xea/0x100 [btrfs]
[19174.367867] RSP: 0018:880367b53bd8  EFLAGS: 00010202
[19174.367905] RAX: 88008250d8f8 RBX: 88008250d820 RCX: 00018021
[19174.367948] RDX: 88008250d8d8 RSI: 88008250d8e8 RDI: 4000
[19174.367992] RBP: 880367b53bf8 R08: 880418b77780 R09: 00018021
[19174.368037] R10: c05ec1d9 R11: 00018bf8 R12: 0001
[19174.368081] R13: 88008250d8e8 R14: fffb R15: 880367b53c28
[19174.368125] FS:  7f7fd6831c80() GS:88043fc4()
knlGS:
[19174.368172] CS:  0010 DS:  ES:  CR0: 80050033
[19174.368210] CR2: 7f65f7564770 CR3: 0003ac92f000 CR4: 001407e0
[19174.368257] Stack:
[19174.368279]  fffb 88008250d800 88042b3d46e0
88006845f990
[19174.368327]  880367b53c78 c05f25eb 880367b53c78
0002
[19174.368376]  00ff880429e4c670 a910d8fb7e00 

[19174.368424] Call Trace:
[19174.368459]  [c05f25eb] relocate_block_group+0x2cb/0x510 [btrfs]
[19174.368509]  [c05f29e0]
btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs]
[19174.368562]  [c05c6eab]
btrfs_relocate_chunk.isra.75+0x4b/0xd0 [btrfs]
[19174.368615]  [c05c82e8] __btrfs_balance+0x348/0x460 [btrfs]
[19174.368663]  [c05c87b5] btrfs_balance+0x3b5/0x5d0 [btrfs]
[19174.368710]  [c05d5cac] btrfs_ioctl_balance+0x1cc/0x530 [btrfs]
[19174.368756]  [811b52e0] ? handle_mm_fault+0xb0/0x160
[19174.368802]  [c05d7c7e] btrfs_ioctl+0x69e/0xb20 [btrfs]
[19174.368845]  [8120f5b5] do_vfs_ioctl+0x75/0x320
[19174.368882]  [8120f8f1] SyS_ioctl+0x91/0xb0
[19174.368923]  [817f098d] system_call_fastpath+0x16/0x1b
[19174.368962] Code: 3b 00 75 29 44 8b a3 00 01 00 00 45 85 e4 75 1b
44 8b 9b 04 01 00 00 45 85 db 75 0d 48 83 c4 08 5b 41 5c 41 5d 5d c3
0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00
00 00
[19174.369133] RIP  [c05ec4ba]
backref_cache_cleanup+0xea/0x100 [btrfs]
[19174.369186]  RSP 880367b53bd8
[19174.369827] [ cut here ]
[19174.369827] kernel BUG at /home/kernel/COD/linux/arch/x86/mm/pageattr.c:216!
[19174.369827] invalid opcode:  [#2] SMP
[19174.369827] Modules linked in: iosf_mbi kvm_intel kvm
crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel
aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper
cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp
parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt
ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy
psmouse pata_acpi
[19174.369827] CPU: 1 PID: 4960 Comm: btrfs Not tainted
4.0.4-040004-generic #201505171336
[19174.369827] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[19174.369827] task: 8804274e8000 ti: 880367b5 task.ti:

Re: Uncorrectable errors on RAID6

2015-05-27 Thread Qu Wenruo



 Original Message  
Subject: Uncorrectable errors on RAID6
From: Tobias Holst to...@tobby.eu
To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org
Date: 2015年05月28日 10:18


Hi

I am doing a scrub on my 6-drive btrfs RAID6. Last time it found zero
errors, but now I am getting this in my log:

[ 6610.888020] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6610.888025] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6610.888029] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[ 6611.271334] BTRFS: unable to fixup (regular) error at logical
478232346624 on dev /dev/dm-2
[ 6611.831370] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6611.831373] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6611.831375] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[ 6612.396402] BTRFS: unable to fixup (regular) error at logical
478232346624 on dev /dev/dm-2
[ 6904.027456] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6904.027460] BTRFS: checksum error at logical 478232346624 on dev
/dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
[ 6904.027463] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0

Looks like it is always the same sector.

btrfs balance status shows me:
scrub status for a34ce68b-bb9f-49f0-91fe-21a924ef11ae
 scrub started at Thu May 28 02:25:31 2015, running for 6759 seconds
 total bytes scrubbed: 448.87GiB with 14 errors
 error details: read=8 csum=6
 corrected errors: 3, uncorrectable errors: 11, unverified errors: 0

What does it mean and why are these erros uncorrectable even on a RAID6?
Can I find out, which files are affected?

If it's OK for you to put the fs offline,
btrfsck is the best method to check what happens, although it may take a 
long time.


There is a known bug that replace can cause checksum error, found by 
Zhao Lei.

So did you run replace while there is still some other disk I/O happens?

Thanks,
Qu


system: Ubuntu 14.04.2
kernel version 4.0.4
btrfs-tools version: 4.0

Regards
Tobias
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html