Re: Uncorrectable errors on RAID6
Hi Qu, hi all, RO snapshot, I remember there is a RO snapshot bug, but seems fixed in 4.x? Yes, that bug has already been fixed. For recovery, first just try cp -r mnt/* to grab what's still completely OK. Maybe recovery mount option can do some help in the process? That's what I did now. I mounted with recovery and copied all of my important data. But several folders/files couldn't be read, the whole system stopped responding. Nothing in the logs, nothing on the screen - but everything is frozen. So I have to take these files out of my backup. Also several files produced checksum verify failed, csum failed and no csum found errrors in the syslog. Then you may try btrfs restore, which is the safest method, won't write any byte into the offline disks. Yes but I would need at least the same storage space as for the original data - and I don't have as much free space somewhere else (or not quickly available). Lastly, you can try the btrfsck --repair, *WITH BINARY BACKUP OF YOUR DISKS* I don't have a bitwise copy of my disks, but all important data is secure now. So I tried it, see below. BTW, if you decided to use btrfs --repair, please upload the full output, since we can use it to improve the b-tree recovery codes. OK, see below. (Yeah, welcome to be a laboratory mice of real world b-tree recovery codes) Haha, right. Since I have been testing the experimental RAID6-features of btrfs for a while I know what it means to be a laboratory mice ;) So back to btrfsck. I started it and after a while this happened in the syslog. Again and again: https://paste.ee/p/BIs56 According to the internet this is a known but very rare problem with my LSI 9211-8i controller. It happens when the PCIe-generation-autodetection detects the card as a PCIe-3.0-card instead of 2.0 and heavy I/O is happening. Because I never ever had this bug before it must be coincidence... But not the root cause of this broken filesystem. As a result there were many blk_update_request: I/O error, FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE, Add. Sense: Power on, reset, or bus device reset occurred and Buffer I/O error/lost async page write in the syslog. The result of btrfsck --repair until this point: https://paste.ee/p/nzzAo Then btrfsck died: https://paste.ee/p/0Brku Now I rebooted and forced the card to PCIe-generation 2.0, so this bug shouldn't happen again, and started btrfsck --repair again. This time it ran without controller-problems and you can find the full output here: https://ssl-account.com/oc.tobby.eu/public.php?service=filest=8b93f56a69ea04886e9bc2c8534b32f6 (huge, about 13MB) Result: One (out of four) folder in my root-directory is completly gone (about 8 TB). Two folders seem to be ok (about 1.4 TB). And the last folder is ok in terms of folder- and subfolder-structure, but nearly all subfolders are empty (only 230GB of 3.1TB are still there). So roughly 90% of the data is gone now. I will now destroy the filesystem, create a new btrfs-RAID-6 and fetch the data out of my backups. I hope, my logs help a little bit to find the cause. I didn't have the time to try to reproduce this broken filesystem - did you try it with loop devices? Regards, Tobias 2015-05-29 4:27 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com: Original Message Subject: Re: Uncorrectable errors on RAID6 From: Tobias Holst to...@tobby.eu To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年05月29日 10:00 Thanks, Qu, sad news... :-( No, I also didn't defrag with older kernels. Maybe I did it a while ago with 3.19.x, but there was a scrub afterwards and it showed no error, so this shouldn't be the problem. The things described above were all done with 4.0.3/4.0.4. Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an error in the log, scrub just doesn't do anything according to dstat without any error and still shows running. The errors/problems started during the first balance but maybe this only showed them and is not the cause. Here detailed debug infos to (maybe?) recreate the problem. This is exactly what happened here over some time. As I can only tell when it definitively has been clean (scrub at the beginning of May) an when it definitively was broken (now, end of May), there may be some more steps neccessary to reproduce, because several things happened in the meantime: - filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L t-raid -O extref,raid56,skinny-metadata,no-holes with 6 LUKS-encrypted HDDs on kernel 3.19 LUKS... Even LUKS is much stabler than btrfs, and may not be related to the bug, your setup is quite complex anyway. - mounted with options defaults,compress-force=zlib,space_cache,autodefrag Normally i'd not recommend compress-force as btrfs can auto detect compress ratio. But such complex setting up with such mount option as LUKS base should be quite a good playground to produce some of bug. - copies all data onto it - all data
Re: Uncorrectable errors on RAID6
Tobias Holst wrote on 2015/06/16 03:31 +0200: Hi Qu, hi all, RO snapshot, I remember there is a RO snapshot bug, but seems fixed in 4.x? Yes, that bug has already been fixed. For recovery, first just try cp -r mnt/* to grab what's still completely OK. Maybe recovery mount option can do some help in the process? That's what I did now. I mounted with recovery and copied all of my important data. But several folders/files couldn't be read, the whole system stopped responding. Nothing in the logs, nothing on the screen - but everything is frozen. So I have to take these files out of my backup. Also several files produced checksum verify failed, csum failed and no csum found errrors in the syslog. Then you may try btrfs restore, which is the safest method, won't write any byte into the offline disks. Yes but I would need at least the same storage space as for the original data - and I don't have as much free space somewhere else (or not quickly available). Lastly, you can try the btrfsck --repair, *WITH BINARY BACKUP OF YOUR DISKS* I don't have a bitwise copy of my disks, but all important data is secure now. So I tried it, see below. BTW, if you decided to use btrfs --repair, please upload the full output, since we can use it to improve the b-tree recovery codes. OK, see below. (Yeah, welcome to be a laboratory mice of real world b-tree recovery codes) Haha, right. Since I have been testing the experimental RAID6-features of btrfs for a while I know what it means to be a laboratory mice ;) So back to btrfsck. I started it and after a while this happened in the syslog. Again and again: https://paste.ee/p/BIs56 According to the internet this is a known but very rare problem with my LSI 9211-8i controller. It happens when the PCIe-generation-autodetection detects the card as a PCIe-3.0-card instead of 2.0 and heavy I/O is happening. Because I never ever had this bug before it must be coincidence... But not the root cause of this broken filesystem. As a result there were many blk_update_request: I/O error, FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE, Add. Sense: Power on, reset, or bus device reset occurred and Buffer I/O error/lost async page write in the syslog. Hardware bug is quite hard to debug, but you still find the bug, nice! The result of btrfsck --repair until this point: https://paste.ee/p/nzzAo Then btrfsck died: https://paste.ee/p/0Brku Now I rebooted and forced the card to PCIe-generation 2.0, so this bug shouldn't happen again, and started btrfsck --repair again. This time it ran without controller-problems and you can find the full output here: https://ssl-account.com/oc.tobby.eu/public.php?service=filest=8b93f56a69ea04886e9bc2c8534b32f6 (huge, about 13MB) After a brief check, about 55K inodes are salvaged, no doubt some will lose its data. Result: One (out of four) folder in my root-directory is completly gone (about 8 TB). Two folders seem to be ok (about 1.4 TB). And the last folder is ok in terms of folder- and subfolder-structure, but nearly all subfolders are empty (only 230GB of 3.1TB are still there). So roughly 90% of the data is gone now. Quite a lot of inode are salvaged in a heavily broken status. Did you checked lost+found dir in each subvolume? Almost every salvaged inode is moved to that dir. I will now destroy the filesystem, create a new btrfs-RAID-6 and fetch the data out of my backups. I hope, my logs help a little bit to find the cause. I didn't have the time to try to reproduce this broken filesystem - did you try it with loop devices? Not yet, but according to your description it's a problem of the controller, right? Thanks, Qu Regards, Tobias 2015-05-29 4:27 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com: Original Message Subject: Re: Uncorrectable errors on RAID6 From: Tobias Holst to...@tobby.eu To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年05月29日 10:00 Thanks, Qu, sad news... :-( No, I also didn't defrag with older kernels. Maybe I did it a while ago with 3.19.x, but there was a scrub afterwards and it showed no error, so this shouldn't be the problem. The things described above were all done with 4.0.3/4.0.4. Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an error in the log, scrub just doesn't do anything according to dstat without any error and still shows running. The errors/problems started during the first balance but maybe this only showed them and is not the cause. Here detailed debug infos to (maybe?) recreate the problem. This is exactly what happened here over some time. As I can only tell when it definitively has been clean (scrub at the beginning of May) an when it definitively was broken (now, end of May), there may be some more steps neccessary to reproduce, because several things happened in the meantime: - filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L t-raid -O extref,raid56,skinny-metadata,no-holes with 6 LUKS-encrypted HDDs
Re: Uncorrectable errors on RAID6
Original Message Subject: Re: Uncorrectable errors on RAID6 From: Tobias Holst to...@tobby.eu To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年05月28日 21:13 Ah it's already done. You can find the error-log over here: https://paste.ee/p/sxCKF In short there are several of these: bytenr mismatch, want=6318462353408, have=56676169344768 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890 checksum verify failed on 8955306033152 found 5B5F717A wanted C44CA54E checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A and these: ref mismatch on [13431504896 16384] extent item 1, found 0 Backref 13431504896 root 7 not referenced back 0x1202acc0 Incorrect global backref count on 13431504896 found 1 wanted 0 backpointer mismatch on [13431504896 16384] owner ref check failed [13431504896 16384] and these: ref mismatch on [1951739412480 524288] extent item 0, found 1 Backref 1951739412480 root 5 owner 27852 offset 644349952 num_refs 0 not found in extent tree Incorrect local backref count on 1951739412480 root 5 owner 27852 offset 644349952 found 1 wanted 0 back 0x1a92aa20 backpointer mismatch on [1951739412480 524288] Any ideas? :) The metadata is really corrupted... I'd recommend to salvage your data as soon as possible. For the reason, as you didn't run replace, it should at least not the bug spotted by Zhao Lei. BTW, did you run defrag on older kernels? IIRC, old kernel has bug with snapshot aware defrag, so it's later disabled in newer kernel. Not sure if it's related. Balance may be related but I'm not familiar with balance with RAID5/6. So hard to say. Sorry for unable to provide much help. But if you have enough time to find a stable method to reproduce the bug, best try it on loop device, it would definitely help us to debug. Thanks, Qu Regards Tobias 2015-05-28 14:57 GMT+02:00 Tobias Holst to...@tobby.eu: Hi Qu, no, I didn't run a replace. But I ran a defrag with -clzo on all files while there has been slightly I/O on the devices. Don't know if this could cause corruptions, too? Later on I deleted a r/o-snapshot which should free a big amount of storage space. It didn't free as much as it should so after a few days I started a balance to free the space. During the balance the first checksum errors happened and the whole balance process crashed: [19174.342882] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.365473] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.365651] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366168] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366250] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366392] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.367313] [ cut here ] [19174.367340] kernel BUG at /home/kernel/COD/linux/fs/btrfs/relocation.c:242! [19174.367384] invalid opcode: [#1] SMP [19174.367418] Modules linked in: iosf_mbi kvm_intel kvm crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy psmouse pata_acpi [19174.367656] CPU: 1 PID: 4960 Comm: btrfs Not tainted 4.0.4-040004-generic #201505171336 [19174.367703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [19174.367752] task: 8804274e8000 ti: 880367b5 task.ti: 880367b5 [19174.367797] RIP: 0010:[c05ec4ba] [c05ec4ba] backref_cache_cleanup+0xea/0x100 [btrfs] [19174.367867] RSP: 0018:880367b53bd8 EFLAGS: 00010202 [19174.367905] RAX: 88008250d8f8 RBX: 88008250d820 RCX: 00018021 [19174.367948] RDX: 88008250d8d8 RSI: 88008250d8e8 RDI: 4000 [19174.367992] RBP: 880367b53bf8 R08: 880418b77780 R09: 00018021 [19174.368037] R10: c05ec1d9 R11: 00018bf8 R12: 0001 [19174.368081] R13: 88008250d8e8 R14: fffb R15: 880367b53c28 [19174.368125] FS: 7f7fd6831c80() GS:88043fc4() knlGS: [19174.368172] CS: 0010 DS: ES: CR0: 80050033 [19174.368210] CR2: 7f65f7564770 CR3: 0003ac92f000 CR4: 001407e0 [19174.368257] Stack: [19174.368279] fffb 88008250d800 88042b3d46e0 88006845f990 [19174.368327] 880367b53c78 c05f25eb 880367b53c78 0002
Re: Uncorrectable errors on RAID6
Original Message Subject: Re: Uncorrectable errors on RAID6 From: Tobias Holst to...@tobby.eu To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年05月29日 10:00 Thanks, Qu, sad news... :-( No, I also didn't defrag with older kernels. Maybe I did it a while ago with 3.19.x, but there was a scrub afterwards and it showed no error, so this shouldn't be the problem. The things described above were all done with 4.0.3/4.0.4. Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an error in the log, scrub just doesn't do anything according to dstat without any error and still shows running. The errors/problems started during the first balance but maybe this only showed them and is not the cause. Here detailed debug infos to (maybe?) recreate the problem. This is exactly what happened here over some time. As I can only tell when it definitively has been clean (scrub at the beginning of May) an when it definitively was broken (now, end of May), there may be some more steps neccessary to reproduce, because several things happened in the meantime: - filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L t-raid -O extref,raid56,skinny-metadata,no-holes with 6 LUKS-encrypted HDDs on kernel 3.19 LUKS... Even LUKS is much stabler than btrfs, and may not be related to the bug, your setup is quite complex anyway. - mounted with options defaults,compress-force=zlib,space_cache,autodefrag Normally i'd not recommend compress-force as btrfs can auto detect compress ratio. But such complex setting up with such mount option as LUKS base should be quite a good playground to produce some of bug. - copies all data onto it - all data on the devices is now compressed with zlib - until now the filesystem is ok, scrub shows no errors autodefrag seems not related to this bug as you removed it from the mount option. As it doesn't even have effect, as you copy data from other place, without overwrite. - now mount it with defaults,compress-force=lzo,space_cache instead - use kernel 4.0.3/4.0.4 - create a r/o-snapshot RO snapshot, I remember there is a RO snapshot bug, but seems fixed in 4.x? - defrag some data with -clzo - have some (not much) I/O during the process - this should approx. double the size of the defragged data because your snapshot contains your data compressed with zlib and your volume contains your data compressed with lzo - delete the snapshot - wait some time until the cleaning is complete, still some other I/O during this - this doesn't free as much data as the snapshot contained (?) - is this ok? Maybe here the problem already existed/started - defrag the rest of all data on the devices with -clzo, still some other I/O during this - now start a balance of the whole array - errors will spam the log and it's broken. I hope, it is possible to reproduce the errors and find out exactly when this happens. I'll do the same steps again, too, but maybe there is someone else who could try it as well? I'll try it with script, but maybe without LUKS to simplify the setup. With some small loop-devices just for testing this shouldn't take too long even if it sounds like that ;-) Back to my actual data: Are there any tips on how to recover? For recovery, first just try cp -r mnt/* to grab what's still completely OK. Maybe recovery mount option can do some help in the process? Then you may try btrfs restore, which is the safest method, won't write any byte into the offline disks. Lastly, you can try the btrfsck --repair, *WITH BINARY BACKUP OF YOUR DISKS* For best luck, it can make your filesystem completely clean at the cost of some file lost(maybe file name lost or part of data lost, or nothing remaining). Some corrupted file can be partly recovered into 'lost+found' dir of each subvolume. At the best case, the recovered fs can pass btrfsck without any error. But for your case, the salvaged data will be somewhat meaningless, as it works best for uncompressed data! And for the worst case, your filesystem will be corrupted even more. So consider twice before using btrfsck --repair. BTW, if you decided to use btrfs --repair, please upload the full output, since we can use it to improve the b-tree recovery codes. (Yeah, welcome to be a laboratory mice of real world b-tree recovery codes) Thanks, Qu Mount with recover, copy over and see the log, which files seem to be broken? Or some (dangerous) tricks on how to repair this broken file system? I do have a full backup, but it's very slow and may take weeks (months?), if I have to recover everything. Regards, Tobias 2015-05-29 2:36 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com: Original Message Subject: Re: Uncorrectable errors on RAID6 From: Tobias Holst to...@tobby.eu To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年05月28日 21:13 Ah it's already done. You can find the error-log over here: https://paste.ee/p/sxCKF In short there are several of these: bytenr mismatch, want=6318462353408
Re: Uncorrectable errors on RAID6
Thanks, Qu, sad news... :-( No, I also didn't defrag with older kernels. Maybe I did it a while ago with 3.19.x, but there was a scrub afterwards and it showed no error, so this shouldn't be the problem. The things described above were all done with 4.0.3/4.0.4. Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an error in the log, scrub just doesn't do anything according to dstat without any error and still shows running. The errors/problems started during the first balance but maybe this only showed them and is not the cause. Here detailed debug infos to (maybe?) recreate the problem. This is exactly what happened here over some time. As I can only tell when it definitively has been clean (scrub at the beginning of May) an when it definitively was broken (now, end of May), there may be some more steps neccessary to reproduce, because several things happened in the meantime: - filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L t-raid -O extref,raid56,skinny-metadata,no-holes with 6 LUKS-encrypted HDDs on kernel 3.19 - mounted with options defaults,compress-force=zlib,space_cache,autodefrag - copies all data onto it - all data on the devices is now compressed with zlib - until now the filesystem is ok, scrub shows no errors - now mount it with defaults,compress-force=lzo,space_cache instead - use kernel 4.0.3/4.0.4 - create a r/o-snapshot - defrag some data with -clzo - have some (not much) I/O during the process - this should approx. double the size of the defragged data because your snapshot contains your data compressed with zlib and your volume contains your data compressed with lzo - delete the snapshot - wait some time until the cleaning is complete, still some other I/O during this - this doesn't free as much data as the snapshot contained (?) - is this ok? Maybe here the problem already existed/started - defrag the rest of all data on the devices with -clzo, still some other I/O during this - now start a balance of the whole array - errors will spam the log and it's broken. I hope, it is possible to reproduce the errors and find out exactly when this happens. I'll do the same steps again, too, but maybe there is someone else who could try it as well? With some small loop-devices just for testing this shouldn't take too long even if it sounds like that ;-) Back to my actual data: Are there any tips on how to recover? Mount with recover, copy over and see the log, which files seem to be broken? Or some (dangerous) tricks on how to repair this broken file system? I do have a full backup, but it's very slow and may take weeks (months?), if I have to recover everything. Regards, Tobias 2015-05-29 2:36 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com: Original Message Subject: Re: Uncorrectable errors on RAID6 From: Tobias Holst to...@tobby.eu To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年05月28日 21:13 Ah it's already done. You can find the error-log over here: https://paste.ee/p/sxCKF In short there are several of these: bytenr mismatch, want=6318462353408, have=56676169344768 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890 checksum verify failed on 8955306033152 found 5B5F717A wanted C44CA54E checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A and these: ref mismatch on [13431504896 16384] extent item 1, found 0 Backref 13431504896 root 7 not referenced back 0x1202acc0 Incorrect global backref count on 13431504896 found 1 wanted 0 backpointer mismatch on [13431504896 16384] owner ref check failed [13431504896 16384] and these: ref mismatch on [1951739412480 524288] extent item 0, found 1 Backref 1951739412480 root 5 owner 27852 offset 644349952 num_refs 0 not found in extent tree Incorrect local backref count on 1951739412480 root 5 owner 27852 offset 644349952 found 1 wanted 0 back 0x1a92aa20 backpointer mismatch on [1951739412480 524288] Any ideas? :) The metadata is really corrupted... I'd recommend to salvage your data as soon as possible. For the reason, as you didn't run replace, it should at least not the bug spotted by Zhao Lei. BTW, did you run defrag on older kernels? IIRC, old kernel has bug with snapshot aware defrag, so it's later disabled in newer kernel. Not sure if it's related. Balance may be related but I'm not familiar with balance with RAID5/6. So hard to say. Sorry for unable to provide much help. But if you have enough time to find a stable method to reproduce the bug, best try it on loop device, it would definitely help us to debug. Thanks, Qu Regards Tobias 2015-05-28 14:57 GMT+02:00 Tobias Holst to...@tobby.eu: Hi Qu, no, I didn't run a replace. But I ran a defrag with -clzo on all files while there has been slightly I/O on the devices. Don't know if this could cause
Re: Uncorrectable errors on RAID6
Tobias Holst posted on Fri, 29 May 2015 04:00:15 +0200 as excerpted: Back to my actual data: Are there any tips on how to recover? Mount with recover, copy over and see the log, which files seem to be broken? Or some (dangerous) tricks on how to repair this broken file system? I do have a full backup, but it's very slow and may take weeks (months?), if I have to recover everything. Unfortunately I can't be of any direct help. For that, Qu is a dev and already providing quite a bit. But perhaps this will help a bit with background and in further decisions once the big current issue is dealt with... With that out of the way... As a (non-dev) btrfs user, sysadmin, and list regular, I can point out that full btrfs raid56 mode support is quite new, 3.19 was the first that had complete support in theory, and any code that new is very likely buggy enough you won't want to rely on it for anything but testing. Real- world deployment... can come later, after a few kernel cycles worth of maturing. I've been recommending waiting at least two kernel cycles to work out the worst bugs, and that would still be very leading, perhaps bleeding, edge. Better to wait about five cycles, a year or so, after which point btrfs raid56 mode should have stabilized to about that of the rest of btrfs, which is to say, not entirely stable yet, but reasonably usable for most people, provided they're following the sysadmin's backups rule, if they don't have backups by definition they don't care about the data regardless of claims to the contrary, and untested would-be backups cannot for purposes of this rule be considered backups. The recommendation for now thus remains to stick with btrfs raid1 or raid10 modes, which are already effectively as mature as btrfs itself is. Of course, given the six devices in your raid6, raid10 would be the more common choice, but since btrfs raid1 is only two-way-mirrored in any case, you'd get the same effective three-device capacity (assuming devices of roughly the same size) either way And in fact the list unfortunately has several threads of folks with similar raid56 mode issues. On the bright side, I guess their disasters are where the improvements and stabilization come from that the folks waiting the recommended two kernel cycles minimum, better a year (five kernel cycles), get, and were they not there, the recommended wait time would have to be longer. Unfortunately that's little help for the folks with the problem... So you have a backup, but it's slow enough you're looking at weeks or months to recover from it. So it's a last-resort backup, but not a /practical/ backup. How on earth did you come to use btrfs raid56 mode for this more or less not practically backed up data, despite the recommendations and long history of partial raid56 support indicating its complexity and thus the likelihood of severe bugs still being present, in the first place? In fact, given a restore time of weeks to months and the fact that btrfs itself isn't yet completely stable, I'd wonder about choosing it in any mode (I can't imagine doing so myself with that sort of restore time, and I'd give up fancy features in ordered to get something as stable as possible, to cut down as far as possible the chance of having to use it... or perhaps more practically, I'd have an on-site primary backup with restore time on the order of hours to days, in addition to the presumably remote, slow backup and restore, which never-the-less remains an excellent insurance policy for the worst-case), but certainly, the still so new it's extremely likely to be buggy enough to eat data raid56 mode isn't appropriate. Hopefully you can restore, either via direct copy-off, or using btrfs restore (as Qu mentions), which has in fact been something I've used a couple times myself (on btrfs raid1, there's a reason I say btrfs itself isn't fully stable yet) as I've had backups but they weren't current (obviously a tradeoff I was willing to make, given my knowledge of the sysadmin's backup rule above), and btrfs restore worked better for me than the backups would have. But given that you'll have to be restoring to something else, I'd strongly recommend at /least/ switching to btrfs raid1/10 mode, for the time being, if not to something other than btrfs if you still aren't going to have backups that restore in hours to days rather than weeks to months, because btrfs really /isn't/ stable enough for the latter case yet. Then, since you'll have the extra storage you'll have freed after switching to the restored copy, I'd use that to create that local backup, restorable in days at maximum, rather than weeks at minimum, that you're currently missing. With that backup in-place and tested, going ahead and playing with btrfs in its still not entirely stable, but for daily use with backups ready if needed, stable /enough/, is reasonable. Just stay away from the raid56 stuff
Re: Uncorrectable errors on RAID6
Ah it's already done. You can find the error-log over here: https://paste.ee/p/sxCKF In short there are several of these: bytenr mismatch, want=6318462353408, have=56676169344768 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890 checksum verify failed on 8955306033152 found 5B5F717A wanted C44CA54E checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A and these: ref mismatch on [13431504896 16384] extent item 1, found 0 Backref 13431504896 root 7 not referenced back 0x1202acc0 Incorrect global backref count on 13431504896 found 1 wanted 0 backpointer mismatch on [13431504896 16384] owner ref check failed [13431504896 16384] and these: ref mismatch on [1951739412480 524288] extent item 0, found 1 Backref 1951739412480 root 5 owner 27852 offset 644349952 num_refs 0 not found in extent tree Incorrect local backref count on 1951739412480 root 5 owner 27852 offset 644349952 found 1 wanted 0 back 0x1a92aa20 backpointer mismatch on [1951739412480 524288] Any ideas? :) Regards Tobias 2015-05-28 14:57 GMT+02:00 Tobias Holst to...@tobby.eu: Hi Qu, no, I didn't run a replace. But I ran a defrag with -clzo on all files while there has been slightly I/O on the devices. Don't know if this could cause corruptions, too? Later on I deleted a r/o-snapshot which should free a big amount of storage space. It didn't free as much as it should so after a few days I started a balance to free the space. During the balance the first checksum errors happened and the whole balance process crashed: [19174.342882] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.365473] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.365651] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366168] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366250] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366392] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.367313] [ cut here ] [19174.367340] kernel BUG at /home/kernel/COD/linux/fs/btrfs/relocation.c:242! [19174.367384] invalid opcode: [#1] SMP [19174.367418] Modules linked in: iosf_mbi kvm_intel kvm crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy psmouse pata_acpi [19174.367656] CPU: 1 PID: 4960 Comm: btrfs Not tainted 4.0.4-040004-generic #201505171336 [19174.367703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [19174.367752] task: 8804274e8000 ti: 880367b5 task.ti: 880367b5 [19174.367797] RIP: 0010:[c05ec4ba] [c05ec4ba] backref_cache_cleanup+0xea/0x100 [btrfs] [19174.367867] RSP: 0018:880367b53bd8 EFLAGS: 00010202 [19174.367905] RAX: 88008250d8f8 RBX: 88008250d820 RCX: 00018021 [19174.367948] RDX: 88008250d8d8 RSI: 88008250d8e8 RDI: 4000 [19174.367992] RBP: 880367b53bf8 R08: 880418b77780 R09: 00018021 [19174.368037] R10: c05ec1d9 R11: 00018bf8 R12: 0001 [19174.368081] R13: 88008250d8e8 R14: fffb R15: 880367b53c28 [19174.368125] FS: 7f7fd6831c80() GS:88043fc4() knlGS: [19174.368172] CS: 0010 DS: ES: CR0: 80050033 [19174.368210] CR2: 7f65f7564770 CR3: 0003ac92f000 CR4: 001407e0 [19174.368257] Stack: [19174.368279] fffb 88008250d800 88042b3d46e0 88006845f990 [19174.368327] 880367b53c78 c05f25eb 880367b53c78 0002 [19174.368376] 00ff880429e4c670 a910d8fb7e00 [19174.368424] Call Trace: [19174.368459] [c05f25eb] relocate_block_group+0x2cb/0x510 [btrfs] [19174.368509] [c05f29e0] btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs] [19174.368562] [c05c6eab] btrfs_relocate_chunk.isra.75+0x4b/0xd0 [btrfs] [19174.368615] [c05c82e8] __btrfs_balance+0x348/0x460 [btrfs] [19174.368663] [c05c87b5] btrfs_balance+0x3b5/0x5d0 [btrfs] [19174.368710] [c05d5cac] btrfs_ioctl_balance+0x1cc/0x530 [btrfs] [19174.368756] [811b52e0] ? handle_mm_fault+0xb0/0x160 [19174.368802] [c05d7c7e] btrfs_ioctl+0x69e/0xb20 [btrfs] [19174.368845] [8120f5b5]
Re: Uncorrectable errors on RAID6
Hi Qu, no, I didn't run a replace. But I ran a defrag with -clzo on all files while there has been slightly I/O on the devices. Don't know if this could cause corruptions, too? Later on I deleted a r/o-snapshot which should free a big amount of storage space. It didn't free as much as it should so after a few days I started a balance to free the space. During the balance the first checksum errors happened and the whole balance process crashed: [19174.342882] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.365473] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.365651] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366168] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366250] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366392] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.367313] [ cut here ] [19174.367340] kernel BUG at /home/kernel/COD/linux/fs/btrfs/relocation.c:242! [19174.367384] invalid opcode: [#1] SMP [19174.367418] Modules linked in: iosf_mbi kvm_intel kvm crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy psmouse pata_acpi [19174.367656] CPU: 1 PID: 4960 Comm: btrfs Not tainted 4.0.4-040004-generic #201505171336 [19174.367703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [19174.367752] task: 8804274e8000 ti: 880367b5 task.ti: 880367b5 [19174.367797] RIP: 0010:[c05ec4ba] [c05ec4ba] backref_cache_cleanup+0xea/0x100 [btrfs] [19174.367867] RSP: 0018:880367b53bd8 EFLAGS: 00010202 [19174.367905] RAX: 88008250d8f8 RBX: 88008250d820 RCX: 00018021 [19174.367948] RDX: 88008250d8d8 RSI: 88008250d8e8 RDI: 4000 [19174.367992] RBP: 880367b53bf8 R08: 880418b77780 R09: 00018021 [19174.368037] R10: c05ec1d9 R11: 00018bf8 R12: 0001 [19174.368081] R13: 88008250d8e8 R14: fffb R15: 880367b53c28 [19174.368125] FS: 7f7fd6831c80() GS:88043fc4() knlGS: [19174.368172] CS: 0010 DS: ES: CR0: 80050033 [19174.368210] CR2: 7f65f7564770 CR3: 0003ac92f000 CR4: 001407e0 [19174.368257] Stack: [19174.368279] fffb 88008250d800 88042b3d46e0 88006845f990 [19174.368327] 880367b53c78 c05f25eb 880367b53c78 0002 [19174.368376] 00ff880429e4c670 a910d8fb7e00 [19174.368424] Call Trace: [19174.368459] [c05f25eb] relocate_block_group+0x2cb/0x510 [btrfs] [19174.368509] [c05f29e0] btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs] [19174.368562] [c05c6eab] btrfs_relocate_chunk.isra.75+0x4b/0xd0 [btrfs] [19174.368615] [c05c82e8] __btrfs_balance+0x348/0x460 [btrfs] [19174.368663] [c05c87b5] btrfs_balance+0x3b5/0x5d0 [btrfs] [19174.368710] [c05d5cac] btrfs_ioctl_balance+0x1cc/0x530 [btrfs] [19174.368756] [811b52e0] ? handle_mm_fault+0xb0/0x160 [19174.368802] [c05d7c7e] btrfs_ioctl+0x69e/0xb20 [btrfs] [19174.368845] [8120f5b5] do_vfs_ioctl+0x75/0x320 [19174.368882] [8120f8f1] SyS_ioctl+0x91/0xb0 [19174.368923] [817f098d] system_call_fastpath+0x16/0x1b [19174.368962] Code: 3b 00 75 29 44 8b a3 00 01 00 00 45 85 e4 75 1b 44 8b 9b 04 01 00 00 45 85 db 75 0d 48 83 c4 08 5b 41 5c 41 5d 5d c3 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 [19174.369133] RIP [c05ec4ba] backref_cache_cleanup+0xea/0x100 [btrfs] [19174.369186] RSP 880367b53bd8 [19174.369827] [ cut here ] [19174.369827] kernel BUG at /home/kernel/COD/linux/arch/x86/mm/pageattr.c:216! [19174.369827] invalid opcode: [#2] SMP [19174.369827] Modules linked in: iosf_mbi kvm_intel kvm crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy psmouse pata_acpi [19174.369827] CPU: 1 PID: 4960 Comm: btrfs Not tainted 4.0.4-040004-generic #201505171336 [19174.369827] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [19174.369827] task: 8804274e8000 ti: 880367b5 task.ti:
Re: Uncorrectable errors on RAID6
Original Message Subject: Uncorrectable errors on RAID6 From: Tobias Holst to...@tobby.eu To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org Date: 2015年05月28日 10:18 Hi I am doing a scrub on my 6-drive btrfs RAID6. Last time it found zero errors, but now I am getting this in my log: [ 6610.888020] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6610.888025] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6610.888029] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [ 6611.271334] BTRFS: unable to fixup (regular) error at logical 478232346624 on dev /dev/dm-2 [ 6611.831370] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6611.831373] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6611.831375] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [ 6612.396402] BTRFS: unable to fixup (regular) error at logical 478232346624 on dev /dev/dm-2 [ 6904.027456] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6904.027460] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6904.027463] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Looks like it is always the same sector. btrfs balance status shows me: scrub status for a34ce68b-bb9f-49f0-91fe-21a924ef11ae scrub started at Thu May 28 02:25:31 2015, running for 6759 seconds total bytes scrubbed: 448.87GiB with 14 errors error details: read=8 csum=6 corrected errors: 3, uncorrectable errors: 11, unverified errors: 0 What does it mean and why are these erros uncorrectable even on a RAID6? Can I find out, which files are affected? If it's OK for you to put the fs offline, btrfsck is the best method to check what happens, although it may take a long time. There is a known bug that replace can cause checksum error, found by Zhao Lei. So did you run replace while there is still some other disk I/O happens? Thanks, Qu system: Ubuntu 14.04.2 kernel version 4.0.4 btrfs-tools version: 4.0 Regards Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html