Re: cannot mount read-write because of unsupported optional features (2)
Ah thanks! Looks like I missed this bug on the list... But as I understand it, it should be mountable with btrfs-progs >=4.7.3 or when mounting it with "-o clear_cache,nospace_cache". But both doesn't work. Is it only mountable with kernel >=4.9 anymore? That would be interesting since I never ran a kernel newer than 4.8... Regards, Tobias 2016-11-01 6:24 GMT+01:00 Qu Wenruo <quwen...@cn.fujitsu.com>: > > > At 11/01/2016 12:46 PM, Tobias Holst wrote: >> >> Hi >> >> I can't mount my boot partition anymore. When I try it by entering >> "mount /dev/sdi1 /mnt/boot/" I get: >>> >>> mount: wrong fs type, bad option, bad superblock on /dev/sdi1, >>> missing codepage or helper program, or other error >>> >>> In some cases useful info is found in syslog - try >>> dmesg | tail or so. >> >> >> "dmesg | tail" gives me: >>> >>> BTRFS info (device sdi1): using free space tree >>> BTRFS info (device sdi1): has skinny extents >>> BTRFS error (device sdi1): cannot mount read-write because of unsupported >>> optional features (2) >>> BTRFS: open_ctree failed >> >> >> I am using an Ubuntu 16.10 Live CD with Kernel 4.8 and btrfs-progs v4.8.2. >> >> "btrfs inspect-internal dump-super /dev/sdi1" gives me the following: >>> >>> superblock: bytenr=65536, device=/dev/sdi1 >>> - >>> csum_type 0 (crc32c) >>> csum_size 4 >>> csum0x0f346b08 [match] >>> bytenr 65536 >>> flags 0x1 >>>( WRITTEN ) >>> magic _BHRfS_M [match] >>> fsid67ac5740-1ced-4d59-8999-03bb3195ec49 >>> label t-hyper-boot >>> generation 64 >>> root20971520 >>> sys_array_size 129 >>> chunk_root_generation 43 >>> root_level 1 >>> chunk_root 12587008 >>> chunk_root_level0 >>> log_root0 >>> log_root_transid0 >>> log_root_level 0 >>> total_bytes 2147483648 >>> bytes_used 102813696 >>> sectorsize 4096 >>> nodesize4096 >>> leafsize4096 >>> stripesize 4096 >>> root_dir6 >>> num_devices 1 >>> compat_flags0x0 >>> compat_ro_flags 0x3 > > compat_ro_flags 0x3 is FREE_SPACE_TREE and FREE_SPACE_TREE_VALID. > > FREE_SPACE_TREE_VALID is introduced later to fix a endian bug in free space > tree. > > And that seems to be cause. > > Thanks, > Qu > >>> incompat_flags 0x34d >>>( MIXED_BACKREF | >>> MIXED_GROUPS | >>> COMPRESS_LZO | >>> EXTENDED_IREF | >>> SKINNY_METADATA | >>> NO_HOLES ) >>> cache_generation53 >>> uuid_tree_generation64 >>> dev_item.uuid 273d5800-add1-45bb-8a11-ecd6d8c1503e >>> dev_item.fsid 67ac5740-1ced-4d59-8999-03bb3195ec49 [match] >>> dev_item.type 0 >>> dev_item.total_bytes2147483648 >>> dev_item.bytes_used 667680768 >>> dev_item.io_align 4096 >>> dev_item.io_width 4096 >>> dev_item.sector_size4096 >>> dev_item.devid 1 >>> dev_item.dev_group 0 >>> dev_item.seek_speed 0 >>> dev_item.bandwidth 0 >>> dev_item.generation 0 >> >> >> Does anyone have an idea why I can't mount it anymore? >> >> Regards, >> Tobias >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
cannot mount read-write because of unsupported optional features (2)
Hi I can't mount my boot partition anymore. When I try it by entering "mount /dev/sdi1 /mnt/boot/" I get: > mount: wrong fs type, bad option, bad superblock on /dev/sdi1, > missing codepage or helper program, or other error > > In some cases useful info is found in syslog - try > dmesg | tail or so. "dmesg | tail" gives me: > BTRFS info (device sdi1): using free space tree > BTRFS info (device sdi1): has skinny extents > BTRFS error (device sdi1): cannot mount read-write because of unsupported > optional features (2) > BTRFS: open_ctree failed I am using an Ubuntu 16.10 Live CD with Kernel 4.8 and btrfs-progs v4.8.2. "btrfs inspect-internal dump-super /dev/sdi1" gives me the following: > superblock: bytenr=65536, device=/dev/sdi1 > - > csum_type 0 (crc32c) > csum_size 4 > csum0x0f346b08 [match] > bytenr 65536 > flags 0x1 >( WRITTEN ) > magic _BHRfS_M [match] > fsid67ac5740-1ced-4d59-8999-03bb3195ec49 > label t-hyper-boot > generation 64 > root20971520 > sys_array_size 129 > chunk_root_generation 43 > root_level 1 > chunk_root 12587008 > chunk_root_level0 > log_root0 > log_root_transid0 > log_root_level 0 > total_bytes 2147483648 > bytes_used 102813696 > sectorsize 4096 > nodesize4096 > leafsize4096 > stripesize 4096 > root_dir6 > num_devices 1 > compat_flags0x0 > compat_ro_flags 0x3 > incompat_flags 0x34d >( MIXED_BACKREF | > MIXED_GROUPS | > COMPRESS_LZO | > EXTENDED_IREF | > SKINNY_METADATA | > NO_HOLES ) > cache_generation53 > uuid_tree_generation64 > dev_item.uuid 273d5800-add1-45bb-8a11-ecd6d8c1503e > dev_item.fsid 67ac5740-1ced-4d59-8999-03bb3195ec49 [match] > dev_item.type 0 > dev_item.total_bytes2147483648 > dev_item.bytes_used 667680768 > dev_item.io_align 4096 > dev_item.io_width 4096 > dev_item.sector_size4096 > dev_item.devid 1 > dev_item.dev_group 0 > dev_item.seek_speed 0 > dev_item.bandwidth 0 > dev_item.generation 0 Does anyone have an idea why I can't mount it anymore? Regards, Tobias -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
"parent transid verify failed"
Hi I am getting some "parent transid verify failed"-errors. Is there any way to find out what's affected? Are these errors in metadata, data or both - and if they are errors in the data: How can I find out which files are affected? Regards, Tobias -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/9] Btrfs: free space B-tree
Ah, thanks for the information! Happy testing :) 2015-11-03 19:34 GMT+01:00 Chris Mason <c...@fb.com>: > On Tue, Nov 03, 2015 at 07:13:37PM +0100, Tobias Holst wrote: >> Hi >> >> Anything new on this topic? >> >> I think it would be a great thing and should be merged as soon as it >> is stable. :) > > I've been testing it, but my plan is 4.5. > > -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/9] Btrfs: free space B-tree
Hi Anything new on this topic? I think it would be a great thing and should be merged as soon as it is stable. :) Regards, Tobias 2015-10-02 13:47 GMT+02:00 Austin S Hemmelgarn: > On 2015-09-29 23:50, Omar Sandoval wrote: >> >> Hi, >> >> Here's one more reroll of the free space B-tree patches, a more scalable >> alternative to the free space cache. Minimal changes this time around, I >> mainly wanted to resend this after Holger and I cleared up his bug >> report here: http://www.spinics.net/lists/linux-btrfs/msg47165.html. It >> initially looked like it was a bug in a patch that Josef sent, then in >> this series, but finally Holger and I figured out that it was something >> else in the queue of patches he carries around, we just don't know what >> yet (I'm in the middle of looking into it). While trying to reproduce >> that bug, I ran xfstests about a trillion times and a bunch of stress >> tests, so this is fairly well tested now. Additionally, the last time >> around, Holger and Austin both bravely offered their Tested-bys on the >> series. I wasn't sure which patch(es) to tack them onto so here they >> are: >> >> Tested-by: Holger Hoffstätte >> Tested-by: Austin S. Hemmelgarn > > I've re-run the same testing I did for the last iteration, and also tested > that the btrfs_end_transaction thing mentioned below works right now > (Ironically that's one of the few things I didn't think of testing last time > :)), so the Tested-by from me is current now. > >> >> Thanks, everyone! >> >> Omar >> >> Changes from v3->v4: >> >> - Added a missing btrfs_end_transaction() to >> btrfs_create_free_space_tree() and >>btrfs_clear_free_space_tree() in the error cases after we abort the >>transaction (see >> http://www.spinics.net/lists/linux-btrfs/msg47545.html) >> - Rebased the kernel patches on v4.3-rc3 >> - Rebased the progs patches on v4.2.1 >> >> v3: http://www.spinics.net/lists/linux-btrfs/msg47095.html >> >> Changes from v2->v3: >> >> - Fixed a warning in the free space tree sanity tests caught by Zhao Lei. >> - Moved the addition of a block group to the free space tree to occur >> either on >>the first attempt to modify the free space for the block group or in >>btrfs_create_pending_block_groups(), whichever happens first. This >> avoids a >>deadlock (lock recursion) when modifying the free space tree requires >>allocating a new block group. In order to do this, it was simpler to >> change >>the on-disk semantics: the superblock stripes will now appear to be >> free space >>according to the free space tree, but load_free_space_tree() will still >>exclude them when building the in-memory free space cache. >> - Changed the free_space_tree option to space_cache=v2 and made >> clear_cache >>clear the free space tree. If the free space tree has been created, >>the mount will fail unless space_cache=v2 or nospace_cache,clear_cache >>is given because we cannot allow the free space tree to get out of >>date. >> - Did a once-over of the code and caught a couple of error handling typos. >> >> v2: http://www.spinics.net/lists/linux-btrfs/msg46796.html >> >> Changes from v1->v2: >> >> - Cleaned up a bunch of unnecessary instances of "if (ret) goto out; ret = >> 0" >> - Added aborts in the free space tree code closer to the site the error is >>encountered: where we add or remove block groups, add or remove free >> space, >>and also when we convert formats >> - Moved loading of the free space tree into caching_thread() and added a >> new >>patch 3 in preparation for it >> - Commented a bunch of stuff in the extent buffer bitmap operations and >>refactored some of the complicated logic >> >> v1: http://www.spinics.net/lists/linux-btrfs/msg46713.html >> >> Omar Sandoval (9): >>Btrfs: add extent buffer bitmap operations >>Btrfs: add extent buffer bitmap sanity tests >>Btrfs: add helpers for read-only compat bits >>Btrfs: refactor caching_thread() >>Btrfs: introduce the free space B-tree on-disk format >>Btrfs: implement the free space B-tree >>Btrfs: add free space tree sanity tests >>Btrfs: wire up the free space tree to the extent tree >>Btrfs: add free space tree mount option >> >> fs/btrfs/Makefile |5 +- >> fs/btrfs/ctree.h | 157 +++- >> fs/btrfs/disk-io.c | 38 + >> fs/btrfs/extent-tree.c | 98 +- >> fs/btrfs/extent_io.c | 183 +++- >> fs/btrfs/extent_io.h | 10 +- >> fs/btrfs/free-space-tree.c | 1584 >> >> fs/btrfs/free-space-tree.h | 72 ++ >> fs/btrfs/super.c | 56 +- >> fs/btrfs/tests/btrfs-tests.c | 52 ++ >> fs/btrfs/tests/btrfs-tests.h | 10 + >> fs/btrfs/tests/extent-io-tests.c
Re: "free_raid_bio" crash on RAID6
Hi No, I never figured this out... After a while of waiting for answers I just started over and took the data from my backup. > Did you try removing the bad drive and did the system keep crashing anyway? As you can see in my first mail the drive was already removed when this error started to happen ("some devices missing"). ;) Regards, Tobias 2015-10-18 16:14 GMT+02:00 Philip Seeger <p0h0i0l0...@gmail.com>: > Hi Tobias > > On 07/20/2015 06:20 PM, Tobias Holst wrote: >> >> My btrfs-RAID6 seems to be broken again :( >> >> When reading from it I get several of these: >> [ 176.349943] BTRFS info (device dm-4): csum failed ino 1287707 >> extent 21274957705216 csum 2830458701 wanted 426660650 mirror 2 >> >> then followed by a "free_raid_bio"-crash: >> >> [ 176.349961] [ cut here ] >> [ 176.349981] WARNING: CPU: 6 PID: 110 at >> /home/kernel/COD/linux/fs/btrfs/raid56.c:831 >> __free_raid_bio+0xfc/0x130 [btrfs]() >> ... > > > It's been 3 months now, have you ever figured this out? Do you know if the > bug has been identified and fixed or have you filed a bugzilla report? > >> One drive is broken, so at the moment it is mounted with "-O >> defaults,ro,degraded,recovery,compress=lzo,space_cache,subvol=raid". > > > Did you try removing the bad drive and did the system keep crashing anyway? > > > > Philip > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
free_raid_bio crash on RAID6
Hi My btrfs-RAID6 seems to be broken again :( When reading from it I get several of these: [ 176.349943] BTRFS info (device dm-4): csum failed ino 1287707 extent 21274957705216 csum 2830458701 wanted 426660650 mirror 2 then followed by a free_raid_bio-crash: [ 176.349961] [ cut here ] [ 176.349981] WARNING: CPU: 6 PID: 110 at /home/kernel/COD/linux/fs/btrfs/raid56.c:831 __free_raid_bio+0xfc/0x130 [btrfs]() [ 176.349982] Modules linked in: iosf_mbi kvm_intel kvm ppdev crct10dif_pclmul crc32_pclmul dm_crypt ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper serio_raw 8250_fintek i2c_piix4 pvpanic cryptd mac_hid virtio_rng parport_pc lp parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt ttm drm_kms_helper mpt2sas drm raid_class psmouse floppy scsi_transport_sas pata_acpi [ 176.349998] CPU: 6 PID: 110 Comm: kworker/u16:2 Not tainted 4.1.2-040102-generic #201507101335 [ 176.34] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 176.350007] Workqueue: btrfs-endio btrfs_endio_helper [btrfs] [ 176.350008] c026fc18 8800baa4f978 817d076c [ 176.350010] 8800baa4f9b8 81079b0a 0246 [ 176.350011] 88034e7baa68 88008619b800 fffb [ 176.350013] Call Trace: [ 176.350023] [817d076c] dump_stack+0x45/0x57 [ 176.350026] [81079b0a] warn_slowpath_common+0x8a/0xc0 [ 176.350029] [81079bfa] warn_slowpath_null+0x1a/0x20 [ 176.350036] [c025e91c] __free_raid_bio+0xfc/0x130 [btrfs] [ 176.350041] [c025f351] rbio_orig_end_io+0x51/0xa0 [btrfs] [ 176.350047] [c02610e3] __raid56_parity_recover+0x1d3/0x210 [btrfs] [ 176.350052] [c0261cb0] raid56_parity_recover+0x110/0x180 [btrfs] [ 176.350058] [c0216cdb] btrfs_map_bio+0xdb/0x4e0 [btrfs] [ 176.350065] [c0236024] btrfs_submit_compressed_read+0x354/0x4e0 [btrfs] [ 176.350070] [c01ee681] btrfs_submit_bio_hook+0x1d1/0x1e0 [btrfs] [ 176.350076] [81376dbe] ? bio_add_page+0x5e/0x70 [ 176.350083] [c020c176] ? btrfs_create_repair_bio+0xe6/0x110 [btrfs] [ 176.350089] [c020c6ab] end_bio_extent_readpage+0x50b/0x560 [btrfs] [ 176.350094] [c020c1a0] ? btrfs_create_repair_bio+0x110/0x110 [btrfs] [ 176.350096] [8137934b] bio_endio+0x5b/0xa0 [ 176.350103] [811d9e19] ? kmem_cache_free+0x1d9/0x1f0 [ 176.350104] [813793a2] bio_endio_nodec+0x12/0x20 [ 176.350109] [c01e10df] end_workqueue_fn+0x3f/0x50 [btrfs] [ 176.350115] [c021b522] normal_work_helper+0xc2/0x2b0 [btrfs] [ 176.350121] [c021b7e2] btrfs_endio_helper+0x12/0x20 [btrfs] [ 176.350124] [8109324f] process_one_work+0x14f/0x420 [ 176.350127] [81093a08] worker_thread+0x118/0x530 [ 176.350128] [810938f0] ? rescuer_thread+0x3d0/0x3d0 [ 176.350129] [81098f89] kthread+0xc9/0xe0 [ 176.350130] [81098ec0] ? kthread_create_on_node+0x180/0x180 [ 176.350134] [817d86a2] ret_from_fork+0x42/0x70 [ 176.350135] [81098ec0] ? kthread_create_on_node+0x180/0x180 [ 176.350136] ---[ end trace 81289955f20d48ee ]--- Did I found a kernel bug? What can/should I do? Don't worry about my data, I have tape-backups of the important data, I just want to help fixing RAID-related btrfs bugs. Hardware: KVM with all drives attached to a passed through SAS-controller System: Ubuntu 14.04.2 Kernel: 4.1.2 btrfs-tools: 4.0 It's a btrfs-RAID-6 on top of 6 LUKS-encrypted volumes, created with -O extref,raid56,skinny-metadata,no-holes. At normal it's mounted with defaults,compress=lzo,space_cache,autodefrag,subvol=raid. One drive is broken, so at the moment it is mounted with -O defaults,ro,degraded,recovery,compress=lzo,space_cache,subvol=raid. It's pretty much full, so btrfs fi show shows: Label: 't-raid' uuid: 3938baeb-cb02-4909-8e75-6ec2f47d1d19 Total devices 6 FS bytes used 14.44TiB devid2 size 3.64TiB used 3.64TiB path /dev/mapper/sdb_crypt devid3 size 3.64TiB used 3.64TiB path /dev/mapper/sdc_crypt devid4 size 3.64TiB used 3.64TiB path /dev/mapper/sdd_crypt devid5 size 3.64TiB used 3.64TiB path /dev/mapper/sde_crypt devid6 size 3.64TiB used 3.64TiB path /dev/mapper/sdf_crypt *** Some devices missing and btrfs fi df /raid shows: Data, RAID6: total=14.52TiB, used=14.42TiB System, RAID6: total=64.00MiB, used=1.00MiB Metadata, RAID6: total=24.00GiB, used=21.78GiB GlobalReserve, single: total=512.00MiB, used=0.00B Regards, Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Uncorrectable errors on RAID6
Hi Qu, hi all, RO snapshot, I remember there is a RO snapshot bug, but seems fixed in 4.x? Yes, that bug has already been fixed. For recovery, first just try cp -r mnt/* to grab what's still completely OK. Maybe recovery mount option can do some help in the process? That's what I did now. I mounted with recovery and copied all of my important data. But several folders/files couldn't be read, the whole system stopped responding. Nothing in the logs, nothing on the screen - but everything is frozen. So I have to take these files out of my backup. Also several files produced checksum verify failed, csum failed and no csum found errrors in the syslog. Then you may try btrfs restore, which is the safest method, won't write any byte into the offline disks. Yes but I would need at least the same storage space as for the original data - and I don't have as much free space somewhere else (or not quickly available). Lastly, you can try the btrfsck --repair, *WITH BINARY BACKUP OF YOUR DISKS* I don't have a bitwise copy of my disks, but all important data is secure now. So I tried it, see below. BTW, if you decided to use btrfs --repair, please upload the full output, since we can use it to improve the b-tree recovery codes. OK, see below. (Yeah, welcome to be a laboratory mice of real world b-tree recovery codes) Haha, right. Since I have been testing the experimental RAID6-features of btrfs for a while I know what it means to be a laboratory mice ;) So back to btrfsck. I started it and after a while this happened in the syslog. Again and again: https://paste.ee/p/BIs56 According to the internet this is a known but very rare problem with my LSI 9211-8i controller. It happens when the PCIe-generation-autodetection detects the card as a PCIe-3.0-card instead of 2.0 and heavy I/O is happening. Because I never ever had this bug before it must be coincidence... But not the root cause of this broken filesystem. As a result there were many blk_update_request: I/O error, FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE, Add. Sense: Power on, reset, or bus device reset occurred and Buffer I/O error/lost async page write in the syslog. The result of btrfsck --repair until this point: https://paste.ee/p/nzzAo Then btrfsck died: https://paste.ee/p/0Brku Now I rebooted and forced the card to PCIe-generation 2.0, so this bug shouldn't happen again, and started btrfsck --repair again. This time it ran without controller-problems and you can find the full output here: https://ssl-account.com/oc.tobby.eu/public.php?service=filest=8b93f56a69ea04886e9bc2c8534b32f6 (huge, about 13MB) Result: One (out of four) folder in my root-directory is completly gone (about 8 TB). Two folders seem to be ok (about 1.4 TB). And the last folder is ok in terms of folder- and subfolder-structure, but nearly all subfolders are empty (only 230GB of 3.1TB are still there). So roughly 90% of the data is gone now. I will now destroy the filesystem, create a new btrfs-RAID-6 and fetch the data out of my backups. I hope, my logs help a little bit to find the cause. I didn't have the time to try to reproduce this broken filesystem - did you try it with loop devices? Regards, Tobias 2015-05-29 4:27 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com: Original Message Subject: Re: Uncorrectable errors on RAID6 From: Tobias Holst to...@tobby.eu To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年05月29日 10:00 Thanks, Qu, sad news... :-( No, I also didn't defrag with older kernels. Maybe I did it a while ago with 3.19.x, but there was a scrub afterwards and it showed no error, so this shouldn't be the problem. The things described above were all done with 4.0.3/4.0.4. Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an error in the log, scrub just doesn't do anything according to dstat without any error and still shows running. The errors/problems started during the first balance but maybe this only showed them and is not the cause. Here detailed debug infos to (maybe?) recreate the problem. This is exactly what happened here over some time. As I can only tell when it definitively has been clean (scrub at the beginning of May) an when it definitively was broken (now, end of May), there may be some more steps neccessary to reproduce, because several things happened in the meantime: - filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L t-raid -O extref,raid56,skinny-metadata,no-holes with 6 LUKS-encrypted HDDs on kernel 3.19 LUKS... Even LUKS is much stabler than btrfs, and may not be related to the bug, your setup is quite complex anyway. - mounted with options defaults,compress-force=zlib,space_cache,autodefrag Normally i'd not recommend compress-force as btrfs can auto detect compress ratio. But such complex setting up with such mount option as LUKS base should be quite a good playground to produce some of bug. - copies all data onto it - all data
checksum verify failed vs. csum failed
Hi Just a question to understand my logs. Doesn't matter where these errors come from, I just want to understand them. What is the difference of these two message types? BTRFS: dm-4 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 vs. BTRFS warning (device dm-4): csum failed ino 27594 off 1266679808 csum 1065556573 expected csum 0 Maybe the first one was a correctable error and the second one not? Regards, Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Uncorrectable errors on RAID6
Thanks, Qu, sad news... :-( No, I also didn't defrag with older kernels. Maybe I did it a while ago with 3.19.x, but there was a scrub afterwards and it showed no error, so this shouldn't be the problem. The things described above were all done with 4.0.3/4.0.4. Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an error in the log, scrub just doesn't do anything according to dstat without any error and still shows running. The errors/problems started during the first balance but maybe this only showed them and is not the cause. Here detailed debug infos to (maybe?) recreate the problem. This is exactly what happened here over some time. As I can only tell when it definitively has been clean (scrub at the beginning of May) an when it definitively was broken (now, end of May), there may be some more steps neccessary to reproduce, because several things happened in the meantime: - filesystem was created with mkfs.btrfs -f -m raid6 -d raid6 -L t-raid -O extref,raid56,skinny-metadata,no-holes with 6 LUKS-encrypted HDDs on kernel 3.19 - mounted with options defaults,compress-force=zlib,space_cache,autodefrag - copies all data onto it - all data on the devices is now compressed with zlib - until now the filesystem is ok, scrub shows no errors - now mount it with defaults,compress-force=lzo,space_cache instead - use kernel 4.0.3/4.0.4 - create a r/o-snapshot - defrag some data with -clzo - have some (not much) I/O during the process - this should approx. double the size of the defragged data because your snapshot contains your data compressed with zlib and your volume contains your data compressed with lzo - delete the snapshot - wait some time until the cleaning is complete, still some other I/O during this - this doesn't free as much data as the snapshot contained (?) - is this ok? Maybe here the problem already existed/started - defrag the rest of all data on the devices with -clzo, still some other I/O during this - now start a balance of the whole array - errors will spam the log and it's broken. I hope, it is possible to reproduce the errors and find out exactly when this happens. I'll do the same steps again, too, but maybe there is someone else who could try it as well? With some small loop-devices just for testing this shouldn't take too long even if it sounds like that ;-) Back to my actual data: Are there any tips on how to recover? Mount with recover, copy over and see the log, which files seem to be broken? Or some (dangerous) tricks on how to repair this broken file system? I do have a full backup, but it's very slow and may take weeks (months?), if I have to recover everything. Regards, Tobias 2015-05-29 2:36 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com: Original Message Subject: Re: Uncorrectable errors on RAID6 From: Tobias Holst to...@tobby.eu To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2015年05月28日 21:13 Ah it's already done. You can find the error-log over here: https://paste.ee/p/sxCKF In short there are several of these: bytenr mismatch, want=6318462353408, have=56676169344768 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890 checksum verify failed on 8955306033152 found 5B5F717A wanted C44CA54E checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A and these: ref mismatch on [13431504896 16384] extent item 1, found 0 Backref 13431504896 root 7 not referenced back 0x1202acc0 Incorrect global backref count on 13431504896 found 1 wanted 0 backpointer mismatch on [13431504896 16384] owner ref check failed [13431504896 16384] and these: ref mismatch on [1951739412480 524288] extent item 0, found 1 Backref 1951739412480 root 5 owner 27852 offset 644349952 num_refs 0 not found in extent tree Incorrect local backref count on 1951739412480 root 5 owner 27852 offset 644349952 found 1 wanted 0 back 0x1a92aa20 backpointer mismatch on [1951739412480 524288] Any ideas? :) The metadata is really corrupted... I'd recommend to salvage your data as soon as possible. For the reason, as you didn't run replace, it should at least not the bug spotted by Zhao Lei. BTW, did you run defrag on older kernels? IIRC, old kernel has bug with snapshot aware defrag, so it's later disabled in newer kernel. Not sure if it's related. Balance may be related but I'm not familiar with balance with RAID5/6. So hard to say. Sorry for unable to provide much help. But if you have enough time to find a stable method to reproduce the bug, best try it on loop device, it would definitely help us to debug. Thanks, Qu Regards Tobias 2015-05-28 14:57 GMT+02:00 Tobias Holst to...@tobby.eu: Hi Qu, no, I didn't run a replace. But I ran a defrag with -clzo on all files while there has been slightly I/O on the devices. Don't know if this could cause
Re: Uncorrectable errors on RAID6
Ah it's already done. You can find the error-log over here: https://paste.ee/p/sxCKF In short there are several of these: bytenr mismatch, want=6318462353408, have=56676169344768 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890 checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890 checksum verify failed on 8955306033152 found 5B5F717A wanted C44CA54E checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A and these: ref mismatch on [13431504896 16384] extent item 1, found 0 Backref 13431504896 root 7 not referenced back 0x1202acc0 Incorrect global backref count on 13431504896 found 1 wanted 0 backpointer mismatch on [13431504896 16384] owner ref check failed [13431504896 16384] and these: ref mismatch on [1951739412480 524288] extent item 0, found 1 Backref 1951739412480 root 5 owner 27852 offset 644349952 num_refs 0 not found in extent tree Incorrect local backref count on 1951739412480 root 5 owner 27852 offset 644349952 found 1 wanted 0 back 0x1a92aa20 backpointer mismatch on [1951739412480 524288] Any ideas? :) Regards Tobias 2015-05-28 14:57 GMT+02:00 Tobias Holst to...@tobby.eu: Hi Qu, no, I didn't run a replace. But I ran a defrag with -clzo on all files while there has been slightly I/O on the devices. Don't know if this could cause corruptions, too? Later on I deleted a r/o-snapshot which should free a big amount of storage space. It didn't free as much as it should so after a few days I started a balance to free the space. During the balance the first checksum errors happened and the whole balance process crashed: [19174.342882] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.365473] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.365651] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366168] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366250] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.366392] BTRFS: dm-5 checksum verify failed on 6318462353408 wanted 25D94CD6 found 8BA427D4 level 1 [19174.367313] [ cut here ] [19174.367340] kernel BUG at /home/kernel/COD/linux/fs/btrfs/relocation.c:242! [19174.367384] invalid opcode: [#1] SMP [19174.367418] Modules linked in: iosf_mbi kvm_intel kvm crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy psmouse pata_acpi [19174.367656] CPU: 1 PID: 4960 Comm: btrfs Not tainted 4.0.4-040004-generic #201505171336 [19174.367703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [19174.367752] task: 8804274e8000 ti: 880367b5 task.ti: 880367b5 [19174.367797] RIP: 0010:[c05ec4ba] [c05ec4ba] backref_cache_cleanup+0xea/0x100 [btrfs] [19174.367867] RSP: 0018:880367b53bd8 EFLAGS: 00010202 [19174.367905] RAX: 88008250d8f8 RBX: 88008250d820 RCX: 00018021 [19174.367948] RDX: 88008250d8d8 RSI: 88008250d8e8 RDI: 4000 [19174.367992] RBP: 880367b53bf8 R08: 880418b77780 R09: 00018021 [19174.368037] R10: c05ec1d9 R11: 00018bf8 R12: 0001 [19174.368081] R13: 88008250d8e8 R14: fffb R15: 880367b53c28 [19174.368125] FS: 7f7fd6831c80() GS:88043fc4() knlGS: [19174.368172] CS: 0010 DS: ES: CR0: 80050033 [19174.368210] CR2: 7f65f7564770 CR3: 0003ac92f000 CR4: 001407e0 [19174.368257] Stack: [19174.368279] fffb 88008250d800 88042b3d46e0 88006845f990 [19174.368327] 880367b53c78 c05f25eb 880367b53c78 0002 [19174.368376] 00ff880429e4c670 a910d8fb7e00 [19174.368424] Call Trace: [19174.368459] [c05f25eb] relocate_block_group+0x2cb/0x510 [btrfs] [19174.368509] [c05f29e0] btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs] [19174.368562] [c05c6eab] btrfs_relocate_chunk.isra.75+0x4b/0xd0 [btrfs] [19174.368615] [c05c82e8] __btrfs_balance+0x348/0x460 [btrfs] [19174.368663] [c05c87b5] btrfs_balance+0x3b5/0x5d0 [btrfs] [19174.368710] [c05d5cac] btrfs_ioctl_balance+0x1cc/0x530 [btrfs] [19174.368756] [811b52e0] ? handle_mm_fault+0xb0/0x160 [19174.368802] [c05d7c7e] btrfs_ioctl+0x69e/0xb20 [btrfs] [19174.368845] [8120f5b5
Re: Uncorrectable errors on RAID6
] [c05ec43c] ? backref_cache_cleanup+0x6c/0x100 [btrfs] [19174.369827] [c05f25eb] relocate_block_group+0x2cb/0x510 [btrfs] [19174.369827] [c05f29e0] btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs] [19174.369827] [c05c6eab] btrfs_relocate_chunk.isra.75+0x4b/0xd0 [btrfs] [19174.369827] [c05c82e8] __btrfs_balance+0x348/0x460 [btrfs] [19174.369827] [c05c87b5] btrfs_balance+0x3b5/0x5d0 [btrfs] [19174.369827] [c05d5cac] btrfs_ioctl_balance+0x1cc/0x530 [btrfs] [19174.369827] [811b52e0] ? handle_mm_fault+0xb0/0x160 [19174.369827] [c05d7c7e] btrfs_ioctl+0x69e/0xb20 [btrfs] [19174.369827] [8120f5b5] do_vfs_ioctl+0x75/0x320 [19174.369827] [8120f8f1] SyS_ioctl+0x91/0xb0 [19174.369827] [817f098d] system_call_fastpath+0x16/0x1b [19174.369827] Code: 4e 8b 2c 23 eb cd 66 0f 1f 44 00 00 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 90 be 00 10 00 00 4c 89 ef e8 a3 ee ff ff eb c7 0f 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 [19174.369827] RIP [8106875f] cpa_flush_array+0x10f/0x120 [19174.369827] RSP 880367b52cf8 [19174.369827] ---[ end trace 60adc437bd944044 ]--- After a reboot and a remount it always tried to resume the balance and and then crashed again, so I had to be quick to do a btrfs balance cancel. Then I started the scrub and got these uncorrectable errors I mentioned in the first mail. I just unmounted it and started a btrfsck. Will post the output when it's done. It's already showing me several of these: checksum verify failed on 18523667709952 found C240FB11 wanted 1ED6A587 checksum verify failed on 18523667709952 found C240FB11 wanted 1ED6A587 checksum verify failed on 18523667709952 found 5EAB6BFE wanted BA48D648 checksum verify failed on 18523667709952 found 8E19F60E wanted E3A34D18 checksum verify failed on 18523667709952 found C240FB11 wanted 1ED6A587 bytenr mismatch, want=18523667709952, have=10838194617263884761 Thanks, Tobias 2015-05-28 4:49 GMT+02:00 Qu Wenruo quwen...@cn.fujitsu.com: Original Message Subject: Uncorrectable errors on RAID6 From: Tobias Holst to...@tobby.eu To: linux-btrfs@vger.kernel.org linux-btrfs@vger.kernel.org Date: 2015年05月28日 10:18 Hi I am doing a scrub on my 6-drive btrfs RAID6. Last time it found zero errors, but now I am getting this in my log: [ 6610.888020] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6610.888025] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6610.888029] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [ 6611.271334] BTRFS: unable to fixup (regular) error at logical 478232346624 on dev /dev/dm-2 [ 6611.831370] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6611.831373] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6611.831375] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [ 6612.396402] BTRFS: unable to fixup (regular) error at logical 478232346624 on dev /dev/dm-2 [ 6904.027456] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6904.027460] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6904.027463] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Looks like it is always the same sector. btrfs balance status shows me: scrub status for a34ce68b-bb9f-49f0-91fe-21a924ef11ae scrub started at Thu May 28 02:25:31 2015, running for 6759 seconds total bytes scrubbed: 448.87GiB with 14 errors error details: read=8 csum=6 corrected errors: 3, uncorrectable errors: 11, unverified errors: 0 What does it mean and why are these erros uncorrectable even on a RAID6? Can I find out, which files are affected? If it's OK for you to put the fs offline, btrfsck is the best method to check what happens, although it may take a long time. There is a known bug that replace can cause checksum error, found by Zhao Lei. So did you run replace while there is still some other disk I/O happens? Thanks, Qu system: Ubuntu 14.04.2 kernel version 4.0.4 btrfs-tools version: 4.0 Regards Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Uncorrectable errors on RAID6
Hi I am doing a scrub on my 6-drive btrfs RAID6. Last time it found zero errors, but now I am getting this in my log: [ 6610.888020] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6610.888025] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6610.888029] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [ 6611.271334] BTRFS: unable to fixup (regular) error at logical 478232346624 on dev /dev/dm-2 [ 6611.831370] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6611.831373] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6611.831375] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [ 6612.396402] BTRFS: unable to fixup (regular) error at logical 478232346624 on dev /dev/dm-2 [ 6904.027456] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6904.027460] BTRFS: checksum error at logical 478232346624 on dev /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2 [ 6904.027463] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 Looks like it is always the same sector. btrfs balance status shows me: scrub status for a34ce68b-bb9f-49f0-91fe-21a924ef11ae scrub started at Thu May 28 02:25:31 2015, running for 6759 seconds total bytes scrubbed: 448.87GiB with 14 errors error details: read=8 csum=6 corrected errors: 3, uncorrectable errors: 11, unverified errors: 0 What does it mean and why are these erros uncorrectable even on a RAID6? Can I find out, which files are affected? system: Ubuntu 14.04.2 kernel version 4.0.4 btrfs-tools version: 4.0 Regards Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Repair broken btrfs raid6?
OK, I see. Maybe there is even more damaged... Now I finished my second backup of the important data and just killed this damaged raid. I created a new one and now I am restoring my data. Let's hope it will last longer this time :) Regards, Tobias 2015-02-15 4:30 GMT+01:00 Liu Bo bo.li@oracle.com: On Fri, Feb 13, 2015 at 10:54:22PM +0100, Tobias Holst wrote: It's me again. I just found out why my system crashed during the back up. I don't know what it means, but maybe it helps you? The warning means somehow checksum becomes inconsistent with file extents, but no clear clues about the cause :-( Thanks, -liubo WARNING: CPU: 7 PID: 22878 at /home/kernel/COD/linux/fs/btrfs/extent_io.c:5203 read_extent_buffer+0xe3/0x120 [btrfs]() Modules linked in: raid0(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) ppdev(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) virtio_rng(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) serio_raw(E) 8250_fintek(E) parport_pc(E) pvpanic(E) i2c_piix4(E) mac_hid(E) lp(E) parport(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) mpt2sas(E) ttm(E) drm_kms_helper(E) raid_class(E) floppy(E) psmouse(E) scsi_transport_sas(E) drm(E) [c05089f3] read_extent_buffer+0xe3/0x120 [btrfs] [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs] [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs] [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs] [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs] [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs] [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs] [c0506087] __do_readpage+0x3f7/0x640 [btrfs] [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c050734e] extent_readpages+0x15e/0x1a0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs] [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs] hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) ppdev(E) crc32_pclmul(E) [c05089ca] ? read_extent_buffer+0xba/0x120 [btrfs] [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs] [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs] [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs] [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs] [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs] [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs] [c0506087] __do_readpage+0x3f7/0x640 [btrfs] [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c050734e] extent_readpages+0x15e/0x1a0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs] [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs] Modules linked in: raid0(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) [c05089ca] ? read_extent_buffer+0xba/0x120 [btrfs] [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs] [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs] [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs] [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs] [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs] [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs] [c0506087] __do_readpage+0x3f7/0x640 [btrfs] [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c050734e] extent_readpages+0x15e/0x1a0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs] [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs] Regards, Tobias 2015-02-13 19:26 GMT+01:00 Tobias Holst to...@tobby.eu: 2015-02-13 9:06 GMT+01:00 Liu Bo bo.li@oracle.com: On Fri, Feb 13, 2015 at 12:22:16AM +0100, Tobias Holst wrote: Hi I don't remember the exact mkfs.btrfs options anymore but ls /sys/fs/btrfs/[UUID]/features/ shows the following
Re: Repair broken btrfs raid6?
2015-02-13 9:06 GMT+01:00 Liu Bo bo.li@oracle.com: On Fri, Feb 13, 2015 at 12:22:16AM +0100, Tobias Holst wrote: Hi I don't remember the exact mkfs.btrfs options anymore but ls /sys/fs/btrfs/[UUID]/features/ shows the following output: big_metadata compress_lzo extended_iref mixed_backref raid56 Well... mkfs.btrfs can specify a '-m' for metadata profile and a '-d' for data profile, the default profile for metadata is RAID1, so we're not sure if your metadata is RAID1 or RAID6, if raid1 and both copies are corrupted, then please use your backup. Ah, I used RAID6 for both, so btrfs fi df /[mountpoint] looks like this: Data, RAID6: total=13.11TiB, used=13.10TiB System, RAID6: total=64.00MiB, used=928.00KiB Metadata, RAID6: total=25.00GiB, used=23.29GiB GlobalReserve, single: total=512.00MiB, used=0.00B I also tested my device with a short hdparm -tT /dev/dm5 and got /dev/mapper/sdc_crypt: Timing cached reads: 30712 MB in 2.00 seconds = 15376.11 MB/sec Timing buffered disk reads: 444 MB in 3.01 seconds = 147.51 MB/sec Looks ok to me. Should I test more? Okay, looks good. I bought a few new hard drives so currently I am copying all my data to a second (faster) backup, so I can maybe overwrite the current file system, if it's not repairable. Another question, have you tried mount -o recovery, did it work? Yes and no. At the moment I mounted it with defaults,recovery,ro,compress-force=lzo,nospace_cache,clear_cache. I am still getting some errors in the syslog, but less than before. Also it doesn't get unreadable after a while like before. But it seems to be a little bit slow sometimes and two times the whole system freezed until I did a hard reset. Thanks, -liubo Regards, Tobias 2015-02-12 10:16 GMT+01:00 Liu Bo bo.li@oracle.com: On Wed, Feb 11, 2015 at 03:46:33PM +0100, Tobias Holst wrote: Hmm, it looks like it is getting worse... Here are some parts of my syslog, including two crashed btrfs-threads: So I am still getting many of these: BTRFS (device dm-5): parent transid verify failed on 25033166798848 wanted 108976 found 108958 BTRFS warning (device dm-5): page private not zero on page 25033166798848 BTRFS warning (device dm-5): page private not zero on page 25033166802944 BTRFS warning (device dm-5): page private not zero on page 25033166807040 BTRFS warning (device dm-5): page private not zero on page 25033166811136 First we probably make sure that your device is well setup, since these messages usually occur after a drive is removed(the device is somehow droping writes), the below -EIO also implies btrfs cannot read/write data from or to that drive. And in theory, RAID6 can tolerate two drive failures, so what's your mkfs.btrfs option? Thanks, -liubo BTRFS info (device dm-5): force lzo compression BTRFS info (device dm-5): disk space caching is enabled BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 Then there is this crash of super/btrfs_abort_transaction: [ cut here ] WARNING: CPU: 0 PID: 30526 at /home/kernel/COD/linux/fs/btrfs/super.c:260 __btrfs_abort_transaction+0x5f/0x140 [btrfs]() BTRFS: Transaction aborted (error -5) Modules linked in: ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) 8250_fintek(E) serio_raw(E) virtio_rng(E) parport_pc(E) mac_hid(E) pvpanic(E) i2c_piix4(E) lp(E) parport(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ttm(E) mpt2sas(E) drm_kms_helper(E) raid_class(E) floppy(E) psmouse(E) drm(E) scsi_transport_sas(E) CPU: 0 PID: 30526 Comm: kworker/u16:6 Tainted: GW E 3.19.0-031900-generic #201502091451 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] 0104 880002743c18 817c4c00 0007 880002743c68 880002743c58 81076e87 880002743c58 88020a8694d0 8801fb715800 fffb 0ae8 Call Trace: [817c4c00] dump_stack+0x45/0x57
Re: Repair broken btrfs raid6?
It's me again. I just found out why my system crashed during the back up. I don't know what it means, but maybe it helps you? WARNING: CPU: 7 PID: 22878 at /home/kernel/COD/linux/fs/btrfs/extent_io.c:5203 read_extent_buffer+0xe3/0x120 [btrfs]() Modules linked in: raid0(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) ppdev(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) virtio_rng(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) serio_raw(E) 8250_fintek(E) parport_pc(E) pvpanic(E) i2c_piix4(E) mac_hid(E) lp(E) parport(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) mpt2sas(E) ttm(E) drm_kms_helper(E) raid_class(E) floppy(E) psmouse(E) scsi_transport_sas(E) drm(E) [c05089f3] read_extent_buffer+0xe3/0x120 [btrfs] [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs] [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs] [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs] [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs] [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs] [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs] [c0506087] __do_readpage+0x3f7/0x640 [btrfs] [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c050734e] extent_readpages+0x15e/0x1a0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs] [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs] hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) ppdev(E) crc32_pclmul(E) [c05089ca] ? read_extent_buffer+0xba/0x120 [btrfs] [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs] [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs] [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs] [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs] [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs] [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs] [c0506087] __do_readpage+0x3f7/0x640 [btrfs] [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c050734e] extent_readpages+0x15e/0x1a0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs] [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs] Modules linked in: raid0(E) ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) [c05089ca] ? read_extent_buffer+0xba/0x120 [btrfs] [c04d7dde] __btrfs_lookup_bio_sums.isra.8+0x2ce/0x540 [btrfs] [c04d82a6] btrfs_lookup_bio_sums+0x36/0x40 [btrfs] [c05301e6] btrfs_submit_compressed_read+0x316/0x4e0 [btrfs] [c04ea031] btrfs_submit_bio_hook+0x1c1/0x1d0 [btrfs] [c05010ca] submit_one_bio+0x6a/0xa0 [btrfs] [c0504958] submit_extent_page.isra.34+0xe8/0x210 [btrfs] [c0506087] __do_readpage+0x3f7/0x640 [btrfs] [c05057a0] ? clean_io_failure+0x1b0/0x1b0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c0506606] __extent_readpages.constprop.45+0x266/0x290 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c050734e] extent_readpages+0x15e/0x1a0 [btrfs] [c04eb400] ? btrfs_submit_direct+0x1b0/0x1b0 [btrfs] [c04e771f] btrfs_readpages+0x1f/0x30 [btrfs] [c04dc969] ? btrfs_congested_fn+0x49/0xb0 [btrfs] Regards, Tobias 2015-02-13 19:26 GMT+01:00 Tobias Holst to...@tobby.eu: 2015-02-13 9:06 GMT+01:00 Liu Bo bo.li@oracle.com: On Fri, Feb 13, 2015 at 12:22:16AM +0100, Tobias Holst wrote: Hi I don't remember the exact mkfs.btrfs options anymore but ls /sys/fs/btrfs/[UUID]/features/ shows the following output: big_metadata compress_lzo extended_iref mixed_backref raid56 Well... mkfs.btrfs can specify a '-m' for metadata profile and a '-d' for data profile, the default profile for metadata is RAID1, so we're not sure if your metadata is RAID1 or RAID6, if raid1 and both copies are corrupted, then please use your backup. Ah, I used RAID6 for both, so btrfs fi df /[mountpoint] looks like this: Data, RAID6: total=13.11TiB, used=13.10TiB System, RAID6: total=64.00MiB, used=928.00KiB Metadata, RAID6: total=25.00GiB, used=23.29GiB GlobalReserve, single: total=512.00MiB
Re: Repair broken btrfs raid6?
Hi I don't remember the exact mkfs.btrfs options anymore but ls /sys/fs/btrfs/[UUID]/features/ shows the following output: big_metadata compress_lzo extended_iref mixed_backref raid56 I also tested my device with a short hdparm -tT /dev/dm5 and got /dev/mapper/sdc_crypt: Timing cached reads: 30712 MB in 2.00 seconds = 15376.11 MB/sec Timing buffered disk reads: 444 MB in 3.01 seconds = 147.51 MB/sec Looks ok to me. Should I test more? I bought a few new hard drives so currently I am copying all my data to a second (faster) backup, so I can maybe overwrite the current file system, if it's not repairable. Regards, Tobias 2015-02-12 10:16 GMT+01:00 Liu Bo bo.li@oracle.com: On Wed, Feb 11, 2015 at 03:46:33PM +0100, Tobias Holst wrote: Hmm, it looks like it is getting worse... Here are some parts of my syslog, including two crashed btrfs-threads: So I am still getting many of these: BTRFS (device dm-5): parent transid verify failed on 25033166798848 wanted 108976 found 108958 BTRFS warning (device dm-5): page private not zero on page 25033166798848 BTRFS warning (device dm-5): page private not zero on page 25033166802944 BTRFS warning (device dm-5): page private not zero on page 25033166807040 BTRFS warning (device dm-5): page private not zero on page 25033166811136 First we probably make sure that your device is well setup, since these messages usually occur after a drive is removed(the device is somehow droping writes), the below -EIO also implies btrfs cannot read/write data from or to that drive. And in theory, RAID6 can tolerate two drive failures, so what's your mkfs.btrfs option? Thanks, -liubo BTRFS info (device dm-5): force lzo compression BTRFS info (device dm-5): disk space caching is enabled BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 BTRFS: dm-5 checksum verify failed on 30525304061952 wanted 55270A94 found B18E3934 level 0 Then there is this crash of super/btrfs_abort_transaction: [ cut here ] WARNING: CPU: 0 PID: 30526 at /home/kernel/COD/linux/fs/btrfs/super.c:260 __btrfs_abort_transaction+0x5f/0x140 [btrfs]() BTRFS: Transaction aborted (error -5) Modules linked in: ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) msdos(E) jfs(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) iosf_mbi(E) dm_crypt(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E) ppdev(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) 8250_fintek(E) serio_raw(E) virtio_rng(E) parport_pc(E) mac_hid(E) pvpanic(E) i2c_piix4(E) lp(E) parport(E) cirrus(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ttm(E) mpt2sas(E) drm_kms_helper(E) raid_class(E) floppy(E) psmouse(E) drm(E) scsi_transport_sas(E) CPU: 0 PID: 30526 Comm: kworker/u16:6 Tainted: GW E 3.19.0-031900-generic #201502091451 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] 0104 880002743c18 817c4c00 0007 880002743c68 880002743c58 81076e87 880002743c58 88020a8694d0 8801fb715800 fffb 0ae8 Call Trace: [817c4c00] dump_stack+0x45/0x57 [81076e87] warn_slowpath_common+0x97/0xe0 [81076f86] warn_slowpath_fmt+0x46/0x50 [c06375cf] __btrfs_abort_transaction+0x5f/0x140 [btrfs] [c0655105] btrfs_run_delayed_refs.part.82+0x175/0x290 [btrfs] [c0655237] btrfs_run_delayed_refs+0x17/0x20 [btrfs] [c0655507] delayed_ref_async_start+0x37/0x90 [btrfs] [c069720e] normal_work_helper+0x7e/0x1b0 [btrfs] [c0697572] btrfs_extent_refs_helper+0x12/0x20 [btrfs] [8108f76d] process_one_work+0x14d/0x460 [8109014b] worker_thread+0x11b/0x3f0 [81090030] ? create_worker+0x1e0/0x1e0 [81095d59] kthread+0xc9/0xe0 [81095c90] ? flush_kthread_worker+0x90/0x90 [817d1e7c] ret_from_fork+0x7c/0xb0 [81095c90] ? flush_kthread_worker+0x90/0x90 ---[ end trace dd65465954546462 ]--- BTRFS: error (device dm-5) in btrfs_run_delayed_refs:2792: errno=-5 IO failure BTRFS info (device dm-5): forced readonly and this crash of delayed-ref/btrfs_select_ref_head: [ cut here ] WARNING: CPU: 7 PID: 3159 at /home/kernel/COD/linux/fs/btrfs/delayed-ref.c:438 btrfs_select_ref_head+0x120/0x130 [btrfs
Re: Repair broken btrfs raid6?
[btrfs] [c0652cd1] __btrfs_run_delayed_refs+0x1e1/0x5f0 [btrfs] [c0654ffa] btrfs_run_delayed_refs.part.82+0x6a/0x290 [btrfs] [c0664e5c] ? join_transaction.isra.31+0x13c/0x380 [btrfs] [c0655237] btrfs_run_delayed_refs+0x17/0x20 [btrfs] [c0665e50] btrfs_commit_transaction+0xb0/0xa70 [btrfs] [c0663d95] transaction_kthread+0x1d5/0x250 [btrfs] [c0663bc0] ? open_ctree+0x1f40/0x1f40 [btrfs] [81095d59] kthread+0xc9/0xe0 [81095c90] ? flush_kthread_worker+0x90/0x90 [817d1e7c] ret_from_fork+0x7c/0xb0 [81095c90] ? flush_kthread_worker+0x90/0x90 ---[ end trace dd65465954546463 ]--- BTRFS warning (device dm-5): Skipping commit of aborted transaction. BTRFS: error (device dm-5) in cleanup_transaction:1670: errno=-5 IO failure Any thoughts? Would it help to unplug the dm5-device which seems to be causing this errors and then balance the array? Regards, Tobias 2015-02-09 23:45 GMT+01:00 Tobias Holst to...@tobby.eu: Hi I'm having some trouble with my six-drives btrfs raid6 (each drive encrypted with LUKS). At first: Yes, I do have backups, but it may take at least days, maybe weeks or even some month to restore everything from the (offside) backups. So it is not essential to recover the data, but would be great ;-) OS: Ubuntu 14.04 Kernel: 3.19.0 btrfs-progs: 3.19-rc2 When booting my server I am getting this in the syslog: [8.026362] BTRFS: device label tobby-btrfs devid 3 transid 108721 /dev/dm-0 [8.118896] BTRFS: device label tobby-btrfs devid 6 transid 108721 /dev/dm-1 [8.202477] BTRFS: device label tobby-btrfs devid 1 transid 108721 /dev/dm-2 [8.520988] BTRFS: device label tobby-btrfs devid 4 transid 108721 /dev/dm-3 [8.70] BTRFS info (device dm-3): force lzo compression [8.74] BTRFS info (device dm-3): disk space caching is enabled [8.556310] BTRFS: failed to read the system array on dm-3 [8.592135] BTRFS: open_ctree failed [9.039187] BTRFS: device label tobby-btrfs devid 2 transid 108721 /dev/dm-4 [9.107779] BTRFS: device label tobby-btrfs devid 5 transid 108721 /dev/dm-5 Looks like there is something wrong on drive 3, giving me open_ctree failed. I have to press S to skip mounting of the btrfs volume. It boots and with sudo mount --all I can successfully mount the btrfs volume. Sometimes it takes one or two minutes but it will mount. After a while I am sometimes/randomly getting this in the syslog: [ 1161.283246] BTRFS: dm-5 checksum verify failed on 39099619901440 wanted BB5B0AD5 found 6B6F5040 level 0 Looks like something else is broken on dm-5... But shouldn't this be repaired with the new raid56-repair-features of kernel 3.19? After some more time I am getting this: [637017.631044] BTRFS (device dm-4): parent transid verify failed on 39099305132032 wanted 108722 found 108719 Then it is not possible to access the mounted volume anymore. I have to umount -l to unmount it and then I can remount it. Until it happens again (after some time)... I also tried a balance and a scrub but they crash. Syslog is full of messages like the following examples: [ 3355.523157] csum_tree_block: 53 callbacks suppressed [ 3355.523160] BTRFS: dm-5 checksum verify failed on 39099306917888 wanted F90D8231 found 5981C697 level 0 [ 4006.935632] BTRFS (device dm-5): parent transid verify failed on 30525418536960 wanted 108975 found 108767 and btrfs scrub status /[device] gives me the following output: scrub status for [UUID] scrub started at Mon Feb 9 18:16:38 2015 and was aborted after 2008 seconds total bytes scrubbed: 113.04GiB with 0 errors So a short summary: - btrfs raid6 on 3.19.0 with btrfs-progs 3.19-rc2 - does not mount at boot up, open_ctree failed (disk 3) - mounts successfully after bootup - randomly checksum verify failed (disk 5) - balance and scrub crash after some time - after a while the volume gets unreadable, saying parent transid verify failed (disk 4 or 5) And it looks like there still is no way to btrfsck a raid6. Any ideas how to repair this filesystem? Regards, Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Repair broken btrfs raid6?
2015-02-10 8:17 GMT+01:00 Kai Krakow hurikha...@gmail.com: Tobias Holst to...@tobby.eu schrieb: and btrfs scrub status /[device] gives me the following output: scrub status for [UUID] scrub started at Mon Feb 9 18:16:38 2015 and was aborted after 2008 seconds total bytes scrubbed: 113.04GiB with 0 errors Does not look very correct to me: Why should a scrub in a six-drivers btrfs array which is probably multi- terabytes big (as you state a restore from backup would take days) take only ~2000 seconds? And scrub only ~120 GB worth of data. Either your 6 devices are really small (then why RAID-6), or your data is very sparse (then way does it take so long), or scrub prematurely aborts and never checks the complete devices (I guess this is it). Yes, sorry, I didn't post an output of btrfs filesystem show - but here it is: Label: 'tobby-btrfs' uuid: b689ab76-7ff5-434c-a2c6-03efb45faa46 Total devices 6 FS bytes used 13.13TiB devid1 size 3.64TiB used 3.28TiB path /dev/mapper/sde_crypt devid2 size 3.64TiB used 3.28TiB path /dev/mapper/sdd_crypt devid3 size 3.64TiB used 3.28TiB path /dev/mapper/sdf_crypt devid4 size 3.64TiB used 3.28TiB path /dev/mapper/sda_crypt devid5 size 3.64TiB used 3.28TiB path /dev/mapper/sdb_crypt devid6 size 3.64TiB used 3.28TiB path /dev/mapper/sdc_crypt btrfs-progs v3.19-rc2 So there are ~13TiB of data on this raid6 - but like it says it was aborted after 2008 seconds (about half an hour) and ~120GB of data. Then a parent transid verify failed happened, the volume got unreadable and the scrub was aborted. Until a remount of the btrfs - and until it happens again... And that's what it actually says: aborted after 2008 seconds. I'd expect finished after seconds if I remember my scrub runs correctly (which I currently don't do regularly because it takes long and IO performance sucks during running it). -- Replies to list only preferred. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs features
Hi I am just looking at the features enabled on my btrfs volume. ls /sys/fs/btrfs/[UUID]/features/ shows the following output: big_metadata compress_lzo extended_iref mixed_backref raid56 So big_metadata means I am not using skinny-metadata, compress_lzo means I am using compression. raid56 means I am using the experimental RAID-features of btrfs. But the other two flags are a little bit unclear... I think extended _iref is the extref feature of mkfs.btrfs - right? I am not sure about the mixed_backref feature. What does it mean? Is this the mixed-bg-feature of mkfs.btrfs? Also I try to change these features. I am missing the skinny extends, this can be enabled by btrfstune -x [one device of my raid], correct? And how can I enable the missing no-holes-feature on my volume? Regards, Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Repair broken btrfs raid6?
Hi I'm having some trouble with my six-drives btrfs raid6 (each drive encrypted with LUKS). At first: Yes, I do have backups, but it may take at least days, maybe weeks or even some month to restore everything from the (offside) backups. So it is not essential to recover the data, but would be great ;-) OS: Ubuntu 14.04 Kernel: 3.19.0 btrfs-progs: 3.19-rc2 When booting my server I am getting this in the syslog: [8.026362] BTRFS: device label tobby-btrfs devid 3 transid 108721 /dev/dm-0 [8.118896] BTRFS: device label tobby-btrfs devid 6 transid 108721 /dev/dm-1 [8.202477] BTRFS: device label tobby-btrfs devid 1 transid 108721 /dev/dm-2 [8.520988] BTRFS: device label tobby-btrfs devid 4 transid 108721 /dev/dm-3 [8.70] BTRFS info (device dm-3): force lzo compression [8.74] BTRFS info (device dm-3): disk space caching is enabled [8.556310] BTRFS: failed to read the system array on dm-3 [8.592135] BTRFS: open_ctree failed [9.039187] BTRFS: device label tobby-btrfs devid 2 transid 108721 /dev/dm-4 [9.107779] BTRFS: device label tobby-btrfs devid 5 transid 108721 /dev/dm-5 Looks like there is something wrong on drive 3, giving me open_ctree failed. I have to press S to skip mounting of the btrfs volume. It boots and with sudo mount --all I can successfully mount the btrfs volume. Sometimes it takes one or two minutes but it will mount. After a while I am sometimes/randomly getting this in the syslog: [ 1161.283246] BTRFS: dm-5 checksum verify failed on 39099619901440 wanted BB5B0AD5 found 6B6F5040 level 0 Looks like something else is broken on dm-5... But shouldn't this be repaired with the new raid56-repair-features of kernel 3.19? After some more time I am getting this: [637017.631044] BTRFS (device dm-4): parent transid verify failed on 39099305132032 wanted 108722 found 108719 Then it is not possible to access the mounted volume anymore. I have to umount -l to unmount it and then I can remount it. Until it happens again (after some time)... I also tried a balance and a scrub but they crash. Syslog is full of messages like the following examples: [ 3355.523157] csum_tree_block: 53 callbacks suppressed [ 3355.523160] BTRFS: dm-5 checksum verify failed on 39099306917888 wanted F90D8231 found 5981C697 level 0 [ 4006.935632] BTRFS (device dm-5): parent transid verify failed on 30525418536960 wanted 108975 found 108767 and btrfs scrub status /[device] gives me the following output: scrub status for [UUID] scrub started at Mon Feb 9 18:16:38 2015 and was aborted after 2008 seconds total bytes scrubbed: 113.04GiB with 0 errors So a short summary: - btrfs raid6 on 3.19.0 with btrfs-progs 3.19-rc2 - does not mount at boot up, open_ctree failed (disk 3) - mounts successfully after bootup - randomly checksum verify failed (disk 5) - balance and scrub crash after some time - after a while the volume gets unreadable, saying parent transid verify failed (disk 4 or 5) And it looks like there still is no way to btrfsck a raid6. Any ideas how to repair this filesystem? Regards, Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to repair a damaged filesystem with btrfs raid5
Hi. There is a known bug when you re-plug in a missing hdd of a btrfs raid without wiping the device before. In worst case this results in a totally corrupted filesystem as it did sometimes during my tests of the raid6 implementation. With raid1 it may just go back in time to the point when you unplugged the device. Which is also bad but still no complete data loss - but in raid6 sometimes it was worse. Sounds like you did that (plug in the missing device without wiping)? Next thing is, that scrub and filesystem-check of raid5/6 is not implemented/completed (yet) as Duncan said. It will be (mostly) included in 3.19, but maybe with bugs. You may try to do a balance instead of a scrub as this should read and check your data and then write it back. This worked for me most of the time during my personal raid6 stability and stress tests. But maybe your filesystem has already been corrupted... Give it a try :) Regards Tobias 2015-01-27 10:12 GMT+01:00 Alexander Fieroch alexander.fier...@mpi-dortmund.mpg.de: Hello, I'm testing btrfs RAID5 on three encrypted hdds (dm-crypt) and I'm simulating a harddisk failure by unplugging one device while writing some files. Now the filesystem is damaged. By now is there any chance to repair the filesystem? My operating system is ubuntu server (vivid) with kernel 3.18 and btrfs 3.18.1 (external PPA). I've unplugged device sdb with UUID 65f62f63-6526-4d5e-82d4-adf6d7508092 and crypt device name /dev/mapper/crypt-1. This one should be repaired. Attached is the dmesg log file with corresponding errors. btrfs check do not seem to work. # btrfs check --repair /dev/mapper/crypt-1 enabling repair mode Checking filesystem on /dev/mapper/crypt-1 UUID: 504c2850-3977-4340-8849-18dd3ac2e5e4 checking extents Check tree block failed, want=165396480, have=5385177728513973313 Check tree block failed, want=165396480, have=5385177728513973313 Check tree block failed, want=165396480, have=65536 Check tree block failed, want=165396480, have=5385177728513973313 Check tree block failed, want=165396480, have=5385177728513973313 read block failed check_tree_block Check tree block failed, want=165740544, have=6895225932619678086 Check tree block failed, want=165740544, have=6895225932619678086 Check tree block failed, want=165740544, have=65536 Check tree block failed, want=165740544, have=6895225932619678086 Check tree block failed, want=165740544, have=6895225932619678086 read block failed check_tree_block Check tree block failed, want=165756928, have=13399486021073017810 Check tree block failed, want=165756928, have=13399486021073017810 Check tree block failed, want=165756928, have=65536 Check tree block failed, want=165756928, have=13399486021073017810 Check tree block failed, want=165756928, have=13399486021073017810 read block failed check_tree_block Check tree block failed, want=165773312, have=12571697019259051064 Check tree block failed, want=165773312, have=12571697019259051064 Check tree block failed, want=165773312, have=65536 Check tree block failed, want=165773312, have=12571697019259051064 Check tree block failed, want=165773312, have=12571697019259051064 read block failed check_tree_block Check tree block failed, want=165789696, have=4069002570438424782 Check tree block failed, want=165789696, have=4069002570438424782 Check tree block failed, want=165789696, have=65536 Check tree block failed, want=165789696, have=4069002570438424782 Check tree block failed, want=165789696, have=4069002570438424782 read block failed check_tree_block Check tree block failed, want=165838848, have=9612508092910615774 Check tree block failed, want=165838848, have=9612508092910615774 Check tree block failed, want=165838848, have=65536 Check tree block failed, want=165838848, have=9612508092910615774 Check tree block failed, want=165838848, have=9612508092910615774 read block failed check_tree_block ref mismatch on [99516416 16384] extent item 1, found 0 failed to repair damaged filesystem, aborting Trying a btrfs scrub is finishing with uncorrectable errors: # btrfs scrub start -d /dev/mapper/crypt-1 scrub started on /dev/mapper/crypt-1, fsid 504c2850-3977-4340-8849-18dd3ac2e5e4 (pid=2014) # btrfs scrub status -d /mnt/data/ scrub status for 504c2850-3977-4340-8849-18dd3ac2e5e4 scrub device /dev/mapper/crypt-1 (id 1) history scrub started at Mon Jan 26 14:36:57 2015 and finished after 617 seconds total bytes scrubbed: 29.78GiB with 10906 errors error details: csum=10906 corrected errors: 0, uncorrectable errors: 10906, unverified errors: 0 scrub device /dev/mapper/crypt-2 (id 2) no stats available scrub device /dev/mapper/crypt-3 (id 3) no stats available Any chance to fix the errors or do I have to wait for the next btrfs version? Thank you very much, Alexander # uname -a Linux antares 3.18.0-9-generic #10-Ubuntu SMP Mon Jan 12 21:41:54 UTC 2015 x86_64 x86_64 x86_64
Re: filesystem corruption
Thank you for your reply. I'll answer in-line. 2014-11-02 5:49 GMT+01:00 Robert White rwh...@pobox.com: On 10/31/2014 10:34 AM, Tobias Holst wrote: I am now using another system with kernel 3.17.2 and btrfs-tools 3.17 and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add the second one as there are only two slots in that server. This is what I got: tobby@ubuntu: sudo btrfs check /dev/sdb1 warning, device 2 is missing warning devid 2 not found already root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. tobby@ubuntu: sudo btrfs check --repair /dev/sdb1 enabling repair mode warning, device 2 is missing warning devid 2 not found already Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. The read-only snapshots taken under 3.17.1 are your core problem. OK Now btrfsck is refusing to operate on the degraded RAID because degraded RAID is degraded so it's read-only. (this is an educated guess). Since btrfsck is _not_ a mount type of operation its got no degraded mode that would let you deal with half a RAID as far as I know. OK, good to know. In your case... It is _known_ that you need to be _not_ running 3.17.0 or 3.17.1 if you are going to make read-only snapshots safely. It is _known_ that you need to be running 3.17.2 to get a number of fixes that impact your circumstance. It is _known_ that you need to be running btrfs-progs 3.17 to repair the read-only snapshot that are borked up, and that you must _not_ have previously tried to repair the problme with an older btrfsck. No, I didn't try to repair it with older kernels/btrfs-tools. Were I you, I would... Put the two disks back in the same computer before something bad happens. Upgrade that computer to 3.17.2 and 3.17 respectively. As I mentioned before I only have two slots and my system on this btrfs-raid1 is not working anymore. Not just when accessing ro-snapshots - it crashes everytime at the login prompt. So now I installed Ubuntu 14.04 to an USB stick (so I can readd both btrfs HDDs) and upgraded the kernel to 3.17.2 and btrfs-tools to 3.17. Take a backup (because I am paranoid like that, though current threat seems negligible). I already have a backup. :) btrfsck your raid with --repair. OK. And this is what I get now: tobby@ubuntu: sudo btrfs check /dev/sda1 root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. tobby@ubuntu: sudo btrfs check /dev/sda1 --repair enabling repair mode fixing root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Fixed 1 roots. Checking filesystem on /dev/sda1 UUID: 3ad065be-2525-4547-87d3-0e195497f9cf checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots root 18446744073709551607 inode 258 errors 1000, some csum missing found 36031450184 bytes used err is 1 total csum bytes: 59665716 total tree bytes: 3523330048 total fs tree bytes: 3234054144 total extent tree bytes: 202358784 btree space waste bytes: 755547262 file data blocks allocated: 122274091008 referenced 211741990912 Btrfs v3.17 Alternately, if you previously tried to btrfsck the raid with a version prior to 3.17 tools after the read-only snapshot(s) problem, you will need to resort to mkfs.btrfs to solve the problem. But Hey! you have two disks, so break the RAID, then mkfs one of them, then copy the data, then re-make the RAID such that the new FS rules. Enjoy your system no longer taking racy read-only snapshots... 8-) And this worked! :) Server is back online without restoring any files from the backup. Looks good to me! But I can't do a balance anymore? root@t-mon:~# btrfs balance start /dev/sda1 ERROR: can't access '/dev/sda1' Regards Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem corruption
I am now using another system with kernel 3.17.2 and btrfs-tools 3.17 and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add the second one as there are only two slots in that server. This is what I got: tobby@ubuntu: sudo btrfs check /dev/sdb1 warning, device 2 is missing warning devid 2 not found already root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. tobby@ubuntu: sudo btrfs check --repair /dev/sdb1 enabling repair mode warning, device 2 is missing warning devid 2 not found already Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. btrfs[0x42bd62] btrfs[0x42ffe5] btrfs[0x430211] btrfs[0x4246ec] btrfs[0x424d11] btrfs[0x426af3] btrfs[0x41b18c] btrfs[0x40b46a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ffca1119ec5] btrfs[0x40b497] This can be repeated as often as I want ;) Nothing changed. Regards Tobias 2014-10-31 3:41 GMT+01:00 Rich Freeman r-bt...@thefreemanclan.net: On Thu, Oct 30, 2014 at 9:02 PM, Tobias Holst to...@tobby.eu wrote: Addition: I found some posts here about a general file system corruption in 3.17 and 3.17.1 - is this the cause? Additionally I am using ro-snapshots - maybe this is the cause, too? Anyway: Can I fix that or do I have to reinstall? Haven't touched the filesystem, just did a scrub (found 0 errors). Yup - ro-snapshots is a big problem in 3.17. You can probably recover now by: 1. Update your kernel to 3.17.2 - that takes care of all the big known 3.16/17 issues in general. 2. Run btrfs check using btrfs-tools 3.17. That can clean up the broken snapshots in your filesystem. That is fairly likely to get your filesystem working normally again. It worked for me. I was getting some balance issues when trying to add another device and I'm not sure if 3.17.2 totally fixed that - I ended up cancelling the balance and it will be a while before I have to balance this particular filesystem again, so I'll just hold off and hope things stabilize. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
filesystem corruption
Hi I was using a btrfs RAID1 with two disks under Ubuntu 14.04, kernel 3.13 and btrfs-tools 3.14.1 for weeks without issues. Now I updated to kernel 3.17.1 and btrfs-tools 3.17. After a reboot everything looked fine and I started some tests. While running duperemover (just scanning, not doing anything) and a balance at the same time the load suddenly went up to 30 and the system was not responding anymore. Everyhting working with the filesystem stopped responding. So I did a hard reset. I was able to reboot, but on the login prompt nothing happened but a kernel bug. Same back in kernel 3.13. Now I started a live system (Ubuntu 14.10, kernel 3.16.x, btrfs-tools 3.14.1), and mounted the btrfs filesystem. I can browse through the files but sometimes, especially when accessing my snapshots or trying to create a new snapshot, the kernel bug appears and the filesystem hangs. It shows this: Oct 31 00:09:14 ubuntu kernel: [ 187.661731] [ cut here ] Oct 31 00:09:14 ubuntu kernel: [ 187.661770] WARNING: CPU: 1 PID: 4417 at /build/buildd/linux-3.16.0/fs/btrfs/relocation.c:924 build_backref_tree+0xcab/0x1240 [btrfs]() Oct 31 00:09:14 ubuntu kernel: [ 187.661772] Modules linked in: nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth 6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci e1000e libahci ptp pps_core Oct 31 00:09:14 ubuntu kernel: [ 187.661800] CPU: 1 PID: 4417 Comm: btrfs-balance Tainted: G C3.16.0-23-generic #31-Ubuntu Oct 31 00:09:14 ubuntu kernel: [ 187.661802] Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009 Oct 31 00:09:14 ubuntu kernel: [ 187.661804] 0009 8800a0ae7a00 8177fcbc Oct 31 00:09:14 ubuntu kernel: [ 187.661807] 8800a0ae7a38 8106fd8d 8800a1440750 8800a1440b48 Oct 31 00:09:14 ubuntu kernel: [ 187.661809] 88020a8ce000 0001 88020b6b0d00 8800a0ae7a48 Oct 31 00:09:14 ubuntu kernel: [ 187.661812] Call Trace: Oct 31 00:09:14 ubuntu kernel: [ 187.661820] [8177fcbc] dump_stack+0x45/0x56 Oct 31 00:09:14 ubuntu kernel: [ 187.661825] [8106fd8d] warn_slowpath_common+0x7d/0xa0 Oct 31 00:09:14 ubuntu kernel: [ 187.661827] [8106fe6a] warn_slowpath_null+0x1a/0x20 Oct 31 00:09:14 ubuntu kernel: [ 187.661842] [c01b734b] build_backref_tree+0xcab/0x1240 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661857] [c01b7ae1] relocate_tree_blocks+0x201/0x600 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661872] [c01b88d8] ? add_data_references+0x268/0x2a0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661887] [c01b96fd] relocate_block_group+0x25d/0x6b0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661902] [c01b9d36] btrfs_relocate_block_group+0x1e6/0x2f0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661916] [c0190988] btrfs_relocate_chunk.isra.27+0x58/0x720 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661926] [c0140dc1] ? btrfs_set_path_blocking+0x41/0x80 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661935] [c0145dfd] ? btrfs_search_slot+0x48d/0xa40 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661950] [c018b49b] ? release_extent_buffer+0x2b/0xd0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661964] [c018b95f] ? free_extent_buffer+0x4f/0xa0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661979] [c01936c3] __btrfs_balance+0x4d3/0x8d0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661993] [c0193d48] btrfs_balance+0x288/0x600 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.662008] [c019411d] balance_kthread+0x5d/0x80 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.662022] [c01940c0] ? btrfs_balance+0x600/0x600 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.662026] [81094aeb] kthread+0xdb/0x100 Oct 31 00:09:14 ubuntu kernel: [ 187.662029] [81094a10] ? kthread_create_on_node+0x1c0/0x1c0 Oct 31 00:09:14 ubuntu kernel: [ 187.662032] [81787c3c] ret_from_fork+0x7c/0xb0 Oct 31 00:09:14 ubuntu kernel: [ 187.662035] [81094a10] ? kthread_create_on_node+0x1c0/0x1c0 Oct 31 00:09:14 ubuntu kernel: [ 187.662037] ---[ end trace fb7849e4a6f20424 ]--- end this: Oct 31 00:09:14 ubuntu kernel: [ 187.682629] [ cut here ] Oct 31 00:09:14 ubuntu kernel: [ 187.682635] kernel BUG at /build/buildd/linux-3.16.0/fs/btrfs/extent-tree.c:868! Oct 31 00:09:14 ubuntu kernel: [ 187.682638] invalid opcode: [#1] SMP Oct 31 00:09:14 ubuntu kernel: [ 187.682642] Modules linked in: nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth 6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp
Re: filesystem corruption
Addition: I found some posts here about a general file system corruption in 3.17 and 3.17.1 - is this the cause? Additionally I am using ro-snapshots - maybe this is the cause, too? Anyway: Can I fix that or do I have to reinstall? Haven't touched the filesystem, just did a scrub (found 0 errors). Regards Tobias 2014-10-31 1:29 GMT+01:00 Tobias Holst to...@tobby.eu: Hi I was using a btrfs RAID1 with two disks under Ubuntu 14.04, kernel 3.13 and btrfs-tools 3.14.1 for weeks without issues. Now I updated to kernel 3.17.1 and btrfs-tools 3.17. After a reboot everything looked fine and I started some tests. While running duperemover (just scanning, not doing anything) and a balance at the same time the load suddenly went up to 30 and the system was not responding anymore. Everyhting working with the filesystem stopped responding. So I did a hard reset. I was able to reboot, but on the login prompt nothing happened but a kernel bug. Same back in kernel 3.13. Now I started a live system (Ubuntu 14.10, kernel 3.16.x, btrfs-tools 3.14.1), and mounted the btrfs filesystem. I can browse through the files but sometimes, especially when accessing my snapshots or trying to create a new snapshot, the kernel bug appears and the filesystem hangs. It shows this: Oct 31 00:09:14 ubuntu kernel: [ 187.661731] [ cut here ] Oct 31 00:09:14 ubuntu kernel: [ 187.661770] WARNING: CPU: 1 PID: 4417 at /build/buildd/linux-3.16.0/fs/btrfs/relocation.c:924 build_backref_tree+0xcab/0x1240 [btrfs]() Oct 31 00:09:14 ubuntu kernel: [ 187.661772] Modules linked in: nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth 6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci e1000e libahci ptp pps_core Oct 31 00:09:14 ubuntu kernel: [ 187.661800] CPU: 1 PID: 4417 Comm: btrfs-balance Tainted: G C3.16.0-23-generic #31-Ubuntu Oct 31 00:09:14 ubuntu kernel: [ 187.661802] Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009 Oct 31 00:09:14 ubuntu kernel: [ 187.661804] 0009 8800a0ae7a00 8177fcbc Oct 31 00:09:14 ubuntu kernel: [ 187.661807] 8800a0ae7a38 8106fd8d 8800a1440750 8800a1440b48 Oct 31 00:09:14 ubuntu kernel: [ 187.661809] 88020a8ce000 0001 88020b6b0d00 8800a0ae7a48 Oct 31 00:09:14 ubuntu kernel: [ 187.661812] Call Trace: Oct 31 00:09:14 ubuntu kernel: [ 187.661820] [8177fcbc] dump_stack+0x45/0x56 Oct 31 00:09:14 ubuntu kernel: [ 187.661825] [8106fd8d] warn_slowpath_common+0x7d/0xa0 Oct 31 00:09:14 ubuntu kernel: [ 187.661827] [8106fe6a] warn_slowpath_null+0x1a/0x20 Oct 31 00:09:14 ubuntu kernel: [ 187.661842] [c01b734b] build_backref_tree+0xcab/0x1240 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661857] [c01b7ae1] relocate_tree_blocks+0x201/0x600 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661872] [c01b88d8] ? add_data_references+0x268/0x2a0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661887] [c01b96fd] relocate_block_group+0x25d/0x6b0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661902] [c01b9d36] btrfs_relocate_block_group+0x1e6/0x2f0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661916] [c0190988] btrfs_relocate_chunk.isra.27+0x58/0x720 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661926] [c0140dc1] ? btrfs_set_path_blocking+0x41/0x80 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661935] [c0145dfd] ? btrfs_search_slot+0x48d/0xa40 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661950] [c018b49b] ? release_extent_buffer+0x2b/0xd0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661964] [c018b95f] ? free_extent_buffer+0x4f/0xa0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661979] [c01936c3] __btrfs_balance+0x4d3/0x8d0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661993] [c0193d48] btrfs_balance+0x288/0x600 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.662008] [c019411d] balance_kthread+0x5d/0x80 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.662022] [c01940c0] ? btrfs_balance+0x600/0x600 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.662026] [81094aeb] kthread+0xdb/0x100 Oct 31 00:09:14 ubuntu kernel: [ 187.662029] [81094a10] ? kthread_create_on_node+0x1c0/0x1c0 Oct 31 00:09:14 ubuntu kernel: [ 187.662032] [81787c3c] ret_from_fork+0x7c/0xb0 Oct 31 00:09:14 ubuntu kernel: [ 187.662035] [81094a10] ? kthread_create_on_node+0x1c0/0x1c0 Oct 31 00:09:14 ubuntu kernel: [ 187.662037] ---[ end trace fb7849e4a6f20424 ]--- end this: Oct 31 00:09:14 ubuntu kernel: [ 187.682629] [ cut here
Re: general thoughts and questions + general and RAID5/6 stability?
If it is unknown, which of these options have been used at btrfs creation time - is it possible to check the state of these options afterwards on a mounted or unmounted filesystem? 2014-09-23 15:38 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com: Well, running 'mkfs.btrfs -O list-all' with 3.16 btrfs-progs gives the following list of features: mixed-bg- mixed data and metadata block groups extref - increased hard-link limit per file to 65536 raid56 - raid56 extended format skinny-metadata - reduced size metadata extent refs no-holes- no explicit hole extents for files -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
Hi Is there anything new on this topic? I am using Ubuntu 14.04.1 and experiencing the same problem. - 6 HDDs - LUKS on every HDD - btrfs RAID6 over this 6 crypt-devices No LVM, no nodatacow files. Mount-options: defaults,compress-force=lzo,space_cache With the original 3.13-kernel (3.13.0-32-generic) it is working fine. Then I tried the following kernels from here: http://kernel.ubuntu.com/~kernel-ppa/mainline/ linux-image-3.14.15-031415-generic_3.14.15-031415.201407311853_amd64.deb - not even booting, kernel panic at boot. linux-image-3.15.6-031506-generic_3.15.6-031506.201407172034_amd64.deb, linux-image-3.15.7-031507-generic_3.15.7-031507.201407281235_amd64.deb, and linux-image-3.16.0-031600-generic_3.16.0-031600.201408031935_amd64.deb causing the hangs like described in this thread. When doing big IO (unpacking a .rar-archive with multiple GB) the filesystem stops working. Load stays very high but nothing actually happens on the drives accoding to dstat. htop shows a D (uninterruptible sleep (usually IO)) at many kworker-threads. Unmounting of the btrfs-filesystem only works with -l (lazy) option. Reboot or shutdown doesn't work because of the blocking threads. So only a power cut works. After the reboot the last written data before the hang is lost. I am now back on 3.13. Regards 2014-07-25 4:27 GMT+02:00 Cody P Schafer d...@codyps.com: On Tue, Jul 22, 2014 at 9:53 AM, Chris Mason c...@fb.com wrote: On 07/19/2014 02:23 PM, Martin Steigerwald wrote: Running 3.15.6 with this patch applied on top: - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/` - no extra error messages printed (`dmesg | grep racing`) compared to without the patch I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far. To recap some details (so I can have it all in one place): - /home/ is btrfs with compress=lzo BTRFS RAID 1 with lzo. - I have _not_ created any nodatacow files. Me neither. - Full stack is: sata - dmcrypt - lvm - btrfs (I noticed others mentioning the use of dmcrypt) Same, except no dmcrypt. Thanks for the help in tracking this down everyone. We'll get there! Are you all running multi-disk systems (from a btrfs POV, more than one device?) I don't care how many physical drives this maps to, just does btrfs think there's more than one drive. No, both of my btrfs filesystems are single disk. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to handle a RAID5 arrawy with a failing drive? - raid5 mostly works, just no rebuilds
I think after the balance it was a fine, non-degraded RAID again... As far as I remember. Tobby 2014-03-20 1:46 GMT+01:00 Marc MERLIN m...@merlins.org: On Thu, Mar 20, 2014 at 01:44:20AM +0100, Tobias Holst wrote: I tried the RAID6 implementation of btrfs and I looks like I had the same problem. Rebuild with balance worked but when a drive was removed when mounted and then readded, the chaos began. I tried it a few times. So when a drive fails (and this is just because of connection lost or similar non severe problems), then it is necessary to wipe the disc first before readding it, so btrfs will add it as a new disk and not try to readd the old one. Good to know you got this too. Just to confirm: did you get it to rebuild, or once a drive is lost/gets behind, you're in degraded mode forever for those blocks? Or were you able to balance? Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Massive BTRFS performance degradation
2014-03-09 18:36 GMT+01:00 Austin S Hemmelgarn ahferro...@gmail.com: On 03/09/2014 04:17 AM, Swâmi Petaramesh wrote: Le dimanche 9 mars 2014 08:48:20 KC a écrit : I am experiencing massive performance degradation on my BTRFS root partition on SSD. BTW, is BTRFS still a SSD-killer ? It had this reputation a while ago, and I'm not sure if this still is the case, but I don't dare (yet) converting to BTRFS one of my laptops that has a SSD... Actually, because of the COW nature of BTRFS, it should be better for SSD's than stuff like ext4 (which DOES kill SSD's when journaling is enabled because it ends up doing thousands of read-modify-write cycles to the same 128k of the disk under just generic usage). Just make sure that you use the 'ssd' and 'discard' mount options. Every modern SSD does Wear Leveling. Doing a read-modify-write cycle on the same block doesn't mean it writes to the same memory cell. The SSD-controller distributes the write-cycles over all (empty) cells. So in best-case every cell in the SSD is used equally, no matter of doing random writes or writing the same block over and over. This works better with lots of empty space on the SSD, that's why you should never use more than 90% of the space on a SSD. Garbage collection and TRIM also help the SSD-controller to find empty cells. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html