Healthy amount of free space?
Greetings, I would like to ask what what is healthy amount of free space to keep on each device for btrfs to be happy? This is how my disk array currently looks like [root@dennas ~]# btrfs fi usage /raid Overall: Device size: 29.11TiB Device allocated: 21.26TiB Device unallocated:7.85TiB Device missing: 0.00B Used: 21.18TiB Free (estimated): 3.96TiB (min: 3.96TiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID1: Size:10.61TiB, Used:10.58TiB /dev/mapper/data1 1.75TiB /dev/mapper/data2 1.75TiB /dev/mapper/data3 856.00GiB /dev/mapper/data4 856.00GiB /dev/mapper/data5 1.75TiB /dev/mapper/data6 1.75TiB /dev/mapper/data7 6.29TiB /dev/mapper/data8 6.29TiB Metadata,RAID1: Size:15.00GiB, Used:13.00GiB /dev/mapper/data1 2.00GiB /dev/mapper/data2 3.00GiB /dev/mapper/data3 1.00GiB /dev/mapper/data4 1.00GiB /dev/mapper/data5 3.00GiB /dev/mapper/data6 1.00GiB /dev/mapper/data7 9.00GiB /dev/mapper/data8 10.00GiB System,RAID1: Size:64.00MiB, Used:1.50MiB /dev/mapper/data2 32.00MiB /dev/mapper/data6 32.00MiB /dev/mapper/data7 32.00MiB /dev/mapper/data8 32.00MiB Unallocated: /dev/mapper/data11004.52GiB /dev/mapper/data21004.49GiB /dev/mapper/data31006.01GiB /dev/mapper/data41006.01GiB /dev/mapper/data51004.52GiB /dev/mapper/data61004.49GiB /dev/mapper/data71005.00GiB /dev/mapper/data81005.00GiB Btrfs does quite good job of evenly using space on all devices. No, how low can I let that go? In other words, with how much space free/unallocated remaining space should I consider adding new disk? Thanks for advice :) W. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. signature.asc Description: PGP signature
btrfs scrub not repair corruption
Greetings, I'm running btrfs scrub on my raid each week (is that too often?) and I'm having a problem that it reports corruption, says it's repaired but next week reports it again. Here is from dmesg: $ cat btrfs_error_2017_12_18 [390739.538838] BTRFS warning (device dm-13): checksum error at logical 53939989975040 on dev /dev/mapper/data2, sector 1565559688, root 5, inode 11930, offset 2866180096, length 4096, links 1 (path: pw/Filmy/Krotitelé duchů.mp4) [390739.538850] BTRFS error (device dm-13): bdev /dev/mapper/data2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [390739.555187] BTRFS error (device dm-13): fixed up error at logical 53939989975040 on dev /dev/mapper/data2 [390739.555735] BTRFS warning (device dm-13): checksum error at logical 53939989979136 on dev /dev/mapper/data2, sector 1565559696, root 5, inode 11930, offset 2866184192, length 4096, links 1 (path: pw/Filmy/Krotitelé duchů.mp4) [390739.555743] BTRFS error (device dm-13): bdev /dev/mapper/data2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [390739.556273] BTRFS error (device dm-13): fixed up error at logical 53939989979136 on dev /dev/mapper/data2 [390739.556557] BTRFS warning (device dm-13): checksum error at logical 53939989983232 on dev /dev/mapper/data2, sector 1565559704, root 5, inode 11930, offset 2866188288, length 4096, links 1 (path: pw/Filmy/Krotitelé duchů.mp4) [390739.556562] BTRFS error (device dm-13): bdev /dev/mapper/data2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 [390739.560065] BTRFS error (device dm-13): fixed up error at logical 53939989983232 on dev /dev/mapper/data2 [390739.560871] BTRFS warning (device dm-13): checksum error at logical 53939989987328 on dev /dev/mapper/data2, sector 1565559712, root 5, inode 11930, offset 2866192384, length 4096, links 1 (path: pw/Filmy/Krotitelé duchů.mp4) [390739.560876] BTRFS error (device dm-13): bdev /dev/mapper/data2 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 [390739.561310] BTRFS error (device dm-13): fixed up error at logical 53939989987328 on dev /dev/mapper/data2 [390739.561652] BTRFS warning (device dm-13): checksum error at logical 53939989991424 on dev /dev/mapper/data2, sector 1565559720, root 5, inode 11930, offset 2866196480, length 4096, links 1 (path: pw/Filmy/Krotitelé duchů.mp4) [390739.561657] BTRFS error (device dm-13): bdev /dev/mapper/data2 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 [390739.562054] BTRFS error (device dm-13): fixed up error at logical 53939989991424 on dev /dev/mapper/data2 [390794.712097] BTRFS warning (device dm-13): checksum error at logical 53941063716864 on dev /dev/mapper/data1, sector 1565559688, root 5, inode 11930, offset 3805704192, length 4096, links 1 (path: pw/Filmy/Krotitelé duchů.mp4) [390794.712105] BTRFS error (device dm-13): bdev /dev/mapper/data1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 [390794.734505] BTRFS error (device dm-13): fixed up error at logical 53941063716864 on dev /dev/mapper/data1 [390794.745848] BTRFS warning (device dm-13): checksum error at logical 53941063720960 on dev /dev/mapper/data1, sector 1565559696, root 5, inode 11930, offset 3805708288, length 4096, links 1 (path: pw/Filmy/Krotitelé duchů.mp4) [390794.745859] BTRFS error (device dm-13): bdev /dev/mapper/data1 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0 [390794.749087] BTRFS error (device dm-13): fixed up error at logical 53941063720960 on dev /dev/mapper/data1 [390794.766806] BTRFS warning (device dm-13): checksum error at logical 53941063725056 on dev /dev/mapper/data1, sector 1565559704, root 5, inode 11930, offset 3805712384, length 4096, links 1 (path: pw/Filmy/Krotitelé duchů.mp4) [390794.766816] BTRFS error (device dm-13): bdev /dev/mapper/data1 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0 [390794.769624] BTRFS error (device dm-13): fixed up error at logical 53941063725056 on dev /dev/mapper/data1 [390794.775683] BTRFS warning (device dm-13): checksum error at logical 53941063729152 on dev /dev/mapper/data1, sector 1565559712, root 5, inode 11930, offset 3805716480, length 4096, links 1 (path: pw/Filmy/Krotitelé duchů.mp4) [390794.775694] BTRFS error (device dm-13): bdev /dev/mapper/data1 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0 [390794.789669] BTRFS error (device dm-13): fixed up error at logical 53941063729152 on dev /dev/mapper/data1 [390794.803555] BTRFS warning (device dm-13): checksum error at logical 53941063733248 on dev /dev/mapper/data1, sector 1565559720, root 5, inode 11930, offset 3805720576, length 4096, links 1 (path: pw/Filmy/Krotitelé duchů.mp4) [390794.803566] BTRFS error (device dm-13): bdev /dev/mapper/data1 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0 [390794.824942] BTRFS error (device dm-13): fixed up error at logical 53941063733248 on dev /dev/mapper/data1 $ cat btrfs_error_2018_01_01 [439232.021975] BTRFS warning (device dm-13): checksum error at logical 53939989975040 on dev /dev/mapper/data2, sector 1565559688, root 5, inode 11969, offset 17240064,
Re: Scrub doesn't correct coruption
On , ein wrote: > On 10/23/2017 10:39 AM, Wolf wrote: > > [...] > > > > Is this and issue somewhere inside btrfs or is disk HW related problem? > > Highly unlikely hardware related. According to SMART and dmsg, there's > no indication which would suggest disk failure. That's my thinking too (and the reason while the disk is still in the array instead of going back for warranty), but since the scrub failed to correct the issue despite saying it did, I'm a bit curious what's going on. W. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. signature.asc Description: PGP signature
Re: Scrub doesn't correct coruption
On , Qu Wenruo wrote: > [27240.680874] perf: interrupt took too long (3952 > 3942), lowering > kernel.perf_event_max_sample_rate to 50400 > > [30658.875802] BTRFS warning (device dm-12): checksum error at logical > > 37889245122560 on dev /dev/mapper/data7, sector 2743145096, root 23674, > > inode 206751, offset 762638336, length 4096, links 1 (path: > > アニメ/!waiting_for_better_quality/Gate: Jieitai Kanochi nite, Kaku > > Tatakaeri/GATE Jieitai Kanochi nite, Kaku Tatakaeri 05v2.mp4) > > Well, it's several seasons ago, and I think there are better BDrip raws now. > (Yeah, I'm also an Otaku) > > Despite that, it's better to hide such personal info though. Since downloading stuff from internet is legal in my country I don't usually bother to hide stuff like this, but will do so if it's an issue in this mailing list. > And, did you tried to scrub the corrupted device other than the whole fs? > Btrfs default scrub will start threads to scrub all devices at the same > time, maybe some concurrency caused the false alert. Tbh I had no idea I can scrub just the device and not whole filesystem, running it now (but the scrub on this drive takes like 12 hours so I see tomorrow if it helped). > Also, it could be possible to check/repair it by using btrfs-progs. > Although it's still out-of-tree. > > Could you please try the following branch and use "btrfs scrub start > --offline /dev/mapper/data7" to check if it reports the corruption is > fixable? > https://github.com/gujx2017/btrfs-progs/tree/offline_scrub > > Offline scrub gives us a quite good reference on whether it's fixable, > without the possible hassle in kernel. > So it's worth trying. If scrubbing just the device doesn't help in any way, will give it a try. > (But hey, there is better better BDrip raws already, so I don't think > you're really interested in fixing the corruption) True. Plus since it's RAID1 no data were actually lost and the is working without problem. I'm mainly interested in knowing if it's 1) Issue with HW 2) Some hidden issue with the whole fs and it's going to fall apart soon Thanks for tips so far :) W. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. signature.asc Description: PGP signature
Scrub doesn't correct coruption
Hi, I'm having problem with corruption in one file on my disk array. This is third time it happened (probably). First time I didn't checked the offending file so I'm not sure but it's likely. Btrfs scrub finds the corruption, according to both dmesg and it's output it fixes it. However, next run finds it too. However, according to SMART the disk appears to be healthy (see below). Plus the corruption is limited to one file. Is this and issue somewhere inside btrfs or is disk HW related problem? Thank you for your help :) W. smartctl -a /dev/sde SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016Pre-fail Always - 0 2 Throughput_Performance 0x0005 131 131 054Pre-fail Offline - 116 3 Spin_Up_Time0x0007 100 100 024Pre-fail Always - 0 4 Start_Stop_Count0x0012 100 100 000Old_age Always - 8 5 Reallocated_Sector_Ct 0x0033 100 100 005Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 140 140 020Pre-fail Offline - 15 9 Power_On_Hours 0x0012 100 100 000Old_age Always - 401 10 Spin_Retry_Count0x0013 100 100 060Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000Old_age Always - 8 22 Unknown_Attribute 0x0023 100 100 025Pre-fail Always - 100 192 Power-Off_Retract_Count 0x0032 100 100 000Old_age Always - 33 193 Load_Cycle_Count0x0012 100 100 000Old_age Always - 33 194 Temperature_Celsius 0x0002 147 147 000Old_age Always - 44 (Min/Max 23/46) 196 Reallocated_Event_Count 0x0032 100 100 000Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000Old_age Offline - 0 199 UDMA_CRC_Error_Count0x000a 200 200 000Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_DescriptionStatus Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offlineCompleted without error 00% 357 - # 2 Short offline Completed without error 00% 335 - uname -a Linux ws 4.13.8-1-ARCH #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017 x86_64 GNU/Linux btrfs --version === btrfs-progs v4.13 btrfs fi show = Label: none uuid: db7e86f5-649d-44ce-9514-53c7ee0fbe09 Total devices 2 FS bytes used 9.91GiB devid1 size 103.79GiB used 20.03GiB path /dev/mapper/storage1-root devid2 size 103.79GiB used 20.03GiB path /dev/mapper/storage2-root Label: 'RAID' uuid: 9a4be3ac-e942-4e6a-bb24-2c4009a42572 Total devices 7 FS bytes used 6.48TiB devid1 size 1.82TiB used 715.03GiB path /dev/mapper/data3 devid2 size 1.82TiB used 715.00GiB path /dev/mapper/data4 devid3 size 2.73TiB used 1.40TiB path /dev/mapper/data2 devid4 size 2.73TiB used 1.40TiB path /dev/mapper/data1 devid5 size 2.73TiB used 1.40TiB path /dev/mapper/data5 devid6 size 2.73TiB used 1.40TiB path /dev/mapper/data6 devid7 size 7.28TiB used 5.95TiB path /dev/mapper/data7 btrfs fi df /raid = Data, RAID1: total=6.47TiB, used=6.47TiB System, RAID1: total=64.00MiB, used=944.00KiB Metadata, RAID1: total=9.00GiB, used=7.56GiB GlobalReserve, single: total=512.00MiB, used=0.00B dmesg = [0.00] microcode: microcode updated early to revision 0xba, date = 2017-04-09 [0.00] random: get_random_bytes called from start_kernel+0x42/0x4b7 with crng_init=0 [0.00] Linux version 4.13.8-1-ARCH (builduser@tobias) (gcc version 7.2.0 (GCC)) #1 SMP PREEMPT Wed Oct 18 11:49:44 CEST 2017 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=db7e86f5-649d-44ce-9514-53c7ee0fbe09 rw cryptdevice=UUID=eb4011d2-38cd-467d-b515-7acf3ef68f01:storage1:allow-discards cryptkey=rootfs:/boot/crypto_keyfile.bin cryptdevice2=UUID=dd0821ae-8fc4-41d2-aab8-f313e2f6d0e8:storage2:allow-discards cryptkey2=rootfs:/boot/crypto_keyfile2.bin root=/dev/mapper/storage1-root [0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.00] x86/fpu:
Mounting bad btrfs img file
My OS drive had issues with metadata (disk full even though it wasn't etc), and so I reinstalled my OS and now I'm learning that my backup img is bad. What steps should I go through to fix it? $ sudo mount -o offset=827808,loop,recovery,ro,nospace_cache,nospace_cache /data/Backup/Nephele.img /mnt mount: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. nephele@Nephele:/data/Backup$ dmesg | tail [48539.443711] BTRFS error (device loop0): parent transid verify failed on 98021605376 wanted 85976 found 85978 [48539.473837] BTRFS error (device loop0): parent transid verify failed on 98021605376 wanted 85976 found 85978 [48539.485032] BTRFS: failed to read tree root on loop0 [48539.494782] BTRFS error (device loop0): parent transid verify failed on 98049490944 wanted 85975 found 85981 [48539.510541] BTRFS error (device loop0): parent transid verify failed on 98049490944 wanted 85975 found 85981 [48539.523464] BTRFS: failed to read tree root on loop0 [48539.526374] BTRFS error (device loop0): parent transid verify failed on 98020917248 wanted 85974 found 85979 [48539.545628] BTRFS error (device loop0): parent transid verify failed on 98020917248 wanted 85974 found 85979 [48539.560595] BTRFS: failed to read tree root on loop0 [48539.589038] BTRFS: open_ctree failed -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rm: it is dangerous to operate recursively on '/data/Backup/@' (same as '/')
So somehow my @ and @home subvolumes became mounted at /data/Backup in addition to their normal locations (/ and /home). So when I used 'dd' I was outputting to my OS drive instead of my data pool. How did this happen? How do I undo it? I'll try restarting now, but I'll await further responses before replying to myself again. --- Eric Wolf (201) 316-6098 19w...@gmail.com On Tue, Sep 26, 2017 at 10:41 AM, Eric Wolf <19w...@gmail.com> wrote: > I accidentally filled my OS drive with a copy of itself? The problem > is, /data/ is a separate pool from the OS drive. And now it looks like > I can't erase it? I don't even know how I got here. All I did was "dd > if=/dev/sda of=/data/Backup/backup-new.img && mv > /data/Backup/backup-new.img /data/Backup/backup.img" I don't really > know where to go from here. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
rm: it is dangerous to operate recursively on '/data/Backup/@' (same as '/')
I accidentally filled my OS drive with a copy of itself? The problem is, /data/ is a separate pool from the OS drive. And now it looks like I can't erase it? I don't even know how I got here. All I did was "dd if=/dev/sda of=/data/Backup/backup-new.img && mv /data/Backup/backup-new.img /data/Backup/backup.img" I don't really know where to go from here. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Snapper Rollback filesystem is read only
I rolled back my filesystem with 'snapper rollback 81 (or whatever snapshot it was)' and now when I boot my filesystem is read-only. How do I fix it? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11
On Thu, Aug 31, 2017 at 4:11 PM, Hugo Mills <h...@carfax.org.uk> wrote: > On Thu, Aug 31, 2017 at 03:21:07PM -0400, Eric Wolf wrote: >> I've previously confirmed it's a bad ram module which I have already >> submitted an RMA for. Any advice for manually fixing the bits? > >What I'd do... use a hex editor and the contents of ctree.h as > documentation to find the byte in question, change it back to what it > should be, mount the FS, try reading the directory again, look up the > csum failure in dmesg, edit the block again to fix up the csum, and > it's done. (Yes, I've done this before, and I'm a massive nerd). > >It's also possible to use Hans van Kranenberg's btrfs-python to fix > up this kind of thing, but I've not done it myself. There should be a > couple of talk-throughs from Hans in various archives -- both this > list (find it on, say, http://www.spinics.net/lists/linux-btrfs/), and > on the IRC archives (http://logs.tvrrug.org.uk/logs/%23btrfs/latest.html). > >> Sorry for top leveling, not sure how mailing lists work (again sorry >> if this message is top leveled, how do I ensure it's not?) > >Just write your answers _after_ the quoted text that you're > replying to, not before. It's a convention, rather than a technical > thing... > >Hugo. > >> >> >> >> On Thu, Aug 31, 2017 at 2:59 PM, Hugo Mills <h...@carfax.org.uk> wrote: >> >(Please don't top-post; edited for conversation flow) >> > >> > On Thu, Aug 31, 2017 at 02:44:39PM -0400, Eric Wolf wrote: >> >> On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills <h...@carfax.org.uk> wrote: >> >> > On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote: >> >> >> I'm having issues with a bad block(?) on my root ssd. >> >> >> >> >> >> dmesg is consistently outputting "BTRFS critical (device sda2): >> >> >> corrupt leaf, bad key order: block=293438636032, root=1, slot=11" >> >> >> >> >> >> "btrfs scrub stat /" outputs "scrub status for >> >> >> b2c9ff7b-[snip]-48a02cc4f508 >> >> >> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55 >> >> >> total bytes scrubbed: 53.41GiB with 2 errors >> >> >> error details: verify=2 >> >> >> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0" >> >> >> >> >> >> Running "btrfs check --repair /dev/sda2" from a live system stalls >> >> >> after telling me corrupt leaf etc etc then "11 12". CPU usage hits >> >> >> 100% and disk activity remains at 0. >> >> > >> >> >This error is usually attributable to bad hardware. Typically RAM, >> >> > but might also be marginal power regulation (blown capacitor >> >> > somewhere) or a slightly broken CPU. >> >> > >> >> >Can you show us the output of "btrfs-debug-tree -b 293438636032 >> >> > /dev/sda2"? >> > >> >Here's the culprit: >> > >> > [snip] >> >> item 10 key (890553 EXTENT_DATA 0) itemoff 14685 itemsize 269 >> >>inline extent data size 248 ram 248 compress 0 >> >> item 11 key (890554 INODE_ITEM 0) itemoff 14525 itemsize 160 >> >>inode generation 5386763 transid 5386764 size 135 nbytes 135 >> >>block group 0 mode 100644 links 1 uid 10 gid 10 >> >>rdev 0 flags 0x0 >> >> item 12 key (856762 INODE_REF 31762) itemoff 14496 itemsize 29 >> >>inode ref index 2745 namelen 19 name: dpkg.statoverride.0 >> >> item 13 key (890554 EXTENT_DATA 0) itemoff 14340 itemsize 156 >> >>inline extent data size 135 ram 135 compress 0 >> > [snip] >> > >> >Note the objectid field -- the first number in the brackets after >> > "key" for each item. This sequence of values should be non-decreasing. >> > Thus, item 12 should have an objectid of 890554 to match the items >> > either side of it, and instead it has 856762. >> > >> >In hex, these are: >> > >> >>>> hex(890554) >> > '0xd96ba' >> >>>> hex(856762) >> > '0xd12ba' >> > >> >Which means you've had two bitflips close together: >> > >> >>>> hex(856762 ^ 890554) >> > '0x8400' >> > >> >Given that everything else is OK, and it's just one byte affected >> > in the m
Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11
Okay, I have a hex editor open. Now what? Your instructions seems straightforward, but I have no idea what I'm doing. --- Eric Wolf (201) 316-6098 19w...@gmail.com On Thu, Aug 31, 2017 at 4:11 PM, Hugo Mills <h...@carfax.org.uk> wrote: > On Thu, Aug 31, 2017 at 03:21:07PM -0400, Eric Wolf wrote: >> I've previously confirmed it's a bad ram module which I have already >> submitted an RMA for. Any advice for manually fixing the bits? > >What I'd do... use a hex editor and the contents of ctree.h as > documentation to find the byte in question, change it back to what it > should be, mount the FS, try reading the directory again, look up the > csum failure in dmesg, edit the block again to fix up the csum, and > it's done. (Yes, I've done this before, and I'm a massive nerd). > >It's also possible to use Hans van Kranenberg's btrfs-python to fix > up this kind of thing, but I've not done it myself. There should be a > couple of talk-throughs from Hans in various archives -- both this > list (find it on, say, http://www.spinics.net/lists/linux-btrfs/), and > on the IRC archives (http://logs.tvrrug.org.uk/logs/%23btrfs/latest.html). > >> Sorry for top leveling, not sure how mailing lists work (again sorry >> if this message is top leveled, how do I ensure it's not?) > >Just write your answers _after_ the quoted text that you're > replying to, not before. It's a convention, rather than a technical > thing... > >Hugo. > >> --- >> Eric Wolf >> (201) 316-6098 >> 19w...@gmail.com >> >> >> On Thu, Aug 31, 2017 at 2:59 PM, Hugo Mills <h...@carfax.org.uk> wrote: >> >(Please don't top-post; edited for conversation flow) >> > >> > On Thu, Aug 31, 2017 at 02:44:39PM -0400, Eric Wolf wrote: >> >> On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills <h...@carfax.org.uk> wrote: >> >> > On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote: >> >> >> I'm having issues with a bad block(?) on my root ssd. >> >> >> >> >> >> dmesg is consistently outputting "BTRFS critical (device sda2): >> >> >> corrupt leaf, bad key order: block=293438636032, root=1, slot=11" >> >> >> >> >> >> "btrfs scrub stat /" outputs "scrub status for >> >> >> b2c9ff7b-[snip]-48a02cc4f508 >> >> >> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55 >> >> >> total bytes scrubbed: 53.41GiB with 2 errors >> >> >> error details: verify=2 >> >> >> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0" >> >> >> >> >> >> Running "btrfs check --repair /dev/sda2" from a live system stalls >> >> >> after telling me corrupt leaf etc etc then "11 12". CPU usage hits >> >> >> 100% and disk activity remains at 0. >> >> > >> >> >This error is usually attributable to bad hardware. Typically RAM, >> >> > but might also be marginal power regulation (blown capacitor >> >> > somewhere) or a slightly broken CPU. >> >> > >> >> >Can you show us the output of "btrfs-debug-tree -b 293438636032 >> >> > /dev/sda2"? >> > >> >Here's the culprit: >> > >> > [snip] >> >> item 10 key (890553 EXTENT_DATA 0) itemoff 14685 itemsize 269 >> >>inline extent data size 248 ram 248 compress 0 >> >> item 11 key (890554 INODE_ITEM 0) itemoff 14525 itemsize 160 >> >>inode generation 5386763 transid 5386764 size 135 nbytes 135 >> >>block group 0 mode 100644 links 1 uid 10 gid 10 >> >>rdev 0 flags 0x0 >> >> item 12 key (856762 INODE_REF 31762) itemoff 14496 itemsize 29 >> >>inode ref index 2745 namelen 19 name: dpkg.statoverride.0 >> >> item 13 key (890554 EXTENT_DATA 0) itemoff 14340 itemsize 156 >> >>inline extent data size 135 ram 135 compress 0 >> > [snip] >> > >> >Note the objectid field -- the first number in the brackets after >> > "key" for each item. This sequence of values should be non-decreasing. >> > Thus, item 12 should have an objectid of 890554 to match the items >> > either side of it, and instead it has 856762. >> > >> >In hex, these are: >> > >> >>>> hex(890554) >> > '0xd96ba' >> >>>> hex(856762) >> > '0xd12ba' >> > >> >Which means you'v
Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11
I've previously confirmed it's a bad ram module which I have already submitted an RMA for. Any advice for manually fixing the bits? Sorry for top leveling, not sure how mailing lists work (again sorry if this message is top leveled, how do I ensure it's not?) --- Eric Wolf (201) 316-6098 19w...@gmail.com On Thu, Aug 31, 2017 at 2:59 PM, Hugo Mills <h...@carfax.org.uk> wrote: >(Please don't top-post; edited for conversation flow) > > On Thu, Aug 31, 2017 at 02:44:39PM -0400, Eric Wolf wrote: >> On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills <h...@carfax.org.uk> wrote: >> > On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote: >> >> I'm having issues with a bad block(?) on my root ssd. >> >> >> >> dmesg is consistently outputting "BTRFS critical (device sda2): >> >> corrupt leaf, bad key order: block=293438636032, root=1, slot=11" >> >> >> >> "btrfs scrub stat /" outputs "scrub status for >> >> b2c9ff7b-[snip]-48a02cc4f508 >> >> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55 >> >> total bytes scrubbed: 53.41GiB with 2 errors >> >> error details: verify=2 >> >> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0" >> >> >> >> Running "btrfs check --repair /dev/sda2" from a live system stalls >> >> after telling me corrupt leaf etc etc then "11 12". CPU usage hits >> >> 100% and disk activity remains at 0. >> > >> >This error is usually attributable to bad hardware. Typically RAM, >> > but might also be marginal power regulation (blown capacitor >> > somewhere) or a slightly broken CPU. >> > >> >Can you show us the output of "btrfs-debug-tree -b 293438636032 >> > /dev/sda2"? > >Here's the culprit: > > [snip] >> item 10 key (890553 EXTENT_DATA 0) itemoff 14685 itemsize 269 >>inline extent data size 248 ram 248 compress 0 >> item 11 key (890554 INODE_ITEM 0) itemoff 14525 itemsize 160 >>inode generation 5386763 transid 5386764 size 135 nbytes 135 >>block group 0 mode 100644 links 1 uid 10 gid 10 >>rdev 0 flags 0x0 >> item 12 key (856762 INODE_REF 31762) itemoff 14496 itemsize 29 >>inode ref index 2745 namelen 19 name: dpkg.statoverride.0 >> item 13 key (890554 EXTENT_DATA 0) itemoff 14340 itemsize 156 >>inline extent data size 135 ram 135 compress 0 > [snip] > >Note the objectid field -- the first number in the brackets after > "key" for each item. This sequence of values should be non-decreasing. > Thus, item 12 should have an objectid of 890554 to match the items > either side of it, and instead it has 856762. > >In hex, these are: > >>>> hex(890554) > '0xd96ba' >>>> hex(856762) > '0xd12ba' > >Which means you've had two bitflips close together: > >>>> hex(856762 ^ 890554) > '0x8400' > >Given that everything else is OK, and it's just one byte affected > in the middle of a load of data that's really quite sensitive to > errors, it's very unlikely that it's the result of a misplaced pointer > in the kernel, or some other subsystem accidentally walking over that > piece of RAM. It is, therefore, almost certainly your hardware that's > at fault. > >I would strongly suggest running memtest86 on your machine -- I'd > usually say a minimum of 8 hours, or longer if you possibly can (24 > hours), or until you have errors reported. If you get errors reported > in the same place on multiple passes, then it's the RAM. If you have > errors scattered around seemingly at random, then it's probably your > power regulation (PSU or motherboard). > >Sadly, btrfs check on its own won't be able to fix this, as it's > two bits flipped. (It can cope with one bit flipped in the key, most > of the time, but not two). It can be fixed manually, if you're > familiar with a hex editor and the on-disk data structures. > >Hugo. > > -- > Hugo Mills | "You got very nice eyes, Deedee. Never noticed them > hugo@... carfax.org.uk | before. They real?" > http://carfax.org.uk/ | > PGP: E2AB1DE4 | Don Logan, Sexy Beast -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11
Also, I know it was caused by bad RAM and that ram has since been removed. --- Eric Wolf (201) 316-6098 19w...@gmail.com On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills <h...@carfax.org.uk> wrote: > On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote: >> I'm having issues with a bad block(?) on my root ssd. >> >> dmesg is consistently outputting "BTRFS critical (device sda2): >> corrupt leaf, bad key order: block=293438636032, root=1, slot=11" >> >> "btrfs scrub stat /" outputs "scrub status for b2c9ff7b-[snip]-48a02cc4f508 >> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55 >> total bytes scrubbed: 53.41GiB with 2 errors >> error details: verify=2 >> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0" >> >> Running "btrfs check --repair /dev/sda2" from a live system stalls >> after telling me corrupt leaf etc etc then "11 12". CPU usage hits >> 100% and disk activity remains at 0. > >This error is usually attributable to bad hardware. Typically RAM, > but might also be marginal power regulation (blown capacitor > somewhere) or a slightly broken CPU. > >Can you show us the output of "btrfs-debug-tree -b 293438636032 /dev/sda2"? > >Hugo. > > -- > Hugo Mills | "You got very nice eyes, Deedee. Never noticed them > hugo@... carfax.org.uk | before. They real?" > http://carfax.org.uk/ | > PGP: E2AB1DE4 | Don Logan, Sexy Beast -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11
402713112576 nr 8192 extent data offset 0 nr 4096 ram 8192 extent compression 0 item 149 key (890558 EXTENT_DATA 1105920) itemoff 6804 itemsize 53 extent data disk byte 402714587136 nr 8192 extent data offset 0 nr 4096 ram 8192 extent compression 0 item 150 key (890558 EXTENT_DATA 1110016) itemoff 6751 itemsize 53 extent data disk byte 402716921856 nr 8192 extent data offset 0 nr 4096 ram 8192 extent compression 0 item 151 key (890558 EXTENT_DATA 1114112) itemoff 6698 itemsize 53 extent data disk byte 402717990912 nr 8192 extent data offset 0 nr 4096 ram 8192 extent compression 0 item 152 key (890558 EXTENT_DATA 1118208) itemoff 6645 itemsize 53 extent data disk byte 402721890304 nr 20480 extent data offset 0 nr 16384 ram 20480 extent compression 0 --- Eric Wolf (201) 316-6098 19w...@gmail.com On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills <h...@carfax.org.uk> wrote: > On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote: >> I'm having issues with a bad block(?) on my root ssd. >> >> dmesg is consistently outputting "BTRFS critical (device sda2): >> corrupt leaf, bad key order: block=293438636032, root=1, slot=11" >> >> "btrfs scrub stat /" outputs "scrub status for b2c9ff7b-[snip]-48a02cc4f508 >> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55 >> total bytes scrubbed: 53.41GiB with 2 errors >> error details: verify=2 >> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0" >> >> Running "btrfs check --repair /dev/sda2" from a live system stalls >> after telling me corrupt leaf etc etc then "11 12". CPU usage hits >> 100% and disk activity remains at 0. > >This error is usually attributable to bad hardware. Typically RAM, > but might also be marginal power regulation (blown capacitor > somewhere) or a slightly broken CPU. > >Can you show us the output of "btrfs-debug-tree -b 293438636032 /dev/sda2"? > >Hugo. > > -- > Hugo Mills | "You got very nice eyes, Deedee. Never noticed them > hugo@... carfax.org.uk | before. They real?" > http://carfax.org.uk/ | > PGP: E2AB1DE4 | Don Logan, Sexy Beast -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11
I'm having issues with a bad block(?) on my root ssd. dmesg is consistently outputting "BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11" "btrfs scrub stat /" outputs "scrub status for b2c9ff7b-[snip]-48a02cc4f508 scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55 total bytes scrubbed: 53.41GiB with 2 errors error details: verify=2 corrected errors: 0, uncorrectable errors: 2, unverified errors: 0" Running "btrfs check --repair /dev/sda2" from a live system stalls after telling me corrupt leaf etc etc then "11 12". CPU usage hits 100% and disk activity remains at 0. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Incremental send receive of snapshot fails
Hi As the fs in question is my root, I tried the following using a live usb stick of a xubuntu 16.10: Checking filesystem on /dev/sdb1 UUID: 122ecca7-9804-4c8a-b4ed-42fd6c6bbe7a checking extents [o] checking free space cache [.] checking fs roots [o] found 40577679360 bytes used err is 0 total csum bytes: 39027548 total tree bytes: 571277312 total fs tree bytes: 453001216 total extent tree bytes: 71745536 btree space waste bytes: 116244847 file data blocks allocated: 46952968192 referenced 44081487872 "err is 0" ... so I guess that means everything is fine? Out of curiosity I retried the new_snap+send+receive on that same fs using the live cd: same results (ERROR unlink ...) Though I noticed that the exact file in question (reported by ERROR) is somewhat random ... For this test with the live usb, I mounted the root volume directly instead of subvolumes via fstab. So that doesn't seam to have been the problem either. I did some further meditating on what happens here. From what I read and understand of send/receive, the stream produced by send is about replaying the fs events. If I give send a parent, it will just replay the difference between the two snapshots and only produce a stream that contains the changes needed to "transform" one (parent) snap into the other (on the receiving end). Now I'm not sure how the receiving end figures out what the parent is, and whether it has it, but I guess that's where all those UUIDs come into play. There are three UUIDs, if I compare them on sending ("lab") and receiving ("server") side, I see: ## sender # btrfs subv show /.back/last_snap_by_script /.back/last_snap_by_script UUID: b4634a8b-b74b-154a-9f17-1115f6d07524 Parent UUID:b5f9a301-69f7-0646-8cf1-ba29e0c24fac Received UUID: 196a0866-cd05-d24e-bac6-84e8e5eb037a ## receiver # btrfs subv show /media/bak/lab/root/last_snap_by_script UUID: 89321ec1-2de6-0a4c-8f9f-cdd30fa3a7af Parent UUID:- Received UUID: 196a0866-cd05-d24e-bac6-84e8e5eb037a So that does make sense to me, as neither "Parent UUID" nor "UUID" is what would fit our needs (both are kind of local to one system). Instead the "Received UUID" seams to be the link identifying snaps on both ends to be "equal". But then why do both snaps on the sending side have the same "Received UUID" for me: ## from my original post, on sender side, this is the "new" delta snapshot # btrfs subv show /.back/new_snap /.back/new_snap Name: new_snap UUID: fca51929-8101-db45-8df6-f25935c04f98 Parent UUID:b5f9a301-69f7-0646-8cf1-ba29e0c24fac Received UUID: 196a0866-cd05-d24e-bac6-84e8e5eb037a It would be great if some one could clear this up .. could this point to the reason on why the "replay" stream is produced on a wrong basis? Another thing I tried is the "--max-error 0" option of receive. That lets it continue after error, but that produced an endless slur of more of the same errors. Is that another indicator that the parent on the sending or receiving side is identified wrongly or not at all? In any case, thanks for the tip Giuseppe :-) Regards Rene On 29.12.2016 16:31, Giuseppe Della Bianca wrote: Hi. In such cases, I have run btrfs check (not repair mode !!!) in every file system/partition that is involved in creating, sending and receiving snapshots. Regards. Gdb -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Incremental send receive of snapshot fails
Hi all I have a problem with incremental snapshot send receive in btrfs. May be my google-fu is weak, but I couldn't find any pointers, so here goes. A few words about my setup first: I have multiple clients that back up to a central server. All clients (and the server) are running a (K)Ubuntu 16.10 64Bit on btrfs. Backing up works with btrfs send / receive, either full or incremental, depending on whats available on the server side. All clients have the usual (Ubuntu) btrfs layout: 2 subvolumes, one for / and one for /home; explicit entries in fstab; root volume not mounted anywhere. For further details see the P.s. at the end. Here's what happens: In general I stick to the example form https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . Backing up is done daily by a script, and it works successfully on all of my clients except one (called "lab"). I start with the first snapshot on "lab" and do a full send to the server. This works as expected (sending takes some hours as it is done over wifi+ssh). After that is done I send an incremental snapshot based on the previous parent. This also works as expected (no error etc). Sending deltas then happens once a day, with the script always keeping the last two snapshots on the client and many more on the server. Also after each run of the script I do a bit of "house keeping" to prevent "disk full" etc (see below p.s. for commands). I can't exactly say when, but after some time (possibly the next day) snapshot sending fails with an error on the receiving end: ERROR: unlink some/file failed. No such file or directory Some searching around lead me to this https://bugzilla.kernel.org/show_bug.cgi?id=60673 . So I checked to make sure my script doesn't use the wrong parent; and it does not. But to make really sure I tried a send / receive directly on "lab" without the server: # btrfs subvol snap -r / /.back/new_snap Create a readonly snapshot of '/' in '/.back/new_snap' # btrfs subv show /.back/last_snap_by_script /.back/last_snap_by_script Name: last_snap_by_script UUID: b4634a8b-b74b-154a-9f17-1115f6d07524 Parent UUID:b5f9a301-69f7-0646-8cf1-ba29e0c24fac Received UUID: 196a0866-cd05-d24e-bac6-84e8e5eb037a Creation time: 2016-12-27 17:55:10 +0100 Subvolume ID: 486 Generation: 52036 Gen at creation:51524 Parent ID: 257 Top level ID: 257 Flags: readonly Snapshot(s): # btrfs subv show /.back/new_snap /.back/new_snap Name: new_snap UUID: fca51929-8101-db45-8df6-f25935c04f98 Parent UUID:b5f9a301-69f7-0646-8cf1-ba29e0c24fac Received UUID: 196a0866-cd05-d24e-bac6-84e8e5eb037a Creation time: 2016-12-28 11:51:43 +0100 Subvolume ID: 506 Generation: 52271 Gen at creation:52271 Parent ID: 257 Top level ID: 257 Flags: readonly Snapshot(s): # btrfs send -p /.back/last_snap_by_script /.back/new_snap > delta At subvol /.back/new_snap # btrfs subvol del /.back/new_snap Delete subvolume (no-commit): '/.back/new_snap' # cat delta | btrfs receive /.back/ At snapshot new_snap ERROR: unlink some/file failed. No such file or directory And the receive always fails with some ERROR similar to the above! What I find a bit odd is the identical "Received UUID", even before new_snap was sent / received ... but maybe that's normal? If instead of "last_snap_by_script" I also create a new read only snapshot and send the delta between these two "new" ones, everything works as expected. But then there's little differences between the two new snaps ... I tried to look for differences between the "lab" client and another one ("navi") where backing up works. So far I couldn't really find anything. I did create both file systems at different points in time (possibly with different kernels). All fs were created as btrfs and not "converted" from ext. "lab" has an SSD, "navi" a spinning disc. Both systems run on Intel CPUs in 64Bit ... So now I have a snapshot on "lab" which I cannot use as a parent, but why? What did I do wrong? The whole procedure does work on my other clients (with the exact same script), why not on the "lab" client? And this is a re-occuring problem: I tried deleting all of the snaps (on both ends) and start all over again ... it will again end up with a "broken" snapshot eventually. Up until now using btrfs has been a great experience and I always could resolve my troubles quite quickly, but this time I don't know what to do? Thanks in advance for any suggestions and feel free to ask for other / missing details :-) Regards Rene P.s.:
btrfs restore fails because of NO SPACE
Hallo, I have a confusing problem with my btrfs raid. Currently I am using the following setup: btrfs fi show Label: none uuid: 93000933-e46d-403b-80d7-60475855e3f3 Total devices 2 FS bytes used 2.56TiB devid1 size 2.73TiB used 2.71TiB path /dev/sda devid4 size 2.73TiB used 2.71TiB path /dev/sdb As you can see both disks are full. Actually I cannot mount my raid, even with recovery options enabled: mount /dev/sda /mnt/Data -t btrfs -onospace_cache,clear_cache,enospc_debug,nodatacow mount: wrong fs type, bad option, bad superblock on /dev/sda, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. dmesg shows: [ 1066.221696] BTRFS info (device sda): disabling disk space caching [ 1066.227990] BTRFS info (device sda): force clearing of disk cache [ 1066.234331] BTRFS info (device sda): setting nodatacow, compression disabled [ 1066.243813] BTRFS error (device sda): parent transid verify failed on 9676657786880 wanted 242139 found 0 [ 1066.253672] BTRFS error (device sda): parent transid verify failed on 9676657786880 wanted 242139 found 0 [ 1066.263450] BTRFS error (device sda): parent transid verify failed on 9676657786880 wanted 242139 found 0 [ 1066.273234] BTRFS: failed to read chunk root on sda [ 1066.279675] BTRFS warning (device sda): page private not zero on page 9676657786880 [ 1066.287482] BTRFS warning (device sda): page private not zero on page 9676657790976 [ 1066.295361] BTRFS warning (device sda): page private not zero on page 9676657795072 [ 1066.303204] BTRFS warning (device sda): page private not zero on page 9676657799168 [ 1066.369266] BTRFS: open_ctree failed After spending some time with Google I found a possible solution for my problem by running: btrfs restore -v /dev/sda /mnt/Data Actually this operation fails silently (computer freezes). After examine the kernel logs I have found out that the operations fails because of „NO SPACE LEFT ON DEVICE“. Can anybody please give me a solution for this problem? Greetings Wolf Bublitz-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html