Re: Seeking Help on Corruption Issues
On Tue, Oct 03, 2017 at 03:49:25PM -0700, Stephen Nesbitt wrote: > > On 10/3/2017 2:11 PM, Hugo Mills wrote: > >Hi, Stephen, > > > >On Tue, Oct 03, 2017 at 08:52:04PM +, Stephen Nesbitt wrote: > >>Here it i. There are a couple of out-of-order entries beginning at 117. And > >>yes I did uncover a bad stick of RAM: > >> > >>btrfs-progs v4.9.1 > >>leaf 2589782867968 items 134 free space 6753 generation 3351574 owner 2 > >>fs uuid 24b768c3-2141-44bf-ae93-1c3833c8c8e3 > >>chunk uuid 19ce12f0-d271-46b8-a691-e0d26c1790c6 > >[snip] > >>item 116 key (1623012749312 EXTENT_ITEM 45056) itemoff 10908 itemsize 53 > >>extent refs 1 gen 3346444 flags DATA > >>extent data backref root 271 objectid 2478 offset 0 count 1 > >>item 117 key (1621939052544 EXTENT_ITEM 8192) itemoff 10855 itemsize 53 > >>extent refs 1 gen 3346495 flags DATA > >>extent data backref root 271 objectid 21751764 offset 6733824 count 1 > >>item 118 key (1623012450304 EXTENT_ITEM 8192) itemoff 10802 itemsize 53 > >>extent refs 1 gen 3351513 flags DATA > >>extent data backref root 271 objectid 5724364 offset 680640512 count 1 > >>item 119 key (1623012802560 EXTENT_ITEM 12288) itemoff 10749 itemsize 53 > >>extent refs 1 gen 3346376 flags DATA > >>extent data backref root 271 objectid 21751764 offset 6701056 count 1 > hex(1623012749312) > >'0x179e3193000' > hex(1621939052544) > >'0x179a319e000' > hex(1623012450304) > >'0x179e314a000' > hex(1623012802560) > >'0x179e31a' > > > >That's "e" -> "a" in the fourth hex digit, which is a single-bit > >flip, and should be fixable by btrfs check (I think). However, even > >fixing that, it's not ordered, because 118 is then before 117, which > >could be another bitflip ("9" -> "4" in the 7th digit), but two bad > >bits that close to each other seems unlikely to me. > > > >Hugo. > > Hope this is a duplicate reply - I might have fat fingered something. > > The underlying file is disposable/replaceable. Any way to zero > out/zap the bad BTRFS entry? Not really. Even trying to delete the related file(s), it's going to fall over when reading the metadata in in the first place. (The key order check is a metadata invariant, like the csum checks and transid checks). At best, you'd have to get btrfs check to fix it. It should be able to manage a single-bit error, but you've got two single-bit errors in close proximity, and I'm not sure it'll be able to deal with it. Might be worth trying it. The FS _might_ blow up as a result of an attempted fix, but you say it's replacable, so that's kind of OK. The worst I'd _expect_ to happen with btrfs check --repair is that it just won't be able to deal with it and you're left where you started. Go for it. Hugo. -- Hugo Mills | You shouldn't anthropomorphise computers. They hugo@... carfax.org.uk | really don't like that. http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: Seeking Help on Corruption Issues
On 10/3/2017 2:11 PM, Hugo Mills wrote: Hi, Stephen, On Tue, Oct 03, 2017 at 08:52:04PM +, Stephen Nesbitt wrote: Here it i. There are a couple of out-of-order entries beginning at 117. And yes I did uncover a bad stick of RAM: btrfs-progs v4.9.1 leaf 2589782867968 items 134 free space 6753 generation 3351574 owner 2 fs uuid 24b768c3-2141-44bf-ae93-1c3833c8c8e3 chunk uuid 19ce12f0-d271-46b8-a691-e0d26c1790c6 [snip] item 116 key (1623012749312 EXTENT_ITEM 45056) itemoff 10908 itemsize 53 extent refs 1 gen 3346444 flags DATA extent data backref root 271 objectid 2478 offset 0 count 1 item 117 key (1621939052544 EXTENT_ITEM 8192) itemoff 10855 itemsize 53 extent refs 1 gen 3346495 flags DATA extent data backref root 271 objectid 21751764 offset 6733824 count 1 item 118 key (1623012450304 EXTENT_ITEM 8192) itemoff 10802 itemsize 53 extent refs 1 gen 3351513 flags DATA extent data backref root 271 objectid 5724364 offset 680640512 count 1 item 119 key (1623012802560 EXTENT_ITEM 12288) itemoff 10749 itemsize 53 extent refs 1 gen 3346376 flags DATA extent data backref root 271 objectid 21751764 offset 6701056 count 1 hex(1623012749312) '0x179e3193000' hex(1621939052544) '0x179a319e000' hex(1623012450304) '0x179e314a000' hex(1623012802560) '0x179e31a' That's "e" -> "a" in the fourth hex digit, which is a single-bit flip, and should be fixable by btrfs check (I think). However, even fixing that, it's not ordered, because 118 is then before 117, which could be another bitflip ("9" -> "4" in the 7th digit), but two bad bits that close to each other seems unlikely to me. Hugo. Hope this is a duplicate reply - I might have fat fingered something. The underlying file is disposable/replaceable. Any way to zero out/zap the bad BTRFS entry? -steve -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Seeking Help on Corruption Issues
Hi, Stephen, On Tue, Oct 03, 2017 at 08:52:04PM +, Stephen Nesbitt wrote: > Here it i. There are a couple of out-of-order entries beginning at 117. And > yes I did uncover a bad stick of RAM: > > btrfs-progs v4.9.1 > leaf 2589782867968 items 134 free space 6753 generation 3351574 owner 2 > fs uuid 24b768c3-2141-44bf-ae93-1c3833c8c8e3 > chunk uuid 19ce12f0-d271-46b8-a691-e0d26c1790c6 [snip] > item 116 key (1623012749312 EXTENT_ITEM 45056) itemoff 10908 itemsize 53 > extent refs 1 gen 3346444 flags DATA > extent data backref root 271 objectid 2478 offset 0 count 1 > item 117 key (1621939052544 EXTENT_ITEM 8192) itemoff 10855 itemsize 53 > extent refs 1 gen 3346495 flags DATA > extent data backref root 271 objectid 21751764 offset 6733824 count 1 > item 118 key (1623012450304 EXTENT_ITEM 8192) itemoff 10802 itemsize 53 > extent refs 1 gen 3351513 flags DATA > extent data backref root 271 objectid 5724364 offset 680640512 count 1 > item 119 key (1623012802560 EXTENT_ITEM 12288) itemoff 10749 itemsize 53 > extent refs 1 gen 3346376 flags DATA > extent data backref root 271 objectid 21751764 offset 6701056 count 1 >>> hex(1623012749312) '0x179e3193000' >>> hex(1621939052544) '0x179a319e000' >>> hex(1623012450304) '0x179e314a000' >>> hex(1623012802560) '0x179e31a' That's "e" -> "a" in the fourth hex digit, which is a single-bit flip, and should be fixable by btrfs check (I think). However, even fixing that, it's not ordered, because 118 is then before 117, which could be another bitflip ("9" -> "4" in the 7th digit), but two bad bits that close to each other seems unlikely to me. Hugo. -- Hugo Mills | Great films about cricket: Silly Point Break hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: Seeking Help on Corruption Issues
On Tue, Oct 03, 2017 at 01:06:50PM -0700, Stephen Nesbitt wrote: > All: > > I came back to my computer yesterday to find my filesystem in read > only mode. Running a btrfs scrub start -dB aborts as follows: > > btrfs scrub start -dB /mnt > ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 > (Input/output error) > ERROR: scrubbing /mnt failed for device id 5: ret=-1, errno=5 > (Input/output error) > scrub device /dev/sdb (id 4) canceled > scrub started at Mon Oct 2 21:51:46 2017 and was aborted after > 00:09:02 > total bytes scrubbed: 75.58GiB with 1 errors > error details: csum=1 > corrected errors: 0, uncorrectable errors: 1, unverified errors: 0 > scrub device /dev/sdc (id 5) canceled > scrub started at Mon Oct 2 21:51:46 2017 and was aborted after > 00:11:11 > total bytes scrubbed: 50.75GiB with 0 errors > > The resulting dmesg is: > [ 699.534066] BTRFS error (device sdc): bdev /dev/sdb errs: wr 0, > rd 0, flush 0, corrupt 6, gen 0 > [ 699.703045] BTRFS error (device sdc): unable to fixup (regular) > error at logical 1609808347136 on dev /dev/sdb > [ 783.306525] BTRFS critical (device sdc): corrupt leaf, bad key > order: block=2589782867968, root=1, slot=116 This error usually means bad RAM. Can you show us the output of "btrfs-debug-tree -b 2589782867968 /dev/sdc"? Hugo. > [ 789.776132] BTRFS critical (device sdc): corrupt leaf, bad key > order: block=2589782867968, root=1, slot=116 > [ 911.529842] BTRFS critical (device sdc): corrupt leaf, bad key > order: block=2589782867968, root=1, slot=116 > [ 918.365225] BTRFS critical (device sdc): corrupt leaf, bad key > order: block=2589782867968, root=1, slot=116 > > Running btrfs check /dev/sdc results in: > btrfs check /dev/sdc > Checking filesystem on /dev/sdc > UUID: 24b768c3-2141-44bf-ae93-1c3833c8c8e3 > checking extents > bad key ordering 116 117 > bad block 2589782867968 > ERROR: errors found in extent allocation tree or chunk allocation > checking free space cache > There is no free space entry for 1623012450304-1623012663296 > There is no free space entry for 1623012450304-1623225008128 > cache appears valid but isn't 1622151266304 > found 288815742976 bytes used err is -22 > total csum bytes: 0 > total tree bytes: 350781440 > total fs tree bytes: 0 > total extent tree bytes: 350027776 > btree space waste bytes: 115829777 > file data blocks allocated: 156499968 > > uname -a: > Linux sysresccd 4.9.24-std500-amd64 #2 SMP Sat Apr 22 17:14:43 UTC > 2017 x86_64 Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz GenuineIntel > GNU/Linux > > btrfs --version: btrfs-progs v4.9.1 > > btrfs fi show: > Label: none uuid: 24b768c3-2141-44bf-ae93-1c3833c8c8e3 > Total devices 2 FS bytes used 475.08GiB > devid 4 size 931.51GiB used 612.06GiB path /dev/sdb > devid 5 size 931.51GiB used 613.09GiB path /dev/sdc > > btrfs fi df /mnt: > Data, RAID1: total=603.00GiB, used=468.03GiB > System, RAID1: total=64.00MiB, used=112.00KiB > System, single: total=32.00MiB, used=0.00B > Metadata, RAID1: total=9.00GiB, used=7.04GiB > Metadata, single: total=1.00GiB, used=0.00B > GlobalReserve, single: total=512.00MiB, used=0.00B > > What is the recommended procedure at this point? Run btrfs check > --repair? I have backups so losing a file or two isn't critical, but > I really don't want to go through the effort of a bare metal > reinstall. > > In the process of researching this I did uncover a bad DIMM. Am I > correct that the problems I'm seeing are likely linked to the > resulting memory errors. > > Thx in advance, > > -steve > -- Hugo Mills | Quidquid latine dictum sit, altum videtur hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Seeking Help on Corruption Issues
All: I came back to my computer yesterday to find my filesystem in read only mode. Running a btrfs scrub start -dB aborts as follows: btrfs scrub start -dB /mnt ERROR: scrubbing /mnt failed for device id 4: ret=-1, errno=5 (Input/output error) ERROR: scrubbing /mnt failed for device id 5: ret=-1, errno=5 (Input/output error) scrub device /dev/sdb (id 4) canceled scrub started at Mon Oct 2 21:51:46 2017 and was aborted after 00:09:02 total bytes scrubbed: 75.58GiB with 1 errors error details: csum=1 corrected errors: 0, uncorrectable errors: 1, unverified errors: 0 scrub device /dev/sdc (id 5) canceled scrub started at Mon Oct 2 21:51:46 2017 and was aborted after 00:11:11 total bytes scrubbed: 50.75GiB with 0 errors The resulting dmesg is: [ 699.534066] BTRFS error (device sdc): bdev /dev/sdb errs: wr 0, rd 0, flush 0, corrupt 6, gen 0 [ 699.703045] BTRFS error (device sdc): unable to fixup (regular) error at logical 1609808347136 on dev /dev/sdb [ 783.306525] BTRFS critical (device sdc): corrupt leaf, bad key order: block=2589782867968, root=1, slot=116 [ 789.776132] BTRFS critical (device sdc): corrupt leaf, bad key order: block=2589782867968, root=1, slot=116 [ 911.529842] BTRFS critical (device sdc): corrupt leaf, bad key order: block=2589782867968, root=1, slot=116 [ 918.365225] BTRFS critical (device sdc): corrupt leaf, bad key order: block=2589782867968, root=1, slot=116 Running btrfs check /dev/sdc results in: btrfs check /dev/sdc Checking filesystem on /dev/sdc UUID: 24b768c3-2141-44bf-ae93-1c3833c8c8e3 checking extents bad key ordering 116 117 bad block 2589782867968 ERROR: errors found in extent allocation tree or chunk allocation checking free space cache There is no free space entry for 1623012450304-1623012663296 There is no free space entry for 1623012450304-1623225008128 cache appears valid but isn't 1622151266304 found 288815742976 bytes used err is -22 total csum bytes: 0 total tree bytes: 350781440 total fs tree bytes: 0 total extent tree bytes: 350027776 btree space waste bytes: 115829777 file data blocks allocated: 156499968 uname -a: Linux sysresccd 4.9.24-std500-amd64 #2 SMP Sat Apr 22 17:14:43 UTC 2017 x86_64 Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz GenuineIntel GNU/Linux btrfs --version: btrfs-progs v4.9.1 btrfs fi show: Label: none uuid: 24b768c3-2141-44bf-ae93-1c3833c8c8e3 Total devices 2 FS bytes used 475.08GiB devid 4 size 931.51GiB used 612.06GiB path /dev/sdb devid 5 size 931.51GiB used 613.09GiB path /dev/sdc btrfs fi df /mnt: Data, RAID1: total=603.00GiB, used=468.03GiB System, RAID1: total=64.00MiB, used=112.00KiB System, single: total=32.00MiB, used=0.00B Metadata, RAID1: total=9.00GiB, used=7.04GiB Metadata, single: total=1.00GiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B What is the recommended procedure at this point? Run btrfs check --repair? I have backups so losing a file or two isn't critical, but I really don't want to go through the effort of a bare metal reinstall. In the process of researching this I did uncover a bad DIMM. Am I correct that the problems I'm seeing are likely linked to the resulting memory errors. Thx in advance, -steve -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html