Re: SOLVED - 32-bit kernel 4.13 bug - Mount failing - unable to find logical

2017-10-18 Thread Cameron Kelley



On 10-17-2017 10:10 PM, Roman Mamedov wrote:

On Wed, 18 Oct 2017 09:24:01 +0800
Qu Wenruo  wrote:




On 2017-10-18 04:43, Cameron Kelley wrote:

Hey btrfs gurus,

I have a 4 disk btrfs filesystem that has suddenly stopped mounting
after a recent reboot. The data is in an odd configuration due to
originally being in a 3 disk RAID1 before adding a 4th disk and running
a balance to convert to RAID10. There wasn't enough free space to
completely convert, so about half the data is still in RAID1 while the
other half is in RAID10. Both metadata and system are RAID10. It has
been in this configuration for 6 months or so now since adding the 4th
disk. It just holds archived media and hasn't had any data added or
modified in quite some time. I feel pretty stupid now for not correcting
that sooner though.

I have tried mounting with different mount options for recovery, ro,
degraded, etc. Log shows errors about "unable to find logical
3746892939264 length 4096"

When I do a btrfs check, it doesn't find any issues. Running
btrfs-find-root comes up with a message about a block that the
generation doesn't match. If I specify that block on the btrfs check, I
get transid verify failures.

I ran a dry run of a recovery of the entire filesystem which runs
through every file with no errors. I would just restore the data and
start fresh, but unfortunately I don't have the free space at the moment
for the ~4.5TB of data.

I also ran full smart self tests on all 4 disks with no errors.

root@nas2:~# uname -a
Linux nas2 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:39:06
UTC 2017 i686 i686 i686 GNU/Linux


I don't think i686 kernel will cause any difference, but considering
most of us are using x86_64 to develop/test, maybe it will be a good
idea to upgrade to x86_64 kernel?


Indeed a problem with mounting on 32-bit in 4.13 has been reported recently:
https://www.spinics.net/lists/linux-btrfs/msg69734.html
with the same error message.

I believe it's this patchset that is supposed to fix that.
https://www.spinics.net/lists/linux-btrfs/msg70001.html

@Cameron maybe you didn't just reboot, but also upgraded your kernel at the
same time? In any case, try a 4.9 series kernel, or a 64-bit machine if you
want to stay with 4.13.



Just for reference to anyone else having this issue, it is indeed a bug in 
the 32-bit release of the 4.13 kernel. The x64 kernel had no issues 
mounting it.


An interesting thing to note is that I still had all the exact same mount 
issues and errors when I booted the latest PartedMagic live image with 
kernel 4.12.9 in 32-bit mode. The same PatedMagic image in 64-bit mode had 
no issues which is how I confirmed your suspicions.


Now for the part where I feel more stupid than I have in a long time.

1. Apparently I had updated the kernel one this NAS without realizing it 
since I was doing updates on multiple appliances at once a little while 
ago and just hadn't rebooted it since. When I ran into issues, I updated 
the kernel to the latest without looking at the kernel I was on just to 
see if that solved it.


2. And here's the real kicker. The processor in this NAS (Pentium E5200) 
is actually x64 capable. I must have skimmed information too quickly when 
I first built this years ago and thought it wasn't x64 capable.


I have rebuilt the NAS and I'm now running a scrub just to make sure steps 
I was taking to recover didn't cause any issues.


Anything else you would recommend to make sure there aren't any other 
issues that could have been caused by my tinkering?


Thank you very much for your help as I was banging my head against a wall. 
This NAS does so little that I tend to get careless with it. Lesson 
learned and embarrassment felt. The only solace is that this might help 
someone else who runs into this with kernel 4.13 on a 32-bit system.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Mount failing - unable to find logical

2017-10-17 Thread Cameron Kelley



On 10-17-2017 6:24 PM, Qu Wenruo wrote:
>
>
> On 2017-10-18 04:43, Cameron Kelley wrote:
>> Hey btrfs gurus,
>>
>> I have a 4 disk btrfs filesystem that has suddenly stopped mounting
>> after a recent reboot. The data is in an odd configuration due to
>> originally being in a 3 disk RAID1 before adding a 4th disk and running
>> a balance to convert to RAID10. There wasn't enough free space to
>> completely convert, so about half the data is still in RAID1 while the
>> other half is in RAID10. Both metadata and system are RAID10. It has
>> been in this configuration for 6 months or so now since adding the 4th
>> disk. It just holds archived media and hasn't had any data added or
>> modified in quite some time. I feel pretty stupid now for not
>> correcting that sooner though.
>>
>> I have tried mounting with different mount options for recovery, ro,
>> degraded, etc. Log shows errors about "unable to find logical
>> 3746892939264 length 4096"
>>
>> When I do a btrfs check, it doesn't find any issues. Running
>> btrfs-find-root comes up with a message about a block that the
>> generation doesn't match. If I specify that block on the btrfs check, I
>> get transid verify failures.
>>
>> I ran a dry run of a recovery of the entire filesystem which runs
>> through every file with no errors. I would just restore the data and
>> start fresh, but unfortunately I don't have the free space at the
>> moment for the ~4.5TB of data.
>>
>> I also ran full smart self tests on all 4 disks with no errors.
>>
>> root@nas2:~# uname -a
>> Linux nas2 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:39:06
>> UTC 2017 i686 i686 i686 GNU/Linux
>
> I don't think i686 kernel will cause any difference, but considering
> most of us are using x86_64 to develop/test, maybe it will be a good
> idea to upgrade to x86_64 kernel?
>

Thanks for the quick response.

This is an old x86 Pentium NAS I inherited, so unfortunately I'm stuck on 
a 32-bit kernel. If push comes to shove, I can disassemble another x64 
machine to test with.


>>
>> root@nas2:~# btrfs version
>> btrfs-progs v4.13.2
>>
>> root@nas2:~# btrfs fi show
>> Label: none  uuid: 827029a4-8625-4a50-a22d-0fd28dbe2d36
>>  Total devices 4 FS bytes used 4.60TiB
>>  devid1 size 2.73TiB used 2.33TiB path /dev/sdb1
>>  devid2 size 2.73TiB used 2.33TiB path /dev/sdc
>>  devid3 size 2.73TiB used 2.33TiB path /dev/sdd1
>>  devid4 size 2.73TiB used 2.33TiB path /dev/sde1
>>
>> root@nas2:~# mount /dev/sdb1 /mnt/nas2/
>> mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
>> missing codepage or helper program, or other error
>>
>> In some cases useful info is found in syslog - try
>> dmesg | tail or so.
>>
>> root@nas2:~# dmesg | tail
>> [  801.332623] BTRFS info (device sdb1): disk space caching is enabled
>> [  801.332627] BTRFS info (device sdb1): has skinny extents
>> [  801.86] BTRFS critical (device sdb1): unable to find logical
>> 3746892939264 length 4096
>> [  801.333472] BTRFS critical (device sdb1): unable to find logical
>> 3746892939264 length 4096
>> [  801.333769] BTRFS critical (device sdb1): unable to find logical
>> 3746892939264 length 4096
>> [  801.333835] BTRFS critical (device sdb1): unable to find logical
>> 3746892939264 length 4096
>> [  801.333909] BTRFS critical (device sdb1): unable to find logical
>> 3746892939264 length 4096
>> [  801.333968] BTRFS critical (device sdb1): unable to find logical
>> 3746892939264 length 4096
>> [  801.334028] BTRFS error (device sdb1): failed to read chunk root
>> [  801.365452] BTRFS error (device sdb1): open_ctree failed
>
> Some of the chunk tree failed to be read out.
>
> Either chunk tree or system chunk array has some problem.
>
> Would you please dump the chunk tree and superblock by the following
> commands?
>
> # btrfs inspect-internal dump-tree -t chunk /dev/sdb1
> # btrfs inspect-internal dump-super -fa /dev/sdb1
>

# btrfs inspect-internal dump-tree -t chunk /dev/sdb1
http://pastebin.ubuntu.com/25763241/

# btrfs inspect-internal dump-super -fa /dev/sdb1
http://pastebin.ubuntu.com/25763246/

>>
>> root@nas2:~# btrfs check /dev/sdb1
>> Checking filesystem on /dev/sdb1
>> UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36
>> checking extents
>> checking free space cache
>> cache and super generation don't match, space cache will be invalidated
>> checking fs roots
>&

Mount failing - unable to find logical

2017-10-17 Thread Cameron Kelley

Hey btrfs gurus,

I have a 4 disk btrfs filesystem that has suddenly stopped mounting after a 
recent reboot. The data is in an odd configuration due to originally being in a 
3 disk RAID1 before adding a 4th disk and running a balance to convert to 
RAID10. There wasn't enough free space to completely convert, so about half the 
data is still in RAID1 while the other half is in RAID10. Both metadata and 
system are RAID10. It has been in this configuration for 6 months or so now 
since adding the 4th disk. It just holds archived media and hasn't had any data 
added or modified in quite some time. I feel pretty stupid now for not 
correcting that sooner though.

I have tried mounting with different mount options for recovery, ro, degraded, etc. Log 
shows errors about "unable to find logical 3746892939264 length 4096"

When I do a btrfs check, it doesn't find any issues. Running btrfs-find-root 
comes up with a message about a block that the generation doesn't match. If I 
specify that block on the btrfs check, I get transid verify failures.

I ran a dry run of a recovery of the entire filesystem which runs through every 
file with no errors. I would just restore the data and start fresh, but 
unfortunately I don't have the free space at the moment for the ~4.5TB of data.

I also ran full smart self tests on all 4 disks with no errors.

root@nas2:~# uname -a
Linux nas2 4.13.7-041307-generic #201710141430 SMP Sat Oct 14 14:39:06 UTC 2017 
i686 i686 i686 GNU/Linux

root@nas2:~# btrfs version
btrfs-progs v4.13.2

root@nas2:~# btrfs fi show
Label: none  uuid: 827029a4-8625-4a50-a22d-0fd28dbe2d36
    Total devices 4 FS bytes used 4.60TiB
    devid    1 size 2.73TiB used 2.33TiB path /dev/sdb1
    devid    2 size 2.73TiB used 2.33TiB path /dev/sdc
    devid    3 size 2.73TiB used 2.33TiB path /dev/sdd1
    devid    4 size 2.73TiB used 2.33TiB path /dev/sde1

root@nas2:~# mount /dev/sdb1 /mnt/nas2/
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
   missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

root@nas2:~# dmesg | tail
[  801.332623] BTRFS info (device sdb1): disk space caching is enabled
[  801.332627] BTRFS info (device sdb1): has skinny extents
[  801.86] BTRFS critical (device sdb1): unable to find logical 
3746892939264 length 4096
[  801.333472] BTRFS critical (device sdb1): unable to find logical 
3746892939264 length 4096
[  801.333769] BTRFS critical (device sdb1): unable to find logical 
3746892939264 length 4096
[  801.333835] BTRFS critical (device sdb1): unable to find logical 
3746892939264 length 4096
[  801.333909] BTRFS critical (device sdb1): unable to find logical 
3746892939264 length 4096
[  801.333968] BTRFS critical (device sdb1): unable to find logical 
3746892939264 length 4096
[  801.334028] BTRFS error (device sdb1): failed to read chunk root
[  801.365452] BTRFS error (device sdb1): open_ctree failed

root@nas2:~# btrfs check /dev/sdb1
Checking filesystem on /dev/sdb1
UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 5054297628672 bytes used, no error found
total csum bytes: 4929567064
total tree bytes: 5197856768
total fs tree bytes: 15237120
total extent tree bytes: 43433984
btree space waste bytes: 161510789
file data blocks allocated: 5050024812544
 referenced 5049610178560

root@nas2:~# btrfs-find-root /dev/sdb1
Superblock thinks the generation is 147970
Superblock thinks the level is 1
Found tree root at 21335861559296 gen 147970 level 1
Well block 21335857758208(gen: 147969 level: 1) seems good, but 
generation/level doesn't match, want gen: 147970 level: 1

root@nas2:~# btrfs check -r 21335857758208 /dev/sdb1
parent transid verify failed on 21335857758208 wanted 147970 found 147969
parent transid verify failed on 21335857758208 wanted 147970 found 147969
parent transid verify failed on 21335857758208 wanted 147970 found 147969
parent transid verify failed on 21335857758208 wanted 147970 found 147969
Ignoring transid failure
Checking filesystem on /dev/sdb1
UUID: 827029a4-8625-4a50-a22d-0fd28dbe2d36
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
ERROR: transid errors in file system
found 5054297628672 bytes used, error(s) found
total csum bytes: 4929567064
total tree bytes: 5197856768
total fs tree bytes: 15237120
total extent tree bytes: 43433984
btree space waste bytes: 161510789
file data blocks allocated: 5050024812544
 referenced 5049610178560
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html