Re: Bad hard drive - checksum verify failure forces readonly mount
Bug reported https://bugzilla.kernel.org/show_bug.cgi?id=121491 Thank you for helping. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
A Dom, 26-06-2016 às 13:54 -0600, Chris Murphy escreveu: > On Sun, Jun 26, 2016 at 7:05 AM, Vasco Almeida <vascomalme...@sapo.pt > > wrote: > > I have tried "btrfs check --repair /device" but that seems do not > > do > > any good. > > http://paste.fedoraproject.org/384960/66945936/ > > It did fix things, in particular with the snapshot that was having > problems being dropped. But it's not enough it seems to prevent it > from going read only. > > There's more than one bug here, you might see if the repair was good > enough that it's possible to use brtfs-image now. File system image available at (choose one link) https://mega.nz/#!AkAEgKyB!RUa7G5xHIygWm0ALx5ZxQjjXNdFYa7lDRHJ_sW0bWLs https://www.sendspace.com/file/i70cft > If not, use > btrfs-debug-tree > file.txt and post that file somewhere. This > does expose file names. Maybe that'll shed some light on the problem. > But also worth filing a bug at bugzilla.kernel.org with this debug > tree referenced (probably too big to attach), maybe a dev will be > able > to look at it and improve things so they don't fail. Should I file a bug report with that image dump linked above or btrfs- debug-tree output or both? I think I will use the subject of this thread as summary to file the bug. Can you think of something more suitable or is that fine? > > What else can I do or I must rebuild the file system? > > Well, it's a long shot but you could try using --repair --init-csum > which will create a new csum tree. But that applies to data, if the > problem with it going read only is due to metadata corruption this > won't help. And then last you could try --init-extent-tree. Thing I > can't answer is which order to do it in. > > In any case there will be files that you shouldn't trust after csum > has been recreated, anything corrupt will now have a new csum, so you > can get silent data corruption. It's better to just blow away this > file system and make a new one and reinstall the OS. But if you're > feeling brave, you can try one or both of those additional options > and > see if they can help. I think I will reinstall the OS since, even if I manage to recover the file system from this issue, that OS will be something I can not trust fully. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
A Sáb, 25-06-2016 às 14:54 -0600, Chris Murphy escreveu: > On Sat, Jun 25, 2016 at 2:10 PM, Vasco Almeida <vascomalme...@sapo.pt > > wrote: > > Citando Chris Murphy <li...@colorremedies.com>: > > > 3. btrfs-image so that devs can see what's causing the problem > > > that > > > the current code isn't handling well enough. > > > > > > btrfs-image does not create dump image: > > > > # btrfs-image /dev/mapper/vg_pupu-lv_opensuse_root > > btrfs-lv_opensuse_root.image > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > Csum didn't match > > Error reading metadata block > > Error adding block -5 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 > > Csum didn't match > > Error reading metadata block > > Error flushing pending -5 > > create failed (Success) > > # echo $? > > 1 > > Well it's pretty strange to have DUP metadata and for the checksum > verify to fail on both copies. I don't have much optimism that brfsck > repair can fix it either. But still it's worth a shot since there's > not much else to go on. I have tried "btrfs check --repair /device" but that seems do not do any good. http://paste.fedoraproject.org/384960/66945936/ I then issued "mount /device /mnt" and, like before, it was mounted readwrite and then forced readonly. Got some kernel oops and traces. I noticed that btrfs-balance was using ~100% CPU whilst btrfs device was mounted readonly. I let it run for about 20 minutes. Then had to reboot because the system was no responding well: was unable to open or close applications, use internet. Did SysRq+reisu (operations were enabled) and then pressed reset button on computer. Unfortunately dmesg dumps were lost after resetting computer. What else can I do or I must rebuild the file system? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
Citando Chris Murphy <li...@colorremedies.com>: On Fri, Jun 24, 2016 at 6:06 PM, Vasco Almeida <vascomalme...@sapo.pt> wrote: Citando Chris Murphy <li...@colorremedies.com>: dmesg http://paste.fedoraproject.org/384352/80842814/ [ 1837.386732] BTRFS info (device dm-9): continuing balance [ 1838.006038] BTRFS info (device dm-9): relocating block group 15799943168 flags 34 [ 1838.684892] BTRFS info (device dm-9): relocating block group 10934550528 flags 36 [ 1839.301453] [ cut here ] [ 1839.301495] WARNING: CPU: 3 PID: 76 at fs/btrfs/extent-tree.c:1625 lookup_inline_extent_backref+0x45c/0x5a0 [btrfs]() followed by [ 1839.301797] WARNING: CPU: 3 PID: 76 at fs/btrfs/extent-tree.c:2946 btrfs_run_delayed_refs+0x29d/0x2d0 [btrfs]() [ 1839.301798] BTRFS: Transaction aborted (error -5) [...] [ 1839.301972] BTRFS: error (device dm-9) in btrfs_run_delayed_refs:2946: errno=-5 IO failure [ 1839.301975] BTRFS info (device dm-9): forced readonly So it looks like it was resuming a balance automatically, and while processing delayed references it's running into something it doesn't expect and doesn't have a way to fix, so it goes read only to avoid causing more problems. I would do a couple things in order: 1. Mount ro and copy off what you want in case the whole thing gets worse and can't ever be mounted again. 2. Mount with only these options: -o skip_balance,subvolid=5,nospace_cache I have mounted with that options and was readwrite first and then it forces readonly. You can see a delay between first BTRFS messages and the "BTRFS info: forced readonly" message in dmesg. /dev/mapper/vg_pupu-lv_opensuse_root on /mnt type btrfs (ro,relatime,seclabel,nospace_cache,skip_balance,subvolid=5,subvol=/) If it mounts rw, don't do anything with it, just see if it cleans up after itself. It also looks from the previous trace it was trying to remove a snapshot and there are complaints of problems in that snapshot. So hopefully just waiting 5 minutes doing nothing and it'll clean up after itself (you can check with top to see if there are any btrfs related transactions that run including the btrfs-cleaner process) wait until they're done. I can see that btrfs processes including btrfs-cleaner but they may be not doing much since device was forced readonly after mounting it. Then umount. If you want you could have two other consoles ready first, one for 'journalctl -f' and another for sysrq+t to issue in case you get a hang. This doesn't fix anything but it collects more information for a bug report for the devs. Once you get it umounted normally or by force, the next thing to do is I have umount it normally (umount /mnt) after more than 20 minutes since mounting it. 3. btrfs-image so that devs can see what's causing the problem that the current code isn't handling well enough. btrfs-image does not create dump image: # btrfs-image /dev/mapper/vg_pupu-lv_opensuse_root btrfs-lv_opensuse_root.image checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 Csum didn't match Error reading metadata block Error adding block -5 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 checksum verify failed on 437944320 found 8BF8C752 wanted 39F456C8 Csum didn't match Error reading metadata block Error flushing pending -5 create failed (Success) # echo $? 1 4. btrfs check --repair Did not issue this command yet. dmesg http://paste.fedoraproject.org/384799/14668851/ Thank your for helping. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad hard drive - checksum verify failure forces readonly mount
Citando Chris Murphy <li...@colorremedies.com>: On Fri, Jun 24, 2016 at 9:52 AM, Vasco Almeida <vascomalme...@sapo.pt> wrote: From the pasted kernel messages: > Linux version 3.18.34-std473-amd64 (root@rl-sysrcd-p11) (gcc version 4.8.5 > (Gentoo 4.8.5 p1.3, pie-0.6.2) ) #2 SMP Tue May 24 20:34:19 UTC 2016 3.18.34 is ancient. Find something newer and try to remount normally. Present information concerns openSUSE Leap 42.1 (x86_64) mount of root file system at boot time. That should mount it normally. Hope that fits what you mean. OK but it's not mounting it normally, it's still being forced readonly at btrfs_drop_snapshot and the only thing I'm coming up with search wise is that it's related to qgroups. Have you enabled quotas on this file system ever? Unless openSUSE does that by default, I did not enable quotas. It is not something I am aware of doing. btrfs-progs v4.1.2+20151002 A lot of changes have happened since 4.1.2 I would still use something newer and try to repair it. By repair do you mean issue "btrfs check --repair /device" ? $ /usr/sbin/btrfs fi df / Data, single: total=10.01GiB, used=9.06GiB System, DUP: total=64.00MiB, used=16.00KiB Metadata, DUP: total=1.12GiB, used=596.69MiB GlobalReserve, single: total=208.00MiB, used=0.00B I forgot to mention in last e-mail that I ran Marc MERLIN's scrubbing script [1] after mounting the device with "-o ro,recovery" on System Rescue CD. Even after that device is forced readonly. OK but System Rescue CD uses an old kernel by btrfs standards, even account for all the backports in that particular version: 4.7.3) 2016-06-04: Standard kernels: Long-Term-Supported linux-3.18.34 (rescue32 + rescue64) So that's why I'm suggesting you use something newer, like 4.5.x, same for btrfs-progs. The old versions aren't working. There's no assurance it'll work with new versions, but that it doesn't get fixed up with old versions means you either try new versions or you rebuild the file system. *shrug* I am using Fedora 24 and have issued "mount /dev/mapper/vg_pupu-lv_opensuse_root /mnt". Got some call trace and scary stuff that did not get before on other systems. Please check dmesg output linked below. Linux catarina 4.5.7-300.fc24.x86_64 #1 SMP Wed Jun 8 18:12:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux btrfs-progs v4.5.2 # btrfs fi show Label: none uuid: ad167e92-fbb1-4148-b54d-6345b6fb26da Total devices 1 FS bytes used 9.63GiB devid1 size 50.00GiB used 12.32GiB path /dev/mapper/vg_pupu-lv_opensuse_root # btrfs fi df /mnt/ Data, single: total=10.01GiB, used=9.05GiB System, DUP: total=32.00MiB, used=16.00KiB Metadata, DUP: total=1.12GiB, used=597.62MiB GlobalReserve, single: total=208.00MiB, used=224.00KiB dmesg http://paste.fedoraproject.org/384352/80842814/ dmesg after umount http://paste.fedoraproject.org/384359/14668108/ diff between two http://paste.fedoraproject.org/384364/11704146/ btrfs check --readonly /dev/mappper/vg_pupu-lv_opensuse_root http://paste.fedoraproject.org/384361/68112421/ After umount and mounting again, the device was normally mounted readwrite again: /dev/mapper/vg_pupu-lv_opensuse_root on /mnt type btrfs (rw,relatime,seclabel,space_cache,subvolid=259,subvol=/@/.snapshots/1/snapshot) But trying to umount it afterwards makes umount command hang. Device no longer shows on mount output, though. CTRL-C or SIGTERM can't kill umount. dmesg http://paste.fedoraproject.org/384371/14668130/ I would like to find a solution to be able to mount normally readwrite again and hopefully understand what caused the issue. My best guess is qgroup related, there were a lot of problems with multiple quota implementations and snapshots and openSUSE does take many many snapshots. So that could be it. But without a reproducer it's hard to say what caused it. Thank you again for your time and reply. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bad hard drive - checksum verify failure forces readonly mount
I was running OpenSuse Leap 42.1 with btrfs and LVM (Logical Volume Management). Last time I've checked smartd log, I noticed there were 30 sector pending reallocation and 1 unrecoverable bad sector on hard drive. I think my hard drive got some sector corrupted and now btrfs fails some checksum and forces mount readonly. The device is successfully mounted readonly. OpenSuse dmesg reported: BTRFS: dm-1 checksum verify failed on 437944320 wanted 39F45669 found 8BF8C752 leval 0 (more 2 times) BTRFS: error (device dm-1) in btrfs_drop_snapshot:???: error=-5 IO failure BTRFS: info (device dm-1): forced readonly Now I'm on System Rescue CD and that is not reported. I've written down those log line on paper, so there may be some typo. Seemingly there is no journalctl installed on this system to check OpenSuse logs again. All the following logs are on System Rescue CD. mount -o ro,recovery /dev/mapper/vg_pupu-lv_opensuse_root /mnt/opensuse https://bpaste.net/show/263e5f7ae9d4 After mounting and umounting several times with and without "-o ro,recovery" https://bpaste.net/show/43eb64decb63 btrfs check --readonly /dev/mapper/vg_pupu-lv_opensuse_root https://bpaste.net/show/7ecf422c73a2 Would it be apropriate to run any of "btrfs check --repair /device" or "btrfs check --init-csum-tree /device" to be able to mount readwrite again? smartctl --all /dev/disk/by-id/ata-SAMSUNG_HD154UI_S1Y6JDWSC01351 https://bpaste.net/show/a6c132618974 btrfs check manpage: https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-check btrfsck page: https://btrfs.wiki.kernel.org/index.php/Btrfsck -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html