Re: Filesystem forced to readonly after use
On 2016-09-13 16:39, Cesar Strauss wrote: On 13-09-2016 16:49, Austin S. Hemmelgarn wrote: I'd be kind of curious to see the results from btrfs check run without repair, but I doubt that will help narrow things down any further. Attached. As of right now, the absolute first thing I'd do is check your logs to see if you can find any indication of errors from the disk itself. I don't think it's likely, but it's worth checking. Will do. The couple of lines just before the crash in the attached kernel log would indicate to me that some of the metadata is corrupted. There are two likely possibilities for how that happened: 1. Running with no extra space for new chunks to be allocated is not a common use case, so it's not well tested, and it wouldn't surprise me if some accounting falls apart in that situation. Indeed. I periodically remove old snapshots and check for disk space, bit I guess I ran a bit too near the limit this time. In theory, BTRFS _should_ work in such a situation. In practice, you get all kinds of odd behaviors. In your case, you still have a reasonable amount of free space in both data and metadata chunks, so it isn't quite as bad as it could be (trying to get a FS working again when you have zero space in any chunks is a serious pain). 2. You might have bad RAM or a bad PSU. This is the second thing you should check after checking to see if the disk is OK, as either will likely cause any repair attempts to make things worse. RAM is pretty easy to check, but for a PSU you need a proper testing device. You can get such a device on Amazon or similar sites for about 25USD, and it's generally worth having around for troubleshooting. Understood. This notebook has occasional failures when resuming from hibernation. I suppose, from the point of view of the filesystem, this corresponds to an unclean reboot. Yeah, although it's generally not quite as bad as an unclean reboot (default configurations on almost all Linux distros call sync just before the actual power off, so you don't have to worry about stuff in the write cache being lost). That said, it can also be worse than an unclean reboot depending on when the crash happens. This brings up a good point though that I forgot, repeated unclean shutdowns (or failed resumes) can cause stuff like this to happen. I don't often think about it since I rarely have issues with power loss or hard crashes (and I don't use hibernation), so it's not something I often remember to mention when helping people with filesystem issues. Assuming your disk and RAM are good, the next thing to do would be try and get the filesystem into a more usable state. The best option for this is to expand the filesystem if possible. Given that you're running right near capacity, I'd suggest at least 16G of extra space if possible. If that isn't a viable solution for you, the other option is to delete some of the oldest snapshots (Ideally enough that you have at least a few GB of extra space in the data chunks and a few hundred MB in the metadata chunks), then add a 4-8GB device to the FS temporarily (a ramdisk or flash drive works well for this), and run a full balance. If you're lucky, this will fix any metadata that's messed up, and the system should be usable. If not, it shouldn't make things any worse, and you probably want to look at btrfs restore to copy out the data to a new filesystem (ideally a bigger one). I will try this next. Like Chris mentioned, you probably want to use a different version of btrfs-progs. I hadn't seen that that version was marked to not be used, otherwise I would have said something in my first reply. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Filesystem forced to readonly after use
>From the fsck... bad block 160420741120 I can't tell though if that's a bad Btrfs leaf/node where both dup copies are bad; or if it's a bad sector. I'd mount it ro, and take a backup of anything you care about before proceeding further. smartctl -x might reveal if there are problems the drive itself is aware of. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Filesystem forced to readonly after use
On 13-09-2016 16:39, Chris Murphy wrote: I just wouldn't use btrfs repair with this version of progs, go back to v4.6.1 or upgrade to 4.7.2. Thanks for the tip. I upgraded to 4.7.2. Cesar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Filesystem forced to readonly after use
On 13-09-2016 16:49, Austin S. Hemmelgarn wrote: I'd be kind of curious to see the results from btrfs check run without repair, but I doubt that will help narrow things down any further. Attached. As of right now, the absolute first thing I'd do is check your logs to see if you can find any indication of errors from the disk itself. I don't think it's likely, but it's worth checking. Will do. The couple of lines just before the crash in the attached kernel log would indicate to me that some of the metadata is corrupted. There are two likely possibilities for how that happened: 1. Running with no extra space for new chunks to be allocated is not a common use case, so it's not well tested, and it wouldn't surprise me if some accounting falls apart in that situation. Indeed. I periodically remove old snapshots and check for disk space, bit I guess I ran a bit too near the limit this time. 2. You might have bad RAM or a bad PSU. This is the second thing you should check after checking to see if the disk is OK, as either will likely cause any repair attempts to make things worse. RAM is pretty easy to check, but for a PSU you need a proper testing device. You can get such a device on Amazon or similar sites for about 25USD, and it's generally worth having around for troubleshooting. Understood. This notebook has occasional failures when resuming from hibernation. I suppose, from the point of view of the filesystem, this corresponds to an unclean reboot. Assuming your disk and RAM are good, the next thing to do would be try and get the filesystem into a more usable state. The best option for this is to expand the filesystem if possible. Given that you're running right near capacity, I'd suggest at least 16G of extra space if possible. If that isn't a viable solution for you, the other option is to delete some of the oldest snapshots (Ideally enough that you have at least a few GB of extra space in the data chunks and a few hundred MB in the metadata chunks), then add a 4-8GB device to the FS temporarily (a ramdisk or flash drive works well for this), and run a full balance. If you're lucky, this will fix any metadata that's messed up, and the system should be usable. If not, it shouldn't make things any worse, and you probably want to look at btrfs restore to copy out the data to a new filesystem (ideally a bigger one). I will try this next. Thanks for the help! Cesar checking extents parent transid verify failed on 160420773888 wanted 181826 found 181573 parent transid verify failed on 160420773888 wanted 181826 found 181573 parent transid verify failed on 160420773888 wanted 181826 found 181573 parent transid verify failed on 160420773888 wanted 181826 found 181573 Ignoring transid failure leaf parent key incorrect 160420773888 parent transid verify failed on 160420773888 wanted 181826 found 181573 Ignoring transid failure leaf parent key incorrect 160420773888 parent transid verify failed on 160420773888 wanted 181826 found 181573 Ignoring transid failure leaf parent key incorrect 160420773888 parent transid verify failed on 160420773888 wanted 181826 found 181573 Ignoring transid failure leaf parent key incorrect 160420773888 parent transid verify failed on 160420773888 wanted 181826 found 181573 Ignoring transid failure leaf parent key incorrect 160420773888 parent transid verify failed on 160420773888 wanted 181826 found 181573 Ignoring transid failure leaf parent key incorrect 160420773888 parent transid verify failed on 160420773888 wanted 181826 found 181573 Ignoring transid failure leaf parent key incorrect 160420773888 parent transid verify failed on 160420773888 wanted 181826 found 181573 Ignoring transid failure leaf parent key incorrect 160420773888 parent transid verify failed on 160420773888 wanted 181826 found 181573 Ignoring transid failure leaf parent key incorrect 160420773888 parent transid verify failed on 160418889728 wanted 181826 found 181572 parent transid verify failed on 160418889728 wanted 181826 found 181572 parent transid verify failed on 160418889728 wanted 181826 found 181572 parent transid verify failed on 160418889728 wanted 181826 found 181572 parent transid verify failed on 160420741120 wanted 181826 found 181573 parent transid verify failed on 160420741120 wanted 181826 found 181573 parent transid verify failed on 160420741120 wanted 181826 found 181573 parent transid verify failed on 160420741120 wanted 181826 found 181573 Ignoring transid failure leaf parent key incorrect 160420741120 bad block 160420741120 Errors found in extent allocation tree or chunk allocation parent transid verify failed on 160420773888 wanted 181826 found 181573 Ignoring transid failure parent transid verify failed on 160418889728 wanted 181826 found 181572 parent transid verify failed on 160418889728 wanted 181826 found 181572 parent transid verify failed on 160420741120 wanted 181826 found 181573 Ignoring transid failure Error: could not find btree root
Re: Filesystem forced to readonly after use
On Tue, Sep 13, 2016 at 1:49 PM, Austin S. Hemmelgarnwrote: > On 2016-09-13 15:20, Cesar Strauss wrote: >> >> btrfs-progs v4.7 > > It's always good to see people who are staying up-to-date on the kernel and > userspace :) Yes, although it and 4.7.1 are marked as do not use. https://btrfs.wiki.kernel.org/index.php/Changelog#btrfs-progs-4.7.2_.28Sep_2016.29 -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Filesystem forced to readonly after use
On 2016-09-13 15:20, Cesar Strauss wrote: Hello, I have a BTRFS filesystem that is reverting to read-only after a few moments of use. There is a stack trace visible in the kernel log, which is attached. Here is my system information: # uname -a Linux rescue 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux # btrfs --version btrfs-progs v4.7 It's always good to see people who are staying up-to-date on the kernel and userspace :) # btrfs fi show Label: 'linux' uuid: 79862c20-d0b0-4ffa-a9af-e3a40868a243 Total devices 1 FS bytes used 284.60GiB devid1 size 300.03GiB used 300.03GiB path /dev/sdb5 Given this, you're running with the whole device fully allocated by the chunk allocator, this is not a good state to be in for any extended period of time on a filesystem which is being written to and modified. # btrfs fi df /mnt Data, single: total=278.00GiB, used=274.68GiB System, DUP: total=8.00MiB, used=64.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=11.00GiB, used=9.92GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B But you appear to have a reasonable amount of slack space within the chunks themselves. As soon as the problem started, I saw that the Metadata, DUP was completely used. It become a little better (like above) after a scrub. I can easily recover disk space by removing old snapshots, if needed. The dmesg output is attached. Before making further recovery attempts, or even restoring from backup, I would like to ask for the best option to proceed. I'd be kind of curious to see the results from btrfs check run without repair, but I doubt that will help narrow things down any further. As of right now, the absolute first thing I'd do is check your logs to see if you can find any indication of errors from the disk itself. I don't think it's likely, but it's worth checking. The couple of lines just before the crash in the attached kernel log would indicate to me that some of the metadata is corrupted. There are two likely possibilities for how that happened: 1. Running with no extra space for new chunks to be allocated is not a common use case, so it's not well tested, and it wouldn't surprise me if some accounting falls apart in that situation. 2. You might have bad RAM or a bad PSU. This is the second thing you should check after checking to see if the disk is OK, as either will likely cause any repair attempts to make things worse. RAM is pretty easy to check, but for a PSU you need a proper testing device. You can get such a device on Amazon or similar sites for about 25USD, and it's generally worth having around for troubleshooting. Assuming your disk and RAM are good, the next thing to do would be try and get the filesystem into a more usable state. The best option for this is to expand the filesystem if possible. Given that you're running right near capacity, I'd suggest at least 16G of extra space if possible. If that isn't a viable solution for you, the other option is to delete some of the oldest snapshots (Ideally enough that you have at least a few GB of extra space in the data chunks and a few hundred MB in the metadata chunks), then add a 4-8GB device to the FS temporarily (a ramdisk or flash drive works well for this), and run a full balance. If you're lucky, this will fix any metadata that's messed up, and the system should be usable. If not, it shouldn't make things any worse, and you probably want to look at btrfs restore to copy out the data to a new filesystem (ideally a bigger one). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Filesystem forced to readonly after use
I just wouldn't use btrfs repair with this version of progs, go back to v4.6.1 or upgrade to 4.7.2. You could do an offline check (no repair) and see if that reveals anything useful for developers. But I can't tell what's going on from the call trace. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Filesystem forced to readonly after use
Hello, I have a BTRFS filesystem that is reverting to read-only after a few moments of use. There is a stack trace visible in the kernel log, which is attached. Here is my system information: # uname -a Linux rescue 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux # btrfs --version btrfs-progs v4.7 # btrfs fi show Label: 'linux' uuid: 79862c20-d0b0-4ffa-a9af-e3a40868a243 Total devices 1 FS bytes used 284.60GiB devid1 size 300.03GiB used 300.03GiB path /dev/sdb5 # btrfs fi df /mnt Data, single: total=278.00GiB, used=274.68GiB System, DUP: total=8.00MiB, used=64.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=11.00GiB, used=9.92GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B As soon as the problem started, I saw that the Metadata, DUP was completely used. It become a little better (like above) after a scrub. I can easily recover disk space by removing old snapshots, if needed. The dmesg output is attached. Before making further recovery attempts, or even restoring from backup, I would like to ask for the best option to proceed. Thanks, Cesar [20048.035688] BTRFS info (device sdb5): disk space caching is enabled [20190.871802] BTRFS error (device sdb5): parent transid verify failed on 160420773888 wanted 181826 found 181573 [20190.882573] BTRFS error (device sdb5): parent transid verify failed on 160420773888 wanted 181826 found 181573 [20190.882607] [ cut here ] [20190.882642] WARNING: CPU: 3 PID: 5026 at fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x28c/0x2c0 [btrfs] [20190.882645] BTRFS: Transaction aborted (error -5) [20190.882648] Modules linked in: hid_generic usbhid hid btrfs xor raid6_pq sr_mod cdrom intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass joydev mousedev crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul dell_wmi dell_laptop amdkfd amd_iommu_v2 glue_helper sparse_keymap dell_smbios uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev radeon media ums_realtek ablk_helper ttm cryptd snd_hda_codec_hdmi arc4 snd_hda_codec_realtek snd_hda_codec_generic dcdbas dell_smm_hwmon iwldvm iTCO_wdt mac80211 iTCO_vendor_support xhci_pci xhci_hcd r8169 snd_hda_intel snd_hda_codec mii iwlwifi evdev i915 input_leds led_class btusb btrtl btbcm btintel bluetooth intel_cstate intel_rapl_perf cfg80211 psmouse pcspkr [20190.882725] snd_hda_core mac_hid rfkill snd_hwdep thermal wmi snd_pcm drm_kms_helper snd_timer drm snd soundcore intel_gtt shpchp syscopyarea ahci sysfillrect sysimgblt fb_sys_fops i2c_algo_bit libahci fjes libata button ac battery mei_me video mei i2c_i801 lpc_ich dell_smo8800 tpm_tis tpm sch_fq_codel ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod uas usb_storage scsi_mod serio_raw atkbd libps2 ehci_pci ehci_hcd usbcore usb_common i8042 serio [20190.882782] CPU: 3 PID: 5026 Comm: kworker/u16:2 Tainted: GW 4.7.2-1-ARCH #1 [20190.882785] Hardware name: Dell Inc. Dell System Vostro 3450/0GG0VM, BIOS A05 05/24/2011 [20190.882814] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [20190.882818] 0286 860a9f71 88015420fc90 812eb132 [20190.882824] 88015420fce0 88015420fcd0 8107a3ab [20190.882828] 0b938efd7800 a0c0be15 880085917688 034c [20190.882834] Call Trace: [20190.882842] [] dump_stack+0x63/0x81 [20190.882847] [] __warn+0xcb/0xf0 [20190.882852] [] warn_slowpath_fmt+0x5f/0x80 [20190.882875] [] btrfs_run_delayed_refs+0x28c/0x2c0 [btrfs] [20190.882895] [] delayed_ref_async_start+0x94/0xb0 [btrfs] [20190.882920] [] btrfs_scrubparity_helper+0x77/0x350 [btrfs] [20190.882943] [] btrfs_extent_refs_helper+0xe/0x10 [btrfs] [20190.882948] [] process_one_work+0x1e5/0x480 [20190.882953] [] worker_thread+0x48/0x4e0 [20190.882958] [] ? process_one_work+0x480/0x480 [20190.882962] [] ? process_one_work+0x480/0x480 [20190.882968] [] kthread+0xd8/0xf0 [20190.882975] [] ret_from_fork+0x1f/0x40 [20190.882981] [] ? kthread_worker_fn+0x170/0x170 [20190.882985] ---[ end trace 99d6d7ec847d19d4 ]--- [20190.882990] BTRFS: error (device sdb5) in btrfs_run_delayed_refs:2963: errno=-5 IO failure [20190.882994] BTRFS info (device sdb5): forced readonly [20295.373706] BTRFS error (device sdb5): cleaner transaction attach returned -30