subject:"Filesystem forced to readonly after use"

Re: Filesystem forced to readonly after use

2016-09-14 Thread Austin S. Hemmelgarn


On 2016-09-13 16:39, Cesar Strauss wrote:

On 13-09-2016 16:49, Austin S. Hemmelgarn wrote:

I'd be kind of curious to see the results from btrfs check run without
repair, but I doubt that will help narrow things down any further.


Attached.



As of right now, the absolute first thing I'd do is check your logs to
see if you can find any indication of errors from the disk itself.  I
don't think it's likely, but it's worth checking.


Will do.


The couple of lines just before the crash in the attached kernel log
would indicate to me that some of the metadata is corrupted.  There are
two likely possibilities for how that happened:
1. Running with no extra space for new chunks to be allocated is not a
common use case, so it's not well tested, and it wouldn't surprise me if
some accounting falls apart in that situation.


Indeed. I periodically remove old snapshots and check for disk space,
bit I guess I ran a bit too near the limit this time.
In theory, BTRFS _should_ work in such a situation.  In practice, you 
get all kinds of odd behaviors.  In your case, you still have a 
reasonable amount of free space in both data and metadata chunks, so it 
isn't quite as bad as it could be (trying to get a FS working again when 
you have zero space in any chunks is a serious pain).



2. You might have bad RAM or a bad PSU.  This is the second thing you
should check after checking to see if the disk is OK, as either will
likely cause any repair attempts to make things worse.  RAM is pretty
easy to check, but for a PSU you need a proper testing device.  You can
get such a device on Amazon or similar sites for about 25USD, and it's
generally worth having around for troubleshooting.


Understood.

This notebook has occasional failures when resuming from hibernation. I
suppose, from the point of view of the filesystem, this corresponds to
an unclean reboot.
Yeah, although it's generally not quite as bad as an unclean reboot 
(default configurations on almost all Linux distros call sync just 
before the actual power off, so you don't have to worry about stuff in 
the write cache being lost).  That said, it can also be worse than an 
unclean reboot depending on when the crash happens.


This brings up a good point though that I forgot, repeated unclean 
shutdowns (or failed resumes) can cause stuff like this to happen.  I 
don't often think about it since I rarely have issues with power loss or 
hard crashes (and I don't use hibernation), so it's not something I 
often remember to mention when helping people with filesystem issues.




Assuming your disk and RAM are good, the next thing to do would be try
and get the filesystem into a more usable state.  The best option for
this is to expand the filesystem if possible.  Given that you're running
right near capacity, I'd suggest at least 16G of extra space if
possible.  If that isn't a viable solution for you, the other option is
to delete some of the oldest snapshots (Ideally enough that you have at
least a few GB of extra space in the data chunks and a few hundred MB in
the metadata chunks), then add a 4-8GB device to the FS temporarily (a
ramdisk or flash drive works well for this), and run a full balance.  If
you're lucky, this will fix any metadata that's messed up, and the
system should be usable.  If not, it shouldn't make things any worse,
and you probably want to look at btrfs restore to copy out the data to a
new filesystem (ideally a bigger one).


I will try this next.
Like Chris mentioned, you probably want to use a different version of 
btrfs-progs.  I hadn't seen that that version was marked to not be used, 
otherwise I would have said something in my first reply.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Filesystem forced to readonly after use

2016-09-13 Thread Chris Murphy

>From the fsck...

bad block 160420741120

I can't tell though if that's a bad Btrfs leaf/node where both dup
copies are bad; or if it's a bad sector.

I'd mount it ro, and take a backup of anything you care about before
proceeding further.

smartctl -x might reveal if there are problems the drive itself is aware of.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Filesystem forced to readonly after use

2016-09-13 Thread Cesar Strauss


On 13-09-2016 16:39, Chris Murphy wrote:

I just wouldn't use btrfs repair with this version of progs, go back
to v4.6.1 or upgrade to 4.7.2.


Thanks for the tip. I upgraded to 4.7.2.

Cesar

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Filesystem forced to readonly after use

2016-09-13 Thread Cesar Strauss


On 13-09-2016 16:49, Austin S. Hemmelgarn wrote:

I'd be kind of curious to see the results from btrfs check run without
repair, but I doubt that will help narrow things down any further.


Attached.



As of right now, the absolute first thing I'd do is check your logs to
see if you can find any indication of errors from the disk itself.  I
don't think it's likely, but it's worth checking.


Will do.


The couple of lines just before the crash in the attached kernel log
would indicate to me that some of the metadata is corrupted.  There are
two likely possibilities for how that happened:
1. Running with no extra space for new chunks to be allocated is not a
common use case, so it's not well tested, and it wouldn't surprise me if
some accounting falls apart in that situation.


Indeed. I periodically remove old snapshots and check for disk space, 
bit I guess I ran a bit too near the limit this time.



2. You might have bad RAM or a bad PSU.  This is the second thing you
should check after checking to see if the disk is OK, as either will
likely cause any repair attempts to make things worse.  RAM is pretty
easy to check, but for a PSU you need a proper testing device.  You can
get such a device on Amazon or similar sites for about 25USD, and it's
generally worth having around for troubleshooting.


Understood.

This notebook has occasional failures when resuming from hibernation. I 
suppose, from the point of view of the filesystem, this corresponds to 
an unclean reboot.




Assuming your disk and RAM are good, the next thing to do would be try
and get the filesystem into a more usable state.  The best option for
this is to expand the filesystem if possible.  Given that you're running
right near capacity, I'd suggest at least 16G of extra space if
possible.  If that isn't a viable solution for you, the other option is
to delete some of the oldest snapshots (Ideally enough that you have at
least a few GB of extra space in the data chunks and a few hundred MB in
the metadata chunks), then add a 4-8GB device to the FS temporarily (a
ramdisk or flash drive works well for this), and run a full balance.  If
you're lucky, this will fix any metadata that's messed up, and the
system should be usable.  If not, it shouldn't make things any worse,
and you probably want to look at btrfs restore to copy out the data to a
new filesystem (ideally a bigger one).


I will try this next.

Thanks for the help!

Cesar

checking extents
parent transid verify failed on 160420773888 wanted 181826 found 181573
parent transid verify failed on 160420773888 wanted 181826 found 181573
parent transid verify failed on 160420773888 wanted 181826 found 181573
parent transid verify failed on 160420773888 wanted 181826 found 181573
Ignoring transid failure
leaf parent key incorrect 160420773888
parent transid verify failed on 160420773888 wanted 181826 found 181573
Ignoring transid failure
leaf parent key incorrect 160420773888
parent transid verify failed on 160420773888 wanted 181826 found 181573
Ignoring transid failure
leaf parent key incorrect 160420773888
parent transid verify failed on 160420773888 wanted 181826 found 181573
Ignoring transid failure
leaf parent key incorrect 160420773888
parent transid verify failed on 160420773888 wanted 181826 found 181573
Ignoring transid failure
leaf parent key incorrect 160420773888
parent transid verify failed on 160420773888 wanted 181826 found 181573
Ignoring transid failure
leaf parent key incorrect 160420773888
parent transid verify failed on 160420773888 wanted 181826 found 181573
Ignoring transid failure
leaf parent key incorrect 160420773888
parent transid verify failed on 160420773888 wanted 181826 found 181573
Ignoring transid failure
leaf parent key incorrect 160420773888
parent transid verify failed on 160420773888 wanted 181826 found 181573
Ignoring transid failure
leaf parent key incorrect 160420773888
parent transid verify failed on 160418889728 wanted 181826 found 181572
parent transid verify failed on 160418889728 wanted 181826 found 181572
parent transid verify failed on 160418889728 wanted 181826 found 181572
parent transid verify failed on 160418889728 wanted 181826 found 181572
parent transid verify failed on 160420741120 wanted 181826 found 181573
parent transid verify failed on 160420741120 wanted 181826 found 181573
parent transid verify failed on 160420741120 wanted 181826 found 181573
parent transid verify failed on 160420741120 wanted 181826 found 181573
Ignoring transid failure
leaf parent key incorrect 160420741120
bad block 160420741120
Errors found in extent allocation tree or chunk allocation
parent transid verify failed on 160420773888 wanted 181826 found 181573
Ignoring transid failure
parent transid verify failed on 160418889728 wanted 181826 found 181572
parent transid verify failed on 160418889728 wanted 181826 found 181572
parent transid verify failed on 160420741120 wanted 181826 found 181573
Ignoring transid failure
Error: could not find btree root

Re: Filesystem forced to readonly after use

2016-09-13 Thread Chris Murphy

On Tue, Sep 13, 2016 at 1:49 PM, Austin S. Hemmelgarn
 wrote:
> On 2016-09-13 15:20, Cesar Strauss wrote:

>>
>> btrfs-progs v4.7
>
> It's always good to see people who are staying up-to-date on the kernel and
> userspace :)

Yes, although it and 4.7.1 are marked as do not use.

https://btrfs.wiki.kernel.org/index.php/Changelog#btrfs-progs-4.7.2_.28Sep_2016.29


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Filesystem forced to readonly after use

2016-09-13 Thread Austin S. Hemmelgarn


On 2016-09-13 15:20, Cesar Strauss wrote:

Hello,

I have a BTRFS filesystem that is reverting to read-only after a few
moments of use. There is a stack trace visible in the kernel log, which
is attached.

Here is my system information:

# uname -a

Linux rescue 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016
x86_64 GNU/Linux

# btrfs --version

btrfs-progs v4.7
It's always good to see people who are staying up-to-date on the kernel 
and userspace :)


# btrfs fi show

Label: 'linux'  uuid: 79862c20-d0b0-4ffa-a9af-e3a40868a243
Total devices 1 FS bytes used 284.60GiB
devid1 size 300.03GiB used 300.03GiB path /dev/sdb5
Given this, you're running with the whole device fully allocated by the 
chunk allocator, this is not a good state to be in for any extended 
period of time on a filesystem which is being written to and modified.


# btrfs fi df /mnt

Data, single: total=278.00GiB, used=274.68GiB
System, DUP: total=8.00MiB, used=64.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=11.00GiB, used=9.92GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B
But you appear to have a reasonable amount of slack space within the 
chunks themselves.


As soon as the problem started, I saw that the Metadata, DUP was
completely used. It become a little better (like above) after a scrub.
I can easily recover disk space by removing old snapshots, if needed.

The dmesg output is attached.

Before making further recovery attempts, or even restoring from backup,
I would like to ask for the best option to proceed.
I'd be kind of curious to see the results from btrfs check run without 
repair, but I doubt that will help narrow things down any further.


As of right now, the absolute first thing I'd do is check your logs to 
see if you can find any indication of errors from the disk itself.  I 
don't think it's likely, but it's worth checking.


The couple of lines just before the crash in the attached kernel log 
would indicate to me that some of the metadata is corrupted.  There are 
two likely possibilities for how that happened:
1. Running with no extra space for new chunks to be allocated is not a 
common use case, so it's not well tested, and it wouldn't surprise me if 
some accounting falls apart in that situation.
2. You might have bad RAM or a bad PSU.  This is the second thing you 
should check after checking to see if the disk is OK, as either will 
likely cause any repair attempts to make things worse.  RAM is pretty 
easy to check, but for a PSU you need a proper testing device.  You can 
get such a device on Amazon or similar sites for about 25USD, and it's 
generally worth having around for troubleshooting.


Assuming your disk and RAM are good, the next thing to do would be try 
and get the filesystem into a more usable state.  The best option for 
this is to expand the filesystem if possible.  Given that you're running 
right near capacity, I'd suggest at least 16G of extra space if 
possible.  If that isn't a viable solution for you, the other option is 
to delete some of the oldest snapshots (Ideally enough that you have at 
least a few GB of extra space in the data chunks and a few hundred MB in 
the metadata chunks), then add a 4-8GB device to the FS temporarily (a 
ramdisk or flash drive works well for this), and run a full balance.  If 
you're lucky, this will fix any metadata that's messed up, and the 
system should be usable.  If not, it shouldn't make things any worse, 
and you probably want to look at btrfs restore to copy out the data to a 
new filesystem (ideally a bigger one).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Filesystem forced to readonly after use

2016-09-13 Thread Chris Murphy

I just wouldn't use btrfs repair with this version of progs, go back
to v4.6.1 or upgrade to 4.7.2.  You could do an offline check (no
repair) and see if that reveals anything useful for developers. But I
can't tell what's going on from the call trace.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Filesystem forced to readonly after use

2016-09-13 Thread Cesar Strauss


Hello,

I have a BTRFS filesystem that is reverting to read-only after a few 
moments of use. There is a stack trace visible in the kernel log, which 
is attached.


Here is my system information:

# uname -a

Linux rescue 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 
x86_64 GNU/Linux


# btrfs --version

btrfs-progs v4.7

# btrfs fi show

Label: 'linux'  uuid: 79862c20-d0b0-4ffa-a9af-e3a40868a243
Total devices 1 FS bytes used 284.60GiB
devid1 size 300.03GiB used 300.03GiB path /dev/sdb5

# btrfs fi df /mnt

Data, single: total=278.00GiB, used=274.68GiB
System, DUP: total=8.00MiB, used=64.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=11.00GiB, used=9.92GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

As soon as the problem started, I saw that the Metadata, DUP was 
completely used. It become a little better (like above) after a scrub.

I can easily recover disk space by removing old snapshots, if needed.

The dmesg output is attached.

Before making further recovery attempts, or even restoring from backup, 
I would like to ask for the best option to proceed.


Thanks,

Cesar

[20048.035688] BTRFS info (device sdb5): disk space caching is enabled
[20190.871802] BTRFS error (device sdb5): parent transid verify failed on 160420773888 wanted 181826 found 181573
[20190.882573] BTRFS error (device sdb5): parent transid verify failed on 160420773888 wanted 181826 found 181573
[20190.882607] [ cut here ]
[20190.882642] WARNING: CPU: 3 PID: 5026 at fs/btrfs/extent-tree.c:2963 btrfs_run_delayed_refs+0x28c/0x2c0 [btrfs]
[20190.882645] BTRFS: Transaction aborted (error -5)
[20190.882648] Modules linked in: hid_generic usbhid hid btrfs xor raid6_pq sr_mod cdrom intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass joydev mousedev crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul dell_wmi dell_laptop amdkfd amd_iommu_v2 glue_helper sparse_keymap dell_smbios uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev radeon media ums_realtek ablk_helper ttm cryptd snd_hda_codec_hdmi arc4 snd_hda_codec_realtek snd_hda_codec_generic dcdbas dell_smm_hwmon iwldvm iTCO_wdt mac80211 iTCO_vendor_support xhci_pci xhci_hcd r8169 snd_hda_intel snd_hda_codec mii iwlwifi evdev i915 input_leds led_class btusb btrtl btbcm btintel bluetooth intel_cstate intel_rapl_perf cfg80211 psmouse pcspkr
[20190.882725]  snd_hda_core mac_hid rfkill snd_hwdep thermal wmi snd_pcm drm_kms_helper snd_timer drm snd soundcore intel_gtt shpchp syscopyarea ahci sysfillrect sysimgblt fb_sys_fops i2c_algo_bit libahci fjes libata button ac battery mei_me video mei i2c_i801 lpc_ich dell_smo8800 tpm_tis tpm sch_fq_codel ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod uas usb_storage scsi_mod serio_raw atkbd libps2 ehci_pci ehci_hcd usbcore usb_common i8042 serio
[20190.882782] CPU: 3 PID: 5026 Comm: kworker/u16:2 Tainted: GW   4.7.2-1-ARCH #1
[20190.882785] Hardware name: Dell Inc.  Dell System Vostro 3450/0GG0VM, BIOS A05 05/24/2011
[20190.882814] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[20190.882818]  0286 860a9f71 88015420fc90 812eb132
[20190.882824]  88015420fce0  88015420fcd0 8107a3ab
[20190.882828]  0b938efd7800 a0c0be15 880085917688 034c
[20190.882834] Call Trace:
[20190.882842]  [] dump_stack+0x63/0x81
[20190.882847]  [] __warn+0xcb/0xf0
[20190.882852]  [] warn_slowpath_fmt+0x5f/0x80
[20190.882875]  [] btrfs_run_delayed_refs+0x28c/0x2c0 [btrfs]
[20190.882895]  [] delayed_ref_async_start+0x94/0xb0 [btrfs]
[20190.882920]  [] btrfs_scrubparity_helper+0x77/0x350 [btrfs]
[20190.882943]  [] btrfs_extent_refs_helper+0xe/0x10 [btrfs]
[20190.882948]  [] process_one_work+0x1e5/0x480
[20190.882953]  [] worker_thread+0x48/0x4e0
[20190.882958]  [] ? process_one_work+0x480/0x480
[20190.882962]  [] ? process_one_work+0x480/0x480
[20190.882968]  [] kthread+0xd8/0xf0
[20190.882975]  [] ret_from_fork+0x1f/0x40
[20190.882981]  [] ? kthread_worker_fn+0x170/0x170
[20190.882985] ---[ end trace 99d6d7ec847d19d4 ]---
[20190.882990] BTRFS: error (device sdb5) in btrfs_run_delayed_refs:2963: errno=-5 IO failure
[20190.882994] BTRFS info (device sdb5): forced readonly
[20295.373706] BTRFS error (device sdb5): cleaner transaction attach returned -30

Re: Filesystem forced to readonly after use

Re: Filesystem forced to readonly after use

Re: Filesystem forced to readonly after use

Re: Filesystem forced to readonly after use

Re: Filesystem forced to readonly after use

Re: Filesystem forced to readonly after use

Re: Filesystem forced to readonly after use

Filesystem forced to readonly after use

8 matches

Site Navigation

Mail list logo

Footer information