Re: Filesystem mounts fine but hangs on access
On 04.11.18 19:31, Duncan wrote: [This mail was also posted to gmane.comp.file-systems.btrfs.] Sebastian Ochmann posted on Sun, 04 Nov 2018 14:15:55 +0100 as excerpted: Hello, I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive which stopped working correctly. Kernel 4.18.16 (Arch Linux) I see upgrading to 4.19 seems to have solved your problem, but this is more about something I saw in the trace that has me wondering... [ 368.267315] touch_atime+0xc0/0xe0 Do you have any atime-related mount options set? That's an interesting point. On some machines, I have explicitly set "noatime", but on that particular system, I did not, thus using the "relatime" option as per default. Since I'm not using mutt or anything else (that I'm aware of) that exploits this feature, I will set noatime there as well. FWIW, noatime is strongly recommended on btrfs. Now I'm not a dev, just a btrfs user and list regular, and I don't know if that function is called and just does nothing when noatime is set, so you may well already have it set and this is "much ado about nothing", but the chance that it's relevant, if not for you, perhaps for others that may read it, begs for this post... The problem with atime, access time, is that it turns most otherwise read- only operations into read-and-write operations in ordered to update the access time. And on copy-on-write (COW) based filesystems such as btrfs, that can be a big problem, because updating that tiny bit of metadata will trigger a rewrite of the entire metadata block containing it, which will trigger an update of the metadata for /that/ block in the parent metadata tier... all the way up the metadata tree, ultimately to its root, the filesystem root and the superblocks, at the next commit (normally every 30 seconds or less). Not only is that a bunch of otherwise unnecessary work for a bit of metadata barely anything actually uses, but forcing most read operations to read-write obviously compounds the risk for all of those would-be read- only operations when a filesystem already has problems. Additionally, if your use-case includes regular snapshotting, with atime on, on mostly read workloads with few writes (other than atime updates), it may actually be the case that most of the changes in a snapshot are actually atime updates, making reoccurring snapshot updates far larger than they'd be otherwise. Now a few years ago the kernel did change the default to relatime, basically updating the atime for any particular file only once a day, which does help quite a bit, and on traditional filesystems it's arguably a reasonably sane default, but COW makes atime tracking enough more expensive that setting noatime is still strongly recommended on btrfs, particularly if you're doing regular snapshotting. So do consider adding noatime to your mount options if you haven't done so already. AFAIK, the only /semi-common/ app that actually uses atimes these days is mutt (for read-message tracking), and then not for mbox, so you should be safe to at least test turning it off. And YMMV, but if you do use mutt or something else that uses atimes, I'd go so far as to recommend finding an alternative, replacing either btrfs (because as I said, relatime is arguably enough on a traditional non-COW filesystem) or whatever it is that uses atimes, your call, because IMO it really is that big a deal. Meanwhile, particularly after seeing that in the trace, if the 4.19 update hadn't already fixed it, I'd have suggested trying a read-only mount, both as a test, and assuming it worked, at least allowing you to access the data without the lockup, which would have then been related to the write due to the atime update, not the actual read. It would be nice to have a 1:1 image of the filesystem (or rather the raw block device) for more testing, but unfortunately I don't have another 10 TB drive lying around. :) I didn't really expect the 4.19 upgrade to (apparently) fix the problem right away, so I also couldn't test the mentioned patch, but yeah... If it happens again (which for some reason I don't hope), I'll try you suggestion. Actually, a read-only mount test is always a good troubleshooting step when the trouble is a filesystem that either won't mount normally, or will, but then locks up when you try to access something. It's far lest risky than a normal writable mount, and at minimum it provides you the additional test data of whether it worked or not, plus if it does, a chance to access the data and make sure your backups are current, before actually trying to do any repairs.
Re: Filesystem mounts fine but hangs on access
Thank you very much for the quick reply. On 04.11.18 14:37, Qu Wenruo wrote: On 2018/11/4 下午9:15, Sebastian Ochmann wrote: Hello, I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive which stopped working correctly. The drive is used as a backup drive with zstd compression to which I regularly rsync and make daily snapshots. After I routinely removed a bunch of snapshots (about 20), I noticed later that the machine would hang when trying to unmount the filesystem. The current state is that I'm able to mount the filesystem without errors and I can view (ls) files in the root level, but trying to view contents of directories contained therein hangs just like when trying to unmount the filesystem. I have not yet tried to run check, repair, etc. Do you have any advice what I should try next? Could you please run "btrfs check" on the umounted fs? I ran btrfs check on the unmounted fs and it reported no errors. A notable hardware change I did a few days before the problem is a switch from an Intel Xeon platform to AMD Threadripper. However, I haven't seen problems with the rest of the btrfs filesystems (in particular, a RAID-1 consisting of three HDDs), which I also migrated to the new platform, yet. I just want to mention it in case there are known issues in that direction. Kernel 4.18.16 (Arch Linux) btrfs-progs 4.17.1 Kernel log after trying to "ls" a directory contained in the filesystem's root directory: [ 79.279349] BTRFS info (device dm-5): use zstd compression, level 0 [ 79.279351] BTRFS info (device dm-5): disk space caching is enabled [ 79.279352] BTRFS info (device dm-5): has skinny extents [ 135.202344] kauditd_printk_skb: 2 callbacks suppressed [ 135.202347] audit: type=1130 audit(1541335770.667:45): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=polkit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 135.364850] audit: type=1130 audit(1541335770.831:46): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=udisks2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 135.589255] audit: type=1130 audit(1541335771.054:47): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rtkit-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 368.266653] INFO: task kworker/u256:1:728 blocked for more than 120 seconds. [ 368.266657] Tainted: P OE 4.18.16-arch1-1-ARCH #1 [ 368.266658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 368.20] kworker/u256:1 D 0 728 2 0x8080 [ 368.266680] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [ 368.266681] Call Trace: [ 368.266687] ? __schedule+0x29b/0x8b0 [ 368.266690] ? preempt_count_add+0x68/0xa0 [ 368.266692] schedule+0x32/0x90 [ 368.266707] btrfs_tree_read_lock+0x7d/0x110 [btrfs] [ 368.266710] ? wait_woken+0x80/0x80 [ 368.266719] btrfs_read_lock_root_node+0x2f/0x40 [btrfs] [ 368.266729] btrfs_search_slot+0xf6/0xa00 [btrfs] [ 368.266732] ? _raw_spin_unlock+0x16/0x30 [ 368.266734] ? inode_insert5+0x105/0x1a0 [ 368.266746] btrfs_lookup_inode+0x3a/0xc0 [btrfs] [ 368.266749] ? kmem_cache_alloc+0x179/0x1d0 [ 368.266762] btrfs_iget+0x113/0x690 [btrfs] [ 368.266764] ? _raw_spin_unlock+0x16/0x30 [ 368.266778] __lookup_free_space_inode+0xd8/0x150 [btrfs] [ 368.266792] lookup_free_space_inode+0x63/0xc0 [btrfs] [ 368.266806] load_free_space_cache+0x6e/0x190 [btrfs] [ 368.266808] ? kmem_cache_alloc_trace+0x181/0x1d0 [ 368.266817] ? cache_block_group+0x73/0x3e0 [btrfs] [ 368.266827] cache_block_group+0x1c1/0x3e0 [btrfs] This thread is trying to get tree root lock to create free space cache, while some one already has locked the tree root. [ 368.266829] ? wait_woken+0x80/0x80 [ 368.266839] find_free_extent+0x872/0x10e0 [btrfs] [ 368.266851] btrfs_reserve_extent+0x9b/0x180 [btrfs] [ 368.266862] btrfs_alloc_tree_block+0x1b3/0x4d0 [btrfs] [ 368.266872] __btrfs_cow_block+0x11d/0x500 [btrfs] [ 368.266882] btrfs_cow_block+0xdc/0x1a0 [btrfs] [ 368.266891] btrfs_search_slot+0x282/0xa00 [btrfs] [ 368.266893] ? _raw_spin_unlock+0x16/0x30 [ 368.266903] btrfs_insert_empty_items+0x67/0xc0 [btrfs] [ 368.266913] __btrfs_run_delayed_refs+0x8ef/0x10a0 [btrfs] [ 368.266915] ? preempt_count_add+0x68/0xa0 [ 368.266926] btrfs_run_delayed_refs+0x72/0x180 [btrfs] [ 368.266937] delayed_ref_async_start+0x81/0x90 [btrfs] [ 368.266950] normal_work_helper+0xbd/0x350 [btrfs] [ 368.266953] process_one_work+0x1eb/0x3c0 [ 368.266955] worker_thread+0x2d/0x3d0 [ 368.266956] ? process_one_work+0x3c0/0x3c0 [ 368.266958] kthread+0x112/0x130 [ 368.266960] ? kthread_flush_work_fn+0x10/0x10 [ 368.266961] ret_from_fork+0x22/0x40 [ 368.266978] INFO: task btrfs-cleaner:1196 blocked for more than 120
Filesystem mounts fine but hangs on access
Hello, I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive which stopped working correctly. The drive is used as a backup drive with zstd compression to which I regularly rsync and make daily snapshots. After I routinely removed a bunch of snapshots (about 20), I noticed later that the machine would hang when trying to unmount the filesystem. The current state is that I'm able to mount the filesystem without errors and I can view (ls) files in the root level, but trying to view contents of directories contained therein hangs just like when trying to unmount the filesystem. I have not yet tried to run check, repair, etc. Do you have any advice what I should try next? A notable hardware change I did a few days before the problem is a switch from an Intel Xeon platform to AMD Threadripper. However, I haven't seen problems with the rest of the btrfs filesystems (in particular, a RAID-1 consisting of three HDDs), which I also migrated to the new platform, yet. I just want to mention it in case there are known issues in that direction. Kernel 4.18.16 (Arch Linux) btrfs-progs 4.17.1 Kernel log after trying to "ls" a directory contained in the filesystem's root directory: [ 79.279349] BTRFS info (device dm-5): use zstd compression, level 0 [ 79.279351] BTRFS info (device dm-5): disk space caching is enabled [ 79.279352] BTRFS info (device dm-5): has skinny extents [ 135.202344] kauditd_printk_skb: 2 callbacks suppressed [ 135.202347] audit: type=1130 audit(1541335770.667:45): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=polkit comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 135.364850] audit: type=1130 audit(1541335770.831:46): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=udisks2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 135.589255] audit: type=1130 audit(1541335771.054:47): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rtkit-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 368.266653] INFO: task kworker/u256:1:728 blocked for more than 120 seconds. [ 368.266657] Tainted: P OE 4.18.16-arch1-1-ARCH #1 [ 368.266658] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 368.20] kworker/u256:1 D0 728 2 0x8080 [ 368.266680] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [ 368.266681] Call Trace: [ 368.266687] ? __schedule+0x29b/0x8b0 [ 368.266690] ? preempt_count_add+0x68/0xa0 [ 368.266692] schedule+0x32/0x90 [ 368.266707] btrfs_tree_read_lock+0x7d/0x110 [btrfs] [ 368.266710] ? wait_woken+0x80/0x80 [ 368.266719] btrfs_read_lock_root_node+0x2f/0x40 [btrfs] [ 368.266729] btrfs_search_slot+0xf6/0xa00 [btrfs] [ 368.266732] ? _raw_spin_unlock+0x16/0x30 [ 368.266734] ? inode_insert5+0x105/0x1a0 [ 368.266746] btrfs_lookup_inode+0x3a/0xc0 [btrfs] [ 368.266749] ? kmem_cache_alloc+0x179/0x1d0 [ 368.266762] btrfs_iget+0x113/0x690 [btrfs] [ 368.266764] ? _raw_spin_unlock+0x16/0x30 [ 368.266778] __lookup_free_space_inode+0xd8/0x150 [btrfs] [ 368.266792] lookup_free_space_inode+0x63/0xc0 [btrfs] [ 368.266806] load_free_space_cache+0x6e/0x190 [btrfs] [ 368.266808] ? kmem_cache_alloc_trace+0x181/0x1d0 [ 368.266817] ? cache_block_group+0x73/0x3e0 [btrfs] [ 368.266827] cache_block_group+0x1c1/0x3e0 [btrfs] [ 368.266829] ? wait_woken+0x80/0x80 [ 368.266839] find_free_extent+0x872/0x10e0 [btrfs] [ 368.266851] btrfs_reserve_extent+0x9b/0x180 [btrfs] [ 368.266862] btrfs_alloc_tree_block+0x1b3/0x4d0 [btrfs] [ 368.266872] __btrfs_cow_block+0x11d/0x500 [btrfs] [ 368.266882] btrfs_cow_block+0xdc/0x1a0 [btrfs] [ 368.266891] btrfs_search_slot+0x282/0xa00 [btrfs] [ 368.266893] ? _raw_spin_unlock+0x16/0x30 [ 368.266903] btrfs_insert_empty_items+0x67/0xc0 [btrfs] [ 368.266913] __btrfs_run_delayed_refs+0x8ef/0x10a0 [btrfs] [ 368.266915] ? preempt_count_add+0x68/0xa0 [ 368.266926] btrfs_run_delayed_refs+0x72/0x180 [btrfs] [ 368.266937] delayed_ref_async_start+0x81/0x90 [btrfs] [ 368.266950] normal_work_helper+0xbd/0x350 [btrfs] [ 368.266953] process_one_work+0x1eb/0x3c0 [ 368.266955] worker_thread+0x2d/0x3d0 [ 368.266956] ? process_one_work+0x3c0/0x3c0 [ 368.266958] kthread+0x112/0x130 [ 368.266960] ? kthread_flush_work_fn+0x10/0x10 [ 368.266961] ret_from_fork+0x22/0x40 [ 368.266978] INFO: task btrfs-cleaner:1196 blocked for more than 120 seconds. [ 368.266980] Tainted: P OE 4.18.16-arch1-1-ARCH #1 [ 368.266981] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 368.266982] btrfs-cleaner D0 1196 2 0x8080 [ 368.266983] Call Trace: [ 368.266985] ? __schedule+0x29b/0x8b0 [ 368.266987] schedule+0x32/0x90 [ 368.266997] cache_block_group+0x148/0x3e0 [btrfs] [ 368.266998] ? wait_woken+0x80/0x80 [
Re: Periodic frame losses when recording to btrfs volume with OBS
Hello, I attached to the ffmpeg-mux process for a little while and pasted the result here: https://pastebin.com/XHaMLX8z Can you help me with interpreting this result? If you'd like me to run strace with specific options, please let me know. This is a level of debugging I'm not dealing with on a daily basis. :) Best regards Sebastian On 22.01.2018 20:08, Chris Mason wrote: On 01/22/2018 01:33 PM, Sebastian Ochmann wrote: [ skipping to the traces ;) ] 2866 ffmpeg-mux D [] btrfs_start_ordered_extent+0x101/0x130 [btrfs] [] lock_and_cleanup_extent_if_need+0x340/0x380 [btrfs] [] __btrfs_buffered_write+0x261/0x740 [btrfs] [] btrfs_file_write_iter+0x20f/0x650 [btrfs] [] __vfs_write+0xf9/0x170 [] vfs_write+0xad/0x1a0 [] SyS_write+0x52/0xc0 [] entry_SYSCALL_64_fastpath+0x1a/0x7d [] 0x This is where we wait for writes that are already in flight before we're allowed to redirty those pages in the file. It'll happen when we either overwrite a page in the file that we've already written, or when we're trickling down writes slowly in non-4K aligned writes. You can probably figure out pretty quickly which is the case by stracing ffmpeg-mux. Since lower dirty ratios made it happen more often for you, my guess is the app is sending down unaligned writes. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Periodic frame losses when recording to btrfs volume with OBS
First off, thank you for all the responses! Let me reply to multiple suggestions at once in this mail. On 22.01.2018 01:39, Qu Wenruo wrote: Either such mount option has a bug, or some unrelated problem. As you mentioned the output is about 10~50MiB/s, 30s means 300~1500MiBs. Maybe it's related to the dirty data amount? Would you please verify if a lower or higher profile (resulting much larger or smaller data stream) would affect? A much lower rate seems to mitigate the problem somewhat, however I'm talking about low single-digit MB/s when the problem seems to vanish. But even with low, but more realistic, amounts of data the drops still happen. Despite that, I'll dig to see if commit= option has any bug. And you could also try the nospace_cache mount option provided by Chris Murphy, which may also help. I tried the nospace_cache option but it doesn't seem to make a difference to me. On 22.01.2018 15:27, Chris Mason wrote: > This could be a few different things, trying without the space cache was > already suggested, and that's a top suspect. > > How does the application do the writes? Are they always 4K aligned or > does it send them out in odd sizes? > > The easiest way to nail it down is to use offcputime from the iovisor > project: > > > https://github.com/iovisor/bcc/blob/master/tools/offcputime.py > > If you haven't already configured this it may take a little while, but > it's the perfect tool for this problem. > > Otherwise, if the stalls are long enough you can try to catch it with > /proc//stack. I've attached a helper script I often use to dump > the stack trace of all the tasks in D state. > > Just run walker.py and it should give you something useful. You can use > walker.py -a to see all the tasks instead of just D state. This just > walks /proc//stack, so you'll need to run it as someone with > permissions to see the stack traces of the procs you care about. > > -chris I tried the walker.py script and was able to catch stack traces when the lag happens. I'm pasting two traces at the end of this mail - one when it happened using a USB-connected HDD and one when it happened on a SATA SSD. The latter is encrypted, hence the dmcrypt_write process. Note however that my original problem appeared on a SSD that was not encrypted. In reply to the mail by Duncan: 64 GB RAM... Do you know about the /proc/sys/vm/dirty_* files and how to use/tweak them? If not, read $KERNDIR/Documentation/sysctl/vm.txt, focusing on these files. At least I have never tweaked those settings yet. I certainly didn't know about the foreground/background distinction, that is really interesting. Thank you for the very extensive info and guide btw! So try setting something a bit more reasonable and see if it helps. That 1% ratio at 16 GiB RAM for ~160 MB was fine for me, but I'm not doing critical streaming, and at 64 GiB you're looking at ~640 MB per 1%, as I said, too chunky. For streaming, I'd suggest something approaching the value of your per-second IO bandwidth, we're assuming 100 MB/sec here so 100 MiB but let's round that up to a nice binary 128 MiB, for the background value, perhaps half a GiB or 5 seconds worth of writeback time for foreground, 4 times the background value. So: vm.dirty_background_bytes = 134217728 # 128*1024*1024, 128 MiB vm.dirty_bytes = 536870912 # 512*1024*1024, 512 MiB Now I have good and bad news. The good news is that setting these tunables to different values does change something. The bad news is that lowering these values only seems to let the lag and frame drops happen quicker/more frequently. I have also tried lowering the background bytes to, say, 128 MB but the non-background bytes to 1 or 2 GB, but even the background task seems to already have a bad enough effect to start dropping frames. :( When writing to the SSD, the effect seems to be mitigated a little bit, but still frame drops are quickly occurring which is unacceptable given that the system is generally able to do better. By the way, as you can see from the stack traces, in the SSD case blk_mq is in use. But I know less about that stuff and it's googlable, should you decide to try playing with it too. I know what the dirty_* stuff does from personal experience. =:^) "I know what the dirty_* stuff does from personal experience. =:^)" sounds quite interesting... :D Best regards and thanks again Sebastian First stack trace: 690 usb-storage D [] usb_sg_wait+0xf4/0x150 [usbcore] [] usb_stor_bulk_transfer_sglist.part.1+0x63/0xb0 [usb_storage] [] usb_stor_bulk_srb+0x49/0x80 [usb_storage] [] usb_stor_Bulk_transport+0x163/0x3d0 [usb_storage] [] usb_stor_invoke_transport+0x37/0x4c0 [usb_storage] [] usb_stor_control_thread+0x1d8/0x2c0 [usb_storage] [] kthread+0x118/0x130 [] ret_from_fork+0x1f/0x30 [] 0x 2505 kworker/u16:2 D [] io_schedule+0x12/0x40 [] wbt_wait+0x1b8/0x340 [] blk_mq_make_request+0xe6/0x6e0 []
Re: Periodic frame losses when recording to btrfs volume with OBS
On 21.01.2018 23:04, Chris Murphy wrote: On Sun, Jan 21, 2018 at 8:27 AM, Sebastian Ochmann <ochm...@cs.uni-bonn.de> wrote: On 21.01.2018 11:04, Qu Wenruo wrote: The output of "mount" after setting 10 seconds commit interval: /dev/sdc1 on /mnt/rec type btrfs (rw,relatime,space_cache,commit=10,subvolid=5,subvol=/) I wonder if it gets stuck updating v1 space cache. Instead of trying v2, you could try nospace_cache mount option and see if there's a change in behavior. I tried disabling space_cache, also on a newly formatted volume when first mounting it. However, it doesn't seem to make a difference. Basically the same lags in the same interval, sorry. Best regards Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Periodic frame losses when recording to btrfs volume with OBS
On 21.01.2018 11:04, Qu Wenruo wrote: On 2018年01月20日 18:47, Sebastian Ochmann wrote: Hello, I would like to describe a real-world use case where btrfs does not perform well for me. I'm recording 60 fps, larger-than-1080p video using OBS Studio [1] where it is important that the video stream is encoded and written out to disk in real-time for a prolonged period of time (2-5 hours). The result is a H264 video encoded on the GPU with a data rate ranging from approximately 10-50 MB/s. The hardware used is powerful enough to handle this task. When I use a XFS volume for recording, no matter whether it's a SSD or HDD, the recording is smooth and no frame drops are reported (OBS has a nice Stats window where it shows the number of frames dropped due to encoding lag which seemingly also includes writing the data out to disk). However, when using a btrfs volume I quickly observe severe, periodic frame drops. It's not single frames but larger chunks of frames that a dropped at a time. I tried mounting the volume with nobarrier but to no avail. What's the drop internal? Something near 30s? If so, try mount option commit=300 to see if it helps. Thank you for your reply. I observed the interval more closely and it shows that the first, quite small drop occurs about 10 seconds after starting the recording (some initial metadata being written?). After that, the interval is indeed about 30 seconds with large drops each time. Thus I tried setting the commit option to different values. I confirmed that the setting was activated by looking at the options "mount" shows (see below). However, no matter whether I set the commit interval to 300, 60 or 10 seconds, the results were always similar. About every 30 seconds the drive shows activity for a few seconds and the drop occurs shortly thereafter. It almost seems like the commit setting doesn't have any effect. By the way, the machine I'm currently testing on has 64 GB of RAM so it should have plenty of room for caching. Of course, the simple fix is to use a FS that works for me(TM). However I thought since this is a common real-world use case I'd describe the symptoms here in case anyone is interested in analyzing this behavior. It's not immediately obvious that the FS makes such a difference. Also, if anyone has an idea what I could try to mitigate this issue (mount or mkfs options?) I can try that. Mkfs.options can help, but only marginally AFAIK. You could try mkfs with -n 4K (minimal supported nodesize), to reduce the tree lock critical region by a little, at the cost of more metadata fragmentation. And is there any special features enabled like quota? Or scheduled balance running at background? Which is known to dramatically impact performance of transaction commitment, so it's recommended to disable quota/scheduled balance first. Another recommendation is to use nodatacow mount option to reduce the CoW metadata overhead, but I doubt about the effectiveness. I tried the -n 4K and nodatacow options, but it doesn't seem to make a big difference, if at all. No quota or auto-balance is active. It's basically using Arch Linux default options. The output of "mount" after setting 10 seconds commit interval: /dev/sdc1 on /mnt/rec type btrfs (rw,relatime,space_cache,commit=10,subvolid=5,subvol=/) Also tried noatime, but didn't make a difference either. Best regards Sebastian Thanks, Qu >> I saw this behavior on two different machines with kernels 4.14.13 and 4.14.5, both Arch Linux. btrfs-progs 4.14, OBS 20.1.3-241-gf5c3af1b built from git. Best regards Sebastian [1] https://github.com/jp9000/obs-studio -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Periodic frame losses when recording to btrfs volume with OBS
Hello, I would like to describe a real-world use case where btrfs does not perform well for me. I'm recording 60 fps, larger-than-1080p video using OBS Studio [1] where it is important that the video stream is encoded and written out to disk in real-time for a prolonged period of time (2-5 hours). The result is a H264 video encoded on the GPU with a data rate ranging from approximately 10-50 MB/s. The hardware used is powerful enough to handle this task. When I use a XFS volume for recording, no matter whether it's a SSD or HDD, the recording is smooth and no frame drops are reported (OBS has a nice Stats window where it shows the number of frames dropped due to encoding lag which seemingly also includes writing the data out to disk). However, when using a btrfs volume I quickly observe severe, periodic frame drops. It's not single frames but larger chunks of frames that a dropped at a time. I tried mounting the volume with nobarrier but to no avail. Of course, the simple fix is to use a FS that works for me(TM). However I thought since this is a common real-world use case I'd describe the symptoms here in case anyone is interested in analyzing this behavior. It's not immediately obvious that the FS makes such a difference. Also, if anyone has an idea what I could try to mitigate this issue (mount or mkfs options?) I can try that. I saw this behavior on two different machines with kernels 4.14.13 and 4.14.5, both Arch Linux. btrfs-progs 4.14, OBS 20.1.3-241-gf5c3af1b built from git. Best regards Sebastian [1] https://github.com/jp9000/obs-studio -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-freespace ever doing anything?
On 31.07.2017 14:08, Austin S. Hemmelgarn wrote: On 2017-07-31 06:51, Sebastian Ochmann wrote: Hello, I have a quite simple and possibly stupid question. Since I'm occasionally seeing warnings about failed loading of free space cache, I wanted to clear and rebuild space cache. So I mounted the filesystem(s) with -o clear_cache and subsequently with my regular options which includes space_cache. Indeed, dmesg tells me: [ 60.285190] BTRFS info (device dm-1): force clearing of disk cache and then [ 137.151845] BTRFS info (device dm-1): use ssd allocation scheme [ 137.151850] BTRFS info (device dm-1): disk space caching is enabled [ 137.151852] BTRFS info (device dm-1): has skinny extents To my understanding, btrfs-freespace should then start working to rebuild the free space cache. However, I can't remember that I have ever seen btrfs work hard after clearing the space cache. The drives aren't working much, and the btrfs-freespace processes (which are indeed there) don't do anything either. So simple question: Can anyone try to clear their space cache and confirm that btrfs actually does something after doing so? Is there anything I could do to confirm that something is happening? Based on my (limited) understanding of that code, assuming you're using the original free space cache (which I think is the case, since you said you're using the regular 'space_cache' option instead of 'space_cache=v2'), there's not _much_ work that needs to be done unless free space is heavily fragmented and the disk is reasonably full. The original free space cache is pretty similar to an allocation bitmap, and computing that is not hard to do (you just figure out which blocks are actually used). Based on my own experience, you'll see almost zero activity most of the time when rebuilding the free space cache regardless of which you are using (the original, or the new version), although the newer free space tree code appears to do a bit more work. Ah, that's interesting, I have to admit I wasn't even aware of space_cache v2. That said, the btrfs wiki doesn't state its existence on the mount options page. It's only mentioned at the bottom of the Status page where clearing the space cache for v2 using "btrfs check" is explained. The man page of "btrfs-check" states something interesting regarding the clearing of space cache using the "clear_cache" mount option when using v1: "For free space cache v1, the clear_cache kernel mount option only rebuilds the free space cache for block groups that are modified while the filesystem is mounted with that option." So "clear_cache" is, to my understanding, pretty much a misnomer. Only for v2, it actually clears the whole cache. I now used "btrfs check --clear-space-cache v1" on one of the devices and it took a while to clear the cache (way longer than when using the clear_cache mount option) (rebuilding still seems to be quick though). The explanation of the space_cache and clear_cache options in the Wiki should be updated. The mount options page doesn't mention space_cache v2 and the clear_cache option supposedly clears "all the free space caches" according to the wiki which contradicts the btrfs check manpage. The drives in question are a SSD and a HDD, both in the range of 1-2 TB in size. I'm on Arch Linux, kernel 4.12.3, btrfs-progs 4.11.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs-freespace ever doing anything?
Hello, I have a quite simple and possibly stupid question. Since I'm occasionally seeing warnings about failed loading of free space cache, I wanted to clear and rebuild space cache. So I mounted the filesystem(s) with -o clear_cache and subsequently with my regular options which includes space_cache. Indeed, dmesg tells me: [ 60.285190] BTRFS info (device dm-1): force clearing of disk cache and then [ 137.151845] BTRFS info (device dm-1): use ssd allocation scheme [ 137.151850] BTRFS info (device dm-1): disk space caching is enabled [ 137.151852] BTRFS info (device dm-1): has skinny extents To my understanding, btrfs-freespace should then start working to rebuild the free space cache. However, I can't remember that I have ever seen btrfs work hard after clearing the space cache. The drives aren't working much, and the btrfs-freespace processes (which are indeed there) don't do anything either. So simple question: Can anyone try to clear their space cache and confirm that btrfs actually does something after doing so? Is there anything I could do to confirm that something is happening? The drives in question are a SSD and a HDD, both in the range of 1-2 TB in size. I'm on Arch Linux, kernel 4.12.3, btrfs-progs 4.11.1 Best regards Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mounting RAID1 degraded+rw only works once
Hello, I'm trying to understand how to correctly handle and recover from degraded RAID1 setups in btrfs. In particular, I don't understand a behavior I'm seeing which somehow takes part of the advantage away from the idea of having a RAID for me. The main issue I have is as follows. I can mount a RAID1 with missing devices using the "degraded" option. So far so good. I know I should add another device at that point, but let's say I don't have a device ready but need to keep my system running. The problem is that once I write some data to the device and unmount it, I cannot mount it degraded+rw again but only degraded+ro. And at that point I can't make changes such as adding new devices and rebalancing as far as I see, rendering the degraded RAID useless for me for keeping the system running. Example: - Create RAID1 (data+metadata) with 2 devices - mount the filesystem - btrfs fi df /something: Data, RAID1: total=112.00MiB, used=40.76MiB System, RAID1: total=8.00MiB, used=16.00KiB Metadata, RAID1: total=32.00MiB, used=768.00KiB GlobalReserve, single: total=16.00MiB, used=0.00B - unmount - Destroy one of the devices (dd over it or whatever) - mount with "-o degraded" - Write some data to the device (not sure if this is strictly necessary). - unmount the filesystem - Try to mount with "-o degraded" again. It doesn't allow you to do so (if it does, try to unmount and mount again, sometimes it needs two tries). dmesg says "missing devices(1) exceeds the limit(0), writable mount is not allowed" - mounting with "-o degraded,ro" works. - btrfs fi df [dev]: Data, RAID1: total=112.00MiB, used=40.76MiB System, RAID1: total=8.00MiB, used=16.00KiB System, single: total=8.00MiB, used=0.00B Metadata, RAID1: total=32.00MiB, used=768.00KiB Metadata, single: total=28.00MiB, used=80.00KiB GlobalReserve, single: total=16.00MiB, used=0.00B Now there is some RAID1 and some single data. Which could make sense since writing to the device in degraded mode could only be done on the single device. Still, I cannot make changes to the filesystem now and can "only" recover the data from it, but that's not really the idea of a RAID1 in my opinion. Any advice? Versions: - Kernel 4.4.5-1-ARCH - btrfs-progs 4.4.1 Best regards Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is it safe to mount subvolumes of already-mounted volumes (even with different options)?
Hello, I need to clarify, I'm _not_ sharing a drive between multiple computers at the _same_ time. It's a portable device which I use at different locations with different computers. I just wanted to give a rationale for mounting the whole drive to some mountpoint and then also part of that drive (a subvolume) to the respective computer's /home mountpoint. So it's controlled by the same kernel in the same computer, it's just that part of the filesystem is mounted at multiple mountpoints, much like a bind-mount, but I'm interested in mounting a subvolume of the already-mounted volume to some other mountpoint. Sorry for the confusion. Best regards Sebastian On 17.07.2014 01:18, Chris Murphy wrote: On Jul 16, 2014, at 4:18 PM, Sebastian Ochmann ochm...@informatik.uni-bonn.de wrote: Hello, I'm sharing a btrfs-formatted drive between multiple computers and each of the machines has a separate home directory on that drive. 2+ computers writing to the same block device? I don't see how this is safe. Seems possibly a bug that the 1st mount event isn't setting some metadata so that another kernel instance knows not to allow another mount. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why does btrfs defrag work worse than making a copy of a file?
On 16.07.2014 09:53, Liu Bo wrote: On Tue, Jul 15, 2014 at 11:17:26PM +0200, Sebastian Ochmann wrote: Hello, I have a VirtualBox hard drive image which is quite fragmented even after very light use; it is 1.6 GB in size and has around 5000 fragments (I'm using filefrag to determine the number of fragments). Doing a btrfs fi defrag -f image.vdi reduced the number of fragments to 3749. Even doing a btrfs fi defrag -f -t 1 image.vdi which should make sure every extent is rewritten (according to the btrfs-progs 3.14.2 manpage) does not yield any better result and seems to return immediately. Copying the file, however, yields a copy which has only 5 fragments (simply doing a cp image.vdi image2.vdi; sync; filefrag image2.vdi). What do I have to do to defrag the file to the minimal number of fragments possible? Am I missing something? So usually btrfs thinks of an extent whose size is bigger than 256K as a big enough extent. Another possible reason is that there is something wrong with btrfs_fiemap which gives filefrag' a wrong output. Would you please show us the 'filefrag -v' output? Sure, I have pasted the output of filefrag -v here: http://pastebin.com/kcZhVhkc However, I think the problem is merely in the documentation (manpage of btrfs-filesystem). The description of the -t option is different in two locations and doesn't make sense in general, I think. It is first described as follows: Any extent bigger than threshold given by -t option, will be considered already defragged. Use 0 to take the kernel default, and use 1 to say every single extent must be rewritten. So I used -t 1 because I thought it will defrag as much as possible. However when thinking about it, any extent at least 1 byte (or 2 bytes?) in size will be ignored this way, am I correct? Further below, the -t option is described as follows: -t size defragment only files at least size bytes big Here, the option suddenly refers to the file size. In any case, doing a btrfs fi defrag -f -t 10G image.vdi defragged my file to the 5 extents I also get by simply copying the file. I think the documentation should be updated to reflect what the -t option actually does. Best regards Sebastian thanks, -liubo Kernel version 3.15.5, btrfs progs 3.14.2, Arch Linux. Best regards, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is it safe to mount subvolumes of already-mounted volumes (even with different options)?
Hello, I'm sharing a btrfs-formatted drive between multiple computers and each of the machines has a separate home directory on that drive. The root of the drive is mounted at /mnt/tray and the home directory for machine {hostname} is under /mnt/tray/Homes/{hostname}. Up until now, I have mounted /mnt/tray like a normal volume and then did an additional bind-mount of /mnt/tray/Homes/{hostname} to /home. Now I have a new drive and wanted to do things a bit more advanced by creating subvolumes for each of the machines' home directories so that I can also do independent snapshotting. I guess I could use the bind-mount method like before but my question is if it is considered safe to do an additional, regular mount of one of the subvolumes to /home instead, like mount /dev/sdxN /mnt/tray mount -o subvol=/Homes/{hostname} /dev/sdxN /home When I experimented with such additional mounts of subvolumes of already-mounted volumes, I noticed that the mount options of the additional subvolume mount might differ from the original mount. For instance, the root volume might be mounted with noatime while the subvolume mount may have relatime. So my questions are: Is mounting a subvolume of an already mounted volume considered safe and are there any combinations of possibly conflicting mount options one should be aware of (compression, autodefrag, cache clearing)? Is it advisable to use the same mount options for all mounts pointing to the same physical device? Best regards, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Why does btrfs defrag work worse than making a copy of a file?
Hello, I have a VirtualBox hard drive image which is quite fragmented even after very light use; it is 1.6 GB in size and has around 5000 fragments (I'm using filefrag to determine the number of fragments). Doing a btrfs fi defrag -f image.vdi reduced the number of fragments to 3749. Even doing a btrfs fi defrag -f -t 1 image.vdi which should make sure every extent is rewritten (according to the btrfs-progs 3.14.2 manpage) does not yield any better result and seems to return immediately. Copying the file, however, yields a copy which has only 5 fragments (simply doing a cp image.vdi image2.vdi; sync; filefrag image2.vdi). What do I have to do to defrag the file to the minimal number of fragments possible? Am I missing something? Kernel version 3.15.5, btrfs progs 3.14.2, Arch Linux. Best regards, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Meaning of \no_csum\ field when scrubbing with -R option
Hello, Sebastian Ochmann posted on Wed, 19 Feb 2014 13:58:17 +0100 as excerpted: So my question is, why does scrub show a high (i.e. non-zero) value for no_csum? I never enabled nodatasum or a similar option. Did you enable nodatacow option? if nodatacow option is enabled, data checksums will be also disabled at the same time. No, never, not even on single files. Some additional info: The filesystem is only a few weeks old (even though I see similar results on an older filesystem as well), it's my root filesystem, and as mount options I use rw,noatime,ssd,discard,space_cache (it's on a SSD). Kernel version is 3.12.9. Best regards, Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Meaning of no_csum field when scrubbing with -R option
Hello everyone, I have a question: What exactly does the value for no_csum mean when doing a scrub with the -R option? Example output: sudo btrfs scrub start -BR / scrub done for ... ... csum_errors: 0 verify_errors: 0 no_csum: 70517 csum_discards: 87381 super_errors: 0 ... In the btrfs header, I found the following comment for the no_csum field of the btrfs_scrub_progress struct: # of 4k data block for which no csum is present, probably the result of data written with nodatasum So my question is, why does scrub show a high (i.e. non-zero) value for no_csum? I never enabled nodatasum or a similar option. Best regards Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/2] Btrfs: fix wrong super generation mismatch when scrubbing supers
Hello, seems to be working for me (only tested using both parts of the patch); wasn't able to trigger the errors after almost an hour of stress-testing. Best regards, Sebastian On 04.12.2013 14:15, Wang Shilong wrote: We came a race condition when scrubbing superblocks, the story is: In commiting transaction, we will update @last_trans_commited after writting superblocks, if scrubber start after writting superblocks and before updating @last_trans_commited, generation mismatch happens! We fix this by checking @scrub_pause_req, and we won't start a srubber until commiting transaction is finished.(after btrfs_scrub_continue() finished.) Reported-by: Sebastian Ochmann ochm...@informatik.uni-bonn.de Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Reviewed-by: Miao Xie mi...@cn.fujitsu.com --- v3-v4: by checking @scrub_pause_req, block a scrubber if we are committing transaction(thanks to Miao and Liu) --- fs/btrfs/scrub.c | 45 ++--- 1 file changed, 26 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 2544805..d27f95e 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -257,6 +257,7 @@ static int copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root, static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len, int mirror_num, u64 physical_for_dev_replace); static void copy_nocow_pages_worker(struct btrfs_work *work); +static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info); static void scrub_pending_bio_inc(struct scrub_ctx *sctx) @@ -270,6 +271,16 @@ static void scrub_pending_bio_dec(struct scrub_ctx *sctx) wake_up(sctx-list_wait); } +static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) +{ + while (atomic_read(fs_info-scrub_pause_req)) { + mutex_unlock(fs_info-scrub_lock); + wait_event(fs_info-scrub_pause_wait, + atomic_read(fs_info-scrub_pause_req) == 0); + mutex_lock(fs_info-scrub_lock); + } +} + /* * used for workers that require transaction commits (i.e., for the * NOCOW case) @@ -2330,14 +2341,10 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, btrfs_reada_wait(reada2); mutex_lock(fs_info-scrub_lock); - while (atomic_read(fs_info-scrub_pause_req)) { - mutex_unlock(fs_info-scrub_lock); - wait_event(fs_info-scrub_pause_wait, - atomic_read(fs_info-scrub_pause_req) == 0); - mutex_lock(fs_info-scrub_lock); - } + scrub_blocked_if_needed(fs_info); atomic_dec(fs_info-scrubs_paused); mutex_unlock(fs_info-scrub_lock); + wake_up(fs_info-scrub_pause_wait); /* @@ -2377,15 +2384,12 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, atomic_set(sctx-wr_ctx.flush_all_writes, 0); atomic_inc(fs_info-scrubs_paused); wake_up(fs_info-scrub_pause_wait); + mutex_lock(fs_info-scrub_lock); - while (atomic_read(fs_info-scrub_pause_req)) { - mutex_unlock(fs_info-scrub_lock); - wait_event(fs_info-scrub_pause_wait, - atomic_read(fs_info-scrub_pause_req) == 0); - mutex_lock(fs_info-scrub_lock); - } + scrub_blocked_if_needed(fs_info); atomic_dec(fs_info-scrubs_paused); mutex_unlock(fs_info-scrub_lock); + wake_up(fs_info-scrub_pause_wait); } @@ -2707,14 +2711,10 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, atomic_read(sctx-workers_pending) == 0); mutex_lock(fs_info-scrub_lock); - while (atomic_read(fs_info-scrub_pause_req)) { - mutex_unlock(fs_info-scrub_lock); - wait_event(fs_info-scrub_pause_wait, - atomic_read(fs_info-scrub_pause_req) == 0); - mutex_lock(fs_info-scrub_lock); - } + scrub_blocked_if_needed(fs_info); atomic_dec(fs_info-scrubs_paused); mutex_unlock(fs_info-scrub_lock); + wake_up(fs_info-scrub_pause_wait); btrfs_put_block_group(cache); @@ -2926,7 +2926,13 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, } sctx-readonly = readonly; dev-scrub_device = sctx; + mutex_unlock(fs_info-fs_devices-device_list_mutex); + /* +* checking @scrub_pause_req here, we can avoid +* race between committing transaction and scrubbing. +*/ + scrub_blocked_if_needed(fs_info
Re: 2 errors when scrubbing - but I don't know what they mean
Hello, However, if you find such superblocks checksum mismatch very often during scrub, it maybe there are something wrong with disk! I'm sorry, but I don't think there's a problem with my disks because I was able to trigger the errors that increment the gen error counter during scrub on a completely different machine and drive today. I basically performed some I/O operations on a drive and scrubbed at the same time over and over again until I actually saw super errors during scrub. But the error is reeally hard to trigger. It seems to me like a race condition somewhere. So I went a step further and tried to create a repro for this. It seems like I can trigger the errors now once every few minutes with the method described below, but sometimes it really takes a long time until the error pops up, so be patient when trying this... For the repro: I'm using a btrfs image in RAM for this for two reasons: I can scrub quickly over and over again and I can rule our hard drive errors. My machine has 32 GB of RAM, so that comes in handy here - if you try this on a physical drive, make sure to adjust some parameters, if necessary. Create a tmpfs and a testing image, format as btrfs: $ mkdir btrfstest $ cd btrfstest/ $ mkdir tmp $ mount -t tmpfs -o size=20G none tmp $ dd if=/dev/zero of=tmp/vol bs=1G count=19 $ mkfs.btrfs tmp/vol $ mkdir mnt $ mount -o commit=1 tmp/vol mnt Note the commit=1 mount option. It's not strictly necessary, but I have the feeling it helps with triggering the problem... So now we have a 19 GB btrfs filesystem in RAM, mounted in mnt. What I did for performing some artificial I/O operations is to rm and cp a linux source tree over and over again. Suppose you have an unpacked linux source tree available in the /somewhere/linux directory (and you're using bash). We'll spawn some loops that keep the filesystem busy: $ while true; do rm -fr mnt/a; sleep 1.0; cp -R /somewhere/linux mnt/a; sleep 1.0; done $ while true; do rm -fr mnt/b; sleep 1.1; cp -R /somewhere/linux mnt/b; sleep 1.1; done $ while true; do rm -fr mnt/c; sleep 1.2; cp -R /somewhere/linux mnt/c; sleep 1.2; done Now that the filesystem is busy, we'll also scrub it repeatedly (without backgrounding, -B): $ while true; do btrfs scrub start -B mnt; sleep 0.5; done On my machine and in RAM, each scrub takes 0-1 second and the total bytes scrubbed should fluctuate (seems to be especially true with commit=1, but not sure). Get a beverage of your choice and wait. (about 10 minutes later) When I was writing this repro it took about 10 minutes until scrub said: total bytes scrubbed: 1.20GB with 2 errors error details: super=2 corrected errors: 0, uncorrectable errors: 0, unverified errors: 0 and in dmesg: [15282.155170] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, corrupt 0, gen 1 [15282.155176] btrfs: bdev /dev/loop0 errs: wr 0, rd 0, flush 0, corrupt 0, gen 2 After that, scrub is happy again and will continue normally until the same errors happen again after a few hundred scrubs or so. So all in all, the error can be triggered using normal I/O operations and scrubbing at the right moments, it seems. Even with a btrfs image in RAM, so no hard drive error is possible. Hope anyone can reproduce this and maybe debug it. Best regards Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2 errors when scrubbing - but I don't know what they mean
Hello, thank you for your input. I didn't know that btrfs keeps the error counters over mounts/reboots, but that's nice. I'm still trying to figure out how such a generation error may occur in the first place. One thing I noticed looking at the btrfs code is that the generation error counter will only get incremented in the actual scrubbing code (either in scrub_checksum_super or in scrub_handle_errored_block, both in scrub.c - please correct me if I'm wrong, I'm not a btrfs dev). Also, the dmesg errors I saw were not there at boot time, but about 10 minutes after boot which was about the time when I started the scrub so I'm pretty sure that it was the scrub that detected the errors. The question remains what can cause superblock/gen errors. Sure it could be some read error, but I'd really like to make sure that it's not a systematic error. I wasn't able to reproduce it yet though. Best Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2 errors when scrubbing - but I don't know what they mean
Hello everyone, when I scrubbed one of my btrfs volumes today, the result of the scrub was: total bytes scrubbed: 1.27TB with 2 errors error details: super=2 corrected errors: 0, uncorrectable errors: 0, unverified errors: 0 and dmesg said: btrfs: bdev /dev/mapper/tray errs: wr 0, rd 0, flush 0, corrupt 0, gen 1 btrfs: bdev /dev/mapper/tray errs: wr 0, rd 0, flush 0, corrupt 0, gen 2 Can someone please enlighten me what these errors mean (especially the super and gen values)? As an additional info: The drive is sometimes used in a machine with kernel 3.11.6 and sometimes with 3.12.0, could this swapping explain the problem somehow? Best regards Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html