Re: cause of dmesg call traces?
Adam Bahe posted on Sat, 26 Aug 2017 15:30:54 -0500 as excerpted: > Hello all. Recently I added another 10TB sas drive to my btrfs array and > I have received the following messages in dmesg during the balance. I > was hoping someone could clarify what seems to be causing this. > > Some additional info, I did a smartctl long test and one of my brand new > 8TB drives warned me with this: > > 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 136 # > 5 Extended offlineCompleted: servo/seek failure 90% > 474 0 > > Are the messages in dmesg caused by the issues with the hard drive, or > something else entirely? I am not a developer, just a btrfs user and list regular, with my reply being based on what I've seen on-list. For a more authoritative answer you can wait for other replies, but this one can cover a few basics. Answering the above question, FWIW, the dmesg below seems to be something else... > A few months ago I had a total failure > requiring a complete nuke and pave so I am trying to track down any > potential issues aggressively and appreciate any help. Thanks! > > Also, how many current_pending_sectors do you tolerate before you swap a > drive? I am going to pull this drive as soon as this current balance > finishes. But for future reference it would be good to keep an eye on. > > > > [Sat Aug 26 03:01:53 2017] WARNING: CPU: 30 PID: 5516 at > fs/btrfs/extent-tree.c:3197 btrfs_cross_ref_exist+0xd1/0xf0 [btrfs] Note warning, not error... It's unexpected but not fatal, and the balance should continue without making whatever triggered the warning worse. If I'm not mistaken (and if I am it doesn't change the conclusion), the triggering of this warning is a known issue related to a rather narrow kernel version window. A newer current series kernel, or potentially older LTS series kernel, could well fix the problem. See below. > [Sat Aug 26 03:01:53 2017] CPU: 30 PID: 5516 Comm: kworker/u97:5 > Tainted: GW 4.10.6-1.el7.elrepo.x86_64 #1 Kernel 4.10.x. That's outside this list's recommended and best supported range, tho not massively so. Given that this list is development focused and btrfs, while stabilizing, isn't yet considered fully stable and mature, emphasis tends to be forward-focused toward relatively new kernels. The list recommendation is therefore one of the two latest kernel release series in either current-mainline-stable or mainline-LTS support tracks. For current track, 4.12 is the latest release (with 4.13 getting close), so 4.12 and 4.11 are best supported, and with 4.13 nearing release 4.11 is actually already EOLed with no further mainline updates. For LTS track, 4.9 is the latest LTS series, with 4.4 the previous one, and 4.1 the one before that, tho btrfs development is moving fast enough that it's no longer recommended and even with 4.4, requests to duplicate reported issues with 4.9 may be expected. So 4.10 has dropped off the recommended list as a non-LTS series kernel that's too old, and the recommendation would be to either upgrade to the latest 4.12-stable release (4.12.9 according to kernel.org as I post), or downgrade to the latest 4.9-LTS release (4.9.45 ATM). And if I'm not mixing up issues and that's the one I think it is, the latest 4.12 should have that fix (tho 4.12.0 may not, IIRC the fix made 4.13 and was backported to 4.12.x), and 4.9, IIRC, wasn't subject to the issue. If you continue to see that warning with 4.13-rc6+, 4.12.9+ or 4.9.45+, then I'm obviously mixed up, and the devs may well be quite interested as it may be a new issue. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
cause of dmesg call traces?
Hello all. Recently I added another 10TB sas drive to my btrfs array and I have received the following messages in dmesg during the balance. I was hoping someone could clarify what seems to be causing this. Some additional info, I did a smartctl long test and one of my brand new 8TB drives warned me with this: 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 136 # 5 Extended offlineCompleted: servo/seek failure 90% 474 0 Are the messages in dmesg caused by the issues with the hard drive, or something else entirely? A few months ago I had a total failure requiring a complete nuke and pave so I am trying to track down any potential issues aggressively and appreciate any help. Thanks! Also, how many current_pending_sectors do you tolerate before you swap a drive? I am going to pull this drive as soon as this current balance finishes. But for future reference it would be good to keep an eye on. [Sat Aug 26 03:01:53 2017] WARNING: CPU: 30 PID: 5516 at fs/btrfs/extent-tree.c:3197 btrfs_cross_ref_exist+0xd1/0xf0 [btrfs] [Sat Aug 26 03:01:53 2017] Modules linked in: dm_mod rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass iTCO_wdt crct10dif_pclmul iTCO_vendor_support crc32_pclmul ghash_clmulni_intel pcbc ext4 aesni_intel jbd2 crypto_simd mbcache glue_helper cryptd intel_cstate intel_rapl_perf ses enclosure pcspkr mei_me lpc_ich input_leds i2c_i801 joydev mfd_core mei sg ioatdma shpchp wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl 8021q lockd garp grace mrp sunrpc ip_tables btrfs xor raid6_pq mlx4_en sd_mod crc32c_intel mlx4_core ast i2c_algo_bit ata_generic [Sat Aug 26 03:01:53 2017] pata_acpi drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe ata_piix mdio mpt3sas ptp raid_class pps_core libata scsi_transport_sas dca fjes [Sat Aug 26 03:01:53 2017] CPU: 30 PID: 5516 Comm: kworker/u97:5 Tainted: GW 4.10.6-1.el7.elrepo.x86_64 #1 [Sat Aug 26 03:01:53 2017] Hardware name: Supermicro Super Server/X10DRi-T4+, BIOS 2.0 12/17/2015 [Sat Aug 26 03:01:53 2017] Workqueue: writeback wb_workfn (flush-btrfs-2) [Sat Aug 26 03:01:53 2017] Call Trace: [Sat Aug 26 03:01:53 2017] dump_stack+0x63/0x87 [Sat Aug 26 03:01:53 2017] __warn+0xd1/0xf0 [Sat Aug 26 03:01:53 2017] warn_slowpath_null+0x1d/0x20 [Sat Aug 26 03:01:53 2017] btrfs_cross_ref_exist+0xd1/0xf0 [btrfs] [Sat Aug 26 03:01:53 2017] run_delalloc_nocow+0x6e7/0xc00 [btrfs] [Sat Aug 26 03:01:53 2017] ? test_range_bit+0xd0/0x160 [btrfs] [Sat Aug 26 03:01:53 2017] run_delalloc_range+0x7d/0x3a0 [btrfs] [Sat Aug 26 03:01:53 2017] ? find_lock_delalloc_range.constprop.56+0x1d1/0x200 [btrfs] [Sat Aug 26 03:01:53 2017] writepage_delalloc.isra.48+0x10c/0x170 [btrfs] [Sat Aug 26 03:01:53 2017] __extent_writepage+0xd6/0x2e0 [btrfs] [Sat Aug 26 03:01:53 2017] extent_write_cache_pages.isra.44.constprop.59+0x2c4/0x480 [btrfs] [Sat Aug 26 03:01:53 2017] extent_writepages+0x5c/0x90 [btrfs] [Sat Aug 26 03:01:53 2017] ? btrfs_submit_direct+0x8b0/0x8b0 [btrfs] [Sat Aug 26 03:01:53 2017] btrfs_writepages+0x28/0x30 [btrfs] [Sat Aug 26 03:01:53 2017] do_writepages+0x1e/0x30 [Sat Aug 26 03:01:53 2017] __writeback_single_inode+0x45/0x330 [Sat Aug 26 03:01:53 2017] writeback_sb_inodes+0x280/0x570 [Sat Aug 26 03:01:53 2017] __writeback_inodes_wb+0x8c/0xc0 [Sat Aug 26 03:01:53 2017] wb_writeback+0x276/0x310 [Sat Aug 26 03:01:53 2017] wb_workfn+0x2e1/0x410 [Sat Aug 26 03:01:53 2017] process_one_work+0x165/0x410 [Sat Aug 26 03:01:53 2017] worker_thread+0x137/0x4c0 [Sat Aug 26 03:01:53 2017] kthread+0x101/0x140 [Sat Aug 26 03:01:53 2017] ? rescuer_thread+0x3b0/0x3b0 [Sat Aug 26 03:01:53 2017] ? kthread_park+0x90/0x90 [Sat Aug 26 03:01:53 2017] ret_from_fork+0x2c/0x40 [Sat Aug 26 03:01:53 2017] ---[ end trace 7ba8e3b5c60c322d ]--- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: status of inline deduplication in btrfs
On Sat, Aug 26, 2017 at 01:36:35AM +, Duncan wrote: > The second has to do with btrfs scaling issues due to reflinking, which > of course is the operational mechanism for both snapshotting and dedup. > Snapshotting of course reflinks the entire subvolume, so it's reflinking > on a /massive/ scale. While normal file operations aren't affected much, > btrfs maintenance operations such as balance and check scale badly enough > with snapshotting (due to the reflinking) that keeping the number of > snapshots per subvolume under 250 or so is strongly recommended, and > keeping them to double-digits or even single-digits is recommended if > possible. > > Dedup works by reflinking as well, but its effect on btrfs maintenance > will be far more variable, depending of course on how effective the > deduping, and thus the reflinking, is. But considering that snapshotting > is effectively 100% effective deduping of the entire subvolume (until the > snapshot and active copy begin to diverge, at least), that tends to be > the worst case, so figuring a full two-copy dedup as equivalent to one > snapshot is a reasonable estimate of effect. If dedup only catches 10%, > only once, than it would be 10% of a snapshot's effect. If it's 10% but > there's 10 duplicated instances, that's the effect of a single snapshot. > Assuming of course that the dedup domain is the same as the subvolume > that's being snapshotted. Nope, snapshotting is not anywhere near the worst case of dedup: [/]$ find /bin /sbin /lib /usr /var -type f -exec md5sum '{}' +| cut -d' ' -f1|sort|uniq -c|sort -nr|head Even on the system parts (ie, ignoring my data) of my desktop, top files have the following dup counts: 532 384 373 164 123 122 101. On this small SSD, the system parts are reflinked by snapshots with 10 dailies, and by deduping with 10 regular chroots, 11 sbuild chroots and 3 full-system lxc containers (chroots are mostly a zoo of different architectures). This is nothing compared to the backup server, which stores backups of 46 machines (only system/user and small data, bulky stuff is backed up elsewhere), 24 snapshots each (a mix of dailies, 1/11/21, monthlies and yearly). This worked well enough until I made the mistake of deduping the whole thing. But, this is still not the worst horror imaginable. I'd recommend using whole-file dedup only as this avoids this pitfall: take two VM images, run block dedup on them. Identical blocks in them will be cross-reflinked. And there's _many_. The vast majority of duplicate blocks are all-zero: I just ran fallocate -d on a 40G win10 VM and it shrank to 19G. AFAIK file_extent_same is not yet smart enough to dedupe them to a hole instead. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!? ⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din ⠈⠳⣄ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: resume qgroup rescan on rw remount
On 07/12/2017 03:03 AM, David Sterba wrote: On Mon, Jul 10, 2017 at 04:56:36PM +0300, Nikolay Borisov wrote: On 10.07.2017 16:12, Nikolay Borisov wrote: On 4.07.2017 14:49, Aleksa Sarai wrote: Several distributions mount the "proper root" as ro during initrd and then remount it as rw before pivot_root(2). Thus, if a rescan had been aborted by a previous shutdown, the rescan would never be resumed. This issue would manifest itself as several btrfs ioctl(2)s causing the entire machine to hang when btrfs_qgroup_wait_for_completion was hit (due to the fs_info->qgroup_rescan_running flag being set but the rescan itself not being resumed). Notably, Docker's btrfs storage driver makes regular use of BTRFS_QUOTA_CTL_DISABLE and BTRFS_IOC_QUOTA_RESCAN_WAIT (causing this problem to be manifested on boot for some machines). Cc:# v3.11+ Cc: Jeff Mahoney Fixes: b382a324b60f ("Btrfs: fix qgroup rescan resume on mount") Signed-off-by: Aleksa Sarai Indeed, looking at the code it seems that b382a324b60f ("Btrfs: fix qgroup rescan resume on mount") missed adding the qgroup_rescan_resume in the remount path. One thing which I couldn't verify though is whether reading fs_info->qgroup_flags without any locking is safe from remount context. During remount I don't see any locks taken that prevent operations which can modify qgroup_flags. Further inspection reveals that the access rules to qgroup_flags are somewhat broken so this patch doesn't really make things any worse than they are. The usage follows a pattern for a bitfield, updated by set_bit/clear_bit etc. The updates to the state or inconsistency is not safe, so some updates could get lost under some circumstances. Patch added to devel queue, possibly will be submitted to 4.13 so stable can pick it. Looks like it wasn't merged in the 4.13 window (so stable hasn't picked it), will this be submitted for 4.14? Thanks. -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html