Re: cause of dmesg call traces?

2017-08-26 Thread Duncan
Adam Bahe posted on Sat, 26 Aug 2017 15:30:54 -0500 as excerpted:

> Hello all. Recently I added another 10TB sas drive to my btrfs array and
> I have received the following messages in dmesg during the balance. I
> was hoping someone could clarify what seems to be causing this.
> 
> Some additional info, I did a smartctl long test and one of my brand new
> 8TB drives warned me with this:
> 
> 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 136 #
> 5  Extended offlineCompleted: servo/seek failure 90%
> 474 0
> 
> Are the messages in dmesg caused by the issues with the hard drive, or
> something else entirely?

I am not a developer, just a btrfs user and list regular, with my reply 
being based on what I've seen on-list.  For a more authoritative answer 
you can wait for other replies, but this one can cover a few basics.

Answering the above question, FWIW, the dmesg below seems to be something 
else...

> A few months ago I had a total failure
> requiring a complete nuke and pave so I am trying to track down any
> potential issues aggressively and appreciate any help. Thanks!
> 
> Also, how many current_pending_sectors do you tolerate before you swap a
> drive? I am going to pull this drive as soon as this current balance
> finishes. But for future reference it would be good to keep an eye on.
> 
> 
> 
> [Sat Aug 26 03:01:53 2017] WARNING: CPU: 30 PID: 5516 at
> fs/btrfs/extent-tree.c:3197 btrfs_cross_ref_exist+0xd1/0xf0 [btrfs]

Note warning, not error...  It's unexpected but not fatal, and the 
balance should continue without making whatever triggered the warning 
worse.

If I'm not mistaken (and if I am it doesn't change the conclusion), the 
triggering of this warning is a known issue related to a rather narrow 
kernel version window.  A newer current series kernel, or potentially 
older LTS series kernel, could well fix the problem.   See below.

> [Sat Aug 26 03:01:53 2017] CPU: 30 PID: 5516 Comm: kworker/u97:5
> Tainted: GW   4.10.6-1.el7.elrepo.x86_64 #1

Kernel 4.10.x.  That's outside this list's recommended and best supported 
range, tho not massively so.  Given that this list is development focused 
and btrfs, while stabilizing, isn't yet considered fully stable and 
mature, emphasis tends to be forward-focused toward relatively new 
kernels.

The list recommendation is therefore one of the two latest kernel release 
series in either current-mainline-stable or mainline-LTS support tracks.

For current track, 4.12 is the latest release (with 4.13 getting close), 
so 4.12 and 4.11 are best supported, and with 4.13 nearing release 4.11 
is actually already EOLed with no further mainline updates.

For LTS track, 4.9 is the latest LTS series, with 4.4 the previous one, 
and 4.1 the one before that, tho btrfs development is moving fast enough 
that it's no longer recommended and even with 4.4, requests to duplicate 
reported issues with 4.9 may be expected.

So 4.10 has dropped off the recommended list as a non-LTS series kernel 
that's too old, and the recommendation would be to either upgrade to the 
latest 4.12-stable release (4.12.9 according to kernel.org as I post), or 
downgrade to the latest 4.9-LTS release (4.9.45 ATM).

And if I'm not mixing up issues and that's the one I think it is, the 
latest 4.12 should have that fix (tho 4.12.0 may not, IIRC the fix made 
4.13 and was backported to 4.12.x), and 4.9, IIRC, wasn't subject to the 
issue.

If you continue to see that warning with 4.13-rc6+, 4.12.9+ or 4.9.45+, 
then I'm obviously mixed up, and the devs may well be quite interested as 
it may be a new issue.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


cause of dmesg call traces?

2017-08-26 Thread Adam Bahe
Hello all. Recently I added another 10TB sas drive to my btrfs array
and I have received the following messages in dmesg during the
balance. I was hoping someone could clarify what seems to be causing
this.

Some additional info, I did a smartctl long test and one of my brand
new 8TB drives warned me with this:

197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 136
# 5  Extended offlineCompleted: servo/seek failure 90%
474 0

Are the messages in dmesg caused by the issues with the hard drive, or
something else entirely? A few months ago I had a total failure
requiring a complete nuke and pave so I am trying to track down any
potential issues aggressively and appreciate any help. Thanks!

Also, how many current_pending_sectors do you tolerate before you swap
a drive? I am going to pull this drive as soon as this current balance
finishes. But for future reference it would be good to keep an eye on.



[Sat Aug 26 03:01:53 2017] WARNING: CPU: 30 PID: 5516 at
fs/btrfs/extent-tree.c:3197 btrfs_cross_ref_exist+0xd1/0xf0 [btrfs]

[Sat Aug 26 03:01:53 2017] Modules linked in: dm_mod rpcrdma ib_isert
iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt
target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib ib_core sb_edac
edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
irqbypass iTCO_wdt crct10dif_pclmul iTCO_vendor_support crc32_pclmul
ghash_clmulni_intel pcbc ext4 aesni_intel jbd2 crypto_simd mbcache
glue_helper cryptd intel_cstate intel_rapl_perf ses enclosure pcspkr
mei_me lpc_ich input_leds i2c_i801 joydev mfd_core mei sg ioatdma
shpchp wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter
acpi_pad nfsd auth_rpcgss nfs_acl 8021q lockd garp grace mrp sunrpc
ip_tables btrfs xor raid6_pq mlx4_en sd_mod crc32c_intel mlx4_core ast
i2c_algo_bit ata_generic

[Sat Aug 26 03:01:53 2017]  pata_acpi drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe ata_piix mdio mpt3sas
ptp raid_class pps_core libata scsi_transport_sas dca fjes

[Sat Aug 26 03:01:53 2017] CPU: 30 PID: 5516 Comm: kworker/u97:5
Tainted: GW   4.10.6-1.el7.elrepo.x86_64 #1

[Sat Aug 26 03:01:53 2017] Hardware name: Supermicro Super
Server/X10DRi-T4+, BIOS 2.0 12/17/2015

[Sat Aug 26 03:01:53 2017] Workqueue: writeback wb_workfn (flush-btrfs-2)

[Sat Aug 26 03:01:53 2017] Call Trace:

[Sat Aug 26 03:01:53 2017]  dump_stack+0x63/0x87

[Sat Aug 26 03:01:53 2017]  __warn+0xd1/0xf0

[Sat Aug 26 03:01:53 2017]  warn_slowpath_null+0x1d/0x20

[Sat Aug 26 03:01:53 2017]  btrfs_cross_ref_exist+0xd1/0xf0 [btrfs]

[Sat Aug 26 03:01:53 2017]  run_delalloc_nocow+0x6e7/0xc00 [btrfs]

[Sat Aug 26 03:01:53 2017]  ? test_range_bit+0xd0/0x160 [btrfs]

[Sat Aug 26 03:01:53 2017]  run_delalloc_range+0x7d/0x3a0 [btrfs]

[Sat Aug 26 03:01:53 2017]  ?
find_lock_delalloc_range.constprop.56+0x1d1/0x200 [btrfs]

[Sat Aug 26 03:01:53 2017]  writepage_delalloc.isra.48+0x10c/0x170 [btrfs]

[Sat Aug 26 03:01:53 2017]  __extent_writepage+0xd6/0x2e0 [btrfs]

[Sat Aug 26 03:01:53 2017]
extent_write_cache_pages.isra.44.constprop.59+0x2c4/0x480 [btrfs]

[Sat Aug 26 03:01:53 2017]  extent_writepages+0x5c/0x90 [btrfs]

[Sat Aug 26 03:01:53 2017]  ? btrfs_submit_direct+0x8b0/0x8b0 [btrfs]

[Sat Aug 26 03:01:53 2017]  btrfs_writepages+0x28/0x30 [btrfs]

[Sat Aug 26 03:01:53 2017]  do_writepages+0x1e/0x30

[Sat Aug 26 03:01:53 2017]  __writeback_single_inode+0x45/0x330

[Sat Aug 26 03:01:53 2017]  writeback_sb_inodes+0x280/0x570

[Sat Aug 26 03:01:53 2017]  __writeback_inodes_wb+0x8c/0xc0

[Sat Aug 26 03:01:53 2017]  wb_writeback+0x276/0x310

[Sat Aug 26 03:01:53 2017]  wb_workfn+0x2e1/0x410

[Sat Aug 26 03:01:53 2017]  process_one_work+0x165/0x410

[Sat Aug 26 03:01:53 2017]  worker_thread+0x137/0x4c0

[Sat Aug 26 03:01:53 2017]  kthread+0x101/0x140

[Sat Aug 26 03:01:53 2017]  ? rescuer_thread+0x3b0/0x3b0

[Sat Aug 26 03:01:53 2017]  ? kthread_park+0x90/0x90

[Sat Aug 26 03:01:53 2017]  ret_from_fork+0x2c/0x40

[Sat Aug 26 03:01:53 2017] ---[ end trace 7ba8e3b5c60c322d ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: status of inline deduplication in btrfs

2017-08-26 Thread Adam Borowski
On Sat, Aug 26, 2017 at 01:36:35AM +, Duncan wrote:
> The second has to do with btrfs scaling issues due to reflinking, which 
> of course is the operational mechanism for both snapshotting and dedup.  
> Snapshotting of course reflinks the entire subvolume, so it's reflinking 
> on a /massive/ scale.  While normal file operations aren't affected much, 
> btrfs maintenance operations such as balance and check scale badly enough 
> with snapshotting (due to the reflinking) that keeping the number of 
> snapshots per subvolume under 250 or so is strongly recommended, and 
> keeping them to double-digits or even single-digits is recommended if 
> possible.
> 
> Dedup works by reflinking as well, but its effect on btrfs maintenance 
> will be far more variable, depending of course on how effective the 
> deduping, and thus the reflinking, is.  But considering that snapshotting 
> is effectively 100% effective deduping of the entire subvolume (until the 
> snapshot and active copy begin to diverge, at least), that tends to be 
> the worst case, so figuring a full two-copy dedup as equivalent to one 
> snapshot is a reasonable estimate of effect.  If dedup only catches 10%, 
> only once, than it would be 10% of a snapshot's effect.  If it's 10% but 
> there's 10 duplicated instances, that's the effect of a single snapshot.  
> Assuming of course that the dedup domain is the same as the subvolume 
> that's being snapshotted.

Nope, snapshotting is not anywhere near the worst case of dedup:

[/]$ find /bin /sbin /lib /usr /var -type f -exec md5sum '{}' +|
cut -d' ' -f1|sort|uniq -c|sort -nr|head

Even on the system parts (ie, ignoring my data) of my desktop, top files
have the following dup counts: 532 384 373 164 123 122 101.  On this small
SSD, the system parts are reflinked by snapshots with 10 dailies, and by
deduping with 10 regular chroots, 11 sbuild chroots and 3 full-system lxc
containers (chroots are mostly a zoo of different architectures).

This is nothing compared to the backup server, which stores backups of 46
machines (only system/user and small data, bulky stuff is backed up
elsewhere), 24 snapshots each (a mix of dailies, 1/11/21, monthlies and
yearly).  This worked well enough until I made the mistake of deduping the
whole thing.

But, this is still not the worst horror imaginable.  I'd recommend using
whole-file dedup only as this avoids this pitfall: take two VM images, run
block dedup on them.  Identical blocks in them will be cross-reflinked.  And
there's _many_.  The vast majority of duplicate blocks are all-zero: I just
ran fallocate -d on a 40G win10 VM and it shrank to 19G.  AFAIK
file_extent_same is not yet smart enough to dedupe them to a hole instead.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀ -- Genghis Ht'rok'din
⠈⠳⣄ 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: resume qgroup rescan on rw remount

2017-08-26 Thread Aleksa Sarai

On 07/12/2017 03:03 AM, David Sterba wrote:

On Mon, Jul 10, 2017 at 04:56:36PM +0300, Nikolay Borisov wrote:

On 10.07.2017 16:12, Nikolay Borisov wrote:

On  4.07.2017 14:49, Aleksa Sarai wrote:

Several distributions mount the "proper root" as ro during initrd and
then remount it as rw before pivot_root(2). Thus, if a rescan had been
aborted by a previous shutdown, the rescan would never be resumed.

This issue would manifest itself as several btrfs ioctl(2)s causing the
entire machine to hang when btrfs_qgroup_wait_for_completion was hit
(due to the fs_info->qgroup_rescan_running flag being set but the rescan
itself not being resumed). Notably, Docker's btrfs storage driver makes
regular use of BTRFS_QUOTA_CTL_DISABLE and BTRFS_IOC_QUOTA_RESCAN_WAIT
(causing this problem to be manifested on boot for some machines).

Cc:  # v3.11+
Cc: Jeff Mahoney 
Fixes: b382a324b60f ("Btrfs: fix qgroup rescan resume on mount")
Signed-off-by: Aleksa Sarai 


Indeed, looking at the code it seems that b382a324b60f ("Btrfs: fix
qgroup rescan resume on mount") missed adding the qgroup_rescan_resume
in the remount path. One thing which I couldn't verify though is whether
reading fs_info->qgroup_flags without any locking is safe from remount
context.

During remount I don't see any locks taken that prevent operations which
can modify qgroup_flags.


Further inspection reveals that the access rules to qgroup_flags are
somewhat broken so this patch doesn't really make things any worse than
they are.


The usage follows a pattern for a bitfield, updated by set_bit/clear_bit
etc. The updates to the state or inconsistency is not safe, so some
updates could get lost under some circumstances.

Patch added to devel queue, possibly will be submitted to 4.13 so stable
can pick it.


Looks like it wasn't merged in the 4.13 window (so stable hasn't picked 
it), will this be submitted for 4.14? Thanks.


--
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html