On Thu, Jan 09, 2020 at 01:02:35PM +0000, Neil Brown wrote: > This is just a heads up in case anyone else has seen or sees > something similar.
*hijacking thread* We've been seeing fairly regular panics on 1.8.2 running on Centos 7 (log below). It's clearly a ZFS bug and not AFS, but the curious thing is that: * we've never seen this on any of our FreeBSD AFS servers (also ZFS) * we've never seen this ZFS behavior for anything other than our AFS fileservers, running various flavors of Linux, FreeBSD, and IllumOS. (NFS fileservers have never seen it; virtualization never seen it; webservers, never seen it) * we've never seen this on any of our AFS fileservers using Linux + ext4, only with ZFSoL. So it seems like something peculiar that the AFS fileserver is doing that ZFSoL doesn't like. Though others have reported other applications triggering it, we've never seen it with anything other than the AFS fileserver. Here's a link to the ZFS bug report that seems relevant: https://github.com/zfsonlinux/zfs/issues/8673 On a dafileserver, it will hang only volumes that trigger the bug. On the traditional fileserver, the whole thing hangs. Here is a recent hang: Jan 10 21:07:08 sloth kernel: PANIC: zfs: accessing past end of object 8/4a7559 (size=9216 access=4136+8848) Jan 10 21:07:08 sloth kernel: Showing stack for process 10371 Jan 10 21:07:08 sloth kernel: CPU: 2 PID: 10371 Comm: dafileserver Kdump: loaded Tainted: P OE ------------ 3.10.0-957.27.2.el7.x86_64 #1 Jan 10 21:07:08 sloth kernel: Hardware name: Supermicro X8DT6/X8DT6, BIOS 2.0b 08/30/2011 Jan 10 21:07:08 sloth kernel: Call Trace: Jan 10 21:07:08 sloth kernel: [<ffffffffb8764147>] dump_stack+0x19/0x1b Jan 10 21:07:08 sloth kernel: [<ffffffffc070b9db>] spl_dumpstack+0x2b/0x30 [spl] Jan 10 21:07:08 sloth kernel: [<ffffffffc070bb5c>] vcmn_err+0x6c/0x110 [spl] Jan 10 21:07:08 sloth kernel: [<ffffffffb8768192>] ? mutex_lock+0x12/0x2f Jan 10 21:07:08 sloth kernel: [<ffffffffc10ee5ba>] ? dmu_zfetch+0x4ea/0x590 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffc10cb863>] ? dbuf_rele_and_unlock+0x283/0x5c0 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffb8768192>] ? mutex_lock+0x12/0x2f Jan 10 21:07:08 sloth kernel: [<ffffffffc10c85f3>] ? dbuf_find+0x1e3/0x200 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffb8768192>] ? mutex_lock+0x12/0x2f Jan 10 21:07:08 sloth kernel: [<ffffffffb821e911>] ? __kmalloc_node+0x1d1/0x2b0 Jan 10 21:07:08 sloth kernel: [<ffffffffc11483a9>] zfs_panic_recover+0x69/0x90 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffc10d72a7>] dmu_buf_hold_array_by_dnode+0x2d7/0x4a0 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffc110560a>] ? dsl_dir_tempreserve_space+0x1fa/0x4a0 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffc10d8c45>] dmu_write_uio_dnode+0x55/0x150 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffc10ed9ad>] ? dmu_tx_assign+0x20d/0x490 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffc10d8d94>] dmu_write_uio_dbuf+0x54/0x70 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffc11abe6c>] zfs_write+0xd3c/0xed0 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffc10c85f3>] ? dbuf_find+0x1e3/0x200 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffb8768192>] ? mutex_lock+0x12/0x2f Jan 10 21:07:08 sloth kernel: [<ffffffffb8768192>] ? mutex_lock+0x12/0x2f Jan 10 21:07:08 sloth kernel: [<ffffffffb80ddd9e>] ? account_entity_dequeue+0xae/0xd0 Jan 10 21:07:08 sloth kernel: [<ffffffffb802a621>] ? __switch_to+0x151/0x580 Jan 10 21:07:08 sloth kernel: [<ffffffffc11cb74e>] zpl_write_common_iovec.constprop.8+0x9e/0x100 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffc11cb8b4>] zpl_aio_write+0x104/0x120 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffb8241cdb>] do_sync_readv_writev+0x7b/0xd0 Jan 10 21:07:08 sloth kernel: [<ffffffffb824391e>] do_readv_writev+0xce/0x260 Jan 10 21:07:08 sloth kernel: [<ffffffffc11cb7b0>] ? zpl_write_common_iovec.constprop.8+0x100/0x100 [zfs] Jan 10 21:07:08 sloth kernel: [<ffffffffb8241b80>] ? do_sync_read+0xe0/0xe0 Jan 10 21:07:08 sloth kernel: [<ffffffffb8243b45>] vfs_writev+0x35/0x60 Jan 10 21:07:08 sloth kernel: [<ffffffffb8243f42>] SyS_pwritev+0xc2/0xf0 Jan 10 21:07:08 sloth kernel: [<ffffffffb8776ddb>] system_call_fastpath+0x22/0x27 Jan 10 21:07:08 sloth kernel: [<ffffffffb8776d21>] ? system_call_after_swapgs+0xae/0x146 -- M. Casper Lewis | [email protected] Systems Administrator | Voice: (530) 754-7978 Genome Center | University of California, Davis | _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
