Re: btrfs btree_ctree_super fault
On 11/17/2016 12:39 AM, Chris Cui wrote: We have just encountered the same bug on 4.9.0-rc2. Any solution now? kernel BUG at fs/btrfs/ctree.c:3172! invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 0 PID: 22702 Comm: trinity-c40 Not tainted 4.9.0-rc4-think+ #1 task: 8804ffde37c0 task.stack: c90002188000 RIP: 0010:[] [] btrfs_set_item_key_safe+0x179/0x190 [btrfs] RSP: :c9000218b8a8 EFLAGS: 00010246 RAX: RBX: 8804fddcf348 RCX: 1000 RDX: RSI: c9000218b9ce RDI: c9000218b8c7 RBP: c9000218b908 R08: 4000 R09: c9000218b8c8 R10: R11: 0001 R12: c9000218b8b6 R13: c9000218b9ce R14: 0001 R15: 880480684a88 FS: 7f7c7f998b40() GS:88050780() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 00044f15f000 CR4: 001406f0 DR0: 7f4ce439d000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Stack: 88050143 d305a00a2245 006c0002 0510 6c0002d3 1000 6427eebb 880480684a88 8804fddcf348 2000 Call Trace: [] __btrfs_drop_extents+0xb00/0xe30 [btrfs] We're going to bash on Josef's patch and probably send it with the next merge window (queued for stable as well). https://patchwork.kernel.org/patch/9431679/ -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Hello, We are seeing this issue regularly across many of the CentOS 7 servers we use for automated software builds. We’ve hit what seems to be this bug from kernel 3.10 through to 4.9.5-1 on physical hardware (HP BL460C G7 Blades, P410i RAID controller in RAID1) for several years now. I’m finding it a little hard to navigate the plethora of mailing list archives and changelogs I’ve found thus far and from the patch Chris provided above I couldn’t find a way to see if this had been merged into the kernel so I’m wondering – 1) Did it make it in? 2) If so, In what kernel version? (and if possible, how can one correlate this information to a release in the future) 3) And finally, if so, do people generally agree that it’s resolved the issue? Below is a crash (resulting in a reboot) we experienced this morning on one of the hosts. (Note that since rebooting, this host has booted into a newer 4.9.9 kernel). Kernel at time of crash: 4.9.5-1.el7.elrepo.x86_64 root@s1-b12:~ # btrfs --version btrfs-progs v4.4.1 root@s1-b12:~ # btrfs fi show Label: none uuid: 87f6d740-0675-41d7-896d-b04d252c7783 Total devices 1 FS bytes used 1.08GiB devid1 size 426.61GiB used 4.02GiB path /dev/sda3 root@s1-b12:~ # btrfs fi df /var/lib/docker Data, single: total=2.01GiB, used=1.00GiB System, DUP: total=8.00MiB, used=16.00KiB Metadata, DUP: total=1.00GiB, used=76.88MiB GlobalReserve, single: total=16.00MiB, used=0.00B [1712950.168671] [ cut here ] [1712950.169806] kernel BUG at fs/btrfs/ctree.c:3172! [1712950.170925] invalid opcode: [#1] SMP [1712950.172034] Modules linked in: fuse ufs hfsplus hfs vfat msdos fat veth binfmt_misc mptctl mptbase ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack bonding xfs libcrc32c intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd intel_cstate btrfs xor ipmi_devintf raid6_pq iTCO_wdt gpio_ich iTCO_vendor_support pcspkr sg lpc_ich mfd_core hpwdt hpilo ipmi_si ipmi_msghandler be2iscsi iscsi_boot_sysfs libiscsi i7core_edac acpi_power_meter scsi_transport_iscsi edac_core shpchp pcc_cpufreq acpi_cpufreq ip_tables ext4 jbd2 mbcache sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt [1712950.179662] crc32c_intel fb_sys_fops serio_raw ttm hpsa drm scsi_transport_sas be2net fjes dm_mirror dm_region_hash dm_log dm_mod [1712950.182391] CPU: 7 PID: 18324 Comm: apt-get Tainted: G I 4.9.5-1.el7.elrepo.x86_64 #1 [1712950.183805] Hardware name: HP ProLiant BL460c G7, BIOS I27 08/16/2015 [1712950.185223] task: 880549f48000 task.stack: c9000d64 [1712950.186655] RIP: 0010:[] [] btrfs_set_item_key_safe+0x172/0x180 [btrfs] [1712950.188180] RSP: 0018:c9000d643920 EFLAGS: 00010246 [1712950.189664] RAX: RBX: 0031 RCX: 000a [1712950.191155] RDX: RSI: c9000d643a3e RDI: c9000d64393f [1712950.192639] RBP: c9000d643980 R08: 4000 R09: c9000d643940 [1712950.194111] R10: R11: 0003 R12: c9000d64392e [1712950.195569] R13: 8808efb15d90 R14: c9000d643a3e R15: 8807ef220d20 [1712950.197044] FS: 7ff1686d56e0() GS:880bdb8c() knlGS: [1712950.198529] CS: 0010 DS: ES: CR0: 80050033 [1712950.28] CR2: 7ff1672adb8c CR3: 000ac136f000 CR4: 06e0 [1712950.201524] Stack: [1712950.203021] 8812b46
Re: btrfs btree_ctree_super fault
On 11/17/2016 12:39 AM, Chris Cui wrote: We have just encountered the same bug on 4.9.0-rc2. Any solution now? kernel BUG at fs/btrfs/ctree.c:3172! invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 0 PID: 22702 Comm: trinity-c40 Not tainted 4.9.0-rc4-think+ #1 task: 8804ffde37c0 task.stack: c90002188000 RIP: 0010:[] [] btrfs_set_item_key_safe+0x179/0x190 [btrfs] RSP: :c9000218b8a8 EFLAGS: 00010246 RAX: RBX: 8804fddcf348 RCX: 1000 RDX: RSI: c9000218b9ce RDI: c9000218b8c7 RBP: c9000218b908 R08: 4000 R09: c9000218b8c8 R10: R11: 0001 R12: c9000218b8b6 R13: c9000218b9ce R14: 0001 R15: 880480684a88 FS: 7f7c7f998b40() GS:88050780() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 00044f15f000 CR4: 001406f0 DR0: 7f4ce439d000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Stack: 88050143 d305a00a2245 006c0002 0510 6c0002d3 1000 6427eebb 880480684a88 8804fddcf348 2000 Call Trace: [] __btrfs_drop_extents+0xb00/0xe30 [btrfs] We're going to bash on Josef's patch and probably send it with the next merge window (queued for stable as well). https://patchwork.kernel.org/patch/9431679/ -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs btree_ctree_super fault
We have just encountered the same bug on 4.9.0-rc2. Any solution now? > kernel BUG at fs/btrfs/ctree.c:3172! > invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC > CPU: 0 PID: 22702 Comm: trinity-c40 Not tainted 4.9.0-rc4-think+ #1 > task: 8804ffde37c0 task.stack: c90002188000 > RIP: 0010:[] > [] btrfs_set_item_key_safe+0x179/0x190 [btrfs] > RSP: :c9000218b8a8 EFLAGS: 00010246 > RAX: RBX: 8804fddcf348 RCX: 1000 > RDX: RSI: c9000218b9ce RDI: c9000218b8c7 > RBP: c9000218b908 R08: 4000 R09: c9000218b8c8 > R10: R11: 0001 R12: c9000218b8b6 > R13: c9000218b9ce R14: 0001 R15: 880480684a88 > FS: 7f7c7f998b40() GS:88050780() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: CR3: 00044f15f000 CR4: 001406f0 > DR0: 7f4ce439d000 DR1: DR2: > DR3: DR6: 0ff0 DR7: 0600 > Stack: > 88050143 d305a00a2245 006c0002 0510 > 6c0002d3 1000 6427eebb 880480684a88 > 8804fddcf348 2000 > Call Trace: > [] __btrfs_drop_extents+0xb00/0xe30 [btrfs] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs btree_ctree_super fault
On 11/10/2016 09:35 AM, Dave Jones wrote: On Tue, Nov 08, 2016 at 10:08:04AM -0500, Chris Mason wrote: > > And another new one: > > > > kernel BUG at fs/btrfs/ctree.c:3172! > > > > Call Trace: > > [] __btrfs_drop_extents+0xb00/0xe30 [btrfs] > > We've been hunting this one for at least two years. It's the white > whale of btrfs bugs. Josef has a semi-reliable reproducer now, but I > think it's not the same as the pagevec based problems you reported earlier. Great, now for whatever reason, I'm hitting this over and over. Even better, after the last time I hit it, it reboot and this happened during boot.. BTRFS info (device sda6): disk space caching is enabled BTRFS info (device sda6): has skinny extents BTRFS info (device sda3): disk space caching is enabled [ cut here ] WARNING: CPU: 1 PID: 443 at fs/btrfs/file.c:546 btrfs_drop_extent_cache+0x411/0x420 [btrfs] CPU: 1 PID: 443 Comm: mount Not tainted 4.9.0-rc4-think+ #1 c9c4b468 813b66bc c9c4b4a8 81086d2b 022200c4b488 0002f265 40c8dded1afd6000 8804ff5cddc8 8804ef26f2b8 40c8dded1afd5000 Call Trace: [] dump_stack+0x4f/0x73 [] __warn+0xcb/0xf0 [] warn_slowpath_null+0x1d/0x20 [] btrfs_drop_extent_cache+0x411/0x420 [btrfs] [] ? alloc_debug_processing+0x73/0x1b0 [] __btrfs_drop_extents+0x44f/0xe30 [btrfs] [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [] ? kmem_cache_alloc+0x2aa/0x330 [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [] btrfs_drop_extents+0x79/0xa0 [btrfs] [] replay_one_extent+0x1e1/0x710 [btrfs] [] replay_one_buffer+0x26d/0x7e0 [btrfs] [] ? ___slab_alloc.constprop.83+0x27c/0x5c0 [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [] ? debug_smp_processor_id+0x17/0x20 [] walk_up_log_tree+0xeb/0x240 [btrfs] [] walk_log_tree+0xa6/0x1d0 [btrfs] [] btrfs_recover_log_trees+0x1dc/0x460 [btrfs] [] ? replay_one_extent+0x710/0x710 [btrfs] [] open_ctree+0x2575/0x2670 [btrfs] [] btrfs_mount+0xd0b/0xe10 [btrfs] [] ? pcpu_alloc+0x2d4/0x660 [] ? lockdep_init_map+0x61/0x200 [] ? __init_waitqueue_head+0x3b/0x50 [] mount_fs+0x14/0xa0 [] vfs_kern_mount+0x6b/0x150 [] btrfs_mount+0x2c8/0xe10 [btrfs] [] ? pcpu_alloc+0x2d4/0x660 [] ? lockdep_init_map+0x61/0x200 [] ? lockdep_init_map+0x61/0x200 [] ? __init_waitqueue_head+0x3b/0x50 [] mount_fs+0x14/0xa0 [] vfs_kern_mount+0x6b/0x150 [] do_mount+0x1c2/0xda0 [] ? memdup_user+0x60/0x90 [] SyS_mount+0x83/0xd0 [] do_syscall_64+0x61/0x170 [] entry_SYSCALL64_slow_path+0x25/0x25 ---[ end trace d3fa03bb9c115bbe ]--- BTRFS: error (device sda3) in btrfs_replay_log:2491: errno=-17 Object already exists (Failed to recover log tree) BTRFS error (device sda3): cleaner transaction attach returned -30 BTRFS error (device sda3): open_ctree failed Guess I'll hit it with btrfsck and hope for the best.. You can zero the log if you need to. Josef has a ton of tracing around this right now, so I'm hoping we nail it down very soon. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs btree_ctree_super fault
On Tue, Nov 08, 2016 at 10:08:04AM -0500, Chris Mason wrote: > > And another new one: > > > > kernel BUG at fs/btrfs/ctree.c:3172! > > > > Call Trace: > > [] __btrfs_drop_extents+0xb00/0xe30 [btrfs] > > We've been hunting this one for at least two years. It's the white > whale of btrfs bugs. Josef has a semi-reliable reproducer now, but I > think it's not the same as the pagevec based problems you reported earlier. Great, now for whatever reason, I'm hitting this over and over. Even better, after the last time I hit it, it reboot and this happened during boot.. BTRFS info (device sda6): disk space caching is enabled BTRFS info (device sda6): has skinny extents BTRFS info (device sda3): disk space caching is enabled [ cut here ] WARNING: CPU: 1 PID: 443 at fs/btrfs/file.c:546 btrfs_drop_extent_cache+0x411/0x420 [btrfs] CPU: 1 PID: 443 Comm: mount Not tainted 4.9.0-rc4-think+ #1 c9c4b468 813b66bc c9c4b4a8 81086d2b 022200c4b488 0002f265 40c8dded1afd6000 8804ff5cddc8 8804ef26f2b8 40c8dded1afd5000 Call Trace: [] dump_stack+0x4f/0x73 [] __warn+0xcb/0xf0 [] warn_slowpath_null+0x1d/0x20 [] btrfs_drop_extent_cache+0x411/0x420 [btrfs] [] ? alloc_debug_processing+0x73/0x1b0 [] __btrfs_drop_extents+0x44f/0xe30 [btrfs] [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [] ? kmem_cache_alloc+0x2aa/0x330 [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [] btrfs_drop_extents+0x79/0xa0 [btrfs] [] replay_one_extent+0x1e1/0x710 [btrfs] [] replay_one_buffer+0x26d/0x7e0 [btrfs] [] ? ___slab_alloc.constprop.83+0x27c/0x5c0 [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [] ? debug_smp_processor_id+0x17/0x20 [] walk_up_log_tree+0xeb/0x240 [btrfs] [] walk_log_tree+0xa6/0x1d0 [btrfs] [] btrfs_recover_log_trees+0x1dc/0x460 [btrfs] [] ? replay_one_extent+0x710/0x710 [btrfs] [] open_ctree+0x2575/0x2670 [btrfs] [] btrfs_mount+0xd0b/0xe10 [btrfs] [] ? pcpu_alloc+0x2d4/0x660 [] ? lockdep_init_map+0x61/0x200 [] ? __init_waitqueue_head+0x3b/0x50 [] mount_fs+0x14/0xa0 [] vfs_kern_mount+0x6b/0x150 [] btrfs_mount+0x2c8/0xe10 [btrfs] [] ? pcpu_alloc+0x2d4/0x660 [] ? lockdep_init_map+0x61/0x200 [] ? lockdep_init_map+0x61/0x200 [] ? __init_waitqueue_head+0x3b/0x50 [] mount_fs+0x14/0xa0 [] vfs_kern_mount+0x6b/0x150 [] do_mount+0x1c2/0xda0 [] ? memdup_user+0x60/0x90 [] SyS_mount+0x83/0xd0 [] do_syscall_64+0x61/0x170 [] entry_SYSCALL64_slow_path+0x25/0x25 ---[ end trace d3fa03bb9c115bbe ]--- BTRFS: error (device sda3) in btrfs_replay_log:2491: errno=-17 Object already exists (Failed to recover log tree) BTRFS error (device sda3): cleaner transaction attach returned -30 BTRFS error (device sda3): open_ctree failed Guess I'll hit it with btrfsck and hope for the best.. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs btree_ctree_super fault
On 11/08/2016 09:59 AM, Dave Jones wrote: On Sun, Nov 06, 2016 at 11:55:39AM -0500, Dave Jones wrote: > > > On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote: > > On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote: > > >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones wrote: > > >> > > >> BUG: Bad page state in process kworker/u8:12 pfn:4e0e39 > > >> page:ea0013838e40 count:0 mapcount:0 mapping:8804a20310e0 index:0x100c > > >> flags: 0x400c(referenced|uptodate) > > >> page dumped because: non-NULL mapping > > > > > >Hmm. So this seems to be btrfs-specific, right? > > > > > >I searched for all your "non-NULL mapping" cases, and they all seem to > > >have basically the same call trace, with some work thread doing > > >writeback and going through btrfs_writepages(). > > > > > >Sounds like it's a race with either fallocate hole-punching or > > >truncate. I'm not seeing it, but I suspect it's btrfs, since DaveJ > > >clearly ran other filesystems too but I am not seeing this backtrace > > >for anything else. > > > > Agreed, I think this is a separate bug, almost certainly btrfs specific. > > I'll work with Dave on a better reproducer. > > Still refining my 'capture ftrace when trinity detects taint' feature, > but in the meantime, here's a variant I don't think we've seen before: And another new one: kernel BUG at fs/btrfs/ctree.c:3172! invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 0 PID: 22702 Comm: trinity-c40 Not tainted 4.9.0-rc4-think+ #1 task: 8804ffde37c0 task.stack: c90002188000 RIP: 0010:[] [] btrfs_set_item_key_safe+0x179/0x190 [btrfs] RSP: :c9000218b8a8 EFLAGS: 00010246 RAX: RBX: 8804fddcf348 RCX: 1000 RDX: RSI: c9000218b9ce RDI: c9000218b8c7 RBP: c9000218b908 R08: 4000 R09: c9000218b8c8 R10: R11: 0001 R12: c9000218b8b6 R13: c9000218b9ce R14: 0001 R15: 880480684a88 FS: 7f7c7f998b40() GS:88050780() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 00044f15f000 CR4: 001406f0 DR0: 7f4ce439d000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Stack: 88050143 d305a00a2245 006c0002 0510 6c0002d3 1000 6427eebb 880480684a88 8804fddcf348 2000 Call Trace: [] __btrfs_drop_extents+0xb00/0xe30 [btrfs] We've been hunting this one for at least two years. It's the white whale of btrfs bugs. Josef has a semi-reliable reproducer now, but I think it's not the same as the pagevec based problems you reported earlier. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs btree_ctree_super fault
On Sun, Nov 06, 2016 at 11:55:39AM -0500, Dave Jones wrote: > > > On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote: > > On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote: > > >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones > wrote: > > >> > > >> BUG: Bad page state in process kworker/u8:12 pfn:4e0e39 > > >> page:ea0013838e40 count:0 mapcount:0 mapping:8804a20310e0 > index:0x100c > > >> flags: 0x400c(referenced|uptodate) > > >> page dumped because: non-NULL mapping > > > > > >Hmm. So this seems to be btrfs-specific, right? > > > > > >I searched for all your "non-NULL mapping" cases, and they all seem to > > >have basically the same call trace, with some work thread doing > > >writeback and going through btrfs_writepages(). > > > > > >Sounds like it's a race with either fallocate hole-punching or > > >truncate. I'm not seeing it, but I suspect it's btrfs, since DaveJ > > >clearly ran other filesystems too but I am not seeing this backtrace > > >for anything else. > > > > Agreed, I think this is a separate bug, almost certainly btrfs specific. > > I'll work with Dave on a better reproducer. > > Still refining my 'capture ftrace when trinity detects taint' feature, > but in the meantime, here's a variant I don't think we've seen before: And another new one: kernel BUG at fs/btrfs/ctree.c:3172! invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 0 PID: 22702 Comm: trinity-c40 Not tainted 4.9.0-rc4-think+ #1 task: 8804ffde37c0 task.stack: c90002188000 RIP: 0010:[] [] btrfs_set_item_key_safe+0x179/0x190 [btrfs] RSP: :c9000218b8a8 EFLAGS: 00010246 RAX: RBX: 8804fddcf348 RCX: 1000 RDX: RSI: c9000218b9ce RDI: c9000218b8c7 RBP: c9000218b908 R08: 4000 R09: c9000218b8c8 R10: R11: 0001 R12: c9000218b8b6 R13: c9000218b9ce R14: 0001 R15: 880480684a88 FS: 7f7c7f998b40() GS:88050780() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 00044f15f000 CR4: 001406f0 DR0: 7f4ce439d000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Stack: 88050143 d305a00a2245 006c0002 0510 6c0002d3 1000 6427eebb 880480684a88 8804fddcf348 2000 Call Trace: [] __btrfs_drop_extents+0xb00/0xe30 [btrfs] [] ? function_trace_call+0x13c/0x190 [] ? __btrfs_drop_extents+0x5/0xe30 [btrfs] [] ? do_raw_write_lock+0xb0/0xc0 [] btrfs_log_changed_extents+0x35d/0x630 [btrfs] [] ? release_extent_buffer+0xa4/0x110 [btrfs] [] ? btrfs_log_changed_extents+0x5/0x630 [btrfs] [] btrfs_log_inode+0xb05/0x11d0 [btrfs] [] ? trace_function+0x6c/0x80 [] ? log_directory_changes+0xc0/0xc0 [btrfs] [] ? btrfs_log_inode_parent+0x240/0x940 [btrfs] [] ? function_trace_call+0x13c/0x190 [] btrfs_log_inode_parent+0x240/0x940 [btrfs] [] ? btrfs_log_inode_parent+0x5/0x940 [btrfs] [] ? dget_parent+0x71/0x150 [] btrfs_log_dentry_safe+0x62/0x80 [btrfs] [] btrfs_sync_file+0x344/0x4d0 [btrfs] [] vfs_fsync_range+0x4b/0xb0 [] ? __fget_light+0x5/0x60 [] do_fsync+0x3d/0x70 [] ? do_fsync+0x5/0x70 [] SyS_fdatasync+0x13/0x20 [] do_syscall_64+0x61/0x170 [] entry_SYSCALL64_slow_path+0x25/0x25 Code: 48 8b 45 b7 48 8d 7d bf 4c 89 ee 48 89 45 c8 0f b6 45 b6 88 45 c7 48 8b 45 ae 48 89 45 bf e8 af f2 ff ff 85 c0 0f 8f 43 ff ff ff <0f> 0b 0f 0b e8 ee f3 02 e1 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 Unfortunatly, because this was a BUG_ON, it locked up the box so it didn't save any additional debug info. Tempted to see if making BUG_ON a no-op will at least let it live long enough to save the ftrace buffer. Given this seems to be mutating every time I see something go wrong, I'm wondering if this is fallout from memory corruption again. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs btree_ctree_super fault
On Mon, Oct 31, 2016 at 01:44:55PM -0600, Chris Mason wrote: > On Mon, Oct 31, 2016 at 12:35:16PM -0700, Linus Torvalds wrote: > >On Mon, Oct 31, 2016 at 11:55 AM, Dave Jones > >wrote: > >> > >> BUG: Bad page state in process kworker/u8:12 pfn:4e0e39 > >> page:ea0013838e40 count:0 mapcount:0 mapping:8804a20310e0 > >> index:0x100c > >> flags: 0x400c(referenced|uptodate) > >> page dumped because: non-NULL mapping > > > >Hmm. So this seems to be btrfs-specific, right? > > > >I searched for all your "non-NULL mapping" cases, and they all seem to > >have basically the same call trace, with some work thread doing > >writeback and going through btrfs_writepages(). > > > >Sounds like it's a race with either fallocate hole-punching or > >truncate. I'm not seeing it, but I suspect it's btrfs, since DaveJ > >clearly ran other filesystems too but I am not seeing this backtrace > >for anything else. > > Agreed, I think this is a separate bug, almost certainly btrfs specific. > I'll work with Dave on a better reproducer. Still refining my 'capture ftrace when trinity detects taint' feature, but in the meantime, here's a variant I don't think we've seen before: general protection fault: [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 3 PID: 1913 Comm: trinity-c51 Not tainted 4.9.0-rc3-think+ #3 task: 880503350040 task.stack: c924 RIP: 0010:[] [] write_ctree_super+0x96/0xb30 [btrfs] RSP: 0018:c9243c90 EFLAGS: 00010286 RAX: dae05adadadad000 RBX: RCX: 0002 RDX: 8804fdfcc000 RSI: 8804edcee313 RDI: 8804edcee1c3 RBP: c9243d00 R08: 0003 R09: 8800 R10: 0001 R11: 0100 R12: 88045151c548 R13: R14: 8804ee5122a8 R15: 8804572267e8 FS: 7f25c3e0eb40() GS:880507e0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f25c1560d44 CR3: 000454e2 CR4: 001406e0 DR0: 7fee93506000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Stack: 0001 88050227b3f8 8804fff01b28 0001810b7f35 a007c265 0001 c9243cb8 2c9a8645 8804fff01b28 8804fff01b28 88045151c548 Call Trace: [] ? write_ctree_super+0x5/0xb30 [btrfs] [] btrfs_sync_log+0x886/0xa60 [btrfs] [] btrfs_sync_file+0x479/0x4d0 [btrfs] [] vfs_fsync_range+0x4b/0xb0 [] ? __fget_light+0x5/0x60 [] do_fsync+0x3d/0x70 [] ? do_fsync+0x5/0x70 [] SyS_fsync+0x10/0x20 [] do_syscall_64+0x61/0x170 [] entry_SYSCALL64_slow_path+0x25/0x25 Code: c7 48 8b 42 30 4c 8b 08 48 b8 00 00 00 00 00 16 00 00 49 03 81 a0 01 00 00 49 b9 00 00 00 00 00 88 ff ff 48 c1 f8 06 48 c1 e0 0c <4a> 8b 44 08 50 48 39 46 08 0f 84 8d 08 00 00 49 63 c0 48 8d 0c RIP [] write_ctree_super+0x96/0xb30 [btrfs] RSP All code 0: c7 (bad) 1: 48 8b 42 30 mov0x30(%rdx),%rax 5: 4c 8b 08mov(%rax),%r9 8: 48 b8 00 00 00 00 00movabs $0x1600,%rax f: 16 00 00 12: 49 03 81 a0 01 00 00add0x1a0(%r9),%rax 19: 49 b9 00 00 00 00 00movabs $0x8800,%r9 20: 88 ff ff 23: 48 c1 f8 06 sar$0x6,%rax 27: 48 c1 e0 0c shl$0xc,%rax 2b:* 4a 8b 44 08 50 mov0x50(%rax,%r9,1),%rax<-- trapping instruction 30: 48 39 46 08 cmp%rax,0x8(%rsi) 34: 0f 84 8d 08 00 00 je 0x8c7 3a: 49 63 c0movslq %r8d,%rax 3d: 48 rex.W 3e: 8d .byte 0x8d 3f: 0c .byte 0xc Code starting with the faulting instruction === 0: 4a 8b 44 08 50 mov0x50(%rax,%r9,1),%rax 5: 48 39 46 08 cmp%rax,0x8(%rsi) 9: 0f 84 8d 08 00 00 je 0x89c f: 49 63 c0movslq %r8d,%rax 12: 48 rex.W 13: 8d .byte 0x8d 14: 0c .byte 0xc According to objdump -S, it looks like this is an inlined copy of backup_super_roots root_backup = info->super_for_commit->super_roots + last_backup; 2706: 48 8d b8 2b 0b 00 00lea0xb2b(%rax),%rdi 270d: 48 63 c1movslq %ecx,%rax 2710: 48 8d 34 80 lea(%rax,%rax,4),%rsi 2714: 48 8d 04 b0 lea(%rax,%rsi,4),%rax 2718: 48 8d 34 c7 lea(%rdi,%rax,8),%rsi btrfs_header_generation(info->tree_root->node)) 271c: 48 8b 42 30 mov0x30(%rdx),%rax 2720: 4c 8b 08mov(%rax),%r9 2723: 48 b8 00 00 00 00 00movabs $0x1600,%rax 272a: 16 00 00 272d