Bug#841144: kernel BUG at /build/linux-Wgpe2M/linux-4.8.11/fs/ocfs2/alloc.c:1514!

2017-03-14 Thread Ian Campbell
OCFS2 folks, any thoughts on this crash?

On Tue, 2017-01-17 at 02:12 +, Ben Hutchings wrote:
> On Mon, 2017-01-16 at 13:12 -0600, Russell Mosemann wrote:
> [...]
> > Jan 15 17:31:03 vhost032 kernel: [ cut here ]
> > Jan 15 17:31:03 vhost032 kernel: kernel BUG at 
> > /build/linux-Wgpe2M/linux-4.8.11/fs/ocfs2/alloc.c:1514!
> 
> This is:
> 
> static int ocfs2_grow_tree(handle_t *handle, struct ocfs2_extent_tree *et,
>    int *final_depth, struct buffer_head **last_eb_bh,
>    struct ocfs2_alloc_context *meta_ac)
> {
> ...
> BUG_ON(meta_ac == NULL);
> 
> > [...]
> > Jan 15 17:31:03 vhost032 kernel: Call Trace:
> > Jan 15 17:31:03 vhost032 kernel:  [] ? 
> > ocfs2_set_buffer_uptodate+0x35/0x4a0 [ocfs2]
> > Jan 15 17:31:03 vhost032 kernel:  [] ? 
> > __find_get_block+0xa7/0x110
> > Jan 15 17:31:03 vhost032 kernel:  [] ? 
> > ocfs2_split_and_insert+0x307/0x490 [ocfs2]
> > Jan 15 17:31:03 vhost032 kernel:  [] ? 
> > ocfs2_split_extent+0x3ee/0x560 [ocfs2]
> > Jan 15 17:31:03 vhost032 kernel:  [] ? 
> > ocfs2_change_extent_flag+0x273/0x450 [ocfs2]
> > Jan 15 17:31:03 vhost032 kernel:  [] ? 
> > ocfs2_mark_extent_written+0x110/0x1d0 [ocfs2]
> > Jan 15 17:31:03 vhost032 kernel:  [] ? 
> > ocfs2_dio_end_io_write+0x44d/0x600 [ocfs2]
> 
> meta_ac is passed down from ocfs2_dio_end_io_write(), which allocates
> it using ocfs2_lock_allocators()... but the latter only allocates it
> conditionally.  It seems like the condition is wrong somehow.

This still seems to be happening for this user with 4.9.13, looking at
"git log -p v4.9.13..origin/master -- fs/ocfs2" I wonder if
https://git.kernel.org/torvalds/c/3e10b793fc40dfdbe51762e0d084bd6f2c8acaaa
might be relevant?

The commit message mentions meta_ac not getting allocated and an extent
split vs refcount split differentiation and we have ocfs2_split_extent
in the trace. Slim reasoning I know, maybe someone who knows the code
could make a better determination.

As Ben said before the whole bug report can be found at https://bugs.de
bian.org/841144

Ian.



Bug#841144: kernel BUG at /build/linux-Wgpe2M/linux-4.8.11/fs/ocfs2/alloc.c:1514!

2017-01-16 Thread Ben Hutchings
On Mon, 2017-01-16 at 13:12 -0600, Russell Mosemann wrote:
[...]
> Jan 15 17:31:03 vhost032 kernel: [ cut here ]
> Jan 15 17:31:03 vhost032 kernel: kernel BUG at 
> /build/linux-Wgpe2M/linux-4.8.11/fs/ocfs2/alloc.c:1514!

This is:

static int ocfs2_grow_tree(handle_t *handle, struct ocfs2_extent_tree *et,
   int *final_depth, struct buffer_head **last_eb_bh,
   struct ocfs2_alloc_context *meta_ac)
{
...
BUG_ON(meta_ac == NULL);

> Jan 15 17:31:03 vhost032 kernel: invalid opcode:  [#1] SMP
> Jan 15 17:31:03 vhost032 kernel: Modules linked in: vhost_net(E) vhost(E) 
> macvtap(E) macvlan(E) tun(E) ocfs2(E) quota_tree(E) hmac(E) veth(E) 
> iptable_filter(E) ip_tables(E) x_tables(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) 
> nfs(E) lockd(E) grace(E) fscache(E) sunrpc(E) ocfs2_dlmfs(E) 
> ocfs2_stack_o2cb(E) ocfs2_dlm(E) ocfs2_nodemanager(E) ocfs2_stackglue(E) 
> configfs(E) bridge(E) stp(E) llc(E) bonding(E) intel_rapl(E) sb_edac(E) 
> edac_core(E) x86_pkg_temp_thermal(E) coretemp(E) ast(E) kvm_intel(E) ttm(E) 
> drm_kms_helper(E) mxm_wmi(E) iTCO_wdt(E) kvm(E) iTCO_vendor_support(E) igb(E) 
> evdev(E) irqbypass(E) drm(E) xhci_pci(E) dca(E) ehci_pci(E) xhci_hcd(E) 
> crct10dif_pclmul(E) ehci_hcd(E) crc32_pclmul(E) i2c_algo_bit(E) e1000e(E) 
> usbcore(E) ptp(E) mei_me(E) lpc_ich(E) ghash_clmulni_intel(E) i2c_i801(E) 
> pcspkr(E) usb_common(E) sg(E)
> Jan 15 17:31:03 vhost032 kernel:  mei(E) shpchp(E) i2c_smbus(E) pps_core(E) 
> mfd_core(E) ipmi_si(E) wmi(E) fjes(E) ipmi_msghandler(E) tpm_tis(E) 
> tpm_tis_core(E) tpm(E) acpi_power_meter(E) acpi_pad(E) button(E) fuse(E) 
> drbd(E) lru_cache(E) libcrc32c(E) crc32c_generic(E) autofs4(E) ext4(E) 
> crc16(E) jbd2(E) fscrypto(E) mbcache(E) dm_mod(E) md_mod(E) sd_mod(E) 
> crc32c_intel(E) aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) 
> gf128mul(E) ablk_helper(E) cryptd(E) ahci(E) libahci(E) libata(E) scsi_mod(E)
> Jan 15 17:31:03 vhost032 kernel: CPU: 5 PID: 28586 Comm: qemu-system-x86 
> Tainted: GE   4.8.0-0.bpo.2-amd64 #1 Debian 4.8.11-1~bpo8+1
> Jan 15 17:31:03 vhost032 kernel: Hardware name: To Be Filled By O.E.M. To Be 
> Filled By O.E.M./EPC612D4I, BIOS P2.10 03/31/2016
> Jan 15 17:31:03 vhost032 kernel: task: 8e6e8584d000 task.stack: 
> 8e6d8079
> Jan 15 17:31:03 vhost032 kernel: RIP: 0010:[]  
> [] ocfs2_grow_tree+0x6f2/0x780 [ocfs2]
> Jan 15 17:31:03 vhost032 kernel: RSP: 0018:8e6d80793618  EFLAGS: 00010246
> Jan 15 17:31:03 vhost032 kernel: RAX:  RBX: 0004 
> RCX: 8e6d80793790
> Jan 15 17:31:03 vhost032 kernel: RDX: 8e6d807936bc RSI: 8e6d80793968 
> RDI: 8e6ea5012690
> Jan 15 17:31:03 vhost032 kernel: RBP: 8e6d80793678 R08:  
> R09: 00141d0b
> Jan 15 17:31:03 vhost032 kernel: R10: 01586960 R11: 8e6e36ab30c0 
> R12: 0001
> Jan 15 17:31:03 vhost032 kernel: R13: 8e6d80793828 R14: 8e6e36ab30c0 
> R15: 0001
> Jan 15 17:31:03 vhost032 kernel: FS:  7f578affd700() 
> GS:8e7cbf34() knlGS:
> Jan 15 17:31:03 vhost032 kernel: CS:  0010 DS:  ES:  CR0: 
> 80050033
> Jan 15 17:31:03 vhost032 kernel: CR2: b0015a300238 CR3: 0001f5579000 
> CR4: 001426e0
> Jan 15 17:31:03 vhost032 kernel: Stack:
> Jan 15 17:31:03 vhost032 kernel:  8e6d80793728 8e6d80793728 
> c092aa75 8e6cb1fc0c30
> Jan 15 17:31:03 vhost032 kernel:  8e6e81ddb548 9e63ba27 
> ab4b7e2a 0004
> Jan 15 17:31:03 vhost032 kernel:  0001 8e6d80793828 
> 8e6d80793968 8e6e78de1700
> Jan 15 17:31:03 vhost032 kernel: Call Trace:
> Jan 15 17:31:03 vhost032 kernel:  [] ? 
> ocfs2_set_buffer_uptodate+0x35/0x4a0 [ocfs2]
> Jan 15 17:31:03 vhost032 kernel:  [] ? 
> __find_get_block+0xa7/0x110
> Jan 15 17:31:03 vhost032 kernel:  [] ? 
> ocfs2_split_and_insert+0x307/0x490 [ocfs2]
> Jan 15 17:31:03 vhost032 kernel:  [] ? 
> ocfs2_split_extent+0x3ee/0x560 [ocfs2]
> Jan 15 17:31:03 vhost032 kernel:  [] ? 
> ocfs2_change_extent_flag+0x273/0x450 [ocfs2]
> Jan 15 17:31:03 vhost032 kernel:  [] ? 
> ocfs2_mark_extent_written+0x110/0x1d0 [ocfs2]
> Jan 15 17:31:03 vhost032 kernel:  [] ? 
> ocfs2_dio_end_io_write+0x44d/0x600 [ocfs2]

meta_ac is passed down from ocfs2_dio_end_io_write(), which allocates
it using ocfs2_lock_allocators()... but the latter only allocates it
conditionally.  It seems like the condition is wrong somehow.

I didn't see any relevant changes post-4.8 (though I did see a number
of unrelated bug fixes that maybe ought to go to stable).

The rest of the traceback is below; the whole bug report can be found
at https://bugs.debian.org/841144

Ben.

> Jan 15 17:31:03 vhost032 kernel:  [] ? 
> ocfs2_allocate_extend_trans+0x180/0x180 [ocfs2]
> Jan 15 17:31:03 vhost032 kernel:  [] ? 
> ocfs2_dio_end_io+0x3b/0x60 [ocfs2]
> Jan 15 17:31:03 vhost032 kernel:  [] ? 

Bug#841144: kernel BUG at /build/linux-Wgpe2M/linux-4.8.11/fs/ocfs2/alloc.c:1514!

2017-01-16 Thread Russell Mosemann

Package: src:linux
Version: 4.8.11-1~bpo8+1
Severity: critical

Dear Maintainer,

   * What led up to the situation?
Jan 15 17:31:03 vhost032 kernel: [ cut here ]
Jan 15 17:31:03 vhost032 kernel: kernel BUG at 
/build/linux-Wgpe2M/linux-4.8.11/fs/ocfs2/alloc.c:1514!
Jan 15 17:31:03 vhost032 kernel: invalid opcode:  [#1] SMP
Jan 15 17:31:03 vhost032 kernel: Modules linked in: vhost_net(E) vhost(E) 
macvtap(E) macvlan(E) tun(E) ocfs2(E) quota_tree(E) hmac(E) veth(E) 
iptable_filter(E) ip_tables(E) x_tables(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) 
nfs(E) lockd(E) grace(E) fscache(E) sunrpc(E) ocfs2_dlmfs(E) 
ocfs2_stack_o2cb(E) ocfs2_dlm(E) ocfs2_nodemanager(E) ocfs2_stackglue(E) 
configfs(E) bridge(E) stp(E) llc(E) bonding(E) intel_rapl(E) sb_edac(E) 
edac_core(E) x86_pkg_temp_thermal(E) coretemp(E) ast(E) kvm_intel(E) ttm(E) 
drm_kms_helper(E) mxm_wmi(E) iTCO_wdt(E) kvm(E) iTCO_vendor_support(E) igb(E) 
evdev(E) irqbypass(E) drm(E) xhci_pci(E) dca(E) ehci_pci(E) xhci_hcd(E) 
crct10dif_pclmul(E) ehci_hcd(E) crc32_pclmul(E) i2c_algo_bit(E) e1000e(E) 
usbcore(E) ptp(E) mei_me(E) lpc_ich(E) ghash_clmulni_intel(E) i2c_i801(E) 
pcspkr(E) usb_common(E) sg(E)
Jan 15 17:31:03 vhost032 kernel:  mei(E) shpchp(E) i2c_smbus(E) pps_core(E) 
mfd_core(E) ipmi_si(E) wmi(E) fjes(E) ipmi_msghandler(E) tpm_tis(E) 
tpm_tis_core(E) tpm(E) acpi_power_meter(E) acpi_pad(E) button(E) fuse(E) 
drbd(E) lru_cache(E) libcrc32c(E) crc32c_generic(E) autofs4(E) ext4(E) crc16(E) 
jbd2(E) fscrypto(E) mbcache(E) dm_mod(E) md_mod(E) sd_mod(E) crc32c_intel(E) 
aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) 
cryptd(E) ahci(E) libahci(E) libata(E) scsi_mod(E)
Jan 15 17:31:03 vhost032 kernel: CPU: 5 PID: 28586 Comm: qemu-system-x86 
Tainted: GE   4.8.0-0.bpo.2-amd64 #1 Debian 4.8.11-1~bpo8+1
Jan 15 17:31:03 vhost032 kernel: Hardware name: To Be Filled By O.E.M. To Be 
Filled By O.E.M./EPC612D4I, BIOS P2.10 03/31/2016
Jan 15 17:31:03 vhost032 kernel: task: 8e6e8584d000 task.stack: 
8e6d8079
Jan 15 17:31:03 vhost032 kernel: RIP: 0010:[]  
[] ocfs2_grow_tree+0x6f2/0x780 [ocfs2]
Jan 15 17:31:03 vhost032 kernel: RSP: 0018:8e6d80793618  EFLAGS: 00010246
Jan 15 17:31:03 vhost032 kernel: RAX:  RBX: 0004 
RCX: 8e6d80793790
Jan 15 17:31:03 vhost032 kernel: RDX: 8e6d807936bc RSI: 8e6d80793968 
RDI: 8e6ea5012690
Jan 15 17:31:03 vhost032 kernel: RBP: 8e6d80793678 R08:  
R09: 00141d0b
Jan 15 17:31:03 vhost032 kernel: R10: 01586960 R11: 8e6e36ab30c0 
R12: 0001
Jan 15 17:31:03 vhost032 kernel: R13: 8e6d80793828 R14: 8e6e36ab30c0 
R15: 0001
Jan 15 17:31:03 vhost032 kernel: FS:  7f578affd700() 
GS:8e7cbf34() knlGS:
Jan 15 17:31:03 vhost032 kernel: CS:  0010 DS:  ES:  CR0: 
80050033
Jan 15 17:31:03 vhost032 kernel: CR2: b0015a300238 CR3: 0001f5579000 
CR4: 001426e0
Jan 15 17:31:03 vhost032 kernel: Stack:
Jan 15 17:31:03 vhost032 kernel:  8e6d80793728 8e6d80793728 
c092aa75 8e6cb1fc0c30
Jan 15 17:31:03 vhost032 kernel:  8e6e81ddb548 9e63ba27 
ab4b7e2a 0004
Jan 15 17:31:03 vhost032 kernel:  0001 8e6d80793828 
8e6d80793968 8e6e78de1700
Jan 15 17:31:03 vhost032 kernel: Call Trace:
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_set_buffer_uptodate+0x35/0x4a0 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? 
__find_get_block+0xa7/0x110
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_split_and_insert+0x307/0x490 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_split_extent+0x3ee/0x560 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_change_extent_flag+0x273/0x450 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_mark_extent_written+0x110/0x1d0 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_dio_end_io_write+0x44d/0x600 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_allocate_extend_trans+0x180/0x180 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_dio_end_io+0x3b/0x60 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? dio_complete+0x68/0x160
Jan 15 17:31:03 vhost032 kernel:  [] ? 
do_blockdev_direct_IO+0x2079/0x23f0
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_write_end_nolock+0x560/0x560 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_direct_IO+0x83/0x90 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? 
generic_file_direct_write+0xb3/0x180
Jan 15 17:31:03 vhost032 kernel:  [] ? 
__generic_file_write_iter+0xb6/0x1e0
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_file_write_iter+0x44e/0xae0 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? 
do_iter_readv_writev+0xb0/0x130
Jan 15 17:31:03 vhost032 kernel:  [] ? 
do_readv_writev+0x1a2/0x240
Jan 15 17:31:03 vhost032 kernel:  [] ? 
ocfs2_check_range_for_refcount+0x130/0x130 [ocfs2]
Jan 15 17:31:03 vhost032 kernel:  [] ? schedule+0x31/0x80
Jan 15 17:31:03 vhost032