Bug#883413: src:linux: Still reproducible with linux-image-4.15.0-rc8-amd64

2019-02-09 Thread Ben Hutchings
Control: severity -1 important
Control: tag -1 moreinfo

On Mon, 29 Jan 2018 15:05:00 + Chris Boot  wrote:
> Package: src:linux
> Followup-For: Bug #883413
> 
> Hi Ben,
> 
> Unfortunately I can still reproduce this problem on 4.15-rc8 from
> experimental.
> 
> The cmdline for this boot was:
> 
> BOOT_IMAGE=/boot/vmlinuz-4.15.0-rc8-amd64
> root=/dev/mapper/vg_tarquin-rootfs ro intel_iommu=on vsyscall=emulate
> scsi_mod.use_blk_mq=Y dm_mod.use_blk_mq=Y intel_pstate=passive
> i915.disable_display=Y i915.enable_gvt=Y apparmor=0
> systemd.unified_cgroup_hierarchy=1 console=ttyS1,115200n8 console=tty0
> 
> This triggers with DefaultMemoryAccounting=yes enabled in
> /etc/systemd/system.conf, and NUT seems to regularly be involved in the
> crash on my system. Sadly the systemd unit is very simple indeed, and
> because my UPS is network-connected I'm not even doing dodgy things like
> USB from within NUT.
> 
> Quite how the kernel thinks that nut-server.service is using 16 ZiB of
> memory is beyond me; presumably this is a slightly negative 64-bit int
> bring cast unsigned. The following also feels like a smoking gun:
>
> [ 2982.158622] percpu ref (css_release) <= 0 (-197) after switching to atomic
[...]

Sorry for leaving this unanswered so long.  Are you still seeing this? 
I found some apparently related reports on the Red Hat Bugzilla but not
on anything newer than 4.17.

Ben.

-- 
Ben Hutchings
The world is coming to an end.  Please log off.




signature.asc
Description: This is a digitally signed message part


Bug#883413: src:linux: Still reproducible with linux-image-4.15.0-rc8-amd64

2018-01-29 Thread Chris Boot
Package: src:linux
Followup-For: Bug #883413

Hi Ben,

Unfortunately I can still reproduce this problem on 4.15-rc8 from
experimental.

The cmdline for this boot was:

BOOT_IMAGE=/boot/vmlinuz-4.15.0-rc8-amd64
root=/dev/mapper/vg_tarquin-rootfs ro intel_iommu=on vsyscall=emulate
scsi_mod.use_blk_mq=Y dm_mod.use_blk_mq=Y intel_pstate=passive
i915.disable_display=Y i915.enable_gvt=Y apparmor=0
systemd.unified_cgroup_hierarchy=1 console=ttyS1,115200n8 console=tty0

This triggers with DefaultMemoryAccounting=yes enabled in
/etc/systemd/system.conf, and NUT seems to regularly be involved in the
crash on my system. Sadly the systemd unit is very simple indeed, and
because my UPS is network-connected I'm not even doing dodgy things like
USB from within NUT.

Quite how the kernel thinks that nut-server.service is using 16 ZiB of
memory is beyond me; presumably this is a slightly negative 64-bit int
bring cast unsigned. The following also feels like a smoking gun:

[ 2982.158622] percpu ref (css_release) <= 0 (-197) after switching to atomic

The kernel log is:

[ 2611.549862] WARNING: CPU: 0 PID: 20830 at 
/build/linux-b8fmzT/linux-4.15~rc8/mm/page_counter.c:27 
page_counter_cancel+0x17/0x20
[ 2611.561360] Modules linked in: binfmt_misc fuse vhost_net vhost tap tun 
devlink bridge 8021q garp mrp stp llc nls_ascii nls_cp437 vfat fat intel_rapl 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i915 kvm ast irqbypass 
crct10dif_pclmul crc32_pclmul ttm drm_kms_helper ghash_clmulni_intel 
intel_cstate sg efi_pstore mei_me intel_uncore iTCO_wdt evdev 
iTCO_vendor_support intel_rapl_perf efivars pcspkr drm mei cdc_acm 
intel_pch_thermal shpchp joydev ie31200_edac video acpi_power_meter button 
acpi_pad nfsd nfs_acl lockd grace auth_rpcgss ipmi_si ipmi_devintf sunrpc 
ipmi_msghandler efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 
crc32c_generic fscrypto ecb dm_mod ses enclosure scsi_transport_sas sd_mod 
hid_generic usbhid hid xhci_pci xhci_hcd ahci crc32c_intel ixgbe libahci igb 
i2c_algo_bit
[ 2611.633015]  aesni_intel aes_x86_64 dca ptp usbcore megaraid_sas crypto_simd 
libata cryptd glue_helper i2c_i801 pps_core usb_common mdio scsi_mod fan thermal
[ 2611.647163] CPU: 0 PID: 20830 Comm: check_ups Not tainted 4.15.0-rc8-amd64 
#1 Debian 4.15~rc8-1~exp1
[ 2611.656338] Hardware name: Supermicro Super Server/X11SSH-F, BIOS 2.0c 
10/06/2017
[ 2611.663857] RIP: 0010:page_counter_cancel+0x17/0x20
[ 2611.668765] RSP: 0018:a74c8433fc70 EFLAGS: 00010097
[ 2611.674017] RAX:  RBX: 8bc863c0b4c0 RCX: 
[ 2611.681186] RDX: 3b83ba4109d0 RSI: 0001 RDI: 8bc863c0b4c0
[ 2611.688370] RBP: 0001 R08: 8bc8c50da8a0 R09: 0001
[ 2611.695556] R10: a74c8433fd48 R11: 0100 R12: 8bc863c0b400
[ 2611.702740] R13: 8bc89c092800 R14: 8bc8a1270e10 R15: 8bc76955ec30
[ 2611.709924] FS:  7f0669316fc0() GS:8bc8c500() 
knlGS:
[ 2611.718063] CS:  0010 DS:  ES:  CR0: 80050033
[ 2611.723853] CR2: 7f0668550930 CR3: 00075ce30005 CR4: 003626f0
[ 2611.731036] DR0:  DR1:  DR2: 
[ 2611.738218] DR3:  DR6: fffe0ff0 DR7: 0400
[ 2611.745397] Call Trace:
[ 2611.747881]  page_counter_uncharge+0x1d/0x30
[ 2611.752195]  drain_stock.isra.37+0x32/0xa0
[ 2611.756327]  refill_stock+0x41/0x70
[ 2611.759855]  __sk_mem_reduce_allocated+0x83/0xd0
[ 2611.764508]  tcp_write_queue_purge+0x1a7/0x1d0
[ 2611.768990]  tcp_v4_destroy_sock+0x3f/0x180
[ 2611.773208]  tcp_v6_destroy_sock+0xe/0x20
[ 2611.777257]  inet_csk_destroy_sock+0x47/0x100
[ 2611.781650]  tcp_rcv_state_process+0x980/0xe20
[ 2611.786130]  ? tcp_v6_do_rcv+0x1a7/0x3e0
[ 2611.790090]  tcp_v6_do_rcv+0x1a7/0x3e0
[ 2611.793880]  __release_sock+0x76/0xc0
[ 2611.797581]  release_sock+0x2b/0x90
[ 2611.801107]  tcp_close+0x165/0x3f0
[ 2611.804547]  inet_release+0x36/0x60
[ 2611.808075]  sock_release+0x1a/0x70
[ 2611.811601]  sock_close+0xe/0x20
[ 2611.814861]  __fput+0xd5/0x210
[ 2611.819465]  task_work_run+0x84/0xa0
[ 2611.824577]  exit_to_usermode_loop+0xb9/0xc0
[ 2611.830383]  syscall_return_slowpath+0x88/0x90
[ 2611.836364]  system_call_fast_compare_end+0x73/0x75
[ 2611.842741] RIP: 0033:0x7f0668ac8d84
[ 2611.847774] RSP: 002b:7ffe23f9c7b8 EFLAGS: 0246 ORIG_RAX: 
0003
[ 2611.856787] RAX:  RBX:  RCX: 7f0668ac8d84
[ 2611.865332] RDX: 1fff RSI: 7ffe23f9c800 RDI: 
[ 2611.873833] RBP: 0006 R08:  R09: 
[ 2611.882405] R10:  R11: 0246 R12: 7ffe23f9e800
[ 2611.890813] R13: 7ffe23f9c800 R14: 2000 R15: 
[ 2611.899185] Code: e8 39 b5 eb ff e9 49 ff ff ff 90 90 90 90 90 90 90 90 90 
90 0f 1f 44 00 00 48 89 f0 48 f7 d8 f0 48 0f c1 07 48 39 f0 78 02 f3 c3 <0f> ff