Bug#883413: src:linux: Still reproducible with linux-image-4.15.0-rc8-amd64
Control: severity -1 important Control: tag -1 moreinfo On Mon, 29 Jan 2018 15:05:00 + Chris Boot wrote: > Package: src:linux > Followup-For: Bug #883413 > > Hi Ben, > > Unfortunately I can still reproduce this problem on 4.15-rc8 from > experimental. > > The cmdline for this boot was: > > BOOT_IMAGE=/boot/vmlinuz-4.15.0-rc8-amd64 > root=/dev/mapper/vg_tarquin-rootfs ro intel_iommu=on vsyscall=emulate > scsi_mod.use_blk_mq=Y dm_mod.use_blk_mq=Y intel_pstate=passive > i915.disable_display=Y i915.enable_gvt=Y apparmor=0 > systemd.unified_cgroup_hierarchy=1 console=ttyS1,115200n8 console=tty0 > > This triggers with DefaultMemoryAccounting=yes enabled in > /etc/systemd/system.conf, and NUT seems to regularly be involved in the > crash on my system. Sadly the systemd unit is very simple indeed, and > because my UPS is network-connected I'm not even doing dodgy things like > USB from within NUT. > > Quite how the kernel thinks that nut-server.service is using 16 ZiB of > memory is beyond me; presumably this is a slightly negative 64-bit int > bring cast unsigned. The following also feels like a smoking gun: > > [ 2982.158622] percpu ref (css_release) <= 0 (-197) after switching to atomic [...] Sorry for leaving this unanswered so long. Are you still seeing this? I found some apparently related reports on the Red Hat Bugzilla but not on anything newer than 4.17. Ben. -- Ben Hutchings The world is coming to an end. Please log off. signature.asc Description: This is a digitally signed message part
Bug#883413: src:linux: Still reproducible with linux-image-4.15.0-rc8-amd64
Package: src:linux Followup-For: Bug #883413 Hi Ben, Unfortunately I can still reproduce this problem on 4.15-rc8 from experimental. The cmdline for this boot was: BOOT_IMAGE=/boot/vmlinuz-4.15.0-rc8-amd64 root=/dev/mapper/vg_tarquin-rootfs ro intel_iommu=on vsyscall=emulate scsi_mod.use_blk_mq=Y dm_mod.use_blk_mq=Y intel_pstate=passive i915.disable_display=Y i915.enable_gvt=Y apparmor=0 systemd.unified_cgroup_hierarchy=1 console=ttyS1,115200n8 console=tty0 This triggers with DefaultMemoryAccounting=yes enabled in /etc/systemd/system.conf, and NUT seems to regularly be involved in the crash on my system. Sadly the systemd unit is very simple indeed, and because my UPS is network-connected I'm not even doing dodgy things like USB from within NUT. Quite how the kernel thinks that nut-server.service is using 16 ZiB of memory is beyond me; presumably this is a slightly negative 64-bit int bring cast unsigned. The following also feels like a smoking gun: [ 2982.158622] percpu ref (css_release) <= 0 (-197) after switching to atomic The kernel log is: [ 2611.549862] WARNING: CPU: 0 PID: 20830 at /build/linux-b8fmzT/linux-4.15~rc8/mm/page_counter.c:27 page_counter_cancel+0x17/0x20 [ 2611.561360] Modules linked in: binfmt_misc fuse vhost_net vhost tap tun devlink bridge 8021q garp mrp stp llc nls_ascii nls_cp437 vfat fat intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i915 kvm ast irqbypass crct10dif_pclmul crc32_pclmul ttm drm_kms_helper ghash_clmulni_intel intel_cstate sg efi_pstore mei_me intel_uncore iTCO_wdt evdev iTCO_vendor_support intel_rapl_perf efivars pcspkr drm mei cdc_acm intel_pch_thermal shpchp joydev ie31200_edac video acpi_power_meter button acpi_pad nfsd nfs_acl lockd grace auth_rpcgss ipmi_si ipmi_devintf sunrpc ipmi_msghandler efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb dm_mod ses enclosure scsi_transport_sas sd_mod hid_generic usbhid hid xhci_pci xhci_hcd ahci crc32c_intel ixgbe libahci igb i2c_algo_bit [ 2611.633015] aesni_intel aes_x86_64 dca ptp usbcore megaraid_sas crypto_simd libata cryptd glue_helper i2c_i801 pps_core usb_common mdio scsi_mod fan thermal [ 2611.647163] CPU: 0 PID: 20830 Comm: check_ups Not tainted 4.15.0-rc8-amd64 #1 Debian 4.15~rc8-1~exp1 [ 2611.656338] Hardware name: Supermicro Super Server/X11SSH-F, BIOS 2.0c 10/06/2017 [ 2611.663857] RIP: 0010:page_counter_cancel+0x17/0x20 [ 2611.668765] RSP: 0018:a74c8433fc70 EFLAGS: 00010097 [ 2611.674017] RAX: RBX: 8bc863c0b4c0 RCX: [ 2611.681186] RDX: 3b83ba4109d0 RSI: 0001 RDI: 8bc863c0b4c0 [ 2611.688370] RBP: 0001 R08: 8bc8c50da8a0 R09: 0001 [ 2611.695556] R10: a74c8433fd48 R11: 0100 R12: 8bc863c0b400 [ 2611.702740] R13: 8bc89c092800 R14: 8bc8a1270e10 R15: 8bc76955ec30 [ 2611.709924] FS: 7f0669316fc0() GS:8bc8c500() knlGS: [ 2611.718063] CS: 0010 DS: ES: CR0: 80050033 [ 2611.723853] CR2: 7f0668550930 CR3: 00075ce30005 CR4: 003626f0 [ 2611.731036] DR0: DR1: DR2: [ 2611.738218] DR3: DR6: fffe0ff0 DR7: 0400 [ 2611.745397] Call Trace: [ 2611.747881] page_counter_uncharge+0x1d/0x30 [ 2611.752195] drain_stock.isra.37+0x32/0xa0 [ 2611.756327] refill_stock+0x41/0x70 [ 2611.759855] __sk_mem_reduce_allocated+0x83/0xd0 [ 2611.764508] tcp_write_queue_purge+0x1a7/0x1d0 [ 2611.768990] tcp_v4_destroy_sock+0x3f/0x180 [ 2611.773208] tcp_v6_destroy_sock+0xe/0x20 [ 2611.777257] inet_csk_destroy_sock+0x47/0x100 [ 2611.781650] tcp_rcv_state_process+0x980/0xe20 [ 2611.786130] ? tcp_v6_do_rcv+0x1a7/0x3e0 [ 2611.790090] tcp_v6_do_rcv+0x1a7/0x3e0 [ 2611.793880] __release_sock+0x76/0xc0 [ 2611.797581] release_sock+0x2b/0x90 [ 2611.801107] tcp_close+0x165/0x3f0 [ 2611.804547] inet_release+0x36/0x60 [ 2611.808075] sock_release+0x1a/0x70 [ 2611.811601] sock_close+0xe/0x20 [ 2611.814861] __fput+0xd5/0x210 [ 2611.819465] task_work_run+0x84/0xa0 [ 2611.824577] exit_to_usermode_loop+0xb9/0xc0 [ 2611.830383] syscall_return_slowpath+0x88/0x90 [ 2611.836364] system_call_fast_compare_end+0x73/0x75 [ 2611.842741] RIP: 0033:0x7f0668ac8d84 [ 2611.847774] RSP: 002b:7ffe23f9c7b8 EFLAGS: 0246 ORIG_RAX: 0003 [ 2611.856787] RAX: RBX: RCX: 7f0668ac8d84 [ 2611.865332] RDX: 1fff RSI: 7ffe23f9c800 RDI: [ 2611.873833] RBP: 0006 R08: R09: [ 2611.882405] R10: R11: 0246 R12: 7ffe23f9e800 [ 2611.890813] R13: 7ffe23f9c800 R14: 2000 R15: [ 2611.899185] Code: e8 39 b5 eb ff e9 49 ff ff ff 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 89 f0 48 f7 d8 f0 48 0f c1 07 48 39 f0 78 02 f3 c3 <0f> ff