Public bug reported: Release: Ubuntu 14.04.5 LTS Kernel: Linux 4.4.0-67-generic #88~14.04.1-Ubuntu SMP Filesystems: ext4 on Hardware RAID 6
We regularly run a backup script, that mainly utilities rsync and mv. When there is a lot of change, the server sometimes freezes and can only be recovered by power cycling. I thought it was a hardware problem, but we have this problem now on 2 out of 18 identical machines. They have different BIOS versions. So probably, it's related to the amount of data. During the process I see high load by the processes rsync and chmod. Kernel messages: Apr 2 01:09:58 server kernel: [483707.688686] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s! [kswapd0:83] Apr 2 01:09:58 server kernel: [483707.688716] Modules linked in: drbg ansi_cprng ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables 8021q garp mrp bridge stp llc dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass ipmi_devintf crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac dcdbas edac_core acpi_power_meter shpchp ipmi_si mei_me input_leds lpc_ich ipmi_msghandler mei 8250_fintek mac_hid parport_pc ppdev lp parport igb dca ptp hid_generic usbhid hid ahci pps_core libahci i2c_algo_bit megaraid_sas wmi fjes Apr 2 01:09:58 server kernel: [483707.688718] CPU: 7 PID: 83 Comm: kswapd0 Tainted: G L 4.4.0-67-generic #88~14.04.1-Ubuntu Apr 2 01:09:58 server kernel: [483707.688719] Hardware name: Dell Inc. PowerEdge T630, BIOS 1.5.4 10/04/2015 Apr 2 01:09:58 server kernel: [483707.688720] task: ffff881034ac6200 ti: ffff88102da44000 task.ti: ffff88102da44000 Apr 2 01:09:58 server kernel: [483707.688722] RIP: 0010:[<ffffffff810c671a>] [<ffffffff810c671a>] native_queued_spin_lock_slowpath+0x10a/0x170 Apr 2 01:09:58 server kernel: [483707.688723] RSP: 0018:ffff88102da47c58 EFLAGS: 00000246 Apr 2 01:09:58 server kernel: [483707.688724] RAX: 0000000000000000 RBX: 000000000000037a RCX: ffff88103d3d7940 Apr 2 01:09:58 server kernel: [483707.688725] RDX: ffff88103d417940 RSI: 0000000000200000 RDI: ffffffff821dc7e0 Apr 2 01:09:58 server kernel: [483707.688725] RBP: ffff88102da47c58 R08: 0000000000000101 R09: 28f5c28f5c28f5c3 Apr 2 01:09:58 server kernel: [483707.688726] R10: 0000000000000000 R11: ffff88102da47a58 R12: 0000000000000080 Apr 2 01:09:58 server kernel: [483707.688727] R13: 0000000000000000 R14: ffffffff81e8ae40 R15: 0000000000007ace Apr 2 01:09:58 server kernel: [483707.688728] FS: 0000000000000000(0000) GS:ffff88103d3c0000(0000) knlGS:0000000000000000 Apr 2 01:09:58 server kernel: [483707.688728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 2 01:09:58 server kernel: [483707.688729] CR2: 00007ff3c624c0f2 CR3: 0000000001e0c000 CR4: 00000000001426e0 Apr 2 01:09:58 server kernel: [483707.688730] Stack: Apr 2 01:09:58 server kernel: [483707.688731] ffff88102da47c68 ffffffff81183477 ffff88102da47c78 ffffffff81806af0 Apr 2 01:09:58 server kernel: [483707.688733] ffff88102da47c88 ffffffff8125dfd5 ffff88102da47d60 ffffffff8119601a Apr 2 01:09:58 server kernel: [483707.688734] 0000000000000000 0000000000000000 ffff880da9fdf340 0000000000e86866 Apr 2 01:09:58 server kernel: [483707.688735] Call Trace: Apr 2 01:09:58 server kernel: [483707.688737] [<ffffffff81183477>] queued_spin_lock_slowpath+0xb/0xf Apr 2 01:09:58 server kernel: [483707.688739] [<ffffffff81806af0>] _raw_spin_lock+0x20/0x30 Apr 2 01:09:58 server kernel: [483707.688740] [<ffffffff8125dfd5>] mb_cache_shrink_count+0x15/0xb0 Apr 2 01:09:58 server kernel: [483707.688742] [<ffffffff8119601a>] shrink_slab.part.40+0x10a/0x3f0 Apr 2 01:09:58 server kernel: [483707.688744] [<ffffffff8119a6f7>] shrink_zone+0x2a7/0x2c0 Apr 2 01:09:58 server kernel: [483707.688746] [<ffffffff8119b6c7>] kswapd+0x4c7/0x970 Apr 2 01:09:58 server kernel: [483707.688749] [<ffffffff8119b200>] ? mem_cgroup_shrink_node_zone+0x190/0x190 Apr 2 01:09:58 server kernel: [483707.688750] [<ffffffff8109cd19>] kthread+0xc9/0xe0 Apr 2 01:09:58 server kernel: [483707.688752] [<ffffffff8109cc50>] ? kthread_park+0x60/0x60 Apr 2 01:09:58 server kernel: [483707.688753] [<ffffffff8180724f>] ret_from_fork+0x3f/0x70 Apr 2 01:09:58 server kernel: [483707.688754] [<ffffffff8109cc50>] ? kthread_park+0x60/0x60 Apr 2 01:09:58 server kernel: [483707.688772] Code: c2 c1 e8 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 79 01 00 48 03 14 c5 00 99 f3 81 48 89 0a 8b 41 08 85 c0 75 0d f3 90 <8b> 41 08 85 c0 74 f7 eb 02 f3 90 8b 17 66 85 d2 75 f7 39 f2 66 Apr 2 01:09:58 server kernel: [483707.698419] Modules linked in: drbg ansi_cprng ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables 8021q garp mrp bridge stp llc dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass ipmi_devintf crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac dcdbas edac_core acpi_power_meter shpchp ipmi_si mei_me input_leds lpc_ich ipmi_msghandler mei 8250_fintek mac_hid parport_pc ppdev lp parport igb dca ptp hid_generic usbhid hid ahci pps_core libahci i2c_algo_bit megaraid_sas wmi fjes Apr 2 01:09:58 server kernel: [483707.698441] CPU: 3 PID: 3119 Comm: freshclam Tainted: G L 4.4. 0-67-generic #88~14.04.1-Ubuntu Apr 2 01:09:58 server kernel: [483707.698441] Hardware name: Dell Inc. PowerEdge T630, BIOS 1.5.4 10/0 4/2015 Apr 2 01:09:58 server kernel: [483707.698443] task: ffff88102b9b3800 ti: ffff88102ef28000 task.ti: ffff88102e f28000 Apr 2 01:09:58 server kernel: [483707.698444] RIP: 0010:[<ffffffff810c671d>] [<ffffffff810c671d>] native_que ued_spin_lock_slowpath+0x10d/0x170 Apr 2 01:09:58 server kernel: [483707.698447] RSP: 0018:ffff88102ef2b7c0 EFLAGS: 00000246 Apr 2 01:09:58 server kernel: [483707.698448] RAX: 0000000000000000 RBX: 000000000000037a RCX: ffff88103d2d79 40 Apr 2 01:09:58 server kernel: [483707.698448] RDX: ffff88103d3d7940 RSI: 0000000000100000 RDI: ffffffff821dc7 e0 Apr 2 01:09:58 server kernel: [483707.698449] RBP: ffff88102ef2b7c0 R08: 0000000000000101 R09: 28f5c28f5c28f5 c3 Apr 2 01:09:58 server kernel: [483707.698450] R10: 0000000000000000 R11: ffff88102ef2b5c8 R12: 00000000000000 80 Apr 2 01:09:58 server kernel: [483707.698451] R13: 0000000000000000 R14: ffffffff81e8ae40 R15: 0000000000007a ce Apr 2 01:09:58 server kernel: [483707.698452] FS: 00007fe59bc02780(0000) GS:ffff88103d2c0000(0000) knlGS:000 0000000000000 Apr 2 01:09:58 server kernel: [483707.698453] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 2 01:09:58 server kernel: [483707.698454] CR2: 00007fe59bc13000 CR3: 000000102c83f000 CR4: 00000000001426 e0 Apr 2 01:09:58 server kernel: [483707.698455] Stack: Apr 2 01:09:58 server kernel: [483707.698456] ffff88102ef2b7d0 ffffffff81183477 ffff88102ef2b7e0 ffffffff818 06af0 Apr 2 01:09:58 server kernel: [483707.698457] ffff88102ef2b7f0 ffffffff8125dfd5 ffff88102ef2b8c8 ffffffff811 9601a Apr 2 01:09:58 server kernel: [483707.698459] 0000000000000003 0000000000000001 0000000000000000 0000000000e 876d8 Apr 2 01:09:58 server kernel: [483707.698461] Call Trace: Apr 2 01:09:58 server kernel: [483707.698463] [<ffffffff81183477>] queued_spin_lock_slowpath+0xb/0xf Apr 2 01:09:58 server kernel: [483707.698465] [<ffffffff81806af0>] _raw_spin_lock+0x20/0x30 Apr 2 01:09:58 server kernel: [483707.698467] [<ffffffff8125dfd5>] mb_cache_shrink_count+0x15/0xb0 Apr 2 01:09:58 server kernel: [483707.698469] [<ffffffff8119601a>] shrink_slab.part.40+0x10a/0x3f0 Apr 2 01:09:58 server kernel: [483707.698471] [<ffffffff8119a6f7>] shrink_zone+0x2a7/0x2c0 Apr 2 01:09:58 server kernel: [483707.698473] [<ffffffff8119aa86>] do_try_to_free_pages+0x166/0x3d0 Apr 2 01:09:58 server kernel: [483707.698475] [<ffffffff81197dfd>] ? throttle_direct_reclaim+0x8d/0x230 Apr 2 01:09:58 server kernel: [483707.698477] [<ffffffff8119ada5>] try_to_free_pages+0xb5/0x170 Apr 2 01:09:58 server kernel: [483707.698479] [<ffffffff811fbb6e>] __alloc_pages_slowpath.constprop.87+0x323/0x78c Apr 2 01:09:58 server kernel: [483707.698482] [<ffffffff8118e3c7>] __alloc_pages_nodemask+0x237/0x240 Apr 2 01:09:58 server kernel: [483707.698483] [<ffffffff811d4298>] alloc_pages_current+0x88/0x120 Apr 2 01:09:58 server kernel: [483707.698485] [<ffffffff8118562e>] __page_cache_alloc+0xae/0xc0 Apr 2 01:09:58 server kernel: [483707.698487] [<ffffffff81186029>] pagecache_get_page+0x59/0x1c0 Apr 2 01:09:58 server kernel: [483707.698488] [<ffffffff811861b6>] grab_cache_page_write_begin+0x26/0x40 Apr 2 01:09:58 server kernel: [483707.698490] [<ffffffff8128e6d1>] ext4_da_write_begin+0xa1/0x330 Apr 2 01:09:58 server kernel: [483707.698492] [<ffffffff811851f0>] generic_perform_write+0xc0/0x1a0 Apr 2 01:09:58 server kernel: [483707.698494] [<ffffffff8121a89b>] ? file_update_time+0x3b/0xf0 Apr 2 01:09:58 server kernel: [483707.698496] [<ffffffff811873a7>] __generic_file_write_iter+0x197/0x1e0 Apr 2 01:09:58 server kernel: [483707.698498] [<ffffffff812832e6>] ext4_file_write_iter+0xf6/0x360 Apr 2 01:09:58 server kernel: [483707.698500] [<ffffffff812008f8>] new_sync_write+0x88/0xb0 Apr 2 01:09:58 server kernel: [483707.698501] [<ffffffff81200947>] __vfs_write+0x27/0x40 Apr 2 01:09:58 server kernel: [483707.698503] [<ffffffff81200f52>] vfs_write+0xa2/0x1a0 Apr 2 01:09:58 server kernel: [483707.698504] [<ffffffff81201c76>] SyS_write+0x46/0xa0 Apr 2 01:09:58 server kernel: [483707.698506] [<ffffffff81806eb6>] entry_SYSCALL_64_fastpath+0x16/0x75 Apr 2 01:09:58 server kernel: [483707.698507] Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 79 01 00 48 03 14 c5 00 99 f3 81 48 89 0a 8b 41 08 85 c0 75 0d f3 90 8b 41 08 <85> c0 74 f7 eb 02 f3 90 8b 17 66 85 d2 75 f7 39 f2 66 90 75 0f The problem exists for a while now. None of the latest kernel updates helped. Can you please advice me what do do? Thank you! ProblemType: Bug DistroRelease: Ubuntu 14.04 Package: linux-image-4.4.0-67-generic 4.4.0-67.88~14.04.1 ProcVersionSignature: Ubuntu 4.4.0-67.88~14.04.1-generic 4.4.49 Uname: Linux 4.4.0-67-generic x86_64 ApportVersion: 2.14.1-0ubuntu3.23 Architecture: amd64 Date: Tue Apr 4 12:38:13 2017 InstallationDate: Installed on 2016-02-22 (406 days ago) InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2) ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash SourcePackage: linux-lts-xenial UpgradeStatus: No upgrade log present (probably fresh install) ** Affects: linux-lts-xenial (Ubuntu) Importance: Undecided Status: New ** Tags: amd64 apport-bug trusty -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1679625 Title: Server crashes on soft lockup To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-lts-xenial/+bug/1679625/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs