Public bug reported: == Comment: #0 - YUECHANG E. MEI <[email protected]> - 2015-12-11 17:19:07 == ---Problem Description--- We have an Ubuntu 14.04.4 LPAR, conelp2. It is running stress test: base, io, and tcp. When checking "dmesg", we see this interruption:
[Fri Dec 11 13:58:50 2015] --- interrupt: 501 at plpar_hcall_norets+0x1c/0x28 [Fri Dec 11 13:58:50 2015] LR = check_and_cede_processor+0x34/0x50 In the previous test, conelp2 stopped all the stress tests by itself because it ran out of memory. Is the out of memory issue relating to the interruption? Contact Information = Yuechang (Erin) Mei /[email protected], Raja Sunkari /[email protected] ---uname output--- Linux conelp2 4.2.0-21-generic #25~14.04.1-Ubuntu SMP Thu Dec 3 13:55:42 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux Machine Type = EUH Alpine 8408-E8E ---Debugger--- A debugger is not configured ---Steps to Reproduce--- 1. install Ubuntu 14.04.4 in a LPAR, then update to the latest 14.04.4 kernel by using this workaround: echo "deb http://software.linux.ibm.com/pub/ubuntu-ppc64el-repository/ trusty-proposed main restricted universe multiverse" >> /etc/apt/sources.list apt-get update apt-get install linux-image-generic-lts-wily 2. Setup the Stress test, and start base,io, tcp 3. After an hour, check dmesg, then you will see the message about the interruption Stack trace output: no Oops output: no System Dump Info: The system is not configured to capture a system dump. *Additional Instructions for Yuechang (Erin) Mei /[email protected], Raja Sunkari /[email protected]: -Post a private note with access information to the machine that the bug is occuring on. -Attach sysctl -a output output to the bug. == Comment: #1 - YUECHANG E. MEI <[email protected]> - 2015-12-11 17:23:00 == == Comment: #3 - YUECHANG E. MEI <[email protected]> - 2015-12-14 15:23:33 == == Comment: #4 - MAMATHA INAMDAR <[email protected]> - 2015-12-15 03:56:14 == dmrsg show page allocation failure [Fri Dec 11 13:45:38 2015] swapper/127: page allocation failure: order:0, mode:0x120 [Fri Dec 11 13:45:38 2015] CPU: 127 PID: 0 Comm: swapper/127 Not tainted 4.2.0-21-generic #25~14.04.1-Ubuntu [Fri Dec 11 13:45:38 2015] Call Trace: [Fri Dec 11 13:45:38 2015] [c00000027fbc3890] [c000000000a805ec] dump_stack+0x90/0xbc (unreliable) [Fri Dec 11 13:45:38 2015] [c00000027fbc38c0] [c00000000021c118] warn_alloc_failed+0x118/0x160 [Fri Dec 11 13:45:38 2015] [c00000027fbc3960] [c000000000221114] __alloc_pages_nodemask+0x834/0xa60 [Fri Dec 11 13:45:38 2015] [c00000027fbc3b10] [c000000000221404] __alloc_page_frag+0xc4/0x190 [Fri Dec 11 13:45:38 2015] [c00000027fbc3b50] [c0000000008f6d20] netdev_alloc_frag+0x50/0x80 [Fri Dec 11 13:45:38 2015] [c00000027fbc3b80] [c000000000764e80] tg3_alloc_rx_data+0xa0/0x2c0 [Fri Dec 11 13:45:38 2015] [c00000027fbc3be0] [c000000000767344] tg3_poll_work+0x484/0x1070 [Fri Dec 11 13:45:38 2015] [c00000027fbc3ce0] [c000000000767f8c] tg3_poll_msix+0x5c/0x210 [Fri Dec 11 13:45:38 2015] [c00000027fbc3d30] [c00000000090ebb8] net_rx_action+0x2d8/0x430 [Fri Dec 11 13:45:38 2015] [c00000027fbc3e40] [c0000000000ba124] __do_softirq+0x174/0x390 [Fri Dec 11 13:45:38 2015] [c00000027fbc3f40] [c0000000000ba6c8] irq_exit+0xc8/0x100 [Fri Dec 11 13:45:38 2015] [c00000027fbc3f60] [c0000000000111ec] __do_irq+0x8c/0x190 [Fri Dec 11 13:45:38 2015] [c00000027fbc3f90] [c000000000024278] call_do_irq+0x14/0x24 [Fri Dec 11 13:45:38 2015] [c0000002763a39b0] [c000000000011390] do_IRQ+0xa0/0x120 [Fri Dec 11 13:45:38 2015] [c0000002763a3a10] [c0000000000099b0] restore_check_irq_replay+0x2c/0x70 [Fri Dec 11 13:45:38 2015] --- interrupt: 501 at plpar_hcall_norets+0x1c/0x28 [Fri Dec 11 13:45:38 2015] LR = check_and_cede_processor+0x34/0x50 [Fri Dec 11 13:45:38 2015] [c0000002763a3d00] [c0000000008a8d90] check_and_cede_processor+0x20/0x50 (unreliable) [Fri Dec 11 13:45:38 2015] [c0000002763a3d60] [c0000000008a8fb8] shared_cede_loop+0x68/0x170 [Fri Dec 11 13:45:38 2015] [c0000002763a3da0] [c0000000008a615c] cpuidle_enter_state+0xbc/0x350 [Fri Dec 11 13:45:38 2015] [c0000002763a3e00] [c000000000110f3c] call_cpuidle+0x7c/0xd0 [Fri Dec 11 13:45:38 2015] [c0000002763a3e40] [c0000000001112d0] cpu_startup_entry+0x340/0x450 [Fri Dec 11 13:45:38 2015] [c0000002763a3f10] [c000000000044ab4] start_secondary+0x364/0x3a0 [Fri Dec 11 13:45:38 2015] [c0000002763a3f90] [c000000000008b6c] start_secondary_prolog+0x10/0x14 [Fri Dec 11 13:45:38 2015] Mem-Info: [Fri Dec 11 13:45:38 2015] active_anon:714 inactive_anon:2255 isolated_anon:0 == Comment: #5 - Luciano Chavez <[email protected]> - 2016-01-04 14:28:59 == Hi Yuechang, Atomic page allocation failure warnings originating from network stack allocation request are common under stress conditions. The order 0x0 page allocation failures are probably the easiest to tune for assuming there isn't a leak. Suggest you start with at least having a minimum free pool reservation of 64MB and see if that helps eliminate that particular warning. First check that current value is lower than that cat /proc/sys/vm/min_free_kbytes and then set it with echo 65536 > /proc/sys/vm/min_free_kbytes If existing value is already higher than 64MB then pick a larger value. If this helps, update the /etc/sysctl.conf file to keep that persistent between boots with an entry of vm.min_free_kbytes = 65536 or whatever the best value that helped. == Comment: #6 - Jonathan Dalton <[email protected]> - 2016-01-06 15:34:18 == root@conelp2:~# root@conelp2:~# cat /proc/sys/vm/min_free_kbytes 180224 root@conelp2:~# echo 365536 > /proc/sys/vm/min_free_kbytes root@conelp2:~# cat /proc/sys/vm/min_free_kbytes 365536 root@conelp2:~# == Comment: #7 - Jonathan Dalton <[email protected]> - 2016-01-07 11:41:51 == root@conelp2:~# root@conelp2:~# cat /proc/sys/vm/min_free_kbytes 180224 root@conelp2:~# echo 365536 > /proc/sys/vm/min_free_kbytes root@conelp2:~# cat /proc/sys/vm/min_free_kbytes 365536 root@conelp2:~# == Comment: #8 - Raja Shekhar Reddy Sunkari <[email protected]> - 2016-01-11 02:30:19 == Hi Luciano, I have run stress test on conelp2 after updating value to: root@conelp2:~# cat /proc/sys/vm/min_free_kbytes 365536 Tests ran successfully for 72hrs without any interruption. However, dmesg output still shows the page allocation failure messages but appear less frequent when compared to last run. == Comment: #9 - Jonathan Dalton <[email protected]> - 2016-01-13 13:02:16 == I restarted stress tests Monday and verified today (Wednesday) that: root@conelp2:~# cat /proc/sys/vm/min_free_kbytes 365536 Was increased. With the increased "min_free_kbytes" there is nothing in the current dmesg that says: interrupt 501 page allocation fault So, increasing the "min_free_kbytes" during stress eliminated the fault, however, is this still a bug? Should the "min_free_kbytes" have to be increased? Attached is the dmesg associated with this comment. == Comment: #12 - Luciano Chavez <[email protected]> - 2016-01-22 20:22:08 == (In reply to comment #11) > Hi Luciano, > > I see some info for --set-recommended-min_free_kbytes documented in the > following link > > http://manpages.ubuntu.com/manpages/trusty/man8/hugeadm.8.html > > Can you please check and let me know. Hi Mamatha, Thanks. That documentation is specific to a utility for huge pages though so we may have to mirror it and see if the Canonical folks can point to Ubuntu documentation they have on when to change min_free_kbytes. Hi canonical, Please point to Ubuntu documentation that will explain when to change min_free_kbytes. ** Affects: ubuntu Importance: Undecided Assignee: Taco Screen team (taco-screen-team) Status: New ** Tags: architecture-ppc64le bugnameltc-134023 severity-high targetmilestone-inin--- ** Tags added: architecture-ppc64le bugnameltc-134023 severity-high targetmilestone-inin--- -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1537666 Title: ISST-LTE: Ubuntu 14.04.4 LPAR interrupts at check_and_cede_processor To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+bug/1537666/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
