Leann,
Another one for the Kernel team to track.
Michael
On 02/15/2017 12:10 PM, Launchpad Bug Tracker wrote:
> bugproxy (bugproxy) has assigned this bug to you for Ubuntu:
>
> Issue:
> -----------
> Kernel unable to handle paging request and panic occurs when more number of
> hugepages is passed as a boot argument to the kernel .
>
> Environment:
> ----------------------
> Power NV : Habanaro Bare metal
> OS : Ubuntu 17.04
> Kernel Version : 4.9.0-11-generic
>
> Steps To reproduce:
> -----------------------------------
>
> 1 - When the ubuntu Kernel boots try to add the boot argument 'hugepages
> = 12000000'.
>
> The Kernel Panics and displays call traces like as below.
>
> [ 5.030274] Unable to handle kernel paging request for data at address
> 0x00000000
> [ 5.030323] Faulting instruction address: 0xc000000000302848
> [ 5.030366] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 5.030399] SMP NR_CPUS=2048 [ 5.030416] NUMA
> [ 5.039443] PowerNV
> [ 5.039461] Modules linked in:
> [ 5.050091] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.9.0-11-generic
> #12-Ubuntu
> [ 5.053266] Workqueue: events pcpu_balance_workfn
> [ 5.080647] task: c000003c8fe9b800 task.stack: c000003ffb118000
> [ 5.090876] NIP: c000000000302848 LR: c0000000002709d4 CTR:
> c00000000016cef0
> [ 5.094175] REGS: c000003ffb11b410 TRAP: 0300 Not tainted
> (4.9.0-11-generic)
> [ 5.103040] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>[
> 5.114466] CR: 22424222 XER: 00000000
> [ 5.124932] CFAR: c000000000008a60 DAR: 0000000000000000 DSISR: 40000000
> SOFTE: 1
> GPR00: c0000000002709d4 c000003ffb11b690 c00000000141a400 c000003fff50e300
> GPR04: 0000000000000000 00000000024001c2 c000003ffb11b780 000000219df50000
> GPR08: 0000003ffb090000 c000000001454fd8 0000000000000000 0000000000000000
> GPR12: 0000000000004400 c000000007b60000 00000000024001c2 00000000024001c2
> GPR16: 00000000024001c2 0000000000000000 0000000000000000 0000000000000002
> GPR20: 000000000000000c 0000000000000000 0000000000000000 00000000024200c0
> GPR24: c0000000016eef48 0000000000000000 c000003fff50fd00 00000000024001c2
> GPR28: 0000000000000000 c000003fff50fd00 c000003fff50e300 c000003ffb11b820
> NIP [c000000000302848] mem_cgroup_soft_limit_reclaim+0xf8/0x4f0
> [ 5.213613] LR [c0000000002709d4] do_try_to_free_pages+0x1b4/0x450
> [ 5.230521] Call Trace:
> [ 5.230643] [c000003ffb11b760] [c0000000002709d4]
> do_try_to_free_pages+0x1b4/0x450
> [ 5.254184] [c000003ffb11b800] [c000000000270d68]
> try_to_free_pages+0xf8/0x270
> [ 5.281896] [c000003ffb11b890] [c000000000259b88]
> __alloc_pages_nodemask+0x7a8/0xff0
> [ 5.321407] [c000003ffb11bab0] [c000000000282cd0]
> pcpu_populate_chunk+0x110/0x520
> [ 5.336262] [c000003ffb11bb50] [c0000000002841b8]
> pcpu_balance_workfn+0x758/0x960
> [ 5.351526] [c000003ffb11bc50] [c0000000000ecdd0]
> process_one_work+0x2b0/0x5a0
> [ 5.362561] [c000003ffb11bce0] [c0000000000ed168] worker_thread+0xa8/0x660
> [ 5.374007] [c000003ffb11bd80] [c0000000000f5320] kthread+0x110/0x130
> [ 5.385160] [c000003ffb11be30] [c00000000000c0e8]
> ret_from_kernel_thread+0x5c/0x74
> [ 5.389456] Instruction dump:
> [ 5.410036] eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 3d230001 e9499a42
> 3d220004
> [ 5.423598] 3929abd8 794a1f24 7d295214 eac90100 <e9360000> 2fa90000
> 419eff74 3b200000
> [ 5.436503] ---[ end trace 23b650e96be5c549 ]---
> [ 5.439700]
>
> This is purely a negative scenario where the system does not have enough
> memory as the hugepages is given a very large argument.
>
> Free output in a system:
> free -h
> total used free shared buff/cache
> available
> Mem: 251G 2.1G 248G 5.2M 502M
> 248G
> Swap: 2.0G 159M 1.8G
>
> The same scenario when tried after the linux is up like as,
>
> echo 12000000 > /proc/sys/vm/nr_hugepages
>
> HugePages_Total: 15069
> HugePages_Free: 15069
> HugePages_Rsvd: 0
> HugePages_Surp: 0
> Hugepagesize: 16384 kB
> root@ltc-haba2:~# free -h
> total used free shared buff/cache
> available
> Mem: 251G 237G 13G 5.6M 311M
> 13G
> Swap: 2.0G 159M 1.8G
>
> In this case the kernel is able to allocate around 237 Gb for hugetlb.
>
> But while the system is booting it gives us panic so please let know if
> this scenario is expected to be handled.
>
> I identified the root cause of the panic.
> When the system is running with low memory during mem cgroup initialisation,
> because most of the page have been grabbed to be huge pages, we hit a chicken
> and egg issue because when trying to allocate memory for the node's cgroup
> descriptor, we try to free some memory and in this path cgroup's services are
> called which assume node's cgroup descriptor is allocated.
>
> I'm working on a patch which fixes this panic, but I think it is
> expected that the system fail due to OOM when all the pages are assigned
> to huge pages.
>
> Patch sent upstream, waiting for review :
> https://patchwork.kernel.org/patch/9573799/
>
> ** Affects: ubuntu
> Importance: Undecided
> Assignee: Taco Screen team (taco-screen-team)
> Status: New
>
>
> ** Tags: architecture-ppc64le bugnameltc-150852 severity-high
> targetmilestone-inin1704
--
Michael Hohnbaum
OIL Program Manager
Power (ppc64el) Development Project Manager
Canonical, Ltd.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1665113
Title:
[Ubuntu 17.04] Kernel panics when large number of hugepages is passed
as an boot argument to kernel.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1665113/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs