Louis,
While we can't test this without access to a machine with large amounts
of memory, is it possible to apply this patch and provide an image to
IBM for testing?
Michael
On 02/01/2017 11:09 PM, bugproxy wrote:
> Public bug reported:
>
> Problem Description
> ===========================
> In Ubuntu16.10 tried kdump in Brazos system (32TB Memory and 192 core).
> when trigger panic kdump process stuck in boot process need to do force
> reboot .After reboot system captured vmcore-incomplete.
>
> Reproducible Step:
> ======================
> 1- Install Ubuntu16.10
> 2- boot system with 31TB and 192 Core
> 3- configure kdump in system
> 4- verify kdump in system that it is running
> 5- Trigger panic in system
>
> Actual Result
> --------------------------
> kdump process stuck in boot process need to do force reboot
>
> Expected Result
> -----------------------------
> Kdump will proceed and vmcore captured successfully.
>
> LOG:
>
> root@ltc-brazos1:~# cat /proc/cmdline
> BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic
> root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash
> crashkernel=4096M
> root@ltc-brazos1:~# kdump-config show
> DUMP_MODE: kdump
> USE_KDUMP: 1
> KDUMP_SYSCTL: kernel.panic_on_oops=1
> KDUMP_COREDIR: /var/crash
> crashkernel addr:
> /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.4.0-30-generic
> kdump initrd:
> /var/lib/kdump/initrd.img: symbolic link to
> /var/lib/kdump/initrd.img-4.4.0-30-generic
> current state: ready to kdump
>
> kexec command:
> /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic
> root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash irqpoll
> nr_cpus=1 nousb systemd.unit=kdump-tools.service"
> --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
> root@ltc-brazos1:~#
> root@ltc-brazos1:~# dpkg -l | grep kdump
> ii kdump-tools 1:1.6.0-2 all
> scripts and tools for automating kdump (Linux crash dumps)
> root@ltc-brazos1:~#
> root@ltc-brazos1:~# echo c > /proc/sysrq-trigger
>
>
> ltc-brazos1 login: [ 416.229464] sysrq: SysRq : Trigger a crash
>
>
> [ 416.229496] Unable to handle kernel paging request for data at address
> 0x00000000
>
> [ 416.229502] Faulting instruction address: 0xc000000000670014
>
>
> [ 416.229508] Oops: Kernel access of bad area, sig: 11 [#1]
>
>
> [ 416.229511] SMP NR_CPUS=2048 NUMA pSeries
>
>
> [ 416.229517] Modules linked in: pseries_rng btrfs xor raid6_pq rtc_generic
> sunrpc autofs4 ses enclosure ipr
>
> [ 416.229532] CPU: 65 PID: 404785 Comm: bash Not tainted 4.4.0-30-generic
> #49-Ubuntu
>
> [ 416.229537] task: c00001f9d583c8e0 ti: c00001fa13cd8000 task.ti:
> c00001fa13cd8000
>
> [ 416.229543] NIP: c000000000670014 LR: c0000000006710c8 CTR:
> c00000000066ffe0
>
> [ 416.229548] REGS: c00001fa13cdb990 TRAP: 0300 Not tainted
> (4.4.0-30-generic)
>
> [ 416.229552] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28242222
> XER: 00000001
>
> [ 416.229565] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000
> SOFTE: 1
>
> GPR00: c0000000006710c8 c00001fa13cdbc10 c0000000015b5d00 0000000000000063
>
>
> GPR04: c00001fab9049c50 c00001fab905b4e0 c0001f3fff3d0000 0000000000000313
>
>
> GPR08: 0000000000000007 0000000000000001 0000000000000000 c0001f3fff3dec68
>
>
> GPR12: c00000000066ffe0 c000000007546980 ffffffffffffffff 0000000022000000
> GPR16: 0000000010170dc8 00000100174901d8 0000000010140f58 00000000100c7570
> GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608
> GPR24: 00003ffff8966c94 0000000000000001 c0000000014f8e58 0000000000000004
> GPR28: c0000000014f9218 0000000000000063 c0000000014b11dc 0000000000000000
> [ 416.229631] NIP [c000000000670014] sysrq_handle_crash+0x34/0x50
> [ 416.229636] LR [c0000000006710c8] __handle_sysrq+0xe8/0x270
> [ 416.229640] Call Trace:
> [ 416.229645] [c00001fa13cdbc10] [c000000000e08f28]
> _fw_tigon_tg3_bin_name+0x2ce58/0x342b0 (unreliable)
> [ 416.229652] [c00001fa13cdbc30] [c0000000006710c8] __handle_sysrq+0xe8/0x270
> [ 416.229658] [c00001fa13cdbcd0] [c000000000671868]
> write_sysrq_trigger+0x78/0xa0
> [ 416.229666] [c00001fa13cdbd00] [c00000000037ae30] proc_reg_write+0xb0/0x110
> [ 416.229673] [c00001fa13cdbd50] [c0000000002e186c] __vfs_write+0x6c/0xe0
> [ 416.229678] [c00001fa13cdbd90] [c0000000002e25a0] vfs_write+0xc0/0x230
> [ 416.229684] [c00001fa13cdbde0] [c0000000002e35dc] SyS_write+0x6c/0x110
> [ 416.229690] [c00001fa13cdbe30] [c000000000009204] system_call+0x38/0xb4
> [ 416.229695] Instruction dump:
> [ 416.229698] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019
> 394931e4
> [ 416.229707] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020
> e8010010 7c0803a6
> [ 416.229717] ---[ end trace 16e5fbbf7faa7340 ]---
> [ 416.232059]
> [ 416.232086] Sending IPI to other CPUs
> [ 416.242558] IPI complete
> [ [ 416.229695] Instruction dump:
>
>
> [ 416.229698] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019
> 394931e4
>
> [ 416.229707] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020
> e8010010 7c0803a6
>
> [ 416.229717] ---[ end trace 16e5fbbf7faa7340 ]---
>
>
> [ 416.232059]
>
>
> [ 416.232086] Sending IPI to other CPUs
>
>
> [ 416.242558] IPI complete
>
>
> I'm in purgatory
>
>
> -> smp_release_cpus()
>
>
> spinning_secondaries = 1528
>
>
> <- smp_release_cpus()
>
>
> <- setup_system()
>
>
> [ 1.146155] sd 0:2:1:0: [sdb] Assuming drive cache: write through
>
>
> [ 1.154176] sd 0:2:0:0: [sda] Assuming drive cache: write through
>
>
> /dev/sdb2: recovering journal
>
>
> /dev/sdb2: clean, 69482/136331264 files, 9047821/545318400 blocks
>
> ---------------------------------------------------------------------------------------
> --------------------------------------------------------------------------------------
> tu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
> 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f.
> . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . .
> .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
> 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f.
> . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . .
> .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
> 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f.
> . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . .
> .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
> 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f.
> . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . .
> .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
> 16.101;-1f. . . .1;-1fUbuntu 1
6.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101
;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . .
.1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f.
. . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . .
.1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f.
>
>
> ---------------------------------------------------------------------------------------
> --------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------
> --------------------------------------------------------------------------------------
>
> after force reboot
>
> root@ltc-brazos1:/var/crash# ls
> 201607161510 kexec_cmd
> root@ltc-brazos1:/var/crash# cd 201607161510/
> root@ltc-brazos1:/var/crash/201607161510# ls
> vmcore-incomplete
> root@ltc-brazos1:
>
> Note : waited for Kdump process more than 2 Hour .
>
> Regards
> Praveen
>
> == Comment: #12 - Vaishnavi Bhat <[email protected]> - 2016-09-16 02:40:20
> ==
> root@ltc-brazos1:~# kdump-config show
> DUMP_MODE: kdump
> USE_KDUMP: 1
> KDUMP_SYSCTL: kernel.panic_on_oops=1
> KDUMP_COREDIR: /var/crash
> crashkernel addr:
> /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.4.0-9136-generic
> kdump initrd:
> /var/lib/kdump/initrd.img: symbolic link to
> /var/lib/kdump/initrd.img-4.4.0-9136-generic
> current state: ready to kdump
>
> kexec command:
> /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic
> root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro quiet splash irqpoll
> nr_cpus=1 nousb systemd.unit=kdump-tools.service"
> --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
>
> root@ltc-brazos1:~# cat /proc/cmdline
> BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic
> root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro crashkernel=4096M quiet
> splash crashkernel=4096M
>
> root@ltc-brazos1:~# dmesg | grep -i crash
> [ 0.000000] Reserving 4096MB of memory at 128MB for crashkernel (System
> RAM: 31744000MB)
> [ 0.000000] Kernel command line:
> BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic
> root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro crashkernel=4096M quiet
> splash crashkernel=4096M
>
> == Comment: #26 - Hari Krishna Bathini <[email protected]> - 2017-02-01
> 02:02:36 ==
> The following kexec-tools commit is needed to fix this issue:
>
> commit f63d8530b9b6a2d7e79b946e326e5a2197eb8f87
> Author: Petr Tesarik <[email protected]>
> Date: Thu Jan 19 18:37:09 2017 +0100
>
> ppc64: Reduce number of ELF LOAD segments
>
> The number of program header table entries (e_phnum) is an Elf64_Half,
> which is a 16-bit entity, i.e. the limit is 65534 entries (one entry is
> reserved for NOTE). This is a hard limit, defined by the ELF standard.
> It is possible that more LMBs (Logical Memory Blocks) are needed to
> represent all RAM on some machines, and this field overflows, causing
> an incomplete /proc/vmcore file.
>
> This has actually happened on a machine with 31TB of RAM and an LMB size
> of 256MB.
>
> However, since there is usually no memory hole between adjacent LMBs, the
> map can be "compressed", combining multiple adjacent into a single LOAD
> segment.
>
> Signed-off-by: Petr Tesarik <[email protected]>
> Signed-off-by: Simon Horman <[email protected]>
>
> ** Affects: kexec-tools (Ubuntu)
> Importance: Undecided
> Assignee: Taco Screen team (taco-screen-team)
> Status: New
>
>
> ** Tags: architecture-ppc64le bugnameltc-143828 severity-high
> targetmilestone-inin---
>
> ** Tags added: architecture-ppc64le bugnameltc-143828 severity-high
> targetmilestone-inin---
>
> ** Changed in: ubuntu
> Assignee: (unassigned) => Taco Screen team (taco-screen-team)
>
> ** Package changed: ubuntu => kexec-tools (Ubuntu)
>
--
Michael Hohnbaum
OIL Program Manager
Power (ppc64el) Development Project Manager
Canonical, Ltd.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1661168
Title:
In Ubuntu16.10: Kdump stuck in boot for longer time need to force
reboot via HMC in 32TB Brazos System
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1661168/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs