Public bug reported:
Problem Description
===========================
In Ubuntu16.10 tried kdump in Brazos system (32TB Memory and 192 core). when
trigger panic kdump process stuck in boot process need to do force reboot
.After reboot system captured vmcore-incomplete.
Reproducible Step:
======================
1- Install Ubuntu16.10
2- boot system with 31TB and 192 Core
3- configure kdump in system
4- verify kdump in system that it is running
5- Trigger panic in system
Actual Result
--------------------------
kdump process stuck in boot process need to do force reboot
Expected Result
-----------------------------
Kdump will proceed and vmcore captured successfully.
LOG:
root@ltc-brazos1:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic
root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash crashkernel=4096M
root@ltc-brazos1:~# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr:
/var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.4.0-30-generic
kdump initrd:
/var/lib/kdump/initrd.img: symbolic link to
/var/lib/kdump/initrd.img-4.4.0-30-generic
current state: ready to kdump
kexec command:
/sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic
root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash irqpoll
nr_cpus=1 nousb systemd.unit=kdump-tools.service"
--initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
root@ltc-brazos1:~#
root@ltc-brazos1:~# dpkg -l | grep kdump
ii kdump-tools 1:1.6.0-2 all
scripts and tools for automating kdump (Linux crash dumps)
root@ltc-brazos1:~#
root@ltc-brazos1:~# echo c > /proc/sysrq-trigger
ltc-brazos1 login: [ 416.229464] sysrq: SysRq : Trigger a crash
[ 416.229496] Unable to handle kernel paging request for data at address
0x00000000
[ 416.229502] Faulting instruction address: 0xc000000000670014
[ 416.229508] Oops: Kernel access of bad area, sig: 11 [#1]
[ 416.229511] SMP NR_CPUS=2048 NUMA pSeries
[ 416.229517] Modules linked in: pseries_rng btrfs xor raid6_pq rtc_generic
sunrpc autofs4 ses enclosure ipr
[ 416.229532] CPU: 65 PID: 404785 Comm: bash Not tainted 4.4.0-30-generic
#49-Ubuntu
[ 416.229537] task: c00001f9d583c8e0 ti: c00001fa13cd8000 task.ti:
c00001fa13cd8000
[ 416.229543] NIP: c000000000670014 LR: c0000000006710c8 CTR: c00000000066ffe0
[ 416.229548] REGS: c00001fa13cdb990 TRAP: 0300 Not tainted
(4.4.0-30-generic)
[ 416.229552] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28242222 XER:
00000001
[ 416.229565] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000
SOFTE: 1
GPR00: c0000000006710c8 c00001fa13cdbc10 c0000000015b5d00 0000000000000063
GPR04: c00001fab9049c50 c00001fab905b4e0 c0001f3fff3d0000 0000000000000313
GPR08: 0000000000000007 0000000000000001 0000000000000000 c0001f3fff3dec68
GPR12: c00000000066ffe0 c000000007546980 ffffffffffffffff 0000000022000000
GPR16: 0000000010170dc8 00000100174901d8 0000000010140f58 00000000100c7570
GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608
GPR24: 00003ffff8966c94 0000000000000001 c0000000014f8e58 0000000000000004
GPR28: c0000000014f9218 0000000000000063 c0000000014b11dc 0000000000000000
[ 416.229631] NIP [c000000000670014] sysrq_handle_crash+0x34/0x50
[ 416.229636] LR [c0000000006710c8] __handle_sysrq+0xe8/0x270
[ 416.229640] Call Trace:
[ 416.229645] [c00001fa13cdbc10] [c000000000e08f28]
_fw_tigon_tg3_bin_name+0x2ce58/0x342b0 (unreliable)
[ 416.229652] [c00001fa13cdbc30] [c0000000006710c8] __handle_sysrq+0xe8/0x270
[ 416.229658] [c00001fa13cdbcd0] [c000000000671868]
write_sysrq_trigger+0x78/0xa0
[ 416.229666] [c00001fa13cdbd00] [c00000000037ae30] proc_reg_write+0xb0/0x110
[ 416.229673] [c00001fa13cdbd50] [c0000000002e186c] __vfs_write+0x6c/0xe0
[ 416.229678] [c00001fa13cdbd90] [c0000000002e25a0] vfs_write+0xc0/0x230
[ 416.229684] [c00001fa13cdbde0] [c0000000002e35dc] SyS_write+0x6c/0x110
[ 416.229690] [c00001fa13cdbe30] [c000000000009204] system_call+0x38/0xb4
[ 416.229695] Instruction dump:
[ 416.229698] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019
394931e4
[ 416.229707] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010
7c0803a6
[ 416.229717] ---[ end trace 16e5fbbf7faa7340 ]---
[ 416.232059]
[ 416.232086] Sending IPI to other CPUs
[ 416.242558] IPI complete
[ [ 416.229695] Instruction dump:
[ 416.229698] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019
394931e4
[ 416.229707] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010
7c0803a6
[ 416.229717] ---[ end trace 16e5fbbf7faa7340 ]---
[ 416.232059]
[ 416.232086] Sending IPI to other CPUs
[ 416.242558] IPI complete
I'm in purgatory
-> smp_release_cpus()
spinning_secondaries = 1528
<- smp_release_cpus()
<- setup_system()
[ 1.146155] sd 0:2:1:0: [sdb] Assuming drive cache: write through
[ 1.154176] sd 0:2:0:0: [sda] Assuming drive cache: write through
/dev/sdb2: recovering journal
/dev/sdb2: clean, 69482/136331264 files, 9047821/545318400 blocks
---------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
tu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f.
. . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . .
.1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.
101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . .
.1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-
1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . .
.1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f.
. . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . .
.1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. .
. .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu 16.101;-1f. . . .1;-1fUbuntu
16.101;-1f. . . .1;-1fUbuntu 16.101;-1f.
---------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
after force reboot
root@ltc-brazos1:/var/crash# ls
201607161510 kexec_cmd
root@ltc-brazos1:/var/crash# cd 201607161510/
root@ltc-brazos1:/var/crash/201607161510# ls
vmcore-incomplete
root@ltc-brazos1:
Note : waited for Kdump process more than 2 Hour .
Regards
Praveen
== Comment: #12 - Vaishnavi Bhat <[email protected]> - 2016-09-16 02:40:20 ==
root@ltc-brazos1:~# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr:
/var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.4.0-9136-generic
kdump initrd:
/var/lib/kdump/initrd.img: symbolic link to
/var/lib/kdump/initrd.img-4.4.0-9136-generic
current state: ready to kdump
kexec command:
/sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic
root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro quiet splash irqpoll
nr_cpus=1 nousb systemd.unit=kdump-tools.service"
--initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
root@ltc-brazos1:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic
root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro crashkernel=4096M quiet
splash crashkernel=4096M
root@ltc-brazos1:~# dmesg | grep -i crash
[ 0.000000] Reserving 4096MB of memory at 128MB for crashkernel (System RAM:
31744000MB)
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic
root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro crashkernel=4096M quiet
splash crashkernel=4096M
== Comment: #26 - Hari Krishna Bathini <[email protected]> - 2017-02-01
02:02:36 ==
The following kexec-tools commit is needed to fix this issue:
commit f63d8530b9b6a2d7e79b946e326e5a2197eb8f87
Author: Petr Tesarik <[email protected]>
Date: Thu Jan 19 18:37:09 2017 +0100
ppc64: Reduce number of ELF LOAD segments
The number of program header table entries (e_phnum) is an Elf64_Half,
which is a 16-bit entity, i.e. the limit is 65534 entries (one entry is
reserved for NOTE). This is a hard limit, defined by the ELF standard.
It is possible that more LMBs (Logical Memory Blocks) are needed to
represent all RAM on some machines, and this field overflows, causing
an incomplete /proc/vmcore file.
This has actually happened on a machine with 31TB of RAM and an LMB size
of 256MB.
However, since there is usually no memory hole between adjacent LMBs, the
map can be "compressed", combining multiple adjacent into a single LOAD
segment.
Signed-off-by: Petr Tesarik <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
** Affects: kexec-tools (Ubuntu)
Importance: Undecided
Assignee: Taco Screen team (taco-screen-team)
Status: New
** Tags: architecture-ppc64le bugnameltc-143828 severity-high
targetmilestone-inin---
** Tags added: architecture-ppc64le bugnameltc-143828 severity-high
targetmilestone-inin---
** Changed in: ubuntu
Assignee: (unassigned) => Taco Screen team (taco-screen-team)
** Package changed: ubuntu => kexec-tools (Ubuntu)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1661168
Title:
In Ubuntu16.10: Kdump stuck in boot for longer time need to force
reboot via HMC in 32TB Brazos System
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1661168/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs