Louis,

While we can't test this without access to a machine with large amounts
of memory, is it possible to apply this patch and provide an image to
IBM for testing?

                      Michael


On 02/01/2017 11:09 PM, bugproxy wrote:
> Public bug reported:
>
> Problem Description
> ===========================
>   In Ubuntu16.10 tried  kdump in Brazos system (32TB Memory and 192 core). 
> when trigger panic  kdump process  stuck in boot process  need to do force 
> reboot .After reboot system captured vmcore-incomplete.
>
> Reproducible Step:
> ======================
> 1- Install Ubuntu16.10
> 2- boot system with 31TB and 192 Core 
> 3- configure kdump in system 
> 4- verify kdump in system that it is running 
> 5- Trigger panic in system
>
> Actual Result
> --------------------------
> kdump process  stuck in boot process  need to do force reboot
>
> Expected Result 
> -----------------------------
> Kdump will proceed and vmcore captured  successfully.
>
> LOG:
>
> root@ltc-brazos1:~# cat /proc/cmdline 
> BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic 
> root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash 
> crashkernel=4096M
> root@ltc-brazos1:~# kdump-config show
> DUMP_MODE:        kdump
> USE_KDUMP:        1
> KDUMP_SYSCTL:     kernel.panic_on_oops=1
> KDUMP_COREDIR:    /var/crash
> crashkernel addr: 
>    /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.4.0-30-generic
> kdump initrd: 
>    /var/lib/kdump/initrd.img: symbolic link to 
> /var/lib/kdump/initrd.img-4.4.0-30-generic
> current state:    ready to kdump
>
> kexec command:
>   /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.4.0-30-generic 
> root=UUID=516c4b1b-6700-4b55-bd37-d61c4c5af6af ro quiet splash irqpoll 
> nr_cpus=1 nousb systemd.unit=kdump-tools.service" 
> --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
> root@ltc-brazos1:~# 
> root@ltc-brazos1:~# dpkg -l | grep kdump         
> ii  kdump-tools                        1:1.6.0-2                         all  
>         scripts and tools for automating kdump (Linux crash dumps)
> root@ltc-brazos1:~# 
> root@ltc-brazos1:~# echo c > /proc/sysrq-trigger 
>
>
> ltc-brazos1 login: [  416.229464] sysrq: SysRq : Trigger a crash              
>                                                                               
>                 
> [  416.229496] Unable to handle kernel paging request for data at address 
> 0x00000000                                                                    
>                     
> [  416.229502] Faulting instruction address: 0xc000000000670014               
>                                                                               
>                 
> [  416.229508] Oops: Kernel access of bad area, sig: 11 [#1]                  
>                                                                               
>                 
> [  416.229511] SMP NR_CPUS=2048 NUMA pSeries                                  
>                                                                               
>                 
> [  416.229517] Modules linked in: pseries_rng btrfs xor raid6_pq rtc_generic 
> sunrpc autofs4 ses enclosure ipr                                              
>                  
> [  416.229532] CPU: 65 PID: 404785 Comm: bash Not tainted 4.4.0-30-generic 
> #49-Ubuntu                                                                    
>                    
> [  416.229537] task: c00001f9d583c8e0 ti: c00001fa13cd8000 task.ti: 
> c00001fa13cd8000                                                              
>                           
> [  416.229543] NIP: c000000000670014 LR: c0000000006710c8 CTR: 
> c00000000066ffe0                                                              
>                                
> [  416.229548] REGS: c00001fa13cdb990 TRAP: 0300   Not tainted  
> (4.4.0-30-generic)                                                            
>                               
> [  416.229552] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28242222  
> XER: 00000001                                                                 
>                    
> [  416.229565] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 
> SOFTE: 1                                                                      
>                   
> GPR00: c0000000006710c8 c00001fa13cdbc10 c0000000015b5d00 0000000000000063    
>                                                                               
>                 
> GPR04: c00001fab9049c50 c00001fab905b4e0 c0001f3fff3d0000 0000000000000313    
>                                                                               
>                 
> GPR08: 0000000000000007 0000000000000001 0000000000000000 c0001f3fff3dec68    
>                                                                               
>                 
> GPR12: c00000000066ffe0 c000000007546980 ffffffffffffffff 0000000022000000 
> GPR16: 0000000010170dc8 00000100174901d8 0000000010140f58 00000000100c7570 
> GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608 
> GPR24: 00003ffff8966c94 0000000000000001 c0000000014f8e58 0000000000000004 
> GPR28: c0000000014f9218 0000000000000063 c0000000014b11dc 0000000000000000 
> [  416.229631] NIP [c000000000670014] sysrq_handle_crash+0x34/0x50
> [  416.229636] LR [c0000000006710c8] __handle_sysrq+0xe8/0x270
> [  416.229640] Call Trace:
> [  416.229645] [c00001fa13cdbc10] [c000000000e08f28] 
> _fw_tigon_tg3_bin_name+0x2ce58/0x342b0 (unreliable)
> [  416.229652] [c00001fa13cdbc30] [c0000000006710c8] __handle_sysrq+0xe8/0x270
> [  416.229658] [c00001fa13cdbcd0] [c000000000671868] 
> write_sysrq_trigger+0x78/0xa0
> [  416.229666] [c00001fa13cdbd00] [c00000000037ae30] proc_reg_write+0xb0/0x110
> [  416.229673] [c00001fa13cdbd50] [c0000000002e186c] __vfs_write+0x6c/0xe0
> [  416.229678] [c00001fa13cdbd90] [c0000000002e25a0] vfs_write+0xc0/0x230
> [  416.229684] [c00001fa13cdbde0] [c0000000002e35dc] SyS_write+0x6c/0x110
> [  416.229690] [c00001fa13cdbe30] [c000000000009204] system_call+0x38/0xb4
> [  416.229695] Instruction dump:
> [  416.229698] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 
> 394931e4 
> [  416.229707] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 
> e8010010 7c0803a6 
> [  416.229717] ---[ end trace 16e5fbbf7faa7340 ]---
> [  416.232059] 
> [  416.232086] Sending IPI to other CPUs
> [  416.242558] IPI complete
> [  [  416.229695] Instruction dump:                                           
>                                                                               
>                    
> [  416.229698] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 
> 394931e4                                                                      
>                 
> [  416.229707] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 
> e8010010 7c0803a6                                                             
>                        
> [  416.229717] ---[ end trace 16e5fbbf7faa7340 ]---                           
>                                                                               
>                 
> [  416.232059]                                                                
>                                                                               
>                 
> [  416.232086] Sending IPI to other CPUs                                      
>                                                                               
>                 
> [  416.242558] IPI complete                                                   
>                                                                               
>                 
> I'm in purgatory                                                              
>                                                                               
>                 
>  -> smp_release_cpus()                                                        
>                                                                               
>                 
> spinning_secondaries = 1528                                                   
>                                                                               
>                 
>  <- smp_release_cpus()                                                        
>                                                                               
>                 
>  <- setup_system()                                                            
>                                                                               
>                 
> [    1.146155] sd 0:2:1:0: [sdb] Assuming drive cache: write through          
>                                                                               
>                 
> [    1.154176] sd 0:2:0:0: [sda] Assuming drive cache: write through          
>                                                                               
>                 
> /dev/sdb2: recovering journal                                                 
>                                                                               
>                 
> /dev/sdb2: clean, 69482/136331264 files, 9047821/545318400 blocks  
>
> ---------------------------------------------------------------------------------------
> --------------------------------------------------------------------------------------
> tu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
> 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  
> .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  
> .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
> 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  
> .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  
> .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
> 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  
> .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  
> .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
> 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  
> .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  
> .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
> 16.101;-1f.  .  .  .1;-1fUbuntu 1
 6.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101
 ;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  
.1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.
   .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  
.1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  
.  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.  .  .  .1;-1fUbuntu 
16.101;-1f.  .  .  .1;-1fUbuntu 16.101;-1f.
>
>
> ---------------------------------------------------------------------------------------
> --------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------
> --------------------------------------------------------------------------------------
>
> after force reboot
>
> root@ltc-brazos1:/var/crash# ls
> 201607161510  kexec_cmd
> root@ltc-brazos1:/var/crash# cd 201607161510/
> root@ltc-brazos1:/var/crash/201607161510# ls
> vmcore-incomplete
> root@ltc-brazos1:
>
> Note :  waited for Kdump process more than 2 Hour .
>
> Regards
> Praveen
>
> == Comment: #12 - Vaishnavi Bhat <[email protected]> - 2016-09-16 02:40:20 
> ==
> root@ltc-brazos1:~# kdump-config show 
> DUMP_MODE:        kdump
> USE_KDUMP:        1
> KDUMP_SYSCTL:     kernel.panic_on_oops=1
> KDUMP_COREDIR:    /var/crash
> crashkernel addr: 
>    /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinux-4.4.0-9136-generic
> kdump initrd: 
>    /var/lib/kdump/initrd.img: symbolic link to 
> /var/lib/kdump/initrd.img-4.4.0-9136-generic
> current state:    ready to kdump
>
> kexec command:
>   /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic 
> root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro quiet splash irqpoll 
> nr_cpus=1 nousb systemd.unit=kdump-tools.service" 
> --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
>
> root@ltc-brazos1:~# cat /proc/cmdline 
> BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic 
> root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro crashkernel=4096M quiet 
> splash crashkernel=4096M
>
> root@ltc-brazos1:~# dmesg  | grep -i crash
> [    0.000000] Reserving 4096MB of memory at 128MB for crashkernel (System 
> RAM: 31744000MB)
> [    0.000000] Kernel command line: 
> BOOT_IMAGE=/boot/vmlinux-4.4.0-9136-generic 
> root=UUID=bfdd4041-1b2f-42b1-b202-2c09f781bbcc ro crashkernel=4096M quiet 
> splash crashkernel=4096M
>
> == Comment: #26 - Hari Krishna Bathini <[email protected]> - 2017-02-01 
> 02:02:36 ==
> The following kexec-tools commit is needed to fix this issue:
>
>   commit f63d8530b9b6a2d7e79b946e326e5a2197eb8f87
>   Author: Petr Tesarik <[email protected]>
>   Date:   Thu Jan 19 18:37:09 2017 +0100
>
>     ppc64: Reduce number of ELF LOAD segments
>     
>     The number of program header table entries (e_phnum) is an Elf64_Half,
>     which is a 16-bit entity, i.e. the limit is 65534 entries (one entry is
>     reserved for NOTE). This is a hard limit, defined by the ELF standard.
>     It is possible that more LMBs (Logical Memory Blocks) are needed to
>     represent all RAM on some machines, and this field overflows, causing
>     an incomplete /proc/vmcore file.
>     
>     This has actually happened on a machine with 31TB of RAM and an LMB size
>     of 256MB.
>     
>     However, since there is usually no memory hole between adjacent LMBs, the
>     map can be "compressed", combining multiple adjacent into a single LOAD
>     segment.
>     
>     Signed-off-by: Petr Tesarik <[email protected]>
>     Signed-off-by: Simon Horman <[email protected]>
>
> ** Affects: kexec-tools (Ubuntu)
>      Importance: Undecided
>      Assignee: Taco Screen team (taco-screen-team)
>          Status: New
>
>
> ** Tags: architecture-ppc64le bugnameltc-143828 severity-high 
> targetmilestone-inin---
>
> ** Tags added: architecture-ppc64le bugnameltc-143828 severity-high
> targetmilestone-inin---
>
> ** Changed in: ubuntu
>      Assignee: (unassigned) => Taco Screen team (taco-screen-team)
>
> ** Package changed: ubuntu => kexec-tools (Ubuntu)
>

-- 
Michael Hohnbaum
OIL Program Manager
Power (ppc64el) Development Project Manager
Canonical, Ltd.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1661168

Title:
  In Ubuntu16.10: Kdump stuck in  boot for longer time need to force
  reboot via HMC in 32TB Brazos System

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1661168/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to