Public bug reported: Problem Description ========================== On talclp1, I enabled kdump. But kdump failed and it drop to BusyBox.
root@talclp1:~# echo c> /proc/sysrq-trigger [ 132.643690] sysrq: SysRq : Trigger a crash [ 132.643739] Unable to handle kernel paging request for data at address 0x00000000 [ 132.643745] Faulting instruction address: 0xc0000000005c28f4 [ 132.643749] Oops: Kernel access of bad area, sig: 11 [#1] [ 132.643753] SMP NR_CPUS=2048 NUMA pSeries [ 132.643758] Modules linked in: fuse ufs qnx4 hfsplus hfs minix ntfs msdos jfs rpadlpar_io rpaphp rpcsec_gss_krb5 nfsv4 dccp_diag cifs nfs dns_resolver dccp tcp_diag fscache udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc xfs libcrc32c pseries_rng rng_core ghash_generic gf128mul vmx_crypto sg nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto mbcache crc32c_generic btrfs xor raid6_pq dm_round_robin sr_mod sd_mod cdrom ses enclosure scsi_transport_sas ibmveth crc32c_vpmsum ipr scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath dm_mod [ 132.643819] CPU: 49 PID: 10174 Comm: bash Not tainted 4.8.0-15-generic #16-Ubuntu [ 132.643824] task: c000000111767080 task.stack: c0000000d82e0000 [ 132.643828] NIP: c0000000005c28f4 LR: c0000000005c39d8 CTR: c0000000005c28c0 [ 132.643832] REGS: c0000000d82e3990 TRAP: 0300 Not tainted (4.8.0-15-generic) [ 132.643836] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28242422 XER: 00000001 [ 132.643848] CFAR: c0000000000087d0 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1 GPR00: c0000000005c39d8 c0000000d82e3c10 c000000000f67b00 0000000000000063 GPR04: c00000011d04a9b8 c00000011d05f7e0 c00000047fb00000 0000000000015998 GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000001 GPR12: c0000000005c28c0 c000000007b4b900 ffffffffffffffff 0000000022000000 GPR16: 0000000010170dc8 000001002b566368 0000000010140f58 00000000100c7570 GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608 GPR24: 00003ffffe87a294 0000000000000001 c000000000ebff60 0000000000000004 GPR28: c000000000ec0320 0000000000000063 c000000000e72a90 0000000000000000 [ 132.643906] NIP [c0000000005c28f4] sysrq_handle_crash+0x34/0x50 [ 132.643911] LR [c0000000005c39d8] __handle_sysrq+0xe8/0x280 [ 132.643914] Call Trace: [ 132.643917] [c0000000d82e3c10] [c000000000a245e8] 0xc000000000a245e8 (unreliable) [ 132.643923] [c0000000d82e3c30] [c0000000005c39d8] __handle_sysrq+0xe8/0x280 [ 132.643928] [c0000000d82e3cd0] [c0000000005c4188] write_sysrq_trigger+0x78/0xa0 [ 132.643935] [c0000000d82e3d00] [c0000000003ad770] proc_reg_write+0xb0/0x110 [ 132.643941] [c0000000d82e3d50] [c00000000030fc3c] __vfs_write+0x6c/0xe0 [ 132.643946] [c0000000d82e3d90] [c000000000311144] vfs_write+0xd4/0x240 [ 132.643950] [c0000000d82e3de0] [c000000000312e5c] SyS_write+0x6c/0x110 [ 132.643957] [c0000000d82e3e30] [c0000000000095e0] system_call+0x38/0x108 [ 132.643961] Instruction dump: [ 132.643963] 38425240 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 3949ba60 [ 132.643972] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6 [ 132.643981] ---[ end trace eed6bbcd2c3bdfdf ]--- [ 132.646105] [ 132.646176] Sending IPI to other CPUs [ 132.647490] IPI complete I'm in purgatory -> smp_release_cpus() spinning_secondaries = 104 <- smp_release_cpus() [ 2.011346] alg: hash: Test 1 failed for crc32c-vpmsum [ 2.729254] sd 0:2:0:0: [sda] Assuming drive cache: write through [ 2.731554] sd 1:2:5:0: [sdn] Assuming drive cache: write through [ 2.739087] sd 1:2:4:0: [sdm] Assuming drive cache: write through [ 2.739089] sd 1:2:6:0: [sdo] Assuming drive cache: write through [ 2.739110] sd 1:2:7:0: [sdp] Assuming drive cache: write through [ 2.739115] sd 1:2:0:0: [sdi] Assuming drive cache: write through [ 2.739122] sd 1:2:3:0: [sdl] Assuming drive cache: write through [ 2.739123] sd 1:2:2:0: [sdk] Assuming drive cache: write through [ 2.739148] sd 1:2:1:0: [sdj] Assuming drive cache: write through [ 2.748938] sd 0:2:1:0: [sdb] Assuming drive cache: write through [ 2.748939] sd 0:2:7:0: [sdh] Assuming drive cache: write through [ 2.748940] sd 0:2:6:0: [sdg] Assuming drive cache: write through [ 2.748942] sd 0:2:2:0: [sdc] Assuming drive cache: write through [ 2.748958] sd 0:2:5:0: [sdf] Assuming drive cache: write through [ 2.748963] sd 0:2:4:0: [sde] Assuming drive cache: write through [ 2.748978] sd 0:2:3:0: [sdd] Assuming drive cache: write through [ 2.999087] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.119912] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.252513] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.343680] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.381234] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 3.419515] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.474587] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 3.482188] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.531439] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 3.552824] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.594489] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 3.619222] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.672208] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.680298] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 3.731718] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.761333] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 3.794955] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.819212] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 3.871913] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.889439] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 3.922620] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 3.960707] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 4.002959] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 4.035611] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 4.054476] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 4.092241] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 4.099432] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 4.182358] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 4.182823] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 4.234767] device-mapper: table: 254:1: multipath: error attaching hardware handler [ 4.333309] device-mapper: table: 254:0: multipath: error attaching hardware handler [ 4.402827] device-mapper: table: 254:0: multipath: error attaching hardware handler Gave up waiting for root device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait long enough?) - Check root= (did the system wait for the right device?) - Missing modules (cat /proc/modules; ls /dev) ALERT! UUID=853769e5-1dc5-41be-a689-b430320d207f does not exist. Dropping to a shell! BusyBox v1.22.1 (Ubuntu 1:1.22.0-19ubuntu2) built-in shell (ash) Enter 'help' for a list of built-in commands. (initramfs) == Comment: #7 - Vaishnavi Bhat <[email protected]> - 2016-10-07 05:37:53 == The blkid output does not show any device with UUID=853769e5-1dc5-41be-a689-b430320d207f which is the root device used in the kexec command line (from kdump-config show) kexec command: /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinux-4.8.0-15-generic root=UUID=853769e5-1dc5-41be-a689-b430320d207f ro xmon=on splash quiet irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz Hence the kdump kernel is failing to boot here. == Comment: #11 - Xue Sheng Li <[email protected]> - 2016-10-17 01:54:56 == recreated with -24 kernel. root@talclp1:~# echo c > /proc/sysrq-trigger [ 72.655416] sysrq: SysRq : Trigger a crash [ 72.655458] Unable to handle kernel paging request for data at address 0x00000000 [ 72.655463] Faulting instruction address: 0xc00000000069d148 [ 72.655469] Oops: Kernel access of bad area, sig: 11 [#1] [ 72.655472] SMP NR_CPUS=2048 NUMA pSeries [ 72.655477] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 nfsv4 nfs cifs fscache binfmt_misc xfs pseries_rng vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs xor raid6_pq dm_round_robin ses enclosure scsi_transport_sas bnx2x ipr mdio libcrc32c crc32c_vpmsum scsi_dh_emc scsi_dh_rdac scsi_dh_alua dm_multipath [ 72.655521] CPU: 25 PID: 9730 Comm: bash Not tainted 4.8.0-24-generic #26-Ubuntu [ 72.655525] task: c0000001d8451e00 task.stack: c0000001d8494000 [ 72.655529] NIP: c00000000069d148 LR: c00000000069e198 CTR: c00000000069d120 [ 72.655534] REGS: c0000001d84979f0 TRAP: 0300 Not tainted (4.8.0-24-generic) [ 72.655537] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28242222 XER: 00000001 [ 72.655549] CFAR: c000000000008750 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1 GPR00: c00000000069e198 c0000001d8497c70 c000000001476700 0000000000000063 GPR04: c00000047e64aca0 c00000047e65fb40 c00000047df00000 0000000000015ed8 GPR08: 0000000000000007 0000000000000001 0000000000000000 0000000000000001 GPR12: c00000000069d120 c000000007b3e100 ffffffffffffffff 0000000022000000 GPR16: 0000000010170dc8 0000010036d36398 0000000010140f58 00000000100c7570 GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608 GPR24: 00003ffff5582464 0000000000000001 c00000000138e6a0 0000000000000004 GPR28: c00000000138ea60 0000000000000063 c000000001342590 0000000000000000 [ 72.655608] NIP [c00000000069d148] sysrq_handle_crash+0x28/0x30 [ 72.655613] LR [c00000000069e198] __handle_sysrq+0xe8/0x280 [ 72.655616] Call Trace: [ 72.655619] [c0000001d8497c70] [c00000000069e178] __handle_sysrq+0xc8/0x280 (unreliable) [ 72.655625] [c0000001d8497d10] [c00000000069e8ec] write_sysrq_trigger+0x6c/0x90 [ 72.655631] [c0000001d8497d40] [c0000000003a9568] proc_reg_write+0x88/0xd0 [ 72.655637] [c0000001d8497d70] [c00000000030c40c] __vfs_write+0x3c/0x70 [ 72.655642] [c0000001d8497d90] [c00000000030d674] vfs_write+0xd4/0x240 [ 72.655647] [c0000001d8497de0] [c00000000030f1c8] SyS_write+0x68/0x110 [ 72.655652] [c0000001d8497e30] [c000000000009584] system_call+0x38/0xec [ 72.655656] Instruction dump: [ 72.655658] 60000000 60000000 3c4c00de 384295e0 7c0802a6 60000000 3d22001a 3949c8e0 [ 72.655667] 39200001 912a0000 7c0004ac 39400000 <992a0000> 4e800020 3c4c00de 384295b0 [ 72.655677] ---[ end trace 43b490f085103bf5 ]--- [ 72.659366] [ 72.659429] Sending IPI to other CPUs [ 72.660740] IPI complete I'm in purgatory -> smp_release_cpus() spinning_secondaries = 104 <- smp_release_cpus() [ 1.699068] ibmveth 30000002 (unnamed net_device) (uninitialized): unable to change IPv4 checksum offload settings. 1 rc=4 [ 1.699093] ibmveth 30000002 (unnamed net_device) (uninitialized): unable to change IPv6 checksum offload settings. 1 rc=4 [ 1.699101] ibmveth 30000002 (unnamed net_device) (uninitialized): unable to change tso settings. 1 rc=4 [ 2.657700] sd 0:2:1:0: [sdb] Assuming drive cache: write through [ 2.657701] sd 0:2:0:0: [sda] Assuming drive cache: write through [ 2.657781] sd 0:2:2:0: [sdc] Assuming drive cache: write through [ 2.660641] sd 0:2:7:0: [sdh] Assuming drive cache: write through [ 2.667731] sd 0:2:4:0: [sde] Assuming drive cache: write through [ 2.677685] sd 0:2:6:0: [sdg] Assuming drive cache: write through [ 2.677688] sd 0:2:5:0: [sdf] Assuming drive cache: write through [ 2.677708] sd 0:2:3:0: [sdd] Assuming drive cache: write through [ 2.697737] sd 1:2:6:0: [sdo] Assuming drive cache: write through [ 2.697743] sd 1:2:1:0: [sdj] Assuming drive cache: write through [ 2.697744] sd 1:2:4:0: [sdm] Assuming drive cache: write through [ 2.697747] sd 1:2:2:0: [sdk] Assuming drive cache: write through [ 2.697749] sd 1:2:3:0: [sdl] Assuming drive cache: write through [ 2.697753] sd 1:2:5:0: [sdn] Assuming drive cache: write through [ 2.699340] sd 1:2:7:0: [sdp] Assuming drive cache: write through [ 2.699360] sd 1:2:0:0: [sdi] Assuming drive cache: write through [ 3.350794] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 3.471468] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 3.540387] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 3.628523] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 3.657731] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 3.733416] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 3.752066] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 3.808884] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 3.838148] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 3.919247] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 3.950262] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 3.997839] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 4.007810] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 4.082174] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 4.089411] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 4.162200] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 4.202441] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 4.252289] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 4.279870] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 4.311712] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 4.348150] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 4.402076] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 4.432069] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 4.487871] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 4.518282] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 4.573338] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 4.599280] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 4.632144] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 4.671142] device-mapper: table: 252:1: multipath: error attaching hardware handler [ 4.713352] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 4.782117] device-mapper: table: 252:0: multipath: error attaching hardware handler [ 4.890336] device-mapper: table: 252:0: multipath: error attaching hardware handler == Comment: #13 - Hari Krishna Bathini <[email protected]> - 2016-10-19 16:26:57 == (In reply to comment #12) > Hi Hari, > > Can you please take a look at this issue and suggest what would be the next > step ? > We are facing this issue with -24 kernel as well. Can this be a issue with > kdump kernel that has missing multipath modules or some other issue ? > Hi Vaishnavi, Necessary hardware handler modules are missing in the kdump initrd. Here is the console log of kdump kernel that says the same: -- Begin: Loading multipath hardware handlers ... Failure: failed to load module scsi_dh_alua. Failure: failed to load module scsi_dh_rdac. Failure: failed to load module scsi_dh_emc. -- Including this modules explicitly and rebuilding initrd for kdump, able to get to a point where makedumpfile starts to capture dump but fails with: "get_mem_map: Can't distinguish the memory type." which is already tracked with bug 146571 Thanks Hari PS1: To explicitly add modules to kdump initrd 1. List the necessary modules in /var/lib/kdump/initramfs-tools/modules file 2. mkinitramfs -d /var/lib/kdump/initramfs-tools -o /var/lib/kdump/initrd.img-$kver 3. systemctl restart kdump-tools.service Mirroring this bug to Canonical for their inputs if to include the missing hardware modules to the kdump initrd or to proceed with the workaround. ** Affects: linux (Ubuntu) Importance: Undecided Assignee: Taco Screen team (taco-screen-team) Status: New ** Tags: architecture-ppc64le bugnameltc-146907 severity-high targetmilestone-inin--- ** Tags added: architecture-ppc64le bugnameltc-146907 severity-high targetmilestone-inin--- -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1635597 Title: Ubuntu16.10:talclp1: Kdump failed with multipath disk To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1635597/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
