This might or might not be relevant:
Detected CPU family 6 model 63^M
UNSUPPORTED HARDWARE DEVICE: Intel CPU model^M
------------[ cut here ]------------^M
WARNING: at kernel/rh_taint.c:13 mark_hardware_unsupported+0x39/0x40() (Not
tainted)^M
Hardware name: Lenovo NeXtScale nx360 M5: -[5465AC1]-^M
Your hardware is unsupported. Please do not report bugs, panics, oopses, etc.,
on this hardware.^M
On Jun 25, 2015, at 3:46 PM, David D Johnson <[email protected]> wrote:
> Well, that got me a shell, and the system got the correct address on eth0.
> Waiting for device with address 40:f2:e9:c5:54:00 to appear..Done
> Acquired IPv4 address on eth0: 172.20.204.56/16
> FATAL: Module ipmi_si not found.
> Dropping to debug shell, exit to check for further action
> [xCAT Genesis running on node855 /]# ***A request variable unknown to the
> server
> ***A request variable unknown to the server
> ***A request variable unknown to the server
> ***A request variable unknown to the server
> ***A request variable unknown to the server
> ***A request variable unknown to the server
> ***A request variable unknown to the server
> ***A request variable unknown to the server
> ***A request variable unknown to the server
> ***A request variable unknown to the server
>
> One of these messages every second.
>
> So several annoying issues:
> 1) the ASU PXE lines do not show up in any predictable order.
> The second new host has them scrambled, #2 is the shared 1GbE port...
> PXE.NicPortMacAddress.1=E4:1D:2D:73:57:B1
> PXE.NicPortMacAddress.2=40:F2:E9:C5:54:00
> PXE.NicPortMacAddress.3=40:F2:E9:C5:54:01
> PXE.NicPortMacAddress.4=E4:1D:2D:73:57:B2
>
> I have been used to grabbing Address.1 and stuffing the MAC into tabedit mac.
>
> 2) view of Lenovo M5 node booting is very different, some lines do not show
> up in the serial redirected rcons view that do show up with a real KVM
> console.
>
> But the main issue remaining 3) is why the boot fails.
> I just tried again nodeset node855 osimage=centos6.5-x86_64-netboot-comp
> and rebooted...
>
> Trying to unpack rootfs image as initramfs...
> Freeing initrd memory: 20283k freed
> dmar: Unsupported device scope
> audit: initializing netlink socket (disabled)
> type=2000 audit(1435290228.438:1): initialized
> HugeTLB registered 2 MB page size, pre-allocated 0 pages
>
> ...
>
> usb 2-1.1.5: Manufacturer: IBM
> usb 2-1.1.5: configuration #1 chosen from 2 choices
>
>
> dracut Warning: No root device "1" found
>
> dracut Warning: Boot has failed. To debug this issue add "rdshell" to the
> kernel command line.
>
>
> dracut Warning: Signal caught!
>
> dracut Warning: Boot has failed. To debug this issue add "rdshell" to the
> kernel command line.
> Kernel panic - not syncing: Attempted to kill init!
> Pid: 1, comm: init Tainted: G --------------- H
> 2.6.32-358.23.2.el6.x86_64 #1
> Call Trace:
> [<ffffffff8150daac>] ? panic+0xa7/0x16f
> [<ffffffff81073be2>] ? do_exit+0x862/0x870
> [<ffffffff81182c85>] ? fput+0x25/0x30
> [<ffffffff81073c48>] ? do_group_exit+0x58/0xd0
> [<ffffffff81073cd7>] ? sys_exit_group+0x17/0x20
> [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
>
> On Jun 25, 2015, at 2:04 PM, Jarrod Johnson <[email protected]> wrote:
>
>> You can nodeset <nodes> shell
>>
>> That'll get you an environment that should boot in them regardless, complete
>> with ssh and all.
>>
>> From: David D Johnson [mailto:[email protected]]
>> Sent: Thursday, June 25, 2015 2:00 PM
>> To: xCAT Users Mailing list
>> Subject: Re: [xcat-user] NextScale deployment kernel crash
>>
>> I may have jumped to conclusions about the reason, but in any case the two
>> new M5 machines don't boot.
>>
>> This is the line specifying drivers from our build script:
>> ./genimage -i eth0 -n dca,8021q,igb,bnx2,tg3 -o centos6.5 -k
>> 2.6.32-358.23.2.el6.x86_64 -p comp
>>
>>
>> As to the ethernet interfaces, from M4 machine the relevant ASU output looks
>> like
>> PXE.NicPortMacAddress.1=6C:AE:8B:08:94:ED
>> PXE.NicPortMacAddress.2=6C:AE:8B:08:94:EE
>> IntelRI350GigabitNetworkConnection-6CAE8B0894ED.LinkStatus=Connected
>> IntelRI350GigabitNetworkConnection-6CAE8B0894ED.AlternateMACAddress=6C:AE:8B:08:94:ED
>> IntelRI350GigabitNetworkConnection-6CAE8B0894ED.LinkSpeed=AutoNeg
>> IntelRI350GigabitNetworkConnection-6CAE8B0894ED.WakeonLAN=Enabled
>> IntelRI350GigabitNetworkConnection-6CAE8B0894EE.LinkStatus=Disconnected
>> IntelRI350GigabitNetworkConnection-6CAE8B0894EE.AlternateMACAddress=6C:AE:8B:08:94:EE
>> IntelRI350GigabitNetworkConnection-6CAE8B0894EE.LinkSpeed=AutoNeg
>> IntelRI350GigabitNetworkConnection-6CAE8B0894EE.WakeonLAN=Enabled
>>
>> On the new M5 machines, there are only two ports on the front (no dedicated
>> IMM port), but there are now four PXE Mac lines, #3 is shared port.
>> PXE.NicPortMacAddress.1=E4:1D:2D:73:56:01
>> PXE.NicPortMacAddress.2=E4:1D:2D:73:56:02
>> PXE.NicPortMacAddress.3=40:F2:E9:C5:51:12
>> PXE.NicPortMacAddress.4=40:F2:E9:C5:51:13
>>
>> Now I realize the first two are from the dual port FDR IB mezzanine card
>> (ConnectX-3 Pro). They can be used as 10/40/56 GbE, I suppose, but we want
>> to use one of them for FDR IB only, and the other one isn't connected to
>> anything.
>> The other two are BroadCom / Tg3
>>
>> I wish I could ssh to the machine so I could poke around and see what the
>> NICs are called. Maybe I will have to boot off a USB key. No disks in these
>> hosts.
>>
>> -- ddj
>>
>> On Jun 25, 2015, at 1:33 PM, Jarrod Johnson <[email protected]> wrote:
>>
>>
>> What nic driver was built in the initrd? m4 was igb, m5 uses tg3.
>>
>> " extra unusable Ethernet ports on the motherboard that mess up the
>> interface naming. Is there a workaround for this???"
>>
>> I'm interested in what this means and if I can help on that.
>>
>> From: David Johnson [mailto:[email protected]]
>> Sent: Thursday, June 25, 2015 11:30 AM
>> To: xCAT Users Mailing list
>> Subject: Re: [xcat-user] NextScale deployment kernel crash
>>
>> Yes, we are seeing exactly the same problem. 300 nodes from nehalem to
>> nextscale m4 all work fine with the same centos 6.5 image, but not so for
>> the the Lenovo nextscale M5 nodes. They seem to have extra unusable Ethernet
>> ports on the motherboard that mess up the interface naming. Is there a
>> workaround for this???
>>
>> -- ddj
>> Dave Johnson
>>
>> On Jun 25, 2015, at 10:49 AM, Damir Krstic <[email protected]> wrote:
>>
>> We are trying to boot NextScale nodes with our RedHat 6.4 stateless image.
>> They are crashing during the initrd boot process with following error:
>>
>> dracut Warning: No root device "1" found
>>
>>
>>
>> dracut Warning: Boot has failed. To debug this issue add "rdshell" to the
>> kernel command line.
>>
>>
>>
>> dracut Warning: Signal caught!
>>
>>
>>
>>
>>
>> dracut Warning: Boot has failed. To debug this issue add "rdshell" to the
>> kernel command line.
>>
>> Kernel panic - not syncing: Attempted to kill init!
>>
>> Pid: 1, comm: init Tainted: G --------------- H
>> 2.6.32-358.el6.x86_64 #1
>>
>> Call Trace:
>>
>> [<ffffffff8150cfc8>] ? panic+0xa7/0x16f
>>
>> [<ffffffff81073ae2>] ? do_exit+0x862/0x870
>>
>> [<ffffffff81182885>] ? fput+0x25/0x30
>>
>> [<ffffffff81073b48>] ? do_group_exit+0x58/0xd0
>>
>> [<ffffffff81073bd7>] ? sys_exit_group+0x17/0x20
>>
>> [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
>>
>> ------------[ cut here ]------------
>>
>> WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule+0x5c/0x60()
>> (Tainted: G --------------- H )
>>
>> Hardware name: IBM NeXtScale nx360 M5: -[5465AC1]-
>>
>> Modules linked in: sd_mod crc_t10dif ahci mlx4_core [last unloaded:
>> scsi_wait_scan]
>>
>> Pid: 1, comm: init Tainted: G --------------- H
>> 2.6.32-358.el6.x86_64 #1
>>
>> Call Trace:
>>
>> <IRQ> [<ffffffff8106e2e7>] ? warn_slowpath_common+0x87/0xc0
>>
>> [<ffffffff8106e33a>] ? warn_slowpath_null+0x1a/0x20
>>
>> [<ffffffff8102dd9c>] ? native_smp_send_reschedule+0x5c/0x60
>>
>> [<ffffffff8105ae28>] ? scheduler_tick+0x208/0x260
>>
>> [<ffffffff810a7fd0>] ? tick_sched_timer+0x0/0xc0
>>
>> [<ffffffff810811de>] ? update_process_times+0x6e/0x90
>>
>> [<ffffffff810a8036>] ? tick_sched_timer+0x66/0xc0
>>
>> [<ffffffff8109b38e>] ? __run_hrtimer+0x8e/0x1a0
>>
>> [<ffffffff810a182f>] ? ktime_get_update_offsets+0x4f/0xd0
>>
>> [<ffffffff8107700f>] ? __do_softirq+0x11f/0x1e0
>>
>> [<ffffffff8109b6f6>] ? hrtimer_interrupt+0xe6/0x260
>>
>> [<ffffffff81516d7b>] ? smp_apic_timer_interrupt+0x6b/0x9b
>>
>> [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
>>
>> <EOI> [<ffffffff8150d06d>] ? panic+0x14c/0x16f
>>
>> [<ffffffff8150cffa>] ? panic+0xd9/0x16f
>>
>> [<ffffffff81073ae2>] ? do_exit+0x862/0x870
>>
>> [<ffffffff81182885>] ? fput+0x25/0x30
>>
>> [<ffffffff81073b48>] ? do_group_exit+0x58/0xd0
>>
>> [<ffffffff81073bd7>] ? sys_exit_group+0x17/0x20
>>
>> [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
>>
>>
>>
>> Any help would be appreciated.
>>
>>
>>
>> Thanks,
>>
>> Damir
>>
>> ------------------------------------------------------------------------------
>> Monitor 25 network devices or servers for free with OpManager!
>> OpManager is web-based network management software that monitors
>> network devices and physical & virtual servers, alerts via email & sms
>> for fault. Monitor 25 devices for free with no restriction. Download now
>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
>> _______________________________________________
>> xCAT-user mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>> ------------------------------------------------------------------------------
>> Monitor 25 network devices or servers for free with OpManager!
>> OpManager is web-based network management software that monitors
>> network devices and physical & virtual servers, alerts via email & sms
>> for fault. Monitor 25 devices for free with no restriction. Download now
>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o_______________________________________________
>> xCAT-user mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>> ------------------------------------------------------------------------------
>> Monitor 25 network devices or servers for free with OpManager!
>> OpManager is web-based network management software that monitors
>> network devices and physical & virtual servers, alerts via email & sms
>> for fault. Monitor 25 devices for free with no restriction. Download now
>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o_______________________________________________
>> xCAT-user mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user