Well, that got me a shell, and the system got the correct address on eth0.
Waiting for device with address 40:f2:e9:c5:54:00 to appear..Done
Acquired IPv4 address on eth0: 172.20.204.56/16
FATAL: Module ipmi_si not found.
Dropping to debug shell, exit to check for further action
[xCAT Genesis running on node855 /]# ***A request variable unknown to the server
***A request variable unknown to the server
***A request variable unknown to the server
***A request variable unknown to the server
***A request variable unknown to the server
***A request variable unknown to the server
***A request variable unknown to the server
***A request variable unknown to the server
***A request variable unknown to the server
***A request variable unknown to the server
One of these messages every second.
So several annoying issues:
1) the ASU PXE lines do not show up in any predictable order.
The second new host has them scrambled, #2 is the shared 1GbE port...
PXE.NicPortMacAddress.1=E4:1D:2D:73:57:B1
PXE.NicPortMacAddress.2=40:F2:E9:C5:54:00
PXE.NicPortMacAddress.3=40:F2:E9:C5:54:01
PXE.NicPortMacAddress.4=E4:1D:2D:73:57:B2
I have been used to grabbing Address.1 and stuffing the MAC into tabedit mac.
2) view of Lenovo M5 node booting is very different, some lines do not show
up in the serial redirected rcons view that do show up with a real KVM console.
But the main issue remaining 3) is why the boot fails.
I just tried again nodeset node855 osimage=centos6.5-x86_64-netboot-comp
and rebooted...
Trying to unpack rootfs image as initramfs...
Freeing initrd memory: 20283k freed
dmar: Unsupported device scope
audit: initializing netlink socket (disabled)
type=2000 audit(1435290228.438:1): initialized
HugeTLB registered 2 MB page size, pre-allocated 0 pages
...
usb 2-1.1.5: Manufacturer: IBM
usb 2-1.1.5: configuration #1 chosen from 2 choices
dracut Warning: No root device "1" found
dracut Warning: Boot has failed. To debug this issue add "rdshell" to the
kernel command line.
dracut Warning: Signal caught!
dracut Warning: Boot has failed. To debug this issue add "rdshell" to the
kernel command line.
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Tainted: G --------------- H
2.6.32-358.23.2.el6.x86_64 #1
Call Trace:
[<ffffffff8150daac>] ? panic+0xa7/0x16f
[<ffffffff81073be2>] ? do_exit+0x862/0x870
[<ffffffff81182c85>] ? fput+0x25/0x30
[<ffffffff81073c48>] ? do_group_exit+0x58/0xd0
[<ffffffff81073cd7>] ? sys_exit_group+0x17/0x20
[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
On Jun 25, 2015, at 2:04 PM, Jarrod Johnson <[email protected]> wrote:
> You can nodeset <nodes> shell
>
> That'll get you an environment that should boot in them regardless, complete
> with ssh and all.
>
> From: David D Johnson [mailto:[email protected]]
> Sent: Thursday, June 25, 2015 2:00 PM
> To: xCAT Users Mailing list
> Subject: Re: [xcat-user] NextScale deployment kernel crash
>
> I may have jumped to conclusions about the reason, but in any case the two
> new M5 machines don't boot.
>
> This is the line specifying drivers from our build script:
> ./genimage -i eth0 -n dca,8021q,igb,bnx2,tg3 -o centos6.5 -k
> 2.6.32-358.23.2.el6.x86_64 -p comp
>
>
> As to the ethernet interfaces, from M4 machine the relevant ASU output looks
> like
> PXE.NicPortMacAddress.1=6C:AE:8B:08:94:ED
> PXE.NicPortMacAddress.2=6C:AE:8B:08:94:EE
> IntelRI350GigabitNetworkConnection-6CAE8B0894ED.LinkStatus=Connected
> IntelRI350GigabitNetworkConnection-6CAE8B0894ED.AlternateMACAddress=6C:AE:8B:08:94:ED
> IntelRI350GigabitNetworkConnection-6CAE8B0894ED.LinkSpeed=AutoNeg
> IntelRI350GigabitNetworkConnection-6CAE8B0894ED.WakeonLAN=Enabled
> IntelRI350GigabitNetworkConnection-6CAE8B0894EE.LinkStatus=Disconnected
> IntelRI350GigabitNetworkConnection-6CAE8B0894EE.AlternateMACAddress=6C:AE:8B:08:94:EE
> IntelRI350GigabitNetworkConnection-6CAE8B0894EE.LinkSpeed=AutoNeg
> IntelRI350GigabitNetworkConnection-6CAE8B0894EE.WakeonLAN=Enabled
>
> On the new M5 machines, there are only two ports on the front (no dedicated
> IMM port), but there are now four PXE Mac lines, #3 is shared port.
> PXE.NicPortMacAddress.1=E4:1D:2D:73:56:01
> PXE.NicPortMacAddress.2=E4:1D:2D:73:56:02
> PXE.NicPortMacAddress.3=40:F2:E9:C5:51:12
> PXE.NicPortMacAddress.4=40:F2:E9:C5:51:13
>
> Now I realize the first two are from the dual port FDR IB mezzanine card
> (ConnectX-3 Pro). They can be used as 10/40/56 GbE, I suppose, but we want to
> use one of them for FDR IB only, and the other one isn't connected to
> anything.
> The other two are BroadCom / Tg3
>
> I wish I could ssh to the machine so I could poke around and see what the
> NICs are called. Maybe I will have to boot off a USB key. No disks in these
> hosts.
>
> -- ddj
>
> On Jun 25, 2015, at 1:33 PM, Jarrod Johnson <[email protected]> wrote:
>
>
> What nic driver was built in the initrd? m4 was igb, m5 uses tg3.
>
> " extra unusable Ethernet ports on the motherboard that mess up the interface
> naming. Is there a workaround for this???"
>
> I'm interested in what this means and if I can help on that.
>
> From: David Johnson [mailto:[email protected]]
> Sent: Thursday, June 25, 2015 11:30 AM
> To: xCAT Users Mailing list
> Subject: Re: [xcat-user] NextScale deployment kernel crash
>
> Yes, we are seeing exactly the same problem. 300 nodes from nehalem to
> nextscale m4 all work fine with the same centos 6.5 image, but not so for the
> the Lenovo nextscale M5 nodes. They seem to have extra unusable Ethernet
> ports on the motherboard that mess up the interface naming. Is there a
> workaround for this???
>
> -- ddj
> Dave Johnson
>
> On Jun 25, 2015, at 10:49 AM, Damir Krstic <[email protected]> wrote:
>
> We are trying to boot NextScale nodes with our RedHat 6.4 stateless image.
> They are crashing during the initrd boot process with following error:
>
> dracut Warning: No root device "1" found
>
>
>
> dracut Warning: Boot has failed. To debug this issue add "rdshell" to the
> kernel command line.
>
>
>
> dracut Warning: Signal caught!
>
>
>
>
>
> dracut Warning: Boot has failed. To debug this issue add "rdshell" to the
> kernel command line.
>
> Kernel panic - not syncing: Attempted to kill init!
>
> Pid: 1, comm: init Tainted: G --------------- H
> 2.6.32-358.el6.x86_64 #1
>
> Call Trace:
>
> [<ffffffff8150cfc8>] ? panic+0xa7/0x16f
>
> [<ffffffff81073ae2>] ? do_exit+0x862/0x870
>
> [<ffffffff81182885>] ? fput+0x25/0x30
>
> [<ffffffff81073b48>] ? do_group_exit+0x58/0xd0
>
> [<ffffffff81073bd7>] ? sys_exit_group+0x17/0x20
>
> [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
>
> ------------[ cut here ]------------
>
> WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule+0x5c/0x60()
> (Tainted: G --------------- H )
>
> Hardware name: IBM NeXtScale nx360 M5: -[5465AC1]-
>
> Modules linked in: sd_mod crc_t10dif ahci mlx4_core [last unloaded:
> scsi_wait_scan]
>
> Pid: 1, comm: init Tainted: G --------------- H
> 2.6.32-358.el6.x86_64 #1
>
> Call Trace:
>
> <IRQ> [<ffffffff8106e2e7>] ? warn_slowpath_common+0x87/0xc0
>
> [<ffffffff8106e33a>] ? warn_slowpath_null+0x1a/0x20
>
> [<ffffffff8102dd9c>] ? native_smp_send_reschedule+0x5c/0x60
>
> [<ffffffff8105ae28>] ? scheduler_tick+0x208/0x260
>
> [<ffffffff810a7fd0>] ? tick_sched_timer+0x0/0xc0
>
> [<ffffffff810811de>] ? update_process_times+0x6e/0x90
>
> [<ffffffff810a8036>] ? tick_sched_timer+0x66/0xc0
>
> [<ffffffff8109b38e>] ? __run_hrtimer+0x8e/0x1a0
>
> [<ffffffff810a182f>] ? ktime_get_update_offsets+0x4f/0xd0
>
> [<ffffffff8107700f>] ? __do_softirq+0x11f/0x1e0
>
> [<ffffffff8109b6f6>] ? hrtimer_interrupt+0xe6/0x260
>
> [<ffffffff81516d7b>] ? smp_apic_timer_interrupt+0x6b/0x9b
>
> [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
>
> <EOI> [<ffffffff8150d06d>] ? panic+0x14c/0x16f
>
> [<ffffffff8150cffa>] ? panic+0xd9/0x16f
>
> [<ffffffff81073ae2>] ? do_exit+0x862/0x870
>
> [<ffffffff81182885>] ? fput+0x25/0x30
>
> [<ffffffff81073b48>] ? do_group_exit+0x58/0xd0
>
> [<ffffffff81073bd7>] ? sys_exit_group+0x17/0x20
>
> [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
>
>
>
> Any help would be appreciated.
>
>
>
> Thanks,
>
> Damir
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> xCAT-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/xcat-user
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o_______________________________________________
> xCAT-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o_______________________________________________
> xCAT-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user