Hi Folks,

We recently replaced the mainboard on a Dell R640.

I removed the mac address from the node definition and let switch based
discovery take care of discovering the new MAC address and running BMC
setup. Everything went well and the node ended at the xcat shell.

However when I tried to boot the node (statelite) its failing to find the
image and if I persist it dies with a horible UEFI error. The node also has
this problem if I nodeset it to boot to shell.

As other nodes are able to boot statelite fine, I assumed that it was a
hardware error. Dell has replaced the mainboard a second time, but the
issue still persists.

It might be worth mentioning that the last time that we had a mainboard
replacement on a comp node was about 9 months ago and we have updated xCat
a couple of times since then. Attached is the console log of the UEFI crash
and the pxe boot messages that are seen on a working and non-working node.

Is anyone able to suggest any tricks to further debug this issue. I'm
reluctant to pin the problem on xCat, but find it unlikely that I have hit
two mainboards with the same fault.

Thanks,

Carl.



#### These are the pxe boot messages for the node that isnt working ####
[2019-07-10T10:45:47+10:00] ESC[2JESC[01;01HBooting from PXE Device 2:
Integrated NIC 1 Port 3 Partition 1
[2019-07-10T10:45:48+10:00]
[2019-07-10T10:45:48+10:00] >>Start PXE over IPv4.
[2019-07-10T10:45:52+10:00]   Station IP address is 100.64.1.78
[2019-07-10T10:45:52+10:00]
[2019-07-10T10:45:52+10:00]   Server IP address is 100.64.0.1
[2019-07-10T10:45:52+10:00]   NBP filename is xcat/xnba.efi
[2019-07-10T10:45:52+10:00]   NBP filesize is 139200 Bytes
[2019-07-10T10:45:52+10:00]  Downloading NBP file...
[2019-07-10T10:45:52+10:00]
[2019-07-10T10:45:52+10:00]   NBP file downloaded successfully.
[2019-07-10T10:45:52+10:00] xNBA initialising devices...ok
[2019-07-10T10:45:52+10:00]
[2019-07-10T10:45:52+10:00]
[2019-07-10T10:45:52+10:00] xCAT Network Boot Agent
[2019-07-10T10:45:52+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028
(d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware --
ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
[2019-07-10T10:45:52+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
[2019-07-10T10:45:52+10:00] net0: 00:0a:f7:be:b7:d2 using <NULL> on EFI SNP
(open)
[2019-07-10T10:45:52+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
[2019-07-10T10:45:52+10:00] DHCP (net0 00:0a:f7:be:b7:d2)... ok
[2019-07-10T10:45:52+10:00] net0: 100.64.1.78/255.255.248.0 gw 100.64.0.1
[2019-07-10T10:45:52+10:00] Next server: 100.64.0.1
[2019-07-10T10:45:52+10:00] Filename:
http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi
[2019-07-10T10:45:52+10:00]
http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi..................
Connection timed out (http://ipxe.org/4c0a6012)
[2019-07-10T10:46:08+10:00] No more network devices
[2019-07-10T10:46:08+10:00] xNBA initialising devices...ok
[2019-07-10T10:46:08+10:00]
[2019-07-10T10:46:08+10:00]
[2019-07-10T10:46:08+10:00] xCAT Network Boot Agent
[2019-07-10T10:46:08+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028
(d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware --
ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
[2019-07-10T10:46:08+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
[2019-07-10T10:46:08+10:00] net1: 00:0a:f7:be:b7:d2 using <NULL> on EFI SNP
(open)
[2019-07-10T10:46:08+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
[2019-07-10T10:46:08+10:00] DHCP (net1 00:0a:f7:be:b7:d2)... ok
[2019-07-10T10:46:08+10:00] net1: 100.64.1.78/255.255.248.0 gw 100.64.0.1
[2019-07-10T10:46:08+10:00] Next server: 100.64.0.1
[2019-07-10T10:46:08+10:00] Filename:
http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi
[2019-07-10T10:46:08+10:00]
http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi..................
Connection timed out (http://ipxe.org/4c0a6012)
[2019-07-10T10:46:24+10:00] No more network devices



#### As a comparison, this is what we see on a node that boots fine ####
[2019-07-18T11:59:45+10:00] ESC[0mESC[37mESC[40mESC[2JESC[01;01HBooting
from PXE Device 1: Integrated NIC 1 Port 3 Partition 1
[2019-07-18T11:59:46+10:00]
[2019-07-18T11:59:46+10:00] >>Start PXE over IPv4.
[2019-07-18T11:59:50+10:00]   Station IP address is 100.64.1.86
[2019-07-18T11:59:50+10:00]
[2019-07-18T11:59:50+10:00]   Server IP address is 100.64.0.1
[2019-07-18T11:59:50+10:00]   NBP filename is xcat/xnba.efi
[2019-07-18T11:59:50+10:00]   NBP filesize is 139200 Bytes
[2019-07-18T11:59:50+10:00]  Downloading NBP file...
[2019-07-18T11:59:50+10:00]
[2019-07-18T11:59:50+10:00]   NBP file downloaded successfully.
[2019-07-18T11:59:50+10:00] xNBA initialising devices...ok
[2019-07-18T11:59:50+10:00]
[2019-07-18T11:59:50+10:00]
[2019-07-18T11:59:50+10:00] xCAT Network Boot Agent
[2019-07-18T11:59:50+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028
(d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware --
ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
[2019-07-18T11:59:50+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
[2019-07-18T11:59:50+10:00] net0: 00:0a:f7:bd:e6:b8 using <NULL> on EFI SNP
(open)
[2019-07-18T11:59:50+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
[2019-07-18T11:59:50+10:00] DHCP (net0 00:0a:f7:bd:e6:b8)... ok
[2019-07-18T11:59:50+10:00] net0: 100.64.1.86/255.255.248.0 gw 100.64.0.1
[2019-07-18T11:59:50+10:00] Next server: 100.64.0.1
[2019-07-18T11:59:50+10:00] Filename:
http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi
[2019-07-18T11:59:51+10:00]
http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi... ok
[2019-07-18T11:59:51+10:00] http://100.64.0.1/tftpboot/xcat/elilo-x64.efi...
ok
[2019-07-18T11:59:51+10:00] ELILO v3.14 for EFI/x86_64
[2019-07-18T11:59:51+10:00] Loading kernel
/tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/kernel...  done
[2019-07-18T11:59:51+10:00] Loading file
/tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/initrd-stateless.gz...done

Attachment: comp078-crash.log
Description: Binary data

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to