Thanks Jarrod, Yes it is a little strange.
I'm not seeing anything on the http server logs when the dhcp lease has :80 in the entry. I don't fully understand how xnba is built, could it be bringing in something from the management node (CentOS 6.5) that might be part of the issue? Cheers, Carl. On Thu, 18 Jul. 2019, 22:35 Jarrod Johnson, <jjohns...@lenovo.com> wrote: > The change is from: > > commit 1889ec879d2ba721869217ad2e4f03d47b7fba40 > > Author: yangsbj <yang...@cn.ibm.com> > > Date: Thu Nov 1 23:29:01 2018 -0400 > > > > support site.httpport in nodeset and mknb > > > > > > Prior to that change, non-80 ports did not work. > > > > What is unusual is that 80 should be the normal port and the url parsing > should be xNBA and not UEFI specific, so I’m uncertain why :80 would cause > a problem in your environment. > > > > Nodes that have not been ‘nodeset’ since your upgrade would not have the > :80…. > > > > A reasonable mitigation in the code would be to skip the port designation > if it is default, though it is still fairly odd that this would do anything > different… > > > > *From:* Carl <mutantll...@gmail.com> > *Sent:* Thursday, July 18, 2019 4:01 AM > *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > *Subject:* [External] Re: [xcat-user] Unable to pxe boot node after > mainboard replacement > > > > Hi all, > > > > Further to the above I have managed to isolate the issue. > > > > It looks like when nodeset is run, it is adding :80 to the boot options in > the leases file. > > > > Eg: > > > > host comp078 { > dynamic; > hardware ethernet 00:0a:f7:be:fc:de; > uid 00:0a:f7:be:fc:de; > fixed-address 100.64.1.78; > supersede server.ddns-hostname = "comp078"; > supersede host-name = "comp078"; > if option user-class-identifier = "xNBA" and option > client-architecture > = 00:00 { > supersede server.always-broadcast = 01; > supersede server.filename = > " > http://${next-server}:80/tftpboot/xcat/xnba/nodes/comp078"; > } elsif option user-class-identifier = "xNBA" and option > client-architecture = 00:09 { > supersede server.filename = > " > http://${next-server}:80/tftpboot/xcat/xnba/nodes/comp078.uefi"; > } elsif option client-architecture = 00:07 { > supersede server.filename = "xcat/xnba.efi"; > } elsif option client-architecture = 00:00 { > supersede server.filename = "xcat/xnba.kpxe"; > } else { > supersede server.filename = ""; > } > } > > > > If I manually edit the leases file and remove :80 from the two filename > entries above, the node is able to boot fine. > > > > Is anyone able to advise on why my environment might be now doing this? > > > > Thanks, > > > > Carl. > > > > > > > > > > > > On Thu, 18 Jul 2019 at 16:22, Carl <mutantll...@gmail.com> wrote: > > Hi Folks, > > We recently replaced the mainboard on a Dell R640. > > I removed the mac address from the node definition and let switch based > discovery take care of discovering the new MAC address and running BMC > setup. Everything went well and the node ended at the xcat shell. > > However when I tried to boot the node (statelite) its failing to find the > image and if I persist it dies with a horible UEFI error. The node also has > this problem if I nodeset it to boot to shell. > > As other nodes are able to boot statelite fine, I assumed that it was a > hardware error. Dell has replaced the mainboard a second time, but the > issue still persists. > > > > It might be worth mentioning that the last time that we had a mainboard > replacement on a comp node was about 9 months ago and we have updated xCat > a couple of times since then. Attached is the console log of the UEFI crash > and the pxe boot messages that are seen on a working and non-working node. > > Is anyone able to suggest any tricks to further debug this issue. I'm > reluctant to pin the problem on xCat, but find it unlikely that I have hit > two mainboards with the same fault. > > Thanks, > > Carl. > > > > #### These are the pxe boot messages for the node that isnt working #### > [2019-07-10T10:45:47+10:00] ESC[2JESC[01;01HBooting from PXE Device 2: > Integrated NIC 1 Port 3 Partition 1 > [2019-07-10T10:45:48+10:00] > [2019-07-10T10:45:48+10:00] >>Start PXE over IPv4. > [2019-07-10T10:45:52+10:00] Station IP address is 100.64.1.78 > [2019-07-10T10:45:52+10:00] > [2019-07-10T10:45:52+10:00] Server IP address is 100.64.0.1 > [2019-07-10T10:45:52+10:00] NBP filename is xcat/xnba.efi > [2019-07-10T10:45:52+10:00] NBP filesize is 139200 Bytes > [2019-07-10T10:45:52+10:00] Downloading NBP file... > [2019-07-10T10:45:52+10:00] > [2019-07-10T10:45:52+10:00] NBP file downloaded successfully. > [2019-07-10T10:45:52+10:00] xNBA initialising devices...ok > [2019-07-10T10:45:52+10:00] > [2019-07-10T10:45:52+10:00] > [2019-07-10T10:45:52+10:00] xCAT Network Boot Agent > [2019-07-10T10:45:52+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028 > (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware -- > ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m > [2019-07-10T10:45:52+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI > [2019-07-10T10:45:52+10:00] net0: 00:0a:f7:be:b7:d2 using <NULL> on EFI > SNP (open) > [2019-07-10T10:45:52+10:00] [Link:up, TX:0 TXE:0 RX:0 RXE:0] > [2019-07-10T10:45:52+10:00] DHCP (net0 00:0a:f7:be:b7:d2)... ok > [2019-07-10T10:45:52+10:00] net0: 100.64.1.78/255.255.248.0 gw 100.64.0.1 > [2019-07-10T10:45:52+10:00] Next server: 100.64.0.1 > [2019-07-10T10:45:52+10:00] Filename: > http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi > [2019-07-10T10:45:52+10:00] > http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi.................. > Connection timed out (http://ipxe.org/4c0a6012) > [2019-07-10T10:46:08+10:00] No more network devices > [2019-07-10T10:46:08+10:00] xNBA initialising devices...ok > [2019-07-10T10:46:08+10:00] > [2019-07-10T10:46:08+10:00] > [2019-07-10T10:46:08+10:00] xCAT Network Boot Agent > [2019-07-10T10:46:08+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028 > (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware -- > ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m > [2019-07-10T10:46:08+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI > [2019-07-10T10:46:08+10:00] net1: 00:0a:f7:be:b7:d2 using <NULL> on EFI > SNP (open) > [2019-07-10T10:46:08+10:00] [Link:up, TX:0 TXE:0 RX:0 RXE:0] > [2019-07-10T10:46:08+10:00] DHCP (net1 00:0a:f7:be:b7:d2)... ok > [2019-07-10T10:46:08+10:00] net1: 100.64.1.78/255.255.248.0 gw 100.64.0.1 > [2019-07-10T10:46:08+10:00] Next server: 100.64.0.1 > [2019-07-10T10:46:08+10:00] Filename: > http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi > [2019-07-10T10:46:08+10:00] > http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi.................. > Connection timed out (http://ipxe.org/4c0a6012) > [2019-07-10T10:46:24+10:00] No more network devices > > > > #### As a comparison, this is what we see on a node that boots fine #### > [2019-07-18T11:59:45+10:00] ESC[0mESC[37mESC[40mESC[2JESC[01;01HBooting > from PXE Device 1: Integrated NIC 1 Port 3 Partition 1 > [2019-07-18T11:59:46+10:00] > [2019-07-18T11:59:46+10:00] >>Start PXE over IPv4. > [2019-07-18T11:59:50+10:00] Station IP address is 100.64.1.86 > [2019-07-18T11:59:50+10:00] > [2019-07-18T11:59:50+10:00] Server IP address is 100.64.0.1 > [2019-07-18T11:59:50+10:00] NBP filename is xcat/xnba.efi > [2019-07-18T11:59:50+10:00] NBP filesize is 139200 Bytes > [2019-07-18T11:59:50+10:00] Downloading NBP file... > [2019-07-18T11:59:50+10:00] > [2019-07-18T11:59:50+10:00] NBP file downloaded successfully. > [2019-07-18T11:59:50+10:00] xNBA initialising devices...ok > [2019-07-18T11:59:50+10:00] > [2019-07-18T11:59:50+10:00] > [2019-07-18T11:59:50+10:00] xCAT Network Boot Agent > [2019-07-18T11:59:50+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028 > (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware -- > ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m > [2019-07-18T11:59:50+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI > [2019-07-18T11:59:50+10:00] net0: 00:0a:f7:bd:e6:b8 using <NULL> on EFI > SNP (open) > [2019-07-18T11:59:50+10:00] [Link:up, TX:0 TXE:0 RX:0 RXE:0] > [2019-07-18T11:59:50+10:00] DHCP (net0 00:0a:f7:bd:e6:b8)... ok > [2019-07-18T11:59:50+10:00] net0: 100.64.1.86/255.255.248.0 gw 100.64.0.1 > [2019-07-18T11:59:50+10:00] Next server: 100.64.0.1 > [2019-07-18T11:59:50+10:00] Filename: > http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi > [2019-07-18T11:59:51+10:00] > http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi... ok > [2019-07-18T11:59:51+10:00] http://100.64.0.1/tftpboot/xcat/elilo-x64.efi... > ok > [2019-07-18T11:59:51+10:00] ELILO v3.14 for EFI/x86_64 > [2019-07-18T11:59:51+10:00] Loading kernel > /tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/kernel... done > [2019-07-18T11:59:51+10:00] Loading file > /tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/initrd-stateless.gz...done > > > _______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user >
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user