Great, thanks. I'm happy to contribute back to the community, so I'll have a look to see what I can do.
Cheers, Carl. On Thu, 18 Jul. 2019, 23:08 Jarrod Johnson, <jjohns...@lenovo.com> wrote: > It should come in the rpm prebuilt, so shouldn’t be different… > > > > So the most ‘make the problem go away’ solution would be to have xnba.pm > only do this when needed. Off hand I think this would be right (untested): > > > https://github.com/jjohnson42/xcat-core/commit/cd61fd9db468cd142537e5bd495b71310e6a6d07 > > > > If I were in the situation, I would probably satisfy curiosity by running > wireshark to see if any packets are emitted with :80 and if so, what looks > odd about them. > > > > Of course, another thing I’d be tempted to do would be to try a newer ipxe > build. I happen to have one built to see if newer codebase would behave > differently. However last time I had checked it seemed to have > compatibility issues with elilo. Elilo is no longer required for CentOS7 > and up (in conjunction with a modified xnba.pm I have), but CentOS6 > kernels still need elilo. > > > > So I suppose there are three options, depending on how little time you > want to spend to make the problem go away or understand more. > > > > > > > > *From:* Carl <mutantll...@gmail.com> > *Sent:* Thursday, July 18, 2019 8:53 AM > *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > *Subject:* Re: [xcat-user] [External] Re: Unable to pxe boot node after > mainboard replacement > > > > Thanks Jarrod, > > > > Yes it is a little strange. > > > > I'm not seeing anything on the http server logs when the dhcp lease has > :80 in the entry. > > > > I don't fully understand how xnba is built, could it be bringing in > something from the management node (CentOS 6.5) that might be part of the > issue? > > > > Cheers, > > > > Carl. > > > > On Thu, 18 Jul. 2019, 22:35 Jarrod Johnson, <jjohns...@lenovo.com> wrote: > > The change is from: > > commit 1889ec879d2ba721869217ad2e4f03d47b7fba40 > > Author: yangsbj <yang...@cn.ibm.com> > > Date: Thu Nov 1 23:29:01 2018 -0400 > > > > support site.httpport in nodeset and mknb > > > > > > Prior to that change, non-80 ports did not work. > > > > What is unusual is that 80 should be the normal port and the url parsing > should be xNBA and not UEFI specific, so I’m uncertain why :80 would cause > a problem in your environment. > > > > Nodes that have not been ‘nodeset’ since your upgrade would not have the > :80…. > > > > A reasonable mitigation in the code would be to skip the port designation > if it is default, though it is still fairly odd that this would do anything > different… > > > > *From:* Carl <mutantll...@gmail.com> > *Sent:* Thursday, July 18, 2019 4:01 AM > *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > *Subject:* [External] Re: [xcat-user] Unable to pxe boot node after > mainboard replacement > > > > Hi all, > > > > Further to the above I have managed to isolate the issue. > > > > It looks like when nodeset is run, it is adding :80 to the boot options in > the leases file. > > > > Eg: > > > > host comp078 { > dynamic; > hardware ethernet 00:0a:f7:be:fc:de; > uid 00:0a:f7:be:fc:de; > fixed-address 100.64.1.78; > supersede server.ddns-hostname = "comp078"; > supersede host-name = "comp078"; > if option user-class-identifier = "xNBA" and option > client-architecture > = 00:00 { > supersede server.always-broadcast = 01; > supersede server.filename = > " > http://${next-server}:80/tftpboot/xcat/xnba/nodes/comp078"; > } elsif option user-class-identifier = "xNBA" and option > client-architecture = 00:09 { > supersede server.filename = > " > http://${next-server}:80/tftpboot/xcat/xnba/nodes/comp078.uefi"; > } elsif option client-architecture = 00:07 { > supersede server.filename = "xcat/xnba.efi"; > } elsif option client-architecture = 00:00 { > supersede server.filename = "xcat/xnba.kpxe"; > } else { > supersede server.filename = ""; > } > } > > > > If I manually edit the leases file and remove :80 from the two filename > entries above, the node is able to boot fine. > > > > Is anyone able to advise on why my environment might be now doing this? > > > > Thanks, > > > > Carl. > > > > > > > > > > > > On Thu, 18 Jul 2019 at 16:22, Carl <mutantll...@gmail.com> wrote: > > Hi Folks, > > We recently replaced the mainboard on a Dell R640. > > I removed the mac address from the node definition and let switch based > discovery take care of discovering the new MAC address and running BMC > setup. Everything went well and the node ended at the xcat shell. > > However when I tried to boot the node (statelite) its failing to find the > image and if I persist it dies with a horible UEFI error. The node also has > this problem if I nodeset it to boot to shell. > > As other nodes are able to boot statelite fine, I assumed that it was a > hardware error. Dell has replaced the mainboard a second time, but the > issue still persists. > > > > It might be worth mentioning that the last time that we had a mainboard > replacement on a comp node was about 9 months ago and we have updated xCat > a couple of times since then. Attached is the console log of the UEFI crash > and the pxe boot messages that are seen on a working and non-working node. > > Is anyone able to suggest any tricks to further debug this issue. I'm > reluctant to pin the problem on xCat, but find it unlikely that I have hit > two mainboards with the same fault. > > Thanks, > > Carl. > > > > #### These are the pxe boot messages for the node that isnt working #### > [2019-07-10T10:45:47+10:00] ESC[2JESC[01;01HBooting from PXE Device 2: > Integrated NIC 1 Port 3 Partition 1 > [2019-07-10T10:45:48+10:00] > [2019-07-10T10:45:48+10:00] >>Start PXE over IPv4. > [2019-07-10T10:45:52+10:00] Station IP address is 100.64.1.78 > [2019-07-10T10:45:52+10:00] > [2019-07-10T10:45:52+10:00] Server IP address is 100.64.0.1 > [2019-07-10T10:45:52+10:00] NBP filename is xcat/xnba.efi > [2019-07-10T10:45:52+10:00] NBP filesize is 139200 Bytes > [2019-07-10T10:45:52+10:00] Downloading NBP file... > [2019-07-10T10:45:52+10:00] > [2019-07-10T10:45:52+10:00] NBP file downloaded successfully. > [2019-07-10T10:45:52+10:00] xNBA initialising devices...ok > [2019-07-10T10:45:52+10:00] > [2019-07-10T10:45:52+10:00] > [2019-07-10T10:45:52+10:00] xCAT Network Boot Agent > [2019-07-10T10:45:52+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028 > (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware -- > ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m > [2019-07-10T10:45:52+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI > [2019-07-10T10:45:52+10:00] net0: 00:0a:f7:be:b7:d2 using <NULL> on EFI > SNP (open) > [2019-07-10T10:45:52+10:00] [Link:up, TX:0 TXE:0 RX:0 RXE:0] > [2019-07-10T10:45:52+10:00] DHCP (net0 00:0a:f7:be:b7:d2)... ok > [2019-07-10T10:45:52+10:00] net0: 100.64.1.78/255.255.248.0 gw 100.64.0.1 > [2019-07-10T10:45:52+10:00] Next server: 100.64.0.1 > [2019-07-10T10:45:52+10:00] Filename: > http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi > [2019-07-10T10:45:52+10:00] > http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi.................. > Connection timed out (http://ipxe.org/4c0a6012) > [2019-07-10T10:46:08+10:00] No more network devices > [2019-07-10T10:46:08+10:00] xNBA initialising devices...ok > [2019-07-10T10:46:08+10:00] > [2019-07-10T10:46:08+10:00] > [2019-07-10T10:46:08+10:00] xCAT Network Boot Agent > [2019-07-10T10:46:08+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028 > (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware -- > ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m > [2019-07-10T10:46:08+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI > [2019-07-10T10:46:08+10:00] net1: 00:0a:f7:be:b7:d2 using <NULL> on EFI > SNP (open) > [2019-07-10T10:46:08+10:00] [Link:up, TX:0 TXE:0 RX:0 RXE:0] > [2019-07-10T10:46:08+10:00] DHCP (net1 00:0a:f7:be:b7:d2)... ok > [2019-07-10T10:46:08+10:00] net1: 100.64.1.78/255.255.248.0 gw 100.64.0.1 > [2019-07-10T10:46:08+10:00] Next server: 100.64.0.1 > [2019-07-10T10:46:08+10:00] Filename: > http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi > [2019-07-10T10:46:08+10:00] > http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi.................. > Connection timed out (http://ipxe.org/4c0a6012) > [2019-07-10T10:46:24+10:00] No more network devices > > > > #### As a comparison, this is what we see on a node that boots fine #### > [2019-07-18T11:59:45+10:00] ESC[0mESC[37mESC[40mESC[2JESC[01;01HBooting > from PXE Device 1: Integrated NIC 1 Port 3 Partition 1 > [2019-07-18T11:59:46+10:00] > [2019-07-18T11:59:46+10:00] >>Start PXE over IPv4. > [2019-07-18T11:59:50+10:00] Station IP address is 100.64.1.86 > [2019-07-18T11:59:50+10:00] > [2019-07-18T11:59:50+10:00] Server IP address is 100.64.0.1 > [2019-07-18T11:59:50+10:00] NBP filename is xcat/xnba.efi > [2019-07-18T11:59:50+10:00] NBP filesize is 139200 Bytes > [2019-07-18T11:59:50+10:00] Downloading NBP file... > [2019-07-18T11:59:50+10:00] > [2019-07-18T11:59:50+10:00] NBP file downloaded successfully. > [2019-07-18T11:59:50+10:00] xNBA initialising devices...ok > [2019-07-18T11:59:50+10:00] > [2019-07-18T11:59:50+10:00] > [2019-07-18T11:59:50+10:00] xCAT Network Boot Agent > [2019-07-18T11:59:50+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028 > (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware -- > ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m > [2019-07-18T11:59:50+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI > [2019-07-18T11:59:50+10:00] net0: 00:0a:f7:bd:e6:b8 using <NULL> on EFI > SNP (open) > [2019-07-18T11:59:50+10:00] [Link:up, TX:0 TXE:0 RX:0 RXE:0] > [2019-07-18T11:59:50+10:00] DHCP (net0 00:0a:f7:bd:e6:b8)... ok > [2019-07-18T11:59:50+10:00] net0: 100.64.1.86/255.255.248.0 gw 100.64.0.1 > [2019-07-18T11:59:50+10:00] Next server: 100.64.0.1 > [2019-07-18T11:59:50+10:00] Filename: > http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi > [2019-07-18T11:59:51+10:00] > http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi... ok > [2019-07-18T11:59:51+10:00] http://100.64.0.1/tftpboot/xcat/elilo-x64.efi... > ok > [2019-07-18T11:59:51+10:00] ELILO v3.14 for EFI/x86_64 > [2019-07-18T11:59:51+10:00] Loading kernel > /tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/kernel... done > [2019-07-18T11:59:51+10:00] Loading file > /tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/initrd-stateless.gz...done > > _______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > _______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user >
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user