It should come in the rpm prebuilt, so shouldn’t be different…

So the most ‘make the problem go away’ solution would be to have xnba.pm only 
do this when needed.  Off hand I think this would be right (untested):
https://github.com/jjohnson42/xcat-core/commit/cd61fd9db468cd142537e5bd495b71310e6a6d07

If I were in the situation, I would probably satisfy curiosity by running 
wireshark to see if any packets are emitted with :80 and if so, what looks odd 
about them.

Of course, another thing I’d be tempted to do would be to try a newer ipxe 
build.  I happen to have one built to see if newer codebase would behave 
differently.  However last time I had checked it seemed to have compatibility 
issues with elilo.  Elilo is no longer required for CentOS7 and up (in 
conjunction with a modified xnba.pm I have), but CentOS6 kernels still need 
elilo.

So I suppose there are three options, depending on how little time you want to 
spend to make the problem go away or understand more.



From: Carl <mutantll...@gmail.com>
Sent: Thursday, July 18, 2019 8:53 AM
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Subject: Re: [xcat-user] [External] Re: Unable to pxe boot node after mainboard 
replacement

Thanks Jarrod,

Yes it is a little strange.

I'm not seeing anything on the http server logs when the dhcp lease has :80 in 
the entry.

I don't fully understand how xnba is built, could it be bringing in something 
from the management node (CentOS 6.5) that might be part of the issue?

Cheers,

Carl.

On Thu, 18 Jul. 2019, 22:35 Jarrod Johnson, 
<jjohns...@lenovo.com<mailto:jjohns...@lenovo.com>> wrote:
The change is from:
commit 1889ec879d2ba721869217ad2e4f03d47b7fba40
Author: yangsbj <yang...@cn.ibm.com<mailto:yang...@cn.ibm.com>>
Date:   Thu Nov 1 23:29:01 2018 -0400

    support site.httpport in nodeset and mknb


Prior to that change, non-80 ports did not work.

What is unusual is that 80 should be the normal port and the url parsing should 
be xNBA and not UEFI specific, so I’m uncertain why :80 would cause a problem 
in your environment.

Nodes that have not been ‘nodeset’ since your upgrade would not have the :80….

A reasonable mitigation in the code would be to skip the port designation if it 
is default, though it is still fairly odd that this would do anything different…

From: Carl <mutantll...@gmail.com<mailto:mutantll...@gmail.com>>
Sent: Thursday, July 18, 2019 4:01 AM
To: xCAT Users Mailing list 
<xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>>
Subject: [External] Re: [xcat-user] Unable to pxe boot node after mainboard 
replacement

Hi all,

Further to the above I have managed to isolate the issue.

It looks like when nodeset is run, it is adding :80 to the boot options in the 
leases file.

Eg:

host comp078 {
  dynamic;
  hardware ethernet 00:0a:f7:be:fc:de;
  uid 00:0a:f7:be:fc:de;
  fixed-address 100.64.1.78;
        supersede server.ddns-hostname = "comp078";
        supersede host-name = "comp078";
        if option user-class-identifier = "xNBA" and option client-architecture
             = 00:00 {
          supersede server.always-broadcast = 01;
          supersede server.filename =
                  
"http://${next-server}:80/tftpboot/xcat/xnba/nodes/comp078<http://$%7bnext-server%7d:80/tftpboot/xcat/xnba/nodes/comp078>";
        } elsif option user-class-identifier = "xNBA" and option
                client-architecture = 00:09 {
          supersede server.filename =
                                      
"http://${next-server}:80/tftpboot/xcat/xnba/nodes/comp078.uefi<http://$%7bnext-server%7d:80/tftpboot/xcat/xnba/nodes/comp078.uefi>";
        } elsif option client-architecture = 00:07 {
          supersede server.filename = "xcat/xnba.efi";
        } elsif option client-architecture = 00:00 {
          supersede server.filename = "xcat/xnba.kpxe";
        } else {
          supersede server.filename = "";
        }
}

If I manually edit the leases file and remove :80 from the two filename entries 
above, the node is able to boot fine.

Is anyone able to advise on why my environment might be now doing this?

Thanks,

Carl.





On Thu, 18 Jul 2019 at 16:22, Carl 
<mutantll...@gmail.com<mailto:mutantll...@gmail.com>> wrote:
Hi Folks,

We recently replaced the mainboard on a Dell R640.

I removed the mac address from the node definition and let switch based 
discovery take care of discovering the new MAC address and running BMC setup. 
Everything went well and the node ended at the xcat shell.

However when I tried to boot the node (statelite) its failing to find the image 
and if I persist it dies with a horible UEFI error. The node also has this 
problem if I nodeset it to boot to shell.

As other nodes are able to boot statelite fine, I assumed that it was a 
hardware error. Dell has replaced the mainboard a second time, but the issue 
still persists.

It might be worth mentioning that the last time that we had a mainboard 
replacement on a comp node was about 9 months ago and we have updated xCat a 
couple of times since then. Attached is the console log of the UEFI crash and 
the pxe boot messages that are seen on a working and non-working node.

Is anyone able to suggest any tricks to further debug this issue. I'm reluctant 
to pin the problem on xCat, but find it unlikely that I have hit two mainboards 
with the same fault.

Thanks,

Carl.



#### These are the pxe boot messages for the node that isnt working ####
[2019-07-10T10:45:47+10:00] ESC[2JESC[01;01HBooting from PXE Device 2: 
Integrated NIC 1 Port 3 Partition 1
[2019-07-10T10:45:48+10:00]
[2019-07-10T10:45:48+10:00] >>Start PXE over IPv4.
[2019-07-10T10:45:52+10:00]   Station IP address is 100.64.1.78
[2019-07-10T10:45:52+10:00]
[2019-07-10T10:45:52+10:00]   Server IP address is 100.64.0.1
[2019-07-10T10:45:52+10:00]   NBP filename is xcat/xnba.efi
[2019-07-10T10:45:52+10:00]   NBP filesize is 139200 Bytes
[2019-07-10T10:45:52+10:00]  Downloading NBP file...
[2019-07-10T10:45:52+10:00]
[2019-07-10T10:45:52+10:00]   NBP file downloaded successfully.
[2019-07-10T10:45:52+10:00] xNBA initialising devices...ok
[2019-07-10T10:45:52+10:00]
[2019-07-10T10:45:52+10:00]
[2019-07-10T10:45:52+10:00] xCAT Network Boot Agent
[2019-07-10T10:45:52+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028 
(d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware -- 
ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
[2019-07-10T10:45:52+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
[2019-07-10T10:45:52+10:00] net0: 00:0a:f7:be:b7:d2 using <NULL> on EFI SNP 
(open)
[2019-07-10T10:45:52+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
[2019-07-10T10:45:52+10:00] DHCP (net0 00:0a:f7:be:b7:d2)... ok
[2019-07-10T10:45:52+10:00] net0: 
100.64.1.78/255.255.248.0<http://100.64.1.78/255.255.248.0> gw 100.64.0.1
[2019-07-10T10:45:52+10:00] Next server: 100.64.0.1
[2019-07-10T10:45:52+10:00] Filename: 
http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi
[2019-07-10T10:45:52+10:00] 
http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi.................. 
Connection timed out (http://ipxe.org/4c0a6012)
[2019-07-10T10:46:08+10:00] No more network devices
[2019-07-10T10:46:08+10:00] xNBA initialising devices...ok
[2019-07-10T10:46:08+10:00]
[2019-07-10T10:46:08+10:00]
[2019-07-10T10:46:08+10:00] xCAT Network Boot Agent
[2019-07-10T10:46:08+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028 
(d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware -- 
ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
[2019-07-10T10:46:08+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
[2019-07-10T10:46:08+10:00] net1: 00:0a:f7:be:b7:d2 using <NULL> on EFI SNP 
(open)
[2019-07-10T10:46:08+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
[2019-07-10T10:46:08+10:00] DHCP (net1 00:0a:f7:be:b7:d2)... ok
[2019-07-10T10:46:08+10:00] net1: 
100.64.1.78/255.255.248.0<http://100.64.1.78/255.255.248.0> gw 100.64.0.1
[2019-07-10T10:46:08+10:00] Next server: 100.64.0.1
[2019-07-10T10:46:08+10:00] Filename: 
http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi
[2019-07-10T10:46:08+10:00] 
http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi.................. 
Connection timed out (http://ipxe.org/4c0a6012)
[2019-07-10T10:46:24+10:00] No more network devices



#### As a comparison, this is what we see on a node that boots fine ####
[2019-07-18T11:59:45+10:00] ESC[0mESC[37mESC[40mESC[2JESC[01;01HBooting from 
PXE Device 1: Integrated NIC 1 Port 3 Partition 1
[2019-07-18T11:59:46+10:00]
[2019-07-18T11:59:46+10:00] >>Start PXE over IPv4.
[2019-07-18T11:59:50+10:00]   Station IP address is 100.64.1.86
[2019-07-18T11:59:50+10:00]
[2019-07-18T11:59:50+10:00]   Server IP address is 100.64.0.1
[2019-07-18T11:59:50+10:00]   NBP filename is xcat/xnba.efi
[2019-07-18T11:59:50+10:00]   NBP filesize is 139200 Bytes
[2019-07-18T11:59:50+10:00]  Downloading NBP file...
[2019-07-18T11:59:50+10:00]
[2019-07-18T11:59:50+10:00]   NBP file downloaded successfully.
[2019-07-18T11:59:50+10:00] xNBA initialising devices...ok
[2019-07-18T11:59:50+10:00]
[2019-07-18T11:59:50+10:00]
[2019-07-18T11:59:50+10:00] xCAT Network Boot Agent
[2019-07-18T11:59:50+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028 
(d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware -- 
ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
[2019-07-18T11:59:50+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
[2019-07-18T11:59:50+10:00] net0: 00:0a:f7:bd:e6:b8 using <NULL> on EFI SNP 
(open)
[2019-07-18T11:59:50+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
[2019-07-18T11:59:50+10:00] DHCP (net0 00:0a:f7:bd:e6:b8)... ok
[2019-07-18T11:59:50+10:00] net0: 
100.64.1.86/255.255.248.0<http://100.64.1.86/255.255.248.0> gw 100.64.0.1
[2019-07-18T11:59:50+10:00] Next server: 100.64.0.1
[2019-07-18T11:59:50+10:00] Filename: 
http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi
[2019-07-18T11:59:51+10:00] 
http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi... ok
[2019-07-18T11:59:51+10:00] http://100.64.0.1/tftpboot/xcat/elilo-x64.efi... ok
[2019-07-18T11:59:51+10:00] ELILO v3.14 for EFI/x86_64
[2019-07-18T11:59:51+10:00] Loading kernel 
/tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/kernel...  done
[2019-07-18T11:59:51+10:00] Loading file 
/tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/initrd-stateless.gz...done

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to