Hi all,

Further to the above I have managed to isolate the issue.

It looks like when nodeset is run, it is adding :80 to the boot options in
the leases file.

Eg:

host comp078 {
  dynamic;
  hardware ethernet 00:0a:f7:be:fc:de;
  uid 00:0a:f7:be:fc:de;
  fixed-address 100.64.1.78;
        supersede server.ddns-hostname = "comp078";
        supersede host-name = "comp078";
        if option user-class-identifier = "xNBA" and option
client-architecture
             = 00:00 {
          supersede server.always-broadcast = 01;
          supersede server.filename =
                  "http://
${next-server}:80/tftpboot/xcat/xnba/nodes/comp078";
        } elsif option user-class-identifier = "xNBA" and option
                client-architecture = 00:09 {
          supersede server.filename =
                                      "http://
${next-server}:80/tftpboot/xcat/xnba/nodes/comp078.uefi";
        } elsif option client-architecture = 00:07 {
          supersede server.filename = "xcat/xnba.efi";
        } elsif option client-architecture = 00:00 {
          supersede server.filename = "xcat/xnba.kpxe";
        } else {
          supersede server.filename = "";
        }
}

If I manually edit the leases file and remove :80 from the two filename
entries above, the node is able to boot fine.

Is anyone able to advise on why my environment might be now doing this?

Thanks,

Carl.





On Thu, 18 Jul 2019 at 16:22, Carl <mutantll...@gmail.com> wrote:

> Hi Folks,
>
> We recently replaced the mainboard on a Dell R640.
>
> I removed the mac address from the node definition and let switch based
> discovery take care of discovering the new MAC address and running BMC
> setup. Everything went well and the node ended at the xcat shell.
>
> However when I tried to boot the node (statelite) its failing to find the
> image and if I persist it dies with a horible UEFI error. The node also has
> this problem if I nodeset it to boot to shell.
>
> As other nodes are able to boot statelite fine, I assumed that it was a
> hardware error. Dell has replaced the mainboard a second time, but the
> issue still persists.
>
> It might be worth mentioning that the last time that we had a mainboard
> replacement on a comp node was about 9 months ago and we have updated xCat
> a couple of times since then. Attached is the console log of the UEFI crash
> and the pxe boot messages that are seen on a working and non-working node.
>
> Is anyone able to suggest any tricks to further debug this issue. I'm
> reluctant to pin the problem on xCat, but find it unlikely that I have hit
> two mainboards with the same fault.
>
> Thanks,
>
> Carl.
>
>
>
> #### These are the pxe boot messages for the node that isnt working ####
> [2019-07-10T10:45:47+10:00] ESC[2JESC[01;01HBooting from PXE Device 2:
> Integrated NIC 1 Port 3 Partition 1
> [2019-07-10T10:45:48+10:00]
> [2019-07-10T10:45:48+10:00] >>Start PXE over IPv4.
> [2019-07-10T10:45:52+10:00]   Station IP address is 100.64.1.78
> [2019-07-10T10:45:52+10:00]
> [2019-07-10T10:45:52+10:00]   Server IP address is 100.64.0.1
> [2019-07-10T10:45:52+10:00]   NBP filename is xcat/xnba.efi
> [2019-07-10T10:45:52+10:00]   NBP filesize is 139200 Bytes
> [2019-07-10T10:45:52+10:00]  Downloading NBP file...
> [2019-07-10T10:45:52+10:00]
> [2019-07-10T10:45:52+10:00]   NBP file downloaded successfully.
> [2019-07-10T10:45:52+10:00] xNBA initialising devices...ok
> [2019-07-10T10:45:52+10:00]
> [2019-07-10T10:45:52+10:00]
> [2019-07-10T10:45:52+10:00] xCAT Network Boot Agent
> [2019-07-10T10:45:52+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028
> (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware --
> ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
> [2019-07-10T10:45:52+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
> [2019-07-10T10:45:52+10:00] net0: 00:0a:f7:be:b7:d2 using <NULL> on EFI
> SNP (open)
> [2019-07-10T10:45:52+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
> [2019-07-10T10:45:52+10:00] DHCP (net0 00:0a:f7:be:b7:d2)... ok
> [2019-07-10T10:45:52+10:00] net0: 100.64.1.78/255.255.248.0 gw 100.64.0.1
> [2019-07-10T10:45:52+10:00] Next server: 100.64.0.1
> [2019-07-10T10:45:52+10:00] Filename:
> http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi
> [2019-07-10T10:45:52+10:00]
> http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi..................
> Connection timed out (http://ipxe.org/4c0a6012)
> [2019-07-10T10:46:08+10:00] No more network devices
> [2019-07-10T10:46:08+10:00] xNBA initialising devices...ok
> [2019-07-10T10:46:08+10:00]
> [2019-07-10T10:46:08+10:00]
> [2019-07-10T10:46:08+10:00] xCAT Network Boot Agent
> [2019-07-10T10:46:08+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028
> (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware --
> ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
> [2019-07-10T10:46:08+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
> [2019-07-10T10:46:08+10:00] net1: 00:0a:f7:be:b7:d2 using <NULL> on EFI
> SNP (open)
> [2019-07-10T10:46:08+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
> [2019-07-10T10:46:08+10:00] DHCP (net1 00:0a:f7:be:b7:d2)... ok
> [2019-07-10T10:46:08+10:00] net1: 100.64.1.78/255.255.248.0 gw 100.64.0.1
> [2019-07-10T10:46:08+10:00] Next server: 100.64.0.1
> [2019-07-10T10:46:08+10:00] Filename:
> http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi
> [2019-07-10T10:46:08+10:00]
> http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi..................
> Connection timed out (http://ipxe.org/4c0a6012)
> [2019-07-10T10:46:24+10:00] No more network devices
>
>
>
> #### As a comparison, this is what we see on a node that boots fine ####
> [2019-07-18T11:59:45+10:00] ESC[0mESC[37mESC[40mESC[2JESC[01;01HBooting
> from PXE Device 1: Integrated NIC 1 Port 3 Partition 1
> [2019-07-18T11:59:46+10:00]
> [2019-07-18T11:59:46+10:00] >>Start PXE over IPv4.
> [2019-07-18T11:59:50+10:00]   Station IP address is 100.64.1.86
> [2019-07-18T11:59:50+10:00]
> [2019-07-18T11:59:50+10:00]   Server IP address is 100.64.0.1
> [2019-07-18T11:59:50+10:00]   NBP filename is xcat/xnba.efi
> [2019-07-18T11:59:50+10:00]   NBP filesize is 139200 Bytes
> [2019-07-18T11:59:50+10:00]  Downloading NBP file...
> [2019-07-18T11:59:50+10:00]
> [2019-07-18T11:59:50+10:00]   NBP file downloaded successfully.
> [2019-07-18T11:59:50+10:00] xNBA initialising devices...ok
> [2019-07-18T11:59:50+10:00]
> [2019-07-18T11:59:50+10:00]
> [2019-07-18T11:59:50+10:00] xCAT Network Boot Agent
> [2019-07-18T11:59:50+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028
> (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware --
> ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
> [2019-07-18T11:59:50+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
> [2019-07-18T11:59:50+10:00] net0: 00:0a:f7:bd:e6:b8 using <NULL> on EFI
> SNP (open)
> [2019-07-18T11:59:50+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
> [2019-07-18T11:59:50+10:00] DHCP (net0 00:0a:f7:bd:e6:b8)... ok
> [2019-07-18T11:59:50+10:00] net0: 100.64.1.86/255.255.248.0 gw 100.64.0.1
> [2019-07-18T11:59:50+10:00] Next server: 100.64.0.1
> [2019-07-18T11:59:50+10:00] Filename:
> http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi
> [2019-07-18T11:59:51+10:00]
> http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi... ok
> [2019-07-18T11:59:51+10:00] http://100.64.0.1/tftpboot/xcat/elilo-x64.efi...
> ok
> [2019-07-18T11:59:51+10:00] ELILO v3.14 for EFI/x86_64
> [2019-07-18T11:59:51+10:00] Loading kernel
> /tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/kernel...  done
> [2019-07-18T11:59:51+10:00] Loading file
> /tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/initrd-stateless.gz...done
>
>
>
>
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to