Great, thanks.

I'm happy to contribute back to the community, so I'll have a look to see
what I can do.

Cheers,

Carl.


On Thu, 18 Jul. 2019, 23:08 Jarrod Johnson, <jjohns...@lenovo.com> wrote:

> It should come in the rpm prebuilt, so shouldn’t be different…
>
>
>
> So the most ‘make the problem go away’ solution would be to have xnba.pm
> only do this when needed.  Off hand I think this would be right (untested):
>
>
> https://github.com/jjohnson42/xcat-core/commit/cd61fd9db468cd142537e5bd495b71310e6a6d07
>
>
>
> If I were in the situation, I would probably satisfy curiosity by running
> wireshark to see if any packets are emitted with :80 and if so, what looks
> odd about them.
>
>
>
> Of course, another thing I’d be tempted to do would be to try a newer ipxe
> build.  I happen to have one built to see if newer codebase would behave
> differently.  However last time I had checked it seemed to have
> compatibility issues with elilo.  Elilo is no longer required for CentOS7
> and up (in conjunction with a modified xnba.pm I have), but CentOS6
> kernels still need elilo.
>
>
>
> So I suppose there are three options, depending on how little time you
> want to spend to make the problem go away or understand more.
>
>
>
>
>
>
>
> *From:* Carl <mutantll...@gmail.com>
> *Sent:* Thursday, July 18, 2019 8:53 AM
> *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> *Subject:* Re: [xcat-user] [External] Re: Unable to pxe boot node after
> mainboard replacement
>
>
>
> Thanks Jarrod,
>
>
>
> Yes it is a little strange.
>
>
>
> I'm not seeing anything on the http server logs when the dhcp lease has
> :80 in the entry.
>
>
>
> I don't fully understand how xnba is built, could it be bringing in
> something from the management node (CentOS 6.5) that might be part of the
> issue?
>
>
>
> Cheers,
>
>
>
> Carl.
>
>
>
> On Thu, 18 Jul. 2019, 22:35 Jarrod Johnson, <jjohns...@lenovo.com> wrote:
>
> The change is from:
>
> commit 1889ec879d2ba721869217ad2e4f03d47b7fba40
>
> Author: yangsbj <yang...@cn.ibm.com>
>
> Date:   Thu Nov 1 23:29:01 2018 -0400
>
>
>
>     support site.httpport in nodeset and mknb
>
>
>
>
>
> Prior to that change, non-80 ports did not work.
>
>
>
> What is unusual is that 80 should be the normal port and the url parsing
> should be xNBA and not UEFI specific, so I’m uncertain why :80 would cause
> a problem in your environment.
>
>
>
> Nodes that have not been ‘nodeset’ since your upgrade would not have the
> :80….
>
>
>
> A reasonable mitigation in the code would be to skip the port designation
> if it is default, though it is still fairly odd that this would do anything
> different…
>
>
>
> *From:* Carl <mutantll...@gmail.com>
> *Sent:* Thursday, July 18, 2019 4:01 AM
> *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> *Subject:* [External] Re: [xcat-user] Unable to pxe boot node after
> mainboard replacement
>
>
>
> Hi all,
>
>
>
> Further to the above I have managed to isolate the issue.
>
>
>
> It looks like when nodeset is run, it is adding :80 to the boot options in
> the leases file.
>
>
>
> Eg:
>
>
>
> host comp078 {
>   dynamic;
>   hardware ethernet 00:0a:f7:be:fc:de;
>   uid 00:0a:f7:be:fc:de;
>   fixed-address 100.64.1.78;
>         supersede server.ddns-hostname = "comp078";
>         supersede host-name = "comp078";
>         if option user-class-identifier = "xNBA" and option
> client-architecture
>              = 00:00 {
>           supersede server.always-broadcast = 01;
>           supersede server.filename =
>                   "
> http://${next-server}:80/tftpboot/xcat/xnba/nodes/comp078";;
>         } elsif option user-class-identifier = "xNBA" and option
>                 client-architecture = 00:09 {
>           supersede server.filename =
>                                       "
> http://${next-server}:80/tftpboot/xcat/xnba/nodes/comp078.uefi";;
>         } elsif option client-architecture = 00:07 {
>           supersede server.filename = "xcat/xnba.efi";
>         } elsif option client-architecture = 00:00 {
>           supersede server.filename = "xcat/xnba.kpxe";
>         } else {
>           supersede server.filename = "";
>         }
> }
>
>
>
> If I manually edit the leases file and remove :80 from the two filename
> entries above, the node is able to boot fine.
>
>
>
> Is anyone able to advise on why my environment might be now doing this?
>
>
>
> Thanks,
>
>
>
> Carl.
>
>
>
>
>
>
>
>
>
>
>
> On Thu, 18 Jul 2019 at 16:22, Carl <mutantll...@gmail.com> wrote:
>
> Hi Folks,
>
> We recently replaced the mainboard on a Dell R640.
>
> I removed the mac address from the node definition and let switch based
> discovery take care of discovering the new MAC address and running BMC
> setup. Everything went well and the node ended at the xcat shell.
>
> However when I tried to boot the node (statelite) its failing to find the
> image and if I persist it dies with a horible UEFI error. The node also has
> this problem if I nodeset it to boot to shell.
>
> As other nodes are able to boot statelite fine, I assumed that it was a
> hardware error. Dell has replaced the mainboard a second time, but the
> issue still persists.
>
>
>
> It might be worth mentioning that the last time that we had a mainboard
> replacement on a comp node was about 9 months ago and we have updated xCat
> a couple of times since then. Attached is the console log of the UEFI crash
> and the pxe boot messages that are seen on a working and non-working node.
>
> Is anyone able to suggest any tricks to further debug this issue. I'm
> reluctant to pin the problem on xCat, but find it unlikely that I have hit
> two mainboards with the same fault.
>
> Thanks,
>
> Carl.
>
>
>
> #### These are the pxe boot messages for the node that isnt working ####
> [2019-07-10T10:45:47+10:00] ESC[2JESC[01;01HBooting from PXE Device 2:
> Integrated NIC 1 Port 3 Partition 1
> [2019-07-10T10:45:48+10:00]
> [2019-07-10T10:45:48+10:00] >>Start PXE over IPv4.
> [2019-07-10T10:45:52+10:00]   Station IP address is 100.64.1.78
> [2019-07-10T10:45:52+10:00]
> [2019-07-10T10:45:52+10:00]   Server IP address is 100.64.0.1
> [2019-07-10T10:45:52+10:00]   NBP filename is xcat/xnba.efi
> [2019-07-10T10:45:52+10:00]   NBP filesize is 139200 Bytes
> [2019-07-10T10:45:52+10:00]  Downloading NBP file...
> [2019-07-10T10:45:52+10:00]
> [2019-07-10T10:45:52+10:00]   NBP file downloaded successfully.
> [2019-07-10T10:45:52+10:00] xNBA initialising devices...ok
> [2019-07-10T10:45:52+10:00]
> [2019-07-10T10:45:52+10:00]
> [2019-07-10T10:45:52+10:00] xCAT Network Boot Agent
> [2019-07-10T10:45:52+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028
> (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware --
> ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
> [2019-07-10T10:45:52+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
> [2019-07-10T10:45:52+10:00] net0: 00:0a:f7:be:b7:d2 using <NULL> on EFI
> SNP (open)
> [2019-07-10T10:45:52+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
> [2019-07-10T10:45:52+10:00] DHCP (net0 00:0a:f7:be:b7:d2)... ok
> [2019-07-10T10:45:52+10:00] net0: 100.64.1.78/255.255.248.0 gw 100.64.0.1
> [2019-07-10T10:45:52+10:00] Next server: 100.64.0.1
> [2019-07-10T10:45:52+10:00] Filename:
> http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi
> [2019-07-10T10:45:52+10:00]
> http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi..................
> Connection timed out (http://ipxe.org/4c0a6012)
> [2019-07-10T10:46:08+10:00] No more network devices
> [2019-07-10T10:46:08+10:00] xNBA initialising devices...ok
> [2019-07-10T10:46:08+10:00]
> [2019-07-10T10:46:08+10:00]
> [2019-07-10T10:46:08+10:00] xCAT Network Boot Agent
> [2019-07-10T10:46:08+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028
> (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware --
> ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
> [2019-07-10T10:46:08+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
> [2019-07-10T10:46:08+10:00] net1: 00:0a:f7:be:b7:d2 using <NULL> on EFI
> SNP (open)
> [2019-07-10T10:46:08+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
> [2019-07-10T10:46:08+10:00] DHCP (net1 00:0a:f7:be:b7:d2)... ok
> [2019-07-10T10:46:08+10:00] net1: 100.64.1.78/255.255.248.0 gw 100.64.0.1
> [2019-07-10T10:46:08+10:00] Next server: 100.64.0.1
> [2019-07-10T10:46:08+10:00] Filename:
> http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi
> [2019-07-10T10:46:08+10:00]
> http://100.64.0.1:80/tftpboot/xcat/xnba/nodes/comp078.uefi..................
> Connection timed out (http://ipxe.org/4c0a6012)
> [2019-07-10T10:46:24+10:00] No more network devices
>
>
>
> #### As a comparison, this is what we see on a node that boots fine ####
> [2019-07-18T11:59:45+10:00] ESC[0mESC[37mESC[40mESC[2JESC[01;01HBooting
> from PXE Device 1: Integrated NIC 1 Port 3 Partition 1
> [2019-07-18T11:59:46+10:00]
> [2019-07-18T11:59:46+10:00] >>Start PXE over IPv4.
> [2019-07-18T11:59:50+10:00]   Station IP address is 100.64.1.86
> [2019-07-18T11:59:50+10:00]
> [2019-07-18T11:59:50+10:00]   Server IP address is 100.64.0.1
> [2019-07-18T11:59:50+10:00]   NBP filename is xcat/xnba.efi
> [2019-07-18T11:59:50+10:00]   NBP filesize is 139200 Bytes
> [2019-07-18T11:59:50+10:00]  Downloading NBP file...
> [2019-07-18T11:59:50+10:00]
> [2019-07-18T11:59:50+10:00]   NBP file downloaded successfully.
> [2019-07-18T11:59:50+10:00] xNBA initialising devices...ok
> [2019-07-18T11:59:50+10:00]
> [2019-07-18T11:59:50+10:00]
> [2019-07-18T11:59:50+10:00] xCAT Network Boot Agent
> [2019-07-18T11:59:50+10:00] ESC[1mESC[37mESC[40miPXE 1.0.3-131028
> (d603e)ESC[0mESC[37mESC[40m -- Open Source Network Boot Firmware --
> ESC[0mESC[36mESC[40mhttp://ipxe.orgESC[0mESC[37mESC[40m
> [2019-07-18T11:59:50+10:00] Features: HTTP HTTPS iSCSI DNS TFTP EFI
> [2019-07-18T11:59:50+10:00] net0: 00:0a:f7:bd:e6:b8 using <NULL> on EFI
> SNP (open)
> [2019-07-18T11:59:50+10:00]   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
> [2019-07-18T11:59:50+10:00] DHCP (net0 00:0a:f7:bd:e6:b8)... ok
> [2019-07-18T11:59:50+10:00] net0: 100.64.1.86/255.255.248.0 gw 100.64.0.1
> [2019-07-18T11:59:50+10:00] Next server: 100.64.0.1
> [2019-07-18T11:59:50+10:00] Filename:
> http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi
> [2019-07-18T11:59:51+10:00]
> http://100.64.0.1/tftpboot/xcat/xnba/nodes/comp086.uefi... ok
> [2019-07-18T11:59:51+10:00] http://100.64.0.1/tftpboot/xcat/elilo-x64.efi...
> ok
> [2019-07-18T11:59:51+10:00] ELILO v3.14 for EFI/x86_64
> [2019-07-18T11:59:51+10:00] Loading kernel
> /tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/kernel...  done
> [2019-07-18T11:59:51+10:00] Loading file
> /tftpboot/xcat/osimage/centos75-gpfs5.0.2.0-compute/initrd-stateless.gz...done
>
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to