Thomas, have you tried using iPXE version 2.21.1 as suggested by kcgthb in https://github.com/xcat2/xcat-core/issues/7036 ? Mark Gurevich Poughkeepsie Development Lab HPC Software Development - xCAT "If we knew what it was we were doing, it would not be called research, would it?" --Albert Einstein From: "Thomas HUMMEL" <thomas.hum...@pasteur.fr> To: <xcat-user@lists.sourceforge.net> Date: 09/23/2021 04:11 AM Subject: [EXTERNAL] Re: [xcat-user] xCAT 2.16.2 new xNBA issue On 07/09/2021 20:45, Mark Gurevich wrote: > So this behavior is different from the errors you were seeing prior to > this dhcp.pm change, where dhcpd.conf contained "http://<xCAT > < http://<xCAT $ > > MN>:80/tftpboot/xcat/xnba/nets/<network>.uefi" for > "else if option user-class-identifier = "xNBA" and option > client-architecture = 00:09" ? > > Can you show what /tftpboot/xcat/xnba/nets/<network> file contains on > your management node ? Hello Mark, sorry for the delay and the long post below Here's the answer to your question: # ls -l /tftpboot/xcat/xnba/nets/ total 24 -rw-r--r-- 1 root root 247 Jul 6 18:42 127.0.0.0_8 -rw-r--r-- 1 root root 241 Jul 6 18:42 127.0.0.0_8.elilo -rw-r--r-- 1 root root 112 Jul 6 18:42 127.0.0.0_8.uefi -rw-r--r-- 1 root root 253 Jul 6 18:42 192.168.144.0_20 -rw-r--r-- 1 root root 259 Jul 6 18:42 192.168.144.0_20.elilo -rw-r--r-- 1 root root 117 Jul 6 18:42 192.168.144.0_20.uefi # cat /tftpboot/xcat/xnba/nets/192.168.144.0_20.uefi #!gpxe chain http://$%7Bnext-server%7D:80/tftpboot/xcat/elilo-x64.efi -C /tftpboot/xcat/xnba/nets/192.168.144.0_20.elilo # cat /tftpboot/xcat/xnba/nets/192.168.144.0_20.elilo default="xCAT Genesis (192.168.149.100)" delay=5 image=/tftpboot/xcat/genesis.kernel.x86_64 label="xCAT Genesis (192.168.149.100)" initrd=/tftpboot/xcat/genesis.fs.x86_64.gz append="quiet xcatd=192.168.149.100:3001 destiny=discover BOOTIF=%B" I'd like to add the following new info regarding xNBA issues: To sum up what's been going on in this thread, I ended up having 3 different possible xNBA's, one of which I choose to run by cp -p'ing it into xnba.efi: I named them like this: # ls -1 xnba*\.efi* xnba.efi -> cp -p of one of the 3 below xnba.efi-2.15 -> the original from the xCAT-2.15 branch xnba.efi-2.16.2 -> the original from the latest stable xCAT-2.16 branch xnba.efi-beta -> the original (from xCAT-2.16) patched you provided me earlier in this thread Depending on which one I use I have a different issue on a different component/process: xnba.efi-2.16: pb booting some stateless hardware nodes (the original post opening this thread) xnba.efi-beta: pb booting genesis (our current discussion) that's the reason why I currently and for some time now am running the xnba.efi-2.15. However I recently discovered an issue with this xnba.efi-2.15 also: stateful VM's (VMWare) booting in UEFI mode and having PXE first in their UEFI boot order just won't boot : it seems that this xNBA somehow triggers a bug which prevents UEFI from booting on the next target configured in the UEFI boot order (disk). And this wether or not the /tftpboot/xcat/xnba/nodes/host.uefi file (iPXE script just issuing 'exit') is here or not Basically, with xNBA-2.15 and such a vm: - hosts tftp xNBA - xNBA http GET's successfully or not the node.uefi script file (does not change anything if http response is 404 or 200) - vm does not manage to boot en ends up in firmware interface This issue does not occur with 2.16 or 2.16 beta versions Which leads me to the additionnal following questions: 1) Initially, those vm are instantiated from a VMWare template which itself comes from a standard xCAT stateful install (remember I'm talking CentOS 8.3 here). Ususally such a remote stateful install ends up, as expected, with UEFI boot order changed from PXE first to disk first. However, I can see some of my VM's having currently PXE first, which I cannot figure out why : do you think this could be xCAT related as in for some reason the stateful install process would fail to complete or change the UEFI order ? 2) Also, I know that on a particular date on August where these VM have rebooted (for some issue in the datacenter) but looking at the logs for this day and those VM's I cannot see any tftp transfer, I only see DHCPDISCOVER/OFFER. I always assumed that PXE boot in the xCAT paradigm would imply tftp of xNBA agent : do you know of a use case where the node would only issue DHCP without receiving any next-server option ? (Note: I know the tftpd daemon was running at that time) ? 3) finally, I'm using on those install the confignetwork -s script which supercedes the NetworkManager dhcp profile with a higher priority static one : do you think that this could generate dhcp logs on the MN node (while the static profile has not auto-connected yet ?) - although from the timestamp this the OFFER/DISCOVER I mentionned in 2) could not come from this (same second as tftp transaction) Thanks for your help and time -- Thomas HUMMEL _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user