Thomas, have you tried using iPXE version 2.21.1 as suggested by kcgthb in
https://github.com/xcat2/xcat-core/issues/7036 ?

Mark Gurevich
Poughkeepsie Development Lab
HPC Software Development - xCAT

"If we knew what it was we were doing, it would not be called research,
would it?"
--Albert Einstein





From:   "Thomas HUMMEL" <thomas.hum...@pasteur.fr>
To:     <xcat-user@lists.sourceforge.net>
Date:   09/23/2021 04:11 AM
Subject:        [EXTERNAL] Re: [xcat-user] xCAT 2.16.2 new xNBA issue





On 07/09/2021 20:45, Mark Gurevich wrote:
> So this behavior is different from the errors you were seeing prior to
> this dhcp.pm change, where dhcpd.conf contained "http://<xCAT
> <
http://<xCAT
$
>
> MN>:80/tftpboot/xcat/xnba/nets/<network>.uefi" for
> "else if option user-class-identifier = "xNBA" and option
> client-architecture = 00:09" ?
>
> Can you show what /tftpboot/xcat/xnba/nets/<network> file contains on
> your management node ?


Hello Mark, sorry for the delay and the long post below

Here's the answer to your question:

# ls -l /tftpboot/xcat/xnba/nets/
total 24
-rw-r--r-- 1 root root 247 Jul  6 18:42 127.0.0.0_8
-rw-r--r-- 1 root root 241 Jul  6 18:42 127.0.0.0_8.elilo
-rw-r--r-- 1 root root 112 Jul  6 18:42 127.0.0.0_8.uefi
-rw-r--r-- 1 root root 253 Jul  6 18:42 192.168.144.0_20
-rw-r--r-- 1 root root 259 Jul  6 18:42 192.168.144.0_20.elilo
-rw-r--r-- 1 root root 117 Jul  6 18:42 192.168.144.0_20.uefi

# cat /tftpboot/xcat/xnba/nets/192.168.144.0_20.uefi
#!gpxe
chain
http://$%7Bnext-server%7D:80/tftpboot/xcat/elilo-x64.efi
  -C
/tftpboot/xcat/xnba/nets/192.168.144.0_20.elilo

# cat /tftpboot/xcat/xnba/nets/192.168.144.0_20.elilo
default="xCAT Genesis (192.168.149.100)"
    delay=5
    image=/tftpboot/xcat/genesis.kernel.x86_64
    label="xCAT Genesis (192.168.149.100)"
    initrd=/tftpboot/xcat/genesis.fs.x86_64.gz
    append="quiet xcatd=192.168.149.100:3001 destiny=discover  BOOTIF=%B"

I'd like to add the following new info regarding xNBA issues:
To sum up what's been going on in this thread, I ended up having 3
different possible xNBA's, one of which I choose to run by cp -p'ing it
into xnba.efi:

I named them like this:

# ls -1 xnba*\.efi*
xnba.efi -> cp -p of one of the 3 below
xnba.efi-2.15 -> the original from the xCAT-2.15 branch
xnba.efi-2.16.2 -> the original from the latest stable xCAT-2.16 branch
xnba.efi-beta -> the original (from xCAT-2.16) patched you provided me
earlier in this thread

Depending on which one I use I have a different issue on a different
component/process:

xnba.efi-2.16: pb booting some stateless hardware nodes (the original
post opening this thread)
xnba.efi-beta: pb booting genesis (our current discussion)

that's the reason why I currently and for some time now am running the
xnba.efi-2.15.

However I recently discovered an issue with this xnba.efi-2.15 also:
stateful VM's (VMWare) booting in UEFI mode and having PXE first in
their UEFI boot order just won't boot : it seems that this xNBA somehow
triggers a bug which prevents UEFI from booting on the next target
configured in the UEFI boot order (disk). And this wether or not the
/tftpboot/xcat/xnba/nodes/host.uefi file (iPXE script just issuing
'exit') is here or not

Basically, with xNBA-2.15 and such a vm:

- hosts tftp xNBA
- xNBA http GET's successfully or not the node.uefi script file (does
not change anything if http response is 404 or 200)
- vm does not manage to boot en ends up in firmware interface

This issue does not occur with 2.16 or 2.16 beta versions

Which leads me to the additionnal following questions:

1) Initially, those vm are instantiated from a VMWare template which
itself comes from a standard xCAT stateful install (remember I'm talking
CentOS 8.3 here).
Ususally such a remote stateful install ends up, as expected, with UEFI
boot order changed from PXE first to disk first.
However, I can see some of my VM's having currently PXE first, which I
cannot figure out why : do you think this could be xCAT related as in
for some reason the stateful install process would fail to complete or
change the UEFI order ?

2) Also, I know that on a particular date on August where these VM have
rebooted (for some issue in the datacenter) but looking at the logs for
this day and those VM's I cannot see any tftp transfer, I only see
DHCPDISCOVER/OFFER.
I always assumed that PXE boot in the xCAT paradigm would imply tftp of
xNBA agent : do you know of a use case where the node would only issue
DHCP without receiving any next-server option ? (Note: I know the tftpd
daemon was running at that time) ?

3) finally, I'm using on those install the confignetwork -s script which
supercedes the NetworkManager dhcp profile with a higher priority static
one : do you think that this could generate dhcp logs on the MN node
(while the static profile has not auto-connected yet ?) - although from
the timestamp this the OFFER/DISCOVER I mentionned in 2) could not come
from this (same second as tftp transaction)

Thanks for your help and time

--
Thomas HUMMEL


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to