xNBA 1.0.3 was based on a much older version of iPXE, so there are likely
many changes in the iPXE base between xNBA 1.0.3 and 1.20.1.
xNBA 1.20.1 is almost identical to iPXE 1.20.1 with minimal customization.
The original rationale for updating xNBA to a newer iPXE base was that
several hardware platforms were not functional with xNBA 1.0.3 but work
better after rebasing xNBA to iPXE 1.20.1.

Are there any differences in the BIOS boot mode settings between the
servers that are working and the servers that are failing?
Legacy mode vs Legacy + UEFI mode vs UEFI only mode?

Do you have better results if you downgrade xNBA to the previous version?
wget
https://xcat.org/files/xcat/repos/yum/2.16/xcat-dep/xnba-undi-1.0.3-131028.noarch.rpm
rpm -e --nodeps xnba-undi
rpm -ivh xnba-undi-1.0.3-131028.noarch.rpm

Also, there is a possibly related issue here:
https://github.com/xcat2/xcat-core/issues/6518

Any help and feedback that other xCAT x86 users can contribute about which
hardware platforms are working better or worse when using xNBA 1.0.3 vs
1.20.1
will help determine the best recipe for the different hardware platforms
that are in use and hopefully allow a golden version of xNBA that works
well for everyone to be built.



From:   Thomas HUMMEL <thomas.hum...@pasteur.fr>
To:     <xcat-user@lists.sourceforge.net>
Date:   05/31/2021 12:31 PM
Subject:        [EXTERNAL] [xcat-user] xCAT 2.16.2 new xNBA issue



Hello,

I'm using successfully xCAT for stateless CentOS 8.3 nodes provisioning.

Last week I upgraded from xCAT 2.16.1 to 2.16.2 without noticing any
issue. However today I experienced the following:

My nodes are mainly SuperMicro AS-2124BT-HNTR (Big Twin - 2U) systems
(H12DST-B motherboard) which PXE boot on an RJ45 nic adaptater which
lspci lists as:

Intel Corporation Ethernet Controller 10G X550T (rev 01)

These nodes, though physicaly identical, has been purchased as 3 batches
at 3 point in time.

This has always worked fine across several xCAT versions. However, since
the 2.16.2 upgrade and because - I reckon - of the new xNBA ("xNBA has
been updated to an iPXE 1.20.1 base"):

1) batch 1 PXE boots, downloads xBNA which prints for a very long time
(~25s real time) xBNA initializing devices, and when finally finished
prints:

xNBA 1.0.0+ ...
Features...

No more network devices

-> so the node fails to boot

2) as batch 2 and 3, though identical physically and with the same
version of BIOS and BMC firmware (though a different CPLD version), it
works (os image ultimately gets fetched)

However, the device initialization takes a fairly amount of time too
(~10s) and the whole xNBA stage (before osimage fetching) takes very
long as well (~28s).
Note: before this, a downloading NBP file takes some time too.

I tried to search for hints in iPXE repo commit messages but I'm don't
know the specifics enough.

Do you have any clue about what has been changed to iPXE/xNBA to show
such a behavior ?

Thanks for your help

--
Thomas HUMMEL


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user




_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to