Re: [xcat-user] [External] Re: xCAT Consortium Update

Jarrod Johnson via xCAT-user Wed, 23 Oct 2024 05:56:39 -0700

> a) So the BMC config is made out of band ? This would imply 2 DHCP
servers if 1 vlan for data and 1 for BMC ?

One thing to point out, is that we may have arbitrarily many or few networks, 
similar to xCAT.  I realized this isn't a given as I hear some other software 
bakes in a fairly hard assumption about a fixed number of networks with 
specific roles.  But also, DHCP is optional, in both cases, for different 
reasons.  For the PXE case, confluent directly receives (and optionally 
answers) the DHCP discover packet, and that packet has MAC and UUID in it, and 
that's enough to list, correlate with out of band information, and to search 
ethernet switches for.  Technically, it means PXE discovery is limited to those 
data, but those are generally the most relevant.  As to the BMCs, where 
supported...

> b) how can no dynamic range work without ending up either BMC ip address
being "random" (i.e 2 discover would not necessary end up the same BMC
having the same ip address) or exposing to ip address conflict risk (one
BMC get set up its final ip address while it is used by another one
still being discovered) ?

In the event of a routed network with BMCs only accessible through a router, 
that assessment is in the ball park, and confluent can move a dynamic IP to a 
static and renumber.  As you surmise, in that scenario, you have some risks and 
potential for it to not work, so best results are if we are in the same vlan.  
If we are in the same vlan, we can tolerate some really traditionally bad 
stuff.  Torturing my nodes for example, by giving them all the identical IPv4 
address, but still things are working:
[root@r3u20 ~]# nodeconfig r3u[21:24] bmc.ipv4_address
r3u21: bmc.ipv4_address: 172.30.91.1/16
r3u22: bmc.ipv4_address: 172.30.91.1/16
r3u23: bmc.ipv4_address: 172.30.91.1/16
r3u24: bmc.ipv4_address: 172.30.91.1/16
[root@r3u20 ~]# arping -I bridge0 172.30.91.1 -c 1
ARPING 172.30.91.1 from 172.30.193.20 bridge0
Unicast reply from 172.30.91.1 [90:2E:16:0E:33:F2]  0.709ms
Unicast reply from 172.30.91.1 [90:2E:16:0E:34:7E]  0.737ms
Unicast reply from 172.30.91.1 [90:2E:16:0E:32:08]  0.751ms
Unicast reply from 172.30.91.1 [90:2E:16:0D:C7:85]  0.764ms
Sent 1 probes (1 broadcast(s))
Received 4 response(s)
[root@r3u20 ~]#

The answer is we have an entirely different cooler network that is pretty 
bulletproof:
# nodediscover list -o node -f node,mac,type,ip -t lenovo-xcc |grep -E 
'(Node|----|r3u2)'|cat
 Node|               Mac|       Type|                                           
   IP
-----|------------------|-----------|------------------------------------------------
r3u21| 90:2e:16:0e:32:08| lenovo-xcc|               
fe80::922e:16ff:fe0e:3208%bridge0
r3u22| 90:2e:16:0d:c7:85| lenovo-xcc|               
fe80::922e:16ff:fe0d:c785%bridge0
r3u23| 90:2e:16:0e:34:7e| lenovo-xcc|               
fe80::922e:16ff:fe0e:347e%bridge0
r3u24| 90:2e:16:0e:33:f2| lenovo-xcc|   
172.30.91.1,fe80::922e:16ff:fe0e:33f2%bridge0

So for things like BMCs, we use the link local address which generally 
participates well in multicast protocols like SSDP or mDNS and is always unique 
and always there if ipv6 is enabled at all, with or without a 'real' address.  
In this case I did 'collect LLA' mode so I can fix this easy enough, as a model 
for how the addressing would be straightened out by discovery:
[root@r3u20 ~]# nodeconfig r3u[21:24] 
bmc.ipv4_address=172.30.{location.rack+128}.{location.u}/16
[root@r3u20 ~]# nodeping -s 172.30.{location.rack+128}.{location.u} r3u[21:24]
172.30.131.21: ping
172.30.131.22: ping
172.30.131.23: ping
172.30.131.24: ping

>The above command implies BMC is already configured (in order to pxe
boot the node), correct ?

In the xCAT norm, we assume a power button press leads to a natural PXE boot, 
and that carries over to confluent PXE discovery. So if you can't assume BMC 
first (unsupported BMC, or a BMC that needs shared but is disabled or needs a 
vlan tag and doesn't have it), then you do PXE-first the xCAT way, and hope 
that power button press leads to PXE boot. In this example I used BMC because 
it was more convenient, but the data showed was consistent with a PXE-only 
appreach.

> So image based (genesis like) discovery is ready to be used or do we
have to self generate our own genesis-like image ?

A genesis is included, but is optional, even for PXE discovery.  When a server 
console looks like:
>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv4 on MAC: 84-16-0C-FB-B7-0C.

Then it's far enough along to perform confluent PXE discovery.  If you have 
done `nodedeploy noderange -n -p genesis-x86_64' prior to discovery, then it 
would take you into the canned genesis profile. Or you can provide your own 
diskless image or take care of traditional genesis behaviors during %pre of a 
kickstart.

> You mean running ansible from the inside at boot or from the management
node ? In a pull mode ?

The scripts are pulled, the ansible plays, if used, are executed by the 
confluent service (as the confluent uid) on the deploying server with the 
deploying node as the host (any host specified in the play is superseded by 
whatever host is actually ready to be hit by the play).

>Do image use initialramfs (if yes, dracut type ?) ?

Genesis is generated by dracut still, with no 'root' stage.  For diskless 
images, depends on the OS, SUSE and RedHat, dracut, for Ubuntu (and 
theoretically other Debian related), then it uses initramfs-tools, since that's 
the 'usual' stack for those platforms, though there are use cases that drive 
folks to switch to dracut for Ubuntu, confluent currently sticks with initramfs 
tools.

________________________________
From: Thomas HUMMEL <thomas.hum...@pasteur.fr>
Sent: Wednesday, October 23, 2024 6:04 AM
To: xcat-user@lists.sourceforge.net <xcat-user@lists.sourceforge.net>
Subject: Re: [xcat-user] [External] Re: xCAT Consortium Update

Hello, thanks for your answer.

On 10/22/24 4:18 PM, Jarrod Johnson via xCAT-user wrote:

> But for non-Lenovo, you would do it roughly xCAT style, with 'pxe-client' and 
> maybe a genesis image using configbmc (or another profile).  One difference 
> is you don't need a dynamic range in Confluent, as it does discovery against 
> the DHCPDISCOVER packet rather than needing linux first.

The BMC discovery mechanism seems cool but I don't think I really
understand how it works :

a) So the BMC config is made out of band ? This would imply 2 DHCP
servers if 1 vlan for data and 1 for BMC ?

b) how can no dynamic range work without ending up either BMC ip address
being "random" (i.e 2 discover would not necessary end up the same BMC
having the same ip address) or exposing to ip address conflict risk (one
BMC get set up its final ip address while it is used by another one
still being discovered) ?

> [root@r3u20 ~]# nodediscover list -t pxe-client -f node,uuid,type,switch,port 
> -o node
>   Node|                                 UUID|       Type| Switch|  Port
> ------|-------------------------------------|-----------|-------|------
> r3u21| 11137727-3f6e-11ed-9dcc-92feca966289| pxe-client|   r3c1| swp34
> r3u22| cfeaac8f-341f-11ed-a12e-ca86724e9d51| pxe-client|   r3c1|  swp2
> r3u23| 57ab573c-327b-11ed-92ce-bf13e73b0f63| pxe-client|   r3c1|  swp3
> r3u24| 40146251-4bb5-11ed-95a4-adc935918367| pxe-client|   r3c1|  swp5

The above command implies BMC is already configured (in order to pxe
boot the node), correct ?

So image based (genesis like) discovery is ready to be used or do we
have to self generate our own genesis-like image ?

> Like most stop at 'power on/off, *maybe* setboot, but nodeconsole, 
> nodeconfig, nodeinventory, etc are frequently out of scope for non-xCAT, 
> non-confluent OS deployment tools.

Well minimal config has to be supported just in order to customize the
image to the particular node (like "statifying" network setting and
hostname at the very minimal)

  Note also that confluent profiles have 'ansible/post.d' as well as
'scripts/post.d', opening up the possibility of triggering ansible plays
on the deployer rather than scripts on the node if desired.

You mean running ansible from the inside at boot or from the management
node ? In a pull mode ?

> In short, if you ignore the new BMC-driven discovery you end up with xCAT as 
> mostly a subset of confluent functions (excepting non-deployment DHCP 
> configuration, and ISC DNS, though that could be addressed if desired, and 
> perhaps extended to more use cases).  So we have BMC-driven, PXE-driven, and 
> manual operation all as options, just have to be very careful and clear which 
> one matches the right audience.

Do image use initialramfs (if yes, dracut type ?) ?

I have to actually play with confluent to go further in my questions.

Thanks for your help

--
Thomas HUMMEL
HPC Group
Institut PASTEUR
Paris, FRANCE

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C02%7Cjjohnson2%40lenovo.com%7C3ee9de6d1ee441e6fe8c08dcf34a5383%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638652747826448580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=axLJP3juYnL0l%2FPZre0SoA0O%2BGjrZJJP3WA11k58g60%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user>

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] [External] Re: xCAT Consortium Update

Reply via email to