> a) So the BMC config is made out of band ? This would imply 2 DHCP servers if 1 vlan for data and 1 for BMC ?
One thing to point out, is that we may have arbitrarily many or few networks, similar to xCAT. I realized this isn't a given as I hear some other software bakes in a fairly hard assumption about a fixed number of networks with specific roles. But also, DHCP is optional, in both cases, for different reasons. For the PXE case, confluent directly receives (and optionally answers) the DHCP discover packet, and that packet has MAC and UUID in it, and that's enough to list, correlate with out of band information, and to search ethernet switches for. Technically, it means PXE discovery is limited to those data, but those are generally the most relevant. As to the BMCs, where supported... > b) how can no dynamic range work without ending up either BMC ip address being "random" (i.e 2 discover would not necessary end up the same BMC having the same ip address) or exposing to ip address conflict risk (one BMC get set up its final ip address while it is used by another one still being discovered) ? In the event of a routed network with BMCs only accessible through a router, that assessment is in the ball park, and confluent can move a dynamic IP to a static and renumber. As you surmise, in that scenario, you have some risks and potential for it to not work, so best results are if we are in the same vlan. If we are in the same vlan, we can tolerate some really traditionally bad stuff. Torturing my nodes for example, by giving them all the identical IPv4 address, but still things are working: [root@r3u20 ~]# nodeconfig r3u[21:24] bmc.ipv4_address r3u21: bmc.ipv4_address: 172.30.91.1/16 r3u22: bmc.ipv4_address: 172.30.91.1/16 r3u23: bmc.ipv4_address: 172.30.91.1/16 r3u24: bmc.ipv4_address: 172.30.91.1/16 [root@r3u20 ~]# arping -I bridge0 172.30.91.1 -c 1 ARPING 172.30.91.1 from 172.30.193.20 bridge0 Unicast reply from 172.30.91.1 [90:2E:16:0E:33:F2] 0.709ms Unicast reply from 172.30.91.1 [90:2E:16:0E:34:7E] 0.737ms Unicast reply from 172.30.91.1 [90:2E:16:0E:32:08] 0.751ms Unicast reply from 172.30.91.1 [90:2E:16:0D:C7:85] 0.764ms Sent 1 probes (1 broadcast(s)) Received 4 response(s) [root@r3u20 ~]# The answer is we have an entirely different cooler network that is pretty bulletproof: # nodediscover list -o node -f node,mac,type,ip -t lenovo-xcc |grep -E '(Node|----|r3u2)'|cat Node| Mac| Type| IP -----|------------------|-----------|------------------------------------------------ r3u21| 90:2e:16:0e:32:08| lenovo-xcc| fe80::922e:16ff:fe0e:3208%bridge0 r3u22| 90:2e:16:0d:c7:85| lenovo-xcc| fe80::922e:16ff:fe0d:c785%bridge0 r3u23| 90:2e:16:0e:34:7e| lenovo-xcc| fe80::922e:16ff:fe0e:347e%bridge0 r3u24| 90:2e:16:0e:33:f2| lenovo-xcc| 172.30.91.1,fe80::922e:16ff:fe0e:33f2%bridge0 So for things like BMCs, we use the link local address which generally participates well in multicast protocols like SSDP or mDNS and is always unique and always there if ipv6 is enabled at all, with or without a 'real' address. In this case I did 'collect LLA' mode so I can fix this easy enough, as a model for how the addressing would be straightened out by discovery: [root@r3u20 ~]# nodeconfig r3u[21:24] bmc.ipv4_address=172.30.{location.rack+128}.{location.u}/16 [root@r3u20 ~]# nodeping -s 172.30.{location.rack+128}.{location.u} r3u[21:24] 172.30.131.21: ping 172.30.131.22: ping 172.30.131.23: ping 172.30.131.24: ping >The above command implies BMC is already configured (in order to pxe boot the node), correct ? In the xCAT norm, we assume a power button press leads to a natural PXE boot, and that carries over to confluent PXE discovery. So if you can't assume BMC first (unsupported BMC, or a BMC that needs shared but is disabled or needs a vlan tag and doesn't have it), then you do PXE-first the xCAT way, and hope that power button press leads to PXE boot. In this example I used BMC because it was more convenient, but the data showed was consistent with a PXE-only appreach. > So image based (genesis like) discovery is ready to be used or do we have to self generate our own genesis-like image ? A genesis is included, but is optional, even for PXE discovery. When a server console looks like: >>Checking Media Presence...... >>Media Present...... >>Start PXE over IPv4 on MAC: 84-16-0C-FB-B7-0C. Then it's far enough along to perform confluent PXE discovery. If you have done `nodedeploy noderange -n -p genesis-x86_64' prior to discovery, then it would take you into the canned genesis profile. Or you can provide your own diskless image or take care of traditional genesis behaviors during %pre of a kickstart. > You mean running ansible from the inside at boot or from the management node ? In a pull mode ? The scripts are pulled, the ansible plays, if used, are executed by the confluent service (as the confluent uid) on the deploying server with the deploying node as the host (any host specified in the play is superseded by whatever host is actually ready to be hit by the play). >Do image use initialramfs (if yes, dracut type ?) ? Genesis is generated by dracut still, with no 'root' stage. For diskless images, depends on the OS, SUSE and RedHat, dracut, for Ubuntu (and theoretically other Debian related), then it uses initramfs-tools, since that's the 'usual' stack for those platforms, though there are use cases that drive folks to switch to dracut for Ubuntu, confluent currently sticks with initramfs tools. ________________________________ From: Thomas HUMMEL <thomas.hum...@pasteur.fr> Sent: Wednesday, October 23, 2024 6:04 AM To: xcat-user@lists.sourceforge.net <xcat-user@lists.sourceforge.net> Subject: Re: [xcat-user] [External] Re: xCAT Consortium Update Hello, thanks for your answer. On 10/22/24 4:18 PM, Jarrod Johnson via xCAT-user wrote: > But for non-Lenovo, you would do it roughly xCAT style, with 'pxe-client' and > maybe a genesis image using configbmc (or another profile). One difference > is you don't need a dynamic range in Confluent, as it does discovery against > the DHCPDISCOVER packet rather than needing linux first. The BMC discovery mechanism seems cool but I don't think I really understand how it works : a) So the BMC config is made out of band ? This would imply 2 DHCP servers if 1 vlan for data and 1 for BMC ? b) how can no dynamic range work without ending up either BMC ip address being "random" (i.e 2 discover would not necessary end up the same BMC having the same ip address) or exposing to ip address conflict risk (one BMC get set up its final ip address while it is used by another one still being discovered) ? > [root@r3u20 ~]# nodediscover list -t pxe-client -f node,uuid,type,switch,port > -o node > Node| UUID| Type| Switch| Port > ------|-------------------------------------|-----------|-------|------ > r3u21| 11137727-3f6e-11ed-9dcc-92feca966289| pxe-client| r3c1| swp34 > r3u22| cfeaac8f-341f-11ed-a12e-ca86724e9d51| pxe-client| r3c1| swp2 > r3u23| 57ab573c-327b-11ed-92ce-bf13e73b0f63| pxe-client| r3c1| swp3 > r3u24| 40146251-4bb5-11ed-95a4-adc935918367| pxe-client| r3c1| swp5 The above command implies BMC is already configured (in order to pxe boot the node), correct ? So image based (genesis like) discovery is ready to be used or do we have to self generate our own genesis-like image ? > Like most stop at 'power on/off, *maybe* setboot, but nodeconsole, > nodeconfig, nodeinventory, etc are frequently out of scope for non-xCAT, > non-confluent OS deployment tools. Well minimal config has to be supported just in order to customize the image to the particular node (like "statifying" network setting and hostname at the very minimal) Note also that confluent profiles have 'ansible/post.d' as well as 'scripts/post.d', opening up the possibility of triggering ansible plays on the deployer rather than scripts on the node if desired. You mean running ansible from the inside at boot or from the management node ? In a pull mode ? > In short, if you ignore the new BMC-driven discovery you end up with xCAT as > mostly a subset of confluent functions (excepting non-deployment DHCP > configuration, and ISC DNS, though that could be addressed if desired, and > perhaps extended to more use cases). So we have BMC-driven, PXE-driven, and > manual operation all as options, just have to be very careful and clear which > one matches the right audience. Do image use initialramfs (if yes, dracut type ?) ? I have to actually play with confluent to go further in my questions. Thanks for your help -- Thomas HUMMEL HPC Group Institut PASTEUR Paris, FRANCE _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C02%7Cjjohnson2%40lenovo.com%7C3ee9de6d1ee441e6fe8c08dcf34a5383%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638652747826448580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=axLJP3juYnL0l%2FPZre0SoA0O%2BGjrZJJP3WA11k58g60%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user>
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user