So we have switch based discovery. In fact, it works with systems 'off' if the BMC is supported. Here's a quick example of 4 systems at firmware defaults, without known ip addresses or mac addresses or anything, powered down, and doing switch based discovery:
[root@r3u20 ~]# nodedefine r3u[21:24] groups=rackmount r3u21: created r3u24: created r3u22: created r3u23: created [root@r3u20 ~]# nodediscover rescan Rescan complete [root@r3u20 ~]# nodepower r3u[21:24] r3u21: off r3u22: off r3u23: off r3u24: off [root@r3u20 ~]# nodeattrib r3u[21:24] net.*switch* --blame r3u21: net.switch: r3c1 (inherited from group rackmount, derived from expression "r{n1}c1") r3u21: net.switchport: 21 (inherited from group rackmount, derived from expression "{location.u}") r3u22: net.switch: r3c1 (inherited from group rackmount, derived from expression "r{n1}c1") r3u22: net.switchport: 22 (inherited from group rackmount, derived from expression "{location.u}") r3u23: net.switch: r3c1 (inherited from group rackmount, derived from expression "r{n1}c1") r3u23: net.switchport: 23 (inherited from group rackmount, derived from expression "{location.u}") r3u24: net.switch: r3c1 (inherited from group rackmount, derived from expression "r{n1}c1") r3u24: net.switchport: 24 (inherited from group rackmount, derived from expression "{location.u}") [root@r3u20 ~]# nodedeploy r3u[21:24] -n alma-9.4-diskless r3u21: network r3u22: network r3u23: network r3u24: network r3u21: on r3u22: on r3u23: on r3u24: on But for non-Lenovo, you would do it roughly xCAT style, with 'pxe-client' and maybe a genesis image using configbmc (or another profile). One difference is you don't need a dynamic range in Confluent, as it does discovery against the DHCPDISCOVER packet rather than needing linux first. [root@r3u20 ~]# nodediscover list -t pxe-client -f node,uuid,type,switch,port -o node Node| UUID| Type| Switch| Port ------|-------------------------------------|-----------|-------|------ r3u21| 11137727-3f6e-11ed-9dcc-92feca966289| pxe-client| r3c1| swp34 r3u22| cfeaac8f-341f-11ed-a12e-ca86724e9d51| pxe-client| r3c1| swp2 r3u23| 57ab573c-327b-11ed-92ce-bf13e73b0f63| pxe-client| r3c1| swp3 r3u24| 40146251-4bb5-11ed-95a4-adc935918367| pxe-client| r3c1| swp5 However, I fully anticipate that non-Lenovo BMCs could get that treatment as well, just need someone to write the plugins. There's a sample in the repository of 'generic redfish' that probably works with light customization for different vendors, but no one has invested in that yet. For external DNS, currently, we only provide one canned thing, 'confluent2hosts'. Which is enough for, example, dnsmasq to directly read. Debating about which other scenarios to also can (nsupdate, writing zone files directly, wrapping ipa, etc). There's also 'noderun' to generically formulaize any command, which we have used for example to demonstrate feeding node data into Foreman using hammer. The commands are xcat like: [root@r3u20 ~]# nodeeventlog r3u21-r3u23 |head r3u21: 10/22/2024 09:50:03 Power Unit - Host Power - Power off r3u21: 10/22/2024 09:50:08 Cable/Interconnect - Front Video - Connected r3u21: 10/22/2024 09:50:12 Entity Presence - Front Panel - Present r3u21: 10/22/2024 09:50:14 Management Subsystem Health - Low Security Jmp - Present r3u21: 10/22/2024 10:00:03 Power Unit - Host Power - Power on r3u21: 10/22/2024 10:00:23 System Firmware - Progress - Unspecified r3u21: 10/22/2024 10:01:54 System Firmware - Progress - Starting OS boot r3u22: 10/22/2024 09:50:45 Power Unit - Host Power - Power off r3u22: 10/22/2024 09:50:50 Cable/Interconnect - Front Video - Connected r3u22: 10/22/2024 09:50:54 Entity Presence - Front Panel - Present [root@r3u20 ~]# nodehealth r3u[21:23] r3u21: ok r3u22: ok r3u23: ok [root@r3u20 ~]# nodeconfig r3u[21:24] processors | collate -d ==================================== r3u22,r3u23,r3u24 ==================================== Processors.DeterminismSlider: Performance Processors.CorePerformanceBoost: Enabled Processors.cTDP: Auto Processors.PackagePowerLimit: Auto Processors.4-LinkxGMIMaxSpeed: Minimum Processors.GlobalC-stateControl: Enabled Processors.SOCP-states: Auto Processors.DFC-States: Enabled Processors.MONITORMWAIT: Enabled Processors.P-state1: Enabled Processors.P-state2: Enabled Processors.CPUSpeculativeStoreModes: Balanced Processors.ACPISRATL3CacheasNUMADomain: Disabled Processors.L1StreamHWPrefetcher: Enabled Processors.L2StreamHWPrefetcher: Enabled Processors.L1StridePrefetcher: Enabled Processors.L1RegionPrefetcher: Enabled Processors.L2UpDownPrefetcher: Enabled Processors.SMTMode: Enabled Processors.CPPC: Enabled Processors.BoostFmax: Auto Processors.SVMMode: Enabled Processors.xGMIMaximumLinkWidth: Auto Processors.APICMode: Auto Processors.SEV-SNPSupport: Disabled Processors.HSMPSupport: Auto Processors.EnhancedREPMOVSBSTOSB: Enabled Processors.FastShortREPMOVSB: Enabled Processors.SNPMemoryRMPTableCoverage: Disabled Processors.xGMIForceLinkWidth: Auto Processors.NumberofEnabledCPUCoresPerSocket: All Processors.Processor1FuseStatus: Unfused Processors.Processor2FuseStatus: Unfused ==================================== r3u21 ==================================== @@ Processors.SOCP-states: Auto Processors.DFC-States: Enabled Processors.MONITORMWAIT: Enabled - Processors.P-state1: Enabled + Processors.P-State: Enabled - Processors.P-state2: Enabled Processors.CPUSpeculativeStoreModes: Balanced Processors.ACPISRATL3CacheasNUMADomain: Disabled Processors.L1StreamHWPrefetcher: Enabled @@ Processors.FastShortREPMOVSB: Enabled Processors.SNPMemoryRMPTableCoverage: Disabled Processors.xGMIForceLinkWidth: Auto + Processors.3DV-Cache: Auto + Processors.ACPICSTC2Latency: 800 + Processors.ProbeFilterOrganization: Dedicated + Processors.PeriodicDirectoryRinsePDRTuning: Auto Processors.NumberofEnabledCPUCoresPerSocket: All Processors.Processor1FuseStatus: Unfused - Processors.Processor2FuseStatus: Unfused + Processors.Processor2FuseStatus: N/A > What do you mean by that ? Like most stop at 'power on/off, *maybe* setboot, but nodeconsole, nodeconfig, nodeinventory, etc are frequently out of scope for non-xCAT, non-confluent OS deployment tools. > expressive way (ex: via running ansible inside the chroot before packing). > What's your take about this ? So in confluent, we provide an 'imgutil build/imgutil exec/imgutil pack' flow to build a diskless image, similar to genimage/packimage but with 'exec' in the middle and a more natural injection of 'normal' mkinitramfs-like activity without a 'geninitrd' needed. Like in xCAT, the thing lives as a 'chrootable' that you can do whatever to, with or without the help of imgutil exec. There's 'onboot' scripts which are still available, though I personally like to bake in as much since onboot means slower boot and frequently larger network transfers, however flexibility is provided. Note also that confluent profiles have 'ansible/post.d' as well as 'scripts/post.d', opening up the possibility of triggering ansible plays on the deployer rather than scripts on the node if desired. In short, if you ignore the new BMC-driven discovery you end up with xCAT as mostly a subset of confluent functions (excepting non-deployment DHCP configuration, and ISC DNS, though that could be addressed if desired, and perhaps extended to more use cases). So we have BMC-driven, PXE-driven, and manual operation all as options, just have to be very careful and clear which one matches the right audience. ________________________________ From: Thomas HUMMEL <thomas.hum...@pasteur.fr> Sent: Tuesday, October 22, 2024 9:19 AM To: xcat-user@lists.sourceforge.net <xcat-user@lists.sourceforge.net> Subject: Re: [xcat-user] [External] Re: xCAT Consortium Update On 10/22/24 12:00 AM, Jarrod Johnson via xCAT-user wrote: > FYI, to share my perspective, it's biased since my work is confluent. Hello Jarrod, > Another complication is that there's several more ways to start. You can PXE > boot and collect mac addresses, but you can also do BMC driven discovery > instead, or just add BMCs manually and run 'nodeinventory nodes -s' to get > there. Which is nice, but requires better documentation so you don't end up > wasting time with an approach you don't like. Actually a killer feature of xCAT is switch-based node discovery. One may not be confident enough in sequentially booting nodes in the hope the discovered order would match the power-on's. If I had to sort xCAT features (heavily biased toward my use case, which is HPC stateless), I would probably list : 1. switch-based discovery + BMC initial setup 2. external dns feeding capabilities 3. formulas and aliases handling 4. commandline monitoring commands (revenlog, rpower, ...) Where does confluent stand relative to those points ? (non Lenovo x86_64 hardware). > Mostly I hear about alternatives that are about OS deployment, so not as many > as concerned with deep BMC operation. What do you mean by that ? About stateless deployment, it always questions the delimitation mark between tools the software offers to configure the image (ex: via postscripts) and what can be done agnositically from it, often in a more expressive way (ex: via running ansible inside the chroot before packing). What's your take about this ? Also, postscripts paradigm may introduce some "critical sections" (for instance you could ssh too soon to a node before the postscript which configures its ssh key runs). Those of course are general thoughts but I'd be interested to understand more confluent paradigms (compared to xCAT) around those as from what I understand (maybe wrongly) confluent has somehow shifted apart from xCAT relatively simple "pxe this image" paradigm (not to reduce xCAT to only that) Thanks for your help -- Thomas HUMMEL HPC Group Institut PASTEUR Paris, FRANCE _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C02%7Cjjohnson2%40lenovo.com%7C902011afd57b4da8f22d08dcf29e7e36%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638652009794379288%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=LEYxjQdm7i4BnYhycJ3mKQspidjUU%2B3QrhmqD5QoxN8%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user>
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user