So we have switch based discovery.  In fact, it works with systems 'off' if the 
BMC is supported.  Here's a quick example of 4 systems at firmware defaults, 
without known ip addresses or mac addresses or anything, powered down, and 
doing switch based discovery:

[root@r3u20 ~]# nodedefine r3u[21:24] groups=rackmount
r3u21: created
r3u24: created
r3u22: created
r3u23: created
[root@r3u20 ~]# nodediscover rescan
Rescan complete
[root@r3u20 ~]# nodepower r3u[21:24]
r3u21: off
r3u22: off
r3u23: off
r3u24: off
[root@r3u20 ~]# nodeattrib r3u[21:24] net.*switch* --blame
r3u21: net.switch: r3c1 (inherited from group rackmount, derived from 
expression "r{n1}c1")
r3u21: net.switchport: 21 (inherited from group rackmount, derived from 
expression "{location.u}")
r3u22: net.switch: r3c1 (inherited from group rackmount, derived from 
expression "r{n1}c1")
r3u22: net.switchport: 22 (inherited from group rackmount, derived from 
expression "{location.u}")
r3u23: net.switch: r3c1 (inherited from group rackmount, derived from 
expression "r{n1}c1")
r3u23: net.switchport: 23 (inherited from group rackmount, derived from 
expression "{location.u}")
r3u24: net.switch: r3c1 (inherited from group rackmount, derived from 
expression "r{n1}c1")
r3u24: net.switchport: 24 (inherited from group rackmount, derived from 
expression "{location.u}")
[root@r3u20 ~]# nodedeploy r3u[21:24] -n alma-9.4-diskless
r3u21: network
r3u22: network
r3u23: network
r3u24: network
r3u21: on
r3u22: on
r3u23: on
r3u24: on


But for non-Lenovo, you would do it roughly xCAT style, with 'pxe-client' and 
maybe a genesis image using configbmc (or another profile).  One difference is 
you don't need a dynamic range in Confluent, as it does discovery against the 
DHCPDISCOVER packet rather than needing linux first.

[root@r3u20 ~]# nodediscover list -t pxe-client -f node,uuid,type,switch,port 
-o node
 Node|                                 UUID|       Type| Switch|  Port
------|-------------------------------------|-----------|-------|------
r3u21| 11137727-3f6e-11ed-9dcc-92feca966289| pxe-client|   r3c1| swp34
r3u22| cfeaac8f-341f-11ed-a12e-ca86724e9d51| pxe-client|   r3c1|  swp2
r3u23| 57ab573c-327b-11ed-92ce-bf13e73b0f63| pxe-client|   r3c1|  swp3
r3u24| 40146251-4bb5-11ed-95a4-adc935918367| pxe-client|   r3c1|  swp5

However, I fully anticipate that non-Lenovo BMCs could get that treatment as 
well, just need someone to write the plugins.  There's a sample in the 
repository of 'generic redfish' that probably works with light customization 
for different vendors, but no one has invested in that yet.

For external DNS, currently, we only provide one canned thing, 
'confluent2hosts'.  Which is enough for, example, dnsmasq to directly read.  
Debating about which other scenarios to also can (nsupdate, writing zone files 
directly, wrapping ipa, etc).  There's also 'noderun' to generically formulaize 
any command, which we have used for example to demonstrate feeding node data 
into Foreman using hammer.

 The commands are xcat like:
[root@r3u20 ~]# nodeeventlog r3u21-r3u23 |head
r3u21: 10/22/2024 09:50:03 Power Unit - Host Power - Power off
r3u21: 10/22/2024 09:50:08 Cable/Interconnect - Front Video - Connected
r3u21: 10/22/2024 09:50:12 Entity Presence - Front Panel - Present
r3u21: 10/22/2024 09:50:14 Management Subsystem Health - Low Security Jmp - 
Present
r3u21: 10/22/2024 10:00:03 Power Unit - Host Power - Power on
r3u21: 10/22/2024 10:00:23 System Firmware - Progress - Unspecified
r3u21: 10/22/2024 10:01:54 System Firmware - Progress - Starting OS boot
r3u22: 10/22/2024 09:50:45 Power Unit - Host Power - Power off
r3u22: 10/22/2024 09:50:50 Cable/Interconnect - Front Video - Connected
r3u22: 10/22/2024 09:50:54 Entity Presence - Front Panel - Present
[root@r3u20 ~]# nodehealth r3u[21:23]
r3u21: ok
r3u22: ok
r3u23: ok
[root@r3u20 ~]# nodeconfig r3u[21:24] processors | collate -d
====================================
r3u22,r3u23,r3u24
====================================
Processors.DeterminismSlider: Performance
Processors.CorePerformanceBoost: Enabled
Processors.cTDP: Auto
Processors.PackagePowerLimit: Auto
Processors.4-LinkxGMIMaxSpeed: Minimum
Processors.GlobalC-stateControl: Enabled
Processors.SOCP-states: Auto
Processors.DFC-States: Enabled
Processors.MONITORMWAIT: Enabled
Processors.P-state1: Enabled
Processors.P-state2: Enabled
Processors.CPUSpeculativeStoreModes: Balanced
Processors.ACPISRATL3CacheasNUMADomain: Disabled
Processors.L1StreamHWPrefetcher: Enabled
Processors.L2StreamHWPrefetcher: Enabled
Processors.L1StridePrefetcher: Enabled
Processors.L1RegionPrefetcher: Enabled
Processors.L2UpDownPrefetcher: Enabled
Processors.SMTMode: Enabled
Processors.CPPC: Enabled
Processors.BoostFmax: Auto
Processors.SVMMode: Enabled
Processors.xGMIMaximumLinkWidth: Auto
Processors.APICMode: Auto
Processors.SEV-SNPSupport: Disabled
Processors.HSMPSupport: Auto
Processors.EnhancedREPMOVSBSTOSB: Enabled
Processors.FastShortREPMOVSB: Enabled
Processors.SNPMemoryRMPTableCoverage: Disabled
Processors.xGMIForceLinkWidth: Auto
Processors.NumberofEnabledCPUCoresPerSocket: All
Processors.Processor1FuseStatus: Unfused
Processors.Processor2FuseStatus: Unfused

====================================
r3u21
====================================
@@
 Processors.SOCP-states: Auto
 Processors.DFC-States: Enabled
 Processors.MONITORMWAIT: Enabled
- Processors.P-state1: Enabled
+ Processors.P-State: Enabled
- Processors.P-state2: Enabled
 Processors.CPUSpeculativeStoreModes: Balanced
 Processors.ACPISRATL3CacheasNUMADomain: Disabled
 Processors.L1StreamHWPrefetcher: Enabled
@@
 Processors.FastShortREPMOVSB: Enabled
 Processors.SNPMemoryRMPTableCoverage: Disabled
 Processors.xGMIForceLinkWidth: Auto
+ Processors.3DV-Cache: Auto
+ Processors.ACPICSTC2Latency: 800
+ Processors.ProbeFilterOrganization: Dedicated
+ Processors.PeriodicDirectoryRinsePDRTuning: Auto
 Processors.NumberofEnabledCPUCoresPerSocket: All
 Processors.Processor1FuseStatus: Unfused
- Processors.Processor2FuseStatus: Unfused
+ Processors.Processor2FuseStatus: N/A

> What do you mean by that ?

Like most stop at 'power on/off, *maybe* setboot, but nodeconsole, nodeconfig, 
nodeinventory, etc are frequently out of scope for non-xCAT, non-confluent OS 
deployment tools.

> expressive way (ex: via running ansible inside the chroot before packing). 
> What's your take about this ?

So in confluent, we provide an 'imgutil build/imgutil exec/imgutil pack' flow 
to build a diskless image, similar to genimage/packimage but with 'exec' in the 
middle and a more natural injection of 'normal' mkinitramfs-like activity 
without a 'geninitrd' needed.  Like in xCAT, the thing lives as a 'chrootable' 
that you can do whatever to, with or without the help of imgutil exec.  There's 
'onboot' scripts which are still available, though I personally like to bake in 
as much since onboot means slower boot and frequently larger network transfers, 
however flexibility is provided.  Note also that confluent profiles have 
'ansible/post.d' as well as 'scripts/post.d', opening up the possibility of 
triggering ansible plays on the deployer rather than scripts on the node if 
desired.

In short, if you ignore the new BMC-driven discovery you end up with xCAT as 
mostly a subset of confluent functions (excepting non-deployment DHCP 
configuration, and ISC DNS, though that could be addressed if desired, and 
perhaps extended to more use cases).  So we have BMC-driven, PXE-driven, and 
manual operation all as options, just have to be very careful and clear which 
one matches the right audience.

________________________________
From: Thomas HUMMEL <thomas.hum...@pasteur.fr>
Sent: Tuesday, October 22, 2024 9:19 AM
To: xcat-user@lists.sourceforge.net <xcat-user@lists.sourceforge.net>
Subject: Re: [xcat-user] [External] Re: xCAT Consortium Update

On 10/22/24 12:00 AM, Jarrod Johnson via xCAT-user wrote:
> FYI, to share my perspective, it's biased since my work is confluent.

Hello Jarrod,


> Another complication is that there's several more ways to start.  You can PXE 
> boot and collect mac addresses, but you can also do BMC driven discovery 
> instead, or just add BMCs manually and run 'nodeinventory nodes -s' to get 
> there. Which is nice, but requires better documentation so you don't end up 
> wasting time with an approach you don't like.

Actually a killer feature of xCAT is switch-based node discovery. One
may not be confident enough in sequentially booting nodes in the hope
the discovered order would match the power-on's.

If I had to sort xCAT features (heavily biased toward my use case, which
is HPC stateless), I would probably list :

1. switch-based discovery + BMC initial setup
2. external dns feeding capabilities
3. formulas and aliases handling
4. commandline monitoring commands (revenlog, rpower, ...)

Where does confluent stand relative to those points ? (non Lenovo x86_64
hardware).

> Mostly I hear about alternatives that are about OS deployment, so not as many 
> as concerned with deep BMC operation.

What do you mean by that ?

About stateless deployment, it always questions the delimitation mark
between tools the software offers to configure the image (ex: via
postscripts) and what can be done agnositically from it, often in a more
expressive way (ex: via running ansible inside the chroot before packing).

What's your take about this ?

Also, postscripts paradigm may introduce some "critical sections" (for
instance you could ssh too soon to a node before the postscript which
configures its ssh key runs).

Those of course are general thoughts but I'd be interested to understand
more confluent paradigms (compared to xCAT) around those as from what I
understand (maybe wrongly) confluent has somehow shifted apart from xCAT
relatively simple "pxe this image" paradigm (not to reduce xCAT to only
that)


Thanks for your help


--
Thomas HUMMEL
HPC Group
Institut PASTEUR
Paris, FRANCE


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C02%7Cjjohnson2%40lenovo.com%7C902011afd57b4da8f22d08dcf29e7e36%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638652009794379288%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=LEYxjQdm7i4BnYhycJ3mKQspidjUU%2B3QrhmqD5QoxN8%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user>
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to