I'd probably take it down to the most generic description, I'd probably run 
tcpdump/wireshark in a way like:

# tcpdump -i <deployment facing nic> ether host <mac address of a test system>

Do you see UDP traffic during the attempt, or does it seem to vanish?  If it 
seems to vanish, then it's going to be either something like the vSwitch uplink 
is somehow messed up on one or both of the hosts or the switch failing to 
connect you...

If you see the bootps but confluent seems to miss it, then I'd have to dig 
deeper, but if the same exact VM works after a migration based on target host, 
I'd assume vSwitch or switch configuration issue...  Maybe a disagreement on 
vlan tag/pvid, maybe tagged on vswitch and pvid on switch or vice versa... 
Maybe missing uplink ports on the related vswitch...
________________________________
From: Brian Joiner <[email protected]>
Sent: Friday, December 5, 2025 11:25 AM
To: xCAT Users Mailing list <[email protected]>
Cc: Jarrod Johnson <[email protected]>
Subject: Re: [xcat-user] [External] Confluent PXE boot and VMware Virtual 
switches


I think its PXE itself.  The node tries to PXE boot but the head node never 
registers an attempt and nodediscover rescan/list shows no entries.  After 
vMotioning the node to the same host as the head node and retrying, the attempt 
is registered and I can assign the mac to the node and deploy it.  This has 
been consistent behavior in my testing.


We have the same issue at our colo partners private cloud, only the web 
interface we're provided doesn't give us access to the underlying hardware 
hosts.  So it works on occasion if we're lucky, but there's no way to see it 
unless I raise a ticket with them.  So if I can figure it out on our internal 
cluster I have more details to give our colo vendor.



On 12/1/25 15:28, Jarrod Johnson via xCAT-user wrote:
Is it PXE stage that gives you trouble or a step after Linux is booted?
________________________________
From: Brian Joiner <[email protected]><mailto:[email protected]>
Sent: Monday, December 1, 2025 4:22 PM
To: xCAT Users Mailing list 
<[email protected]><mailto:[email protected]>
Subject: [External] [xcat-user] Confluent PXE boot and VMware Virtual switches


Hi all,

I emailed a while back regarding some inconsistent behavior with PXE boot on 
Confluent deployments in virtual environments, and now that I have some more 
data points I've narrowed it down to this:

IF both the head node and client node VM are on the same physical (esxi in this 
case) host, then PXE answers as expected and deployment works.  If they are on 
different nodes, it times out and the client does not answer.  I have followed 
several guides and enlisted the AI to make sure my multicast setup is right, 
but at the end of the day I'm not a network engineer or a VMware expert.  Has 
anyone else got experience with configuring a distributed virtual switch in 
vSphere for a management/deployment network that allows PXE to traverse to 
other physical hosts?


Thanks,

Brian Joiner




_______________________________________________
xCAT-user mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/xcat-user

_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to