This should be a fairly routine scenario, I'll double check Ubuntu deployment, 
but just to be sure, the two interfaces are actually distinct ethernet 
segments, not aliases on the same ethernet segment?
________________________________
From: Brian Joiner <martinitime1...@gmail.com>
Sent: Tuesday, May 6, 2025 9:25 AM
To: Jarrod Johnson <jjohns...@lenovo.com>; xCAT Users Mailing list 
<xcat-user@lists.sourceforge.net>
Subject: Re: [xcat-user] [External] Confluent: Anyone get Ubuntu to deploy?


FYI

Something I am noticing, and it seems to be consistent, is the installation is 
still failing IF the VM has more than 1 nic.

Even though it PXE boots from the deployment interface, once the install starts 
it fails to configure the network if a second nic is enabled (in this case the 
enterprise network).  The network does not come up, DHCP will not get an 
address, and they have to be configured by hand.  I also (sometimes) have 
issues where nodediscover adds the right mac but still tries to PXE boot off 
the wrong interface due to the UUID associated with both interfaces being the 
same.

xCAT would deploy the node but often the network was still not correct even if 
the routes table was setup right.  Both nics would setup their own default 
gateway instead of using one (the enterprise).  The 'setroutes' postscript 
would fix the issue but it didn't persist after reboot.  Easy enough to fix 
with a postscript full of nmcli statements.  However, if the node won't deploy 
with 2 nics present and it has to be added manually as a device in vSphere that 
defeats a hands off deploy.


So far I only see this with the Ubuntu installs, Rocky seems to deploy (then I 
fix the default route and custom dns with a bash postscript).

Since the bulk of a cluster is made up of compute nodes this may not be a huge 
deal, just for login/remote visualization/etc nodes that need connectivity to 
the corporate network, but I thought I would document it here.

So for now, I only add the one nic, deploy the node, add the second nic and 
update the netplan file with the new nic, default gw, and proper dns 
address/search order.

Hope it helps,

Brian J


On 4/25/25 05:48, Jarrod Johnson wrote:
Ok, changes were made to the Ubuntu deployment bootstrap to be a bit more 
tenacious. There was previously a chance to fall through the automation setup 
into manual setup. Now it should never do that, either hanging or erroring.  
I'll look at a hanging scenario I saw yesterday and make it more clear what is 
missing, but that shouldn't apply to a normal PXE install.

Also as an aside I've been doing "hardware" control to target VCSA.  I can't 
seem to get "setboot" to work yet, but could provide power, inventory and text 
console. The API for boot device doesn't seem to work as I would expect.  I 
hope to push libvirt, VCSA, and proxmox in the near future.

________________________________
From: Brian Joiner <martinitime1...@gmail.com><mailto:martinitime1...@gmail.com>
Sent: Thursday, April 24, 2025 8:24 PM
To: xCAT Users Mailing list 
<xcat-user@lists.sourceforge.net><mailto:xcat-user@lists.sourceforge.net>
Cc: Jarrod Johnson <jjohns...@lenovo.com><mailto:jjohns...@lenovo.com>
Subject: Re: [xcat-user] [External] Confluent: Anyone get Ubuntu to deploy?

Ok so today I had some time to try again:

Ubuntu server 22.04 pxe boots, starts installer, but becomes interactive and 
will not detect the network
Added ubuntu server 24.04, same result

Then I saw the email about the new release.  Installed updates normally via 
yum, retried 24.04 .  No other changes.

It worked!  Fully hands off, hostname correct.

I may not have mentioned that my environment is esxi  7 VM's all the way, so I 
don't know if there was a problem with the nic firmware or what, but the 
Confluent update fixed it.

Thanks for all you devs do!

Brian Joiner


On Wed, Apr 9, 2025 at 11:25 AM Brian Joiner 
<martinitime1...@gmail.com<mailto:martinitime1...@gmail.com>> wrote:

Awesome, thanks.  Just knowing that a > 18.04 deployment should work as 
expected is a good start. I'll double check paths, permissions, logs and the 
curl test in the other reply and report back.




On 4/1/25 12:57, Jarrod Johnson via xCAT-user wrote:
The 'password not accepted' can happen if you reboot a deployment and retry 
without doing a 'nodedeploy' again. In confluent you have to explicitly say you 
want to deploy, as there's a security mechanism that locks down after a node 
API token is claimed.

[root@r3u20 ~]# nodedeploy r3u24
r3u24: pending: ubuntu-22.04.5-x86_64-default (node authentication armed)
[root@r3u20 ~]#


Note 'node authentication armed'. This means it is configured to allow a single 
more weakly authenticated request for a node token.

At some point during an install attempt, you get to:
[root@r3u20 ~]# nodedeploy r3u24
r3u24: pending: ubuntu-22.04.5-x86_64-default
[root@r3u20 ~]#

In this case, a new attempt must be accompanied by another nodedeploy, a reboot 
without completing deployment will discard the node token that was granted.

However, I assume at least the first attempt did fail for other reasons and may 
need to know more about that.  I just did a deployment myself of 22.04.5 
without issue and had no issues, so at least in theory it should be workable, 
but will probably need to see some files when you get to a manual interaciton 
point.

Files like /conf/param.conf, files in /custom-installation/confluent/..
________________________________
From: Brian Joiner <martinitime1...@gmail.com><mailto:martinitime1...@gmail.com>
Sent: Tuesday, April 1, 2025 12:00 PM
To: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net> 
<xcat-user@lists.sourceforge.net><mailto:xcat-user@lists.sourceforge.net>
Subject: [External] [xcat-user] Confluent: Anyone get Ubuntu to deploy?

I have attempted to deploy an Ubuntu 22.04 server via Confluent, and
I've run into all kinds of issues.  Either the process dies after trying
to mount the cd, I get the option to enter an emergency command prompt,
or I get sent into a non-interactive setup screen where I'm prompted to
enter account/network/disk info by hand.  I've had the same experience
in my home lab and my office Confluent instance.

However, if I deploy Rocky 9.x to the same node (with no attribute
changes) it deploys as expected.  Has anyone gotten Ubuntu server to
deploy without intervention?  Have I missed some kind of setup step for
Ubuntu based installs?

My confluent server has all updated packages, and I didn't see anything
of interest in /var/log/confluent

Attached is one of the fail screen shots if that helps.

Thanks,

Brian Joiner




_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net<mailto:xCAT-user@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/xcat-user

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to