Ultimately, it should be doing this:
autoinstall
ds=nocloud-net;s=https://${ipv4s}/confluent-public/os/${osprofile}/autoinstall/
Making changes as appropriate and pulling in the autoinstall in that way.
However, the networknig comes from:
{
echo "DEVICE='$DEVICE'"
echo "PROTO='none'"
echo "IPV4PROTO='none'"
echo "IPV4ADDR='$v4addr'"
echo "IPV4NETMASK='$v4nm'"
echo "IPV4BROADCAST='$v4nm'"
echo "IPV4GATEWAY='$v4gw'"
echo "IPV4DNS1='$dns'"
echo "HOSTNAME='$NODENAME'"
echo "DNSDOMAIN='$dnsdomain'"
echo "DOMAINSEARCH='$dnsdomain'"
} > "/run/net-$DEVICE.conf"
Something along those lines.
At the time of failure, are you able to ssh in?
________________________________
From: David Magda <[email protected]>
Sent: Tuesday, November 14, 2023 11:28 AM
To: xCAT Users Mailing list <[email protected]>
Subject: Re: [xcat-user] [External] Re: xCAT-Confluent
So is Confluent supposed to act as a cloud-init datasource?
https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloudinit.readthedocs.io%2Fen%2F22.4.2%2Ftopics%2Fdatasources.html&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302651495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oSlkgQHp3cj1odc6mKTvRcNgk8czFzkkHQSWdOc4%2FUQ%3D&reserved=0<https://cloudinit.readthedocs.io/en/22.4.2/topics/datasources.html>
There exists in /var/lib/confluent/public/os/ubuntu-20.04.6-x86_64/ a
autoinstall/ directory that contains “meta-data” and “user-data” files.
There’s a lot of output that flies by quite quickly, so I edited the
“boot.ipxe” file to add “console=tty0 console=ttyS1,115200” so that the Lenovo
webUI console could more fully see and capture the output in
/var/log/confluent/console/. From there I see Confluent giving a PXE response:
net0: 172.17.15.155/255.255.248.0 gw 172.17.15.254
Next server: 172.17.15.254
Filename:
https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.15.254%2Fconfluent-public%2Fos%2Fubuntu-20.04.6-x86_64-default%2Fboot.ipxe&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302651495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lPvEYKVmMGZEhRK18L3mT5vyBpbhSOfnRJ3%2BJ7Y7%2BRE%3D&reserved=0<http://172.17.15.254/confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe>
https://apc01.safelinks.protection.outlook.com/?url=http%3A%2F%2F172.17.15.254%2Fconfluent-public%2Fos%2Fubuntu-20.04.6-x86_64-default%2Fboot.ipxe&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302651495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lPvEYKVmMGZEhRK18L3mT5vyBpbhSOfnRJ3%2BJ7Y7%2BRE%3D&reserved=0<http://172.17.15.254/confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe>
It then switches to link-local IPv6 (?) to fetch the ISO:
Preparing to deploy ubuntu-20.04.6-x86_64-default from
[fe80::AAbb:Cff:feCd:dEE%2]
Connecting to [fe80::EEcc:Bff:feBa:aXX%2] ([fe80::[…]%eno0]:80)
install.iso 3% |* | 52.0M 0:00:26 ETA
install.iso 11% |*** | 162M 0:00:15 ETA
[…]
Cloud-init then seems to be kicked off (with only an IPv6 LL address?):
[ 57.599545] cloud-init[2691]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running
'init-local' at Tue, 14 Nov 2023 16:10:04 +0000. Up 52.98 seconds.
[ 69.044787] cloud-init[2742]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running
'init' at Tue, 14 Nov 2023 16:10:09 +0000. Up 58.09 seconds.
[ 69.064878] cloud-init[2742]: ci-info:
+++++++++++++++++++++++++++++++++++++Net device
info+++++++++++++++++++++++++++++++++++++
[ 69.084789] cloud-init[2742]: ci-info:
+--------+-------+------------------------------+-----------+-------+-------------------+
[ 69.104844] cloud-init[2742]: ci-info: | Device | Up | Address
| Mask | Scope | Hw-Address |
[ 69.124838] cloud-init[2742]: ci-info:
+--------+-------+------------------------------+-----------+-------+-------------------+
[ 69.144756] cloud-init[2742]: ci-info: | eno0 | True | fe80::ae1f:[…]/64
| . | link | ac:1f:[…] |
[ 69.164837] cloud-init[2742]: ci-info: | ens4f1 | False | .
| . | . | ac:1f:[…] |
[…]
This seems to fail / error out:
[ 69.456748] cloud-init[2742]: 2023-11-14 16:10:20,895 - util.py[WARNING]:
Getting data from <class
'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloudNet'> failed
[ 69.810439] cloud-init[2742]: 2023-11-14 16:10:21,661 -
activators.py[WARNING]: Running ['netplan', 'apply'] resulted in stderr output:
[0;1;31mFailed to connect system bus: No such file or directory
[ 69.836748] cloud-init[2742]: Falling back to a hard restart of
systemd-networkd.service
[ 70.170428] cloud-init[2742]: Generating public/private rsa key pair.
Bunch of SSH key generation stuff, until we get to:
[ 77.218133] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running
'modules:final' at Tue, 14 Nov 2023 16:10:28 +0000. Up 76.89 seconds.
[ 77.240868] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 finished
at Tue, 14 Nov 2023 16:10:29 +0000. Datasource DataSourceNone. Up 77.20 seconds
[ 77.264872] cloud-init[3848]: 2023-11-14 16:10:29,068 -
cc_final_message.py[WARNING]: Used fallback datasource
Ubuntu 20.04.6 LTS ubuntu-server ttyS1
connecting...
waiting for cloud-init…
After which the manual installation of Ubuntu kicks in (the installer noticed
that it is (now) running in a serial console, per “boot.ipxe” changes above,
and asked if I wanted ‘rich’ or ‘basic’ mode).
> On Nov 10, 2023, at 17:06, David Magda <[email protected]> wrote:
>
>
> $ nodedeploy MYHOST
> MYHOST: pending: ubuntu-20.04.6-x86_64-default
>
> I have U22.04 available already as well if testing with that is useful.
>
> The server in question isn’t used for anything special currently. My hope is
> that once I get some basic stuff going with the SuperMicro hardware we can
> start upgrading our Lenovo systems.
>
>> On Nov 10, 2023, at 14:25, Jarrod Johnson <[email protected]> wrote:
>>
>> It should cloud-init as a matter of course, just like for the kickstart
>> installs...
>>
>> What does nodedeploy <node> look like when you hit interactive? May need to
>> look into this more directly next week...
>>
>>> From: David Magda <[email protected]>
>>> Sent: Friday, November 10, 2023 2:16 PM
>>> To: xCAT Users Mailing list <[email protected]>
>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent
>>>
>>> Ah, silly me: bad copy-paste.
>>>
>>> That command gives:
>>>
>>> File "/opt/confluent/bin/confluent_selfcheck", line 241
>>> for rsp in sess.read(f'/nodes/{args.node}/attributes/all’):
>>>
>>> ^
>>> SyntaxError: invalid syntax
>>>
>>> Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I
>>> then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and
>>> the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE
>>> boot parameters and the system launched into the Ubuntu 20.04 installer.
>>> The console is prompting me a bunch of questions.
>>>
>>> So I’ve think I’ve finally managed to muddle through this part of the
>>> documentation:
>>>
>>> https://hpc.lenovo.com/users/documentation/confluentosdeploy.html
>>>
>>> Is there any documentation about automating Ubuntu installs with Confluent?
>>> Does Confluent handle any cloud-init stuff (which was run during the boot
>>> process), or is there some other method to send things that partitioning
>>> and packing information to Ubuntu?
>>>
>>>
>>>> On Nov 10, 2023, at 11:01, Jarrod Johnson <[email protected]> wrote:
>>>>
>>>> The attribute name is plural, with s at the end.
>>>> deployment.useinsecureprotocols rather than deployment.useinsecureprotocol.
>>>>
>>>> confluent_selfcheck -n MYHOST
>>>>
>>>> Say anything interesting?
>>>>
>>>>> From: David Magda <[email protected]>
>>>>> Sent: Friday, November 10, 2023 10:50 AM
>>>>> To: xCAT Users Mailing list <[email protected]>
>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent
>>>>>
>>>>> Looking in that file there was:
>>>>>
>>>>> Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in
>>>>> insecure
>>>>> mode, but insecure mode is disabled. Set the attribute
>>>>> `deployment.useinsecureprotocols` to `firmware` or `always` to
>>>>> enable
>>>>> support, or use UEFI HTTP boot with HTTPS." }
>>>>>
>>>>> Trying to tweak that attribute, I got:
>>>>>
>>>>> $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware
>>>>> Error: Bad Request - deployment.useinsecureprotocol attribute on
>>>>> node MYHOST is invalid
>>>>>
>>>>> I tried using nodegroupattrib as well on a group that the host was in,
>>>>> and got:
>>>>>
>>>>> Error: Bad Request - deployment.useinsecureprotocol attribute is
>>>>> invalid
>>>>>
>>>>> I then edited the reply_dhcp4(() function in
>>>>> /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change
>>>>> the default check to remove the “return;" in the "if insecuremode ==
>>>>> 'never' and not httpboot:" stanza so that it would continue going. The
>>>>> log message still appears (so I know the code is getting there), but the
>>>>> events file now has:
>>>>>
>>>>> Nov 09 09:18:34 {"info": "Offering PXE boot without address, served
>>>>> from 172.17.15.254 to MYHOST"}
>>>>>
>>>>> And the system is still booting xCat (I have commented out
>>>>> "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted).
>>>>>
>>>>> Not running the dhcpd at all simply has the system timeout on its PXE
>>>>> attempt. I told Confluent about the particular IP address the system
>>>>> should have:
>>>>>
>>>>> $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21
>>>>>
>>>>> And that did not help.
>>>>>
>>>>> Per "lsof -i udp", Confluent is listening on (amongst many other ports)
>>>>> *:bootps, *:dhcpv6-server, *:pxe (etc).
>>>>>
>>>>> Should I edit my dhcpd.conf and rip out things like:
>>>>>
>>>>> […]
>>>>> if option user-class-identifier = "xNBA" and option
>>>>> client-architecture = 00:00 { #x86, xCAT Network Boot Agent
>>>>> always-broadcast on;
>>>>> filename = "…"
>>>>> […]
>>>>>
>>>>> to try to see if that will get things going with Confluent? Or are things
>>>>> expected to work with all of that?
>>>>>
>>>>>
>>>>>
>>>>>> On Nov 8, 2023, at 16:19, Jarrod Johnson <[email protected]> wrote:
>>>>>>
>>>>>> tail /var/log/confluent/events for a hint on why it might be ignoring
>>>>>> the request.
>>>>>>
>>>>>>> From: David Magda <[email protected]>
>>>>>>> Sent: Wednesday, November 8, 2023 2:46 PM
>>>>>>> To: xCAT Users Mailing list <[email protected]>
>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent
>>>>>>>
>>>>>>>
>>>>>>> I did a “service dhcpd stop” and a “service confluent restart”, and the
>>>>>>> SuperMicro did not receive any reply to the DHCP/PXE packets it was
>>>>>>> sending out. I then did a “service dhcpd start” and the “xcat/genesis”
>>>>>>> file was loaded.
>>>>>>>
>>>>>>> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and
>>>>>>> restarting did not change any behaviour. I noticed that
>>>>>>> “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being
>>>>>>> referenced.
>>>>>>>
>>>>>>> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not
>>>>>>> sure why it is not answering. I had run a “nodedeploy MYHOST -n
>>>>>>> ubuntu-20.04.6-x86_64-default” earlier.
>>>>>>>
>>>>>>> $ nodeattrib MYHOST
>>>>>>> MYHOST: console.method: ipmi
>>>>>>> MYHOST: deployment.apiarmed: once
>>>>>>> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default
>>>>>>> MYHOST: deployment.profile:
>>>>>>> MYHOST: deployment.stagedprofile:
>>>>>>> MYHOST: deployment.state:
>>>>>>> MYHOST: deployment.state_detail:
>>>>>>> MYHOST: groups: prox,ipmi,all,everything
>>>>>>> MYHOST: hardwaremanagement.manager: MYHOST-ipmi
>>>>>>> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD
>>>>>>> MYHOST: net.ipv4_method: dhcp
>>>>>>> MYHOST: secret.hardwaremanagementpassword: ********
>>>>>>> MYHOST: secret.hardwaremanagementuser: ********
>>>>>>>
>>>>>>>
>>>>>>>> On Nov 7, 2023, at 13:40, Jarrod Johnson wrote:
>>>>>>>>
>>>>>>>> If dhcpd.conf is set to not send any 'filename', it's best. If you
>>>>>>>> don't need a dhcp server, then you can turn it off. There's also
>>>>>>>>
>>>>>>>> If you have a dhcp server with a dynamic range on it, then:
>>>>>>>> nodeattrib net.ipv4_method=firmwaredhcp
>>>>>>>>
>>>>>>>> If you have a dhcp server with static reservations, you could either
>>>>>>>> have dhcp continue, or disallow dhcp for the confluent node.
>>>>>>>>
>>>>>>>> If you have no dhcp server, then it should just do the right thing
>>>>>>>> directly.
>>>>>>>>
>>>>>>>> If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however
>>>>>>>> you own the IPAM sort of responsibility totally.
>>>>>>>>
>>>>>>>> If your dhcp has:
>>>>>>>> option gpxe.no-pxedhcp 1;
>>>>>>>> Please remove that to let confluent merge an offer with an
>>>>>>>> uncoordinated dhcp server.
>>>>>>>>
>>>>>>>> I need to do a deeper right up on the detail about dhcp interaction,
>>>>>>>> how it is now optional, and how it can coexist with an unmanaged dhcp
>>>>>>>> server and free the dhcp server from 'filename'
>>>>>>>>
>>>>>>>>> From: David Magda
>>>>>>>>> Sent: Tuesday, November 7, 2023 9:27 AM
>>>>>>>>> To: xCAT Users Mailing list
>>>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent
>>>>>>>>>
>>>>>>>>> After running the first few commands, I have
>>>>>>>>> /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os,
>>>>>>>>> distribution}/ubuntu* present, along with genesis-x86_64/.
>>>>>>>>>
>>>>>>>>> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such
>>>>>>>>> that “filename” is “xcat/xnba.*”, so that’s what gets loaded.
>>>>>>>>>
>>>>>>>>> Do I need to tweak the dhcpd.conf just for the test system I’m
>>>>>>>>> playing with, or should a completely new dhcpd.conf file be put in
>>>>>>>>> place for using Confluent? (Moving the current one out of the way,
>>>>>>>>> perhaps temporarily until I get an understanding of Confluent so I
>>>>>>>>> can revert to xCat if need-be.)
>>>>>>>>>
>>>>>>>>>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote:
>>>>>>>>>>
>>>>>>>>>> I will say that EL7 hasn't been tested and thus we haven't pushed
>>>>>>>>>> updates since 3.8.0, but 3.8.0 should be plenty.
>>>>>>>>>>
>>>>>>>>>> The confluent you have going is already enough to start examining OS
>>>>>>>>>> deployment profiles. If you would like to, you can use commands
>>>>>>>>>> like osdeploy initialize and osdeploy import and even imgutil build,
>>>>>>>>>> and it won't mess with xCAT.
>>>>>>>>>>
>>>>>>>>>> When you get to nodedeploy, that is the time when you have to start
>>>>>>>>>> planning around potential disruption as xCAT and confluent might
>>>>>>>>>> fight over who gets to deploy a system, and that can be confusing.
>>>>>>>>>> We should document formally how to mask a node from xCAT ('!*NOIP*'
>>>>>>>>>> in mac table) to let one kick the tires with a node...
>>>>>>>>>>
>>>>>>>>>> I can help look at a few people kicking tires, certainly seems
>>>>>>>>>> worthy of documentation or video example...
>>>>>>>>>>> From: David Magda
>>>>>>>>>>> Sent: Thursday, October 26, 2023 11:22 AM
>>>>>>>>>>> To: xCAT Users Mailing list
>>>>>>>>>>> Subject: [External] Re: [xcat-user] xCAT-Confluent
>>>>>>>>>>>
>>>>>>>>>>> Yes, there was perhaps auto-completion with regards
>>>>>>>>>>> Confluent/Confluence.
>>>>>>>>>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6)
>>>>>>>>>>> installation on RHEL 7 that I inherited; if one wants to fully move
>>>>>>>>>>> from xCAT to Confluent, is there document on how to ‘extract’
>>>>>>>>>>> oneself from xCAT? I don’t see anything that jumps out at:
>>>>>>>>>>> https://hpc.lenovo.com/users/
>>>>>>>>>>> https://hpc.lenovo.com/users/documentation/
>>>>>>>>>>> Should I simply abandon the previous installation and do a fresh
>>>>>>>>>>> install? While there is some documentation, the system leans
>>>>>>>>>>> towards being heavily vendor-used so people completely new to it
>>>>>>>>>>> have a steep learning curve (xCAT is/was also challenging to get
>>>>>>>>>>> into since it was fairly vendor-focused).
>>>>>>>>> […]
>>>>>
>>>
>>>
>>> _______________________________________________
>>> xCAT-user mailing list
>>> [email protected]
>>> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302651495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=z6gqMfoNs%2FXCpDMEVr%2BwmDplWtm7rTo3CBCRuneLIvs%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user>
>>> _______________________________________________
>>> xCAT-user mailing list
>>> [email protected]
>>> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302651495%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=z6gqMfoNs%2FXCpDMEVr%2BwmDplWtm7rTo3CBCRuneLIvs%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user>
>
>
> _______________________________________________
> xCAT-user mailing list
> [email protected]
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302807767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TTd4PMTqgrf9iAM9TvZL2ctlSNN4EDo2Qx89ybEj6Sw%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user>
_______________________________________________
xCAT-user mailing list
[email protected]
https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C591a94e01bd94df251cd08dbe52f022e%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638355762302807767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TTd4PMTqgrf9iAM9TvZL2ctlSNN4EDo2Qx89ybEj6Sw%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user>
U
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user