So is Confluent supposed to act as a cloud-init datasource? https://cloudinit.readthedocs.io/en/22.4.2/topics/datasources.html
There exists in /var/lib/confluent/public/os/ubuntu-20.04.6-x86_64/ a autoinstall/ directory that contains “meta-data” and “user-data” files. There’s a lot of output that flies by quite quickly, so I edited the “boot.ipxe” file to add “console=tty0 console=ttyS1,115200” so that the Lenovo webUI console could more fully see and capture the output in /var/log/confluent/console/. From there I see Confluent giving a PXE response: net0: 172.17.15.155/255.255.248.0 gw 172.17.15.254 Next server: 172.17.15.254 Filename: http://172.17.15.254/confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe http://172.17.15.254/confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe It then switches to link-local IPv6 (?) to fetch the ISO: Preparing to deploy ubuntu-20.04.6-x86_64-default from [fe80::AAbb:Cff:feCd:dEE%2] Connecting to [fe80::EEcc:Bff:feBa:aXX%2] ([fe80::[…]%eno0]:80) install.iso 3% |* | 52.0M 0:00:26 ETA install.iso 11% |*** | 162M 0:00:15 ETA […] Cloud-init then seems to be kicked off (with only an IPv6 LL address?): [ 57.599545] cloud-init[2691]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init-local' at Tue, 14 Nov 2023 16:10:04 +0000. Up 52.98 seconds. [ 69.044787] cloud-init[2742]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'init' at Tue, 14 Nov 2023 16:10:09 +0000. Up 58.09 seconds. [ 69.064878] cloud-init[2742]: ci-info: +++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++ [ 69.084789] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ [ 69.104844] cloud-init[2742]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | [ 69.124838] cloud-init[2742]: ci-info: +--------+-------+------------------------------+-----------+-------+-------------------+ [ 69.144756] cloud-init[2742]: ci-info: | eno0 | True | fe80::ae1f:[…]/64 | . | link | ac:1f:[…] | [ 69.164837] cloud-init[2742]: ci-info: | ens4f1 | False | . | . | . | ac:1f:[…] | […] This seems to fail / error out: [ 69.456748] cloud-init[2742]: 2023-11-14 16:10:20,895 - util.py[WARNING]: Getting data from <class 'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloudNet'> failed [ 69.810439] cloud-init[2742]: 2023-11-14 16:10:21,661 - activators.py[WARNING]: Running ['netplan', 'apply'] resulted in stderr output: [0;1;31mFailed to connect system bus: No such file or directory [ 69.836748] cloud-init[2742]: Falling back to a hard restart of systemd-networkd.service [ 70.170428] cloud-init[2742]: Generating public/private rsa key pair. Bunch of SSH key generation stuff, until we get to: [ 77.218133] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'modules:final' at Tue, 14 Nov 2023 16:10:28 +0000. Up 76.89 seconds. [ 77.240868] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 finished at Tue, 14 Nov 2023 16:10:29 +0000. Datasource DataSourceNone. Up 77.20 seconds [ 77.264872] cloud-init[3848]: 2023-11-14 16:10:29,068 - cc_final_message.py[WARNING]: Used fallback datasource Ubuntu 20.04.6 LTS ubuntu-server ttyS1 connecting... waiting for cloud-init… After which the manual installation of Ubuntu kicks in (the installer noticed that it is (now) running in a serial console, per “boot.ipxe” changes above, and asked if I wanted ‘rich’ or ‘basic’ mode). > On Nov 10, 2023, at 17:06, David Magda <dmagda+x...@ee.torontomu.ca> wrote: > > > $ nodedeploy MYHOST > MYHOST: pending: ubuntu-20.04.6-x86_64-default > > I have U22.04 available already as well if testing with that is useful. > > The server in question isn’t used for anything special currently. My hope is > that once I get some basic stuff going with the SuperMicro hardware we can > start upgrading our Lenovo systems. > >> On Nov 10, 2023, at 14:25, Jarrod Johnson <jjohns...@lenovo.com> wrote: >> >> It should cloud-init as a matter of course, just like for the kickstart >> installs... >> >> What does nodedeploy <node> look like when you hit interactive? May need to >> look into this more directly next week... >> >>> From: David Magda <dmagda+x...@ee.torontomu.ca> >>> Sent: Friday, November 10, 2023 2:16 PM >>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> >>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>> >>> Ah, silly me: bad copy-paste. >>> >>> That command gives: >>> >>> File "/opt/confluent/bin/confluent_selfcheck", line 241 >>> for rsp in sess.read(f'/nodes/{args.node}/attributes/all’): >>> >>> ^ >>> SyntaxError: invalid syntax >>> >>> Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I >>> then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and >>> the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE >>> boot parameters and the system launched into the Ubuntu 20.04 installer. >>> The console is prompting me a bunch of questions. >>> >>> So I’ve think I’ve finally managed to muddle through this part of the >>> documentation: >>> >>> https://hpc.lenovo.com/users/documentation/confluentosdeploy.html >>> >>> Is there any documentation about automating Ubuntu installs with Confluent? >>> Does Confluent handle any cloud-init stuff (which was run during the boot >>> process), or is there some other method to send things that partitioning >>> and packing information to Ubuntu? >>> >>> >>>> On Nov 10, 2023, at 11:01, Jarrod Johnson <jjohns...@lenovo.com> wrote: >>>> >>>> The attribute name is plural, with s at the end. >>>> deployment.useinsecureprotocols rather than deployment.useinsecureprotocol. >>>> >>>> confluent_selfcheck -n MYHOST >>>> >>>> Say anything interesting? >>>> >>>>> From: David Magda <dmagda+x...@ee.torontomu.ca> >>>>> Sent: Friday, November 10, 2023 10:50 AM >>>>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> >>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>> >>>>> Looking in that file there was: >>>>> >>>>> Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in >>>>> insecure >>>>> mode, but insecure mode is disabled. Set the attribute >>>>> `deployment.useinsecureprotocols` to `firmware` or `always` to >>>>> enable >>>>> support, or use UEFI HTTP boot with HTTPS." } >>>>> >>>>> Trying to tweak that attribute, I got: >>>>> >>>>> $ nodeattrib MYHOST deployment.useinsecureprotocol=firmware >>>>> Error: Bad Request - deployment.useinsecureprotocol attribute on >>>>> node MYHOST is invalid >>>>> >>>>> I tried using nodegroupattrib as well on a group that the host was in, >>>>> and got: >>>>> >>>>> Error: Bad Request - deployment.useinsecureprotocol attribute is >>>>> invalid >>>>> >>>>> I then edited the reply_dhcp4(() function in >>>>> /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py to change >>>>> the default check to remove the “return;" in the "if insecuremode == >>>>> 'never' and not httpboot:" stanza so that it would continue going. The >>>>> log message still appears (so I know the code is getting there), but the >>>>> events file now has: >>>>> >>>>> Nov 09 09:18:34 {"info": "Offering PXE boot without address, served >>>>> from 172.17.15.254 to MYHOST"} >>>>> >>>>> And the system is still booting xCat (I have commented out >>>>> "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted). >>>>> >>>>> Not running the dhcpd at all simply has the system timeout on its PXE >>>>> attempt. I told Confluent about the particular IP address the system >>>>> should have: >>>>> >>>>> $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21 >>>>> >>>>> And that did not help. >>>>> >>>>> Per "lsof -i udp", Confluent is listening on (amongst many other ports) >>>>> *:bootps, *:dhcpv6-server, *:pxe (etc). >>>>> >>>>> Should I edit my dhcpd.conf and rip out things like: >>>>> >>>>> […] >>>>> if option user-class-identifier = "xNBA" and option >>>>> client-architecture = 00:00 { #x86, xCAT Network Boot Agent >>>>> always-broadcast on; >>>>> filename = "…" >>>>> […] >>>>> >>>>> to try to see if that will get things going with Confluent? Or are things >>>>> expected to work with all of that? >>>>> >>>>> >>>>> >>>>>> On Nov 8, 2023, at 16:19, Jarrod Johnson <jjohns...@lenovo.com> wrote: >>>>>> >>>>>> tail /var/log/confluent/events for a hint on why it might be ignoring >>>>>> the request. >>>>>> >>>>>>> From: David Magda <dma...@ee.torontomu.ca> >>>>>>> Sent: Wednesday, November 8, 2023 2:46 PM >>>>>>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> >>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>>> >>>>>>> >>>>>>> I did a “service dhcpd stop” and a “service confluent restart”, and the >>>>>>> SuperMicro did not receive any reply to the DHCP/PXE packets it was >>>>>>> sending out. I then did a “service dhcpd start” and the “xcat/genesis” >>>>>>> file was loaded. >>>>>>> >>>>>>> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and >>>>>>> restarting did not change any behaviour. I noticed that >>>>>>> “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being >>>>>>> referenced. >>>>>>> >>>>>>> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not >>>>>>> sure why it is not answering. I had run a “nodedeploy MYHOST -n >>>>>>> ubuntu-20.04.6-x86_64-default” earlier. >>>>>>> >>>>>>> $ nodeattrib MYHOST >>>>>>> MYHOST: console.method: ipmi >>>>>>> MYHOST: deployment.apiarmed: once >>>>>>> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default >>>>>>> MYHOST: deployment.profile: >>>>>>> MYHOST: deployment.stagedprofile: >>>>>>> MYHOST: deployment.state: >>>>>>> MYHOST: deployment.state_detail: >>>>>>> MYHOST: groups: prox,ipmi,all,everything >>>>>>> MYHOST: hardwaremanagement.manager: MYHOST-ipmi >>>>>>> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD >>>>>>> MYHOST: net.ipv4_method: dhcp >>>>>>> MYHOST: secret.hardwaremanagementpassword: ******** >>>>>>> MYHOST: secret.hardwaremanagementuser: ******** >>>>>>> >>>>>>> >>>>>>>> On Nov 7, 2023, at 13:40, Jarrod Johnson wrote: >>>>>>>> >>>>>>>> If dhcpd.conf is set to not send any 'filename', it's best. If you >>>>>>>> don't need a dhcp server, then you can turn it off. There's also >>>>>>>> >>>>>>>> If you have a dhcp server with a dynamic range on it, then: >>>>>>>> nodeattrib net.ipv4_method=firmwaredhcp >>>>>>>> >>>>>>>> If you have a dhcp server with static reservations, you could either >>>>>>>> have dhcp continue, or disallow dhcp for the confluent node. >>>>>>>> >>>>>>>> If you have no dhcp server, then it should just do the right thing >>>>>>>> directly. >>>>>>>> >>>>>>>> If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however >>>>>>>> you own the IPAM sort of responsibility totally. >>>>>>>> >>>>>>>> If your dhcp has: >>>>>>>> option gpxe.no-pxedhcp 1; >>>>>>>> Please remove that to let confluent merge an offer with an >>>>>>>> uncoordinated dhcp server. >>>>>>>> >>>>>>>> I need to do a deeper right up on the detail about dhcp interaction, >>>>>>>> how it is now optional, and how it can coexist with an unmanaged dhcp >>>>>>>> server and free the dhcp server from 'filename' >>>>>>>> >>>>>>>>> From: David Magda >>>>>>>>> Sent: Tuesday, November 7, 2023 9:27 AM >>>>>>>>> To: xCAT Users Mailing list >>>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent >>>>>>>>> >>>>>>>>> After running the first few commands, I have >>>>>>>>> /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, >>>>>>>>> distribution}/ubuntu* present, along with genesis-x86_64/. >>>>>>>>> >>>>>>>>> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such >>>>>>>>> that “filename” is “xcat/xnba.*”, so that’s what gets loaded. >>>>>>>>> >>>>>>>>> Do I need to tweak the dhcpd.conf just for the test system I’m >>>>>>>>> playing with, or should a completely new dhcpd.conf file be put in >>>>>>>>> place for using Confluent? (Moving the current one out of the way, >>>>>>>>> perhaps temporarily until I get an understanding of Confluent so I >>>>>>>>> can revert to xCat if need-be.) >>>>>>>>> >>>>>>>>>> On Oct 26, 2023, at 11:33, Jarrod Johnson wrote: >>>>>>>>>> >>>>>>>>>> I will say that EL7 hasn't been tested and thus we haven't pushed >>>>>>>>>> updates since 3.8.0, but 3.8.0 should be plenty. >>>>>>>>>> >>>>>>>>>> The confluent you have going is already enough to start examining OS >>>>>>>>>> deployment profiles. If you would like to, you can use commands >>>>>>>>>> like osdeploy initialize and osdeploy import and even imgutil build, >>>>>>>>>> and it won't mess with xCAT. >>>>>>>>>> >>>>>>>>>> When you get to nodedeploy, that is the time when you have to start >>>>>>>>>> planning around potential disruption as xCAT and confluent might >>>>>>>>>> fight over who gets to deploy a system, and that can be confusing. >>>>>>>>>> We should document formally how to mask a node from xCAT ('!*NOIP*' >>>>>>>>>> in mac table) to let one kick the tires with a node... >>>>>>>>>> >>>>>>>>>> I can help look at a few people kicking tires, certainly seems >>>>>>>>>> worthy of documentation or video example... >>>>>>>>>>> From: David Magda >>>>>>>>>>> Sent: Thursday, October 26, 2023 11:22 AM >>>>>>>>>>> To: xCAT Users Mailing list >>>>>>>>>>> Subject: [External] Re: [xcat-user] xCAT-Confluent >>>>>>>>>>> >>>>>>>>>>> Yes, there was perhaps auto-completion with regards >>>>>>>>>>> Confluent/Confluence. >>>>>>>>>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) >>>>>>>>>>> installation on RHEL 7 that I inherited; if one wants to fully move >>>>>>>>>>> from xCAT to Confluent, is there document on how to ‘extract’ >>>>>>>>>>> oneself from xCAT? I don’t see anything that jumps out at: >>>>>>>>>>> https://hpc.lenovo.com/users/ >>>>>>>>>>> https://hpc.lenovo.com/users/documentation/ >>>>>>>>>>> Should I simply abandon the previous installation and do a fresh >>>>>>>>>>> install? While there is some documentation, the system leans >>>>>>>>>>> towards being heavily vendor-used so people completely new to it >>>>>>>>>>> have a steep learning curve (xCAT is/was also challenging to get >>>>>>>>>>> into since it was fairly vendor-focused). >>>>>>>>> […] >>>>> >>> >>> >>> _______________________________________________ >>> xCAT-user mailing list >>> xCAT-user@lists.sourceforge.net >>> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C792090eb799c44203d5f08dbe221c79a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352407016733478%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HnVKyN2mc6qmLTaPkQafrcs5ZZ3UV9tp%2B9xFz6jf0bE%3D&reserved=0 >>> _______________________________________________ >>> xCAT-user mailing list >>> xCAT-user@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/xcat-user > > > _______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user