So is Confluent supposed to act as a cloud-init datasource?

        https://cloudinit.readthedocs.io/en/22.4.2/topics/datasources.html

There exists in /var/lib/confluent/public/os/ubuntu-20.04.6-x86_64/ a 
autoinstall/ directory that contains “meta-data” and “user-data” files.

There’s a lot of output that flies by quite quickly, so I edited the 
“boot.ipxe” file to add “console=tty0 console=ttyS1,115200” so that the Lenovo 
webUI console could more fully see and capture the output in 
/var/log/confluent/console/. From there I see Confluent giving a PXE response:

net0: 172.17.15.155/255.255.248.0 gw 172.17.15.254                              
Next server: 172.17.15.254                                                      
Filename: 
http://172.17.15.254/confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe
        
http://172.17.15.254/confluent-public/os/ubuntu-20.04.6-x86_64-default/boot.ipxe

It then switches to link-local IPv6 (?) to fetch the ISO:

Preparing to deploy ubuntu-20.04.6-x86_64-default from 
[fe80::AAbb:Cff:feCd:dEE%2]
Connecting to [fe80::EEcc:Bff:feBa:aXX%2] ([fe80::[…]%eno0]:80)
install.iso            3% |*                               | 52.0M  0:00:26 ETA
install.iso           11% |***                             |  162M  0:00:15 ETA
[…]

Cloud-init then seems to be kicked off (with only an IPv6 LL address?):

[   57.599545] cloud-init[2691]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 
'init-local' at Tue, 14 Nov 2023 16:10:04 +0000. Up 52.98 seconds.
[   69.044787] cloud-init[2742]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 
'init' at Tue, 14 Nov 2023 16:10:09 +0000. Up 58.09 seconds.
[   69.064878] cloud-init[2742]: ci-info: 
+++++++++++++++++++++++++++++++++++++Net device 
info+++++++++++++++++++++++++++++++++++++
[   69.084789] cloud-init[2742]: ci-info: 
+--------+-------+------------------------------+-----------+-------+-------------------+
[   69.104844] cloud-init[2742]: ci-info: | Device |   Up  |           Address  
          |    Mask   | Scope |     Hw-Address    |
[   69.124838] cloud-init[2742]: ci-info: 
+--------+-------+------------------------------+-----------+-------+-------------------+
[   69.144756] cloud-init[2742]: ci-info: |  eno0  |  True | fe80::ae1f:[…]/64 
|     .     |  link | ac:1f:[…] |
[   69.164837] cloud-init[2742]: ci-info: | ens4f1 | False |              .     
          |     .     |   .   | ac:1f:[…] |
[…]

This seems to fail / error out:

[   69.456748] cloud-init[2742]: 2023-11-14 16:10:20,895 - util.py[WARNING]: 
Getting data from <class 
'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloudNet'> failed
[   69.810439] cloud-init[2742]: 2023-11-14 16:10:21,661 - 
activators.py[WARNING]: Running ['netplan', 'apply'] resulted in stderr output: 
[0;1;31mFailed to connect system bus: No such file or directory
[   69.836748] cloud-init[2742]: Falling back to a hard restart of 
systemd-networkd.service
[   70.170428] cloud-init[2742]: Generating public/private rsa key pair.

Bunch of SSH key generation stuff, until we get to:

[   77.218133] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 
'modules:final' at Tue, 14 Nov 2023 16:10:28 +0000. Up 76.89 seconds.
[   77.240868] cloud-init[3848]: Cloud-init v. 22.4.2-0ubuntu0~20.04.2 finished 
at Tue, 14 Nov 2023 16:10:29 +0000. Datasource DataSourceNone.  Up 77.20 seconds
[   77.264872] cloud-init[3848]: 2023-11-14 16:10:29,068 - 
cc_final_message.py[WARNING]: Used fallback datasource
Ubuntu 20.04.6 LTS ubuntu-server ttyS1
connecting...  
waiting for cloud-init…  

After which the manual installation of Ubuntu kicks in (the installer noticed 
that it is (now) running in a serial console, per “boot.ipxe” changes above, 
and asked if I wanted ‘rich’ or ‘basic’ mode).

> On Nov 10, 2023, at 17:06, David Magda <dmagda+x...@ee.torontomu.ca> wrote:
> 
> 
> $ nodedeploy MYHOST
> MYHOST: pending: ubuntu-20.04.6-x86_64-default
> 
> I have U22.04 available already as well if testing with that is useful. 
> 
> The server in question isn’t used for anything special currently. My hope is 
> that once I get some basic stuff going with the SuperMicro hardware we can 
> start upgrading our Lenovo systems.
> 
>> On Nov 10, 2023, at 14:25, Jarrod Johnson <jjohns...@lenovo.com> wrote:
>> 
>> It should cloud-init as a matter of course, just like for the kickstart 
>> installs...
>> 
>> What does nodedeploy <node> look like when you hit interactive?  May need to 
>> look into this more directly next week...
>> 
>>> From: David Magda <dmagda+x...@ee.torontomu.ca>
>>> Sent: Friday, November 10, 2023 2:16 PM
>>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent
>>> 
>>> Ah, silly me: bad copy-paste.
>>> 
>>> That command gives:
>>> 
>>>        File "/opt/confluent/bin/confluent_selfcheck", line 241
>>>          for rsp in sess.read(f'/nodes/{args.node}/attributes/all’):
>>>                                                                             
>>>                    ^
>>>       SyntaxError: invalid syntax
>>> 
>>> Regardless, I (re-)ran the `nodeattrib` correctly, but that did not help. I 
>>> then removed all the “filename=…” stanzas in dhcpd.conf, did a restart, and 
>>> the system got (AFAICT) an IP from DHCPd, but Confluent gave it the PXE 
>>> boot parameters and the system launched into the Ubuntu 20.04 installer. 
>>> The console is prompting me a bunch of questions.
>>> 
>>> So I’ve think I’ve finally managed to muddle through this part of the 
>>> documentation:
>>> 
>>>       https://hpc.lenovo.com/users/documentation/confluentosdeploy.html
>>> 
>>> Is there any documentation about automating Ubuntu installs with Confluent? 
>>> Does Confluent handle any cloud-init stuff (which was run during the boot 
>>> process), or is there some other method to send things that partitioning 
>>> and packing information to Ubuntu?
>>> 
>>> 
>>>> On Nov 10, 2023, at 11:01, Jarrod Johnson <jjohns...@lenovo.com> wrote:
>>>> 
>>>> The attribute name is plural, with s at the end.  
>>>> deployment.useinsecureprotocols rather than deployment.useinsecureprotocol.
>>>> 
>>>> confluent_selfcheck -n MYHOST
>>>> 
>>>> Say anything interesting?
>>>> 
>>>>> From: David Magda <dmagda+x...@ee.torontomu.ca>
>>>>> Sent: Friday, November 10, 2023 10:50 AM
>>>>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent
>>>>> 
>>>>> Looking in that file there was:
>>>>> 
>>>>>       Nov 09 09:02:06 {"info": "Boot attempt by MYHOST detected in 
>>>>> insecure 
>>>>>       mode, but insecure mode is disabled.  Set the attribute 
>>>>>       `deployment.useinsecureprotocols` to `firmware` or `always` to 
>>>>> enable 
>>>>>       support, or use UEFI HTTP boot with HTTPS." }
>>>>> 
>>>>> Trying to tweak that attribute, I got:
>>>>> 
>>>>>       $  nodeattrib MYHOST deployment.useinsecureprotocol=firmware
>>>>>       Error: Bad Request - deployment.useinsecureprotocol attribute on 
>>>>> node MYHOST is invalid
>>>>> 
>>>>> I tried using nodegroupattrib as well on a group that the host was in, 
>>>>> and got:
>>>>> 
>>>>>       Error: Bad Request - deployment.useinsecureprotocol attribute is 
>>>>> invalid
>>>>> 
>>>>> I then edited the reply_dhcp4(() function in 
>>>>> /opt/confluent/lib/python/confluent/discovery/protocols/pxe.py  to change 
>>>>> the default check to remove the “return;" in the "if insecuremode == 
>>>>> 'never' and not httpboot:" stanza so that it would continue going. The 
>>>>> log message still appears (so I know the code is getting there), but the 
>>>>> events file now has:
>>>>> 
>>>>>       Nov 09 09:18:34 {"info": "Offering PXE boot without address, served 
>>>>> from 172.17.15.254 to MYHOST"}
>>>>> 
>>>>> And the system is still booting xCat (I have commented out 
>>>>> "gpxe.no-pxedhcp 1" in dhcpd.conf and restarted).
>>>>> 
>>>>> Not running the dhcpd at all simply has the system timeout on its PXE 
>>>>> attempt. I told Confluent about the particular IP address the system 
>>>>> should have:
>>>>> 
>>>>>       $ nodeattrib MYHOST net.ipv4_address=172.17.15.223/21
>>>>> 
>>>>> And that did not help.
>>>>> 
>>>>> Per "lsof -i udp", Confluent is listening on (amongst many other ports) 
>>>>> *:bootps, *:dhcpv6-server, *:pxe (etc).
>>>>> 
>>>>> Should I edit my dhcpd.conf and rip out things like:
>>>>> 
>>>>>       […]
>>>>>       if option user-class-identifier = "xNBA" and option 
>>>>> client-architecture = 00:00 { #x86, xCAT Network Boot Agent
>>>>>               always-broadcast on;
>>>>>               filename = "…"
>>>>>       […]
>>>>> 
>>>>> to try to see if that will get things going with Confluent? Or are things 
>>>>> expected to work with all of that?
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Nov 8, 2023, at 16:19, Jarrod Johnson <jjohns...@lenovo.com> wrote:
>>>>>> 
>>>>>> tail /var/log/confluent/events for a hint on why it might be ignoring 
>>>>>> the request.
>>>>>> 
>>>>>>> From: David Magda <dma...@ee.torontomu.ca>
>>>>>>> Sent: Wednesday, November 8, 2023 2:46 PM
>>>>>>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent
>>>>>>> 
>>>>>>> 
>>>>>>> I did a “service dhcpd stop” and a “service confluent restart”, and the 
>>>>>>> SuperMicro did not receive any reply to the DHCP/PXE packets it was 
>>>>>>> sending out. I then did a “service dhcpd start” and the “xcat/genesis” 
>>>>>>> file was loaded.
>>>>>>> 
>>>>>>> The dhcpd.conf did have "gpxe.no-pxedhcp”, but removing it and 
>>>>>>> restarting did not change any behaviour. I noticed that 
>>>>>>> “http://IP:80/tftpboot/xcat/xnba/nets/172.17.8.0_21” is being 
>>>>>>> referenced.
>>>>>>> 
>>>>>>> Per “lsof -i udp”, the Confluent is listening on *:bootps, so I’m not 
>>>>>>> sure why it is not answering. I had run a “nodedeploy MYHOST -n
>>>>>>> ubuntu-20.04.6-x86_64-default” earlier.
>>>>>>> 
>>>>>>> $ nodeattrib MYHOST
>>>>>>> MYHOST: console.method: ipmi
>>>>>>> MYHOST: deployment.apiarmed: once
>>>>>>> MYHOST: deployment.pendingprofile: ubuntu-20.04.6-x86_64-default
>>>>>>> MYHOST: deployment.profile:
>>>>>>> MYHOST: deployment.stagedprofile:
>>>>>>> MYHOST: deployment.state:
>>>>>>> MYHOST: deployment.state_detail:
>>>>>>> MYHOST: groups: prox,ipmi,all,everything
>>>>>>> MYHOST: hardwaremanagement.manager: MYHOST-ipmi
>>>>>>> MYHOST: net.hwaddr: ac:1f:AA:BB:CC:DD
>>>>>>> MYHOST: net.ipv4_method: dhcp
>>>>>>> MYHOST: secret.hardwaremanagementpassword: ********
>>>>>>> MYHOST: secret.hardwaremanagementuser: ********
>>>>>>> 
>>>>>>> 
>>>>>>>> On Nov 7, 2023, at 13:40, Jarrod Johnson  wrote:
>>>>>>>> 
>>>>>>>> If dhcpd.conf is set to not send any 'filename', it's best.  If you 
>>>>>>>> don't need a dhcp server, then you can turn it off.  There's also
>>>>>>>> 
>>>>>>>> If you have a dhcp server with a dynamic range on it, then:
>>>>>>>> nodeattrib  net.ipv4_method=firmwaredhcp
>>>>>>>> 
>>>>>>>> If you have a dhcp server with static reservations, you could either 
>>>>>>>> have dhcp continue, or disallow dhcp for the confluent node.
>>>>>>>> 
>>>>>>>> If you have no dhcp server, then it should just do the right thing 
>>>>>>>> directly.
>>>>>>>> 
>>>>>>>> If you want to use dhcp ongoing, then 'net.ipv4_method=dhcp', however 
>>>>>>>> you own the IPAM sort of responsibility totally.
>>>>>>>> 
>>>>>>>> If your dhcp has:
>>>>>>>> option gpxe.no-pxedhcp 1;
>>>>>>>> Please remove that to let confluent merge an offer with an 
>>>>>>>> uncoordinated dhcp server.
>>>>>>>> 
>>>>>>>> I need to do a deeper right up on the detail about dhcp interaction, 
>>>>>>>> how it is now optional, and how it can coexist with an unmanaged dhcp 
>>>>>>>> server and free the dhcp server from 'filename'
>>>>>>>> 
>>>>>>>>> From: David Magda 
>>>>>>>>> Sent: Tuesday, November 7, 2023 9:27 AM
>>>>>>>>> To: xCAT Users Mailing list 
>>>>>>>>> Subject: Re: [xcat-user] [External] Re: xCAT-Confluent
>>>>>>>>> 
>>>>>>>>> After running the first few commands, I have 
>>>>>>>>> /tftpboot/confluent/x86_64/ipxe* and /var/lib/confluent/public/{os, 
>>>>>>>>> distribution}/ubuntu* present, along with genesis-x86_64/.
>>>>>>>>> 
>>>>>>>>> However the contents of the RHEL/CentOS /etc/dhcp/dhcpd.conf are such 
>>>>>>>>> that “filename” is “xcat/xnba.*”, so that’s what gets loaded.
>>>>>>>>> 
>>>>>>>>> Do I need to tweak the dhcpd.conf just for the test system I’m 
>>>>>>>>> playing with, or should a completely new dhcpd.conf file be put in 
>>>>>>>>> place for using Confluent? (Moving the current one out of the way, 
>>>>>>>>> perhaps temporarily until I get an understanding of Confluent so I 
>>>>>>>>> can revert to xCat if need-be.)
>>>>>>>>> 
>>>>>>>>>> On Oct 26, 2023, at 11:33, Jarrod Johnson  wrote:
>>>>>>>>>> 
>>>>>>>>>> I will say that EL7 hasn't been tested and thus we haven't pushed 
>>>>>>>>>> updates since 3.8.0, but 3.8.0 should be plenty.
>>>>>>>>>> 
>>>>>>>>>> The confluent you have going is already enough to start examining OS 
>>>>>>>>>> deployment profiles.  If you would like to, you can use commands 
>>>>>>>>>> like osdeploy initialize and osdeploy import and even imgutil build, 
>>>>>>>>>> and it won't mess with xCAT.
>>>>>>>>>> 
>>>>>>>>>> When you get to nodedeploy​, that is the time when you have to start 
>>>>>>>>>> planning around potential disruption as xCAT and confluent might 
>>>>>>>>>> fight over who gets to deploy a system, and that can be confusing.  
>>>>>>>>>> We should document formally how to mask a node from xCAT ('!*NOIP*' 
>>>>>>>>>> in mac table) to let one kick the tires with a node...
>>>>>>>>>> 
>>>>>>>>>> I can help look at a few people kicking tires, certainly seems 
>>>>>>>>>> worthy of documentation or video example...
>>>>>>>>>>> From: David Magda 
>>>>>>>>>>> Sent: Thursday, October 26, 2023 11:22 AM
>>>>>>>>>>> To: xCAT Users Mailing list 
>>>>>>>>>>> Subject: [External] Re: [xcat-user] xCAT-Confluent
>>>>>>>>>>> 
>>>>>>>>>>> Yes, there was perhaps auto-completion with regards 
>>>>>>>>>>> Confluent/Confluence.
>>>>>>>>>>> I currently have a (legacy?) ‘joint’ xCAT-Confluent (3.6) 
>>>>>>>>>>> installation on RHEL 7 that I inherited; if one wants to fully move 
>>>>>>>>>>> from xCAT to Confluent, is there document on how to ‘extract’ 
>>>>>>>>>>> oneself from xCAT? I don’t see anything that jumps out at:
>>>>>>>>>>>      https://hpc.lenovo.com/users/
>>>>>>>>>>>      https://hpc.lenovo.com/users/documentation/
>>>>>>>>>>> Should I simply abandon the previous installation and do a fresh 
>>>>>>>>>>> install? While there is some documentation, the system leans 
>>>>>>>>>>> towards being heavily vendor-used so people completely new to it 
>>>>>>>>>>> have a steep learning curve (xCAT is/was also challenging to get 
>>>>>>>>>>> into since it was fairly vendor-focused).
>>>>>>>>> […]
>>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> xCAT-user mailing list
>>> xCAT-user@lists.sourceforge.net
>>> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C01%7Cjjohnson2%40lenovo.com%7C792090eb799c44203d5f08dbe221c79a%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638352407016733478%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HnVKyN2mc6qmLTaPkQafrcs5ZZ3UV9tp%2B9xFz6jf0bE%3D&reserved=0
>>> _______________________________________________
>>> xCAT-user mailing list
>>> xCAT-user@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/xcat-user
> 
> 
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user



_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to