Yup:

"""
#   sha1sum /var/lib/confluent/public/site/ssh/*pubkey 
/etc/confluent/ssh/automation.pub
b88168467bf2920011f4a769d7cbd7aab0de0b35  
/var/lib/confluent/public/site/ssh/mp01.example.com.automationpubkey
27574dd33ad3781bb588d7fcef2b8a6dd189d3cb  
/var/lib/confluent/public/site/ssh/mp01.example.com.rootpubkey
b88168467bf2920011f4a769d7cbd7aab0de0b35  /etc/confluent/ssh/automation.pub
"""


> On Jan 26, 2024, at 15:59, Jarrod Johnson <jjohns...@lenovo.com> wrote:
> 
>> Hmm, how odd....  
>> 
>> # cat /var/lib/confluent/public/site/ssh/mp01.example.com.automationpubkey 
>> /etc/confluent/ssh/automation.pub
>> 
>> Do they match?
>> 
>> From: David Magda <dmagda+x...@ee.torontomu.ca>
>> Sent: Friday, January 26, 2024 3:50 PM
>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>> Subject: Re: [xcat-user] [External] Ansible and Confluent
>>  
>> """
>> #  ls -lrth /var/lib/confluent/public/site/ssh/*pubkey
>> -rw-r--r-- 1 confluent root 410 Oct 31 12:05 
>> /var/lib/confluent/public/site/ssh/mp01.example.com.rootpubkey
>> -rw-r--r-- 1 confluent root 129 Oct 31 12:05 
>> /var/lib/confluent/public/site/ssh/mp01.example.com.automationpubkey
>> 
>> #  ssh dm-boot1 'hostname -f; uptime'
>> dm-boot1
>>  15:47:19 up 21 min,  0 users,  load average: 0.00, 0.00, 0.00
>> """
>> 
>> 
>> > On Jan 26, 2024, at 15:43, Jarrod Johnson <jjohns...@lenovo.com> wrote:
>> > 
>> > # ls /var/lib/confluent/public/site/ssh/*pubkey
>> > 
>> > 
>> > From: David Magda <dmagda+x...@ee.torontomu.ca>
>> > Sent: Friday, January 26, 2024 3:40 PM
>> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>> > Subject: Re: [xcat-user] [External] Ansible and Confluent
>> >  
>> > There’s no “syncfiles” in the default Ubuntu profile, nor anything in the 
>> > web docs on its format, but I found a template in 
>> > "/opt/confluent/lib/osdeploy/el9/profiles/default/syncfiles”.
>> > 
>> > Created a file with the line:
>> > 
>> >         /etc/hosts -> /etc/hosts_test
>> > 
>> > With the results:
>> > 
>> > """
>> > #  nodeapply -F dm-boot1
>> > dm-boot1: 
>> > dm-boot1: 
>> > ---------------------------------------------------------------------------
>> > dm-boot1: Running python script 'syncfileclient' from 
>> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2F%5Bfe80%3A%3A749f%3A43ff%3Afe72%3A55e4%5D%2Fconfluent-public%2Fos%2Fubuntu-22.04.3-x86_64-test1%2Fscripts%2F&data=05%7C02%7Cjjohnson2%40lenovo.com%7C5c35e577e5704f7f951308dc1eb0a251%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638418991206164688%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=jmxwT6AqR9GO6CZMBSs1HnR%2B1969RQMAHzwhYLX2E54%3D&reserved=0
>> > dm-boot1: Executing in /tmp/confluentscripts.HUGo3sMtt
>> > dm-boot1: Traceback (most recent call last):
>> > dm-boot1:   File "/tmp/confluentscripts.HUGo3sMtt/syncfileclient", line 
>> > 286, in <module>
>> > dm-boot1:     synchronize()
>> > dm-boot1:   File "/tmp/confluentscripts.HUGo3sMtt/syncfileclient", line 
>> > 233, in synchronize
>> > dm-boot1:     status, rsp = 
>> > ac.grab_url_with_status('/confluent-api/self/remotesyncfiles')
>> > dm-boot1:   File "/opt/confluent/bin/apiclient", line 413, in 
>> > grab_url_with_status
>> > dm-boot1:     raise Exception(rsp.read())
>> > dm-boot1: Exception: b"500 - Command '['rsync', '-rvLD', 
>> > '/tmp/tmpSUbmoD.synctodm-boot1/', 'root@[172.17.15.222]:/']' returned 
>> > non-zero exit status 255"
>> > dm-boot1: 'syncfileclient' exited with code 1
>> > """
>> > 
>> > In "/var/log/confluent/stderr” we have:
>> > 
>> > """
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator): Traceback (most recent call last):
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator):   File 
>> > "/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 111, in wait
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator):     listener.cb(fileno)
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator):   File 
>> > "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 53, in 
>> > on_read
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator):     current.switch(([original], [], []))
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator):   File 
>> > "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 221, in 
>> > main
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator):     result = function(*args, **kwargs)
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator):   File 
>> > "/opt/confluent/lib/python/confluent/syncfiles.py", line 197, in 
>> > sync_list_to_node
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator):     ['rsync', '-rvLD', targdir + '/', 
>> > 'root@[{}]:/'.format(targip)])[0]
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator):   File 
>> > "/opt/confluent/lib/python/confluent/util.py", line 45, in run
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator):     raise 
>> > subprocess.CalledProcessError(retcode, process.args, output=stdout)
>> > Jan 26 15:28:53   File "/usr/lib64/python2.7/traceback.py", line 13, in 
>> > _print
>> >     file.write(str+terminator): CalledProcessError: Command '['rsync', 
>> > '-rvLD', '/tmp/tmpVbi9YY.synctodm-boot1/', 'root@[172.17.15.222]:/']' 
>> > returned non-zero exit status 255
>> > Jan 26 15:28:53   File 
>> > "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 317, in 
>> > squelch_exception
>> >     sys.stderr.write("Removing descriptor: %r\n" % (fileno,)): Removing 
>> > descriptor: 65
>> > """
>> > 
>> > And in “trace” we have:
>> > 
>> > """
>> > Jan 26 15:28:53 Traceback (most recent call last):
>> >   File "/opt/confluent/lib/python/confluent/httpapi.py", line 612, in 
>> > resourcehandler
>> >     for rsp in resourcehandler_backend(env, start_response):
>> >   File "/opt/confluent/lib/python/confluent/httpapi.py", line 636, in 
>> > resourcehandler_backend
>> >     for res in selfservice.handle_request(env, start_response):
>> >   File "/opt/confluent/lib/python/confluent/selfservice.py", line 502, in 
>> > handle_request
>> >     status, output = syncfiles.get_syncresult(nodename)
>> >   File "/opt/confluent/lib/python/confluent/syncfiles.py", line 321, in 
>> > get_syncresult
>> >     result = syncrunners[nodename].wait()
>> >   File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 
>> > 181, in wait
>> >     return self._exit_event.wait()
>> >   File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 132, in 
>> > wait
>> >     current.throw(*self._exc)
>> >   File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 
>> > 221, in main
>> >     result = function(*args, **kwargs)
>> >   File "/opt/confluent/lib/python/confluent/syncfiles.py", line 197, in 
>> > sync_list_to_node
>> >     ['rsync', '-rvLD', targdir + '/', 'root@[{}]:/'.format(targip)])[0]
>> >   File "/opt/confluent/lib/python/confluent/util.py", line 45, in run
>> >     raise subprocess.CalledProcessError(retcode, process.args, 
>> > output=stdout)
>> > CalledProcessError: Command '['rsync', '-rvLD', 
>> > '/tmp/tmpVbi9YY.synctodm-boot1/', 'root@[172.17.15.222]:/']' returned 
>> > non-zero exit status 255
>> > 
>> > """
>> > 
>> > > On Jan 26, 2024, at 15:01, Jarrod Johnson <jjohns...@lenovo.com> wrote:
>> > > 
>> > > Ok, another track (trying to compensate for not being able to use 
>> > > selfcheck).
>> > > 
>> > > Can you try sticking some file in the profile's syncfiles, then do:
>> > > nodeapply -F <node>
>> > > 
>> > > And see if any errors happen, either in output or in the 
>> > > /var/log/confluet area.
>> > > 
>> > >> From: David Magda <dmagda+x...@ee.torontomu.ca>
>> > >> Sent: Friday, January 26, 2024 2:01 PM
>> > >> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>> > >> Subject: Re: [xcat-user] [External] Ansible and Confluent
>> > >>  
>> > >> We have Confluent installed on a RH/CentOS 7 system that originally 
>> > >> had/has xCat installed for deployment of our Lenovo hardware/HPC 
>> > >> solution. I just installed it there as it was/is our 'install server'. 
>> > >> (We don't want to touch it too much, as it was a previous team of folks 
>> > >> that set things up, and there's been a lot of team churn.)
>> > >> 
>> > >> I've attached the "hangtraces" to this message; hopefully the mailing 
>> > >> list software will pass it along. I noticed “ipmi” in some of the 
>> > >> paths, and for the record this is a VM running under Proxmox, and does 
>> > >> not have any LOM configured:
>> > >> 
>> > >> """
>> > >> # nodeattrib dm-boot1
>> > >> dm-boot1: crypted.selfapikey: ********
>> > >> dm-boot1: deployment.apiarmed:
>> > >> dm-boot1: deployment.pendingprofile: ubuntu-22.04.3-x86_64-test1
>> > >> dm-boot1: deployment.profile:
>> > >> dm-boot1: deployment.sealedapikey:
>> > >> dm-boot1: deployment.stagedprofile:
>> > >> dm-boot1: deployment.state:
>> > >> dm-boot1: deployment.state_detail:
>> > >> dm-boot1: deployment.useinsecureprotocols: always
>> > >> dm-boot1: dns.servers: 172.17.15.252,172.17.15.247,172.17.15.254
>> > >> dm-boot1: groups: everything
>> > >> dm-boot1: net.hwaddr: 4e:78:df:d3:8d:59
>> > >> dm-boot1: net.ipv4_address: 172.17.15.222/21
>> > >> dm-boot1: net.ipv4_gateway: 172.17.8.254
>> > >> """
>> > >> 
>> > >> Running an strace(1) on the 'apiclient' that runs as part of the 
>> > >> "post.sh" process, we have a continuous poll/read/write stream:
>> > >> 
>> > >> """
>> > >> […]
>> > >> write(3, 
>> > >> "\27\3\3\0\371Sm2\233\337\222n\221\377vZs\21\22S\10\351\232\321I7Y$R\370]\312"...,
>> > >>  254) = 254
>> > >> read(3, 0x560b6949e8f3, 5)              = -1 EAGAIN (Resource 
>> > >> temporarily unavailable)
>> > >> poll([{fd=3, events=POLLIN}], 1, 15000) = 1 ([{fd=3, revents=POLLIN}])
>> > >> read(3, "\27\3\3\0\226", 5)             = 5
>> > >> read(3, 
>> > >> "\0055\271\274&\2464\237\242h\341\30\231\274\327g\224\344g\306\313\206\326\355x\307\341\331C\366H\331"...,
>> > >>  150) = 150
>> > >> poll([{fd=3, events=POLLOUT}], 1, 15000) = 1 ([{fd=3, revents=POLLOUT}])
>> > >> write(3, 
>> > >> "\27\3\3\0\371Sm2\233\337\222n\222\334e\336f\353u\343p\22\215:\264e\30a\3172\245\361"...,
>> > >>  254) = 254
>> > >> read(3, 0x560b6949e8f3, 5)              = -1 EAGAIN (Resource 
>> > >> temporarily unavailable)
>> > >> poll([{fd=3, events=POLLIN}], 1, 15000) = 1 ([{fd=3, revents=POLLIN}])
>> > >> read(3, "\27\3\3\0\226", 5)             = 5
>> > >> read(3, 
>> > >> "\0055\271\274&\2464\240\326\202\347(\213\311\260|\333\230\372A\235\341\273U\201\223\2209ah\325J"...,
>> > >>  150) = 150
>> > >> poll([{fd=3, events=POLLOUT}], 1, 15000) = 1 ([{fd=3, revents=POLLOUT}])
>> > >> write(3, 
>> > >> "\27\3\3\0\371Sm2\233\337\222n\223\240\341<\3602\323\177Y\311\317/\371\336P/s\301t8"...,
>> > >>  254) = 254
>> > >> read(3, 0x560b6949e8f3, 5)              = -1 EAGAIN (Resource 
>> > >> temporarily unavailable)
>> > >> poll([{fd=3, events=POLLIN}], 1, 15000^Cstrace: Process 27477 detached
>> > >> <detached ...>
>> > >> """
>> > >> 
>> > >> Per lsof(1), FD 3 is:
>> > >> 
>> > >> """
>> > >> python3 27477 root    3u  IPv6             158157      0t0    TCP 
>> > >> [fe80::[EUI-64_client]]:44800->[fe80::[EUI-64_server]]:https 
>> > >> (ESTABLISHED)
>> > >> """
>> > >> 
>> > >> 
>> > >> 
>> > >> On Thu, January 25, 2024 16:34, Jarrod Johnson wrote:
>> > >> > What is the OS of the deployment server?
>> > >> > 
>> > >> > kill -USR1 $(cat /var/run/confluent/pid)
>> > >> > 
>> > >> > This should produce a /var/log/confluennt/hangtraces
>> > >> > 
>> > >> > Would be interesting to see if there's ansible related stacks in
>> > >> > hangtraces that seem stuck...
>> > >> > 
>> > >> > 
>> > >> > ________________________________
>> > >> > From: David Magda <dmagda+x...@ee.torontomu.ca>
>> > >> > Sent: Thursday, January 25, 2024 4:25 PM
>> > >> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>> > >> > Subject: Re: [xcat-user] [External] Ansible and Confluent
>> > >> > 
>> > >> > First suggested command:
>> > >> > 
>> > >> > """
>> > >> > #   confluent_selfcheck
>> > >> > OS Deployment: Initialized
>> > >> > Confluent UUID: Consistent
>> > >> > Web Server: Running
>> > >> > Web Certificate: Traceback (most recent call last):
>> > >> > File "/opt/confluent/bin/confluent_selfcheck", line 178, in <module>
>> > >> >   cert = certificates_missing_ips(conn)
>> > >> > File "/opt/confluent/bin/confluent_selfcheck", line 57, in
>> > >> > certificates_missing_ips
>> > >> >   ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
>> > >> > AttributeError: 'module' object has no attribute 'PROTOCOL_TLS_CLIENT'
>> > >> > """
>> > >> > 
>> > >> > On the being-installed system, ignoring the typical Linux stuff, the
>> > >> > output of 'ps -elfH' has:
>> > >> > 
>> > >> > """
>> > >> > 
>> > >> > 4 S root        1247       1  0  80   0 -  7499 do_pol 17:53 ?       
>> > >> > 00:00:00   /usr/bin/python3 /usr/bin/networkd-dispatcher
>> > >> > --run-startup-triggers
>> > >> > 4 S root        1248       1  0  80   0 - 58623 do_pol 17:53 ?       
>> > >> > 00:00:00   /usr/libexec/polkitd --no-debug
>> > >> > 4 S syslog      1250       1  0  80   0 - 55600 do_sel 17:53 ?       
>> > >> > 00:00:00   /usr/sbin/rsyslogd -n -iNONE
>> > >> > 4 S root        1252       1  0  80   0 - 385081 futex_ 17:53 ?      
>> > >> > 00:00:03   /usr/lib/snapd/snapd
>> > >> > 4 S root        1253       1  0  80   0 -  3831 ep_pol 17:53 ?       
>> > >> > 00:00:00   /lib/systemd/systemd-logind
>> > >> > 4 S root        1255       1  0  80   0 - 98198 do_pol 17:53 ?       
>> > >> > 00:00:02   /usr/libexec/udisks2/udisksd
>> > >> > 4 S root        1283       1  0  80   0 - 26778 do_pol 17:53 ?       
>> > >> > 00:00:00   /usr/bin/python3
>> > >> > /usr/share/unattended-upgrades/unattended-upgrade-shutdown
>> > >> > --wait-for-signal
>> > >> > 4 S root        1291       1  0  80   0 - 61055 do_pol 17:53 ?       
>> > >> > 00:00:00   /usr/sbin/ModemManager
>> > >> > 4 S root        2042       1  0  80   0 -   722 do_wai 17:53 ?       
>> > >> > 00:00:00   /bin/sh /snap/subiquity/5004/usr/bin/subiquity-server
>> > >> > 4 S root        2086    2042  0  80   0 - 149574 ep_pol 17:53 ?      
>> > >> > 00:00:07     /snap/subiquity/5004/usr/bin/python3.10 -m
>> > >> > subiquity.cmd.server
>> > >> > 4 S root       27499    2086  0  80   0 -   722 do_wai 18:09 ?       
>> > >> > 00:00:00       sh -c /custom-installation/post.sh
>> > >> > 4 S root       27501   27499  0  80   0 -  1150 do_wai 18:09 ?       
>> > >> > 00:00:00         /bin/bash /custom-installation/post.sh
>> > >> > 4 S root       27588   27501  4  80   0 -  7403 do_pol 18:09 ?       
>> > >> > 00:03:16           /usr/bin/python3 /opt/confluent/bin/apiclient
>> > >> > /confluent-api/self/remoteconfig/status -w 204
>> > >> > 4 S root        2049       1  0  80   0 - 24167 ep_pol 17:53 tty1    
>> > >> > 00:00:05   /snap/subiquity/5004/usr/bin/python3.10
>> > >> > /snap/subiquity/5004/usr/bin/subiquity
>> > >> > 4 S root        2137       1  0  80   0 -  3855 do_pol 17:53 ?       
>> > >> > 00:00:00   sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
>> > >> > 4 S root       37842    2137  0  80   0 -  4310 -      19:15 ?       
>> > >> > 00:00:00     sshd: root@pts/0
>> > >> > 4 S root       37952   37842  0  80   0 -  1543 do_wai 19:15 ?       
>> > >> > 00:00:00       -bash
>> > >> > 4 R root       38032   37952  0  80   0 -  1911 -      19:16 ?       
>> > >> > 00:00:00         ps -elfH
>> > >> > 4 S root        2206       1  0  80   0 -  3266 ep_pol 17:53 ?       
>> > >> > 00:00:00   /lib/netplan/netplan-dbus
>> > >> > 4 S root        2570       1  0  80   0 - 73244 do_pol 17:53 ?       
>> > >> > 00:00:00   /usr/libexec/packagekitd
>> > >> > 4 S root       37848       1  1  80   0 -  4301 ep_pol 19:15 ?       
>> > >> > 00:00:00   /lib/systemd/systemd --user
>> > >> > 5 S root       37850   37848  0  80   0 - 26271 do_sig 19:15 ?       
>> > >> > 00:00:00     (sd-pam)
>> > >> > """
>> > >> > 
>> > >> > While 'ps axf' produces (trimmed):
>> > >> > 
>> > >> > """
>> > >> >  2042 ?        Ss     0:00 /bin/sh
>> > >> > /snap/subiquity/5004/usr/bin/subiquity-server
>> > >> >  2086 ?        Sl     0:07  \_ 
>> > >> > /snap/subiquity/5004/usr/bin/python3.10 -m
>> > >> > subiquity.cmd.server
>> > >> > 27499 ?        S      0:00      \_ sh -c /custom-installation/post.sh
>> > >> > 27501 ?        S      0:00          \_ /bin/bash
>> > >> > /custom-installation/post.sh
>> > >> > 27588 ?        S      3:21              \_ /usr/bin/python3
>> > >> > /opt/confluent/bin/apiclient /confluent-api/self/remoteconfig/status 
>> > >> > -w
>> > >> > 204
>> > >> >  2049 tty1     Ss+    0:05 /snap/subiquity/5004/usr/bin/python3.10
>> > >> > /snap/subiquity/5004/usr/bin/subiquity
>> > >> > """
>> > >> > 
>> > >> > Doing a "kill -9 27588" (on apiclient) causes the installation to
>> > >> > 'finish'. After the reboot, and after "firshboot.sh" does its thing, 
>> > >> > we
>> > >> > have the following from 'ps axf':
>> > >> > 
>> > >> > """
>> > >> > 1372 ?        Ss     0:00 /usr/bin/python3 /usr/bin/cloud-init modules
>> > >> > --mode=final
>> > >> >  1376 ?        S      0:00  \_ /bin/sh -c tee -a
>> > >> > /var/log/cloud-init-output.log
>> > >> >  1377 ?        S      0:00  |   \_ tee -a 
>> > >> > /var/log/cloud-init-output.log
>> > >> >  1378 ?        S      0:00  \_ /bin/sh
>> > >> > /var/lib/cloud/instance/scripts/runcmd
>> > >> >  1379 ?        S      0:00      \_ /bin/bash 
>> > >> > /etc/confluent/firstboot.sh
>> > >> >  1429 ?        S      0:01          \_ /usr/bin/python3
>> > >> > /opt/confluent/bin/apiclient /confluent-api/self/remoteconfig/status 
>> > >> > -w
>> > >> > 204
>> > >> > """
>> > >> > 
>> > >> > This causes the "/var/log/httpd/ssl_access_log" to start filling up. A
>> > >> > subsequent reboot, where "firstboot.sh" is not run, has the the system
>> > >> > coming up without "apiclient" running, and so there's no longer 
>> > >> > 'spam' in
>> > >> > "ssl_access_log".
>> > >> > 
>> > >> > Running "apiclient" manually from the CLI with the exact options 
>> > >> > causes a
>> > >> > bunch of stuff in "ssl_access_log":
>> > >> > 
>> > >> > """
>> > >> > fe80::[EUI-64] - - [25/Jan/2024:14:52:15 -0500] "GET
>> > >> > /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
>> > >> > """
>> > >> > 
>> > >> > at the same time as the above is being generated, there is nothing in
>> > >> > "/var/log/confluent/trace" or "stderr�.
>> > >> > 
>> > >> > 
>> > >> > On Thu, January 25, 2024 07:52, Jarrod Johnson wrote:
>> > >> >> Anything in /var/log/confluent/stderr or /var/log/confluent/trace?  
>> > >> >> Also
>> > >> >> would be tempted to see if 'confluent_selfcheck' has any 
>> > >> >> suggestions. 
>> > >> >> You
>> > >> >> can also ssh into the node during that phase to confirm what it is 
>> > >> >> doing
>> > >> >> while it is seemingly hung, e.g. looking at ps axf
>> > >> >> ________________________________
>> > >> >> From: David Magda <dmagda+x...@ee.torontomu.ca>
>> > >> >> Sent: Wednesday, January 24, 2024 9:37 PM
>> > >> >> To: xCAT-user@lists.sourceforge.net <xCAT-user@lists.sourceforge.net>
>> > >> >> Subject: [External] [xcat-user] Ansible and Confluent
>> > >> >> 
>> > >> >> Hello,
>> > >> >> 
>> > >> >> I'm trying to get Ansible working with Confluent 3.8.0. (Using an 
>> > >> >> older
>> > >> >> version due to legacy OS reasons.)
>> > >> >> 
>> > >> >> In /var/lib/confluent/public/os/ I created a new profile called
>> > >> >> ubuntu-22.04.3-x86_64-test1/, and this seems to work just fine: I 
>> > >> >> took
>> > >> >> the
>> > >> >> provided "autoinstall/user-data" file, added some partition stanzas,
>> > >> >> some
>> > >> >> packages, etc.
>> > >> >> 
>> > >> >> Once I sorted out a 'basic' automated Ubuntu install I tried 
>> > >> >> creating a
>> > >> >> "ansible/post.d/01-packages.yaml" file with-in the profile directory
>> > >> >> with
>> > >> >> the following contents:
>> > >> >> 
>> > >> >> """
>> > >> >> - name: install chrony
>> > >> >> apt:
>> > >> >>  pkg:
>> > >> >>    - chrony
>> > >> >> """
>> > >> >> 
>> > >> >> The Ubuntu (subiquity) installer seems to 'hang' at:
>> > >> >> 
>> > >> >> """
>> > >> >> start: subiquity/Late/run/command_1: /custom-installation/post.sh
>> > >> >> """
>> > >> >> 
>> > >> >> which probably corresponds to this part of the "user-data" file:
>> > >> >> 
>> > >> >> """
>> > >> >> late-commands:
>> > >> >>  - chroot /target apt-get -y -q purge snapd modemmanager
>> > >> >>  - /custom-installation/post.sh
>> > >> >> """
>> > >> >> 
>> > >> >> When the 'hang' occurs the following starts filling up the
>> > >> >> "/var/log/httpd/ssl_access_log" file of the Confluent/xcat server:
>> > >> >> 
>> > >> >> """
>> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
>> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
>> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
>> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
>> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
>> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
>> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
>> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
>> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
>> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
>> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
>> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
>> > >> >> """
>> > >> >> 
>> > >> >> When I force a restart of the system/VM, it can boot off the disk, 
>> > >> >> and
>> > >> >> goes through the regular start-up process, including a bunch of
>> > >> >> cloud-init
>> > >> >> stuff. Though after it runs "/etc/confluent/firstboot.sh", the
>> > >> >> "ssl_access_log" file once again starts filling with the
>> > >> >> "remoteconfig/status" stuff per above.
>> > >> >> 
>> > >> >> Renaming "ansible/" to "ansible_off/" seems to make the problem go 
>> > >> >> away.
>> > >> >> Similar behaviour with Ubuntu 20.04.
>> > >> >> 
>> > >> >> I'm wondering what's going with the 'hang' when "post.sh" is 
>> > >> >> executed,
>> > >> >> and
>> > >> >> the flooding after "firstboot.sh".
>> > >> >> 
>> > >> >> Regards,
>> > >> >> David


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to