Yup: """ # sha1sum /var/lib/confluent/public/site/ssh/*pubkey /etc/confluent/ssh/automation.pub b88168467bf2920011f4a769d7cbd7aab0de0b35 /var/lib/confluent/public/site/ssh/mp01.example.com.automationpubkey 27574dd33ad3781bb588d7fcef2b8a6dd189d3cb /var/lib/confluent/public/site/ssh/mp01.example.com.rootpubkey b88168467bf2920011f4a769d7cbd7aab0de0b35 /etc/confluent/ssh/automation.pub """
> On Jan 26, 2024, at 15:59, Jarrod Johnson <jjohns...@lenovo.com> wrote: > >> Hmm, how odd.... >> >> # cat /var/lib/confluent/public/site/ssh/mp01.example.com.automationpubkey >> /etc/confluent/ssh/automation.pub >> >> Do they match? >> >> From: David Magda <dmagda+x...@ee.torontomu.ca> >> Sent: Friday, January 26, 2024 3:50 PM >> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> >> Subject: Re: [xcat-user] [External] Ansible and Confluent >> >> """ >> # ls -lrth /var/lib/confluent/public/site/ssh/*pubkey >> -rw-r--r-- 1 confluent root 410 Oct 31 12:05 >> /var/lib/confluent/public/site/ssh/mp01.example.com.rootpubkey >> -rw-r--r-- 1 confluent root 129 Oct 31 12:05 >> /var/lib/confluent/public/site/ssh/mp01.example.com.automationpubkey >> >> # ssh dm-boot1 'hostname -f; uptime' >> dm-boot1 >> 15:47:19 up 21 min, 0 users, load average: 0.00, 0.00, 0.00 >> """ >> >> >> > On Jan 26, 2024, at 15:43, Jarrod Johnson <jjohns...@lenovo.com> wrote: >> > >> > # ls /var/lib/confluent/public/site/ssh/*pubkey >> > >> > >> > From: David Magda <dmagda+x...@ee.torontomu.ca> >> > Sent: Friday, January 26, 2024 3:40 PM >> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> >> > Subject: Re: [xcat-user] [External] Ansible and Confluent >> > >> > There’s no “syncfiles” in the default Ubuntu profile, nor anything in the >> > web docs on its format, but I found a template in >> > "/opt/confluent/lib/osdeploy/el9/profiles/default/syncfiles”. >> > >> > Created a file with the line: >> > >> > /etc/hosts -> /etc/hosts_test >> > >> > With the results: >> > >> > """ >> > # nodeapply -F dm-boot1 >> > dm-boot1: >> > dm-boot1: >> > --------------------------------------------------------------------------- >> > dm-boot1: Running python script 'syncfileclient' from >> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2F%5Bfe80%3A%3A749f%3A43ff%3Afe72%3A55e4%5D%2Fconfluent-public%2Fos%2Fubuntu-22.04.3-x86_64-test1%2Fscripts%2F&data=05%7C02%7Cjjohnson2%40lenovo.com%7C5c35e577e5704f7f951308dc1eb0a251%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638418991206164688%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=jmxwT6AqR9GO6CZMBSs1HnR%2B1969RQMAHzwhYLX2E54%3D&reserved=0 >> > dm-boot1: Executing in /tmp/confluentscripts.HUGo3sMtt >> > dm-boot1: Traceback (most recent call last): >> > dm-boot1: File "/tmp/confluentscripts.HUGo3sMtt/syncfileclient", line >> > 286, in <module> >> > dm-boot1: synchronize() >> > dm-boot1: File "/tmp/confluentscripts.HUGo3sMtt/syncfileclient", line >> > 233, in synchronize >> > dm-boot1: status, rsp = >> > ac.grab_url_with_status('/confluent-api/self/remotesyncfiles') >> > dm-boot1: File "/opt/confluent/bin/apiclient", line 413, in >> > grab_url_with_status >> > dm-boot1: raise Exception(rsp.read()) >> > dm-boot1: Exception: b"500 - Command '['rsync', '-rvLD', >> > '/tmp/tmpSUbmoD.synctodm-boot1/', 'root@[172.17.15.222]:/']' returned >> > non-zero exit status 255" >> > dm-boot1: 'syncfileclient' exited with code 1 >> > """ >> > >> > In "/var/log/confluent/stderr” we have: >> > >> > """ >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): Traceback (most recent call last): >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): File >> > "/usr/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 111, in wait >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): listener.cb(fileno) >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): File >> > "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 53, in >> > on_read >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): current.switch(([original], [], [])) >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): File >> > "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 221, in >> > main >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): result = function(*args, **kwargs) >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): File >> > "/opt/confluent/lib/python/confluent/syncfiles.py", line 197, in >> > sync_list_to_node >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): ['rsync', '-rvLD', targdir + '/', >> > 'root@[{}]:/'.format(targip)])[0] >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): File >> > "/opt/confluent/lib/python/confluent/util.py", line 45, in run >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): raise >> > subprocess.CalledProcessError(retcode, process.args, output=stdout) >> > Jan 26 15:28:53 File "/usr/lib64/python2.7/traceback.py", line 13, in >> > _print >> > file.write(str+terminator): CalledProcessError: Command '['rsync', >> > '-rvLD', '/tmp/tmpVbi9YY.synctodm-boot1/', 'root@[172.17.15.222]:/']' >> > returned non-zero exit status 255 >> > Jan 26 15:28:53 File >> > "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 317, in >> > squelch_exception >> > sys.stderr.write("Removing descriptor: %r\n" % (fileno,)): Removing >> > descriptor: 65 >> > """ >> > >> > And in “trace” we have: >> > >> > """ >> > Jan 26 15:28:53 Traceback (most recent call last): >> > File "/opt/confluent/lib/python/confluent/httpapi.py", line 612, in >> > resourcehandler >> > for rsp in resourcehandler_backend(env, start_response): >> > File "/opt/confluent/lib/python/confluent/httpapi.py", line 636, in >> > resourcehandler_backend >> > for res in selfservice.handle_request(env, start_response): >> > File "/opt/confluent/lib/python/confluent/selfservice.py", line 502, in >> > handle_request >> > status, output = syncfiles.get_syncresult(nodename) >> > File "/opt/confluent/lib/python/confluent/syncfiles.py", line 321, in >> > get_syncresult >> > result = syncrunners[nodename].wait() >> > File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line >> > 181, in wait >> > return self._exit_event.wait() >> > File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 132, in >> > wait >> > current.throw(*self._exc) >> > File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line >> > 221, in main >> > result = function(*args, **kwargs) >> > File "/opt/confluent/lib/python/confluent/syncfiles.py", line 197, in >> > sync_list_to_node >> > ['rsync', '-rvLD', targdir + '/', 'root@[{}]:/'.format(targip)])[0] >> > File "/opt/confluent/lib/python/confluent/util.py", line 45, in run >> > raise subprocess.CalledProcessError(retcode, process.args, >> > output=stdout) >> > CalledProcessError: Command '['rsync', '-rvLD', >> > '/tmp/tmpVbi9YY.synctodm-boot1/', 'root@[172.17.15.222]:/']' returned >> > non-zero exit status 255 >> > >> > """ >> > >> > > On Jan 26, 2024, at 15:01, Jarrod Johnson <jjohns...@lenovo.com> wrote: >> > > >> > > Ok, another track (trying to compensate for not being able to use >> > > selfcheck). >> > > >> > > Can you try sticking some file in the profile's syncfiles, then do: >> > > nodeapply -F <node> >> > > >> > > And see if any errors happen, either in output or in the >> > > /var/log/confluet area. >> > > >> > >> From: David Magda <dmagda+x...@ee.torontomu.ca> >> > >> Sent: Friday, January 26, 2024 2:01 PM >> > >> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> >> > >> Subject: Re: [xcat-user] [External] Ansible and Confluent >> > >> >> > >> We have Confluent installed on a RH/CentOS 7 system that originally >> > >> had/has xCat installed for deployment of our Lenovo hardware/HPC >> > >> solution. I just installed it there as it was/is our 'install server'. >> > >> (We don't want to touch it too much, as it was a previous team of folks >> > >> that set things up, and there's been a lot of team churn.) >> > >> >> > >> I've attached the "hangtraces" to this message; hopefully the mailing >> > >> list software will pass it along. I noticed “ipmi” in some of the >> > >> paths, and for the record this is a VM running under Proxmox, and does >> > >> not have any LOM configured: >> > >> >> > >> """ >> > >> # nodeattrib dm-boot1 >> > >> dm-boot1: crypted.selfapikey: ******** >> > >> dm-boot1: deployment.apiarmed: >> > >> dm-boot1: deployment.pendingprofile: ubuntu-22.04.3-x86_64-test1 >> > >> dm-boot1: deployment.profile: >> > >> dm-boot1: deployment.sealedapikey: >> > >> dm-boot1: deployment.stagedprofile: >> > >> dm-boot1: deployment.state: >> > >> dm-boot1: deployment.state_detail: >> > >> dm-boot1: deployment.useinsecureprotocols: always >> > >> dm-boot1: dns.servers: 172.17.15.252,172.17.15.247,172.17.15.254 >> > >> dm-boot1: groups: everything >> > >> dm-boot1: net.hwaddr: 4e:78:df:d3:8d:59 >> > >> dm-boot1: net.ipv4_address: 172.17.15.222/21 >> > >> dm-boot1: net.ipv4_gateway: 172.17.8.254 >> > >> """ >> > >> >> > >> Running an strace(1) on the 'apiclient' that runs as part of the >> > >> "post.sh" process, we have a continuous poll/read/write stream: >> > >> >> > >> """ >> > >> […] >> > >> write(3, >> > >> "\27\3\3\0\371Sm2\233\337\222n\221\377vZs\21\22S\10\351\232\321I7Y$R\370]\312"..., >> > >> 254) = 254 >> > >> read(3, 0x560b6949e8f3, 5) = -1 EAGAIN (Resource >> > >> temporarily unavailable) >> > >> poll([{fd=3, events=POLLIN}], 1, 15000) = 1 ([{fd=3, revents=POLLIN}]) >> > >> read(3, "\27\3\3\0\226", 5) = 5 >> > >> read(3, >> > >> "\0055\271\274&\2464\237\242h\341\30\231\274\327g\224\344g\306\313\206\326\355x\307\341\331C\366H\331"..., >> > >> 150) = 150 >> > >> poll([{fd=3, events=POLLOUT}], 1, 15000) = 1 ([{fd=3, revents=POLLOUT}]) >> > >> write(3, >> > >> "\27\3\3\0\371Sm2\233\337\222n\222\334e\336f\353u\343p\22\215:\264e\30a\3172\245\361"..., >> > >> 254) = 254 >> > >> read(3, 0x560b6949e8f3, 5) = -1 EAGAIN (Resource >> > >> temporarily unavailable) >> > >> poll([{fd=3, events=POLLIN}], 1, 15000) = 1 ([{fd=3, revents=POLLIN}]) >> > >> read(3, "\27\3\3\0\226", 5) = 5 >> > >> read(3, >> > >> "\0055\271\274&\2464\240\326\202\347(\213\311\260|\333\230\372A\235\341\273U\201\223\2209ah\325J"..., >> > >> 150) = 150 >> > >> poll([{fd=3, events=POLLOUT}], 1, 15000) = 1 ([{fd=3, revents=POLLOUT}]) >> > >> write(3, >> > >> "\27\3\3\0\371Sm2\233\337\222n\223\240\341<\3602\323\177Y\311\317/\371\336P/s\301t8"..., >> > >> 254) = 254 >> > >> read(3, 0x560b6949e8f3, 5) = -1 EAGAIN (Resource >> > >> temporarily unavailable) >> > >> poll([{fd=3, events=POLLIN}], 1, 15000^Cstrace: Process 27477 detached >> > >> <detached ...> >> > >> """ >> > >> >> > >> Per lsof(1), FD 3 is: >> > >> >> > >> """ >> > >> python3 27477 root 3u IPv6 158157 0t0 TCP >> > >> [fe80::[EUI-64_client]]:44800->[fe80::[EUI-64_server]]:https >> > >> (ESTABLISHED) >> > >> """ >> > >> >> > >> >> > >> >> > >> On Thu, January 25, 2024 16:34, Jarrod Johnson wrote: >> > >> > What is the OS of the deployment server? >> > >> > >> > >> > kill -USR1 $(cat /var/run/confluent/pid) >> > >> > >> > >> > This should produce a /var/log/confluennt/hangtraces >> > >> > >> > >> > Would be interesting to see if there's ansible related stacks in >> > >> > hangtraces that seem stuck... >> > >> > >> > >> > >> > >> > ________________________________ >> > >> > From: David Magda <dmagda+x...@ee.torontomu.ca> >> > >> > Sent: Thursday, January 25, 2024 4:25 PM >> > >> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> >> > >> > Subject: Re: [xcat-user] [External] Ansible and Confluent >> > >> > >> > >> > First suggested command: >> > >> > >> > >> > """ >> > >> > # confluent_selfcheck >> > >> > OS Deployment: Initialized >> > >> > Confluent UUID: Consistent >> > >> > Web Server: Running >> > >> > Web Certificate: Traceback (most recent call last): >> > >> > File "/opt/confluent/bin/confluent_selfcheck", line 178, in <module> >> > >> > cert = certificates_missing_ips(conn) >> > >> > File "/opt/confluent/bin/confluent_selfcheck", line 57, in >> > >> > certificates_missing_ips >> > >> > ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT) >> > >> > AttributeError: 'module' object has no attribute 'PROTOCOL_TLS_CLIENT' >> > >> > """ >> > >> > >> > >> > On the being-installed system, ignoring the typical Linux stuff, the >> > >> > output of 'ps -elfH' has: >> > >> > >> > >> > """ >> > >> > >> > >> > 4 S root 1247 1 0 80 0 - 7499 do_pol 17:53 ? >> > >> > 00:00:00 /usr/bin/python3 /usr/bin/networkd-dispatcher >> > >> > --run-startup-triggers >> > >> > 4 S root 1248 1 0 80 0 - 58623 do_pol 17:53 ? >> > >> > 00:00:00 /usr/libexec/polkitd --no-debug >> > >> > 4 S syslog 1250 1 0 80 0 - 55600 do_sel 17:53 ? >> > >> > 00:00:00 /usr/sbin/rsyslogd -n -iNONE >> > >> > 4 S root 1252 1 0 80 0 - 385081 futex_ 17:53 ? >> > >> > 00:00:03 /usr/lib/snapd/snapd >> > >> > 4 S root 1253 1 0 80 0 - 3831 ep_pol 17:53 ? >> > >> > 00:00:00 /lib/systemd/systemd-logind >> > >> > 4 S root 1255 1 0 80 0 - 98198 do_pol 17:53 ? >> > >> > 00:00:02 /usr/libexec/udisks2/udisksd >> > >> > 4 S root 1283 1 0 80 0 - 26778 do_pol 17:53 ? >> > >> > 00:00:00 /usr/bin/python3 >> > >> > /usr/share/unattended-upgrades/unattended-upgrade-shutdown >> > >> > --wait-for-signal >> > >> > 4 S root 1291 1 0 80 0 - 61055 do_pol 17:53 ? >> > >> > 00:00:00 /usr/sbin/ModemManager >> > >> > 4 S root 2042 1 0 80 0 - 722 do_wai 17:53 ? >> > >> > 00:00:00 /bin/sh /snap/subiquity/5004/usr/bin/subiquity-server >> > >> > 4 S root 2086 2042 0 80 0 - 149574 ep_pol 17:53 ? >> > >> > 00:00:07 /snap/subiquity/5004/usr/bin/python3.10 -m >> > >> > subiquity.cmd.server >> > >> > 4 S root 27499 2086 0 80 0 - 722 do_wai 18:09 ? >> > >> > 00:00:00 sh -c /custom-installation/post.sh >> > >> > 4 S root 27501 27499 0 80 0 - 1150 do_wai 18:09 ? >> > >> > 00:00:00 /bin/bash /custom-installation/post.sh >> > >> > 4 S root 27588 27501 4 80 0 - 7403 do_pol 18:09 ? >> > >> > 00:03:16 /usr/bin/python3 /opt/confluent/bin/apiclient >> > >> > /confluent-api/self/remoteconfig/status -w 204 >> > >> > 4 S root 2049 1 0 80 0 - 24167 ep_pol 17:53 tty1 >> > >> > 00:00:05 /snap/subiquity/5004/usr/bin/python3.10 >> > >> > /snap/subiquity/5004/usr/bin/subiquity >> > >> > 4 S root 2137 1 0 80 0 - 3855 do_pol 17:53 ? >> > >> > 00:00:00 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups >> > >> > 4 S root 37842 2137 0 80 0 - 4310 - 19:15 ? >> > >> > 00:00:00 sshd: root@pts/0 >> > >> > 4 S root 37952 37842 0 80 0 - 1543 do_wai 19:15 ? >> > >> > 00:00:00 -bash >> > >> > 4 R root 38032 37952 0 80 0 - 1911 - 19:16 ? >> > >> > 00:00:00 ps -elfH >> > >> > 4 S root 2206 1 0 80 0 - 3266 ep_pol 17:53 ? >> > >> > 00:00:00 /lib/netplan/netplan-dbus >> > >> > 4 S root 2570 1 0 80 0 - 73244 do_pol 17:53 ? >> > >> > 00:00:00 /usr/libexec/packagekitd >> > >> > 4 S root 37848 1 1 80 0 - 4301 ep_pol 19:15 ? >> > >> > 00:00:00 /lib/systemd/systemd --user >> > >> > 5 S root 37850 37848 0 80 0 - 26271 do_sig 19:15 ? >> > >> > 00:00:00 (sd-pam) >> > >> > """ >> > >> > >> > >> > While 'ps axf' produces (trimmed): >> > >> > >> > >> > """ >> > >> > 2042 ? Ss 0:00 /bin/sh >> > >> > /snap/subiquity/5004/usr/bin/subiquity-server >> > >> > 2086 ? Sl 0:07 \_ >> > >> > /snap/subiquity/5004/usr/bin/python3.10 -m >> > >> > subiquity.cmd.server >> > >> > 27499 ? S 0:00 \_ sh -c /custom-installation/post.sh >> > >> > 27501 ? S 0:00 \_ /bin/bash >> > >> > /custom-installation/post.sh >> > >> > 27588 ? S 3:21 \_ /usr/bin/python3 >> > >> > /opt/confluent/bin/apiclient /confluent-api/self/remoteconfig/status >> > >> > -w >> > >> > 204 >> > >> > 2049 tty1 Ss+ 0:05 /snap/subiquity/5004/usr/bin/python3.10 >> > >> > /snap/subiquity/5004/usr/bin/subiquity >> > >> > """ >> > >> > >> > >> > Doing a "kill -9 27588" (on apiclient) causes the installation to >> > >> > 'finish'. After the reboot, and after "firshboot.sh" does its thing, >> > >> > we >> > >> > have the following from 'ps axf': >> > >> > >> > >> > """ >> > >> > 1372 ? Ss 0:00 /usr/bin/python3 /usr/bin/cloud-init modules >> > >> > --mode=final >> > >> > 1376 ? S 0:00 \_ /bin/sh -c tee -a >> > >> > /var/log/cloud-init-output.log >> > >> > 1377 ? S 0:00 | \_ tee -a >> > >> > /var/log/cloud-init-output.log >> > >> > 1378 ? S 0:00 \_ /bin/sh >> > >> > /var/lib/cloud/instance/scripts/runcmd >> > >> > 1379 ? S 0:00 \_ /bin/bash >> > >> > /etc/confluent/firstboot.sh >> > >> > 1429 ? S 0:01 \_ /usr/bin/python3 >> > >> > /opt/confluent/bin/apiclient /confluent-api/self/remoteconfig/status >> > >> > -w >> > >> > 204 >> > >> > """ >> > >> > >> > >> > This causes the "/var/log/httpd/ssl_access_log" to start filling up. A >> > >> > subsequent reboot, where "firstboot.sh" is not run, has the the system >> > >> > coming up without "apiclient" running, and so there's no longer >> > >> > 'spam' in >> > >> > "ssl_access_log". >> > >> > >> > >> > Running "apiclient" manually from the CLI with the exact options >> > >> > causes a >> > >> > bunch of stuff in "ssl_access_log": >> > >> > >> > >> > """ >> > >> > fe80::[EUI-64] - - [25/Jan/2024:14:52:15 -0500] "GET >> > >> > /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - >> > >> > """ >> > >> > >> > >> > at the same time as the above is being generated, there is nothing in >> > >> > "/var/log/confluent/trace" or "stderr�. >> > >> > >> > >> > >> > >> > On Thu, January 25, 2024 07:52, Jarrod Johnson wrote: >> > >> >> Anything in /var/log/confluent/stderr or /var/log/confluent/trace? >> > >> >> Also >> > >> >> would be tempted to see if 'confluent_selfcheck' has any >> > >> >> suggestions. >> > >> >> You >> > >> >> can also ssh into the node during that phase to confirm what it is >> > >> >> doing >> > >> >> while it is seemingly hung, e.g. looking at ps axf >> > >> >> ________________________________ >> > >> >> From: David Magda <dmagda+x...@ee.torontomu.ca> >> > >> >> Sent: Wednesday, January 24, 2024 9:37 PM >> > >> >> To: xCAT-user@lists.sourceforge.net <xCAT-user@lists.sourceforge.net> >> > >> >> Subject: [External] [xcat-user] Ansible and Confluent >> > >> >> >> > >> >> Hello, >> > >> >> >> > >> >> I'm trying to get Ansible working with Confluent 3.8.0. (Using an >> > >> >> older >> > >> >> version due to legacy OS reasons.) >> > >> >> >> > >> >> In /var/lib/confluent/public/os/ I created a new profile called >> > >> >> ubuntu-22.04.3-x86_64-test1/, and this seems to work just fine: I >> > >> >> took >> > >> >> the >> > >> >> provided "autoinstall/user-data" file, added some partition stanzas, >> > >> >> some >> > >> >> packages, etc. >> > >> >> >> > >> >> Once I sorted out a 'basic' automated Ubuntu install I tried >> > >> >> creating a >> > >> >> "ansible/post.d/01-packages.yaml" file with-in the profile directory >> > >> >> with >> > >> >> the following contents: >> > >> >> >> > >> >> """ >> > >> >> - name: install chrony >> > >> >> apt: >> > >> >> pkg: >> > >> >> - chrony >> > >> >> """ >> > >> >> >> > >> >> The Ubuntu (subiquity) installer seems to 'hang' at: >> > >> >> >> > >> >> """ >> > >> >> start: subiquity/Late/run/command_1: /custom-installation/post.sh >> > >> >> """ >> > >> >> >> > >> >> which probably corresponds to this part of the "user-data" file: >> > >> >> >> > >> >> """ >> > >> >> late-commands: >> > >> >> - chroot /target apt-get -y -q purge snapd modemmanager >> > >> >> - /custom-installation/post.sh >> > >> >> """ >> > >> >> >> > >> >> When the 'hang' occurs the following starts filling up the >> > >> >> "/var/log/httpd/ssl_access_log" file of the Confluent/xcat server: >> > >> >> >> > >> >> """ >> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET >> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - >> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET >> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - >> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET >> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - >> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET >> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - >> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET >> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - >> > >> >> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET >> > >> >> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 - >> > >> >> """ >> > >> >> >> > >> >> When I force a restart of the system/VM, it can boot off the disk, >> > >> >> and >> > >> >> goes through the regular start-up process, including a bunch of >> > >> >> cloud-init >> > >> >> stuff. Though after it runs "/etc/confluent/firstboot.sh", the >> > >> >> "ssl_access_log" file once again starts filling with the >> > >> >> "remoteconfig/status" stuff per above. >> > >> >> >> > >> >> Renaming "ansible/" to "ansible_off/" seems to make the problem go >> > >> >> away. >> > >> >> Similar behaviour with Ubuntu 20.04. >> > >> >> >> > >> >> I'm wondering what's going with the 'hang' when "post.sh" is >> > >> >> executed, >> > >> >> and >> > >> >> the flooding after "firstboot.sh". >> > >> >> >> > >> >> Regards, >> > >> >> David _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user