First suggested command:
"""
# confluent_selfcheck
OS Deployment: Initialized
Confluent UUID: Consistent
Web Server: Running
Web Certificate: Traceback (most recent call last):
File "/opt/confluent/bin/confluent_selfcheck", line 178, in <module>
cert = certificates_missing_ips(conn)
File "/opt/confluent/bin/confluent_selfcheck", line 57, in
certificates_missing_ips
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)
AttributeError: 'module' object has no attribute 'PROTOCOL_TLS_CLIENT'
"""
On the being-installed system, ignoring the typical Linux stuff, the output of
'ps -elfH' has:
"""
4 S root 1247 1 0 80 0 - 7499 do_pol 17:53 ? 00:00:00
/usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
4 S root 1248 1 0 80 0 - 58623 do_pol 17:53 ? 00:00:00
/usr/libexec/polkitd --no-debug
4 S syslog 1250 1 0 80 0 - 55600 do_sel 17:53 ? 00:00:00
/usr/sbin/rsyslogd -n -iNONE
4 S root 1252 1 0 80 0 - 385081 futex_ 17:53 ? 00:00:03
/usr/lib/snapd/snapd
4 S root 1253 1 0 80 0 - 3831 ep_pol 17:53 ? 00:00:00
/lib/systemd/systemd-logind
4 S root 1255 1 0 80 0 - 98198 do_pol 17:53 ? 00:00:02
/usr/libexec/udisks2/udisksd
4 S root 1283 1 0 80 0 - 26778 do_pol 17:53 ? 00:00:00
/usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown
--wait-for-signal
4 S root 1291 1 0 80 0 - 61055 do_pol 17:53 ? 00:00:00
/usr/sbin/ModemManager
4 S root 2042 1 0 80 0 - 722 do_wai 17:53 ? 00:00:00
/bin/sh /snap/subiquity/5004/usr/bin/subiquity-server
4 S root 2086 2042 0 80 0 - 149574 ep_pol 17:53 ? 00:00:07
/snap/subiquity/5004/usr/bin/python3.10 -m subiquity.cmd.server
4 S root 27499 2086 0 80 0 - 722 do_wai 18:09 ? 00:00:00
sh -c /custom-installation/post.sh
4 S root 27501 27499 0 80 0 - 1150 do_wai 18:09 ? 00:00:00
/bin/bash /custom-installation/post.sh
4 S root 27588 27501 4 80 0 - 7403 do_pol 18:09 ? 00:03:16
/usr/bin/python3 /opt/confluent/bin/apiclient
/confluent-api/self/remoteconfig/status -w 204
4 S root 2049 1 0 80 0 - 24167 ep_pol 17:53 tty1 00:00:05
/snap/subiquity/5004/usr/bin/python3.10 /snap/subiquity/5004/usr/bin/subiquity
4 S root 2137 1 0 80 0 - 3855 do_pol 17:53 ? 00:00:00
sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
4 S root 37842 2137 0 80 0 - 4310 - 19:15 ? 00:00:00
sshd: root@pts/0
4 S root 37952 37842 0 80 0 - 1543 do_wai 19:15 ? 00:00:00
-bash
4 R root 38032 37952 0 80 0 - 1911 - 19:16 ? 00:00:00
ps -elfH
4 S root 2206 1 0 80 0 - 3266 ep_pol 17:53 ? 00:00:00
/lib/netplan/netplan-dbus
4 S root 2570 1 0 80 0 - 73244 do_pol 17:53 ? 00:00:00
/usr/libexec/packagekitd
4 S root 37848 1 1 80 0 - 4301 ep_pol 19:15 ? 00:00:00
/lib/systemd/systemd --user
5 S root 37850 37848 0 80 0 - 26271 do_sig 19:15 ? 00:00:00
(sd-pam)
"""
While 'ps axf' produces (trimmed):
"""
2042 ? Ss 0:00 /bin/sh
/snap/subiquity/5004/usr/bin/subiquity-server
2086 ? Sl 0:07 \_ /snap/subiquity/5004/usr/bin/python3.10 -m
subiquity.cmd.server
27499 ? S 0:00 \_ sh -c /custom-installation/post.sh
27501 ? S 0:00 \_ /bin/bash /custom-installation/post.sh
27588 ? S 3:21 \_ /usr/bin/python3
/opt/confluent/bin/apiclient /confluent-api/self/remoteconfig/status -w 204
2049 tty1 Ss+ 0:05 /snap/subiquity/5004/usr/bin/python3.10
/snap/subiquity/5004/usr/bin/subiquity
"""
Doing a "kill -9 27588" (on apiclient) causes the installation to 'finish'.
After the reboot, and after "firshboot.sh" does its thing, we have the
following from 'ps axf':
"""
1372 ? Ss 0:00 /usr/bin/python3 /usr/bin/cloud-init modules
--mode=final
1376 ? S 0:00 \_ /bin/sh -c tee -a /var/log/cloud-init-output.log
1377 ? S 0:00 | \_ tee -a /var/log/cloud-init-output.log
1378 ? S 0:00 \_ /bin/sh /var/lib/cloud/instance/scripts/runcmd
1379 ? S 0:00 \_ /bin/bash /etc/confluent/firstboot.sh
1429 ? S 0:01 \_ /usr/bin/python3
/opt/confluent/bin/apiclient /confluent-api/self/remoteconfig/status -w 204
"""
This causes the "/var/log/httpd/ssl_access_log" to start filling up. A
subsequent reboot, where "firstboot.sh" is not run, has the the system coming
up without "apiclient" running, and so there's no longer 'spam' in
"ssl_access_log".
Running "apiclient" manually from the CLI with the exact options causes a bunch
of stuff in "ssl_access_log":
"""
fe80::[EUI-64] - - [25/Jan/2024:14:52:15 -0500] "GET
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
"""
at the same time as the above is being generated, there is nothing in
"/var/log/confluent/trace" or "stderrā.
On Thu, January 25, 2024 07:52, Jarrod Johnson wrote:
> Anything in /var/log/confluent/stderr or /var/log/confluent/trace? Also
> would be tempted to see if 'confluent_selfcheck' has any suggestions. You
> can also ssh into the node during that phase to confirm what it is doing
> while it is seemingly hung, e.g. looking at ps axf
> ________________________________
> From: David Magda <[email protected]>
> Sent: Wednesday, January 24, 2024 9:37 PM
> To: [email protected] <[email protected]>
> Subject: [External] [xcat-user] Ansible and Confluent
>
> Hello,
>
> I'm trying to get Ansible working with Confluent 3.8.0. (Using an older
> version due to legacy OS reasons.)
>
> In /var/lib/confluent/public/os/ I created a new profile called
> ubuntu-22.04.3-x86_64-test1/, and this seems to work just fine: I took the
> provided "autoinstall/user-data" file, added some partition stanzas, some
> packages, etc.
>
> Once I sorted out a 'basic' automated Ubuntu install I tried creating a
> "ansible/post.d/01-packages.yaml" file with-in the profile directory with
> the following contents:
>
> """
> - name: install chrony
> apt:
> pkg:
> - chrony
> """
>
> The Ubuntu (subiquity) installer seems to 'hang' at:
>
> """
> start: subiquity/Late/run/command_1: /custom-installation/post.sh
> """
>
> which probably corresponds to this part of the "user-data" file:
>
> """
> late-commands:
> - chroot /target apt-get -y -q purge snapd modemmanager
> - /custom-installation/post.sh
> """
>
> When the 'hang' occurs the following starts filling up the
> "/var/log/httpd/ssl_access_log" file of the Confluent/xcat server:
>
> """
> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
> fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
> /confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
> """
>
> When I force a restart of the system/VM, it can boot off the disk, and
> goes through the regular start-up process, including a bunch of cloud-init
> stuff. Though after it runs "/etc/confluent/firstboot.sh", the
> "ssl_access_log" file once again starts filling with the
> "remoteconfig/status" stuff per above.
>
> Renaming "ansible/" to "ansible_off/" seems to make the problem go away.
> Similar behaviour with Ubuntu 20.04.
>
> I'm wondering what's going with the 'hang' when "post.sh" is executed, and
> the flooding after "firstboot.sh".
>
> Regards,
> David
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user