Anything in /var/log/confluent/stderr or /var/log/confluent/trace?  Also would 
be tempted to see if 'confluent_selfcheck' has any suggestions.  You can also 
ssh into the node during that phase to confirm what it is doing while it is 
seemingly hung, e.g. looking at ps axf
________________________________
From: David Magda <dmagda+x...@ee.torontomu.ca>
Sent: Wednesday, January 24, 2024 9:37 PM
To: xCAT-user@lists.sourceforge.net <xCAT-user@lists.sourceforge.net>
Subject: [External] [xcat-user] Ansible and Confluent

Hello,

I'm trying to get Ansible working with Confluent 3.8.0. (Using an older version 
due to legacy OS reasons.)

In /var/lib/confluent/public/os/ I created a new profile called 
ubuntu-22.04.3-x86_64-test1/, and this seems to work just fine: I took the 
provided "autoinstall/user-data" file, added some partition stanzas, some 
packages, etc.

Once I sorted out a 'basic' automated Ubuntu install I tried creating a 
"ansible/post.d/01-packages.yaml" file with-in the profile directory with the 
following contents:

"""
- name: install chrony
 apt:
   pkg:
     - chrony
"""

The Ubuntu (subiquity) installer seems to 'hang' at:

"""
start: subiquity/Late/run/command_1: /custom-installation/post.sh
"""

which probably corresponds to this part of the "user-data" file:

"""
 late-commands:
   - chroot /target apt-get -y -q purge snapd modemmanager
   - /custom-installation/post.sh
"""

When the 'hang' occurs the following starts filling up the 
"/var/log/httpd/ssl_access_log" file of the Confluent/xcat server:

"""
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET 
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET 
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET 
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET 
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET 
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET 
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
"""

When I force a restart of the system/VM, it can boot off the disk, and goes 
through the regular start-up process, including a bunch of cloud-init stuff. 
Though after it runs "/etc/confluent/firstboot.sh", the "ssl_access_log" file 
once again starts filling with the "remoteconfig/status" stuff per above.

Renaming "ansible/" to "ansible_off/" seems to make the problem go away. 
Similar behaviour with Ubuntu 20.04.

I'm wondering what's going with the 'hang' when "post.sh" is executed, and the 
flooding after "firstboot.sh".

Regards,
David

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C02%7Cjjohnson2%40lenovo.com%7C1a071e27a40c447e020208dc1d50acd8%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638417479688016346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C60000%7C%7C%7C&sdata=rjezz0DVeivcDm%2FQyUPGNj1CPft3hI381qfEn%2BKPHkA%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user>
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to