Hey all, Finally got some time to check this on our servers and build up a separate test cluster and I found the issue, no debugging required. Seems we are/were using IP addresses instead of names in the corosync.conf. I replicated that with the separate test cluster and noticed the exact same behaviour. Thanks for all the support!! I really appreciate it!
Respectfully, Tyler Phillippe Apr 25, 2023, 3:23 AM by jfrie...@redhat.com: > On 24/04/2023 22:16, Tyler Phillippe via Users wrote: > >> Hello all, >> >> We are currently using RHEL9 and have set up a PCS cluster. When restarting >> the servers, we noticed Corosync 3.1.5 doesn't start properly with the below >> error message: >> >> Parse error in config: No valid name found for local host >> Corosync Cluster Engine exiting with status 8 at main.c:1445. >> Corosync.service: Main process exited, code=exited, status=8/n/a >> >> These are physical, blade machines that are using a 2x Fibre Channel NIC in >> a Mode 6 bond as their networking interface for the cluster; other than >> that, there is really nothing special about these machines. We have ensured >> the names of the machines exist in /etc/hosts and that they can resolve >> those names via the hosts file first. The strange >> > > This is really weird. All described symptoms simply points to name service > (DNS/NIS/...) is not available during bootup and it will become available > later. But if /etc/hosts really contains static entries it should just work. > > Could you please try to set debug: trace in corosync.conf like > ``` > ... > logging { > to_syslog: yes > to_stderr: yes > timestamp: on > to_logfile: yes > logfile: /var/log/cluster/corosync.log > > debug: trace > } > ... > ``` > > and observe very beginning output of corosync (either in syslog or in > /var/log/cluster/corosync.log)? There should be something like > > totemip_parse: IPv4 address of NAME resolved as IPADDR > > Also compare the difference between corosync started on boot and later after > multi-user.target. > > thing is if we start Corosync manually after we can SSH into the machines, > Corosync starts immediately and without issue. We did manage to get Corosync > to autostart properly by modifying the service file and changing the > After=network-online.target to After=multi-user.target. In doing this, at > first, Pacemaker complains about mismatching dependencies in the service > between Corosync and Pacemaker. Changing the Pacemaker service to > After=multi-user.target fixes that self-caused issue. Any ideas on this one? > Mostly checking to see if changing the After dependency will harm us in the > future. > > That's questionable. It's always best if resolve uses /etc/hosts reliably, > what is not the case now, so IMHO better to find a reason why /etc/hosts > doesn't work rather than "workaround" it. > > Regards, > Honza > >> >> Thanks! >> >> Respectfully, >> Tyler Phillippe >> >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/