Hey all,

Finally got some time to check this on our servers and build up a separate test 
cluster and I found the issue, no debugging required. Seems we are/were using 
IP addresses instead of names in the corosync.conf. I replicated that with the 
separate test cluster and noticed the exact same behaviour. Thanks for all the 
support!! I really appreciate it!

Respectfully,
 Tyler Phillippe



Apr 25, 2023, 3:23 AM by jfrie...@redhat.com:

> On 24/04/2023 22:16, Tyler Phillippe via Users wrote:
>
>> Hello all,
>>
>> We are currently using RHEL9 and have set up a PCS cluster. When restarting 
>> the servers, we noticed Corosync 3.1.5 doesn't start properly with the below 
>> error message:
>>
>> Parse error in config: No valid name found for local host
>> Corosync Cluster Engine exiting with status 8 at main.c:1445.
>> Corosync.service: Main process exited, code=exited, status=8/n/a
>>
>> These are physical, blade machines that are using a 2x Fibre Channel NIC in 
>> a Mode 6 bond as their networking interface for the cluster; other than 
>> that, there is really nothing special about these machines. We have ensured 
>> the names of the machines exist in /etc/hosts and that they can resolve 
>> those names via the hosts file first. The strange
>>
>
> This is really weird. All described symptoms simply points to name service 
> (DNS/NIS/...) is not available during bootup and it will become available 
> later. But if /etc/hosts really contains static entries it should just work.
>
> Could you please try to set debug: trace in corosync.conf like
> ```
> ...
> logging {
>  to_syslog: yes
>  to_stderr: yes
>  timestamp: on
>  to_logfile: yes
>  logfile: /var/log/cluster/corosync.log
>
>  debug: trace
> }
> ...
> ```
>
> and observe very beginning output of corosync (either in syslog or in 
> /var/log/cluster/corosync.log)? There should be something like
>
> totemip_parse: IPv4 address of NAME resolved as IPADDR
>
> Also compare the difference between corosync started on boot and later after 
> multi-user.target.
>
> thing is if we start Corosync manually after we can SSH into the machines, 
> Corosync starts immediately and without issue. We did manage to get Corosync 
> to autostart properly by modifying the service file and changing the 
> After=network-online.target to After=multi-user.target. In doing this, at 
> first, Pacemaker complains about mismatching dependencies in the service 
> between Corosync and Pacemaker. Changing the Pacemaker service to 
> After=multi-user.target fixes that self-caused issue. Any ideas on this one? 
> Mostly checking to see if changing the After dependency will harm us in the 
> future.
>
> That's questionable. It's always best if resolve uses /etc/hosts reliably, 
> what is not the case now, so IMHO better to find a reason why /etc/hosts 
> doesn't work rather than "workaround" it.
>
> Regards,
>  Honza
>
>>
>> Thanks!
>>
>> Respectfully,
>>   Tyler Phillippe
>>
>>
>> _______________________________________________
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to