On 08.08.23 11:45, Nicholas Yang wrote:
Maybe a firewall is blocking the ports used by corosync. In a 2-node cluster using UDPU, 4 udp streams should be seen on one of the LAN interfaces, including:

1. from local port 5405 to remote port 5405
2. from remote port 5405 to local port 5405
3. from a local ephemeral port to remote port 5405
4. from a remote ephemeral port to local port 5405

Blocking any one of the streams with stop corosync from working.

I'm not sure why I didn't start checking this from the beginning... After adding a firewall rule for UDP port 5405, the node connected without any issues.
Thank you for your help.

Regards,
bartk


On 8/8/23 16:50, Bartosz Kaczyński wrote:
On 08.08.23 10:06, Nicholas Yang wrote:
Hi Bartk,

Hi Nicholas,

Thank you for your message and advice. I've checked the logs on node02 during the cluster joining process. I've pasted them to the Pastebin service:

- Logs from the systemd corosync service [1]
- Logs from the systemd pacemaker service [2]
- Log located at /var/log/pacemaker/pacemaker.log [3]

While reviewing the logs, I didn't particularly notice any specific errors, certainly nothing directly related to the inability to resolve the node01 name. However, I'm not an expert, so it's hard for me to make a definite statement.

[1] https://paste.opensuse.org/pastes/009aaf4f60c8
[2] https://paste.opensuse.org/pastes/8a31fec97802
[3] https://paste.opensuse.org/pastes/4641f5adc940

Regards,
bartk


You may look into the status and logs of service `corosync` and `pacemaker` to check what's going wrong.

systemctl status corosync
systemctl status pacemaker
journalctl --unit corosync
journalctl --unit pacemaker

Regards,
Nicholas Yang

On 8/8/23 04:45, Bartosz Kaczyński wrote:
Hello ClusterLabs community!

I'm in the process of learning this technology, using the course "Say Goodbye to Downtime with SUSE Linux Enterprise Server (Repeat)" [1] where the instructor guides through the process of installing a cluster on one node and joining to the cluster on another node.

The instructor is using SLES 15. The entire test environment is quite complex in terms of networking and storage. I think I managed to replicate it on openSUSE Leap 15.5.

I initiated the cluster from node01 with the following command:

node01:~ # crm cluster init -u \
    -s /dev/disk/by-path/ip-10.10.11.111:3260-iscsi-iqn.(...)-lun-0 \
    -s /dev/disk/by-path/ip-10.10.12.112:3260-iscsi-iqn.(...)-lun-0 \
    -s /dev/disk/by-path/ip-10.10.13.113:3260-iscsi-iqn.(...)-lun-0

From node02, I then tried to join it to the cluster with the following command [2]

I also believe that I met all the requirements regarding the network connectivity between the nodes and cluster resolution. ICMP is working between all interfaces, the /etc/hosts file is also filled in on both hosts, and time synchronization is ensured with chronyd.

The network diagram is available as a downloadable graphic [3], and someone has created a similar tutorial using VMware as the hypervisor (while I am using KVM) [4].

I am aware that providing assistance in such a situation can be challenging, and the problem may occur at various levels. However, is there anything that I might have overlooked, or is there anything that comes to your mind?

[1] https://open.sap.com/courses/suse2-1-pc
[2] https://paste.opensuse.org/pastes/254c28adf1c5
[3] https://opensap-pinboard.s3.openhpicloud.de/courses/6IB8s3snE8SkA2AxXCJWOL/topics/1nYPUtHIu4f73RLSTHdkqw/44nDvDjlO5bs8eoCxu0Uqh/openSAP-HA-LabEnv-Diagram.png [4] https://blogs.sap.com/2021/03/13/setting-up-suse-high-availability-cluster-quick-start-demo/

Regards,
bartk
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to