There are also no any errors detected on public ring 1 unlike private ring 0.
I have a suspicion that this error could be related to private VLAN settings but unfortunately have no good idea how to found the issue.
On 22/02/17 09:37, Ulrich Windl wrote:
Is "ttl 1" a good idea for a public network?Denis Gribkov <d...@itsts.net> schrieb am 21.02.2017 um 18:26 in Nachricht<4f5543c4-b80c-659d-ed5e-7a99e1482...@itsts.net>:Hi Everyone. I have 16-nodes asynchronous cluster configured with Corosync redundant ring feature. Each node has 2 similarly connected/configured NIC's. One NIC connected to the public network, another one to our private VLAN. When I checked Corosync rings operability I found: # corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 192.168.1.54 status = Marking ringid 0 interface 192.168.1.54 FAULTY RING ID 1 id = 111.11.11.1 status = ring 1 active with no faults After some time of digging into I identified that if I enable back the failed ring with command: # corosync-cfgtool -r RING ID 0 will be marked as "active" for few minutes, but after it marked permanently as faulty. Log has no any useful info, just single message: corosync[21740]: [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY And no any message like: [TOTEM ] Automatically recovered ring 1 My corosync.conf looks like: compatibility: whitetank totem { version: 2 secauth: on threads: 4 rrp_mode: passive interface { member { memberaddr: PRIVATE_IP_1 } ... member { memberaddr: PRIVATE_IP_16 } ringnumber: 0 bindnetaddr: PRIVATE_NET_ADDR mcastaddr: 226.0.0.1 mcastport: 5505 ttl: 1 } interface { member { memberaddr: PUBLIC_IP_1 } ... member { memberaddr: PUBLIC_IP_16 } ringnumber: 1 bindnetaddr: PUBLIC_NET_ADDR mcastaddr: 224.0.0.1 mcastport: 5405 ttl: 1 } transport: udpu logging { to_stderr: no to_logfile: yes logfile: /var/log/cluster/corosync.log logfile_priority: info to_syslog: yes syslog_priority: warning debug: on timestamp: on } I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0, but result was the similar. I checked multicast/unicast operability using omping utility and didn't found any issues. Also no errors on our private VLAN was found for network equipment. Why Corosync decided to disable permanently second ring? How I can debug the issue? Other properties: Corosync Cluster Engine, version '1.4.7' Pacemaker properties: cluster-infrastructure: cman cluster-recheck-interval: 5min dc-version: 1.1.14-8.el6-70404b0 expected-quorum-votes: 3 have-watchdog: false last-lrm-refresh: 1484068350 maintenance-mode: false no-quorum-policy: ignore pe-error-series-max: 1000 pe-input-series-max: 1000 pe-warn-series-max: 1000 stonith-action: reboot stonith-enabled: false symmetric-cluster: false Thank you. -- Regards Denis Gribkov_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
-- Regards Denis Gribkov
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org