On May 5, 2020 6:39:54 AM GMT+03:00, "Nickle, Richard" <rnic...@holycross.edu> 
wrote:
I have a two node cluster managing a VIP.  The service is an SMTP
service.
This could be active/active, it doesn't matter which node accepts the
SMTP
connection, but I wanted to make sure that a VIP was in place so that
there
was a well-known address.

This service has been running for quite awhile with no problems.  All
of a
sudden, it partitioned, and now I can't work out a good way to get them
to
merge the clusters back again.  Right now one partition takes the
resource
and starts the VIP, but doesn't see the other node.  The other node
doesn't
create a resource, and can't seem to see the other node.

At this point, I am perfectly willing to create another node and make
an
odd-numbered cluster, the arguments for this being fairly persuasive.
But
I'm not sure why they are blocking.

Surely there must be some manual way to get a partitioned cluster to
merge?  Some trick?  I also had a scenario several weeks ago where an
odd-numbered cluster configured in a similar way partitioned into a 3
and 2
node cluster, and I was unable to work out how to get them to merge,
until
all of a sudden they seemed to fix themselves after doing a 'pcs node
remove/pcs node add' which had failed many times before.  I have tried
that
here but with no success so far.

I ruled out some common cases I've seen in discussions and threads,
such as
having my host name defined in host as localhost, etc.

Corosync 2.4.3, Pacemaker 0.9.164. (Ubuntu 18.04.).

Output from pcs status for both nodes:

Cluster name: mail
Stack: corosync
Current DC: mail2 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon May  4 23:28:53 2020
Last change: Mon May  4 21:50:04 2020 by hacluster via crmd on mail2

2 nodes configured
1 resource configured

Online: [ mail2 ]
OFFLINE: [ mail3 ]

Full list of resources:

mail_vip (ocf::heartbeat:IPaddr2): Started mail2

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Cluster name: mail
Stack: corosync
Current DC: mail3 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon May  4 22:13:10 2020
Last change: Mon May  4 22:10:34 2020 by root via cibadmin on mail3

2 nodes configured
0 resources configured

Online: [ mail3 ]
OFFLINE: [ mail2 ]

No resources

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

/etc/corosync/corosync.conf:

totem {
    version: 2
    cluster_name: mail
    clear_node_high_bit: yes
    crypto_cipher: none
    crypto_hash: none

    interface {
        ringnumber: 0
        bindnetaddr: 192.168.80.128
        mcastport: 5405
    }
}

logging {
    fileline: off
    to_stderr: no
    to_logfile: no
    to_syslog: yes
    syslog_facility: daemon
    debug: off
    timestamp: on
}

quorum {
    provider: corosync_votequorum
    wait_for_all: 0
    two_node: 1
}

nodelist {
    node {
        ring0_addr: mail2
        name: mail2
        nodeid: 1
    }

    node {
        ring0_addr: mail3
        name: mail3
        nodeid: 2
    }
}

Thanks!

Rick

Ah Rick,All

Just ignore the previous one - I guess  I'm too sleepy.

Honestly I think your advise was good. Current config uses default transport and for 2.4.3 it means multicast so trying unicast udpu may solve the problem.

If not I would take a look to classic things like firewall, ...

Regards,
  Honza



Best Regards,
Strahil Nikolov
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to