Hi, we setup an Artemis cluster with three broker instances installed on AWS each in a different availability zones (A,B,C). Only one of the broker instances is configured as master, because Artemis message redistribution does not take message filters into account (https://activemq.apache.org/components/artemis/documentation/latest/clusters.html#redistribution-and-filters-selectors) and we use/need message filters.
Last week we experienced a network outage of one AWS availability zone. When the network connectivity was restored we ended up with two of the broker instances claiming to be active/live. I tried to replicate this by manually blocking all TCP connections between the servers, this is the behavior I see: 1) I started all three broker instances. => The broker instance in AZ A reports to be "live", the instance in AZ B reports to be "backup server", and the instance in AZ C reports to be "stopped". 2) I then cut the network connection between the server in AZ A and the other servers but left the broker process on the Server in AZ A running. => Now broker B upgrades from backup to live server and broker C starts a backup server. => broker A still thinks it is live - which is not a problem for as, as no client can reach the broker. 4) I re-enabled TCP connections between the server in AZ A and the servers in the other AZs. => Now broker A and broker B are permanently stay "live". How can we achieve that after 4) either broker A or broker B shuts down? Here is the cluster configuration we are currently using: Broker Cluster Config in AZ A: ------------------------------ <connectors> <connector name="local-node-connector">tcp://broker-a:61617</connector> <connector name="remote-node-connector-0">tcp://broker-b:61617</connector> <connector name="remote-node-connector-1">tcp://broker-c:61617</connector> </connectors> <cluster-connections><cluster-connection name="cluster1"> <message-load-balancing>ON_DEMAND</message-load-balancing> <connector-ref>local-cluster-node-connector</connector-ref> <static-connectors allow-direct-connections-only="true"> <connector-ref>remote-node-connector-0</connector-ref> <connector-ref>remote-node-connector-1</connector-ref> </static-connectors> </cluster-connection></cluster-connections> <ha-policy><replication> <master> <cluser-name>cluster1</cluser-name> <check-for-live-server>true</check-for-live-server> <vote-on-replication-failure>true</vote-on-replication-failure> </master> </replication></ha-policy> Broker Cluster Config in AZ B: ------------------------------ <connectors> <connector name="local-node-connector">tcp://broker-b:61617</connector> <connector name="remote-node-connector-0">tcp://broker-a:61617</connector> <connector name="remote-node-connector-1">tcp://broker-c:61617</connector> </connectors> <cluster-connections><cluster-connection name="cluster1"> <message-load-balancing>ON_DEMAND</message-load-balancing> <connector-ref>local-node-connector</connector-ref> <static-connectors allow-direct-connections-only="true"> <connector-ref>remote-node-connector-0</connector-ref> <connector-ref>remote-node-connector-1</connector-ref> </static-connectors> </cluster-connection></cluster-connections> <ha-policy><replication> <slave> <cluser-name>cluster1</cluser-name> <allow-failback>true</allow-failback> <restart-backup>true</restart-backup> <quorum-vote-wait>15</quorum-vote-wait> <vote-retries>12</vote-retries> <vote-retry-wait>5000</vote-retry-wait> </slave> </replication></ha-policy> Broker Cluster Config in AZ C: ------------------------------ <connectors> <connector name="local-node-connector">tcp://broker-c:61617</connector> <connector name="remote-node-connector-0">tcp://broker-a:61617</connector> <connector name="remote-node-connector-1">tcp://broker-b:61617</connector> </connectors> <cluster-connections> <cluster-connection name="cluster1"> <message-load-balancing>ON_DEMAND</message-load-balancing> <connector-ref>local-node-connector</connector-ref> <static-connectors allow-direct-connections-only="true"> <connector-ref>remote-node-connector-0</connector-ref> <connector-ref>remote-node-connector-1</connector-ref> </static-connectors> </cluster-connection> </cluster-connections> <ha-policy><replication> <slave> <cluser-name>cluster1</cluser-name> <allow-failback>true</allow-failback> <restart-backup>true</restart-backup> <quorum-vote-wait>15</quorum-vote-wait> <vote-retries>12</vote-retries> <vote-retry-wait>5000</vote-retry-wait> </slave> </replication></ha-policy> Thanks for any help, Jo -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html