Hi,

We are using qpid-cpp v1.39 on a 2 node CentOS 7 cluster, running under
pacemaker.

Stopping the active node 1) causes the qpid daemon to start up on backup
node, 2) runs qpidd-primary as a one shot to promote qpidd on the backup
node, and 3) the Virtual IP migrates to the backup node, all as it should
be.

The problem is with the AMQP queues/exchanges.  The configuration is not
apparently replicated to the backup node.  The queues and exchanges just
disappear upon failover.  Also when failover is done while writing to the
queues with reconnect:true;, the queues just disappear.

The nodes are connected to each other via a crossover cable.  And each node
is also connected to the world via a separate nic.  Each node has the same
password set for user hacluster.  Also tried non-crossover cable for
replication connection and also tried connecting them through a switch.

The contents of the qpidd.conf for both nodes:

auth=no
port=5672
tcp-nodelay=yes
default-queue-limit=104857600
module-dir=/usr/lib64/qpid/daemon
log-enable=debug+:Application
log-enable=debug+:Broker
log-enable=debug+:Cluster
log-enable=debug+:HA
log-enable=debug+:Network
log-to-file=/var/log/qpidd.log
ha-cluster=yes
ha-public-url=17.210.37.67
ha-brokers-url=192.168.5.156,192.168.5.157
ha-replicate=all
ha-mechanism=ANONYMOUS
ha-username=hacluster
ha-password=hacluster
link-heartbeat-interval=10
link-maintenance-interval=2
mgmt-enable=yes
mgmt-qmf2=yes
mgmt-qmf1=no
mgmt-pub-interval=10
enable-timestamp=yes 

In the course of a failover test I see these HA messages in the log file:

2021-02-03 18:46:13 [HA] info 47115303(standalone) Starting HA broker
2021-02-03 18:46:13 [HA] info 47115303(standalone) Status change: standalone
-> joining
2021-02-03 18:46:13 [HA] debug amq.failover Updating URLs
amqp:tcp:10.20.34.63:5672  to 0 subscribers.
2021-02-03 18:46:13 [HA] debug 47115303(joining) Public URL set to:
amqp:tcp:10.20.34.63:5672
2021-02-03 18:46:13 [HA] info 47115303(joining) Brokers URL set to:
amqp:tcp:192.168.5.156:5672,tcp:192.168.5.157:5672
2021-02-03 18:46:13 [HA] info 47115303(joining) Connecting to cluster:
amqp:tcp:192.168.5.156:5672,tcp:192.168.5.157:5672
2021-02-03 18:46:13 [HA] info Status check amqp:tcp:192.168.5.156:5672:
Failed to connect (reconnect disabled)
2021-02-03 18:46:13 [HA] info 47115303(joining) Set self address to:
tcp:192.168.5.157:5672
2021-02-03 18:46:13 [HA] info Status check amqp:tcp:192.168.5.157:5672:
Failed to connect (reconnect disabled)
2021-02-03 18:46:15 [HA] info 47115303(joining) Status change: joining ->
recovering
2021-02-03 18:46:15 [HA] notice 47115303(recovering) Promoted to primary
2021-02-03 18:46:15 [HA] info 47115303(recovering) Status change: recovering
-> active
2021-02-03 18:46:15 [HA] notice 47115303(active) All backups recovered.
2021-02-03 18:46:22 [HA] info 47115303(active) Accepted client connection
qpid.10.20.34.63:5672-10.20.112.107:53657  qpid-config(92614)

The reconnect disabled comes from autoReconnect not being set, but not
pertinent with queues ?
firewall is not running on either machine
selinux is disabled
The nodes can ping each other via the 192.168.5.nnn nics.
Can ssh between the nodes.
There are no aliases for the node names.

Any ideas about what I've got wrong?  Any help would be welcome.

Thanks



--
Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org

Reply via email to