Hi, We are using qpid-cpp v1.39 on a 2 node CentOS 7 cluster, running under pacemaker.
Stopping the active node 1) causes the qpid daemon to start up on backup node, 2) runs qpidd-primary as a one shot to promote qpidd on the backup node, and 3) the Virtual IP migrates to the backup node, all as it should be. The problem is with the AMQP queues/exchanges. The configuration is not apparently replicated to the backup node. The queues and exchanges just disappear upon failover. Also when failover is done while writing to the queues with reconnect:true;, the queues just disappear. The nodes are connected to each other via a crossover cable. And each node is also connected to the world via a separate nic. Each node has the same password set for user hacluster. Also tried non-crossover cable for replication connection and also tried connecting them through a switch. The contents of the qpidd.conf for both nodes: auth=no port=5672 tcp-nodelay=yes default-queue-limit=104857600 module-dir=/usr/lib64/qpid/daemon log-enable=debug+:Application log-enable=debug+:Broker log-enable=debug+:Cluster log-enable=debug+:HA log-enable=debug+:Network log-to-file=/var/log/qpidd.log ha-cluster=yes ha-public-url=17.210.37.67 ha-brokers-url=192.168.5.156,192.168.5.157 ha-replicate=all ha-mechanism=ANONYMOUS ha-username=hacluster ha-password=hacluster link-heartbeat-interval=10 link-maintenance-interval=2 mgmt-enable=yes mgmt-qmf2=yes mgmt-qmf1=no mgmt-pub-interval=10 enable-timestamp=yes In the course of a failover test I see these HA messages in the log file: 2021-02-03 18:46:13 [HA] info 47115303(standalone) Starting HA broker 2021-02-03 18:46:13 [HA] info 47115303(standalone) Status change: standalone -> joining 2021-02-03 18:46:13 [HA] debug amq.failover Updating URLs amqp:tcp:10.20.34.63:5672 to 0 subscribers. 2021-02-03 18:46:13 [HA] debug 47115303(joining) Public URL set to: amqp:tcp:10.20.34.63:5672 2021-02-03 18:46:13 [HA] info 47115303(joining) Brokers URL set to: amqp:tcp:192.168.5.156:5672,tcp:192.168.5.157:5672 2021-02-03 18:46:13 [HA] info 47115303(joining) Connecting to cluster: amqp:tcp:192.168.5.156:5672,tcp:192.168.5.157:5672 2021-02-03 18:46:13 [HA] info Status check amqp:tcp:192.168.5.156:5672: Failed to connect (reconnect disabled) 2021-02-03 18:46:13 [HA] info 47115303(joining) Set self address to: tcp:192.168.5.157:5672 2021-02-03 18:46:13 [HA] info Status check amqp:tcp:192.168.5.157:5672: Failed to connect (reconnect disabled) 2021-02-03 18:46:15 [HA] info 47115303(joining) Status change: joining -> recovering 2021-02-03 18:46:15 [HA] notice 47115303(recovering) Promoted to primary 2021-02-03 18:46:15 [HA] info 47115303(recovering) Status change: recovering -> active 2021-02-03 18:46:15 [HA] notice 47115303(active) All backups recovered. 2021-02-03 18:46:22 [HA] info 47115303(active) Accepted client connection qpid.10.20.34.63:5672-10.20.112.107:53657 qpid-config(92614) The reconnect disabled comes from autoReconnect not being set, but not pertinent with queues ? firewall is not running on either machine selinux is disabled The nodes can ping each other via the 192.168.5.nnn nics. Can ssh between the nodes. There are no aliases for the node names. Any ideas about what I've got wrong? Any help would be welcome. Thanks -- Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org