We've been using Broker-J 7.0.2 BDB_HA successfully (on Windows servers) for
a couple years now without issue.  I was working on upgrading to Broker-J
7.1.6 and performing some tests in our Test Broker-J cluster prior to the
upgrade and noticed that BDB_HA message replication wasn't working properly
and we've never had an issue like this before.  By not working properly,
what is happening is when publishing messages to the master virtual host
node and then forcing (via the admin console) another virtual host node to
become the master, the messages I just added are not in the queue in the new
master.  And if I publish messages in the new master, those messages aren't
in the queue in the next node I make the master.

One difference in our Test environment from our Dev and Production
environment (which all use 3 nodes/servers) is that a couple months ago we
replaced 2 nodes (2 VMs - swapped out Windows 2008 Server with Windows 2016
Server for those 2 VMs).   For that process I just changed the Virtual Host
Node to add the 2 new VMs and then removed the old nodes (old Windows2008
VMs).  Everything appeared to be fine after that change but I didn't run the
tests I am running now so perhaps that is what is causing this issue.

To see if this replication issue was for our existing BDB_HA virtual host
(the VH that existed prior to the Windows VM replacement discussed
previously) I created a brand new virtual host (dixonbroker) in our Test
Qpid environment and added a virtual host node on each of our VMs in Test. 
I am still getting the same behavior where messages aren't replicating
correctly to each node when that node becomes master.  To help with
troubleshooting I used this new dixonbroker virtual host for the tests I've
documented below.

I've attached log files from all 3 nodes with some debugging enabled (I can
add more debugging and run the tests again if it helps).    


Here are the steps of my test:
1) Node1 was master.  I created a new durable queue named app_test.  I have
a little Java app using qpid-jms-client .31 (I tried .44 and it still
produced the same bad results).  I use Spring's JmsTemplate to publish 50
messages.  This was done at 11:00 (to help see the flow in the log files).
2) I made Node2 the master and there were no messages in app_test queue (see
log message 2020-01-24 11:00:00,243 INFO  [Queue Recoverer : app_test (vh:
dixonbroker)] (q.m.t.recovered) - [Broker]
[vh(/dixonbroker)/ms(ProvidedBDBMessageStore)] TXN-1005 : Recovered 0
messages for queue app_test in the node2 log file).
3) I made Node3 the master.  The 50 messages DID exist in the app_test
queue.  I then added 50 more messages to app_test queue (so total of 100)
while Node3 was the master.  This was at 11:02.
4) I made Node1 the master.  It only had 50 messages in app_test queue -
should have had 100.
5) I made Node2 the master. It still didn't have any messages in app_test
queue - should have had 100.



qpid-node1.log
<http://qpid.2158936.n2.nabble.com/file/t396334/qpid-node1.log>  
qpid-node2.log
<http://qpid.2158936.n2.nabble.com/file/t396334/qpid-node2.log>  
qpid-node3.log
<http://qpid.2158936.n2.nabble.com/file/t396334/qpid-node3.log>  



--
Sent from: http://qpid.2158936.n2.nabble.com/Apache-Qpid-users-f2158936.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to