Hi Igor, Could you please clarify what client are you using to publish messages and how exactly messages were published?
The log about orphaned message indicates that on Virtual Host recovery (after node became a Master), the queue entry record was not found for the message in the message store. VirtualHostNode-RCO_1_FIX_VHN-Config] (o.a.q.s.v.SynchronousMessageStoreRecoverer) - Discarded 1 orphaned message(s). As result, the broker discarded that message. The above might happen when BDB JE transaction for message header and content was replicated over, but message enquiuing was not replicated. It looks like a switch to a new Master occurred somewhere after message arrived to the broker but before finishing an enqueuing operation. When message is published asynchronously, the client send operation does not wait for the message to arrive to the broker and there is a possibility here for the message loss. For example, this might happen when message was publishing using a legacy JMS client for AMQP 0-x. By default, this client is publishing messages in asynchronous way. If synchronous publishing mode is used, the publishing operation should fail with exception. In order to exclude any possibility for a message loss, you need to use transactions. Kind Regards, Alex On Tue, 28 May 2019 at 20:35, Igor Natanzon <[email protected]> wrote: > Hi, we have a 3-node cluster defined, with MASTER set to SYNC and REPLICAS > set to WRITE_NO_SYNC. > Today we did a stress test, sending 200k+ messages on each of 3 queues. > Some time during the transmission I performed a failover of Master to > another node (RCO_1_FIX_VHN). The node was in 'waiting' state for about 20 > seconds before it became a master. > > Once the queues emptied, we noticed we lost 4 messages. Looking into qpid > server log, I only see the following exception: > > 2019-05-28 13:20:56,581 WARN [Broker-Config] > (o.a.q.s.v.b.BDBHAVirtualHostNodeImpl) - Transfer master did not complete > within 100ms. Node may still be elected master at a later time. > ... > 2019-05-28 13:21:27,842 INFO [VirtualHostNode-RCO_1_FIX_VHN-Config] > (o.a.q.s.v.SynchronousMessageStoreRecoverer) - Discarded 1 orphaned > message(s). > > There are no other errors or issues in any logs. > > I'm not sure what the orphaned message is, and I'm not sure if I need to > set all replicas to be SYNC in addition to the master to handle this > scenar. Is there anything I can look at to track down what happened to the > missing messages? > > Thanks! >
