Re: [Broker-J 7.1.3] Message loss during failover

Oleksandr Rudyy Wed, 29 May 2019 03:22:36 -0700

Hi Igor,

Could you please clarify what client are you using to publish messages and
how exactly messages were published?

The log about orphaned message indicates that on Virtual Host recovery
(after node became a Master), the queue entry record was not found for the
message in the message store.
VirtualHostNode-RCO_1_FIX_VHN-Config]
(o.a.q.s.v.SynchronousMessageStoreRecoverer) - Discarded 1 orphaned
message(s).

As result, the broker discarded that message.

The above might happen when BDB JE transaction for message header and
content was replicated over, but message enquiuing was not replicated. It
looks like a switch to a new Master occurred somewhere after message
arrived to the broker but before finishing an enqueuing operation.

When message is published asynchronously, the client send operation does
not wait for the message to arrive to the broker and there is a possibility
here for the message loss. For example, this might happen when message was
publishing using a legacy JMS client for AMQP 0-x. By default, this client
is publishing messages in asynchronous way.

If synchronous publishing mode is used, the publishing operation should
fail with exception. In order to exclude any possibility for a message
loss, you need to use transactions.

Kind Regards,
Alex

On Tue, 28 May 2019 at 20:35, Igor Natanzon <[email protected]> wrote:

> Hi, we have a 3-node cluster defined, with MASTER set to SYNC and REPLICAS
> set to WRITE_NO_SYNC.
> Today we did a stress test, sending 200k+ messages on each of 3 queues.
> Some time during the transmission I performed a failover of Master to
> another node (RCO_1_FIX_VHN). The node was in 'waiting' state for about 20
> seconds before it became a master.
>
> Once the queues emptied, we noticed we lost 4 messages. Looking into qpid
> server log, I only see the following exception:
>
> 2019-05-28 13:20:56,581 WARN  [Broker-Config]
> (o.a.q.s.v.b.BDBHAVirtualHostNodeImpl) - Transfer master did not complete
> within 100ms. Node may still be elected master at a later time.
> ...
> 2019-05-28 13:21:27,842 INFO  [VirtualHostNode-RCO_1_FIX_VHN-Config]
> (o.a.q.s.v.SynchronousMessageStoreRecoverer) - Discarded 1 orphaned
> message(s).
>
> There are no other errors or issues in any logs.
>
> I'm not sure what the orphaned message is, and I'm not sure if I need to
> set all replicas to be SYNC in addition to the master to handle this
> scenar. Is there anything I can look at to track down what happened to the
> missing messages?
>
> Thanks!
>

Re: [Broker-J 7.1.3] Message loss during failover

Reply via email to