At the moment you have to start the latest server to be alive first.

I know there's a task to compare age of the journals before
synchronizing it.. but it's not done yet.

On Tue, Jul 17, 2018 at 6:48 PM, Udayan Sahu <udayan.s...@oracle.com> wrote:
> Its simple HA subsystem, with a simple ask in replicated state system, it
> should start from last committed state…
>
>
>
> Step1: Master (M1) & Standby (S1) Alive
>
> Step2: Producer Send 10 Message à M1 receives it and replicates it to S1
>
> Step3: Kill Master ( M1) à It makes S1 as New Master
>
> Step4: Producer Send 10 Message à S1 receives messages and is not replicated
> as M1 is Down
>
> Step5: Kill Standby ( S1 )
>
> Step6: Start Master ( M1 )
>
> Step7: Start Standby (S1) ( it sync with Master (M1) discarding its internal
> state )
>
> This is wrong. M1 should sync with S1 since S1 represents the current state
> of the queue.
>
>
>
> How can we protect Step 4 Messages being lost… We are using transacted
> session and calling commit to make sure messages are persisted..
>
>
>
> --- Udayan Sahu
>
>
>
>
>
> From: Clebert Suconic [mailto:clebert.suco...@gmail.com]
> Sent: Tuesday, July 17, 2018 2:50 PM
> To: users@activemq.apache.org
> Cc: Udayan Sahu <udayan.s...@oracle.com>
> Subject: Re: Potential message loss seen with HA topology in Artemis 2.6.2
> on failback
>
>
>
> Ha is about preserving the journals between failures.
>
>
>
> When you read and send messages you may still have an failure during the
> reading.  I would need to understand what you do in case of a failure with
> your consumer and producer.
>
>
>
> Retries on send and duplicate detection are key for your case.
>
>
>
> You could also play with XA and a transaction manager.
>
>
>
> On Tue, Jul 17, 2018 at 5:01 PM Neha Sareen <neha.sar...@oracle.com> wrote:
>
> Hi,
>
>
>
> We are setting up a cluster of 6 brokers using Artemis 2.6.2.
>
>
>
> The cluster has 3 groups.
>
> - Each group has one master, and one slave broker pair.
>
> - The HA uses replication.
>
> - Each master broker configuration has the flag 'check-for-live-server' set
> to true.
>
> - Each slave broker configuration has the flag 'allow-failback' set to true.
>
> - We use static connectors for allowing cluster topology discovery.
>
> - Each broker's static connector list includes the connectors to the other 5
> servers in the cluster.
>
> - Each broker declares its acceptor.
>
> - Each broker exports its own connector information via the  'connector-ref'
> configuration element.
>
> - The acceptor and the connector URLs for each broker are identical with
> respect to the host and port information
>
>
>
> We have a standalone test application that creates producers and
>
> consumers to write messages and receive messages respectively using a
> transacted JMS session.
>
>
>
>> We are trying to execute an automatic failover test case followed by
>> failback as follows:
>
> TestCase -1
>
> Step1: Master & Standby Alive
>
> Step2: Producer Send Message , say 9 messages
>
> Step3: Kill Master
>
> Step4: Producer Send Message , say another 9 messages
>
> Step5: Kill Standby
>
> Step6: Start Master
>
> Step7: Start Standby.
>
> What we see is that it sync with Master discarding its internal state , and
> we are able to consume only 9 messages, leading to a loss of 9 messages
>
>
>
>
>
> Test Case - 2
>
> Step1: Master & Standby Alive
>
> Step2: Producer Send Message
>
> Step3: Kill Master
>
> Step4: Producer Send Message
>
> Step5: Kill Standby
>
> Step6: Start Standby ( it waits for Master )
>
> Step7: Start Master (Question does it wait for slave ??)
>
> Step8: Consume Message
>
>
>
> Can someone provide any insights here regarding the potential message loss?
>
> Also are there alternatives to a different topology we may use here to get
> around this issue?
>
>
>
> Thanks
>
> Neha
>
>
>
> --
>
> Clebert Suconic



-- 
Clebert Suconic

Reply via email to