Hi Justin, Thanks for answering!
I did not describe the whole picture, just the part which caused issues. I set up 2 physical nodes, each running 1 master and 1 slave crossed over, so both slaves are on different node from it's master. I did not copy any journals, but created every instance separately with artemis create command and making necessary changes to broker.xml. I started masters first and while looking at the logs I noticed the master I started second took the role of slave. BR! Mihkel On 20 October 2015 at 18:36, Justin Bertram <jbert...@apache.com> wrote: > I'm not sure I understand the point of having an HA policy without any HA > (i.e. without any backups). If you want 2 master servers then don't > configure HA, just configure 2 clustered servers. > > Also, make sure you don't copy the journal from one server to another when > configuring replication as the journal contains the unique ID of each node. > > > Justin > > ----- Original Message ----- > From: "Mihkel Nõges" <mihkel.no...@transferwise.com> > To: users@activemq.apache.org > Sent: Tuesday, October 20, 2015 9:46:24 AM > Subject: Re: [Artemis] Master fails to start up after failback > > Yes, sorry, had typo in email. Had replicated conf and second master became > slave for the first. > > Mihkel > > On 20 October 2015 at 17:44, Justin Bertram <jbert...@apache.com> wrote: > > > You can't have 2 masters using the same shared-store. However, you can > > have 2 masters each with their own store. > > > > > > Justin > > > > ----- Original Message ----- > > From: "Mihkel Nõges" <mihkel.no...@transferwise.com> > > To: users@activemq.apache.org > > Sent: Tuesday, October 20, 2015 9:24:21 AM > > Subject: Re: [Artemis] Master fails to start up after failback > > > > Also I had a question earlier about having more than one Artemis master > in > > single cluster. When I tried this it resulted in only one master > becoming a > > master, the other one became a slave for the first one started even > though > > I set different group-name values for them in broker.xml. Is this > expected > > behavior? > > > > <ha-policy> > > <shared-store> > > <master> > > <group-name>ha-cluster1</group-name> > > </master> > > </shared-store> > > </ha-policy> > > > > <ha-policy> > > <shared-store> > > <master> > > <group-name>ha-cluster2</group-name> > > </master> > > </shared-store> > > </ha-policy> > > > > Mihkel > > > > On 20 October 2015 at 16:53, Mihkel Nõges <mihkel.no...@transferwise.com > > > > wrote: > > > > > Hi Tim, Clebert! > > > > > > Yes we considered also the alternatives ( > > > http://activemq.apache.org/masterslave.html): > > > *Shared Storage:* > > > > > > We do not have high performance shared storage solution. We have some > > > solution for our current file storage needs, but it's I/O is said to be > > > very slow and would need to be extended to support extra load. > > > > > > *Replicated LevelDB:* > > > > > > It sounds cool, but I'm a little bit afraid of moving from one > > > experimental solution to the next. I noticed LevelDB does not support > > some > > > of the features we need like Scheduled message delivery: > > > http://activemq.apache.org/replicated-leveldb-store.html > > > The LevelDB store does not yet support storing data associated with > Delay > > > and Schedule Message Delivery. Those are are stored in a separate > > > non-replicated KahaDB data files. Unexpected results will occur if you > > use > > > Delay and Schedule Message Delivery with the replicated leveldb store > > since > > > that data will be not be there when the master fails over to a slave. > > > > > > Note like this make me feel very uneasy about the solution. > > > > > > *JDBC:* > > > > > > So it seems to me like the most reliable highly available messaging > > > solution in ActiveMQ 5 is JDBC. We have MySQL running as our main DB > and > > > setting up a second DB for messaging would be fairly simple for > standard > > > procedures of maintenance, backups and disaster recovery etc. > > > > > > > > > I consider this only as a temporary solution until we can use more > > > performant alternative configuration and I'm not expecting Artemis to > > > implement support for JDBC storage ever. > > > > > > We are using messaging in process of splitting our monolithic > application > > > into micro-services. As this is gradual process, the amount of messages > > > would be very small in the beginning, so having low performing but > > reliable > > > JDBC backed broker configuration seems good for start. > > > > > > I was trying to find the more orthodox approach, but could not find or > > get > > > good suggestions. I tried disabling fail-back and starting master like > > that > > > resulted in both servers spamming in the logs another server with the > > same > > > ID is running. Do I understand correctly I should have backed up and > > > removed the /data folder of the master, reconfigured it as a slave and > > > started it then? > > > > > > Can you give me some overview of already existing deployments of highly > > > available and failing over (not necessarily failing back) Artemis > > > installations in production I may change my mind about going with it > from > > > the start. > > > > > > Mihkel > > > > > > > > > On 20 October 2015 at 16:19, Clebert Suconic < > clebert.suco...@gmail.com> > > > wrote: > > > > > >> As far as I know ActiveMQ5 doesn't do failback on the master-slave > > >> journal... and it doesn't have any protocol to sync the data between > > >> master and slave. > > >> > > >> > > >> There is a small regression on the failback that we are dealing now... > > >> if you set the master as a backup it would work fine... > > >> > > >> > > >> I think your testcase is a bit non orthodox... > > >> > > >> TBH production guys usually don't use failback.. they keep the backup > > >> until they can get to a quiet period and then do the failback (or > > >> restart the system) under low load. > > >> > > >> > > >> I also second Tim Bain on your choice for JDBC. > > >> > > >> I actually always say this.. if you can use JDBC as a storage for > > >> messaging.. don't use messaging at all.. just store and retrieve from > > >> the Database. > > >> > > >> > > >> There's a JIRA open for Artemis on JDBC.. but usually those things are > > >> written because users want, not need it. > > >> > > >> On Tue, Oct 20, 2015 at 3:12 AM, Mihkel Nõges > > >> <mihkel.no...@transferwise.com> wrote: > > >> > Yes I saw that issue too and set myself as watcher of this when it > was > > >> > created. I did not think it could be exactly the same as it is > > >> described to > > >> > present itself only in narrow timing related conditions. My case > seems > > >> to > > >> > be much more broad and basic. Seems like nobody actually tried to > set > > >> this > > >> > up in realistic situation. > > >> > > > >> > Do you know of any existing production deployments of Artemis (or > > >> hornetq) > > >> > with failover? I thought Artemis as based on hornetq should have > it's > > >> > features as stable as last hornetq version. We have already used > > >> embedded > > >> > hornetq for some time happily. I think it would make a lot of sense > to > > >> > grade the Artemis features publicly as what is their maturity and > > usage > > >> > statistics of each feature if known, so it would be easier to > compare > > >> the > > >> > brokers even among the 3 variants of ActiveMQ family. > > >> > > > >> > I think it's more safe for us to start building our first messaging > > >> > features on ActiveMQ 5.12.1 with JDBC backed Master-Slave instead of > > >> > Artemis and switch to Artemis once it has become more stable and > also > > >> our > > >> > needs for scalability have grown to make it reasonable. Right now it > > >> seems > > >> > there are still too big blockers which may threaten the stability of > > our > > >> > system in Artemis. > > >> > > > >> > I did not mean this letter to be in no means negative. In the > > opposite I > > >> > really hope Artemis would do well as it comes with such a great > > >> technical > > >> > foundation and elegant ideas. I think the best for Artemis would be > to > > >> find > > >> > users that can trust it's features and improve it as they grow. This > > >> means > > >> > the nucleus of Artemis must be really solid and stable. > > >> > > > >> > BR! > > >> > Mihkel Nõges > > >> > > > >> > > > >> > > > >> > On 19 October 2015 at 22:15, Clebert Suconic < > > clebert.suco...@gmail.com > > >> > > > >> > wrote: > > >> > > > >> >> Looks related to me: > > >> >> > > >> >> https://issues.apache.org/jira/browse/ARTEMIS-256 > > >> >> > > >> >> > > >> >> > > >> >> On Mon, Oct 19, 2015 at 4:04 AM, Mihkel Nõges > > >> >> <mihkel.no...@transferwise.com> wrote: > > >> >> > Basic flow of getting unresponsive failback cluster: > > >> >> > Have machine with Ubuntu 14.04.3 > > >> >> > > > >> >> > 1. Install libaio1, Java 1.8.0_60, maven 3.3.3, download and > > >> extract > > >> >> > apache-artemis-1.1.0-bin > > >> >> > < > > >> >> > > >> > > > http://www.eu.apache.org/dist/activemq/activemq-artemis/1.1.0/apache-artemis-1.1.0-bin.tar.gz > > >> >> > > > >> >> > in > > >> >> > /opt > > >> >> > 2. run $ mvn -Prelease install and $ mnv verify in > > >> >> > > > >> /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback > > >> >> > SUCCESS > > >> >> > 3. Clean data folders and starts both servers manually: > > >> >> > $ > > >> >> > cd > > >> >> > > >> > > /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target > > >> >> > $ rm -R server0/data/ > > >> >> > $ rm -R server1/data/ > > >> >> > $ server0/bin/artemis-service start > > >> >> > Starting artemis-service > > >> >> > artemis-service is now running (7154) > > >> >> > $ server1/bin/artemis-service start > > >> >> > Starting artemis-service > > >> >> > artemis-service is now running (7180) > > >> >> > 4. Kill master server and wait for slave to take over > > >> >> > $ kill -9 7154 > > >> >> > > > >> >> > $ tail -f server1/log/artemis.log > > >> >> > 08:52:54,798 INFO [org.apache.activemq.artemis.core.server] > > >> >> AMQ221043: > > >> >> > Protocol module found: [artemis-stomp-protocol]. Adding > protocol > > >> >> support > > >> >> > for: STOMP > > >> >> > 08:53:02,145 INFO [org.apache.activemq.artemis.core.server] > > >> >> AMQ221109: > > >> >> > Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] > > >> started, > > >> >> waiting > > >> >> > live to fail before it gets active > > >> >> > 08:53:03,582 INFO [org.apache.activemq.artemis.core.server] > > >> >> AMQ221024: > > >> >> > Backup server > > >> >> > > > >> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 is > > >> >> > synchronized with live-server. > > >> >> > 08:53:03,777 INFO [org.apache.activemq.artemis.core.server] > > >> >> AMQ221031: > > >> >> > backup announced > > >> >> > 08:55:59,292 INFO [org.apache.activemq.artemis.core.server] > > >> >> AMQ221037: > > >> >> > > > >> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 to > > >> >> > become 'live' > > >> >> > 08:55:59,302 WARN [org.apache.activemq.artemis.core.client] > > >> >> AMQ212004: > > >> >> > Failed to connect to server. > > >> >> > 08:55:59,778 INFO [org.apache.activemq.artemis.core.server] > > >> >> AMQ221003: > > >> >> > trying to deploy queue jms.queue.exampleQueue > > >> >> > 08:55:59,829 WARN [org.apache.activemq.artemis.core.client] > > >> >> AMQ212034: > > >> >> > There are more than one servers on the network broadcasting > the > > >> same > > >> >> node > > >> >> > id. You will see this message exactly once (per node) if a > node > > is > > >> >> > restarted, in which case it can be safely ignored. But if it > is > > >> logged > > >> >> > continuously it means you really do have more than one node on > > the > > >> >> same > > >> >> > network active concurrently with the same node id. This could > > >> occur > > >> >> if you > > >> >> > have a backup node active at the same time as its live node. > > >> >> > nodeID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 > > >> >> > 08:55:59,836 INFO [org.apache.activemq.artemis.core.server] > > >> >> AMQ221007: > > >> >> > Server is now live > > >> >> > 08:55:59,869 INFO [org.apache.activemq.artemis.core.server] > > >> >> AMQ221020: > > >> >> > Started Acceptor at broker3:61617 for protocols > > >> >> > [CORE,MQTT,AMQP,HORNETQ,STOMP,OPENWIRE] > > >> >> > 5. > > >> >> > > > >> >> > Start master again and observer the logs: > > >> >> > $ server0/bin/artemis-service start > > >> >> > Starting artemis-service > > >> >> > artemis-service is now running (7388) > > >> >> > > > >> >> > $ tail -f server0/log/artemis.log > > >> >> > 08:57:24,625 INFO [org.apache.activemq.artemis.core.server] > > >> AMQ221012: > > >> >> > Using AIO Journal > > >> >> > 08:57:24,694 INFO [org.apache.activemq.artemis.core.server] > > >> AMQ221043: > > >> >> > Protocol module found: [artemis-server]. Adding protocol support > > for: > > >> >> CORE > > >> >> > 08:57:24,702 INFO [org.apache.activemq.artemis.core.server] > > >> AMQ221043: > > >> >> > Protocol module found: [artemis-amqp-protocol]. Adding protocol > > >> support > > >> >> > for: AMQP > > >> >> > 08:57:24,731 INFO [org.apache.activemq.artemis.core.server] > > >> AMQ221043: > > >> >> > Protocol module found: [artemis-hornetq-protocol]. Adding > protocol > > >> >> support > > >> >> > for: HORNETQ > > >> >> > 08:57:24,733 INFO [org.apache.activemq.artemis.core.server] > > >> AMQ221043: > > >> >> > Protocol module found: [artemis-mqtt-protocol]. Adding protocol > > >> support > > >> >> > for: MQTT > > >> >> > 08:57:24,743 INFO [org.apache.activemq.artemis.core.server] > > >> AMQ221043: > > >> >> > Protocol module found: [artemis-openwire-protocol]. Adding > protocol > > >> >> support > > >> >> > for: OPENWIRE > > >> >> > 08:57:24,878 INFO [org.apache.activemq.artemis.core.server] > > >> AMQ221043: > > >> >> > Protocol module found: [artemis-stomp-protocol]. Adding protocol > > >> support > > >> >> > for: STOMP > > >> >> > 08:57:25,082 INFO [org.apache.activemq.artemis.core.server] > > >> AMQ221109: > > >> >> > Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] > started, > > >> >> waiting > > >> >> > live to fail before it gets active > > >> >> > 08:57:27,043 INFO [org.apache.activemq.artemis.core.server] > > >> AMQ221024: > > >> >> > Backup server > > >> >> > > ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 > > >> is > > >> >> > synchronized with live-server. > > >> >> > 08:57:27,948 INFO [org.apache.activemq.artemis.core.server] > > >> AMQ221031: > > >> >> > backup announced > > >> >> > 08:57:31,227 WARN [org.apache.activemq.artemis.core.client] > > >> AMQ212037: > > >> >> > Connection failure has been detected: AMQ119015: The connection > was > > >> >> > disconnected because of server shutdown [code=DISCONNECTED] > > >> >> > 08:57:31,252 WARN [org.apache.activemq.artemis.core.client] > > >> AMQ212037: > > >> >> > Connection failure has been detected: AMQ119015: The connection > was > > >> >> > disconnected because of server shutdown [code=DISCONNECTED] > > >> >> > 08:57:31,307 WARN [org.apache.activemq.artemis.core.client] > > >> AMQ212037: > > >> >> > Connection failure has been detected: AMQ119015: The connection > was > > >> >> > disconnected because of server shutdown [code=DISCONNECTED] > > >> >> > 08:57:31,339 INFO [org.apache.activemq.artemis.core.server] > > >> AMQ221037: > > >> >> > > ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 > > >> to > > >> >> > become 'live' > > >> >> > 08:57:31,360 WARN [org.apache.activemq.artemis.core.client] > > >> AMQ212004: > > >> >> > Failed to connect to server. > > >> >> > 08:57:31,413 ERROR [org.apache.activemq.artemis.core.server] > > >> AMQ224008: > > >> >> > Failed to store id: java.lang.IllegalStateException: Cannot find > > add > > >> >> info 1 > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:799) > > >> >> > [artemis-journal-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRecord(JournalBase.java:183) > > >> >> > [artemis-journal-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:79) > > >> >> > [artemis-journal-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.deleteID(JournalStorageManager.java:1194) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.deleteID(BatchingIDGenerator.java:152) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.cleanup(BatchingIDGenerator.java:75) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.loadBindingJournal(JournalStorageManager.java: > > >> 1784) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: > > >> 1625) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: > > >> 1535) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] > > >> >> > 08:57:31,540 WARN [org.apache.activemq.artemis.core.server] > > >> AMQ222173: > > >> >> > Queue jms.queue.exampleQueue is duplicated during reload. This > > queue > > >> will > > >> >> > be renamed as jms.queue.exampleQueue-0 > > >> >> > 08:57:31,550 ERROR [org.apache.activemq.artemis.core.server] > > >> AMQ224000: > > >> >> > Failure in initialisation: java.lang.IllegalStateException: > Cursor > > 2 > > >> had > > >> >> > already been created > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.createSubscription(PageCursorProviderImpl.java:97) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.initQueues(PostOfficeJournalLoader.java:140) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: > > >> 1631) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: > > >> 1535) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at > > >> >> > > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) > > >> >> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] > > >> >> > > > >> >> > > > >> >> > On 19 October 2015 at 10:31, Mihkel Nõges < > > >> mihkel.no...@transferwise.com > > >> >> > > > >> >> > wrote: > > >> >> > > > >> >> >> Hi Clebert, > > >> >> >> > > >> >> >> I do not have other code to share with you but the example code > in > > >> >> Artemis > > >> >> >> 1.1.0 binary deployment package. I'm running > > >> >> >> > org.apache.activemq.artemis.jms.example.ReplicatedFailbackExample > > >> >> >> > > >> >> >> And only commented out the serverStart and killServer calls > which > > I > > >> am > > >> >> >> doing manually. > > >> >> >> > > >> >> >> I do not think I do any of the steps too fast as I tail the > server > > >> log > > >> >> >> files in parallel and see everything is finished when I start > the > > >> fail > > >> >> >> back. I have waited many minutes in between. > > >> >> >> > > >> >> >> Only changes in configuration to the test is changing localhost > > >> >> addresses > > >> >> >> with broker3 to make the cluster accessible remotely. > > >> >> >> > > >> >> >> BR! > > >> >> >> MIhkel > > >> >> >> > > >> >> >> On 18 October 2015 at 17:49, Clebert <clebert.suco...@gmail.com > > > > >> wrote: > > >> >> >> > > >> >> >>> Im not on my computer now but it sounds like you are doing a > fail > > >> back > > >> >> >>> immediately after failed over. It takes some time (seconds) to > > the > > >> >> server > > >> >> >>> to activate on the backup. > > >> >> >>> > > >> >> >>> Later the server will need to copy the data back before it can > be > > >> >> >>> activated in fail back mode. > > >> >> >>> > > >> >> >>> It sounds the live is not reaching backup for fail back. > > >> >> >>> > > >> >> >>> I will try looking it at it on Monday. Maybe you could post > your > > >> >> example > > >> >> >>> at your GitHub fork. > > >> >> >>> > > >> >> >>> -- Clebert Suconic typing on the iPhone. > > >> >> >>> > > >> >> >>> > On Oct 18, 2015, at 08:15, Mihkel Nõges < > > >> >> mihkel.no...@transferwise.com> > > >> >> >>> wrote: > > >> >> >>> > > > >> >> >>> > Hello again! > > >> >> >>> > > > >> >> >>> > I would be very grateful If someone could answer my > questions. > > We > > >> >> need > > >> >> >>> the high availability to work to use the broker in production. > > >> >> >>> > > > >> >> >>> > When I run the replicated-failback example in one machine > > >> (broker3) > > >> >> it > > >> >> >>> succeeds. > > >> >> >>> > > > >> >> >>> > It fails when I run the same test - exactly the same servers > > with > > >> >> >>> slightly modified client remotely. > > >> >> >>> > > > >> >> >>> > I run client in debug mode from my IDE with commented out > > >> serverStart > > >> >> >>> and killServer calls. > > >> >> >>> > Deleted data folders and started the servers: > > >> >> >>> > artemis@broker3 > > >> >> > > >> > > > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > > >> >> >>> rm -R server0/data/ > > >> >> >>> > > > >> >> >>> > artemis@broker3 > > >> >> > > >> > > > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > > >> >> >>> rm -R server1/data/ > > >> >> >>> > > > >> >> >>> > artemis@broker3 > > >> >> > > >> > > > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > > >> >> >>> server0/bin/artemis-service start > > >> >> >>> > > > >> >> >>> > Starting artemis-service > > >> >> >>> > > > >> >> >>> > artemis-service is now running (23357) > > >> >> >>> > > > >> >> >>> > artemis@broker3 > > >> >> > > >> > > > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > > >> >> >>> server1/bin/artemis-service start > > >> >> >>> > > > >> >> >>> > Starting artemis-service > > >> >> >>> > > > >> >> >>> > artemis-service is now running (23383) > > >> >> >>> > > > >> >> >>> > Starting client and stopping on breakpoint at line 103: > > >> >> >>> > //ServerUtil.killServer(server0); > > >> >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages > > will > > >> fail > > >> >> >>> as failover to the > > >> >> >>> > // backup server has occurred > > >> >> >>> > try { > > >> >> >>> > message0.acknowledge(); //line 103 > > >> >> >>> > killing server0 > > >> >> >>> > artemis@broker3 > > >> >> > > >> > > > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > > >> >> >>> kill -9 23357 > > >> >> >>> > > > >> >> >>> > Proceeding to breakpoint at line 121: > > >> >> >>> > //server0 = ServerUtil.startServer(args[0], > > >> >> >>> ReplicatedFailbackExample.class.getSimpleName() + "0", 0, > 10000); > > >> >> >>> > > > >> >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages > > will > > >> fail > > >> >> >>> as failover to the > > >> >> >>> > // backup server has occurred > > >> >> >>> > try { > > >> >> >>> > message0.acknowledge(); // line 121 > > >> >> >>> > Starting server0: > > >> >> >>> > artemis@broker3 > > >> >> > > >> > > > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > > >> >> >>> server0/bin/artemis-service start > > >> >> >>> > > > >> >> >>> > Starting artemis-service > > >> >> >>> > > > >> >> >>> > artemis-service is now running (24240) > > >> >> >>> > > > >> >> >>> > Server0 writes ERROR to it's log (see attached > > >> server0_artemis.log). > > >> >> >>> > Now when trying to proceed with the client it writes the > > >> following in > > >> >> >>> the log and does not exit, but remains hanging forever: > > >> >> >>> > > > >> >> >>> > Oct 18, 2015 2:55:34 PM > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl > > >> >> >>> fail > > >> >> >>> > > > >> >> >>> > WARN: AMQ212037: Connection failure has been detected: > > >> AMQ119015: The > > >> >> >>> connection was disconnected because of server shutdown > > >> >> [code=DISCONNECTED] > > >> >> >>> > > > >> >> >>> > Got message: This is text message 20 (redelivered?: false) > > >> >> >>> > > > >> >> >>> > Got exception while acknowledging message: AMQ119014: Timed > out > > >> after > > >> >> >>> waiting 30,000 ms for response when sending packet 43 > > >> >> >>> > > > >> >> >>> > Got message: This is text message 21 (redelivered?: false) > > >> >> >>> > > > >> >> >>> > Got message: This is text message 22 (redelivered?: false) > > >> >> >>> > > > >> >> >>> > Got message: This is text message 23 (redelivered?: false) > > >> >> >>> > > > >> >> >>> > Got message: This is text message 24 (redelivered?: false) > > >> >> >>> > > > >> >> >>> > Got message: This is text message 25 (redelivered?: false) > > >> >> >>> > > > >> >> >>> > Got message: This is text message 26 (redelivered?: false) > > >> >> >>> > > > >> >> >>> > Got message: This is text message 27 (redelivered?: false) > > >> >> >>> > > > >> >> >>> > Got message: This is text message 28 (redelivered?: false) > > >> >> >>> > > > >> >> >>> > Got message: This is text message 29 (redelivered?: false) > > >> >> >>> > > > >> >> >>> > As a result the slave (server1) remains stopped, not > restarted > > as > > >> >> >>> expected and the master (server0) process appears to be running > > but > > >> >> does > > >> >> >>> not accept any connections. > > >> >> >>> > > > >> >> >>> > Exactly the same behavior is observable every time I try > this. > > >> >> >>> > > > >> >> >>> > BR! > > >> >> >>> > Mihkel > > >> >> >>> > > > >> >> >>> >> On 13 October 2015 at 20:17, Mihkel Nõges < > > >> >> >>> mihkel.no...@transferwise.com> wrote: > > >> >> >>> >> Hi Clebert, > > >> >> >>> >> > > >> >> >>> >> No test, just doing it on command line with standalone > > servers. > > >> I'm > > >> >> >>> using 1.1.0 installed with wget, not the snapshot. > > >> >> >>> >> > > >> >> >>> >> I'm wondering what should be the suggested procedure for > > admins > > >> to > > >> >> do > > >> >> >>> changes to HA cluster of 2 or 3 nodes of Artemis. If one of the > > >> nodes > > >> >> is > > >> >> >>> master by configuration, do they need to change it's config > > before > > >> >> >>> restarting it to become slave to have seamless change process > and > > >> make > > >> >> some > > >> >> >>> instance master by configuration only if all the instances need > > to > > >> be > > >> >> >>> restarted? > > >> >> >>> >> > > >> >> >>> >> I tried also a cluster with 2 masters and 2 slaves with 2 > > >> separate > > >> >> >>> group-name values, but for some reason the second master I > > started > > >> >> became > > >> >> >>> slave for the first immediately. I expected it to become a > > >> clustered > > >> >> >>> load-balancing parallel master. Our loads are not yet that high > > to > > >> >> require > > >> >> >>> more than one master, so it's just a theoretical question. > > >> >> >>> >> > > >> >> >>> >> BR! > > >> >> >>> >> Mihkel > > >> >> >>> >> > > >> >> >>> >>> On 13 October 2015 at 20:03, Clebert Suconic < > > >> >> >>> clebert.suco...@gmail.com> wrote: > > >> >> >>> >>> The master needs to copy its data from the backup back to > > live > > >> >> before > > >> >> >>> >>> it's activated. > > >> >> >>> >>> > > >> >> >>> >>> Do you have a test replicating this? > > >> >> >>> >>> > > >> >> >>> >>> Did you try the snapshot build? > > >> >> >>> >>> > > >> >> >>> >>> On Tue, Oct 13, 2015 at 11:58 AM, Mihkel Nõges > > >> >> >>> >>> <mihkel.no...@transferwise.com> wrote: > > >> >> >>> >>> > Hi, > > >> >> >>> >>> > > > >> >> >>> >>> > I configured replicating HA master-slave of Artemis 1.1.0 > > >> >> instances > > >> >> >>> on > > >> >> >>> >>> > Ubuntu 14.04.3. > > >> >> >>> >>> > > > >> >> >>> >>> > When I kill master the slave takes over as expected and > > >> starts > > >> >> >>> serving as > > >> >> >>> >>> > new master. When I then start the old master, it fails > with > > >> the > > >> >> >>> following > > >> >> >>> >>> > errors in the log: > > >> >> >>> >>> > > > >> >> >>> >>> > 16:35:46,476 ERROR > > [org.apache.activemq.artemis.core.server] > > >> >> >>> AMQ224008: > > >> >> >>> >>> > Failed to store id: java.lang.IllegalStateException: > Cannot > > >> find > > >> >> >>> add info 1 > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:799) > > >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRecord(JournalBase.java:183) > > >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:79) > > >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.deleteID(JournalStorageManager.java:1194) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.deleteID(BatchingIDGenerator.java:152) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.cleanup(BatchingIDGenerator.java:75) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.loadBindingJournal(JournalStorageManager.java: > > >> >> >>> 1784) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: > > >> >> >>> 1625) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: > > >> >> >>> 1535) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at java.lang.Thread.run(Thread.java:745) > [rt.jar:1.8.0_60] > > >> >> >>> >>> > > > >> >> >>> >>> > 16:35:46,572 WARN > > [org.apache.activemq.artemis.core.server] > > >> >> >>> AMQ222173: > > >> >> >>> >>> > Queue jms.queue.DLQ is duplicated during reload. This > queue > > >> will > > >> >> be > > >> >> >>> renamed > > >> >> >>> >>> > as jms.queue.DLQ-0 > > >> >> >>> >>> > 16:35:46,572 ERROR > > [org.apache.activemq.artemis.core.server] > > >> >> >>> AMQ224000: > > >> >> >>> >>> > Failure in initialisation: > java.lang.IllegalStateException: > > >> >> Cursor > > >> >> >>> 2 had > > >> >> >>> >>> > already been created > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.createSubscription(PageCursorProviderImpl.java:97) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.initQueues(PostOfficeJournalLoader.java:140) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: > > >> >> >>> 1631) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: > > >> >> >>> 1535) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at > > >> >> >>> >>> > > > >> >> >>> > > >> >> > > >> > > > org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) > > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > > >> >> >>> >>> > at java.lang.Thread.run(Thread.java:745) > [rt.jar:1.8.0_60] > > >> >> >>> >>> > > > >> >> >>> >>> > As a result both master and the slave remain unaccessible > > >> and no > > >> >> >>> further > > >> >> >>> >>> > restarts solve the situation. > > >> >> >>> >>> > > > >> >> >>> >>> > Attached also master and slave broker.xml files. > > >> >> >>> >>> > > > >> >> >>> >>> > BR! > > >> >> >>> >>> > > > >> >> >>> >>> > Mihkel Nõges > > >> >> >>> >>> > > >> >> >>> >>> > > >> >> >>> >>> > > >> >> >>> >>> -- > > >> >> >>> >>> Clebert Suconic > > >> >> >>> > > > >> >> >>> > > >> >> >> > > >> >> >> > > >> >> > > >> >> > > >> >> > > >> >> -- > > >> >> Clebert Suconic > > >> >> > > >> > > >> > > >> > > >> -- > > >> Clebert Suconic > > >> > > > > > > > > >