You can't have 2 masters using the same shared-store. However, you can have 2 masters each with their own store.
Justin ----- Original Message ----- From: "Mihkel Nõges" <mihkel.no...@transferwise.com> To: users@activemq.apache.org Sent: Tuesday, October 20, 2015 9:24:21 AM Subject: Re: [Artemis] Master fails to start up after failback Also I had a question earlier about having more than one Artemis master in single cluster. When I tried this it resulted in only one master becoming a master, the other one became a slave for the first one started even though I set different group-name values for them in broker.xml. Is this expected behavior? <ha-policy> <shared-store> <master> <group-name>ha-cluster1</group-name> </master> </shared-store> </ha-policy> <ha-policy> <shared-store> <master> <group-name>ha-cluster2</group-name> </master> </shared-store> </ha-policy> Mihkel On 20 October 2015 at 16:53, Mihkel Nõges <mihkel.no...@transferwise.com> wrote: > Hi Tim, Clebert! > > Yes we considered also the alternatives ( > http://activemq.apache.org/masterslave.html): > *Shared Storage:* > > We do not have high performance shared storage solution. We have some > solution for our current file storage needs, but it's I/O is said to be > very slow and would need to be extended to support extra load. > > *Replicated LevelDB:* > > It sounds cool, but I'm a little bit afraid of moving from one > experimental solution to the next. I noticed LevelDB does not support some > of the features we need like Scheduled message delivery: > http://activemq.apache.org/replicated-leveldb-store.html > The LevelDB store does not yet support storing data associated with Delay > and Schedule Message Delivery. Those are are stored in a separate > non-replicated KahaDB data files. Unexpected results will occur if you use > Delay and Schedule Message Delivery with the replicated leveldb store since > that data will be not be there when the master fails over to a slave. > > Note like this make me feel very uneasy about the solution. > > *JDBC:* > > So it seems to me like the most reliable highly available messaging > solution in ActiveMQ 5 is JDBC. We have MySQL running as our main DB and > setting up a second DB for messaging would be fairly simple for standard > procedures of maintenance, backups and disaster recovery etc. > > > I consider this only as a temporary solution until we can use more > performant alternative configuration and I'm not expecting Artemis to > implement support for JDBC storage ever. > > We are using messaging in process of splitting our monolithic application > into micro-services. As this is gradual process, the amount of messages > would be very small in the beginning, so having low performing but reliable > JDBC backed broker configuration seems good for start. > > I was trying to find the more orthodox approach, but could not find or get > good suggestions. I tried disabling fail-back and starting master like that > resulted in both servers spamming in the logs another server with the same > ID is running. Do I understand correctly I should have backed up and > removed the /data folder of the master, reconfigured it as a slave and > started it then? > > Can you give me some overview of already existing deployments of highly > available and failing over (not necessarily failing back) Artemis > installations in production I may change my mind about going with it from > the start. > > Mihkel > > > On 20 October 2015 at 16:19, Clebert Suconic <clebert.suco...@gmail.com> > wrote: > >> As far as I know ActiveMQ5 doesn't do failback on the master-slave >> journal... and it doesn't have any protocol to sync the data between >> master and slave. >> >> >> There is a small regression on the failback that we are dealing now... >> if you set the master as a backup it would work fine... >> >> >> I think your testcase is a bit non orthodox... >> >> TBH production guys usually don't use failback.. they keep the backup >> until they can get to a quiet period and then do the failback (or >> restart the system) under low load. >> >> >> I also second Tim Bain on your choice for JDBC. >> >> I actually always say this.. if you can use JDBC as a storage for >> messaging.. don't use messaging at all.. just store and retrieve from >> the Database. >> >> >> There's a JIRA open for Artemis on JDBC.. but usually those things are >> written because users want, not need it. >> >> On Tue, Oct 20, 2015 at 3:12 AM, Mihkel Nõges >> <mihkel.no...@transferwise.com> wrote: >> > Yes I saw that issue too and set myself as watcher of this when it was >> > created. I did not think it could be exactly the same as it is >> described to >> > present itself only in narrow timing related conditions. My case seems >> to >> > be much more broad and basic. Seems like nobody actually tried to set >> this >> > up in realistic situation. >> > >> > Do you know of any existing production deployments of Artemis (or >> hornetq) >> > with failover? I thought Artemis as based on hornetq should have it's >> > features as stable as last hornetq version. We have already used >> embedded >> > hornetq for some time happily. I think it would make a lot of sense to >> > grade the Artemis features publicly as what is their maturity and usage >> > statistics of each feature if known, so it would be easier to compare >> the >> > brokers even among the 3 variants of ActiveMQ family. >> > >> > I think it's more safe for us to start building our first messaging >> > features on ActiveMQ 5.12.1 with JDBC backed Master-Slave instead of >> > Artemis and switch to Artemis once it has become more stable and also >> our >> > needs for scalability have grown to make it reasonable. Right now it >> seems >> > there are still too big blockers which may threaten the stability of our >> > system in Artemis. >> > >> > I did not mean this letter to be in no means negative. In the opposite I >> > really hope Artemis would do well as it comes with such a great >> technical >> > foundation and elegant ideas. I think the best for Artemis would be to >> find >> > users that can trust it's features and improve it as they grow. This >> means >> > the nucleus of Artemis must be really solid and stable. >> > >> > BR! >> > Mihkel Nõges >> > >> > >> > >> > On 19 October 2015 at 22:15, Clebert Suconic <clebert.suco...@gmail.com >> > >> > wrote: >> > >> >> Looks related to me: >> >> >> >> https://issues.apache.org/jira/browse/ARTEMIS-256 >> >> >> >> >> >> >> >> On Mon, Oct 19, 2015 at 4:04 AM, Mihkel Nõges >> >> <mihkel.no...@transferwise.com> wrote: >> >> > Basic flow of getting unresponsive failback cluster: >> >> > Have machine with Ubuntu 14.04.3 >> >> > >> >> > 1. Install libaio1, Java 1.8.0_60, maven 3.3.3, download and >> extract >> >> > apache-artemis-1.1.0-bin >> >> > < >> >> >> http://www.eu.apache.org/dist/activemq/activemq-artemis/1.1.0/apache-artemis-1.1.0-bin.tar.gz >> >> > >> >> > in >> >> > /opt >> >> > 2. run $ mvn -Prelease install and $ mnv verify in >> >> > >> /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback >> >> > SUCCESS >> >> > 3. Clean data folders and starts both servers manually: >> >> > $ >> >> > cd >> >> >> /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target >> >> > $ rm -R server0/data/ >> >> > $ rm -R server1/data/ >> >> > $ server0/bin/artemis-service start >> >> > Starting artemis-service >> >> > artemis-service is now running (7154) >> >> > $ server1/bin/artemis-service start >> >> > Starting artemis-service >> >> > artemis-service is now running (7180) >> >> > 4. Kill master server and wait for slave to take over >> >> > $ kill -9 7154 >> >> > >> >> > $ tail -f server1/log/artemis.log >> >> > 08:52:54,798 INFO [org.apache.activemq.artemis.core.server] >> >> AMQ221043: >> >> > Protocol module found: [artemis-stomp-protocol]. Adding protocol >> >> support >> >> > for: STOMP >> >> > 08:53:02,145 INFO [org.apache.activemq.artemis.core.server] >> >> AMQ221109: >> >> > Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] >> started, >> >> waiting >> >> > live to fail before it gets active >> >> > 08:53:03,582 INFO [org.apache.activemq.artemis.core.server] >> >> AMQ221024: >> >> > Backup server >> >> > >> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 is >> >> > synchronized with live-server. >> >> > 08:53:03,777 INFO [org.apache.activemq.artemis.core.server] >> >> AMQ221031: >> >> > backup announced >> >> > 08:55:59,292 INFO [org.apache.activemq.artemis.core.server] >> >> AMQ221037: >> >> > >> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 to >> >> > become 'live' >> >> > 08:55:59,302 WARN [org.apache.activemq.artemis.core.client] >> >> AMQ212004: >> >> > Failed to connect to server. >> >> > 08:55:59,778 INFO [org.apache.activemq.artemis.core.server] >> >> AMQ221003: >> >> > trying to deploy queue jms.queue.exampleQueue >> >> > 08:55:59,829 WARN [org.apache.activemq.artemis.core.client] >> >> AMQ212034: >> >> > There are more than one servers on the network broadcasting the >> same >> >> node >> >> > id. You will see this message exactly once (per node) if a node is >> >> > restarted, in which case it can be safely ignored. But if it is >> logged >> >> > continuously it means you really do have more than one node on the >> >> same >> >> > network active concurrently with the same node id. This could >> occur >> >> if you >> >> > have a backup node active at the same time as its live node. >> >> > nodeID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 >> >> > 08:55:59,836 INFO [org.apache.activemq.artemis.core.server] >> >> AMQ221007: >> >> > Server is now live >> >> > 08:55:59,869 INFO [org.apache.activemq.artemis.core.server] >> >> AMQ221020: >> >> > Started Acceptor at broker3:61617 for protocols >> >> > [CORE,MQTT,AMQP,HORNETQ,STOMP,OPENWIRE] >> >> > 5. >> >> > >> >> > Start master again and observer the logs: >> >> > $ server0/bin/artemis-service start >> >> > Starting artemis-service >> >> > artemis-service is now running (7388) >> >> > >> >> > $ tail -f server0/log/artemis.log >> >> > 08:57:24,625 INFO [org.apache.activemq.artemis.core.server] >> AMQ221012: >> >> > Using AIO Journal >> >> > 08:57:24,694 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >> > Protocol module found: [artemis-server]. Adding protocol support for: >> >> CORE >> >> > 08:57:24,702 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >> > Protocol module found: [artemis-amqp-protocol]. Adding protocol >> support >> >> > for: AMQP >> >> > 08:57:24,731 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >> > Protocol module found: [artemis-hornetq-protocol]. Adding protocol >> >> support >> >> > for: HORNETQ >> >> > 08:57:24,733 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >> > Protocol module found: [artemis-mqtt-protocol]. Adding protocol >> support >> >> > for: MQTT >> >> > 08:57:24,743 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >> > Protocol module found: [artemis-openwire-protocol]. Adding protocol >> >> support >> >> > for: OPENWIRE >> >> > 08:57:24,878 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >> > Protocol module found: [artemis-stomp-protocol]. Adding protocol >> support >> >> > for: STOMP >> >> > 08:57:25,082 INFO [org.apache.activemq.artemis.core.server] >> AMQ221109: >> >> > Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] started, >> >> waiting >> >> > live to fail before it gets active >> >> > 08:57:27,043 INFO [org.apache.activemq.artemis.core.server] >> AMQ221024: >> >> > Backup server >> >> > ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 >> is >> >> > synchronized with live-server. >> >> > 08:57:27,948 INFO [org.apache.activemq.artemis.core.server] >> AMQ221031: >> >> > backup announced >> >> > 08:57:31,227 WARN [org.apache.activemq.artemis.core.client] >> AMQ212037: >> >> > Connection failure has been detected: AMQ119015: The connection was >> >> > disconnected because of server shutdown [code=DISCONNECTED] >> >> > 08:57:31,252 WARN [org.apache.activemq.artemis.core.client] >> AMQ212037: >> >> > Connection failure has been detected: AMQ119015: The connection was >> >> > disconnected because of server shutdown [code=DISCONNECTED] >> >> > 08:57:31,307 WARN [org.apache.activemq.artemis.core.client] >> AMQ212037: >> >> > Connection failure has been detected: AMQ119015: The connection was >> >> > disconnected because of server shutdown [code=DISCONNECTED] >> >> > 08:57:31,339 INFO [org.apache.activemq.artemis.core.server] >> AMQ221037: >> >> > ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 >> to >> >> > become 'live' >> >> > 08:57:31,360 WARN [org.apache.activemq.artemis.core.client] >> AMQ212004: >> >> > Failed to connect to server. >> >> > 08:57:31,413 ERROR [org.apache.activemq.artemis.core.server] >> AMQ224008: >> >> > Failed to store id: java.lang.IllegalStateException: Cannot find add >> >> info 1 >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:799) >> >> > [artemis-journal-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRecord(JournalBase.java:183) >> >> > [artemis-journal-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:79) >> >> > [artemis-journal-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.deleteID(JournalStorageManager.java:1194) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.deleteID(BatchingIDGenerator.java:152) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.cleanup(BatchingIDGenerator.java:75) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.loadBindingJournal(JournalStorageManager.java: >> 1784) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: >> 1625) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: >> 1535) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> >> > 08:57:31,540 WARN [org.apache.activemq.artemis.core.server] >> AMQ222173: >> >> > Queue jms.queue.exampleQueue is duplicated during reload. This queue >> will >> >> > be renamed as jms.queue.exampleQueue-0 >> >> > 08:57:31,550 ERROR [org.apache.activemq.artemis.core.server] >> AMQ224000: >> >> > Failure in initialisation: java.lang.IllegalStateException: Cursor 2 >> had >> >> > already been created >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.createSubscription(PageCursorProviderImpl.java:97) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.initQueues(PostOfficeJournalLoader.java:140) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: >> 1631) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: >> 1535) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at >> >> > >> >> >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) >> >> > [artemis-server-1.1.0.jar:1.1.0] >> >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> >> > >> >> > >> >> > On 19 October 2015 at 10:31, Mihkel Nõges < >> mihkel.no...@transferwise.com >> >> > >> >> > wrote: >> >> > >> >> >> Hi Clebert, >> >> >> >> >> >> I do not have other code to share with you but the example code in >> >> Artemis >> >> >> 1.1.0 binary deployment package. I'm running >> >> >> org.apache.activemq.artemis.jms.example.ReplicatedFailbackExample >> >> >> >> >> >> And only commented out the serverStart and killServer calls which I >> am >> >> >> doing manually. >> >> >> >> >> >> I do not think I do any of the steps too fast as I tail the server >> log >> >> >> files in parallel and see everything is finished when I start the >> fail >> >> >> back. I have waited many minutes in between. >> >> >> >> >> >> Only changes in configuration to the test is changing localhost >> >> addresses >> >> >> with broker3 to make the cluster accessible remotely. >> >> >> >> >> >> BR! >> >> >> MIhkel >> >> >> >> >> >> On 18 October 2015 at 17:49, Clebert <clebert.suco...@gmail.com> >> wrote: >> >> >> >> >> >>> Im not on my computer now but it sounds like you are doing a fail >> back >> >> >>> immediately after failed over. It takes some time (seconds) to the >> >> server >> >> >>> to activate on the backup. >> >> >>> >> >> >>> Later the server will need to copy the data back before it can be >> >> >>> activated in fail back mode. >> >> >>> >> >> >>> It sounds the live is not reaching backup for fail back. >> >> >>> >> >> >>> I will try looking it at it on Monday. Maybe you could post your >> >> example >> >> >>> at your GitHub fork. >> >> >>> >> >> >>> -- Clebert Suconic typing on the iPhone. >> >> >>> >> >> >>> > On Oct 18, 2015, at 08:15, Mihkel Nõges < >> >> mihkel.no...@transferwise.com> >> >> >>> wrote: >> >> >>> > >> >> >>> > Hello again! >> >> >>> > >> >> >>> > I would be very grateful If someone could answer my questions. We >> >> need >> >> >>> the high availability to work to use the broker in production. >> >> >>> > >> >> >>> > When I run the replicated-failback example in one machine >> (broker3) >> >> it >> >> >>> succeeds. >> >> >>> > >> >> >>> > It fails when I run the same test - exactly the same servers with >> >> >>> slightly modified client remotely. >> >> >>> > >> >> >>> > I run client in debug mode from my IDE with commented out >> serverStart >> >> >>> and killServer calls. >> >> >>> > Deleted data folders and started the servers: >> >> >>> > artemis@broker3 >> >> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >> >> >>> rm -R server0/data/ >> >> >>> > >> >> >>> > artemis@broker3 >> >> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >> >> >>> rm -R server1/data/ >> >> >>> > >> >> >>> > artemis@broker3 >> >> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >> >> >>> server0/bin/artemis-service start >> >> >>> > >> >> >>> > Starting artemis-service >> >> >>> > >> >> >>> > artemis-service is now running (23357) >> >> >>> > >> >> >>> > artemis@broker3 >> >> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >> >> >>> server1/bin/artemis-service start >> >> >>> > >> >> >>> > Starting artemis-service >> >> >>> > >> >> >>> > artemis-service is now running (23383) >> >> >>> > >> >> >>> > Starting client and stopping on breakpoint at line 103: >> >> >>> > //ServerUtil.killServer(server0); >> >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages will >> fail >> >> >>> as failover to the >> >> >>> > // backup server has occurred >> >> >>> > try { >> >> >>> > message0.acknowledge(); //line 103 >> >> >>> > killing server0 >> >> >>> > artemis@broker3 >> >> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >> >> >>> kill -9 23357 >> >> >>> > >> >> >>> > Proceeding to breakpoint at line 121: >> >> >>> > //server0 = ServerUtil.startServer(args[0], >> >> >>> ReplicatedFailbackExample.class.getSimpleName() + "0", 0, 10000); >> >> >>> > >> >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages will >> fail >> >> >>> as failover to the >> >> >>> > // backup server has occurred >> >> >>> > try { >> >> >>> > message0.acknowledge(); // line 121 >> >> >>> > Starting server0: >> >> >>> > artemis@broker3 >> >> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >> >> >>> server0/bin/artemis-service start >> >> >>> > >> >> >>> > Starting artemis-service >> >> >>> > >> >> >>> > artemis-service is now running (24240) >> >> >>> > >> >> >>> > Server0 writes ERROR to it's log (see attached >> server0_artemis.log). >> >> >>> > Now when trying to proceed with the client it writes the >> following in >> >> >>> the log and does not exit, but remains hanging forever: >> >> >>> > >> >> >>> > Oct 18, 2015 2:55:34 PM >> >> >>> >> >> >> org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl >> >> >>> fail >> >> >>> > >> >> >>> > WARN: AMQ212037: Connection failure has been detected: >> AMQ119015: The >> >> >>> connection was disconnected because of server shutdown >> >> [code=DISCONNECTED] >> >> >>> > >> >> >>> > Got message: This is text message 20 (redelivered?: false) >> >> >>> > >> >> >>> > Got exception while acknowledging message: AMQ119014: Timed out >> after >> >> >>> waiting 30,000 ms for response when sending packet 43 >> >> >>> > >> >> >>> > Got message: This is text message 21 (redelivered?: false) >> >> >>> > >> >> >>> > Got message: This is text message 22 (redelivered?: false) >> >> >>> > >> >> >>> > Got message: This is text message 23 (redelivered?: false) >> >> >>> > >> >> >>> > Got message: This is text message 24 (redelivered?: false) >> >> >>> > >> >> >>> > Got message: This is text message 25 (redelivered?: false) >> >> >>> > >> >> >>> > Got message: This is text message 26 (redelivered?: false) >> >> >>> > >> >> >>> > Got message: This is text message 27 (redelivered?: false) >> >> >>> > >> >> >>> > Got message: This is text message 28 (redelivered?: false) >> >> >>> > >> >> >>> > Got message: This is text message 29 (redelivered?: false) >> >> >>> > >> >> >>> > As a result the slave (server1) remains stopped, not restarted as >> >> >>> expected and the master (server0) process appears to be running but >> >> does >> >> >>> not accept any connections. >> >> >>> > >> >> >>> > Exactly the same behavior is observable every time I try this. >> >> >>> > >> >> >>> > BR! >> >> >>> > Mihkel >> >> >>> > >> >> >>> >> On 13 October 2015 at 20:17, Mihkel Nõges < >> >> >>> mihkel.no...@transferwise.com> wrote: >> >> >>> >> Hi Clebert, >> >> >>> >> >> >> >>> >> No test, just doing it on command line with standalone servers. >> I'm >> >> >>> using 1.1.0 installed with wget, not the snapshot. >> >> >>> >> >> >> >>> >> I'm wondering what should be the suggested procedure for admins >> to >> >> do >> >> >>> changes to HA cluster of 2 or 3 nodes of Artemis. If one of the >> nodes >> >> is >> >> >>> master by configuration, do they need to change it's config before >> >> >>> restarting it to become slave to have seamless change process and >> make >> >> some >> >> >>> instance master by configuration only if all the instances need to >> be >> >> >>> restarted? >> >> >>> >> >> >> >>> >> I tried also a cluster with 2 masters and 2 slaves with 2 >> separate >> >> >>> group-name values, but for some reason the second master I started >> >> became >> >> >>> slave for the first immediately. I expected it to become a >> clustered >> >> >>> load-balancing parallel master. Our loads are not yet that high to >> >> require >> >> >>> more than one master, so it's just a theoretical question. >> >> >>> >> >> >> >>> >> BR! >> >> >>> >> Mihkel >> >> >>> >> >> >> >>> >>> On 13 October 2015 at 20:03, Clebert Suconic < >> >> >>> clebert.suco...@gmail.com> wrote: >> >> >>> >>> The master needs to copy its data from the backup back to live >> >> before >> >> >>> >>> it's activated. >> >> >>> >>> >> >> >>> >>> Do you have a test replicating this? >> >> >>> >>> >> >> >>> >>> Did you try the snapshot build? >> >> >>> >>> >> >> >>> >>> On Tue, Oct 13, 2015 at 11:58 AM, Mihkel Nõges >> >> >>> >>> <mihkel.no...@transferwise.com> wrote: >> >> >>> >>> > Hi, >> >> >>> >>> > >> >> >>> >>> > I configured replicating HA master-slave of Artemis 1.1.0 >> >> instances >> >> >>> on >> >> >>> >>> > Ubuntu 14.04.3. >> >> >>> >>> > >> >> >>> >>> > When I kill master the slave takes over as expected and >> starts >> >> >>> serving as >> >> >>> >>> > new master. When I then start the old master, it fails with >> the >> >> >>> following >> >> >>> >>> > errors in the log: >> >> >>> >>> > >> >> >>> >>> > 16:35:46,476 ERROR [org.apache.activemq.artemis.core.server] >> >> >>> AMQ224008: >> >> >>> >>> > Failed to store id: java.lang.IllegalStateException: Cannot >> find >> >> >>> add info 1 >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:799) >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRecord(JournalBase.java:183) >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:79) >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.deleteID(JournalStorageManager.java:1194) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.deleteID(BatchingIDGenerator.java:152) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.cleanup(BatchingIDGenerator.java:75) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.loadBindingJournal(JournalStorageManager.java: >> >> >>> 1784) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: >> >> >>> 1625) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: >> >> >>> 1535) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> >> >>> >>> > >> >> >>> >>> > 16:35:46,572 WARN [org.apache.activemq.artemis.core.server] >> >> >>> AMQ222173: >> >> >>> >>> > Queue jms.queue.DLQ is duplicated during reload. This queue >> will >> >> be >> >> >>> renamed >> >> >>> >>> > as jms.queue.DLQ-0 >> >> >>> >>> > 16:35:46,572 ERROR [org.apache.activemq.artemis.core.server] >> >> >>> AMQ224000: >> >> >>> >>> > Failure in initialisation: java.lang.IllegalStateException: >> >> Cursor >> >> >>> 2 had >> >> >>> >>> > already been created >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.createSubscription(PageCursorProviderImpl.java:97) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.initQueues(PostOfficeJournalLoader.java:140) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: >> >> >>> 1631) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: >> >> >>> 1535) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at >> >> >>> >>> > >> >> >>> >> >> >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >> >>> >>> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> >> >>> >>> > >> >> >>> >>> > As a result both master and the slave remain unaccessible >> and no >> >> >>> further >> >> >>> >>> > restarts solve the situation. >> >> >>> >>> > >> >> >>> >>> > Attached also master and slave broker.xml files. >> >> >>> >>> > >> >> >>> >>> > BR! >> >> >>> >>> > >> >> >>> >>> > Mihkel Nõges >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> -- >> >> >>> >>> Clebert Suconic >> >> >>> > >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> Clebert Suconic >> >> >> >> >> >> -- >> Clebert Suconic >> > >