Yes, sorry, had typo in email. Had replicated conf and second master became slave for the first.
Mihkel On 20 October 2015 at 17:44, Justin Bertram <jbert...@apache.com> wrote: > You can't have 2 masters using the same shared-store. However, you can > have 2 masters each with their own store. > > > Justin > > ----- Original Message ----- > From: "Mihkel Nõges" <mihkel.no...@transferwise.com> > To: users@activemq.apache.org > Sent: Tuesday, October 20, 2015 9:24:21 AM > Subject: Re: [Artemis] Master fails to start up after failback > > Also I had a question earlier about having more than one Artemis master in > single cluster. When I tried this it resulted in only one master becoming a > master, the other one became a slave for the first one started even though > I set different group-name values for them in broker.xml. Is this expected > behavior? > > <ha-policy> > <shared-store> > <master> > <group-name>ha-cluster1</group-name> > </master> > </shared-store> > </ha-policy> > > <ha-policy> > <shared-store> > <master> > <group-name>ha-cluster2</group-name> > </master> > </shared-store> > </ha-policy> > > Mihkel > > On 20 October 2015 at 16:53, Mihkel Nõges <mihkel.no...@transferwise.com> > wrote: > > > Hi Tim, Clebert! > > > > Yes we considered also the alternatives ( > > http://activemq.apache.org/masterslave.html): > > *Shared Storage:* > > > > We do not have high performance shared storage solution. We have some > > solution for our current file storage needs, but it's I/O is said to be > > very slow and would need to be extended to support extra load. > > > > *Replicated LevelDB:* > > > > It sounds cool, but I'm a little bit afraid of moving from one > > experimental solution to the next. I noticed LevelDB does not support > some > > of the features we need like Scheduled message delivery: > > http://activemq.apache.org/replicated-leveldb-store.html > > The LevelDB store does not yet support storing data associated with Delay > > and Schedule Message Delivery. Those are are stored in a separate > > non-replicated KahaDB data files. Unexpected results will occur if you > use > > Delay and Schedule Message Delivery with the replicated leveldb store > since > > that data will be not be there when the master fails over to a slave. > > > > Note like this make me feel very uneasy about the solution. > > > > *JDBC:* > > > > So it seems to me like the most reliable highly available messaging > > solution in ActiveMQ 5 is JDBC. We have MySQL running as our main DB and > > setting up a second DB for messaging would be fairly simple for standard > > procedures of maintenance, backups and disaster recovery etc. > > > > > > I consider this only as a temporary solution until we can use more > > performant alternative configuration and I'm not expecting Artemis to > > implement support for JDBC storage ever. > > > > We are using messaging in process of splitting our monolithic application > > into micro-services. As this is gradual process, the amount of messages > > would be very small in the beginning, so having low performing but > reliable > > JDBC backed broker configuration seems good for start. > > > > I was trying to find the more orthodox approach, but could not find or > get > > good suggestions. I tried disabling fail-back and starting master like > that > > resulted in both servers spamming in the logs another server with the > same > > ID is running. Do I understand correctly I should have backed up and > > removed the /data folder of the master, reconfigured it as a slave and > > started it then? > > > > Can you give me some overview of already existing deployments of highly > > available and failing over (not necessarily failing back) Artemis > > installations in production I may change my mind about going with it from > > the start. > > > > Mihkel > > > > > > On 20 October 2015 at 16:19, Clebert Suconic <clebert.suco...@gmail.com> > > wrote: > > > >> As far as I know ActiveMQ5 doesn't do failback on the master-slave > >> journal... and it doesn't have any protocol to sync the data between > >> master and slave. > >> > >> > >> There is a small regression on the failback that we are dealing now... > >> if you set the master as a backup it would work fine... > >> > >> > >> I think your testcase is a bit non orthodox... > >> > >> TBH production guys usually don't use failback.. they keep the backup > >> until they can get to a quiet period and then do the failback (or > >> restart the system) under low load. > >> > >> > >> I also second Tim Bain on your choice for JDBC. > >> > >> I actually always say this.. if you can use JDBC as a storage for > >> messaging.. don't use messaging at all.. just store and retrieve from > >> the Database. > >> > >> > >> There's a JIRA open for Artemis on JDBC.. but usually those things are > >> written because users want, not need it. > >> > >> On Tue, Oct 20, 2015 at 3:12 AM, Mihkel Nõges > >> <mihkel.no...@transferwise.com> wrote: > >> > Yes I saw that issue too and set myself as watcher of this when it was > >> > created. I did not think it could be exactly the same as it is > >> described to > >> > present itself only in narrow timing related conditions. My case seems > >> to > >> > be much more broad and basic. Seems like nobody actually tried to set > >> this > >> > up in realistic situation. > >> > > >> > Do you know of any existing production deployments of Artemis (or > >> hornetq) > >> > with failover? I thought Artemis as based on hornetq should have it's > >> > features as stable as last hornetq version. We have already used > >> embedded > >> > hornetq for some time happily. I think it would make a lot of sense to > >> > grade the Artemis features publicly as what is their maturity and > usage > >> > statistics of each feature if known, so it would be easier to compare > >> the > >> > brokers even among the 3 variants of ActiveMQ family. > >> > > >> > I think it's more safe for us to start building our first messaging > >> > features on ActiveMQ 5.12.1 with JDBC backed Master-Slave instead of > >> > Artemis and switch to Artemis once it has become more stable and also > >> our > >> > needs for scalability have grown to make it reasonable. Right now it > >> seems > >> > there are still too big blockers which may threaten the stability of > our > >> > system in Artemis. > >> > > >> > I did not mean this letter to be in no means negative. In the > opposite I > >> > really hope Artemis would do well as it comes with such a great > >> technical > >> > foundation and elegant ideas. I think the best for Artemis would be to > >> find > >> > users that can trust it's features and improve it as they grow. This > >> means > >> > the nucleus of Artemis must be really solid and stable. > >> > > >> > BR! > >> > Mihkel Nõges > >> > > >> > > >> > > >> > On 19 October 2015 at 22:15, Clebert Suconic < > clebert.suco...@gmail.com > >> > > >> > wrote: > >> > > >> >> Looks related to me: > >> >> > >> >> https://issues.apache.org/jira/browse/ARTEMIS-256 > >> >> > >> >> > >> >> > >> >> On Mon, Oct 19, 2015 at 4:04 AM, Mihkel Nõges > >> >> <mihkel.no...@transferwise.com> wrote: > >> >> > Basic flow of getting unresponsive failback cluster: > >> >> > Have machine with Ubuntu 14.04.3 > >> >> > > >> >> > 1. Install libaio1, Java 1.8.0_60, maven 3.3.3, download and > >> extract > >> >> > apache-artemis-1.1.0-bin > >> >> > < > >> >> > >> > http://www.eu.apache.org/dist/activemq/activemq-artemis/1.1.0/apache-artemis-1.1.0-bin.tar.gz > >> >> > > >> >> > in > >> >> > /opt > >> >> > 2. run $ mvn -Prelease install and $ mnv verify in > >> >> > > >> /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback > >> >> > SUCCESS > >> >> > 3. Clean data folders and starts both servers manually: > >> >> > $ > >> >> > cd > >> >> > >> > /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target > >> >> > $ rm -R server0/data/ > >> >> > $ rm -R server1/data/ > >> >> > $ server0/bin/artemis-service start > >> >> > Starting artemis-service > >> >> > artemis-service is now running (7154) > >> >> > $ server1/bin/artemis-service start > >> >> > Starting artemis-service > >> >> > artemis-service is now running (7180) > >> >> > 4. Kill master server and wait for slave to take over > >> >> > $ kill -9 7154 > >> >> > > >> >> > $ tail -f server1/log/artemis.log > >> >> > 08:52:54,798 INFO [org.apache.activemq.artemis.core.server] > >> >> AMQ221043: > >> >> > Protocol module found: [artemis-stomp-protocol]. Adding protocol > >> >> support > >> >> > for: STOMP > >> >> > 08:53:02,145 INFO [org.apache.activemq.artemis.core.server] > >> >> AMQ221109: > >> >> > Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] > >> started, > >> >> waiting > >> >> > live to fail before it gets active > >> >> > 08:53:03,582 INFO [org.apache.activemq.artemis.core.server] > >> >> AMQ221024: > >> >> > Backup server > >> >> > > >> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 is > >> >> > synchronized with live-server. > >> >> > 08:53:03,777 INFO [org.apache.activemq.artemis.core.server] > >> >> AMQ221031: > >> >> > backup announced > >> >> > 08:55:59,292 INFO [org.apache.activemq.artemis.core.server] > >> >> AMQ221037: > >> >> > > >> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 to > >> >> > become 'live' > >> >> > 08:55:59,302 WARN [org.apache.activemq.artemis.core.client] > >> >> AMQ212004: > >> >> > Failed to connect to server. > >> >> > 08:55:59,778 INFO [org.apache.activemq.artemis.core.server] > >> >> AMQ221003: > >> >> > trying to deploy queue jms.queue.exampleQueue > >> >> > 08:55:59,829 WARN [org.apache.activemq.artemis.core.client] > >> >> AMQ212034: > >> >> > There are more than one servers on the network broadcasting the > >> same > >> >> node > >> >> > id. You will see this message exactly once (per node) if a node > is > >> >> > restarted, in which case it can be safely ignored. But if it is > >> logged > >> >> > continuously it means you really do have more than one node on > the > >> >> same > >> >> > network active concurrently with the same node id. This could > >> occur > >> >> if you > >> >> > have a backup node active at the same time as its live node. > >> >> > nodeID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 > >> >> > 08:55:59,836 INFO [org.apache.activemq.artemis.core.server] > >> >> AMQ221007: > >> >> > Server is now live > >> >> > 08:55:59,869 INFO [org.apache.activemq.artemis.core.server] > >> >> AMQ221020: > >> >> > Started Acceptor at broker3:61617 for protocols > >> >> > [CORE,MQTT,AMQP,HORNETQ,STOMP,OPENWIRE] > >> >> > 5. > >> >> > > >> >> > Start master again and observer the logs: > >> >> > $ server0/bin/artemis-service start > >> >> > Starting artemis-service > >> >> > artemis-service is now running (7388) > >> >> > > >> >> > $ tail -f server0/log/artemis.log > >> >> > 08:57:24,625 INFO [org.apache.activemq.artemis.core.server] > >> AMQ221012: > >> >> > Using AIO Journal > >> >> > 08:57:24,694 INFO [org.apache.activemq.artemis.core.server] > >> AMQ221043: > >> >> > Protocol module found: [artemis-server]. Adding protocol support > for: > >> >> CORE > >> >> > 08:57:24,702 INFO [org.apache.activemq.artemis.core.server] > >> AMQ221043: > >> >> > Protocol module found: [artemis-amqp-protocol]. Adding protocol > >> support > >> >> > for: AMQP > >> >> > 08:57:24,731 INFO [org.apache.activemq.artemis.core.server] > >> AMQ221043: > >> >> > Protocol module found: [artemis-hornetq-protocol]. Adding protocol > >> >> support > >> >> > for: HORNETQ > >> >> > 08:57:24,733 INFO [org.apache.activemq.artemis.core.server] > >> AMQ221043: > >> >> > Protocol module found: [artemis-mqtt-protocol]. Adding protocol > >> support > >> >> > for: MQTT > >> >> > 08:57:24,743 INFO [org.apache.activemq.artemis.core.server] > >> AMQ221043: > >> >> > Protocol module found: [artemis-openwire-protocol]. Adding protocol > >> >> support > >> >> > for: OPENWIRE > >> >> > 08:57:24,878 INFO [org.apache.activemq.artemis.core.server] > >> AMQ221043: > >> >> > Protocol module found: [artemis-stomp-protocol]. Adding protocol > >> support > >> >> > for: STOMP > >> >> > 08:57:25,082 INFO [org.apache.activemq.artemis.core.server] > >> AMQ221109: > >> >> > Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] started, > >> >> waiting > >> >> > live to fail before it gets active > >> >> > 08:57:27,043 INFO [org.apache.activemq.artemis.core.server] > >> AMQ221024: > >> >> > Backup server > >> >> > ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 > >> is > >> >> > synchronized with live-server. > >> >> > 08:57:27,948 INFO [org.apache.activemq.artemis.core.server] > >> AMQ221031: > >> >> > backup announced > >> >> > 08:57:31,227 WARN [org.apache.activemq.artemis.core.client] > >> AMQ212037: > >> >> > Connection failure has been detected: AMQ119015: The connection was > >> >> > disconnected because of server shutdown [code=DISCONNECTED] > >> >> > 08:57:31,252 WARN [org.apache.activemq.artemis.core.client] > >> AMQ212037: > >> >> > Connection failure has been detected: AMQ119015: The connection was > >> >> > disconnected because of server shutdown [code=DISCONNECTED] > >> >> > 08:57:31,307 WARN [org.apache.activemq.artemis.core.client] > >> AMQ212037: > >> >> > Connection failure has been detected: AMQ119015: The connection was > >> >> > disconnected because of server shutdown [code=DISCONNECTED] > >> >> > 08:57:31,339 INFO [org.apache.activemq.artemis.core.server] > >> AMQ221037: > >> >> > ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 > >> to > >> >> > become 'live' > >> >> > 08:57:31,360 WARN [org.apache.activemq.artemis.core.client] > >> AMQ212004: > >> >> > Failed to connect to server. > >> >> > 08:57:31,413 ERROR [org.apache.activemq.artemis.core.server] > >> AMQ224008: > >> >> > Failed to store id: java.lang.IllegalStateException: Cannot find > add > >> >> info 1 > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:799) > >> >> > [artemis-journal-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRecord(JournalBase.java:183) > >> >> > [artemis-journal-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:79) > >> >> > [artemis-journal-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.deleteID(JournalStorageManager.java:1194) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.deleteID(BatchingIDGenerator.java:152) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.cleanup(BatchingIDGenerator.java:75) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.loadBindingJournal(JournalStorageManager.java: > >> 1784) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: > >> 1625) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: > >> 1535) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] > >> >> > 08:57:31,540 WARN [org.apache.activemq.artemis.core.server] > >> AMQ222173: > >> >> > Queue jms.queue.exampleQueue is duplicated during reload. This > queue > >> will > >> >> > be renamed as jms.queue.exampleQueue-0 > >> >> > 08:57:31,550 ERROR [org.apache.activemq.artemis.core.server] > >> AMQ224000: > >> >> > Failure in initialisation: java.lang.IllegalStateException: Cursor > 2 > >> had > >> >> > already been created > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.createSubscription(PageCursorProviderImpl.java:97) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.initQueues(PostOfficeJournalLoader.java:140) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: > >> 1631) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: > >> 1535) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at > >> >> > > >> >> > >> > org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) > >> >> > [artemis-server-1.1.0.jar:1.1.0] > >> >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] > >> >> > > >> >> > > >> >> > On 19 October 2015 at 10:31, Mihkel Nõges < > >> mihkel.no...@transferwise.com > >> >> > > >> >> > wrote: > >> >> > > >> >> >> Hi Clebert, > >> >> >> > >> >> >> I do not have other code to share with you but the example code in > >> >> Artemis > >> >> >> 1.1.0 binary deployment package. I'm running > >> >> >> org.apache.activemq.artemis.jms.example.ReplicatedFailbackExample > >> >> >> > >> >> >> And only commented out the serverStart and killServer calls which > I > >> am > >> >> >> doing manually. > >> >> >> > >> >> >> I do not think I do any of the steps too fast as I tail the server > >> log > >> >> >> files in parallel and see everything is finished when I start the > >> fail > >> >> >> back. I have waited many minutes in between. > >> >> >> > >> >> >> Only changes in configuration to the test is changing localhost > >> >> addresses > >> >> >> with broker3 to make the cluster accessible remotely. > >> >> >> > >> >> >> BR! > >> >> >> MIhkel > >> >> >> > >> >> >> On 18 October 2015 at 17:49, Clebert <clebert.suco...@gmail.com> > >> wrote: > >> >> >> > >> >> >>> Im not on my computer now but it sounds like you are doing a fail > >> back > >> >> >>> immediately after failed over. It takes some time (seconds) to > the > >> >> server > >> >> >>> to activate on the backup. > >> >> >>> > >> >> >>> Later the server will need to copy the data back before it can be > >> >> >>> activated in fail back mode. > >> >> >>> > >> >> >>> It sounds the live is not reaching backup for fail back. > >> >> >>> > >> >> >>> I will try looking it at it on Monday. Maybe you could post your > >> >> example > >> >> >>> at your GitHub fork. > >> >> >>> > >> >> >>> -- Clebert Suconic typing on the iPhone. > >> >> >>> > >> >> >>> > On Oct 18, 2015, at 08:15, Mihkel Nõges < > >> >> mihkel.no...@transferwise.com> > >> >> >>> wrote: > >> >> >>> > > >> >> >>> > Hello again! > >> >> >>> > > >> >> >>> > I would be very grateful If someone could answer my questions. > We > >> >> need > >> >> >>> the high availability to work to use the broker in production. > >> >> >>> > > >> >> >>> > When I run the replicated-failback example in one machine > >> (broker3) > >> >> it > >> >> >>> succeeds. > >> >> >>> > > >> >> >>> > It fails when I run the same test - exactly the same servers > with > >> >> >>> slightly modified client remotely. > >> >> >>> > > >> >> >>> > I run client in debug mode from my IDE with commented out > >> serverStart > >> >> >>> and killServer calls. > >> >> >>> > Deleted data folders and started the servers: > >> >> >>> > artemis@broker3 > >> >> > >> > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > >> >> >>> rm -R server0/data/ > >> >> >>> > > >> >> >>> > artemis@broker3 > >> >> > >> > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > >> >> >>> rm -R server1/data/ > >> >> >>> > > >> >> >>> > artemis@broker3 > >> >> > >> > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > >> >> >>> server0/bin/artemis-service start > >> >> >>> > > >> >> >>> > Starting artemis-service > >> >> >>> > > >> >> >>> > artemis-service is now running (23357) > >> >> >>> > > >> >> >>> > artemis@broker3 > >> >> > >> > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > >> >> >>> server1/bin/artemis-service start > >> >> >>> > > >> >> >>> > Starting artemis-service > >> >> >>> > > >> >> >>> > artemis-service is now running (23383) > >> >> >>> > > >> >> >>> > Starting client and stopping on breakpoint at line 103: > >> >> >>> > //ServerUtil.killServer(server0); > >> >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages > will > >> fail > >> >> >>> as failover to the > >> >> >>> > // backup server has occurred > >> >> >>> > try { > >> >> >>> > message0.acknowledge(); //line 103 > >> >> >>> > killing server0 > >> >> >>> > artemis@broker3 > >> >> > >> > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > >> >> >>> kill -9 23357 > >> >> >>> > > >> >> >>> > Proceeding to breakpoint at line 121: > >> >> >>> > //server0 = ServerUtil.startServer(args[0], > >> >> >>> ReplicatedFailbackExample.class.getSimpleName() + "0", 0, 10000); > >> >> >>> > > >> >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages > will > >> fail > >> >> >>> as failover to the > >> >> >>> > // backup server has occurred > >> >> >>> > try { > >> >> >>> > message0.acknowledge(); // line 121 > >> >> >>> > Starting server0: > >> >> >>> > artemis@broker3 > >> >> > >> > :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ > >> >> >>> server0/bin/artemis-service start > >> >> >>> > > >> >> >>> > Starting artemis-service > >> >> >>> > > >> >> >>> > artemis-service is now running (24240) > >> >> >>> > > >> >> >>> > Server0 writes ERROR to it's log (see attached > >> server0_artemis.log). > >> >> >>> > Now when trying to proceed with the client it writes the > >> following in > >> >> >>> the log and does not exit, but remains hanging forever: > >> >> >>> > > >> >> >>> > Oct 18, 2015 2:55:34 PM > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl > >> >> >>> fail > >> >> >>> > > >> >> >>> > WARN: AMQ212037: Connection failure has been detected: > >> AMQ119015: The > >> >> >>> connection was disconnected because of server shutdown > >> >> [code=DISCONNECTED] > >> >> >>> > > >> >> >>> > Got message: This is text message 20 (redelivered?: false) > >> >> >>> > > >> >> >>> > Got exception while acknowledging message: AMQ119014: Timed out > >> after > >> >> >>> waiting 30,000 ms for response when sending packet 43 > >> >> >>> > > >> >> >>> > Got message: This is text message 21 (redelivered?: false) > >> >> >>> > > >> >> >>> > Got message: This is text message 22 (redelivered?: false) > >> >> >>> > > >> >> >>> > Got message: This is text message 23 (redelivered?: false) > >> >> >>> > > >> >> >>> > Got message: This is text message 24 (redelivered?: false) > >> >> >>> > > >> >> >>> > Got message: This is text message 25 (redelivered?: false) > >> >> >>> > > >> >> >>> > Got message: This is text message 26 (redelivered?: false) > >> >> >>> > > >> >> >>> > Got message: This is text message 27 (redelivered?: false) > >> >> >>> > > >> >> >>> > Got message: This is text message 28 (redelivered?: false) > >> >> >>> > > >> >> >>> > Got message: This is text message 29 (redelivered?: false) > >> >> >>> > > >> >> >>> > As a result the slave (server1) remains stopped, not restarted > as > >> >> >>> expected and the master (server0) process appears to be running > but > >> >> does > >> >> >>> not accept any connections. > >> >> >>> > > >> >> >>> > Exactly the same behavior is observable every time I try this. > >> >> >>> > > >> >> >>> > BR! > >> >> >>> > Mihkel > >> >> >>> > > >> >> >>> >> On 13 October 2015 at 20:17, Mihkel Nõges < > >> >> >>> mihkel.no...@transferwise.com> wrote: > >> >> >>> >> Hi Clebert, > >> >> >>> >> > >> >> >>> >> No test, just doing it on command line with standalone > servers. > >> I'm > >> >> >>> using 1.1.0 installed with wget, not the snapshot. > >> >> >>> >> > >> >> >>> >> I'm wondering what should be the suggested procedure for > admins > >> to > >> >> do > >> >> >>> changes to HA cluster of 2 or 3 nodes of Artemis. If one of the > >> nodes > >> >> is > >> >> >>> master by configuration, do they need to change it's config > before > >> >> >>> restarting it to become slave to have seamless change process and > >> make > >> >> some > >> >> >>> instance master by configuration only if all the instances need > to > >> be > >> >> >>> restarted? > >> >> >>> >> > >> >> >>> >> I tried also a cluster with 2 masters and 2 slaves with 2 > >> separate > >> >> >>> group-name values, but for some reason the second master I > started > >> >> became > >> >> >>> slave for the first immediately. I expected it to become a > >> clustered > >> >> >>> load-balancing parallel master. Our loads are not yet that high > to > >> >> require > >> >> >>> more than one master, so it's just a theoretical question. > >> >> >>> >> > >> >> >>> >> BR! > >> >> >>> >> Mihkel > >> >> >>> >> > >> >> >>> >>> On 13 October 2015 at 20:03, Clebert Suconic < > >> >> >>> clebert.suco...@gmail.com> wrote: > >> >> >>> >>> The master needs to copy its data from the backup back to > live > >> >> before > >> >> >>> >>> it's activated. > >> >> >>> >>> > >> >> >>> >>> Do you have a test replicating this? > >> >> >>> >>> > >> >> >>> >>> Did you try the snapshot build? > >> >> >>> >>> > >> >> >>> >>> On Tue, Oct 13, 2015 at 11:58 AM, Mihkel Nõges > >> >> >>> >>> <mihkel.no...@transferwise.com> wrote: > >> >> >>> >>> > Hi, > >> >> >>> >>> > > >> >> >>> >>> > I configured replicating HA master-slave of Artemis 1.1.0 > >> >> instances > >> >> >>> on > >> >> >>> >>> > Ubuntu 14.04.3. > >> >> >>> >>> > > >> >> >>> >>> > When I kill master the slave takes over as expected and > >> starts > >> >> >>> serving as > >> >> >>> >>> > new master. When I then start the old master, it fails with > >> the > >> >> >>> following > >> >> >>> >>> > errors in the log: > >> >> >>> >>> > > >> >> >>> >>> > 16:35:46,476 ERROR > [org.apache.activemq.artemis.core.server] > >> >> >>> AMQ224008: > >> >> >>> >>> > Failed to store id: java.lang.IllegalStateException: Cannot > >> find > >> >> >>> add info 1 > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:799) > >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRecord(JournalBase.java:183) > >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:79) > >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.deleteID(JournalStorageManager.java:1194) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.deleteID(BatchingIDGenerator.java:152) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.cleanup(BatchingIDGenerator.java:75) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.loadBindingJournal(JournalStorageManager.java: > >> >> >>> 1784) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: > >> >> >>> 1625) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: > >> >> >>> 1535) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] > >> >> >>> >>> > > >> >> >>> >>> > 16:35:46,572 WARN > [org.apache.activemq.artemis.core.server] > >> >> >>> AMQ222173: > >> >> >>> >>> > Queue jms.queue.DLQ is duplicated during reload. This queue > >> will > >> >> be > >> >> >>> renamed > >> >> >>> >>> > as jms.queue.DLQ-0 > >> >> >>> >>> > 16:35:46,572 ERROR > [org.apache.activemq.artemis.core.server] > >> >> >>> AMQ224000: > >> >> >>> >>> > Failure in initialisation: java.lang.IllegalStateException: > >> >> Cursor > >> >> >>> 2 had > >> >> >>> >>> > already been created > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.createSubscription(PageCursorProviderImpl.java:97) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.initQueues(PostOfficeJournalLoader.java:140) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: > >> >> >>> 1631) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: > >> >> >>> 1535) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at > >> >> >>> >>> > > >> >> >>> > >> >> > >> > org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] > >> >> >>> >>> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] > >> >> >>> >>> > > >> >> >>> >>> > As a result both master and the slave remain unaccessible > >> and no > >> >> >>> further > >> >> >>> >>> > restarts solve the situation. > >> >> >>> >>> > > >> >> >>> >>> > Attached also master and slave broker.xml files. > >> >> >>> >>> > > >> >> >>> >>> > BR! > >> >> >>> >>> > > >> >> >>> >>> > Mihkel Nõges > >> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> -- > >> >> >>> >>> Clebert Suconic > >> >> >>> > > >> >> >>> > >> >> >> > >> >> >> > >> >> > >> >> > >> >> > >> >> -- > >> >> Clebert Suconic > >> >> > >> > >> > >> > >> -- > >> Clebert Suconic > >> > > > > >