Sorry, the last mail went out too fast. Instead of shared-store I had replication
On 20 October 2015 at 17:24, Mihkel Nõges <mihkel.no...@transferwise.com> wrote: > Also I had a question earlier about having more than one Artemis master in > single cluster. When I tried this it resulted in only one master becoming a > master, the other one became a slave for the first one started even though > I set different group-name values for them in broker.xml. Is this expected > behavior? > > <ha-policy> > <replication> > <master> > <group-name>ha-cluster1</group-name> > </master> > </replication> > </ha-policy> > > <ha-policy> > <replication> > <master> > <group-name>ha-cluster2</group-name> > </master> > </replication> > </ha-policy> > > Mihkel > > On 20 October 2015 at 16:53, Mihkel Nõges <mihkel.no...@transferwise.com> > wrote: > >> Hi Tim, Clebert! >> >> Yes we considered also the alternatives ( >> http://activemq.apache.org/masterslave.html): >> *Shared Storage:* >> >> We do not have high performance shared storage solution. We have some >> solution for our current file storage needs, but it's I/O is said to be >> very slow and would need to be extended to support extra load. >> >> *Replicated LevelDB:* >> >> It sounds cool, but I'm a little bit afraid of moving from one >> experimental solution to the next. I noticed LevelDB does not support some >> of the features we need like Scheduled message delivery: >> http://activemq.apache.org/replicated-leveldb-store.html >> The LevelDB store does not yet support storing data associated with Delay >> and Schedule Message Delivery. Those are are stored in a separate >> non-replicated KahaDB data files. Unexpected results will occur if you use >> Delay and Schedule Message Delivery with the replicated leveldb store since >> that data will be not be there when the master fails over to a slave. >> >> Note like this make me feel very uneasy about the solution. >> >> *JDBC:* >> >> So it seems to me like the most reliable highly available messaging >> solution in ActiveMQ 5 is JDBC. We have MySQL running as our main DB and >> setting up a second DB for messaging would be fairly simple for standard >> procedures of maintenance, backups and disaster recovery etc. >> >> >> I consider this only as a temporary solution until we can use more >> performant alternative configuration and I'm not expecting Artemis to >> implement support for JDBC storage ever. >> >> We are using messaging in process of splitting our monolithic application >> into micro-services. As this is gradual process, the amount of messages >> would be very small in the beginning, so having low performing but reliable >> JDBC backed broker configuration seems good for start. >> >> I was trying to find the more orthodox approach, but could not find or >> get good suggestions. I tried disabling fail-back and starting master like >> that resulted in both servers spamming in the logs another server with the >> same ID is running. Do I understand correctly I should have backed up and >> removed the /data folder of the master, reconfigured it as a slave and >> started it then? >> >> Can you give me some overview of already existing deployments of highly >> available and failing over (not necessarily failing back) Artemis >> installations in production I may change my mind about going with it from >> the start. >> >> Mihkel >> >> >> On 20 October 2015 at 16:19, Clebert Suconic <clebert.suco...@gmail.com> >> wrote: >> >>> As far as I know ActiveMQ5 doesn't do failback on the master-slave >>> journal... and it doesn't have any protocol to sync the data between >>> master and slave. >>> >>> >>> There is a small regression on the failback that we are dealing now... >>> if you set the master as a backup it would work fine... >>> >>> >>> I think your testcase is a bit non orthodox... >>> >>> TBH production guys usually don't use failback.. they keep the backup >>> until they can get to a quiet period and then do the failback (or >>> restart the system) under low load. >>> >>> >>> I also second Tim Bain on your choice for JDBC. >>> >>> I actually always say this.. if you can use JDBC as a storage for >>> messaging.. don't use messaging at all.. just store and retrieve from >>> the Database. >>> >>> >>> There's a JIRA open for Artemis on JDBC.. but usually those things are >>> written because users want, not need it. >>> >>> On Tue, Oct 20, 2015 at 3:12 AM, Mihkel Nõges >>> <mihkel.no...@transferwise.com> wrote: >>> > Yes I saw that issue too and set myself as watcher of this when it was >>> > created. I did not think it could be exactly the same as it is >>> described to >>> > present itself only in narrow timing related conditions. My case seems >>> to >>> > be much more broad and basic. Seems like nobody actually tried to set >>> this >>> > up in realistic situation. >>> > >>> > Do you know of any existing production deployments of Artemis (or >>> hornetq) >>> > with failover? I thought Artemis as based on hornetq should have it's >>> > features as stable as last hornetq version. We have already used >>> embedded >>> > hornetq for some time happily. I think it would make a lot of sense to >>> > grade the Artemis features publicly as what is their maturity and usage >>> > statistics of each feature if known, so it would be easier to compare >>> the >>> > brokers even among the 3 variants of ActiveMQ family. >>> > >>> > I think it's more safe for us to start building our first messaging >>> > features on ActiveMQ 5.12.1 with JDBC backed Master-Slave instead of >>> > Artemis and switch to Artemis once it has become more stable and also >>> our >>> > needs for scalability have grown to make it reasonable. Right now it >>> seems >>> > there are still too big blockers which may threaten the stability of >>> our >>> > system in Artemis. >>> > >>> > I did not mean this letter to be in no means negative. In the opposite >>> I >>> > really hope Artemis would do well as it comes with such a great >>> technical >>> > foundation and elegant ideas. I think the best for Artemis would be to >>> find >>> > users that can trust it's features and improve it as they grow. This >>> means >>> > the nucleus of Artemis must be really solid and stable. >>> > >>> > BR! >>> > Mihkel Nõges >>> > >>> > >>> > >>> > On 19 October 2015 at 22:15, Clebert Suconic < >>> clebert.suco...@gmail.com> >>> > wrote: >>> > >>> >> Looks related to me: >>> >> >>> >> https://issues.apache.org/jira/browse/ARTEMIS-256 >>> >> >>> >> >>> >> >>> >> On Mon, Oct 19, 2015 at 4:04 AM, Mihkel Nõges >>> >> <mihkel.no...@transferwise.com> wrote: >>> >> > Basic flow of getting unresponsive failback cluster: >>> >> > Have machine with Ubuntu 14.04.3 >>> >> > >>> >> > 1. Install libaio1, Java 1.8.0_60, maven 3.3.3, download and >>> extract >>> >> > apache-artemis-1.1.0-bin >>> >> > < >>> >> >>> http://www.eu.apache.org/dist/activemq/activemq-artemis/1.1.0/apache-artemis-1.1.0-bin.tar.gz >>> >> > >>> >> > in >>> >> > /opt >>> >> > 2. run $ mvn -Prelease install and $ mnv verify in >>> >> > >>> /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback >>> >> > SUCCESS >>> >> > 3. Clean data folders and starts both servers manually: >>> >> > $ >>> >> > cd >>> >> >>> /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target >>> >> > $ rm -R server0/data/ >>> >> > $ rm -R server1/data/ >>> >> > $ server0/bin/artemis-service start >>> >> > Starting artemis-service >>> >> > artemis-service is now running (7154) >>> >> > $ server1/bin/artemis-service start >>> >> > Starting artemis-service >>> >> > artemis-service is now running (7180) >>> >> > 4. Kill master server and wait for slave to take over >>> >> > $ kill -9 7154 >>> >> > >>> >> > $ tail -f server1/log/artemis.log >>> >> > 08:52:54,798 INFO [org.apache.activemq.artemis.core.server] >>> >> AMQ221043: >>> >> > Protocol module found: [artemis-stomp-protocol]. Adding protocol >>> >> support >>> >> > for: STOMP >>> >> > 08:53:02,145 INFO [org.apache.activemq.artemis.core.server] >>> >> AMQ221109: >>> >> > Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] >>> started, >>> >> waiting >>> >> > live to fail before it gets active >>> >> > 08:53:03,582 INFO [org.apache.activemq.artemis.core.server] >>> >> AMQ221024: >>> >> > Backup server >>> >> > >>> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 is >>> >> > synchronized with live-server. >>> >> > 08:53:03,777 INFO [org.apache.activemq.artemis.core.server] >>> >> AMQ221031: >>> >> > backup announced >>> >> > 08:55:59,292 INFO [org.apache.activemq.artemis.core.server] >>> >> AMQ221037: >>> >> > >>> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 to >>> >> > become 'live' >>> >> > 08:55:59,302 WARN [org.apache.activemq.artemis.core.client] >>> >> AMQ212004: >>> >> > Failed to connect to server. >>> >> > 08:55:59,778 INFO [org.apache.activemq.artemis.core.server] >>> >> AMQ221003: >>> >> > trying to deploy queue jms.queue.exampleQueue >>> >> > 08:55:59,829 WARN [org.apache.activemq.artemis.core.client] >>> >> AMQ212034: >>> >> > There are more than one servers on the network broadcasting the >>> same >>> >> node >>> >> > id. You will see this message exactly once (per node) if a node >>> is >>> >> > restarted, in which case it can be safely ignored. But if it is >>> logged >>> >> > continuously it means you really do have more than one node on >>> the >>> >> same >>> >> > network active concurrently with the same node id. This could >>> occur >>> >> if you >>> >> > have a backup node active at the same time as its live node. >>> >> > nodeID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 >>> >> > 08:55:59,836 INFO [org.apache.activemq.artemis.core.server] >>> >> AMQ221007: >>> >> > Server is now live >>> >> > 08:55:59,869 INFO [org.apache.activemq.artemis.core.server] >>> >> AMQ221020: >>> >> > Started Acceptor at broker3:61617 for protocols >>> >> > [CORE,MQTT,AMQP,HORNETQ,STOMP,OPENWIRE] >>> >> > 5. >>> >> > >>> >> > Start master again and observer the logs: >>> >> > $ server0/bin/artemis-service start >>> >> > Starting artemis-service >>> >> > artemis-service is now running (7388) >>> >> > >>> >> > $ tail -f server0/log/artemis.log >>> >> > 08:57:24,625 INFO [org.apache.activemq.artemis.core.server] >>> AMQ221012: >>> >> > Using AIO Journal >>> >> > 08:57:24,694 INFO [org.apache.activemq.artemis.core.server] >>> AMQ221043: >>> >> > Protocol module found: [artemis-server]. Adding protocol support >>> for: >>> >> CORE >>> >> > 08:57:24,702 INFO [org.apache.activemq.artemis.core.server] >>> AMQ221043: >>> >> > Protocol module found: [artemis-amqp-protocol]. Adding protocol >>> support >>> >> > for: AMQP >>> >> > 08:57:24,731 INFO [org.apache.activemq.artemis.core.server] >>> AMQ221043: >>> >> > Protocol module found: [artemis-hornetq-protocol]. Adding protocol >>> >> support >>> >> > for: HORNETQ >>> >> > 08:57:24,733 INFO [org.apache.activemq.artemis.core.server] >>> AMQ221043: >>> >> > Protocol module found: [artemis-mqtt-protocol]. Adding protocol >>> support >>> >> > for: MQTT >>> >> > 08:57:24,743 INFO [org.apache.activemq.artemis.core.server] >>> AMQ221043: >>> >> > Protocol module found: [artemis-openwire-protocol]. Adding protocol >>> >> support >>> >> > for: OPENWIRE >>> >> > 08:57:24,878 INFO [org.apache.activemq.artemis.core.server] >>> AMQ221043: >>> >> > Protocol module found: [artemis-stomp-protocol]. Adding protocol >>> support >>> >> > for: STOMP >>> >> > 08:57:25,082 INFO [org.apache.activemq.artemis.core.server] >>> AMQ221109: >>> >> > Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] started, >>> >> waiting >>> >> > live to fail before it gets active >>> >> > 08:57:27,043 INFO [org.apache.activemq.artemis.core.server] >>> AMQ221024: >>> >> > Backup server >>> >> > ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 >>> is >>> >> > synchronized with live-server. >>> >> > 08:57:27,948 INFO [org.apache.activemq.artemis.core.server] >>> AMQ221031: >>> >> > backup announced >>> >> > 08:57:31,227 WARN [org.apache.activemq.artemis.core.client] >>> AMQ212037: >>> >> > Connection failure has been detected: AMQ119015: The connection was >>> >> > disconnected because of server shutdown [code=DISCONNECTED] >>> >> > 08:57:31,252 WARN [org.apache.activemq.artemis.core.client] >>> AMQ212037: >>> >> > Connection failure has been detected: AMQ119015: The connection was >>> >> > disconnected because of server shutdown [code=DISCONNECTED] >>> >> > 08:57:31,307 WARN [org.apache.activemq.artemis.core.client] >>> AMQ212037: >>> >> > Connection failure has been detected: AMQ119015: The connection was >>> >> > disconnected because of server shutdown [code=DISCONNECTED] >>> >> > 08:57:31,339 INFO [org.apache.activemq.artemis.core.server] >>> AMQ221037: >>> >> > ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 >>> to >>> >> > become 'live' >>> >> > 08:57:31,360 WARN [org.apache.activemq.artemis.core.client] >>> AMQ212004: >>> >> > Failed to connect to server. >>> >> > 08:57:31,413 ERROR [org.apache.activemq.artemis.core.server] >>> AMQ224008: >>> >> > Failed to store id: java.lang.IllegalStateException: Cannot find add >>> >> info 1 >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:799) >>> >> > [artemis-journal-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRecord(JournalBase.java:183) >>> >> > [artemis-journal-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:79) >>> >> > [artemis-journal-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.deleteID(JournalStorageManager.java:1194) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.deleteID(BatchingIDGenerator.java:152) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.cleanup(BatchingIDGenerator.java:75) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.loadBindingJournal(JournalStorageManager.java: >>> 1784) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: >>> 1625) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: >>> 1535) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >>> >> > 08:57:31,540 WARN [org.apache.activemq.artemis.core.server] >>> AMQ222173: >>> >> > Queue jms.queue.exampleQueue is duplicated during reload. This >>> queue will >>> >> > be renamed as jms.queue.exampleQueue-0 >>> >> > 08:57:31,550 ERROR [org.apache.activemq.artemis.core.server] >>> AMQ224000: >>> >> > Failure in initialisation: java.lang.IllegalStateException: Cursor >>> 2 had >>> >> > already been created >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.createSubscription(PageCursorProviderImpl.java:97) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.initQueues(PostOfficeJournalLoader.java:140) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: >>> 1631) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: >>> 1535) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at >>> >> > >>> >> >>> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) >>> >> > [artemis-server-1.1.0.jar:1.1.0] >>> >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >>> >> > >>> >> > >>> >> > On 19 October 2015 at 10:31, Mihkel Nõges < >>> mihkel.no...@transferwise.com >>> >> > >>> >> > wrote: >>> >> > >>> >> >> Hi Clebert, >>> >> >> >>> >> >> I do not have other code to share with you but the example code in >>> >> Artemis >>> >> >> 1.1.0 binary deployment package. I'm running >>> >> >> org.apache.activemq.artemis.jms.example.ReplicatedFailbackExample >>> >> >> >>> >> >> And only commented out the serverStart and killServer calls which >>> I am >>> >> >> doing manually. >>> >> >> >>> >> >> I do not think I do any of the steps too fast as I tail the server >>> log >>> >> >> files in parallel and see everything is finished when I start the >>> fail >>> >> >> back. I have waited many minutes in between. >>> >> >> >>> >> >> Only changes in configuration to the test is changing localhost >>> >> addresses >>> >> >> with broker3 to make the cluster accessible remotely. >>> >> >> >>> >> >> BR! >>> >> >> MIhkel >>> >> >> >>> >> >> On 18 October 2015 at 17:49, Clebert <clebert.suco...@gmail.com> >>> wrote: >>> >> >> >>> >> >>> Im not on my computer now but it sounds like you are doing a fail >>> back >>> >> >>> immediately after failed over. It takes some time (seconds) to the >>> >> server >>> >> >>> to activate on the backup. >>> >> >>> >>> >> >>> Later the server will need to copy the data back before it can be >>> >> >>> activated in fail back mode. >>> >> >>> >>> >> >>> It sounds the live is not reaching backup for fail back. >>> >> >>> >>> >> >>> I will try looking it at it on Monday. Maybe you could post your >>> >> example >>> >> >>> at your GitHub fork. >>> >> >>> >>> >> >>> -- Clebert Suconic typing on the iPhone. >>> >> >>> >>> >> >>> > On Oct 18, 2015, at 08:15, Mihkel Nõges < >>> >> mihkel.no...@transferwise.com> >>> >> >>> wrote: >>> >> >>> > >>> >> >>> > Hello again! >>> >> >>> > >>> >> >>> > I would be very grateful If someone could answer my questions. >>> We >>> >> need >>> >> >>> the high availability to work to use the broker in production. >>> >> >>> > >>> >> >>> > When I run the replicated-failback example in one machine >>> (broker3) >>> >> it >>> >> >>> succeeds. >>> >> >>> > >>> >> >>> > It fails when I run the same test - exactly the same servers >>> with >>> >> >>> slightly modified client remotely. >>> >> >>> > >>> >> >>> > I run client in debug mode from my IDE with commented out >>> serverStart >>> >> >>> and killServer calls. >>> >> >>> > Deleted data folders and started the servers: >>> >> >>> > artemis@broker3 >>> >> >>> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >>> >> >>> rm -R server0/data/ >>> >> >>> > >>> >> >>> > artemis@broker3 >>> >> >>> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >>> >> >>> rm -R server1/data/ >>> >> >>> > >>> >> >>> > artemis@broker3 >>> >> >>> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >>> >> >>> server0/bin/artemis-service start >>> >> >>> > >>> >> >>> > Starting artemis-service >>> >> >>> > >>> >> >>> > artemis-service is now running (23357) >>> >> >>> > >>> >> >>> > artemis@broker3 >>> >> >>> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >>> >> >>> server1/bin/artemis-service start >>> >> >>> > >>> >> >>> > Starting artemis-service >>> >> >>> > >>> >> >>> > artemis-service is now running (23383) >>> >> >>> > >>> >> >>> > Starting client and stopping on breakpoint at line 103: >>> >> >>> > //ServerUtil.killServer(server0); >>> >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages >>> will fail >>> >> >>> as failover to the >>> >> >>> > // backup server has occurred >>> >> >>> > try { >>> >> >>> > message0.acknowledge(); //line 103 >>> >> >>> > killing server0 >>> >> >>> > artemis@broker3 >>> >> >>> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >>> >> >>> kill -9 23357 >>> >> >>> > >>> >> >>> > Proceeding to breakpoint at line 121: >>> >> >>> > //server0 = ServerUtil.startServer(args[0], >>> >> >>> ReplicatedFailbackExample.class.getSimpleName() + "0", 0, 10000); >>> >> >>> > >>> >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages >>> will fail >>> >> >>> as failover to the >>> >> >>> > // backup server has occurred >>> >> >>> > try { >>> >> >>> > message0.acknowledge(); // line 121 >>> >> >>> > Starting server0: >>> >> >>> > artemis@broker3 >>> >> >>> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$ >>> >> >>> server0/bin/artemis-service start >>> >> >>> > >>> >> >>> > Starting artemis-service >>> >> >>> > >>> >> >>> > artemis-service is now running (24240) >>> >> >>> > >>> >> >>> > Server0 writes ERROR to it's log (see attached >>> server0_artemis.log). >>> >> >>> > Now when trying to proceed with the client it writes the >>> following in >>> >> >>> the log and does not exit, but remains hanging forever: >>> >> >>> > >>> >> >>> > Oct 18, 2015 2:55:34 PM >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl >>> >> >>> fail >>> >> >>> > >>> >> >>> > WARN: AMQ212037: Connection failure has been detected: >>> AMQ119015: The >>> >> >>> connection was disconnected because of server shutdown >>> >> [code=DISCONNECTED] >>> >> >>> > >>> >> >>> > Got message: This is text message 20 (redelivered?: false) >>> >> >>> > >>> >> >>> > Got exception while acknowledging message: AMQ119014: Timed out >>> after >>> >> >>> waiting 30,000 ms for response when sending packet 43 >>> >> >>> > >>> >> >>> > Got message: This is text message 21 (redelivered?: false) >>> >> >>> > >>> >> >>> > Got message: This is text message 22 (redelivered?: false) >>> >> >>> > >>> >> >>> > Got message: This is text message 23 (redelivered?: false) >>> >> >>> > >>> >> >>> > Got message: This is text message 24 (redelivered?: false) >>> >> >>> > >>> >> >>> > Got message: This is text message 25 (redelivered?: false) >>> >> >>> > >>> >> >>> > Got message: This is text message 26 (redelivered?: false) >>> >> >>> > >>> >> >>> > Got message: This is text message 27 (redelivered?: false) >>> >> >>> > >>> >> >>> > Got message: This is text message 28 (redelivered?: false) >>> >> >>> > >>> >> >>> > Got message: This is text message 29 (redelivered?: false) >>> >> >>> > >>> >> >>> > As a result the slave (server1) remains stopped, not restarted >>> as >>> >> >>> expected and the master (server0) process appears to be running >>> but >>> >> does >>> >> >>> not accept any connections. >>> >> >>> > >>> >> >>> > Exactly the same behavior is observable every time I try this. >>> >> >>> > >>> >> >>> > BR! >>> >> >>> > Mihkel >>> >> >>> > >>> >> >>> >> On 13 October 2015 at 20:17, Mihkel Nõges < >>> >> >>> mihkel.no...@transferwise.com> wrote: >>> >> >>> >> Hi Clebert, >>> >> >>> >> >>> >> >>> >> No test, just doing it on command line with standalone >>> servers. I'm >>> >> >>> using 1.1.0 installed with wget, not the snapshot. >>> >> >>> >> >>> >> >>> >> I'm wondering what should be the suggested procedure for >>> admins to >>> >> do >>> >> >>> changes to HA cluster of 2 or 3 nodes of Artemis. If one of the >>> nodes >>> >> is >>> >> >>> master by configuration, do they need to change it's config before >>> >> >>> restarting it to become slave to have seamless change process and >>> make >>> >> some >>> >> >>> instance master by configuration only if all the instances need >>> to be >>> >> >>> restarted? >>> >> >>> >> >>> >> >>> >> I tried also a cluster with 2 masters and 2 slaves with 2 >>> separate >>> >> >>> group-name values, but for some reason the second master I started >>> >> became >>> >> >>> slave for the first immediately. I expected it to become a >>> clustered >>> >> >>> load-balancing parallel master. Our loads are not yet that high to >>> >> require >>> >> >>> more than one master, so it's just a theoretical question. >>> >> >>> >> >>> >> >>> >> BR! >>> >> >>> >> Mihkel >>> >> >>> >> >>> >> >>> >>> On 13 October 2015 at 20:03, Clebert Suconic < >>> >> >>> clebert.suco...@gmail.com> wrote: >>> >> >>> >>> The master needs to copy its data from the backup back to live >>> >> before >>> >> >>> >>> it's activated. >>> >> >>> >>> >>> >> >>> >>> Do you have a test replicating this? >>> >> >>> >>> >>> >> >>> >>> Did you try the snapshot build? >>> >> >>> >>> >>> >> >>> >>> On Tue, Oct 13, 2015 at 11:58 AM, Mihkel Nõges >>> >> >>> >>> <mihkel.no...@transferwise.com> wrote: >>> >> >>> >>> > Hi, >>> >> >>> >>> > >>> >> >>> >>> > I configured replicating HA master-slave of Artemis 1.1.0 >>> >> instances >>> >> >>> on >>> >> >>> >>> > Ubuntu 14.04.3. >>> >> >>> >>> > >>> >> >>> >>> > When I kill master the slave takes over as expected and >>> starts >>> >> >>> serving as >>> >> >>> >>> > new master. When I then start the old master, it fails with >>> the >>> >> >>> following >>> >> >>> >>> > errors in the log: >>> >> >>> >>> > >>> >> >>> >>> > 16:35:46,476 ERROR [org.apache.activemq.artemis.core.server] >>> >> >>> AMQ224008: >>> >> >>> >>> > Failed to store id: java.lang.IllegalStateException: Cannot >>> find >>> >> >>> add info 1 >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:799) >>> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRecord(JournalBase.java:183) >>> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:79) >>> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.deleteID(JournalStorageManager.java:1194) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.deleteID(BatchingIDGenerator.java:152) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.cleanup(BatchingIDGenerator.java:75) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.loadBindingJournal(JournalStorageManager.java: >>> >> >>> 1784) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: >>> >> >>> 1625) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: >>> >> >>> 1535) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >>> >> >>> >>> > >>> >> >>> >>> > 16:35:46,572 WARN [org.apache.activemq.artemis.core.server] >>> >> >>> AMQ222173: >>> >> >>> >>> > Queue jms.queue.DLQ is duplicated during reload. This queue >>> will >>> >> be >>> >> >>> renamed >>> >> >>> >>> > as jms.queue.DLQ-0 >>> >> >>> >>> > 16:35:46,572 ERROR [org.apache.activemq.artemis.core.server] >>> >> >>> AMQ224000: >>> >> >>> >>> > Failure in initialisation: java.lang.IllegalStateException: >>> >> Cursor >>> >> >>> 2 had >>> >> >>> >>> > already been created >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.createSubscription(PageCursorProviderImpl.java:97) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.initQueues(PostOfficeJournalLoader.java:140) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java: >>> >> >>> 1631) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java: >>> >> >>> 1535) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at >>> >> >>> >>> > >>> >> >>> >>> >> >>> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249) >>> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >>> >> >>> >>> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >>> >> >>> >>> > >>> >> >>> >>> > As a result both master and the slave remain unaccessible >>> and no >>> >> >>> further >>> >> >>> >>> > restarts solve the situation. >>> >> >>> >>> > >>> >> >>> >>> > Attached also master and slave broker.xml files. >>> >> >>> >>> > >>> >> >>> >>> > BR! >>> >> >>> >>> > >>> >> >>> >>> > Mihkel Nõges >>> >> >>> >>> >>> >> >>> >>> >>> >> >>> >>> >>> >> >>> >>> -- >>> >> >>> >>> Clebert Suconic >>> >> >>> > >>> >> >>> >>> >> >> >>> >> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> Clebert Suconic >>> >> >>> >>> >>> >>> -- >>> Clebert Suconic >>> >> >> >