[jira] [Commented] (AMQ-6174) LevelDB gets corrupted when Primary ActiveMQ server is shutdown while messages are queued to it
[ https://issues.apache.org/jira/browse/AMQ-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152589#comment-15152589 ] Sunil Vishwanath commented on AMQ-6174: --- Here is the Persistence Adapter configuration: > LevelDB gets corrupted when Primary ActiveMQ server is shutdown while > messages are queued to it > --- > > Key: AMQ-6174 > URL: https://issues.apache.org/jira/browse/AMQ-6174 > Project: ActiveMQ > Issue Type: Bug > Components: activemq-leveldb-store >Affects Versions: 5.13.0 > Environment: Virtual type detected as xen-para. > Last rubix: Mon Feb 1 11:02:37 2016 release: 74867 version: 2.0.7 > Installed kernel: 2.6.18-308.0.0.0.1.el5xen x86_64 >Reporter: Sunil Vishwanath >Priority: Critical > > Currently I am testing the following setup: > ActiveMQ 5.13.0 with LevelDB (3 node cluster). > Zookeeper 3.4.6 (3 node cluster). > File system: NFSv3 > Started up all 3 Zookeeper nodes. (aamqzk1, aamqzk2 and aamqzk3) > Started up all 5 ActiveMQ nodes. (aamql1, aamql2, aamql3, aamql4 and aamql5) > I started aamql1 first and all others in order. aamql1 is the master and I am > able to see all the queue statistics for aamql1 via ActiveMQ Web Console. > I am also watching all 5 AMQ's "application.log" file using "tail -f > application.log” command. > The message producer starts sending messages (about 120,000 of them). While > the messages are being queued and also being consumed, I stopped the master > instance (aamql1). Now aamql2 becomes the master. About 10 seconds later > after all the slave aamq reports to the new master, aamql2 throws the > following exception and the instance dies. This keeps repeating as it > failover to the next instance. > 2016-02-17T15:43:48.358885-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO > datetime=2016-02-17 15:43:48,354 thread=LevelDB IOException handler. > category=org.apache.activemq.util.DefaultIOExceptionHandler Stopping > BrokerService[localhost] due to exception, java.io.EOFException: File > '/aamql/local/activemq/data/leveldb/38409848.log' offset: 477491777 > 2016-02-17T15:43:48.370881-08:00 aamql2.bus.jetqa1.syseng.tmcs > java.io.EOFException: File > '/aamql/local/activemq/data/leveldb/38409848.log' offset: 477491777 > 2016-02-17T15:43:48.371003-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.RecordLog$LogReader.read(RecordLog.scala:389) > 2016-02-17T15:43:48.371082-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.RecordLog$$anonfun$read$2.apply(RecordLog.scala:654) > 2016-02-17T15:43:48.371148-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.RecordLog$$anonfun$read$2.apply(RecordLog.scala:654) > 2016-02-17T15:43:48.371219-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.RecordLog.get_reader(RecordLog.scala:644) > 2016-02-17T15:43:48.371380-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.RecordLog.read(RecordLog.scala:654) > 2016-02-17T15:43:48.371454-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.LevelDBClient.getMessage(LevelDBClient.scala:1335) > 2016-02-17T15:43:48.371526-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1274) > 2016-02-17T15:43:48.371604-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1271) > 2016-02-17T15:43:48.371675-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1359) > 2016-02-17T15:43:48.371746-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1358) > 2016-02-17T15:43:48.371818-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.LevelDBClient$RichDB.check$4(LevelDBClient.scala:323) > 2016-02-17T15:43:48.371888-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorRange(LevelDBClient.scala:325) > 2016-02-17T15:43:48.371960-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply$mcV$sp(LevelDBClient.scala:1358) > 2016-02-17T15:43:48.372034-08:00 aamql2.bus.jetqa1.syseng.tmcs at > org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1358) > 2016-02-17T15:43:48.372104-08:00 aamql2.bus.jetqa1.syseng.tmcs at >
[jira] [Created] (AMQ-6174) LevelDB gets corrupted when Primary ActiveMQ server is shutdown while messages are queued to it
Sunil Vishwanath created AMQ-6174: - Summary: LevelDB gets corrupted when Primary ActiveMQ server is shutdown while messages are queued to it Key: AMQ-6174 URL: https://issues.apache.org/jira/browse/AMQ-6174 Project: ActiveMQ Issue Type: Bug Components: activemq-leveldb-store Affects Versions: 5.13.0 Environment: Virtual type detected as xen-para. Last rubix: Mon Feb 1 11:02:37 2016 release: 74867 version: 2.0.7 Installed kernel: 2.6.18-308.0.0.0.1.el5xen x86_64 Reporter: Sunil Vishwanath Priority: Critical Currently I am testing the following setup: ActiveMQ 5.13.0 with LevelDB (3 node cluster). Zookeeper 3.4.6 (3 node cluster). File system: NFSv3 Started up all 3 Zookeeper nodes. (aamqzk1, aamqzk2 and aamqzk3) Started up all 5 ActiveMQ nodes. (aamql1, aamql2, aamql3, aamql4 and aamql5) I started aamql1 first and all others in order. aamql1 is the master and I am able to see all the queue statistics for aamql1 via ActiveMQ Web Console. I am also watching all 5 AMQ's "application.log" file using "tail -f application.log” command. The message producer starts sending messages (about 120,000 of them). While the messages are being queued and also being consumed, I stopped the master instance (aamql1). Now aamql2 becomes the master. About 10 seconds later after all the slave aamq reports to the new master, aamql2 throws the following exception and the instance dies. This keeps repeating as it failover to the next instance. 2016-02-17T15:43:48.358885-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-02-17 15:43:48,354 thread=LevelDB IOException handler. category=org.apache.activemq.util.DefaultIOExceptionHandler Stopping BrokerService[localhost] due to exception, java.io.EOFException: File '/aamql/local/activemq/data/leveldb/38409848.log' offset: 477491777 2016-02-17T15:43:48.370881-08:00 aamql2.bus.jetqa1.syseng.tmcs java.io.EOFException: File '/aamql/local/activemq/data/leveldb/38409848.log' offset: 477491777 2016-02-17T15:43:48.371003-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.RecordLog$LogReader.read(RecordLog.scala:389) 2016-02-17T15:43:48.371082-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.RecordLog$$anonfun$read$2.apply(RecordLog.scala:654) 2016-02-17T15:43:48.371148-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.RecordLog$$anonfun$read$2.apply(RecordLog.scala:654) 2016-02-17T15:43:48.371219-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.RecordLog.get_reader(RecordLog.scala:644) 2016-02-17T15:43:48.371380-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.RecordLog.read(RecordLog.scala:654) 2016-02-17T15:43:48.371454-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient.getMessage(LevelDBClient.scala:1335) 2016-02-17T15:43:48.371526-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1274) 2016-02-17T15:43:48.371604-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1271) 2016-02-17T15:43:48.371675-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1359) 2016-02-17T15:43:48.371746-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1358) 2016-02-17T15:43:48.371818-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$RichDB.check$4(LevelDBClient.scala:323) 2016-02-17T15:43:48.371888-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorRange(LevelDBClient.scala:325) 2016-02-17T15:43:48.371960-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply$mcV$sp(LevelDBClient.scala:1358) 2016-02-17T15:43:48.372034-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1358) 2016-02-17T15:43:48.372104-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1358) 2016-02-17T15:43:48.372175-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient.usingIndex(LevelDBClient.scala:1038) 2016-02-17T15:43:48.372259-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$might_fail_using_index$1.apply(LevelDBClient.scala:1044) 2016-02-17T15:43:48.372334-08:00 aamql2.bus.jetqa1.syseng.tmcs at
[jira] [Created] (AMQ-6173) ActiveMQ with replicated LevelDB using NFSv4 corrupts on failover back to the initial instance
Sunil Vishwanath created AMQ-6173: - Summary: ActiveMQ with replicated LevelDB using NFSv4 corrupts on failover back to the initial instance Key: AMQ-6173 URL: https://issues.apache.org/jira/browse/AMQ-6173 Project: ActiveMQ Issue Type: Bug Components: activemq-leveldb-store Affects Versions: 5.13.0 Environment: Linux: Installed kernel: 2.6.18-308.0.0.0.1.el5xen x86_64 with NFSv4 Reporter: Sunil Vishwanath I have setup the following to test with NFSv4 file system: ActiveMQ 5.13.0 with LevelDB (3 node cluster). Zookeeper 3.4.6 (3 node cluster). NFSv4 file system local to each server. (not shared) Started up all 3 Zookeeper nodes. Started up all 3 ActiveMQ nodes. As I started aamq2 first, it became the master. I am able to see all the queue statistics via ActiveMQ Web Console. I am watching all 3 AMQ "application.log" file using "tail -f application.log” command. Now I stopped the aamq2 instance. Aamq3 is now promoted to master as per the messages in the aamq3’s application.log I restarted aamq2 and its levelDB caught up. Now I stopped the aamq3 instance. Aamq1 is now promoted to master as per the message in the application log. I restarted aamq3 and its levelDB caught up. Now I stopped the aamq1 instance. Aamq2 is now promoted to master as per the messages below and it encounters errors: 2016-01-31T16:39:20.097313-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:39:20,097 thread=hawtdispatch-DEFAULT-3 category=org.apache.activemq.leveldb.replicated.SlaveLevelDBStore Attaching... Downloaded 66.47/258.72 kb and 5/6 files 2016-01-31T16:39:20.103037-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:39:20,102 thread=hawtdispatch-DEFAULT-3 category=org.apache.activemq.leveldb.replicated.SlaveLevelDBStore Attaching... Downloaded 258.72/258.72 kb and 6/6 files 2016-01-31T16:39:20.104353-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:39:20,104 thread=hawtdispatch-DEFAULT-3 category=org.apache.activemq.leveldb.replicated.SlaveLevelDBStore Attached 2016-01-31T16:46:45.021281-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,020 thread=main-EventThread category=org.apache.activemq.leveldb.replicated.MasterElector Not enough cluster members have reported their update positions yet. 2016-01-31T16:46:45.115987-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,115 thread=main-EventThread category=org.apache.activemq.leveldb.replicated.MasterElector Not enough cluster members have reported their update positions yet. 2016-01-31T16:46:45.188385-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,187 thread=ActiveMQ BrokerService[localhost] Task-4 category=org.apache.activemq.leveldb.replicated.MasterElector Slave stopped 2016-01-31T16:46:45.189199-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,188 thread=ActiveMQ BrokerService[localhost] Task-4 category=org.apache.activemq.leveldb.replicated.MasterElector Not enough cluster members have reported their update positions yet. 2016-01-31T16:46:45.214426-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,214 thread=main-EventThread category=org.apache.activemq.leveldb.replicated.MasterElector Promoted to master 2016-01-31T16:46:45.256560-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,255 thread=ActiveMQ BrokerService[localhost] Task-5 category=org.apache.activemq.leveldb.LevelDBClient Using the pure java LevelDB implementation. 2016-01-31T16:46:45.729608-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,729 thread=LevelDB IOException handler. category=org.apache.activemq.broker.BrokerService No IOExceptionHandler registered, ignoring IO exception 2016-01-31T16:46:45.735717-08:00 aamql2.bus.jetqa1.syseng.tmcs java.io.IOException: java.lang.IllegalArgumentException: File is not a table (bad magic number) 2016-01-31T16:46:45.735717-08:00 aamql2.bus.jetqa1.syseng.tmcsat org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:39) 2016-01-31T16:46:45.735752-08:00 aamql2.bus.jetqa1.syseng.tmcsat org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:552) 2016-01-31T16:46:45.735752-08:00 aamql2.bus.jetqa1.syseng.tmcsat org.apache.activemq.leveldb.LevelDBClient.might_fail_using_index(LevelDBClient.scala:1044) 2016-01-31T16:46:45.735858-08:00 aamql2.bus.jetqa1.syseng.tmcsat org.apache.activemq.leveldb.LevelDBClient.listCollections(LevelDBClient.scala:1167) 2016-01-31T16:46:45.735858-08:00 aamql2.bus.jetqa1.syseng.tmcsat org.apache.activemq.leveldb.DBManager$$anonfun$3.apply(DBManager.scala:837) 2016-01-31T16:46:45.735877-08:00 aamql2.bus.jetqa1.syseng.tmcsat
[jira] [Commented] (AMQ-6173) ActiveMQ with replicated LevelDB using NFSv4 corrupts on failover back to the initial instance
[ https://issues.apache.org/jira/browse/AMQ-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152488#comment-15152488 ] Sunil Vishwanath commented on AMQ-6173: --- I changed the file system to NFSv3 and the issue went away. It looks like it is not doing well with NFSv4. > ActiveMQ with replicated LevelDB using NFSv4 corrupts on failover back to the > initial instance > -- > > Key: AMQ-6173 > URL: https://issues.apache.org/jira/browse/AMQ-6173 > Project: ActiveMQ > Issue Type: Bug > Components: activemq-leveldb-store >Affects Versions: 5.13.0 > Environment: Linux: Installed kernel: 2.6.18-308.0.0.0.1.el5xen > x86_64 with NFSv4 >Reporter: Sunil Vishwanath > > I have setup the following to test with NFSv4 file system: > ActiveMQ 5.13.0 with LevelDB (3 node cluster). > Zookeeper 3.4.6 (3 node cluster). > NFSv4 file system local to each server. (not shared) > Started up all 3 Zookeeper nodes. > Started up all 3 ActiveMQ nodes. > As I started aamq2 first, it became the master. I am able to see all the > queue statistics via ActiveMQ Web Console. > I am watching all 3 AMQ "application.log" file using "tail -f > application.log” command. > Now I stopped the aamq2 instance. Aamq3 is now promoted to master as per the > messages in the aamq3’s application.log > I restarted aamq2 and its levelDB caught up. > Now I stopped the aamq3 instance. Aamq1 is now promoted to master as per the > message in the application log. > I restarted aamq3 and its levelDB caught up. > Now I stopped the aamq1 instance. Aamq2 is now promoted to master as per the > messages below and it encounters errors: > 2016-01-31T16:39:20.097313-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO > datetime=2016-01-31 16:39:20,097 thread=hawtdispatch-DEFAULT-3 > category=org.apache.activemq.leveldb.replicated.SlaveLevelDBStore > Attaching... Downloaded 66.47/258.72 kb and 5/6 files > 2016-01-31T16:39:20.103037-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO > datetime=2016-01-31 16:39:20,102 thread=hawtdispatch-DEFAULT-3 > category=org.apache.activemq.leveldb.replicated.SlaveLevelDBStore > Attaching... Downloaded 258.72/258.72 kb and 6/6 files > 2016-01-31T16:39:20.104353-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO > datetime=2016-01-31 16:39:20,104 thread=hawtdispatch-DEFAULT-3 > category=org.apache.activemq.leveldb.replicated.SlaveLevelDBStore Attached > 2016-01-31T16:46:45.021281-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO > datetime=2016-01-31 16:46:45,020 thread=main-EventThread > category=org.apache.activemq.leveldb.replicated.MasterElector Not enough > cluster members have reported their update positions yet. > 2016-01-31T16:46:45.115987-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO > datetime=2016-01-31 16:46:45,115 thread=main-EventThread > category=org.apache.activemq.leveldb.replicated.MasterElector Not enough > cluster members have reported their update positions yet. > 2016-01-31T16:46:45.188385-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO > datetime=2016-01-31 16:46:45,187 thread=ActiveMQ BrokerService[localhost] > Task-4 category=org.apache.activemq.leveldb.replicated.MasterElector Slave > stopped > 2016-01-31T16:46:45.189199-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO > datetime=2016-01-31 16:46:45,188 thread=ActiveMQ BrokerService[localhost] > Task-4 category=org.apache.activemq.leveldb.replicated.MasterElector Not > enough cluster members have reported their update positions yet. > 2016-01-31T16:46:45.214426-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO > datetime=2016-01-31 16:46:45,214 thread=main-EventThread > category=org.apache.activemq.leveldb.replicated.MasterElector Promoted to > master > 2016-01-31T16:46:45.256560-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO > datetime=2016-01-31 16:46:45,255 thread=ActiveMQ BrokerService[localhost] > Task-5 category=org.apache.activemq.leveldb.LevelDBClient Using the pure java > LevelDB implementation. > 2016-01-31T16:46:45.729608-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO > datetime=2016-01-31 16:46:45,729 thread=LevelDB IOException handler. > category=org.apache.activemq.broker.BrokerService No IOExceptionHandler > registered, ignoring IO exception > 2016-01-31T16:46:45.735717-08:00 aamql2.bus.jetqa1.syseng.tmcs > java.io.IOException: java.lang.IllegalArgumentException: File is not a table > (bad magic number) > 2016-01-31T16:46:45.735717-08:00 aamql2.bus.jetqa1.syseng.tmcsat > org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:39) > 2016-01-31T16:46:45.735752-08:00 aamql2.bus.jetqa1.syseng.tmcsat > org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:552) > 2016-01-31T16:46:45.735752-08:00