[ 
https://issues.apache.org/jira/browse/AMQ-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513740#comment-15513740
 ] 

William Brendel commented on AMQ-5618:
--------------------------------------

Another data point: My team also encountered this issue recently, in June 2016, 
while evaluating replicated LevelDB storage for our application. We were using 
AMQ 5.13.3 and ZK 3.4.8 with 3 instances. Like Jonathan G, our database seemed 
to be corrupted beyond repair, causing us to eliminate replicated LevelDB as an 
option.

> Infinite loop in log replay with Replicated LevelDB
> ---------------------------------------------------
>
>                 Key: AMQ-5618
>                 URL: https://issues.apache.org/jira/browse/AMQ-5618
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.11.0, 5.11.1
>         Environment: Linux, Google Compute Engine
>            Reporter: Artem Karpenko
>            Priority: Critical
>
> This is very similar to AMQ-5300 except that I use replicatedLevelDB 
> persistence adapter and in order to reproduce I don't have to delete any 
> index files.
> Setup: 1 ZK instance, 3 AMQ nodes.
> One of the AMQ configs:
> {code}
> <replicatedLevelDB directory="${activemq.data}/replicatedLevelDB"
>                                 replicas="3"
>                                 bind="tcp://0.0.0.0:61619"
>                                 zkAddress="instance-6:2181"
>                                 zkPath="/activemq/leveldb-stores"
>                                 hostname="instance-7" />
> {code}
> Difference between nodes is only in hostname attribute.
> The way to reproduce is almost the same as in AMQ-5300: 
> # Produce lots of messages to generate several log files in leveldb data 
> directory.
> # Consume _some_ messages until you see "Deleting log" in activemq.log.
> # Restart master. Wait for system to rebalance itself. Everything's fine at 
> this point.
> # Restart the second master.
> # Observe the massive (infinite?) logging on slave and relatively calm but 
> still possibly infinite logging on master.
> This is what the first master logs after it's restarted:
> {code}
> 2015-02-25 21:37:08,338 | DEBUG | Download session connected... | 
> org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:08,582 | INFO  | Slave skipping download of: 
> log/00000000190be289.log | 
> org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,099 | INFO  | Slave skipping download of: 
> log/000000000642f848.log | 
> org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,411 | INFO  | Slave skipping download of: 
> log/000000000c85f06d.log | 
> org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,838 | INFO  | Slave skipping download of: 
> log/0000000012c8e921.log | 
> org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,842 | INFO  | Slave requested: 
> 000000001c9373b4.index/CURRENT | 
> org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,846 | INFO  | Slave requested: 
> 000000001c9373b4.index/MANIFEST-000002 | 
> org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,850 | INFO  | Slave requested: 
> 000000001c9373b4.index/000003.log | 
> org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,857 | INFO  | Attaching... Downloaded 0.02/95.65 kb and 
> 1/3 files | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,859 | INFO  | Attaching... Downloaded 0.06/95.65 kb and 
> 2/3 files | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,861 | INFO  | Attaching... Downloaded 95.65/95.65 kb and 
> 3/3 files | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,862 | INFO  | Attached | 
> org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:37:09,878 | DEBUG | Taking a snapshot of the current index: 
> /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000001c9373b4.index
>  | org.apache.activemq.leveldb.LevelDBClient | Thread-2
> 2015-02-25 21:37:10,352 | DEBUG | Recovering from last index snapshot at: 
> /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/dirty.index | 
> org.apache.activemq.leveldb.LevelDBClient | Thread-2
> {code}
> Right after that everything seems fine. But as soon as I stop the new master, 
> the another new master (that would be the third one) logs
> {code}
> 2015-02-25 21:38:43,876 | INFO  | Promoted to master | 
> org.apache.activemq.leveldb.replicated.MasterElector | main-EventThread
> 2015-02-25 21:38:43,894 | INFO  | Using the pure java LevelDB implementation. 
> | org.apache.activemq.leveldb.LevelDBClient | ActiveMQ 
> BrokerService[localhost] Task-5
> 2015-02-25 21:38:45,954 | WARN  | Invalid log position: 44 | 
> org.apache.activemq.leveldb.LevelDBClient | ActiveMQ BrokerService[localhost] 
> Task-5
> 2015-02-25 21:38:45,955 | WARN  | Invalid log position: 325280 | 
> org.apache.activemq.leveldb.LevelDBClient | ActiveMQ BrokerService[localhost] 
> Task-5
> ...
> 2015-02-25 21:38:46,696 | WARN  | Invalid log position: 104726993 | 
> org.apache.activemq.leveldb.LevelDBClient | ActiveMQ BrokerService[localhost] 
> Task-5
> 2015-02-25 21:38:46,902 | INFO  | Master started: tcp://instance-8:61619 | 
> org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ 
> BrokerService[localhost] Task-6
> ...
> 2015-02-25 21:38:52,127 | INFO  | Initializing Spring FrameworkServlet 
> 'dispatcher' | /admin | main
> 2015-02-25 21:38:53,181 | INFO  | jolokia-agent: No access restrictor found 
> at classpath:/jolokia-access.xml, access to all MBeans is allowed | /api | 
> main
> 2015-02-25 21:38:56,881 | WARN  | Invalid log position: 0 | 
> org.apache.activemq.leveldb.LevelDBClient | Thread-4
> 2015-02-25 21:39:06,884 | WARN  | Invalid log position: 0 | 
> org.apache.activemq.leveldb.LevelDBClient | Thread-4
> 2015-02-25 21:39:16,887 | WARN  | Invalid log position: 0 | 
> org.apache.activemq.leveldb.LevelDBClient | Thread-4
> ...
> {code}
> the last message starts repeating every 10 seconds without hint of stopping.
> At the same time the one left slave (that was the 1st master) logs
> {code}
> 2015-02-25 21:38:50,124 | INFO  | Attached | 
> org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
> hawtdispatch-DEFAULT-1
> 2015-02-25 21:38:50,134 | DEBUG | Taking a snapshot of the current index: 
> /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000001c9373b4.index
>  | org.apache.activemq.leveldb.LevelDBClient | Thread-3
> 2015-02-25 21:38:50,149 | DEBUG | Recovering from last index snapshot at: 
> /usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/dirty.index | 
> org.apache.activemq.leveldb.LevelDBClient | Thread-3
> 2015-02-25 21:38:50,189 | DEBUG | Replay of journal from: 0 to 479425460. | 
> org.apache.activemq.leveldb.LevelDBClient | Thread-3
> 2015-02-25 21:38:50,328 | WARN  | No reader available for position: 0, 
> log_infos: 
> {105052232=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000642f848.log,105052232,105052197),
>  
> 210104429=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000c85f06d.log,210104429,105052340),
>  
> 315156769=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/0000000012c8e921.log,315156769,105052520),
>  
> 420209289=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/00000000190be289.log,420209289,0)}
>  | org.apache.activemq.leveldb.RecordLog | Thread-3
> 2015-02-25 21:38:50,332 | WARN  | No reader available for position: 0, 
> log_infos: 
> {105052232=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000642f848.log,105052232,105052197),
>  
> 210104429=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000c85f06d.log,210104429,105052340),
>  
> 315156769=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/0000000012c8e921.log,315156769,105052520),
>  
> 420209289=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/00000000190be289.log,420209289,0)}
>  | org.apache.activemq.leveldb.RecordLog | Thread-3
> ...
> {code}
> the last message repeats infinitely and every ~1ms, with activemq process 
> taking 100% CPU.
> If I would now start the stopped node (that was the 2d master) it would log 
> exactly the same message with the same intensity.
> It should also be noted that while I can observe correct diminished 
> (relatively to initial, due to some consumption) count of messages in a queue 
> via console after the first restart, the second restart changes that and 
> queue size becomes bigger (although I'm not sure if it is equal to the size 
> of queue before any consumption).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to