Re: failed to start ActiveMQ
OK, it sounds like I understood you correctly, then. Did you use the tools and techniques outlined in the wiki page I provided to determine which destination(s) contain the messages that are preventing the files from being deleted? Tim On Wed, Apr 4, 2018, 6:02 PM norinoswrote: > Sorry, my information is not enough. > I changed the ActiveMQ setting as follows, and restarted. > > - > offlineDurableSubscriberTimeout="12" > offlineDurableSubscriberTaskSchedule="18" > - > > In this case, the client application was not connected to the server. > So I assumed that the journal files will be deleted in a few minutes after > the server started up. > > But when the offline durable subscription cleanup is started, journal file > could not be deleted. > > Since the file was not deleted, I connected the application thought that I > could receive a message, but I could not receive the message. > > > > > -- > Sent from: > http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html >
Re: failed to start ActiveMQ
Sorry, my information is not enough. I changed the ActiveMQ setting as follows, and restarted. - offlineDurableSubscriberTimeout="12" offlineDurableSubscriberTaskSchedule="18" - In this case, the client application was not connected to the server. So I assumed that the journal files will be deleted in a few minutes after the server started up. But when the offline durable subscription cleanup is started, journal file could not be deleted. Since the file was not deleted, I connected the application thought that I could receive a message, but I could not receive the message. -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: failed to start ActiveMQ
Sorry, my information is not enough. I changed the ActiveMQ setting as follows, and restarted. - offlineDurableSubscriberTimeout="12" offlineDurableSubscriberTaskSchedule="18" - In this case, the client application was not connected to the server. So I assumed that the journal files will be deleted in a few minutes after the server started up. But when the offline durable subscription cleanup is started, journal file could not be deleted. Since the file was not deleted, I connected the application thought that I could receive a message, but I could not receive the message. -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: failed to start ActiveMQ
I'm not understanding. Are you saying that after those durable subscriptions were deleted, there were no more unconsumed messages and so the journal files should have been deleted but were not? If I've understood correctly, http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html will let you determine why the files are being kept. (Is there anything in the DLQ?) Tim On Wed, Apr 4, 2018, 4:24 AM norinoswrote: > I tried deleting db.data and db.redo files, and starting up ActiveMQ. > This try succeeded.(ActiveMQ start successfuly, and recreated db.data and > db.redo) > > But when the offline durable subscription cleanup is started, journal file > could not be deleted. > > The following message was logged to the activemq.log sometimes, and it was > no longer logged at the end. > In this state, the client cannot subscribe message. > > > > 2018-04-04 12:39:45,600 | INFO | Destroying durable subscriber due to > inactivity: > > DurableTopicSubscription-OjPgmaDmsor4oyyGGONiR18:EXACTLY_ONCE:OjPgmaDmsor4oyyGGONiR18, > id=OFFLINE:1:6137, active=false, destinations=1, total=0, pending=0, > dispatched=0, inflight=0, prefetchExtension=0 | > org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber > Cleanup Timer > 2018-04-04 12:41:06,857 | INFO | Destroying durable subscriber due to > inactivity: > > DurableTopicSubscription-xOyHFpTRyeCRG5Ymjszmi6Q:EXACTLY_ONCE:xOyHFpTRyeCRG5Ymjszmi6Q, > id=OFFLINE:1:3439, active=false, destinations=1, total=3, pending=3, > dispatched=0, inflight=0, prefetchExtension=0 | > org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber > Cleanup Timer > 2018-04-04 12:45:14,792 | INFO | Destroying durable subscriber due to > inactivity: > > DurableTopicSubscription-EySMl5bvJ5Ae2HZNqPmoizn:EXACTLY_ONCE:EySMl5bvJ5Ae2HZNqPmoizn, > id=OFFLINE:1:3459, active=false, destinations=1, total=0, pending=0, > dispatched=0, inflight=0, prefetchExtension=0 | > org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber > Cleanup Timer > > > > > > > > -- > Sent from: > http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html >
Re: failed to start ActiveMQ
I tried deleting db.data and db.redo files, and starting up ActiveMQ. This try succeeded.(ActiveMQ start successfuly, and recreated db.data and db.redo) But when the offline durable subscription cleanup is started, journal file could not be deleted. The following message was logged to the activemq.log sometimes, and it was no longer logged at the end. In this state, the client cannot subscribe message. 2018-04-04 12:39:45,600 | INFO | Destroying durable subscriber due to inactivity: DurableTopicSubscription-OjPgmaDmsor4oyyGGONiR18:EXACTLY_ONCE:OjPgmaDmsor4oyyGGONiR18, id=OFFLINE:1:6137, active=false, destinations=1, total=0, pending=0, dispatched=0, inflight=0, prefetchExtension=0 | org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber Cleanup Timer 2018-04-04 12:41:06,857 | INFO | Destroying durable subscriber due to inactivity: DurableTopicSubscription-xOyHFpTRyeCRG5Ymjszmi6Q:EXACTLY_ONCE:xOyHFpTRyeCRG5Ymjszmi6Q, id=OFFLINE:1:3439, active=false, destinations=1, total=3, pending=3, dispatched=0, inflight=0, prefetchExtension=0 | org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber Cleanup Timer 2018-04-04 12:45:14,792 | INFO | Destroying durable subscriber due to inactivity: DurableTopicSubscription-EySMl5bvJ5Ae2HZNqPmoizn:EXACTLY_ONCE:EySMl5bvJ5Ae2HZNqPmoizn, id=OFFLINE:1:3459, active=false, destinations=1, total=0, pending=0, dispatched=0, inflight=0, prefetchExtension=0 | org.apache.activemq.broker.region.TopicRegion | ActiveMQ Durable Subscriber Cleanup Timer -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: failed to start ActiveMQ
I've submitted https://issues.apache.org/jira/browse/AMQ-6938 to capture the fact that the broker failed to start due to the missing journal file, even though ignoreMissingJournalFiles was set to true. Please add any information to it that you think would be useful. I think the next workaround is to remove your db.data and db.redo files (the KahaDB index and recovery files) and try starting the broker. This should cause it to not know that file 531 is missing, and the index will be rebuilt based on the content of those journal files that are present. Sorry you've hit two successive bugs in the course of this work; I'm sure that's frustrating for you. Hopefully this second workaround will get your broker back up and running, and hopefully both bug reports I submitted will get implemented and will avoid this problem for future users. On the subject of the offline durable subscription cleanup, you'd see an INFO-level log line saying "Destroying durable subscriber due to inactivity: ..." if a subscriber were getting removed due to inactivity. If you've got a subscriber who should be getting removed and you don't see that log line, I'd suggest grabbing the source code and putting a breakpoint in org.apache.activemq.broker.region.TopicRegion.doCleanup(), to see why that subscription is not considered eligible for removal. The code looks pretty straightforward, so there's nothing obvious that I see that might cause it to not remove the subscription, so hopefully the debugger will indicate what the root of the problem is. Tim On Thu, Mar 22, 2018 at 11:35 PM, norinoswrote: > > You need to set ignoreMissingJournalFiles. See > > http://activemq.apache.org/kahadb.html for more information. > > Thank you for the information. > > I tryed the above settings, but the following error occured and > failed to start ActiveMQ. > > > 2018-03-22 17:21:56,218 | ERROR | Looking for key 531 but not found in > fileMap: > 2018-03-22 17:21:56,222 | ERROR | Failed to start Apache ActiveMQ > ([broker0, > null], java.io.IOException: Could not locate data file > /data/kahadb/db-531.log) | org.apache.activemq.broker.BrokerService | > WrapperSimpleAppMain > > > > > That's certainly possible. Are you sure that the durable subscription in > > question has actually been offline continuously for more than a month? It > > hasn't been sporadically connecting briefly and then disconnecting? > > My application connect and subscribe the message at the same time. > So the durable subscription has actually been offline continuously for more > than a month. > > > > > -- > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User- > f2341805.html >
Re: failed to start ActiveMQ
> You need to set ignoreMissingJournalFiles. See > http://activemq.apache.org/kahadb.html for more information. Thank you for the information. I tryed the above settings, but the following error occured and failed to start ActiveMQ. 2018-03-22 17:21:56,218 | ERROR | Looking for key 531 but not found in fileMap: 2018-03-22 17:21:56,222 | ERROR | Failed to start Apache ActiveMQ ([broker0, null], java.io.IOException: Could not locate data file /data/kahadb/db-531.log) | org.apache.activemq.broker.BrokerService | WrapperSimpleAppMain > That's certainly possible. Are you sure that the durable subscription in > question has actually been offline continuously for more than a month? It > hasn't been sporadically connecting briefly and then disconnecting? My application connect and subscribe the message at the same time. So the durable subscription has actually been offline continuously for more than a month. -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: failed to start ActiveMQ
On Tue, Mar 20, 2018, 4:18 AM norinoswrote: > Hi Tim. > > I deleted "db-531.log" and re-started ActiveMQ, but failed to start because > of following error. > > > > 2018-03-20 17:16:15,890 | ERROR | Looking for key 531 but not found in > fileMap: { > 29=db-29.log number = 29 , length = 33554432, . > , 482=db-482.log number = 482 , length = 33554432} > > > You need to set ignoreMissingJournalFiles. See http://activemq.apache.org/kahadb.html for more information. > you'll probably want to figure out why db-29.log is still alive (check the > > DLQ) and solve that problem. > > Yes. > It is very strange why this old file is not deleted becase I don't use DLQ. > There are tools and techniques for analyzing what destinations contain the messages in a set of KahaDB files, as described in http://activemq.apache.org/why-do-kahadb-log-files-remain-after-cleanup.html. Are they what told you that you have offline durable subscribers with unconsumed messages? If not, I'd recommend doing the analysis described on that page to know for sure. I set offlineDurableSubscriberTimeout and > offlineDurableSubscriberTaskSchedule. > > offlineDurableSubscriberTimeout="259200" > offlineDurableSubscriberTaskSchedule="8640" > > But I checked the journal files, It seems that messages sent months ago are > left without being deleted. > > Is offlineDurableSubscriberTimeout not working properly? > That's certainly possible. Are you sure that the durable subscription in question has actually been offline continuously for more than a month? It hasn't been sporadically connecting briefly and then disconnecting? Tim >
Re: failed to start ActiveMQ
Hi Tim. Sorry, because it contains confidential information, I can not attach the file to the issue. > So the question now is how you move forward. If you're able to live with > reprocessing the 511 messages that those acks acknowledged, then just > delete that file and continue on without it. But note that if you've got > 10GB in the persistence store and the only file that has a problem is the > most-recent one, you're going to very quickly hit the store limit again, > so I deleted "db-531.log" and re-started ActiveMQ, but failed to start because of following error. 2018-03-20 17:16:15,890 | ERROR | Looking for key 531 but not found in fileMap: { 29=db-29.log number = 29 , length = 33554432, . , 482=db-482.log number = 482 , length = 33554432} > you'll probably want to figure out why db-29.log is still alive (check the > DLQ) and solve that problem. Yes. It is very strange why this old file is not deleted becase I don't use DLQ. I set offlineDurableSubscriberTimeout and offlineDurableSubscriberTaskSchedule. offlineDurableSubscriberTimeout="259200" offlineDurableSubscriberTaskSchedule="8640" But I checked the journal files, It seems that messages sent months ago are left without being deleted. Is offlineDurableSubscriberTimeout not working properly? -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: failed to start ActiveMQ
I submitted https://issues.apache.org/jira/browse/AMQ-6931 to capture this issue. Please add any additional information you think might be useful. If possible, please attach the db-531.log file to the issue in case that helps whoever investigates the issue. So the question now is how you move forward. If you're able to live with reprocessing the 511 messages that those acks acknowledged, then just delete that file and continue on without it. But note that if you've got 10GB in the persistence store and the only file that has a problem is the most-recent one, you're going to very quickly hit the store limit again, so you'll probably want to figure out why db-29.log is still alive (check the DLQ) and solve that problem. If you can't afford to delete the data file, your best remaining option may be to try to patch the code yourself, to account for the possibility that it sounds like some messages don't have a 5-byte header. You're going to want to confirm the changes you make against the data file in question, to avoid making a code change that misinterprets your data files, so if you're going to go down that path, you'll want to get very familiar with the file format of KahaDB journal files. If neither of those options sounds good, I think your best remaining option is to delete all of the journal files and start over with an empty KahaDB instance. That way, you'll lose data, but you won't process any messages more than once. Tim On Wed, Mar 14, 2018 at 2:06 AM, norinoswrote: > >1. Download the 5.13.1 source code (via a sources JAR or Git). > >2. In your debugger, set a breakpoint on the catch block in > >DataFileAccessor.readRecord (). > >3. Attach a debugger to the broker when starting it (use suspend=y > > since > >this is occurring during initialization). > >4. When you hit the breakpoint, use the Drop To Frame feature (in > >Eclipse, or similar in whatever debugger you're using) to return to > the > >beginning of the method, then step through again to confirm that the > >initializer of the data local variable is the problem. If it is, > you'll > >find that location.getSize() is less than 5, and the question will be > > "why?" > > I tried the above. > > My kahadb consists of "db-29.log" to "db-531.log".(reached the store limit > of 10GB) > I set breakpoint at DataFileAccessor.readRecord. > > First, "db-512.log" is read and the location size is "9188". > Next, "db-531.log" is read and the location size is "0". > > Alos I checked "db-531.log" using the amq-kahadb-tool. > https://github.com/Hill30/amq-kahadb-tool > > The result is below. > > --- > Command statistics: > - Topics: 0 (messages: 0, +subscriptions: 0, -subscription: 0). > - Queues: 0 (messages: 0). > - Other messages: 511. > > Commands: > + CmdType: KAHA_ACK_MESSAGE_FILE_MAP_COMMAND (Count: 511, TotalSize: 10.95 > MB (11478593), ~AvrgSize: 21.94 KB (22463), LastBigSize: 21.94 KB (22463), > LastSize: 21.94 KB (22463)) > All commands: 511 (Total size: 10.95 MB (11478593). > > --- > > > > > > > > -- > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User- > f2341805.html >
Re: failed to start ActiveMQ
>1. Download the 5.13.1 source code (via a sources JAR or Git). >2. In your debugger, set a breakpoint on the catch block in >DataFileAccessor.readRecord (). >3. Attach a debugger to the broker when starting it (use suspend=y > since >this is occurring during initialization). >4. When you hit the breakpoint, use the Drop To Frame feature (in >Eclipse, or similar in whatever debugger you're using) to return to the >beginning of the method, then step through again to confirm that the >initializer of the data local variable is the problem. If it is, you'll >find that location.getSize() is less than 5, and the question will be > "why?" I tried the above. My kahadb consists of "db-29.log" to "db-531.log".(reached the store limit of 10GB) I set breakpoint at DataFileAccessor.readRecord. First, "db-512.log" is read and the location size is "9188". Next, "db-531.log" is read and the location size is "0". Alos I checked "db-531.log" using the amq-kahadb-tool. https://github.com/Hill30/amq-kahadb-tool The result is below. --- Command statistics: - Topics: 0 (messages: 0, +subscriptions: 0, -subscription: 0). - Queues: 0 (messages: 0). - Other messages: 511. Commands: + CmdType: KAHA_ACK_MESSAGE_FILE_MAP_COMMAND (Count: 511, TotalSize: 10.95 MB (11478593), ~AvrgSize: 21.94 KB (22463), LastBigSize: 21.94 KB (22463), LastSize: 21.94 KB (22463)) All commands: 511 (Total size: 10.95 MB (11478593). --- -- Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: failed to start ActiveMQ
On Sat, Mar 10, 2018 at 12:50 AM, norinoswrote: > Hi, Tim ! > > Yes. I'm using 5.13.1. > > I will try debugging and check the location size. > > > In this case, my KahaDB size is very large. > It seems that the message in which the offlineDurableSubscriberTimeout > occurred Uh, sorry, context please? This was not mentioned in your initial post. > has not been deleted, > so I restart ActiveMQ then this trouble is occrred. > > Do you think that the message is not deleted also because KahaDB location > size is broken? I don't really know the KahaDB code well enough to answer that question. I certainly can't rule it out, though it's quite possible that a copy of the message that doesn't have the problem might be in the memory store and this problem might occur only when loading persistent messages from disk on a broker restart. Either is possible, but I'm not sure it really matters. The ultimate problem is that either the message was invalid and we didn't detect it on receipt, or it got corrupted sometime after that; either way, if we fix that root cause, the inability to delete the message (if that's indeed what's going on) won't matter. But first let's have you confirm that an invalid message size is in fact what's going on, via the steps I provided earlier, and then we can figure out where to go from there. Also, were there any filesystem/storage issues with the disk contents of the KahaDB files around the time this message would have been received? If there's a meaningful possibility that the file was simply corrupted due to something outside the broker's control, let's know that up front. Tim
Re: failed to start ActiveMQ
Looking through your stack trace, I can see you're using 5.13.1, not 5.3.1, which is a good thing. (If you were using 5.3.1, I'd tell you no one was likely to help you with a version that old.) It looks like that exception is probably being thrown in org.apache.activemq.store.kahadb.disk.journal.DataFileAccessor.readRecord(), probably at the line where byte[] data is initialized. I advise you to do the following: 1. Download the 5.13.1 source code (via a sources JAR or Git). 2. In your debugger, set a breakpoint on the catch block in DataFileAccessor.readRecord (). 3. Attach a debugger to the broker when starting it (use suspend=y since this is occurring during initialization). 4. When you hit the breakpoint, use the Drop To Frame feature (in Eclipse, or similar in whatever debugger you're using) to return to the beginning of the method, then step through again to confirm that the initializer of the data local variable is the problem. If it is, you'll find that location.getSize() is less than 5, and the question will be "why?" 5. You can reset the Location's size to Location.NOT_SET to re-read the content from disk to confirm that the bytes really do indicate invalid content, but assuming that's the case, it's going to be hard to determine why the file has those bytes once they've already been written. 6. Were there any disk write failures or anything similar while running the broker prior to these errors occurring? 7. Do you need to keep the content in KahaDB? If not, you should be able to just delete the journal files and the index and then start with an empty broker. 8. If you absolutely can't afford to lose the content in KahaDB (except maybe the seemingly-corrupted record that it's failing to load at the moment), you'll probably need to figure out the byte signature of KahaDB and then determine where the next message starts, and either remove this invalid message by editing the file or modify the content to give a length that's valid for the location where the next message starts. Hope this helps. Tim On Thu, Mar 8, 2018 at 8:00 PM, n-swdwrote: > Hi. > > When I started ActiveMQ (5.3.1) , the following logs are output and can not > be started. > What kind of cause can be considered? > > 2018-03-09 11:43:32,468 | INFO | Refreshing > org.apache.activemq.xbean.XBeanBrokerFactory$1@2eeda025: startup date [Fri > Mar 09 11:43:32 JST 2018]; root of context hierarchy | > org.apache.activemq.xbean.XBeanBrokerFactory$1 | WrapperSimpleAppMain > 2018-03-09 11:43:35,073 | INFO | Loading properties file from URL > [file:/var/opt/FJSVimaps/domains/imaps0/imapsbroker_ > push/conf/credentials-enc.properties] > | org.jasypt.spring31.properties.EncryptablePropertyPlaceholderConfigurer > | > WrapperSimpleAppMain > 2018-03-09 11:43:37,388 | INFO | Using Persistence Adapter: > KahaDBPersistenceAdapter[/var/opt/FJSVimaps/domains/imaps0/ > imapsbroker_push/data/kahadb] > | org.apache.activemq.broker.BrokerService | WrapperSimpleAppMain > 2018-03-09 11:43:37,480 | INFO | JMX consoles can connect to > service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi | > org.apache.activemq.broker.jmx.ManagementContext | JMX connector > 2018-03-09 11:43:38,195 | INFO | KahaDB is version 6 | > org.apache.activemq.store.kahadb.MessageDatabase | WrapperSimpleAppMain > 2018-03-09 11:43:42,028 | ERROR | Failed to start Apache ActiveMQ > ([broker0, > null], java.io.IOException: Invalid location: 531:11590936, : > java.lang.NegativeArraySizeException) | > org.apache.activemq.broker.BrokerService | WrapperSimpleAppMain > 2018-03-09 11:43:42,043 | INFO | Apache ActiveMQ 5.13.1 (broker0, null) is > shutting down | org.apache.activemq.broker.BrokerService | > WrapperSimpleAppMain > 2018-03-09 11:43:42,047 | INFO | Connector mqtt+ssl stopped | > org.apache.activemq.broker.TransportConnector | WrapperSimpleAppMain > 2018-03-09 11:43:42,059 | INFO | > PListStore:[/var/opt/FJSVimaps/domains/imaps0/ > imapsbroker_push/data/broker0/tmp_storage] > stopped | org.apache.activemq.store.kahadb.plist.PListStoreImpl | > WrapperSimpleAppMain > 2018-03-09 11:43:42,062 | INFO | Stopping async queue tasks | > org.apache.activemq.store.kahadb.KahaDBStore | WrapperSimpleAppMain > 2018-03-09 11:43:42,064 | INFO | Stopping async topic tasks | > org.apache.activemq.store.kahadb.KahaDBStore | WrapperSimpleAppMain > 2018-03-09 11:43:42,067 | INFO | Stopped KahaDB | > org.apache.activemq.store.kahadb.KahaDBStore | WrapperSimpleAppMain > 2018-03-09 11:43:42,692 | INFO | Apache ActiveMQ 5.13.1 (broker0, null) > uptime 5.493 seconds | org.apache.activemq.broker.BrokerService | > WrapperSimpleAppMain > 2018-03-09 11:43:42,696 | INFO | Apache ActiveMQ 5.13.1 (broker0, null) is > shutdown | org.apache.activemq.broker.BrokerService | WrapperSimpleAppMain > 2018-03-09 11:43:42,700 | INFO | Closing > org.apache.activemq.xbean.XBeanBrokerFactory$1@2eeda025: startup