[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658305#comment-16658305 ] ASF subversion and git services commented on AMQ-6590: -- Commit c6103415b9a185776ebc16343c6574f7ff884806 in activemq's branch refs/heads/activemq-5.15.x from gtully [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=c610341 ] AMQ-7082 - recover index free pages in parallel with start, merge in flush, clean shutdown if complete. follow up on AMQ-6590 > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16656900#comment-16656900 ] ASF subversion and git services commented on AMQ-6590: -- Commit 79c74998dc1efb72b05d32f920052a1df4b6dd8e in activemq's branch refs/heads/master from gtully [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=79c7499 ] AMQ-7082 - recover index free pages in parallel with start, merge in flush, clean shutdown if complete. follow up on AMQ-6590 > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655742#comment-16655742 ] Alan Protasio commented on AMQ-6590: Thanks! Just created a new Jira for that: https://issues.apache.org/jira/browse/AMQ-7080 > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655547#comment-16655547 ] Christopher L. Shannon commented on AMQ-6590: - I'm good with the checkpoint but want to see what [~gtully] thinks as he brought up some concerns about checkpointing the free list. As Jeff said we can continue the conversation against the PR in a new jira. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655537#comment-16655537 ] Alan Protasio commented on AMQ-6590: [~cshannon] What about saving the free pages in the check point? Do you think that make sense? I was talking other guys and the only concern seems to be that we will increase the Checkpoint timings (as we are writing one more file) > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655503#comment-16655503 ] Christopher L. Shannon commented on AMQ-6590: - The ACTIVEMQ_KILL_MAXSECONDS actually occurred to me as well a bit after I responded to this. I use a custom start up script but still have something similar where if the broker hangs it will be killed which obviously breaks in this case. So that case needs to be handled. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655414#comment-16655414 ] Jeff Genender commented on AMQ-6590: Also, add a unit test. The unit test can show a normal checkpoint, and you can also exercise the failure by force calling brokerService.getPersistenceAdapter().checkpoint() and then changing the db.data or db.free fingerprint with some file I/O using the File object, then restarting and testing that it did a full recovery. Should be straight forward. Open a new Jira on this :) > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655260#comment-16655260 ] Alan Protasio commented on AMQ-6590: Yep... Or we can change the fingerprint for freePageTxId or something like that and be a single increment (freePageTxId++) on every time i save db.free > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655236#comment-16655236 ] Jeff Genender commented on AMQ-6590: One more small comment, I would change from using Random as the fingerprint and perhaps use a time stamp or something. The Random could potentially have a clash and possibly could have a collision which would cause corruption. Perhaps use a Time function or time based UUID as your finger print. Since this will not be doing 2 checkpoints in under a millisecond, I think that would be the safest. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654678#comment-16654678 ] Jeff Genender commented on AMQ-6590: Hi Alan... that's awesome! That would totally work in cutting down the potential for the recovery. May I suggest that you open a new Jira to track this since this one is resolved. Right now parts of this are in multiple releases, so its own Jira may help ensure we know what release it would be a part of. Your code looks pretty clear and I think it deserves its own JIRA Please reference this one in the new JIRA. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654671#comment-16654671 ] Alan Protasio commented on AMQ-6590: Actually this fixed the failover speed, but my proposal I just makes it more streamlined > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16654661#comment-16654661 ] Alan Protasio commented on AMQ-6590: Hum I dont really think that this is solving the all problem Of course in the case of a crash (unclean shutdown) the fail over will be faster But after that, let's imagine that we wanna do a normal restart. Well the broker will have a long shutdown and while doing that it will still hold the lock making the failover slow the same way. And maybe now is worst... Imagine that the shutdown is taking 2 minutes and the ACTIVEMQ_KILL_MAXSECONDS is set to 30 secs... In this case all the shutdown will be unclean (so lots of dangling free pages - larger index file on each restart) until we delete the index file and perform a full recovery. Why don't we save the free page in every checkpoint + keep track of it's hash or fingerPrint (SequenceSetHash) in the metadata (the page size is already saved there). [https://github.com/apache/activemq/blob/5d3a3fcca7fbc04e60a146e42fb1f65b94e4ea7b/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/page/PageFile.java#L468] Now, in the load method, we can check if the fingerprint of the metadata is equal to the saved free list... if it's equal we can load this list, if it's not, we can recover it (read the whole index file as before). So in each checkpoint we should: Calculate freeList hash (or this can be pre calculated changing it every time that this list is modified) Save the free list Save this information in the metadata Now, on the load we can check if the hash of the loaded free list is the equal to the metadata... if it is we are good. If you think that this makes sense i can create a Jira to try to implement this.. This is a rough draft of what I meant and seems to be working. [https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5] Is important to note that in the Pagefile#load method we need to rebuild the index in the same way it was in the checkpoint (the recovery will be performed after to replay the time btw the checkpoint and the crash) So... Imagine the timeline: T0 -> P1, P2 and P3 are free. T1 -> Checkpoint T2 -> P1 got occupied. T3 -> Crash In the current scenario after the Pagefile#load the P1 will be free and then the replay will mark P1 as occupied... My change only make sure that db.data and db.free are in sync and showing the reality in T1 (checkpoint), If they are in sync i can use the db.free. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653873#comment-16653873 ] Gary Tully commented on AMQ-6590: - [~cshannon] sure, the recovery penalty needs to be taken at some stage. The ~2mins is not ideal, AMQ-7055 helps, but also for the really large index cases, being able to set the index page size may help. That is currently not exposed in kahadb and defaults to 4k, which means loads of small seek/reads of the index file. Exposing that may help. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653872#comment-16653872 ] Christopher L. Shannon commented on AMQ-6590: - [~jgenender] , ah yeah that makes sense, on the failover case on a shared store you want to resume as fast as possible So yeah for sure this is a good chance as it helps that failover case and then you can plan later when to take that hit in a clean restart or shutdown. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653831#comment-16653831 ] Jeff Genender commented on AMQ-6590: [~cshannon], I think the criticality of this is restart on a failed broker. It could cause slowness in failover that delays startup of the broker. In a clean shut down, yes the delay moved to the back, but its a known delay when you are shutting it down in a clean fashion. Thus you can plan it. This makes sense as during a failover, you likely need availability immediately since that generally is not planned. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653818#comment-16653818 ] Christopher L. Shannon commented on AMQ-6590: - Nice catch [~gtully] , I agree that it is fine to temporarily take the space hit as it will get cleaned up on the next clean shutdown. My only question is will there now be a large delay on shutdown instead? So are we just moving the 2 minute lag time to having to wait on shutdown instead of startup? In that case it doesn’t seem to be much different on a restart case as you have to wait the 2 minutes either way > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653629#comment-16653629 ] Jeff Genender commented on AMQ-6590: Nice Gary Tully. I pushed this to 5.15.x (5.15.7) as this was supposed to be in the 5.15 line as well. Probably should have been a new improvement JIRA. Updated the versions above as well. Thank you! > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4, 5.16.0, 5.15.7 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653622#comment-16653622 ] ASF subversion and git services commented on AMQ-6590: -- Commit cee7f014fa4afca04681066946e5a2b6811141af in activemq's branch refs/heads/activemq-5.15.x from gtully [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=cee7f01 ] AMQ-6590 - rework fix to take the recovery hit on clean shutdown rather than on restart, trade off availability for short term disk usage > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653549#comment-16653549 ] Jamie goodyear commented on AMQ-6590: - Can we back port this change to the 5.15.x line as well? > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653398#comment-16653398 ] Gary Tully commented on AMQ-6590: - [~cshannon] I have move this fix to the shutdown phase b/c the recovery work can be significant on very large stores. The down side is temporary disk space usage by the index which is a better trade off IMHO. I also added an info log message to indicate that the some recovery is going on. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653395#comment-16653395 ] ASF subversion and git services commented on AMQ-6590: -- Commit 5d3a3fcca7fbc04e60a146e42fb1f65b94e4ea7b in activemq's branch refs/heads/master from gtully [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=5d3a3fc ] AMQ-6590 - rework fix to take the recovery hit on clean shutdown rather than on restart, trade off availability for short term disk usage > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653357#comment-16653357 ] Gary Tully commented on AMQ-6590: - The checkpoint won't help, it will not be possible to work with a stale free list b/c one of those may now be occupied, hense the index file will have to grow, however we can still do the full and necessary free recovery on shutdown rather than on start. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653349#comment-16653349 ] Gary Tully commented on AMQ-6590: - It turns out that this change, while good, means that we take a hit on start in the normal failover case where the primary dies uncleanly. There have been reports of more than 2mins to start and the threads are stuck in sequence.set add. AMQ-7055 helps a good bit, but the problem is we are trading off availability for disk usage and taking the hit during restart. I am thinking it may be better to do a checkpoint of the feeList when we do gc, the cleanup phase, and accept that information on restart. If the restart is unclean, we remember that and do a freeList recovery when we next do an orderly shutdown. In that way, we can still restart fast, lose some disk space to some missed free pages and gracefully recover when we are stopping. [~cshannon] I wonder if that will hold together? The other approach is do have the option to do this offline, offline work has always been on the todo. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon >Priority: Major > Fix For: 5.15.0, 5.14.4 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850043#comment-15850043 ] Christopher L. Shannon commented on AMQ-6590: - Thanks for taking a look. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.15.0, 5.14.4 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850004#comment-15850004 ] Gary Tully commented on AMQ-6590: - [~cshannon] sure. that fix does look correct and is a great catch. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.15.0, 5.14.4 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849847#comment-15849847 ] Christopher L. Shannon commented on AMQ-6590: - [~gtully], Can you do a quick sanity check on this? I believe this is the correct fix. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.15.0, 5.14.4 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849845#comment-15849845 ] ASF subversion and git services commented on AMQ-6590: -- Commit cc28f5dc3905e411b3e5ce39240baf21ccedb0af in activemq's branch refs/heads/activemq-5.14.x from [~cshannon] [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=cc28f5d ] https://issues.apache.org/jira/browse/AMQ-6590 Removing un-needed imports (cherry picked from commit 83511c96e5cf91f348a81ecc3c4439aa345548dd) > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.15.0, 5.14.4 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849844#comment-15849844 ] ASF subversion and git services commented on AMQ-6590: -- Commit 83511c96e5cf91f348a81ecc3c4439aa345548dd in activemq's branch refs/heads/master from [~cshannon] [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=83511c9 ] https://issues.apache.org/jira/browse/AMQ-6590 Removing un-needed imports > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > Fix For: 5.15.0, 5.14.4 > > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849841#comment-15849841 ] ASF subversion and git services commented on AMQ-6590: -- Commit 6a87e13eb3d6f819b0d05ee8d16f6d955442425e in activemq's branch refs/heads/activemq-5.14.x from [~cshannon] [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=6a87e13 ] https://issues.apache.org/jira/browse/AMQ-6590 Fix KahaDB index free page recovery on unclean shutdown so that existing free pages will be tracked and not lost. (cherry picked from commit 38d85be476a06fe9a1f60b4f38232d64a6d0398a) > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AMQ-6590) KahaDB index loses track of free pages on unclean shutdown
[ https://issues.apache.org/jira/browse/AMQ-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849839#comment-15849839 ] ASF subversion and git services commented on AMQ-6590: -- Commit 38d85be476a06fe9a1f60b4f38232d64a6d0398a in activemq's branch refs/heads/master from [~cshannon] [ https://git-wip-us.apache.org/repos/asf?p=activemq.git;h=38d85be ] https://issues.apache.org/jira/browse/AMQ-6590 Fix KahaDB index free page recovery on unclean shutdown so that existing free pages will be tracked and not lost. > KahaDB index loses track of free pages on unclean shutdown > -- > > Key: AMQ-6590 > URL: https://issues.apache.org/jira/browse/AMQ-6590 > Project: ActiveMQ > Issue Type: Bug > Components: Broker >Affects Versions: 5.14.3 >Reporter: Christopher L. Shannon >Assignee: Christopher L. Shannon > > I have discovered an issue with the KahaDB index recovery after an unclean > shutdown (OOM error, kill -9, etc) that leads to excessive disk space usage. > Normally on clean shutdown the index stores the known set of free pages to > db.free and reads that in on start up to know which pages can be re-used. On > an unclean shutdown this is not written to disk so on start up the index is > supposed to scan the page file to figure out all of the free pages. > Unfortunately it turns out that this scan of the page file is being done > before the total page count value has been set so when the iterator is > created it always thinks there are 0 pages to scan. > The end result is that every time an unclean shutdown occurs all known free > pages are lost and no longer tracked. This of course means new free pages > have to be allocated and all of the existing space is now lost which will > lead to excessive index file growth over time. -- This message was sent by Atlassian JIRA (v6.3.15#6346)