[jira] [Commented] (IGNITE-7019) Cluster can not survive after IgniteOOM
[ https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358389#comment-16358389 ] Anton Vinogradov commented on IGNITE-7019: -- [~cyberdemon], Please ping me once TC will be fully checked and I'll merge the changes. [~gvvinblade], Thanks a lot for final review! > Cluster can not survive after IgniteOOM > --- > > Key: IGNITE-7019 > URL: https://issues.apache.org/jira/browse/IGNITE-7019 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.3 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Sorokin >Priority: Critical > Labels: iep-7 > Fix For: 2.5 > > > even if we have full sync mode and transactional cache we can't add new nodes > if there was IgniteOOM, after adding new nodes and re-balancing, old nodes > can't evict partitions: > {code} > [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition > eviction failed, this can cause grid hang. > class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough > memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB] > Consider increasing memory policy size, enabling evictions, adding more nodes > to the cluster, reducing number of backups or reducing model size. > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > Discussion on the dev list: > http://apache-ignite-developers.2346864.n4.nabble.com/How-properly-handle-IgniteOOM-td25288.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7019) Cluster can not survive after IgniteOOM
[ https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358365#comment-16358365 ] Igor Seliverstov commented on IGNITE-7019: -- [~cyberdemon], looks OK > Cluster can not survive after IgniteOOM > --- > > Key: IGNITE-7019 > URL: https://issues.apache.org/jira/browse/IGNITE-7019 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.3 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Sorokin >Priority: Critical > Labels: iep-7 > Fix For: 2.5 > > > even if we have full sync mode and transactional cache we can't add new nodes > if there was IgniteOOM, after adding new nodes and re-balancing, old nodes > can't evict partitions: > {code} > [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition > eviction failed, this can cause grid hang. > class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough > memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB] > Consider increasing memory policy size, enabling evictions, adding more nodes > to the cluster, reducing number of backups or reducing model size. > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > Discussion on the dev list: > http://apache-ignite-developers.2346864.n4.nabble.com/How-properly-handle-IgniteOOM-td25288.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7019) Cluster can not survive after IgniteOOM
[ https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355742#comment-16355742 ] Igor Seliverstov commented on IGNITE-7019: -- [~cyberdemon], I've left a couple of comments on github > Cluster can not survive after IgniteOOM > --- > > Key: IGNITE-7019 > URL: https://issues.apache.org/jira/browse/IGNITE-7019 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.3 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Sorokin >Priority: Critical > Labels: iep-7 > Fix For: 2.5 > > > even if we have full sync mode and transactional cache we can't add new nodes > if there was IgniteOOM, after adding new nodes and re-balancing, old nodes > can't evict partitions: > {code} > [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition > eviction failed, this can cause grid hang. > class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough > memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB] > Consider increasing memory policy size, enabling evictions, adding more nodes > to the cluster, reducing number of backups or reducing model size. > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > Discussion on the dev list: > http://apache-ignite-developers.2346864.n4.nabble.com/How-properly-handle-IgniteOOM-td25288.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7019) Cluster can not survive after IgniteOOM
[ https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343239#comment-16343239 ] Dmitriy Sorokin commented on IGNITE-7019: - [~avinogradov], Review my patch, please. > Cluster can not survive after IgniteOOM > --- > > Key: IGNITE-7019 > URL: https://issues.apache.org/jira/browse/IGNITE-7019 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.3 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Sorokin >Priority: Critical > Labels: iep-7 > Fix For: 2.5 > > > even if we have full sync mode and transactional cache we can't add new nodes > if there was IgniteOOM, after adding new nodes and re-balancing, old nodes > can't evict partitions: > {code} > [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition > eviction failed, this can cause grid hang. > class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough > memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB] > Consider increasing memory policy size, enabling evictions, adding more nodes > to the cluster, reducing number of backups or reducing model size. > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > Discussion on the dev list: > http://apache-ignite-developers.2346864.n4.nabble.com/How-properly-handle-IgniteOOM-td25288.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7019) Cluster can not survive after IgniteOOM
[ https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340947#comment-16340947 ] Dmitriy Sorokin commented on IGNITE-7019: - Final solution which was coded is passing ReuseBag instance as parameter through PagesList's getPageForPut and addStripe methods to allocatePage method. That allows use ReuseBag's pages before trying to allocate new pages. > Cluster can not survive after IgniteOOM > --- > > Key: IGNITE-7019 > URL: https://issues.apache.org/jira/browse/IGNITE-7019 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.3 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Sorokin >Priority: Critical > Labels: iep-7 > Fix For: 2.5 > > > even if we have full sync mode and transactional cache we can't add new nodes > if there was IgniteOOM, after adding new nodes and re-balancing, old nodes > can't evict partitions: > {code} > [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition > eviction failed, this can cause grid hang. > class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough > memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB] > Consider increasing memory policy size, enabling evictions, adding more nodes > to the cluster, reducing number of backups or reducing model size. > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > Discussion on the dev list: > http://apache-ignite-developers.2346864.n4.nabble.com/How-properly-handle-IgniteOOM-td25288.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7019) Cluster can not survive after IgniteOOM
[ https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334085#comment-16334085 ] Dmitriy Sorokin commented on IGNITE-7019: - We discussed possible solutions with [~mcherkasov] and [~avinogradov], and chose the following: first, when IOOME occured on page moving from bucket with lower index to higher one, we leave page on old bucket; second, we add periodical task for looking up such pages (placed on wrong buckets) and correcting its placement if possible (no IOOME on page moving). Also we need reproducer for this bug, I'll make it at first. > Cluster can not survive after IgniteOOM > --- > > Key: IGNITE-7019 > URL: https://issues.apache.org/jira/browse/IGNITE-7019 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.3 >Reporter: Mikhail Cherkasov >Assignee: Dmitriy Sorokin >Priority: Critical > Labels: iep-7 > Fix For: 2.5 > > > even if we have full sync mode and transactional cache we can't add new nodes > if there was IgniteOOM, after adding new nodes and re-balancing, old nodes > can't evict partitions: > {code} > [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition > eviction failed, this can cause grid hang. > class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough > memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB] > Consider increasing memory policy size, enabling evictions, adding more nodes > to the cluster, reducing number of backups or reducing model size. > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {code} > Discussion on the dev list: > http://apache-ignite-developers.2346864.n4.nabble.com/How-properly-handle-IgniteOOM-td25288.html -- This message was sent by Atlassian JIRA
[jira] [Commented] (IGNITE-7019) Cluster can not survive after IgniteOOM
[ https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290065#comment-16290065 ] Denis Magda commented on IGNITE-7019: - This problem is related to the discussion around Ignite internal problems and their possible resolution: http://apache-ignite-developers.2346864.n4.nabble.com/Internal-problems-requiring-graceful-node-shutdown-reboot-etc-td24856.html Referring to that discussion, I would define a special IgniteFailureAction in response to IgniteOOM (IgniteFailureCause in terms of the new API). The action can purge, wipe out the page memory or do another extra steps. > Cluster can not survive after IgniteOOM > --- > > Key: IGNITE-7019 > URL: https://issues.apache.org/jira/browse/IGNITE-7019 > Project: Ignite > Issue Type: Bug > Components: cache >Affects Versions: 2.3 >Reporter: Mikhail Cherkasov >Priority: Critical > Fix For: 2.4 > > > even if we have full sync mode and transactional cache we can't add new nodes > if there was IgniteOOM, after adding new nodes and re-balancing, old nodes > can't evict partitions: > [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition > eviction failed, this can cause grid hang. > class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough > memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB] > Consider increasing memory policy size, enabling evictions, adding more nodes > to the cluster, reducing number of backups or reducing model size. > at > org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617) > at > org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v6.4.14#64029)