[jira] [Commented] (IGNITE-8469) Non-heap memory leak for calling cluster activation multi times
[ https://issues.apache.org/jira/browse/IGNITE-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16679644#comment-16679644 ] Maxim Muzafarov commented on IGNITE-8469: - [~mshonichev] If you are checking the 2.7 release branch it seems to me that it's not possible to run a different cluster on the same PDS files (don't know if this correct). This is was my case. > Non-heap memory leak for calling cluster activation multi times > --- > > Key: IGNITE-8469 > URL: https://issues.apache.org/jira/browse/IGNITE-8469 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Major > Fix For: 2.7 > > > Calling multiple time cluster (with enabled persistence and started client > nodes) activation {{ig3CB.cluster().active(true);}} leads to non-heap memory > leak. > Line > {{org/apache/ignite/internal/pagemem/impl/PageMemoryNoStoreImpl.java:234}} > looks suspicious because of in case method > {{org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl#start}} > callled multi times (e.g. activate(true) called multi times) we lost info > about allocated regions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8469) Non-heap memory leak for calling cluster activation multi times
[ https://issues.apache.org/jira/browse/IGNITE-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678452#comment-16678452 ] Max Shonichev commented on IGNITE-8469: --- Seems that two grids with the same PDS scenario is not reproducible anymore, second node just refuses to start, process dies, thus no leaks, right? :) {noformat} [19:12:05,058][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine). class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter [] at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1784) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1008) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2020) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1725) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1153) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1071) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:957) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:856) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:726) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:695) at org.apache.ignite.Ignition.start(Ignition.java:348) at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) Caused by: class org.apache.ignite.IgniteCheckedException: Failed to acquire file lock during 10 sec, (locked by [ba1467ea-0180-4f18-8151-973cb56ef4a7][]): /opt/tmp/gg/db/ign_8469/lock at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$FileLockHolder.tryLock(GridCacheDatabaseSharedManager.java:4303) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.acquireFileLock(GridCacheDatabaseSharedManager.java:593) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:496) at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:741) at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1781) ... 11 more [19:12:05,059][WARNING][main][IgniteKernal] Attempt to stop starting grid. This operation cannot be guaranteed to be successful. [19:12:05,070][INFO][main][FilePageStoreManager] Cleanup cache stores [total=0, left=0, cleanFiles=false] [19:12:05,076][INFO][main][IgniteKernal] >>> +--+ >>> Ignite ver. >>> stopped OK >>> +--+ >>> Grid uptime: 00:00:11.197 {noformat} [~Mmuzaf] was it your case? Or you'd started two Ignition manually within one JVM? > Non-heap memory leak for calling cluster activation multi times > --- > > Key: IGNITE-8469 > URL: https://issues.apache.org/jira/browse/IGNITE-8469 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Major > Fix For: 2.7 > > > Calling multiple time cluster (with enabled persistence and started client > nodes) activation {{ig3CB.cluster().active(true);}} leads to non-heap memory > leak. > Line > {{org/apache/ignite/internal/pagemem/impl/PageMemoryNoStoreImpl.java:234}} > looks suspicious because of in case method > {{org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl#start}} > callled multi times (e.g. activate(true) called multi times) we lost info > about allocated regions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8469) Non-heap memory leak for calling cluster activation multi times
[ https://issues.apache.org/jira/browse/IGNITE-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678378#comment-16678378 ] Max Shonichev commented on IGNITE-8469: --- Thank you for feedback, I'll try to test scenario with two clusters as well and report later > Non-heap memory leak for calling cluster activation multi times > --- > > Key: IGNITE-8469 > URL: https://issues.apache.org/jira/browse/IGNITE-8469 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Major > Fix For: 2.7 > > > Calling multiple time cluster (with enabled persistence and started client > nodes) activation {{ig3CB.cluster().active(true);}} leads to non-heap memory > leak. > Line > {{org/apache/ignite/internal/pagemem/impl/PageMemoryNoStoreImpl.java:234}} > looks suspicious because of in case method > {{org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl#start}} > callled multi times (e.g. activate(true) called multi times) we lost info > about allocated regions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8469) Non-heap memory leak for calling cluster activation multi times
[ https://issues.apache.org/jira/browse/IGNITE-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678367#comment-16678367 ] Maxim Muzafarov commented on IGNITE-8469: - [~mshonichev] Hello! Thanks for testing this issue. The reproducer was added to the test scope and merged with PR. Please, see {{PageMemoryNoStoreLeakTest}}. As a high-level example of this memory leak, you can try to look at {{IgniteChangeGlobalStateTest.testActivateAfterFailGetLock}}. If we run it 100+ on TC without this fix it will fail with OOM. I don't remember all the details the memory leak scenario, but you should configure the backup cluster (with or without client nodes) and it should fail on activation when the main cluster is online. And this is the source of memory leak. I've done several tests with and without this fix and it seems to me that everything works fine. > Non-heap memory leak for calling cluster activation multi times > --- > > Key: IGNITE-8469 > URL: https://issues.apache.org/jira/browse/IGNITE-8469 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Major > Fix For: 2.7 > > > Calling multiple time cluster (with enabled persistence and started client > nodes) activation {{ig3CB.cluster().active(true);}} leads to non-heap memory > leak. > Line > {{org/apache/ignite/internal/pagemem/impl/PageMemoryNoStoreImpl.java:234}} > looks suspicious because of in case method > {{org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl#start}} > callled multi times (e.g. activate(true) called multi times) we lost info > about allocated regions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8469) Non-heap memory leak for calling cluster activation multi times
[ https://issues.apache.org/jira/browse/IGNITE-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677009#comment-16677009 ] Max Shonichev commented on IGNITE-8469: --- ok, here are final results with 100 iterations activate/deactivate scenario: without fix, total process memory is slowly increasing approx each 20 iterations: {noformat} 10298776 ... 10283356 10284384 ... 10286432 {noformat} with fix, total process memory is changed a little first few runs, than remains constant all 90+ iterations {noformat} 14501520 10099096 10232216 10232216 10297752 ... 10297752 {noformat} so, it seems, that this scenario also shows memory leak fixed, [~Mmuzaf], please confirm > Non-heap memory leak for calling cluster activation multi times > --- > > Key: IGNITE-8469 > URL: https://issues.apache.org/jira/browse/IGNITE-8469 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Major > Fix For: 2.7 > > > Calling multiple time cluster (with enabled persistence and started client > nodes) activation {{ig3CB.cluster().active(true);}} leads to non-heap memory > leak. > Line > {{org/apache/ignite/internal/pagemem/impl/PageMemoryNoStoreImpl.java:234}} > looks suspicious because of in case method > {{org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl#start}} > callled multi times (e.g. activate(true) called multi times) we lost info > about allocated regions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8469) Non-heap memory leak for calling cluster activation multi times
[ https://issues.apache.org/jira/browse/IGNITE-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676934#comment-16676934 ] Max Shonichev commented on IGNITE-8469: --- Possibly, I'm seeing a different kind of leak, about this scenario: {quote}Global scenario for this leak is: 1.Start topology with client nodes, presistence enabled 2.Set active(true) for cluster. This activation should fail by some circumstances (e.g. some locks exists) 3.IgniteCacheDatabaseSharedManager started and onActive method called here. New memory segment allocated for client node 4.Set active(true) again. Activation successfull, non heap memory leak introduced{quote} 1. should I start client nodes with persistence enabled ?! 2. how to achive 'locks exist' condition to fail activation? 3. should IgniteCacheDatabaseSharedManager be started manually or you've meant that this is automagically called under the hood upon active(true)? > Non-heap memory leak for calling cluster activation multi times > --- > > Key: IGNITE-8469 > URL: https://issues.apache.org/jira/browse/IGNITE-8469 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Major > Fix For: 2.7 > > > Calling multiple time cluster (with enabled persistence and started client > nodes) activation {{ig3CB.cluster().active(true);}} leads to non-heap memory > leak. > Line > {{org/apache/ignite/internal/pagemem/impl/PageMemoryNoStoreImpl.java:234}} > looks suspicious because of in case method > {{org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl#start}} > callled multi times (e.g. activate(true) called multi times) we lost info > about allocated regions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8469) Non-heap memory leak for calling cluster activation multi times
[ https://issues.apache.org/jira/browse/IGNITE-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676922#comment-16676922 ] Max Shonichev commented on IGNITE-8469: --- [~Mmuzaf] what is exact reproducer for that issue? I'm executing 100 iterations of activate-deactivate and see small increase in total pages sum per process for both fixed and original version > Non-heap memory leak for calling cluster activation multi times > --- > > Key: IGNITE-8469 > URL: https://issues.apache.org/jira/browse/IGNITE-8469 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Major > Fix For: 2.7 > > > Calling multiple time cluster (with enabled persistence and started client > nodes) activation {{ig3CB.cluster().active(true);}} leads to non-heap memory > leak. > Line > {{org/apache/ignite/internal/pagemem/impl/PageMemoryNoStoreImpl.java:234}} > looks suspicious because of in case method > {{org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl#start}} > callled multi times (e.g. activate(true) called multi times) we lost info > about allocated regions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8469) Non-heap memory leak for calling cluster activation multi times
[ https://issues.apache.org/jira/browse/IGNITE-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482554#comment-16482554 ] ASF GitHub Bot commented on IGNITE-8469: Github user asfgit closed the pull request at: https://github.com/apache/ignite/pull/3986 > Non-heap memory leak for calling cluster activation multi times > --- > > Key: IGNITE-8469 > URL: https://issues.apache.org/jira/browse/IGNITE-8469 > Project: Ignite > Issue Type: Bug > Components: persistence >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Major > Fix For: 2.6 > > > Calling multiple time cluster (with enabled persistence and started client > nodes) activation {{ig3CB.cluster().active(true);}} leads to non-heap memory > leak. > Line > {{org/apache/ignite/internal/pagemem/impl/PageMemoryNoStoreImpl.java:234}} > looks suspicious because of in case method > {{org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl#start}} > callled multi times (e.g. activate(true) called multi times) we lost info > about allocated regions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)