Out-of-memory issue on single node cache with durable persistence
Hello, I recently ran into an out-of-memory error on a durable persistent cache I set up a few weeks ago. I have a single node, with durable persistence enabled, as well as WAL archiving. I'm running Ignite ver. 2.8.1#20200521-sha1:86422096. I looked at the stack trace, but I couldn't get a clear fix on what part of the system ran out of memory, or what parameters I should change to fix the problem. From what I could tell of the stack dump, it looks like the WAL archive ran out of memory; but the memory usage report that occurred just a minute before the exception showed plenty of memory was available. Can someone with more experience tuning Ignite memory point me towards the configuration parameters I should adjust? Below are my log and my configuration. ( I have read the wiki page on memory tuning, but I'm happy to be referred back to it.) The log, with the metrics right before the OOM exception, then the OOM exception: [2020-11-22T19:20:39,787][INFO ][grid-timeout-worker-#22][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=2845fe3e, uptime=5 days, 15:08:38.033] ^-- Cluster [hosts=1, CPUs=4, servers=1, clients=0, topVer=1, minorTopVer=1] ^-- Network [addrs=[0:0:0:0:0:0:0:1%lo, xxx.xxx.xxx.xxx, 127.0.0.1, yyy.yyy.yyy.yyy], discoPort=47500, commPort=47100] ^-- CPU [CPUs=4, curLoad=0.33%, avgLoad=0.29%, GC=0%] ^-- Heap [used=316MB, free=62.34%, comm=812MB] ^-- Off-heap memory [used=4288MB, free=33.45%, allocated=6344MB] ^-- Page memory [pages=1085139] ^-- sysMemPlc region [type=internal, persistence=true, lazyAlloc=false, ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%, allocRam=100MB, allocTotal=0MB] ^-- default_region region [type=default, persistence=true, lazyAlloc=true, ... initCfg=256MB, maxCfg=6144MB, usedRam=4288MB, freeRam=30.2%, allocRam=6144MB, allocTotal=4240MB] ^-- metastoreMemPlc region [type=internal, persistence=true, lazyAlloc=false, ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.94%, allocRam=0MB, allocTotal=0MB] ^-- TxLog region [type=internal, persistence=true, lazyAlloc=false, ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, allocRam=100MB, allocTotal=0MB] ^-- Ignite persistence [used=4240MB] ^-- Outbound messages queue [size=0] ^-- Public thread pool [active=0, idle=0, qSize=0] ^-- System thread pool [active=0, idle=6, qSize=0] [2020-11-22T19:21:15,585][ERROR][db-checkpoint-thread-#63][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.OutOfMemoryError]] java.lang.OutOfMemoryError: null at sun.misc.Unsafe.allocateMemory(Native Method) ~[?:1.8.0_121] at org.apache.ignite.internal.util.GridUnsafe.allocateMemory(GridUnsafe.java:1205) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.util.GridUnsafe.allocateBuffer(GridUnsafe.java:264) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.ByteBufferExpander.(ByteBufferExpander.java:36) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.(AbstractWalRecordsIterator.java:125) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.(FileWriteAheadLogManager.java:2701) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.(FileWriteAheadLogManager.java:2637) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:944) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:920) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.initIfNeeded(CheckpointEntry.java:347) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.access$300(CheckpointEntry.java:243) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.initIfNeeded(CheckpointEntry.java:122) ~[ignite-core-2.9.0.jar:2.9.0] at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.groupState(CheckpointEntry.java:104) ~[ignite-core-2.9.0.jar:2.9.0] at
Re: Question about Ignite persistence disk space used after clearing cache
Thanks, Denis. I'll take a look at that documentation. On Fri, Sep 25, 2020 at 6:30 PM Denis Magda wrote: > Hi Scott, > > The disk space is not compacted even after you clear the entire cache. The > compaction feature will be introduced to Ignite soon. So, the metric shows > the allocated size. This doc section suggests an approach for the actual > size calculation: > > https://www.gridgain.com/docs/latest/administrators-guide/monitoring-metrics/metrics#allocated-space-vs-actual-size-of-data > > > > - > Denis > > > On Fri, Sep 25, 2020 at 1:52 PM Scott Prater wrote: > >> I have a question about how the off-heap usage is reported when Ignite >> persistence is configured. I have a single node set up. I stored about >> 1GB of items in the cache, then cleared the cache (remotely, using the Java >> thin client: ClientCache.clear()). >> >> I then verified that the items were no longer in the cache. >> >> However, when I look at the Ignite log, I do not see that the disk space >> was freed: >> >> [2020-09-25T11:17:36,299][INFO ][grid-timeout-worker-#23][IgniteKernal] >> Metrics for local node (to disable set 'metricsLogFrequency' to 0) >> ^-- Node [id=db4ed295, uptime=00:34:00.176] >> ^-- H/N/C [hosts=1, nodes=1, CPUs=8] >> ^-- CPU [cur=0.2%, avg=0.3%, GC=0%] >> ^-- PageMemory [pages=250315] >> ^-- Heap [used=180MB, free=94.85%, comm=438MB] >> ^-- Off-heap [used=989MB, free=88.35%, comm=8392MB] >> ^-- sysMemPlc region [used=0MB, free=99.99%, comm=100MB] >> ^-- default_region region [used=989MB, free=87.92%, comm=8192MB] >> ^-- metastoreMemPlc region [used=0MB, free=99.94%, comm=0MB] >> ^-- TxLog region [used=0MB, free=100%, comm=100MB] >> ^-- Ignite persistence [used=998MB] >> ^-- sysMemPlc region [used=0MB] >> ^-- default_region region [used=998MB] >> ^-- metastoreMemPlc region [used=0MB] >> ^-- TxLog region [used=0MB] >> ^-- Outbound messages queue [size=0] >> ^-- Public thread pool [active=0, idle=0, qSize=0] >> ^-- System thread pool [active=0, idle=6, qSize=0] >> >> "Ignite persistence [used=998MB]" seems to indicate that 1GB of data is >> still in the cache. Is this simply a report of the disk space *allocated*, >> or is actual disk space in use? Is there a way to get both measurements? >> >> thanks, >> >> -- Scott >> >
Question about Ignite persistence disk space used after clearing cache
I have a question about how the off-heap usage is reported when Ignite persistence is configured. I have a single node set up. I stored about 1GB of items in the cache, then cleared the cache (remotely, using the Java thin client: ClientCache.clear()). I then verified that the items were no longer in the cache. However, when I look at the Ignite log, I do not see that the disk space was freed: [2020-09-25T11:17:36,299][INFO ][grid-timeout-worker-#23][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=db4ed295, uptime=00:34:00.176] ^-- H/N/C [hosts=1, nodes=1, CPUs=8] ^-- CPU [cur=0.2%, avg=0.3%, GC=0%] ^-- PageMemory [pages=250315] ^-- Heap [used=180MB, free=94.85%, comm=438MB] ^-- Off-heap [used=989MB, free=88.35%, comm=8392MB] ^-- sysMemPlc region [used=0MB, free=99.99%, comm=100MB] ^-- default_region region [used=989MB, free=87.92%, comm=8192MB] ^-- metastoreMemPlc region [used=0MB, free=99.94%, comm=0MB] ^-- TxLog region [used=0MB, free=100%, comm=100MB] ^-- Ignite persistence [used=998MB] ^-- sysMemPlc region [used=0MB] ^-- default_region region [used=998MB] ^-- metastoreMemPlc region [used=0MB] ^-- TxLog region [used=0MB] ^-- Outbound messages queue [size=0] ^-- Public thread pool [active=0, idle=0, qSize=0] ^-- System thread pool [active=0, idle=6, qSize=0] "Ignite persistence [used=998MB]" seems to indicate that 1GB of data is still in the cache. Is this simply a report of the disk space *allocated*, or is actual disk space in use? Is there a way to get both measurements? thanks, -- Scott
Re: Feature request: method to test active connection in Ignite thin client
Thanks, Pavel. I'll follow the discussion there. On Sun, Sep 13, 2020 at 5:27 AM Pavel Tupitsyn wrote: > Sorry, IgniteClient.cluster() was added for Ignite 2.9, which is not yet > released. > ClientCache.size() works too. > > I've started a discussion on the dev list [1] regarding a dedicated ping > API. > > [1] > http://apache-ignite-developers.2346864.n4.nabble.com/Thin-Client-ping-operation-td49181.html > > On Sat, Sep 12, 2020 at 9:33 PM Scott Prater wrote: > >> Hello, Pavel -- >> >> There isn't a method cluster() in the IgniteClient java class. I came up >> with a different workaround: I retrieve the number of cached entries on >> the heap, and check that it's greater than -1 and no exception is thrown. >> That's not ideal, but should work, as long as ClientCache.size() never >> returns a negative number. Ideally I would use something like >> clientCacheObject.ping(), which would simply send a request to the node or >> cluster, and get back a response (or not). >> >> -- Scott >> >> >> On Sat, Sep 12, 2020 at 9:33 AM Scott Prater wrote: >> >>> Correct, something like a ping. But the state() method should serve the >>> purpose, too. Thanks! >>> >>> On Sat, Sep 12, 2020 at 2:30 AM Pavel Tupitsyn >>> wrote: >>> >>>> Hello Scott, >>>> >>>> ClientCache.getName() is a local operation, it simply returns a cached >>>> string. >>>> >>>> IgniteClient.cluster().state() is a good way to check the connectivity >>>> - >>>> it sends a lightweight request to the server. >>>> >>>> As I understood, you are asking for something like IgniteClient.ping(), >>>> right? >>>> >>>> Thanks, >>>> Pavel >>>> >>>> On Sat, Sep 12, 2020 at 2:25 AM Scott Prater wrote: >>>> >>>>> Hello, >>>>> >>>>> I'm a new Ignite user, and with just a little bit of exposure, I'm >>>>> quite impressed with it -- it did not take me long to get a >>>>> single standalone remote node up and running and start using it as a >>>>> durable memory key-value store. >>>>> >>>>> I created a connection pool for ClientCache in Java, using Apache >>>>> Commons Pool 2. So far it's working well, but I had to fudge overriding >>>>> the commons-pool2 "validateObject()" method, which is a method to test >>>>> that >>>>> your pooled object is still alive and well: I used >>>>> clientCache.getName().equals("MY_CACHE") as a test, but I have no idea if >>>>> this indicates whether the connection to my remote cache is active or not. >>>>> >>>>> In some future release, could you add a "isConnected()" method or >>>>> similar to ClientCache or IgniteClient (if it makes more sense there), for >>>>> ease of testing connections and determining when to discard bad client >>>>> objects? >>>>> >>>>> thanks, >>>>> >>>>> -- Scott >>>>> >>>>> >>>>> >>>> >>>>
Re: Feature request: method to test active connection in Ignite thin client
Hello, Pavel -- There isn't a method cluster() in the IgniteClient java class. I came up with a different workaround: I retrieve the number of cached entries on the heap, and check that it's greater than -1 and no exception is thrown. That's not ideal, but should work, as long as ClientCache.size() never returns a negative number. Ideally I would use something like clientCacheObject.ping(), which would simply send a request to the node or cluster, and get back a response (or not). -- Scott On Sat, Sep 12, 2020 at 9:33 AM Scott Prater wrote: > Correct, something like a ping. But the state() method should serve the > purpose, too. Thanks! > > On Sat, Sep 12, 2020 at 2:30 AM Pavel Tupitsyn > wrote: > >> Hello Scott, >> >> ClientCache.getName() is a local operation, it simply returns a cached >> string. >> >> IgniteClient.cluster().state() is a good way to check the connectivity - >> it sends a lightweight request to the server. >> >> As I understood, you are asking for something like IgniteClient.ping(), >> right? >> >> Thanks, >> Pavel >> >> On Sat, Sep 12, 2020 at 2:25 AM Scott Prater wrote: >> >>> Hello, >>> >>> I'm a new Ignite user, and with just a little bit of exposure, I'm quite >>> impressed with it -- it did not take me long to get a single standalone >>> remote node up and running and start using it as a durable memory key-value >>> store. >>> >>> I created a connection pool for ClientCache in Java, using Apache >>> Commons Pool 2. So far it's working well, but I had to fudge overriding >>> the commons-pool2 "validateObject()" method, which is a method to test that >>> your pooled object is still alive and well: I used >>> clientCache.getName().equals("MY_CACHE") as a test, but I have no idea if >>> this indicates whether the connection to my remote cache is active or not. >>> >>> In some future release, could you add a "isConnected()" method or >>> similar to ClientCache or IgniteClient (if it makes more sense there), for >>> ease of testing connections and determining when to discard bad client >>> objects? >>> >>> thanks, >>> >>> -- Scott >>> >>> >>> >> >>
Re: Feature request: method to test active connection in Ignite thin client
Correct, something like a ping. But the state() method should serve the purpose, too. Thanks! On Sat, Sep 12, 2020 at 2:30 AM Pavel Tupitsyn wrote: > Hello Scott, > > ClientCache.getName() is a local operation, it simply returns a cached > string. > > IgniteClient.cluster().state() is a good way to check the connectivity - > it sends a lightweight request to the server. > > As I understood, you are asking for something like IgniteClient.ping(), > right? > > Thanks, > Pavel > > On Sat, Sep 12, 2020 at 2:25 AM Scott Prater wrote: > >> Hello, >> >> I'm a new Ignite user, and with just a little bit of exposure, I'm quite >> impressed with it -- it did not take me long to get a single standalone >> remote node up and running and start using it as a durable memory key-value >> store. >> >> I created a connection pool for ClientCache in Java, using Apache Commons >> Pool 2. So far it's working well, but I had to fudge overriding the >> commons-pool2 "validateObject()" method, which is a method to test that >> your pooled object is still alive and well: I used >> clientCache.getName().equals("MY_CACHE") as a test, but I have no idea if >> this indicates whether the connection to my remote cache is active or not. >> >> In some future release, could you add a "isConnected()" method or similar >> to ClientCache or IgniteClient (if it makes more sense there), for ease of >> testing connections and determining when to discard bad client objects? >> >> thanks, >> >> -- Scott >> >> >> > >
Feature request: method to test active connection in Ignite thin client
Hello, I'm a new Ignite user, and with just a little bit of exposure, I'm quite impressed with it -- it did not take me long to get a single standalone remote node up and running and start using it as a durable memory key-value store. I created a connection pool for ClientCache in Java, using Apache Commons Pool 2. So far it's working well, but I had to fudge overriding the commons-pool2 "validateObject()" method, which is a method to test that your pooled object is still alive and well: I used clientCache.getName().equals("MY_CACHE") as a test, but I have no idea if this indicates whether the connection to my remote cache is active or not. In some future release, could you add a "isConnected()" method or similar to ClientCache or IgniteClient (if it makes more sense there), for ease of testing connections and determining when to discard bad client objects? thanks, -- Scott