D'oh! I thought our production machine (where the error occurred) had the same amount of memory as our test machine, but it doesn't: I configured the test machine with 15GB of memory, but the production machine only has 4GB. That would explain it.
Thank you very much! -- Scott On Tue, Nov 24, 2020 at 9:23 AM Ilya Kasnacheev <[email protected]> wrote: > Hello! > > It seems that you have run out of available memory. I.e., your operating > system could not allocate more memory even though the demand was still in > the range permitted by data region configuration. How much RAM do you have > on that machine? > > That you still have heap left is irrelevant here, since the allocation is > for non-heap memory. > > Regards, > -- > Ilya Kasnacheev > > > пн, 23 нояб. 2020 г. в 21:44, Scott Prater <[email protected]>: > >> Hello, >> >> I recently ran into an out-of-memory error on a durable persistent cache >> I set up a few weeks ago. I have a single node, with durable persistence >> enabled, as well as WAL archiving. I'm running Ignite ver. >> 2.8.1#20200521-sha1:86422096. >> >> I looked at the stack trace, but I couldn't get a clear fix on what part >> of the system ran out of memory, or what parameters I should change to fix >> the problem. From what I could tell of the stack dump, it looks like the >> WAL archive ran out of memory; but the memory usage report that occurred >> just a minute before the exception showed plenty of memory was available. >> >> Can someone with more experience tuning Ignite memory point me towards >> the configuration parameters I should adjust? Below are my log and my >> configuration. ( I have read the wiki page on memory tuning, but I'm happy >> to be referred back to it.) >> >> The log, with the metrics right before the OOM exception, then the OOM >> exception: >> >> [2020-11-22T19:20:39,787][INFO ][grid-timeout-worker-#22][IgniteKernal] >> Metrics for local node (to disable set 'metricsLogFrequency' to 0) >> ^-- Node [id=2845fe3e, uptime=5 days, 15:08:38.033] >> ^-- Cluster [hosts=1, CPUs=4, servers=1, clients=0, topVer=1, >> minorTopVer=1] >> ^-- Network [addrs=[0:0:0:0:0:0:0:1%lo, xxx.xxx.xxx.xxx, 127.0.0.1, >> yyy.yyy.yyy.yyy], discoPort=47500, commPort=47100] >> ^-- CPU [CPUs=4, curLoad=0.33%, avgLoad=0.29%, GC=0%] >> ^-- Heap [used=316MB, free=62.34%, comm=812MB] >> ^-- Off-heap memory [used=4288MB, free=33.45%, allocated=6344MB] >> ^-- Page memory [pages=1085139] >> ^-- sysMemPlc region [type=internal, persistence=true, >> lazyAlloc=false, >> ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%, >> allocRam=100MB, allocTotal=0MB] >> ^-- default_region region [type=default, persistence=true, >> lazyAlloc=true, >> ... initCfg=256MB, maxCfg=6144MB, usedRam=4288MB, freeRam=30.2%, >> allocRam=6144MB, allocTotal=4240MB] >> ^-- metastoreMemPlc region [type=internal, persistence=true, >> lazyAlloc=false, >> ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.94%, >> allocRam=0MB, allocTotal=0MB] >> ^-- TxLog region [type=internal, persistence=true, lazyAlloc=false, >> ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, >> allocRam=100MB, allocTotal=0MB] >> ^-- Ignite persistence [used=4240MB] >> ^-- Outbound messages queue [size=0] >> ^-- Public thread pool [active=0, idle=0, qSize=0] >> ^-- System thread pool [active=0, idle=6, qSize=0] >> [2020-11-22T19:21:15,585][ERROR][db-checkpoint-thread-#63][] Critical >> system error detected. Will be handled accordingly to configured handler >> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet >> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], >> failureCtx=FailureContext [type=CRITICAL_ERROR, >> err=java.lang.OutOfMemoryError]] >> java.lang.OutOfMemoryError: null >> at sun.misc.Unsafe.allocateMemory(Native Method) ~[?:1.8.0_121] >> at >> org.apache.ignite.internal.util.GridUnsafe.allocateMemory(GridUnsafe.java:1205) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.util.GridUnsafe.allocateBuffer(GridUnsafe.java:264) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.wal.ByteBufferExpander.<init>(ByteBufferExpander.java:36) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.<init>(AbstractWalRecordsIterator.java:125) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2701) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2637) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:944) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:920) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.initIfNeeded(CheckpointEntry.java:347) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.access$300(CheckpointEntry.java:243) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.initIfNeeded(CheckpointEntry.java:122) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.groupState(CheckpointEntry.java:104) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.addCpToEarliestCpMap(CheckpointHistory.java:242) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.addCheckpoint(CheckpointHistory.java:175) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3952) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3515) >> ~[ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3404) >> [ignite-core-2.9.0.jar:2.9.0] >> at >> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) >> [ignite-core-2.9.0.jar:2.9.0] >> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121] >> >> My configuration: >> >> <beans xmlns="http://www.springframework.org/schema/beans" >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >> xsi:schemaLocation=" >> http://www.springframework.org/schema/beans >> http://www.springframework.org/schema/beans/spring-beans.xsd"> >> <bean id="ignite.cfg" >> class="org.apache.ignite.configuration.IgniteConfiguration"> >> <property name="gridLogger"> >> <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger"> >> <constructor-arg type="java.lang.String" >> value="/etc/apache-ignite/log4j2.xml"/> >> </bean> >> </property> >> <property name="dataStorageConfiguration"> >> <bean >> class="org.apache.ignite.configuration.DataStorageConfiguration"> >> <property name="storagePath" >> value="/data/ignite/persistent-cache"/> >> <property name="walPath" value="/data/ignite/wal"/> >> <property name="walArchivePath" >> value="/data/ignite/wal/archive"/> >> <property name="defaultDataRegionConfiguration"> >> <bean >> class="org.apache.ignite.configuration.DataRegionConfiguration"> >> <property name="name" value="default_region"/> >> <property name="persistenceEnabled" value="true"/> >> <property name="maxSize" value="#{6L * 1024 * 1024 * >> 1024}"/> >> </bean> >> </property> >> <property name="pageSize" value="#{4 * 1024}"/> >> <property name="maxWalArchiveSize" value="#{100L * 1024 * 1024 >> * 1024}"/> >> </bean> >> </property> >> <property name="cacheConfiguration"> >> <list> >> <bean >> class="org.apache.ignite.configuration.CacheConfiguration"> >> <property name="name" value="default" /> >> <property name="atomicityMode" value="ATOMIC" /> >> <property name="backups" value="1"/> >> <property name="dataRegionName" value="default_region"/> >> </bean> >> </list> >> </property> >> <property name="discoverySpi"> >> <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi"> >> <property name="ipFinder"> >> <bean >> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> >> <property name="addresses"> >> <list> >> <value>127.0.0.1</value> >> </list> >> </property> >> </bean> >> </property> >> </bean> >> </property> >> <property name="communicationSpi"> >> <bean >> class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"> >> <property name="idleConnectionTimeout" value="60000"/> >> </bean> >> </property> >> </bean> >> </beans> >> >> Thanks in advance, >> >> -- Scott >> >
