D'oh!  I thought our production machine (where the error occurred) had the
same amount of memory as our test machine, but it doesn't:  I configured
the test machine with 15GB of memory, but the production machine only has
4GB.  That would explain it.

Thank you very much!

-- Scott

On Tue, Nov 24, 2020 at 9:23 AM Ilya Kasnacheev <[email protected]>
wrote:

> Hello!
>
> It seems that you have run out of available memory. I.e., your operating
> system could not allocate more memory even though the demand was still in
> the range permitted by data region configuration. How much RAM do you have
> on that machine?
>
> That you still have heap left is irrelevant here, since the allocation is
> for non-heap memory.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 23 нояб. 2020 г. в 21:44, Scott Prater <[email protected]>:
>
>> Hello,
>>
>> I recently ran into an out-of-memory error on a durable persistent cache
>> I set up a few weeks ago.  I have a single node, with durable persistence
>> enabled, as well as WAL archiving.  I'm running Ignite ver.
>> 2.8.1#20200521-sha1:86422096.
>>
>> I looked at the stack trace, but I couldn't get a clear fix on what part
>> of the system ran out of memory, or what parameters I should change to fix
>> the problem.  From what I could tell of the stack dump, it looks like the
>> WAL archive ran out of memory;  but the memory usage report that occurred
>> just a minute before the exception showed plenty of memory was available.
>>
>> Can someone with more experience tuning Ignite memory point me towards
>> the configuration parameters I should adjust?  Below are my log and my
>> configuration.  ( I have read the wiki page on memory tuning, but I'm happy
>> to be referred back to it.)
>>
>> The log, with the metrics right before the OOM exception, then the OOM
>> exception:
>>
>> [2020-11-22T19:20:39,787][INFO ][grid-timeout-worker-#22][IgniteKernal]
>> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>>     ^-- Node [id=2845fe3e, uptime=5 days, 15:08:38.033]
>>     ^-- Cluster [hosts=1, CPUs=4, servers=1, clients=0, topVer=1,
>> minorTopVer=1]
>>     ^-- Network [addrs=[0:0:0:0:0:0:0:1%lo, xxx.xxx.xxx.xxx, 127.0.0.1,
>> yyy.yyy.yyy.yyy], discoPort=47500, commPort=47100]
>>     ^-- CPU [CPUs=4, curLoad=0.33%, avgLoad=0.29%, GC=0%]
>>     ^-- Heap [used=316MB, free=62.34%, comm=812MB]
>>     ^-- Off-heap memory [used=4288MB, free=33.45%, allocated=6344MB]
>>     ^-- Page memory [pages=1085139]
>>     ^--   sysMemPlc region [type=internal, persistence=true,
>> lazyAlloc=false,
>>       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%,
>> allocRam=100MB, allocTotal=0MB]
>>     ^--   default_region region [type=default, persistence=true,
>> lazyAlloc=true,
>>       ...  initCfg=256MB, maxCfg=6144MB, usedRam=4288MB, freeRam=30.2%,
>> allocRam=6144MB, allocTotal=4240MB]
>>     ^--   metastoreMemPlc region [type=internal, persistence=true,
>> lazyAlloc=false,
>>       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.94%,
>> allocRam=0MB, allocTotal=0MB]
>>     ^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,
>>       ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%,
>> allocRam=100MB, allocTotal=0MB]
>>     ^-- Ignite persistence [used=4240MB]
>>     ^-- Outbound messages queue [size=0]
>>     ^-- Public thread pool [active=0, idle=0, qSize=0]
>>     ^-- System thread pool [active=0, idle=6, qSize=0]
>> [2020-11-22T19:21:15,585][ERROR][db-checkpoint-thread-#63][] Critical
>> system error detected. Will be handled accordingly to configured handler
>> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
>> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
>> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
>> failureCtx=FailureContext [type=CRITICAL_ERROR,
>> err=java.lang.OutOfMemoryError]]
>> java.lang.OutOfMemoryError: null
>>         at sun.misc.Unsafe.allocateMemory(Native Method) ~[?:1.8.0_121]
>>         at
>> org.apache.ignite.internal.util.GridUnsafe.allocateMemory(GridUnsafe.java:1205)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.util.GridUnsafe.allocateBuffer(GridUnsafe.java:264)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.wal.ByteBufferExpander.<init>(ByteBufferExpander.java:36)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.<init>(AbstractWalRecordsIterator.java:125)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2701)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2637)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:944)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:920)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.initIfNeeded(CheckpointEntry.java:347)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.access$300(CheckpointEntry.java:243)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.initIfNeeded(CheckpointEntry.java:122)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.groupState(CheckpointEntry.java:104)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.addCpToEarliestCpMap(CheckpointHistory.java:242)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.addCheckpoint(CheckpointHistory.java:175)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3952)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3515)
>> ~[ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3404)
>> [ignite-core-2.9.0.jar:2.9.0]
>>         at
>> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>> [ignite-core-2.9.0.jar:2.9.0]
>>         at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
>>
>> My configuration:
>>
>> <beans xmlns="http://www.springframework.org/schema/beans";
>>        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>>        xsi:schemaLocation="
>>        http://www.springframework.org/schema/beans
>>        http://www.springframework.org/schema/beans/spring-beans.xsd";>
>>     <bean id="ignite.cfg"
>> class="org.apache.ignite.configuration.IgniteConfiguration">
>>       <property name="gridLogger">
>>         <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
>>           <constructor-arg type="java.lang.String"
>> value="/etc/apache-ignite/log4j2.xml"/>
>>         </bean>
>>       </property>
>>       <property name="dataStorageConfiguration">
>>         <bean
>> class="org.apache.ignite.configuration.DataStorageConfiguration">
>>           <property name="storagePath"
>> value="/data/ignite/persistent-cache"/>
>>           <property name="walPath" value="/data/ignite/wal"/>
>>           <property name="walArchivePath"
>> value="/data/ignite/wal/archive"/>
>>           <property name="defaultDataRegionConfiguration">
>>             <bean
>> class="org.apache.ignite.configuration.DataRegionConfiguration">
>>               <property name="name" value="default_region"/>
>>               <property name="persistenceEnabled" value="true"/>
>>               <property name="maxSize" value="#{6L * 1024 * 1024 *
>> 1024}"/>
>>             </bean>
>>           </property>
>>           <property name="pageSize" value="#{4 * 1024}"/>
>>           <property name="maxWalArchiveSize" value="#{100L * 1024 * 1024
>> * 1024}"/>
>>         </bean>
>>       </property>
>>       <property name="cacheConfiguration">
>>         <list>
>>           <bean
>> class="org.apache.ignite.configuration.CacheConfiguration">
>>             <property name="name" value="default" />
>>             <property name="atomicityMode" value="ATOMIC" />
>>             <property name="backups" value="1"/>
>>             <property name="dataRegionName" value="default_region"/>
>>           </bean>
>>         </list>
>>       </property>
>>       <property name="discoverySpi">
>>         <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
>>           <property name="ipFinder">
>>             <bean
>> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
>>               <property name="addresses">
>>                 <list>
>>                   <value>127.0.0.1</value>
>>                 </list>
>>               </property>
>>             </bean>
>>           </property>
>>         </bean>
>>       </property>
>>       <property name="communicationSpi">
>>         <bean
>> class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
>>           <property name="idleConnectionTimeout" value="60000"/>
>>         </bean>
>>       </property>
>>     </bean>
>> </beans>
>>
>> Thanks in advance,
>>
>> -- Scott
>>
>

Reply via email to