Out-of-memory issue on single node cache with durable persistence

2020-11-23 Thread Scott Prater
Hello,

I recently ran into an out-of-memory error on a durable persistent cache I
set up a few weeks ago.  I have a single node, with durable persistence
enabled, as well as WAL archiving.  I'm running Ignite ver.
2.8.1#20200521-sha1:86422096.

I looked at the stack trace, but I couldn't get a clear fix on what part of
the system ran out of memory, or what parameters I should change to fix the
problem.  From what I could tell of the stack dump, it looks like the WAL
archive ran out of memory;  but the memory usage report that occurred just
a minute before the exception showed plenty of memory was available.

Can someone with more experience tuning Ignite memory point me towards the
configuration parameters I should adjust?  Below are my log and my
configuration.  ( I have read the wiki page on memory tuning, but I'm happy
to be referred back to it.)

The log, with the metrics right before the OOM exception, then the OOM
exception:

[2020-11-22T19:20:39,787][INFO ][grid-timeout-worker-#22][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=2845fe3e, uptime=5 days, 15:08:38.033]
^-- Cluster [hosts=1, CPUs=4, servers=1, clients=0, topVer=1,
minorTopVer=1]
^-- Network [addrs=[0:0:0:0:0:0:0:1%lo, xxx.xxx.xxx.xxx, 127.0.0.1,
yyy.yyy.yyy.yyy], discoPort=47500, commPort=47100]
^-- CPU [CPUs=4, curLoad=0.33%, avgLoad=0.29%, GC=0%]
^-- Heap [used=316MB, free=62.34%, comm=812MB]
^-- Off-heap memory [used=4288MB, free=33.45%, allocated=6344MB]
^-- Page memory [pages=1085139]
^--   sysMemPlc region [type=internal, persistence=true,
lazyAlloc=false,
  ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%,
allocRam=100MB, allocTotal=0MB]
^--   default_region region [type=default, persistence=true,
lazyAlloc=true,
  ...  initCfg=256MB, maxCfg=6144MB, usedRam=4288MB, freeRam=30.2%,
allocRam=6144MB, allocTotal=4240MB]
^--   metastoreMemPlc region [type=internal, persistence=true,
lazyAlloc=false,
  ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.94%,
allocRam=0MB, allocTotal=0MB]
^--   TxLog region [type=internal, persistence=true, lazyAlloc=false,
  ...  initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%,
allocRam=100MB, allocTotal=0MB]
^-- Ignite persistence [used=4240MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=6, qSize=0]
[2020-11-22T19:21:15,585][ERROR][db-checkpoint-thread-#63][] Critical
system error detected. Will be handled accordingly to configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
failureCtx=FailureContext [type=CRITICAL_ERROR,
err=java.lang.OutOfMemoryError]]
java.lang.OutOfMemoryError: null
at sun.misc.Unsafe.allocateMemory(Native Method) ~[?:1.8.0_121]
at
org.apache.ignite.internal.util.GridUnsafe.allocateMemory(GridUnsafe.java:1205)
~[ignite-core-2.9.0.jar:2.9.0]
at
org.apache.ignite.internal.util.GridUnsafe.allocateBuffer(GridUnsafe.java:264)
~[ignite-core-2.9.0.jar:2.9.0]
at
org.apache.ignite.internal.processors.cache.persistence.wal.ByteBufferExpander.(ByteBufferExpander.java:36)
~[ignite-core-2.9.0.jar:2.9.0]
at
org.apache.ignite.internal.processors.cache.persistence.wal.AbstractWalRecordsIterator.(AbstractWalRecordsIterator.java:125)
~[ignite-core-2.9.0.jar:2.9.0]
at
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.(FileWriteAheadLogManager.java:2701)
~[ignite-core-2.9.0.jar:2.9.0]
at
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.(FileWriteAheadLogManager.java:2637)
~[ignite-core-2.9.0.jar:2.9.0]
at
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:944)
~[ignite-core-2.9.0.jar:2.9.0]
at
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:920)
~[ignite-core-2.9.0.jar:2.9.0]
at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.initIfNeeded(CheckpointEntry.java:347)
~[ignite-core-2.9.0.jar:2.9.0]
at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry$GroupStateLazyStore.access$300(CheckpointEntry.java:243)
~[ignite-core-2.9.0.jar:2.9.0]
at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.initIfNeeded(CheckpointEntry.java:122)
~[ignite-core-2.9.0.jar:2.9.0]
at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry.groupState(CheckpointEntry.java:104)
~[ignite-core-2.9.0.jar:2.9.0]
at

Re: Question about Ignite persistence disk space used after clearing cache

2020-09-26 Thread Scott Prater
Thanks, Denis.  I'll take a look at that documentation.

On Fri, Sep 25, 2020 at 6:30 PM Denis Magda  wrote:

> Hi Scott,
>
> The disk space is not compacted even after you clear the entire cache. The
> compaction feature will be introduced to Ignite soon. So, the metric shows
> the allocated size. This doc section suggests an approach for the actual
> size calculation:
>
> https://www.gridgain.com/docs/latest/administrators-guide/monitoring-metrics/metrics#allocated-space-vs-actual-size-of-data
>
>
>
> -
> Denis
>
>
> On Fri, Sep 25, 2020 at 1:52 PM Scott Prater  wrote:
>
>> I have a question about how the off-heap usage is reported when Ignite
>> persistence is configured.  I have a single node set up.  I stored about
>> 1GB of items in the cache, then cleared the cache (remotely, using the Java
>> thin client:  ClientCache.clear()).
>>
>> I then verified that the items were no longer in the cache.
>>
>> However, when I look at the Ignite log, I do not see that the disk space
>> was freed:
>>
>> [2020-09-25T11:17:36,299][INFO ][grid-timeout-worker-#23][IgniteKernal]
>> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>> ^-- Node [id=db4ed295, uptime=00:34:00.176]
>> ^-- H/N/C [hosts=1, nodes=1, CPUs=8]
>> ^-- CPU [cur=0.2%, avg=0.3%, GC=0%]
>> ^-- PageMemory [pages=250315]
>> ^-- Heap [used=180MB, free=94.85%, comm=438MB]
>> ^-- Off-heap [used=989MB, free=88.35%, comm=8392MB]
>> ^--   sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
>> ^--   default_region region [used=989MB, free=87.92%, comm=8192MB]
>> ^--   metastoreMemPlc region [used=0MB, free=99.94%, comm=0MB]
>> ^--   TxLog region [used=0MB, free=100%, comm=100MB]
>> ^-- Ignite persistence [used=998MB]
>> ^--   sysMemPlc region [used=0MB]
>> ^--   default_region region [used=998MB]
>> ^--   metastoreMemPlc region [used=0MB]
>> ^--   TxLog region [used=0MB]
>> ^-- Outbound messages queue [size=0]
>> ^-- Public thread pool [active=0, idle=0, qSize=0]
>> ^-- System thread pool [active=0, idle=6, qSize=0]
>>
>> "Ignite persistence [used=998MB]" seems to indicate that 1GB of data is
>> still in the cache.  Is this simply a report of the disk space *allocated*,
>> or is actual disk space in use?  Is there a way to get both measurements?
>>
>> thanks,
>>
>> -- Scott
>>
>


Question about Ignite persistence disk space used after clearing cache

2020-09-25 Thread Scott Prater
I have a question about how the off-heap usage is reported when Ignite
persistence is configured.  I have a single node set up.  I stored about
1GB of items in the cache, then cleared the cache (remotely, using the Java
thin client:  ClientCache.clear()).

I then verified that the items were no longer in the cache.

However, when I look at the Ignite log, I do not see that the disk space
was freed:

[2020-09-25T11:17:36,299][INFO ][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=db4ed295, uptime=00:34:00.176]
^-- H/N/C [hosts=1, nodes=1, CPUs=8]
^-- CPU [cur=0.2%, avg=0.3%, GC=0%]
^-- PageMemory [pages=250315]
^-- Heap [used=180MB, free=94.85%, comm=438MB]
^-- Off-heap [used=989MB, free=88.35%, comm=8392MB]
^--   sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
^--   default_region region [used=989MB, free=87.92%, comm=8192MB]
^--   metastoreMemPlc region [used=0MB, free=99.94%, comm=0MB]
^--   TxLog region [used=0MB, free=100%, comm=100MB]
^-- Ignite persistence [used=998MB]
^--   sysMemPlc region [used=0MB]
^--   default_region region [used=998MB]
^--   metastoreMemPlc region [used=0MB]
^--   TxLog region [used=0MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=6, qSize=0]

"Ignite persistence [used=998MB]" seems to indicate that 1GB of data is
still in the cache.  Is this simply a report of the disk space *allocated*,
or is actual disk space in use?  Is there a way to get both measurements?

thanks,

-- Scott


Re: Feature request: method to test active connection in Ignite thin client

2020-09-13 Thread Scott Prater
Thanks, Pavel.  I'll follow the discussion there.

On Sun, Sep 13, 2020 at 5:27 AM Pavel Tupitsyn  wrote:

> Sorry, IgniteClient.cluster() was added for Ignite 2.9, which is not yet
> released.
> ClientCache.size() works too.
>
> I've started a discussion on the dev list [1] regarding a dedicated ping
> API.
>
> [1]
> http://apache-ignite-developers.2346864.n4.nabble.com/Thin-Client-ping-operation-td49181.html
>
> On Sat, Sep 12, 2020 at 9:33 PM Scott Prater  wrote:
>
>> Hello, Pavel --
>>
>> There isn't a method cluster() in the IgniteClient java class.  I came up
>> with a different workaround:  I retrieve the number of cached entries on
>> the heap, and check that it's greater than -1 and no exception is thrown.
>> That's not ideal, but should work, as long as ClientCache.size() never
>> returns a negative number.  Ideally I would use something like
>> clientCacheObject.ping(), which would simply send a request to the node or
>> cluster, and get back a response (or not).
>>
>> -- Scott
>>
>>
>> On Sat, Sep 12, 2020 at 9:33 AM Scott Prater  wrote:
>>
>>> Correct, something like a ping.  But the state() method should serve the
>>> purpose, too.  Thanks!
>>>
>>> On Sat, Sep 12, 2020 at 2:30 AM Pavel Tupitsyn 
>>> wrote:
>>>
>>>> Hello Scott,
>>>>
>>>> ClientCache.getName() is a local operation, it simply returns a cached
>>>> string.
>>>>
>>>> IgniteClient.cluster().state() is a good way to check the connectivity
>>>> -
>>>> it sends a lightweight request to the server.
>>>>
>>>> As I understood, you are asking for something like IgniteClient.ping(),
>>>> right?
>>>>
>>>> Thanks,
>>>> Pavel
>>>>
>>>> On Sat, Sep 12, 2020 at 2:25 AM Scott Prater  wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm a new Ignite user, and with just a little bit of exposure, I'm
>>>>> quite impressed with it -- it did not take me long to get a
>>>>> single standalone remote node up and running and start using it as a
>>>>> durable memory key-value store.
>>>>>
>>>>> I created a connection pool for ClientCache in Java, using Apache
>>>>> Commons Pool 2.  So far it's working well, but I had to fudge overriding
>>>>> the commons-pool2 "validateObject()" method, which is a method to test 
>>>>> that
>>>>> your pooled object is still alive and well:  I used
>>>>> clientCache.getName().equals("MY_CACHE") as a test, but I have no idea if
>>>>> this indicates whether the connection to my remote cache is active or not.
>>>>>
>>>>> In some future release, could you add a "isConnected()" method or
>>>>> similar to ClientCache or IgniteClient (if it makes more sense there), for
>>>>> ease of testing connections and determining when to discard bad client
>>>>> objects?
>>>>>
>>>>> thanks,
>>>>>
>>>>> -- Scott
>>>>>
>>>>>
>>>>>
>>>>
>>>>


Re: Feature request: method to test active connection in Ignite thin client

2020-09-12 Thread Scott Prater
Hello, Pavel --

There isn't a method cluster() in the IgniteClient java class.  I came up
with a different workaround:  I retrieve the number of cached entries on
the heap, and check that it's greater than -1 and no exception is thrown.
That's not ideal, but should work, as long as ClientCache.size() never
returns a negative number.  Ideally I would use something like
clientCacheObject.ping(), which would simply send a request to the node or
cluster, and get back a response (or not).

-- Scott


On Sat, Sep 12, 2020 at 9:33 AM Scott Prater  wrote:

> Correct, something like a ping.  But the state() method should serve the
> purpose, too.  Thanks!
>
> On Sat, Sep 12, 2020 at 2:30 AM Pavel Tupitsyn 
> wrote:
>
>> Hello Scott,
>>
>> ClientCache.getName() is a local operation, it simply returns a cached
>> string.
>>
>> IgniteClient.cluster().state() is a good way to check the connectivity -
>> it sends a lightweight request to the server.
>>
>> As I understood, you are asking for something like IgniteClient.ping(),
>> right?
>>
>> Thanks,
>> Pavel
>>
>> On Sat, Sep 12, 2020 at 2:25 AM Scott Prater  wrote:
>>
>>> Hello,
>>>
>>> I'm a new Ignite user, and with just a little bit of exposure, I'm quite
>>> impressed with it -- it did not take me long to get a single standalone
>>> remote node up and running and start using it as a durable memory key-value
>>> store.
>>>
>>> I created a connection pool for ClientCache in Java, using Apache
>>> Commons Pool 2.  So far it's working well, but I had to fudge overriding
>>> the commons-pool2 "validateObject()" method, which is a method to test that
>>> your pooled object is still alive and well:  I used
>>> clientCache.getName().equals("MY_CACHE") as a test, but I have no idea if
>>> this indicates whether the connection to my remote cache is active or not.
>>>
>>> In some future release, could you add a "isConnected()" method or
>>> similar to ClientCache or IgniteClient (if it makes more sense there), for
>>> ease of testing connections and determining when to discard bad client
>>> objects?
>>>
>>> thanks,
>>>
>>> -- Scott
>>>
>>>
>>>
>>
>>


Re: Feature request: method to test active connection in Ignite thin client

2020-09-12 Thread Scott Prater
Correct, something like a ping.  But the state() method should serve the
purpose, too.  Thanks!

On Sat, Sep 12, 2020 at 2:30 AM Pavel Tupitsyn  wrote:

> Hello Scott,
>
> ClientCache.getName() is a local operation, it simply returns a cached
> string.
>
> IgniteClient.cluster().state() is a good way to check the connectivity -
> it sends a lightweight request to the server.
>
> As I understood, you are asking for something like IgniteClient.ping(),
> right?
>
> Thanks,
> Pavel
>
> On Sat, Sep 12, 2020 at 2:25 AM Scott Prater  wrote:
>
>> Hello,
>>
>> I'm a new Ignite user, and with just a little bit of exposure, I'm quite
>> impressed with it -- it did not take me long to get a single standalone
>> remote node up and running and start using it as a durable memory key-value
>> store.
>>
>> I created a connection pool for ClientCache in Java, using Apache Commons
>> Pool 2.  So far it's working well, but I had to fudge overriding the
>> commons-pool2 "validateObject()" method, which is a method to test that
>> your pooled object is still alive and well:  I used
>> clientCache.getName().equals("MY_CACHE") as a test, but I have no idea if
>> this indicates whether the connection to my remote cache is active or not.
>>
>> In some future release, could you add a "isConnected()" method or similar
>> to ClientCache or IgniteClient (if it makes more sense there), for ease of
>> testing connections and determining when to discard bad client objects?
>>
>> thanks,
>>
>> -- Scott
>>
>>
>>
>
>


Feature request: method to test active connection in Ignite thin client

2020-09-11 Thread Scott Prater
Hello,

I'm a new Ignite user, and with just a little bit of exposure, I'm quite
impressed with it -- it did not take me long to get a single standalone
remote node up and running and start using it as a durable memory key-value
store.

I created a connection pool for ClientCache in Java, using Apache Commons
Pool 2.  So far it's working well, but I had to fudge overriding the
commons-pool2 "validateObject()" method, which is a method to test that
your pooled object is still alive and well:  I used
clientCache.getName().equals("MY_CACHE") as a test, but I have no idea if
this indicates whether the connection to my remote cache is active or not.

In some future release, could you add a "isConnected()" method or similar
to ClientCache or IgniteClient (if it makes more sense there), for ease of
testing connections and determining when to discard bad client objects?

thanks,

-- Scott