Re: Configuring persistence for large dataset on small cluster

Sebastien Blind Mon, 15 Mar 2021 09:12:16 -0700

Definite slow down (from 160k/sec to 8k/sec with 10 concurrent clients) and
plenty of messages like

[07:08:16] Possible failure suppressed accordingly to a configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
failureCtx=FailureContext [type=SYSTEM_CRITICAL_OPERATION_TIMEOUT,
err=class o.a.i.IgniteException: Checkpoint read lock acquisition has been
timed out.]]
[07:08:16,915][SEVERE][client-connector-#85][GridCacheDatabaseSharedManager]
Checkpoint read lock acquisition has been timed out.
class
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$CheckpointReadLockTimeoutException:
Checkpoint read lock acquisition has been timed out.
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.failCheckpointReadLock(GridCacheDatabaseSharedManager.java:1728)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1654)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1819)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1734)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:300)
....

[09:04:25,831][SEVERE][tcp-disco-msg-worker-[crd]-#2-#56][G] Blocked
system-critical thread has been detected. This can lead to cluster-wide
undefined behaviour [workerName=db-checkpoint-thread,
threadName=db-checkpoint-thread-#79, blockedFor=13s]
[09:04:25] Possible failure suppressed accordingly to a configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=db-checkpoint-thread,
igniteInstanceName=null, finished=false, heartbeatTs=1615824252073]]]

On Mon, Mar 15, 2021 at 2:44 AM Stephen Darlington <
[email protected]> wrote:

> What happens after you have 150mm records? You say it’s not okay, but you
> don’t say what does happen. Does it crash, slow down, what?
>
> With SQL and persistence, I think you’d probably need more heap space.
> Increasing the number of partitions isn’t likely to help.
>
> On 14 Mar 2021, at 20:14, Sebastien Blind <[email protected]>
> wrote:
>
> Hello Apache Ignite friends,
> I am trying to load 1bn records in an ignite cache with persistence
> enabled for a poc. Records only consist of an id, a uuid and a state, and
> I'd like to lookup by id (this is the key in the cache) and by token (sql
> queries for that are fine or another cache keyed by uuid is another option).
>
> The experiment is running locally on a MacOs (16Gb memory) laptop and I'd
> like to use as little memory as possible, and use disk storage as much as
> possible. Everything works ok until ~150M with query index enabled, and
> ~600M without index or side cache, so I'd be curious about what type of
> settings (if any) could help w/ the setup, and what amount of data can a
> single node be expected to handle. Restarting my ignite node and the
> loading process seems to help and more data can be stuffed in the cache.
>
> I've tried to play w/ settings like size, number of partitions etc and to
> find some info on how to control what's onheap/offheap etc, any
> overhead/sizing that could be taken into consideration without much
> success. Maybe the experiment is doomed to fail just based on the specs,
> but it would be good to understand the constraints..  so any help would be
> much appreciated!
>
> Thanks in advance!
> Sebastien
>
> PS:
> JVM is running w/ -Xms4g -Xmx4g -server -XX:MaxMetaspaceSize=256m settings.
>
> Some snippet of the config:
> <property name=*"persistenceEnabled"* value=*"true"* />
> <property name=*"initialSize"* value=*"#{4L * 1024 * 1024 * 1024}"* />
> <property name=*"maxSize" *value=*"#{4L * 1024 * 1024 * 1024}"* />
> <property name=*"pageEvictionMode"* value=*"RANDOM_2_LRU"* />
>
> Cache configuration
> <property name=*"cacheConfiguration"*>
> <bean class=*"org.apache.ignite.configuration.CacheConfiguration"*>
> <!-- Set the cache name. -->
> <property name=*"name"* value=*"qa_sqr_txn"* />
> <!-- Set the cache mode. -->
> <property name=*"cacheMode"* value=*"PARTITIONED"* />
> <property name=*"backups"* value=*"0"* />
> <property name=*"storeKeepBinary"* value=*"true"* />
>
> <property name=*"affinity"*>
> <bean class=
> *"org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction"*
> >
> <property name=*"partitions"* value=*"8192"* />
> </bean>
> </property>
>
> <!-- Configure query entities -->
> <property name=*"queryEntities"*>
> <list>
> <bean class=*"org.apache.ignite.cache.QueryEntity"*>
> <!-- Setting the type of the key -->
> <property name=*"keyType"* value=*"java.lang.String"* />
> <property name=*"keyFieldName"* value=*"id"* />
>
> <!-- Setting type of the value -->
> <property name=*"valueType"* value=*"com.xxx.Txn"* />
>
> <property name=*"fields"*>
> <map>
> <entry key=*"id"* value=*"java.lang.String"* />
> <entry key=*"token"* value=*"java.lang.String"* />
> <entry key=*"state"* value=*"java.lang.String "* />
> </map>
> </property>
> <!--
> <property name="indexes">
> <list>
> <bean class="org.apache.ignite.cache.QueryIndex">
> <constructor-arg value="token" />
> </bean>
> </list>
> </property>
>  -->
> </bean>
> </list>
> </property>
> </bean>
> </property>
>
>
>
>

Re: Configuring persistence for large dataset on small cluster

Reply via email to