>We have observed one interesting issue with checkpointing. We are using 64G
>RAM 12 CPU with 3K iops/128mbps SSDs. Our application fills up the WAL
>directory really fast and hence the RAM. We made the following observations
>
>0. Not so bad news first, it resumes processing after getting stuck for
>several minutes.
>
>1. WAL and WAL Archive writes are a lot faster than writes to the work
>directory through checkpointing. Very curious to know why this is the case.
>checkpointing writes never exceeds 15 mbps while wal and wal archive go really
>high upto max limits of ssd
Very simple example : sequential changing of 1 key, so in wal you obtain all
changes and in (in your terms — checkpointing) only one key change.
>
>2. We observed that when offheap memory usage tend to zero , checkpointing
>takes minutes to complete , sometimes 30+ minutes which stalls the application
>writes completely on all nodes. It means the whole cluster freezes.
Seems ignite enables throttling in such a case, you need some system and
cluster tuning.
>
>3. Checkpointing thread get stuck at checkpointing page futures.get and after
>several minutes, it logs this error and grid resumes processing
>
>"sys-stripe-0-#1" #19 prio=5 os_prio=0 cpu=86537.69ms elapsed=2166.63s
>tid=0x00007fa52a6f1000 nid=0x3b waiting on condition [0x00007fa4c58be000]
> java.lang.Thread.State: WAITING (parking)
>at jdk.internal.misc.Unsafe.park( [email protected]/Native Method)
>at java.util.concurrent.locks.LockSupport.park( [email protected]/Unknown
>Source)
>at
>org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
>at
>org.apache.ignite.internal.util.future.GridFutureAdapter.getUninterruptibly(GridFutureAdapter.java:146)
>at
>org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:144)
>at
>org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1613)
>at
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3313)
>at
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:143)
>at
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:322)
>at
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:317)
>at
>org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1151)
>at
>org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:592)
>at
>org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
>at
>org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
>at
>org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:110)
>at
>org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:309)
>at
>org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1908)
>at
>org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1529)
>at
>org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
>at
>org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1422)
>at
>org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
>at
>org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:569)
>at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>at java.lang.Thread.run( [email protected]/Unknown Source)
>CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty
>pages");
>checkpointReadWriteLock.readUnlock();
>
>if (timeout > 0 && U.currentTimeMillis() - start >= timeout)
> failCheckpointReadLock();
>
>try {
> pages
> .futureFor(LOCK_RELEASED)
> .getUninterruptibly();
>}
>
> [2022-09-09 18:58:35,148][ERROR][sys-stripe-9-#10][CheckpointTimeoutLock]
> Checkpoint read lock acquisition has been timed out.
>class
>org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock$CheckpointReadLockTimeoutException:
> Checkpoint read lock acquisition has been timed out.
>at
>org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.failCheckpointReadLock(CheckpointTimeoutLock.java:210)
>at
>org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:108)
>at
>org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1613)
>at
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3313)
>at
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:143)
>at
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:322)
>[2022-09-09 18:58:35,148][INFO ][sys-stripe-7-#8][FailureProcessor] Thread
>dump is hidden due to throttling settings. Set
>IGNITE_DUMP_THREADS_ON_FAILURE_THROTTLING_TIMEOUT property to 0 to see all
>thread dumps.
>
>
>4. Other nodes printy below logs during the window problematic node is stuck
>at checkpointing
>
>[2022-09-09 18:58:35,153][WARN ][push-metrics-exporter-#80][G] >>> Possible
>starvation in striped pool.
> Thread name: sys-stripe-5-#6
> Queue:
>[o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@eb9f832,
> Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
>ordered=false, timeout=0, skipOnTimeout=false,
>msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1,
>arr=[351148]]]]], Message closure [msg=GridIoMessage [plc=2,
>topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
>msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=2,
>arr=[273841,273843]]]]], Message closure [msg=GridIoMessage [plc=2,
>topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
>msg=GridNearSingleGetRequest [futId=1662749921887, key=BinaryObjectImpl [arr=
>true, ctx=false, start=0], flags=1, topVer=AffinityTopologyVersion [topVer=14,
>minorTopVer=0], subjId=12746da1-ac0d-4ba1-933e-5aa3f92d2f68, taskNameHash=0,
>createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure
>[msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false,
>timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse
>[futIds=GridLongList [idx=1, arr=[351149]]]]],
>o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@110ec0fa,
> Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
>ordered=false, timeout=0, skipOnTimeout=false,
>msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=10,
>arr=[414638,414655,414658,414661,414662,414663,414666,414668,414673,414678]]]]],
>
>o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@63ae8204,
>
>o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@2d3cc0b,
> Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
>ordered=false, timeout=0, skipOnTimeout=false,
>msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1,
>arr=[414667]]]]], Message closure [msg=GridIoMessage [plc=2,
>topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
>msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=4,
>arr=[351159,351162,351163,351164]]]]], Message closure [msg=GridIoMessage
>[plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
>skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse
>[futIds=GridLongList [idx=1, arr=[290762]]]]], Message closure
>[msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false,
>timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse
>[futIds=GridLongList [idx=1, arr=[400357]]]]],
>o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@71887193,
> Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
>ordered=false, timeout=0, skipOnTimeout=false,
>msg=GridDhtAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true,
>ctx=false, start=0], val=BinaryObjectImpl [arr= true, ctx=false, start=0],
>prevVal=null, super=GridDhtAtomicAbstractUpdateRequest [onRes=false,
>nearNodeId=null, nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage
>[plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
>skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest
>[key=BinaryObjectImpl [arr= true, ctx=false, start=0],
>parent=GridNearAtomicAbstractSingleUpdateRequest [nodeId=null, futId=1324019,
>topVer=AffinityTopologyVersion [topVer=14, minorTopVer=0],
>parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]]]]
> Deadlock: false
> Completed: 205703
>
>On Wed, Sep 7, 2022 at 4:25 PM Zhenya Stanilovsky via user <
>[email protected] > wrote:
>>Ok, Raymond i understand. But seems no one have good answer here, it depends
>>on appropriate fs and near (probably cloud) layer implementation.
>>If you not observe «throttling» messages (described in prev link) seems it`s
>>all ok, but of course you can benchmark your io by yourself with 3-rd party
>>tool.
>>
>>>Thanks Zhenya.
>>>
>>>I have seen the link you provide has a lot of good information on this
>>>system. But it does not talk about the check point writers in any detail.
>>>
>>>I appreciate this cannot be a bottleneck, my question is more related to:
>>>"If I have more check pointing threads will check points take less time". In
>>>our case we use AWS EFS so if each checkpoint thread is spending relatively
>>>long times blocking on write I/O to the persistent store then more check
>>>points allow more concurrent writes to take place. Of course, if the check
>>>point threads themselves utilise async I/O tasks and interleave I/O
>>>activities on that basis then there may not be an opportunity for
>>>performance improvement, but I am not an expert in the Ignite code base :)
>>>
>>>Raymond.
>>>
>>>On Wed, Sep 7, 2022 at 7:51 PM Zhenya Stanilovsky via user <
>>>[email protected] > wrote:
>>>>
>>>>No, there is no any log and metrics suggestions and as i told earlier —
>>>>this place can`t became a bottleneck, if you have any performance problems
>>>>— describe them somehow wider and interesting reading here [1]
>>>>
>>>>[1]
>>>>https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>>>>
>>>>>Thanks Zhenya.
>>>>>
>>>>>Is there any logging or metrics that would indicate if there was value
>>>>>increasing the size of this pool?
>>>>>
>>>>>
>>>>>On Fri, 2 Sep 2022 at 8:20 PM, Zhenya Stanilovsky via user <
>>>>>[email protected] > wrote:
>>>>>>Hi Raymond
>>>>>>
>>>>>>checkpoint threads is responsible for dumping modified pages, so you may
>>>>>>consider it as io bound only operation and pool size is amount of disc
>>>>>>writing workers.
>>>>>>I think that default is enough and no need for raising it, but it also up
>>>>>>to you.
>>>>>>
>>>>>>>Hi,
>>>>>>>
>>>>>>>I am looking at our configuration of the Ignite checkpointing system to
>>>>>>>ensure we have it tuned correctly.
>>>>>>>
>>>>>>>There is a checkpointing thread pool defined, which defaults to 4
>>>>>>>threads in size. I have not been able to find much of a discussion on
>>>>>>>when/how this pool size should be changed to reflect the node size
>>>>>>>Ignite is running on.
>>>>>>>
>>>>>>>In our case, we are running 16 core servers with 128 GB RAM with
>>>>>>>persistence on an NFS storage layer.
>>>>>>>
>>>>>>>Given the number of cores, and the relative latency of NFS compared to
>>>>>>>local SSD, is 4 checkpointing threads appropriate, or are we likely to
>>>>>>>see better performance if we increased it to 8 (or more)?
>>>>>>>
>>>>>>>If there is a discussion related to this a pointer to it would be good
>>>>>>>(it's not really covered in the performance tuning section).
>>>>>>>
>>>>>>>Thanks,
>>>>>>>Raymond.
>>>>>>> --
>>>>>>>
>>>>>>>Raymond Wilson
>>>>>>>Trimble Distinguished Engineer, Civil Construction Software (CCS)
>>>>>>>11 Birmingham Drive | Christchurch, New Zealand
>>>>>>>[email protected]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>>Raymond Wilson
>>>Trimble Distinguished Engineer, Civil Construction Software (CCS)
>>>11 Birmingham Drive | Christchurch, New Zealand
>>>[email protected]
>>>
>>>
>>>
>>
>>
>>
>>