Re[6]: Checkpointing threads

Zhenya Stanilovsky via user Sun, 11 Sep 2022 23:02:41 -0700


 
>We have observed one interesting issue with checkpointing. We are using 64G 
>RAM 12 CPU with 3K iops/128mbps SSDs. Our application fills up the WAL 
>directory really fast and hence the RAM. We made the following observations
>
>0. Not so bad news first, it resumes processing after getting stuck for 
>several minutes.
>
>1. WAL and WAL Archive writes are a lot faster than writes to the work 
>directory through checkpointing. Very curious to know why this is the case. 
>checkpointing writes never exceeds 15 mbps while wal and wal archive go really 
>high upto max limits of ssd
 
Very simple example : sequential changing of 1 key, so in wal you obtain all 
changes and in (in your terms — checkpointing) only one key change.
 
>
>2. We observed that when offheap memory usage tend to zero , checkpointing 
>takes minutes to complete , sometimes 30+ minutes which stalls the application 
>writes completely on all nodes. It means the whole cluster freezes. 
 
Seems ignite enables throttling in such a case, you need some system and 
cluster tuning. 
 
>
>3. Checkpointing thread get stuck at checkpointing page futures.get and after 
>several minutes, it logs this error and grid resumes processing
>
>"sys-stripe-0-#1" #19 prio=5 os_prio=0 cpu=86537.69ms elapsed=2166.63s 
>tid=0x00007fa52a6f1000 nid=0x3b waiting on condition  [0x00007fa4c58be000]
>   java.lang.Thread.State: WAITING (parking)
>at jdk.internal.misc.Unsafe.park( [email protected]/Native Method)
>at java.util.concurrent.locks.LockSupport.park( [email protected]/Unknown 
>Source)
>at 
>org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
>at 
>org.apache.ignite.internal.util.future.GridFutureAdapter.getUninterruptibly(GridFutureAdapter.java:146)
>at 
>org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:144)
>at 
>org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1613)
>at 
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3313)
>at 
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:143)
>at 
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:322)
>at 
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:317)
>at 
>org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1151)
>at 
>org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:592)
>at 
>org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
>at 
>org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
>at 
>org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:110)
>at 
>org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:309)
>at 
>org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1908)
>at 
>org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1529)
>at 
>org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
>at 
>org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1422)
>at 
>org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
>at 
>org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:569)
>at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>at java.lang.Thread.run( [email protected]/Unknown Source)   
>CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty 
>pages");
>checkpointReadWriteLock.readUnlock();
>
>if (timeout > 0 && U.currentTimeMillis() - start >= timeout)
>    failCheckpointReadLock();
>
>try {
>    pages
>        .futureFor(LOCK_RELEASED)
>        .getUninterruptibly();
>}
>
> [2022-09-09 18:58:35,148][ERROR][sys-stripe-9-#10][CheckpointTimeoutLock] 
> Checkpoint read lock acquisition has been timed out.
>class 
>org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock$CheckpointReadLockTimeoutException:
> Checkpoint read lock acquisition has been timed out.
>at 
>org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.failCheckpointReadLock(CheckpointTimeoutLock.java:210)
>at 
>org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:108)
>at 
>org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1613)
>at 
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3313)
>at 
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:143)
>at 
>org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:322)
>[2022-09-09 18:58:35,148][INFO ][sys-stripe-7-#8][FailureProcessor] Thread 
>dump is hidden due to throttling settings. Set 
>IGNITE_DUMP_THREADS_ON_FAILURE_THROTTLING_TIMEOUT property to 0 to see all 
>thread dumps.
>
>
>4. Other nodes printy below logs during the window problematic node is stuck 
>at checkpointing
>
>[2022-09-09 18:58:35,153][WARN ][push-metrics-exporter-#80][G] >>> Possible 
>starvation in striped pool.
>    Thread name: sys-stripe-5-#6
>    Queue: 
>[o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@eb9f832,
> Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, 
>ordered=false, timeout=0, skipOnTimeout=false, 
>msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1, 
>arr=[351148]]]]], Message closure [msg=GridIoMessage [plc=2, 
>topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, 
>msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=2, 
>arr=[273841,273843]]]]], Message closure [msg=GridIoMessage [plc=2, 
>topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, 
>msg=GridNearSingleGetRequest [futId=1662749921887, key=BinaryObjectImpl [arr= 
>true, ctx=false, start=0], flags=1, topVer=AffinityTopologyVersion [topVer=14, 
>minorTopVer=0], subjId=12746da1-ac0d-4ba1-933e-5aa3f92d2f68, taskNameHash=0, 
>createTtl=-1, accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure 
>[msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, 
>timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse 
>[futIds=GridLongList [idx=1, arr=[351149]]]]], 
>o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@110ec0fa,
> Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, 
>ordered=false, timeout=0, skipOnTimeout=false, 
>msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=10, 
>arr=[414638,414655,414658,414661,414662,414663,414666,414668,414673,414678]]]]],
> 
>o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@63ae8204,
> 
>o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@2d3cc0b,
> Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, 
>ordered=false, timeout=0, skipOnTimeout=false, 
>msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1, 
>arr=[414667]]]]], Message closure [msg=GridIoMessage [plc=2, 
>topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, 
>msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=4, 
>arr=[351159,351162,351163,351164]]]]], Message closure [msg=GridIoMessage 
>[plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, 
>skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse 
>[futIds=GridLongList [idx=1, arr=[290762]]]]], Message closure 
>[msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, 
>timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse 
>[futIds=GridLongList [idx=1, arr=[400357]]]]], 
>o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@71887193,
> Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, 
>ordered=false, timeout=0, skipOnTimeout=false, 
>msg=GridDhtAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true, 
>ctx=false, start=0], val=BinaryObjectImpl [arr= true, ctx=false, start=0], 
>prevVal=null, super=GridDhtAtomicAbstractUpdateRequest [onRes=false, 
>nearNodeId=null, nearFutId=0, flags=]]]], Message closure [msg=GridIoMessage 
>[plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, 
>skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest 
>[key=BinaryObjectImpl [arr= true, ctx=false, start=0], 
>parent=GridNearAtomicAbstractSingleUpdateRequest [nodeId=null, futId=1324019, 
>topVer=AffinityTopologyVersion [topVer=14, minorTopVer=0], 
>parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]]]]
>    Deadlock: false
>    Completed: 205703
>   
>On Wed, Sep 7, 2022 at 4:25 PM Zhenya Stanilovsky via user < 
>[email protected] > wrote:
>>Ok, Raymond i understand. But seems no one have good answer here, it depends 
>>on appropriate fs and near (probably cloud) layer implementation.
>>If you not observe «throttling» messages (described in prev link) seems it`s 
>>all ok, but of course you can benchmark your io by yourself with 3-rd party 
>>tool.
>>   
>>>Thanks Zhenya.
>>> 
>>>I have seen the link you provide has a lot of good information on this 
>>>system. But it does not talk about the check point writers in any detail.
>>> 
>>>I appreciate this cannot be a bottleneck, my question is more related to: 
>>>"If I have more check pointing threads will check points take less time". In 
>>>our case we use AWS EFS so if each checkpoint thread is spending relatively 
>>>long times blocking on write I/O to the persistent store then more check 
>>>points allow more concurrent writes to take place. Of course, if the check 
>>>point threads themselves utilise async I/O tasks and interleave I/O 
>>>activities on that basis then there may not be an opportunity for 
>>>performance improvement, but I am not an expert in the Ignite code base :)
>>> 
>>>Raymond.
>>>   
>>>On Wed, Sep 7, 2022 at 7:51 PM Zhenya Stanilovsky via user < 
>>>[email protected] > wrote:
>>>>
>>>>No, there is no any log and metrics suggestions and as i told earlier — 
>>>>this place can`t became a bottleneck, if you have any performance problems 
>>>>— describe them somehow wider and interesting reading here [1]
>>>> 
>>>>[1]  
>>>>https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>>>>    
>>>>>Thanks Zhenya. 
>>>>> 
>>>>>Is there any logging or metrics that would indicate if there was value 
>>>>>increasing the size of this pool?
>>>>> 
>>>>> 
>>>>>On Fri, 2 Sep 2022 at 8:20 PM, Zhenya Stanilovsky via user < 
>>>>>[email protected] > wrote:
>>>>>>Hi  Raymond
>>>>>> 
>>>>>>checkpoint threads is responsible for dumping modified pages, so you may 
>>>>>>consider it as io bound only operation and pool size is amount of disc 
>>>>>>writing workers.
>>>>>>I think that default is enough and no need for raising it, but it also up 
>>>>>>to you.
>>>>>>   
>>>>>>>Hi,
>>>>>>> 
>>>>>>>I am looking at our configuration of the Ignite checkpointing system to 
>>>>>>>ensure we have it tuned correctly.
>>>>>>> 
>>>>>>>There is a checkpointing thread pool defined, which defaults to 4 
>>>>>>>threads in size. I have not been able to find much of a discussion on 
>>>>>>>when/how this pool size should be changed to reflect the node size 
>>>>>>>Ignite is running on.
>>>>>>> 
>>>>>>>In our case, we are running 16 core servers with 128 GB RAM with 
>>>>>>>persistence on an NFS storage layer.
>>>>>>> 
>>>>>>>Given the number of cores, and the relative latency of NFS compared to 
>>>>>>>local SSD, is 4 checkpointing threads appropriate, or are we likely to 
>>>>>>>see better performance if we increased it to 8 (or more)?
>>>>>>> 
>>>>>>>If there is a discussion related to this a pointer to it would be good 
>>>>>>>(it's not really covered in the performance tuning section).
>>>>>>> 
>>>>>>>Thanks,
>>>>>>>Raymond.
>>>>>>>  --
>>>>>>>
>>>>>>>Raymond Wilson
>>>>>>>Trimble Distinguished Engineer, Civil Construction Software (CCS)
>>>>>>>11 Birmingham Drive   |   Christchurch, New Zealand
>>>>>>>[email protected]
>>>>>>>         
>>>>>>> 
>>>>>>>  
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>  
>>>> 
>>>> 
>>>> 
>>>>  
>>> 
>>>  --
>>>
>>>Raymond Wilson
>>>Trimble Distinguished Engineer, Civil Construction Software (CCS)
>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>[email protected]
>>>         
>>> 
>>>  
>> 
>> 
>> 
>>
Re[6]: Checkpointing threads

Reply via email to