We have observed one interesting issue with checkpointing. We are using 64G
RAM 12 CPU with 3K iops/128mbps SSDs. Our application fills up the WAL
directory really fast and hence the RAM. We made the following observations

0. Not so bad news first, it resumes processing after getting stuck for
several minutes.

1. WAL and WAL Archive writes are a lot faster than writes to the work
directory through checkpointing. Very curious to know why this is the case.
checkpointing writes never exceeds 15 mbps while wal and wal archive go
really high upto max limits of ssd

2. We observed that when offheap memory usage tend to zero , checkpointing
takes minutes to complete , sometimes 30+ minutes which stalls the
application writes completely on all nodes. It means the whole cluster
freezes.

3. Checkpointing thread get stuck at checkpointing page futures.get and
after several minutes, it logs this error and grid resumes processing

"sys-stripe-0-#1" #19 prio=5 os_prio=0 cpu=86537.69ms elapsed=2166.63s
tid=0x00007fa52a6f1000 nid=0x3b waiting on condition  [0x00007fa4c58be000]
   java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.14.1/Native Method)
at java.util.concurrent.locks.LockSupport.park(java.base@11.0.14.1/Unknown
Source)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.getUninterruptibly(GridFutureAdapter.java:146)
at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:144)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1613)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3313)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:143)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:322)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:317)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1151)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:592)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:110)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:309)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1908)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1529)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
at
org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1422)
at
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
at
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:569)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(java.base@11.0.14.1/Unknown Source)
CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many
dirty pages");

checkpointReadWriteLock.readUnlock();

if (timeout > 0 && U.currentTimeMillis() - start >= timeout)
    failCheckpointReadLock();

try {
    pages
        .futureFor(LOCK_RELEASED)
        .getUninterruptibly();
}


[2022-09-09 18:58:35,148][ERROR][sys-stripe-9-#10][CheckpointTimeoutLock]
Checkpoint read lock acquisition has been timed out.
class
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock$CheckpointReadLockTimeoutException:
Checkpoint read lock acquisition has been timed out.
at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.failCheckpointReadLock(CheckpointTimeoutLock.java:210)
at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:108)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1613)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processDhtAtomicUpdateRequest(GridDhtAtomicCache.java:3313)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$600(GridDhtAtomicCache.java:143)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$7.apply(GridDhtAtomicCache.java:322)
[2022-09-09 18:58:35,148][INFO ][sys-stripe-7-#8][FailureProcessor] Thread
dump is hidden due to throttling settings. Set
IGNITE_DUMP_THREADS_ON_FAILURE_THROTTLING_TIMEOUT property to 0 to see all
thread dumps.


4. Other nodes printy below logs during the window problematic node is
stuck at checkpointing

[2022-09-09 18:58:35,153][WARN ][push-metrics-exporter-#80][G] >>> Possible
starvation in striped pool.
    Thread name: sys-stripe-5-#6
    Queue:
[o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@eb9f832,
Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false,
msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1,
arr=[351148]]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse
[futIds=GridLongList [idx=2, arr=[273841,273843]]]]], Message closure
[msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false,
timeout=0, skipOnTimeout=false, msg=GridNearSingleGetRequest
[futId=1662749921887, key=BinaryObjectImpl [arr= true, ctx=false, start=0],
flags=1, topVer=AffinityTopologyVersion [topVer=14, minorTopVer=0],
subjId=12746da1-ac0d-4ba1-933e-5aa3f92d2f68, taskNameHash=0, createTtl=-1,
accessTtl=-1, txLbl=null, mvccSnapshot=null]]], Message closure
[msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false,
timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse
[futIds=GridLongList [idx=1, arr=[351149]]]]],
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@110ec0fa,
Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false,
msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=10,
arr=[414638,414655,414658,414661,414662,414663,414666,414668,414673,414678]]]]],
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@63ae8204,
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@2d3cc0b,
Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false,
msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1,
arr=[414667]]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse
[futIds=GridLongList [idx=4, arr=[351159,351162,351163,351164]]]]], Message
closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false,
msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1,
arr=[290762]]]]], Message closure [msg=GridIoMessage [plc=2,
topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse
[futIds=GridLongList [idx=1, arr=[400357]]]]],
o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout@71887193,
Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8,
ordered=false, timeout=0, skipOnTimeout=false,
msg=GridDhtAtomicSingleUpdateRequest [key=BinaryObjectImpl [arr= true,
ctx=false, start=0], val=BinaryObjectImpl [arr= true, ctx=false, start=0],
prevVal=null, super=GridDhtAtomicAbstractUpdateRequest [onRes=false,
nearNodeId=null, nearFutId=0, flags=]]]], Message closure
[msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false,
timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest
[key=BinaryObjectImpl [arr= true, ctx=false, start=0],
parent=GridNearAtomicAbstractSingleUpdateRequest [nodeId=null,
futId=1324019, topVer=AffinityTopologyVersion [topVer=14, minorTopVer=0],
parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]]]]]]
    Deadlock: false
    Completed: 205703


On Wed, Sep 7, 2022 at 4:25 PM Zhenya Stanilovsky via user <
user@ignite.apache.org> wrote:

> Ok, Raymond i understand. But seems no one have good answer here, it
> depends on appropriate fs and near (probably cloud) layer implementation.
> If you not observe «throttling» messages (described in prev link) seems
> it`s all ok, but of course you can benchmark your io by yourself with 3-rd
> party tool.
>
>
> Thanks Zhenya.
>
> I have seen the link you provide has a lot of good information on this
> system. But it does not talk about the check point writers in any detail.
>
> I appreciate this cannot be a bottleneck, my question is more related to:
> "If I have more check pointing threads will check points take less time".
> In our case we use AWS EFS so if each checkpoint thread is spending
> relatively long times blocking on write I/O to the persistent store then
> more check points allow more concurrent writes to take place. Of course, if
> the check point threads themselves utilise async I/O tasks and
> interleave I/O activities on that basis then there may not be an
> opportunity for performance improvement, but I am not an expert in the
> Ignite code base :)
>
> Raymond.
>
>
> On Wed, Sep 7, 2022 at 7:51 PM Zhenya Stanilovsky via user <
> user@ignite.apache.org
> <//e.mail.ru/compose/?mailto=mailto%3au...@ignite.apache.org>> wrote:
>
>
> No, there is no any log and metrics suggestions and as i told earlier
> — this place can`t became a bottleneck, if you have any performance
> problems — describe them somehow wider and interesting reading here [1]
>
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>
>
>
> Thanks Zhenya.
>
> Is there any logging or metrics that would indicate if there was value
> increasing the size of this pool?
>
>
> On Fri, 2 Sep 2022 at 8:20 PM, Zhenya Stanilovsky via user <
> user@ignite.apache.org
> <http://e.mail.ru/compose/?mailto=mailto%3au...@ignite.apache.org>> wrote:
>
> Hi Raymond
>
> checkpoint threads is responsible for dumping modified pages, so you may
> consider it as io bound only operation and pool size is amount of
> disc writing workers.
> I think that default is enough and no need for raising it, but it also up
> to you.
>
>
>
> Hi,
>
> I am looking at our configuration of the Ignite checkpointing system to
> ensure we have it tuned correctly.
>
> There is a checkpointing thread pool defined, which defaults to 4 threads
> in size. I have not been able to find much of a discussion on when/how this
> pool size should be changed to reflect the node size Ignite is running on.
>
> In our case, we are running 16 core servers with 128 GB RAM with
> persistence on an NFS storage layer.
>
> Given the number of cores, and the relative latency of NFS compared to
> local SSD, is 4 checkpointing threads appropriate, or are we likely to see
> better performance if we increased it to 8 (or more)?
>
> If there is a discussion related to this a pointer to it would be good
> (it's not really covered in the performance tuning section).
>
> Thanks,
> Raymond.
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Trimble Distinguished Engineer, Civil Construction Software (CCS)
> 11 Birmingham Drive
> <https://www.google.com/maps/search/11+Birmingham+Drive+%7C+Christchurch,+New+Zealand?entry=gmail&source=g>
>  |
> <https://www.google.com/maps/search/11+Birmingham+Drive+%7C+Christchurch,+New+Zealand?entry=gmail&source=g>
>  Christchurch, New Zealand
> <https://www.google.com/maps/search/11+Birmingham+Drive+%7C+Christchurch,+New+Zealand?entry=gmail&source=g>
> raymond_wil...@trimble.com
> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Trimble Distinguished Engineer, Civil Construction Software (CCS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
> <//e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
>
>
>

Reply via email to