Correct code is running from here:
if (checkpointReadWriteLock.getReadHoldCount() > 1 ||
safeToUpdatePageMemories() || checkpointer.runner() == null)
break;
else {
CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many
dirty pages");
and near you can see that :
maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED
? pool.pages() * 3L / 4
: Math.min(pool.pages() * 2L / 3, cpPoolPages);
Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp.
>In (
>https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
> ), there is a mention of a dirty pages limit that is a factor that can
>trigger check points.
>
>I also found this issue:
>http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
> where "too many dirty pages" is a reason given for initiating a checkpoint.
>
>After reviewing our logs I found this: (one example)
>
>2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint
>started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28,
>startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573],
>checkpointBeforeLockTime=99ms, checkpointLockWait=0ms,
>checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms,
>walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms,
>splitAndSortCpPagesDuration=45ms, pages=33421, reason=' too many dirty pages
>']
>
>Which suggests we may have the issue where writes are frozen until the check
>point is completed.
>
>Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to
>be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
>
> /**
> * Threshold to calculate limit for pages list on-heap caches.
> * <p>
> * Note: When a checkpoint is triggered, we need some amount of page
>memory to store pages list on-heap cache.
> * If a checkpoint is triggered by "too many dirty pages" reason and pages
>list cache is rather big, we can get
>* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total
>amount of cached page list buckets,
> * assuming that checkpoint will be triggered if no more then 3/4 of pages
>will be marked as dirty (there will be
> * at least 1/4 of clean pages) and each cached page list bucket can be
>stored to up to 2 pages (this value is not
> * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE >
>PagesListNodeIO#getCapacity it can take
> * more than 2 pages). Also some amount of page memory needed to store
>page list metadata.
> */
> private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD =
>0.1 ;
>
>This raises two questions:
>
>1. The data region where most writes are occurring has 4Gb allocated to it,
>though it is permitted to start at a much lower level. 4Gb should be 1,000,000
>pages, 10% of which should be 100,000 dirty pages.
>
>The 'limit holder' is calculated like this:
>
> /**
> * @return Holder for page list cache limit for given data region.
> */
> public AtomicLong pageListCacheLimitHolder ( DataRegion dataRegion
>) {
> if ( dataRegion . config (). isPersistenceEnabled ()) {
> return pageListCacheLimits . computeIfAbsent ( dataRegion .
>config (). getName (), name -> new AtomicLong (
> ( long )(((PageMemoryEx) dataRegion . pageMemory ()).
>totalPages () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
> }
> return null ;
> }
>
>... but I am unsure if totalPages() is referring to the current size of the
>data region, or the size it is permitted to grow to. ie: Could the 'dirty page
>limit' be a sliding limit based on the growth of the data region? Is it better
>to set the initial and maximum sizes of data regions to be the same number?
>
>2. We have two data regions, one supporting inbound arrival of data (with low
>numbers of writes), and one supporting storage of processed results from the
>arriving data (with many more writes).
>
>The block on writes due to the number of dirty pages appears to affect all
>data regions, not just the one which has violated the dirty page limit. Is
>that correct? If so, is this something that can be improved?
>
>Thanks,
>Raymond.
>
>On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson < [email protected] >
>wrote:
>>I'm working on getting automatic JVM thread stack dumping occurring if we
>>detect long delays in put (PutIfAbsent) operations. Hopefully this will
>>provide more information.
>>On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky < [email protected] >
>>wrote:
>>>
>>>Don`t think so, checkpointing work perfectly well already before this fix.
>>>Need additional info for start digging your problem, can you share ignite
>>>logs somewhere?
>>>
>>>>I noticed an entry in the Ignite 2.9.1 changelog:
>>>>* Improved checkpoint concurrent behaviour
>>>>I am having trouble finding the relevant Jira ticket for this in the 2.9.1
>>>>Jira area at
>>>>https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>>>>
>>>>Perhaps this change may improve the checkpointing issue we are seeing?
>>>>
>>>>Raymond.
>>>>
>>>>On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson < [email protected]
>>>>> wrote:
>>>>>Hi Zhenya,
>>>>>
>>>>>1. We currently use AWS EFS for primary storage, with provisioned IOPS to
>>>>>provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage
>>>>>(with at least 5 nodes writing to it, including WAL and WAL archive), so
>>>>>we are not saturating the EFS interface. We use the default page size
>>>>>(experiments with larger page sizes showed instability when checkpointing
>>>>>due to free page starvation, so we reverted to the default size).
>>>>>
>>>>>2. Thanks for the detail, we will look for that in thread dumps when we
>>>>>can create them.
>>>>>
>>>>>3. We are using the default CP buffer size, which is max(256Mb,
>>>>>DataRagionSize / 4) according to the Ignite documentation, so this should
>>>>>have more than enough checkpoint buffer space to cope with writes. As
>>>>>additional information, the cache which is displaying very slow writes is
>>>>>in a data region with relatively slow write traffic. There is a primary
>>>>>(default) data region with large write traffic, and the vast majority of
>>>>>pages being written in a checkpoint will be for that default data region.
>>>>>
>>>>>4. Yes, this is very surprising. Anecdotally from our logs it appears
>>>>>write traffic into the low write traffic cache is blocked during
>>>>>checkpoints.
>>>>>
>>>>>Thanks,
>>>>>Raymond.
>>>>>
>>>>>
>>>>>On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky < [email protected] >
>>>>>wrote:
>>>>>>*
>>>>>>Additionally to Ilya reply you can check vendors page for additional
>>>>>>info, all in this page are applicable for ignite too [1]. Increasing
>>>>>>threads number leads to concurrent io usage, thus if your have something
>>>>>>like nvme — it`s up to you but in case of sas possibly better would be to
>>>>>>reduce this param.
>>>>>>* Log will shows you something like :
>>>>>>Parking thread=%Thread name% for timeout(ms)= %time% and appropriate :
>>>>>>Unparking thread=
>>>>>>* No additional looging with cp buffer usage are provided. cp buffer
>>>>>>need to be more than 10% of overall persistent DataRegions size.
>>>>>>* 90 seconds or longer — Seems like problems in io or system tuning,
>>>>>>it`s very bad score i hope.
>>>>>>[1]
>>>>>>https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Hi,
>>>>>>>
>>>>>>>We have been investigating some issues which appear to be related to
>>>>>>>checkpointing. We currently use the IA 2.8.1 with the C# client.
>>>>>>>
>>>>>>>I have been trying to gain clarity on how certain aspects of the Ignite
>>>>>>>configuration relate to the checkpointing process:
>>>>>>>
>>>>>>>1. Number of check pointing threads. This defaults to 4, but I don't
>>>>>>>understand how it applies to the checkpointing process. Are more threads
>>>>>>>generally better (eg: because it makes the disk IO parallel across the
>>>>>>>threads), or does it only have a positive effect if you have many data
>>>>>>>storage regions? Or something else? If this could be clarified in the
>>>>>>>documentation (or a pointer to it which Google has not yet found), that
>>>>>>>would be good.
>>>>>>>
>>>>>>>2. Checkpoint frequency. This is defaulted to 180 seconds. I was
>>>>>>>thinking that reducing this time would result in smaller less disruptive
>>>>>>>check points. Setting it to 60 seconds seems pretty safe, but is there a
>>>>>>>practical lower limit that should be used for use cases with new data
>>>>>>>constantly being added, eg: 5 seconds, 10 seconds?
>>>>>>>
>>>>>>>3. Write exclusivity constraints during checkpointing. I understand that
>>>>>>>while a checkpoint is occurring ongoing writes will be supported into
>>>>>>>the caches being check pointed, and if those are writes to existing
>>>>>>>pages then those will be duplicated into the checkpoint buffer. If this
>>>>>>>buffer becomes full or stressed then Ignite will throttle, and perhaps
>>>>>>>block, writes until the checkpoint is complete. If this is the case then
>>>>>>>Ignite will emit logging (warning or informational?) that writes are
>>>>>>>being throttled.
>>>>>>>
>>>>>>>We have cases where simple puts to caches (a few requests per second)
>>>>>>>are taking up to 90 seconds to execute when there is an active check
>>>>>>>point occurring, where the check point has been triggered by the
>>>>>>>checkpoint timer. When a checkpoint is not occurring the time to do this
>>>>>>>is usually in the milliseconds. The checkpoints themselves can take 90
>>>>>>>seconds or longer, and are updating up to 30,000-40,000 pages, across a
>>>>>>>pair of data storage regions, one with 4Gb in-memory space allocated
>>>>>>>(which should be 1,000,000 pages at the standard 4kb page size), and one
>>>>>>>small region with 128Mb. There is no 'throttling' logging being emitted
>>>>>>>that we can tell, so the checkpoint buffer (which should be 1Gb for the
>>>>>>>first data region and 256 Mb for the second smaller region in this case)
>>>>>>>does not look like it can fill up during the checkpoint.
>>>>>>>
>>>>>>>It seems like the checkpoint is affecting the put operations, but I
>>>>>>>don't understand why that may be given the documented checkpointing
>>>>>>>process, and the checkpoint itself (at least via Informational logging)
>>>>>>>is not advertising any restrictions.
>>>>>>>
>>>>>>>Thanks,
>>>>>>>Raymond.
>>>>>>> --
>>>>>>>
>>>>>>>Raymond Wilson
>>>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>Raymond Wilson
>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>11 Birmingham Drive | Christchurch, New Zealand
>>>>>+64-21-2013317 Mobile
>>>>>[email protected]
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>>Raymond Wilson
>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>11 Birmingham Drive | Christchurch, New Zealand
>>>>+64-21-2013317 Mobile
>>>>[email protected]
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>> --
>>
>>Raymond Wilson
>>Solution Architect, Civil Construction Software Systems (CCSS)
>>11 Birmingham Drive | Christchurch, New Zealand
>>+64-21-2013317 Mobile
>>[email protected]
>>
>>
>
> --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>11 Birmingham Drive | Christchurch, New Zealand
>+64-21-2013317 Mobile
>[email protected]
>
>