Re: Questions related to check pointing

Zhenya Stanilovsky Sun, 10 Jan 2021 23:47:56 -0800


fsync=37104ms too long for such pages amount : pages=33421, plz check how can 
you improve fsync on your storage.


 
>
>
>------- Forwarded message -------
>From: "Raymond Wilson" < raymond_wil...@trimble.com >
>To: user < user@ignite.apache.org >, "Zhenya Stanilovsky" < arzamas...@mail.ru 
>>
>Cc:
>Subject: Re: Re[4]: Questions related to check pointing
>Date: Thu, 31 Dec 2020 01:46:20 +0300
> 
>Hi Zhenya,
> 
>The matching checkpoint finished log is this:
> 
>2020-12-15 19:07:39,253 [106] INF [MutableCacheComputeServer]  Checkpoint 
>finished [cpId=e2c31b43-44df-43f1-b162-6b6cefa24e28, pages=33421, 
>markPos=FileWALPointer [idx=6339, fileOff=243287334, len=196573], 
>walSegmentsCleared=0, walSegmentsCovered=[], markDuration=218ms, 
>pagesWrite=1150ms, fsync=37104ms, total=38571ms]  
> 
>Regards your comment that 3/4 of pages in whole data region need to be dirty 
>to trigger this, can you confirm this is 3/4 of the maximum size of the data 
>region, or of the currently used size (eg: if Min is 1Gb, and Max is 4Gb, and 
>used is 2Gb, would 1.5Gb of dirty pages trigger this?)
> 
>Are data regions independently checkpointed, or are they checkpointed as a 
>whole, so that a 'too many dirty pages' condition affects all data regions in 
>terms of write blocking?
> 
>Can you comment on my query regarding should we set Min and Max size of the 
>data region to be the same? Ie: Don't bother with growing the data region 
>memory use on demand, just allocate the maximum?  
> 
>In terms of the checkpoint lock hold time metric, of the checkpoints quoting 
>'too many dirty pages' there is one instance apart from the one I have 
>provided earlier violating this limit, ie:
> 
>2020-12-17 18:56:39,086 [104] INF [MutableCacheComputeServer] Checkpoint 
>started [checkpointId=e9ccf0ca-f813-4f91-ac93-5483350fdf66, 
>startPtr=FileWALPointer [idx=7164, fileOff=389224517, len=196573], 
>checkpointBeforeLockTime=276ms, checkpointLockWait=0ms, 
>checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=39ms, 
>walCpRecordFsyncDuration=254ms, writeCheckpointEntryDuration=32ms, 
>splitAndSortCpPagesDuration=276ms, pages=77774, reason=' too many dirty pages 
>']  
> 
>This is out of a population of 16 instances I can find. The remainder have 
>lock times of 16-17ms.
> 
>Regarding writes of pages to the persistent store, does the check pointing 
>system parallelise writes across partitions ro maximise throughput? 
> 
>Thanks,
>Raymond.
> 
>   
>On Thu, Dec 31, 2020 at 1:17 AM Zhenya Stanilovsky < arzamas...@mail.ru > 
>wrote:
>>
>>All write operations will be blocked for this timeout :  
>>checkpointLockHoldTime=32ms (Write Lock holding) If you observe huge amount 
>>of such messages :    reason=' too many dirty pages ' may be you need to 
>>store some data in not persisted regions for example or reduce indexes (if 
>>you use them). And please attach other part of cp message starting with : 
>>Checkpoint finished.
>>
>>
>> 
>>>In ( 
>>>https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>>> ), there is a mention of a dirty pages limit that is a factor that can 
>>>trigger check points.
>>> 
>>>I also found this issue:  
>>>http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html
>>> where "too many dirty pages" is a reason given for initiating a checkpoint.
>>> 
>>>After reviewing our logs I found this: (one example)
>>> 
>>>2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint 
>>>started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, 
>>>startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], 
>>>checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, 
>>>checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, 
>>>walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, 
>>>splitAndSortCpPagesDuration=45ms, pages=33421, reason=' too many dirty pages 
>>>']   
>>> 
>>>Which suggests we may have the issue where writes are frozen until the check 
>>>point is completed.
>>> 
>>>Looking at the AI 2.8.1 source code, the dirty page limit fraction appears 
>>>to be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java:
>>> 
>>>    /**
>>>     * Threshold to calculate limit for pages list on-heap caches.
>>>     * <p>
>>>     * Note: When a checkpoint is triggered, we need some amount of page 
>>>memory to store pages list on-heap cache.
>>>     * If a checkpoint is triggered by "too many dirty pages" reason and 
>>>pages list cache is rather big, we can get
>>>* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the 
>>>total amount of cached page list buckets,
>>>     * assuming that checkpoint will be triggered if no more then 3/4 of 
>>>pages will be marked as dirty (there will be
>>>     * at least 1/4 of clean pages) and each cached page list bucket can be 
>>>stored to up to 2 pages (this value is not
>>>     * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE 
>>>> PagesListNodeIO#getCapacity it can take
>>>     * more than 2 pages). Also some amount of page memory needed to store 
>>>page list metadata.
>>>     */
>>>     private   static   final   double   PAGE_LIST_CACHE_LIMIT_THRESHOLD  =  
>>>0.1 ;
>>> 
>>>This raises two questions: 
>>> 
>>>1. The data region where most writes are occurring has 4Gb allocated to it, 
>>>though it is permitted to start at a much lower level. 4Gb should be 
>>>1,000,000 pages, 10% of which should be 100,000 dirty pages.
>>> 
>>>The 'limit holder' is calculated like this:
>>> 
>>>    /**
>>>     *  @return  Holder for page list cache limit for given data region.
>>>     */
>>>     public   AtomicLong   pageListCacheLimitHolder ( DataRegion   
>>>dataRegion ) {
>>>         if  ( dataRegion . config (). isPersistenceEnabled ()) {
>>>             return   pageListCacheLimits . computeIfAbsent ( dataRegion . 
>>>config (). getName (), name  ->   new   AtomicLong (
>>>                ( long )(((PageMemoryEx) dataRegion . pageMemory ()). 
>>>totalPages () * PAGE_LIST_CACHE_LIMIT_THRESHOLD)));
>>>        }  
>>>         return   null ;
>>>    }
>>> 
>>>... but I am unsure if totalPages() is referring to the current size of the 
>>>data region, or the size it is permitted to grow to. ie: Could the 'dirty 
>>>page limit' be a sliding limit based on the growth of the data region? Is it 
>>>better to set the initial and maximum sizes of data regions to be the same 
>>>number?
>>> 
>>>2. We have two data regions, one supporting inbound arrival of data (with 
>>>low numbers of writes), and one supporting storage of processed results from 
>>>the arriving data (with many more writes). 
>>> 
>>>The block on writes due to the number of dirty pages appears to affect all 
>>>data regions, not just the one which has violated the dirty page limit. Is 
>>>that correct? If so, is this something that can be improved?
>>> 
>>>Thanks,
>>>Raymond.
>>>   
>>>On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson < raymond_wil...@trimble.com 
>>>> wrote:
>>>>I'm working on getting automatic JVM thread stack dumping occurring if we 
>>>>detect long delays in put (PutIfAbsent) operations. Hopefully this will 
>>>>provide more information.  
>>>>On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky < arzamas...@mail.ru > 
>>>>wrote:
>>>>>
>>>>>Don`t think so, checkpointing work perfectly well already before this fix.
>>>>>Need additional info for start digging your problem, can you share ignite 
>>>>>logs somewhere?
>>>>>   
>>>>>>I noticed an entry in the Ignite 2.9.1 changelog:
>>>>>>*  Improved checkpoint concurrent behaviour
>>>>>>I am having trouble finding the relevant Jira ticket for this in the 
>>>>>>2.9.1 Jira area at  
>>>>>>https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved
>>>>>> 
>>>>>>Perhaps this change may improve the checkpointing issue we are seeing?
>>>>>> 
>>>>>>Raymond.
>>>>>>   
>>>>>>On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson < 
>>>>>>raymond_wil...@trimble.com > wrote:
>>>>>>>Hi Zhenya,
>>>>>>> 
>>>>>>>1. We currently use AWS EFS for primary storage, with provisioned IOPS 
>>>>>>>to provide sufficient IO. Our Ignite cluster currently tops out at ~10% 
>>>>>>>usage (with at least 5 nodes writing to it, including WAL and WAL 
>>>>>>>archive), so we are not saturating the EFS interface. We use the default 
>>>>>>>page size (experiments with larger page sizes showed instability when 
>>>>>>>checkpointing due to free page starvation, so we reverted to the default 
>>>>>>>size). 
>>>>>>> 
>>>>>>>2. Thanks for the detail, we will look for that in thread dumps when we 
>>>>>>>can create them.
>>>>>>> 
>>>>>>>3. We are using the default CP buffer size, which is max(256Mb, 
>>>>>>>DataRagionSize / 4) according to the Ignite documentation, so this 
>>>>>>>should have more than enough checkpoint buffer space to cope with 
>>>>>>>writes. As additional information, the cache which is displaying very 
>>>>>>>slow writes is in a data region with relatively slow write traffic. 
>>>>>>>There is a primary (default) data region with large write traffic, and 
>>>>>>>the vast majority of pages being written in a checkpoint will be for 
>>>>>>>that default data region.
>>>>>>> 
>>>>>>>4. Yes, this is very surprising. Anecdotally from our logs it appears 
>>>>>>>write traffic into the low write traffic cache is blocked during 
>>>>>>>checkpoints.
>>>>>>> 
>>>>>>>Thanks,
>>>>>>>Raymond.
>>>>>>>    
>>>>>>>   
>>>>>>>On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky < arzamas...@mail.ru 
>>>>>>>> wrote:
>>>>>>>>*  
>>>>>>>>Additionally to Ilya reply you can check vendors page for additional 
>>>>>>>>info, all in this page are applicable for ignite too [1]. Increasing 
>>>>>>>>threads number leads to concurrent io usage, thus if your have 
>>>>>>>>something like nvme — it`s up to you but in case of sas possibly better 
>>>>>>>>would be to reduce this param.
>>>>>>>>*  Log will shows you something like :
>>>>>>>>Parking thread=%Thread name% for timeout(ms)= %time% and appropriate :
>>>>>>>>Unparking thread=
>>>>>>>>*  No additional looging with cp buffer usage are provided. cp buffer 
>>>>>>>>need to be more than 10% of overall persistent  DataRegions size.
>>>>>>>>*  90 seconds or longer  —    Seems like problems in io or system 
>>>>>>>>tuning, it`s very bad score i hope. 
>>>>>>>>[1]  
>>>>>>>>https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning
>>>>>>>>
>>>>>>>>
>>>>>>>> 
>>>>>>>>>Hi,
>>>>>>>>> 
>>>>>>>>>We have been investigating some issues which appear to be related to 
>>>>>>>>>checkpointing. We currently use the IA 2.8.1 with the C# client.
>>>>>>>>> 
>>>>>>>>>I have been trying to gain clarity on how certain aspects of the 
>>>>>>>>>Ignite configuration relate to the checkpointing process:
>>>>>>>>> 
>>>>>>>>>1. Number of check pointing threads. This defaults to 4, but I don't 
>>>>>>>>>understand how it applies to the checkpointing process. Are more 
>>>>>>>>>threads generally better (eg: because it makes the disk IO parallel 
>>>>>>>>>across the threads), or does it only have a positive effect if you 
>>>>>>>>>have many data storage regions? Or something else? If this could be 
>>>>>>>>>clarified in the documentation (or a pointer to it which Google has 
>>>>>>>>>not yet found), that would be good.
>>>>>>>>> 
>>>>>>>>>2. Checkpoint frequency. This is defaulted to 180 seconds. I was 
>>>>>>>>>thinking that reducing this time would result in smaller less 
>>>>>>>>>disruptive check points. Setting it to 60 seconds seems pretty safe, 
>>>>>>>>>but is there a practical lower limit that should be used for use cases 
>>>>>>>>>with new data constantly being added, eg: 5 seconds, 10 seconds?
>>>>>>>>> 
>>>>>>>>>3. Write exclusivity constraints during checkpointing. I understand 
>>>>>>>>>that while a checkpoint is occurring ongoing writes will be supported 
>>>>>>>>>into the caches being check pointed, and if those are writes to 
>>>>>>>>>existing pages then those will be duplicated into the checkpoint 
>>>>>>>>>buffer. If this buffer becomes full or stressed then Ignite will 
>>>>>>>>>throttle, and perhaps block, writes until the checkpoint is complete. 
>>>>>>>>>If this is the case then Ignite will emit logging (warning or 
>>>>>>>>>informational?) that writes are being throttled.
>>>>>>>>> 
>>>>>>>>>We have cases where simple puts to caches (a few requests per second) 
>>>>>>>>>are taking up to 90 seconds to execute when there is an active check 
>>>>>>>>>point occurring, where the check point has been triggered by the 
>>>>>>>>>checkpoint timer. When a checkpoint is not occurring the time to do 
>>>>>>>>>this is usually in the milliseconds. The checkpoints themselves can 
>>>>>>>>>take 90 seconds or longer, and are updating up to 30,000-40,000 pages, 
>>>>>>>>>across a pair of data storage regions, one with 4Gb in-memory space 
>>>>>>>>>allocated (which should be 1,000,000 pages at the standard 4kb page 
>>>>>>>>>size), and one small region with 128Mb. There is no 'throttling' 
>>>>>>>>>logging being emitted that we can tell, so the checkpoint buffer 
>>>>>>>>>(which should be 1Gb for the first data region and 256 Mb for the 
>>>>>>>>>second smaller region in this case) does not look like it can fill up 
>>>>>>>>>during the checkpoint.
>>>>>>>>> 
>>>>>>>>>It seems like the checkpoint is affecting the put operations, but I 
>>>>>>>>>don't understand why that may be given the documented checkpointing 
>>>>>>>>>process, and the checkpoint itself (at least via Informational 
>>>>>>>>>logging) is not advertising any restrictions.
>>>>>>>>> 
>>>>>>>>>Thanks,
>>>>>>>>>Raymond.
>>>>>>>>>  --
>>>>>>>>>
>>>>>>>>>Raymond Wilson
>>>>>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>>>>>  
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  
>>>>>>> 
>>>>>>>  --
>>>>>>>
>>>>>>>Raymond Wilson
>>>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>>>>+64-21-2013317  Mobile
>>>>>>>raymond_wil...@trimble.com
>>>>>>>         
>>>>>>> 
>>>>>> 
>>>>>>  --
>>>>>>
>>>>>>Raymond Wilson
>>>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>>>+64-21-2013317  Mobile
>>>>>>raymond_wil...@trimble.com
>>>>>>         
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>  
>>>> 
>>>>  --
>>>>
>>>>Raymond Wilson
>>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>>+64-21-2013317  Mobile
>>>>raymond_wil...@trimble.com
>>>>         
>>>> 
>>> 
>>>  --
>>>
>>>Raymond Wilson
>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>11 Birmingham Drive |  Christchurch, New Zealand
>>>+64-21-2013317  Mobile
>>>raymond_wil...@trimble.com
>>>         
>>> 
>> 
>> 
>> 
>>  
> 
>  --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>11 Birmingham Drive |  Christchurch, New Zealand
>+64-21-2013317  Mobile
>raymond_wil...@trimble.com
>         
> 
>
>
>

Re: Questions related to check pointing

Reply via email to