Re[2]: Questions related to check pointing

2021-01-12 Thread Zhenya Stanilovsky
  >Hi Zhenya, >  >Thanks for the pointers - I will look into them. >  >I have been doing some additional reading into this and discovered we are >using a 4.0 NFS client, which seems to be the first 'no-no'; we will look at >updating to use the 41 NFS client. >  >We have modified our default ti

Re: Questions related to check pointing

2021-01-12 Thread Raymond Wilson
Hi Zhenya, Thanks for the pointers - I will look into them. I have been doing some additional reading into this and discovered we are using a 4.0 NFS client, which seems to be the first 'no-no'; we will look at updating to use the 41 NFS client. We have modified our default timer cadence for che

Re: Questions related to check pointing

2021-01-10 Thread Zhenya Stanilovsky
t;Zhenya Stanilovsky" < arzamas...@mail.ru >> >Cc: >Subject: Re: Re[4]: Questions related to check pointing >Date: Thu, 31 Dec 2020 01:46:20 +0300 >  >Hi Zhenya, >  >The matching checkpoint finished log is this: >  >2020-12-15 19:07:39,253 [106] INF [MutableCa

Re: Re[4]: Questions related to check pointing

2021-01-07 Thread Ilya Kasnacheev
Hello! I think it's a sensible explanation. Regards, -- Ilya Kasnacheev ср, 6 янв. 2021 г. в 14:32, Raymond Wilson : > I checked our code that creates the primary data region, and it does set > the minimum and maximum to 4Gb, meaning there will be 1,000,000 pages in > that region. > > The sec

Re: Re[4]: Questions related to check pointing

2021-01-06 Thread Raymond Wilson
I checked our code that creates the primary data region, and it does set the minimum and maximum to 4Gb, meaning there will be 1,000,000 pages in that region. The secondary data region is much smaller, and is set to min/max = 128 Mb of memory. The checkpoints with the "too many dirty pages" reaso

Re: Re[4]: Questions related to check pointing

2021-01-04 Thread Ilya Kasnacheev
Hello! I guess it's pool.pages() * 3L / 4 Since, counter intuitively, the default ThrottlingPolicy is not ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY. Regards, -- Ilya Kasnacheev чт, 31 дек. 2020 г. в 04:33, Raymond Wilson : > Regards this section of code: > > maxDirty

Re: Re[4]: Questions related to check pointing

2020-12-30 Thread Raymond Wilson
Regards this section of code: maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED ? pool.pages() * 3L / 4 : Math.min(pool.pages() * 2L / 3, cpPoolPages); I think the correct ratio will be 2/3 of pages as we do not have a throttling policy defined

Re: Re[4]: Questions related to check pointing

2020-12-30 Thread Raymond Wilson
Hi Zhenya, The matching checkpoint finished log is this: 2020-12-15 19:07:39,253 [106] INF [MutableCacheComputeServer] Checkpoint finished [cpId=e2c31b43-44df-43f1-b162-6b6cefa24e28, pages=33421, markPos=FileWALPointer [idx=6339, fileOff=243287334, len=196573], walSegmentsCleared=0, walSegmentsCo

Re[4]: Questions related to check pointing

2020-12-30 Thread Zhenya Stanilovsky
All write operations will be blocked for this timeout :  checkpointLockHoldTime=32ms (Write Lock holding) If you observe huge amount of such messages :   reason=' too many dirty pages ' may be you need to store some data in not persisted regions for example or reduce indexes (if you use them

Re[4]: Questions related to check pointing

2020-12-30 Thread Zhenya Stanilovsky
Correct code is running from here: if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null) break; else { CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages"); and near you can see that : maxDirty

Re: Re[2]: Questions related to check pointing

2020-12-30 Thread Raymond Wilson
In ( https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), there is a mention of a dirty pages limit that is a factor that can trigger check points. I also found this issue: http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html whe

Re: Re[2]: Questions related to check pointing

2020-12-30 Thread Raymond Wilson
I'm working on getting automatic JVM thread stack dumping occurring if we detect long delays in put (PutIfAbsent) operations. Hopefully this will provide more information. On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky wrote: > > Don`t think so, checkpointing work perfectly well already befo

Re[2]: Questions related to check pointing

2020-12-29 Thread Zhenya Stanilovsky
Don`t think so, checkpointing work perfectly well already before this fix. Need additional info for start digging your problem, can you share ignite logs somewhere?   >I noticed an entry in the Ignite 2.9.1 changelog: >* Improved checkpoint concurrent behaviour >I am having trouble finding the

Re: Questions related to check pointing

2020-12-29 Thread Raymond Wilson
I noticed an entry in the Ignite 2.9.1 changelog: - Improved checkpoint concurrent behaviour I am having trouble finding the relevant Jira ticket for this in the 2.9.1 Jira area at https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20

Re: Questions related to check pointing

2020-12-28 Thread Raymond Wilson
Hi Zhenya, 1. We currently use AWS EFS for primary storage, with provisioned IOPS to provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage (with at least 5 nodes writing to it, including WAL and WAL archive), so we are not saturating the EFS interface. We use the default page

Re: Questions related to check pointing

2020-12-28 Thread Zhenya Stanilovsky
* Additionally to Ilya reply you can check vendors page for additional info, all in this page are applicable for ignite too [1]. Increasing threads number leads to concurrent io usage, thus if your have something like nvme — it`s up to you but in case of sas possibly better would be to reduce

Re: Questions related to check pointing

2020-12-28 Thread Raymond Wilson
As another detail, we have the WriteThrottlingEnabled property left at its default value of 'false', so I would not ordinarily expect throttling, correct? On Tue, Dec 29, 2020 at 10:04 AM Raymond Wilson wrote: > Hi Ilya, > > Regarding the throttling question, I have not yet looked at thread dump

Re: Questions related to check pointing

2020-12-28 Thread Raymond Wilson
Hi Ilya, Regarding the throttling question, I have not yet looked at thread dumps - the observed behaviour has been seen in production metrics and logging. What would you expect a thread dump to show in this case? Given my description of the sizes of the data regions and the numbers of pages bein

Re: Questions related to check pointing

2020-12-28 Thread Ilya Kasnacheev
Hello! 1. If we knew the specific circumstances in which a specific setting value will yield the most benefit, we would've already set it to that value. A setting means that you may tune it and get better results, or not. But in general we can't promise you anything. I did see improvements from in

Questions related to check pointing

2020-12-23 Thread Raymond Wilson
Hi, We have been investigating some issues which appear to be related to checkpointing. We currently use the IA 2.8.1 with the C# client. I have been trying to gain clarity on how certain aspects of the Ignite configuration relate to the checkpointing process: 1. Number of check pointing threads