On 13/01/2021, at 9:15 PM, Zhenya Stanilovsky
<[email protected] <mailto:[email protected]>> wrote:
Is there an API version of the cluster deactivation?
https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131
<https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131>
On Wed, Jan 13, 2021 at 8:28 PM Zhenya Stanilovsky
<[email protected]
<//e.mail.ru/compose/?mailto=mailto%[email protected]>>
wrote:
Hi Zhenya,
Thanks for confirming performing checkpoints more
often will help here.
Hi Raymond !
I have established this configuration so will
experiment with settings little.
On a related note, is there any way to
automatically trigger a checkpoint, for instance
as a pre-shutdown activity?
If you shutdown your cluster gracefully = with
deactivation [1] further start will not trigger wal
readings.
[1]
https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster
<https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster>
Checkpoints seem to be much faster than the
process of applying WAL updates.
Raymond.
On Wed, Jan 13, 2021 at 8:07 PM Zhenya
Stanilovsky <[email protected]
<http://e.mail.ru/compose/?mailto=mailto%[email protected]>>
wrote:
We have noticed that startup time for our
server nodes has been slowly increasing
in time as the amount of data stored in
the persistent store grows.
This appears to be closely related to
recovery of WAL changes that were not
checkpointed at the time the node was
stopped.
After enabling debug logging we see that
the WAL file is scanned, and for every
cache, all partitions in the cache are
examined, and if there are any
uncommitted changes in the WAL file then
the partition is updated (I assume this
requires reading of the partition itself
as a part of this process).
We now have ~150Gb of data in our
persistent store and we see WAL update
times between 5-10 minutes to complete,
during which the node is unavailable.
We use fairly large WAL files (512Mb) and
use 10 segments, with WAL archiving enabled.
We anticipate data in persistent storage
to grow to Terabytes, and if the startup
time continues to grow as storage grows
then this makes deploys and restarts
difficult.
Until now we have been using the default
checkpoint time out of 3 minutes which
may mean we have significant
uncheckpointed data in the WAL files. We
are moving to 1 minute checkpoint but
don't yet know if this improve startup
times. We also use the default 1024
partitions per cache, though some
partitions may be large.
Can anyone confirm this is expected
behaviour and recommendations for
resolving it?
Will reducing checking pointing intervals
help?
yes, it will help. Check
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
<https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood>
Is the entire content of a partition read
while applying WAL changes?
don`t think so, may be someone else suggest here?
Does anyone else have this issue?
Thanks,
Raymond.
--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction
Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New
Zealand
[email protected]
<http://e.mail.ru/compose/?mailto=mailto%[email protected]>
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software
Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
[email protected]
<http://e.mail.ru/compose/?mailto=mailto%[email protected]>
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems
(CCSS)
11 Birmingham Drive | Christchurch, New Zealand
[email protected]
<//e.mail.ru/compose/?mailto=mailto%[email protected]>
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>