Hi,
I don't think there are any other options at the moment other than the
ones you mentioned.
However, you can also create your own application that will check the
topology and activate it when all nodes from the baseline are online.
For example, additional java code when starting a server node.
In case you require any changes to the current Ignite implementation,
you can create a thread in the Ignite developer list:
http://apache-ignite-developers.2346864.n4.nabble.com/
BR,
Andrei
1/20/2021 9:16 PM, Raymond Wilson пишет:
Hi Andre,
I would like to see Ignite support a graceful shutdown scenario you
get with deactivation, but which does not need to be manually reactivated.
We run a pretty agile process and it is not uncommon to have multiple
deploys to production throughout a week. This is a pretty automated
affair (essentially push-button) and it works well, except for the WAL
rescan on startup.
Today there are two approaches we can take for a deployment:
1. Stop the nodes (which is what we currently do), leaving the WAL and
persistent store inconsistent. This requires a rescan of the WAL
before the grid is auto re-activated on startup. The time to do this
is increasing with the size of the persistent store - it does not
appear to be related to the size of the WAL.
2. Deactivate the grid, which leaves the WAL and persistent store in a
consistent state. This requires manual re-activation on restart, but
does not incur the increasing WAL restart cost.
Is an option like the one below possible?:
3. Suspend the grid, which performs the same steps deactivation does
to make the WAL and persistent store consistent, but which leaves the
grid activated so the manual activation process is not required on
restart.
Thanks,
Raymond.
On Thu, Jan 21, 2021 at 4:02 AM andrei <[email protected]
<mailto:[email protected]>> wrote:
Hi,
Yes, that was to be expected. The main autoactivation scenario is
cluster restart. If you are using manual deactivation, you should
also manually activate your cluster.
BR,
Andrei
1/20/2021 5:50 AM, Raymond Wilson пишет:
We have been experimenting with using deactivation to shutdown
the grid to reduce the time for the grid to start up again.
It appears there is a downside to this: once deactivated the grid
does not appear to auto-activate once baseline topology is
achieved, which means we will need to run through the
bootstrapping protocol of ensuring the grid has restarted
correctly before activating it once again.
The baseline topology documentation at
https://ignite.apache.org/docs/latest/clustering/baseline-topology
<https://ignite.apache.org/docs/latest/clustering/baseline-topology>
does not cover this condition.
Is this expected?
Thanks,
Raymond.
On Wed, Jan 13, 2021 at 11:49 PM Pavel Tupitsyn
<[email protected] <mailto:[email protected]>> wrote:
Raymond,
Please use ICluster.SetActive [1] instead, the API linked
above is obsolete
[1]
https://ignite.apache.org/releases/latest/dotnetdoc/api/Apache.Ignite.Core.Cluster.ICluster.html?#Apache_Ignite_Core_Cluster_ICluster_SetActive_System_Boolean_
<https://ignite.apache.org/releases/latest/dotnetdoc/api/Apache.Ignite.Core.Cluster.ICluster.html?#Apache_Ignite_Core_Cluster_ICluster_SetActive_System_Boolean_>
On Wed, Jan 13, 2021 at 11:54 AM Raymond Wilson
<[email protected]
<mailto:[email protected]>> wrote:
Of course. Obvious! :)
Sent from my iPhone
On 13/01/2021, at 9:15 PM, Zhenya Stanilovsky
<[email protected] <mailto:[email protected]>> wrote:
Is there an API version of the cluster deactivation?
https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131
<https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131>
On Wed, Jan 13, 2021 at 8:28 PM Zhenya Stanilovsky
<[email protected]
<//e.mail.ru/compose/?mailto=mailto%[email protected]>>
wrote:
Hi Zhenya,
Thanks for confirming performing checkpoints
more often will help here.
Hi Raymond !
I have established this configuration so
will experiment with settings little.
On a related note, is there any way to
automatically trigger a checkpoint, for
instance as a pre-shutdown activity?
If you shutdown your cluster gracefully = with
deactivation [1] further start will not
trigger wal readings.
[1]
https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster
<https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster>
Checkpoints seem to be much faster than the
process of applying WAL updates.
Raymond.
On Wed, Jan 13, 2021 at 8:07 PM Zhenya
Stanilovsky <[email protected]
<http://e.mail.ru/compose/?mailto=mailto%[email protected]>>
wrote:
We have noticed that startup time
for our server nodes has been slowly
increasing in time as the amount of
data stored in the persistent store
grows.
This appears to be closely related
to recovery of WAL changes that were
not checkpointed at the time the
node was stopped.
After enabling debug logging we see
that the WAL file is scanned, and
for every cache, all partitions in
the cache are examined, and if there
are any uncommitted changes in the
WAL file then the partition is
updated (I assume this requires
reading of the partition itself as a
part of this process).
We now have ~150Gb of data in our
persistent store and we see WAL
update times between 5-10 minutes to
complete, during which the node is
unavailable.
We use fairly large WAL files
(512Mb) and use 10 segments, with
WAL archiving enabled.
We anticipate data in persistent
storage to grow to Terabytes, and if
the startup time continues to grow
as storage grows then this makes
deploys and restarts difficult.
Until now we have been using the
default checkpoint time out of 3
minutes which may mean we have
significant uncheckpointed data in
the WAL files. We are moving to 1
minute checkpoint but don't yet know
if this improve startup times. We
also use the default 1024 partitions
per cache, though some partitions
may be large.
Can anyone confirm this is expected
behaviour and recommendations for
resolving it?
Will reducing checking pointing
intervals help?
yes, it will help. Check
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
<https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood>
Is the entire content of a partition
read while applying WAL changes?
don`t think so, may be someone else
suggest here?
Does anyone else have this issue?
Thanks,
Raymond.
--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil
Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch,
New Zealand
[email protected]
<http://e.mail.ru/compose/?mailto=mailto%[email protected]>
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction
Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
[email protected]
<http://e.mail.ru/compose/?mailto=mailto%[email protected]>
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software
Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
[email protected]
<//e.mail.ru/compose/?mailto=mailto%[email protected]>
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
[email protected] <mailto:[email protected]>
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
[email protected] <mailto:[email protected]>
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>