Re: Ever increasing startup times as data grow in persistent storage

andrei Wed, 20 Jan 2021 07:02:49 -0800

Hi,

Yes, that was to be expected. The main autoactivation scenario iscluster restart. If you are using manual deactivation, you should alsomanually activate your cluster.


BR,
Andrei

1/20/2021 5:50 AM, Raymond Wilson пишет:

We have been experimenting with using deactivation to shutdown thegrid to reduce the time for the grid to start up again.

It appears there is a downside to this: once deactivated the grid doesnot appear to auto-activate once baseline topology is achieved, whichmeans we will need to run through the bootstrapping protocol ofensuring the grid has restarted correctly before activating it once again.

The baseline topology documentation athttps://ignite.apache.org/docs/latest/clustering/baseline-topology<https://ignite.apache.org/docs/latest/clustering/baseline-topology>does not cover this condition.


Is this expected?

Thanks,
Raymond.

On Wed, Jan 13, 2021 at 11:49 PM Pavel Tupitsyn <[email protected]<mailto:[email protected]>> wrote:


    Raymond,

    Please use ICluster.SetActive [1] instead, the API linked above is
    obsolete


    [1]
    
https://ignite.apache.org/releases/latest/dotnetdoc/api/Apache.Ignite.Core.Cluster.ICluster.html?#Apache_Ignite_Core_Cluster_ICluster_SetActive_System_Boolean_
    
<https://ignite.apache.org/releases/latest/dotnetdoc/api/Apache.Ignite.Core.Cluster.ICluster.html?#Apache_Ignite_Core_Cluster_ICluster_SetActive_System_Boolean_>

    On Wed, Jan 13, 2021 at 11:54 AM Raymond Wilson
    <[email protected] <mailto:[email protected]>>
    wrote:

        Of course. Obvious! :)

        Sent from my iPhone

        On 13/01/2021, at 9:15 PM, Zhenya Stanilovsky
        <[email protected] <mailto:[email protected]>> wrote:

        



            Is there an API version of the cluster deactivation?

        
https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131
        
<https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131>

            On Wed, Jan 13, 2021 at 8:28 PM Zhenya Stanilovsky
            <[email protected]
            <//e.mail.ru/compose/?mailto=mailto%[email protected]>>
            wrote:



                    Hi Zhenya,
                    Thanks for confirming performing checkpoints more
                    often will help here.

                Hi Raymond !

                    I have established this configuration so will
                    experiment with settings little.
                    On a related note, is there any way to
                    automatically trigger a checkpoint, for instance
                    as a pre-shutdown activity?

                If you shutdown your cluster gracefully = with
                deactivation [1] further start will not trigger wal
                readings.
                [1]
                
https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster
                
<https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster>

                    Checkpoints seem to be much faster than the
                    process of applying WAL updates.
                    Raymond.
                    On Wed, Jan 13, 2021 at 8:07 PM Zhenya
                    Stanilovsky <[email protected]
                    
<http://e.mail.ru/compose/?mailto=mailto%[email protected]>>
                    wrote:




                            We have noticed that startup time for our
                            server nodes has been slowly increasing
                            in time as the amount of data stored in
                            the persistent store grows.
                            This appears to be closely related to
                            recovery of WAL changes that were not
                            checkpointed at the time the node was
                            stopped.
                            After enabling debug logging we see that
                            the WAL file is scanned, and for every
                            cache, all partitions in the cache are
                            examined, and if there are any
                            uncommitted changes in the WAL file then
                            the partition is updated (I assume this
                            requires reading of the partition itself
                            as a part of this process).
                            We now have ~150Gb of data in our
                            persistent store and we see WAL update
                            times between 5-10 minutes to complete,
                            during which the node is unavailable.
                            We use fairly large WAL files (512Mb) and
                            use 10 segments, with WAL archiving enabled.
                            We anticipate data in persistent storage
                            to grow to Terabytes, and if the startup
                            time continues to grow as storage grows
                            then this makes deploys and restarts
                            difficult.
                            Until now we have been using the default
                            checkpoint time out of 3 minutes which
                            may mean we have significant
                            uncheckpointed data in the WAL files. We
                            are moving to 1 minute checkpoint but
                            don't yet know if this improve startup
                            times. We also use the default 1024
                            partitions per cache, though some
                            partitions may be large.
                            Can anyone confirm this is expected
                            behaviour and recommendations for
                            resolving it?
                            Will reducing checking pointing intervals
                            help?

                        yes, it will help. Check
                        
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
                        
<https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood>

                            Is the entire content of a partition read
                            while applying WAL changes?

                        don`t think so, may be someone else suggest here?

                            Does anyone else have this issue?
                            Thanks,
                            Raymond.

--<http://www.trimble.com/>

                            Raymond Wilson
                            Solution Architect, Civil Construction
                            Software Systems (CCSS)
                            11 Birmingham Drive | Christchurch, New
                            Zealand
                            [email protected]
                            
<http://e.mail.ru/compose/?mailto=mailto%[email protected]>

                                                                
                                
                            
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

--<http://www.trimble.com/>

                    Raymond Wilson
                    Solution Architect, Civil Construction Software
                    Systems (CCSS)
                    11 Birmingham Drive | Christchurch, New Zealand
                    [email protected]
                    
<http://e.mail.ru/compose/?mailto=mailto%[email protected]>

                                                        
                        
                    
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

--<http://www.trimble.com/>

            Raymond Wilson
            Solution Architect, Civil Construction Software Systems
            (CCSS)
            11 Birmingham Drive | Christchurch, New Zealand
            [email protected]
            <//e.mail.ru/compose/?mailto=mailto%[email protected]>

                                                
                
            
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>



--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
[email protected] <mailto:[email protected]>

        
        
        
        
        
        
<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Re: Ever increasing startup times as data grow in persistent storage

Reply via email to