Hi, Please find ignite config and error log below config : <property name="gridLogger"> <bean class="org.apache.ignite.logger.log4j.Log4JLogger"> <constructor-arg type="java.lang.String" value="/opt/ignite/apache-ignite/config/ignite-log4j.xml"/> </bean> </property> <property name="peerClassLoadingEnabled" value="true"/> <property name="deploymentMode" value="CONTINUOUS"/> <property name="workDirectory" value="/ignite/work"/> <property name="snapshotPath" value="/ignite/snapshots"/> <property name="queryThreadPoolSize" value="8"/>
<property name="dataStorageConfiguration"> <bean class="org.apache.ignite.configuration.DataStorageConfiguration"> <property name="walBufferSize" value="#{128L * 1024 * 1024}"/> <property name="walSegmentSize" value="#{512L * 1024 * 1024}"/> <property name="maxWalArchiveSize" value="#{2L * 1024 * 1024 * 1024}"/> <property name="checkpointFrequency" value="#{60 * 1000}" /> <property name="writeThrottlingEnabled" value="true"/> <property name="defaultDataRegionConfiguration"> <bean class="org.apache.ignite.configuration.DataRegionConfiguration"> <property name="persistenceEnabled" value="true"/> <property name="initialSize" value="#{100L * 1024 * 1024}"/> <property name="maxSize" value="#{2L * 1024 * 1024 * 1024}"/> <!-- https://ignite.apache.org/docs/latest/persistence/persistence-tuning#adjusting-checkpointing-buffer-size-- > <property name="checkpointPageBufferSize" value="#{512L * 1024 * 1024}"/> <!--<property name="pageReplacementMode" value="SEGMENTED_LRU"/>--> </bean> </property> <property name="walPath" value="/ignite/wal"/> <property name="walArchivePath" value="/ignite/walarchive"/> </bean> </property> Error log: at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.access$1000(FileWriteAheadLogManager.java:2763) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:870) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3200) at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1116) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1799) at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1721) at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1160) at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1054) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:940) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:839) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:709) at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678) at org.apache.ignite.Ignition.start(Ignition.java:353) ... 1 more Failed to start grid: WAL history is too short [descs=[FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000060.wal, idx=60], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000061.wal, idx=61], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000062.wal, idx=62], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000063.wal, idx=63], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000064.wal, idx=64], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000065.wal, idx=65], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000066.wal, idx=66], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000067.wal, idx=67], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000068.wal, idx=68], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000069.wal, idx=69], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000070.wal, idx=70], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000071.wal, idx=71], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000072.wal, idx=72], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000073.wal, idx=73], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000074.wal, idx=74], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000075.wal, idx=75], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000076.wal, idx=76], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000077.wal, idx=77], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000078.wal, idx=78], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000079.wal, idx=79], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000080.wal, idx=80], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000081.wal, idx=81], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000082.wal, idx=82], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000083.wal, idx=83], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000084.wal, idx=84], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000085.wal, idx=85], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000086.wal, idx=86], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000087.wal, idx=87], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000088.wal, idx=88], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000089.wal, idx=89], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000090.wal, idx=90], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000091.wal, idx=91], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000092.wal, idx=92], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000093.wal, idx=93], FileDescriptor [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000094.wal, idx=94]], start=WALPointer [idx=0, fileOff=0, len=0]] On Thu, May 26, 2022 at 8:56 PM Николай Ижиков <nizhi...@apache.org> wrote: > Can you, please, send your config and full log file that contains error > message. > > 26 мая 2022 г., в 17:50, Surinder Mehra <redni...@gmail.com> написал(а): > > Hello, > I upgraded to 2.13.0 and I am able to take sync snapshots now. However, I > ran into another problem while restoring from snapshot using manual steps > mentioned in documentation. > > We run ignite statefulset on kubernetes cluster so when we scale it to N > nodes, it brings up one node at a time. > > Now I am trying to attach init container which will copy /db directory > from snapshots to work directory after clearing db directory from work > directory and then start main container which runs ignite. > > It works well on single node, it's able to start cluster with snapshot > Data. > > When I start multiple nodes, init container will run each one of those as > first step. Since nodes starts one at a time, it's runs into error saying > "too small WAL segments data" > > I suppose that could be because 2nd node is still in init step while first > one is in running mode. There are few which haven't started yet, waiting > for 2nd node to be in running state. > > Any idea how can we make main containers wait until all init containers > are completed > > Asking this here as its related to ignite setup in kubernetes. > > Any help wil be appreciated. Thanks > > On Wed, 25 May 2022, 00:04 Surinder Mehra, <redni...@gmail.com> wrote: > >> Thanks a lot. I will try this. >> >> On Tue, 24 May 2022, 23:50 Николай Ижиков, <nizhi...@apache.org> wrote: >> >>> > Does it ensure consistency while copying data which is parallely >>> getting updated by application writes >>> >>> Yes. >>> >>> From the documentation: >>> >>> «An Ignite snapshot includes a consistent cluster-wide copy of all data >>> records persisted on disk and some other files needed for a restore >>> procedure.» >>> >>> > will this be a stop the world process >>> >>> No. >>> >>> >>> 24 мая 2022 г., в 21:17, Surinder Mehra <redni...@gmail.com> написал(а): >>> >>> Hi >>> Thanks for reply. >>> >>> #1: So it's not a stop the world task. Does it ensure consistency while >>> copying data which is parallely getting updated by application writes. Or >>> does it mark the data to copied and ignore further updates on it. >>> >>> #2: >>> I will try sync snapshot. But just to confirm, will this be a stop the >>> world process. Couldn't find anything on Documentation page about it >>> >>> On Tue, 24 May 2022, 23:12 Николай Ижиков, <nizhi...@apache.org> wrote: >>> >>>> Hello, Mehra. >>>> >>>> > 1. Is it stop the world process. >>>> >>>> No, you can perform any actions. >>>> Note, topology changes will cancel snapshot create process. >>>> >>>> > 2. If so, is it stop the world only during command execution >>>> (500millis) or until snapshot Dara is fully copied(takes many minutes) to >>>> complete. >>>> >>>> Please, take a look at `—sync` option of create snapshot command (you >>>> can see help in `control.sh` output). >>>> `EVT_CLUSTER_SNAPSHOT_FINISHED` raise on snapshot create finish. >>>> >>>> > 3. Is there a way around to speed up this other than increasing >>>> snapshot threads >>>> >>>> Stop write operations. >>>> The less you change the quicker snapshot will be created. >>>> >>>> 24 мая 2022 г., в 20:12, Surinder Mehra <redni...@gmail.com> >>>> написал(а): >>>> >>>> Hi, >>>> I have 3 node ignite cluster each node contains 60G work directory(ebs) >>>> and I need to create snapshots. >>>> I followed steps to create snapshots and run create snapshot command >>>> using control utility. Command completed in 500millis but snapshot >>>> directory only had 400Mb data. Later I realised directory size grew up 30G. >>>> I suppose it would reach size of work directory. >>>> >>>> >>>> I have few questions. >>>> 1. Is it stop the world process. >>>> 2. If so, is it stop the world only during command execution >>>> (500millis) or until snapshot Dara is fully copied(takes many minutes) to >>>> complete. >>>> 3. Is there a way around to speed up this other than increasing >>>> snapshot threads >>>> >>>> >>>> >>> >