Re: Ignite Snapshots take minutes to complete

Surinder Mehra Tue, 31 May 2022 04:21:25 -0700

Hey,
I have one more query on this one. Currently statefulset
"podManagementPolicy" is "OrderedReady". It means kubernetes will start one
pod at a time until it reaches the replica count given.
If we change  "podManagementPolicy" to "Parallel", all pods will start in
parallel and hence I can control init containers as intended to clean up
and copy from snapshot before any main container starts.


Question is, is it fine to start Ignite pods in parallel or should they
always start one after another. Documentation doesn't say anything about
this and I just want to make sure this has no side effects before I start
implementing my approach to restore data.

On Fri, May 27, 2022 at 11:33 PM Surinder Mehra <redni...@gmail.com> wrote:

> Hi Maxim,
> As I explained in original email, I do cleanup as part of init container.
> Since ignite nodes starts after one another, init container would also run
> in sequence.
> So when ignite node 1 completes startup(init container clean up work
> directory and copy data from snapshot) but node 2 is still running init
> container which might be still copying data. And node 3 since it is yet to
> start, it's wal, cp, and work directory hasn't been cleaned yet.
>
> So my question was, is there a way I can do cleanup of work directory  and
> copy from snapshot for all nodes before first ignite nodes starts.
>
> On Fri, 27 May 2022, 23:11 Maxim Muzafarov, <mmu...@apache.org> wrote:
>
>> Hello,
>>
>> If you're copying a snapshot part to the new node, then you have to be
>> sure that the /ignite/work/cp, /ignite/wal, /ignite/walarchive
>> directories are empty prior to the node start. Is it true for your
>> case?
>>
>> On Fri, 27 May 2022 at 10:29, Surinder Mehra <redni...@gmail.com> wrote:
>> >
>> > Hi,
>> > Please find ignite config and error log below
>> >
>> > config :
>> > <property name="gridLogger">
>> >             <bean class="org.apache.ignite.logger.log4j.Log4JLogger">
>> >                 <constructor-arg type="java.lang.String"
>> value="/opt/ignite/apache-ignite/config/ignite-log4j.xml"/>
>> >             </bean>
>> >         </property>
>> >         <property name="peerClassLoadingEnabled" value="true"/>
>> >         <property name="deploymentMode" value="CONTINUOUS"/>
>> >         <property name="workDirectory" value="/ignite/work"/>
>> >         <property name="snapshotPath" value="/ignite/snapshots"/>
>> >         <property name="queryThreadPoolSize" value="8"/>
>> >
>> >         <property name="dataStorageConfiguration">
>> >             <bean
>> class="org.apache.ignite.configuration.DataStorageConfiguration">
>> >                 <property name="walBufferSize" value="#{128L * 1024 *
>> 1024}"/>
>> >                 <property name="walSegmentSize" value="#{512L * 1024 *
>> 1024}"/>
>> >                 <property name="maxWalArchiveSize" value="#{2L * 1024 *
>> 1024 * 1024}"/>
>> >                 <property name="checkpointFrequency" value="#{60 *
>> 1000}" />
>> >                 <property name="writeThrottlingEnabled" value="true"/>
>> >                 <property name="defaultDataRegionConfiguration">
>> >                     <bean
>> class="org.apache.ignite.configuration.DataRegionConfiguration">
>> >                         <property name="persistenceEnabled"
>> value="true"/>
>> >                         <property name="initialSize" value="#{100L *
>> 1024 * 1024}"/>
>> >                         <property name="maxSize" value="#{2L * 1024 *
>> 1024 * 1024}"/>
>> >                         <!--
>> https://ignite.apache.org/docs/latest/persistence/persistence-tuning#adjusting-checkpointing-buffer-size--
>> >
>> >                         <property name="checkpointPageBufferSize"
>> value="#{512L * 1024 * 1024}"/>
>> >                         <!--<property name="pageReplacementMode"
>> value="SEGMENTED_LRU"/>-->
>> >                     </bean>
>> >                 </property>
>> >                 <property name="walPath" value="/ignite/wal"/>
>> >                 <property name="walArchivePath"
>> value="/ignite/walarchive"/>
>> >             </bean>
>> >         </property>
>> >
>> >
>> > Error log:
>> >
>> > at
>> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.access$1000(FileWriteAheadLogManager.java:2763)
>> > at
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:870)
>> > at
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3200)
>> > at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1116)
>> > at
>> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1799)
>> > at
>> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1721)
>> > at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1160)
>> > at
>> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1054)
>> > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:940)
>> > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:839)
>> > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:709)
>> > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678)
>> > at org.apache.ignite.Ignition.start(Ignition.java:353)
>> > ... 1 more
>> > Failed to start grid: WAL history is too short [descs=[FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000060.wal,
>> idx=60], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000061.wal,
>> idx=61], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000062.wal,
>> idx=62], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000063.wal,
>> idx=63], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000064.wal,
>> idx=64], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000065.wal,
>> idx=65], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000066.wal,
>> idx=66], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000067.wal,
>> idx=67], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000068.wal,
>> idx=68], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000069.wal,
>> idx=69], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000070.wal,
>> idx=70], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000071.wal,
>> idx=71], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000072.wal,
>> idx=72], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000073.wal,
>> idx=73], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000074.wal,
>> idx=74], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000075.wal,
>> idx=75], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000076.wal,
>> idx=76], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000077.wal,
>> idx=77], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000078.wal,
>> idx=78], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000079.wal,
>> idx=79], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000080.wal,
>> idx=80], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000081.wal,
>> idx=81], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000082.wal,
>> idx=82], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000083.wal,
>> idx=83], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000084.wal,
>> idx=84], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000085.wal,
>> idx=85], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000086.wal,
>> idx=86], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000087.wal,
>> idx=87], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000088.wal,
>> idx=88], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000089.wal,
>> idx=89], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000090.wal,
>> idx=90], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000091.wal,
>> idx=91], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000092.wal,
>> idx=92], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000093.wal,
>> idx=93], FileDescriptor
>> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000094.wal,
>> idx=94]], start=WALPointer [idx=0, fileOff=0, len=0]]
>> >
>> >
>> > On Thu, May 26, 2022 at 8:56 PM Николай Ижиков <nizhi...@apache.org>
>> wrote:
>> >>
>> >> Can you, please, send your config and full log file that contains
>> error message.
>> >>
>> >> 26 мая 2022 г., в 17:50, Surinder Mehra <redni...@gmail.com>
>> написал(а):
>> >>
>> >> Hello,
>> >> I upgraded to 2.13.0 and I am able to take sync snapshots now.
>> However, I ran into another problem while restoring from snapshot using
>> manual steps mentioned in documentation.
>> >>
>> >> We run ignite statefulset on kubernetes cluster so when we scale it to
>> N nodes, it brings up one node at a time.
>> >>
>> >> Now I am trying to attach init container which will copy /db directory
>> from snapshots to work directory after clearing db directory from work
>> directory and then start main container which runs ignite.
>> >>
>> >> It works well on single node, it's able to start cluster with snapshot
>> Data.
>> >>
>> >> When I start multiple nodes, init container will run each one of those
>> as first step. Since nodes starts one at a time, it's runs into error
>> saying "too small WAL segments data"
>> >>
>> >> I suppose that could be because 2nd node is still in init step while
>> first one is in running mode. There are few which haven't started yet,
>> waiting for 2nd node to be in running state.
>> >>
>> >> Any idea how can we make main containers wait until all init
>> containers are completed
>> >>
>> >> Asking this here as its related to ignite setup in kubernetes.
>> >>
>> >> Any help wil be appreciated. Thanks
>> >>
>> >> On Wed, 25 May 2022, 00:04 Surinder Mehra, <redni...@gmail.com> wrote:
>> >>>
>> >>> Thanks a lot. I will try this.
>> >>>
>> >>> On Tue, 24 May 2022, 23:50 Николай Ижиков, <nizhi...@apache.org>
>> wrote:
>> >>>>
>> >>>> > Does it ensure consistency while copying data which is parallely
>> getting updated by application writes
>> >>>>
>> >>>> Yes.
>> >>>>
>> >>>> From the documentation:
>> >>>>
>> >>>> «An Ignite snapshot includes a consistent cluster-wide copy of all
>> data records persisted on disk and some other files needed for a restore
>> procedure.»
>> >>>>
>> >>>> > will this be a stop the world process
>> >>>>
>> >>>> No.
>> >>>>
>> >>>>
>> >>>> 24 мая 2022 г., в 21:17, Surinder Mehra <redni...@gmail.com>
>> написал(а):
>> >>>>
>> >>>> Hi
>> >>>> Thanks for reply.
>> >>>>
>> >>>> #1:  So it's not a stop the world task. Does it ensure consistency
>> while copying data which is parallely getting updated by application
>> writes. Or does it mark the data to copied and ignore further updates on it.
>> >>>>
>> >>>> #2:
>> >>>> I will try sync snapshot. But just to confirm, will this be a stop
>> the world process. Couldn't find anything on Documentation page about it
>> >>>>
>> >>>> On Tue, 24 May 2022, 23:12 Николай Ижиков, <nizhi...@apache.org>
>> wrote:
>> >>>>>
>> >>>>> Hello, Mehra.
>> >>>>>
>> >>>>> > 1. Is it stop the world process.
>> >>>>>
>> >>>>> No, you can perform any actions.
>> >>>>> Note, topology changes will cancel snapshot create process.
>> >>>>>
>> >>>>> > 2. If so, is it stop the world only during command execution
>> (500millis) or until snapshot Dara is fully copied(takes many minutes) to
>> complete.
>> >>>>>
>> >>>>> Please, take a look at `—sync` option of create snapshot command
>> (you can see help in `control.sh` output).
>> >>>>> `EVT_CLUSTER_SNAPSHOT_FINISHED` raise on snapshot create finish.
>> >>>>>
>> >>>>> > 3. Is there a way around to speed up this other than increasing
>> snapshot threads
>> >>>>>
>> >>>>> Stop write operations.
>> >>>>> The less you change the quicker snapshot will be created.
>> >>>>>
>> >>>>> 24 мая 2022 г., в 20:12, Surinder Mehra <redni...@gmail.com>
>> написал(а):
>> >>>>>
>> >>>>> Hi,
>> >>>>> I have 3 node ignite cluster each node contains 60G work
>> directory(ebs) and I need to create snapshots.
>> >>>>> I followed steps to create snapshots and run create snapshot
>> command using control utility. Command completed in 500millis but snapshot
>> directory only had 400Mb data. Later I realised directory size grew up 30G.
>> I suppose it would reach size of work directory.
>> >>>>>
>> >>>>>
>> >>>>> I have few questions.
>> >>>>> 1. Is it stop the world process.
>> >>>>> 2. If so, is it stop the world only during command execution
>> (500millis) or until snapshot Dara is fully copied(takes many minutes) to
>> complete.
>> >>>>> 3. Is there a way around to speed up this other than increasing
>> snapshot threads
>> >>>>>
>> >>>>>
>> >>>>
>> >>
>>
>

Re: Ignite Snapshots take minutes to complete

Reply via email to