Hi, Maxim!
It is very useful feature, great job!
But could you explain me some aspects?
- Does snapshot contain only primary data or backup partitions or both?
- Could I create snapshot from m-node cluster and apply it to n-node
cluster (n<>m)?
- Should data node has extra space on persistent store to create
snapshot? Or, from another point of view, woild size of temporary file be
equal to size of all data on cluster node?
- What resulted snapshot is, single file or collection of files (one for
every data node)?
I apologize for my questions, but i really interested in such feature.
вт, 7 апр. 2020 г. в 22:10, Maxim Muzafarov :
> Igniters,
>
>
> I'd like to back to the discussion of a snapshot operation for Apache
> Ignite for persistence cache groups and I propose my changes below. I
> have prepared everything so that the discussion is as meaningful and
> specific as much as possible:
>
> - IEP-43: Cluster snapshot [1]
> - The Jira task IGNITE-11073 [2]
> - PR with described changes, Patch Available [4]
>
> Changes are ready for review.
>
>
> Here are a few implementation details and my thoughts:
>
> 1. Snapshot restore assumed to be manual at the first step. The
> process will be described on our documentation pages, but it is
> possible to start node right from the snapshot directory since the
> directory structure is preserved (check
> `testConsistentClusterSnapshotUnderLoad` in the PR). We also have some
> options here about how the restore process must look like:
> - fully manual snapshot restore (will be documented)
> - ansible or shell scripts for restore
> - Java API for restore (I doubt we should go this way).
>
> 3. The snapshot `create` procedure creates a snapshot of all
> persistent caches available on the cluster (see limitations [1]).
>
> 2. The snapshot `create` procedure is available through Java API and
> JMX (control.sh may be implemented further).
>
> Java API:
> IgniteFuture fut = ignite.snapshot()
> .createSnapshot(name);
>
> JMX:
> SnapshotMXBean mxBean = getMBean(ignite.name());
> mxBean.createSnapshot(name);
>
> 3. The Distribute Process [3] is used to perform a cluster-wide
> snapshot procedure, so we've avoided a lot of boilerplate code here.
>
> 4. The design document [1] contains also an internal API for creating
> a consistent local snapshot of requested cache groups and transfer it
> to another node using the FileTransmission protocol [6]. This is one
> of the parts of IEP-28 [5] for cluster rebalancing via partition files
> and an important part for understanding the whole design.
>
> Java API:
> public IgniteInternalFuture createRemoteSnapshot(
> UUID rmtNodeId,
> Map> parts,
> BiConsumer partConsumer);
>
>
> Please, share your thoughts and take a loot at my changes [4].
>
>
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots
> [2] https://issues.apache.org/jira/browse/IGNITE-11073
> [3]
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/util/distributed/DistributedProcess.java#L49
> [4] https://github.com/apache/ignite/pull/7607
> [5]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-28%3A+Cluster+peer-2-peer+balancing#IEP-28:Clusterpeer-2-peerbalancing-Filetransferbetweennodes
> [6]
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/managers/communication/TransmissionHandler.java#L42
>
>
> On Thu, 28 Feb 2019 at 14:43, Dmitriy Pavlov wrote:
> >
> > Hi Maxim,
> >
> > I agree with Denis and I have just one concern here.
> >
> > Apache Ignite has quite a long story (started even before Apache), and
> now
> > it has a way too huge number of features. Some of these features
> > - are developed and well known by community members,
> > - some of them were contributed a long time ago and nobody develops it,
> > - and, actually, in some rare cases, nobody in the community knows how it
> > works and how to change it.
> >
> > Such features may attract users, but a bug in it may ruin impression
> about
> > the product. Even worse, nobody can help to solve it, and only user
> himself
> > or herself may be encouraged to contribute a fix.
> >
> > And my concern here, such a big feature should have a number of
> interested
> > contributors, who can support it in case if others lost interest. I will
> be
> > happy if 3-5 members will come and say, yes, I will do a review/I will
> help
> > with further changes.
> >
> > Just to be clear, I'm not against it, and I'll never cast -1 for it, but
> it
> > would be more comfortable to develop this feature with understanding that
> > this work will not be useless.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > ср, 27 февр. 2019 г. в 23:36, Denis Magda :
> >
> > > Maxim,
> > >
> > > GridGain has this exact feature available for Ignite native persistence
> > > deployments. It's not as easy as it might have been seen from the
> > > enablement perspective. Took us many