Re: [DISCUSSION] Hot cache backup

2020-04-08 Thread Andrey Dolmatov
I would like to understand your solution deeper. Hope, that my questions
are interesting not only for me:

   - What about primary/backup node data consistency. I found, that [1]
   Cassandra uses eventually consistent backups, so some backup data could
   miss from snapshot. If I apply snapshot, would Ignite detect and rebalance
   data to backup nodes?
   - I cant quite picture how persistence rebalancing works, but according
   to [2] it uses WAL logs. Snapshot doesn't contain WAL data, correct? Did
   You analyze alternative snapshot solutions based on WAL?

[1]
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsAboutSnapshots.html
[2]
https://cwiki.apache.org/confluence/display/IGNITE/Persistent+Store+Architecture#PersistentStoreArchitecture-Rebalancing

ср, 8 апр. 2020 г. в 18:22, Maxim Muzafarov :

> Andrey,
>
>
> Thanks for your questions, I've also clarified some details on the
> IEP-43 [1] page according to them.
>
> > Does snapshot contain only primary data or backup partitions or both?
>
> A snapshot contains a full copy of persistence data on each local
> node. This means all primary, backup partitions and the SQL index file
> available on the local node are copied to snapshot.
>
> > Could I create snapshot from m-node cluster and apply it to n-node
> cluster (n<>m)?
>
> Currently, the restore procedure is fully manual, but it is possible
> to restore on different topology in general. There are a few options
> here:
> - m == n, the easiest and fastest way
> - m < n, cluster will start and the rebalance will happen (see
> testClusterSnapshotWithRebalancing in PR). If some SQL indexes exist
> it may take a quite a long time to complete.
> - m > n, the hardest case. For instance, if backups > 1 you can start
> a cluster and remove node one by one from baseline. I think this case
> should be covered by additional recovery scripts which will be
> developed further.
>
> > - Should data node has extra space on persistent store to create
> snapshot? Or, from another point of view, woild size of temporary file be
> equal to size of all data on cluster node?
>
> If a cluster has no load you will need only a free space to store
> snapshot which is almost equal to the node `db` directory size.
>
> If a cluster is under the load it needs some extra space to store
> intermediate snapshot results. The amount of such space depends on how
> fast cache partition files are copied to snapshot directory (if disks
> are slow). The maximum size of the temporary file per each partition
> is equal to the size of the appropriate partition file. So, the worst
> case you need x3 extra disk size. But according to my measurements
> assume SSD is used and size of each partition is 300MB it will require
> no more than 1-3% to a cluster under high load.
>
> - What resulted snapshot is, single file or collection of files (one
> for every data node)?
>
> Check the example of the snapshot directory structure on the IEP-43
> page [1], this is how a completed snapshot will look like.
>
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Restoresnapshot(manually)
>
> On Wed, 8 Apr 2020 at 17:18, Andrey Dolmatov  wrote:
> >
> > Hi, Maxim!
> > It is very useful feature, great job!
> >
> > But could you explain me some aspects?
> >
> >- Does snapshot contain only primary data or backup partitions or
> both?
> >- Could I create snapshot from m-node cluster and apply it to n-node
> >cluster (n<>m)?
> >- Should data node has extra space on persistent store to create
> >snapshot? Or, from another point of view, woild size of temporary
> file be
> >equal to size of all data on cluster node?
> >- What resulted snapshot is, single file or collection of files (one
> for
> >every data node)?
> >
> > I apologize for my questions, but i really interested in such feature.
> >
> >
> > вт, 7 апр. 2020 г. в 22:10, Maxim Muzafarov :
> >
> > > Igniters,
> > >
> > >
> > > I'd like to back to the discussion of a snapshot operation for Apache
> > > Ignite for persistence cache groups and I propose my changes below. I
> > > have prepared everything so that the discussion is as meaningful and
> > > specific as much as possible:
> > >
> > > - IEP-43: Cluster snapshot [1]
> > > - The Jira task IGNITE-11073 [2]
> > > - PR with described changes, Patch Available [4]
> > >
> > > Changes are ready for review.
> > >
> > >
> > > Here are a few implementation details and my thoughts:
&g

Re: [DISCUSSION] Hot cache backup

2020-04-08 Thread Andrey Dolmatov
Hi, Maxim!
It is very useful feature, great job!

But could you explain me some aspects?

   - Does snapshot contain only primary data or backup partitions or both?
   - Could I create snapshot from m-node cluster and apply it to n-node
   cluster (n<>m)?
   - Should data node has extra space on persistent store to create
   snapshot? Or, from another point of view, woild size of temporary file be
   equal to size of all data on cluster node?
   - What resulted snapshot is, single file or collection of files (one for
   every data node)?

I apologize for my questions, but i really interested in such feature.


вт, 7 апр. 2020 г. в 22:10, Maxim Muzafarov :

> Igniters,
>
>
> I'd like to back to the discussion of a snapshot operation for Apache
> Ignite for persistence cache groups and I propose my changes below. I
> have prepared everything so that the discussion is as meaningful and
> specific as much as possible:
>
> - IEP-43: Cluster snapshot [1]
> - The Jira task IGNITE-11073 [2]
> - PR with described changes, Patch Available [4]
>
> Changes are ready for review.
>
>
> Here are a few implementation details and my thoughts:
>
> 1. Snapshot restore assumed to be manual at the first step. The
> process will be described on our documentation pages, but it is
> possible to start node right from the snapshot directory since the
> directory structure is preserved (check
> `testConsistentClusterSnapshotUnderLoad` in the PR). We also have some
> options here about how the restore process must look like:
> - fully manual snapshot restore (will be documented)
> - ansible or shell scripts for restore
> - Java API for restore (I doubt we should go this way).
>
> 3. The snapshot `create` procedure creates a snapshot of all
> persistent caches available on the cluster (see limitations [1]).
>
> 2. The snapshot `create` procedure is available through Java API and
> JMX (control.sh may be implemented further).
>
> Java API:
> IgniteFuture fut = ignite.snapshot()
> .createSnapshot(name);
>
> JMX:
> SnapshotMXBean mxBean = getMBean(ignite.name());
> mxBean.createSnapshot(name);
>
> 3. The Distribute Process [3] is used to perform a cluster-wide
> snapshot procedure, so we've avoided a lot of boilerplate code here.
>
> 4. The design document [1] contains also an internal API for creating
> a consistent local snapshot of requested cache groups and transfer it
> to another node using the FileTransmission protocol [6]. This is one
> of the parts of IEP-28 [5] for cluster rebalancing via partition files
> and an important part for understanding the whole design.
>
> Java API:
> public IgniteInternalFuture createRemoteSnapshot(
> UUID rmtNodeId,
> Map> parts,
> BiConsumer partConsumer);
>
>
> Please, share your thoughts and take a loot at my changes [4].
>
>
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots
> [2] https://issues.apache.org/jira/browse/IGNITE-11073
> [3]
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/util/distributed/DistributedProcess.java#L49
> [4] https://github.com/apache/ignite/pull/7607
> [5]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-28%3A+Cluster+peer-2-peer+balancing#IEP-28:Clusterpeer-2-peerbalancing-Filetransferbetweennodes
> [6]
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/managers/communication/TransmissionHandler.java#L42
>
>
> On Thu, 28 Feb 2019 at 14:43, Dmitriy Pavlov  wrote:
> >
> > Hi Maxim,
> >
> > I agree with Denis and I have just one concern here.
> >
> > Apache Ignite has quite a long story (started even before Apache), and
> now
> > it has a way too huge number of features. Some of these features
> > - are developed and well known by community members,
> > - some of them were contributed a long time ago and nobody develops it,
> > - and, actually, in some rare cases, nobody in the community knows how it
> > works and how to change it.
> >
> > Such features may attract users, but a bug in it may ruin impression
> about
> > the product. Even worse, nobody can help to solve it, and only user
> himself
> > or herself may be encouraged to contribute a fix.
> >
> > And my concern here, such a big feature should have a number of
> interested
> > contributors, who can support it in case if others lost interest. I will
> be
> > happy if 3-5 members will come and say, yes, I will do a review/I will
> help
> > with further changes.
> >
> > Just to be clear, I'm not against it, and I'll never cast -1 for it, but
> it
> > would be more comfortable to develop this feature with understanding that
> > this work will not be useless.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > ср, 27 февр. 2019 г. в 23:36, Denis Magda :
> >
> > > Maxim,
> > >
> > > GridGain has this exact feature available for Ignite native persistence
> > > deployments. It's not as easy as it might have been seen from the
> > > enablement perspective. Took us many 

Hello

2020-04-07 Thread Andrey Dolmatov
Hello everyone!I professionally use ApacheIgnite data/compute grid for
about 2 years.
I would like to contribute to Ignite. My fields of interest are:
persistence, cache.

My Jira account is:
Username: itdolmatov