Hello, stackers. I'd like to start thread related to backuping procedure
for MagnetoDB, to be precise, for Cassandra backend.

In order to accomplish backuping procedure for Cassandra we need to
understand how does backuping work.

To perform backuping:


   We need to SSH into each node

   Call ‘nodetool snapshot’ with appropriate parameters

   Collect backup.

   Send backup to remote storage.

   Remove initial snapshot

Lets take a look how does ‘nodetool snapshot’ works. Cassandra backs up
data by taking a snapshot of all on-disk data files (SSTable files) stored
in the data directory. Each time an SSTable gets flushed and snapshotted it
becomes a hard link against initial SSTable pinned to specific timestamp.

Snapshots are taken per keyspace or per-CF and while the system is online.
However, nodes must be taken offline in order to restore a snapshot.

Using a parallel ssh tool (such as pssh), you can flush and then snapshot
an entire cluster. This provides an eventually consistent backup. Although
no one node is guaranteed to be consistent with its replica nodes at the
time a snapshot is taken, a restored snapshot can resume consistency using
Cassandra's built-in consistency mechanisms.

After a system-wide snapshot has been taken, you can enable incremental
backups on each node (disabled by default) to backup data that has changed
since the last snapshot was taken. Each time an SSTable is flushed, a hard
link is copied into a /backups subdirectory of the data directory.

Now lets see how can we deal with snapshot once its taken. Below you can
see a list of command that needs to be executed to prepare a snapshot:

    Flushing SSTables for consistency

    'nodetool flush'

    Creating snapshots (for example of all keyspaces)

    "nodetool snapshot -t %(backup_name)s 1>/dev/null",



   backup_name - is a name of snapshot

Once it’s done we would need to collect all hard links into a common
directory (with keeping initial file hierarchy):

sudo tar cpzfP /tmp/all_ks.tar.gz\

$(sudo find %(datadir)s -type d -name %(backup_name)s)"



   backup_name - is a name of snapshot,

   datadir - storage location (/var/lib/cassandra/data, by the default)

Note that this operation can be extended:


   if cassandra was launched with more than one data directory (see

   if we want to backup only:

      certain keyspaces at the same time

      one keyspace

      a list of CF’s for given keyspace

Best regards,
Denis Makogon
