subject:"\[openstack\-dev\] \[magnetodb\] Backup procedure for Cassandra backend"

Re: [openstack-dev] [magnetodb] Backup procedure for Cassandra backend

2014-09-02 Thread Dmitriy Ukhlov

Hi Romain!

Thank you for useful info about your Cassandra backuping.

We have not tried to tune Cassandra compaction properties yet.
MagnetoDB is DynamoDB-like REST API and it means that it is key-value
storage itself and it should be able to work for different kind of load,
because it depends on user application which use MagnetoDB.

Do you have some recommendation or comments based on information about
read/write ratio?

On Tue, Sep 2, 2014 at 4:29 PM, Romain Hardouin
romain.hardo...@cloudwatt.com wrote:

Hi Mirantis guys,

I have set up two Cassandra backups:
The first backup procedure was similar to the one you want to achieve.
The second backup used SAN features (EMC VNX snapshots) so it was very
specific to the environment.

Backup an entire cluster (therefore all replicas) is challenging when
dealing with big data and not really needed. If your replicas are spread
accross several data centers then you could backup just one data center. In
that case you backup only one replica.
Depending on your needs you may want to backup twice (I mean backup the
backup using a tape library for example) and then store it in an external
location for disaster recovery, requirements specification, norms, etc.

The snapshot command issues a flush before to effectively take the
snapshot. So the flush command is not necessary.

https://github.com/apache/cassandra/blob/c7ebc01bbc6aa602b91e105b935d6779245c87d1/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2213
(snapshotWithoutFlush() is used by the scrub command)

Just out of curiosity, have you tried the leveled compaction strategy? It
seems that you use STCS.
Does your use case imply many updates? What is your read/write ratio?

Best,

Romain

--
*From: *Denis Makogon dmako...@mirantis.com
*To: *OpenStack Development Mailing List (not for usage questions)
openstack-dev@lists.openstack.org
*Sent: *Friday, August 29, 2014 4:33:59 PM
*Subject: *Re: [openstack-dev] [magnetodb] Backup procedure for
Cassandrabackend

On Fri, Aug 29, 2014 at 4:29 PM, Dmitriy Ukhlov dukh...@mirantis.com
wrote:

Hello Denis,
Thank you for very useful knowledge sharing.

But I have one more question. As far as I understood if we have
replication factor 3 it means that our backup may contain three copies of
the same data. Also it may contain some not compacted sstables set. Do we
have any ability to compact collected backup data before moving it to
backup storage?

Thanks for fast response, Dmitriy.

With replication factor 3 - yes, this looks like a feature that allows to
backup only one node instead of 3 of them. In other cases, we would need to
iterate over each node, as you know.
Correct, it is possible to have not compacted SSTables. To accomplish
compaction we might need to use compaction mechanism provided by the
nodetool, see
http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCompact.html,
we just need take into account that it's possible that sstable was already
compacted and force compaction wouldn't give valuable benefits.

Best regards,
Denis Makogon

On Fri, Aug 29, 2014 at 2:01 PM, Denis Makogon dmako...@mirantis.com
wrote:

Hello, stackers. I'd like to start thread related to backuping procedure
for MagnetoDB, to be precise, for Cassandra backend.

In order to accomplish backuping procedure for Cassandra we need to
understand how does backuping work.

To perform backuping:

We need to SSH into each node
2.

Call ‘nodetool snapshot’ with appropriate parameters
3.

Collect backup.
4.

Send backup to remote storage.
5.

Remove initial snapshot

Lets take a look how does ‘nodetool snapshot’ works. Cassandra backs
up data by taking a snapshot of all on-disk data files (SSTable files)
stored in the data directory. Each time an SSTable gets flushed and
snapshotted it becomes a hard link against initial SSTable pinned to
specific timestamp.

Snapshots are taken per keyspace or per-CF and while the system is
online. However, nodes must be taken offline in order to restore a snapshot.

Using a parallel ssh tool (such as pssh), you can flush and then
snapshot an entire cluster. This provides an eventually consistent
backup. Although no one node is guaranteed to be consistent with its
replica nodes at the time a snapshot is taken, a restored snapshot can
resume consistency using Cassandra's built-in consistency mechanisms.

After a system-wide snapshot has been taken, you can enable incremental
backups on each node (disabled by default) to backup data that has changed
since the last snapshot was taken. Each time an SSTable is flushed, a hard
link is copied into a /backups subdirectory of the data directory.

Now lets see how can we deal with snapshot once its taken. Below you can
see a list of command that needs to be executed to prepare a snapshot:

Flushing SSTables for consistency

Re: [openstack-dev] [magnetodb] Backup procedure for Cassandra backend

2014-09-02 Thread Romain Hardouin

On Tue, 2014-09-02 at 17:34 +0300, Dmitriy Ukhlov wrote:
 Hi Romain!
 
 
 Thank you for useful info about your Cassandra backuping.

It's always a pleasure to talk about Cassandra :)

 
 We have not tried to tune Cassandra compaction properties yet.
 
 MagnetoDB is DynamoDB-like REST API and it means that it is key-value
 storage itself and it should be able to work for different kind of
 load, because it depends on user application which use MagnetoDB.

The compaction strategy choice really matters when setting up a cluster.
In such a use case, I mean MagnetoDB, we can assume that the database
will be updated frequently. Thus LCS is more suitable than STCS.


 Do you have some recommendation or comments based on information about
 read/write ratio?

Yes, if read/write ratio = 2 then LCS is a must have.
Just be aware that LCS is more IO intensive during compaction than STCS,
but it's for a good cause.

You'll find information here:
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

Best,

Romain




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [magnetodb] Backup procedure for Cassandra backend

2014-08-29 Thread Denis Makogon

Hello, stackers. I'd like to start thread related to backuping procedure
for MagnetoDB, to be precise, for Cassandra backend.

In order to accomplish backuping procedure for Cassandra we need to
understand how does backuping work.

To perform backuping:

We need to SSH into each node
2.

Call ‘nodetool snapshot’ with appropriate parameters
3.

Collect backup.
4.

Send backup to remote storage.
5.

Remove initial snapshot

Lets take a look how does ‘nodetool snapshot’ works. Cassandra backs up
data by taking a snapshot of all on-disk data files (SSTable files) stored
in the data directory. Each time an SSTable gets flushed and snapshotted it
becomes a hard link against initial SSTable pinned to specific timestamp.

Snapshots are taken per keyspace or per-CF and while the system is online.
However, nodes must be taken offline in order to restore a snapshot.

Using a parallel ssh tool (such as pssh), you can flush and then snapshot
an entire cluster. This provides an eventually consistent backup. Although
no one node is guaranteed to be consistent with its replica nodes at the
time a snapshot is taken, a restored snapshot can resume consistency using
Cassandra's built-in consistency mechanisms.

Now lets see how can we deal with snapshot once its taken. Below you can
see a list of command that needs to be executed to prepare a snapshot:

Flushing SSTables for consistency

'nodetool flush'

Creating snapshots (for example of all keyspaces)

nodetool snapshot -t %(backup_name)s 1/dev/null,

where

backup_name - is a name of snapshot

Once it’s done we would need to collect all hard links into a common
directory (with keeping initial file hierarchy):

sudo tar cpzfP /tmp/all_ks.tar.gz\

$(sudo find %(datadir)s -type d -name %(backup_name)s)

where

backup_name - is a name of snapshot,
-

datadir - storage location (/var/lib/cassandra/data, by the default)

Note that this operation can be extended:

if cassandra was launched with more than one data directory (see
cassandra.yaml

http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
)
-

if we want to backup only:
-

certain keyspaces at the same time
-

one keyspace
-

a list of CF’s for given keyspace

Useful links

http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html

Best regards,
Denis Makogon
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [magnetodb] Backup procedure for Cassandra backend

2014-08-29 Thread Dmitriy Ukhlov

Hello Denis,
Thank you for very useful knowledge sharing.

But I have one more question. As far as I understood if we have replication
factor 3 it means that our backup may contain three copies of the same
data. Also it may contain some not compacted sstables set. Do we have any
ability to compact collected backup data before moving it to backup storage?

On Fri, Aug 29, 2014 at 2:01 PM, Denis Makogon dmako...@mirantis.com
wrote:

Hello, stackers. I'd like to start thread related to backuping procedure
for MagnetoDB, to be precise, for Cassandra backend.

In order to accomplish backuping procedure for Cassandra we need to
understand how does backuping work.

To perform backuping:

We need to SSH into each node
2.

Call ‘nodetool snapshot’ with appropriate parameters
3.

Collect backup.
4.

Send backup to remote storage.
5.

Remove initial snapshot

Lets take a look how does ‘nodetool snapshot’ works. Cassandra backs up
data by taking a snapshot of all on-disk data files (SSTable files) stored
in the data directory. Each time an SSTable gets flushed and snapshotted it
becomes a hard link against initial SSTable pinned to specific timestamp.

Snapshots are taken per keyspace or per-CF and while the system is online.
However, nodes must be taken offline in order to restore a snapshot.

Using a parallel ssh tool (such as pssh), you can flush and then snapshot
an entire cluster. This provides an eventually consistent backup.
Although no one node is guaranteed to be consistent with its replica nodes
at the time a snapshot is taken, a restored snapshot can resume consistency
using Cassandra's built-in consistency mechanisms.

Now lets see how can we deal with snapshot once its taken. Below you can
see a list of command that needs to be executed to prepare a snapshot:

Flushing SSTables for consistency

'nodetool flush'

Creating snapshots (for example of all keyspaces)

nodetool snapshot -t %(backup_name)s 1/dev/null,

where

backup_name - is a name of snapshot

Once it’s done we would need to collect all hard links into a common
directory (with keeping initial file hierarchy):

sudo tar cpzfP /tmp/all_ks.tar.gz\

$(sudo find %(datadir)s -type d -name %(backup_name)s)

where

backup_name - is a name of snapshot,
-

datadir - storage location (/var/lib/cassandra/data, by the default)

Note that this operation can be extended:

if cassandra was launched with more than one data directory (see
cassandra.yaml

http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
)
-

if we want to backup only:
-

certain keyspaces at the same time
-

one keyspace
-

a list of CF’s for given keyspace

Useful links

http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html

Best regards,
Denis Makogon

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Best regards,
Dmitriy Ukhlov
Mirantis Inc.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [magnetodb] Backup procedure for Cassandra backend

2014-08-29 Thread Denis Makogon

On Fri, Aug 29, 2014 at 4:29 PM, Dmitriy Ukhlov dukh...@mirantis.com
wrote:

Hello Denis,
Thank you for very useful knowledge sharing.

Thanks for fast response, Dmitriy.

Best regards,
Denis Makogon

On Fri, Aug 29, 2014 at 2:01 PM, Denis Makogon dmako...@mirantis.com
wrote:

Hello, stackers. I'd like to start thread related to backuping procedure
for MagnetoDB, to be precise, for Cassandra backend.

In order to accomplish backuping procedure for Cassandra we need to
understand how does backuping work.

To perform backuping:

We need to SSH into each node
2.

Call ‘nodetool snapshot’ with appropriate parameters
3.

Collect backup.
4.

Send backup to remote storage.
5.

Remove initial snapshot

Lets take a look how does ‘nodetool snapshot’ works. Cassandra backs up
data by taking a snapshot of all on-disk data files (SSTable files) stored
in the data directory. Each time an SSTable gets flushed and snapshotted it
becomes a hard link against initial SSTable pinned to specific timestamp.

Snapshots are taken per keyspace or per-CF and while the system is
online. However, nodes must be taken offline in order to restore a snapshot.

Using a parallel ssh tool (such as pssh), you can flush and then snapshot
an entire cluster. This provides an eventually consistent backup.
Although no one node is guaranteed to be consistent with its replica nodes
at the time a snapshot is taken, a restored snapshot can resume consistency
using Cassandra's built-in consistency mechanisms.

Now lets see how can we deal with snapshot once its taken. Below you can
see a list of command that needs to be executed to prepare a snapshot:

Flushing SSTables for consistency

'nodetool flush'

Creating snapshots (for example of all keyspaces)

nodetool snapshot -t %(backup_name)s 1/dev/null,

where

backup_name - is a name of snapshot

Once it’s done we would need to collect all hard links into a common
directory (with keeping initial file hierarchy):

sudo tar cpzfP /tmp/all_ks.tar.gz\

$(sudo find %(datadir)s -type d -name %(backup_name)s)

where

backup_name - is a name of snapshot,
-

datadir - storage location (/var/lib/cassandra/data, by the default)

Note that this operation can be extended:

if cassandra was launched with more than one data directory (see
cassandra.yaml

http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
)
-

if we want to backup only:
-

certain keyspaces at the same time
-

one keyspace
-

a list of CF’s for given keyspace

Useful links

http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html

Best regards,
Denis Makogon

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Best regards,
Dmitriy Ukhlov
Mirantis Inc.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [magnetodb] Backup procedure for Cassandra backend

Re: [openstack-dev] [magnetodb] Backup procedure for Cassandra backend

[openstack-dev] [magnetodb] Backup procedure for Cassandra backend

Re: [openstack-dev] [magnetodb] Backup procedure for Cassandra backend

Re: [openstack-dev] [magnetodb] Backup procedure for Cassandra backend

5 matches

Site Navigation

Mail list logo

Footer information