Re: how to take consistant snapshot?

2012-12-07 Thread Tyler Hobbs
Snapshots trigger a flush first, so data that's currently in the commit log
will be covered by the snapshot.


On Thu, Dec 6, 2012 at 11:52 PM, Andrey Ilinykh ailin...@gmail.com wrote:




 On Thu, Dec 6, 2012 at 7:34 PM, aaron morton aa...@thelastpickle.comwrote:

 For background


 http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups

 If you it for a single node then yes there is a chance of inconsistency
 across CF's.

 If you have mulitple nodes the snashots you take on the later nodes will
 help. If you use CL QUOURM for reads you *may* be ok (cannot work it out
 quickly.). If you use CL ALL for reads you will be ok. Or you can use
 nodetool repair to ensure the data is consistent.

 I'm talking about restoring whole cluster, so all nodes are restored from
 backup and all of them are inconsistent because they lost data  from commit
 logs.  It doesn't matter what CL I use, some data may be lost.
 Cassandra 1.1 supports commit log archiving
 http://www.datastax.com/docs/1.1/configuration/commitlog_archiving
 I think if I store both flushed sstables and commit logs it should solve
 my problem. I'm wondering if someone has any experience with this feature?

 Thank you,
   Andrey




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: how to take consistant snapshot?

2012-12-07 Thread Andrey Ilinykh
That's right. But when I have incremental backup on each CF gets flushed
independently. I have hot CF which gets flushed every several minutes and
regular CF which gets flushed every hour or so. They have references to
each other and data in sstables is definitely inconsistent.



On Fri, Dec 7, 2012 at 9:28 AM, Tyler Hobbs ty...@datastax.com wrote:

 Snapshots trigger a flush first, so data that's currently in the commit
 log will be covered by the snapshot.


 On Thu, Dec 6, 2012 at 11:52 PM, Andrey Ilinykh ailin...@gmail.comwrote:




 On Thu, Dec 6, 2012 at 7:34 PM, aaron morton aa...@thelastpickle.comwrote:

 For background


 http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups

 If you it for a single node then yes there is a chance of inconsistency
 across CF's.

 If you have mulitple nodes the snashots you take on the later nodes will
 help. If you use CL QUOURM for reads you *may* be ok (cannot work it out
 quickly.). If you use CL ALL for reads you will be ok. Or you can use
 nodetool repair to ensure the data is consistent.

 I'm talking about restoring whole cluster, so all nodes are restored
 from backup and all of them are inconsistent because they lost data  from
 commit logs.  It doesn't matter what CL I use, some data may be lost.
 Cassandra 1.1 supports commit log archiving
 http://www.datastax.com/docs/1.1/configuration/commitlog_archiving
  I think if I store both flushed sstables and commit logs it should solve
 my problem. I'm wondering if someone has any experience with this feature?

 Thank you,
   Andrey




 --
 Tyler Hobbs
 DataStax http://datastax.com/




Re: how to take consistant snapshot?

2012-12-07 Thread Tyler Hobbs
Right.  I don't personally think incremental backup is useful beyond
restoring individual nodes unless none of your data happens to reference
any other rows.


On Fri, Dec 7, 2012 at 11:37 AM, Andrey Ilinykh ailin...@gmail.com wrote:

 That's right. But when I have incremental backup on each CF gets flushed
 independently. I have hot CF which gets flushed every several minutes and
 regular CF which gets flushed every hour or so. They have references to
 each other and data in sstables is definitely inconsistent.



 On Fri, Dec 7, 2012 at 9:28 AM, Tyler Hobbs ty...@datastax.com wrote:

 Snapshots trigger a flush first, so data that's currently in the commit
 log will be covered by the snapshot.


 On Thu, Dec 6, 2012 at 11:52 PM, Andrey Ilinykh ailin...@gmail.comwrote:




 On Thu, Dec 6, 2012 at 7:34 PM, aaron morton aa...@thelastpickle.comwrote:

 For background


 http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups

 If you it for a single node then yes there is a chance of inconsistency
 across CF's.

 If you have mulitple nodes the snashots you take on the later nodes
 will help. If you use CL QUOURM for reads you *may* be ok (cannot work it
 out quickly.). If you use CL ALL for reads you will be ok. Or you can use
 nodetool repair to ensure the data is consistent.

 I'm talking about restoring whole cluster, so all nodes are restored
 from backup and all of them are inconsistent because they lost data  from
 commit logs.  It doesn't matter what CL I use, some data may be lost.
 Cassandra 1.1 supports commit log archiving
 http://www.datastax.com/docs/1.1/configuration/commitlog_archiving
  I think if I store both flushed sstables and commit logs it should
 solve my problem. I'm wondering if someone has any experience with this
 feature?

 Thank you,
   Andrey




 --
 Tyler Hobbs
 DataStax http://datastax.com/





-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: how to take consistant snapshot?

2012-12-07 Thread Andrey Ilinykh
Agreed.



On Fri, Dec 7, 2012 at 12:38 PM, Tyler Hobbs ty...@datastax.com wrote:

 Right.  I don't personally think incremental backup is useful beyond
 restoring individual nodes unless none of your data happens to reference
 any other rows.


 On Fri, Dec 7, 2012 at 11:37 AM, Andrey Ilinykh ailin...@gmail.comwrote:

 That's right. But when I have incremental backup on each CF gets flushed
 independently. I have hot CF which gets flushed every several minutes and
 regular CF which gets flushed every hour or so. They have references to
 each other and data in sstables is definitely inconsistent.



 On Fri, Dec 7, 2012 at 9:28 AM, Tyler Hobbs ty...@datastax.com wrote:

 Snapshots trigger a flush first, so data that's currently in the commit
 log will be covered by the snapshot.


 On Thu, Dec 6, 2012 at 11:52 PM, Andrey Ilinykh ailin...@gmail.comwrote:




 On Thu, Dec 6, 2012 at 7:34 PM, aaron morton 
 aa...@thelastpickle.comwrote:

 For background


 http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups

 If you it for a single node then yes there is a chance of
 inconsistency across CF's.

 If you have mulitple nodes the snashots you take on the later nodes
 will help. If you use CL QUOURM for reads you *may* be ok (cannot work it
 out quickly.). If you use CL ALL for reads you will be ok. Or you can use
 nodetool repair to ensure the data is consistent.

 I'm talking about restoring whole cluster, so all nodes are restored
 from backup and all of them are inconsistent because they lost data  from
 commit logs.  It doesn't matter what CL I use, some data may be lost.
 Cassandra 1.1 supports commit log archiving
 http://www.datastax.com/docs/1.1/configuration/commitlog_archiving
  I think if I store both flushed sstables and commit logs it should
 solve my problem. I'm wondering if someone has any experience with this
 feature?

 Thank you,
   Andrey




 --
 Tyler Hobbs
 DataStax http://datastax.com/





 --
 Tyler Hobbs
 DataStax http://datastax.com/




Re: how to take consistant snapshot?

2012-12-06 Thread aaron morton
For background

http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backups

If you it for a single node then yes there is a chance of inconsistency across 
CF's. 

If you have mulitple nodes the snashots you take on the later nodes will help. 
If you use CL QUOURM for reads you *may* be ok (cannot work it out quickly.). 
If you use CL ALL for reads you will be ok. Or you can use nodetool repair to 
ensure the data is consistent. 

I doubt that even using repair would give you a provable guarantee though. 
Anyone ?

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/12/2012, at 7:56 AM, Andrey Ilinykh ailin...@gmail.com wrote:

 Hello, everybody!
 I have production cluster with incremental backup on and I want to clone it 
 (create test one). I don't understand one thing- each column family gets 
 flushed (and copied to backup storage) independently. Which means the total 
 snapshot is inconsistent. If I restore from such snapshot  I have totally 
 useless system. To be more specific, let's say I have two CF, one serves as 
 an index for another. Every time I update one CF I update index CF. There is 
 a good chance that all replicas flush index CF first. Then I move it into 
 backup storage, restore and get CF which has pointers to non existent data in 
 another CF. What is a way to avoid this situation?
 
 Thank you,
   Andrey



Re: how to take consistant snapshot?

2012-12-06 Thread Andrey Ilinykh
On Thu, Dec 6, 2012 at 7:34 PM, aaron morton aa...@thelastpickle.comwrote:

 For background


 http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backupshttp://wiki.apache.org/cassandra/Operations?highlight=(snapshot)#Consistent_backups

 If you it for a single node then yes there is a chance of inconsistency
 across CF's.

 If you have mulitple nodes the snashots you take on the later nodes will
 help. If you use CL QUOURM for reads you *may* be ok (cannot work it out
 quickly.). If you use CL ALL for reads you will be ok. Or you can use
 nodetool repair to ensure the data is consistent.

 I'm talking about restoring whole cluster, so all nodes are restored from
backup and all of them are inconsistent because they lost data  from commit
logs.  It doesn't matter what CL I use, some data may be lost.
Cassandra 1.1 supports commit log archiving
http://www.datastax.com/docs/1.1/configuration/commitlog_archiving
I think if I store both flushed sstables and commit logs it should solve my
problem. I'm wondering if someone has any experience with this feature?

Thank you,
  Andrey


how to take consistant snapshot?

2012-12-05 Thread Andrey Ilinykh
Hello, everybody!
I have production cluster with incremental backup on and I want to clone it
(create test one). I don't understand one thing- each column family gets
flushed (and copied to backup storage) independently. Which means the total
snapshot is inconsistent. If I restore from such snapshot  I have totally
useless system. To be more specific, let's say I have two CF, one serves as
an index for another. Every time I update one CF I update index CF. There
is a good chance that all replicas flush index CF first. Then I move it
into backup storage, restore and get CF which has pointers to
non existent data in another CF. What is a way to avoid this situation?

Thank you,
  Andrey