MVCC

2009-08-03 Thread Ivan Chang
Does Cassandra support MVCC?

I am building an application with concurrent updates (add, update, delete)
and one of the requirements is to be able to run audits that reproduce all
the update histories and the data objects in different versions.  What's the
best way to go about this in Cassandra?  As long as histories and versions
are maintained?  Does Cassandra support MVCC?

-Ivan


Re: MVCC

2009-08-03 Thread Jun Rao

Ivan,

The original cassandra keeps multiple versions of the column data. However,
that support has been removed in the apache code. Right now, only the
latest version is kept. In the future, we could add the versioning support
back.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

jun...@almaden.ibm.com



|
| From:  |
|
  
--|
  |Ivan Chang ivan.ch...@medigy.com   
 |
  
--|
|
| To:|
|
  
--|
  |cassandra-user@incubator.apache.org  
 |
  
--|
|
| Date:  |
|
  
--|
  |08/03/2009 08:24 AM  
 |
  
--|
|
| Subject:   |
|
  
--|
  |MVCC 
 |
  
--|





Does Cassandra support MVCC?

I am building an application with concurrent updates (add, update, delete)
and one of the requirements is to be able to run audits that reproduce all
the update histories and the data objects in different versions.  What's
the best way to go about this in Cassandra?  As long as histories and
versions are maintained?  Does Cassandra support MVCC?

-Ivan
inline: graycol.gifinline: ecblank.gif

Re: MVCC

2009-08-03 Thread Chris Goffinet

How was it used in the original?

On Aug 3, 2009, at 8:49 AM, Jun Rao wrote:


Ivan,

The original cassandra keeps multiple versions of the column data.  
However, that support has been removed in the apache code. Right  
now, only the latest version is kept. In the future, we could add  
the versioning support back.


Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099

jun...@almaden.ibm.com


graycol.gifIvan Chang ---08/03/2009 08:24:50 AM---Does Cassandra  
support MVCC? I am building an application with concurrent updates  
(add, update, dele


ecblank.gif
From:   ecblank.gif
Ivan Chang ivan.ch...@medigy.com
ecblank.gif
To: ecblank.gif
cassandra-user@incubator.apache.org
ecblank.gif
Date:   ecblank.gif
08/03/2009 08:24 AM
ecblank.gif
Subject:ecblank.gif
MVCC



Does Cassandra support MVCC?

I am building an application with concurrent updates (add, update,  
delete) and one of the requirements is to be able to run audits that  
reproduce all the update histories and the data objects in different  
versions.  What's the best way to go about this in Cassandra?  As  
long as histories and versions are maintained?  Does Cassandra  
support MVCC?


-Ivan





Re: MVCC

2009-08-03 Thread Jonathan Ellis
On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote:
 Ivan,

 The original cassandra keeps multiple versions of the column data.

No, it didn't.  (It had versioning-related bugs but multiple versions
a la Bigtable was never part of the design.)

-Jonathan


Re: MVCC

2009-08-03 Thread mobiledreamers
I always thought cassandra had free multiple versions and we needed to
manually delete the older versions

On Mon, Aug 3, 2009 at 8:56 AM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote:
  Ivan,
 
  The original cassandra keeps multiple versions of the column data.

 No, it didn't.  (It had versioning-related bugs but multiple versions
 a la Bigtable was never part of the design.)

 -Jonathan




-- 
Bidegg worlds best auction site
http://bidegg.com


Re: MVCC

2009-08-03 Thread Mark McBride
If this is the case, what does the timestamp passed in to the remove
call do?  I assumed you had to have it match up with a specific
version...

On Mon, Aug 3, 2009 at 9:53 AM, mobiledream...@gmail.com wrote:
 I always thought cassandra had free multiple versions and we needed to
 manually delete the older versions

 On Mon, Aug 3, 2009 at 8:56 AM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote:
  Ivan,
 
  The original cassandra keeps multiple versions of the column data.

 No, it didn't.  (It had versioning-related bugs but multiple versions
 a la Bigtable was never part of the design.)

 -Jonathan



 --
 Bidegg worlds best auction site
 http://bidegg.com



Re: MVCC

2009-08-03 Thread Jonathan Ellis
It's there for the same reason as the other timestamps: it lets
cassandra ignore obsolete operations.  So if you do a delete at time X
and an insert at time Y where X  Y, the insert will not be deleted by
mistake even if a node is down temporarily and gets the delete later.

-Jonathan

On Mon, Aug 3, 2009 at 11:59 AM, Mark McBridemark.mcbr...@gmail.com wrote:
 If this is the case, what does the timestamp passed in to the remove
 call do?  I assumed you had to have it match up with a specific
 version...

 On Mon, Aug 3, 2009 at 9:53 AM, mobiledream...@gmail.com wrote:
 I always thought cassandra had free multiple versions and we needed to
 manually delete the older versions

 On Mon, Aug 3, 2009 at 8:56 AM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote:
  Ivan,
 
  The original cassandra keeps multiple versions of the column data.

 No, it didn't.  (It had versioning-related bugs but multiple versions
 a la Bigtable was never part of the design.)

 -Jonathan



 --
 Bidegg worlds best auction site
 http://bidegg.com




Re: MVCC

2009-08-03 Thread Wilson Mar
So if different servers are not synchronized in time (to a Tier 1 time
server), then updates from slower server will not be updated on faster
servers?


Re: MVCC

2009-08-03 Thread Mark McBride
Thanks, that makes sense.  Is it an ok general rule that the
timestamps should be set to

1) The time that the data to be mutated was generated
2) The current system time if the time the data was mutated isn't available

Looking around at code it seems like time 0 is used a lot, which seems
pretty dangerous.

   ---Mark

On Mon, Aug 3, 2009 at 10:10 AM, Wilson Marwilson...@gmail.com wrote:
 So if different servers are not synchronized in time (to a Tier 1 time
 server), then updates from slower server will not be updated on faster
 servers?



Re: MVCC

2009-08-03 Thread Jonathan Ellis
Strictly speaking, no; timestamp is client-provided.

But in the sense that you'd better use ntpd on your clients, yes.

On Mon, Aug 3, 2009 at 12:10 PM, Wilson Marwilson...@gmail.com wrote:
 So if different servers are not synchronized in time (to a Tier 1 time
 server), then updates from slower server will not be updated on faster
 servers?



Re: MVCC

2009-08-03 Thread Jonathan Ellis
On Mon, Aug 3, 2009 at 12:12 PM, Mark McBridemark.mcbr...@gmail.com wrote:
 Thanks, that makes sense.  Is it an ok general rule that the
 timestamps should be set to

 1) The time that the data to be mutated was generated
 2) The current system time if the time the data was mutated isn't available

Yes.

 Looking around at code it seems like time 0 is used a lot, which seems
 pretty dangerous.

We do this in test code to make it obviously clock-independent, yes.

-Jonathan


Re: MVCC

2009-08-03 Thread Mark McBride
Cool.  There are a few things I've found out recently that should
probably go into the wiki (this, the fact that get_columns_since
silently returns no results if your column family isn't ordered by
time)... is it moderated at all?  Should I run changes by the mailing
list?

On Mon, Aug 3, 2009 at 10:15 AM, Jonathan Ellisjbel...@gmail.com wrote:
 On Mon, Aug 3, 2009 at 12:12 PM, Mark McBridemark.mcbr...@gmail.com wrote:
 Thanks, that makes sense.  Is it an ok general rule that the
 timestamps should be set to

 1) The time that the data to be mutated was generated
 2) The current system time if the time the data was mutated isn't available

 Yes.

 Looking around at code it seems like time 0 is used a lot, which seems
 pretty dangerous.

 We do this in test code to make it obviously clock-independent, yes.

 -Jonathan



Re: MVCC

2009-08-03 Thread Ivan Chang
Is this going to be an inherent limitation of Cassandra?

There is no doubt many applications will benefit from db with build-in
support for mutliple versions of the same data - features that allow
reversal of operations, applications that require historical data maintained
(e.g. credit/debit application) for indefinite amount of time or number of
versions.

It would be nice to be able to configure column famillies with versioning
attrbutes.
Would we ever get that or we have to implement our own version stack in
Cassandra.

-Ivan

On Mon, Aug 3, 2009 at 11:56 AM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote:
  Ivan,
 
  The original cassandra keeps multiple versions of the column data.

 No, it didn't.  (It had versioning-related bugs but multiple versions
 a la Bigtable was never part of the design.)

 -Jonathan



Re: MVCC

2009-08-03 Thread Evan Weaver
You can support this at the domain level with custom comparators, I
think. It doesn't need to be in Cassandra itself as a first-class
operation.

Evan

On Mon, Aug 3, 2009 at 1:39 PM, Ivan Changivan.ch...@medigy.com wrote:
 Is this going to be an inherent limitation of Cassandra?

 There is no doubt many applications will benefit from db with build-in
 support for mutliple versions of the same data - features that allow
 reversal of operations, applications that require historical data maintained
 (e.g. credit/debit application) for indefinite amount of time or number of
 versions.

 It would be nice to be able to configure column famillies with versioning
 attrbutes.
 Would we ever get that or we have to implement our own version stack in
 Cassandra.

 -Ivan

 On Mon, Aug 3, 2009 at 11:56 AM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote:
  Ivan,
 
  The original cassandra keeps multiple versions of the column data.

 No, it didn't.  (It had versioning-related bugs but multiple versions
 a la Bigtable was never part of the design.)

 -Jonathan





-- 
Evan Weaver


Re: MVCC

2009-08-03 Thread Jonathan Ellis
On Mon, Aug 3, 2009 at 3:39 PM, Ivan Changivan.ch...@medigy.com wrote:
 Is this going to be an inherent limitation of Cassandra?

If someone writes a patch that adds multi-version support without
compromising single-version performance then I don't see any reasons
to turn it down.

-Jonathan