MVCC
Does Cassandra support MVCC? I am building an application with concurrent updates (add, update, delete) and one of the requirements is to be able to run audits that reproduce all the update histories and the data objects in different versions. What's the best way to go about this in Cassandra? As long as histories and versions are maintained? Does Cassandra support MVCC? -Ivan
Re: MVCC
Ivan, The original cassandra keeps multiple versions of the column data. However, that support has been removed in the apache code. Right now, only the latest version is kept. In the future, we could add the versioning support back. Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 jun...@almaden.ibm.com | | From: | | --| |Ivan Chang ivan.ch...@medigy.com | --| | | To:| | --| |cassandra-user@incubator.apache.org | --| | | Date: | | --| |08/03/2009 08:24 AM | --| | | Subject: | | --| |MVCC | --| Does Cassandra support MVCC? I am building an application with concurrent updates (add, update, delete) and one of the requirements is to be able to run audits that reproduce all the update histories and the data objects in different versions. What's the best way to go about this in Cassandra? As long as histories and versions are maintained? Does Cassandra support MVCC? -Ivan inline: graycol.gifinline: ecblank.gif
Re: MVCC
How was it used in the original? On Aug 3, 2009, at 8:49 AM, Jun Rao wrote: Ivan, The original cassandra keeps multiple versions of the column data. However, that support has been removed in the apache code. Right now, only the latest version is kept. In the future, we could add the versioning support back. Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 jun...@almaden.ibm.com graycol.gifIvan Chang ---08/03/2009 08:24:50 AM---Does Cassandra support MVCC? I am building an application with concurrent updates (add, update, dele ecblank.gif From: ecblank.gif Ivan Chang ivan.ch...@medigy.com ecblank.gif To: ecblank.gif cassandra-user@incubator.apache.org ecblank.gif Date: ecblank.gif 08/03/2009 08:24 AM ecblank.gif Subject:ecblank.gif MVCC Does Cassandra support MVCC? I am building an application with concurrent updates (add, update, delete) and one of the requirements is to be able to run audits that reproduce all the update histories and the data objects in different versions. What's the best way to go about this in Cassandra? As long as histories and versions are maintained? Does Cassandra support MVCC? -Ivan
Re: MVCC
On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote: Ivan, The original cassandra keeps multiple versions of the column data. No, it didn't. (It had versioning-related bugs but multiple versions a la Bigtable was never part of the design.) -Jonathan
Re: MVCC
I always thought cassandra had free multiple versions and we needed to manually delete the older versions On Mon, Aug 3, 2009 at 8:56 AM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote: Ivan, The original cassandra keeps multiple versions of the column data. No, it didn't. (It had versioning-related bugs but multiple versions a la Bigtable was never part of the design.) -Jonathan -- Bidegg worlds best auction site http://bidegg.com
Re: MVCC
If this is the case, what does the timestamp passed in to the remove call do? I assumed you had to have it match up with a specific version... On Mon, Aug 3, 2009 at 9:53 AM, mobiledream...@gmail.com wrote: I always thought cassandra had free multiple versions and we needed to manually delete the older versions On Mon, Aug 3, 2009 at 8:56 AM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote: Ivan, The original cassandra keeps multiple versions of the column data. No, it didn't. (It had versioning-related bugs but multiple versions a la Bigtable was never part of the design.) -Jonathan -- Bidegg worlds best auction site http://bidegg.com
Re: MVCC
It's there for the same reason as the other timestamps: it lets cassandra ignore obsolete operations. So if you do a delete at time X and an insert at time Y where X Y, the insert will not be deleted by mistake even if a node is down temporarily and gets the delete later. -Jonathan On Mon, Aug 3, 2009 at 11:59 AM, Mark McBridemark.mcbr...@gmail.com wrote: If this is the case, what does the timestamp passed in to the remove call do? I assumed you had to have it match up with a specific version... On Mon, Aug 3, 2009 at 9:53 AM, mobiledream...@gmail.com wrote: I always thought cassandra had free multiple versions and we needed to manually delete the older versions On Mon, Aug 3, 2009 at 8:56 AM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote: Ivan, The original cassandra keeps multiple versions of the column data. No, it didn't. (It had versioning-related bugs but multiple versions a la Bigtable was never part of the design.) -Jonathan -- Bidegg worlds best auction site http://bidegg.com
Re: MVCC
So if different servers are not synchronized in time (to a Tier 1 time server), then updates from slower server will not be updated on faster servers?
Re: MVCC
Thanks, that makes sense. Is it an ok general rule that the timestamps should be set to 1) The time that the data to be mutated was generated 2) The current system time if the time the data was mutated isn't available Looking around at code it seems like time 0 is used a lot, which seems pretty dangerous. ---Mark On Mon, Aug 3, 2009 at 10:10 AM, Wilson Marwilson...@gmail.com wrote: So if different servers are not synchronized in time (to a Tier 1 time server), then updates from slower server will not be updated on faster servers?
Re: MVCC
Strictly speaking, no; timestamp is client-provided. But in the sense that you'd better use ntpd on your clients, yes. On Mon, Aug 3, 2009 at 12:10 PM, Wilson Marwilson...@gmail.com wrote: So if different servers are not synchronized in time (to a Tier 1 time server), then updates from slower server will not be updated on faster servers?
Re: MVCC
On Mon, Aug 3, 2009 at 12:12 PM, Mark McBridemark.mcbr...@gmail.com wrote: Thanks, that makes sense. Is it an ok general rule that the timestamps should be set to 1) The time that the data to be mutated was generated 2) The current system time if the time the data was mutated isn't available Yes. Looking around at code it seems like time 0 is used a lot, which seems pretty dangerous. We do this in test code to make it obviously clock-independent, yes. -Jonathan
Re: MVCC
Cool. There are a few things I've found out recently that should probably go into the wiki (this, the fact that get_columns_since silently returns no results if your column family isn't ordered by time)... is it moderated at all? Should I run changes by the mailing list? On Mon, Aug 3, 2009 at 10:15 AM, Jonathan Ellisjbel...@gmail.com wrote: On Mon, Aug 3, 2009 at 12:12 PM, Mark McBridemark.mcbr...@gmail.com wrote: Thanks, that makes sense. Is it an ok general rule that the timestamps should be set to 1) The time that the data to be mutated was generated 2) The current system time if the time the data was mutated isn't available Yes. Looking around at code it seems like time 0 is used a lot, which seems pretty dangerous. We do this in test code to make it obviously clock-independent, yes. -Jonathan
Re: MVCC
Is this going to be an inherent limitation of Cassandra? There is no doubt many applications will benefit from db with build-in support for mutliple versions of the same data - features that allow reversal of operations, applications that require historical data maintained (e.g. credit/debit application) for indefinite amount of time or number of versions. It would be nice to be able to configure column famillies with versioning attrbutes. Would we ever get that or we have to implement our own version stack in Cassandra. -Ivan On Mon, Aug 3, 2009 at 11:56 AM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote: Ivan, The original cassandra keeps multiple versions of the column data. No, it didn't. (It had versioning-related bugs but multiple versions a la Bigtable was never part of the design.) -Jonathan
Re: MVCC
You can support this at the domain level with custom comparators, I think. It doesn't need to be in Cassandra itself as a first-class operation. Evan On Mon, Aug 3, 2009 at 1:39 PM, Ivan Changivan.ch...@medigy.com wrote: Is this going to be an inherent limitation of Cassandra? There is no doubt many applications will benefit from db with build-in support for mutliple versions of the same data - features that allow reversal of operations, applications that require historical data maintained (e.g. credit/debit application) for indefinite amount of time or number of versions. It would be nice to be able to configure column famillies with versioning attrbutes. Would we ever get that or we have to implement our own version stack in Cassandra. -Ivan On Mon, Aug 3, 2009 at 11:56 AM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Aug 3, 2009 at 10:49 AM, Jun Raojun...@almaden.ibm.com wrote: Ivan, The original cassandra keeps multiple versions of the column data. No, it didn't. (It had versioning-related bugs but multiple versions a la Bigtable was never part of the design.) -Jonathan -- Evan Weaver
Re: MVCC
On Mon, Aug 3, 2009 at 3:39 PM, Ivan Changivan.ch...@medigy.com wrote: Is this going to be an inherent limitation of Cassandra? If someone writes a patch that adds multi-version support without compromising single-version performance then I don't see any reasons to turn it down. -Jonathan