Charlie, I would be willing to help you out with your issues tomorrow afternoon, feel free to give me a call after 4m ET. There are lots of people who store *and* update data with cassandra (at scale).
-- Colin Clark | Solutions Architect DataStax | www.datastax.com m | +1-320-221-9531 e | colin.cl...@datastax.com We power the big data applications that transform business. More than 400 customers, including startups and twenty-five percent of the Fortune 100 rely on DataStax's massively scalable, flexible, fast and continuously available big data platform built on Apache Cassandra™. DataStax integrates in one cluster (thus requiring no ETL) enterprise-ready Cassandra, Apache Hadoop™ for analytics and Apache Solr™ for search, across multiple data centers and in the cloud all while providing advanced enterprise security features that keep data safe. > On May 27, 2014, at 4:16 PM, Robert Coli <rc...@eventbrite.com> wrote: > >> On Sun, May 25, 2014 at 12:01 PM, Charlie Mason <charlie....@gmail.com> >> wrote: >> I have a table which has one column per user. It revives at lot of updates >> to these columns through out the life time. They are always updates on a few >> specific columns Firstly is Cassandra storing a Tombstone for each of these >> old column values. >> ... >> As you can see that's awful lot of tombstoned cells. That's after a full >> compaction as well. Just so you are aware this table is updated using a >> Paxos IF statement. > > If you do a lot of UPDATEs, perhaps a log structured database with immutable > datafiles from which row fragments are reconciled on read is not for you. > Especially if you have to use lightweight "transactions" to make your > application semantics work. > >> Would I better off adding a time based key to the primary key. Then doing a >> sepperate insert and then deleting the original. If I did the query with a >> limit of one it should always find the first rows before hitting a >> tombstone. Is that correct? > > I have no idea what you're asking regarding a LIMIT of 1... in general > anything that scans over multiple partitions is bad. I'm pretty sure you > almost always want to use a design which allows you to use FIRST instead of > LIMIT for this reason. > > The overall form of your questions suggests you might be better off using the > right tool for the job, which may not be Cassandra. > > =Rob