Re: Avoiding High Cell Tombstone Count

Colin Tue, 27 May 2014 14:56:31 -0700

Charlie,

I would be willing to help you out with your issues tomorrow afternoon, feel 
free to give me a call after 4m ET.  There are lots of people who store *and* 
update data with cassandra (at scale).


--
Colin Clark   | Solutions Architect
DataStax  |  www.datastax.com 
m | +1-320-221-9531
e  | colin.cl...@datastax.com


We power the big data applications that transform business.

More than 400 customers, including startups and twenty-five percent of the 
Fortune 100 rely on DataStax's massively scalable, flexible, fast and 
continuously available big data platform built on Apache Cassandra™. DataStax 
integrates in one cluster (thus requiring no ETL)  enterprise-ready Cassandra, 
Apache Hadoop™ for analytics and Apache Solr™ for search, across multiple data 
centers and in the cloud all while providing advanced enterprise security 
features that keep data safe.
 

> On May 27, 2014, at 4:16 PM, Robert Coli <rc...@eventbrite.com> wrote:
> 
>> On Sun, May 25, 2014 at 12:01 PM, Charlie Mason <charlie....@gmail.com> 
>> wrote:
>> I have a table which has one column per user. It revives at lot of updates 
>> to these columns through out the life time. They are always updates on a few 
>> specific columns Firstly is Cassandra storing a Tombstone for each of these 
>> old column values. 
>> ...
>> As you can see that's awful lot of tombstoned cells. That's after a full 
>> compaction as well. Just so you are aware this table is updated using a 
>> Paxos IF statement.
> 
> If you do a lot of UPDATEs, perhaps a log structured database with immutable 
> datafiles from which row fragments are reconciled on read is not for you. 
> Especially if you have to use lightweight "transactions" to make your 
> application semantics work.
>  
>> Would I better off adding a time based key to the primary key. Then doing a 
>> sepperate insert and then deleting the original. If I did the query with a 
>> limit of one it should always find the first rows before hitting a 
>> tombstone. Is that correct? 
> 
> I have no idea what you're asking regarding a LIMIT of 1... in general 
> anything that scans over multiple partitions is bad. I'm pretty sure you 
> almost always want to use a design which allows you to use FIRST instead of 
> LIMIT for this reason.
> 
> The overall form of your questions suggests you might be better off using the 
> right tool for the job, which may not be Cassandra.
> 
> =Rob

Re: Avoiding High Cell Tombstone Count

Reply via email to