Re: Storing log structured data in Cassandra without compactions for performance boost.
If you make the timestamp the partition key you won't be able to do range queries (unless you use an ordered partitioner). Assuming you are logging from multiple devices you will want your partition key to be the device id the date, your clustering key to be the timestamp (timeuuid are good to prevent collisions) and then log message, levels etc as the other columns. Then you can also create a new table for every week (or day/month depending on how much granularity you want) and just write to the current weeks table. This step allows you to delete old data without Cassandra using tombstones (you just drop the table for the week of logs you want to delete). For a much clearer explantation see http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries (the last few slides). As for compaction, I would leave it enabled as having lots of stables hanging around can make range queries slower (the query has more files to visit). See http://stackoverflow.com/questions/8917882/cassandra-sstables-and-compaction (a little old but still relevant). Compaction also fixes up things like merging row fragments (when you write new columns to the same row). Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 07/05/2014, at 10:55 AM, Kevin Burton bur...@spinn3r.com wrote: I'm looking at storing log data in Cassandra… Every record is a unique timestamp for the key, and then the log line for the value. I think it would be best to just disable compactions. - there will never be any deletes. - all the data will be accessed in time range (probably partitioned randomly) and sequentially. So every time a memtable flushes, we will just keep that SSTable forever. Compacting the data is kind of redundant in this situation. I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered. Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone. Is this possible? -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Storing log structured data in Cassandra without compactions for performance boost.
If the data is read from a slice of a partition that has been added over time there will be a part of that row in every almost sstable. That would mean all of them (multiple disk seeks depending on clustering order per sstable) would have to be read from in order to service the query. Data model can help or hurt a lot though. Yes… totally agree, but we wouldn't do that. The entire 'row' is immutable and passes through the system and then expires due to TTL. TTL is probably the way to go here, especially if Cassandra just drops the whole SSTable on the TTL expiration which is what I think Im hearing. If you set the TTL for the columns you added then C* will clean up sstables (if size tiered and post 1.2) once the datas been expired. Since you never delete set the gc_grace_seconds to 0 so the ttl expiration doesnt result in tombstones. Thanks! Kevin -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Storing log structured data in Cassandra without compactions for performance boost.
The following article has some good information for what you describe: http://www.datastax.com/dev/blog/optimizations-around-cold-sstables Some related tickets which will provide background: https://issues.apache.org/jira/browse/CASSANDRA-5228 https://issues.apache.org/jira/browse/CASSANDRA-5515 On Tue, May 6, 2014 at 7:55 PM, Kevin Burton bur...@spinn3r.com wrote: I'm looking at storing log data in Cassandra… Every record is a unique timestamp for the key, and then the log line for the value. I think it would be best to just disable compactions. - there will never be any deletes. - all the data will be accessed in time range (probably partitioned randomly) and sequentially. So every time a memtable flushes, we will just keep that SSTable forever. Compacting the data is kind of redundant in this situation. I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered. Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone. Is this possible? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Storing log structured data in Cassandra without compactions for performance boost.
Whats your data model look like? I think it would be best to just disable compactions. Why? are you never doing reads? There is also a cost to repairs/bootstrapping when you have a ton of sstables. This might be a premature optimization. If the data is read from a slice of a partition that has been added over time there will be a part of that row in every almost sstable. That would mean all of them (multiple disk seeks depending on clustering order per sstable) would have to be read from in order to service the query. Data model can help or hurt a lot though. If you set the TTL for the columns you added then C* will clean up sstables (if size tiered and post 1.2) once the datas been expired. Since you never delete set the gc_grace_seconds to 0 so the ttl expiration doesnt result in tombstones. --- Chris Lohfink On May 6, 2014, at 7:55 PM, Kevin Burton bur...@spinn3r.com wrote: I'm looking at storing log data in Cassandra… Every record is a unique timestamp for the key, and then the log line for the value. I think it would be best to just disable compactions. - there will never be any deletes. - all the data will be accessed in time range (probably partitioned randomly) and sequentially. So every time a memtable flushes, we will just keep that SSTable forever. Compacting the data is kind of redundant in this situation. I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered. Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone. Is this possible? -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Storing log structured data in Cassandra without compactions for performance boost.
Hello Kevin You can disable compaction by configuring the compaction options of your table as follow: compaction={'min_threshold': '0', 'class': 'SizeTieredCompactionStrategy', 'max_threshold': '0'} Regards Duy Hai DOAN On Wed, May 7, 2014 at 2:55 AM, Kevin Burton bur...@spinn3r.com wrote: I'm looking at storing log data in Cassandra… Every record is a unique timestamp for the key, and then the log line for the value. I think it would be best to just disable compactions. - there will never be any deletes. - all the data will be accessed in time range (probably partitioned randomly) and sequentially. So every time a memtable flushes, we will just keep that SSTable forever. Compacting the data is kind of redundant in this situation. I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered. Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone. Is this possible? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Storing log structured data in Cassandra without compactions for performance boost.
I'm looking at storing log data in Cassandra… Every record is a unique timestamp for the key, and then the log line for the value. I think it would be best to just disable compactions. - there will never be any deletes. - all the data will be accessed in time range (probably partitioned randomly) and sequentially. So every time a memtable flushes, we will just keep that SSTable forever. Compacting the data is kind of redundant in this situation. I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered. Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone. Is this possible? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.