Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-16 Thread Ben Bromhead
If you make the timestamp the partition key you won't be able to do range 
queries (unless you use an ordered partitioner).

Assuming you are logging from multiple devices you will want your partition key 
to be the device id  the date, your clustering key to be the timestamp 
(timeuuid are good to prevent collisions) and then log message, levels etc as 
the other columns.

Then you can also create a new table for every week (or day/month depending on 
how much granularity you want) and just write to the current weeks table. This 
step allows you to delete old data without Cassandra using tombstones (you just 
drop the table for the week of logs you want to delete).

For a much clearer explantation see 
http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries (the last 
few slides).

As for compaction, I would leave it enabled as having lots of stables hanging 
around can make range queries slower (the query has more files to visit). See 
http://stackoverflow.com/questions/8917882/cassandra-sstables-and-compaction (a 
little old but still relevant). Compaction also fixes up things like merging 
row fragments (when you write new columns to the same row).


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359


On 07/05/2014, at 10:55 AM, Kevin Burton bur...@spinn3r.com wrote:

 I'm looking at storing log data in Cassandra… 
 
 Every record is a unique timestamp for the key, and then the log line for the 
 value.
 
 I think it would be best to just disable compactions.
 
 - there will never be any deletes.
 
 - all the data will be accessed in time range (probably partitioned randomly) 
 and sequentially.
 
 So every time a memtable flushes, we will just keep that SSTable forever.  
 
 Compacting the data is kind of redundant in this situation.
 
 I was thinking the best strategy is to use setcompactionthreshold and set the 
 value VERY high to compactions are never triggered.
 
 Also, It would be IDEAL to be able to tell cassandra to just drop a full 
 SSTable so that I can truncate older data without having to do a major 
 compaction and without having to mark everything with a tombstone.  Is this 
 possible?
 
 
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.
 



Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-16 Thread Kevin Burton


 If the data is read from a slice of a partition that has been added over
 time there will be a part of that row in every almost sstable. That would
 mean all of them (multiple disk seeks depending on clustering order per
 sstable) would have to be read from in order to service the query.  Data
 model can help or hurt a lot though.


Yes… totally agree, but we wouldn't do that.  The entire 'row' is immutable
and passes through the system and then expires due to TTL.

TTL is probably the way to go here, especially if Cassandra just drops the
whole SSTable on the TTL expiration which is what I think Im hearing.


 If you set the TTL for the columns you added then C* will clean up
 sstables (if size tiered and post 1.2) once the datas been expired.  Since
 you never delete set the gc_grace_seconds to 0 so the ttl expiration doesnt
 result in tombstones.


Thanks!

Kevin
-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.


Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-15 Thread Nate McCall
The following article has some good information for what you describe:
http://www.datastax.com/dev/blog/optimizations-around-cold-sstables

Some related tickets which will provide background:
https://issues.apache.org/jira/browse/CASSANDRA-5228
https://issues.apache.org/jira/browse/CASSANDRA-5515


On Tue, May 6, 2014 at 7:55 PM, Kevin Burton bur...@spinn3r.com wrote:

 I'm looking at storing log data in Cassandra…

 Every record is a unique timestamp for the key, and then the log line for
 the value.

 I think it would be best to just disable compactions.

 - there will never be any deletes.

 - all the data will be accessed in time range (probably partitioned
 randomly) and sequentially.

 So every time a memtable flushes, we will just keep that SSTable forever.

 Compacting the data is kind of redundant in this situation.

 I was thinking the best strategy is to use setcompactionthreshold and set
 the value VERY high to compactions are never triggered.

 Also, It would be IDEAL to be able to tell cassandra to just drop a full
 SSTable so that I can truncate older data without having to do a major
 compaction and without having to mark everything with a tombstone.  Is this
 possible?



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ 
 profilehttps://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder  Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-13 Thread Chris Lohfink
Whats your data model look like?

 I think it would be best to just disable compactions.

Why? are you never doing reads?  There is also a cost to repairs/bootstrapping 
when you have a ton of sstables.  This might be a premature optimization.

If the data is read from a slice of a partition that has been added over time 
there will be a part of that row in every almost sstable. That would mean all 
of them (multiple disk seeks depending on clustering order per sstable) would 
have to be read from in order to service the query.  Data model can help or 
hurt a lot though.

If you set the TTL for the columns you added then C* will clean up sstables (if 
size tiered and post 1.2) once the datas been expired.  Since you never delete 
set the gc_grace_seconds to 0 so the ttl expiration doesnt result in tombstones.

---
Chris Lohfink 



On May 6, 2014, at 7:55 PM, Kevin Burton bur...@spinn3r.com wrote:

 I'm looking at storing log data in Cassandra… 
 
 Every record is a unique timestamp for the key, and then the log line for the 
 value.
 
 I think it would be best to just disable compactions.
 
 - there will never be any deletes.
 
 - all the data will be accessed in time range (probably partitioned randomly) 
 and sequentially.
 
 So every time a memtable flushes, we will just keep that SSTable forever.  
 
 Compacting the data is kind of redundant in this situation.
 
 I was thinking the best strategy is to use setcompactionthreshold and set the 
 value VERY high to compactions are never triggered.
 
 Also, It would be IDEAL to be able to tell cassandra to just drop a full 
 SSTable so that I can truncate older data without having to do a major 
 compaction and without having to mark everything with a tombstone.  Is this 
 possible?
 
 
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.
 



Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-12 Thread DuyHai Doan
Hello Kevin

 You can disable compaction by configuring the compaction options of your
table as follow:

  compaction={'min_threshold': '0', 'class':
'SizeTieredCompactionStrategy', 'max_threshold': '0'}

Regards

 Duy Hai DOAN


On Wed, May 7, 2014 at 2:55 AM, Kevin Burton bur...@spinn3r.com wrote:

 I'm looking at storing log data in Cassandra…

 Every record is a unique timestamp for the key, and then the log line for
 the value.

 I think it would be best to just disable compactions.

 - there will never be any deletes.

 - all the data will be accessed in time range (probably partitioned
 randomly) and sequentially.

 So every time a memtable flushes, we will just keep that SSTable forever.

 Compacting the data is kind of redundant in this situation.

 I was thinking the best strategy is to use setcompactionthreshold and set
 the value VERY high to compactions are never triggered.

 Also, It would be IDEAL to be able to tell cassandra to just drop a full
 SSTable so that I can truncate older data without having to do a major
 compaction and without having to mark everything with a tombstone.  Is this
 possible?



 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ 
 profilehttps://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




Storing log structured data in Cassandra without compactions for performance boost.

2014-05-07 Thread Kevin Burton
I'm looking at storing log data in Cassandra…

Every record is a unique timestamp for the key, and then the log line for
the value.

I think it would be best to just disable compactions.

- there will never be any deletes.

- all the data will be accessed in time range (probably partitioned
randomly) and sequentially.

So every time a memtable flushes, we will just keep that SSTable forever.

Compacting the data is kind of redundant in this situation.

I was thinking the best strategy is to use setcompactionthreshold and set
the value VERY high to compactions are never triggered.

Also, It would be IDEAL to be able to tell cassandra to just drop a full
SSTable so that I can truncate older data without having to do a major
compaction and without having to mark everything with a tombstone.  Is this
possible?



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.