Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-28 Thread aaron morton
 dataset... just under 4 months of data is less then 2GB!  I'm pretty
 thrilled.
Be thrilled by all the compressions ! :)

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/08/2012, at 6:10 AM, Aaron Turner synfina...@gmail.com wrote:

 On Mon, Aug 27, 2012 at 1:19 AM, aaron morton aa...@thelastpickle.com wrote:
 After thinking about how
 sstables are done on disk, it seems best (required??) to write out
 each row at once.
 
 Sort of. We only want one instance of the row per SSTable created.
 
 Ah, good clarification, although I think for my purposes they're one
 in the same.
 
 
 Any other tips to improve load time or reduce the load on the cluster
 or subsequent compaction activity?
 
 Less SSTables means less compaction. So go as high as you can on the
 bufferSizeInMB param for the
 SSTableSimpleUnsortedWriter.
 
 Ok.
 
 There is also a SSTableSimpleWriter. Because it expects rows to be ordered
 it does not buffer and can create bigger sstables.
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java
 
 Hmmm prolly not realistic in my situation... doing so would likely
 thrash the disks on my PG server a lot more and kill my read
 throughput and that server is already hitting a wall.
 
 
 Right now my Cassandra data store has about 4 months of data and we
 have 5 years of historical
 
 ingest all the histories!
 
 Actually, I was a little worried about how much space that would
 take... my estimates was ~305GB/year, which is a lot when you consider
 the 300-400GB/node limit (something I didn't know about at the time).
 However, compression has turned out to be extremely efficient on my
 dataset... just under 4 months of data is less then 2GB!  I'm pretty
 thrilled.
 
 
 -- 
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
 carpe diem quam minimum credula postero



Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-27 Thread aaron morton
 After thinking about how
 sstables are done on disk, it seems best (required??) to write out
 each row at once.  
Sort of. We only want one instance of the row per SSTable created. 


 Any other tips to improve load time or reduce the load on the cluster
 or subsequent compaction activity? 

Less SSTables means less compaction. So go as high as you can on the 
bufferSizeInMB param for the 
SSTableSimpleUnsortedWriter. 

There is also a SSTableSimpleWriter. Because it expects rows to be ordered it 
does not buffer and can create bigger sstables.
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java


 Right now my Cassandra data store has about 4 months of data and we
 have 5 years of historical 
ingest all the histories!

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/08/2012, at 12:56 PM, Aaron Turner synfina...@gmail.com wrote:

 So I've read: http://www.datastax.com/dev/blog/bulk-loading
 
 Are there any tips for using sstableloader /
 SSTableSimpleUnsortedWriter to migrate time series data from a our old
 datastore (PostgreSQL) to Cassandra?  After thinking about how
 sstables are done on disk, it seems best (required??) to write out
 each row at once.  Ie: if each row == 1 years worth of data and you
 have say 30,000 rows, write one full row at a time (a full years worth
 of data points for a given metric) rather then 1 data point for 30,000
 rows.
 
 Any other tips to improve load time or reduce the load on the cluster
 or subsequent compaction activity?   All my CF's I'll be writing to
 use compression and leveled compaction.
 
 Right now my Cassandra data store has about 4 months of data and we
 have 5 years of historical (not sure yet how much we'll actually load
 yet, but minimally 1 years worth).
 
 Thanks!
 
 -- 
 Aaron Turner
 http://synfin.net/ Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
 carpe diem quam minimum credula postero



Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-27 Thread Aaron Turner
On Mon, Aug 27, 2012 at 1:19 AM, aaron morton aa...@thelastpickle.com wrote:
 After thinking about how
 sstables are done on disk, it seems best (required??) to write out
 each row at once.

 Sort of. We only want one instance of the row per SSTable created.

Ah, good clarification, although I think for my purposes they're one
in the same.


 Any other tips to improve load time or reduce the load on the cluster
 or subsequent compaction activity?

 Less SSTables means less compaction. So go as high as you can on the
 bufferSizeInMB param for the
 SSTableSimpleUnsortedWriter.

Ok.

 There is also a SSTableSimpleWriter. Because it expects rows to be ordered
 it does not buffer and can create bigger sstables.
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java

Hmmm prolly not realistic in my situation... doing so would likely
thrash the disks on my PG server a lot more and kill my read
throughput and that server is already hitting a wall.


 Right now my Cassandra data store has about 4 months of data and we
 have 5 years of historical

 ingest all the histories!

Actually, I was a little worried about how much space that would
take... my estimates was ~305GB/year, which is a lot when you consider
the 300-400GB/node limit (something I didn't know about at the time).
However, compression has turned out to be extremely efficient on my
dataset... just under 4 months of data is less then 2GB!  I'm pretty
thrilled.


-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero


optimizing use of sstableloader / SSTableSimpleUnsortedWriter

2012-08-24 Thread Aaron Turner
So I've read: http://www.datastax.com/dev/blog/bulk-loading

Are there any tips for using sstableloader /
SSTableSimpleUnsortedWriter to migrate time series data from a our old
datastore (PostgreSQL) to Cassandra?  After thinking about how
sstables are done on disk, it seems best (required??) to write out
each row at once.  Ie: if each row == 1 years worth of data and you
have say 30,000 rows, write one full row at a time (a full years worth
of data points for a given metric) rather then 1 data point for 30,000
rows.

Any other tips to improve load time or reduce the load on the cluster
or subsequent compaction activity?   All my CF's I'll be writing to
use compression and leveled compaction.

Right now my Cassandra data store has about 4 months of data and we
have 5 years of historical (not sure yet how much we'll actually load
yet, but minimally 1 years worth).

Thanks!

-- 
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
carpe diem quam minimum credula postero