Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter
dataset... just under 4 months of data is less then 2GB! I'm pretty thrilled. Be thrilled by all the compressions ! :) Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/08/2012, at 6:10 AM, Aaron Turner synfina...@gmail.com wrote: On Mon, Aug 27, 2012 at 1:19 AM, aaron morton aa...@thelastpickle.com wrote: After thinking about how sstables are done on disk, it seems best (required??) to write out each row at once. Sort of. We only want one instance of the row per SSTable created. Ah, good clarification, although I think for my purposes they're one in the same. Any other tips to improve load time or reduce the load on the cluster or subsequent compaction activity? Less SSTables means less compaction. So go as high as you can on the bufferSizeInMB param for the SSTableSimpleUnsortedWriter. Ok. There is also a SSTableSimpleWriter. Because it expects rows to be ordered it does not buffer and can create bigger sstables. https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java Hmmm prolly not realistic in my situation... doing so would likely thrash the disks on my PG server a lot more and kill my read throughput and that server is already hitting a wall. Right now my Cassandra data store has about 4 months of data and we have 5 years of historical ingest all the histories! Actually, I was a little worried about how much space that would take... my estimates was ~305GB/year, which is a lot when you consider the 300-400GB/node limit (something I didn't know about at the time). However, compression has turned out to be extremely efficient on my dataset... just under 4 months of data is less then 2GB! I'm pretty thrilled. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter
After thinking about how sstables are done on disk, it seems best (required??) to write out each row at once. Sort of. We only want one instance of the row per SSTable created. Any other tips to improve load time or reduce the load on the cluster or subsequent compaction activity? Less SSTables means less compaction. So go as high as you can on the bufferSizeInMB param for the SSTableSimpleUnsortedWriter. There is also a SSTableSimpleWriter. Because it expects rows to be ordered it does not buffer and can create bigger sstables. https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java Right now my Cassandra data store has about 4 months of data and we have 5 years of historical ingest all the histories! Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/08/2012, at 12:56 PM, Aaron Turner synfina...@gmail.com wrote: So I've read: http://www.datastax.com/dev/blog/bulk-loading Are there any tips for using sstableloader / SSTableSimpleUnsortedWriter to migrate time series data from a our old datastore (PostgreSQL) to Cassandra? After thinking about how sstables are done on disk, it seems best (required??) to write out each row at once. Ie: if each row == 1 years worth of data and you have say 30,000 rows, write one full row at a time (a full years worth of data points for a given metric) rather then 1 data point for 30,000 rows. Any other tips to improve load time or reduce the load on the cluster or subsequent compaction activity? All my CF's I'll be writing to use compression and leveled compaction. Right now my Cassandra data store has about 4 months of data and we have 5 years of historical (not sure yet how much we'll actually load yet, but minimally 1 years worth). Thanks! -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
Re: optimizing use of sstableloader / SSTableSimpleUnsortedWriter
On Mon, Aug 27, 2012 at 1:19 AM, aaron morton aa...@thelastpickle.com wrote: After thinking about how sstables are done on disk, it seems best (required??) to write out each row at once. Sort of. We only want one instance of the row per SSTable created. Ah, good clarification, although I think for my purposes they're one in the same. Any other tips to improve load time or reduce the load on the cluster or subsequent compaction activity? Less SSTables means less compaction. So go as high as you can on the bufferSizeInMB param for the SSTableSimpleUnsortedWriter. Ok. There is also a SSTableSimpleWriter. Because it expects rows to be ordered it does not buffer and can create bigger sstables. https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java Hmmm prolly not realistic in my situation... doing so would likely thrash the disks on my PG server a lot more and kill my read throughput and that server is already hitting a wall. Right now my Cassandra data store has about 4 months of data and we have 5 years of historical ingest all the histories! Actually, I was a little worried about how much space that would take... my estimates was ~305GB/year, which is a lot when you consider the 300-400GB/node limit (something I didn't know about at the time). However, compression has turned out to be extremely efficient on my dataset... just under 4 months of data is less then 2GB! I'm pretty thrilled. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
optimizing use of sstableloader / SSTableSimpleUnsortedWriter
So I've read: http://www.datastax.com/dev/blog/bulk-loading Are there any tips for using sstableloader / SSTableSimpleUnsortedWriter to migrate time series data from a our old datastore (PostgreSQL) to Cassandra? After thinking about how sstables are done on disk, it seems best (required??) to write out each row at once. Ie: if each row == 1 years worth of data and you have say 30,000 rows, write one full row at a time (a full years worth of data points for a given metric) rather then 1 data point for 30,000 rows. Any other tips to improve load time or reduce the load on the cluster or subsequent compaction activity? All my CF's I'll be writing to use compression and leveled compaction. Right now my Cassandra data store has about 4 months of data and we have 5 years of historical (not sure yet how much we'll actually load yet, but minimally 1 years worth). Thanks! -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero