On Mon, Aug 27, 2012 at 1:19 AM, aaron morton <aa...@thelastpickle.com> wrote: > After thinking about how > sstables are done on disk, it seems best (required??) to write out > each row at once. > > Sort of. We only want one instance of the row per SSTable created.
Ah, good clarification, although I think for my purposes they're one in the same. > Any other tips to improve load time or reduce the load on the cluster > or subsequent compaction activity? > > Less SSTables means less compaction. So go as high as you can on the > bufferSizeInMB param for the > SSTableSimpleUnsortedWriter. Ok. > There is also a SSTableSimpleWriter. Because it expects rows to be ordered > it does not buffer and can create bigger sstables. > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableSimpleWriter.java Hmmm.... prolly not realistic in my situation... doing so would likely thrash the disks on my PG server a lot more and kill my read throughput and that server is already hitting a wall. > > Right now my Cassandra data store has about 4 months of data and we > have 5 years of historical > > ingest all the histories! Actually, I was a little worried about how much space that would take... my estimates was ~305GB/year, which is a lot when you consider the 300-400GB/node limit (something I didn't know about at the time). However, compression has turned out to be extremely efficient on my dataset... just under 4 months of data is less then 2GB! I'm pretty thrilled. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"